September 17, 2024

Firebolt Unleashed: High Efficiency and Low-Cost Concurrency in Action

No items found.

Introduction

In today’s data-driven landscape, data analysts and developers need systems to process large volumes of information quickly and concurrently, without compromising cost or performance. Firebolt is engineered to meet these demands, offering a unique blend of high concurrency, low-latency query performance, and cost efficiency.

Firebolt’s features, such as its advanced query optimizer and vectorized execution engine, efficient storage format, and intelligent indexing, are key drivers of its performance. Vectorized execution allows Firebolt to efficiently process data in batches while its indexing strategies reduce the data scanned during queries. These optimizations enable Firebolt to deliver query results quickly and efficiently, even for challenging workloads, making it an ideal solution for modern data applications.

This post is part of a benchmarking series that validates Firebolt’s value propositions across different workloads. In this installment, we explore its performance on data application workloads. Our motivation behind this series is twofold: to demonstrate how Firebolt handles large volumes of concurrent queries while maintaining fast execution and affordability and to identify areas where it can continue to evolve. By exploring real-world benchmarks, we aim to ensure Firebolt stays ahead of the ever-growing needs of data analysts and developers. All the queries can be found in our GitHub repository for the entire series of benchmarks.

Summary results 

Benchmark results highlight Firebolt’s strengths in delivering low latency, cost efficiency, and high concurrency, making it a powerful solution for diverse data workloads. The benchmark consists of 25 queries designed to replicate real customer data application workloads on an extended AMPLab dataset. These queries perform complex operations like joins, aggregations, and filters, reflecting real-world production environments. With that in mind, here’s how Firebolt performs:

Low Latency:
Firebolt consistently demonstrates fast query execution, making it ideal for time-sensitive tasks like real-time analytics. Firebolt delivered median latencies below 100 ms for this benchmark, which is critical for performance-sensitive workloads. Scale-up and scale-out options can be leveraged to reduce execution time further. For example, scaling up to a single extra-large node configuration resulted in an 8x reduction in execution time compared to a single small node for app_q21 of the benchmark, demonstrating perfect scalability. Similarly, an engine with eight small nodes significantly reduced execution time by over 8x. Not all queries in the benchmark exhibit such perfect scalability, although many do. Depending on the query shape, taking full advantage of all available resources in scale-up or scale-out setups is not always possible. In some cases, complex queries with higher data processing demands benefit even more from scale-out configurations, while simpler queries see significant improvements with scale-up configurations. This scalability ensures that Firebolt can efficiently handle a wide range of workloads, delivering rapid performance regardless of query type or workload size.

App_q21 query represents a common yet complex query pattern. It processes web traffic data by combining information from multiple tables through four joins, performing five aggregations, using a subquery, multiple SQL functions, and complex conditional logic. This query analyzes metrics such as ad revenue, visitor engagement, and page rankings, making it an ideal choice for evaluating performance in real-world scenarios where similar query patterns are often used.

Cost efficiency:
Firebolt offers a flexible approach to balancing cost and performance. An engine with one small node is the most cost-efficient configuration on our benchmark workload. For faster execution, an engine with a single extra-large node or one with eight small nodes offers significantly reduced query times, though at an incremental cost. While single-node engine configurations are more cost-efficient for this dataset and benchmark, scale-out setups would address cases beyond a single node's capabilities, such as handling larger datasets and high concurrency. This flexibility allows users to select the best configuration based on their needs—whether they prioritize cost savings or faster performance.

High concurrency:
Firebolt excels in managing high-concurrency environments, handling large volumes of queries without degradation in performance. In a multi-cluster engine setup, query throughput increased nearly tenfold from 449 to 4213 QPS, demonstrating Firebolt’s capability to handle thousands of queries per second. Under heavy query loads of 8000 requests sent concurrently, median query latency remained low in the tens of milliseconds, further solidifying Firebolt as a reliable solution for high-concurrency environments.

These benchmark results demonstrate Firebolt’s ability to scale efficiently, providing businesses with flexibility depending on their workload demands. Whether for high-throughput tasks or time-sensitive queries, Firebolt is both cost-effective and highly performant.

Benchmark methodology 

Industry-standard benchmarks like TPC-H have long been used to evaluate database performance. However, these benchmarks often fall short when reflecting the real-world complexities found in production workloads. When trying to understand the impact of improvements in our production environment, it became evident that relying solely on traditional benchmarks could provide skewed, non-representative insights. This section examines why industry-standard benchmarks often fall short of capturing the complexity of modern data environments and introduces FireNEWT (Nexus for Efficient Workload Testing), a benchmark designed specifically to address these limitations. Unlike industry benchmarks, FireNEWT is inspired by anonymized query patterns observed across Firebolt customers, reflecting the workloads and challenges encountered in today’s data environments.

Inspired by the "Nastily Exhausting Wizarding Tests"  from the Harry Potter series, FireNEWT is designed to rigorously evaluate Firebolt’s ability to manage a wide range of query complexities. It challenges the system with scenarios beyond standard benchmarks by incorporating real-world query patterns, resource-intensive operations, and high-concurrency conditions. This comprehensive approach ensures that Firebolt’s performance is tested under the same demanding conditions customers experience, making FireNEWT a realistic and future-proof benchmark for modern data applications. FireNEWT consists of three key components: a set of queries representing customer patterns, an extended version of the AMPLab dataset, and a range of workload execution scenarios.  This benchmark is publicly available on GitHub, allowing anyone to explore and test Firebolt's capabilities firsthand. You can find the repository here

Why traditional benchmarks like TPC-H are insufficient

TPC-H was originally designed to simulate decision support systems with predefined queries. While useful for some evaluations, research shows that TPC-H doesn’t fully capture the complexities of modern data workloads. Firebolt customer workloads, for instance, often involve more complex operations like multiple joins, aggregations, window functions, CTEs, complex conditions, and subqueries. These operations are increasingly common, making it critical for benchmarking methodologies to reflect them accurately. As highlighted in Amazon’s research paper, "Why TPC Is Not Enough: An Analysis of the Amazon Redshift Fleet" (link here), real-world workloads demand more robust representations. To illustrate this gap, we compare the structural complexity of TPC-H queries with those in our customer workloads.

Calculating structural complexity: Our approach

In our benchmarking methodology, structural complexity is a key metric that quantifies the complexity of a query based on its components. We assigned weights to the critical elements of a query that most impact performance, such as joins, filters, aggregation, windowing functions, subqueries, CTE, SQL functions, and projection. 

Weights for each query operation are assigned based on their relative impact on system performance, with more resource-intensive operations like joins receiving higher weights (e.g., 10) due to their significant CPU, memory, and I/O demands. Simpler operations, such as column projections, are assigned lower weights (e.g., 0.5) as they have a much lower impact on overall performance. The structural complexity score for a query is calculated by first multiplying the weight assigned to each operation by its frequency within the query. These weighted values are then summed across all operations in the query, providing a total complexity score reflecting each operation's nature and prevalence. The structural complexity score is the sum of these weighted components, quantitatively measuring a query’s inherent difficulty. This approach allows for direct comparisons between the TPC-H and FireNEWT benchmarks.

The role of operational complexity in real-world workloads

While structural complexity is crucial, it’s not the only factor that matters in real-world scenarios. Operational complexity—encompassing data scan or processed volume, execution time, resource, and query frequency—is also critical for assessing a database system’s performance under production conditions.

In developing the FireNEWT benchmark, we conducted an in-depth analysis of Firebolt's existing customer query workloads to understand real-world patterns. This analysis measured structural and operational complexities to create a comprehensive performance metric. However, when comparing FireNEWT with TPC-H, we focused solely on structural complexity to ensure an unbiased comparison. Since TPC-H queries are typically run in isolation and not in a production-like environment with diverse, repeated data, including operational factors would have skewed the results.

In the context of the FireNEWT benchmark, we chose to focus on structural complexity for several reasons:

  • Isolation of structural impact: By isolating structural complexity, we can compare the inherent difficulty of queries without the variability introduced by operational factors. This highlights the fundamental difference in complexity between FireNEWT and TPC-H workloads.
  • Benchmarking modern systems: The primary goal of the FireNEWT benchmark is to demonstrate the limitations of TPC-H in representing Firebolt customer workloads. Focusing on structural complexity provides a clear, quantifiable comparison, showing that FireNEWT workloads are inherently realistic.
  • Operational factors as a separate metric: Operational factors such as data scan volume and execution time vary widely depending on system configuration and dataset specifics. Including these factors in the complexity score would introduce variability that could obscure the core message—that FireNEWT workloads are structurally more complex than TPC-H workloads. Operational metrics are used to validate and improve Firebolt’s scalability and performance across different configurations.

Comparing structural complexity: FireNEWT vs. TPC-H

Our analysis revealed significant differences in structural complexity between FireNEWT and TPC-H:

  • TPC-H: The highest structural complexity score in TPC-H is 158, with most queries scoring below 100. This highlights the simplicity of TPC-H queries, which do not reflect the complexities of modern workloads.
  • FireNEWT: In contrast, FireNEWT queries demonstrate a broad spectrum of structural complexity, with scores reaching 626. While many queries score above 300, indicating a high level of complexity, the overall workload is skewed. A significant portion of the workload consists of simpler queries, but these are punctuated by a smaller set of highly complex queries that include multiple joins, nested subqueries, window functions, and aggregations. This distribution mirrors the real-world scenarios databases encounter, where the majority of queries may be routine but need to coexist with complex, resource-intensive queries. FireNEWT's design reflects this skewed distribution, providing a comprehensive and realistic benchmark for evaluating database performance across simple and intricate query patterns. Additionally, it is essential to note that we focus exclusively on SELECT queries applicable to data-intensive applications, which are particularly relevant for high-concurrency, real-time environments.
Benchmark Number of Queries Highest Complexity Score Lowest Complexity Score Average Complexity Score
TPC-H 22 158 15 70
FireNEWT 25 626 4 137

These findings emphasize that TPC-H is not a sufficient benchmark for modern data systems and highlight the need for a more accurate benchmark like FireNEWT.

Moving forward with real-world benchmarking

The analysis above underscores the importance of using benchmarks that reflect both structural and operational complexities to ensure databases are prepared for real-world workloads. The FireNEWT benchmark offers a more comprehensive and realistic measure of system performance, helping organizations optimize their databases for the demands of production environments. 

Benchmark setup

Dataset overview

The benchmarking process for Firebolt utilizes the AMPLab dataset, which is a comprehensive dataset originally designed for evaluating large-scale data processing frameworks. Established at the University of California Berkeley, AMPLab’s dataset simulates a search engine’s web analytics, providing a variety of real-world data patterns for analytics workloads.  The core tables are `Rankings`and `UserVisits`. We extended the AMPLab dataset with several additional tables to enable us to express user-centric query patterns which we will cover later in the section. This dataset is particularly useful for measuring scans, aggregations, and joins across different data sizes.  AMPLab serves as the basis for our in-depth analysis on the data application query capabilities of Firebolt. We call this extension of AMPLab the “FireNEWT” data application benchmark.

Why the AMPLab Dataset?

The AMPLab dataset was chosen because it provides:

  • Relevance: The dataset’s structure mimics common data patterns found in analytics workloads, often involving complex, large-scale relationships—similar to the vast and highly detailed web traffic data processed by Firebolt customers like SimilarWeb. Their data is particularly complex because it spans multiple dimensions such as geographic regions, devices, time periods, and user behavior, requiring deep analytical insights. By reflecting these challenges, this dataset allows us to rigorously test Firebolt's ability to handle intricate queries efficiently and at scale.
  • Scalability: The dataset’s variety in size and data relationships mimics real-world workloads, allowing us to test the system’s ability to efficiently handle large-scale queries.

Dataset extension 

To better simulate real-world workloads, the AMPLab dataset was extended with additional dimension tables to match customer patterns. This modification helps to accurately reflect the complexity of production query patterns, which could not be fully captured with the original set of tables in the AMPLab dataset.

After analyzing Firebolt's customer query workloads, it became clear that customer queries are quite intricate, often involving multiple dimensions, and the original AMPLab tables did not provide sufficient flexibility to express all these patterns. We added the dimension tables searchwords, agents, and ipaddresses, which capture details such as user search terms, system interaction agents, and IP address information, supporting analysis of search behavior, technology trends, and traffic sources. More details can be found in the table below.

Introducing these dimension tabs allows us to design and test more complex query patterns that closely mirror real-world applications. This approach ensures the benchmark reflects the diverse data models and query types common in customer workloads, providing valuable insights into how any system can perform under similar conditions.

The following table provides an overview of the sizes and characteristics of the tables used in FireNEWT:

Table name Size Row count Description
Rankings 17.2GBs 300,000,000 Stores data about document rankings for various search terms
UserVisits 1.02TBs 6,699,905,372 Records detailed information about user visits, including source and destination URLs, visit dates, ad revenue, and user behavior metrics, enabling the analysis of user engagement and revenue generation across different countries and languages
Agents 180 KBs 2,000 Captures details about user agents interacting with the system, such as the operating system, browser, and device architecture, which can be used to analyze trends in user technology preferences
Searchwords 30 KBs 1,000 Stores information about search terms used by users, including their unique identifiers, when they were first seen, and whether they are categorized as a topic, providing insights into search behavior and trends over time
IPaddresses 641.2 MBs 19,049,831 Maintains a registry of IP addresses linked to their respective autonomous systems and names, useful for network analysis and identifying the geographical or organizational source of traffic

Query workload

In our benchmarking process to define a query workload, we began by analyzing production workloads from our customers. This analysis helped us gain insights into the types of queries they typically run, the resource consumption patterns, and any performance bottlenecks they face. By studying these factors, we could understand the most commonly used query patterns and operator combinations, the levels of concurrency they encounter, and the sizes of datasets they work with. This understanding enabled us to design a benchmark that closely mirrors the challenges customers encounter in real-world scenarios, ensuring that it reflects both Firebolt’s performance and operational efficiency.

We categorized the workload into three tiers based on a scoring system ranging from 0 to 700, where scores reflected query structural complexity. Queries scoring below 50 were classified as simple, those between 50 and 200 as moderate, and those above 200 as complex. This range was chosen to reflect the actual distribution of queries in production environments, from simple lookups to complex, resource-heavy operations. This approach allowed us to evaluate Firebolt’s current performance and ability to handle increasingly complex workloads in the future. 

As described earlier, FireNEWT is designed to rigorously evaluate Firebolt’s ability to manage a wide range of query complexities. It challenges the system with scenarios beyond standard benchmarks by incorporating real-world query patterns, resource-intensive operations, and high-concurrency conditions.FireNEWT features two key scenarios: a power run to test latency and efficiency and a multi-cluster concurrency run to measure throughput under high load. These scenarios will be explored in detail in the next section. The meta point here is that this approach ensures that Firebolt’s performance is evaluated under demanding conditions that closely resemble real customer workloads, making FireNEWT a more realistic and forward-looking benchmark for modern data applications

Clearly, we cannot and would not want to use customer queries or data directly. To maintain realism while protecting customer confidentiality, we adapted production query patterns and rewrote them to leverage the extended AMPLab dataset described above while preserving essential query elements such as joins, aggregations, window functions, CTEs, subqueries, and SQL functions. This allowed us to maintain the integrity of the queries and safeguard our customers’ information while ensuring a realistic assessment of Firebolt’s performance.

Through the creation of FireNEWT, we deliver a comprehensive and accurate benchmarking suite that reflects the evolving demands of modern data applications. This suite gives us valuable insights into how Firebolt performs today and how it will scale to meet future challenges. We used the FireNEWT benchmark to run our experiments.

Engine configuration

A Firebolt engine is a compute resource designed to process data and execute queries efficiently. A Firebolt engine is a collection of one or more identical clusters, each consisting of compute resources defined by node type and node count. Engine has scaling options that allow vertical scaling by adjusting node types (Small, Medium, Large, X-Large) and horizontal scaling by modifying the number of nodes (1-128) to fit varying workload needs. 

Firebolt provides a range of node types, each offering a unique balance of compute resources that directly influences the performance and efficiency of the system. These resources are critical in both the power run and multi-cluster concurrency scenarios, allowing us to evaluate Firebolt's performance across varying scales and workload intensities. Below is a table outlining the node types used in the benchmark along with their respective costs for us-east-1 region on AWS:

Node Type Cost per Hour
Small $2.8
Medium $5.6
Large $11.2
XLarge $22.4

These resource allocations underpin the various engine configurations used in our benchmarks, enabling us to evaluate Firebolt’s performance across different scales and workload intensities.

As part of the benchmark setup, we utilized different Firebolt engine configurations to test performance under various conditions. Each engine name follows a straightforward naming convention. The prefix "FB" stands for Firebolt, followed by a number that indicates the number of nodes in the engine (e.g., 1, 2, 4, or 10). The final letter denotes the size of the node: "S" for Small, "M" for Medium, "L" for Large, and "XL" for Extra-Large. For example, FB_1M refers to an engine with one medium-sized node, while FB_4L refers to an engine with four large nodes. This naming convention allows for easy identification of engine capacity and configuration, which plays a critical role in how different workloads are processed during the benchmark.

Benchmark scenarios

Based on the data and queries described in the previous sections, we constructed three benchmark scenarios to evaluate different aspects of data app workloads: a power run and a multi-cluster concurrency test.

Power run of all FireNEWT benchmark queries

The Power Run scenario is designed to test Firebolt and measure its latency, query execution efficiency, and cost-effectiveness on Firebolt. We sequentially execute all 25 FireNEWT benchmark queries, observing how the system handles a wide range of data application queries. This scenario evaluates performance across different engine configurations, specifically by analyzing the effects of both scale-up and scale-out engines on 1 TB dataset. To increase the stability of the numbers, we run the full set of queries X times with subresult reuse disabled and use the average running time of all runs.

This scenario offers a clear view of Firebolt's baseline efficiency. You’ll gain a better understanding of how different engine configurations affect query processing, which is essential for optimizing workloads. Decision-makers will find this scenario invaluable for resource planning, as it highlights how Firebolt can deliver high performance while managing costs effectively.

Multi-cluster engine concurrency 

The Multi-Cluster Engine Concurrency Run is designed to evaluate how Firebolt scales across multiple clusters, maintaining high throughput and efficiency under extreme concurrency. In this scenario, we execute a highly parameterized query, expanded to 4 million distinct instances, to simulate a real-world, high-demand environment. The specific query pattern used in this scenario is:

SELECT sourceip, searchword, avg(duration) as average_duration 
FROM uservisits 
WHERE visitdate = '2024-09-15' 
and destinationurl = 'firebolt.io' 
group by sourceip, searchword;

We parameterize this query on visitdate and destinationurl to create 4 million distinct queries.

To rigorously test Firebolt's capabilities, we simulate 20 clients, each submitting 400 requests concurrently. As soon as one request completes, another is sent immediately, ensuring a constant total of 8,000 active requests at any given time. This setup pushes the system to its limits, specifically measuring throughput and how well Firebolt manages distributed query execution and resource allocation across multiple clusters—a crucial factor for data analysts and developers overseeing large-scale data operations.

This scenario is crucial for understanding Firebolt's ability to scale out efficiently while maintaining low operational costs and efficient use of resources. The ability to handle massive concurrency with high efficiency directly influences the cost-effectiveness and overall performance of the database platform, especially as data demands continue to grow.

The insights gained from this scenario will demonstrate Firebolt's ability to support data-intensive applications in high-concurrency environments, ensuring the platform delivers strong performance while maintaining cost efficiency. This is vital for organizations looking to expand their data infrastructure without compromising on performance or escalating costs.

Engine configurations

In this benchmark, we evaluate various Firebolt engine configurations to measure both scale-up and scale-out scenarios, with the goal of measuring performance in terms of latency, scalability, concurrency, and cost-efficiency.

Power run configurations

For the Power Run scenario, we assess Firebolt's performance using two different scaling strategies: scale-up and scale-out.

  • Scale-Up Configuration: This setup tests vertical scaling by progressively increasing the size of the node within the engine. We’ll evaluate how Firebolt improves query performance as we move from Small to Medium, Large, and ExtraLarge node engines. This configuration helps us understand how query latency is affected by engine size and how efficiently Firebolt scales with increasing data volume and workload complexity.
  • Scale-Out Configuration: In this setup, we test horizontal scaling by increasing the number of nodes in the engine, evaluating performance across configurations of 1, 2, 4, and 8 Small nodes. This approach allows us to observe how Firebolt distributes workloads and executes queries through distributed query execution across a multi-node engine, aiming to reduce query latency through efficient load balancing and distributed processing. This configuration is essential for understanding how Firebolt scales out to sustain high performance and efficiency as workload demands increase.

Multi-cluster engine configuration

In the multi-cluster setup, we aim to evaluate Firebolt's ability to manage simultaneous query execution while optimizing throughput across multiple clusters. We will test three configurations:

  • A single cluster engine with 1 Large node.
  • A two-cluster engine, each with 1 Large node.
  • A ten-cluster engine, each with 1 Large node.

This setup is designed to measure the impact on throughput as the number of clusters increases. By observing how Firebolt handles workload distribution across these configurations, we can assess its effectiveness in scaling out to maximize query throughput, even under high concurrency.

Result, observations and insights

Power run analysis

In modern cloud environments, performance and cost are critical factors when deciding between scaling up (using larger nodes) or scaling out (using more nodes). Below, we dive into the Power Run Benchmark, comparing both approaches—scale-up and scale-out—to determine how they affect execution time and cost efficiency.

Scale-up: Faster performance as size of the nodes increases

Scaling up involves increasing the size and capacity of individual nodes, allowing for faster execution times due to greater compute power. The results from scaling up through different node types are clear:

Execution Speed: As we scale up from FB_1S to FB_1XL, execution time improves significantly, but the speedup factor diminishes as we move to larger configurations. For example, scaling from FB_1S to FB_1M results in a near-perfect 2x improvement, reducing total execution time from 20.1 seconds to 11.2 seconds. However, as we scale further to FB_1L and FB_1XL, the improvements become smaller. The FB_1XL node processes all 25 queries in 6.1 seconds, which is about 3.3x faster than FB_1S, but only 1.27x faster than FB_1L.

Data shows that FB_1XL is an excellent choice when you need to maximize performance for time-sensitive workloads like real-time analytics, where faster response times directly translate to business value. On the other hand, FB_1S offers the best price-performance ratio when cost efficiency is the primary concern.

These diminishing returns are a common pattern seen across most engines when scaling up—scaling always makes things faster, but the efficiency of speedup lessens as you add more resources. While Firebolt excels in improving execution times with scaling, we recognize that there are still opportunities to enhance performance further, particularly in higher configurations. Our goal is to continue pushing these limits to achieve even better results for large-scale workloads.

Cost efficiency: While FB_1XL offers superior speed, it comes at a higher cost. FB_1XL costs $0.038 for the whole power run compared to $0.015 for the FB_1S. However, the trade-off between speed and cost can be worthwhile in scenarios where faster processing times lead to greater operational efficiency or cost savings elsewhere in the workflow (e.g., avoiding bottlenecks or freeing up resources for other tasks). Additionally, as we scaled up, we observed that while query execution became faster, resource utilization (CPU) was below 63% at a peak, indicating that the engine did not fully utilize its capacity . This reflects the potential for running more queries concurrently, effectively distributing the workload. As a result, the perceived higher cost of FB_1XL can actually be amortized when running more queries, making it a cost-efficient option when maximizing resource utilization.

Takeaway: Scale-up delivers excellent performance gains by reducing execution time as node size increases. However, larger nodes come at a premium, so the decision to scale up should be based on whether the increased speed justifies the additional cost.

Scale out: Speed through distributed processing

Scaling out involves using multiple smaller nodes to distribute the workload, enabling parallel processing. Let’s explore how execution time and cost evolve when scaling out with multiple S-type nodes:

Execution speed: As you scale out by adding more nodes, the total execution time decreases due to task parallelization. For example, moving from a single FB_1S node to an FB_8S configuration reduces execution time from 20.1 seconds to 8.2 seconds, cutting the time by little more than half. However, the reduction in execution time doesn't scale in direct proportion to the number of nodes added. This reflects a common pattern in distributed systems, where the initial gains from adding nodes are substantial, but the improvements diminish as more nodes are introduced. 

app_q21 FB_1S FB_8S FB_1XL
Latency (seconds) 0.71 0.083 0.086
Price-Performance (cents) 0.05 0.05 0.05

Despite this, scaling out remains an attractive option for workloads that benefit from parallel processing, such as longer running queries. In particular, for resource-intensive and complex queries like app_q21, scaling out to FB_8S delivers a perfect improvement, reducing latency by a factor of 8x. This shows that while scaling out offers diminishing returns for overall execution times, it can still provide significant gains for complex workloads, making it a powerful approach for handling more demanding queries.

Cost efficiency: However, this increase in speed comes with a corresponding rise in cost. The cost of the FB_8S configuration is $0.05, which is more than triple the cost of the FB_1S node. While this is a significant cost increase, the ability to complete tasks faster in parallel may justify the expense, particularly in high-throughput environments where overall query speed is critical.

For overall execution time, while FB_8S is nearly 2.5 times faster than FB_1S, it comes with an 8x higher cost per second, highlighting the trade-off between speed and cost when scaling out. However, in the case of complex queries like app_q21, scale-out demonstrates near-linear improvements in latency without increasing costs proportionally. This indicates that for more intricate queries, scaling out can offer both performance gains and cost-efficiency. Additionally, as we scaled out, we observed that while query execution became faster, resource utilization (CPU) was below 50% at a peak, indicating that the engine did not fully utilize its capacity . This reflects the potential for running more queries concurrently, effectively distributing the workload. 

Takeaway: For the current FireNEWT benchmark queries, scaling out delivers significant improvements in execution time by leveraging parallelism, but it also increases cost per second as more nodes are added. However, for complex queries like app_q21, scale-out can provide near-linear improvements in latency without an increase in cost, making it an efficient choice for intricate workloads. This approach is best suited for scenarios where both speed and complexity justify the added cost of parallel processing.

Analysis of multi-cluster engine concurrency

In high-concurrency environments, where multiple clusters are used to handle large volumes of queries, understanding the system's scalability, performance trade-offs, and stability is crucial. The data we gathered from running 1, 2, and 10 clusters of 1L size highlights key trends in QPS (Queries Per Second), latency, and error handling that offer insights into how the system behaves as the workload scales.

Scaling throughput with multi-cluster concurrency

Throughput is a key performance metric for systems handling large volumes of queries, and the data shows a significant increase in Queries Per Second (QPS) as clusters are added in highly concurrent environments. The graph below illustrates the QPS growth as the number of clusters increases.

The data shows that QPS scales nearly tenfold from 449 with 1 cluster to 4213 with 10 clusters, demonstrating the system's strong ability to handle increased query load. This near-linear scaling indicates that the system efficiently utilizes additional resources as more clusters are added, ensuring that query throughput increases proportionally to the number of clusters.

Takeaway: The addition of clusters leads to a substantial improvement in throughput, making the system well-suited for high-concurrency environments where rapid query execution and scalability are critical. This performance scalability ensures that the system can handle larger workloads without compromising speed or efficiency.

Latency breakdown: Analyzing the typical query performance: 

Latency is a critical performance metric for any system handling queries, and the data shows a clear improvement in response times as more clusters are added in a highly concurrent request environment. The plot below shows the distribution of query latencies under high load as we increase number of clusters of the engine:

As shown in the median and 90th percentile, most queries see a significant reduction in response times as clusters increase. For example, median latency drops from 214 ms with 1 cluster to just 48 ms with 10 clusters, indicating that the majority of queries benefit from faster processing. Even the 90th percentile latency improves as the number of clusters increases, though the improvement starts to plateau between 2 and 10 clusters.

Takeaway: The scaling of clusters results in improved latency for the majority of queries, with noticeable gains in both median and 90th percentile latency. This means the typical user or query workload sees faster response times as more clusters are added, making it ideal for environments requiring fast, consistent query performance.

Ensuring stability:

Across all cluster configurations—1, 2, and 10 clusters— no errors occurred during query processing regardless of the cluster size. This error-free performance is crucial for ensuring reliability and stability in a multi-cluster setup, where increasing the workload and adding clusters often introduces more complexity.

Takeaway: The system demonstrates high reliability, handling large-scale query workloads without errors, even when the number of clusters and the number of queries processed increases significantly. This makes the system dependable for production environments that require consistent query execution without disruptions.

Conclusion

The benchmark results confirm that Firebolt delivers a strong balance of performance, cost-efficiency, and concurrency, making it an excellent choice for data-intensive applications. With its advanced vectorized execution engine, indexing, and query optimizer, Firebolt excels at handling complex queries while maintaining low latency and high concurrency. The scalability of the platform allows users to choose between scaling up for more powerful individual nodes or scaling out to distribute workloads across multiple nodes, providing flexibility based on workload demands.

While scaling up offers rapid improvements in execution time, especially for real-time analytics, it comes with higher costs as node sizes increase. On the other hand, scaling out shows substantial performance gains in tasks that benefit from parallel processing, such as complex queries with multiple joins and aggregations, like app_q21, where latency improves significantly without a proportional cost increase.

Firebolt’s multi-cluster concurrency further showcases its ability to handle heavy query loads, maintaining high throughput and low latency even under extreme concurrency. The system’s ability to scale efficiently, coupled with its cost-effective configurations, makes it a reliable solution for modern data environments that demand both speed and affordability.

Overall, Firebolt’s architecture and performance optimizations position it as a future-proof platform capable of meeting the evolving needs of data analysts and developers, ensuring high performance at scale while keeping operational costs in check.

Try it Yourself

As data workloads grow in complexity and scale, optimizing for performance and cost-efficiency is crucial. This benchmarking exercise demonstrates how Firebolt can efficiently handle diverse query workloads using both scale-up and scale-out configurations, ensuring flexibility and consistent performance under high concurrency.You can access the FireNEWT benchmark GitHub repo to reproduce these results and test Firebolt’s performance under real-world conditions. Whether you’re a data analyst seeking low-latency query execution, a developer focused on high concurrency, or a decision-maker evaluating cost-performance, Firebolt provides the tools to meet your specific needs.To explore Firebolt’s features, you can get started for free here.

Dive into its vectorized execution, intelligent indexing, and scalable architecture to optimize your data workloads. The FireNEWT benchmark is available for you to modify and adapt to your specific use cases, whether they involve data applications, ETL processes, or real-time analytics.By leveraging Firebolt, you’ll discover how it can optimize your data operations with low-latency, high-concurrency performance that scales with your needs—all at a cost-effective price.

1 The app_q21 query was chosen to demonstrate low latency as we scale up or scale out because it represents a common yet complex query pattern. It processes web traffic data by combining information from multiple tables through four joins, performing five aggregations, using a subquery, multiple SQL functions, and complex conditional logic. This query analyzes metrics such as ad revenue, visitor engagement, and page rankings, making it an ideal choice for evaluating performance in real-world scenarios where similar query patterns are often used. 
2 https://amplab.cs.berkeley.edu/benchmark/
Read all the posts

Intrigued? Want to read some more?