September 17, 2024

Low Latency Incremental Ingestion: Benchmarking Fast and Efficient DML Operations

No items found.

Introduction

This blog post explores Firebolt's performance characteristics for incremental data ingestion workloads, sometimes called “trickle” ingestion. Though less common than bulk ingestion, incremental ingestion is a pattern we see in Firebolt customer workloads, and a benchmark enables us to assess our system’s performance in these cases. Our benchmarks also include other data manipulation (DML) queries to better capture the types of ongoing operations we see on production workloads.

DML operations are challenging for data warehouses due to how data is typically stored, especially in immutable object storage. DML statements introduce processing overhead due to the need for transactional consistency, snapshot isolation, and rewriting/reorganizing files in storage, resulting in poor performance. This benchmark helps characterize and demonstrate Firebolt’s efficiency when processing DML queries.

This blog is part of a series examining Firebolt's end-to-end query capabilities. You can read this blog for a deep dive into our query engine's efficiency and concurrency performance. Our benchmark analysis of bulk ingestion workloads can be found here. At the end of this article, you will find instructions and resources on how to recreate the benchmarks locally.

Summary of Results

Firebolt consistently offers low-latency performance across DML queries. Singleton and micro-batch INSERT queries have P95 response times of 0.273 and 0.284 seconds, respectively, only rising to a P95 of 0.441 seconds when inserting 1,000 rows at a time. Inserting new data is the most common DML operation on existing datasets, but customers often update or delete production data. The P95 latencies for UPDATE and DELETE queries (tested as singleton queries) are 0.263 and 0.313 seconds, respectively. The following sections detail the datasets, queries, and results from our DML benchmarks.

Benchmark Setup

Data

The basis for our trickle ingestion benchmark is AMPLab Big Data Benchmark, which is the same framework we used to benchmark Firebolt’s bulk ingestion and query performance. Much like those benchmarks, we used the uservisits table, created with the following data definition language (DDL):

CREATE FACT TABLE uservisits (
  sourceip TEXT NOT NULL,
  destinationurl TEXT NOT NULL,
  visitdate DATE NOT NULL,
  adrevenue REAL NOT NULL,
  useragent TEXT NOT NULL,
  countrycode TEXT NOT NULL,
  languagecode TEXT NOT NULL,
  searchword TEXT NOT NULL,
  duration INTEGER NOT NULL
) PRIMARY INDEX visitdate, destinationurl, sourceip;

For incremental ingestion scenarios, inserting data into an existing table rather than an empty one is more realistic because the goal is to measure performance when adding (or updating) small amounts of data to an existing dataset. We chose a version of the uservisits table that is approximately 100GB.

# of Columns # of Rows Size (uncompressed)
10 749,995,154 100GB

We settled on 100GB as the size of our dataset for two primary reasons: First, Firebolt stores data on internal structures called tablets, where it’s sorted, compressed, and indexed. A dataset of 100GB ensures we’re filling multiple tablets, which can increase the time to update data in storage, and thus gives us a realistic view of ACID DML performance. Secondly, limiting the size increases cost-effectiveness and accessibility for those who want to recreate these measurements, and we saw little difference with larger-sized base tables. Lastly, we defined a primary index when creating the table to make the tables and queries realistic and performant. Indexes are critical to high performance in Firebolt. Learn more about Firebolt’s primary indexes and how to choose the right columns for your indexes in our documentation.

Hardware

For our tests, we used an engine configured with a single small node. Our DML benchmarks focus on latency, not throughput or concurrency, and a small node is more than sufficient to demonstrate Firebolt’s low latency for incremental updates. A single small node has the added benefit of being inexpensive.

Engine Node Size # of Nodes $/hour
Small 1 2.80

Test fixture

We ran the public benchmarking package from an AWS EC2 instance, type c5a.4xlarge, located in the us-east-1 region, targeting the public Firebolt API in the region. This follows a real-world configuration for customers who want to achieve the best performance and reduces external latencies to a negligible level for analysis.

Syntax

Each UPDATE and DELETE query targets the three fields of the primary index with a specific value known to be true for only a single row. For UPDATE, all columns were modified slightly based on their existing value. Simplified INSERT, UPDATE, and DELETE queries are below to show our approach to the benchmark. As mentioned, all the queries can be found in our GitHub repository for the entire series of benchmarks.

INSERT:
INSERT INTO table VALUES  (row 1 values);
INSERT INTO table VALUES  (row 1 values), ..., (row 10000 values);

UPDATE:
UPDATE table SET <every column to mutated value> WHERE <primary index columns all equal a specific value>;

DELETE:
DELETE FROM table WHERE <primary index columns all equal a specific value>;

Benchmark Scenarios

We explored three different scenarios for our incremental ingestion (and DML) benchmarks. Our primary goal was to measure how efficiently Firebolt ingests new data in single or micro-batches. This type of ingestion is common, but so are the other DML commands: UPDATE and DELETE. We tested all three commands after our initial ingestion of a 100GB table:

Scenario 1: INSERT

Our primary scenario inserts rows into an existing 100GB table. The following sequences of INSERT statements allow us to measure the performance of workloads of various sizes, from singleton to small batch. Similar queries are common among our customers. For each of the following sizes of INSERT, a fresh base table is created, and 100 INSERTs are issued sequentially:

  • INSERT 1 row x 100
  • INSERT 10 rows x 100
  • INSERT 100 rows x 100
  • INSERT 1,000 rows x 100

Scenario 2: UPDATE

Updates to existing tables in a data warehouse are typically less frequent than inserting new data but are a common workflow, such as when correcting data errors. Our update scenario is similar to inserting data, starting with a fresh base table and performing similar updates of single rows. The target rows were chosen at random.

  • UPDATE 1 row x 100

Scenario 3: DELETE

Much like our updates, we create a fresh base table and do singleton deletes 100 times using the same methodology:

  • DELETE 1 row x 100

Full Results & Additional Insights

We focus on two primary metrics for all DML operations: median and 95th percentile (P95), provided in fractions of a second. For the INSERT operations, we group them by singleton, micro-batch (10 rows per query), and medium batch (100 and 1000 rows per query)2. The UPDATE and DELETE queries are singleton. In Firebolt, all updates, including INSERT, UPDATE, and DELETE, are fully transactional. In other words, the following results reflect latency after data has been persisted to S3 (durably stored), with the guarantees of snapshot isolation and strong consistency. When updates are committed in Firebolt, readers across all engines see all changes written on any engine due to Firebolt's strong consistency.

Query # of Rows / Query # of Queries Total Rows Updated Median (seconds) P95 (seconds)
INSERT 1 100 100 0.183 0.273
INSERT 10 100 1000 0.188 0.284
INSERT 100 100 10000 0.201 0.271
INSERT 1000 100 100000 0.305 0.441
UPDATE 1 100 100 0.215 0.313
DELETE 1 100 100 0.194 0.313

At a median of 0.183 and a P95 of 0.273 seconds, singleton INSERT queries performed well. Micro-batch queries of 10 rows climbed to only 0.188 and 0.284 seconds for median and P95, respectively.  Across all INSERT queries, the maximum latency we observed was 0.441 seconds at P95 for 1,000-row batch inserts. Overall, observed latency for micro- and small-batch queries was performant, making ongoing updates to Firebolt tables efficient and responsive, allowing for near real-time data ingestion and querying.

Both UPDATE and DELETE queries demonstrate that Firebolt’s low-latency DML performance extends beyond inserting new data. UPDATE and DELETE queries performed similarly to micro-batch inserts, with median latencies of 0.215 seconds and 0.194 seconds, respectively. These times only marginally increase at P95, reaching a maximum of 0.313 seconds for DELETE queries.

Firebolt shows strong performance for incremental ingestion. When the need for DML arises, Firebolt enables these mutations at low latency, transactionally, and without degrading the performance of the primary read workload. Furthermore, these tests were done with the smallest engine configuration possible - ensuring cost efficiency.

Conclusion

Firebolt’s core value proposition centers on delivering high-concurrency, low-latency queries. However, ongoing data maintenance workflows, such as inserting new data, fixing old data, or deleting partitions to save space, all benefit from low latency and transactional support. Updating your tables does not impact the primary read workload. Firebolt continues to optimize its core query performance and DML operations so that customers get the end-to-end performance they can rely on.

DIY Benchmark

These benchmarks are reproducible. The instructions in this GitHub repository contain all the files and scripts necessary to recreate them. We look forward to hearing about your experience, whether you want to learn more about these benchmarks, experiment with the same datasets, or verify the results presented above.

1 You can read more about tablets, data management, and the overall Firebolt architecture, in our whitepaper
2 Micro, medium, and large batches are not strictly defined. For large batch (bulk) ingestion, we conducted a separate benchmark and wrote a blog reporting the results.
Read all the posts

Intrigued? Want to read some more?