We use tools like SCA, SAST for code analysis, along with practices such as Fuzzing, scanning for pipeline weaknesses (like the use of unverified external sources), and secret scans as part of our secure software development lifecycle.
We use tools like SCA, SAST for code analysis, along with practices such as Fuzzing, scanning for pipeline weaknesses (like the use of unverified external sources), and secret scans as part of our secure software development lifecycle.
Switching production workloads to Firebolt typically involves updating configuration to point to Firebolt endpoints. If all validation is complete and data is already present, this process is straightforward.
Firebolt recommends using aggregating indexes where possible for regularly queried granularities (e.g., daily or weekly), and employing pre-joined or pre-aggregated tables to simplify and speed up dashboard queries. Ensure indexes align closely with filter criteria to optimize query performance across various granularities.
When a tenant comprises a large percentage of data (e.g., 20-25% of all data), avoid subqueries or joins that initially select large volumes of data and subsequently discard most rows. Instead, optimize queries and table structures to filter data as early and narrowly as possible, potentially using aggregated or pre-joined tables.
Firebolt supports both using views and pre-joined tables. However, if most of the query execution time is spent on joins rather than aggregations, pre-joining tables (i.e., creating wider, denormalized tables during data ingestion) is often more performant. Views are effective for reusable SQL but may become slower with complex joins at scale. Aggregating indexes, which can pre-materialize aggregation results for fast query responses, work best on single tables without cross-table joins.
Yes, primary indexes significantly impact query performance in Firebolt. Ensuring correct and optimized indexes is crucial, especially during migration. Indexes should be carefully reviewed and implemented based on query patterns and use cases.
You can add more users to your Firebolt account by either adding them through the web application under or with SQL commands. First create a login, using the email address of your invitee as the login_id. Next, associate the login to a user and assign them the appropriate permissions. Your invitee wiill automatically receive an email invitation to join your account. For more information visit our documentation.
Setting up Apache Superset with Firebolt involves: - Installing Superset locally or on a server. - Configuring the Firebolt connector with appropriate credentials and connection parameters. - Testing queries in Superset to ensure Firebolt’s indexing structure is leveraged efficiently. - Optimizing queries for dashboard performance by using Firebolt’s indexing features to minimize latency. In this case, there were some challenges with reinstalling Superset, but Firebolt’s team is available to assist with setup and troubleshooting.
Primary indexes should include the most frequently used filters, such as tenant_id and date/time columns if queries consistently filter data by tenant and date ranges. A well-chosen primary index ensures queries access only relevant data partitions, maintaining fast performance even as data volumes scale significantly.
Query performance in high-cardinality joins is significantly impacted by data cardinality, joins resulting in large intermediate row outputs, and data shuffles across nodes. Firebolt users should leverage the EXPLAIN ANALYZE functionality to identify expensive operations such as table scans, joins, and shuffles. Reducing data volume before joins through effective indexing, semi-joins, or aggregation indexes can mitigate these impacts.
Yes, semi-joins (implemented via WHERE IN clauses) can be more performant than explicit joins, as Firebolt has built-in optimizations that leverage semi-joins for better data pruning. Using semi-joins helps reduce intermediate row counts earlier in query execution, especially beneficial for high-cardinality datasets.
First, on your S3 account, confirgure the permission policy found in the help center article, https://docs.firebolt.io/Guides/loading-data/configuring-aws-role-to-access-amazon-s3.html#use-aws-iam-roles-to-access-amazon-s3. While still in your AWS Identity and Access Management (IAM) Console, start the process to upload data through the plus sign icon in the develop space. After selecting an ingestion engine, you can select 'IAM Role' as your authetnication method and you can create an IAM role in the application. Copy the trust policy here and follow the rest of the instructions in the article to apply to your AWS account. Note that you don't actually have to upload anything to create the IAM role.
In Firebolt's query profiling, CPU time refers to the actual processing time on CPU cores, while thread time represents the total wall-clock time across all threads and nodes. When thread time is significantly higher than CPU time, it typically indicates waits due to data loading from storage (like S3) or node concurrency constraints. This distinction helps diagnose bottlenecks related to IO-bound or compute-bound workloads.
For high concurrency, use multiple clusters within your engine. Clusters help handle more simultaneous queries by distributing the load. Keep in mind that cache is shared across nodes in a cluster, but not between clusters, so the right balance depends on your workload. You can also consider using auto-scaling to dynamically adjust resources based on demand.
Firebolt proatively maintains a status page at https://firebolt.statuspage.io/ where we keep you notified about any active incidents that may cause interruption to your access or services. From this page, you can also hit the 'subscribe' button to stay informed by phone, RSS, email, or Slack.
You can label a query by setting the query_label system setting before running it:
cursor.execute("set query_label = '<label>';")
cursor.execute("your_query_here")
Here’s a full example using the Firebolt Python SDK:
id = '****'
secret = '****'
connection = connect(
database="<db_name>",
account_name="<account_name>",
auth=ClientCredentials(id, secret)
)
cursor = connection.cursor()
cursor.execute("start engine <engine_name>")
cursor.execute("use engine <engine_name>")
cursor.execute("use database <database_name>")
cursor.execute("set query_label = '123';")
cursor.execute("select 1;")
print(cursor.fetchone())
connection.close()
While Firebolt adheres to NIST SP 800-53, NIST 800-171, and NIST CSF guidelines, we are not currently FedRAMP compliant.
Firebolt is not PCI-DSS compliant and does not permit credit card data storage on its platform.
A separate BAA with Firebolt is required since our service includes proprietary technology and other sub-processors not covered under the standard AWS HIPAA Eligible Services.
Yes. As a business associate under HIPAA, we support business associate agreements (BAAs) to ensure healthcare data protection. Our SOC 2 Type-2 + HIPAA report is available subject to a Non-Disclosure Agreement (NDA)
Firebolt is certified for ISO 27001 and ISO 27018. Certification reports are available here.
Yes, our SOC 2 Type-2 + HIPAA report is available subject to a Non-Disclosure Agreement (NDA).
Yes, more details in our End User License Agreement (EULA) and Data Processing Addendum (DPA).
Yes, during our POC process, Firebolt's team will provide you with fast and accurate cost estimates based on real usage data. During the POC, our team will closely support you, analyzing engine usage, query patterns, and resource consumption to deliver a precise cost breakdown. With our efficient benchmarking and expert guidance, you’ll quickly understand your projected costs, ensuring transparency and confidence in scaling with Firebolt.
Firebolt provides engine consumption and spend information in the Web UI. Additionally, granular engine-level consumption can be found via the information_schema.engine_metering_history view that details the hourly consumption of all the engines within an account. Users can also drill down into how the topology of their engines (node type, number of nodes and number of clusters) was modified over time, providing visibility into the FBU consumption of their engines.
Firebolt provides multidimensional scaling to help right-size workloads. Autostop and Autostart are features that help reduce costs by eliminating idle time. Firebolt also provides global visibility of consumption and costs through built-in organizational governance and account-level consumption breakdown.
Yes, commitment based discounts are available. Contact our sales team for more information.
US East (Virginia), US West (Oregon) and EU (Frankfurt)
Consumption starts when the engine endpoint is available for querying.
Each node type consumes a specified number of FBUs per hour. Compute consumption is billed in one-second increments. For example, a type ‘M’ node consumes 16 FBUs per hour. The same node running for one minute will consume FBU calculated as such: Consumed FBU = (Available FBU per hour / 3600) x ( 1 x 60 seconds) = (16/3600) x 60 = 0.27 FBUs.
While FBUs measure consumption, the performance profile of a workload depends on the engine topology.
Firebolt Unit is a normalized measurement of consumption. FBU normalizes consumption management irrespective of node type, number of nodes, number of clusters, duration of consumption, etc. Thanks to Firebolt’s multidimensional scaling, per-second billing, and auto-stop/start capabilities, compute consumption can be a fraction of a minute. FBU eliminates the need to keep track of individual node types, nodes, and the number of clusters. There’s no binding to specific instance types, so you are free to use pre-paid credits on any node type.
Firebolt's billing is generally sent monthly, aligning with the AWS billing cycle. The bill email provides a breakdown of engine usage and storage consumption, giving you visibility into your total cost. Because Firebolt runs on AWS infrastructure, its billing is influenced by the resources consumed in AWS, and the timing of Firebolt’s billing is closely aligned with AWS bills for the same period.
Firebolt provides comprehensive billing view that break down both compute (engine) consumption and storage usage. You can access detailed information on engine usage through the information_schema.engines_billing table and storage usage through the information_schema.storage_billing table. These tables and UI view offer granular insights into usage by specific engines, storage by table, and usage patterns, allowing for better cost tracking and resource optimization. The billing details can be viewed by hour, day, or month in the Firebolt UI, helping users stay informed about their resource consumption.
The considerations for splitting into separate databases include governance, logical isolation, and performance aspects related to metadata caching. Here are the key points:
Governance and Isolation: Different databases can have different owners and permissions, allowing for better governance. This is particularly important when different teams or departments manage their own data.
Logical Grouping: Currently, without support for custom schemas, databases serve as the primary mechanism to logically group tables and views. This will change when custom schemas are introduced.
Performance on Metadata Caching: The packdb caches metadata per database. A single large database with all tables may complicate this caching process, although the practical impact is likely minimal except in specific scenarios.
Cross Database Queries: At present, cross-database queries are not supported, making it impractical to have a separate database for each table if joins are required. When cross-database queries are supported, they may incur some performance degradation compared to querying within the same database due to metadata storage methods.
Security: From a security perspective, Role-Based Access Control (RBAC) can be applied at the table level to restrict access to specific users, enhancing data security.
In summary, while there are some advantages to splitting databases, such as improved governance and security, the current limitations regarding cross-database queries and potential performance issues should be carefully considered before making a decision.
There is a soft limit of 100 databases per account, that can be increased if needed.
Firebolt does not yet support automatic cross-region replication. If you need to replicate data across regions, you will need to handle the data replication process manually using external tools or services like AWS DataSync or S3 cross-region replication.
On Firebolt Data is stored in Amazon S3, which inherently offers durability and availability features leveraging copies of data stored in 3 Availability Zones per Region. However, Firebolt does not natively provide cross-region disaster recovery (DR) at this time, so manual processes would need to be in place for cross-region DR setups. Compute High Availability across Availability Zones is a roadmap item.
Firebolt supports deployment in multiple AWS regions, allowing you to choose the most appropriate region for your data and workloads. However, Firebolt does not currently offer seamless, cross-region deployments within a single account. To deploy across multiple regions, you need to create separate accounts in each region.
Firebolt is built natively on AWS and currently does not support running directly on Google Cloud Platform (GCP) or MS Azure. You would need to use AWS as the backend for Firebolt, but you can still ingest data from other cloud platforms through various data ingestion tools and connectors, or by loading data from those platforms into S3.
Yes, transferring data between different AWS regions incurs cross-region data transfer costs according to AWS pricing. Firebolt itself does not add additional fees for cross-regional data transfers, but users should consider AWS network charges when moving data across regions.
To work with authentication tokens in Firebolt:
Generate a token via Firebolt’s authentication endpoint using your client ID and secret.
Example:
curl -X POST https://id.app.firebolt.io/oauth/token \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'grant_type=client_credentials' \
--data-urlencode 'client_id=YOUR_CLIENT_ID' \
--data-urlencode 'client_secret=YOUR_CLIENT_SECRET'
Use the token in API requests by including it in the authorization header:
--header 'Authorization: Bearer YOUR_ACCESS_TOKEN'
Refresh tokens regularly, as they expire. Keep tokens secure using environment variables or secret managers.
For more details, refer to the Firebolt API documentation.
When using service accounts in Firebolt, access errors can occur due to incorrect credentials, missing permissions, or expired tokens. To troubleshoot these errors, follow these steps:
- Check your credentials: Ensure that the service account credentials (client ID and secret) are correct and active.
- Verify permissions: Make sure the service account has the appropriate role or permissions to access the resources you are trying to interact with, such as S3 buckets or Firebolt tables.
- Refresh authentication tokens: If you're using tokens, ensure they have not expired. Tokens generated for service accounts typically have a limited lifespan.
- Logs and error details: Review the error message logs for more specific information about the access failure.
- Review documentation: Follow the Firebolt documentation on service accounts for proper setup, permissions, and token management.
Firebolt's Python SDK provides detailed error message handling for SQL queries. When an error occurs, the SDK generates helpful error messages, allowing users to quickly diagnose and fix issues such as syntax problems or missing credentials. The SDK also offers robust logging and debugging capabilities, making it easier for developers to troubleshoot errors in their applications. For more information, refer to the Firebolt Python SDK documentation or visit the GitHub repository for examples.
Firebolt does not natively support Presto. However, Firebolt provides its own high-performance SQL engine and you can integrate it with your existing data infrastructure via ODBC, JDBC, and REST API.
While this is on our roadmap, Firebolt currently does not have a native integration with Kafka. However, you can ingest Kafka data into Firebolt using intermediate storage systems like S3.
Firebolt provides a custom Airflow connector that allows you to orchestrate and automate your Firebolt data workflows directly from Airflow. This integration helps in managing ETL processes, scheduling queries, and handling data pipelines efficiently.
Yes, Firebolt supports data migration from Redshift through standard ETL tools. You can move data from Redshift to Firebolt by exporting Redshift data to S3 and then using Firebolt’s COPY FROM command to ingest data into Firebolt tables.
Firebolt is continuously expanding its integration ecosystem to support a wide range of data sources and connectors. If your preferred connector isn't listed in the current documentation, don’t worry! Firebolt’s development team is actively working on adding new integrations, and you can expect ongoing enhancements to its capabilities.
In the meantime, you can reach out to Firebolt support to inquire about upcoming connectors or even request a specific integration. Firebolt also supports custom connectors through its API and can integrate with many systems using standard protocols like JDBC and ODBC, giving you the flexibility to connect to external sources in various ways.
While this is on our roadmap, Firebolt does not natively integrate with Delta Lake or Databricks. However, you can use data transfer solutions to migrate data between Firebolt and Databricks or Delta Lake via standard ETL tools, enabling the two platforms to coexist in a broader data architecture
System settings in Firebolt allow you to control query execution behavior and performance, providing flexibility when needed. This is particularly useful when you want to override default settings for specific queries via the REST API.
To adjust settings such as the time_zone, you can embed them directly in the URL of your API call. For example, if you need to set the time_zone to UTC, include the parameter in the API call URL.
Example API call:
curl --location 'https://<user engine URL>?engine=<engine_name>&database=<database_name>&time_zone=UTC' \
--header 'Authorization: Bearer <authentication_token>' \
--data "SELECT TIMESTAMPTZ '1996-09-03 11:19:33.123456 Europe/Berlin'"
This query sets the time_zone system setting to UTC for the duration of the query. Each new API call requires you to include the necessary system settings again if you want to apply specific overrides.
Firebolt can be integrated with Coralogix through OpenTelemetry. Firebolt’s OTel Exporter allows you to export Firebolt engine metrics, query logs, and other telemetry data to any OpenTelemetry-compatible platform, including Coralogix. This integration enables real-time monitoring and troubleshooting, giving you better insights into engine performance, query execution, and resource usage. You can refer to Firebolt's GitHub repository for additional setup details and code samples.
If you encounter errors due to missing credentials when accessing AWS S3 from Firebolt, ensure that you have the correct IAM roles and policies assigned. Alternatively, you can provide AWS keys directly within your external table definition using the CREDENTIALS parameter. Check your AWS permissions and Firebolt’s documentation for troubleshooting credential errors.
While this is coming soon, Firebolt does not natively support geospatial data types or queries. However, you can still store and manage geospatial data using standard data types like strings and numeric values, and process geospatial information via external tools or data pipelines integrated with Firebolt.
Yes, Firebolt integrates with a wide range of popular BI and data tools, including Looker, Tableau, and Power BI, among others. These integrations allow users to leverage Firebolt’s performance while visualizing and analyzing data in their preferred tools. Additionally, Firebolt offers JDBC and ODBC drivers to facilitate connectivity with other tools.
At present, Firebolt does not have a direct API connection to external data sources like Google Sheets. However, you can leverage third-party tools or custom ETL pipelines to load data from sources like Google Sheets into Firebolt for analysis.
Yes, Firebolt integrates seamlessly with dbt (data build tool). Firebolt’s dbt adapter allows you to model, transform, and manage your data workflows using dbt. This integration combines dbt’s transformation capabilities with Firebolt’s high-performance query engine, enabling ELT workflows. You can also define models in dbt to run directly on Firebolt, helping you process large volumes of data more efficiently. For more details, visit Firebolt's blog on ELT with dbt.
When using a NOT IN filter, rows where the column value is NULL are excluded from the results, even though NULL is not in the list of values. This is because SQL treats comparisons with NULL as UNKNOWN, which prevents those rows from being returned.
How to include NULL in NOT IN results:
To include rows with NULL values, add an explicit condition checking for NULL using OR column IS NULL.
Example:
SELECT *
FROM players
WHERE playerid NOT IN (1, 2, 3) OR playerid IS NULL;
This query will include rows where playerid is either NOT IN the list or is NULL, ensuring that NULL values are part of the result set.
This error occurs when Firebolt cannot convert data from a text format (e.g., CSV or TSV) to the expected column data type defined in the external table schema.
Common Scenarios:
Mismatched Data Types: If a column contains a value that doesn’t match the expected type (e.g., a string in a numeric column).
Example: A file contains the value "abc" in a column defined as LONG, which leads to the error.
Header Rows in Files: If a CSV file includes a header row and it's not excluded, Firebolt tries to interpret the header text as data.
Solution: Use SKIP_HEADER_ROWS in the TYPE parameter of the CREATE EXTERNAL TABLE DDL.
Troubleshooting Tip: Use a text editor to inspect the first few rows of the file for mismatches. If the issue isn’t obvious, use SELECT...LIMIT and OFFSET to locate problematic rows and identify the file using the SOURCE_FILE_NAME column.
Example query:
SELECT SOURCE_FILE_NAME, COUNT(*)
FROM (SELECT *, SOURCE_FILE_NAME FROM my_external_table LIMIT 10000 OFFSET 0)
GROUP BY SOURCE_FILE_NAME;
To implement LEFT() and RIGHT() string functions in Firebolt, you can use the SUBSTR() function, as Firebolt does not natively support these functions.
LEFT() Alternative
To replicate the LEFT() function, use SUBSTR() to extract characters from the left side of a string. For example:
SELECT SUBSTR(nickname, 1, 6) FROM players WHERE nickname = 'murrayrebecca';
-- This returns "murray"
This extracts the first 6 characters from the string.
RIGHT() Alternative
For the RIGHT() function, combine SUBSTR() with LENGTH() to extract characters from the right side of the string. For example:
SELECT SUBSTR(nickname, LENGTH(nickname) - 6) FROM players WHERE nickname = 'murrayrebecca';
-- This returns "rebecca"
This extracts the last 7 characters by calculating the length of the string and subtracting the desired number of characters.
These methods allow you to achieve the same functionality as LEFT() and RIGHT() using SUBSTR() in Firebolt.
Use PARTITION BY when you need to split the table into distinct data segments for better data management or to prune large amounts of data quickly. Partitioning allows for efficient data removal (e.g., ALTER TABLE...DROP PARTITION).
Use the Primary Index when you want to organize the order of data for optimal query performance. The primary index helps Firebolt efficiently prune data during queries based on filter conditions.
Example:
If you often query by playerid but also need to manage data by tournamentid, you could use playerid in the primary index and tournamentid in PARTITION BY. This would allow you to both optimize query performance and manage large data segments.
CREATE TABLE playstats_partition (
playerid integer,
tournamentid integer,
stattime timestampntz
) PRIMARY INDEX playerid
PARTITION BY tournamentid;
In Firebolt's UI, numeric values are automatically displayed with commas for readability (e.g., 123,456,789). However, this may be undesirable for fields like IDs or other values where commas aren’t needed.
Solution:
To remove commas from numbers in the UI, CAST the numeric field to TEXT using ::TEXT. This ensures that the number is displayed as a plain text string, without commas.
Example:
SELECT
playerid AS playerid_default,
playerid::text AS playerid_text,
nickname,
email
FROM players
LIMIT 10;
In this example, playerid_default will display with commas, while playerid_text will display the number without commas.
This method only affects how numbers are displayed in the Firebolt UI and does not alter the underlying data or its formatting in external tools.
When deciding between a fact or dimension table in Firebolt, it's important to consider how the data will be used and queried, as this choice impacts performance and how data is handled in multi-node engines.
Fact tables are typically large and contain measurable events, like sales or sensor readings. They usually hold foreign keys to dimension tables and measures that are aggregated (e.g., sums or averages). Fact tables benefit from aggregate indexes, which optimize heavy aggregations.
Dimension tables describe the entities in fact tables, such as product details or customer information. Dimension tables are usually smaller, updated more frequently, and replicated across nodes for faster lookups. Join indexes can be applied to dimensions to speed up lookup queries.
In general, choose a fact table when you need to aggregate large volumes of data, and a dimension table for smaller, descriptive datasets primarily used for lookups. For multi-node engines, keep in mind that fact tables are sharded, while dimension tables are replicated.
Yes, Firebolt is designed for ease of use, leveraging SQL simplicity and PostgreSQL compliance. It allows data professionals to manage, process, and query data effortlessly using familiar SQL commands.
Everything in Firebolt is done through SQL. Firebolt’s SQL dialect is compliant with Postgres’s SQL dialect and supports running SQL queries directly on structured and semi-structured data without compromising speed. Firebolt also has multiple extensions in its SQL dialect to better serve modern data applications.
To optimize query performance in Firebolt, follow these guidelines for selecting a primary index:
Frequently Queried Columns: Choose columns often used in WHERE clauses or joins for faster data retrieval.
Range Queries: Include columns used in range filters, like dates, to improve performance in range-based queries.
Data Distribution: Pick columns with many unique values (high cardinality) to ensure even data distribution.
Sorting: Select columns based on how data is typically sorted in queries to minimize the amount of scanned data.
For more detailed information, check out Firebolt’s comprehensive guide on primary indexes.
These steps ensure efficient data pruning and faster query execution.
Firebolt's aggregating index pre-calculates and stores aggregate function results for improved query performance, similar to a materialized view that works with Firebolt's F3 storage format. Firebolt selects the best aggregating indexes to optimize queries at runtime, avoiding full table scans. These indexes are automatically updated with new or modified data to remain consistent with the underlying table data. In multi-node engines, Firebolt shards aggregating indexes across nodes, similar to the sharding of the underlying tables.
The Shuffle operation is the key ingredient to executing queries at scale in distributed systems like Firebolt. Firebolt leverages close to all available network bandwidth and streams intermediate results from one execution state to the next whenever possible. By overlapping the execution of different stages, Firebolt reduces the overall query latency
Firebolt scales to manage hundreds of terabytes of data without performance bottlenecks. Its distributed architecture allows it to leverage all available network bandwidth and execute queries at scale with efficient cross-node data transfer using streaming data shuffle.
Warming up an aggregating index preloads the data into the cache, improving query performance. Use the CHECKSUM function on a query matching the index definition to warm up the index, leading to faster execution when it is utilized.
Solution:
Use the CHECKSUM function to preload specific data into the cache. Focus on frequently accessed columns or data ranges to optimize performance and minimize cache usage.
Example:
-- Warm-up the entire table SELECT CHECKSUM(*) FROM playstats;
-- Warm-up specific columnsSELECT CHECKSUM(GameID, PlayerID, CurrentScore) FROM playstats;
-- Warm-up specific data rangeSELECT CHECKSUM(*) FROM playstats WHERE CurrentLevel BETWEEN 1 AND 5;
Warming up tables using CHECKSUM ensures data is stored in the cache, improving performance for large tables or frequently queried datasets. Use filters or column selection to target relevant data efficiently.
Warming up a table can improve query performance by preloading data into the cache. Running warm-up queries after an engine starts ensures faster execution of subsequent queries.
Querying cold data, or data not yet cached in Firebolt's local SSD storage, may result in slightly slower performance compared to querying hot (cached) data. However, Firebolt's efficient caching mechanisms ensure that even cold data is accessed quickly, minimizing the performance impact.
Firebolt offers observability views through information_schema, allowing you to access real-time engine metrics. These insights help you size your engines for optimal performance and cost efficiency. Read more here- https://docs.firebolt.io/general-reference/information-schema/views.html
Firebolt engines can scale up and out to handle high-concurrency workloads. Firebolt supports adding up to 10 clusters within a single engine to manage spikes in concurrent queries. These clusters can be dynamically added on-demand, ensuring optimal performance even during peak loads.
Sub-plan result caching allows Firebolt to reuse intermediate query artifacts, such as hash tables computed during previous requests when serving new requests, reducing query processing times significantly.It includes built-in automatic cache eviction for efficient memory utilization while maintaining real-time, fully transactional results.
Firebolt’s engine uses vectorized execution, which processes batches of thousands of rows at a time, leveraging modern CPUs for maximum efficiency. Combined with multi-threading, this approach allows queries to scale across all CPU cores, optimizing performance.*
Boncz, Peter A., Marcin Zukowski, and Niels Nes. "MonetDB/X100: Hyper-Pipelining Query Execution." CIDR. Vol. 5. 2005.
* Nes, Stratos Idreos Fabian Groffen Niels, and Stefan Manegold Sjoerd Mullender Martin Kersten. "MonetDB: Two decades of research in column-oriented database architectures." Data Engineering 40 (2012).
Data pruning in Firebolt involves using sparse indexes to minimize the amount of data scanned during queries. This allows for tens of millisecond response times by reducing I/O usage, making your queries highly performant.
Firebolt uses advanced query processing techniques such as granular range-level data pruning with sparse indexes, incrementally updated aggregating indexes, vectorized multi-threaded execution, and tiered caching, including sub-plan result caching.These techniques both minimize data being scanned and reduce CPU time by reusing precomputed, enabling query processing times in tens of milliseconds latency on hundreds of TBs of data.
Firebolt has been benchmarked against several major data warehouses, including Snowflake, BigQuery, and Redshift. These benchmarks highlight Firebolt's superior performance in low-latency, high-concurrency queries, especially for fast aggregations and real-time analytics. See further details in our benchmark Github repo and our benchmark articles about handling concurency, high-volume ingestion and DML operations.
Firebolt is engineered to handle hundreds of analytical queries per second; without compromising speed. It offers unparalleled cost efficiency with industry-leading price-to-performance ratios and scales seamlessly to handle terabytes of data with minimal performance impact.
Drop the associated indexes manually or use the CASCADE option to automatically remove all dependencies when dropping the table.
Use the NULLIF function to convert empty strings to NULL, which can then be cast to the appropriate data type.
When casting columns to data types like DATE or NUMERIC in Firebolt, empty strings in the source data can cause errors. This occurs because empty strings cannot be directly cast to other data types.
Use the NULLIF function to convert empty strings to NULL, which can then be cast to the appropriate data type without causing errors.
Example:
INSERT INTO tournaments_nullif_example_fact
SELECT
NULLIF(dt, '')::date
FROM tournaments_nullif_example;
In this example, NULLIF(dt, '') converts empty strings in the dt column to NULL, allowing the data to be safely cast to a DATE type. This method ensures smooth casting of columns with empty strings in Firebolt.
CSV file ingestion into an external table may fail with errors such as:
Cannot parse input: expected '|' but found '<CARRIAGE RETURN>' instead.
This usually means the file delimiter in the CSV doesn't match the table definition or the number of columns differs.
To troubleshoot check the delimiter: Ensure FIELD_DELIMITER matches the CSV file's delimiter. Also, compare the file to the table definition column-by-column. Finally, if the file is large, create a temporary external table to view the entire row as a single string:
CREATE EXTERNAL TABLE ext_tmp (blob text) URL = 's3://some_bucket/somefolder/' TYPE = (CSV FIELD_DELIMITER=' ');
Example Query:
SELECT * FROM ext_tmp LIMIT 2;
This helps inspect rows and verify column consistency.
Firebolt distributes data across nodes and uses spilling to local SSDs when the working set exceeds available memory, allowing the system to scale even with limited resources.
No, Firebolt doesn't support adding new columns without rebuilding the table. However, you can create a new table with the updated schema or use views to simulate schema changes.
This ensures that the schema remains optimized for performance, which is critical in high-performance analytical databases like Firebolt.
Alternatively, Firebolt offers a flexible approach where you can create views to simulate changes like renaming or restructuring tables without needing to rebuild or re-ingest data. For instance, you can create a view that selects all columns from the original table, effectively simulating the addition of new columns:
Example Usage: To simulate renaming a table or altering its structure, create a view:
CREATE VIEW IF NOT EXISTS new_games ASSELECT * FROM games;
This approach allows you to redirect queries to the new view (new_games), making it function like a table with updated schema without altering the original table.
Firebolt enables running ELT jobs on a separate engine isolated from the customer-facing engine. This prevents disruptions and allows scaling ELT engines dynamically with auto-start and auto-stop features to reduce costs.
Yes, ELT processes can be automated using: Firebolt Python SDK for programmatic database operations. Apache Airflow for scheduling and automating complex workflows. dbt (Data Build Tool) for managing data transformations in a version-controlled environment.
CSV, TSV, JSON, and Parquet formats are supported for exporting data.
Multistage distributed execution allows complex ELT queries to utilize all cluster resources by splitting stages across nodes. This parallelization optimizes resource utilization, speeding up ELT processes.
Firebolt supports the ARRAY data type for mapping arrays from Parquet files and provides functions like ARRAY_AGG and UNNEST for working with arrays.
Yes, Firebolt supports semi-structured data types like JSON. JSON data can be ingested as text columns or parsed into individual columns for flexible schema-on-read or flattened structures.
Data transformations can be applied directly within INSERT INTO SELECT statements during ingestion. Standard SQL functions can be used to manipulate data types, perform calculations, and format strings.
Optimize file sizes (500MB–1GB), use efficient formats (Parquet), and relocate files after ingestion to avoid reprocessing.
Firebolt uses transactional semantics and ACID guarantees. Ingestion operations are fully isolated from ongoing reads or queries, ensuring consistency. There are no partial inserts or copies to clean up.
Firebolt boosts data ingestion performance through parallel processing, multi-node scaling as the engine grows, and pipelined execution for efficient resource use. Using COPY FROM enables linear scaling with the number of nodes, accelerating ingestion speed with larger engines—ideal for latency-sensitive ELT scenarios.
Start with a small node type (CREATE ENGINE ingest_engine TYPE=S NODES=1) and monitor CPU and RAM utilization via information_schema.engine_metrics_history. Scale out the engine (e.g., ALTER ingest_engine SET NODES=4) as needed to increase throughput. As a general rule of thumb, most ingestion workloads benefit from paralellism, specifically when importing multiple files. Adding to that, Firebolt will be even more efficient when files are roughly equivalent in size.
While streaming ingestion is on the roadmap, Firebolt currently don't have a native straming ability. However, Firebolt has the ability to run hiligh preforment micro-batching to persist data to S3 in Parquet or Avro format for near real-time ingestion.
Firebolt allows filtering on file-level information such as name, modified time, and size using metadata fields like $source_file_timestamp, $source_file_name, and $source_file_size.
Use the CREATE EXTERNAL TABLE
command to reference data stored outside Firebolt, like in an S3 bucket, while specifying the file format and schema.
Yes, Firebolt’s COPY FROM command can automatically create the destination table using AUTO_CREATE = TRUE
, which maps columns and creates the table when it doesn't exist.
Firebolt currently supports Parquet and CSV formats. For AVRO, JSON, or ORC, use the external table option.
Firebolt offers multiple data import options, including: COPY FROM SQL command for importing data from S3 buckets with built-in schema inference and automatic table creation. The 'Load data' wizard in the WebUI to explore, set options, infer schema, and load data into Firebolt tables. Direct read for CSV and Parquet files from S3 using read_csv and read_parquet table-valued functions. External tables for data stored in Amazon S3 buckets, supporting formats such as CSV, Parquet, AVRO, ORC, and JSON.
Firebolt is ACID compliant and treats every operation as a transaction. For example, data from a COPY FROM operation is visible only after the entire operation is successful, ensuring data integrity. This eliminates partial updates ensuring data integrity at all times.
Spilling happens when a query requires more memory than allocated, causing intermediate query results to be stored on disk (SSD) instead of in-memory. While this ensures the query completes, it may affect performance.
For more information, check the Firebolt Documentation on Engine Metrics History.
To investigate query timeouts or delays, you can start by using Firebolt’s Query History and Query Profile tools, which provide detailed insights into query performance, including execution time, memory usage, and any potential bottlenecks. You can also check engine logs and metrics using Firebolt’s Engine Metrics History to identify issues like memory limitations, network latency, or resource constraints.
For troubleshooting steps, check the Firebolt documentation on query analysis.
If AWS instance availability is low:
Change the engine instance type.
Retry after some time.
Contact Firebolt support if the issue persists.
Firebolt provides Role-based Access Control (RBAC) to help customers control which users can perform what operations on a given engine. For example, you can provide users with only the ability to use or operate existing engines but not allow them to create new engines. In addition, you can also prevent users from starting or stopping engines, allowing them to only run queries on engines that are already running. These fine-grained controls help ensure that customers do not end up with runaway costs resulting from multiple users in an organization creating and running new engines.
You can use the engine_metering_history information schema view for detailed tracking of FBU consumption.
Use the AUTO_STOP feature to automatically stop engines after a certain amount of idle time. Example:
:ALTER ENGINE MyEngine SET AUTO_STOP = 15;
For more information, read more about Engine Consumption in our Documentation.
Firebolt uses Firebolt Units (FBU) to track engine consumption. For example, for an engine with Type "S", 2 nodes, and 1 cluster running for 30 minutes, it would consume 8 FBUs:
FBU per Hour = 8 * 2 * 1 = 16 FBUs
Consumption = (16 / 3600) * 1800 seconds = 8 FBUs
Monitoring the status of your Firebolt engine using the REST API is a key step to ensure smooth operations. Firebolt provides a way to programmatically check engine status by querying the system engine. This article explains how to authenticate, retrieve the system engine URL, and query the engine status using Firebolt's REST API.To begin, ensure that you have an active service account with the necessary permissions. You will need the service account credentials to generate an access token for API authentication.
After obtaining an access token, use the following request to retrieve the system engine URL:
curl https://api.app.firebolt.io/web/v3/account/<account_name>/engineUrl \
-H 'Accept: application/json' \
-H 'Authorization: Bearer <access_token>'
Once you have the system engine URL, you can query it to check the engine's status with a simple SQL query, as shown below:
curl --location 'https://<system_engine_URL>/query' \
--header 'Authorization: Bearer <access_token>' \
--data "select status from information_schema.engines where engine_name = '<your_engine_name>'
This will return the current status of your engine, helping you monitor its activity and health.
For more information please refer to the Using the API Documentation.
Yes. The engine_running_querie and engine_query_history tables provide insights into current workloads. For more information see our Information Schema Documenation.
Use the engine_metering_history information schema view to track FBU consumption for each engine.
Firebolt provides three different observability views that provide insight into the performance of your engine.
1/ engine_running_queries - This view provides Information about currently running queries. This includes whether a query is running or in the queue. For queries that are currently running, this view also provides information on how long it has been running.
2/ engine_query_history - This view provides historical information about past queries - for each query in history, this includes the execution time of the query, amount of CPU and Memory consumed and amount of time the query spent in queue, among other details.
3/ engine_metrics_history - This view provides information about the utilization of CPU, RAM and Storage for each of the engine clusters. You can use these views to understand whether your engine resources are being utilized optimally, whether your query performance is meeting your needs, what percentage of queries are waiting in the queue and for how long. Based on these insights, you can resize your engine accordingly.
For more information, please refer to the Sizing Engines article in our Documentation.
If the engine has the AUTO_START
option set to True, an engine in a stopped state will be automatically started when it receives a query. By default, this option is set to True. If this option is set to False, you must explicitly start the engine using the START ENGINE
command. For more information, please refer to the Work with Engines Using DDL article in the Firebolt Documentation.