FAQ

Find quick answers to common questions about Firebolt

How can query performance be optimized when querying event data with minute-level granularity in Firebolt?

One approach is to restructure the table by setting the primary index on event_time to better leverage Firebolt’s indexing capabilities. Additionally, an aggregating index on event_time can be beneficial. However, if queries still take longer than expected (e.g., 15 seconds for 30 days of data), it may help to review: - The structure of the primary index and ensure it aligns with the query’s filtering. - Whether unnecessary dimensions are included in the dataset, increasing granularity unnecessarily. - If joins or aggregations can be optimized, possibly through pre-aggregated tables. Firebolt’s architecture is designed to improve query efficiency by avoiding costly full scans and optimizing indexing structures.

Will the creation of an engine automatically result in the creation of the underlying cluster(s)?

Yes. By default, creating an engine would result in the creation of the underlying engine clusters and start the engine. This would enable the engine to be in a running state where it is ready to start serving the queries. However, you have the option to defer the creation of the underlying clusters for an engine by setting the property “INITIALLY STOPPED” to True while calling CREATE ENGINE. You can start the engine at a later point, when you are ready to start running queries on the engine. Note that you cannot modify this property after an engine has been created.

CREATE ENGINE IF NOT EXISTS MyEngine WITH

TYPE = “S” NODES = 2 CLUSTERS =1 START_IMMEDIATELY = FALSE;

Engines

FAQ

Will the creation of an engine automatically result in the creation of the underlying cluster(s)?

Can Firebolt handle complex data types like JSON during ELT processes?

What versions of Presto does Firebolt support?

How do you protect against DDoS attacks?

Can I query the output of another query in Firebolt?

Is Firebolt billing real-time?

How do I start and stop engines in Firebolt?

How do I handle complex structures like arrays in Parquet?

How are SQL errors handled in the SDK?

How does Firebolt protect customer data?

How can I make sure that my engines are not sitting idle and incurring infrastructure costs?

What is multistage distributed execution in Firebolt, and how does it improve ELT operations?

How do I handle access errors when using service accounts in Firebolt?

Is IP Allow-listing supported for tenant connections?

What happens when an engine receives queries while it is in a stopped state?

What options does Firebolt provide to export data?

How do I work with authentication tokens in Firebolt?

Can customers federate users to their chosen identity provider (IdP)?

How do I monitor the performance of my engine to understand whether it is optimally configured?

What file formats are supported by 'COPY TO'?

Is Multi-Factor Authentication (MFA) supported?

How can I monitor the consumption of engines?

Can I automate ELT processes with Firebolt?

Does Firebolt support RBAC?

Can users see what is currently running on a specific engine when it is at 100% utilization?

Our ELT jobs are expensive and impact customer-facing dashboards. How can Firebolt address this?

Is data sovereignty supported?

How can I check Firebolt engine status via the REST API?

Can Firebolt support adding new columns to existing tables without rebuilding the entire table?

Is encryption supported for data at rest and in motion?

How do I calculate the consumption of engines?

How does Firebolt address complex queries that exceed physical memory capacity?

How does Firebolt mitigate SQL Injection attacks?

How can I make sure that my engines are not sitting idle and incurring infrastructure costs?

How do I troubleshoot errors when ingesting CSV files into Firebolt using external tables?

How does Firebolt protect against malware, zero-day exploits, and other runtime threats?

How can I monitor the consumption of engines?

How do I resolve 'Unable to cast' errors during CSV ingestion related to empty strings?

What is Firebolt’s process for data retention and deletion?

I want to ensure that some of the engines in my account are accessible only to certain users. What mechanisms does Firebolt provide to help control what operations users can perform on engines?

How do I drop a table with indexes in Firebolt?

How does Firebolt handle external vulnerability reporting?

How can I troubleshoot issues with Firebolt engines not starting due to low AWS instance availability?

What backup and recovery options are available in case of Availability Zone (AZ) failure?

How can I investigate query timeouts or unexpected delays in Firebolt engines?

Does Firebolt provide insurance coverage?

What is spilling, and how does it work?

How to know the ACS URL for a certain organization?

What is the impact of stopping an engine on performance?

What is the system engine, and how is it used for metadata-related queries?

Can engines be resized dynamically during operation?

What happens to the queries that have already been run during an engine resize?

How can I check the last time an engine was started or stopped?

Could you please give explanation, when it makes sense to scale engines with more clusters, when with higher amount of nodes and when with bigger engines?

Do we need to create new service account for new firebolt version or exist version will work fine?

Do you charge extra for AI features like vector search?

How does Firebolt’s AI-powered data warehouse differ from competitors like Snowflake, Amazon Redshift, and Google BigQuery?

Does Firebolt support real-time AI workloads?

How does Firebolt’s vector search compare to dedicated vector databases?

How does Firebolt ensure low-latency performance for AI applications?

How can users ensure that their Firebolt service account setup is working correctly?

How can Firebolt users optimize query performance by leveraging primary indexes?

How can users extract and save queries in Firebolt for collaboration?

What are the best practices for structuring queries in Firebolt for performance optimization?

How do aggregating indexes work in Firebolt, and what are their trade-offs?

Does Firebolt provide an execution plan for queries, similar to Athena?

How can Firebolt be integrated with Apache Superset for visualization?

How should large table joins be handled in Firebolt to optimize query performance?

Does Firebolt always require creating additional tables, or can large joins be handled directly in SQL as with other warehouses (e.g., Snowflake)?

What is the recommended approach for migrating an existing Firebolt environment to a new organization or domain (for example, if a team is switching from one AWS org to another)?

Is it better to insert data into Firebolt one row at a time or in batches for real-time workloads?

How can organizations optimize queries that filter on high-cardinality timestamp columns (such as a ‘closed_at’ column)?

Does Firebolt still require manual vacuuming, or is there an automatic process to reclaim storage space and optimize performance? Should vacuuming be done on a dedicated engine?

How does Firebolt handle query performance as data within a single tenant expands?

How can organizations estimate the appropriate Firebolt engine size and associated costs for a given query load?

In the absence of true streaming, how can a Node.js application handle large Firebolt query results without running out of memory?

What ingestion throughput can organizations expect, and how does Firebolt handle large batch loads or full refreshes?

How should we structure the primary indexes?

Does Firebolt offer ongoing query-optimization assistance, and is there an extra cost associated with this service?