Architecture
Built for Efficiency

Firebolt is designed to deliver price-performance efficiency with the simplicity of a SQL-centric managed service

Price-performance efficiency and ease of use are at the core of Firebolt’s architecture. Every workload is unique; requiring composability from the data platform to address unique needs. Firebolt combines flexible infrastructure with composable data services to deliver a fast, efficient, general purpose data platform to address those needs. 

Flexible Infrastructure

Firebolt’s infrastructure is built on a distributed, decoupled architecture for flexibility and availability. 
Compute, storage and metadata are fully managed and decoupled allowing independent scaling across all three layers.

Compute Service
Multidimensional elasticity
Stateless
Controlled consumption

Compute on Firebolt is delivered in the form of on-demand, stateless engines. Each engine is composed of 1 to 10 clusters; with each cluster composed of 1 to 128 compute nodes with a choice of node types. This multidimensional elasticity helps meet various performance requirements, supporting scale-up, scale-out, and concurrency scaling. Workloads can run on a single engine or across multiple read-write engines accessing the same, shared data, optimizing both cost and performance while ensuring workload isolation. Integration and operation are streamlined with an easy-to-use SQL API, facilitating engine management and online scaling.

Metadata Service
ACID transactions
Read/writes
Any engine, any data

The metadata service is central to Firebolt, maintaining consistency across its distributed architecture, regardless of the number of nodes, clusters, or engines. It ensures transactional integrity and strong consistency for distributed writes within the cluster. Furthermore, isolated reads and writes can be done from any provisioned cluster to any data managed by Firebolt. This service also underpins security and observability, promoting a secure, transparent operational environment. Metadata access is simplified via information_schema objects for easier management and eco-system integration.

Storage Service
Columnar
Indexed and hybrid
Native data lake

Firebolt managed storage service merges the speed of block storage with the scalability of object storage, enhanced by features like tiered storage and adaptive prefetching. It utilizes a columnar format for efficient data compression, organizing data in tablets with sparse indexing to deliver effective data pruning and rapid response times. This fully manage storage provides performance-and capacity efficiency. Alternatively, for those requiring integration with the data lake, breadth of native file formats is supported for efficient data exploration and ingestion. This allows direct querying of various open file formats (including PARQUET, JSON, CSV, TSV, AVRO, and ORC) stored on Amazon S3, utilizing external tables.

Data Services

Data Services are building blocks that transform infrastructure into a powerhouse of performance and efficiency. Each of these blocks are essential as they push the boundaries of technology needed to deliver modern analytics workloads. The focus is to deliver capabilities through intelligent choices, so that cost and complexity are minimized. 

Data Management
Simplified data onboarding
Fast updates and deletes
Distributed writes

Firebolt offers the versatility to handle both structured and semi-structured data efficiently. For structured data, it facilitates easy migration of both normalized and denormalized models. For semi-structured data, Firebolt supports complex data types and provides a rich library of functions and Lambda expressions for data processing.

In Firebolt, data lifecycle starts right at ingestion, utilizing schema inference and parallel processing for rapid batch data onboarding. Data is then sorted, compressed, indexed, and stored efficiently as tablets. Similarly, trickle inserts, updates and deletes are handled efficiently to ensure that the data warehouse is always up to date.

Crucially, Firebolt enables ACID transactions, coordinating data management activities through its metadata service for strong, global consistency across its distributed system. This guarantees data integrity and consistency even in complex, multi-engine environments.

Query Processing
Multimodal optimizer
Distributed multithreaded and vectorized
Multistage execution

Firebolt's query processing service is tailor-made to offer low latency and scalable processing, using a resource-aware admission control layer that scales to manage high concurrency. The process begins with an optimizer that uses historical and cost-based methods to create efficient execution plans, considering data distribution and index availability and learning from past queries to enhance consistency.

The execution of these plans occurs on a distributed runtime system that employs multithreading and vectorized processing, optimizing operations through tiered caching, sub-graph reuse, and resource-aware scheduling. This runtime is designed for efficient memory use and leverages distributed shuffle techniques to maximize the use of cluster resources.

Security
Layered
SQL object model
Shared responsibility

Pervasive security and governance starts with a framework designed around Organizations & Accounts, integrating Single Sign-On (SSO), Role-Based Access Control (RBAC), and Network Policies. This structure allows for precise control over user access and permissions, ensuring that only authorized personnel can access specific resources or data. Together, these features form a secure foundation for managing and safeguarding your data and resources.

Observability
Security
Performance
Consumption

Firebolt enhances observability by providing insights into workload patterns, consumption and billing, facilitating efficient resource management. Observability can be extended with Open Telemetry integration to existing data collection and analysis tools. This streamlined approach enables DevOps/Site Reliability Engineering (SRE) teams to effectively monitor and manage the environment.

Workspace
Develop
Configure
Govern

Delivering insights or data products requires multiple personas to collaborate together such as data architect, data engineer, analytics engineer, developer and DevOps. Firebolt workspace is designed to foster collaboration and provide visibility into the data lifecycle. Security and governance, data modeling, data exploration, SQL development, consumption and performance management are all performed within dedicated areas.

Workloads

Workloads leverage heterogeneous APIs to access the data platform. To address these needs, Firebolt provides developers with SQL API, SDKs and CLI to iterate and deliver data products fast.

SQL API
SDK
CLI

On Firebolt, everything is in SQL, allowing the use of existing skills while ensuring rapid integration and faster time to insights. To enable flexibility for developers, Firebolt provides SDKs for Python, Java, .Net, Node, and Go, accelerating development in these frameworks. With these SDKs, developers can programmatically control and optimize data workflows, from ingestion to analysis, within their custom applications or integrate with open-source tools such as Airflow and dbt. Finally, the Firebolt CLI provides a scriptable interface for developers to efficiently manage resources such as engines, databases, tables, and queries via the command line.