Scalable infrastructure for next-gen Cloud Data Warehouse

Introduction

Businesses in various sectors like Financial Services, Healthcare, Manufacturing, and e-commerce rely on data-driven insights to support daily decisions and long-term strategies. Some are even monetizing insights for external use through fast, interactive analytics. The journey from gathering and transforming raw data to delivering insights at scale has led to the dynamic demand for computing and storage resources in a data warehouse. Traditional cloud data warehouses address these demands by decoupling compute and storage and providing infrastructure elasticity. Still, they do not adequately address the low latency and high concurrency needs of emerging workloads and data-intensive applications. To handle this, the only options in traditional cloud data warehouses are to overprovision infrastructure or add expensive caching layers to address these demands, resulting in complexity and high total cost of ownership (TCO) challenges. 

As a fully managed and purpose-built cloud data warehouse, Firebolt tackles the above challenges squarely while providing efficient infrastructure to optimize costs. This article will explain Firebolt’s key functionality to deliver secure, flexible, and scalable data infrastructure that companies can leverage.

Firebolt Engines

In Firebolt, the compute resources used to process data and serve queries are called engines. Before we discuss how to operate Firebolt engines, let’s first understand the Firebolt architecture at a high level. 

Firebolt’s infrastructure is built on a distributed, decoupled architecture for flexibility and availability. Compute, storage, and metadata are fully managed and decoupled, allowing independent scaling across all three layers. This three way decoupled architecture between compute, storage and metadata, ensures that businesses can dynamically adjust their infrastructure to meet changing demands efficiently.

First, Firebolt decouples compute from storage. This approach is backed by managed, centralized, persistent storage in the form of Amazon S3, enabling data to be shared and accessed independently by multiple workloads, each with dedicated computing resources. As a result, each workload is fully isolated from interference from other workloads and can scale compute resources independently and independent of storage.

Next, Firebolt decouples compute from the metadata. Firebolt implements a highly available and scalable metadata service where all metadata information is persisted This metastore allows different engines to see the same and consistent view of the metadata, enabling Single System Image configuration (see Figure 1 below). Each engine can execute all Firebolt-supported SQL operations, which are fully transactional and ACID compliant. Furthermore, committed transactions performed through any engine are immediately visible via any other active and running engine.

Figure 1: Three data applications running isolated from each other while sharing access to the same data.

   

The following are some key attributes that define Firebolt engines:

  • Name: Engine names are logical engine identifiers unique within a given account.
  • Type: This attribute represents a compute node used as a building block. Compute nodes come in Small, Medium, Large, or X-Large sizes. Vertical engine scaling (scale-up or scale-down) is supported through this attribute.
  • Nodes: This attribute represents a number (1 - 128) of compute nodes, allowing granular horizontal scaling to fine-tune query performance characteristics while avoiding overprovisioning and unnecessary cost. Both scaling in and out are supported.
  • Clusters: : A cluster is a collection of compute resources, described by “Type” and “Nodes” attributes. One Firebolt engine can contain one or more clusters. The maximum number of clusters is specified by the Clusters attribute. Only homogeneous cluster configurations (clusters with the same number of nodes and Type) are supported within a single engine. Clusters attributed are leveraged to support query concurrency scaling.
  • URL: The client application uses the engine URL (or an endpoint) to submit a query for processing. Engine URL, described by the engine name, allows clusters to be transparently and dynamically added (or removed) from a single engine without any impact on the client application.
  • Auto_start: Configures the engine to be automatically started when a new query is issued, assuming the engine is in the stopped state. 
  • Auto_stop: Configures the engine to stop automatically after a specified idle time. Auto_stop, combined with Auto_start, acts as a consumption control mechanism for intermittent workloads or workloads that run during specific intervals.  

Below are a few examples of engine and engine cluster configurations.

Figure 2:  A Cluster with a collection of four medium-sized nodes.
Figure 3: A Firebolt engine that contains two clusters . Each cluster contains four medium-sized nodes.

Now that we have defined Firebolt engines, in the following sections, we will discuss how engines provide workload isolation and independent scaling, how to scale them along multiple dimensions instantly, and the metrics Firebolt provides to monitor and size engines. We will also discuss how Firebolt provides seamless engine upgrades without incurring service downtime and the granular security controls provided to enable desired authorization checks.

Operating Engines

Let’s look at how users can perform core Create, Read, Update, and Delete (CRUD) operations on engines using the SQL API. 

Creating engines

Firebolt engines have attributes that users configure for desired behavior. Among other options, users can specify whether an engine should start running on creation or whether it should start later, giving users the flexibility to create engine specifications as part of their workflows but then start these engines at a later point when these workloads are needed to process and query the data. As mentioned above, Firebolt engines also have cost-saving features such as automatic start and stop. These flexible options allow users to deploy and use Firebolt engines only when needed, leading to cost savings and lower TCO.

Due to metadata, data, and compute decoupling, a given database can be used with multiple engines. Similarly, a single engine can work with multiple databases. 

Users can create an engine using the SQL query as shown in the below example:

1CREATE ENGINE IF NOT EXISTS  MyEngine WITH 
2TYPE  = M  CLUSTERS = 2  NODES = 4  AUTO_START = TRUE AUTO_STOP = 15 
3INITIALLY_STOPPED = TRUE  DEFAULT_DATABASE = MyDatabase;

The above example creates an engine, MyEngine, which has two clusters, each with four medium-sized nodes. The AUTO_START option configures the engine to be automatically started when a new query is issued, assuming the engine was previously in the stopped state. The INITIALLY_STOPPED option indicates that the engine will not be started immediately but later when the user needs the engine to be available for processing queries. The AUTO_STOP option configures the engine to stop automatically after 15 minutes of idle time. Users can specify a default database for a given engine at the time of engine creation using the DEFAULT_DATABASE option.

Starting and stopping engines

Engines can be started or stopped using the START/STOP commands as shown below:

1START ENGINE MyEngine;
2STOP ENGINE MyEngine;

Like all other objects in the Firebolt object model, Firebolt provides an information schema view to provide visibility into all available engines within a given account. A standard SQL query, like the one below, can fetch this information:

1SELECT * FROM INFORMATION_SCHEMA.ENGINES;

Note: The  information_schema.engines view implements row-level security and shows only engines that a given user can work with and operate. On the other hand, administrators can see and query all the engines.

Modifying engines

We can examine why users need to evolve and adjust engine configurations. 

Users can only sometimes reliably predict when workloads change or what the magnitude of these changes are. To help deal with unpredictable workload changes quickly and without any interruption, Firebolt enables dynamic and fully online engine scaling operations. With multi-dimensional scaling, Firebolt offers the flexibility to fine-tune engine price-performance characteristics and allows users to dynamically scale their compute resources based on the workload requirements. 

Let’s explore a few cases and scenarios. 

  1. Scaling engines up/down

Users may need to increase the memory available on their engine for specific workloads. For instance, a reporting application can see a sudden increase in the amount of data it needs to process, resulting in a need for more RAM. Firebolt users can quickly scale up from a Large (L) node to an X-Large (XL)  node to address the memory needs of this workload. 

Figure 5: Scale-up operation allows users to resize the nodes used in an engine. In this example, the node is resized from a Large to an X-Large node.

To resize their engines, users can execute the following SQL statement:

ALTER ENGINE IF EXISTS  MyEngine SET TYPE  = XL;

Like scaling up, users can freely scale their engine down by setting the node size smaller than what’s currently in use. For example, once the workload calms down, users can scale their engine down once the workload calms down by modifying the MyEngine engine to use medium-sized nodes (TYPE = M).

  1. Scaling engines out/in

There may also be a need to increase the performance of a given query or the workload. A perfect example of this situation is performing large data load operations where users need to process large amounts of data. In such scenarios, users can horizontally scale their engines to distribute the load across additional nodes, increasing query speed and throughput and leading to better workload price-performance characteristics (Figure 6). 

Figure 6: Scale-out operation that demonstrates horizontal engine scaling by increasing the number of nodes. An engine with four Medium-sized nodes is scaled to an engine with six.

To resize their engines, users can execute the following SQL statement:

1ALTER ENGINE IF EXISTS  MyEngine SET NODES = 6;  -- scaling-out the engine to 6 nodes
2ALTER ENGINE IF EXISTS  MyEngine SET NODES = 3;  -- scaling-in the engine to 3 nodes

  1. Cloning engine clusters:

Firebolt users can create new clusters (additional clones of a given cluster) to address dynamic and variable concurrency needs that workloads and data applications may exercise. Adding clusters to a given engine helps address the increased concurrency (Figure 7). When new clusters are present, the system automatically detects newly added compute capacity and smartly sends queued queries to balance the workload while increasing system throughput. When new capacity is onboarded, Firebolt dequeues queued query requests and utilizes newly added capacity automatically without any changes to the data application. 

One thing to note is that all clusters in an engine are created with the same configuration (same type and number of nodes). Clusters with different configurations are not allowed in an engine.

Figure 7: The picture above shows the engine state after an additional engine cluster is added to increase concurrency

1ALTER ENGINE MyEngine IF EXISTS SET CLUSTERS  = 2;    -- new engine cluster is added 

  1. Fast and online engine scaling:

All engine operations described above are fully online and do not require engine restart, providing  system availability and uninterrupted operation while adjusting the engine to meet target workload needs. To achieve this behavior, Firebolt provisions a new engine cluster with the updated configuration (specified node type, updated number of nodes, new cluster and configuration number of clusters). Once the new engine is provisioned, new workload queries are directed to the newly provisioned cluster(s).  Existing queries (submitted before the scaling operation) are completed on the cluster(s). Once old queries are completed, the old cluster is removed.

Engine scaling operations are fully online and instantaneous. Scaling engines takes seconds.

Note: Although not shown through examples above, users can change multiple engine properties at a given time. With this, both scaling an engine up and out is possible to be done using one ALTER ENGINE statement.

Dropping engines

Users can delete engines using the DROP ENGINE command, as shown below. We expect this to be a rare operation, but drop engine operation is nevertheless supported. 

The statement below demonstrates how engines can be dropped:

1DROP ENGINE MyEngine;

Note: Running engines cannot be dropped. To drop an engine, the engine must be stopped first.

Separating workloads

As mentioned earlier, Firebolt’s infrastructure is built on a distributed, decoupled architecture for flexibility and availability. Compute, storage, and metadata are fully managed and decoupled, allowing independent scaling across all three layers. This three way decoupled architecture between compute, storage and metadata, ensures that businesses can dynamically adjust their infrastructure to meet changing demands efficiently. 

This three way decoupled architecture is critical to how Firebolt provides flexibility and scalability for companies running a wide range of workloads. Firebolt’s flexible infrastructure allows for the efficient management of multiple workloads with varying resource requirements. Each workload can scale independently, ensuring optimal performance and resource utilization without interference. 

These workloads may run on the same/shared datasets, while each workload may have different and varying performance and resource requirements. For example, users may need to run a reporting dashboard once a day, an ingestion scenario that runs every 4 hours, and an application that serves customer queries 24x7 (see Figure 4 below). In addition, these workloads may have distinct CPU, memory, and, ultimately, performance requirements.

To meet the varying needs of the different workloads, Firebolt users can spin up and down engines as and when needed. As such, each engine may have different and independent configurations (type, number of nodes, and number of clusters) tailored to meet the performance demands of the workload that it supports.

Examples below demonstrate how Ingest, Reporting and Analytics engines are created, each with different engine configurations while all accessing the same dataset:

1CREATE ENGINE IF NOT EXISTS  IngestEngine WITH TYPE  = L NODES  = 2 CLUSTERS  = 1;
2CREATE ENGINE IF NOT EXISTS  ReportingEngine WITH TYPE  = L NODES  = 1 CLUSTERS  = 1;
3CREATE ENGINE IF NOT EXISTS  AnalyticsEngine WITH TYPE  = M NODES  = 4 CLUSTERS  = 2;

Firebolt engines leverage highly performant local NVMe SSDs to cache data that is frequently accessed to deliver performance and low latency responses for large working datasets. Due to full engine separation and isolation, each engine has its local SSD cache and can be fully tuned to meet the performance needs of the workload without impacting workloads on other engines.

Figure 4: The picture above shows multi-engine Firebolt deployment tuned for each workload.

Sizing Engines

Previous sections described Firebolt functionality that allows users to scale their infrastructure as their workloads demand dynamically. However, how does one realize that engine change is needed?

To help users understand current engine usage and utilization and to understand when engine resizing is needed, Firebolt provides observability views through the information_schema schema. Three engine observability views are stored in the information_schema schema: 1) engine_metrics_history and 2) engine_running_queries and 3) engine_query_history

engine_metrics_history view

The engine_metrics_history view gathers engine resource utilization metrics at a given time (snapshot). Utilization snapshots are captured every 30 seconds and retained for 30 days, allowing users to understand engine utilization and consumption trends. Let’s understand the metrics:

  • cpu_used: Amount of CPU consumed as a percentage of the total CPU available for the engine cluster.
  • memory_used: Amount of memory consumed as a percentage of the total RAM available for the engine cluster.
  • disk_used: Amount of disk space consumed as a percentage of the total SSD capacity available for the engine cluster.
  • cache_hit_ratio:  shows how often customer queries are executed using the data in the local SSD cache without fetching data over the network from S3.
  • spilling_size: The amount of disk space used to store data spilling over from memory.

engine_running_queries view

The engine_running_queries view exposes information about queries currently running or waiting to be run in the system. Specifically, the STATUS column depicts whether a given query is being executed or awaiting execution. We will describe how this information can help with engine sizing in the scenarios below. 

Observability scenarios

Let’s discuss a few scenarios and determine how engine observability views can be used to help resize the engine. 

  1. Scaling engine up to improve performance

Let’s assume we have a scenario where engine_metrics_history shows an increase in both the amount of memory consumed and the amount of data spilled onto disk. The growth of spilled data indicates that the workload could benefit from additional memory (RAM).

To help with the above, Firebolt users can scale up their engine by modifying its TYPE property. Additional memory helps complete the task in memory, resulting in improved performance.

Figure 8: The above picture shows the engine scale-up operation, going from L to XL size (increase in memory).

  1. Scaling engine out to improve performance

Let’s consider a scenario where a complex analytical query, containing joins across multiple fact-tables, is being executed. By querying the engine_metrics_history table, the user notices that CPU and memory consumption spiked and the query spilled, leading to sub-optimal query performance. By looking into the query plan, the user also notices that a large data shuffle operation is happening. Based on this information, the user scales out their engine to use more nodes (going from 4 to 10). After re-executing the query on the larger engine, query execution is partitioned across more nodes, leading to faster query completion.

Similarly, performing large data ingestion is a perfectly parallelizable process. When querying engine_metrics_history, the user notices that resources are currently fully utilized. The user scales the engine by adding more nodes, driving faster data ingestion.

Figure 9: The above picture shows a user scaling out an engine with additional nodes.

  1. Adding engine clusters to increase concurrency

Let’s consider a scenario where the workload runs on an engine with a single engine cluster. The user notices that some queries exhibit higher latency than anticipated. The user queries the engine_running_queries view and notices a spike in the number of queued queries. To increase engine throughput and allow more concurrent queries, the user resizes the engine by adding a new cluster. Once the new cluster is added, the user notices that queued queries have started to run, allowing more concurrent queries to be processed by the system.

Figure 10: Engine with 2 clusters of type L nodes.


Upgrading Engines

Firebolt constantly evolves its functionality. We frequently upgrade our service to bring our customers new features, security updates, and performance enhancements. 

However, many mission-critical workloads and customer-facing data applications come with strict SLAs, and cannot afford downtime. Any manual upgrade solution requires service downtime or entails a significant operational burden on the user to minimize downtime. Planning and scheduling service downtime to apply the latest versions may delay critical security vulnerability fixes, introducing significant risks for companies.

Firebolt provides an automatic upgrade process with zero downtime to ensure users get timely upgrades seamlessly. Firebolt follows rigorous checks during the upgrade process to ensure no performance degradation to user workloads post-upgrade. Firebolt transparently creates additional cluster(s) that automatically come with the new software version. Before switching existing workloads to the cluster(s), Firebolt transparently replays query workloads on the upgraded cluster and validates workload performance characteristics. Once validation is complete and performance is on par with the workload performance on the old software version, the workload is cut over to the upgraded cluster.. The old cluster is then removed from the engine after all running queries complete their execution. All these tasks are completed seamlessly behind the scenes without any intervention from the user side.

In addition, Firebolt offers a preview option where companies can subscribe to the latest version of our software in their development or test environments, allowing validation of about-to-be-released software versions. Once new features are validated, new software versions can be automatically rolled into the production environment.

Securing Engines

Maintaining performance SLAs for data-intensive applications can sometimes come with challenges. It’s not a rare occasion to see accidental and "poor quality" queries being submitted in a production environment that may severely impact SLAs - especially when an unwanted production user is submitting such queries. Similarly, organizations desire to implement cost-control measures to ensure users only create resources that fit the needs of their business unit and predefined budgets. For example, the Marketing department may only need engines with type ‘S’ and nodes 4, but the data engineering team may need engines with bigger nodes and scale factors.  

To help organizations provide strict governance over data access and infrastructure costs, Firebolt provides strong security and governance mechanisms. Account-level environment isolation and Role Based Access Control (RBAC) are built to address these needs.

Firebolt allows the creation of multiple accounts within a given organization. Each account can represent separate environments, such as development, test, or production. In addition, the Firebolt RBAC model enables granular control over resources that are being created. Firebolt accounts and engines are securable objects, allowing administrators to fully control users' actions over these objects. Administrators can ensure that only principals who are supposed to use the production environment and production engines are authorized to access those resources by granting necessary permissions. Both account and engine objects come with flexible permissions for administrators to implement fine-grained access-control policies. 

For example, let us say that we have an engine “myEngine” in account “myAccount” serving production workloads that we want to restrict to one user, Kate. Firebolt supports this scenario by using standard SQL commands as shown below:

1CREATE ROLE prodAdminRole;
2GRANT USAGE ENGINE ON myEngine IN ACCOUNT myAccount TO prodAminRole;  
3--grants ability to use the engine
4GRANT OPERATE ENGINE ON myEngine IN ACCOUNT myAccount TO prodAdminRole; 
5-- grants ability to start/stop the engine
6GRANT ROLE prodAdminRole TO USER kate;  
7-- grants prodAdminRole to kate

Note that even though user kate can start and stop the myEngine engine, user kate cannot create any other engines or delete myEngine engine (intentionally or unintentionally).

For more information about engine-level permissions, refer to our RBAC documentation.

Summary

Modern data-driven businesses grapple with a continuous increase in the amount of data stored and processed by heterogeneous workloads with varying performance requirements. In addition, these workloads are dynamic and experience sudden and unpredictable changes in their infrastructure needs. In this whitepaper, we saw how Firebolt offers a next-generation, flexible cloud infrastructure to meet the demands of these large-scale analytic workloads, providing capabilities such as workload isolation, multidimensional and granular scaling, instant elasticity, zero-downtime upgrades, and flexible security model. 

Firebolt’s flexible infrastructure, built on a distributed and fully decoupled architecture, is key to delivering scalable, efficient, and high-performing data warehouse solutions. By decoupling compute, storage, and metadata, Firebolt provides unparalleled flexibility and availability, allowing businesses to meet their dynamic and growing data needs efficiently. With workload isolation, multiple workloads scale independently and share the same data without impacting one another. Firebolt's multidimensional scaling capability allows dynamically adding or removing resources in small increments across multiple dimensions, enabling precise resource optimization for each workload and delivering insights at the lowest total cost of ownership (TCO) possible. With automatic updates, Firebolt also minimizes any operational overhead due to service upgrades and helps maximize the uptime of critical workloads. To help customers meet compliance and governance regulations across different verticals such as Banking, Financial Services, and Healthcare, Firebolt provides flexible security to implement granular access control to compute resources.

Send me as pdf