We use cookies to give you a better online experience
Got it
Table of Contents
Knowlegebase

Selecting Your Firebolt Engine's Compute Family

Cole Bowden

TL;DR Storage optimized engines are the default in Firebolt, but compute-optimized engines can be a more cost-effective option for workloads that don't require large caches. You'll likely need to do some experimentation and monitor your engine's resource utilization to figure out exactly what engine size and configuration will be best for your workload, but as a rule of thumb, queries on small datasets or selecting the same small subset of your data can take advantage of compute-optimized engines to make Firebolt more efficient for you.

Firebolt has two engine compute families: storage-optimized and compute-optimized. The sole difference between these two compute families is hardware - they both run the same software in exactly the same way. This guide intentionally keeps hardware details high-level to ensure long-term accuracy as Firebolt offerings continue to evolve, but with guidance that should help you reach the right decision for which compute family is correct for you.

Storage-optimized vs. compute-optimized

The storage-optimized compute family is the default in Firebolt. It uses a storage-optimized compute instance from AWS EC2, which comes with proportionally more SSD space and memory (RAM) relative to vCPUs.

The compute-optimized compute family is the other option in Firebolt. It uses a compute-optimized instance from AWS EC2, which comes with proportionately more vCPUs relative to SSD space and memory.

  • When your node type is the same size: The two compute families have the same number of vCPUs, but the storage-optimized engine will have more memory and disk space. Because it comes with more hardware overall, a storage-optimized engine will consume twice as many FBUs as an equivalently-sized compute-optimized engine.
  • When your FBU consumption rate is equivalent:  The compute-optimized engine will have more vCPUs and less memory and storage; the storage-optimized engine will have fewer vCPUs and more memory and storage.

Selecting your compute family

Because of the differences in hardware arrangement and the different dimensions on which you can scale a Firebolt engine, it can be difficult to prescribe a correct compute family without experimentation.

Our recommendation is to start with a small or medium storage-optimized engine. Firebolt Engines cache data locally, which helps serve queries at low latencies. The cache size provided by the engines varies depending on the type of node used in your engines, with each increase in node size doubling the size of your cache. Storage-optimized instances have approximately four times the cache size of compute-optimized instances. Thus, starting with a large cache is the most reliable way to ensure smooth performance.

However, as compute-optimized instances can be more cost-effective for workloads that can leverage them, you should consider whether a compute-optimized engine is right for you. The factors that can play into this include:

  • Having a small amount of data or primarily accessing a small subset of your data.
  • Having queries that are computationally complex. Note that, this is less true if the computational complexity involves joins on large tables, where having sufficient memory is important to avoid spilling.
  • Having high concurrency on an engine, with many queries accessing the same data at the same time.

Ultimately, many factors will determine the resources your engine needs to use. The cardinality of your data, the exact shape and size of the queries being run most often, the complexity and size of joins, and the amount of data being scanned can all influence the exact resource allocation that is best for your workload.

Evaluating engine utilization

The most foolproof way to determine the resources you’ll need is to run some queries in Firebolt. By using your own data and your own queries, you can see exactly what resources the engine is consuming to handle that workload.

There’s two ways to understand this: use the Firebolt UI, or use engine metrics history:

SELECT
    *
FROM
    information_schema.engine_metrics_history
WHERE
    event_time > now() - INTERVAL '2 hours'
LIMIT 100;

The engine monitoring tab in the UI is another approach:

Because we’re interested in the relative ratio between CPU, memory, and disk utilization when considering a compute family, this does provide the key data necessary to make decisions about an engine’s recommended compute family.

Before choosing your compute family, it’s important to make sure that your engine is at least somewhat right-sized for your workload. Engine metrics history can inform you of issues like spilling to disk and suspended queries that may require more scaling. Once you’ve taken care of that, you can evaluate the proportional resource utilization to see if it makes sense to switch to compute-optimized:

  • If memory and disk are below 25% utilization, as they are in the chart above, you should swap to a compute-optimized engine.
  • If memory and disk are above 25% utilization, but below 50% utilization, you should increase your node type by one size and switch to a compute-optimized engine. Note that, this will cost the same amount, but it will come with double the processing power, potentially improving query performance.

Changing an engine’s compute family

Like other engine properties, there’s two ways to set the compute family for your Firebolt engine once you’ve settled on the correct compute family for your workload. The first is via the UI, where you click Modify Engine, enter Advanced Settings, and use the dropdown to change your compute family.

The other approach is with SQL:

ALTER ENGINE my_engine SET FAMILY = "COMPUTE_OPTIMIZED"

Changing your engine type on an engine that has already started may take some time, especially if you have queries running on that engine. It also will invalidate your cache and decrease query performance until the cache can re-populate, so be cautious about doing this on a busy engine in production.

Maximizing your price-performance

By selecting the correct compute family and right-sizing your engine to your workload, you can be sure that you are getting industry-leading price-performance for your SQL queries. It’s recommended to repeat this process if your data or queries meaningfully change, as you want to be sure you have enough capacity without spending too much on resources you aren’t leveraging.

Go try it out in Firebolt today.