Choosing the right data warehouse and analytics infrastructure on Amazon Web Services (AWS) can be confusing. Part of the problem is understanding the differences between the older on-premises warehouses and three generations of cloud data warehouse and query engine technologies that have all been evolving over time.
- 1st first generation cloud data warehouses, such as Redshift, which was built on ParAccel, that helped “Shift Red” (Oracle) and other data warehouse deployments by porting data warehouse technology to the cloud and simplifying deployment
- 2nd generation cloud data warehouses, such as Snowflake, which improved scalability by separating storage and compute, and simplified administration
- 2nd generation open source query engines such as Presto, or Amazon Athena, which provided federated query engines on top of multiple data sources
- 3d generation cloud data warehouses, such as Firebolt, which improved performance and lowered costs by leveraging some of the latest innovations in data warehousing, providing more control over resources, and not charging a markup on cloud computing resources
Comparison - Redshift, Athena, Snowflake, Firebolt
What follows is a side-by-side comparison of the options on AWS across 1st, 2nd, and 3rd generation data warehouses and query engines.
Summary
The detailed comparison of Redshift, Athena, Snowflake, and Firebolt across architecture, scalability, performance, use cases and cost of ownership highlights the following major differences:
- Redshift, while it is arguably the most mature and feature-rich, is also the most like a traditional data warehouse in its limitations. This makes it the hardest to manage, and costly overall for traditional reporting and dashboards, and not as well suited for the newer use cases.
- Athena is arguably the easiest, least expensive and best suited for “one-off analytics”. But it is also the most limited, and requires you to manage your own (external) storage and ingestion very well, which is especially hard for continuous ingestion.
- Snowflake as a more modern cloud data warehouse with decoupled storage and compute is easier to manage for reporting and dashboards, and delivers strong user scalability. It also runs on more than AWS. But like the others, Snowflake does not deliver sub-second performance for ad hoc, interactive analytics at any reasonable scale, or support continuous ingestion well. It is also often very expensive to scale, especially for large data sets, complex queries and semi-structured data.
- Firebolt is the only data warehouse with decoupled storage and compute that supports ad hoc and semi-structured data analytics with sub-second performance at scale. It also combines simplified administration with choice and control over instance types and a different pricing model to deliver the lowest overall TCO. This makes it the best choice for ad hoc, high performance, operational and customer-facing analytics.
In short, some data warehouses are better for different use cases. Using this information will hopefully help you choose the right data warehouse or query engine for different use cases. It may also help you prepare your analytics infrastructure so that these choices can be made.
There has never been such a thing as a single data warehouse that satisfied all the analytics needs of a company. The combination of today’s modern data pipelines and data lakes with the simplicity of cloud services have made adding another cloud warehouse relatively straightforward and cost effective. Those companies that were already using a cloud data warehouse for reporting and dashboards were able to add Firebolt as another cloud data warehouse in weeks, and use it for ad hoc, semi-structured data, operational and customer-facing analytics, while leaving the existing analytics in place.
If you do put a modern data pipeline in place that does make it easier to redirect your data from a data lake or other “single source of the truth” then you will be able to choose the best combination of price and performance for each analytics need.