April 9, 2025

April 9, 2025

Recap - Firebolt Forward: Data Warehousing in the Age of AI

Multiple contributors

April 9, 2025

April 9, 2025

Recap - Firebolt Forward: Data Warehousing in the Age of AI

Multiple contributors

No items found.

Listen to this article

Powered by NotebookLM

Listen to this article

TL;DR: Legacy data warehouses can’t handle AI apps. Firebolt is built for subsecond latency, massive concurrency, and native vector search—no bolt-ons needed.

Outperforms Snowflake, Redshift, BigQuery by 8x-90x on price-performance
Firebolt Core: self-hosted, Docker-based version
Native Iceberg + geospatial support
Built-in vector search powers RAG apps—no external DB needed
Roadmap: Text-to-SQL, streaming, model mgmt, and more.

Fast enough for production AI. Flexible for modern stacks. Simple enough to scale without a massive team.‍

The Challenge: AI-Ready Infrastructure

We're caught in a frustrating paradox; we're tasked with building infrastructure for the AI revolution, but we're doing it with tooling designed for the BI dashboard era.

According to ISG:

90% of AI models never make it to production
80% of AI project time is spent wrangling data, not training models
35% of enterprises cite inadequate infrastructure as their biggest blocker to GenAI adoption

If you've been feeling this pain—struggling with latency issues, cost overruns, and scattered vector data—you're not alone. The recent Firebolt Forward event tackled this head-on, focusing on a critical question: What happens when you rethink cloud data warehousing from the ground up for AI-powered applications?

This wasn't a future-gazing event full of "someday" promises. It was about solutions available now for the challenges we're facing today. Let's dive into what was presented and what it means for data engineers building in the AI era.

The Old Playbook Doesn't Work Anymore

The classic data warehouse was built for BI dashboards, where your primary concern was, "Will my bar chart render in under 10 seconds?" Today, we're asking much more demanding questions:

Can our infrastructure support semantic search for thousands of users?
Will our chatbot return answers in under 500ms?
How do we avoid breaking the bank every time an AI app scales?

Most warehouses simply weren't designed to serve AI-powered, customer-facing applications. They weren't optimized for low latency, weren't priced for massive concurrency, and certainly weren't built to handle vector workloads or unstructured data.

The result? We've been cobbling together solutions—a vector database here, a caching layer there, maybe a query accelerator thrown in for good measure. It all kind of works... until it doesn't.

Firebolt's thesis is simple but powerful: the warehouse for AI is fundamentally different from the warehouse for BI.

Firebolt's Approach: Built for Performance, Priced for Scale

The event kicked off with a compelling keynote session featuring Igor Stanko, Firebolt's Chief Product Officer, who set the stage by examining how AI is fundamentally reshaping data infrastructure requirements. Igor walked attendees through Firebolt's evolution from a high-performance cloud data warehouse to an AI-first platform designed specifically for the demands of modern applications.

"Traditional data warehouses were built for BI dashboards where acceptable performance meant bar charts rendering in under 10 seconds," Igor explained. "Today's AI applications demand responses in milliseconds, support for thousands of concurrent users, and seamless handling of both structured and unstructured data including vectors and embeddings."

To provide an independent perspective on these market shifts, Igor was joined by David Menninger, Executive Director at ISG, who shared eye-opening research on AI adoption challenges. According to ISG's findings, while AI is now the second-largest area of IT spend with the AI and data analytics market projected to hit $261.7 billion, a staggering 90% of AI models never make it to production. The primary culprit? Inadequate infrastructure.

"The data shows that 80% of AI project time is spent wrangling data, not building models," David revealed. "And 35% of enterprises cite poor infrastructure as their biggest blocker to generative AI adoption."

Together, Igor and David painted a clear picture of the disconnect between AI ambitions and the reality of legacy data infrastructure, setting the stage for Firebolt's vision of purpose-built data warehousing for the AI era.

FireScale Benchmarks: The Numbers That Matter

While the insights from Igor Stanko and David Menninger established the critical need for AI-optimized data infrastructure, the question remained: How much better can a purpose-built solution actually perform? To answer this, Cole Bowden, Developer Advocate at Firebolt, took the stage to present real-world performance data with the introduction of Firebolt's new FireScale benchmark.

"We've all heard performance claims before," Cole acknowledged. "That's why we created FireScale—not as a contrived test, but as a benchmark based on actual customer workloads and the Amplab web traffic dataset that teams are running in production today."

What followed was a compelling demonstration of Firebolt's performance advantages that moved beyond theoretical discussions into the realm of measurable results. The FireScale benchmark compared Firebolt against industry heavyweights Snowflake, Redshift, and BigQuery—the results were eye-opening:

Firebolt's smallest engine outperformed competitors' largest ones
Price-performance was up to 8x better than Snowflake, 18x better than Redshift, and 90x better than BigQuery
At high concurrency: 2,500 queries per second with <120ms latency

This wasn't a contrived "hello world" test. It was built from real-world workloads that teams are running today, making it particularly relevant for those of us in the trenches.

Customer Spotlights: Real-World Validation

The customer panel featured Ryan McWilliams from Vrio and Yaron Cohen-Leo from Bigabid, who shared how Firebolt transformed their data operations.

From Database Frustration to Performance Freedom

Ryan McWilliams, CTO of Vrio (an online headless CRM platform), described their struggles with traditional infrastructure:

"Before Firebolt, we were paying astronomical costs to AWS and trying to solve our issues with MySQL and RDS and Aurora, and nothing would work," Ryan explained. "It was probably every day I'd be pinged with somebody saying a report's taking too long, or it's not completing at all."

The breaking point wasn't just performance issues but the mounting development costs: "We were chasing our tails because we'd solve one problem and maybe cause another one somewhere else."

After migrating to Firebolt:

Query complaints disappeared completely
They reduced their AWS spend
Engineers could focus on building features rather than optimizing queries

Breaking Free from "Redshift Purgatory"

Yaron Cohen-Leo from Bigabid, an adtech company processing tens of terabytes daily, explained how their previous setup with MySQL and Redshift couldn't handle their analytics needs:

"Our campaign managers always wanted more and more, and a lot of times things got stuck, or we had to do trade-offs between analyzing a couple of months or seeing the entire data funnel."

With Firebolt, Bigabid built two critical systems:

A comprehensive BI platform
A near real-time campaign management platform with millisecond-level responses

"After a few months, there was full adoption. It now serves almost everyone in the company," Yaron noted.

Keys to Successful Migration

Both customers emphasized that the migration process was straightforward. Ryan highlighted that "the customer support we got from the Firebolt team was second to none," noting how their data ingestion time dropped from 30 minutes to about one minute.

Their advice for teams considering a similar move:

"If you take the time to integrate Firebolt, you're going to save time on the back end. We probably save 50% of the money we were spending on AWS," said Ryan.

Yaron added: "First, understand what your data model is and examine it across platforms... The technical account managers helped us optimize everything until most of our important queries ran faster than the competitors."

Looking Toward an AI-Powered Future

Both companies are now looking to leverage their enhanced infrastructure for AI initiatives.

For Vrio, predictive analytics for e-commerce is the focus: "To be able to predict revenue, predict inventory stock needs... that's where we're going to start pushing our business," Ryan explained.

Bigabid is enhancing their machine learning capabilities: "Our data science teams can use Firebolt to leverage their machine learning algorithms, helping them develop even quicker than what they're doing now," said Yaron.

The customer stories made it clear that Firebolt's impact extends beyond technical metrics, fundamentally changing how these companies operate and enabling new business capabilities.

Product Announcements: What's New in Firebolt

Firebolt Core (Private Preview)

Perhaps the biggest announcement was Firebolt Core—a fully self-hosted edition of Firebolt, packaged as a single Docker image. This is the same distributed query engine, minus the SaaS overhead:

Run Firebolt locally, in any cloud, or in air-gapped environments
Deploy in minutes with zero external dependencies
Query open formats like Apache Iceberg without data duplication

For those of us who need more control over our environments (or who work in organizations with strict data residency requirements), this is a game-changer.

Apache Iceberg Support

Firebolt also announced native support for Apache Iceberg, allowing users to:

Use the read_iceberg() function to query Parquet files directly
Connect to catalogs like Snowflake, AWS Glue, and GCP BigLake
Get Firebolt-level performance on top of open data formats

With SSD caching, metadata pruning, and transparent subresult caching, Firebolt makes querying Iceberg feel like querying native storage—a huge win for those of us working in lakehouse architectures who are tired of making copies just to make queries fast.

Geospatial Support

Firebolt now supports native geospatial queries, using familiar Postgres-style SQL but with Firebolt's columnar engine under the hood. This means:

Using functions like ST_DISTANCE, ST_WITHIN, and other common spatial functions
Running them directly against high-volume location data
Joining with structured and semi-structured data—all in one query

For use cases like delivery optimization, low-latency tracking, geofencing, and location-based personalization, this eliminates the need for a separate geospatial stack.

Security and Compliance

Security got a significant boost with enterprise-grade features built directly into the platform:

HIPAA compliance for healthcare workloads
SOC 2, ISO 27001, and ISO 27018 certifications
RBAC down to the table, view, schema, and location level
PrivateLink support for secure VPC access
Encryption at rest and in transit by default

These aren't add-ons or patches—they're integral to the platform, allowing faster development with sensitive data and deployment to regulated industries.

Pricing Transparency

In an industry where cost surprises are the norm, Firebolt unveiled a refreshingly clear pricing model:

Compute-optimized engines from $0.92/hour
Storage-optimized engines from $1.84/hour

Simple, predictable, and designed to scale with your business—not your anxiety.

Demo Highlights: Firebolt in Action

Firebolt Core + Iceberg Demo

Benjamin Wagner demonstrated Firebolt Core deployed across GCP instances, showcasing:

Quick Deployment: Spun up across four GCP instances using a single Docker image, with no custom provisioning or third-party services
In-Situ Querying: Used read_iceberg() to query Snowflake-managed Iceberg tables directly—no ETL, no data movement, no waiting
Native-Like Performance: Even with external data, performance was comparable to Firebolt's managed storage thanks to:
- SSD caching (automatically used under the hood)
- Subresult caching (reuses query fragments when possible)
- Metadata pruning (uses Parquet and Iceberg metadata to skip irrelevant files and rows)
Simplicity: The entire setup required just a few lines of SQL to register the Iceberg table and standard SQL syntax for queries

For those of us dealing with fragmented storage (Parquet in S3, Iceberg tables across tools, maybe a Snowflake-managed catalog), this demo showed that you don't need to choose between flexibility and speed.

Building a Chatbot with Firebolt

David Stanko, intern at Firebolt, demonstrated a support chatbot built using Firebolt as the vector search layer in a RAG (retrieval-augmented generation) pipeline. The key takeaways:

No external vector DB needed
Subsecond similarity search with cosine metric
Semantic chunking using nomic-embed-text
Built-in metadata filtering for internal vs. customer queries
Powered by Llama 3.1

The entire solution is open source and available for us to fork and adapt to our own documentation and use cases—a practical example of how Firebolt can eliminate the need for specialized vector databases in RAG workflows.

Roadmap: What's Coming Next

Hiren Patel outlined Firebolt's future direction across three pillars:

1. Modern AI Applications

Text-to-SQL
Enhanced vector search and model management
Python UDFs and LLM integrations
Agentic AI enablement

2. Lakehouse Integration

Full Iceberg read/write support
Deeper integration with Glue, Polaris, BigLake
REST Catalog support
Firebolt as an Iceberg catalog (future-looking)

3. Cloud Data Warehouse Enhancements

Streaming ingestion with Kafka
Subsecond engine warm-up
Advanced query optimization (JIT, pruning, background storage optimization)
Fine-grained RBAC and expanded compliance
Power BI, GitHub, Azure, GCP ecosystem integrations

4. AI-Powered Usability

AI-powered query optimizations
Copilots for schema design and SQL generation
Optimization recommendations based on usage patterns
AI-powered migration utilities

The Bottom Line: Firebolt's Strategy in a Sentence

If I had to distill Firebolt's message from the event into a single sentence, it would be:

Fast enough for production-grade AI, flexible enough for modern stacks, and simple enough to scale without a dedicated data engineering team.

Firebolt is not trying to out-feature Snowflake with an endless list of capabilities. Instead, we're focused on out-executing them for teams building modern, real-time, AI-powered applications—which is exactly where most practitioners are heading.

Getting Started

Ready to see what Firebolt can do for your AI and data workloads?

Try Firebolt: Experience Firebolt's subsecond query speed starting under $1/hour
Join the Firebolt Core Private Preview: Deploy Firebolt anywhere with Docker—no strings, no vendor lock-in
Build your own AI assistant: Fork the Firebolt chatbot repo and power it with your docs
Explore FireScale Benchmarks: See how Firebolt outperformed Snowflake, Redshift, and BigQuery by up to 90x on real workloads.

Table of Contents

This is some text inside of a div block.

This is some text inside of a div block.

Get Started for Free

Read all the posts

Decomposing Firebolt transactions

How Firebolt maps Execute, Validate, Order, Persist steps using MVCC, OCC, LSN, FoundationDB, and Kafka.

Mosha Pasumansky

Revolutionizing Data Governance with DataStrato’s Unified Open Source Approach

Uncover the future of data governance and explore innovative solutions for a unified data ecosystem with Lisa Cao, Produ

Firebolt Team

Beyond Database Optimization with AI

Discover groundbreaking, innovative approach to database technology as you tune in to this episode with CEO DucksDB Labs

Firebolt Team

Intrigued? Want to read some more?