<- Back to all posts

February 5, 2025

February 5, 2025

Building Geospatial Support in Firebolt (Part-I)

Software Engineer

February 5, 2025

February 5, 2025

Building Geospatial Support in Firebolt (Part-I)

Software Engineer

No items found.

Listen to this article

Powered by NotebookLM

Listen to this article

As the demand for location-based insights grows, integrating geospatial capabilities directly into a data warehouse unlocks a range of powerful use cases—from real-time tracking to spatial analysis and mapping. In this series of blog posts, we’ll take you behind the scenes of how we developed robust and fast geospatial functionality for our cloud data platform, enabling users to seamlessly store, query, and analyze spatial data alongside their traditional datasets. We’ll walk through the key challenges we faced, the technologies we leveraged, and the architectural decisions that made it all possible.

Geospatial support allows Firebolt to store and process spatial data, such as points, lines, and polygons, that represent real-world locations and features. We’ll dive into the various components that make this possible, from specialized libraries to performance optimizations and data pruning.

In Part I, we'll begin by exploring the two main types of geospatial support: GEOMETRY and GEOGRAPHY. We'll explain the reasons behind our decision to implement the GEOGRAPHY type and introduce the S2 geometry library, which we use to power geospatial features in Firebolt. We'll also discuss the benefits and challenges of leveraging this library with a brief look at how S2 Cells can help speed up geospatial functions.

In Part II, we'll examine the underlying architecture and storage infrastructure that enable both high-performance geospatial functions and efficient data pruning through spatial indexing. Here, we will discuss the importance of verification and normalization of inputs during ingestion and see how Firebolt persists S2 cell coverings in its storage layer which enables powerful pruning as well as function optimizations.

In Part III, we'll take a closer look at how we achieve fast and reliable geospatial operations with S2. While S2 offers robust tools for geospatial functions, these capabilities can come at a performance cost. We'll dive into how careful engineering helps mitigate these performance trade-offs by making use of fast checks using S2 Cells before doing expensive snap rounding operations that are required for robust functions.

GEOMETRY vs GEOGRAPHY

When dealing with geospatial data in a database, most systems use either a GEOMETRY or a GEOGRAPHY type. At Firebolt, we decided to implement the GEOGRAPHY type as it is more accurate for geographical data. Let’s have a look at some of the differences between the two types:

GEOMETRY models data on a flat plane. Since Earth is, in fact, not flat, this means that distances and other relations between objects, like containment, become inaccurate the further apart and larger they are. You also have to decide on a projection to map your inputs onto a flat plane, adding complexity to your queries.

GEOGRAPHY models data on a sphere, which much more closely resembles the shape of the earth. Distances and other relations between objects remain accurate, even on a global scale. A projection is also unnecessary since coordinates are directly mapped to the sphere according to the WGS 84 coordinate system.

For example, let’s have a look at the following query where we test whether the point at longitude -100 and latitude 45.1 is contained in a large area defined as a polygon within the USA.

(Powered by Mapbox)

The corresponding query using the GEOMETRY and GEOGRAPHY types using the PostGIS expansion for PostgreSQL return different results as shown below.

-- GEOMETRY: Returns false
SELECT ST_Covers(
    ST_SetSRID(ST_GeomFromText('POLYGON ((-115 45, -115 35, -90 35, -90 45, -115 45))'), 4326),
    ST_SetSRID(ST_GeomFromText('POINT (-100 45.1)'), 4326)
);

-- GEOGRAPHY: Returns true
SELECT ST_Covers(
    ST_GeogFromText('POLYGON ((-115 45, -115 35, -90 35, -90 45, -115 45))'),
    ST_GeogFromText('POINT (-100 45.1)')
);

Implementing GEOGRAPHY using the S2 geometry library

Implementing GEOMETRY or GEOGRAPHY brings many challenges. Consider for example the common problem of determining if a point x is contained in a polygon. We can do this by choosing a point y that we know is outside of the polygon, and counting how often a line between x and y crosses a polygon boundary while making sure to account for edge cases like crossing through a vertex of the polygon, where we might falsely determine that we cross two or zero edges of the polygon.

Of course, implementing GEOGRAPHY comes with its own set of problems: We need to implement all geometric primitives in a way that accounts for earth's curvature and make sure that objects crossing the antimeridian (180 degrees east or west) or the poles are handled correctly.

Solving these problems is hard and takes a lot of time. Fortunately, the S2 Geometry library that powers Firebolt’s GEOGRAPHY type and functions can do a lot of the heavy lifting. It provides abstractions for shapes like points, polygons, and line strings and can do many useful computations on them like testing whether one shape contains another shape. It even provides the tools to build powerful spatial indexes using space filling curves (more about this in a later blog post). Thank you to Google and Eric Veach for open sourcing and maintaining the S2 Geometry library.

S2 also provides powerful tools for spatial indexing by dividing the Earth into a hierarchy of approximately square-shaped cells. This recursive division allows us to approximate complex shapes or collections of shapes through what are called coverings—sets of S2 cells that together represent the shape. One of the key advantages of using S2 cells is the efficiency of intersection tests. Since each cell is represented by a single integer, determining whether one cell intersects with another can be done using simple integer comparisons, which are extremely fast.

This efficiency is especially valuable for spatial queries with selective filters on spatial relations. For example, when querying a table to retrieve all points within a specific polygon, we can significantly reduce the number of points we need to examine by testing whether the covering of the polygon intersects with the covering of the point set. This allows us to quickly rule out large portions of the dataset that don't meet the query criteria, improving both the speed and scalability of spatial queries.

We will go into more detail of how spatial indexing and pruning works in Firebolt in a future blog post.

Of course, these are not the only interesting problems we have to solve and optimizations Firebolt can do. Stay tuned for future blog posts where we talk about some of the most interesting problems we had to tackle as well as how we tuned our in-memory and on-disk representation of GEOGRAPHY to enable fast geospatial functions and powerful pruning.

Try Geospatial today by signing up for Firebolt for free and look into our GitHub for a cool demo.

Table of Contents

This is some text inside of a div block.

This is some text inside of a div block.

Firing Up Firebolt’s Client Ecosystem

Enable users to not use resources on just maintaining a connection when in fact their client is not doing anything

Bogdan Truta

From Zero to 100M Users: Inside Notion’s Data Stack and AI Strategy with Sumit Gupta

Master AI data workflows and key soft skills for your evolving data career, with tips from Notion's Lead BI Engineer.

Firebolt Team

Introducing Firebolt Core - Self-Hosted Firebolt, For Free, Forever

Dive into the workings of the forever free, self-hosted edition of Firebolt’s distributed query engine

Mosha Pasumansky

Intrigued? Want to read some more?