February 5, 2025
February 5, 2025

Introducing Geospatial Analytics in Firebolt: Powering Location Intelligence at Scale

No items found.

Listen to this article

Powered by NotebookLM
Listen to this article

In an increasingly connected world, where every device, transaction, and interaction carries a spatial component, geospatial data analysis has emerged as a key driver for innovation. Whether optimizing delivery routes, enabling real-time tracking, analyzing spatial patterns, or monitoring IoT networks, the ability to store and process geospatial data with speed and accuracy is essential.

Today, we’re excited to announce the public preview of Firebolt’s Geospatial Analytics—bringing low-latency, high-performance spatial data processing to the modern cloud data warehouse. With Firebolt, data engineers can now integrate spatial data into their workflows, unlocking new dimensions of insight across industries.

Top Use Cases Empowered by Firebolt Geospatial

Firebolt’s geospatial capabilities cater to mission-critical scenarios, including:

  • Logistics and Fleet Optimization: Real-time delivery tracking and efficient resource allocation at a global scale.
  • IoT and Sensor Data Analysis: Powering IoT networks by analyzing spatial patterns from millions of connected devices.
  • Retail and Customer Insights: Geo-marketing, location-based recommendations, and catchment area analysis to enhance customer experiences.
  • Urban Planning and Infrastructure: Mapping traffic patterns, utilities, and infrastructure layouts to drive smarter city planning.
  • Environmental Monitoring: Real-time tracking of environmental changes, disaster response coordination, and asset monitoring.

Key Pain Points Addressed by Firebolt’s Geospatial Capabilities

Geospatial analytics is notoriously challenging due to issues like computational complexity, scalability limits, and performance bottlenecks. Firebolt overcomes these challenges with:

  1. Accuracy at Scale: Traditional GEOMETRY data types model spatial data on a flat, two-dimensional plane, assuming a Euclidean (flat-earth) surface. This simplification can lead to inaccuracies. Firebolt’s GEOGRAPHY type models data on a sphere, ensuring precision even for long distances and edge cases like poles and the anti-meridian.
  2. Low Latency for Instant Insights: Firebolt's geospatial data processing is built on top of the S2 geometry library and leverages spatial indexing - (the first version of spatial indexing is coming soon). This enables Firebolt to deliver lightning-fast geospatial queries, even on massive datasets.
  3. IoT-Ready Scalability: Efficiently ingest and process billions of geospatial data points generated by IoT devices, maintaining performance consistency for real-time analytics.
  4. Simplified Spatial Operations: Pre-built support for spatial functions like ST_CONTAINS, ST_INTERSECTS, and ST_DISTANCE reduces the need for manual computation while enabling advanced query capabilities.

Firebolt's Approach: Geospatial Analytics Without Compromise

Firebolt’s geospatial implementation leverages the S2 Geometry Library, a cutting-edge tool for spatial indexing and processing, to achieve both accuracy and performance. Firebolt supports creating GEOGRAPHY from the industry standard GeoJSON, Well-Known Text (WKT), and Well-Known Binary (WKB) representations, as well as the extended formats EWKT and EWKB introduced by the PostGIS extension. See the corresponding function documentations for further detail.

1. Geospatial Representation: Why GEOGRAPHY?

The GEOGRAPHY type accurately models spatial data on a spherical surface, preserving real-world distances and relationships. Unlike GEOMETRY, which flattens data onto a plane and introduces distortions, GEOGRAPHY supports use cases where precision is critical—such as calculating the shortest route between two global points or determining spatial coverage for IoT devices.

2. Performance Optimization: Spatial Indexing

At the heart of Firebolt’s geospatial engine is its approach to spatial indexing using S2 Cells—a hierarchical partitioning of Earth into small, manageable regions. The current implementation collects tablet-level metadata to prune full tablets. This enables:

  • Selective Data Access: Queries scan smaller, relevant data subsets based on spatial relationships, providing an initial performance boost.
  • Conditional Performance Optimizations: When possible, Firebolt uses fast intersection tests, such as simple integer comparisons of S2 cells, to evaluate spatial relationships like containment and intersection efficiently. For more complex cases, the system gracefully transitions to advanced computations to ensure accurate results.

Read more about geospatial in our technical blog here

3. Handling Real-World Complexities

Geospatial data is rife with edge cases:

  • Anti-Meridian and Polar Seams: Firebolt’s architecture gracefully handles objects spanning the anti-meridian or polar regions.
  • Snap Rounding for Robustness: To mitigate floating-point inaccuracies, Firebolt uses snap rounding, ensuring reliable results for operations like ST_CONTAINS.
  • Degeneracies and Orientation: Firebolt accounts for tricky cases like degenerate polygons, ensuring consistent and accurate outputs.

4. GEOGRAPHY types in Firebolt

Firebolt supports the following GEOGRAPHY types, with examples provided in the Well-Known Text (WKT) format for spatial data representation:

  • Point: A 0-dimensional object representing a single location in coordinate space. Example: POINT (1 2)
  • LineString: A 1-dimensional line made up of a contiguous sequence of line segments, where each segment connects two points. The end of one segment forms the start of the next. Example: LINESTRING (1 2, 3 4, 5 6)
  • Polygon: A 2-dimensional area, defined by an outer boundary (shell) and optionally one or more inner boundaries (holes). Example: POLYGON ((0 0 ,4 0 ,4 4 ,0 4 ,0 0 ), (1 1 ,2 1 ,2 2 ,1 2 ,1 1 )
  • MultiPoint: A collection of multiple Points. Example: MULTIPOINT ((0 0), (1 2))
  • MultiLineString: A collection of multiple LineStrings. Example: MULTILINESTRING ((0 0,1 1,1 2), (2 3,3 2,5 4))
  • MultiPolygon: A collection of Polygons that do not overlap or share adjacent boundaries, although they may touch at finite Points. Example: MULTIPOLYGON (((1 5, 5 5, 5 1, 1 1, 1 5)), ((6 5, 9 1, 6 1, 6 5)))
  • GeometryCollection: A mixed collection of Geographies. Example: GEOMETRYCOLLECTION (POINT(2 3), LINESTRING(2 3, 3 4))

Geospatial Analytics in action

To do our analysis, we ingested the Kaggle US-Accidents dataset into Firebolt and transformed the coordinates into GEOGRAPHY data types using the ST_GEOGPOINT function, enabling us to perform spatial queries and analysis. This allowed us to work with location-based data and leverage Firebolt’s geospatial functions to gain meaningful insights from the dataset. You can find all the queries and insights here in our GitHub repo if you wish to try the example.

1. First, let’s define a Polygon for Central Park

Query:

set query_parameters={"name":"central_park","value":"POLYGON((-73.95826578140259 40.80120581546623,-73.94845962524414 40.79711243898632,-73.97289991378783 40.76401504612403,-73.98268461227417 40.76785044388771,-73.95826578140259 40.80120581546623))"};
  • In this query, we show a polygon representing the Central Park area in New York City using WKT (Well-Known Text) format.
  • The polygon coordinates encapsulate the boundaries of Central Park.
  • This step sets the polygon as a query parameter named central_park, enabling its reuse across multiple queries for geographic filtering.

Output

2. Now, Let’s Count Accidents by Year Within Central Park

Query:

select extract(year from start_time) as year, count(*)
from accidentdata
where   
  st_covers(st_geogfromtext(param('central_park')), start_location)
group by all
order by year;
  • This query calculates the number of accidents occurring inside Central Park for each year. 
  • The st_covers function checks if the accident's location (start_location) lies within the Central Park polygon (st_geogfromtext(param('central_park'))).
  • The extract(year from start_time) function groups the accidents by year, enabling trend analysis.
  • The result provides a temporal view of accidents, helping to identify years with higher or lower accident counts.
  • First Spike (2020): There was a noticeable increase in 2020, with accidents rising to 11, the highest up to that point.This could be attributed to changes in park usage patterns during the COVID-19 pandemic when outdoor spaces saw increased foot and vehicle traffic.
  • Massive Surge (2022): The most dramatic increase happened in 2022, with 28 accidents, a nearly 4x increase from the previous year. This sharp rise could be due to changes in park policies, increased traffic, new infrastructure, or a surge in visitors.

3. Next, Let’s Retrieve Accident Points Within Central Park

Query

select st_astext(start_location) 
from accidentdata
where st_covers(st_geogfromtext(param('central_park')), start_location);
  • This query extracts the geographic coordinates of all accidents that occurred within Central Park.
  • The st_covers function ensures only accidents within the polygon are included.
  • The st_astext function converts the accident location (start_location) into a human-readable WKT format.
  • This step provides granular accident location data for further spatial analysis or visualization.

4. Let’s Try a New Scenario and Find Where Most of the Accidents Happen in US

Query:

SELECT 
    ST_S2CellIdFromPoint(start_location, 15) AS s2_cell_id, 
    COUNT(*) AS c
FROM 
    accidentdata
GROUP BY 
    s2_cell_id
ORDER BY 
    c DESC
LIMIT 1;
  • ​​This gives us S2 cell id: -9168534571950538752

About the query: This query groups accident data by S2 cells at level 15, a spatial granularity that divides Earth's surface into cells of approximately 79173 m² - about the size of 9-10 soccer fields. This resolution is helpful in analyzing accident density at the scale of neighborhoods or city blocks. The ST_S2CellIDFromPoint function converts geographic coordinates (start_location) into an S2 cell ID, which represents the spatial area containing that location.. The ST_S2CellIdFromPoint function converts geographic coordinates (start_location) into an S2 cell ID at a specified level (e.g., 15), representing the spatial area containing that location. By grouping the data by these cell IDs and counting the occurrences, the query identifies accident counts for each geographic area. 

5. Now, Let’s Find ALL Accidents at the above Cell ID: -9168534571950538752

SELECT st_astext(start_location) AS geometry_wkt
FROM accidentdata
WHERE st_s2cellidfrompoint(start_location, 15) = -9168534571950538752;
  • Answer: South LA

IoT and Firebolt: Enabling Real-Time Spatial Insights

Firebolt’s geospatial support unlocks massive potential for IoT-driven analytics. Imagine:

  • Real-Time Asset Tracking: Monitor and manage fleets, shipments, or connected devices with sub-second query response times, ensuring operational efficiency.
  • Sensor Data Aggregation: Analyze geotagged sensor data from billions of IoT devices to identify spatial patterns, anomalies, or trends.
  • Smart Cities at Scale: From traffic optimization to utility monitoring, Firebolt’s geospatial capabilities power data-driven decision-making for urban infrastructure.

Ready to Explore? Try Firebolt’s Geospatial Preview

Here’s how to get started:

  1. Run Interactive Queries: Test-drive Firebolt’s geospatial capabilities with queries like finding points within polygons or evaluating spatial relationships (e.g., intersections and containment).
  2. Visualize Your Data: Pair Firebolt with mapping tools to create rich, interactive visualizations of geospatial datasets.
  3. Scale Your Analytics: Leverage Firebolt’s elastic architecture to handle even the most demanding geospatial workloads without sacrificing performance.

For more insights and all that went under the hood to build Geospatial analytics support please read this engineering blog - Part I: Building Geospatial support in Firebolt 

What’s Next?

The public preview is just the beginning. We’re continuously optimizing and expanding Firebolt’s geospatial capabilities with features like:

  • Spatial Joins for Complex Analysis: Enable faster, more powerful join operations across geospatial datasets.
  • Enhanced Pruning and Indexing: Further reduce query latencies for massive datasets.
  • IoT-Specific Enhancements: Develop features tailored to high-velocity geospatial data streams from IoT networks.

Try Geospatial with Firebolt Today!

With Firebolt’s geospatial analytics, data engineers can unlock location intelligence at unparalleled speed and scale. Whether you're optimizing IoT networks, analyzing supply chains, or building next-gen applications, Firebolt empowers you to turn geospatial data into actionable insights.Ready to experience the future of geospatial analytics? Sign up for the public preview today, try geospatial with this cool demo(GitHub), and read our documentation.

Read all the posts

Intrigued? Want to read some more?