Robust and efficient geospatial operations using snap rounding (Part III)

Listen to this article

In parts I and II of this series of blog posts, we talked about the foundational building blocks of Firebolt’s GEOGRAPHY data type. In part III, we will explore in more detail how Firebolt implements robust operations on geospatial data.

What is snap rounding?

One of the fundamental challenges we faced when implementing geospatial functions using the S2 Geometry library was the use of snap rounding. To understand snap rounding, let’s first have a look at the problem it solves. For this example, we ignore the curvature of the Earth in order to keep it simple and understandable. Consider a line between the points at coordinates (0, 0.1) and (1, 0.3). We might have a point at coordinates (0.5, 0.2) and want to check whether it is on the line. Since the number 0.2 is not representable exactly as a floating point number (and neither are 0.1 or 0.3), an exact calculation would always return that the point is not on the line. Such errors might also happen because of slight imprecisions in the data due to measurements or because points are constructed from other objects. For example, a point could have been constructed by intersecting two lines, which will introduce some inaccuracies due to floating point calculations. The S2 library goes even further: Because it cannot guarantee that an exact calculation always works, it will always return that the point is not on the line, except if it is exactly equal to one of the points used to define the line. Snap rounding is S2's solution to this problem: We “snap” two inputs together if they are less than a certain distance apart from each other (in Firebolt’s case, we use 1 micron). During this process, we transform the line from (0, 0.1) to (1, 0.3) into a line that starts at (0, 0.1), passes through (0.5, 0.2) and ends at (1, 0.3). Now, (0.5, 0.2) is exactly the same as one of the points used to define the line, so we return that it lies on it.

Mitigating the performance penalty of snap rounding

While snap rounding is a great tool for building robust operations, it comes with a performance hit for actually performing snap rounding. S2 does not natively support a way to only perform snap rounding when it’s required. So in order to get fast and robust operations, we need to take optimization into our own hands. We use several fast paths that are applied per tuple to circumvent performing the actual snap rounding.

Many of these checks use S2 cells and coverings. We already explained these in detail in part II. In short: S2 cells are a hierarchical partitioning of the Earth's surface, designed to represent spatial data at various levels of precision. A covering, in this context, refers to a set of S2 cells that together encompass a given region. One of the key advantages of using S2 cells is that they allow very efficient intersection and containment tests. Since each cell is represented by a single integer, determining whether one cell intersects with another can be done using simple integer comparisons, which are extremely fast.

One example is that if one of the inputs is constant in the query (for example when querying all points from a table that are inside a specific polygon), we compute a covering of S2 cells around the polygon. We can then perform a fast check whether a point from a table is within that covering. If it is not, we can immediately deduce that the point is not inside the polygon. The covering used here is chosen such that it includes everything that would be inside the polygon after snap rounding. Since the snap rounding distance of 1 microns is much smaller than even the most accurate S2 cells, this usually does not actually lead to a different covering than if we ignored snap rounding. It is still important to account for it because otherwise we might falsely determine that a point is outside of the polygon based on the covering even though it would be inside it after snap rounding.

In the example below, the point on the right is outside of the polygon’s covering, so we can immediately return that it is outside of the polygon.

We use a similar technique for clearly positive cases: By computing an interior S2 cell covering, i.e. a set of S2 cells that are guaranteed to be completely inside a polygon, we can quickly determine that a point is inside the polygon.

The example below shows the same polygon and point set as the previous one. Here we see that the left-most point is inside the interior covering of the polygon so we can immediately return that it is inside the polygon.

‍

In the examples above, we still have two points left that we need to check for containment in the polygon. The next optimization we apply here makes use of the rough covering consisting of only one S2 cell that is stored for each point (or any other GEOGRAPHY type, but we’re focussing on points for these examples). This covering is configured such that it contains everything within the snap rounding distance around the point. For a containment query in a polygon, we can then check whether the polygon fully contains that cell. If the cell is contained, then snap rounding cannot change that fact so we can safely return that the polygon contains the point. If the cell is fully outside of the polygon, then the point cannot be snap rounded to be contained in the polygon, so we can safely return that as well. Only if the point is very close to the polygon’s boundary, snap rounding will be applied and an exact containment test on the snap-rounded polygon and point will be performed. This is more expensive than the previous two checks which only used interior and exterior coverings of the polygon, but it is still cheaper than the exact containment test using snap rounding.

In the image below, we see the two remaining points from the example with their respective one-cell coverings. The top left point has a covering that is completely contained in the polygon, so we can return that it is contained in the polygon. The bottom right point’s covering intersects the polygon, so we have to perform snap rounding only for this one remaining point.

‍

Handling degenerate shapes

Since snap rounding is also applied within a single shape, it can also lead to degenerate shapes: Consider a very narrow polygon in a diamond shape, where two of the opposite vertices are under 1 micrometer apart. During snap rounding, these two vertices are snapped together, so the result is a degenerate polygon that has an area of 0.

There are two approaches to handle such degeneracies: Dimension reduction where this polygon would be converted into a line string, or native support of degeneracies where this is treated as an infinitely small polygon (in terms of area). Since there are some functions that behave differently for polygons and linestrings (because they have different definitions of what their boundaries are), Firebolt uses S2’s native support of degeneracies when these occur during snapping.

One example of a function that behaves differently depending on how degeneracies are handled, is ST_CONTAINS. In comparison to the very similar ST_COVERS function, ST_CONTAINS(A, B) only returns true if B is completely inside A, and the interiors of A and B intersect. The interior of the degenerate polygon from the example above is empty, so it contains nothing according to the definition of ST_CONTAINS. However, if we reduce the polygon to a linestring, this changes: By definition, a line string’s interior is the entire linestring except for the first and last vertex. So if we were to convert the polygon into a linestring, it would contain infinitely many points. Because of this subtlety, we recommend using ST_COVERS instead of ST_CONTAINS in most cases. It is easier to understand, less expensive to compute, and does the same as ST_CONTAINS in almost all cases.

This wraps up our three-part series of blog posts about geospatial support in Firebolt. We went from fundamentals in part I to data format and pruning in part II all the way down to the details of snap rounding and performance optimizations in part III. Hopefully you learned a thing or two about the fascinating engineering problems associated with geospatial support in a database. I would also like to thank Google and Eric Veach again for making S2 geometry available. It was an invaluable tool to build our geospatial support. Now that you know all about how it works, you can try out Firebolt’s geospatial functionality yourself using our demo project or have a first look using our live-demo.‍

Robust and efficient geospatial operations using snap rounding (Part III)

Robust and efficient geospatial operations using snap rounding (Part III)

Listen to this article

What is snap rounding?

Mitigating the performance penalty of snap rounding

Handling degenerate shapes

Firing Up Firebolt’s Client Ecosystem

Bogdan Truta

From Zero to 100M Users: Inside Notion’s Data Stack and AI Strategy with Sumit Gupta

Firebolt Team

Introducing Firebolt Core - Self-Hosted Firebolt, For Free, Forever

Mosha Pasumansky

Intrigued? Want to read some more?

Robust and efficient geospatial operations using snap rounding (Part III)

Robust and efficient geospatial operations using snap rounding (Part III)

Listen to this article

What is snap rounding?

Mitigating the performance penalty of snap rounding

Handling degenerate shapes

Firing Up Firebolt’s Client Ecosystem

Bogdan Truta

From Zero to 100M Users: Inside Notion’s Data Stack and AI Strategy with Sumit Gupta

Firebolt Team

Introducing Firebolt Core - Self-Hosted Firebolt, For Free, Forever

Mosha Pasumansky

Intrigued? Want to read some more?

Don't miss a post, subscribe to the Fireblog