Introducing crystal-wkb to work with geometry data (e.g., GEOS) and geospatial databases (e.g., PostGIS)

Hi everyone!

I had the need to read and write geospatial data, particularly Well-Known Binary (WKB) representation of geometry, which is used by most geospatial libraries (e.g., GEOS, GDAL) and database engines that support geospatial features, e.g., Postgres with PostGIS, MySQL spatial data types, DuckDB spatial. Given that I did not find a suitable shard, I’ve implemented one in pure Crystal: crystal-wkb.

The shard supports the most common flavors of WKB: standard WKB for 2D, EWKB used in PostGIS that extends to 3D and 4D with optional SRID, and ISO WKB (3D and 4D). The shard also has limited support for the equivalent Well-Known Text (WKT). In addition, there is an extension for serialization of geometry as GeoJSON, and another extension that provides convenience converters for DB::Serializable.

If you need to decode and encode geospatial data in Crystal, give it a try. As always, PRs are welcomed!


Looks very neat and it will be a great accessory for those that need it. I havn’t worked with geodata in decades though, so I won’t have all that fresh idea of what is necessary or good.

I wonder though about the use of nested arrays in so many places. In some places Tuples or even records seem like a more natural coupling. For example a method creating a 2d point should probably only accept exactly 2 arguments, so representing that with a tuple of exactly two elements could be a nice idea - it could allow both a more compact format with fewer allocations, while being more type-safe, while not adding overhead of typing them in. Something to consider, but perhaps you have already thought about that.

This looks great. I was trying to implement this specifically for PostGIS a few years ago, but ran out of time to work on it. Turned out there were additional challenges on top of implementing WKT/WKB formats, like how the Postgres shard doesn’t have support for data types provided by extensions (the oids are not static the way they are for built-in types). That’s also something I tried to work on but time limitations got me again.

1 Like

Unfortunately doing a specialized record per coordinate mode and geometry type would be cumbersome. WKB and its flavors support 2D, 3D or 4D coordinates (XY, XYZ, XYM and XYZM), so you would need to create four different records just for the type Point. In addition any geometry object can be empty, and EWKB has an optional SRID attribute that it is not present in the other flavors.

This library is intended for WKB which is meant to be an efficient binary representation of geometry objects. Hence having an underlying sequence of coordinates, a Slice(Float64) in the case of crystal-wkb, is a natural fit when you need to write and read from byte sequences of double precision floats that can have variable sizes.

The approach I followed is based on GEOS —the underlying library for the a great majority of geospatial libraries and systems. Therein, what I called Position, based on the same concept in the GeoJSON RFC, is called CoordinateSequence, sharing a similar structure to what I’ve implemented in Crystal. In this manner you create thin wrappers to access the underlying sequence of coordinates and the overall architecture of the library is greatly simplified.

Regarding the use of nested arrays of Float64, this follows naturally from the previous points, and indeed both WKT and GeoJSON represent the basic geometry objects as nested arrays of coordinates. Moreover, in crystal-wkb the use of (nested) Array(Float64) is meant to be the pubic interface to manually create instances WKB::Object for people familiar with GeoJSON or WKT. If you only need, for instance, to decode WKB into GeoJSON, it should be efficient because the respective decoder and encoder work directly on the underlying slices.

Yes, indeed. When I searched for potential solutions in Crystal I stumbled upon your repository. However I needed a more generic solution that could work with different libraries and databases. I hope it might come handy for you and that someday we could have support for custom datatypes with dynamic OIDs in the Postgres driver.

1 Like

Indeed. You’ve taken a huge part of that work off my plate with this project. Thank you!

I don’t see how this is a problem for a tuple/struct approach. Actually, I would expect Crystal’s union types to work pretty well for this.

Looking at GEOS, it seems the C++ implementation is closer to what @yxhuvud hints at than what’s implemented in crystal-wkb. There are separate CoordinateXY, CoordinateXYM etc. types for example.

1 Like

I see. Regarding GEOS, I took inspiration from the C API because it “provides long-term API and ABI stability”, while the C++ API “will likely change across versions”, according to the project’s own home page.

Nonetheless, perhaps I could add new public initializers for Position (and Point) that take a Tuple or NamedTuple to automatically set the underlying Slice(Float64) and coordinate mode without the need of allocating intermediary arrays (if the user does not already have those). For instance,{1.0, 2.0}) or{x: 1.0, y: 2.0, z: 3.0, m: 4.0}). In the case of a 3-coordinate unnamed tuple on which there is ambiguity it could default to XYZ.