Replies: 8 comments 13 replies
-
It seems like trait-based geo algorithms could be developed in a separate repository or crate. If it pans out, there might be a reason to have geo depend on it, or merge it into georust/geo if it seems beneficial. This might help avoid a situation where we're rewriting (and frankly, probably complicating) already working geo code under the auspices of "maybe this will be useful to someone someday". This is similar to what we did with Proj - which used to have a hard dependency on geo-types, but now we have a |
Beta Was this translation helpful? Give feedback.
-
I had a similar thought and am definitely in favour of this approach. |
Beta Was this translation helpful? Give feedback.
-
It's taken me like 4 months of learning, but I think I'm finally to a point where I can contribute to the discussion 😄. MotivationI'm really excited about cross-language tools and workflows, where it's easy to pass data between Rust to Python to C++, and similarly in the browser between WebAssembly and JS. Today, there tends to be data silos where code was written for one memory model, and thus in order to use that data from another program, it needs to be serialized and deserialized. In the geo world, that data silo tends to be GEOS/JTS, because so much of today's geo tools have been implemented around them. But with the advent of FlatGeobuf and GeoArrow, I think there's potential in having the memory model as the core specification, not the library. I think the Georust community is really well poised to take advantage of this going forward because:
Potential zero-copy memory formatsGeoArrowApache Arrow is a columnar data format designed for cross-language support and zero-copy access. GeoArrow is an in-progress specification to extend Arrow to handle geospatial vector data. When using the arrow-native coordinate encoding, any coordinate of any geometry in the array can be accessed in constant time. Arrow has already seen wide use outside of the Geo sphere, but I think there's a ton of potential for it in the geo world as well. I do a lot of geo data visualization in the browser, and I've written about how deck.gl (a WebGL-enabled map rendering library) can use GeoArrow data directly, saving a ton of CPU processing time. So you could imagine a data exploration workflow like:
Being able to go through this entire process where the memory format is virtually unchanged in each step would be super fast and reduce memory overheads. FlatGeobufFlatGeobuf is a performance-oriented vector geometry format. Until now, FlatGeobuf has been used almost exclusively as a file format, where language support for FlatGeobuf means loading the file to convert it into that language's native geometry type. But Flatbuffers, which FlatGeobuf is built upon, allows for zero-copy access. So FlatGeobuf could also be used as a memory format. Indeed, I believe this is why FlatGeobuf does not offer compression, so that it could be memory mapped. But if you can't operate on the original memory layout, then memory mapping loses some of its value. GEOS?I don't know the internals of GEOS well, but it looks like it is theoretically possible to read GEOS memory without a copy. Then georust could share memory with e.g. shapely in Python. Trait DefinitionI've been hacking on a trait definition on this branch, as of today at commit Trait methodsOne big question would be how many methods to require on the traits themselves, vs implementing on top of the traits. Fewer methods on the traits would of course make them easier to implement but harder to use. For now, I just define the most minimal methods to be able to access underlying data. Geometry typesFor now I only implemented the 6 core types: OwnershipI at first planned to exclusively use references in the traits, but ended up hitting lots of
when trying to pass in Arrow data. As I understand it, Arrow2 is designed to support copy-on-write semantics, so clones are really cheap and using owned objects in the traits seemed to make things easier for now, but this is certainly something to revisit. Generic
|
Beta Was this translation helpful? Give feedback.
-
Thanks for fleshing out some working code to move this along!
I may be misunderstanding, but would something like a blanket implementation solve the problem? e.g. how we impl Translate for anything that implements AffineOps: https://github.com/georust/geo/blob/main/geo/src/algorithm/translate.rs#L49 (How) do GeometryCollection's fit into this proposal? A heterogenous bag of potentially different geometry types seems like it could be hard to reconcile with a column store, but I'm not really familiar with the subject. |
Beta Was this translation helpful? Give feedback.
-
Also, I'm hopeful we could keep some kind of CoordNum based generic number type where possible. Integer types and "bring your own precision model" isn't something I've personally used yet in geo, but I think some people might appreciate the option, and might be kind of a novel thing that we can support better in rust than in, e.g. cpp, so I'm a little resistant to dropping it. |
Beta Was this translation helpful? Give feedback.
-
In Discord, @kylebarron mentioned concern about "how to adapt georust algorithms to use traits in a backwards-compatible way". I responded there since that concern wasn't (initially at least) brought up here, but it was suggested I include my comment here. It's a minor one, and more of a side-thought, but perhaps worth considering.
|
Beta Was this translation helpful? Give feedback.
-
For everyone ending up here. This effort is alive and well, particularly in #1157. |
Beta Was this translation helpful? Give feedback.
-
#1157 has been merged! https://crates.io/crates/geo-traits |
Beta Was this translation helpful? Give feedback.
-
There's been some recent discussion around implementing geometry primitives in terms of traits. The idea isn't new (see e.g. #67), and has seen some concrete use in https://github.com/andelf/rust-postgis/blob/master/src/types.rs for instance, but we haven't pursued it in
geo-types
as yet.The need for efficient (zero-copy) access to the geometry representations in other formats (e.g. Arrow) has brought the idea to the fore again.
This would require a significant overhaul of many of
geo
s APIs, and we're nowhere near a concrete idea of what those changes would look like. For the moment, this discussion serves as a repository for ideas, sketches, experiments, with the possible goal of producing an RFC for discussion.Beta Was this translation helpful? Give feedback.
All reactions