Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WrappedProcessor to help reading MVTs #222

Merged
merged 3 commits into from
Sep 4, 2024

Conversation

dabreegster
Copy link
Contributor

My ultimate aim is to read an mbtiles file with a single zoom level and extract features from it. It contains compressed MVT data. To that end, I wrote https://github.com/acteng/will-it-fit/blob/main/data_prep/fix_osmm/src/main.rs using geozero and some other crates. It works like this:

  1. Use the mbtiles crate to open the file
  2. Calculate all tiles in the file
  3. For each tile, read the raw data
  4. Use flate2 to gunzip it
  5. Decode into the geozero::mvt::Tile proto
  6. Use the existing mvt geozero reader to process all layers.
  7. But because the geometry in each tile is scaled to fit that tile, I need to transform the coordinates back into WGS84. IIUC the geozero approach correctly, doing that translation as I go is more performant than collecting into something like an FGB writer first and then trying to go back and transform everything. So, I wrote WrappedProcessor to delegate to another FeatureProcessor (an FgbWriter in this example), but call a function on every coordinate first

This PR adds the WrappedProcessor, in case it might be helpful for other use cases. But I'm not sure this approach is the nicest one -- maybe the mvt::process function should instead take optional info about the current tile (the tile x and y, the zoom level, and the extent), plumb it through the private methods in that file, and apply in process_coord?

I'm not opinionated about the best way to read an mvt tile doing this coordinate transformation, so happy to implement the other approach or something else entirely. Thanks!

@michaelkirk
Copy link
Member

Sorry this sat here for so long!

I really like the direction of this, but I have some proposed amendments for clarity and ergonomics - see
dabreegster#1

@michaelkirk
Copy link
Member

Also, looking at your example, I wonder if in the future, more of this should be built into geozero's mvt support.

Currently it looks like we only implement FeatureProcessor for a single tile, but it seems reasonable that lots of (most?) people will want to process the entire tileset.

@dabreegster
Copy link
Contributor Author

Thank you for the fixes; merged them in here.

I wonder if in the future, more of this should be built into geozero's mvt support.

I'd vote yes, but I'm not sure how to design the API. Would we want optional dependencies on mbtiles and flate2 in geozero? And to implement GeozeroDatasource or similar on `Mbtiles? What if the user doesn't want to process every single tile?

Or maybe the mvt::process function could take optional info about the current tile (the tile x and y, the zoom level, and the extent) and handle the math?

@michaelkirk
Copy link
Member

michaelkirk commented Aug 16, 2024

Would we want optional dependencies on mbtiles and flate2 in geozero?

I think that would be unavoidable, right? It seems reasonable to me, but I'm not actually a user of the rust mbtiles integration, so I'd try to defer to what other users think.

And to implement GeozeroDatasource or similar on `Mbtiles? What if the user doesn't want to process every single tile?

These are very good questions that I haven't thought about.

My first guess would be to add some kind of new types to geozero for managing this logic:

// 1. read everything
let tile_iter = MBTilesReader::new(some_impl_read).select_all();

// 2. read a contiguous range (like your example)
let tile_iter = MBTilesReader::new(some_impl_read).select_range(x1..=x2, y1..=y2, zoom..=zoom)

// 3. read arbitrary sequence of tiles given an iterator of  tile ids (x, y, zoom)
// NOTE: getting every other tile is a contrived example, but it seems reasonable that 
// eventually someone will want something more flexible than what's offered by (2.)
let tile_id_iter = (x1..=x2).step_by(2).flat_map(|x|  y1..=y2.iter().step_by(2) ).map(|y| (x, y, zoom))
let tile_iter = MBTilesReader::new(some_impl_read).select_tiles_by_iter(tile_id_iter)

// TileIter is the thing that implements the geozero::FeatureProcessor 
let geojson = tile_iter.to_geojson()

It might even make sense to upstream these Reader/Selection APIs to mbtiles, but I wouldn't hold my breath.

@nyurik
Copy link
Member

nyurik commented Aug 16, 2024

i would love to have proper iterator support in the mbtiles crate. PRs welcome

dabreegster and others added 2 commits September 3, 2024 16:45
1. Rename to clarify it's wrapping XY processing. I'm hopeful we might
   have pre-processors for other fields!

2. Make mutable instead of passing a value - this makes the
   pre-processor signature closer to the processor signature.

3. Add "builder" api to convert FeatureProcessor into XYPreProcessor

4. Add a test!
@michaelkirk
Copy link
Member

I've just rebased and clarified the test cases.

What do you think about this API addition @pka?

I think it implies that we could add similar pre-processing methods for pipelining other transformations.

@pka
Copy link
Member

pka commented Sep 4, 2024

It was one of the initial goals to support processing pipelines like reprojections, but I never found a good way to implement it (see also https://github.com/georust/geozero/blob/main/geozero/src/multiplex.rs for an experimental chained processor).
It gets trickier when you want to process more than one coordinate at a time (like e.g. for smoothing lines) and/or reduce the number of coordinates in a processing step. For that use case I experimented with an event based API, but gave up, because the performance was much worse than the current callback implementation. I deferred these experiments to an iterator based approach.

Having said that, the pre_process_xy API looks good to me! (Non-inverse order would obviousle be nicer.)

@michaelkirk
Copy link
Member

There's only so many algorithms that can be solved in a single pass. Even simple things, like what @kylebarron ran into in #237 (comment), are hindered by this.

That said, I think a reasonable Simplification implementation would work OK with the current APIs.

Copy link
Member

@michaelkirk michaelkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind taking another look at my amendments @dabreegster? If they look good to you, I'll go ahead and merge.

Copy link
Contributor Author

@dabreegster dabreegster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, thanks so much for taking over this!

i would love to have proper iterator support in the mbtiles crate. PRs welcome

When I get some time, I'll rewrite my example reader with the changes here, and see how an iterator API might work

@michaelkirk michaelkirk added this pull request to the merge queue Sep 4, 2024
Merged via the queue into georust:main with commit c30f80e Sep 4, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants