Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New documentation structure #433

Open
9 of 30 tasks
leouieda opened this issue Feb 7, 2024 · 4 comments
Open
9 of 30 tasks

New documentation structure #433

leouieda opened this issue Feb 7, 2024 · 4 comments
Labels
documentation Improvements or additions to documentation
Milestone

Comments

@leouieda
Copy link
Member

leouieda commented Feb 7, 2024

I'm proposing a full reorganization of the documentation based around the Divio system. Instead of a galley and user guide, we would have the following. All of these sections will use the Ensaio data.

Getting started

  • Installing
  • A taste of Verde: Quick showcase of using Verde to generate a grid and a profile from a dataset. Use a Cartesian one or just run it on lon,lat with a warning that this isn't the best way to do it. Main goal is to get people excited about Verde so don't need to explain a lot here. No weights, blockmeans, CV, etc. In the end, instruct to start the tutorial
  • Citing Verde

Tutorial

The tutorial should have several parts that grow in complexity and are meant to be followed in order by someone new to Verde. They can include some notes with further reading but shouldn't try to explain how things work under the hood or anything more advanced than what each one is aiming to tackle (leave that for other sections).

  • Creating your first grid: Show how to make a grid in Cartesian coordinates using BlockReduce projection, and Spline with default parameters. Go over how to add metadata to the grid and saving it to netCDF. Use the bathymetry data.
  • Grids and profiles in geographic coordinates: Show how to pass a projection to the grid and profile methods to get geographic grids/profiles. How to edit the metadata to get proper names. Use the same data. After this, do all of the following tutorials using projections.
  • Using data weights and uncertainties: How to use weights in BlockMean/BlockReduce and Spline. First show how to manually add a weight to avoid fitting a point (use BlockReduce). Then show how to use uncertainties to add weights to BlockMean and Spline. Use the vertical GPS data which has weights.
  • Chaining operations: Use Chain to build a pipeline with BlockMean and Spline for the GPS data. Show how to access each individual step. This is important for the cross-validation section.
  • Evaluating interpolations through cross-validation: How to use cross-validation in Verde. Only use the blocked versions. Start with train_test_split then show BlockKFold and cross_val_score. Explain why a Chain is needed and link to resources on leakage. Use the GPS data.
  • Selecting optimal spline parameters: How to do a grid-search to find the best spline damping. Use the BlockKFold from the previous section on a loop. Use the GPS data.
  • Interpolating 2D and 3D vectors: How to use Vector with a Chained BlockMean and Spline to grid the 3 components at once. Then how to use VectorSpline2D on the horizontal components. Use the GPS data.

How to

These should be short and to the point, focused on the problem presented. They can assume that people have done the tutorials.

  • Decimate large datasets: How to turn a large dataset into a smaller one with BlockReduce. Particularly when data are oversampled along tracks. The bathymetry data is good for this.
  • Interpolate large datasets: For large than ~10k the Spline is too heavy. Say to use KNeighbors instead and show an example using it on a full lidar data (the volcano one maybe) with no BlockReduce. Try doing CV to find the number of neighbors.
  • Project a grid: Make a sythetic lon,lat grid and project it to the polar coordinates.
  • Select points inside a region: How to use vd.inside to index points inside a region.
  • Mask grid points too far from data points: Use verde.distance_mask. Trail island dataset + KNeighbors is a good one for this.
  • Mask grid points outside of the data convexhull: Use the convex_hull_mask. Maybe use the Bushveld height data for this.
  • Estimate and remove a polinomial trend: Fit a trend and remove it from the point data, grid the trend.
  • Calculate statistics on spatial bins: Use BlockReduce to calculate standard deviation within blocks on volcano lidar as a measure of roughness.
  • Split point data into spatial blocks: Run block_split on pretty much any dataset and show how to loop over the blocks.
  • Split point data into rollowing windows: Run rolling_window to split the data and loop over the blocks.

Explanations

These are meant to explain how things work and why they are that way.

  • Weights and uncertainties in data decimation: How weights work in BlockMean and BlockReduce. This actually goes into detail about what each of them mean, not just how to pass the weights.
  • Adjust spacing or region in grid coordinates: How these work and what it looks like when they are changed.
  • Grid-node and pixel registration: Explain the difference and what they look like when we make a grid.
  • How spline interpolation works: Theory beind the spline interpolation. Build the Jacobian matrix and solve the linear system by hand. Show how to make a prediction.
  • Conventions and definitions: List of conventions and definitions used throughout the project.

Reference documentation

What we already have.

  • API
  • References
  • Changelog
  • Version compatibility
  • Documentation for other versions

A lot of the existing docs can be repurposed for this. The only thing I would change are the datasets used, update the writing, and use more notes/hints/etc.

This can be done in parts, one section at a time. When it's all done, we can delete the sphinx-gallery parts and remove the sample data.

@leouieda leouieda added the documentation Improvements or additions to documentation label Feb 7, 2024
@leouieda leouieda added this to the v2.0.0 milestone Feb 7, 2024
@mdtanker
Copy link
Member

mdtanker commented Feb 7, 2024

Great idea! I just watched the Divio pycon talk and think it's a great watch to organize 🙂

@leouieda
Copy link
Member Author

leouieda commented Feb 7, 2024

I saw that talk a while ago and it stayed in the back of my mind all this time. I finally think I digested that enough to come up with this plan. Hope it works!

@MGomezN
Copy link
Member

MGomezN commented Feb 15, 2024

Great idea! I just watched the Divio pycon talk and think it's a great watch to organize 🙂

This one @mdtanker https://www.youtube.com/watch?v=t4vKPhjcMZg?

@mdtanker
Copy link
Member

@MGomezN yes thats the one!

leouieda added a commit that referenced this issue Nov 29, 2024
The first tutorial of the new documentation structure (see #433). Trying
to be as simple as possible on how to generate a grid from some data. No
cross-validation or other fancy things if it can be avoided.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants