PsuedoCode ‐ Patching

Recall the general pipeline process for EO data: patching. Patching is when we take a subset of a fixed size from a larger scene or AOI. In general, there are two ways to do the patching process. 1) you can pre-patch your data and then save it to a file or 2) you can patch on the fly using a custom dataset.

Sources

SatClip Example for Random Clean Patches - SatClip

Option I: Pre-Patching

In this case, we will pre-chip the images to have consistent chipped datasets. Some advantages to this method is that we are free to choose the data structure of choice to save. This will allow flexibility for when people create their custom datasets provided they are simple data structures like .tif, .png or numpy arrays. In addition, the user will not have to worry about making patches.

Part I - Create ML-Ready Data

Operations

Load Analysis-Ready Data
Initialize Normalizer
Pre-Patching
Save ML-Ready Data
Save Normalizer

PsuedoCode

# select analysis-ready files and load data
analysis_ready_files: List[str] = …
ds: GeoDataset = load_dataset(analysis_ready_files)

# calculate transformation parameters
transform_params: Dataclass = calculate_transform_params(ds, **params)
save_normalizer(…, transform_params)

# define patcher and patch parameters
patch_size: Dataclass = Dataclass(lon=256, lat=256)
stride: Dataclass = Dataclass(lon=64, lat=64)
patcher: Patcher = Patcher(patch_size, stride)

# save patches to ML Ready Bucket
file_path: Path = Path(…)
save_name_id: str = …
num_workers: int = …
save_patches(patcher, num_workers, file_path, save_name_id)

Part II - Create ML Dataset

Operations

Load ML-Ready Data
Load Normalizer
Apply Normalizer
Create Dataset

PsuedoCode

# get ml ready data files
ml_ready_data_files: List[str] = […]

# load transform params, init transform
transform_params: PyTree = load_tranform_params(…)
transformer = init_transformer(transform_params)

# create ML dataset
ds: MLDataset = MLDataset(files, transformer)
# demo item 
num_samples: int = …
sample: Tensor[“B C H W”] = ds.sample(num_samples)

Option II - On-the-Fly Patching

In this case, we will create a dataset that does some preprocessing on-the-fly. We just need to save the scenes to a chosen data structure and then we need a custom dataset which allows us to subset AOI and take patches. Some advantages of this is that we don’t need to double-save the data, we can retain some of the meta-data of the data, we have more flexibility to experiment with different patching strategies. Some disadvantages of this approach is that we need a more advanced dataset which requires more code and it can be very expensive if the memory is not managed properly.

Operations

Load Analysis-Ready Data
Apply Normalizer
Patch On The Fly

PseudoCode

# get analysis ready data files
analysis_ready_files: List[str] = […]

# load transform params, init transform
transform_params: PyTree = …
Transformer: Callable = init_transformer(transform_params)

# initialize patch parameters
patch_size: Dataclass = Dataclass(lon=256, lat=256)
stride: Dataclass = Dataclass(lon=64, lat=64)

# initialize dataset
ds: Dataset = Dataset(
	analysis_ready_files, 
	transformer, 
	patch_size, 
	stride, 
	**kwargs
)
# demo item
sample: Tensor[“1 C 256 256”] = ds.sample(1)

Libraries

There are a number of libraries that offer this patching strategy.

xrpatcher is a lightweight patcher for xarray.Dataset structures which can easily be composed with PyTorch Datasets.
torchgeo provide some lightweight datasets for rasters and vectors and include geo information.
Raster-Vision

This research is funded through a NASA 22-MDRAIT22-0018 award (No 80NSSC23K1045) and managed by Trillium Technologies Inc (trillium.tech).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PsuedoCode ‐ Patching

Option I: Pre-Patching

Part I - Create ML-Ready Data

Part II - Create ML Dataset

Option II - On-the-Fly Patching

Clone this wiki locally