Skip to content

PsuedoCode ‐ Patching

Juan Emmanuel Johnson edited this page Feb 27, 2024 · 4 revisions

Recall the general pipeline process for EO data: patching. Patching is when we take a subset of a fixed size from a larger scene or AOI. In general, there are two ways to do the patching process. 1) you can pre-patch your data and then save it to a file or 2) you can patch on the fly using a custom dataset.

Sources

  • SatClip Example for Random Clean Patches - SatClip

Option I: Pre-Patching

In this case, we will pre-chip the images to have consistent chipped datasets. Some advantages to this method is that we are free to choose the data structure of choice to save. This will allow flexibility for when people create their custom datasets provided they are simple data structures like .tif, .png or numpy arrays. In addition, the user will not have to worry about making patches.

Part I - Create ML-Ready Data

Operations

  • Load Analysis-Ready Data
  • Initialize Normalizer
  • Pre-Patching
  • Save ML-Ready Data
  • Save Normalizer

PsuedoCode

# select analysis-ready files and load data
analysis_ready_files: List[str] =ds: GeoDataset = load_dataset(analysis_ready_files)

# calculate transformation parameters
transform_params: Dataclass = calculate_transform_params(ds, **params)
save_normalizer(…, transform_params)

# define patcher and patch parameters
patch_size: Dataclass = Dataclass(lon=256, lat=256)
stride: Dataclass = Dataclass(lon=64, lat=64)
patcher: Patcher = Patcher(patch_size, stride)

# save patches to ML Ready Bucket
file_path: Path = Path(…)
save_name_id: str =num_workers: int =save_patches(patcher, num_workers, file_path, save_name_id)

Part II - Create ML Dataset

Operations

  • Load ML-Ready Data
  • Load Normalizer
  • Apply Normalizer
  • Create Dataset

PsuedoCode

# get ml ready data files
ml_ready_data_files: List[str] = […]

# load transform params, init transform
transform_params: PyTree = load_tranform_params(…)
transformer = init_transformer(transform_params)

# create ML dataset
ds: MLDataset = MLDataset(files, transformer)
# demo item 
num_samples: int =sample: Tensor[“B C H W”] = ds.sample(num_samples)

Option II - On-the-Fly Patching

In this case, we will create a dataset that does some preprocessing on-the-fly. We just need to save the scenes to a chosen data structure and then we need a custom dataset which allows us to subset AOI and take patches. Some advantages of this is that we don’t need to double-save the data, we can retain some of the meta-data of the data, we have more flexibility to experiment with different patching strategies. Some disadvantages of this approach is that we need a more advanced dataset which requires more code and it can be very expensive if the memory is not managed properly.

Operations

  • Load Analysis-Ready Data
  • Apply Normalizer
  • Patch On The Fly

PseudoCode

# get analysis ready data files
analysis_ready_files: List[str] = […]

# load transform params, init transform
transform_params: PyTree =Transformer: Callable = init_transformer(transform_params)

# initialize patch parameters
patch_size: Dataclass = Dataclass(lon=256, lat=256)
stride: Dataclass = Dataclass(lon=64, lat=64)

# initialize dataset
ds: Dataset = Dataset(
	analysis_ready_files, 
	transformer, 
	patch_size, 
	stride, 
	**kwargs
)
# demo item
sample: Tensor[“1 C 256 256”] = ds.sample(1)

Libraries

There are a number of libraries that offer this patching strategy.