Skip to content

Block Size

Jennifer V Medina edited this page Jul 17, 2023 · 1 revision

Cube data stored on an Amazon Web Services S3 bucket is concatenated into a 1-dimensional array. For the case of NumPy arrays, which is the storage format for the TESSCut cubes, these data are concatenated column-wise, meaning the first byte of a column will be stored adjacent to the last byte of the preceding column (see figure below). As such, when a user requests a cutout, rather than 2-dimensional indexing, an API call must be made where the desired bytes that correspond to the cutout requested need to be known. To do this we use Astropy’s new SEEK functionality to unpack and buffer through the entire cube file that the cutout is stored in, to reach the location of the requested cutout.

image

This new functionality is provided by wrapping Astropy around the Filesystem Spec (fsspec) tool, which is a tool that allows users to access data on remote file systems through instances made for the data storage client. Through Astropy, we are able to utilize fsspec to unpack and buffer the cube file to the location of the cutout, and generate the cutout without needing to download the entire cube file. The way in which bytes are buffered through is in steps of blocks. The block size is the size of the buffer, or the number of bytes that will be buffered through for a single request. If the desired pixels that belong to the requested cutout do not all fall within the block size, a consecutive request will be made, and requests will be made until all desired pixels are retrieved. Screen Shot 2023-07-17 at 9 10 39 AM Screen Shot 2023-07-17 at 9 10 26 AM Screen Shot 2023-07-17 at 9 10 16 AM Screen Shot 2023-07-17 at 9 09 59 AM

Clone this wiki locally