-
Notifications
You must be signed in to change notification settings - Fork 13
Block Size
Cube data stored on an Amazon Web Services S3 bucket is concatenated into a 1-dimensional array. For the case of NumPy arrays, which is the storage format for the TESSCut cubes, these data are concatenated column-wise, meaning the first byte of a column will be stored adjacent to the last byte of the preceding column (see figure below). As such, when a user requests a cutout, rather than 2-dimensional indexing, an API call must be made where the desired bytes that correspond to the cutout requested need to be known. To do this we use Astropy’s new SEEK functionality to unpack and buffer through the entire cube file that the cutout is stored in, to reach the location of the requested cutout.
This new functionality is provided by wrapping Astropy around the Filesystem Spec (fsspec) tool, which is a tool that allows users to access data on remote file systems through instances made for the data storage client. Through Astropy, we are able to utilize fsspec to unpack and buffer the cube file to the location of the cutout, and generate the cutout without needing to download the entire cube file. The way in which bytes are buffered through is in steps of blocks. The block size is the size of the buffer, or the number of bytes that will be buffered through for a single request. If the desired pixels that belong to the requested cutout do not all fall within the block size, a consecutive request will be made, and requests will be made until all desired pixels are retrieved.