diff --git a/lectures/lesson-15-xarray.qmd b/lectures/lesson-15-xarray.qmd index 6400497..89dcab5 100644 --- a/lectures/lesson-15-xarray.qmd +++ b/lectures/lesson-15-xarray.qmd @@ -236,26 +236,64 @@ temp At this point, since we have a single variable, the dataset attributes and the variable attributes are the same. -### Indexing -An `xarray.DataArray` allows both positional indexing (like `numpy`) and label-based indexing (like `pandas`). -Positional indexing is the most basic, and it's done using Python's `[]` syntax, as in `array[i,j]` with i and j both integers. -**Label-based indexing** takes advantage of dimensions in the array having names and coordinate values that we can use to access data instead of remembering the positional order of each dimension. +### Subsetting +An `xarray.DataArray` is a multi-dimensional array with laballed dimensions. +To select data from it we need to specify which subsets along each dimension we are interested in. +We can specify the data we need from each dimension either by relying on the dimension's positions (**positional dimension lookup**) or by calling each dimension by its name (**dimension lookup by name**). +Let's see some examples. -As an example, suppose we want to know what was the temperature recorded by the weather station located at 40°0′N 80°0′E on September 1st, 2022. -By recalling all the information about how the array is setup with respect to the dimensions and coordinates, we can access this data positionally: + + +**Example** + + + +Suppose we want to know what was the temperature recorded by the weather station located at 40°0′N 80°0′E on September 1st, 2022. + +When we want to rely on the position of the dimensions in the `xarray.DataArray`, we need to remember that lat is the first dimension, lon is the second, and date the third. + +Then, we can then access the values along each dimension in two ways: + +- by integer: the exact same as a `np.array`. Use the locator brackets `[]` and "simply" remember that ***: ```{python} +# access dimensions by position, then use integers for indexing temp[0,1,2] ``` -Or, we can use the dimensions names and their coordinates to access the same value: +- by label: same as `pandas`. We use the `.loc[]` locator to look up a specific coordiante at each position (which represents a dimension): + +```{python} +# access dimensions by position, then use labels for indexing +temp.loc[0, 0, '2022-09-03'] +``` + +For datasets with dozens of dimensions, it can be confusing to remember which dimensions go where. +We can also use the dimension names to subset data, without the need to remember which dimensions goes where +In this case, there are still two ways of selecting data along a dimension: + +- by integer: we specify the integer location of the data we want along each dimension: + +```{python} +# acess dimensions by name, then use integers for indexing +temp.isel(time=2, lon=0, lat=1) +``` + +- by label: we use the coordinate values we want to get! ```{python} +# access dimensions by name, then use labels for indexing temp.sel(time='2022-09-01', lat=40, lon=80) ``` Notice that the result of this indexing is a 1x1 `xarray.DataArray`. -This is because operations on an `xarray.DataArray` (resp. `xarray.DataSet`) always return another `xarray.DataArray` (resp. `xarray.DataSet`). +This is because operations on an `xarray.DataArray` always return another `xarray.DataArray`. In particular, operations returning scalar values will also produce `xarray` objects, so we need to cast them as numbers manually. See [xarray.DataArray.item](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.item.html). @@ -321,10 +359,8 @@ Then save the dataset using the `to_netcdf` method with your file path. Opening NetCDF is similarly straightforward using `xarray.open_dataset()`. ```{python} -# specify file path - don't forget the .nc extension! -fp = os.path.join(os.getcwd(),'temp_dataset.nc') -# save file -temp_dataset.to_netcdf(fp) +# save file - don't forget the .nc extension! +temp_dataset.to_netcdf('temp_dataset.nc') # open to check: check = xr.open_dataset(fp)