indexing, first pass

carmengg · Nov 6, 2023 · aaae959 · aaae959
1 parent 7f519d0
commit aaae959
Showing 1 changed file with 48 additions and 12 deletions.
diff --git a/lectures/lesson-15-xarray.qmd b/lectures/lesson-15-xarray.qmd
@@ -236,26 +236,64 @@ temp
 
 At this point, since we have a single variable, the dataset attributes and the variable attributes are the same. 
 
-### Indexing
-An `xarray.DataArray` allows both positional indexing (like `numpy`) and label-based indexing (like `pandas`). 
-Positional indexing is the most basic, and it's done using Python's `[]` syntax, as in `array[i,j]` with i and j both integers. 
-**Label-based indexing** takes advantage of dimensions in the array having names and coordinate values that we can use to access data instead of remembering the positional order of each dimension.
+### Subsetting
+An `xarray.DataArray` is a multi-dimensional array with laballed dimensions. 
+To select data from it we need to specify which subsets along each dimension we are interested in. 
+We can specify the data we need from each dimension either by relying on the dimension's positions (**positional dimension lookup**) or by calling each dimension by its name (**dimension lookup by name**). 
+Let's see some examples.
 
-As an example, suppose we want to know what was the temperature recorded by the weather station located at 40°0′N 80°0′E on September 1st, 2022. 
-By recalling all the information about how the array is setup with respect to the dimensions and coordinates, we can access this data positionally:
+<!--
+An `xarray.DataArray` allows both positional indexing (like `numpy`) and label-based indexing (similar `pandas`). 
+
+- **Positional indexing** is the most basic, and it's done using Python's `[]` syntax, as in `array[i,j]` with i and j both integers. 
+
+- **Label-based indexing** takes advantage of dimensions in the array having names and coordinate values that we can use to access data instead of remembering the positional order of each dimension.
+-->
+
+**Example** 
+
+<!-- TO DO: change this 0,1,2 -->
+
+Suppose we want to know what was the temperature recorded by the weather station located at 40°0′N 80°0′E on September 1st, 2022. 
+
+When we want to rely on the position of the dimensions in the `xarray.DataArray`, we need to remember that lat is the first dimension, lon is the second, and date the third. 
+
+Then, we can then access the values along each dimension in two ways:
+
+- by integer: the exact same as a `np.array`. Use the locator brackets `[]` and "simply"  remember that ***:
 
 ```{python}
+# access dimensions by position, then use integers for indexing
 temp[0,1,2]
 ```
 
-Or, we can use the dimensions names and their coordinates to access the same value:
+- by label: same as `pandas`. We use the `.loc[]` locator to look up a specific coordiante at each position (which represents a dimension):
+
+```{python}
+# access dimensions by position, then use labels for indexing
+temp.loc[0, 0, '2022-09-03']
+```
+
+For datasets with dozens of dimensions, it can be confusing to remember which dimensions go where. 
+We can also use the dimension names to subset data, without the need to remember which dimensions goes where
+In this case, there are still two ways of selecting data along a dimension:
+
+- by integer: we specify the integer location of the data we want along each dimension:
+
+```{python}
+# acess dimensions by name, then use integers for indexing
+temp.isel(time=2, lon=0, lat=1)
+```
+
+- by label: we use the coordinate values we want to get!
 
 ```{python}
+# access dimensions by name, then use labels for indexing
 temp.sel(time='2022-09-01', lat=40, lon=80)
 ```
 
 Notice that the result of this indexing is a 1x1 `xarray.DataArray`. 
-This is because operations on an `xarray.DataArray` (resp. `xarray.DataSet`) always return another `xarray.DataArray` (resp. `xarray.DataSet`). 
+This is because operations on an `xarray.DataArray` always return another `xarray.DataArray`. 
 In particular, operations returning scalar values will also produce `xarray` objects, so we need to cast them as numbers manually. 
 See [xarray.DataArray.item](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.item.html).
 
@@ -321,10 +359,8 @@ Then save the dataset using the `to_netcdf` method with your file path.
 Opening NetCDF is similarly straightforward using `xarray.open_dataset()`.
 
 ```{python}
-# specify file path - don't forget the .nc extension!
-fp = os.path.join(os.getcwd(),'temp_dataset.nc') 
-# save file
-temp_dataset.to_netcdf(fp)
+# save file - don't forget the .nc extension!
+temp_dataset.to_netcdf('temp_dataset.nc')
 
 # open to check:
 check = xr.open_dataset(fp)