Skip to content

Commit

Permalink
indexing, first pass
Browse files Browse the repository at this point in the history
  • Loading branch information
carmengg committed Nov 6, 2023
1 parent 7f519d0 commit aaae959
Showing 1 changed file with 48 additions and 12 deletions.
60 changes: 48 additions & 12 deletions lectures/lesson-15-xarray.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -236,26 +236,64 @@ temp

At this point, since we have a single variable, the dataset attributes and the variable attributes are the same.

### Indexing
An `xarray.DataArray` allows both positional indexing (like `numpy`) and label-based indexing (like `pandas`).
Positional indexing is the most basic, and it's done using Python's `[]` syntax, as in `array[i,j]` with i and j both integers.
**Label-based indexing** takes advantage of dimensions in the array having names and coordinate values that we can use to access data instead of remembering the positional order of each dimension.
### Subsetting
An `xarray.DataArray` is a multi-dimensional array with laballed dimensions.
To select data from it we need to specify which subsets along each dimension we are interested in.
We can specify the data we need from each dimension either by relying on the dimension's positions (**positional dimension lookup**) or by calling each dimension by its name (**dimension lookup by name**).
Let's see some examples.

As an example, suppose we want to know what was the temperature recorded by the weather station located at 40°0′N 80°0′E on September 1st, 2022.
By recalling all the information about how the array is setup with respect to the dimensions and coordinates, we can access this data positionally:
<!--
An `xarray.DataArray` allows both positional indexing (like `numpy`) and label-based indexing (similar `pandas`).
- **Positional indexing** is the most basic, and it's done using Python's `[]` syntax, as in `array[i,j]` with i and j both integers.
- **Label-based indexing** takes advantage of dimensions in the array having names and coordinate values that we can use to access data instead of remembering the positional order of each dimension.
-->

**Example**

<!-- TO DO: change this 0,1,2 -->

Suppose we want to know what was the temperature recorded by the weather station located at 40°0′N 80°0′E on September 1st, 2022.

When we want to rely on the position of the dimensions in the `xarray.DataArray`, we need to remember that lat is the first dimension, lon is the second, and date the third.

Then, we can then access the values along each dimension in two ways:

- by integer: the exact same as a `np.array`. Use the locator brackets `[]` and "simply" remember that ***:

```{python}
# access dimensions by position, then use integers for indexing
temp[0,1,2]
```

Or, we can use the dimensions names and their coordinates to access the same value:
- by label: same as `pandas`. We use the `.loc[]` locator to look up a specific coordiante at each position (which represents a dimension):

```{python}
# access dimensions by position, then use labels for indexing
temp.loc[0, 0, '2022-09-03']
```

For datasets with dozens of dimensions, it can be confusing to remember which dimensions go where.
We can also use the dimension names to subset data, without the need to remember which dimensions goes where
In this case, there are still two ways of selecting data along a dimension:

- by integer: we specify the integer location of the data we want along each dimension:

```{python}
# acess dimensions by name, then use integers for indexing
temp.isel(time=2, lon=0, lat=1)
```

- by label: we use the coordinate values we want to get!

```{python}
# access dimensions by name, then use labels for indexing
temp.sel(time='2022-09-01', lat=40, lon=80)
```

Notice that the result of this indexing is a 1x1 `xarray.DataArray`.
This is because operations on an `xarray.DataArray` (resp. `xarray.DataSet`) always return another `xarray.DataArray` (resp. `xarray.DataSet`).
This is because operations on an `xarray.DataArray` always return another `xarray.DataArray`.
In particular, operations returning scalar values will also produce `xarray` objects, so we need to cast them as numbers manually.
See [xarray.DataArray.item](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.item.html).

Expand Down Expand Up @@ -321,10 +359,8 @@ Then save the dataset using the `to_netcdf` method with your file path.
Opening NetCDF is similarly straightforward using `xarray.open_dataset()`.

```{python}
# specify file path - don't forget the .nc extension!
fp = os.path.join(os.getcwd(),'temp_dataset.nc')
# save file
temp_dataset.to_netcdf(fp)
# save file - don't forget the .nc extension!
temp_dataset.to_netcdf('temp_dataset.nc')
# open to check:
check = xr.open_dataset(fp)
Expand Down

0 comments on commit aaae959

Please sign in to comment.