# Map transformations

Most of the georeferencing machinery for gridded datasets is
handled by the `Grid`

class: its capacity to handle
gridded datasets in a painless manner was one of the primary
motivations to develop Salem.

## Grids

A point on earth can be defined unambiguously in two ways:

- DATUM (lon, lat, datum)
longitudes and latitudes are angular coordinates of a point on an ellipsoid (often called “datum”)

- PROJ (x, y, projection)
x (eastings) and y (northings) are cartesian coordinates of a point in a map projection (the unit of x, y is usually meter)

Salem adds a third coordinate reference system (**crs**) to this list:

- GRID (i, j, Grid)
on a structured grid, the (x, y) coordinates are distant of a constant (dx, dy) step. The (x, y) coordinates are therefore equivalent to a new reference frame (i, j) proportional to the projection’s (x, y) frame.

Transformations between datums and projections is handled by several tools in the python ecosystem, for example GDAL or the more lightweight pyproj, which is the tool used by Salem internally [1].

The concept of Grid added by Salem is useful when transforming data between two structured datasets, or from an unstructured dataset to a structured one.

A `Grid`

is defined by a projection, a reference point in
this projection, a grid spacing and a number of grid points:

```
In [1]: import numpy as np
In [2]: import salem
In [3]: from salem import wgs84
In [4]: grid = salem.Grid(nxny=(3, 2), dxdy=(1, 1), x0y0=(0.5, 0.5), proj=wgs84)
In [5]: x, y = grid.xy_coordinates
In [6]: x
Out[6]:
array([[0.5, 1.5, 2.5],
[0.5, 1.5, 2.5]])
In [7]: y
Out[7]:
array([[0.5, 0.5, 0.5],
[1.5, 1.5, 1.5]])
```

The default is to define the grids according to the pixels center point:

```
In [8]: smap = salem.Map(grid)
In [9]: smap.set_data(np.arange(6).reshape((2, 3)))
In [10]: lon, lat = grid.ll_coordinates
In [11]: smap.set_points(lon.flatten(), lat.flatten())
In [12]: smap.visualize(addcbar=False);
```

But with the `pixel_ref`

keyword you can use another convention. For Salem,
the two conventions are identical:

```
In [13]: grid_c = salem.Grid(nxny=(3, 2), dxdy=(1, 1), x0y0=(0, 0),
....: proj=wgs84, pixel_ref='corner')
....:
In [14]: assert grid_c == grid
```

While it’s good to know how grids work, most of the time grids should be inferred directly from the data files (see also: Initializing the accessor):

```
In [15]: ds = salem.open_xr_dataset(salem.get_demo_file('himalaya.tif'))
In [16]: grid = ds.salem.grid
In [17]: grid.proj.srs
Out[17]: '+proj=longlat +datum=WGS84 +no_defs'
In [18]: grid.extent
Out[18]:
[np.float64(78.31668014000002),
np.float64(93.9416809549872),
np.float64(24.233329905463762),
np.float64(31.54166362)]
```

Grids come with several convenience functions, for example for transforming points onto the grid coordinates:

```
In [19]: grid.transform(85, 27, crs=salem.wgs84)
Out[19]: (801.4983413684229, 544.4996059728029)
```

Or for reprojecting structured data as explained below.

## Reprojecting data

### Interpolation

The standard way to reproject a gridded dataset into another one is to use the
`transform()`

method:

```
In [20]: dse = salem.open_xr_dataset(salem.get_demo_file('era_interim_tibet.nc'))
In [21]: t2_era_reproj = ds.salem.transform(dse.t2m.isel(time=0))
In [22]: t2_era_reproj.salem.quick_map();
```

This is the recommended way if the output grid (in this case, a high resolution
lon-lat grid) is of similar or finer resolution than the input grid (in this
case, reanalysis data at 0.75°). As of v0.2, three interpolation methods are
available in Salem: `nearest`

(default), `linear`

, or `spline`

:

```
In [23]: t2_era_reproj = ds.salem.transform(dse.t2m.isel(time=0), interp='spline')
In [24]: t2_era_reproj.salem.quick_map();
```

Internally, Salem uses pyproj for the coordinates transformation and scipy’s interpolation methods for the resampling. Note that reprojecting data can be computationally and memory expensive: it is generally recommended to reproject your data at the end of the processing chain if possible.

The `transform()`

method returns an object of
the same structure as the input. The only differences are the coordinates and
the grid, which are those of the arrival grid:

```
In [25]: dst = ds.salem.transform(dse)
In [26]: dst
Out[26]:
<xarray.Dataset> Size: 53MB
Dimensions: (time: 4, x: 1875, y: 877)
Coordinates:
* time (time) datetime64[ns] 32B 2012-06-01 ... 2012-06-01T18:00:00
* x (x) float64 15kB 78.32 78.33 78.34 78.35 ... 93.92 93.93 93.94
* y (y) float64 7kB 31.54 31.53 31.52 31.51 ... 24.26 24.25 24.25 24.24
Data variables:
t2m (time, y, x) float64 53MB 278.6 278.6 278.6 ... 296.2 296.2 296.2
Attributes:
pyproj_srs: +proj=longlat +datum=WGS84 +no_defs
In [27]: dst.salem.grid == ds.salem.grid
Out[27]: True
```

### Aggregation

If you need to resample higher resolution data onto a coarser grid,
`lookup_transform()`

may be the way to go. This
method gets its name from the “lookup table” it uses internally to store
the information needed for the resampling: for each
grid point in the coarser dataset, the lookup table stores the coordinates
of the high-resolution grid located below.

The default resampling method is to average all these points:

```
In [28]: dse = dse.salem.subset(corners=((77, 23), (94.5, 32.5)))
In [29]: dsl = dse.salem.lookup_transform(ds)
In [30]: dsl.data.salem.quick_map(cmap='terrain');
```

But any aggregation method is available, for example `np.std`

, or `len`

if
you want to know the number of high resolution pixels found below a coarse
grid point:

```
In [31]: dsl = dse.salem.lookup_transform(ds, method=len)
In [32]: dsl.data.salem.quick_map();
```