Map transformations

Most of the georeferencing machinery for gridded datasets is handled by the Grid class: its capacity to handle gridded datasets in a painless manner was one of the primary motivations to develop Salem.

Grids

A point on earth can be defined unambiguously in two ways:

DATUM (lon, lat, datum): longitudes and latitudes are angular coordinates of a point on an ellipsoid (often called “datum”)
PROJ (x, y, projection): x (eastings) and y (northings) are cartesian coordinates of a point in a map projection (the unit of x, y is usually meter)

Salem adds a third coordinate reference system (crs) to this list:

GRID (i, j, Grid): on a structured grid, the (x, y) coordinates are distant of a constant (dx, dy) step. The (x, y) coordinates are therefore equivalent to a new reference frame (i, j) proportional to the projection’s (x, y) frame.

Transformations between datums and projections is handled by several tools in the python ecosystem, for example GDAL or the more lightweight pyproj, which is the tool used by Salem internally [1].

The concept of Grid added by Salem is useful when transforming data between two structured datasets, or from an unstructured dataset to a structured one.

A Grid is defined by a projection, a reference point in this projection, a grid spacing and a number of grid points:

In [1]: import numpy as np

In [2]: import salem

In [3]: from salem import wgs84

In [4]: grid = salem.Grid(nxny=(3, 2), dxdy=(1, 1), x0y0=(0.5, 0.5), proj=wgs84)

In [5]: x, y = grid.xy_coordinates

In [6]: x
Out[6]: 
array([[0.5, 1.5, 2.5],
       [0.5, 1.5, 2.5]])

In [7]: y
Out[7]: 
array([[0.5, 0.5, 0.5],
       [1.5, 1.5, 1.5]])

The default is to define the grids according to the pixels center point:

In [8]: smap = salem.Map(grid)

In [9]: smap.set_data(np.arange(6).reshape((2, 3)))

In [10]: lon, lat = grid.ll_coordinates

In [11]: smap.set_points(lon.flatten(), lat.flatten())

In [12]: smap.visualize(addcbar=False);

But with the pixel_ref keyword you can use another convention. For Salem, the two conventions are identical:

In [13]: grid_c = salem.Grid(nxny=(3, 2), dxdy=(1, 1), x0y0=(0, 0),
   ....:                     proj=wgs84, pixel_ref='corner')
   ....: 

In [14]: assert grid_c == grid

While it’s good to know how grids work, most of the time grids should be inferred directly from the data files (see also: Initializing the accessor):

In [15]: ds = salem.open_xr_dataset(salem.get_demo_file('himalaya.tif'))

In [16]: grid = ds.salem.grid

In [17]: grid.proj.srs
Out[17]: '+proj=longlat +datum=WGS84 +no_defs'

In [18]: grid.extent
Out[18]: 
[np.float64(78.31668014000002),
 np.float64(93.9416809549872),
 np.float64(24.233329905463762),
 np.float64(31.54166362)]

Grids come with several convenience functions, for example for transforming points onto the grid coordinates:

In [19]: grid.transform(85, 27, crs=salem.wgs84)
Out[19]: (801.4983413684229, 544.4996059728029)

Or for reprojecting structured data as explained below.

Reprojecting data

Interpolation

The standard way to reproject a gridded dataset into another one is to use the transform() method:

In [20]: dse = salem.open_xr_dataset(salem.get_demo_file('era_interim_tibet.nc'))

In [21]: t2_era_reproj = ds.salem.transform(dse.t2m.isel(time=0))

In [22]: t2_era_reproj.salem.quick_map();

This is the recommended way if the output grid (in this case, a high resolution lon-lat grid) is of similar or finer resolution than the input grid (in this case, reanalysis data at 0.75°). As of v0.2, three interpolation methods are available in Salem: nearest (default), linear, or spline:

In [23]: t2_era_reproj = ds.salem.transform(dse.t2m.isel(time=0), interp='spline')

In [24]: t2_era_reproj.salem.quick_map();

Internally, Salem uses pyproj for the coordinates transformation and scipy’s interpolation methods for the resampling. Note that reprojecting data can be computationally and memory expensive: it is generally recommended to reproject your data at the end of the processing chain if possible.

The transform() method returns an object of the same structure as the input. The only differences are the coordinates and the grid, which are those of the arrival grid:

In [25]: dst = ds.salem.transform(dse)

In [26]: dst
Out[26]: 
<xarray.Dataset> Size: 53MB
Dimensions:  (time: 4, x: 1875, y: 877)
Coordinates:
  * time     (time) datetime64[ns] 32B 2012-06-01 ... 2012-06-01T18:00:00
  * x        (x) float64 15kB 78.32 78.33 78.34 78.35 ... 93.92 93.93 93.94
  * y        (y) float64 7kB 31.54 31.53 31.52 31.51 ... 24.26 24.25 24.25 24.24
Data variables:
    t2m      (time, y, x) float64 53MB 278.6 278.6 278.6 ... 296.2 296.2 296.2
Attributes:
    pyproj_srs:  +proj=longlat +datum=WGS84 +no_defs

In [27]: dst.salem.grid == ds.salem.grid
Out[27]: True

Aggregation

If you need to resample higher resolution data onto a coarser grid, lookup_transform() may be the way to go. This method gets its name from the “lookup table” it uses internally to store the information needed for the resampling: for each grid point in the coarser dataset, the lookup table stores the coordinates of the high-resolution grid located below.

The default resampling method is to average all these points:

In [28]: dse = dse.salem.subset(corners=((77, 23), (94.5, 32.5)))

In [29]: dsl = dse.salem.lookup_transform(ds)

In [30]: dsl.data.salem.quick_map(cmap='terrain');

But any aggregation method is available, for example np.std, or len if you want to know the number of high resolution pixels found below a coarse grid point:

In [31]: dsl = dse.salem.lookup_transform(ds, method=len)

In [32]: dsl.data.salem.quick_map();