R as GIS, part 1: vector

[ad_1]

Working with spatial information is changing into an increasing number of frequent with the event of geoportals offering free entry to massive variety of spatial datasets. Geographic Information System software program such as ArcGis or QGis are generally used to work with these information, on this sequence of submit I need to present the GIS capacities of R.

Spatial information are available two class:

  1. vector-type: spatial information are represented as discrete parts such as factors (cities), traces (rivers) or polygons (nations).
  2. raster-type: spatial information are represented as common grids with spatial data discretized into cells / pixels. Raster information are the usual format for distant sensing, satelite information.

In this primary submit we’ll see easy methods to work with vector information in R utilizing the sf package deal.

Getting the info

First we load the wanted libraries:

library(tidyverse)
library(sf)

Reading in spatial information into R might be simply finished utilizing the st_read perform. The perform assist numerous codecs by utilizing the GDAL driver within the background. We will use an instance dataset from the Flemish area of Belgium, downloading a zipper file with all of the shapefiles, unzipping it and loading it into R:

# outline a goal listing in your laptop computer
target_dir <- "./"
# put this as the working listing
setwd(target_dir)
# obtain the information
obtain.file("https://downloadagiv.blob.core.windows.net/referentiebestand-gemeenten/VoorlopigRefBestandGemeentegrenzen_2016-01-29/Voorlopig_referentiebestand_gemeentegrenzen_toestand_29_01_2016_GewVLA_Shape.zip", destfile = "municipality.zip")
# unzip it
unzip(zipfile = "municipality.zip")
# learn within the Flemish province shapefile
province <- st_read("Shapefile/Refprv.shp")
## Reading layer `Refprv' from information supply `D:DocumentsProgramming_stufflionel68.github.io_posts_datar_as_gisShapefileRefprv.shp' utilizing driver `ESRI Shapefile'
## Simple function assortment with 5 options and eight fields
## geometry kind:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 21991.38 ymin: 153049.Four xmax: 258878.5 ymax: 244027.1
## projected CRS:  Belge 1972 / Belgian Lambert 72

The perform tells us that te dataset is fabricated from 5 options (rows) every having Eight fields (columns). We additionally get some infos on the bounding field and the Coordinate Reference System (CRS).

The perform returns an object of sophistication sf that appears like an information.body:

province
## Simple function assortment with 5 options and eight fields
## geometry kind:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 21991.38 ymin: 153049.Four xmax: 258878.5 ymax: 244027.1
## projected CRS:  Belge 1972 / Belgian Lambert 72
##   UIDN OIDN TERRID            NAAM NISCODE NUTS2   LENGTE    OPPERVL
## 1    6    2    357       Antwerpen   10000  BE21 409906.2 2876444170
## 2    7    4    359  Vlaams Brabant   20001  BE24 507999.1 2118893799
## 3    9    5    356 Oost-Vlaanderen   40000  BE23 404985.3 3008166146
## 4   10    1    355         Limburg   70000  BE22 397732.6 2428024237
## 5   11    3    351 West-Vlaanderen   30000  BE25 356653.0 3197075771
##                         geometry
## 1 MULTIPOLYGON (((178133.9 24...
## 2 MULTIPOLYGON (((200484.9 19...
## Three MULTIPOLYGON (((142473.9 22...
## Four MULTIPOLYGON (((231635.6 21...
## 5 MULTIPOLYGON (((80189.16 22...

The necessary distinction with a normal information.body is the “geometry” column that incorporates the vector informations utilizing the well-known textual content illustration. This column is sticky, it should keep within the R object except explicitely dropped (verify ?st_drop_geometry).

Manipulating sf object

The dplyr capabilities can be utilized to control sf objects, as an illustration we’ll rename the “NAAM” column, create a brand new “area” column and kind the rows by the realm descending:

province %>%
  rename(identify = NAAM) %>%
  mutate(space = OPPERVL / 1e6) %>%
  choose(identify, space) %>%
  prepare(desc(space)) -> province

Coordinate Reference System

Every spatial information comes with a Coordinate reference system (CRS briefly) that describes how the info are represented on the floor of the earth. An in-depth definition of CRS is past the scope of this submit, right here we’ll simply see easy methods to get CRS data from sf object and easy methods to rework them into new reference system.

CRS are outlined in sf by their epsg code, as an illustration the classical GPS/WGS84 CRS has the 4326 code, let’s discover this:

# get CRS
st_crs(province)
## Coordinate Reference System:
##   User enter: Belge 1972 / Belgian Lambert 72 
##   wkt:
## PROJCRS["Belge 1972 / Belgian Lambert 72",
##     BASEGEOGCRS["Belge 1972",
##         DATUM["Reseau National Belge 1972",
##             ELLIPSOID["International 1924",6378388,297,
##                 LENGTHUNIT["metre",1]],
##             ID["EPSG",6313]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["Degree",0.0174532925199433]]],
##     CONVERSION["unnamed",
##         METHOD["Lambert Conic Conformal (2SP)",
##             ID["EPSG",9802]],
##         PARAMETER["Latitude of false origin",90,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8821]],
##         PARAMETER["Longitude of false origin",4.36748666666667,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8822]],
##         PARAMETER["Latitude of 1st standard parallel",49.8333339,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8823]],
##         PARAMETER["Latitude of 2nd standard parallel",51.1666672333333,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8824]],
##         PARAMETER["Easting at false origin",150000.01256,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8826]],
##         PARAMETER["Northing at false origin",5400088.4378,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8827]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]
# re-project into the European CRS
province <- st_transform(province, 3035)

When working with spatial information it’s key to be sure that the CRS are enough, as an illustration computing areas or distances make extra sense in projected CRS somewhat than in geographic CRS. Also, when doing spatial operations on a number of spatial objects all of the CRS need to be an identical.

Plotting spatial information

sf objects might be, very convinently, plotted utilizing ggplot2:

# a easy plot of the province
# coloured by their floor
ggplot() +
  geom_sf(information = province, aes(fill = space)) +
  geom_sf_label(information = province, aes(label = identify)) +
  scale_fill_continuous(kind = "viridis",
                        identify = "Area [km²]") +
  labs(title = "Flemish provinces",
       x = "",
       y = "") +
  theme(panel.background = element_blank(),
        axis.textual content = element_blank(),
        axis.ticks = element_blank())

Geometrical operations

Several kinds of geometrical operations are applied in sf, we’ll right here discover a few of them, you may take a look at this vignette for the complete listing.

Unary operations returning a geometry

These operations take a single sf object and return geometries, these embody getting a buffer round geometries or the centroid of polygons. We will discover this with a dataset of the Flemish municipalities:

# load the municipality shapefile
municipality <- st_read("Shapefile/Refgem.shp")
## Reading layer `Refgem' from information supply `D:DocumentsProgramming_stufflionel68.github.io_posts_datar_as_gisShapefileRefgem.shp' utilizing driver `ESRI Shapefile'
## Simple function assortment with 308 options and 9 fields
## geometry kind:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 21991.38 ymin: 153049.Four xmax: 258878.5 ymax: 244027.1
## projected CRS:  Belge 1972 / Belgian Lambert 72
# re-project municipality information
municipality <- st_transform(municipality,
                                 st_crs(province))

# make a buffer of 5000m across the metropolis of Antwerpen
ant_buffer <- st_buffer(municipality[municipality$NAAM == "Antwerpen",], dist = 5000)

# get the centroids of the municipalities
municipality_cent <- st_centroid(municipality)
## Warning in st_centroid.sf(municipality): st_centroid assumes attributes are
## fixed over geometries of x
# plot the completely different objects
ggplot() +
  geom_sf(information = municipality) +
  geom_sf(information = ant_buffer, alpha = 0.5, fill = "orange") +
  geom_sf(information = municipality_cent, colour = "green")

Binary logical operations

These geometrical operation take two geometries and return logical data, as an illustration we will verify which of town centroids are throughout the 5km buffer of town of Antwerpen

st_within(municipality_cent, ant_buffer)
## Sparse geometry binary predicate listing of size 308, the place the predicate was `inside'
## first 10 parts:
##  1: (empty)
##  2: (empty)
##  3: (empty)
##  4: (empty)
##  5: 1
##  6: (empty)
##  7: (empty)
##  8: (empty)
##  9: (empty)
##  10: (empty)

The perform returns a listing the place every aspect characterize the centroid of one of many 308 municipalities, an “(empty)” worth implies that the centroid was not throughout the buffer whereas a worth of “1” point out that it was throughout the buffer. We also can get this output as a logical vector/matrix:

st_within(municipality_cent, ant_buffer, sparse = FALSE)[1:10,]
##  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE

This can be utilized to filter the municipality dataset to solely hold the cities whose centroid are throughout the buffer:

municipality %>%
  filter(st_within(municipality_cent, ant_buffer, sparse = FALSE))
## Simple function assortment with 20 options and 9 fields
## geometry kind:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 3912907 ymin: 3123760 xmax: 3941971 ymax: 3156259
## projected CRS:  ETRS89-extended / LAEA Europe
## First 10 options:
##    UIDN OIDN TERRID NISCODE       NAAM  DATPUBLBS      NUMAC   LENGTE
## 1   313    5    140   11001 Aartselaar 1982-12-29 1982001920 22122.90
## 2   370   62    124   11029    Mortsel 1982-12-29 1982001920 13450.48
## 3   379   71     85   11040    Schoten 1831-02-07       <NA> 28983.56
## 4   399   91    122   11004   Boechout 1976-01-23 1975123003 26307.31
## 5   400   92    117   11007   Borsbeek 1831-02-07       <NA> 10035.45
## 6   403   95    142   11024    Kontich 1976-01-23 1975123003 29080.36
## 7   418  110     74   11023   Kapellen 1982-07-17 1982001074 29310.94
## 8   482  174     75   11044   Stabroek 1988-09-08 1988000266 21599.21
## 9   490  182    120   46013   Kruibeke 1982-12-29 1982001920 27536.07
## 10  503  195     71   11002  Antwerpen 1988-09-08 1988000266 99479.14
##      OPPERVL                       geometry
## 1   11025465 MULTIPOLYGON (((3929668 313...
## 2    7785306 MULTIPOLYGON (((3933399 313...
## 3   29513750 MULTIPOLYGON (((3941284 314...
## 4   20712116 MULTIPOLYGON (((3938920 313...
## 5    3902970 MULTIPOLYGON (((3935869 313...
## 6   23824644 MULTIPOLYGON (((3930045 313...
## 7   37238543 MULTIPOLYGON (((3932909 315...
## 8   21536451 MULTIPOLYGON (((3929829 315...
## 9   33587947 MULTIPOLYGON (((3924151 313...
## 10 204283071 MULTIPOLYGON (((3926991 315...

All of those operations comply with the identical logic, st_operation(A, B) checks for every combos of the geometries in A and B whether or not A operation B is true or false. For occasion st_within(A, B) checks whether or not the geometries in A are inside B, that is just like st_contains(B, A), the distinction between the 2 being the form of the returned object. If A has n geometries and B has m, st_contains(B, A) returns a listing of size m the place every parts incorporates the row IDs (numbers between 1 and n) of the geometries in A satisfying the operation. By utilizing sparse=FALSE the capabilities returns matrices, like st_within(A, B, sparse=FALSE) returns a n x m matrix, st_within(B, A, sparse=FALSE) returns a m x n matrix. Note that working st_operation(A, A) checks the operation between all geometries of the item, so returning a n x n matrix.

There are numerous such operations applied in sf, you may verify ?st_intersects for a listing of choices. These capabilities work for any kind of geometries (factors, traces, polygons), you may verify this determine for a graphical illustration of a few of the operations.

Another instance of such binary operations can be to rely what number of municipalities are inside every province:

mat <- st_contains(province, municipality_cent, sparse = FALSE)
province$municipality_count <- apply(mat, 1, sum)
province
## Simple function assortment with 5 options and three fields
## geometry kind:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 3799513 ymin: 3074638 xmax: 4032508 ymax: 3167919
## projected CRS:  ETRS89-extended / LAEA Europe
##              identify     space                       geometry
## 1 West-Vlaanderen 3197.076 MULTIPOLYGON (((3859766 316...
## 2 Oost-Vlaanderen 3008.166 MULTIPOLYGON (((3921616 315...
## 3       Antwerpen 2876.444 MULTIPOLYGON (((3958497 316...
## 4         Limburg 2428.024 MULTIPOLYGON (((4009899 313...
## 5  Vlaams Brabant 2118.894 MULTIPOLYGON (((3976903 311...
##   municipality_count
## 1                 64
## 2                 65
## 3                 69
## 4                 44
## 5                 65

Binary operation returning a geometry

These operations prolong the capabilities seen within the earlier part by returning the corresponding geometries, as an illustration if we need to compute the areas of the intersection of the 5km buffer round Antwerpen with the municipalities:

municipality %>%
  st_intersection(., st_geometry(ant_buffer)) %>%
  mutate(space = st_area(.)) %>%
  mutate(prop_area = as.numeric(space / OPPERVL)) %>%
  prepare(prop_area) -> municipality_inter
## Warning: attribute variables are assumed to be spatially fixed
## all through all geometries
municipality_inter
## Simple function assortment with 29 options and 11 fields
## geometry kind:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 3916863 ymin: 3124856 xmax: 3942025 ymax: 3157429
## projected CRS:  ETRS89-extended / LAEA Europe
## First 10 options:
##    UIDN OIDN TERRID NISCODE      NAAM  DATPUBLBS      NUMAC   LENGTE
## 1   352   44    133   12021      Lier 1982-12-29 1982001920 54049.33
## 2   487  179    128   46025     Temse 1982-12-29 1982001920 36505.16
## 3   554  246    154   12007    Bornem 1976-01-23 1975123003 34757.33
## 4   499  191     64   11022 Kalmthout 1982-12-29 1982001920 44002.17
## 5   589  281    146   11025      Lint 1869-06-30       <NA> 13603.12
## 6   358   50    104   11035     Ranst 1982-12-29 1982001920 32714.08
## 7   392   84     88   11039   Schilde 1982-12-29 1982001920 33382.47
## 8   443  135    163   11005      Boom 1831-02-07       <NA> 12224.58
## 9   387   79    155   11037     Rumst 1982-12-29 1982001920 29597.17
## 10  592  284     76   46003   Beveren 1982-12-29 1982001920 63007.70
##      OPPERVL              space  prop_area                       geometry
## 1   49856712    69998.32 [m^2] 0.00140399 POLYGON ((3938357 3130414, ...
## 2   40111036   478231.03 [m^2] 0.01192268 MULTIPOLYGON (((3920096 313...
## 3   46197971   898380.83 [m^2] 0.01944633 POLYGON ((3922951 3128162, ...
## 4   59441394  1430014.45 [m^2] 0.02405755 POLYGON ((3934127 3153756, ...
## 5    5627892   319806.05 [m^2] 0.05682520 POLYGON ((3934665 3129181, ...
## 6   43665536  5457104.05 [m^2] 0.12497509 POLYGON ((3940083 3133253, ...
## 7   36095176  6511826.42 [m^2] 0.18040711 POLYGON ((3942004 3137363, ...
## 8    7199506  1427152.64 [m^2] 0.19822923 POLYGON ((3926726 3125739, ...
## 9   20142693  5249294.65 [m^2] 0.26060541 POLYGON ((3929559 3126868, ...
## 10 153050589 84895066.28 [m^2] 0.55468631 POLYGON ((3921616 3153206, ...

The necessary distinction with these perform is that the geometry column returned correspond to not the geometry column of the inputted objects however to the geometry of the operation itself. The completely different operations obtainable might be discovered by checking ?st_intersection. An necessary level to think about is that the attributes (the columns) of the inputted objects are handed to the outcomes of the operation assuming that the attributes values stays fixed. For occasion, within the instance above the column “OPPERVL” characterize the realm of the municipalities and is handed on the intersections with no modifications.

We can verify the returned geometries:

ggplot() +
  geom_sf(information = subset(municipality, NAAM %in% municipality_inter$NAAM)) +
  geom_sf(information = municipality_inter, aes(fill = prop_area), alpha = 0.3)

Joins (spatial and non-spatial)

With sf objects various kinds of joins might be carried out, (i) on the attributes (columns) and (ii) on the geometries (spatial).

First we will see joins primarily based on attributes including inhabitants information to the municipality dataset:

# load inhabitants file
inhabitants <- learn.csv("https://raw.githubusercontent.com/lionel68/lionel68.github.io/master/_posts_data/r_as_gis/population_flanders.csv")

# be a part of to the municipality
municipality %>%
  left_join(inhabitants) -> municipality_pop
## Joining, by = "NAAM"

For this one can use all of the various kinds of joints obtainable inside tidyverse.

More fascinating on this context are spatial joins, as an illustration becoming a member of the province and municipality datasets primarily based on whether or not the municipalities geometries are throughout the province geometries:

# barely adapt names in province
names(province)[1:2] <- paste(names(province)[1:2], "province", sep="_")

# barely adapt names province
names(municipality_pop)[5] <- "name_municipality"

# joins, utilizing municipality inside provinces
municipality_province <- st_join(municipality_pop[,c(5, 10)], 
                                 province, be a part of = st_within)

# plot the outcomes
ggplot() +
  geom_sf(information = municipality_province, 
          aes(fill = name_province)) +
  geom_sf(information = province, colour = "red",
         alpha = 0.2)

We can use for the spatial joins any of the binary logical operations, see ?st_intersects for all of the choices. These joins are then outlined by the be a part of= argument of the perform. These joins additionally works for any kinds of vector information (level, line, polygon). Here with our examples a few of the municipalities weren’t discovered to be inside any provinces, these have been municipalities on the border of the provinces the place almost definitely the borders of the municipalities touched or possibly barely overflowed out of the provinces.

Use case

To sum up all of the issues we noticed as much as know, let’s put them in apply by taking a look at what number of inhabitants of Flanders are inside 10km of Seveso websites, websites the place harmful (chemical) substances are saved. To achieve this we’ll load a dataset containing the geometries of the registered Seveso websites, create buffers, interset this with the municipalities geometries, then assuming that inhabitants are homogeneously distributed over the municipality compute what number of inhabitants are within the completely different intersections.

Let’s go:

# load seveso information
seveso <- st_read("https://raw.githubusercontent.com/lionel68/lionel68.github.io/master/_posts_data/r_as_gis/seveso.geojson")
## Reading layer `seveso' from information supply `https://uncooked.githubusercontent.com/lionel68/lionel68.github.io/grasp/_posts_data/r_as_gis/seveso.geojson' utilizing driver `GeoJSON'
## Simple function assortment with 291 options and a pair of fields
## geometry kind:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 3809095 ymin: 3082887 xmax: 4012066 ymax: 3164945
## projected CRS:  ETRS89-extended / LAEA Europe
# create 10km buffers
seveso_buffer <- st_buffer(seveso, 10000)
# create intersection with municipality
municipality_seveso <- st_intersection(municipality_pop, seveso_buffer)create 10km buffers
## Warning: attribute variables are assumed to be spatially fixed
## all through all geometries
# compute areas and inhabitants throughout the intersection
# and compute sums
municipality_seveso %>%
  mutate(area_intersection = as.numeric(st_area(.))) %>%
  mutate(pop_intersection = (area_intersection /
                               OPPERVL) * inhabitants) %>%
  st_drop_geometry() %>% # we drop the geometry column
  summarise(S_seveso = sum(pop_intersection, na.rm = TRUE),
            S_population = sum(inhabitants, na.rm = TRUE)) %>%
  mutate(prop = S_seveso / S_population)
##   S_seveso S_population      prop
## 1 70455290    136289905 0.5169516

Based on the simplified assumptions that municipality inhabitants is homogeneously distributed throughout the municipality space we discovered that greater than half of the inhabitants in Flanders are inside 10km of a Seveso website …

Conclusion

In this submit the most typical GIS operation on vector information have been lined utilizing the sf package deal in R. Make positive to verify the very full Geocomputation in R ebook for extra in-depth infos on this. Next time we’ll discover raster information, keep tuned!



[ad_2]

Source hyperlink

Write a comment