## R as GIS, part 1: vector

[ad_1]

Working with spatial information is changing into an increasing number of frequent with the event of geoportals offering free entry to massive variety of spatial datasets. Geographic Information System software program such as ArcGis or QGis are generally used to work with these information, on this sequence of submit I need to present the GIS capacities of R.

Spatial information are available two class:

- vector-type: spatial information are represented as discrete parts such as factors (cities), traces (rivers) or polygons (nations).
- raster-type: spatial information are represented as common grids with spatial data discretized into cells / pixels. Raster information are the usual format for distant sensing, satelite information.

In this primary submit we’ll see easy methods to work with vector information in R utilizing the sf package deal.

## Getting the info

First we load the wanted libraries:

```
library(tidyverse)
library(sf)
```

Reading in spatial information into R might be simply finished utilizing the `st_read`

perform. The perform assist numerous codecs by utilizing the GDAL driver within the background. We will use an instance dataset from the Flemish area of Belgium, downloading a zipper file with all of the shapefiles, unzipping it and loading it into R:

`# outline a goal listing in your laptop computer target_dir <- "./" # put this as the working listing setwd(target_dir) # obtain the information obtain.file("https://downloadagiv.blob.core.windows.net/referentiebestand-gemeenten/VoorlopigRefBestandGemeentegrenzen_2016-01-29/Voorlopig_referentiebestand_gemeentegrenzen_toestand_29_01_2016_GewVLA_Shape.zip", destfile = "municipality.zip") # unzip it unzip(zipfile = "municipality.zip") # learn within the Flemish province shapefile province <- st_read("Shapefile/Refprv.shp")`

## Reading layer `Refprv' from information supply `D:DocumentsProgramming_stufflionel68.github.io_posts_datar_as_gisShapefileRefprv.shp' utilizing driver `ESRI Shapefile' ## Simple function assortment with 5 options and eight fields ## geometry kind: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 21991.38 ymin: 153049.Four xmax: 258878.5 ymax: 244027.1 ## projected CRS: Belge 1972 / Belgian Lambert 72

The perform tells us that te dataset is fabricated from 5 options (rows) every having Eight fields (columns). We additionally get some infos on the bounding field and the Coordinate Reference System (CRS).

The perform returns an object of sophistication `sf`

that appears like an information.body:

`province`

## Simple function assortment with 5 options and eight fields ## geometry kind: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 21991.38 ymin: 153049.Four xmax: 258878.5 ymax: 244027.1 ## projected CRS: Belge 1972 / Belgian Lambert 72 ## UIDN OIDN TERRID NAAM NISCODE NUTS2 LENGTE OPPERVL ## 1 6 2 357 Antwerpen 10000 BE21 409906.2 2876444170 ## 2 7 4 359 Vlaams Brabant 20001 BE24 507999.1 2118893799 ## 3 9 5 356 Oost-Vlaanderen 40000 BE23 404985.3 3008166146 ## 4 10 1 355 Limburg 70000 BE22 397732.6 2428024237 ## 5 11 3 351 West-Vlaanderen 30000 BE25 356653.0 3197075771 ## geometry ## 1 MULTIPOLYGON (((178133.9 24... ## 2 MULTIPOLYGON (((200484.9 19... ## Three MULTIPOLYGON (((142473.9 22... ## Four MULTIPOLYGON (((231635.6 21... ## 5 MULTIPOLYGON (((80189.16 22...

The necessary distinction with a normal information.body is the “geometry” column that incorporates the vector informations utilizing the well-known textual content illustration. This column is sticky, it should keep within the R object except explicitely dropped (verify `?st_drop_geometry`

).

## Manipulating sf object

The `dplyr`

capabilities can be utilized to control `sf`

objects, as an illustration we’ll rename the “NAAM” column, create a brand new “area” column and kind the rows by the realm descending:

```
province %>%
rename(identify = NAAM) %>%
mutate(space = OPPERVL / 1e6) %>%
choose(identify, space) %>%
prepare(desc(space)) -> province
```

## Coordinate Reference System

Every spatial information comes with a Coordinate reference system (CRS briefly) that describes how the info are represented on the floor of the earth. An in-depth definition of CRS is past the scope of this submit, right here we’ll simply see easy methods to get CRS data from `sf`

object and easy methods to rework them into new reference system.

CRS are outlined in sf by their epsg code, as an illustration the classical GPS/WGS84 CRS has the 4326 code, let’s discover this:

`# get CRS st_crs(province)`

## Coordinate Reference System: ## User enter: Belge 1972 / Belgian Lambert 72 ## wkt: ## PROJCRS["Belge 1972 / Belgian Lambert 72", ## BASEGEOGCRS["Belge 1972", ## DATUM["Reseau National Belge 1972", ## ELLIPSOID["International 1924",6378388,297, ## LENGTHUNIT["metre",1]], ## ID["EPSG",6313]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["Degree",0.0174532925199433]]], ## CONVERSION["unnamed", ## METHOD["Lambert Conic Conformal (2SP)", ## ID["EPSG",9802]], ## PARAMETER["Latitude of false origin",90, ## ANGLEUNIT["Degree",0.0174532925199433], ## ID["EPSG",8821]], ## PARAMETER["Longitude of false origin",4.36748666666667, ## ANGLEUNIT["Degree",0.0174532925199433], ## ID["EPSG",8822]], ## PARAMETER["Latitude of 1st standard parallel",49.8333339, ## ANGLEUNIT["Degree",0.0174532925199433], ## ID["EPSG",8823]], ## PARAMETER["Latitude of 2nd standard parallel",51.1666672333333, ## ANGLEUNIT["Degree",0.0174532925199433], ## ID["EPSG",8824]], ## PARAMETER["Easting at false origin",150000.01256, ## LENGTHUNIT["metre",1], ## ID["EPSG",8826]], ## PARAMETER["Northing at false origin",5400088.4378, ## LENGTHUNIT["metre",1], ## ID["EPSG",8827]]], ## CS[Cartesian,2], ## AXIS["(E)",east, ## ORDER[1], ## LENGTHUNIT["metre",1, ## ID["EPSG",9001]]], ## AXIS["(N)",north, ## ORDER[2], ## LENGTHUNIT["metre",1, ## ID["EPSG",9001]]]]

```
# re-project into the European CRS
province <- st_transform(province, 3035)
```

When working with spatial information it’s key to be sure that the CRS are enough, as an illustration computing areas or distances make extra sense in projected CRS somewhat than in geographic CRS. Also, when doing spatial operations on a number of spatial objects all of the CRS need to be an identical.

## Plotting spatial information

`sf`

objects might be, very convinently, plotted utilizing `ggplot2`

:

```
# a easy plot of the province
# coloured by their floor
ggplot() +
geom_sf(information = province, aes(fill = space)) +
geom_sf_label(information = province, aes(label = identify)) +
scale_fill_continuous(kind = "viridis",
identify = "Area [km²]") +
labs(title = "Flemish provinces",
x = "",
y = "") +
theme(panel.background = element_blank(),
axis.textual content = element_blank(),
axis.ticks = element_blank())
```

## Geometrical operations

Several kinds of geometrical operations are applied in `sf`

, we’ll right here discover a few of them, you may take a look at this vignette for the complete listing.

### Unary operations returning a geometry

These operations take a single `sf`

object and return geometries, these embody getting a buffer round geometries or the centroid of polygons. We will discover this with a dataset of the Flemish municipalities:

`# load the municipality shapefile municipality <- st_read("Shapefile/Refgem.shp")`

## Reading layer `Refgem' from information supply `D:DocumentsProgramming_stufflionel68.github.io_posts_datar_as_gisShapefileRefgem.shp' utilizing driver `ESRI Shapefile' ## Simple function assortment with 308 options and 9 fields ## geometry kind: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 21991.38 ymin: 153049.Four xmax: 258878.5 ymax: 244027.1 ## projected CRS: Belge 1972 / Belgian Lambert 72

`# re-project municipality information municipality <- st_transform(municipality, st_crs(province)) # make a buffer of 5000m across the metropolis of Antwerpen ant_buffer <- st_buffer(municipality[municipality$NAAM == "Antwerpen",], dist = 5000) # get the centroids of the municipalities municipality_cent <- st_centroid(municipality)`

## Warning in st_centroid.sf(municipality): st_centroid assumes attributes are ## fixed over geometries of x

```
# plot the completely different objects
ggplot() +
geom_sf(information = municipality) +
geom_sf(information = ant_buffer, alpha = 0.5, fill = "orange") +
geom_sf(information = municipality_cent, colour = "green")
```

### Binary logical operations

These geometrical operation take two geometries and return logical data, as an illustration we will verify which of town centroids are throughout the 5km buffer of town of Antwerpen

`st_within(municipality_cent, ant_buffer)`

## Sparse geometry binary predicate listing of size 308, the place the predicate was `inside' ## first 10 parts: ## 1: (empty) ## 2: (empty) ## 3: (empty) ## 4: (empty) ## 5: 1 ## 6: (empty) ## 7: (empty) ## 8: (empty) ## 9: (empty) ## 10: (empty)

The perform returns a listing the place every aspect characterize the centroid of one of many 308 municipalities, an “(empty)” worth implies that the centroid was not throughout the buffer whereas a worth of “1” point out that it was throughout the buffer. We also can get this output as a logical vector/matrix:

`st_within(municipality_cent, ant_buffer, sparse = FALSE)[1:10,]`

## [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE

This can be utilized to filter the municipality dataset to solely hold the cities whose centroid are throughout the buffer:

`municipality %>% filter(st_within(municipality_cent, ant_buffer, sparse = FALSE))`

## Simple function assortment with 20 options and 9 fields ## geometry kind: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 3912907 ymin: 3123760 xmax: 3941971 ymax: 3156259 ## projected CRS: ETRS89-extended / LAEA Europe ## First 10 options: ## UIDN OIDN TERRID NISCODE NAAM DATPUBLBS NUMAC LENGTE ## 1 313 5 140 11001 Aartselaar 1982-12-29 1982001920 22122.90 ## 2 370 62 124 11029 Mortsel 1982-12-29 1982001920 13450.48 ## 3 379 71 85 11040 Schoten 1831-02-07 <NA> 28983.56 ## 4 399 91 122 11004 Boechout 1976-01-23 1975123003 26307.31 ## 5 400 92 117 11007 Borsbeek 1831-02-07 <NA> 10035.45 ## 6 403 95 142 11024 Kontich 1976-01-23 1975123003 29080.36 ## 7 418 110 74 11023 Kapellen 1982-07-17 1982001074 29310.94 ## 8 482 174 75 11044 Stabroek 1988-09-08 1988000266 21599.21 ## 9 490 182 120 46013 Kruibeke 1982-12-29 1982001920 27536.07 ## 10 503 195 71 11002 Antwerpen 1988-09-08 1988000266 99479.14 ## OPPERVL geometry ## 1 11025465 MULTIPOLYGON (((3929668 313... ## 2 7785306 MULTIPOLYGON (((3933399 313... ## 3 29513750 MULTIPOLYGON (((3941284 314... ## 4 20712116 MULTIPOLYGON (((3938920 313... ## 5 3902970 MULTIPOLYGON (((3935869 313... ## 6 23824644 MULTIPOLYGON (((3930045 313... ## 7 37238543 MULTIPOLYGON (((3932909 315... ## 8 21536451 MULTIPOLYGON (((3929829 315... ## 9 33587947 MULTIPOLYGON (((3924151 313... ## 10 204283071 MULTIPOLYGON (((3926991 315...

All of those operations comply with the identical logic, st_*operation*(A, B) checks for every combos of the geometries in A and B whether or not A *operation* B is true or false. For occasion `st_within(A, B)`

checks whether or not the geometries in A are *inside* B, that is just like `st_contains(B, A)`

, the distinction between the 2 being the form of the returned object. If A has n geometries and B has m, `st_contains(B, A)`

returns a listing of size m the place every parts incorporates the row IDs (numbers between 1 and n) of the geometries in A satisfying the operation. By utilizing `sparse=FALSE`

the capabilities returns matrices, like `st_within(A, B, sparse=FALSE)`

returns a n x m matrix, `st_within(B, A, sparse=FALSE)`

returns a m x n matrix. Note that working st_*operation*(A, A) checks the operation between all geometries of the item, so returning a n x n matrix.

There are numerous such operations applied in sf, you may verify `?st_intersects`

for a listing of choices. These capabilities work for any kind of geometries (factors, traces, polygons), you may verify this determine for a graphical illustration of a few of the operations.

Another instance of such binary operations can be to rely what number of municipalities are inside every province:

`mat <- st_contains(province, municipality_cent, sparse = FALSE) province$municipality_count <- apply(mat, 1, sum) province`

## Simple function assortment with 5 options and three fields ## geometry kind: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 3799513 ymin: 3074638 xmax: 4032508 ymax: 3167919 ## projected CRS: ETRS89-extended / LAEA Europe ## identify space geometry ## 1 West-Vlaanderen 3197.076 MULTIPOLYGON (((3859766 316... ## 2 Oost-Vlaanderen 3008.166 MULTIPOLYGON (((3921616 315... ## 3 Antwerpen 2876.444 MULTIPOLYGON (((3958497 316... ## 4 Limburg 2428.024 MULTIPOLYGON (((4009899 313... ## 5 Vlaams Brabant 2118.894 MULTIPOLYGON (((3976903 311... ## municipality_count ## 1 64 ## 2 65 ## 3 69 ## 4 44 ## 5 65

### Binary operation returning a geometry

These operations prolong the capabilities seen within the earlier part by returning the corresponding geometries, as an illustration if we need to compute the areas of the intersection of the 5km buffer round Antwerpen with the municipalities:

`municipality %>% st_intersection(., st_geometry(ant_buffer)) %>% mutate(space = st_area(.)) %>% mutate(prop_area = as.numeric(space / OPPERVL)) %>% prepare(prop_area) -> municipality_inter`

## Warning: attribute variables are assumed to be spatially fixed ## all through all geometries

`municipality_inter`

## Simple function assortment with 29 options and 11 fields ## geometry kind: GEOMETRY ## dimension: XY ## bbox: xmin: 3916863 ymin: 3124856 xmax: 3942025 ymax: 3157429 ## projected CRS: ETRS89-extended / LAEA Europe ## First 10 options: ## UIDN OIDN TERRID NISCODE NAAM DATPUBLBS NUMAC LENGTE ## 1 352 44 133 12021 Lier 1982-12-29 1982001920 54049.33 ## 2 487 179 128 46025 Temse 1982-12-29 1982001920 36505.16 ## 3 554 246 154 12007 Bornem 1976-01-23 1975123003 34757.33 ## 4 499 191 64 11022 Kalmthout 1982-12-29 1982001920 44002.17 ## 5 589 281 146 11025 Lint 1869-06-30 <NA> 13603.12 ## 6 358 50 104 11035 Ranst 1982-12-29 1982001920 32714.08 ## 7 392 84 88 11039 Schilde 1982-12-29 1982001920 33382.47 ## 8 443 135 163 11005 Boom 1831-02-07 <NA> 12224.58 ## 9 387 79 155 11037 Rumst 1982-12-29 1982001920 29597.17 ## 10 592 284 76 46003 Beveren 1982-12-29 1982001920 63007.70 ## OPPERVL space prop_area geometry ## 1 49856712 69998.32 [m^2] 0.00140399 POLYGON ((3938357 3130414, ... ## 2 40111036 478231.03 [m^2] 0.01192268 MULTIPOLYGON (((3920096 313... ## 3 46197971 898380.83 [m^2] 0.01944633 POLYGON ((3922951 3128162, ... ## 4 59441394 1430014.45 [m^2] 0.02405755 POLYGON ((3934127 3153756, ... ## 5 5627892 319806.05 [m^2] 0.05682520 POLYGON ((3934665 3129181, ... ## 6 43665536 5457104.05 [m^2] 0.12497509 POLYGON ((3940083 3133253, ... ## 7 36095176 6511826.42 [m^2] 0.18040711 POLYGON ((3942004 3137363, ... ## 8 7199506 1427152.64 [m^2] 0.19822923 POLYGON ((3926726 3125739, ... ## 9 20142693 5249294.65 [m^2] 0.26060541 POLYGON ((3929559 3126868, ... ## 10 153050589 84895066.28 [m^2] 0.55468631 POLYGON ((3921616 3153206, ...

The necessary distinction with these perform is that the geometry column returned correspond to not the geometry column of the inputted objects however to the geometry of the operation itself. The completely different operations obtainable might be discovered by checking `?st_intersection`

. An necessary level to think about is that the attributes (the columns) of the inputted objects are handed to the outcomes of the operation assuming that the attributes values stays fixed. For occasion, within the instance above the column “OPPERVL” characterize the realm of the municipalities and is handed on the intersections with no modifications.

We can verify the returned geometries:

```
ggplot() +
geom_sf(information = subset(municipality, NAAM %in% municipality_inter$NAAM)) +
geom_sf(information = municipality_inter, aes(fill = prop_area), alpha = 0.3)
```

## Joins (spatial and non-spatial)

With `sf`

objects various kinds of joins might be carried out, (i) on the attributes (columns) and (ii) on the geometries (spatial).

First we will see joins primarily based on attributes including inhabitants information to the municipality dataset:

`# load inhabitants file inhabitants <- learn.csv("https://raw.githubusercontent.com/lionel68/lionel68.github.io/master/_posts_data/r_as_gis/population_flanders.csv") # be a part of to the municipality municipality %>% left_join(inhabitants) -> municipality_pop`

## Joining, by = "NAAM"

For this one can use all of the various kinds of *joints* obtainable inside tidyverse.

More fascinating on this context are spatial joins, as an illustration becoming a member of the province and municipality datasets primarily based on whether or not the municipalities geometries are throughout the province geometries:

```
# barely adapt names in province
names(province)[1:2] <- paste(names(province)[1:2], "province", sep="_")
# barely adapt names province
names(municipality_pop)[5] <- "name_municipality"
# joins, utilizing municipality inside provinces
municipality_province <- st_join(municipality_pop[,c(5, 10)],
province, be a part of = st_within)
# plot the outcomes
ggplot() +
geom_sf(information = municipality_province,
aes(fill = name_province)) +
geom_sf(information = province, colour = "red",
alpha = 0.2)
```

We can use for the spatial joins any of the binary logical operations, see `?st_intersects`

for all of the choices. These joins are then outlined by the `be a part of=`

argument of the perform. These joins additionally works for any kinds of vector information (level, line, polygon). Here with our examples a few of the municipalities weren’t discovered to be inside any provinces, these have been municipalities on the border of the provinces the place almost definitely the borders of the municipalities touched or possibly barely overflowed out of the provinces.

## Use case

To sum up all of the issues we noticed as much as know, let’s put them in apply by taking a look at what number of inhabitants of Flanders are inside 10km of Seveso websites, websites the place harmful (chemical) substances are saved. To achieve this we’ll load a dataset containing the geometries of the registered Seveso websites, create buffers, interset this with the municipalities geometries, then assuming that inhabitants are homogeneously distributed over the municipality compute what number of inhabitants are within the completely different intersections.

Let’s go:

`# load seveso information seveso <- st_read("https://raw.githubusercontent.com/lionel68/lionel68.github.io/master/_posts_data/r_as_gis/seveso.geojson")`

## Reading layer `seveso' from information supply `https://uncooked.githubusercontent.com/lionel68/lionel68.github.io/grasp/_posts_data/r_as_gis/seveso.geojson' utilizing driver `GeoJSON' ## Simple function assortment with 291 options and a pair of fields ## geometry kind: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 3809095 ymin: 3082887 xmax: 4012066 ymax: 3164945 ## projected CRS: ETRS89-extended / LAEA Europe

`# create 10km buffers seveso_buffer <- st_buffer(seveso, 10000)`

# create intersection with municipality municipality_seveso <- st_intersection(municipality_pop, seveso_buffer)create 10km buffers ## Warning: attribute variables are assumed to be spatially fixed ## all through all geometries

`# compute areas and inhabitants throughout the intersection # and compute sums municipality_seveso %>% mutate(area_intersection = as.numeric(st_area(.))) %>% mutate(pop_intersection = (area_intersection / OPPERVL) * inhabitants) %>% st_drop_geometry() %>% # we drop the geometry column summarise(S_seveso = sum(pop_intersection, na.rm = TRUE), S_population = sum(inhabitants, na.rm = TRUE)) %>% mutate(prop = S_seveso / S_population)`

## S_seveso S_population prop ## 1 70455290 136289905 0.5169516

Based on the simplified assumptions that municipality inhabitants is homogeneously distributed throughout the municipality space we discovered that greater than half of the inhabitants in Flanders are inside 10km of a Seveso website …

## Conclusion

In this submit the most typical GIS operation on vector information have been lined utilizing the `sf`

package deal in R. Make positive to verify the very full Geocomputation in R ebook for extra in-depth infos on this. Next time we’ll discover raster information, keep tuned!

[ad_2]

Source hyperlink