Setup: sample data

For more on setting up the environment and sample data, see the preparation document.

Table 0.1: Sample data
Data	File(s)	Format	Source
“Nafot”	`nafot.shp` (+7)	Shapefile	https://www.gov.il/he/Departments/Guides/info-gis
Railways	`RAIL_STRATEGIC.shp` (+7)	Shapefile	https://data.gov.il/dataset/rail_strategic
Statistical areas	`statisticalareas_demography2018.gdb`	GDB	https://www.cbs.gov.il/he/Pages/geo-layers.aspx

The data for this tutorial can be downloaded from:

https://github.com/michaeldorman/R-Spatial-Workshop-at-CBS-2021/raw/main/data.zip

A script with the R code of this document is available here:

https://github.com/michaeldorman/R-Spatial-Workshop-at-CBS-2021/raw/main/main.R

All of the materials are also available on GitHub.

Please feel free to ask questions as we go along!

1 R for Spatial Data Analysis

1.1 Software for analysis of spatial data

Software in general, and software for spatial analysis in particular, is characterized by two types of interfaces:

Graphical User Interface (GUI) (Figure 1.1)
Command Line Interface (CLI) (Figure 1.2)

In a GUI, our interaction with the computer is restricted to the predefined set of input elements, such as buttons, menus, and dialog boxes. In a CLI, we interact with the computer by writing code, which means that our instructions are practically unconstraned. In other words, with a CLI, we can give the computer specific instructions to do anything we want.

R, which we talk about today, is an example of CLI software for working with (among other things) spatial data.

Figure 1.1: QGIS, an example of Graphical User Interface (GUI) software

Figure 1.2: R, an example of Command Line Interface (CLI) software

1.2 What is R?

R is a programming language and environment, originally developed for statistical computing and graphics. Notable advantages of R are that it is a full-featured programming language, yet customized for working with data, relatively simple and has a huge collection of ~16,000 packages in the official repository from various areas of interest.

Over time, there was an increasing number of contributed packages for handling and analyzing spatial data in R. Today, spatial analysis is a major functionality in R. As of October 2020, there are ~185 packages specifically addressing spatial analysis in R, and many more are indirectly related to spatial data.

Figure 1.3: Books on Spatial Data Analysis with R

1.3 History of spatial analysis in R

Some important events in the history of spatial analysis support in R are summarized in Table 1.1.

Table 1.1: Significant events in the history of R-spatial
Year	Event
pre-2003	Variable and incomplete approaches (`MASS`, `spatstat`, `maptools`, `geoR`, `splancs`, `gstat`, …)
2003	Consensus that a package defining standard data structures should be useful; `rgdal` released on CRAN
2005	`sp` released on CRAN; `sp` support in `rgdal`
2008	Applied Spatial Data Analysis with R, 1^st ed.
2010	`raster` released on CRAN
2011	`rgeos` released on CRAN
2013	Applied Spatial Data Analysis with R, 2^nd ed.
2016	`sf` released on CRAN (Section 2.1)
2018	`stars` released on CRAN
2019	Geocomputation with R (https://geocompr.robinlovelace.net/)
2021(?)	Spatial Data Science (https://www.r-spatial.org/book/)

1.4 R as a GIS?

A question that arises, at this point, is: can R be used as a Geographic Information System (GIS), or as a comprehensive toolbox for doing spatial analysis? The answer is definitely yes. Moreover, R has some important advantages over traditional approaches to GIS, i.e., software with GUIs such as ArcGIS or QGIS.

General advantages of Command Line Interface (CLI) software include:

Automation—Doing otherwise unfeasible repetitive tasks
Reproducibility—Precise control of instructions to the computer

Moreover, specific strengths of R as a GIS are:

R capabilities in data processing, statistics, and visualization, combined with dedicated packages for spatial data
A single environment encompassing all analysis aspects—acquiring data, computation, statistics, visualization, Web, etc.

Nevertheless, there are situations when other tools are needed:

Interactive editing or georeferencing (but see mapedit package)
Unique GIS algorithms (3D analysis, label placement, splitting lines at intersections)
Data that cannot fit in RAM (but R can connect to spatial databases¹ and other softwere for working with big data)

The following sections (1.5–1.11) highlight some of the capabilities of spatial data analysis packages in R, through short examples.

1.5 `sf` and `stars`

Reading spatial layers from a file into an R data structure, or writing the R data structure into a file, are handled by external libraries:

GDAL/OGR is used for reading/writing vector and raster files, with sf and stars
PROJ is used for handling Coordinate Reference Systems (CRS), in both sf and stars

1.6 `sf`: Vector Layers

GEOS is used for geometric operations on vector layers with sf:

Numeric operators—Area, Length, Distance…
Logical operators—Contains, Within, Within distance, Crosses, Overlaps, Equals, Intersects, Disjoint, Touches…
Geometry generating operators—Centroid, Buffer (Figure 1.4), Intersection, Union, Difference, Convex-Hull, Simplification…

Figure 1.4: Buffer function

1.7 `stars`: Rasters

Geometric operations on rasters can be done with package stars:

Accessing cell values—As matrix / array, Extracting to points / lines / polygons
Raster algebra—Arithmetic (+, -, …), Math (sqrt, log10, …), logical (!, ==, >, …), summary (mean, max, …), Masking
Changing resolution and extent—Cropping, Mosaic, Resampling, Reprojection (Figure 1.5)
Transformations—Raster <-> Points / Contour lines / Polygons

Figure 1.5: Reprojection of the MODIS NDVI raster from Sinusoidal (left) to ITM (right)

1.8 `gstat`: Interpolation

Univariate and multivariate geostatistics:

Variogram modelling
Ordinary and universal point or block (co)kriging (Figure 1.6)
Cross-validation

Figure 1.6: Predicted Zinc concentration, using Ordinary Kriging

1.9 `spdep`: Spatial dependence

Modelling with spatial weights:

Building neighbor lists (Figure 1.7) and spatial weights
Tests for spatial autocorrelation for areal data (e.g. Moran’s I)
Spatial regression models (e.g. SAR, CAR)

Figure 1.7: Neighbors list based on regions with contiguous boundaries

1.10 `spatstat`: Point patterns

Techniques for statistical analysis of spatial point patterns (Figure 1.8), such as:

Kernel density estimation
Detection of clustering using Ripley’s K-function
Spatial logistic regression

Figure 1.8: Distance map for the Biological Cells point pattern dataset

1.11 `RPostgreSQL`: PostGIS

When:

working with big spatial data, and/or
when data are collaboratively prepared by many users (e.g., in a large organization),

we may want to combine R with a (spatial) database.

Package sf (Section 2.1) combined with RPostgreSQL can be used to read from, and write to, a PostGIS spatial database. First, we need to create a connection object:

library(sf)
library(RPostgreSQL)

con = dbConnect(
  PostgreSQL(),
  dbname = "gisdb",
  host = "159.89.13.241",
  port = 5432,
  user = "geobgu",
  password = "*******"
)

Then, we can read or write to the database, just like from a file, using the st_read function (Section 2.3):

st_read(con, query = "SELECT name_lat, geometry FROM plants LIMIT 3;")

## Loading required package: DBI
## Simple feature collection with 3 features and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 35.19337 ymin: 31.44711 xmax: 35.67976 ymax: 32.77013
## geographic CRS: WGS 84
##         name_lat                  geometry
## 1    Iris haynei POINT (35.67976 32.77013)
## 2    Iris haynei   POINT (35.654 32.74137)
## 3 Iris atrofusca POINT (35.19337 31.44711)

dbDisconnect(con)
## [1] TRUE

1.12 Other examples

Network routing: stplanr, dodgr, sfnetworks, igraph
Simplification: rmapshaper
Access to OpenStreetMap data: osmdata, osmextract
Static mapping: ggplot2, tmap, rasterVis
Web mapping and web APIs: leaflet, mapview, deckgl, mapsapi
Spherical geometry: geosphere, s2, lwgeom

2 Spatial data structures

2.1 The `sf` package

The sf package (Figure 2.1), released in 2016, is a newer package for working with vector layers in R, which we are going to use in this tutorial. In recent years, sf has become the standard package for working with vector data in R, practically replacing sp, rgdal, and rgeos.

Figure 2.1: Pebesma, 2018, The R Journal (https://journal.r-project.org/archive/2018-1/)

One of the important innovations in sf is a complete implementation of the Simple Features standard. Since 2003, Simple Features been widely implemented in spatial databases (such as PostGIS), commercial GIS (e.g., ESRI ArcGIS) and forms the vector data basis for libraries such as GDAL. The Simple Features standard defines several types of geometries, of which seven are most commonly used in the world of GIS and spatial data analysis (Figure 2.2). When working with spatial databases, Simple Features are commonly specified as Well Known Text (WKT).

Figure 2.2: Seven Simple Feature geometry types most commonly used in GIS (see also: https://r-spatial.github.io/sf/articles/sf1.html)

The sf package depends on several external software components (installed along with the R package²), most importantly GDAL, GEOS and PROJ (Figure 2.3). These well-tested and popular open-source components are common to numerous open-source and commercial software for spatial analysis, such as QGIS and PostGIS.

Figure 2.3: sf package dependencies (source: https://github.com/edzer/rstudio_conf)

2.2 `sf` data structures

Package sf defines a hierarchical class system, with three classes (Table 2.1):

Class sfg—a single geometry
Class sfc—a geometry column, which is a set of sfg geometries + CRS information
Class sf—a layer, which is an sfc geometry column inside a data.frame with non-spatial attributes

Table 2.1: Spatial data structures in package `sf`
Class	Hierarchy	Information
`sfg`	Geometry	type, coordinates
`sfc`	Geometry column	set of `sfg` + CRS
`sf`	Layer	`sfc` + attributes

The sf class represents a vector layer by extending the data.frame class, supplementing it with a geometry column. This is similar to the way that spatial databases are structured. For example, the sample dataset shown in Figure 2.4 represents a polygonal layer with three features and six non-spatial attributes. The attributes refer to demographic and epidemiological attributes of US counties, such as the number of births in 1974 (BIR74), the number of sudden infant death cases in 1974 (SID74), and so on. The seventh column is the geometry column, containing the polygon geometries.

Figure 2.4: Structure of an sf object (source: https://cran.r-project.org/web/packages/sf/vignettes/sf1.html)

Figure 2.5 shows what the layer in Figure 2.4 would look like when mapped. We can see the outline of the three polygons, as well as the values of all six non-spatial attributes (in separate panels).

$Visualization of the `sf` object shown in Figure \@ref(fig:nc-geometry-column)$

Figure 2.5: Visualization of the sf object shown in Figure 2.4

2.3 Reading vector data

We move on to practical examples. First, let’s see what importing and examining sf layers looks like, in the R environment.

Function st_read can be used to read vector layers into sf data structures. Before doing anything else we need to load the sf package:

library(sf)
## Linking to GEOS 3.9.0, GDAL 3.2.0, PROJ 7.2.0

Then, we can use st_read to import a vector layer into the R environment. In this case, we are reading the nafot.shp Shapefile:

nafot = st_read("data/nafot.shp")
## Reading layer `nafot' from data source `/home/michael/Sync/Presentations/p_2021_01_LMS_Intro_to_Spatial_R/data/nafot.shp' using driver `ESRI Shapefile'
## Simple feature collection with 15 features and 1 field
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N

Printing the object gives a summary of its properties, and the values of the (first 10) features. In this case, we see that the nafot layer has 15 features, and one attribute called name_eng:

nafot
## Simple feature collection with 15 features and 1 field
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##     name_eng                       geometry
## 1      Zefat POLYGON ((739780.1 3686007,...
## 2  Jerusalem POLYGON ((672273.8 3518394,...
## 3   Kinneret POLYGON ((745560 3649860, 7...
## 4   Yizre'el POLYGON ((702283.1 3628046,...
## 5       Akko POLYGON ((702725.9 3630513,...
## 6      Golan POLYGON ((759304.4 3691202,...
## 7      Haifa POLYGON ((701391.6 3631170,...
## 8     Hadera POLYGON ((706537.1 3602188,...
## 9     Sharon POLYGON ((692687.3 3583974,...
## 10     Ramla POLYGON ((672841.5 3544808,...

The class function, applied on an sf layer, returns both "sf" and "data.frame" class names:

class(nafot)
## [1] "sf"         "data.frame"

As mentioned above, a layer (geometry+attributes) is represented by an sf object. The second printed value, data.frame, implies that sf is, in fact, an extension of the data.frame class, inheriting many of its methods. Indeed, we will see that many functions in R work exactly the same way on sf and data.frame objects.

If we want just the geometric part, it can be extracted with st_geometry, resulting in an object of class sfc, i.e., a geometry column (Table 2.1):

st_geometry(nafot)
## Geometry set for 15 features 
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N
## First 5 geometries:
## POLYGON ((739780.1 3686007, 739786.6 3686007, 7...
## POLYGON ((672273.8 3518394, 672368.4 3518511, 6...
## POLYGON ((745560 3649860, 745585.8 3649853, 745...
## POLYGON ((702283.1 3628046, 702283.5 3628046, 7...
## POLYGON ((702725.9 3630513, 702724.1 3630510, 7...

Conversely, If we want just the non-geometric part (the “attributes”), it can be extracted with st_drop_geometry which returns a data.frame:

st_drop_geometry(nafot)
##       name_eng
## 1        Zefat
## 2    Jerusalem
## 3     Kinneret
## 4     Yizre'el
## 5         Akko
## 6        Golan
## 7        Haifa
## 8       Hadera
## 9       Sharon
## 10       Ramla
## 11     Rehovot
## 12    Ashqelon
## 13 Be'er Sheva
## 14 Petah Tiqwa
## 15    Tel Aviv

The plot function is a quick way to see the spatial arrangment and attribute values in an sf layer. For example, the nafot layer attribute can be plotted as follows (Figure 2.6):

plot(nafot, key.width = lcm(4))

Figure 2.6: The nafot layer

Alternatively, we can plot the geometry only, without attributes, as follows (Figure 2.7):

plot(st_geometry(nafot))

Figure 2.7: The nafot layer

2.4 Coordinate Reference Systems (CRS)

2.4.1 What are CRS?

A Coordinate Reference System (CRS) defines how the coordinates in our geometries relate to the surface of the Earth. There are two main types of CRS:

Geographic—longitude and latitude, in degrees
Projected—implying flat surface, usually in units of true distance (e.g., meters)

For example, Figure 2.8 shows the same polygonal layer (U.S. counties) in two different projection. On the left, the county layer is in the WGS84 geographic projection. Indeed, we can see that the axes are given in degrees of longitude and latitude. For example, we can see how the nothern border of U.S. follows the 49° latitude line. On the right, the same layer is displayed in the US National Atlas projection, where units are arbitrary but reflect true distance (meters). For example, the distance between every two consecutive grid lines is 1,000,000 meters or 1,000 kilometers.

Figure 2.8: US counties in WGS84 and US Atlas projections

The CRS of a given sf layer can be obtained using function st_crs. The CRS information is returned in the WKT format:

st_crs(nafot)
## Coordinate Reference System:
##   User input: WGS 84 / UTM zone 36N 
##   wkt:
## PROJCRS["WGS 84 / UTM zone 36N",
##     BASEGEOGCRS["WGS 84",
##         DATUM["World Geodetic System 1984",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4326]],
##     CONVERSION["UTM zone 36N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",33,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     ID["EPSG",32636]]

2.4.2 Vector layer reprojection

Reprojection is the transformation of geometry coordinates, from one CRS to another. It is an important part of spatial analysis workflow, since we often need to:

Transform several layers into the same projection, so that they can be displayed one on top of the other (e.g., Figure 3.3) or so that they can be subject to a spatial operator (e.g., Figure 3.10)
Switch between geographic and projected CRS

A vector layer can be reprojected with st_transform. The st_transform function has two important parameters:

x—The layer to be reprojected
crs—The target CRS

The crs can be specified in one of four ways:

An EPSG code (e.g., 4326)
A PROJ4 string (e.g., "+proj=longlat +datum=WGS84 +no_defs")
A WKT string
A crs object of another layer, as returned by st_crs

For example, the following expression reprojects the nafot layer to the geographic WGS84 CRS, using its EPSG code (4326):

nafot_wgs84 = st_transform(nafot, 4326)

Printing the layer summary demostrates that the coordinates and CRS definition have changed:

nafot_wgs84
## Simple feature collection with 15 features and 1 field
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 34.26679 ymin: 29.48737 xmax: 35.89468 ymax: 33.33479
## geographic CRS: WGS 84
## First 10 features:
##     name_eng                       geometry
## 1      Zefat POLYGON ((35.57481 33.2865,...
## 2  Jerusalem POLYGON ((34.81956 31.78814...
## 3   Kinneret POLYGON ((35.6271 32.95949,...
## 4   Yizre'el POLYGON ((35.15965 32.77173...
## 5       Akko POLYGON ((35.16491 32.79389...
## 6      Golan POLYGON ((35.78575 33.32878...
## 7      Haifa POLYGON ((35.15081 32.80006...
## 8     Hadera POLYGON ((35.19932 32.53785...
## 9     Sharon POLYGON ((35.0482 32.37614,...
## 10     Ramla POLYGON ((34.83026 32.02623...

The two layers, nafot and nafot_wgs84, are shown in Figure 2.9:

plot(st_geometry(nafot), main = "UTM", axes = TRUE)
plot(st_geometry(nafot_wgs84), main = "WGS84", axes = TRUE)

Figure 2.9: Nafot in UTM and WGS84 coordinate reference systems

3 Geoprocessing functions

3.1 Reading layers into R

For the next series of examples, we read a second layer called RAIL_STRATEGIC.shp. The RAIL_STRATEGIC.shp layer contains railway lines in Israel. Using st_read, we import the layer and create an sf object called rail:

rail = st_read("data/RAIL_STRATEGIC.shp", options = "ENCODING=UTF-8")
## options:        ENCODING=UTF-8 
## Reading layer `RAIL_STRATEGIC' from data source `/home/michael/Sync/Presentations/p_2021_01_LMS_Intro_to_Spatial_R/data/RAIL_STRATEGIC.shp' using driver `ESRI Shapefile'
## Simple feature collection with 217 features and 6 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 157822.2 ymin: 385552.2 xmax: 255113.7 ymax: 790663.5
## projected CRS:  Israel 1993 / Israeli TM Grid

(Note that the ENCODING option is only needed when operating system default encoding is not the same as the layer encoding.)

Here is a summary of the rail layer:

rail
## Simple feature collection with 217 features and 6 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 157822.2 ymin: 385552.2 xmax: 255113.7 ymax: 790663.5
## projected CRS:  Israel 1993 / Israeli TM Grid
## First 10 features:
##       UPGRADE SHAPE_LENG SHAPE_Le_1               SEGMENT ISACTIVE
## 1  שדרוג 2040  14489.401  12379.715     כפר יהושע - נשר_1     פעיל
## 2  שדרוג 2030   2159.344   2274.112    באר יעקב-ראשונים_2     פעיל
## 3  שדרוג 2040   4942.306   4942.306          נתבג - נען_3  לא פעיל
## 4  שדרוג 2040   2829.892   2793.338 לב המפרץ מזרח - נשר_4     פעיל
## 5  שדרוג 2040   1976.491   1960.171     קרית גת - להבים_5     פעיל
## 6  שדרוג 2040   1195.701   1195.701    רמלה - רמלה מזרח_6     פעיל
## 7  שדרוג 2030   4224.799   4224.799   אור עקיבא - עתלית_7  לא פעיל
## 8    חדש 2040  36452.633  36452.633      מרכז ספיר-פארן_8  לא פעיל
## 9  שדרוג 2040   7822.722   7892.334       נען - בית שמש_9     פעיל
## 10 שדרוג 2030   4610.831   4176.157       נתבג - ההגנה_10     פעיל
##                   UPGRAD                       geometry
## 1  שדרוג בין 2030 ל-2040 LINESTRING (205530.1 741563...
## 2          שדרוג עד 2030 LINESTRING (181507.6 650706...
## 3  שדרוג בין 2030 ל-2040 LINESTRING (189180.6 645433...
## 4  שדרוג בין 2030 ל-2040 LINESTRING (203482.8 744181...
## 5  שדרוג בין 2030 ל-2040 LINESTRING (178574.1 609392...
## 6  שדרוג בין 2030 ל-2040 LINESTRING (189266.6 647211...
## 7          שדרוג עד 2030 LINESTRING (193268.9 719774...
## 8  פתיחה בין 2030 ל-2040 LINESTRING (219966 509396.8...
## 9  שדרוג בין 2030 ל-2040 LINESTRING (188081.3 642530...
## 10         שדרוג עד 2030 LINESTRING (184248 657573.3...

3.2 Reprojection

For any type of spatial analysis, we usually need all input layers to be in the same CRS. For this purpose, we will reproject the rail layer to the CRS of the nafot layer:

rail = st_transform(rail, st_crs(nafot))

3.3 Basic plotting

We can plot the rail layer on its own, as shown in Figure 3.1, to examine its attributes:

plot(rail)

$The `rail` layer$

Figure 3.1: The rail layer

Or plot just one attribute, in which case there is a legened (Figure 3.2):

plot(rail["ISACTIVE"], key.width = lcm(3))

Figure 3.2: One attribute of the rail layer

We can also plot the nafot and rail geometries together, to examine their arrangement (Figure 3.3). The second expression uses add=TRUE to add the geometries on top of the existing plot, instead of initializing a new one:

plot(st_geometry(nafot), border = "grey")
plot(st_geometry(rail), col = "red", add = TRUE)

Figure 3.3: The nafot and rail geometries

3.4 Subsetting

3.4.1 Non-spatial

Non-spatial subsetting of features in an sf vector layer, e.g., according to attribute values, is done in exactly the same way as in a data.frame. For example, the following expression selects the rail features for active lines, using the logical vector rail$ISACTIVE == "פעיל" as the rows index:

rail = rail[rail$ISACTIVE == "פעיל", ]
rail
## Simple feature collection with 161 features and 6 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##       UPGRADE SHAPE_LENG SHAPE_Le_1               SEGMENT ISACTIVE
## 1  שדרוג 2040  14489.401  12379.715     כפר יהושע - נשר_1     פעיל
## 2  שדרוג 2030   2159.344   2274.112    באר יעקב-ראשונים_2     פעיל
## 4  שדרוג 2040   2829.892   2793.338 לב המפרץ מזרח - נשר_4     פעיל
## 5  שדרוג 2040   1976.491   1960.171     קרית גת - להבים_5     פעיל
## 6  שדרוג 2040   1195.701   1195.701    רמלה - רמלה מזרח_6     פעיל
## 9  שדרוג 2040   7822.722   7892.334       נען - בית שמש_9     פעיל
## 10 שדרוג 2030   4610.831   4176.157       נתבג - ההגנה_10     פעיל
## 11 שדרוג 2040   1426.854   1426.854   סוקולוב - נורדאו_11     פעיל
## 13 שדרוג 2030   8758.880   8758.880        נהריה - עכו_13     פעיל
## 14 שדרוג 2030   5102.569   5102.569    חולות-יבנה מערב_14     פעיל
##                   UPGRAD                       geometry
## 1  שדרוג בין 2030 ל-2040 LINESTRING (692568.6 362751...
## 2          שדרוג עד 2030 LINESTRING (670422.8 353618...
## 4  שדרוג בין 2030 ל-2040 LINESTRING (690467.1 363008...
## 5  שדרוג בין 2030 ל-2040 LINESTRING (668326.9 349482...
## 6  שדרוג בין 2030 ל-2040 LINESTRING (678250.9 353284...
## 9  שדרוג בין 2030 ל-2040 LINESTRING (677161.1 352814...
## 10         שדרוג עד 2030 LINESTRING (673022.5 354310...
## 11 שדרוג בין 2030 ל-2040 LINESTRING (679299.3 356088...
## 13         שדרוג עד 2030 LINESTRING (696063 3653684,...
## 14         שדרוג עד 2030 LINESTRING (664704.1 353470...

We can also subset the columns (attributes) we need. For example, in the following expressions we create an ID attribute called segment_id, and remove all other attributes:

rail$segment_id = 1:nrow(rail)
rail = rail["segment_id"]
rail
## Simple feature collection with 161 features and 1 field
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##    segment_id                       geometry
## 1           1 LINESTRING (692568.6 362751...
## 2           2 LINESTRING (670422.8 353618...
## 4           3 LINESTRING (690467.1 363008...
## 5           4 LINESTRING (668326.9 349482...
## 6           5 LINESTRING (678250.9 353284...
## 9           6 LINESTRING (677161.1 352814...
## 10          7 LINESTRING (673022.5 354310...
## 11          8 LINESTRING (679299.3 356088...
## 13          9 LINESTRING (696063 3653684,...
## 14         10 LINESTRING (664704.1 353470...

The modified layer is shown in Figure 3.4:

plot(rail)

Figure 3.4: Subset of active railway lines

3.4.2 Spatial

We can also subset features according to whether they intersect with another layer, using the latter as an index. For example, the following expression creates a subset of nafot, named nafot1, with only those features intersecting the rail layer:

nafot1 = nafot[rail, ]
nafot1
## Simple feature collection with 12 features and 1 field
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 742395.3 ymax: 3665564
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##       name_eng                       geometry
## 2    Jerusalem POLYGON ((672273.8 3518394,...
## 4     Yizre'el POLYGON ((702283.1 3628046,...
## 5         Akko POLYGON ((702725.9 3630513,...
## 7        Haifa POLYGON ((701391.6 3631170,...
## 8       Hadera POLYGON ((706537.1 3602188,...
## 9       Sharon POLYGON ((692687.3 3583974,...
## 10       Ramla POLYGON ((672841.5 3544808,...
## 11     Rehovot POLYGON ((660883 3525603, 6...
## 12    Ashqelon POLYGON ((660883 3525603, 6...
## 13 Be'er Sheva POLYGON ((639165.2 3481650,...

Figure 3.5 shows the nafot1 subset (in grey fill), the complete nafot layer (in grey outline), and the railway lines layer (in red):

plot(st_geometry(nafot), border = "grey")
plot(st_geometry(nafot1), border = "grey", col = "grey90", add = TRUE)
plot(st_geometry(rail), col = "red", add = TRUE)

Figure 3.5: The nafot and rail geometries

3.5 Geometric calculations

Geometric operations on vector layers can conceptually be divided into three groups according to their output:

Numeric values (Section 3.5.1)—Functions that summarize geometrical properties of:
- A single layer—e.g., area, length
- A pair of layers—e.g., distance
Logical values (Section 3.5.2)—Functions that evaluate whether a certain condition holds true, regarding:
- A single layer—e.g., geometry is valid
- A pair of layers—e.g., feature A intersects feature B
Spatial layers (Section 3.5.3)—Functions that create a new layer based on:
- A single layer—e.g., centroid, buffer
- A pair of layers—e.g., intersection area

3.5.1 Numeric

There are several functions to calculate numeric geometric properties of vector layers in package sf:

st_length—Length
st_area—Area
st_distance—Distance
st_bbox—Bounding box
st_dimension—Dimensions (0 = points, 1 = lines, 2 = polygons)

For example, we can calculate the area of each feature in the nafot layer using st_area. The result is assigned to a new attribute named area in nafot:

nafot$area = st_area(nafot)

The nafot layer now has a new attribute named area, with the “Nafa” polygon areas:

nafot
## Simple feature collection with 15 features and 2 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##     name_eng                       geometry             area
## 1      Zefat POLYGON ((739780.1 3686007,...  660340513 [m^2]
## 2  Jerusalem POLYGON ((672273.8 3518394,...  653746831 [m^2]
## 3   Kinneret POLYGON ((745560 3649860, 7...  639422163 [m^2]
## 4   Yizre'el POLYGON ((702283.1 3628046,... 1231817404 [m^2]
## 5       Akko POLYGON ((702725.9 3630513,...  955636113 [m^2]
## 6      Golan POLYGON ((759304.4 3691202,... 1153904250 [m^2]
## 7      Haifa POLYGON ((701391.6 3631170,...  299106812 [m^2]
## 8     Hadera POLYGON ((706537.1 3602188,...  577310399 [m^2]
## 9     Sharon POLYGON ((692687.3 3583974,...  348445079 [m^2]
## 10     Ramla POLYGON ((672841.5 3544808,...  339246753 [m^2]

You may notice the area values are printed with their units [m^2]. The resulting area values, just like any other numeric calculations in sf, are returned as object of class units:

class(nafot$area)
## [1] "units"

A units object is basically a numeric vector associated with units of measurement. We can transform units measurements to different units, with function set_units from package units. First, we load the units library:

library(units)
## udunits system database from /usr/share/xml/udunits

Then, we make the transformation to $km^2$ units:

nafot$area = set_units(nafot$area, "km^2")

The modified area column can be observed in the layer summary:

nafot
## Simple feature collection with 15 features and 2 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##     name_eng                       geometry             area
## 1      Zefat POLYGON ((739780.1 3686007,...  660.3405 [km^2]
## 2  Jerusalem POLYGON ((672273.8 3518394,...  653.7468 [km^2]
## 3   Kinneret POLYGON ((745560 3649860, 7...  639.4222 [km^2]
## 4   Yizre'el POLYGON ((702283.1 3628046,... 1231.8174 [km^2]
## 5       Akko POLYGON ((702725.9 3630513,...  955.6361 [km^2]
## 6      Golan POLYGON ((759304.4 3691202,... 1153.9042 [km^2]
## 7      Haifa POLYGON ((701391.6 3631170,...  299.1068 [km^2]
## 8     Hadera POLYGON ((706537.1 3602188,...  577.3104 [km^2]
## 9     Sharon POLYGON ((692687.3 3583974,...  348.4451 [km^2]
## 10     Ramla POLYGON ((672841.5 3544808,...  339.2468 [km^2]

Figure 3.6 shows a map of the calculated area sizes:

plot(nafot["area"])

Figure 3.6: Calculated area attribute

3.5.2 Logical

Given two layers, x and y, the following logical geometric functions check whether each feature in x maintains the specified relation with each feature in y (Table tab:logical-geometric-functions).

Table 3.1: Logical geometric functions (from https://keen-swartz-3146c4.netlify.app/geommanip.html#predicates)
Function	Value	Inverse of
`st_contains`	None of the points of A are outside B	`st_within`
`st_contains_properly`	A contains B and B has no points in common with the boundary of A
`st_covers`	No points of B lie in the exterior of A	`st_covered_by`
`st_covered_by`	inverse of `st_covers`
`st_crosses`	A and B have some but not all interior points in common
`st_disjoint`	A and B have no points in common	`st_intersects`
`st_equals`	A and B are geometrically equal; node order number of nodes may differ; identical to A contains B AND A within B
`st_equals_exact`	A and B are geometrically equal, and have identical node order
`st_intersects`	A and B are not disjoint	`st_disjoint`
`st_is_within_distance`	A is closer to B than a given distance
`st_within`	None of the points of B are outside A	`st_contains`
`st_touches`	A and B have at least one boundary point in common, but no interior points
`st_overlaps`	A and B have some points in common; the dimension of these is identical to that of A and B
`st_relate`	given a pattern, returns whether A and B adhere to this pattern

When specifying sparse=FALSE the functions return a logical matrix. Each element i,j in the matrix is TRUE when f(x[i], y[j]) is TRUE.

For example, this creates a matrix of intersection relations between nafot:

int = st_intersects(nafot1, nafot1, sparse = FALSE)

int
##        [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
##  [1,]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE
##  [2,] FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [3,] FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [4,] FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [5,] FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [6,] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE
##  [7,]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE
##  [8,]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE
##  [9,]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
## [11,] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE
## [12,] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE

The following code section visualizes the matrix (Figure 3.7):

int1 = apply(int, 2, rev)
int1 = t(int1)
image(int1, col = c("lightgrey", "red"), asp = 1, axes = FALSE)
axis(3, at = seq(0, 1, 1/(nrow(int1)-1)), labels = nafot1$name_eng, las = 2, lwd = 0, lwd.ticks = 1, cex.axis = 0.75)
axis(2, at = seq(0, 1, 1/(nrow(int1)-1)), labels = rev(nafot1$name_eng), las = 1, pos = -0.046, lwd = 0, lwd.ticks = 1, cex.axis = 0.75)

Figure 3.7: Intersection relations between nafot features

3.5.3 Spatial

sf provides common geometry-generating functions applicable to individual geometries, such as:

st_centroid—Centroids
st_buffer—Buffer
st_sample—Random sample points
st_convex_hull—Convex hull
st_voronoi—Voronoi polygons

Figure 3.8: Geometry-generating operations on individual layers

For example, to calculate the centroids of the “Nafa” polygons we can use the st_centroid function as follows:

nafot_ctr = st_centroid(nafot)

They can be plotted as follows, the result is shown in Figure 3.9:

plot(st_geometry(nafot), border = "grey")
plot(st_geometry(nafot_ctr), col = "red", pch = 3, add = TRUE)

Figure 3.9: “Nafot” centroids

Other geometry-generating functions work on pairs of input geometries (Figure 3.10):

st_intersection
st_difference
st_sym_difference
st_union

Figure 3.10: Geometry-generating operations on pairs of layers

Now we move on to examples of two specific tasks:

Calculating line density per polygon (Section 4)
Calculating heirarchical clusters (Section 5)

4 Example #1: Rail density

4.1 Splitting lines by polygons

In this example, our goal is to calculate the average rail density ($km$ per $km^2$) per “Nafa” (Figure 4.5). To calculate total railway length per “Nafa”, we are going to dissolve the rail layer, then split it by “Nafa” polygons.

First, we “dissolve” the railway layer into a single MULTILINESTRING geometry:

rail = st_union(rail)
rail
## Geometry set for 1 feature 
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS:  WGS 84 / UTM zone 36N

The result is a geometry column (sfc), which we convert to sf so that it can accomodate “Nafa” name attribute later on:

rail = st_sf(geometry = rail)
rail
## Simple feature collection with 1 feature and 0 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS:  WGS 84 / UTM zone 36N
##                         geometry
## 1 MULTILINESTRING ((649213.3 ...

Then, we “split” the rail layer back according to “Nafot” polygons using st_intersection (Section 3.5.3):

rail = st_intersection(rail, nafot)

The result is a new line layer of 12 MULTILINESTRING geometries, including the name_eng attribute of the corresponding “Nafa” they intersect with:

rail
## Simple feature collection with 12 features and 2 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##       name_eng              area                       geometry
## 1    Jerusalem   653.7468 [km^2] MULTILINESTRING ((673559.5 ...
## 2     Yizre'el  1231.8174 [km^2] MULTILINESTRING ((697854.1 ...
## 3         Akko   955.6361 [km^2] MULTILINESTRING ((696063 36...
## 4        Haifa   299.1068 [km^2] MULTILINESTRING ((683294.1 ...
## 5       Hadera   577.3104 [km^2] MULTILINESTRING ((678559 35...
## 6       Sharon   348.4451 [km^2] MULTILINESTRING ((673053.9 ...
## 7        Ramla   339.2468 [km^2] MULTILINESTRING ((671460.5 ...
## 8      Rehovot   323.6068 [km^2] MULTILINESTRING ((658033.1 ...
## 9     Ashqelon  1270.6010 [km^2] MULTILINESTRING ((649213.3 ...
## 10 Be'er Sheva 13182.9382 [km^2] MULTILINESTRING ((649386 34...

The layer is visualized in Figure 4.1:

plot(rail["name_eng"], lwd = 3, key.width = lcm(4), reset = FALSE)
plot(st_geometry(nafot), border = "lightgrey", add = TRUE)

Figure 4.1: Intersection result

4.2 Line length

Next, we will calculate line length of each rail segment, using function st_length (Section 3.5.1). We assign the result into a railway length attribute called length:

rail$length = st_length(rail)
rail
## Simple feature collection with 12 features and 3 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##       name_eng              area                       geometry        length
## 1    Jerusalem   653.7468 [km^2] MULTILINESTRING ((673559.5 ...  60237.68 [m]
## 2     Yizre'el  1231.8174 [km^2] MULTILINESTRING ((697854.1 ...  43289.01 [m]
## 3         Akko   955.6361 [km^2] MULTILINESTRING ((696063 36...  36191.26 [m]
## 4        Haifa   299.1068 [km^2] MULTILINESTRING ((683294.1 ...  41606.83 [m]
## 5       Hadera   577.3104 [km^2] MULTILINESTRING ((678559 35...  46310.90 [m]
## 6       Sharon   348.4451 [km^2] MULTILINESTRING ((673053.9 ...  24785.27 [m]
## 7        Ramla   339.2468 [km^2] MULTILINESTRING ((671460.5 ...  82702.28 [m]
## 8      Rehovot   323.6068 [km^2] MULTILINESTRING ((658033.1 ...  47651.51 [m]
## 9     Ashqelon  1270.6010 [km^2] MULTILINESTRING ((649213.3 ...  76780.70 [m]
## 10 Be'er Sheva 13182.9382 [km^2] MULTILINESTRING ((649386 34... 119747.86 [m]

For convenience, we will also convert the values from $m$ units to $km$ units:

rail$length = set_units(rail$length, "km")
rail
## Simple feature collection with 12 features and 3 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##       name_eng              area                       geometry         length
## 1    Jerusalem   653.7468 [km^2] MULTILINESTRING ((673559.5 ...  60.23768 [km]
## 2     Yizre'el  1231.8174 [km^2] MULTILINESTRING ((697854.1 ...  43.28901 [km]
## 3         Akko   955.6361 [km^2] MULTILINESTRING ((696063 36...  36.19126 [km]
## 4        Haifa   299.1068 [km^2] MULTILINESTRING ((683294.1 ...  41.60683 [km]
## 5       Hadera   577.3104 [km^2] MULTILINESTRING ((678559 35...  46.31090 [km]
## 6       Sharon   348.4451 [km^2] MULTILINESTRING ((673053.9 ...  24.78527 [km]
## 7        Ramla   339.2468 [km^2] MULTILINESTRING ((671460.5 ...  82.70228 [km]
## 8      Rehovot   323.6068 [km^2] MULTILINESTRING ((658033.1 ...  47.65151 [km]
## 9     Ashqelon  1270.6010 [km^2] MULTILINESTRING ((649213.3 ...  76.78070 [km]
## 10 Be'er Sheva 13182.9382 [km^2] MULTILINESTRING ((649386 34... 119.74786 [km]

The lengths are visualized in Figure 4.2:

plot(rail["name_eng"], lwd = 3, key.width = lcm(4), reset = FALSE)
plot(st_geometry(nafot), border = "lightgrey", add = TRUE)
text(st_coordinates(st_centroid(rail)), as.character(round(rail$length)))

Figure 4.2: Intersection result

4.3 Join layer with table

To transfer the track lengths we just calculated to the polygonal nafot layer, we use a non-spatial join. First, we extract the attribute table of rail, using st_drop_geometry (Section 2.3):

rail = st_drop_geometry(rail)
rail
##       name_eng              area         length
## 1    Jerusalem   653.7468 [km^2]  60.23768 [km]
## 2     Yizre'el  1231.8174 [km^2]  43.28901 [km]
## 3         Akko   955.6361 [km^2]  36.19126 [km]
## 4        Haifa   299.1068 [km^2]  41.60683 [km]
## 5       Hadera   577.3104 [km^2]  46.31090 [km]
## 6       Sharon   348.4451 [km^2]  24.78527 [km]
## 7        Ramla   339.2468 [km^2]  82.70228 [km]
## 8      Rehovot   323.6068 [km^2]  47.65151 [km]
## 9     Ashqelon  1270.6010 [km^2]  76.78070 [km]
## 10 Be'er Sheva 13182.9382 [km^2] 119.74786 [km]
## 11 Petah Tiqwa   283.1575 [km^2]  35.57314 [km]
## 12    Tel Aviv   175.6073 [km^2]  33.82369 [km]

and keep just the name_eng and length columns:

rail = rail[c("name_eng", "length")]
rail
##       name_eng         length
## 1    Jerusalem  60.23768 [km]
## 2     Yizre'el  43.28901 [km]
## 3         Akko  36.19126 [km]
## 4        Haifa  41.60683 [km]
## 5       Hadera  46.31090 [km]
## 6       Sharon  24.78527 [km]
## 7        Ramla  82.70228 [km]
## 8      Rehovot  47.65151 [km]
## 9     Ashqelon  76.78070 [km]
## 10 Be'er Sheva 119.74786 [km]
## 11 Petah Tiqwa  35.57314 [km]
## 12    Tel Aviv  33.82369 [km]

Now we can join nafot and rail, by the common column "name_eng". This is done using function merge. The additional argument all.x=TRUE implies a left join:

nafot = merge(nafot, rail, by = "name_eng", all.x = TRUE)

As a result of the join operation, the nafot layer now has a new length attribute:

nafot
## Simple feature collection with 15 features and 3 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##       name_eng              area         length                       geometry
## 1         Akko   955.6361 [km^2]  36.19126 [km] POLYGON ((702725.9 3630513,...
## 2     Ashqelon  1270.6010 [km^2]  76.78070 [km] POLYGON ((660883 3525603, 6...
## 3  Be'er Sheva 13182.9382 [km^2] 119.74786 [km] POLYGON ((639165.2 3481650,...
## 4        Golan  1153.9042 [km^2]        NA [km] POLYGON ((759304.4 3691202,...
## 5       Hadera   577.3104 [km^2]  46.31090 [km] POLYGON ((706537.1 3602188,...
## 6        Haifa   299.1068 [km^2]  41.60683 [km] POLYGON ((701391.6 3631170,...
## 7    Jerusalem   653.7468 [km^2]  60.23768 [km] POLYGON ((672273.8 3518394,...
## 8     Kinneret   639.4222 [km^2]        NA [km] POLYGON ((745560 3649860, 7...
## 9  Petah Tiqwa   283.1575 [km^2]  35.57314 [km] POLYGON ((674110.3 3549169,...
## 10       Ramla   339.2468 [km^2]  82.70228 [km] POLYGON ((672841.5 3544808,...

Length of NA implies there was no matching rail feature, which means there are no railways in the “Nafa”. These NA values should be replaced with zero:

nafot$length[is.na(nafot$length)] = 0
nafot
## Simple feature collection with 15 features and 3 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##       name_eng              area         length                       geometry
## 1         Akko   955.6361 [km^2]  36.19126 [km] POLYGON ((702725.9 3630513,...
## 2     Ashqelon  1270.6010 [km^2]  76.78070 [km] POLYGON ((660883 3525603, 6...
## 3  Be'er Sheva 13182.9382 [km^2] 119.74786 [km] POLYGON ((639165.2 3481650,...
## 4        Golan  1153.9042 [km^2]   0.00000 [km] POLYGON ((759304.4 3691202,...
## 5       Hadera   577.3104 [km^2]  46.31090 [km] POLYGON ((706537.1 3602188,...
## 6        Haifa   299.1068 [km^2]  41.60683 [km] POLYGON ((701391.6 3631170,...
## 7    Jerusalem   653.7468 [km^2]  60.23768 [km] POLYGON ((672273.8 3518394,...
## 8     Kinneret   639.4222 [km^2]   0.00000 [km] POLYGON ((745560 3649860, 7...
## 9  Petah Tiqwa   283.1575 [km^2]  35.57314 [km] POLYGON ((674110.3 3549169,...
## 10       Ramla   339.2468 [km^2]  82.70228 [km] POLYGON ((672841.5 3544808,...

The calculated total lengths per “Nafa” are shown in Figure 4.3:

plot(nafot["length"])

Figure 4.3: Total railway length per Nafa

4.4 Calculating density

Now that we know the total railway length and the area, per “Nafa”, we can divide total railway length by area. This gives us railway density per “Nafa”:

nafot$density = nafot$length / nafot$area

Sorting the features lets us clearly see which “Nafa” has the highest densities of railway lines:

nafot = nafot[order(nafot$density, decreasing = TRUE), ]
nafot
## Simple feature collection with 15 features and 4 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##       name_eng             area        length                       geometry
## 10       Ramla  339.2468 [km^2] 82.70228 [km] POLYGON ((672841.5 3544808,...
## 13    Tel Aviv  175.6073 [km^2] 33.82369 [km] POLYGON ((674110.3 3549169,...
## 11     Rehovot  323.6068 [km^2] 47.65151 [km] POLYGON ((660883 3525603, 6...
## 6        Haifa  299.1068 [km^2] 41.60683 [km] POLYGON ((701391.6 3631170,...
## 9  Petah Tiqwa  283.1575 [km^2] 35.57314 [km] POLYGON ((674110.3 3549169,...
## 7    Jerusalem  653.7468 [km^2] 60.23768 [km] POLYGON ((672273.8 3518394,...
## 5       Hadera  577.3104 [km^2] 46.31090 [km] POLYGON ((706537.1 3602188,...
## 12      Sharon  348.4451 [km^2] 24.78527 [km] POLYGON ((692687.3 3583974,...
## 2     Ashqelon 1270.6010 [km^2] 76.78070 [km] POLYGON ((660883 3525603, 6...
## 1         Akko  955.6361 [km^2] 36.19126 [km] POLYGON ((702725.9 3630513,...
##              density
## 10 0.24378209 [1/km]
## 13 0.19260989 [1/km]
## 11 0.14725126 [1/km]
## 6  0.13910357 [1/km]
## 9  0.12563022 [1/km]
## 7  0.09214221 [1/km]
## 5  0.08021836 [1/km]
## 12 0.07113107 [1/km]
## 2  0.06042865 [1/km]
## 1  0.03787138 [1/km]

Plotting the layer shows the new density attribute (Figures 4.4 on its own:

plot(nafot["density"])

Figure 4.4: Railway density per “Nafa”

or along with 4.5) the other attributes:

plot(nafot)

Figure 4.5: Nafot layer with calculated attributes

5 Example #2: Population patterns

5.1 Reading statistical areas

In the second example, we are going to examine the dissimilarity between “Nafot” in terms of their population age structure. The demographic data come at the statistical area level, which we are going to aggregate to the “Nafa” level.

First, we read the statistical areas Shapefile into an object named stat:

stat = st_read("data/statisticalareas_demography2018.gdb", options = "ENCODING=UTF-8")
## options:        ENCODING=UTF-8 
## Reading layer `statisticalareas_demography2018' from data source `/home/michael/Sync/Presentations/p_2021_01_LMS_Intro_to_Spatial_R/data/statisticalareas_demography2018.gdb' using driver `OpenFileGDB'
## Simple feature collection with 3194 features and 34 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 130175.2 ymin: 378139.7 xmax: 284068.5 ymax: 804530.3
## projected CRS:  Israel 1993 / Israeli TM Grid

5.2 Subsetting

The stat layer has numerous columns:

vars = colnames(stat)
vars
##  [1] "SEMEL_YISHUV"         "STAT11"               "YISHUV_STAT11"       
##  [4] "SHEM_YISHUV"          "SHEM_YISHUV_ENGLISH"  "Stat11_Unite"        
##  [7] "Stat11_Ref"           "Main_Function_Code"   "Main_Function_Txt"   
## [10] "Religion_yishuv_Code" "Religion_yishuv_Txt"  "Religion_Stat_Code"  
## [13] "Religion_Stat_Txt"    "Pop_Total"            "Male_Total"          
## [16] "Female_Total"         "age_0_4"              "age_5_9"             
## [19] "age_10_14"            "age_15_19"            "age_20_24"           
## [22] "age_25_29"            "age_30_34"            "age_35_39"           
## [25] "age_40_44"            "age_45_49"            "age_50_54"           
## [28] "age_55_59"            "age_60_64"            "age_65_69"           
## [31] "age_70_74"            "age_75_up"            "SHAPE_Length"        
## [34] "SHAPE_Area"           "SHAPE"

In our case, we are only interested in the population estimates per age category, i.e., all variables that start with "age_":

vars = vars[grepl("age_", vars)]
vars
##  [1] "age_0_4"   "age_5_9"   "age_10_14" "age_15_19" "age_20_24" "age_25_29"
##  [7] "age_30_34" "age_35_39" "age_40_44" "age_45_49" "age_50_54" "age_55_59"
## [13] "age_60_64" "age_65_69" "age_70_74" "age_75_up"
stat = stat[vars]

The resulting subset of the stat layer, now with just 16 attributes, is shown in Figure 5.1.

plot(stat, max.plot = 16)

Figure 5.1: Population estimates in the stat layer

5.3 Filling missing data

One thing we may notice in Figure 5.1 (or printing stat in the console) is that many of the statistical areas have NA values, meant to represent zero (rather than unknown) population size (e.g., in open areas). For all practical purposes, these NA values should be replaced with 0 values:

stat[is.na(stat)] = 0

The modified stat layer is shown in Figure 5.2:

plot(stat, max.plot = 16)

Figure 5.2: Population estimates in the stat layer

5.4 Reprojecting

Next, we are going to calculate the average age structure per “Nafa” (nafot), based on the data from statistical areas (stat). First of all, we need to make sure both layers are in the same CRS (Section 2.4.2):

stat = st_transform(stat, st_crs(nafot))

Figure 5.3 shows one of the attributes in stat (colored polygons with narrow borders) with the nafot layer on top (transparent polygons with thick black bordwer), demonstrating that the layers are aligned:

plot(stat["age_10_14"], pal = hcl.colors(12, "Reds", rev = TRUE), border = "black", lwd = 0.07, reset = FALSE)
plot(st_geometry(nafot), lwd = 0.5, add = TRUE)

Figure 5.3: The age_10_14 attribute in stat, and the nafot layer

5.5 Area weighted sum

Now, in theory, we are ready to calculate the area-weighted sum of population per “Nafa”, using function st_interpolate_aw. The function requires three parameters:

x—Input layer (with values to transfer)
to—Target geometries (where to transfer values)
extensive—Does the layer contains spatially extensive variables, which are summed, such as population (TRUE), or spatially intensive variables which are averaged, such as population density (FALSE)?

In our case, the appropriate expression is:

st_interpolate_aw(stat, nafot, extensive = TRUE)

5.6 Standardizing geometries

However, the operation fails, since the layer contains unsapported geometry types:

x = st_interpolate_aw(stat, nafot, extensive = TRUE)
## Error in CPL_geos_op2(op, x, y): Evaluation error: ParseException: Unknown WKB type 12.

namely, MULTISURFACE geometries (detected using st_geometry_type):

as.data.frame(table(st_geometry_type(stat)))
##                  Var1 Freq
## 1            GEOMETRY    0
## 2               POINT    0
## 3          LINESTRING    0
## 4             POLYGON    0
## 5          MULTIPOINT    0
## 6     MULTILINESTRING    0
## 7        MULTIPOLYGON 3068
## 8  GEOMETRYCOLLECTION    0
## 9      CIRCULARSTRING    0
## 10      COMPOUNDCURVE    0
## 11       CURVEPOLYGON    0
## 12         MULTICURVE    0
## 13       MULTISURFACE  126
## 14              CURVE    0
## 15            SURFACE    0
## 16  POLYHEDRALSURFACE    0
## 17                TIN    0
## 18           TRIANGLE    0

We can “cast” the latter to MULTIPOLYGON, as follows, to avoid the unsupported geometry type:

stat = st_cast(stat, "MULTIPOLYGON")

Now, all 3194 features in stat are of type MULTIPOLYGON:

as.data.frame(table(st_geometry_type(stat)))
##                  Var1 Freq
## 1            GEOMETRY    0
## 2               POINT    0
## 3          LINESTRING    0
## 4             POLYGON    0
## 5          MULTIPOINT    0
## 6     MULTILINESTRING    0
## 7        MULTIPOLYGON 3194
## 8  GEOMETRYCOLLECTION    0
## 9      CIRCULARSTRING    0
## 10      COMPOUNDCURVE    0
## 11       CURVEPOLYGON    0
## 12         MULTICURVE    0
## 13       MULTISURFACE    0
## 14              CURVE    0
## 15            SURFACE    0
## 16  POLYHEDRALSURFACE    0
## 17                TIN    0
## 18           TRIANGLE    0

It may also be necessary to make a topological correction, depending on the version of sf and the linked GDAL and GEOS software:

stat = st_make_valid(stat)

5.7 Area-weighed interpolation

The following expression calculates the area-weighted average population properties per “Nafa”:

x = st_interpolate_aw(stat, nafot, extensive = TRUE)

The resulting layer x has the nafot geometries, along with summed population estimates from stat. It is shown in Figure 5.4:

plot(x, max.plot = 16)

Figure 5.4: Area-weighted counts per age group, by “Nafa”

5.8 Calculating proportions

Next, we are going to cluster the “Nafot” according to similarity in their age structure. First, we extract the attribute table of x with the age group estimates per “Nafa”:

dat = st_drop_geometry(x)
dat[, 1:5]
##       age_0_4    age_5_9  age_10_14  age_15_19 age_20_24
## 1   34308.157  35480.099  32369.329  29095.881 24620.472
## 2  127130.244 109620.934  91668.354  82691.547 80979.798
## 3   52838.467  51748.292  45701.403  42056.490 37653.274
## 4   44738.801  41258.817  36659.088  35189.068 37087.742
## 5   72960.980  72360.969  65454.855  56370.752 47684.046
## 6  143014.908 127973.492 113769.180 103933.860 95599.193
## 7   42729.350  42823.751  40581.731  39749.875 33351.647
## 8   40842.886  42077.430  40111.430  36793.293 31939.422
## 9   55514.217  50283.571  42664.732  39655.143 39107.549
## 10  56794.113  57174.737  57322.555  59915.198 52538.711
## 11  48496.599  47099.450  45340.007  47387.870 42124.138
## 12  81693.567  72531.221  63411.609  60628.745 56235.210
## 13   4541.896   4949.645   4886.759   5050.816  3791.931
## 14  10894.496  10116.174   9526.119   9484.625  8823.037
## 15  12651.048  11674.798  10784.952  10414.150  9279.833

We also set the rownames as the “Nafa” name, for convenience:

rownames(dat) = nafot$name_eng
dat[, 1:5]
##                age_0_4    age_5_9  age_10_14  age_15_19 age_20_24
## Ramla        34308.157  35480.099  32369.329  29095.881 24620.472
## Tel Aviv    127130.244 109620.934  91668.354  82691.547 80979.798
## Rehovot      52838.467  51748.292  45701.403  42056.490 37653.274
## Haifa        44738.801  41258.817  36659.088  35189.068 37087.742
## Petah Tiqwa  72960.980  72360.969  65454.855  56370.752 47684.046
## Jerusalem   143014.908 127973.492 113769.180 103933.860 95599.193
## Hadera       42729.350  42823.751  40581.731  39749.875 33351.647
## Sharon       40842.886  42077.430  40111.430  36793.293 31939.422
## Ashqelon     55514.217  50283.571  42664.732  39655.143 39107.549
## Akko         56794.113  57174.737  57322.555  59915.198 52538.711
## Yizre'el     48496.599  47099.450  45340.007  47387.870 42124.138
## Be'er Sheva  81693.567  72531.221  63411.609  60628.745 56235.210
## Golan         4541.896   4949.645   4886.759   5050.816  3791.931
## Kinneret     10894.496  10116.174   9526.119   9484.625  8823.037
## Zefat        12651.048  11674.798  10784.952  10414.150  9279.833

The “Nafot” have variable population sizes. Therefore, it is necessary to standardize the population counts by totals, to work with proportions. To do that, we use sweep to divide each row by its sum, thus transforming counts to proportions:

dat = sweep(dat, 1, rowSums(dat), "/")
dat[, 1:5]
##                age_0_4    age_5_9  age_10_14  age_15_19  age_20_24
## Ramla       0.09756403 0.10089675 0.09205048 0.08274159 0.07001462
## Tel Aviv    0.08968093 0.07732941 0.06466520 0.05833273 0.05712522
## Rehovot     0.08780640 0.08599476 0.07594610 0.06988902 0.06257181
## Haifa       0.07696696 0.07098013 0.06306693 0.06053796 0.06380437
## Petah Tiqwa 0.09512865 0.09434633 0.08534194 0.07349783 0.06217185
## Jerusalem   0.12794951 0.11449258 0.10178457 0.09298532 0.08552864
## Hadera      0.09602274 0.09623488 0.09119654 0.08932717 0.07494887
## Sharon      0.08737008 0.09001098 0.08580536 0.07870729 0.06832401
## Ashqelon    0.10039783 0.09093817 0.07715945 0.07171659 0.07072626
## Akko        0.08797158 0.08856115 0.08879011 0.09280600 0.08138015
## Yizre'el    0.09384794 0.09114426 0.08773948 0.09170239 0.08151631
## Be'er Sheva 0.11874226 0.10542472 0.09216929 0.08812437 0.08173833
## Golan       0.09003600 0.09811900 0.09687238 0.10012454 0.07516912
## Kinneret    0.09559008 0.08876096 0.08358372 0.08321964 0.07741476
## Zefat       0.10276190 0.09483203 0.08760399 0.08459204 0.07537821

5.9 Hierarchical clustering

Now we can calculate the distance (i.e., the dissimilarity) between Nafot age structures using dist:

d = dist(dat)

and calculate hierarchical clusters using hclust:

hc = hclust(d, "average")

The dendrogram can then be split at a particular “level” to obtain a patricular number of distinct groups (such as k=3). The named vector groups specifies the classification of nafot features into three distinct clusters 1, 2, and 3:

k = 3
groups = cutree(hc, k = k)
groups
##       Ramla    Tel Aviv     Rehovot       Haifa Petah Tiqwa   Jerusalem 
##           1           2           1           2           1           3 
##      Hadera      Sharon    Ashqelon        Akko    Yizre'el Be'er Sheva 
##           1           1           1           1           1           3 
##       Golan    Kinneret       Zefat 
##           1           1           1

The dendrogram and the division to groups are shown in Figure 5.5:

plot(hc)
rect.hclust(hc, k = k)

Figure 5.5: Hierarchical dendrogram of age structure per Nafa

The division to clusters can also be displayed on a map (Figure 5.6):

nafot$group = factor(groups)
plot(nafot["group"])

Figure 5.6: Age structure clusters on a map

Finally, to visualize the age structure distribution within each cluster, we can reshape the data:

library(reshape2)

# Reshape
dat2 = dat
dat2$group = groups
dat2$name = rownames(dat2)
dat2 = melt(dat2, id.vars = c("group", "name"))
dat2$variable = gsub("age_", "", dat2$variable)

# Set age groups order
x = strsplit(dat2$variable, "_")
x = sapply(x, "[", 1)
x = as.numeric(x)
x = dat2$variable[order(x)]
x = unique(x)
dat2$variable = factor(dat2$variable, levels = x)

# Set cluster labels
x = split(dat2$name, dat2$group)
x = sapply(x, "[", 1)
dat2$group = factor(dat2$group, levels = 1:3, labels = x)

head(dat2)
##       group        name variable      value
## 1     Ramla       Ramla      0_4 0.09756403
## 2  Tel Aviv    Tel Aviv      0_4 0.08968093
## 3     Ramla     Rehovot      0_4 0.08780640
## 4  Tel Aviv       Haifa      0_4 0.07696696
## 5     Ramla Petah Tiqwa      0_4 0.09512865
## 6 Jerusalem   Jerusalem      0_4 0.12794951

and then use ggplot2 (Figure 5.7).

library(ggplot2)
library(ggrepel)
ggplot(dat2, aes(x = variable, y = value, group = name, colour = name)) +
    geom_line() +
    geom_point() +
    geom_label_repel(data = dat2[dat2$variable == "75_up", ], aes(label = name)) +
    scale_x_discrete("Age group") +
    scale_y_continuous("Proportion") +
    scale_colour_discrete(guide = FALSE) +
    facet_wrap(~ group) +
    theme_bw() +
    theme(
      axis.text.x = element_text(angle = 90, vjust = 0.5),
      axis.text.y = element_text(angle = 90, hjust = 0.5),
      panel.grid.minor.y = element_blank(),
      panel.grid.major.y = element_blank()
    )

Figure 5.7: Distribution of age structures in each cluster of nafot

6 More information

Other:

sf tutorial from useR!2017 conference
sf tutorial from rstudio::conf 2018 conference
The r-spatial blog

Thank you for listening!

Using R for Spatial Data Analysis Israel Central Bureau of Statistics 2021-01-19

Michael Dorman Geography and Environmental Development dorman@post.bgu.ac.il