2021-01-19
For more on setting up the environment and sample data, see the preparation document.
Data | File(s) | Format | Source |
---|---|---|---|
“Nafot” | nafot.shp (+7) |
Shapefile | https://www.gov.il/he/Departments/Guides/info-gis |
Railways | RAIL_STRATEGIC.shp (+7) |
Shapefile | https://data.gov.il/dataset/rail_strategic |
Statistical areas | statisticalareas_demography2018.gdb |
GDB | https://www.cbs.gov.il/he/Pages/geo-layers.aspx |
The data for this tutorial can be downloaded from:
https://github.com/michaeldorman/R-Spatial-Workshop-at-CBS-2021/raw/main/data.zip
A script with the R code of this document is available here:
https://github.com/michaeldorman/R-Spatial-Workshop-at-CBS-2021/raw/main/main.R
All of the materials are also available on GitHub.
Please feel free to ask questions as we go along!
Software in general, and software for spatial analysis in particular, is characterized by two types of interfaces:
In a GUI, our interaction with the computer is restricted to the predefined set of input elements, such as buttons, menus, and dialog boxes. In a CLI, we interact with the computer by writing code, which means that our instructions are practically unconstraned. In other words, with a CLI, we can give the computer specific instructions to do anything we want.
R, which we talk about today, is an example of CLI software for working with (among other things) spatial data.
R is a programming language and environment, originally developed for statistical computing and graphics. Notable advantages of R are that it is a full-featured programming language, yet customized for working with data, relatively simple and has a huge collection of ~16,000 packages in the official repository from various areas of interest.
Over time, there was an increasing number of contributed packages for handling and analyzing spatial data in R. Today, spatial analysis is a major functionality in R. As of October 2020, there are ~185 packages specifically addressing spatial analysis in R, and many more are indirectly related to spatial data.
Some important events in the history of spatial analysis support in R are summarized in Table 1.1.
Year | Event |
---|---|
pre-2003 | Variable and incomplete approaches (MASS , spatstat , maptools , geoR , splancs , gstat , …) |
2003 | Consensus that a package defining standard data structures should be useful; rgdal released on CRAN |
2005 | sp released on CRAN; sp support in rgdal |
2008 | Applied Spatial Data Analysis with R, 1st ed. |
2010 | raster released on CRAN |
2011 | rgeos released on CRAN |
2013 | Applied Spatial Data Analysis with R, 2nd ed. |
2016 | sf released on CRAN (Section 2.1) |
2018 | stars released on CRAN |
2019 | Geocomputation with R (https://geocompr.robinlovelace.net/) |
2021(?) | Spatial Data Science (https://www.r-spatial.org/book/) |
A question that arises, at this point, is: can R be used as a Geographic Information System (GIS), or as a comprehensive toolbox for doing spatial analysis? The answer is definitely yes. Moreover, R has some important advantages over traditional approaches to GIS, i.e., software with GUIs such as ArcGIS or QGIS.
General advantages of Command Line Interface (CLI) software include:
Moreover, specific strengths of R as a GIS are:
Nevertheless, there are situations when other tools are needed:
mapedit
package)The following sections (1.5–1.11) highlight some of the capabilities of spatial data analysis packages in R, through short examples.
sf
and stars
Reading spatial layers from a file into an R data structure, or writing the R data structure into a file, are handled by external libraries:
sf
: Vector LayersGEOS is used for geometric operations on vector layers with sf
:
stars
: RastersGeometric operations on rasters can be done with package stars
:
+
, -
, …), Math (sqrt
, log10
, …), logical (!
, ==
, >
, …), summary (mean
, max
, …), Maskinggstat
: InterpolationUnivariate and multivariate geostatistics:
spdep
: Spatial dependenceModelling with spatial weights:
spatstat
: Point patternsTechniques for statistical analysis of spatial point patterns (Figure 1.8), such as:
RPostgreSQL
: PostGISWhen:
we may want to combine R with a (spatial) database.
Package sf
(Section 2.1) combined with RPostgreSQL
can be used to read from, and write to, a PostGIS spatial database. First, we need to create a connection object:
library(sf)
library(RPostgreSQL)
con = dbConnect(
PostgreSQL(),
dbname = "gisdb",
host = "159.89.13.241",
port = 5432,
user = "geobgu",
password = "*******"
)
Then, we can read or write to the database, just like from a file, using the st_read
function (Section 2.3):
st_read(con, query = "SELECT name_lat, geometry FROM plants LIMIT 3;")
## Loading required package: DBI
## Simple feature collection with 3 features and 1 field
## geometry type: POINT
## dimension: XY
## bbox: xmin: 35.19337 ymin: 31.44711 xmax: 35.67976 ymax: 32.77013
## geographic CRS: WGS 84
## name_lat geometry
## 1 Iris haynei POINT (35.67976 32.77013)
## 2 Iris haynei POINT (35.654 32.74137)
## 3 Iris atrofusca POINT (35.19337 31.44711)
dbDisconnect(con)
## [1] TRUE
sf
packageThe sf
package (Figure 2.1), released in 2016, is a newer package for working with vector layers in R, which we are going to use in this tutorial. In recent years, sf
has become the standard package for working with vector data in R, practically replacing sp
, rgdal
, and rgeos
.
One of the important innovations in sf
is a complete implementation of the Simple Features standard. Since 2003, Simple Features been widely implemented in spatial databases (such as PostGIS), commercial GIS (e.g., ESRI ArcGIS) and forms the vector data basis for libraries such as GDAL. The Simple Features standard defines several types of geometries, of which seven are most commonly used in the world of GIS and spatial data analysis (Figure 2.2). When working with spatial databases, Simple Features are commonly specified as Well Known Text (WKT).
The sf
package depends on several external software components (installed along with the R package2), most importantly GDAL, GEOS and PROJ (Figure 2.3). These well-tested and popular open-source components are common to numerous open-source and commercial software for spatial analysis, such as QGIS and PostGIS.
sf
data structuresPackage sf
defines a hierarchical class system, with three classes (Table 2.1):
sfg
—a single geometrysfc
—a geometry column, which is a set of sfg
geometries + CRS informationsf
—a layer, which is an sfc
geometry column inside a data.frame
with non-spatial attributesClass | Hierarchy | Information |
---|---|---|
sfg |
Geometry | type, coordinates |
sfc |
Geometry column | set of sfg + CRS |
sf |
Layer | sfc + attributes |
The sf
class represents a vector layer by extending the data.frame
class, supplementing it with a geometry column. This is similar to the way that spatial databases are structured. For example, the sample dataset shown in Figure 2.4 represents a polygonal layer with three features and six non-spatial attributes. The attributes refer to demographic and epidemiological attributes of US counties, such as the number of births in 1974 (BIR74
), the number of sudden infant death cases in 1974 (SID74
), and so on. The seventh column is the geometry column, containing the polygon geometries.
Figure 2.5 shows what the layer in Figure 2.4 would look like when mapped. We can see the outline of the three polygons, as well as the values of all six non-spatial attributes (in separate panels).
We move on to practical examples. First, let’s see what importing and examining sf
layers looks like, in the R environment.
Function st_read
can be used to read vector layers into sf
data structures. Before doing anything else we need to load the sf
package:
library(sf)
## Linking to GEOS 3.9.0, GDAL 3.2.0, PROJ 7.2.0
Then, we can use st_read
to import a vector layer into the R environment. In this case, we are reading the nafot.shp
Shapefile:
nafot = st_read("data/nafot.shp")
## Reading layer `nafot' from data source `/home/michael/Sync/Presentations/p_2021_01_LMS_Intro_to_Spatial_R/data/nafot.shp' using driver `ESRI Shapefile'
## Simple feature collection with 15 features and 1 field
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
Printing the object gives a summary of its properties, and the values of the (first 10) features. In this case, we see that the nafot
layer has 15 features, and one attribute called name_eng
:
nafot
## Simple feature collection with 15 features and 1 field
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng geometry
## 1 Zefat POLYGON ((739780.1 3686007,...
## 2 Jerusalem POLYGON ((672273.8 3518394,...
## 3 Kinneret POLYGON ((745560 3649860, 7...
## 4 Yizre'el POLYGON ((702283.1 3628046,...
## 5 Akko POLYGON ((702725.9 3630513,...
## 6 Golan POLYGON ((759304.4 3691202,...
## 7 Haifa POLYGON ((701391.6 3631170,...
## 8 Hadera POLYGON ((706537.1 3602188,...
## 9 Sharon POLYGON ((692687.3 3583974,...
## 10 Ramla POLYGON ((672841.5 3544808,...
The class
function, applied on an sf
layer, returns both "sf"
and "data.frame"
class names:
class(nafot)
## [1] "sf" "data.frame"
As mentioned above, a layer (geometry+attributes) is represented by an sf
object. The second printed value, data.frame
, implies that sf
is, in fact, an extension of the data.frame
class, inheriting many of its methods. Indeed, we will see that many functions in R work exactly the same way on sf
and data.frame
objects.
If we want just the geometric part, it can be extracted with st_geometry
, resulting in an object of class sfc
, i.e., a geometry column (Table 2.1):
st_geometry(nafot)
## Geometry set for 15 features
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
## First 5 geometries:
## POLYGON ((739780.1 3686007, 739786.6 3686007, 7...
## POLYGON ((672273.8 3518394, 672368.4 3518511, 6...
## POLYGON ((745560 3649860, 745585.8 3649853, 745...
## POLYGON ((702283.1 3628046, 702283.5 3628046, 7...
## POLYGON ((702725.9 3630513, 702724.1 3630510, 7...
Conversely, If we want just the non-geometric part (the “attributes”), it can be extracted with st_drop_geometry
which returns a data.frame
:
st_drop_geometry(nafot)
## name_eng
## 1 Zefat
## 2 Jerusalem
## 3 Kinneret
## 4 Yizre'el
## 5 Akko
## 6 Golan
## 7 Haifa
## 8 Hadera
## 9 Sharon
## 10 Ramla
## 11 Rehovot
## 12 Ashqelon
## 13 Be'er Sheva
## 14 Petah Tiqwa
## 15 Tel Aviv
The plot
function is a quick way to see the spatial arrangment and attribute values in an sf
layer. For example, the nafot
layer attribute can be plotted as follows (Figure 2.6):
plot(nafot, key.width = lcm(4))
Alternatively, we can plot the geometry only, without attributes, as follows (Figure 2.7):
plot(st_geometry(nafot))
A Coordinate Reference System (CRS) defines how the coordinates in our geometries relate to the surface of the Earth. There are two main types of CRS:
For example, Figure 2.8 shows the same polygonal layer (U.S. counties) in two different projection. On the left, the county layer is in the WGS84 geographic projection. Indeed, we can see that the axes are given in degrees of longitude and latitude. For example, we can see how the nothern border of U.S. follows the 49° latitude line. On the right, the same layer is displayed in the US National Atlas projection, where units are arbitrary but reflect true distance (meters). For example, the distance between every two consecutive grid lines is 1,000,000 meters or 1,000 kilometers.
The CRS of a given sf
layer can be obtained using function st_crs
. The CRS information is returned in the WKT format:
st_crs(nafot)
## Coordinate Reference System:
## User input: WGS 84 / UTM zone 36N
## wkt:
## PROJCRS["WGS 84 / UTM zone 36N",
## BASEGEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]],
## CONVERSION["UTM zone 36N",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",33,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## ID["EPSG",32636]]
Reprojection is the transformation of geometry coordinates, from one CRS to another. It is an important part of spatial analysis workflow, since we often need to:
A vector layer can be reprojected with st_transform
. The st_transform
function has two important parameters:
x
—The layer to be reprojectedcrs
—The target CRSThe crs
can be specified in one of four ways:
4326
)"+proj=longlat +datum=WGS84 +no_defs"
)crs
object of another layer, as returned by st_crs
For example, the following expression reprojects the nafot
layer to the geographic WGS84 CRS, using its EPSG code (4326
):
nafot_wgs84 = st_transform(nafot, 4326)
Printing the layer summary demostrates that the coordinates and CRS definition have changed:
nafot_wgs84
## Simple feature collection with 15 features and 1 field
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 34.26679 ymin: 29.48737 xmax: 35.89468 ymax: 33.33479
## geographic CRS: WGS 84
## First 10 features:
## name_eng geometry
## 1 Zefat POLYGON ((35.57481 33.2865,...
## 2 Jerusalem POLYGON ((34.81956 31.78814...
## 3 Kinneret POLYGON ((35.6271 32.95949,...
## 4 Yizre'el POLYGON ((35.15965 32.77173...
## 5 Akko POLYGON ((35.16491 32.79389...
## 6 Golan POLYGON ((35.78575 33.32878...
## 7 Haifa POLYGON ((35.15081 32.80006...
## 8 Hadera POLYGON ((35.19932 32.53785...
## 9 Sharon POLYGON ((35.0482 32.37614,...
## 10 Ramla POLYGON ((34.83026 32.02623...
The two layers, nafot
and nafot_wgs84
, are shown in Figure 2.9:
plot(st_geometry(nafot), main = "UTM", axes = TRUE)
plot(st_geometry(nafot_wgs84), main = "WGS84", axes = TRUE)
For the next series of examples, we read a second layer called RAIL_STRATEGIC.shp
. The RAIL_STRATEGIC.shp
layer contains railway lines in Israel. Using st_read
, we import the layer and create an sf
object called rail
:
rail = st_read("data/RAIL_STRATEGIC.shp", options = "ENCODING=UTF-8")
## options: ENCODING=UTF-8
## Reading layer `RAIL_STRATEGIC' from data source `/home/michael/Sync/Presentations/p_2021_01_LMS_Intro_to_Spatial_R/data/RAIL_STRATEGIC.shp' using driver `ESRI Shapefile'
## Simple feature collection with 217 features and 6 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: 157822.2 ymin: 385552.2 xmax: 255113.7 ymax: 790663.5
## projected CRS: Israel 1993 / Israeli TM Grid
(Note that the ENCODING
option is only needed when operating system default encoding is not the same as the layer encoding.)
Here is a summary of the rail
layer:
rail
## Simple feature collection with 217 features and 6 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: 157822.2 ymin: 385552.2 xmax: 255113.7 ymax: 790663.5
## projected CRS: Israel 1993 / Israeli TM Grid
## First 10 features:
## UPGRADE SHAPE_LENG SHAPE_Le_1 SEGMENT ISACTIVE
## 1 שדרוג 2040 14489.401 12379.715 כפר יהושע - נשר_1 פעיל
## 2 שדרוג 2030 2159.344 2274.112 באר יעקב-ראשונים_2 פעיל
## 3 שדרוג 2040 4942.306 4942.306 נתבג - נען_3 לא פעיל
## 4 שדרוג 2040 2829.892 2793.338 לב המפרץ מזרח - נשר_4 פעיל
## 5 שדרוג 2040 1976.491 1960.171 קרית גת - להבים_5 פעיל
## 6 שדרוג 2040 1195.701 1195.701 רמלה - רמלה מזרח_6 פעיל
## 7 שדרוג 2030 4224.799 4224.799 אור עקיבא - עתלית_7 לא פעיל
## 8 חדש 2040 36452.633 36452.633 מרכז ספיר-פארן_8 לא פעיל
## 9 שדרוג 2040 7822.722 7892.334 נען - בית שמש_9 פעיל
## 10 שדרוג 2030 4610.831 4176.157 נתבג - ההגנה_10 פעיל
## UPGRAD geometry
## 1 שדרוג בין 2030 ל-2040 LINESTRING (205530.1 741563...
## 2 שדרוג עד 2030 LINESTRING (181507.6 650706...
## 3 שדרוג בין 2030 ל-2040 LINESTRING (189180.6 645433...
## 4 שדרוג בין 2030 ל-2040 LINESTRING (203482.8 744181...
## 5 שדרוג בין 2030 ל-2040 LINESTRING (178574.1 609392...
## 6 שדרוג בין 2030 ל-2040 LINESTRING (189266.6 647211...
## 7 שדרוג עד 2030 LINESTRING (193268.9 719774...
## 8 פתיחה בין 2030 ל-2040 LINESTRING (219966 509396.8...
## 9 שדרוג בין 2030 ל-2040 LINESTRING (188081.3 642530...
## 10 שדרוג עד 2030 LINESTRING (184248 657573.3...
For any type of spatial analysis, we usually need all input layers to be in the same CRS. For this purpose, we will reproject the rail
layer to the CRS of the nafot
layer:
rail = st_transform(rail, st_crs(nafot))
We can plot the rail
layer on its own, as shown in Figure 3.1, to examine its attributes:
plot(rail)
Or plot just one attribute, in which case there is a legened (Figure 3.2):
plot(rail["ISACTIVE"], key.width = lcm(3))
We can also plot the nafot
and rail
geometries together, to examine their arrangement (Figure 3.3). The second expression uses add=TRUE
to add the geometries on top of the existing plot, instead of initializing a new one:
plot(st_geometry(nafot), border = "grey")
plot(st_geometry(rail), col = "red", add = TRUE)
Non-spatial subsetting of features in an sf
vector layer, e.g., according to attribute values, is done in exactly the same way as in a data.frame
. For example, the following expression selects the rail
features for active lines, using the logical vector rail$ISACTIVE == "פעיל"
as the rows index:
rail = rail[rail$ISACTIVE == "פעיל", ]
rail
## Simple feature collection with 161 features and 6 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## UPGRADE SHAPE_LENG SHAPE_Le_1 SEGMENT ISACTIVE
## 1 שדרוג 2040 14489.401 12379.715 כפר יהושע - נשר_1 פעיל
## 2 שדרוג 2030 2159.344 2274.112 באר יעקב-ראשונים_2 פעיל
## 4 שדרוג 2040 2829.892 2793.338 לב המפרץ מזרח - נשר_4 פעיל
## 5 שדרוג 2040 1976.491 1960.171 קרית גת - להבים_5 פעיל
## 6 שדרוג 2040 1195.701 1195.701 רמלה - רמלה מזרח_6 פעיל
## 9 שדרוג 2040 7822.722 7892.334 נען - בית שמש_9 פעיל
## 10 שדרוג 2030 4610.831 4176.157 נתבג - ההגנה_10 פעיל
## 11 שדרוג 2040 1426.854 1426.854 סוקולוב - נורדאו_11 פעיל
## 13 שדרוג 2030 8758.880 8758.880 נהריה - עכו_13 פעיל
## 14 שדרוג 2030 5102.569 5102.569 חולות-יבנה מערב_14 פעיל
## UPGRAD geometry
## 1 שדרוג בין 2030 ל-2040 LINESTRING (692568.6 362751...
## 2 שדרוג עד 2030 LINESTRING (670422.8 353618...
## 4 שדרוג בין 2030 ל-2040 LINESTRING (690467.1 363008...
## 5 שדרוג בין 2030 ל-2040 LINESTRING (668326.9 349482...
## 6 שדרוג בין 2030 ל-2040 LINESTRING (678250.9 353284...
## 9 שדרוג בין 2030 ל-2040 LINESTRING (677161.1 352814...
## 10 שדרוג עד 2030 LINESTRING (673022.5 354310...
## 11 שדרוג בין 2030 ל-2040 LINESTRING (679299.3 356088...
## 13 שדרוג עד 2030 LINESTRING (696063 3653684,...
## 14 שדרוג עד 2030 LINESTRING (664704.1 353470...
We can also subset the columns (attributes) we need. For example, in the following expressions we create an ID attribute called segment_id
, and remove all other attributes:
rail$segment_id = 1:nrow(rail)
rail = rail["segment_id"]
rail
## Simple feature collection with 161 features and 1 field
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## segment_id geometry
## 1 1 LINESTRING (692568.6 362751...
## 2 2 LINESTRING (670422.8 353618...
## 4 3 LINESTRING (690467.1 363008...
## 5 4 LINESTRING (668326.9 349482...
## 6 5 LINESTRING (678250.9 353284...
## 9 6 LINESTRING (677161.1 352814...
## 10 7 LINESTRING (673022.5 354310...
## 11 8 LINESTRING (679299.3 356088...
## 13 9 LINESTRING (696063 3653684,...
## 14 10 LINESTRING (664704.1 353470...
The modified layer is shown in Figure 3.4:
plot(rail)
We can also subset features according to whether they intersect with another layer, using the latter as an index. For example, the following expression creates a subset of nafot
, named nafot1
, with only those features intersecting the rail
layer:
nafot1 = nafot[rail, ]
nafot1
## Simple feature collection with 12 features and 1 field
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 742395.3 ymax: 3665564
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng geometry
## 2 Jerusalem POLYGON ((672273.8 3518394,...
## 4 Yizre'el POLYGON ((702283.1 3628046,...
## 5 Akko POLYGON ((702725.9 3630513,...
## 7 Haifa POLYGON ((701391.6 3631170,...
## 8 Hadera POLYGON ((706537.1 3602188,...
## 9 Sharon POLYGON ((692687.3 3583974,...
## 10 Ramla POLYGON ((672841.5 3544808,...
## 11 Rehovot POLYGON ((660883 3525603, 6...
## 12 Ashqelon POLYGON ((660883 3525603, 6...
## 13 Be'er Sheva POLYGON ((639165.2 3481650,...
Figure 3.5 shows the nafot1
subset (in grey fill), the complete nafot
layer (in grey outline), and the railway lines layer (in red):
plot(st_geometry(nafot), border = "grey")
plot(st_geometry(nafot1), border = "grey", col = "grey90", add = TRUE)
plot(st_geometry(rail), col = "red", add = TRUE)
Geometric operations on vector layers can conceptually be divided into three groups according to their output:
There are several functions to calculate numeric geometric properties of vector layers in package sf
:
st_length
—Lengthst_area
—Areast_distance
—Distancest_bbox
—Bounding boxst_dimension
—Dimensions (0 = points, 1 = lines, 2 = polygons)For example, we can calculate the area of each feature in the nafot
layer using st_area
. The result is assigned to a new attribute named area
in nafot
:
nafot$area = st_area(nafot)
The nafot
layer now has a new attribute named area
, with the “Nafa” polygon areas:
nafot
## Simple feature collection with 15 features and 2 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng geometry area
## 1 Zefat POLYGON ((739780.1 3686007,... 660340513 [m^2]
## 2 Jerusalem POLYGON ((672273.8 3518394,... 653746831 [m^2]
## 3 Kinneret POLYGON ((745560 3649860, 7... 639422163 [m^2]
## 4 Yizre'el POLYGON ((702283.1 3628046,... 1231817404 [m^2]
## 5 Akko POLYGON ((702725.9 3630513,... 955636113 [m^2]
## 6 Golan POLYGON ((759304.4 3691202,... 1153904250 [m^2]
## 7 Haifa POLYGON ((701391.6 3631170,... 299106812 [m^2]
## 8 Hadera POLYGON ((706537.1 3602188,... 577310399 [m^2]
## 9 Sharon POLYGON ((692687.3 3583974,... 348445079 [m^2]
## 10 Ramla POLYGON ((672841.5 3544808,... 339246753 [m^2]
You may notice the area
values are printed with their units [m^2]
. The resulting area values, just like any other numeric calculations in sf
, are returned as object of class units
:
class(nafot$area)
## [1] "units"
A units
object is basically a numeric vector associated with units of measurement. We can transform units
measurements to different units, with function set_units
from package units
. First, we load the units
library:
library(units)
## udunits system database from /usr/share/xml/udunits
Then, we make the transformation to \(km^2\) units:
nafot$area = set_units(nafot$area, "km^2")
The modified area
column can be observed in the layer summary:
nafot
## Simple feature collection with 15 features and 2 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng geometry area
## 1 Zefat POLYGON ((739780.1 3686007,... 660.3405 [km^2]
## 2 Jerusalem POLYGON ((672273.8 3518394,... 653.7468 [km^2]
## 3 Kinneret POLYGON ((745560 3649860, 7... 639.4222 [km^2]
## 4 Yizre'el POLYGON ((702283.1 3628046,... 1231.8174 [km^2]
## 5 Akko POLYGON ((702725.9 3630513,... 955.6361 [km^2]
## 6 Golan POLYGON ((759304.4 3691202,... 1153.9042 [km^2]
## 7 Haifa POLYGON ((701391.6 3631170,... 299.1068 [km^2]
## 8 Hadera POLYGON ((706537.1 3602188,... 577.3104 [km^2]
## 9 Sharon POLYGON ((692687.3 3583974,... 348.4451 [km^2]
## 10 Ramla POLYGON ((672841.5 3544808,... 339.2468 [km^2]
Figure 3.6 shows a map of the calculated area sizes:
plot(nafot["area"])
Given two layers, x
and y
, the following logical geometric functions check whether each feature in x
maintains the specified relation with each feature in y
(Table tab:logical-geometric-functions).
Function | Value | Inverse of |
---|---|---|
st_contains |
None of the points of A are outside B | st_within |
st_contains_properly |
A contains B and B has no points in common with the boundary of A | |
st_covers |
No points of B lie in the exterior of A | st_covered_by |
st_covered_by |
inverse of st_covers |
|
st_crosses |
A and B have some but not all interior points in common | |
st_disjoint |
A and B have no points in common | st_intersects |
st_equals |
A and B are geometrically equal; node order number of nodes may differ; identical to A contains B AND A within B | |
st_equals_exact |
A and B are geometrically equal, and have identical node order | |
st_intersects |
A and B are not disjoint | st_disjoint |
st_is_within_distance |
A is closer to B than a given distance | |
st_within |
None of the points of B are outside A | st_contains |
st_touches |
A and B have at least one boundary point in common, but no interior points | |
st_overlaps |
A and B have some points in common; the dimension of these is identical to that of A and B | |
st_relate |
given a pattern, returns whether A and B adhere to this pattern |
When specifying sparse=FALSE
the functions return a logical matrix
. Each element i,j
in the matrix
is TRUE
when f(x[i], y[j])
is TRUE
.
For example, this creates a matrix of intersection relations between nafot:
int = st_intersects(nafot1, nafot1, sparse = FALSE)
int
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE
## [2,] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE
## [7,] TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE
## [8,] TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE
## [9,] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
## [11,] FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
## [12,] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE
The following code section visualizes the matrix
(Figure 3.7):
int1 = apply(int, 2, rev)
int1 = t(int1)
image(int1, col = c("lightgrey", "red"), asp = 1, axes = FALSE)
axis(3, at = seq(0, 1, 1/(nrow(int1)-1)), labels = nafot1$name_eng, las = 2, lwd = 0, lwd.ticks = 1, cex.axis = 0.75)
axis(2, at = seq(0, 1, 1/(nrow(int1)-1)), labels = rev(nafot1$name_eng), las = 1, pos = -0.046, lwd = 0, lwd.ticks = 1, cex.axis = 0.75)
sf
provides common geometry-generating functions applicable to individual geometries, such as:
st_centroid
—Centroidsst_buffer
—Bufferst_sample
—Random sample pointsst_convex_hull
—Convex hullst_voronoi
—Voronoi polygonsFor example, to calculate the centroids of the “Nafa” polygons we can use the st_centroid
function as follows:
nafot_ctr = st_centroid(nafot)
They can be plotted as follows, the result is shown in Figure 3.9:
plot(st_geometry(nafot), border = "grey")
plot(st_geometry(nafot_ctr), col = "red", pch = 3, add = TRUE)
Other geometry-generating functions work on pairs of input geometries (Figure 3.10):
st_intersection
st_difference
st_sym_difference
st_union
Now we move on to examples of two specific tasks:
In this example, our goal is to calculate the average rail density (\(km\) per \(km^2\)) per “Nafa” (Figure 4.5). To calculate total railway length per “Nafa”, we are going to dissolve the rail
layer, then split it by “Nafa” polygons.
First, we “dissolve” the railway layer into a single MULTILINESTRING
geometry:
rail = st_union(rail)
rail
## Geometry set for 1 feature
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS: WGS 84 / UTM zone 36N
The result is a geometry column (sfc
), which we convert to sf
so that it can accomodate “Nafa” name attribute later on:
rail = st_sf(geometry = rail)
rail
## Simple feature collection with 1 feature and 0 fields
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS: WGS 84 / UTM zone 36N
## geometry
## 1 MULTILINESTRING ((649213.3 ...
Then, we “split” the rail
layer back according to “Nafot” polygons using st_intersection
(Section 3.5.3):
rail = st_intersection(rail, nafot)
The result is a new line layer of 12 MULTILINESTRING
geometries, including the name_eng
attribute of the corresponding “Nafa” they intersect with:
rail
## Simple feature collection with 12 features and 2 fields
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng area geometry
## 1 Jerusalem 653.7468 [km^2] MULTILINESTRING ((673559.5 ...
## 2 Yizre'el 1231.8174 [km^2] MULTILINESTRING ((697854.1 ...
## 3 Akko 955.6361 [km^2] MULTILINESTRING ((696063 36...
## 4 Haifa 299.1068 [km^2] MULTILINESTRING ((683294.1 ...
## 5 Hadera 577.3104 [km^2] MULTILINESTRING ((678559 35...
## 6 Sharon 348.4451 [km^2] MULTILINESTRING ((673053.9 ...
## 7 Ramla 339.2468 [km^2] MULTILINESTRING ((671460.5 ...
## 8 Rehovot 323.6068 [km^2] MULTILINESTRING ((658033.1 ...
## 9 Ashqelon 1270.6010 [km^2] MULTILINESTRING ((649213.3 ...
## 10 Be'er Sheva 13182.9382 [km^2] MULTILINESTRING ((649386 34...
The layer is visualized in Figure 4.1:
plot(rail["name_eng"], lwd = 3, key.width = lcm(4), reset = FALSE)
plot(st_geometry(nafot), border = "lightgrey", add = TRUE)
Next, we will calculate line length of each rail segment, using function st_length
(Section 3.5.1). We assign the result into a railway length attribute called length
:
rail$length = st_length(rail)
rail
## Simple feature collection with 12 features and 3 fields
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng area geometry length
## 1 Jerusalem 653.7468 [km^2] MULTILINESTRING ((673559.5 ... 60237.68 [m]
## 2 Yizre'el 1231.8174 [km^2] MULTILINESTRING ((697854.1 ... 43289.01 [m]
## 3 Akko 955.6361 [km^2] MULTILINESTRING ((696063 36... 36191.26 [m]
## 4 Haifa 299.1068 [km^2] MULTILINESTRING ((683294.1 ... 41606.83 [m]
## 5 Hadera 577.3104 [km^2] MULTILINESTRING ((678559 35... 46310.90 [m]
## 6 Sharon 348.4451 [km^2] MULTILINESTRING ((673053.9 ... 24785.27 [m]
## 7 Ramla 339.2468 [km^2] MULTILINESTRING ((671460.5 ... 82702.28 [m]
## 8 Rehovot 323.6068 [km^2] MULTILINESTRING ((658033.1 ... 47651.51 [m]
## 9 Ashqelon 1270.6010 [km^2] MULTILINESTRING ((649213.3 ... 76780.70 [m]
## 10 Be'er Sheva 13182.9382 [km^2] MULTILINESTRING ((649386 34... 119747.86 [m]
For convenience, we will also convert the values from \(m\) units to \(km\) units:
rail$length = set_units(rail$length, "km")
rail
## Simple feature collection with 12 features and 3 fields
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng area geometry length
## 1 Jerusalem 653.7468 [km^2] MULTILINESTRING ((673559.5 ... 60.23768 [km]
## 2 Yizre'el 1231.8174 [km^2] MULTILINESTRING ((697854.1 ... 43.28901 [km]
## 3 Akko 955.6361 [km^2] MULTILINESTRING ((696063 36... 36.19126 [km]
## 4 Haifa 299.1068 [km^2] MULTILINESTRING ((683294.1 ... 41.60683 [km]
## 5 Hadera 577.3104 [km^2] MULTILINESTRING ((678559 35... 46.31090 [km]
## 6 Sharon 348.4451 [km^2] MULTILINESTRING ((673053.9 ... 24.78527 [km]
## 7 Ramla 339.2468 [km^2] MULTILINESTRING ((671460.5 ... 82.70228 [km]
## 8 Rehovot 323.6068 [km^2] MULTILINESTRING ((658033.1 ... 47.65151 [km]
## 9 Ashqelon 1270.6010 [km^2] MULTILINESTRING ((649213.3 ... 76.78070 [km]
## 10 Be'er Sheva 13182.9382 [km^2] MULTILINESTRING ((649386 34... 119.74786 [km]
The lengths are visualized in Figure 4.2:
plot(rail["name_eng"], lwd = 3, key.width = lcm(4), reset = FALSE)
plot(st_geometry(nafot), border = "lightgrey", add = TRUE)
text(st_coordinates(st_centroid(rail)), as.character(round(rail$length)))
To transfer the track lengths we just calculated to the polygonal nafot
layer, we use a non-spatial join. First, we extract the attribute table of rail
, using st_drop_geometry
(Section 2.3):
rail = st_drop_geometry(rail)
rail
## name_eng area length
## 1 Jerusalem 653.7468 [km^2] 60.23768 [km]
## 2 Yizre'el 1231.8174 [km^2] 43.28901 [km]
## 3 Akko 955.6361 [km^2] 36.19126 [km]
## 4 Haifa 299.1068 [km^2] 41.60683 [km]
## 5 Hadera 577.3104 [km^2] 46.31090 [km]
## 6 Sharon 348.4451 [km^2] 24.78527 [km]
## 7 Ramla 339.2468 [km^2] 82.70228 [km]
## 8 Rehovot 323.6068 [km^2] 47.65151 [km]
## 9 Ashqelon 1270.6010 [km^2] 76.78070 [km]
## 10 Be'er Sheva 13182.9382 [km^2] 119.74786 [km]
## 11 Petah Tiqwa 283.1575 [km^2] 35.57314 [km]
## 12 Tel Aviv 175.6073 [km^2] 33.82369 [km]
and keep just the name_eng
and length
columns:
rail = rail[c("name_eng", "length")]
rail
## name_eng length
## 1 Jerusalem 60.23768 [km]
## 2 Yizre'el 43.28901 [km]
## 3 Akko 36.19126 [km]
## 4 Haifa 41.60683 [km]
## 5 Hadera 46.31090 [km]
## 6 Sharon 24.78527 [km]
## 7 Ramla 82.70228 [km]
## 8 Rehovot 47.65151 [km]
## 9 Ashqelon 76.78070 [km]
## 10 Be'er Sheva 119.74786 [km]
## 11 Petah Tiqwa 35.57314 [km]
## 12 Tel Aviv 33.82369 [km]
Now we can join nafot
and rail
, by the common column "name_eng"
. This is done using function merge
. The additional argument all.x=TRUE
implies a left join:
nafot = merge(nafot, rail, by = "name_eng", all.x = TRUE)
As a result of the join operation, the nafot
layer now has a new length
attribute:
nafot
## Simple feature collection with 15 features and 3 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng area length geometry
## 1 Akko 955.6361 [km^2] 36.19126 [km] POLYGON ((702725.9 3630513,...
## 2 Ashqelon 1270.6010 [km^2] 76.78070 [km] POLYGON ((660883 3525603, 6...
## 3 Be'er Sheva 13182.9382 [km^2] 119.74786 [km] POLYGON ((639165.2 3481650,...
## 4 Golan 1153.9042 [km^2] NA [km] POLYGON ((759304.4 3691202,...
## 5 Hadera 577.3104 [km^2] 46.31090 [km] POLYGON ((706537.1 3602188,...
## 6 Haifa 299.1068 [km^2] 41.60683 [km] POLYGON ((701391.6 3631170,...
## 7 Jerusalem 653.7468 [km^2] 60.23768 [km] POLYGON ((672273.8 3518394,...
## 8 Kinneret 639.4222 [km^2] NA [km] POLYGON ((745560 3649860, 7...
## 9 Petah Tiqwa 283.1575 [km^2] 35.57314 [km] POLYGON ((674110.3 3549169,...
## 10 Ramla 339.2468 [km^2] 82.70228 [km] POLYGON ((672841.5 3544808,...
Length of NA
implies there was no matching rail
feature, which means there are no railways in the “Nafa”. These NA
values should be replaced with zero:
nafot$length[is.na(nafot$length)] = 0
nafot
## Simple feature collection with 15 features and 3 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng area length geometry
## 1 Akko 955.6361 [km^2] 36.19126 [km] POLYGON ((702725.9 3630513,...
## 2 Ashqelon 1270.6010 [km^2] 76.78070 [km] POLYGON ((660883 3525603, 6...
## 3 Be'er Sheva 13182.9382 [km^2] 119.74786 [km] POLYGON ((639165.2 3481650,...
## 4 Golan 1153.9042 [km^2] 0.00000 [km] POLYGON ((759304.4 3691202,...
## 5 Hadera 577.3104 [km^2] 46.31090 [km] POLYGON ((706537.1 3602188,...
## 6 Haifa 299.1068 [km^2] 41.60683 [km] POLYGON ((701391.6 3631170,...
## 7 Jerusalem 653.7468 [km^2] 60.23768 [km] POLYGON ((672273.8 3518394,...
## 8 Kinneret 639.4222 [km^2] 0.00000 [km] POLYGON ((745560 3649860, 7...
## 9 Petah Tiqwa 283.1575 [km^2] 35.57314 [km] POLYGON ((674110.3 3549169,...
## 10 Ramla 339.2468 [km^2] 82.70228 [km] POLYGON ((672841.5 3544808,...
The calculated total lengths per “Nafa” are shown in Figure 4.3:
plot(nafot["length"])
Now that we know the total railway length and the area, per “Nafa”, we can divide total railway length by area. This gives us railway density per “Nafa”:
nafot$density = nafot$length / nafot$area
Sorting the features lets us clearly see which “Nafa” has the highest densities of railway lines:
nafot = nafot[order(nafot$density, decreasing = TRUE), ]
nafot
## Simple feature collection with 15 features and 4 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS: WGS 84 / UTM zone 36N
## First 10 features:
## name_eng area length geometry
## 10 Ramla 339.2468 [km^2] 82.70228 [km] POLYGON ((672841.5 3544808,...
## 13 Tel Aviv 175.6073 [km^2] 33.82369 [km] POLYGON ((674110.3 3549169,...
## 11 Rehovot 323.6068 [km^2] 47.65151 [km] POLYGON ((660883 3525603, 6...
## 6 Haifa 299.1068 [km^2] 41.60683 [km] POLYGON ((701391.6 3631170,...
## 9 Petah Tiqwa 283.1575 [km^2] 35.57314 [km] POLYGON ((674110.3 3549169,...
## 7 Jerusalem 653.7468 [km^2] 60.23768 [km] POLYGON ((672273.8 3518394,...
## 5 Hadera 577.3104 [km^2] 46.31090 [km] POLYGON ((706537.1 3602188,...
## 12 Sharon 348.4451 [km^2] 24.78527 [km] POLYGON ((692687.3 3583974,...
## 2 Ashqelon 1270.6010 [km^2] 76.78070 [km] POLYGON ((660883 3525603, 6...
## 1 Akko 955.6361 [km^2] 36.19126 [km] POLYGON ((702725.9 3630513,...
## density
## 10 0.24378209 [1/km]
## 13 0.19260989 [1/km]
## 11 0.14725126 [1/km]
## 6 0.13910357 [1/km]
## 9 0.12563022 [1/km]
## 7 0.09214221 [1/km]
## 5 0.08021836 [1/km]
## 12 0.07113107 [1/km]
## 2 0.06042865 [1/km]
## 1 0.03787138 [1/km]
Plotting the layer shows the new density
attribute (Figures 4.4 on its own:
plot(nafot["density"])
or along with 4.5) the other attributes:
plot(nafot)
In the second example, we are going to examine the dissimilarity between “Nafot” in terms of their population age structure. The demographic data come at the statistical area level, which we are going to aggregate to the “Nafa” level.
First, we read the statistical areas Shapefile into an object named stat
:
stat = st_read("data/statisticalareas_demography2018.gdb", options = "ENCODING=UTF-8")
## options: ENCODING=UTF-8
## Reading layer `statisticalareas_demography2018' from data source `/home/michael/Sync/Presentations/p_2021_01_LMS_Intro_to_Spatial_R/data/statisticalareas_demography2018.gdb' using driver `OpenFileGDB'
## Simple feature collection with 3194 features and 34 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 130175.2 ymin: 378139.7 xmax: 284068.5 ymax: 804530.3
## projected CRS: Israel 1993 / Israeli TM Grid
The stat
layer has numerous columns:
vars = colnames(stat)
vars
## [1] "SEMEL_YISHUV" "STAT11" "YISHUV_STAT11"
## [4] "SHEM_YISHUV" "SHEM_YISHUV_ENGLISH" "Stat11_Unite"
## [7] "Stat11_Ref" "Main_Function_Code" "Main_Function_Txt"
## [10] "Religion_yishuv_Code" "Religion_yishuv_Txt" "Religion_Stat_Code"
## [13] "Religion_Stat_Txt" "Pop_Total" "Male_Total"
## [16] "Female_Total" "age_0_4" "age_5_9"
## [19] "age_10_14" "age_15_19" "age_20_24"
## [22] "age_25_29" "age_30_34" "age_35_39"
## [25] "age_40_44" "age_45_49" "age_50_54"
## [28] "age_55_59" "age_60_64" "age_65_69"
## [31] "age_70_74" "age_75_up" "SHAPE_Length"
## [34] "SHAPE_Area" "SHAPE"
In our case, we are only interested in the population estimates per age category, i.e., all variables that start with "age_"
:
vars = vars[grepl("age_", vars)]
vars
## [1] "age_0_4" "age_5_9" "age_10_14" "age_15_19" "age_20_24" "age_25_29"
## [7] "age_30_34" "age_35_39" "age_40_44" "age_45_49" "age_50_54" "age_55_59"
## [13] "age_60_64" "age_65_69" "age_70_74" "age_75_up"
stat = stat[vars]
The resulting subset of the stat
layer, now with just 16 attributes, is shown in Figure 5.1.
plot(stat, max.plot = 16)
One thing we may notice in Figure 5.1 (or printing stat
in the console) is that many of the statistical areas have NA
values, meant to represent zero (rather than unknown) population size (e.g., in open areas). For all practical purposes, these NA
values should be replaced with 0
values:
stat[is.na(stat)] = 0
The modified stat
layer is shown in Figure 5.2:
plot(stat, max.plot = 16)
Next, we are going to calculate the average age structure per “Nafa” (nafot
), based on the data from statistical areas (stat
). First of all, we need to make sure both layers are in the same CRS (Section 2.4.2):
stat = st_transform(stat, st_crs(nafot))
Figure 5.3 shows one of the attributes in stat
(colored polygons with narrow borders) with the nafot
layer on top (transparent polygons with thick black bordwer), demonstrating that the layers are aligned:
plot(stat["age_10_14"], pal = hcl.colors(12, "Reds", rev = TRUE), border = "black", lwd = 0.07, reset = FALSE)
plot(st_geometry(nafot), lwd = 0.5, add = TRUE)
Now, in theory, we are ready to calculate the area-weighted sum of population per “Nafa”, using function st_interpolate_aw
. The function requires three parameters:
x
—Input layer (with values to transfer)to
—Target geometries (where to transfer values)extensive
—Does the layer contains spatially extensive variables, which are summed, such as population (TRUE
), or spatially intensive variables which are averaged, such as population density (FALSE
)?In our case, the appropriate expression is:
st_interpolate_aw(stat, nafot, extensive = TRUE)
However, the operation fails, since the layer contains unsapported geometry types:
x = st_interpolate_aw(stat, nafot, extensive = TRUE)
## Error in CPL_geos_op2(op, x, y): Evaluation error: ParseException: Unknown WKB type 12.
namely, MULTISURFACE
geometries (detected using st_geometry_type
):
as.data.frame(table(st_geometry_type(stat)))
## Var1 Freq
## 1 GEOMETRY 0
## 2 POINT 0
## 3 LINESTRING 0
## 4 POLYGON 0
## 5 MULTIPOINT 0
## 6 MULTILINESTRING 0
## 7 MULTIPOLYGON 3068
## 8 GEOMETRYCOLLECTION 0
## 9 CIRCULARSTRING 0
## 10 COMPOUNDCURVE 0
## 11 CURVEPOLYGON 0
## 12 MULTICURVE 0
## 13 MULTISURFACE 126
## 14 CURVE 0
## 15 SURFACE 0
## 16 POLYHEDRALSURFACE 0
## 17 TIN 0
## 18 TRIANGLE 0
We can “cast” the latter to MULTIPOLYGON
, as follows, to avoid the unsupported geometry type:
stat = st_cast(stat, "MULTIPOLYGON")
Now, all 3194 features in stat
are of type MULTIPOLYGON
:
as.data.frame(table(st_geometry_type(stat)))
## Var1 Freq
## 1 GEOMETRY 0
## 2 POINT 0
## 3 LINESTRING 0
## 4 POLYGON 0
## 5 MULTIPOINT 0
## 6 MULTILINESTRING 0
## 7 MULTIPOLYGON 3194
## 8 GEOMETRYCOLLECTION 0
## 9 CIRCULARSTRING 0
## 10 COMPOUNDCURVE 0
## 11 CURVEPOLYGON 0
## 12 MULTICURVE 0
## 13 MULTISURFACE 0
## 14 CURVE 0
## 15 SURFACE 0
## 16 POLYHEDRALSURFACE 0
## 17 TIN 0
## 18 TRIANGLE 0
It may also be necessary to make a topological correction, depending on the version of sf
and the linked GDAL and GEOS software:
stat = st_make_valid(stat)
The following expression calculates the area-weighted average population properties per “Nafa”:
x = st_interpolate_aw(stat, nafot, extensive = TRUE)
The resulting layer x
has the nafot
geometries, along with summed population estimates from stat
. It is shown in Figure 5.4:
plot(x, max.plot = 16)
Next, we are going to cluster the “Nafot” according to similarity in their age structure. First, we extract the attribute table of x
with the age group estimates per “Nafa”:
dat = st_drop_geometry(x)
dat[, 1:5]
## age_0_4 age_5_9 age_10_14 age_15_19 age_20_24
## 1 34308.157 35480.099 32369.329 29095.881 24620.472
## 2 127130.244 109620.934 91668.354 82691.547 80979.798
## 3 52838.467 51748.292 45701.403 42056.490 37653.274
## 4 44738.801 41258.817 36659.088 35189.068 37087.742
## 5 72960.980 72360.969 65454.855 56370.752 47684.046
## 6 143014.908 127973.492 113769.180 103933.860 95599.193
## 7 42729.350 42823.751 40581.731 39749.875 33351.647
## 8 40842.886 42077.430 40111.430 36793.293 31939.422
## 9 55514.217 50283.571 42664.732 39655.143 39107.549
## 10 56794.113 57174.737 57322.555 59915.198 52538.711
## 11 48496.599 47099.450 45340.007 47387.870 42124.138
## 12 81693.567 72531.221 63411.609 60628.745 56235.210
## 13 4541.896 4949.645 4886.759 5050.816 3791.931
## 14 10894.496 10116.174 9526.119 9484.625 8823.037
## 15 12651.048 11674.798 10784.952 10414.150 9279.833
We also set the rownames as the “Nafa” name, for convenience:
rownames(dat) = nafot$name_eng
dat[, 1:5]
## age_0_4 age_5_9 age_10_14 age_15_19 age_20_24
## Ramla 34308.157 35480.099 32369.329 29095.881 24620.472
## Tel Aviv 127130.244 109620.934 91668.354 82691.547 80979.798
## Rehovot 52838.467 51748.292 45701.403 42056.490 37653.274
## Haifa 44738.801 41258.817 36659.088 35189.068 37087.742
## Petah Tiqwa 72960.980 72360.969 65454.855 56370.752 47684.046
## Jerusalem 143014.908 127973.492 113769.180 103933.860 95599.193
## Hadera 42729.350 42823.751 40581.731 39749.875 33351.647
## Sharon 40842.886 42077.430 40111.430 36793.293 31939.422
## Ashqelon 55514.217 50283.571 42664.732 39655.143 39107.549
## Akko 56794.113 57174.737 57322.555 59915.198 52538.711
## Yizre'el 48496.599 47099.450 45340.007 47387.870 42124.138
## Be'er Sheva 81693.567 72531.221 63411.609 60628.745 56235.210
## Golan 4541.896 4949.645 4886.759 5050.816 3791.931
## Kinneret 10894.496 10116.174 9526.119 9484.625 8823.037
## Zefat 12651.048 11674.798 10784.952 10414.150 9279.833
The “Nafot” have variable population sizes. Therefore, it is necessary to standardize the population counts by totals, to work with proportions. To do that, we use sweep
to divide each row by its sum, thus transforming counts to proportions:
dat = sweep(dat, 1, rowSums(dat), "/")
dat[, 1:5]
## age_0_4 age_5_9 age_10_14 age_15_19 age_20_24
## Ramla 0.09756403 0.10089675 0.09205048 0.08274159 0.07001462
## Tel Aviv 0.08968093 0.07732941 0.06466520 0.05833273 0.05712522
## Rehovot 0.08780640 0.08599476 0.07594610 0.06988902 0.06257181
## Haifa 0.07696696 0.07098013 0.06306693 0.06053796 0.06380437
## Petah Tiqwa 0.09512865 0.09434633 0.08534194 0.07349783 0.06217185
## Jerusalem 0.12794951 0.11449258 0.10178457 0.09298532 0.08552864
## Hadera 0.09602274 0.09623488 0.09119654 0.08932717 0.07494887
## Sharon 0.08737008 0.09001098 0.08580536 0.07870729 0.06832401
## Ashqelon 0.10039783 0.09093817 0.07715945 0.07171659 0.07072626
## Akko 0.08797158 0.08856115 0.08879011 0.09280600 0.08138015
## Yizre'el 0.09384794 0.09114426 0.08773948 0.09170239 0.08151631
## Be'er Sheva 0.11874226 0.10542472 0.09216929 0.08812437 0.08173833
## Golan 0.09003600 0.09811900 0.09687238 0.10012454 0.07516912
## Kinneret 0.09559008 0.08876096 0.08358372 0.08321964 0.07741476
## Zefat 0.10276190 0.09483203 0.08760399 0.08459204 0.07537821
Now we can calculate the distance (i.e., the dissimilarity) between Nafot age structures using dist
:
d = dist(dat)
and calculate hierarchical clusters using hclust
:
hc = hclust(d, "average")
The dendrogram can then be split at a particular “level” to obtain a patricular number of distinct groups (such as k=3
). The named vector groups
specifies the classification of nafot
features into three distinct clusters 1
, 2
, and 3
:
k = 3
groups = cutree(hc, k = k)
groups
## Ramla Tel Aviv Rehovot Haifa Petah Tiqwa Jerusalem
## 1 2 1 2 1 3
## Hadera Sharon Ashqelon Akko Yizre'el Be'er Sheva
## 1 1 1 1 1 3
## Golan Kinneret Zefat
## 1 1 1
The dendrogram and the division to groups are shown in Figure 5.5:
plot(hc)
rect.hclust(hc, k = k)
The division to clusters can also be displayed on a map (Figure 5.6):
nafot$group = factor(groups)
plot(nafot["group"])
Finally, to visualize the age structure distribution within each cluster, we can reshape the data:
library(reshape2)
# Reshape
dat2 = dat
dat2$group = groups
dat2$name = rownames(dat2)
dat2 = melt(dat2, id.vars = c("group", "name"))
dat2$variable = gsub("age_", "", dat2$variable)
# Set age groups order
x = strsplit(dat2$variable, "_")
x = sapply(x, "[", 1)
x = as.numeric(x)
x = dat2$variable[order(x)]
x = unique(x)
dat2$variable = factor(dat2$variable, levels = x)
# Set cluster labels
x = split(dat2$name, dat2$group)
x = sapply(x, "[", 1)
dat2$group = factor(dat2$group, levels = 1:3, labels = x)
head(dat2)
## group name variable value
## 1 Ramla Ramla 0_4 0.09756403
## 2 Tel Aviv Tel Aviv 0_4 0.08968093
## 3 Ramla Rehovot 0_4 0.08780640
## 4 Tel Aviv Haifa 0_4 0.07696696
## 5 Ramla Petah Tiqwa 0_4 0.09512865
## 6 Jerusalem Jerusalem 0_4 0.12794951
and then use ggplot2
(Figure 5.7).
library(ggplot2)
library(ggrepel)
ggplot(dat2, aes(x = variable, y = value, group = name, colour = name)) +
geom_line() +
geom_point() +
geom_label_repel(data = dat2[dat2$variable == "75_up", ], aes(label = name)) +
scale_x_discrete("Age group") +
scale_y_continuous("Proportion") +
scale_colour_discrete(guide = FALSE) +
facet_wrap(~ group) +
theme_bw() +
theme(
axis.text.x = element_text(angle = 90, vjust = 0.5),
axis.text.y = element_text(angle = 90, hjust = 0.5),
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank()
)
Other:
sf
tutorial from useR!2017 conferencesf
tutorial from rstudio::conf 2018 conferenceThank you for listening!