fuzzySim
R packagefuzzySim
is an R package for computing
fuzzy similarity in species distributions. It can also
produce the underlying fuzzy distribution
data, e.g. through interpolating or modelling species
presence-absence data.
fuzzySim
fuzzySim
works within the free and open-source R
statistical software, so you first need to download, install and
open R (available at http://www.r-project.org). In this tutorial, in
monospaced
font are the commands that you need to type (or
copy and paste) into the R console (and then press the enter
key to execute them). For commands that generate visible results in R,
these are usually shown below them, preceded by hash marks (##). Note
that all commands are case-sensitive, so you must
respect upper- and lower-case letters; that you must always use
straight ('
, "
)
rather than curly quotes and apostrophes; and that
R is only ready to receive a new command when there’s a prompt
sign (>) at the end of the R console; if not, it’s still
waiting for an operation to be finished or for you to complete a
previous command – watch out for unclosed parentheses or such, and press
Esc for a fresh prompt.
You can install the latest CRAN version of fuzzySim
by
pasting the command below in the R console (when connected to the
internet). You may need to provide additional information if R asks you,
such as selecting a CRAN mirror to download from (choose any).
install.packages("fuzzySim")
If you instead need the latest development version of the package, install it from R-Forge:
install.packages("fuzzySim", repos = "http://R-Forge.R-project.org")
You only need to install the package once (unless a new version
becomes available), but you need to load it every time you open
a new R session in which you intend to use
fuzzySim
(no need for an internet connection anymore), by
pasting the following command in R:
library(fuzzySim)
NOTE: This package works mainly with species occurrence data in presence-absence tables. If your data are point occurrences and raster maps, check out this blog post on how to convert them into tabular presence-absence data.
Load the rotifers sample dataset that comes with the
fuzzySim
package, to use as an example:
data(rotifers)
This dataset was originally published by Fontaneto, Barbosa, Segers & Pautasso in Ecography. You can get more information on the data (the following command should open an R Documentation window):
help(rotifers)
Show the first 10 rows of the rotifers dataset:
head(rotifers, 10)
## TDWG4 species
## 1 DEN-OO Brachionus_plicatilis_plicatilis
## 11 DEN-OO Keratella_cochlearis_cochlearis
## 15 DEN-OO Keratella_quadrata_quadrata
## 16 DEN-OO Asplanchna_priodonta
## 17 DEN-OO Brachionus_angularis_angularis
## 19 DEN-OO Filinia_longiseta
## 24 DEN-OO Brachionus_calyciflorus
## 32 FIN-OO Asplanchna_priodonta
## 37 FIN-OO Keratella_cochlearis_cochlearis
## 46 FIN-OO Brachionus_calyciflorus
The first column contains the identifiers of the spatial units, which are TDWG level 4 region codes, and the second column contains the (sub)species names. These are a bit long, especially if we intend to use them as column names further on, so use the spCodes function to add to the rotifers dataset a column named spcode with species name abbreviations, consisting of the first letter of the genus + the first 5 letters of the specific name. Specify that the character separating words in the input species names is an underscore (_, as you’ve seen in the table above), and that the character you want separating the genus from the specific name code is empty (no separator):
rotifers$spcode <- spCodes(rotifers$species, sep.species = "_", nchar.gen = 1, nchar.sp = 5, nchar.ssp = 0, sep.spcode = "")
## OK - no duplicated spcodes found.
You can try the above with different options for nchar.gen,
nchar.sp and nchar.ssp; the function will return an
error message if the resulting codes are not unique for each species.
Find out more details on this function with
help(spCodes)
.
Now show the first 10 rows of the rotifers dataset after you’ve added the spcode column:
head(rotifers, 10)
## TDWG4 species spcode
## 1 DEN-OO Brachionus_plicatilis_plicatilis Bplica
## 11 DEN-OO Keratella_cochlearis_cochlearis Kcochl
## 15 DEN-OO Keratella_quadrata_quadrata Kquadr
## 16 DEN-OO Asplanchna_priodonta Apriod
## 17 DEN-OO Brachionus_angularis_angularis Bangul
## 19 DEN-OO Filinia_longiseta Flongi
## 24 DEN-OO Brachionus_calyciflorus Bcalyc
## 32 FIN-OO Asplanchna_priodonta Apriod
## 37 FIN-OO Keratella_cochlearis_cochlearis Kcochl
## 46 FIN-OO Brachionus_calyciflorus Bcalyc
The rotifers dataset is in long format, listing in the same
column the species that are present in each spatial unit. For analyzing
distributional relationships with fuzzySim
, we need a
presence-absence table with species in separate columns (wide format).
So, create a new table called rotifers.presabs with the
rotifers presence-absence data converted to wide format and using
spcodes as column names:
rotifers.presabs <- splist2presabs(rotifers, sites.col = "TDWG4", sp.col = "spcode", keep.n = FALSE)
Show the first rows of the result:
head(rotifers.presabs)
## TDWG4 Abrigh Afissa Apriod Bangul Bcalyc Bplica Bquadr Burceo Cgibba Edilat
## 1 ABT-OO 0 0 1 1 0 0 0 0 0 0
## 2 AFG-OO 1 0 1 1 1 1 1 1 1 0
## 3 AGE-BA 1 1 0 1 1 1 1 1 0 1
## 4 AGE-CD 0 0 0 1 0 1 1 0 0 1
## 5 AGE-CH 0 0 0 1 0 0 1 0 1 0
## 6 AGE-CN 0 0 0 0 1 1 1 0 0 1
## Flongi Kcochl Kquadr Ktropi Lbulla Lclost Lhamat Lluna Llunar Lovali Lpatel
## 1 1 1 1 0 0 0 0 1 1 1 0
## 2 0 1 1 1 1 1 1 1 0 1 1
## 3 1 1 1 1 1 1 1 1 1 1 0
## 4 1 1 0 0 1 1 0 1 1 1 0
## 5 1 0 0 1 1 1 1 0 1 1 0
## 6 1 0 1 0 0 0 0 1 0 1 0
## Lquadr Mventr Ppatul Pquadr Pvulga Specti Tpatin Tsimil Ttetra
## 1 0 0 1 0 1 1 0 0 1
## 2 1 0 1 1 1 0 1 1 0
## 3 1 1 1 1 1 0 1 0 1
## 4 1 1 0 0 1 0 0 1 0
## 5 1 1 1 1 1 0 0 1 1
## 6 0 1 1 1 1 0 0 1 0
You can map these data if you have a map of the same spatial units
and with the same unit identifiers; the TDWG maps are available online. You can
use R commands to download the TDWG level 4 map into your current R
session. This requires the terra
package; the following
command will install terra
within your R installation if
it’s not there already (and if you’re connected to the internet).
if (!("terra" %in% .packages(all.available = TRUE)))
install.packages("terra")
Now load the terra
package and create a map named
tdwg by importing the online file, using the vect
function of terra
:
library(terra)
tdwg <- vect("https://raw.githubusercontent.com/tdwg/wgsrpd/master/geojson/level4.geojson")
The map is now in your R session. Now add the rotifers.presabs data to the tdwg map’s attribute table within R, matching them by the name of the column containing the region identifiers in each table, and keeping all rows of tdwg in the final result:
tdwg <- merge(tdwg, rotifers.presabs,
by.x = "Level4_cod", by.y = "TDWG4",
all.x = TRUE)
You are now ready to map the presence-absence of particular species from the rotifers.presabs table:
plot(tdwg, y = "Abrigh", col = rev(heat.colors(2)), main = "Abrigh occurrence records")
Try this also with y = other species in
names(tdwg)
. As you can see in the maps and in the
rotifers.presabs table, these are binary (either 0 or 1)
presence-absence data, with missing values in regions with no rotifer
occurrence records in this dataset.
You can get a fuzzy (continuous between 0 and 1) version of
presence-absence data using e.g. trend surface analysis or inverse
distance interpolation. These require the spatial coordinates of each
study unit, which the rotifers table doesn’t have. So, load the
rotif.env sample dataset that also comes with the
fuzzySim
package, which is already in wide format and
contains the geographical coordinates of the centroid of each TDWG4
unit:
data(rotif.env)
Take a look at its first rows (results not shown here):
head(rotif.env)
Show the column names of this dataset, to see which columns contain the species data and which contain the coordinates:
names(rotif.env)
## [1] "TDWG4" "LEVEL_NAME"
## [3] "REGION_NAME" "CONTINENT"
## [5] "Area" "Altitude"
## [7] "AltitudeRange" "HabitatDiversity"
## [9] "HumanPopulation" "Latitude"
## [11] "Longitude" "Precipitation"
## [13] "PrecipitationSeasonality" "TemperatureAnnualRange"
## [15] "Temperature" "TemperatureSeasonality"
## [17] "UrbanArea" "Abrigh"
## [19] "Afissa" "Apriod"
## [21] "Bangul" "Bcalyc"
## [23] "Bplica" "Bquadr"
## [25] "Burceo" "Cgibba"
## [27] "Edilat" "Flongi"
## [29] "Kcochl" "Kquadr"
## [31] "Ktropi" "Lbulla"
## [33] "Lclost" "Lhamat"
## [35] "Lluna" "Llunar"
## [37] "Lovali" "Lpatel"
## [39] "Lquadr" "Mventr"
## [41] "Ppatul" "Pquadr"
## [43] "Pvulga" "Specti"
## [45] "Tpatin" "Tsimil"
## [47] "Ttetra"
You can see that species are in columns 18 to 47 and geographical
coordinates are in columns 10 and 11. You can use either the names or
the index numbers of these columns in the multTSA
and
distPres
functions below. Mind that the coordinates
must be specified to the function in the order x, y,
i.e. Longitude, Latitude!
First, try a multiple trend surface analysis (TSA) for all species using a 3rd-degree polynomial with stepwise selection of terms:
spcols <- 18:47
rotifers.tsa <- multTSA(rotif.env, sp.cols = spcols, coord.cols = c("Longitude", "Latitude"), id.col = 1, degree = 3, step = TRUE)
You can find out more about trend surface analysis and the different
options of the multTSA
function by reading the help file
that appears when you type help(multTSA)
. Now look at the
first rows of the rotifers.tsa table created just above:
head(rotifers.tsa)
Results (not shown here) are continuous values representing the spatial trend in each species’ occurrence. You can confirm that they are bounded between 0 and 1 (so that they can be used in fuzzy logic) by checking the range of values in all columns except the first one (which contains region identifiers rather than species data):
range(rotifers.tsa[ , -1])
## [1] 9.408071e-05 9.996004e-01
Now add the TSA results to the tdwg map table:
tdwg <- merge(tdwg, rotifers.tsa,
by.x = "Level4_cod", by.y = "TDWG4",
all = TRUE)
Plot the results for the first species (and then
others as you like; check names(tdwg)
for available
y options):
plot(tdwg, y = "Abrigh_TS", col = rev(heat.colors(256)), main = "Abrigh TSA")
The TSA depicts a general spatial trend in a species’ occurrence, but
this may not be a faithful representation of its (fuzzy) occurrence area
(compare with the presence-absence map shown before for the same
species). You can try multTSA
again with different
polynomial degrees, with or without stepwise selection; or you can
compute inverse distance to presence to use instead of
TSA:
rotifers.invdist <- distPres(rotif.env, sp.cols = spcols, coord.cols = c("Longitude", "Latitude"), id.col = 1, p = 1, inv = TRUE, suffix = "_D")
## Warning in distPres(rotif.env, sp.cols = spcols, coord.cols = c("Longitude", :
## Distances may be inaccurate when CRS is not provided.
You can check help(distPres)
for more information and
options for this function (for example, you may want to use p=2
for a more conservative squared distance, especially if your spatial
units are smaller). Check out the first rows of the resulting
table (results not shown here):
head(rotifers.invdist)
You can check that the values in this table (excluding the first column, which has the region identifiers) also range between 0 and 1, so they can be used with fuzzy logic:
range(rotifers.invdist[ , -1])
## [1] 0 1
Note that inverse distance to presence is computed only for absence
localities; distPres
maintains the value 1 for presences.
Now add these distances to the tdwg map table and plot
the first species (then try other species as well):
tdwg <- merge(tdwg, rotifers.invdist,
by.x = "Level4_cod", by.y = "TDWG4",
all = TRUE)
plot(tdwg, y = "Abrigh_D", col = rev(heat.colors(256)), main = "Abrigh inverse distance")