fuzzySim is an R package for computing fuzzy similarity in species distributions. It can also produce the underlying fuzzy distribution data, e.g. through interpolating or modelling species presence-absence data.

Installing and loading fuzzySim

fuzzySim works within the free and open-source R statistical software, so you first need to download, install and open R (available at http://www.r-project.org). In this tutorial, in monospaced font are the commands that you need to type (or copy and paste) into the R console (and then press the enter key to execute them). For commands that generate visible results in R, these are usually shown below them, preceded by hash marks (##). Note that all commands are case-sensitive, so you must respect upper- and lower-case letters; that you must always use straight (', ") rather than curly quotes and apostrophes; and that R is only ready to receive a new command when there’s a prompt sign (>) at the end of the R console; if not, it’s still waiting for an operation to be finished or for you to complete a previous command – watch out for unclosed parentheses or such, and press Esc for a fresh prompt.

You can install the latest CRAN version of fuzzySim by pasting the command below in the R console (when connected to the internet). You may need to provide additional information if R asks you, such as selecting a CRAN mirror to download from (choose any).

install.packages("fuzzySim")

If you instead need the latest development version of the package, install it from R-Forge:

install.packages("fuzzySim", repos = "http://R-Forge.R-project.org")

You only need to install the package once (unless a new version becomes available), but you need to load it every time you open a new R session in which you intend to use fuzzySim (no need for an internet connection anymore), by pasting the following command in R:

library(fuzzySim)

Preparing the species occurrence data

NOTE: This package works mainly with species occurrence data in presence-absence tables. If your data are point occurrences and raster maps, check out this blog post on how to convert them into tabular presence-absence data.

Load the rotifers sample dataset that comes with the fuzzySim package, to use as an example:

data(rotifers)

This dataset was originally published by Fontaneto, Barbosa, Segers & Pautasso in Ecography. You can get more information on the data (the following command should open an R Documentation window):

help(rotifers)

Show the first 10 rows of the rotifers dataset:

head(rotifers, 10)
##     TDWG4                          species
## 1  DEN-OO Brachionus_plicatilis_plicatilis
## 11 DEN-OO  Keratella_cochlearis_cochlearis
## 15 DEN-OO      Keratella_quadrata_quadrata
## 16 DEN-OO             Asplanchna_priodonta
## 17 DEN-OO   Brachionus_angularis_angularis
## 19 DEN-OO                Filinia_longiseta
## 24 DEN-OO          Brachionus_calyciflorus
## 32 FIN-OO             Asplanchna_priodonta
## 37 FIN-OO  Keratella_cochlearis_cochlearis
## 46 FIN-OO          Brachionus_calyciflorus

The first column contains the identifiers of the spatial units, which are TDWG level 4 region codes, and the second column contains the (sub)species names. These are a bit long, especially if we intend to use them as column names further on, so use the spCodes function to add to the rotifers dataset a column named spcode with species name abbreviations, consisting of the first letter of the genus + the first 5 letters of the specific name. Specify that the character separating words in the input species names is an underscore (_, as you’ve seen in the table above), and that the character you want separating the genus from the specific name code is empty (no separator):

rotifers$spcode <- spCodes(rotifers$species, sep.species = "_", nchar.gen = 1, nchar.sp = 5, nchar.ssp = 0, sep.spcode = "")
## OK - no duplicated spcodes found.

You can try the above with different options for nchar.gen, nchar.sp and nchar.ssp; the function will return an error message if the resulting codes are not unique for each species. Find out more details on this function with help(spCodes).

Now show the first 10 rows of the rotifers dataset after you’ve added the spcode column:

head(rotifers, 10)
##     TDWG4                          species spcode
## 1  DEN-OO Brachionus_plicatilis_plicatilis Bplica
## 11 DEN-OO  Keratella_cochlearis_cochlearis Kcochl
## 15 DEN-OO      Keratella_quadrata_quadrata Kquadr
## 16 DEN-OO             Asplanchna_priodonta Apriod
## 17 DEN-OO   Brachionus_angularis_angularis Bangul
## 19 DEN-OO                Filinia_longiseta Flongi
## 24 DEN-OO          Brachionus_calyciflorus Bcalyc
## 32 FIN-OO             Asplanchna_priodonta Apriod
## 37 FIN-OO  Keratella_cochlearis_cochlearis Kcochl
## 46 FIN-OO          Brachionus_calyciflorus Bcalyc

The rotifers dataset is in long format, listing in the same column the species that are present in each spatial unit. For analyzing distributional relationships with fuzzySim, we need a presence-absence table with species in separate columns (wide format). So, create a new table called rotifers.presabs with the rotifers presence-absence data converted to wide format and using spcodes as column names:

rotifers.presabs <- splist2presabs(rotifers, sites.col = "TDWG4", sp.col = "spcode", keep.n = FALSE)

Show the first rows of the result:

head(rotifers.presabs)
##    TDWG4 Abrigh Afissa Apriod Bangul Bcalyc Bplica Bquadr Burceo Cgibba Edilat
## 1 ABT-OO      0      0      1      1      0      0      0      0      0      0
## 2 AFG-OO      1      0      1      1      1      1      1      1      1      0
## 3 AGE-BA      1      1      0      1      1      1      1      1      0      1
## 4 AGE-CD      0      0      0      1      0      1      1      0      0      1
## 5 AGE-CH      0      0      0      1      0      0      1      0      1      0
## 6 AGE-CN      0      0      0      0      1      1      1      0      0      1
##   Flongi Kcochl Kquadr Ktropi Lbulla Lclost Lhamat Lluna Llunar Lovali Lpatel
## 1      1      1      1      0      0      0      0     1      1      1      0
## 2      0      1      1      1      1      1      1     1      0      1      1
## 3      1      1      1      1      1      1      1     1      1      1      0
## 4      1      1      0      0      1      1      0     1      1      1      0
## 5      1      0      0      1      1      1      1     0      1      1      0
## 6      1      0      1      0      0      0      0     1      0      1      0
##   Lquadr Mventr Ppatul Pquadr Pvulga Specti Tpatin Tsimil Ttetra
## 1      0      0      1      0      1      1      0      0      1
## 2      1      0      1      1      1      0      1      1      0
## 3      1      1      1      1      1      0      1      0      1
## 4      1      1      0      0      1      0      0      1      0
## 5      1      1      1      1      1      0      0      1      1
## 6      0      1      1      1      1      0      0      1      0

Mapping the species occurrence data

You can map these data if you have a map of the same spatial units and with the same unit identifiers; the TDWG maps are available online. You can use R commands to download the TDWG level 4 map into your current R session. This requires the terra package; the following command will install terra within your R installation if it’s not there already (and if you’re connected to the internet).

if (!("terra" %in% .packages(all.available = TRUE)))
  install.packages("terra")

Now load the terra package and create a map named tdwg by importing the online file, using the vect function of terra:

library(terra)
tdwg <- vect("https://raw.githubusercontent.com/tdwg/wgsrpd/master/geojson/level4.geojson")

The map is now in your R session. Now add the rotifers.presabs data to the tdwg map’s attribute table within R, matching them by the name of the column containing the region identifiers in each table, and keeping all rows of tdwg in the final result:

tdwg <- merge(tdwg, rotifers.presabs, 
              by.x = "Level4_cod", by.y = "TDWG4", 
              all.x = TRUE)

You are now ready to map the presence-absence of particular species from the rotifers.presabs table:

plot(tdwg, y = "Abrigh", col = rev(heat.colors(2)), main = "Abrigh occurrence records")

Try this also with y = other species in names(tdwg). As you can see in the maps and in the rotifers.presabs table, these are binary (either 0 or 1) presence-absence data, with missing values in regions with no rotifer occurrence records in this dataset.

Converting binary to fuzzy occurrence data

You can get a fuzzy (continuous between 0 and 1) version of presence-absence data using e.g. trend surface analysis or inverse distance interpolation. These require the spatial coordinates of each study unit, which the rotifers table doesn’t have. So, load the rotif.env sample dataset that also comes with the fuzzySim package, which is already in wide format and contains the geographical coordinates of the centroid of each TDWG4 unit:

data(rotif.env)

Take a look at its first rows (results not shown here):

head(rotif.env)

Show the column names of this dataset, to see which columns contain the species data and which contain the coordinates:

names(rotif.env)
##  [1] "TDWG4"                    "LEVEL_NAME"              
##  [3] "REGION_NAME"              "CONTINENT"               
##  [5] "Area"                     "Altitude"                
##  [7] "AltitudeRange"            "HabitatDiversity"        
##  [9] "HumanPopulation"          "Latitude"                
## [11] "Longitude"                "Precipitation"           
## [13] "PrecipitationSeasonality" "TemperatureAnnualRange"  
## [15] "Temperature"              "TemperatureSeasonality"  
## [17] "UrbanArea"                "Abrigh"                  
## [19] "Afissa"                   "Apriod"                  
## [21] "Bangul"                   "Bcalyc"                  
## [23] "Bplica"                   "Bquadr"                  
## [25] "Burceo"                   "Cgibba"                  
## [27] "Edilat"                   "Flongi"                  
## [29] "Kcochl"                   "Kquadr"                  
## [31] "Ktropi"                   "Lbulla"                  
## [33] "Lclost"                   "Lhamat"                  
## [35] "Lluna"                    "Llunar"                  
## [37] "Lovali"                   "Lpatel"                  
## [39] "Lquadr"                   "Mventr"                  
## [41] "Ppatul"                   "Pquadr"                  
## [43] "Pvulga"                   "Specti"                  
## [45] "Tpatin"                   "Tsimil"                  
## [47] "Ttetra"

You can see that species are in columns 18 to 47 and geographical coordinates are in columns 10 and 11. You can use either the names or the index numbers of these columns in the multTSA and distPres functions below. Mind that the coordinates must be specified to the function in the order x, y, i.e. Longitude, Latitude!

First, try a multiple trend surface analysis (TSA) for all species using a 3rd-degree polynomial with stepwise selection of terms:

spcols <- 18:47

rotifers.tsa <- multTSA(rotif.env, sp.cols = spcols, coord.cols = c("Longitude", "Latitude"), id.col = 1, degree = 3, step = TRUE)

You can find out more about trend surface analysis and the different options of the multTSA function by reading the help file that appears when you type help(multTSA). Now look at the first rows of the rotifers.tsa table created just above:

head(rotifers.tsa)

Results (not shown here) are continuous values representing the spatial trend in each species’ occurrence. You can confirm that they are bounded between 0 and 1 (so that they can be used in fuzzy logic) by checking the range of values in all columns except the first one (which contains region identifiers rather than species data):

range(rotifers.tsa[ , -1])
## [1] 9.408071e-05 9.996004e-01

Now add the TSA results to the tdwg map table:

tdwg <- merge(tdwg, rotifers.tsa, 
              by.x = "Level4_cod", by.y = "TDWG4", 
              all = TRUE)

Plot the results for the first species (and then others as you like; check names(tdwg) for available y options):

plot(tdwg, y = "Abrigh_TS", col = rev(heat.colors(256)), main = "Abrigh TSA")

The TSA depicts a general spatial trend in a species’ occurrence, but this may not be a faithful representation of its (fuzzy) occurrence area (compare with the presence-absence map shown before for the same species). You can try multTSA again with different polynomial degrees, with or without stepwise selection; or you can compute inverse distance to presence to use instead of TSA:

rotifers.invdist <- distPres(rotif.env, sp.cols = spcols, coord.cols = c("Longitude", "Latitude"), id.col = 1, p = 1, inv = TRUE, suffix = "_D")
## Warning in distPres(rotif.env, sp.cols = spcols, coord.cols = c("Longitude", :
## Distances may be inaccurate when CRS is not provided.

You can check help(distPres) for more information and options for this function (for example, you may want to use p=2 for a more conservative squared distance, especially if your spatial units are smaller). Check out the first rows of the resulting table (results not shown here):

head(rotifers.invdist)

You can check that the values in this table (excluding the first column, which has the region identifiers) also range between 0 and 1, so they can be used with fuzzy logic:

range(rotifers.invdist[ , -1])
## [1] 0 1

Note that inverse distance to presence is computed only for absence localities; distPres maintains the value 1 for presences. Now add these distances to the tdwg map table and plot the first species (then try other species as well):

tdwg <- merge(tdwg, rotifers.invdist, 
              by.x = "Level4_cod", by.y = "TDWG4", 
              all = TRUE)

plot(tdwg, y = "Abrigh_D", col = rev(heat.colors(256)), main = "Abrigh inverse distance")