Bioclim SDM explained

Modified on Wed, 22 Feb, 2023 at 3:37 PM

Introduction

Bioclim is a so-called envelope-style method that uses only occurrence data to define a multi-dimensional environmental space in which a species can occur. This space is constructed as a bounding box around the minimum and maximum values of the environmental variables for all occurrences, resulting in a multi-dimensional rectilinear envelope (Figure 1). To avoid the over-predictive effect of outliers, the resulting envelope can be reduced by specified percentiles or standard deviations.

To predict the probability of species occurrences in any given location, BIOCLIM compares the values of the environmental variables at that location to the percentile distribution of the values from known locations. The 50th percentile is the median, which divides the data in half. Environmental variable and unknown location values closer to the 50th percentile reflect higher location suitability for a species and, thus, a higher probability of occurrence (equals to 1). As the tails of the distribution are not distinguished, the 10th percentile is treated as equal to the 90th percentile and both have the same probability value. The BIOCLIM model combines the scores for each environmental variable into an overall probability of occurrence for each location with equal weights for all environmental variables. The simplicity of the model makes BIOCLIM widely used but acknowledged not to perform as well as other modelling methods.

On Biosecurity Commons, BIOCLIM is implemented using the ‘dismo’ R package. There, predicted values larger than 0.5 are subtracted from 1 to transform upper tail values into the lower tail. Then, the minimum percentile score across all environmental variables is used to obtain the overall score for an unknown location. By using the minimum across all variables, the model predicts that a species' chance of occurring at any grid cell is based on the lowest percentile of any of the environmental variables. The final score is subtracted from 1 and then multiplied by two so that the results are between 0 and 1. The developers of the 'dismo' packages have implemented this scaling to make the results more similar to other species distribution modelling methods and easier to interpret. Values of 1 will rarely be observed, as it would require a location to have the optimal (median) value for all environmental variables. Values of 0 are very common as it is assigned to all cells that have at least one environmental variable with a value outside the percentile distribution.

BIOCLIM was the first species distribution modelling (SDM) package that linked spatially explicit species occurrence data with maps of environmental variables. It was developed in Australia under the leadership of Henry Nix.

Advantages

Simple and intuitive
Presence only model, no absence data needed
Provides ranking of environmental predictor variables
Useful in teaching species distribution modelling

Limitations

Susceptible to overprediction
Does not account for the interaction between predictors
Cannot use categorical variables
Does not make quantitative predictions or provide confidence levels

Assumptions

Bioclim was mostly developed to model species distributions in relation to climatic variables and thus assumes that species occurrence is influenced by climate at the scale of climate variables and that these variables are normally distributed.

Requires absence data

No.

Model configuration options

Random seed

Seed used for generating random values. Using the same seed value, i.e. 123, ensures that running the same model, with the same data and settings generates the same result, despite stochastic processes such as machine learning or cross-validation. (default is not to use a random seed NULL)

Bioclim in R

Usage

dismo::bioclim(x, p, ...)

Arguments
x
Raster* object or matrix (including a raster Brick of environmental variables)
p
two column matrix or SpatialPoints* object

...

Additional Arguments

Value

An object of class 'Bioclim' (inherits from DistModel-class)

Author(s)

Robert J. Hijmans

Some good examples:-

logo <- stack(system.file("external/rlogo.grd", package="raster"))
#presence data
pts <- matrix(c(48.243420, 48.243420, 47.985820, 52.880230, 49.531423, 46.182616, 54.168232, 
  69.624263, 83.792291, 85.337894, 74.261072, 83.792291, 95.126713, 84.565092, 66.275456, 41.803408,
  25.832176, 3.936132, 18.876962, 17.331359,7.048974, 13.648543, 26.093446, 28.544714, 39.104026, 
  44.572240, 51.171810, 56.262906, 46.269272, 38.161230, 30.618865, 21.945145, 34.390047, 59.656971,
  69.839163, 73.233228, 63.239594, 45.892154, 43.252326, 28.356155) , ncol=2)
bc <- bioclim(logo, pts)

#or
v <- extract(logo, pts)
bc <- bioclim(v)
p1 <- predict(logo, bc)
p2 <- predict(logo, bc, tails=c('both', 'low', 'high'))

#or
#sp <- SpatialPoints(pts)
#bc <- bioclim(logo, pts)

References

Araujo MB, Peterson AT (2012) Uses and misuses of bioclimatic envelope modeling. Ecology 93(7): 1527-1539. 
Booth TH, Nix HA, Busby JR, Hutchinson MF (2014) BIOCLIM: the first species distribution modelling package, its early applications and relevance to most current MAXENT studies. Diversity and Distributions, 20(1): 1-9. 
Hijmans RJ, Elith J (2015) Species distribution modeling with R.