Inverse-Distance Weighting, Voronoi Hull, Convex Hull, Geographic Distance and Circles
These are all kinds of Geographic models, which are simple with no predictor / dependent variables required. They are often the kind of thing you might do early in an analysis to produce results you will use later in the modelling process. Each is described individually below.
Inverse-Distance Weighted Model
Introduction
Inverse-Distance Weighted Model is a geographical model that uses the location of known occurrences and predicts that the likelihood of finding a species in an area depends on the distance of that area to a known occurrence point. It is different from the Geographic Distance model, in that it explicitly implements the assumption that points that are close to one another are more alike than those that are farther apart. The probability of species occurrence for an unknown location is calculated as the average of some number of surrounding known locations weighted by their inverse distance from the unknown location. The values of known locations closest to the unknown location have more influence on the predicted value than values of locations farther away. Thus, the values of nearby known locations have greater weights, and the weights decrease as a function of distance, hence the name ‘inverse-distance weighted’.
This model does not use the input of environmental variables to predict the distribution of a species. This can be a easy way to generate a bias layer using survey locations.
Advantages
Simple and easy to interpret (but less so than other geographic models)
Limitations
Does not use environmental variables to predict species occurrence
Assumptions
N/A
Requires absence data
Yes
Configuration options
Biosecurity Commons allows the user to set model arguments as specified below.
random_seed | Seed used for generating random values. Using the same seed value, i.e. 123, ensures that running the same model, with the same data and settings generates the same result, despite stochastic processes such as machine learning or cross-validation. |
Tails (tails) | The "tails” argument can be used to ignore the left or right tail of the percentile distribution for a variable. I If supplied, tails should be a character vector with a length equal to the number of variables used in the model. Valid values are "both", "low" and "high". |
References
- Franklin J (2010) Mapping species distributions: spatial inference and prediction. Cambridge University Press.
- Hijmans RJ, Elith J (2015) Species distribution modeling with R.
Veroni Hull
Introduction
Voronoi Hull Model is a geographical model that uses the location of known occurrences and predicts that a species is present inside voronoi hulls around observed occurrences, and absent outside those hulls. To create a Voronoi hull, a polygon is drawn around each known location, which consists of all points whose distance to the known location is less than or equal to its distance to any other known location.
This model does not use the input of environmental variables to predict the distribution of a species.
Advantages
Relatively simple and easy to interpret
Limitations
Does not use environmental variables to predict species occurrence
Assumptions
N/A
Requires absence data
Yes
Configuration options
EcoCommons allows the user to set model arguments as specified below.
random_seed | Setting a random seed would not change this model. |
References
- Hijmans, R. J., & Elith, J. (2015). Species distribution modeling with R.
Convex Hull
Introduction
Convex Hull is a geographical model that uses the location of known occurrences and predicts that a species can be present within a spatial convex hull around these occurrence points. A convex hull is the smallest polygon that you can draw around the occurrence points enclosing all occurrence points. For any two points, the line between these two points has to fall completely within the convex.
This model does not use the input of environmental variables to predict the distribution of a species.
The International Union for Conservation of Nature (IUCN) uses the convex hull method to estimate the extent of occurrence for species, with the adaptation that large areas with obvious unsuitable habitat (such as ocean for terrestrial species) are excluded. Species are considered to be critically endangered if the extent of occurrence is
< 100 km², endangered when the extent of occurrence is <5,000 km², and vulnerable when the extent of occurrence is < 20,000 km².
Advantages
- Simple and easy to interpret
- Presence only model, no absence data needed
Limitations
- Does not use environmental variables to predict species occurrence
- Likely to over/underestimate species range; For example:
A convex hull acts under the assumption that a species is distributed through the entirety of the convex hull. Major over predictions could arise if there are biomes in the output distribution which the species is not found. In the case of a coastal species, for example, a convex hull would include the whole of Australia rather than just the coastal areas where this species is found.
The second limitation is that convex hulls are susceptible to underestimations if the data is not representative of all the places a species actually occurs. In the image below the northern part of the species distribution is not captured because there are no available data points from this area. This problem is common for species that occur in remote parts of Australia where there is little sampling.
The third limitation is that convex hulls are vulnerable to errors in occurrence data. If attempting to capture the distribution of wild species, an occurrence record from a zoo or museum, would result in an overestimation of the species range. Errors in the recorded latitude and longitude are also not uncommon, but these kinds of errors can be reduced by filtering your data.
Assumptions
That a species distribution in the wild is well captured by the available outermost points of occurrence data.
Requires absence data
No
Configuration options
Biosecurity Commons allows the user to set model arguments as specified below.
random_seed | A random seed will not impact this model. |
Tails (tails) | The "tails” argument can be used to ignore the left or right tail of the percentile distribution for a variable. I If supplied, tails should be a character vector with a length equal to the number of variables used in the model. Valid values are "both", "low" and "high". (default = NULL) |
References
- Burgman, M. A., & Fox, J. C. (2003). Bias in species range estimates from minimum convex polygons: Implications for conservation and options for improved planning. Animal Conservation Forum, 6(1), 19–28.
- Hijmans, R. J., Phillips, S., & Leathwick, J. (2015). Elith J. dismo: Species distribution modeling. 2014. R package version, 1-1.
- IUCN, I. (2001). Red list categories and criteria: version 3.1. IUCN, Gland, Switzerland and Cambridge, UK.
Geographic Distance
Introduction
Geographic Distance is a geographical model that uses the location of known occurrences and predicts that the likelihood of finding a species in an area depends on the distance of that area to a known occurrence point. The predicted values are the inverse linear distance to the nearest known presence point. Distances smaller than or equal to zero are set to 1 (highest score).
This model does not use the input of environmental variables to predict the distribution of a species.
It is another very simple way to generate a bias layer.
Advantages
- Simple and easy to interpret
Limitations
- Does not use environmental variables to predict species occurrence
Assumptions
N/A
Requires absence data
No
Configuration options
Biosecurity Commons allows the user to set model arguments as specified below.
random_seed | Setting a random seed will not impact this model. |
scale | scale (in metres) used to divide the distance from occurrence records before computing the inverse distance. (default = 1000) |
Tails (tails) | The "tails” argument can be used to ignore the left or right tail of the percentile distribution for a variable. I If supplied, tails should be a character vector with a length equal to the number of variables used in the model. Valid values are "both", "low" and "high". (default = NULL) |
References
- Hijmans, R.J., Elith, J. (2015). Species distribution modeling with R.
Circles
Introduction
Circles is a geographical model that uses the location of known occurrences and predicts that a species can be present within a circle with a given radius around these occurrence points. This model does not use the input of environmental variables to predict the distribution of a species.
The radius is by default computed from the mean of all distances between points. This can be a really large distance, for example, if you are modelling marine species that occur across the globe. In this case, some circles might overlap, and the algorithm tries to merge these circles which might result in a failed experiment. The solution is to rerun the experiment with a fixed distance for the radius.
Advantages
- Simple and easy to interpret
- Presence only model, no absence data needed
Limitations
Does not use environmental variables to predict species occurrence
Assumptions
N/A
Requirements
No
Configuration options
Biosecurity Commons allows the user to set model arguments as specified below.
random_seed | There is no impact of setting a random seed for this model. |
Circle radius (d) | The radius of each circle in meters. A single number or a vector with elements corresponding to rows in p. If missing, the diameter is computed from the mean inter-point distance. (default = NULL) |
Tails (tails) | The "tails” argument can be used to ignore the left or right tail of the percentile distribution for a variable. I If supplied, tails should be a character vector with a length equal to the number of variables used in the model. Valid values are "both", "low" and "high". (default = NULL) |
References
- Hijmans, R. J., Phillips, S., & Leathwick, J. (2015). Elith J. dismo: Species distribution modeling. 2014. R package version, 1-1.