Inverse-Distance Weighting, Voronoi Hull, Convex Hull, Geographic Distance and Circles 

 

These are all kinds of Geographic models, which are simple with no predictor / dependent variables required. They are often the kind of thing you might do early in an analysis to produce results you will use later in the modelling process. Each is described individually below.

 


Inverse-Distance Weighted Model

Introduction

 

Inverse-Distance Weighted Model is a geographical model that uses the location of known occurrences and predicts that the likelihood of finding a species in an area depends on the distance of that area to a known occurrence point. It is different from the Geographic Distance model, in that it explicitly implements the assumption that points that are close to one another are more alike than those that are farther apart. The probability of species occurrence for an unknown location is calculated as the average of some number of surrounding known locations weighted by their inverse distance from the unknown location. The values of known locations closest to the unknown location have more influence on the predicted value than values of locations farther away. Thus, the values of nearby known locations have greater weights, and the weights decrease as a function of distance, hence the name ‘inverse-distance weighted’. 


This model does not use the input of environmental variables to predict the distribution of a species. This can be a easy way to generate a bias layer using survey locations.

 

Advantages

  • Simple and easy to interpret (but less so than other geographic models)


Limitations

  • Does not use environmental variables to predict species occurrence


Assumptions

N/A


Requires absence data

Yes


Configuration options

Biosecurity Commons allows the user to set model arguments as specified below. 


random_seed  

Seed used for generating random values. Using the same seed value, i.e. 123, ensures that running the same model, with the same data and settings generates the same result, despite stochastic processes such as machine learning or cross-validation. 

Tails (tails) 

The "tails” argument can be used to ignore the left or right tail of the percentile distribution for a variable. I If supplied, tails should be a character vector with a length equal to the number of variables used in the model. Valid values are "both", "low" and "high". 

 

References 

  • Franklin J (2010) Mapping species distributions: spatial inference and prediction. Cambridge University Press. 
  • Hijmans RJ, Elith J (2015) Species distribution modeling with R. 



Veroni Hull

Introduction

Voronoi Hull Model is a geographical model that uses the location of known occurrences and predicts that a species is present inside voronoi hulls around observed occurrences, and absent outside those hulls. To create a Voronoi hull, a polygon is drawn around each known location, which consists of all points whose distance to the known location is less than or equal to its distance to any other known location.

 

This model does not use the input of environmental variables to predict the distribution of a species.


Advantages

  • Relatively simple and easy to interpret

Limitations

  • Does not use environmental variables to predict species occurrence

Assumptions

N/A


Requires absence data

Yes

Configuration options

EcoCommons allows the user to set model arguments as specified below.

   

random_seed  

Setting a random seed would not change this model. 

 

References

  • Hijmans, R. J., & Elith, J. (2015). Species distribution modeling with R. 




Convex Hull

Introduction

Convex Hull is a geographical model that uses the location of known occurrences and predicts that a species can be present within a spatial convex hull around these occurrence points. A convex hull is the smallest polygon that you can draw around the occurrence points enclosing all occurrence points. For any two points, the line between these two points has to fall completely within the convex. 


This model does not use the input of environmental variables to predict the distribution of a species.



 

The International Union for Conservation of Nature (IUCN) uses the convex hull method to estimate the extent of occurrence for species, with the adaptation that large areas with obvious unsuitable habitat (such as ocean for terrestrial species) are excluded. Species are considered to be critically endangered if the extent of occurrence is 

< 100 km², endangered when the extent of occurrence is <5,000 km², and vulnerable when the extent of occurrence is < 20,000 km². 

 

Advantages

  • Simple and easy to interpret 
  • Presence only model, no absence data needed 


Limitations

  • Does not use environmental variables to predict species occurrence
  • Likely to over/underestimate species range; For example:
 

A convex hull acts under the assumption that a species is distributed through the entirety of the convex hull. Major over predictions could arise if there are biomes in the output distribution which the species is not found. In the case of a coastal species, for example, a convex hull would include the whole of Australia rather than just the coastal areas where this species is found. 

 

   

 

  

The second limitation is that convex hulls are susceptible to underestimations if the data is not representative of all the places a species actually occurs. In the image below the northern part of the species distribution is not captured because there are no available data points from this area. This problem is common for species that occur in remote parts of Australia where there is little sampling.




 

  

The third limitation is that convex hulls are vulnerable to errors in occurrence data. If attempting to capture the distribution of wild species, an occurrence record from a zoo or museum, would result in an overestimation of the species range. Errors in the recorded latitude and longitude are also not uncommon, but these kinds of errors can be reduced by filtering your data.

 

 

  


Assumptions

That a species distribution in the wild is well captured by the available outermost points of occurrence data. 

 

Requires absence data 

No

 

Configuration options

Biosecurity Commons allows the user to set model arguments as specified below.  

 

random_seed  

A random seed will not impact this model. 

Tails (tails) 

The "tails” argument can be used to ignore the left or right tail of the percentile distribution for a variable. I If supplied, tails should be a character vector with a length equal to the number of variables used in the model. Valid values are "both", "low" and "high". (default = NULL) 

 
 

References

  • Burgman, M. A., & Fox, J. C. (2003). Bias in species range estimates from minimum convex polygons: Implications for conservation and options for improved planning. Animal Conservation Forum, 6(1), 19–28.  
  • Hijmans, R. J., Phillips, S., & Leathwick, J. (2015). Elith J. dismo: Species distribution modeling. 2014. R package version, 1-1. 
  • IUCN, I. (2001). Red list categories and criteria: version 3.1. IUCN, Gland, Switzerland and Cambridge, UK. 





Geographic Distance

Introduction

Geographic Distance is a geographical model that uses the location of known occurrences and predicts that the likelihood of finding a species in an area depends on the distance of that area to a known occurrence point. The predicted values are the inverse linear distance to the nearest known presence point. Distances smaller than or equal to zero are set to 1 (highest score). 

 

This model does not use the input of environmental variables to predict the distribution of a species.

It is another very simple way to generate a bias layer.

 

 


Advantages 

  • Simple and easy to interpret 
     

Limitations 

  • Does not use environmental variables to predict species occurrence 
     

Assumptions 

  

N/A 

 

Requires absence data 


No 

 

Configuration options

Biosecurity Commons allows the user to set model arguments as specified below. 

   

random_seed  

Setting a random seed will not impact this model. 

scale 

scale (in metres) used to divide the distance from occurrence records before computing the inverse distance.  (default = 1000) 

Tails (tails) 

The "tails” argument can be used to ignore the left or right tail of the percentile distribution for a variable. I If supplied, tails should be a character vector with a length equal to the number of variables used in the model. Valid values are "both", "low" and "high". (default = NULL) 

 
 

References

  • Hijmans, R.J., Elith, J. (2015). Species distribution modeling with R.



  

 

Circles

Introduction 


Circles is a geographical model that uses the location of known occurrences and predicts that a species can be present within a circle with a given radius around these occurrence points. This model does not use the input of environmental variables to predict the distribution of a species. 


The radius is by default computed from the mean of all distances between points. This can be a really large distance, for example, if you are modelling marine species that occur across the globe. In this case, some circles might overlap, and the algorithm tries to merge these circles which might result in a failed experiment. The solution is to rerun the experiment with a fixed distance for the radius.

 


 

Advantages

  • Simple and easy to interpret
  • Presence only model, no absence data needed 
 

Limitations 

   

Does not use environmental variables to predict species occurrence 


Assumptions 

  

N/A 


Requirements 


No 


Configuration options  

Biosecurity Commons allows the user to set model arguments as specified below. 

  

random_seed  

There is no impact of setting a random seed for this model. 

Circle radius (d) 

The radius of each circle in meters. A single number or a vector with elements corresponding to rows in p. If missing, the diameter is computed from the mean inter-point distance. (default = NULL) 

Tails (tails) 

The "tails” argument can be used to ignore the left or right tail of the percentile distribution for a variable. I If supplied, tails should be a character vector with a length equal to the number of variables used in the model. Valid values are "both", "low" and "high". (default = NULL) 

 

References 

  • Hijmans, R. J., Phillips, S., & Leathwick, J. (2015). Elith J. dismo: Species distribution modeling. 2014. R package version, 1-1.