Generalized Additive Model (GAM) SDM explained

Modified on Thu, 23 Feb, 2023 at 1:01 PM

Introduction

Generalized Additive Models (GAMs) are an extension of Generalized Linear Models (GLMs) in such a way that predictor variables can be modeled non-parametrically in addition to linear and polynomial terms for other predictors. Therefore, GAMs are useful when the relationship between the variables are expected to be of a more complex form, not easily fitted by standard linear or non-linear models, or where there is no a prior reason for using a particular model.

Like GLMs, GAMs have three important components:

the probability distribution of the response variable
the linear predictor (LP), which is a combination of all predictor variables and represents an overall score for the environmental suitability.
the link function that describes how the mean of the response depends on the linear predictor.

However, in GAMs the coefficients of the predictor variables in the linear predictor are replaced by a smoothing function. The model fits a smooth curve to each predictor variable and then combines the results additively. The GAM algorithm in BCCVL uses a cubic spline smoother.

The estimation of the values of the variable coefficients is obtained by maximum likelihood estimation (MLE), which maximizes the agreement of the predicted species occurrences with the observed data. In other words, MLE finds the values of the coefficients that result in a model under which you would be most likely to get the observed results. As for GLM models, GAM uses the iteratively reweighted least squares (IWLS) method for MLE.

Advantages

Able to deal with non-linear and non-monotonic relationships between the response and the predictor variables.
Able to deal with categorical predictors.

Limitations

More susceptible to overfitting. To avoid this, it is good practice to compare the model fit of a GLM with the fit of a GAM and evaluate whether the added complexity of GAMs is necessary in order to obtain a satisfactory fit to the data. If the fit of a GLM and GAM is comparable, it is advised to use a GLM model.
Less easy to interpret compared to GLMs.

Assumptions

No assumptions are made about the distributions of the environmental variables. However, they should not be highly correlated with one another because this could cause problems with the estimation.

Requires absence data

Yes.

Configuration options

Biosecurity Commons allows the user to set model arguments as specified below.

random_seed	Seed used for generating random values. Using the same seed value, i.e. 123, ensures that running the same model, with the same data and settings generates the same result, despite stochastic processes such as machine learning or cross-validation.
Number of repetitions (nb_run_eval)	Integer value, corresponding to the number of repetitions to be done for calibration/validation splitting. (default = 10)
Data split percentage (data_split)	Numeric value between 0 and 100, corresponding to the percentage of data used to calibrate the models (calibration/validation splitting). (default = 100)
prevalence	Allows to give more or less weight to particular observations; default = NULL: each observation (presence or absence) has the same weight; if value < 0.5: absences are given more weight; if value > 0.5: presences are given more weight. (algorithm parameter)
Variable importance (var_import)	Integer value, corresponding to the number of permutations to be done for each variable to estimate variable importance. (default = 0)
Scale models (rescale_all_models)	A logical value defining whether all models predictions should be scaled with a binomial GLM or not. (default = FALSE)
Evaluate all models (do_full_models)	A logical value defining whether models calibrated and evaluated over the whole dataset should be computed or not. (default = TRUE)
interaction_level	Regression method used in optimal scaling. (default = 0)
Smooth parameter (k)	Smooth parameter. (default = -1)
Family (family)	Family object specifying the distribution and link to use in fitting. (default = binomial)
Method (method)	The smoothing parameter estimation method. (default = 'GCV.Cp')
Optimizer (optimizer)	An array specifying the numerical optimization method to use to optimize the smoothing parameter estimation criterion (given by method). (default = c('outer', 'newton'))
Select penalty (select)	If this is TRUE then gam can add an extra penalty to each term so that it can be penalized to zero. This means that the smoothing parameter estimation that is part of fitting can completely remove terms from the model. If the corresponding smoothing parameter is estimated as zero then the extra penalty has no effect. (default = FALSE )
Ridge regression penalty (control_irls.reg)	The size of the ridge regression penalty to the model to impose identifiability; for most models this should be 0. (default = 0)
Control epsilon (control_epsilon)	This is used for judging conversion of the GLM IRLS loop. (default = 0.000001 )
Maximum interaction (control_maxit)	Maximum number of IRLS iterations to perform. (default = 100)
Convergence tolerance (control_mgvc.tol)	The convergence tolerance parameter to use in GCV/UBRE optimization (default = 1e-7 )
Number of halvings (control.mcv.half)	If a step of the GCV/UBRE optimization method leads to a worse GCV/UBRE score, then the step length is halved; this is the number of halvings to try before giving up. (default = 15 )
Diagnostic output (control.trace)	set this to TRUE to turn on diagnostic output. (default = FALSE )

References

Elith, J., H. Graham, C., P. Anderson, R., Dudík, M., Ferrier, S., Guisan, A., J. Hijmans, R., Huettmann, F., R. Leathwick, J., Lehmann, A., Li, J., G. Lohmann, L., A. Loiselle, B., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., McC. M. Overton, J., Townsend Peterson, A., … E. Zimmermann, N. (2006). Novel methods improve prediction of species’ distributions from occurrence data. Ecography, 29(2), 129–151.
Franklin, J. (2010). Mapping species distributions: spatial inference and prediction. Cambridge University Press.
Guisan, A., Edwards, T. C., & Hastie, T. (2002). Generalized linear and generalized additive models in studies of species distributions: Setting the scene. Ecological Modelling, 157(2), 89–100.
Hastie. T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference and prediction. 2nd edition, Springer.

Additional Reading

Arenas-Castro, S., Gonçalves, J., Alves, P., Alcaraz-Segura, D., & Honrado, J. P. (2018). Assessing the multi-scale predictive ability of ecosystem functional attributes for species distribution modelling. PLOS ONE, 13(6), e0199292.
Bazzichetto, M., Malavasi, M., Bartak, V., Acosta, A. T. R., Rocchini, D., & Carranza, M. L. (2018). Plant invasion risk: A quest for invasive species distribution modelling in managing protected areas. Ecological Indicators, 95, 311–319.
Cacciapaglia, C., & van Woesik, R. (2018). Marine species distribution modelling and the effects of genetic isolation under climate change. Journal of Biogeography, 45(1), 154–163.
Ducci, L., Agnelli, P., Di Febbraro, M., Frate, L., Russo, D., Loy, A., Carranza, M. L., Santini, G., & Roscioni, F. (2015). Different bat guilds perceive their habitat in different ways: A multiscale landscape approach for variable selection in species distribution modelling. Landscape Ecology, 30(10), 2147–2159.
Eaton, S., Ellis, C., Genney, D., Thompson, R., Yahr, R., & Haydon, D. T. (2018). Adding small species to the big picture: Species distribution modelling in an age of landscape scale conservation. Biological Conservation, 217, 251–258.
Feuda, R., Bannikova, A. A., Zemlemerova, E. D., Di Febbraro, M., Loy, A., Hutterer, R., Aloise, G., Zykov, A. E., Annesi, F., & Colangelo, P. (2015). Tracing the evolutionary history of the mole, Talpa europaea, through mitochondrial DNA phylogeography and species distribution modelling. Biological Journal of the Linnean Society, 114(3), 495–512.
Golding, N., & Purse, B. V. (2016). Fast and flexible Bayesian species distribution modelling using Gaussian processes. Methods in Ecology and Evolution, 7(5), 598–608.
Greiser, C., Hylander, K., Meineri, E., Luoto, M., & Ehrlén, J. (2020). Climate limitation at the cold edge: Contrasting perspectives from species distribution modelling and a transplant experiment. Ecography, 43(5), 637–647.
Niamir, A., Skidmore, A. K., Muñoz, A.-R., Toxopeus, A. G., & Real, R. (2019). Incorporating knowledge uncertainty into species distribution modelling. Biodiversity and Conservation, 28(3), 571–588.
Oyafuso, Zack. S., Drazen, J. C., Moore, C. H., & Franklin, E. C. (2017). Habitat-based species distribution modelling of the Hawaiian deepwater snapper-grouper complex. Fisheries Research, 195, 19–27.
Phillips, N. D., Reid, N., Thys, T., Harrod, C., Payne, N. L., Morgan, C. A., White, H. J., Porter, S., & Houghton, J. D. R. (2017). Applying species distribution modelling to a data poor, pelagic fish complex: The ocean sunfishes. Journal of Biogeography, 44(10), 2176–2187.
Rodríguez-Rey, M., Consuegra, S., Börger, L., & Leaniz, C. G. de. (2019). Improving Species Distribution Modelling of freshwater invasive species for management applications. PLOS ONE, 14(6), e0217896.
Zhang, Z., Xu, S., Capinha, C., Weterings, R., & Gao, T. (2019). Using species distribution model to predict the impact of climate change on the potential distribution of Japanese whiting Sillago japonica. Ecological Indicators, 104, 333–340.
Zuckerberg, B., Fink, D., La Sorte, F. A., Hochachka, W. M., & Kelling, S. (2016). Novel seasonal land cover associations for eastern North American forest birds identified through dynamic species distribution modelling. Diversity and Distributions, 22(6), 717–730.