What is Maxent?
Maxent is a tool that uses a “maximum-entropy approach for species habitat modeling”. Maximum entropy is a statistical concept that may be used as a method for fitting distributions. A research group from Princeton University applied this approach to habitat modeling for ecologists. Specifically, Maxent is designed to model presence only data, which is traditionally difficult for ecologists. This data occurs commonly in ecology because we know where organisms are found, but do not have good observations of where they are not located.
How does Maxent work?
That’s a good question. The underlying theory was based upon computer science and machine learning research from the 1950s (as a side note, applied statisticians across different disciplines are often discovering and rediscovering each others work. The proliferation of science has only made the lack of cross-communication worse because of the volume of scientific output produced). The Maxent authors provide three articles explaining the software on the program’s homepage. However, as far as I can tell, the program is close source, which means looking at the code is difficult if not impossible (for this reason, a colleague of mine passionately dislikes Maxent). Luckily for us, Renner and Warton dove into the weeds and explain how Maxent works. They discovered that the underlying model is simply a Poisson point processes model (a generalized linear model [GLM] with Poisson error distribution). What makes Maxent special is how the GLM is parametrized. Maxent uses a lasso approach. As an interesting site note, Warton also shows how the Poisson point processes model converges to logistic regression under certain conditions.
What does this mean for the casual user?
Ecologists are known for having messy data and needing powerful statistical tools. That being said, ecologists have also been criticized for not knowing enough statistics. For the non-statistician ecologist, Warton’s research means we have expanded our theory of how Maxent works. That being said, ecologists often become polarized and even dogmatic with their viewpoints. Some people hate Maxent while others seem to worship it. Personally, I am skeptical of it because the program feels black box/closed source and some users over-hype it. When Warton made some of these points on the Methods in Ecology and Evolution blog (while posting as an Associated Editor), he elicited some colorful comments. Additionally, some of Warton’s discussion arose because Maxent was initially presented as being free of some assumptions of a GLM. Not surprisingly (if you’ve hung out around ecologists), one of Warton’s critic soon dove into a God/religion comparison. Ironically, I would argue that his or her view supporting Maxent is dogmatic. Rather than embracing a black box approach, programing a transparent appraoch seems less dogmatic, but I digresses (Ellison and Dennis make this point in their pathways to statistical fluency article when being critical of Program Mark users for lacking control over model assumptions).
What does this mean for the “DIY” crowd/advanced users?
The DYI (Do it yourself) crowd that is comfortable programming would not need Maxent. The glm function and a stepwise function in R combined with a GIS program such as GRASS or ESRI’s ArcGIS should be able to get the job done for a basic user (some R users might be able to do everything in R, but I have not had the best luck using GIS layers in R). For a more advanced user, a Bayesian program such as JAGS or Stan could be used for model parametrization and selection.
Maxent fills a useful role for ecologists in that it improves their toolbox for modeling habitat using presence only data. For a basic user (such as the group of people who use Program Mark), this canned program could do an adequate job (this is why another colleague of mine likes Maxent and uses it for her own reserach). An advanced user who is comfortable with statistical programming, would likely be better suited fitting his or her data with a Poisson point processes model. All users also need to be aware of the assumptions of their model (be it Maxent or GLM or any program). My own statistical viewpoint has been shaped by Gelman (a future blog post at some point), so I will start off trying to use Stan to parametrize my habitat model with a Bayesian approach.