The Population Dynamics of Disturbance Specialist Plant Populations

Standard

In my second post for Quantitative Dynamics I’m going to discuss a topic that I have studied since my graduate work at the University of Nebraska. In 2010 Brigitte Tenhumberg, Richard Rebarber, Diana Pilson and I embarked on a journey studying the long-term, stochastic dynamics of wild sunflower, a disturbance specialist plant population that uses a seed bank to buffer against the randomness of disturbances.

Because the seeds of disturbance specialist plants cannot germinate without a soil disturbance, there are many periods of time for which these populations will have zero or few above-ground plants, and hence no new members of the population from one season to the next. As such, much like a freelance worker with uncertain pay, a seed bank (account) is necessary for long-term viability.

In our work (which you can find here, here and here) we created an integral projection model with stochasticity modeling 1) the presence of a disturbance and 2) the depth of a disturbance. We found through mathematical analyses and simulations that the presence of disturbances increased population viability (as you would expect), but the intensity, depth and autocorrelation of disturbances had a different effect on populations depending on their viability. For populations that were viable, increasingly intense and positively-autocorrelated disturbances enhanced long-term population sizes, whereas when populations were near extinction levels both dynamics were actually harmful to population viability. These results were novel and surprising. You can find my blog post on the topic in The American Naturalist as well.

In subsequent work we would like to study transient dynamics of such systems. Transient dynamics, to this point, have not garnered the attention of long-term dynamics in stochastic systems. However, my friend Iain Stott and colleagues have gotten the ball rolling in that direction, and it’s only a matter of time.

Integral projection models

Standard

Matrix population models describe populations as discrete life-, size-, or age-stages. Scientists apply these models to understand population ecology and guide conservation. However, some species have continuous life histories. For example, thistles grow continuously as presented within this paper.

Fish also grow continuously. We sought to understand how different management approaches could be used to control grass carp. This species impacts native ecosystems by out-competing native fish. Mangers were interesting in evaluating the use of YY-males to control populations. YY-males work because they spawn and only produce male offspring. Thus, it is possible in theory to cause a population to crash by biasing the sex-ratio.

We constructed an integral projection model for grass carp and compared different yy-male release methods. We found the life history of grass carp does not work well with the YY-male strategy because the species lives long and females produce many offspring.

 

 

An Introduction to Galton-Watson Processes

Standard

Howdy! I’m Eric Eager, and I’m an associate professor of mathematical biology at the University of Wisconsin – La Crosse.  I’m also a data scientist for Pro Football Focus and Orca Pacific.  In my first post for Quantitative Dynamics, I’m going to discuss a topic near and dear to my heart: Branching processes (thanks Sebastian Schreiber for teaching me these five years ago).

Branching processes are a great bridge between the continuous-space population models that permeate the ecological literature (e.g. Caswell 2001, Ellner, Childs and Rees 2016) and the individual-based realities that drive ecological systems (Railsback and Grimm 2011). All branching process models specify an absorbing state (usually extinction in ecology) and model the probability of reaching the absorbing state by creating an iterative map from one generation to the next. This allows you to work with a model whose space is in a set of discrete values (individual-based), but with a resulting model that’s a difference equation (traditional ecological models).

The most famous example of a branching process is the Galton-Watson process. Francis Galton was concerned about the eventual fate of surnames (a quaint artifact of the past), especially among the aristocracy. Below are a couple of videos I made, one deriving the Galton-Watson process and one solving it. Enjoy!

review of “How Not to be Wrong: The Power of Mathematical Thinking”

Standard
I just got done reading “How Not to be Wrong: The Power of Mathematical Thinking” by Jordan Ellenberg. My younger brother lent me his copy. In sentence, the book can be summed up this this phrase from the book: “Mathematics is the extension of common sense by other means.” The book does a great job of explaining how mathematics and statistics can be used understand the world around us. The book is filled many good examples such as Wald’s WWII on where to place extra armor on planes. Wald was given data of where plane got shot and asked where should extra armor be placed (answer: the places without the holes!). The book is filled with many other interesting examples as well.

The only downsides to the book are that it can become long and drawn out at times. Also, I was familiar with many of the examples and had seen them before. Finally, I would add the book is “math lite”, which is a strength for many potential readers.

6 tips for a new LaTeX user

Standard
Recently a coworker started using LaTeX and asked for some tips. Here’s my 6 tips for starting to use LaTeX:
  1.  Start what you finish (i.e., close environments or else you get errors or weird bugs), for example \begin{equation} needs an \end{equation}
  2. Every document needs 3 things: \documentclass{<class>}, \begin{document}, and \end{document}
  3. For equations, use \( \) inline and \begin{eqnarray} \end{eqnarray} equations. \\ creates a new line. Use \\ \nonumber to continue on an equation and &=& to space multiple equations for example:
    <code>
    \begin{eqnarray}
    a &=& b +c \\
    a & =& b \\ \nonumber
     & & c
    \end{eqnarray}
    </code>
     Gives you something like:
    a = b +c   (1)
    a = b

          c        (2)

  4. Bib files are your friend for citations. Use Google Scholar to populate new citations.
  5.  \textit{My italics text}, \textbf{my bold text}, should get most of your formatting. Do NOT use the depreciated {\bf bold} or {\it italics} style. (cf http://tex.stackexchange.com/questions/41681/correct-way-to-bold-italicize-text for more details on the second point)
  6. {} can be very helpful, especially for complicated math functions, when order of operation is important. For example, \sigma_{\pi^2}^{2}

 

ArcPy tips and observations

Standard

When I started using ArcGIS a few weeks ago, I quickly became frustrated when operations failed after pointing and clicking several times. This motivated me start scripting in ArcPy. Now, I only need to load the script file once to make it crash. Thus, I am crashing more efficiently :). Overall, my experience with ArcPy has been alright. I discover two blog posts about hating ArcGIS 10 and ArcGIS sucking that are well worth the read. Despite the titles, the author actually likes ArcGIS. I understand why. ArcPy and ArcGIS are powerful. Plus, the documentation is usually well done and easy to find online (although, including data and reproducible examples like R’s help files would be a great addition).  However, ArcPy does have some problems/idiosyncrasies. Here’s what I have discovered:

1) Non-basic error message are often cryptic and some require hours of using Google. The archives on GIS Stack Exchange has become a new favorite page of mine. As an example, I got this error message (I removed and shortened my file paths in the code snip):

Layer (Folder) exists, please delete output if you wish to recreate layer
done with extraction for us_asp
0:00:36.046000
Could not create tile. name: z:\my documents\...\slopeclipped, adepth: 32, type: 1, iomode: 3, version: 3, compression: 0, eval case: 1 (GetOpenTileChannel)
Traceback (most recent call last):
File "", line 1, in
File "z:/My Documents/.../PythonFile.py", line 77, in
mF.RasterClipper(Slope, grid, SlopeClipped, arcpy.env.workspace)
File "myFunctions.py", line 37, in RasterClipper
arcpy.Clip_management(raster, clipExt, out, "", "ClippingGeometry")
File "C:\ArcGIS\Desktop10.1\arcpy\arcpy\management.py", line 12015, in Clip
raise e
arcgisscripting.ExecuteError: ERROR 999999: Error executing function.
Failed to copy raster dataset
Failed to execute (Clip).

This error leads me to my second tip.

2) The workspace should not include a space. Sometimes, but not all the time, this causes problems. This took me a while to figure out and I only found this by chance online when looking up another problem. For example, this code is okay:
arcpy.env.workspace = "C:/Folder/"

is okay, but
arcpy.env.workspace = "C:/My Documents/"

caused my example problem.

3) Be prepared to use libraries such as NymPy. ArcPy does not always play well with USA wide data. This seems to be the case for ArcPy’s wrapper functions more so than the base functions (My guess is the best performing functions are GDAL functions). I plan on posting this function once I have written it and tested it.

4) “Python Scripting for ArcGIS” is still a great reference!

I plan to post more of my project as I work through it.

Learning Python

Standard

Over the past year, I have been teaching myself Python. I am writing this blog post so I remember how I wish I had learned Python in case I ever need to teach anybody else. My desire to learn Python has been driven by several factors: 1) my laziness and disdain for GUIs (i.e., I hate pointing and clicking); 2) my need to solve complex problems that GUIs would not solve (i.e., GUIs could not solve my problem); and 3) my curiosity to learn new things.

Python is a powerful computer language. I want to use Python for scripting and scientific computing, but many other uses of Python exist. A great way to learn Python and some computer science theory is with the book “Think Python: How to Think Like a Computer Scientist”. The book was originally written by the author for his college class using the Java language. However, he published the book with Creative Commons License and a high school teacher adapted the book for Python! Score one for open source freedom.

After working through this book, I went to the “Invent with Python” books, which are fun (but not necessarily that useful for my goal…). The first book, “Invent Your Own Computer Games with Python” covers basic game designs that are terminal based. The second book, “Making games with Python and Pygame” includes graphics. These books are good because they are simple and fun. If I was simply learning Python for fun, I would start with these books rather than “Think Python”. These books also gave me more experience coding (It’s been said the first 10,000 lines of coding are learning, so at least these gave me a fun couple of thousand), so perhaps the sugar coating of game development worked with me as well.

After these books, dive into subject area books. For me, these were GIS books. “Python Scripting for ArcGIS” does a great job introducing and covering ESRI’s flagship product, ArcGIS. The only downside is that the script is closed source. I decided to stop trying to read source code when a .py file including a warning about being a trade secret. “Learning Geospatial Analysis with Python” is an introduction to advanced (but open source) libraries such as Shapely and GDAL. However, the book is a bit pricey for what you get. But, the book is easier to read than the author’s blog that contains most of the same material. Also note that the ebook kills Pythons tab spacing (Python uses white space such as tabs to define functions and code). Also, I was initially overwhelmed because I did not know GIS or Python my first time reading it. My co-workers copy of “The Geospatial Desktop” provided a great refresher to my GIS skills!

Finally, Learning Python, 5th ed. is a tome, but very through. My own course of study began with the games books. Then I tried “Learning Geospatial Analysis with Python”. Along the way, I bought “Learning Python” and then discovered “Think Python”. I discovered the ArcGIS book only when I needed to script AcrPy for work. Not an optimal course of study, but needing to script ArcGIS has been a great crash course in Python for me. As an added bonus, Python has also made me a better R programmer! Horray for learning.

 

 

Maxent, a tool for ecologist and other spatial modelers

Standard

What is Maxent?

Maxent is a tool that uses a “maximum-entropy approach for species habitat modeling”. Maximum entropy is a statistical concept that may be used as a method for fitting distributions. A research group from Princeton University applied this approach to habitat modeling for ecologists. Specifically, Maxent is designed to model presence only data, which is traditionally difficult for ecologists. This data occurs commonly in ecology because we know where organisms are found, but do not have good observations of where they are not located.

How does Maxent work?

That’s a good question. The underlying theory was based upon computer science and machine learning research from the 1950s (as a side note, applied statisticians across different disciplines are often discovering and rediscovering each others work. The proliferation of science has only made the lack of cross-communication worse because of the volume of scientific output produced). The Maxent authors provide three articles explaining the software on the program’s homepage. However, as far as I can tell, the program is close source, which means looking at the code is difficult if not impossible (for this reason, a colleague of mine passionately dislikes Maxent). Luckily for us, Renner and Warton dove into the weeds and explain how Maxent works. They discovered that the underlying model is simply a Poisson point processes model (a generalized linear model [GLM] with Poisson error distribution). What makes Maxent special is how the GLM is parametrized. Maxent uses a lasso approach. As an interesting site note, Warton also shows how the Poisson point processes model converges to logistic regression under certain conditions.

What does this mean for the casual user?

Ecologists are known for having messy data and needing powerful statistical tools. That being said, ecologists have also been criticized for not knowing enough statistics. For the non-statistician ecologist, Warton’s research means we have expanded our theory of how Maxent works. That being said, ecologists often become polarized and even dogmatic with their viewpoints. Some people hate Maxent while others seem to worship it. Personally, I am skeptical of it because the program feels black box/closed source and some users over-hype it. When Warton made some of these points on the Methods in Ecology and Evolution blog (while posting as an Associated Editor), he elicited some colorful comments. Additionally, some of Warton’s discussion arose because Maxent was initially presented as being free of some assumptions of a GLM. Not surprisingly (if you’ve hung out around ecologists), one of Warton’s critic soon dove into a God/religion comparison. Ironically, I would argue that his or her view supporting Maxent is dogmatic.  Rather than embracing a black box approach, programing a transparent appraoch seems less dogmatic, but I digresses (Ellison and Dennis make this point in their pathways to statistical fluency article when being critical of Program Mark users for lacking control over model assumptions).

What does this mean for the “DIY” crowd/advanced users?

The DYI (Do it yourself) crowd that is comfortable programming would not need Maxent. The glm function and a stepwise function in R combined with a GIS program such as GRASS or ESRI’s ArcGIS should be able to get the job done for a basic user (some R users might be able to do everything in R, but I have not had the best luck using GIS layers in R). For a more advanced user, a Bayesian program such as JAGS or Stan could be used for model parametrization and selection.

Conclusion:

Maxent fills a useful role for ecologists in that it improves their toolbox for modeling habitat using presence only data. For a basic user (such as the group of people who use Program Mark), this canned program could do an adequate job (this is why another colleague of mine likes Maxent and uses it for her own reserach). An advanced user who is comfortable with statistical programming, would likely be better suited fitting his or her data with a Poisson point processes model. All users also need to be aware of the assumptions of their model (be it Maxent or GLM or any program). My own statistical viewpoint has been shaped by Gelman (a future blog post at some point), so I will start off trying to use Stan to parametrize my habitat model with a Bayesian approach.

 

 

 

 

 

 

 

How much will my 401k be in 30 years?

Standard

A couple of months ago, I discovered Mr. Money Mustache (MMM). He’s somebody who “retired” at the age of 30 by saving a ~3/4 of his income and living a frugal lifestyle. A unique and enthralling way of living the American Dream. I have been reading through his different posts and like what I read. As you could likely guess, many of MMM’s blogs deal with investing. I like his posts because they encourage me to think about topics and ideas that I may otherwise not think about.

One of these topics is the “safety factor” when dealing with investing. Safety factors are commonly used in engineering and MMM was trained as an engineer. In a nutshell, safety factors often use the worst case scenario and then include a multiplier (e.g., 10 times that value) as a “safety factor”. As an example, perhaps a type of chair breaks with 1,000 lbs during a test. A safety factor of 2 would limit the chair to 500 lbs max in case the wood deteriorates or other stressors are added or not accounted for.

In contract, I have been trained and an ecotoxicologist with a healthy mix of statistics tossed in to my education. We usually deal with messy data and safety factors are often discouraged because they often do not perform well. One alternative approach is probabilistic risk assessment. This approach assigns a probability distribution to inputs (e.g., market returns) and, conditional on the quality of the data, allows one to estimate the probability of an even occurring. In this post, I apply a a probabilistic approach to explore MMM’s investment assumptions.

MMM usually assumes a 5% return rate on the stock market (or more specifically, on broad index funds) after inflation and uses a 4% withdrawal rate as an added layer of safety. He uses this to define the “safe withdrawal rate” as

“the maximum rate at which you can spend your retirement savings, such that you don’t run out in your lifetime.”

Besides applying this to retirement, he also applies this to idea to understanding how investments grow. Overall, I like his approach. It is a nice approximation that is easy to understand. That being said, I use probabilistic tools at work and I want to explore this investment concept more.

Motivating my own case study is my pending switch from a contractor to Federal Term employee. I will have a ~$3,000 in my 401k when I leave and am wondering how much money will be there in 30 years when I can access it. I’ll do my simulations in R because I am familiar with the program and it is free so anybody can “follow along at home” if they would like (Note that you’ll need to not copy the “>” into R).

To start off, I define two variables:

> nYear = 30 # number of years
> growth = 1.05 # annual growth after inflation

and then use this to calculate how much money I will have in 30 years. This is a difference equation that MMM likes to use and the math is simple:

> growth^nYear
[1] 4.321942

Based upon this worst case scenario, I will should expect to have 4.3 times my starting investment. Therefore, my $3,000 will be come about $13,000 when I can access it in 30 years. However, what about the variability in this simulation and uncertainty? This year, I might get 1.05 growth, but next year could be 1.03 and the year after 1.05. We could simply take the product of 30 years:
1.05 * 1.04 * 1.03 * 1.6 …
but this is tedious, error prone, and boring. Programing and simulations come to the rescue for situations such as this. Instead of generating the numbers by hand, we can assume that they come from a normal distribution. This requires me to guess a standard deviation. I’ll guess 0.05 because it seems reasonable and I do not know a great deal about quantitative finance. In R, this looks like:

> rnorm(n = nYear, mean = 1.05, sd = 0.05)
 [1] 1.0460267 1.0851230 0.9941926 1.0551473 1.0233291 1.0695875 1.0287381 1.1444693
 [9] 1.0428234 1.0557862 1.0947897 1.0367700 1.0036091 1.0797985 0.9861632 1.0154970
[17] 0.9705547 1.1681693 1.0083213 1.1077836 1.0220802 1.0551286 0.9268657 0.9814797
[25] 0.9921165 1.1120307 1.0640827 1.0439697 1.0824330 1.0698607
and this gives us 30 different growth rates. We then take the product of this and save it as x:
> x = prod(rnorm(n = nYear, mean = 1.05, sd = 0.05))
> 
> x
[1] 3.318441

In this example, my investment become 3.31 times larger. But, this is just one possible trajectory. To check out more, I created a function that also plots a histogram and tells me where 95% of the simulations fall:

marketReturn <- function(nYear, growth, sd, reps = 10000, title = "Histogram"){
    x <- numeric(reps)
    for(i in 1:reps){
        x[i] <- prod(rnorm(nYear, growth, sd))
        }

hist(x, main = title)
return(list(
    min = min(x), 
    max = max(x), 
    ci95 = quantile(x, c(0.025, 0.975))
    )
    )
    }

I can now use this function to run 10,000 simulations:

> marketReturn(nYear = nYear, growth = growth, sd = sd, reps = 10^4, title = paste("sd =", sd))
$min
[1] 2.070814

$max
[1] 9.194219

$ci95
    2.5%    97.5% 
2.996903 6.019339

sd005

In case you are unfamiliar, a historgram shows the number of counts on the y-axis and the observations on the x-axis. In this case, the x-axis is how many times my money has grown at the end of 30 years (e.g., 4 times).

Thus, assuming a worse case scenario, I can expect my money to grow between 3 and 6 times it current value. However, what if we dig around on the web, perhaps we can do better with the guess for the standard deviation:
This page list 0.0346 as a standard deviation:

> sd = 0.034
> marketReturn(nYear = nYear, growth = growth, sd = sd, reps = 10^4)
$min
[1] 2.360319

$max
[1] 7.885858

$ci95
    2.5%    97.5% 
2.989520 6.035744

sd003

Which yields a similar result. Another site uses 0.15, but this seems high

> sd = 0.15
> marketReturn(nYear = nYear, growth = growth, sd = sd, reps = 10^4, title = paste("sd =", sd))
$min
[1] 2.191817

$max
[1] 7.982334

$ci95
    2.5%    97.5% 
2.977771 6.008814

sd015

Regardless, the 5% annual growth after inflation is a worst case scenario from MMM. A more optimistic return might be 7%:

> growth = 1.07
> sd = 0.05
> 
> marketReturn(nYear = nYear, growth = growth, sd = sd, reps = 10^4, title = paste("Mean growth of 7% with sd =", sd))
$min
[1] 2.830627

$max
[1] 18.28947

$ci95
     2.5%     97.5% 
 4.491611 12.106446

g7

Thus, if I am lucky, or at least average, I will most likely have 6-7x my initial amount when I retire. Thus, my $3,000 will likely be $18,000 to $21,000 in today’s dollars! If I am unlucky, I will only have ~3x my current amount (actually my worst case was ~2.8, which is more pessimistic that MMM’s worst case). And if the market does very well, I could have 12-18x my current amount!
Einstein is often quoted as saying:

“The power of compound interest the most powerful force in the universe.”

Although this quote may or may not be from him, these simulations demonstrate the power of compound interest.

Although, past performance is no indicator of future gains, the future does look promising! If you’d like to explore different scenarios, please feel free to adapt and share this code. If you find anything cool or any bugs with my code, please share it in the comments.

Midwest Mathematical Biology Conference

Standard

Two weekends ago, I was lucky enough to attend the Midwest Mathematical Biology Conference.

This meeting was hosted by the University of Wisconsin-La Crosse and was the first time for the event. Overall, the meeting was great. I enjoyed the great keynote speakers. Meetings like this help inspire me because I enjoy seeing how other people solve problems. I also enjoy the networking and collaborations formed between researchers.

The keynote address that stood out most to me as the one by Carlos Castillo-Chavez. Carlos provided a great overview on epidemiology modeling and to use multiple scales of modeling together. He summarized a project that used an Individual-based Model to examine how disease spread at the city-level and then scaled up the results to a national level. These results were also compared traditional SIR models. The research does a great job illuminating where people catch infection disease during outbreaks. Surprisingly, most people get sick at home. This happens because people have the most contact with others when they are at home. However, when school is not in session, outbreaks are less likely to occur.

Besides the other great talks, my postdoc mentor had a great observation. He realized the USGS has very few mathematicians and all of them are doing geology (or at least non-biology) research. We wished we had more mathematical support. Furthermore, the Midwest Mathematical Biology Meeting expanded his view of biomath. Even if I gained nothing else from the meeting, having my postdoc mentor appreciate math more made it all worth while! However, I did gain more. So, the meeting was great opportunity!