Data.gov, a great source of teaching data

Standard

Recently, I’ve been finding datasets for a R course I’m teaching. In the past, I’ve used my own data or the default R courses. But, I wanted to find broader datasets that would appeal to a broader audience than the ecologists and environmental scientists I usually interact with. Enter Data.gov.

Although quirky, the I’ve found several interesting datasets for students to explore.  These range from economic data to medical data to crime data. I only have 3 criticisms. First, the page took a little bit to figure out how to navigate around. However, I quickly got around this. Second, many cool datasets only have meta-data because the original data cannot be shared. Atlas, all good things have limits. Third, several pages had dead links. However, I was often able to find the original dataset using quick Google search.

In conclusion, Data.gov rocks. It is an integrated data warehouse for data from around the US ranging from local governments up to state and federal datasets. If it was a commercial product, I would give it 3 stars, but as government product I give it 4 starts because of all the agencies it bridges.

My Favorite Topic to Teach

Standard

In this post I’m going to discuss a topic that I’m currently covering in UW – La Crosse’s MTH 353 Differential Equations course: Laplace transforms. While (like many math topics) I didn’t appreciate transforms as much as I should have when I learned them for the first time, they have become my favorite topic to teach in the undergraduate curriculum.

First, Pierre-Simon Laplace was an absolute crusher in the mathematical sciences. In addition to the transform method that bears his name, he’s responsible for a lot of the theoretical underpinnings of Bayesian statistics (one of Richard’s favorite topics), tidal flow, spherical harmonics, potential theory and Laplace’s equation, among many other things.

The Laplace transform, in its simplest application, transforms linear, generally inhomogeneous, constant-coefficient ordinary differential equations of time t into an algebraic equation of a (complex) frequency variable s.

What I love the most about Laplace transforms as a topic in the mathematics curriculum is that is requires students to apply techniques from earlier in their training. For example:

– Completing the square (elementary algebra)

– Horizontal translation of functions (elementary algebra)

– Improper integration (second-semester calculus)

– Partial fraction decomposition (second-semester calculus)

– Linear transformations (linear algebra/functional analysis)

– Elementary theory of linear ODEs (elementary differential equations)

Another nice thing about the Laplace transform is that it can handle discontinuous inhomogeneous (forcing) data like Heaviside step functions and Dirac delta functions, as well as forcing terms that aren’t their own derivatives like polynomials, sines, cosines and exponential functions. When viewing the solution to differential equations in this setting, in the space of functions of the variable s, one can clearly see how initial data and forcing data is propagated in time back in the original solution space.

If you’re interested in Laplace transforms, I’ve created some videos for my MTH 353 course, and they can be found here on my YouTube page!

 

Clustering Applied to Quarterback Play in the NFL

Image

In this blog post I want to talk a bit about unsupervised learning. As some of you that know me may know, I am relatively new to data science and machine learning, having my formal educational training in applied mathematics/mathematical biology. My interest in machine learning came not through mathematical biology or ecology, but through studying football.

Using ProFootballFocus data (I am a data scientist for PFF) we can study the quality of quarterback play through the process of grading players on every play of every game of every season.  To do so, it’s the most efficient to “cluster” quarterback seasons into buckets of similar seasons.  The best way to do this (do date) is through k-means clustering.

While there are many references on k-means clustering in the literature and on the web, I’ll briefly summarize the idea in this blog.  K-means clustering is an unsupervised learning algorithm that aims to partition a data set of n observations into k clusters where each observation belongs to one and only one cluster with the nearest mean.  Visually, one can think of a cluster as a collection of objects in m-dimensional space that are “close” to each other.  Below is an example of clustering quarterbacks from the 2016 season by their proportions of positively-graded and negatively-graded throws.  Different clusters are visualized with different colors:

As a part of our in-depth study of quarterback play at PFF, we clustered quarterbacks on the composition of their play-by-play grades in various settings (when under pressure, when kept clean, with using play action).  This gave us a tier-based system in which to evaluate the position throughout the PFF era (2006-present).  In 2016 the only quarterback that was in our top cluster on all throws, throws when from a clean pocket, throws when under pressure, and throws on third and long was New England Patriots’ star Tom Brady.

Stay tuned for more of an in-depth look at the quarterback position by visiting profootballfocus.com both in-season and during the offseason.

 

 

Teaching Mathematical Biology at the College Level

Standard

As another semester at the University of Wisconsin – La Crosse reaches its halfway point, it’s time to start preparing for my spring class – MTH 265: Mathematical Models in Biology. This is a course that only has a first-semester calculus prerequisite, meaning that it is unlike many of the mathematical biology courses around the world (which often require differential equations and/or linear algebra as a prerequisite).

My thought process when teaching this course is that the students likely do not have the mathematical background to fully appreciate the breadth and depth that mathematical biology has to offer. Whether it’s the global and/or asymptotic stability of equilibria to difference equations, principal component analysis applied to multivariate data, or Markov Processes applied to Allele frequencies, research-level mathematical biology requires mathematical flexibility and maturity. However, most of the students in my MTH 265 class are not mathematics majors. Many will be researchers or practitioners of the life sciences, though, meaning that they will have to interact in a meaningful way with mathematicians, statisticians and computer scientists at some point during their careers. Thus, my goal for the course eventually became to give a survey of many different topics pertaining to mathematical biology during the 15-week course.  This way, they will know that a solution (possibly) exists to their quantitative problems (even if they may not be able to come up with it themselves).  Simply knowing such a solution exists allows one to approach the right people for collaborations, and keeps the math-biology interface a fruitful one.

Survey courses are fairly common in graduate work, but students in their second semester of mathematics are pretty new to reading mathematics. Thus, to cover the material in 15 weeks, I created a collection of videos as a part of an inverted, or “flipped” classroom.  Videos appear to be a medium that reaches current students better than (or in conjunction with) traditional textbooks. Students were asked to view these videos prior to class, while during class they were assigned groups in which they worked on “case studies” that took the duration of the hour. I provided assistance with the case studies, as well as any homework questions the students had.

The term “flipped” comes from the way the course is structured relative to a traditional course, where lectures occur during the regular class period (where the professor is present but the student engagement is low) and homework/case studies occur outside of the classroom space (where demands on the student are high, but direct help from the professor is not immediately available).

This course has been a great success. Some of the things we’ve learned from flipping the course can be found in this paper, and were used in a section of Grand Valley State professor Robert Talbert’s new book on flipped learning in the college classroom.  I owe a great deal of my ideas to the Mathematical Association of America, especially their Project NExT program.  The progress we’ve made as educators even in the short time (six years) I’ve been a mathematics professor has me excited for what is to come for the future.

Continuous time, discrete event models

Standard

Recently, I’ve been exposed to situations where I am trying to model discrete, binary events (i.e., 0 or 1 like heads-or-tails). My knee-jerk response has been: use a logistic regression or another model with a binomial outcome. The jack-of-all trades generalize linear model usually servers me well in these situations. However, my recent events have had continuous-time predictors. Although Cox proportional hazards model can be used if the event is something like survival, this did not seem appropriate for my situation because I had multiple events occurring per individual. Enter in continuous-time, discrete events.

A Poisson regression is similar to a binomial if the probability of an even occurring is small enough. Enter in a Poisson regression as a method for modeling animal behavior. I first saw this in a mathematical statistics paper describing models animal movements, but found another paper by some of the co-authors that was more accessible. From this, I learned I needed to use the following version of the Poisson regression:

y ~ Poisson(μ)

μ = τ exp( β x’).

I was able to program this in Stan, by adopting code I found online. This model can also be modified to treat individuals a random effect (and prevent pseudo-replication) if the data allows or requires it.

The Population Dynamics of Disturbance Specialist Plant Populations

Standard

In my second post for Quantitative Dynamics I’m going to discuss a topic that I have studied since my graduate work at the University of Nebraska. In 2010 Brigitte Tenhumberg, Richard Rebarber, Diana Pilson and I embarked on a journey studying the long-term, stochastic dynamics of wild sunflower, a disturbance specialist plant population that uses a seed bank to buffer against the randomness of disturbances.

Because the seeds of disturbance specialist plants cannot germinate without a soil disturbance, there are many periods of time for which these populations will have zero or few above-ground plants, and hence no new members of the population from one season to the next. As such, much like a freelance worker with uncertain pay, a seed bank (account) is necessary for long-term viability.

In our work (which you can find here, here and here) we created an integral projection model with stochasticity modeling 1) the presence of a disturbance and 2) the depth of a disturbance. We found through mathematical analyses and simulations that the presence of disturbances increased population viability (as you would expect), but the intensity, depth and autocorrelation of disturbances had a different effect on populations depending on their viability. For populations that were viable, increasingly intense and positively-autocorrelated disturbances enhanced long-term population sizes, whereas when populations were near extinction levels both dynamics were actually harmful to population viability. These results were novel and surprising. You can find my blog post on the topic in The American Naturalist as well.

In subsequent work we would like to study transient dynamics of such systems. Transient dynamics, to this point, have not garnered the attention of long-term dynamics in stochastic systems. However, my friend Iain Stott and colleagues have gotten the ball rolling in that direction, and it’s only a matter of time.

Integral projection models

Standard

Matrix population models describe populations as discrete life-, size-, or age-stages. Scientists apply these models to understand population ecology and guide conservation. However, some species have continuous life histories. For example, thistles grow continuously as presented within this paper.

Fish also grow continuously. We sought to understand how different management approaches could be used to control grass carp. This species impacts native ecosystems by out-competing native fish. Mangers were interesting in evaluating the use of YY-males to control populations. YY-males work because they spawn and only produce male offspring. Thus, it is possible in theory to cause a population to crash by biasing the sex-ratio.

We constructed an integral projection model for grass carp and compared different yy-male release methods. We found the life history of grass carp does not work well with the YY-male strategy because the species lives long and females produce many offspring.

 

 

An Introduction to Galton-Watson Processes

Standard

Howdy! I’m Eric Eager, and I’m an associate professor of mathematical biology at the University of Wisconsin – La Crosse.  I’m also a data scientist for Pro Football Focus and Orca Pacific.  In my first post for Quantitative Dynamics, I’m going to discuss a topic near and dear to my heart: Branching processes (thanks Sebastian Schreiber for teaching me these five years ago).

Branching processes are a great bridge between the continuous-space population models that permeate the ecological literature (e.g. Caswell 2001, Ellner, Childs and Rees 2016) and the individual-based realities that drive ecological systems (Railsback and Grimm 2011). All branching process models specify an absorbing state (usually extinction in ecology) and model the probability of reaching the absorbing state by creating an iterative map from one generation to the next. This allows you to work with a model whose space is in a set of discrete values (individual-based), but with a resulting model that’s a difference equation (traditional ecological models).

The most famous example of a branching process is the Galton-Watson process. Francis Galton was concerned about the eventual fate of surnames (a quaint artifact of the past), especially among the aristocracy. Below are a couple of videos I made, one deriving the Galton-Watson process and one solving it. Enjoy!

review of “How Not to be Wrong: The Power of Mathematical Thinking”

Standard
I just got done reading “How Not to be Wrong: The Power of Mathematical Thinking” by Jordan Ellenberg. My younger brother lent me his copy. In sentence, the book can be summed up this this phrase from the book: “Mathematics is the extension of common sense by other means.” The book does a great job of explaining how mathematics and statistics can be used understand the world around us. The book is filled many good examples such as Wald’s WWII on where to place extra armor on planes. Wald was given data of where plane got shot and asked where should extra armor be placed (answer: the places without the holes!). The book is filled with many other interesting examples as well.

The only downsides to the book are that it can become long and drawn out at times. Also, I was familiar with many of the examples and had seen them before. Finally, I would add the book is “math lite”, which is a strength for many potential readers.

6 tips for a new LaTeX user

Standard
Recently a coworker started using LaTeX and asked for some tips. Here’s my 6 tips for starting to use LaTeX:
  1.  Start what you finish (i.e., close environments or else you get errors or weird bugs), for example \begin{equation} needs an \end{equation}
  2. Every document needs 3 things: \documentclass{<class>}, \begin{document}, and \end{document}
  3. For equations, use \( \) inline and \begin{eqnarray} \end{eqnarray} equations. \\ creates a new line. Use \\ \nonumber to continue on an equation and &=& to space multiple equations for example:
    <code>
    \begin{eqnarray}
    a &=& b +c \\
    a & =& b \\ \nonumber
     & & c
    \end{eqnarray}
    </code>
     Gives you something like:
    a = b +c   (1)
    a = b

          c        (2)

  4. Bib files are your friend for citations. Use Google Scholar to populate new citations.
  5.  \textit{My italics text}, \textbf{my bold text}, should get most of your formatting. Do NOT use the depreciated {\bf bold} or {\it italics} style. (cf http://tex.stackexchange.com/questions/41681/correct-way-to-bold-italicize-text for more details on the second point)
  6. {} can be very helpful, especially for complicated math functions, when order of operation is important. For example, \sigma_{\pi^2}^{2}