Knitr and R Markdown

Standard

I’m late to the game, but have recently begun using R Markdown. I was motivated because my employer now has an open data/open code requirement for all data we generate.  My specific problem was that I am often using R code, but need to document what I am doing so that I may share my code. Hence, R Markdown was a perfect solution for me. As an added bonus, I have also switched over my R teaching materials to R Markdown and am now using Markdown to develop an online course on mixed-effect models with R.

Previously, I used sweave. Although powerful, sweave offers similar functionality to RMarkdown, but requires the file to be complied multiple times. Thus, sweave offers me no benefit compared to RMarkdown.

I usuallyRStudio as my editor and loving how it works. RStudio is easy to use and R Markdown is well documented. I was able to learn the program easily and get up to speed because of 3 factors. First, I previously used sweave. Second, I am familiar with Markdown from StackOverflow. Third, I am good with R. My only regret is that I did not start using it earlier.

As for time, learning R Markdown only required a couple of hours on Monday afternoon and I am now fully up to speed. The tutorials built into RStudio were fabulous! In summary, I would recommend RMarkdown for everybody wanting to create documents with R Code embedded within them. !

 

Review of Modeling for Insight

Standard

During graduate school, I attended a Joint Mathematics Meeting and attended a session on “quantitative literacy” as part of higher education. During that session, the book Modeling for Insight: A Master Class for Business Analysts was recommended to me and I purchased the book shortly after the meeting.

The book was written by two professors Stephen Powell currently at Darmoth College and Bob Batt, currently at UW-Madison. They wrote the book for their graduate-level business program while both were at Darmoth College. The target audience is people who need to do quantitative modeling and analysis, but lack programming skills (e.g., MBA students). The book teaches both the modeling process necessary to make quantitative decisions and how to do so using Microsoft Excel (with some plugins).

The book’s overview of the how to make quantitative decisions makes the book a worthwhile purchase in-and-of-itself. For example, the book describes “spreadsheet engineering” as four steps:

  1. Design
  2. Build
  3. Test
  4. Analyze

Although basic steps, they are important. And, many scientists I know build models using more complicated tools and do not use all of these steps (for example, many ecological modelers often do not “test” their code). Also, many simple scientific model builders lack any formal process! Furthermore, even though the model targets business applications, many of the lessons apply to scientists as well. In directly, the authors do a good job of teaching the modeling process. Directly, much of modern science involves running large projects and planning these project. The case studies in the book can be adapted to scientific projects just as easy a traditional business projects.

I have recommended the book to this book to friends who work in business and the admin team at my center for their own business modeling needs. Additionally, I have recommended the book for ecologists who teach modeling classes with Excel because of its value Excel modeling content. More broadly, this book is good for anyone who wants to learn either spreadsheet modeling or wants an introduction to modeling in general.

That being said, there were few things I disliked about the book. First, I do not like that they require Excel plugins. I understand why the authors use them, but I could not justify buying them. For example, Oracle Crystal Ball currents costs just slightly less than $1,000! However, this makes a good case for learning open source programs such a R or Python! Second, some of their terms like “spreadsheet engineering” seems gimmicky to me. However, I suspect jargon like that is often used in the business world. Last, I am coding snob and think everyone should learn programming. But, alas we live in an imperfect world…

Overall, I give this book 5/5 starts! I recommend this book for business people who need to learn to model, but do not want to code; ecologists who use spreadsheet models (although I shake my head at you while doing so); and professors teaching undergraduates spreadsheet modeling either in ecology or business classes.

Data.gov, a great source of teaching data

Standard

Recently, I’ve been finding datasets for a R course I’m teaching. In the past, I’ve used my own data or the default R courses. But, I wanted to find broader datasets that would appeal to a broader audience than the ecologists and environmental scientists I usually interact with. Enter Data.gov.

Although quirky, the I’ve found several interesting datasets for students to explore.  These range from economic data to medical data to crime data. I only have 3 criticisms. First, the page took a little bit to figure out how to navigate around. However, I quickly got around this. Second, many cool datasets only have meta-data because the original data cannot be shared. Atlas, all good things have limits. Third, several pages had dead links. However, I was often able to find the original dataset using quick Google search.

In conclusion, Data.gov rocks. It is an integrated data warehouse for data from around the US ranging from local governments up to state and federal datasets. If it was a commercial product, I would give it 3 stars, but as government product I give it 4 starts because of all the agencies it bridges.