ArcPy tips and observations

Standard

When I started using ArcGIS a few weeks ago, I quickly became frustrated when operations failed after pointing and clicking several times. This motivated me start scripting in ArcPy. Now, I only need to load the script file once to make it crash. Thus, I am crashing more efficiently :). Overall, my experience with ArcPy has been alright. I discover two blog posts about hating ArcGIS 10 and ArcGIS sucking that are well worth the read. Despite the titles, the author actually likes ArcGIS. I understand why. ArcPy and ArcGIS are powerful. Plus, the documentation is usually well done and easy to find online (although, including data and reproducible examples like R’s help files would be a great addition).  However, ArcPy does have some problems/idiosyncrasies. Here’s what I have discovered:

1) Non-basic error message are often cryptic and some require hours of using Google. The archives on GIS Stack Exchange has become a new favorite page of mine. As an example, I got this error message (I removed and shortened my file paths in the code snip):

Layer (Folder) exists, please delete output if you wish to recreate layer
done with extraction for us_asp
0:00:36.046000
Could not create tile. name: z:\my documents\...\slopeclipped, adepth: 32, type: 1, iomode: 3, version: 3, compression: 0, eval case: 1 (GetOpenTileChannel)
Traceback (most recent call last):
File "", line 1, in
File "z:/My Documents/.../PythonFile.py", line 77, in
mF.RasterClipper(Slope, grid, SlopeClipped, arcpy.env.workspace)
File "myFunctions.py", line 37, in RasterClipper
arcpy.Clip_management(raster, clipExt, out, "", "ClippingGeometry")
File "C:\ArcGIS\Desktop10.1\arcpy\arcpy\management.py", line 12015, in Clip
raise e
arcgisscripting.ExecuteError: ERROR 999999: Error executing function.
Failed to copy raster dataset
Failed to execute (Clip).

This error leads me to my second tip.

2) The workspace should not include a space. Sometimes, but not all the time, this causes problems. This took me a while to figure out and I only found this by chance online when looking up another problem. For example, this code is okay:
arcpy.env.workspace = "C:/Folder/"

is okay, but
arcpy.env.workspace = "C:/My Documents/"

caused my example problem.

3) Be prepared to use libraries such as NymPy. ArcPy does not always play well with USA wide data. This seems to be the case for ArcPy’s wrapper functions more so than the base functions (My guess is the best performing functions are GDAL functions). I plan on posting this function once I have written it and tested it.

4) “Python Scripting for ArcGIS” is still a great reference!

I plan to post more of my project as I work through it.

Learning Python

Standard

Over the past year, I have been teaching myself Python. I am writing this blog post so I remember how I wish I had learned Python in case I ever need to teach anybody else. My desire to learn Python has been driven by several factors: 1) my laziness and disdain for GUIs (i.e., I hate pointing and clicking); 2) my need to solve complex problems that GUIs would not solve (i.e., GUIs could not solve my problem); and 3) my curiosity to learn new things.

Python is a powerful computer language. I want to use Python for scripting and scientific computing, but many other uses of Python exist. A great way to learn Python and some computer science theory is with the book “Think Python: How to Think Like a Computer Scientist”. The book was originally written by the author for his college class using the Java language. However, he published the book with Creative Commons License and a high school teacher adapted the book for Python! Score one for open source freedom.

After working through this book, I went to the “Invent with Python” books, which are fun (but not necessarily that useful for my goal…). The first book, “Invent Your Own Computer Games with Python” covers basic game designs that are terminal based. The second book, “Making games with Python and Pygame” includes graphics. These books are good because they are simple and fun. If I was simply learning Python for fun, I would start with these books rather than “Think Python”. These books also gave me more experience coding (It’s been said the first 10,000 lines of coding are learning, so at least these gave me a fun couple of thousand), so perhaps the sugar coating of game development worked with me as well.

After these books, dive into subject area books. For me, these were GIS books. “Python Scripting for ArcGIS” does a great job introducing and covering ESRI’s flagship product, ArcGIS. The only downside is that the script is closed source. I decided to stop trying to read source code when a .py file including a warning about being a trade secret. “Learning Geospatial Analysis with Python” is an introduction to advanced (but open source) libraries such as Shapely and GDAL. However, the book is a bit pricey for what you get. But, the book is easier to read than the author’s blog that contains most of the same material. Also note that the ebook kills Pythons tab spacing (Python uses white space such as tabs to define functions and code). Also, I was initially overwhelmed because I did not know GIS or Python my first time reading it. My co-workers copy of “The Geospatial Desktop” provided a great refresher to my GIS skills!

Finally, Learning Python, 5th ed. is a tome, but very through. My own course of study began with the games books. Then I tried “Learning Geospatial Analysis with Python”. Along the way, I bought “Learning Python” and then discovered “Think Python”. I discovered the ArcGIS book only when I needed to script AcrPy for work. Not an optimal course of study, but needing to script ArcGIS has been a great crash course in Python for me. As an added bonus, Python has also made me a better R programmer! Horray for learning.

 

 

Maxent, a tool for ecologist and other spatial modelers

Standard

What is Maxent?

Maxent is a tool that uses a “maximum-entropy approach for species habitat modeling”. Maximum entropy is a statistical concept that may be used as a method for fitting distributions. A research group from Princeton University applied this approach to habitat modeling for ecologists. Specifically, Maxent is designed to model presence only data, which is traditionally difficult for ecologists. This data occurs commonly in ecology because we know where organisms are found, but do not have good observations of where they are not located.

How does Maxent work?

That’s a good question. The underlying theory was based upon computer science and machine learning research from the 1950s (as a side note, applied statisticians across different disciplines are often discovering and rediscovering each others work. The proliferation of science has only made the lack of cross-communication worse because of the volume of scientific output produced). The Maxent authors provide three articles explaining the software on the program’s homepage. However, as far as I can tell, the program is close source, which means looking at the code is difficult if not impossible (for this reason, a colleague of mine passionately dislikes Maxent). Luckily for us, Renner and Warton dove into the weeds and explain how Maxent works. They discovered that the underlying model is simply a Poisson point processes model (a generalized linear model [GLM] with Poisson error distribution). What makes Maxent special is how the GLM is parametrized. Maxent uses a lasso approach. As an interesting site note, Warton also shows how the Poisson point processes model converges to logistic regression under certain conditions.

What does this mean for the casual user?

Ecologists are known for having messy data and needing powerful statistical tools. That being said, ecologists have also been criticized for not knowing enough statistics. For the non-statistician ecologist, Warton’s research means we have expanded our theory of how Maxent works. That being said, ecologists often become polarized and even dogmatic with their viewpoints. Some people hate Maxent while others seem to worship it. Personally, I am skeptical of it because the program feels black box/closed source and some users over-hype it. When Warton made some of these points on the Methods in Ecology and Evolution blog (while posting as an Associated Editor), he elicited some colorful comments. Additionally, some of Warton’s discussion arose because Maxent was initially presented as being free of some assumptions of a GLM. Not surprisingly (if you’ve hung out around ecologists), one of Warton’s critic soon dove into a God/religion comparison. Ironically, I would argue that his or her view supporting Maxent is dogmatic.  Rather than embracing a black box approach, programing a transparent appraoch seems less dogmatic, but I digresses (Ellison and Dennis make this point in their pathways to statistical fluency article when being critical of Program Mark users for lacking control over model assumptions).

What does this mean for the “DIY” crowd/advanced users?

The DYI (Do it yourself) crowd that is comfortable programming would not need Maxent. The glm function and a stepwise function in R combined with a GIS program such as GRASS or ESRI’s ArcGIS should be able to get the job done for a basic user (some R users might be able to do everything in R, but I have not had the best luck using GIS layers in R). For a more advanced user, a Bayesian program such as JAGS or Stan could be used for model parametrization and selection.

Conclusion:

Maxent fills a useful role for ecologists in that it improves their toolbox for modeling habitat using presence only data. For a basic user (such as the group of people who use Program Mark), this canned program could do an adequate job (this is why another colleague of mine likes Maxent and uses it for her own reserach). An advanced user who is comfortable with statistical programming, would likely be better suited fitting his or her data with a Poisson point processes model. All users also need to be aware of the assumptions of their model (be it Maxent or GLM or any program). My own statistical viewpoint has been shaped by Gelman (a future blog post at some point), so I will start off trying to use Stan to parametrize my habitat model with a Bayesian approach.