I’ve seen a lot of GIS developers struggle to create a good project structure when building Python applications; often there’s a transition from one enormous file with a single method to a “real” software project, with modular design, well defined dependencies, and the necessary tooling.
The goal of this post is to be a summary and short checklist; these steps can improve almost any project, and are easy to implement. Used properly, they can help ease developer onboarding, promote code reuse, and reduce the time spent on boilerplate activities and code. Even the smallest applications can benefit from these tools and conventions, and it’s rare to see any real-world project that doesn’t include all of the following:
- Package management via
- Environmental isolation using
- Standard directory structure
Setup.pyfile to track versions and dependencies
- Test runner and scaffolding
If you aren’t using a package manager to bring in your dependencies, you’re making your life harder. Modern python versions include pip by default; if you’re using a python version older than 2.7.9 (that’s ArcMap 10.3 and older), you’ll need to install it.
With pip, you can bring in third-party libraries automatically, without having to rely on them already having been installed; rather than reading through documentation (or the source code!) and manually hunting down
beautiful_soup, etc, you can bring them all in just by doing
GIS machines tend to get clogged up over time; a small number of staff responsible for a large number of projects is the norm, and those project tend to live a long time, requiring periodic maintenance. Rather than installing all the dependencies of all our projects into the global
site_packages directory, we can use virtual environments to keep a separate set of dependencies and library versions for each project we work on.
Pulling in the standard .gitignore file right from the start will keep a lot of junk out of your repository–
.pyc files, your virtual environment, and any editor-specific files don’t need to be checked into version control. If you want an easy tool to manage
.gitignore templates, try getignore.
4. Directory Structure
The typical python project has a very simple directory structure–a top-level directory for the source files, a
tests directory, and a few files (
.gitignore) in the root:
It’s usually a good idea to keep your python projects fairly flat–each .py file will act as its own module. If you need more nested directories, don’t forget to add an init.py file.
setup.py is analogous to
package.json. A very simple setup.py might look like:
install_requires are the dependencies our project needs to run–in this case, we’re using
extras_require is a list of additional dependencies used for development–our testing tools, linters, etc.
6. Test Runner
Even if you have don’t have any tests around your code yet, setting up the test runner and getting into the habit of running it before checking in can be a great first step; all you need is a
tests directory, a module with the prefix
test_, and a simple
pip install pytest.
Once these are in place, you can add tests a bit at a time; you can find low-hanging fruit in your existing code–maybe you have a pure function that is easy to write tests for. And the next time you have to add a brand-new feature, you can implement it in a separate module, and practice using TDD.
Working with the Project
Once we have everything in place, developing against this package is straightforward; with a few lines, a new developer can be up and running on a fresh machine:git clone firstname.lastname@example.org:lobsteropteryx/testing-arcpy.git cd testing-arcpy virtualenv --python C:/Python27/ArcGIS10.5/python.exe --system-site-packages venv source venv/Scripts/activate pip install . pip install .[test] pytest --cov=my_project tests/
It’s also easy to publish your code as a package, so that it can be shared between applications, departments, or even across organizations.comments powered by Disqus