Six Essentials for a Python Project· Notes from the Lifeboat

I’ve seen a lot of GIS developers struggle to create a good project structure when building Python applications; often there’s a transition from one enormous file with a single method to a “real” software project, with modular design, well defined dependencies, and the necessary tooling.

The goal of this post is to be a summary and short checklist; these steps can improve almost any project, and are easy to implement. Used properly, they can help ease developer onboarding, promote code reuse, and reduce the time spent on boilerplate activities and code. Even the smallest applications can benefit from these tools and conventions, and it’s rare to see any real-world project that doesn’t include all of the following:

Package management via pip
Environmental isolation using virtualenv
Standard .gitignore file
Standard directory structure
Setup.py file to track versions and dependencies
Test runner and scaffolding

There is a wealth of great documentation about all of these tools; in particular, check out The Hitchhiker’s Guide to Python and the Python Packaging User’s Guide.

1. Pip

If you aren’t using a package manager to bring in your dependencies, you’re making your life harder. Modern python versions include pip by default; if you’re using a python version older than 2.7.9 (that’s ArcMap 10.3 and older), you’ll need to install it.

With pip, you can bring in third-party libraries automatically, without having to rely on them already having been installed; rather than reading through documentation (or the source code!) and manually hunting down requests, beautiful_soup, etc, you can bring them all in just by doing pip install.

2. Virtualenv

GIS machines tend to get clogged up over time; a small number of staff responsible for a large number of projects is the norm, and those project tend to live a long time, requiring periodic maintenance. Rather than installing all the dependencies of all our projects into the global site_packages directory, we can use virtual environments to keep a separate set of dependencies and library versions for each project we work on.

3. Gitignore

Pulling in the standard .gitignore file right from the start will keep a lot of junk out of your repository–.pyc files, your virtual environment, and any editor-specific files don’t need to be checked into version control. If you want an easy tool to manage .gitignore templates, try getignore.

4. Directory Structure

The typical python project has a very simple directory structure–a top-level directory for the source files, a tests directory, and a few files (setup.py, .gitignore) in the root:

├── my_project
│   ├── __init__.py
│   └── my_module.py
├── setup.py
├── .gitignore
└── tests
    ├── __init__.py
    └── test_my_module.py

It’s usually a good idea to keep your python projects fairly flat–each .py file will act as its own module. If you need more nested directories, don’t forget to add an init.py file.

5. Setup

The setup.py file will define a few key pieces of information about our package–its name, version, and dependencies; if you’re a javascript developer coming to python, setup.py is analogous to package.json. A very simple setup.py might look like:

from setuptools import setup, find_packages

setup(
    name='my_project',
    version='1.0.0',
    description='Sample project for arcpy applications',
    url='https://github.com/lobsteropteryx/arcpy-testing',
    author='Ian Firkin',
    author_email='ian.firkin@gmail.com',
    packages=find_packages(exclude=['contrib', 'docs', 'tests']),
    install_requires=['requests'],
    extras_require={
        'test': ['pytest', 'pytest-cov', 'pylint']
    },
    entry_points={
        'console_scripts': []
    },
)

install_requires are the dependencies our project needs to run–in this case, we’re using requests. extras_require is a list of additional dependencies used for development–our testing tools, linters, etc.

6. Test Runner

Even if you have don’t have any tests around your code yet, setting up the test runner and getting into the habit of running it before checking in can be a great first step; all you need is a tests directory, a module with the prefix test_, and a simple pip install pytest.

Once these are in place, you can add tests a bit at a time; you can find low-hanging fruit in your existing code–maybe you have a pure function that is easy to write tests for. And the next time you have to add a brand-new feature, you can implement it in a separate module, and practice using TDD.

Working with the Project

Once we have everything in place, developing against this package is straightforward; with a few lines, a new developer can be up and running on a fresh machine:

git clone git@github.com:lobsteropteryx/testing-arcpy.git
cd testing-arcpy
virtualenv --python C:/Python27/ArcGIS10.5/python.exe --system-site-packages venv
source venv/Scripts/activate
pip install .
pip install .[test]
pytest --cov=my_project tests/

It’s also easy to publish your code as a package, so that it can be shared between applications, departments, or even across organizations.