Test Driven Development has been a major part of the discipline of software engineering for almost two decades; a Red-Green-Refactor cycle helps keep codebases maintainable, while lowering the risk of adding new features by protecting against regression and facilitating extensible designs.
The design and history of arcpy and its related modules can make it difficult to test, and automation is not always given high priority in GIS projects. Nevertheless, there are techniques to help ease the process, and it’s relatively easy to test drive the development of GIS software.
This post walks through a couple of basic test scenarios, and the full project is available on github.
Geodatabase Operations
A common task that can be automated via arcpy is transforming geodatabase data and schemas. We may want to use FieldMappings, add or remove fields, and calculate new values derived from existing fields; all of these operations can be test-driven.
In this example, we’ll test-drive a very simple feature: Adding a new field, and populating it with the sum of two other fields.
Test Fixtures
While arcpy exposes methods to create a new feature class and insert records, this can become difficult to maintain for large numbers of tables with many fields. Another approach is to use a fixture.
Our fixture will be a file geocodatabase (FGDB); this database will hold known data that will be used by our tests. Since a FGDB is just a directory, we can check it into our project and manage it just like any other test files.
When creating a test fixture, we want to use the minimum set of data needed to prove out our logic; in practice, this usually means tables with one or two rows–we’ll be running the tests many, many times, so we want them to be as fast as possible!
Note: ArcMap may lock the database if we have it open; because of this, it’s a good idea to add .lock
files in the test data directory to your .gitignore
file.
Because we need a FGDB to hold our fixture data, the tests are not true unit tests, since they’re still coupled to the filesystem. We can keep this coupling as loose as possible, by copying the fixture data into an in-memory workspace before each test.
Once we have our fixture created, our project structure should look something like this:
├── my_project
│ ├── __init__.py
│ └── my_module.py
├── setup.py
└── tests
├── __init__.py
├── fixtures
│ └── test_data.gdb
│ ├── <a whole bunch of FGDB files>
└── test_my_module.py
Setting up and tearing down
We want each test to run from a fresh, clean state, and to be independent of all the other tests. We can accomplish this by copying our fixture data into memory, and cleaning up in the tearDown
method of the python unittest module; this method will automatically be run after each test.
The Copy tool doesn’t support in_memory
workspaces, but we can use a create and append pattern, with a template:
import os
import unittest
import arcpy
class MyModuleTest(unittest.TestCase):
TEST_GDB = 'in_memory'
def setup_data(self, table_name='MyTable'):
input_file = os.path.abspath('tests/fixtures/test_data.gdb/{}'.format(table_name))
output_file = '{}/{}'.format(self.TEST_GDB, table_name)
arcpy.CreateFeatureclass_management(
out_path=self.TEST_GDB,
out_name=table_name,
template=input_file
)
arcpy.Append_management(
inputs=input_file,
target=output_file,
schema_type='NO_TEST'
)
return output_file
To clean up, we call the Delete tool in our tearDown
method:
def tearDown(self):
arcpy.Delete_management(self.TEST_GDB)
Now we’re ready to write our first test!
Adding a Field
Say we want our business code to add a SUM
field to the table, and calculate some value. We’ll write our test first:
def test adds_sum_field(self):
feature_class = self.setup_data('SumData')
field_name = 'SUM'
add_sum_field(feature_class)
field_names = [field.name for field in arcpy.ListFields(feature_class)]
self.assertTrue(field_name in field_names)
and watch it fail:
$ pytest
============================= test session starts =============================
platform win32 -- Python 2.7.12, pytest-3.1.2, py-1.4.34, pluggy-0.4.0
rootdir: C:\develop\testing-arcpy, inifile:
collected 1 items
tests\test_my_module.py F
================================== FAILURES ===================================
______________________ MyModuleTest.test_adds_sum_field _______________________
self = <tests.test_my_module.MyModuleTest testMethod=test_adds_sum_field>
def test_adds_sum_field(self):
feature_class = self.setup_data('SumData')
field_name = 'SUM'
field_names = [field.name for field in arcpy.ListFields(feature_class)]
> self.assertTrue(field_name in field_names)
E AssertionError: False is not true
tests\test_my_module.py:32: AssertionError
========================== 1 failed in 11.63 seconds ==========================
Next we add our implementation code to the business module:
import arcpy
def add_sum_field(feature_class):
arcpy.AddField_management(
in_table=feature_class,
field_name='SUM',
field_type='DOUBLE'
)
And rerun our tests:
$ pytest
============================= test session starts =============================
platform win32 -- Python 2.7.12, pytest-3.1.2, py-1.4.34, pluggy-0.4.0
rootdir: C:\develop\testing-arcpy, inifile:
collected 1 items
tests\test_my_module.py .
========================== 1 passed in 11.61 seconds ==========================
With everything green, we can now look for opportunities to refactor–we know our code works, and if we break something while cleaning up the code, we’ll know right away. There’s not much logic here, so we’ll go ahead and start the cycle over again, with a new, failing test.
Calculating a Field Value
Now we want to add some logic to calculate our SUM
value; again, we’ll write the test first:
def test_calculates_sum(self):
feature_class = self.setup_data('SumData')
add_sum_field(feature_class)
with arcpy.da.SearchCursor(feature_class, ['SUM']) as cursor:
for row in cursor:
self.assertEqual(row[0], 2)
and watch it fail:
$ pytest
============================= test session starts =============================
platform win32 -- Python 2.7.12, pytest-3.1.2, py-1.4.34, pluggy-0.4.0
rootdir: C:\develop\testing-arcpy, inifile:
collected 2 items
tests\test_my_module.py .F
================================== FAILURES ===================================
______________________ MyModuleTest.test_calculates_sum _______________________
self = <tests.test_my_module.MyModuleTest testMethod=test_calculates_sum>
def test_calculates_sum(self):
feature_class = self.setup_data('SumData')
add_sum_field(feature_class)
with arcpy.da.SearchCursor(feature_class, ['SUM']) as cursor:
for row in cursor:
> self.assertEqual(row[0], 2)
E AssertionError: None != 2
tests\test_my_module.py:42: AssertionError
===================== 1 failed, 1 passed in 12.76 seconds =====================
then implement our business logic:
import arcpy
def add_sum_field(feature_class):
arcpy.AddField_management(
in_table=feature_class,
field_name='SUM',
field_type='DOUBLE'
)
fields = ['FIRST_VALUE', 'SECOND_VALUE', 'SUM']
with arcpy.da.UpdateCursor(feature_class, fields) as cursor:
for row in cursor:
row[2] = row[0] + row[1]
cursor.updateRow(row)
And check that our tests pass:
$ pytest
============================= test session starts =============================
platform win32 -- Python 2.7.12, pytest-3.1.2, py-1.4.34, pluggy-0.4.0
rootdir: C:\develop\testing-arcpy, inifile:
collected 2 items
tests\test_my_module.py ..
========================== 2 passed in 13.22 seconds ==========================
Once everything is green, we’ll look again for places to remove duplication, make our intent clear, and just clean up our code (incuding our test code). It’s easy to gloss over the refactor step, but it is critical!
Looking at our above code, we might separate out setting the new field value into its own function:
import arcpy
def add_sum_field(feature_class):
arcpy.AddField_management(
in_table=feature_class,
field_name='SUM',
field_type='DOUBLE'
)
_update_sum_value
def _update_sum_value(feature_class):
fields = ['FIRST_VALUE', 'SECOND_VALUE', 'SUM']
with arcpy.da.UpdateCursor(feature_class, fields) as cursor:
for row in cursor:
row[2] = row[0] + row[1]
cursor.updateRow(row)
after making a change, we make sure our tests are still green:
$ pytest
============================= test session starts =============================
platform win32 -- Python 2.7.12, pytest-3.1.2, py-1.4.34, pluggy-0.4.0
rootdir: C:\develop\testing-arcpy, inifile:
collected 2 items
tests\test_my_module.py ..
========================== 2 passed in 13.23 seconds ==========================
Having tests lets us make changes to keep our code neat and maintainable, while giving confidence that we haven’t changed the fundamental behavior; it also protects against regression when new features are added, or requirements change in the future. Once you fall into the rhythm of red-green-refactor, it’s both gratifying and surprising to see how quickly you can add new features to a project!
comments powered by Disqus