Pydoit for Workflow Automation

Who: Camille Scott, lead instructor.

When: January 20, 2016

Times: 9:15am-12:15pm PST

Where: DSI Space, Shields Library

Contact: Please contact Jessica Mizzi with any questions.

Description

Pydoit is a task management and automation tool, similar to ‘make’ (comparison: http://swcarpentry.github.io/bc/intermediate/doit/make-vs-doit.html). Tasks are defined individually and executed in order according to dependencies, via a directed acyclic graph. The basic building blocks of a pydoit workflow are tasks, which encode the work we would like to get done. Here is an extremely simple task:

def task_hello_world():
   return {'actions': ['echo "hello world!" > hello.txt’ ],
               'targets': ['hello.txt’]}

The task is a python function prefixed with task, which returns a dictionary containing some predefined entries. The actions entry is a list of the actual commands we’d like to run, in this case, a single shell command. The targets entry is a list of the files output by this task. Of course, hello world doesn’t really do anything for us. Throughout this lesson, we’re going to build a pipeline which downloads some data, plots it with matplotlib, generates a markup file with the chart, and outputs a final compiled document – in other words, a barebones version of a publication pipeline.

Computer and workshop requirements

Attendees will need to bring a computer with an Internet connection.

Please install Python 3.x before the workshop by following the instructions below.

Installation instructions

Python

Python is a very powerful programming language. We recommend using Anaconda as an all-in-one installer.

Windows:

  1. Go to https://www.continuum.io/downloads.
  2. Download Python 3.x (3.4 is fine) 64-bit graphical installer for Windows, not Python 2.7.
  3. Install Python 3.x using all defaults except make sure to check Make Anaconda the default Python.

OS X:

  1. Go to https://www.continuum.io/downloads.
  2. Download Python 3.x (3.4 is fine) 64-bit graphical installer for OS X, not Python 2.7.
  3. Install Python 3.x using all defaults.

Once installation is complete, please go to this page to test that Python has installed correctly. If you do not have bash installed, please refer to the installation instructions at the bottom of this page.

Dependencies

Windows and OS X:

If you are using Anaconda, open Python and run:

install doit seaborn jinja2

If you are not using Anconda, open Python and run the previous command as well as:

pip install matplotlib pandas

To install necessary dependencies.

Pandoc

Windows and OS X:

Scroll to the bottom of the latest pandoc release page. Download and run the correct package installer for your system (Windows users use the package ending in -windows.msi, Mac users use the package ending in -osx.pkg).


LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.