mnarayan
7/12/2015 - 7:03 PM

Tools for simulation and data analysis workflow management

Tools for simulation and data analysis workflow management

Philip Guo summarized the problem quite well in Burrito. Are there any modern solutions to this problem ?

A nice collection of all tools provided by @pditommaso. Some subset worth trying out.

So far Sumatra/noworkflow/recipy/WorldMake appear to care most about provenance tracking; nextflow appears to be a very promising upgrade to gnu make for containerized data science pipelines.

  1. Nextflow
  2. Sumatra
  3. Luigi and SciLuigi
  4. Doit and an tutorial from sw
  5. nipype
  6. joblib Does this effectively give us provenance tracking for free?/use with noworkflow?
  7. drake "Make for Data"
  8. noworkflow, Recommended by Philip Stark
  9. recipy similar to Sumatra but only works within python
  10. Flex A command-line tool for data science pipelines
  11. WorldMake
  12. Reprozip From the noworkflow folks, useful for capturing environment information only.
  13. [scikit-bio] has a nice workflow module worth looking into. Useful project templates

Also, a nice example of a reproducible workflow but any real project is likely to be far more complicated.

Updates: New tools for running computational experiments.

  1. SciExp https://pypi.python.org/pypi/sciexp2/1.1.9
  2. Comp-Exp https://pypi.python.org/pypi/comp-exp/2.3.1
  3. Lazyrunner (old) http://www.stat.washington.edu/~hoytak/code/lazyrunner/
  4. Expyriment http://www.expyriment.org/
  5. Reprozip and Reprounzip https://pypi.python.org/pypi/reprounzip/
  6. pypet is a wrapper around sumatra for dealing with multiple experiments https://pypi.python.org/pypi/pypet/0.4.0