This lesson is being piloted (Beta version)

Reproducible analyses with REANA: Glossary

Key Points

  • Workflow is the new data.

  • Data + Code + Environment + Workflow = Reproducible Analyses

  • Before reproducibility comes preproducibility

First example
  • Use reana-client rich command-line client to run containerised workflows from your laptop on remote compute clouds

  • Before running analysis remotely, check locally its correctness via validate command

  • As always, when it doubt, use the --help command-line argument

Developing serial workflows
  • Develop workflows progressively; add steps as needed

  • When developing a workflow, stay on the same workspace

  • When developing a bytecode-interpreted code, stay on the same container

  • Use smaller test data before scaling out

  • Use workflows as Continuous Integration; make atomic commits that always work

HiggsToTauTau analysis: serial
  • Writing serial workflows is like chaining shell script commands

Coffee break
  • Refresh your mind

  • Discuss your experience

Developing parallel workflows
  • Computational analysis is a graph of inter-dependent steps

  • Fully declare inputs and outputs for each step

  • Use dependencies between workflow steps to allow running jobs in parallel

  • Use scatter/gather paradigm to parallelise parametrised computations

HiggsToTauTau analysis: parallel
  • Use step dependencies to express main analysis stages

  • Use scatter-gather paradigm in staged to massively parallelise DAG workflow execution

  • REANA usage scenarios remain the same regardless of workflow language details

A glimpse on advanced topics
  • Workflow specification uses hints to hide implementation complexity

  • Use kerberos: true clause to automatically trigger Kerberos token initialisation

  • Use resources clause to access CVMFS repositories

  • Use compute_backend hint in your workflow steps to dispatch jobs to various HPC/HTC backends

  • Use open/close commands to open and close interactive sessions on your workspace

  • Enable REANA application on GitLab to run long-standing tasks that would time out in GitLab CI

  • Experiment with containerised workflows to advance scientific reproducibility in your research


reproducible analysis

computational workflows