Skip to content

Checkpointing

BenoitMorel edited this page Nov 12, 2018 · 1 revision

Checkpointing

ParGenes provides a checkpointing system. This means that if your analysis stops for any reason (user interruption, cluster wall-time reached etc.), you can restart it from the last saved state.

ParGenes will not try to run again (raxml, modeltest etc.) runs that already finished.

In addition, raxml and modeltest themselves implement their own checkpointing system, which allows ParGenes to restart them from their last saved state if they did not finish to analyse an MSA.

How to use checkpointing

To restart ParGenes, you need to type the original command, and to add --continue. If you don't remember the command you used to run ParGenes at first place, you can find it in the logs.

Changing some ParGenes arguments for a restart

When restarting ParGenes, we strongly recommend you NOT to change the initial command, unless you know what you are doing.

Some exceptions are:

  • The number of cores (-c): it is ok to first run ParGenes with 256 cores, and then with 16 or 1024 cores. ParGenes will reschedule the jobs accordingly.
  • Adding an astral step at the end (--use-astral).

Some examples that will definitively not work with checkpointing:

  • Adding some MSAs to the analysis (by adding them in the input directory)
  • Changing raxml/modeltest parameters
  • Changing the output directory (well this will work, but the checkpoint won't be used)

Clone this wiki locally