Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Luigi does not provide a pre-defined strategy for exception handling which have to be implemented individually
by the user, if possible. However, Luigi offers a mechanism for so called 'idempotent' tasks which are only run
once, given a specific set of parameters. Hereby, Luigi skipps skips complete tasks while running a pipeline
the second time after encounting encountering an unresolvable dependency (i.e. a predecessor task which failed while execution).

A more detailed strategy for exception handling is and the implementation of idempotent tasks are given in the following sections.

...

IDProcessException Handling
1None.

None.
All tasks have to be re-run manually.
No traceability.

2

Tasks are stored within a database and updated whenever a task terminates.

  1. Register new pipeline tasks within Luigi’s database:
                    
      1. tag database entries as ‘open’
    1. For each task in pipeline tasks
      1. Run task
      2. If tasks task terminates successfully:
                        
          1. tag database entry as ‘successful’
          else
        1. Else:
                          
            1. tag database entry as ‘failed’

While start up Luigi, check and re-run open pipeline tasks.

A successful task (with a docker container as target) can be defined in different ways:

  • Given , a result file with a specified format (ContraDrawback: additional effort for data handling, error-prone due file formatting).
  • Exit code of the docker container is equal to 0 (= successful).

The second approach seems is more accurate due simplicity.

...

Exception handling through algorithm containers (e.g. invalid parameters, unreachable database, ...)

Approaches:

                     container                     
IDProcessException Handling
1None.

None.
All tasks have to be re-run manuallyRisk of data inconsistency.

2

An algorithm continuously saves results which are deleted if an exception is
thrown.

  1. Run algorithm
  2. If any exception is thrown:
      1. Clean up results
      1. Terminate with an error
      Else:
        1. Terminate without an error

      While encountering an exception, clean up inconsistent
      results and terminate

      with an error.
      Each algorithm is responsible for saving consistent results.

      3

      An algorithm persistently saves all or none results.

      1. Run algorithm
      2. If any exception is thrown:
          1. Terminate with an error
        1. Else:
            1. Save results
            1. Terminate without an error

          Results are only saved if no exception was throws.
          Each algorithm is responsible for saving consistent results.

           

           

          Note 1:

          Approaches (2) and (3) may result in inconsistent results if an algorithm saves result fragments and terminates immediately during a
          superior exception (e.g. power failure). Therefore, each algorithm is responsible for saving (only) consistent results.

          Note 2:

          We do not provide any rollback mechanism. Inadequate results are not detected nor cleaned up by Luigi. Therefore, consistent results
          are crucial. (Advantages: Reduce complexity of workflow engine and hereby the error rate due rollbacks).

          It is also possible to save (consistent) result fragments iteratively. However, if the algorithm is restarted after a failure (e.g. power failure) it has to
          be able to handle such fragments individually.

          Idempotent Algorithm

          Description:

          Avoid running an algorithm twice, which results are already available.

          Approaches:

          IDProcess
          1Using Luigi's mechanism of idempotent algorithms (see preliminary).
          2Each algorithm is responsible for fulfilling the idempotent property itself. Luigi's mechanism is omitted.
          Drawback: More complex algorithms, error-prone due wrong implementations.

          Given the mechanism of idempotent tasks and the fact as we can verify an algorithm's exit code, Luigi could simply infer if an algorithm has to be
          run or was already completed. A problem appears after restarting Luigi which discards all containers and there exit codes.
          A recommended solution is to save all container's exit codes and standard streams (stdout/stderr) within Luigi's database. The solution supports
          traceability for historical batch processes and provides Luigi a way to identify complete tasks.