...
A more detailed strategy for exception handling is and the implementation of idempotent tasks are given in the following sections.
...
ID | Process | Exception Handling |
---|---|---|
1 | None. | None. |
2 | Tasks are stored within a database and updated whenever a task terminates.
| While start up Luigi, check and re-run open pipeline tasks. |
...
- Given, a result file with a specified format (ContraDrawback: additional effort for data handling, error-prone due file formatting).
- Exit code of the docker container is equal to 0 ( = successful).
The second approach seems is more accurate due simplicity.
...
ID | Process | Exception Handling | ||||
---|---|---|---|---|---|---|
1 | None. | None. | ||||
2 | An algorithm continuously saves results which are deleted if an exception is
|
|
|
| While encountering an exception, clean up inconsistent | container with an error. |
3 | An algorithm persistently saves all or none results.
|
|
|
| Results are only saved if no exception was throws. |
Note 1:
Approaches (2) and (3) may result in inconsistent results if an algorithm saves result fragments and terminates immediately during a
superior exception (e.g. power failure). Therefore, each algorithm is responsible for saving consistent results.
Note 2:
We do not provide any rollback meachanism. Inadequate results are not detected nor cleaned up by Luigi. Therefore, consistent results
are crucial. (Advantages: Reduce complexity of workflow engine and hereby the error rate due rollbacks).
It is also possible to save (consistent) result fragments iteratively. However, if the algorithm is restarted after a failure (e.g. power failure) it has to
be able to handle such fragments individually.
Idempotent Algorithm
Description:
Avoiding to run an algorithm twice, which results are already available.
Approaches:
Using Luigi's mechanism of idempotent algorithms (see preliminary).
ID | Process |
---|---|
1 | Using Luigi's mechanism of idempotent algorithms (see preliminary). |
2 | Each algorithm is responsible for fulfilling the idempotent property by itself. Luigi's mechanism is omitted. Drawback: More complexer algorithms, error-prone due wrong implementations |
Given the mechanism of idempotent tasks and the fact as we can verify an algorithm's exit code, Luigi could simply infer if an algorithm has to be
run or was already completed. A problem appears after restarting Luigi which discards all containers and there exit codes.
A recommended solution is to save all container's exit codes and standard streams (stdout/strerr) within Luigi's database. The solution supports
traceability for historical batch processes and provides Luigi a way to identify complete tasks.
The approach, not to re-run algorithms with a valid result file, is inadequate as the container may be deleted but its results are already present within a database (e.g. property database).