Safer action -> schedule queue dataflow #15

hmh · 2019-04-09T18:43:44Z

This PR addresses issue #13.

The current data flow from an action (in schedule A) to destination(s) in schedule B... is:

If an action returns non-zero, its results are destroyed.
If an action returns zero, and has destinations, its output is moved to the destination schedule(s) "base directory", regardless of whether that (destination) schedule is already running or not.
There is no data life-cycle management implemented to deal with consumed data by an action.

This really gets in the way of implementing resilient reporting of results to a collector. The desired report flow is: render the report, optionally compress it, and if this fails (e.g. storage full), don't remove the source data. After the report is rendered successfully, remove only the source data that is present in that report, and queue the report for transmission. The report is only removed when the transmission succeeds. The render+transmit steps may be implemented atomically, and then it becomes "the source data present in the report must only be removed when the transmission succeeds".

This PR implements the following action -> schedule data-flow:

Action output is sent to an "incoming" queue on each schedule (it can continue to be the base directory of the schedule -- unless it is directed to that actions own schedule, in which case it goes to the "processing" queue of the schedule for immediate availability.
Data can arrive at the "incoming" queue of a schedule at any time, including while that scheduling is already running. Either advisory locking or "write then rename" strategies must be used to ensure no "live" pair of (data, meta) files exist. This is already true as implemented.
When the runner will start processing a schedule, it moves every pair of (meta, data) files to a "processing" queue of that schedule.
Actions must consume data only from the "processing" queue of their schedule.
If, and only if, every action of a schedule returns a zero status, the "processing" queue is emptied by the runner when the schedule finishes running. If any of the actions return a non-zero status or no action exists, the "processing" queue is left alone (i.e. accumulates data).

Note: if a schedule has no actions, it currently will accumulate input in its storage (because it doesn't actually "run"). This behavior is left unchanged.

Consider unsafe (and therefore %-encode) any filesystem name that starts with "." (hidden file, "." and ".." special inodes) or "_", and a few other characters. This gives us a private namespace (anything prefixed with "_") to use internally. CEPTRO.br-issue: MDC-473

Use the new private namespace (_*) to add an _incoming directory to each schedule. When moving output files from an action to a destination schedule, use the _incoming directory instead of the schedule's base directory. CEPTRO.br-issue: MDC-473

Add a function to move .meta and .data files (and only when the pair is present) from the _incoming directory, to the "processing" directory of a schedule, and call it before executing a schedule. Add a function to remove files from the "processing" directory, and call it when all actions of a schedule exit with a successful status. Do not remove directories (action workspaces), or anything prefixed by "_". CEPTRO.br-issue: MDC-473

Should an action attempt to output to its own schedule (i.e. one of its destinations is its own schedule), direct the output to the schedule's active (processing) queue instead of its special incoming queue. The incoming queue would be processed only on the *next* invocation of that schedule, while the active (processing) queue is immediately available for the next action to use. This only really makes sense for "sequential" execution mode though, although it could work on "pipelined" mode depending on how it is implemented. It is rather difficult to detect misuse since struct action does not know about its own schedule, so we don't even try. CEPTRO.br-issue: MDC-473

Define _GNU_SOURCE so that O_DIRECTORY is defined for open(), this also defines _BSD_SOURCE so that we get the prototype for dirfd(). While at it, bump _XOPEN_SOURCE to 700, since dirfd() actually belongs to that level. CEPTRO.br-issue: MDC-473

Use S_ISDIR(st_mode) instead of testing directly for st_mode & S_IFDIR in lmapd_workspace_schedule_clean(). The S_IFDIR bit is also set for Unix sockets and block devices, which we don't want to "implicitly" skip from removal during schedule workspace cleanup. Fixes: commit 1966cd3 (lmapd: workspace: handle the schedule incoming queue) CEPTRO.br-issue: MDC-473

Ensure the inodes we try to move from the incoming to the active queue in a schedule workspace are in fact regular files. The current workspace code and its filesystem layout is not prepared to cope with subdirectories as queue items, and we don't want other inode types getting moved/copied, either. While at it, issue an error message when we fail to link() the "data" file, and correct the error message when we fail to link() the "meta" file (it was displaying the name of the "data" file). CEPTRO.br-issue: MDC-473

lmapd_workspace_action_move() is responsible for moving (actually, copying) action output data to the destination schedule(s). Currently, it would "move" subdirectories (and anything else link() would accept). However, the whole workspace logic and layout is not prepared to deal with directories as action workspace data: hardlinking can only work if there isn't such a directory already at the destination, for example. Also, you can't have the destination schedule's input queue be a parent of anything else (in this case, action workspaces) if an action output can have subdirectories, etc. Change lmapd_workspace_action_move() to only work on regular files. This will skip subdirectories, as well as any unexpected inode types. Note: lmapd_workspace_action_clean() *does* handle subdirectories and other inode types. CEPTRO.br-issue: MDC-473

Skip the private namespace (directories and files prefixed by "_") when moving action results and cleaning action workspaces. This gives actions a way to preserve state across separate runs of a schedule. Hidden files are also skipped when moving an action's result to destinations, but they are not preserved (they will be removed by lmapd_workspace_action_clean()). Note that actions cannot depend on the workspaces not being fully cleaned up when lmapd is restarted or reconfigured, as such workspaces depend on the lmapd task and schedule configuration. CEPTRO.br-issue: MDC-1018 CEPTRO.br-issue: MDC-473

Cleanup any leftover data inside action workspaces of a schedule right before we start executing the schedule. Such leftover data is usually the output (possibly incomplete) of aborted tasks due to lmapd exiting, or force-supressing the schedule. Should these leftover files inside an action workspace be left alone and the action run to completion, the leftover output would end up being moved to the action's destinations. CEPTRO.br-issue: MDC-1018 CEPTRO.br-issue: MDC-473

hmh added 10 commits March 31, 2019 21:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Safer action -> schedule queue dataflow #15

Safer action -> schedule queue dataflow #15

Uh oh!

hmh commented Apr 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Safer action -> schedule queue dataflow #15

Are you sure you want to change the base?

Safer action -> schedule queue dataflow #15

Uh oh!

Conversation

hmh commented Apr 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant