Skip to content

Conversation

@hmh
Copy link
Contributor

@hmh hmh commented Apr 9, 2019

This PR addresses issue #13.

The current data flow from an action (in schedule A) to destination(s) in schedule B... is:

  • If an action returns non-zero, its results are destroyed.
  • If an action returns zero, and has destinations, its output is moved to the destination schedule(s) "base directory", regardless of whether that (destination) schedule is already running or not.
  • There is no data life-cycle management implemented to deal with consumed data by an action.

This really gets in the way of implementing resilient reporting of results to a collector. The desired report flow is: render the report, optionally compress it, and if this fails (e.g. storage full), don't remove the source data. After the report is rendered successfully, remove only the source data that is present in that report, and queue the report for transmission. The report is only removed when the transmission succeeds. The render+transmit steps may be implemented atomically, and then it becomes "the source data present in the report must only be removed when the transmission succeeds".

This PR implements the following action -> schedule data-flow:

  1. Action output is sent to an "incoming" queue on each schedule (it can continue to be the base directory of the schedule -- unless it is directed to that actions own schedule, in which case it goes to the "processing" queue of the schedule for immediate availability.
  2. Data can arrive at the "incoming" queue of a schedule at any time, including while that scheduling is already running. Either advisory locking or "write then rename" strategies must be used to ensure no "live" pair of (data, meta) files exist. This is already true as implemented.
  3. When the runner will start processing a schedule, it moves every pair of (meta, data) files to a "processing" queue of that schedule.
  4. Actions must consume data only from the "processing" queue of their schedule.
  5. If, and only if, every action of a schedule returns a zero status, the "processing" queue is emptied by the runner when the schedule finishes running. If any of the actions return a non-zero status or no action exists, the "processing" queue is left alone (i.e. accumulates data).

Note: if a schedule has no actions, it currently will accumulate input in its storage (because it doesn't actually "run"). This behavior is left unchanged.

hmh added 10 commits March 31, 2019 21:55
Consider unsafe (and therefore %-encode) any filesystem name that starts
with "." (hidden file, "." and ".." special inodes) or "_", and a few
other characters.

This gives us a private namespace (anything prefixed with "_") to use
internally.

CEPTRO.br-issue: MDC-473
Use the new private namespace (_*) to add an _incoming directory to each
schedule.  When moving output files from an action to a destination
schedule, use the _incoming directory instead of the schedule's base
directory.

CEPTRO.br-issue: MDC-473
Add a function to move .meta and .data files (and only when the
pair is present) from the _incoming directory, to the "processing"
directory of a schedule, and call it before executing a schedule.

Add a function to remove files from the "processing" directory,
and call it when all actions of a schedule exit with a successful
status.  Do not remove directories (action workspaces), or anything
prefixed by "_".

CEPTRO.br-issue: MDC-473
Should an action attempt to output to its own schedule (i.e.  one of its
destinations is its own schedule), direct the output to the schedule's
active (processing) queue instead of its special incoming queue.

The incoming queue would be processed only on the *next* invocation of
that schedule, while the active (processing) queue is immediately
available for the next action to use.

This only really makes sense for "sequential" execution mode though,
although it could work on "pipelined" mode depending on how it is
implemented.  It is rather difficult to detect misuse since struct
action does not know about its own schedule, so we don't even try.

CEPTRO.br-issue: MDC-473
Define _GNU_SOURCE so that O_DIRECTORY is defined for open(), this also
defines _BSD_SOURCE so that we get the prototype for dirfd().

While at it, bump _XOPEN_SOURCE to 700, since dirfd() actually
belongs to that level.

CEPTRO.br-issue: MDC-473
Use S_ISDIR(st_mode) instead of testing directly for st_mode & S_IFDIR
in lmapd_workspace_schedule_clean().

The S_IFDIR bit is also set for Unix sockets and block devices, which we
don't want to "implicitly" skip from removal during schedule workspace
cleanup.

Fixes: commit 1966cd3
       (lmapd: workspace: handle the schedule incoming queue)

CEPTRO.br-issue: MDC-473
Ensure the inodes we try to move from the incoming to the active queue
in a schedule workspace are in fact regular files.

The current workspace code and its filesystem layout is not prepared to
cope with subdirectories as queue items, and we don't want other inode
types getting moved/copied, either.

While at it, issue an error message when we fail to link() the "data"
file, and correct the error message when we fail to link() the "meta"
file (it was displaying the name of the "data" file).

CEPTRO.br-issue: MDC-473
lmapd_workspace_action_move() is responsible for moving (actually,
copying) action output data to the destination schedule(s).  Currently,
it would "move" subdirectories (and anything else link() would accept).

However, the whole workspace logic and layout is not prepared to deal
with directories as action workspace data: hardlinking can only work if
there isn't such a directory already at the destination, for example.
Also, you can't have the destination schedule's input queue be a parent
of anything else (in this case, action workspaces) if an action output
can have subdirectories, etc.

Change lmapd_workspace_action_move() to only work on regular files.
This will skip subdirectories, as well as any unexpected inode types.

Note: lmapd_workspace_action_clean() *does* handle subdirectories and
other inode types.

CEPTRO.br-issue: MDC-473
Skip the private namespace (directories and files prefixed by "_") when
moving action results and cleaning action workspaces.  This gives
actions a way to preserve state across separate runs of a schedule.

Hidden files are also skipped when moving an action's result to
destinations, but they are not preserved (they will be removed by
lmapd_workspace_action_clean()).

Note that actions cannot depend on the workspaces not being fully
cleaned up when lmapd is restarted or reconfigured, as such workspaces
depend on the lmapd task and schedule configuration.

CEPTRO.br-issue: MDC-1018
CEPTRO.br-issue: MDC-473
Cleanup any leftover data inside action workspaces of a schedule right
before we start executing the schedule.

Such leftover data is usually the output (possibly incomplete) of
aborted tasks due to lmapd exiting, or force-supressing the schedule.

Should these leftover files inside an action workspace be left alone and
the action run to completion, the leftover output would end up being
moved to the action's destinations.

CEPTRO.br-issue: MDC-1018
CEPTRO.br-issue: MDC-473
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant