Giter Site home page Giter Site logo

shepherd's People

Contributors

xophmeister avatar

Watchers

 avatar  avatar  avatar

shepherd's Issues

Assumption of linear transfer time in PostgreSQL state engine

Related to #5, the PostgreSQL state engine assumes a linear relationship between transfer time and filesize. Specifically:

Transfer Time = Transfer Rate × Filesize

In reality, it is an affine relationship:

Transfer Time = (Transfer Rate × Filesize) + Overhead

The state engine ought to do linear regression on the transfer time data, rather than simple averaging, to obtain more accurate estimates.

PostgreSQL state engine can still deadlock

Despite the effort to eliminate deadlocking in the PostgreSQL state engine, it can still occur. However, it appears to be limited to only when there are no appropriate (i.e., that can fit into the time limit) tasks on the todo list.

More investigation is required...

Suspect iRODS upload options

The following options to iput have been anecdotally shown to be unstable:

Option Description
-T Renew socket connection after 10 minutes
--retries COUNT Retry the iput in case of error; the COUNT input specifies the number of times to retry
-X FILE Specifies that the restart option is on and the FILE input specifies a local file that contains the restart information
--lfrestart FILE Specifies that the large file restart option is on and the FILE input specifies a local file that contains the restart information
--wlock Use advisory write (exclusive) lock for the upload

Incorrectly establishing iRODS user and zone

A recent change in the iRODS client commands (specifically iuserinfo) is causing persistent "Permission Denied" errors, because our wrapper code is consequently not able to establish the correct user and zone.

Solution: Parse ienv output, rather than iuserinfo.

(See irods/irods#4955)

Filesystems should not be associated with jobs in the PostgreSQL state engine

By associating filesystems with jobs, the maximum concurrency of a filesystem is only respected under that job. By disassociating filesystems from everything -- under the assumption that the same unique names are used by all clients -- then multiple jobs can be run concurrently without overwhelming any one filesystem.

(Note, by doing this, transfer workers could find themselves under-utilised while multiple jobs are running concurrently.)

No checksum or size invalidation in the PostgreSQL state engine

Per d66500a, if an attempt shows a mismatch in either checksum or size, then it will be considered unsuccessful and subject to retry (modulo the maximum attempts). However, as there is no mechanism to invalidate mismatched checksums or sizes, subsequent attempts -- even if they are successful -- will continue to show a mismatched checksum or size.

Potential solutions:

  • Recalculate the checksum on every fetch, rather than using the stored result. This is costly at runtime.
  • Expose a method that invalidates checksum and size data. This is messy and (potentially) coupled to the design of the PostgreSQL interface.
  • Rework the schema such that checksums and sizes are associated with attempts, rather than data (files). This would be the best solution, but is also the hardest to implement.

Dummy client perpetually reincarnates itself when a task is too long for the queue

The dummy client is designed to only accept tasks that fit into the runtime limit imposed by the execution context. If there are remaining tasks to complete, but they exceed that runtime limit, then the transfer phase workers will wake up, decided there's nothing they can do in the time available, kill themselves, then resurrect themselves to continue this Sisyphean cycle. (In many ways, this is quite amusing.)

It should be noted that the overhead of transferring files can be relatively large and, for small files, this is significant and it skews the transfer rate calculation. This therefore impacts the runtime estimate of larger files, where the overhead becomes negligible -- anecdotally, at least in our production environment, by a factor of 30+ -- making the above cycle even more likely (despite it now also being erroneous).

Thundering Herd

The workers are a bit aggressive against the PostgreSQL database, with their advisory locking, such that the database becomes a contended resource. This is particularly acute when there are many, small files (i.e., when DB lookups are frequent). This is a classic thundering herd problem.

Possible solutions:

  1. The advisory locking is only really needed when looking up the shared todo view and then writing the corresponding attempt record. IIRC, this was the original design, but it wasn't perfect and ran in to deadlocks; to save time (i.e., incur technical debt), the locking was applied to every transaction (see #7). Using a more intelligent locking scheme would mitigate the problem.

  2. The attempt fetching routine, which currently runs in a tight loop, could have jittered backoff applied, so not all workers are competing at once.

  3. The attempt fetching routine currently fetches the next, single task (that will fit in the run time limit). Perhaps it could instead return N (10, say), where their total estimated time fits into the limit, to reduce DB requests.

  4. Investigate lock-free data structures which could be applied to the todo view, such that no locking is required (e.g., partitioning of todo, etc.)

Add tasks defensively

The dummy CLI needs to be made more defensive as it will crash when adding transfer tasks in (at least) the following cases:

  • File doesn't exist
    Should log and skip/fail.
  • File cannot be accessed due to permissions
    All files must be readable by the user running the CLI; otherwise, should log and fail/skip.
  • File cannot be accessed due to corruption
    If a file cannot be statd, for whatever reason; these should be logged and (probably) fail.
  • File is not unique
    At least with the PostgreSQL backend, this raises a constraint violation (by design); these can be logged and skipped.

If this preparation stage fails, then the transfer phase should also be cancelled.

Edge case to the route transformation design

You are expecting all input paths to look like this:

/something/something/something/PROJECT/.vault/.staged/ENCODED/PATH

…where you’re presumably extracting the PROJECT part and decoding the ENCODED/PATH part (amongst other things, like the Lustre volume, to avoid collisions), per the design. This is fine and will work in general, with the one exception (at Sanger) of hgi. Because hgi owns the infrastructure, the root of its subtree -- which is where the vault gets placed -- is a bit higher and will be different on each of our current Lustre volumes. Specifically, the PROJECT value, following the above schema, will equal the following for any hgi team vaults:

Lustre Volume PROJECT Value
scratch114 scratch114
scratch115 scratch115
scratch118 humgen
scratch119 mdt3

One way you might handle this is that, instead of simply extracting PROJECT from the paths you are fed, you could do group(/something/something/something/PROJECT) (i.e., check the group owner of the path prefix ending in PROJECT). This would work for hgi and all other project directories, but it would break for team directories as they’re named things like soranzo and hurles, but their groups are team151 and team29, respectively (etc.). I would therefore suggest encoding the exceptions as a mapping, read in from some config file, and doing something like:

group = project_prefix.group()             # project_prefix = /something/something/something/PROJECT (extracted from source)
project = group_mapping.get(group, group)  # i.e., fallback to the group name if the mapping doesn't exist
# etc.
target = some_root / project / some_disambiguator / decoded_path

Here, group_mapping would be a dictionary, populated from a config file, that looks like, say:

{
   "team151": "soranzo",
   "team29": "hurles",
   # etc.
}

Tag a file once it has been transferred and verified

Suggested by @gn5: Once a file has been successfully transferred and verified, an additional piece of metadata can be added to the file (if the filesystem supports it) to inform end users that the transfer has completed and verified successfully.

This could be implemented as a route transformer (lib.planning.types.RouteScriptTransformation) or simply as part of a transfer route definition (e.g., in the current, dummy implementation: lib/planning/templates/posix_to_irods.sh.j2).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.