wtsi-hgi / shepherd Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 380 KB

Filesystem-agnostic distributed copy tool

License: GNU General Public License v3.0

Python 99.80% Shell 0.20%

shepherd's People

Contributors

Watchers

shepherd's Issues

Assumption of linear transfer time in PostgreSQL state engine

Related to #5, the PostgreSQL state engine assumes a linear relationship between transfer time and filesize. Specifically:

Transfer Time = Transfer Rate × Filesize

In reality, it is an affine relationship:

Transfer Time = (Transfer Rate × Filesize) + Overhead

The state engine ought to do linear regression on the transfer time data, rather than simple averaging, to obtain more accurate estimates.

PostgreSQL state engine can still deadlock

Despite the effort to eliminate deadlocking in the PostgreSQL state engine, it can still occur. However, it appears to be limited to only when there are no appropriate (i.e., that can fit into the time limit) tasks on the todo list.

More investigation is required...

Suspect iRODS upload options

The following options to iput have been anecdotally shown to be unstable:

Option	Description
`-T`	Renew socket connection after 10 minutes
`--retries COUNT`	Retry the `iput` in case of error; the `COUNT` input specifies the number of times to retry
`-X FILE`	Specifies that the restart option is on and the `FILE` input specifies a local file that contains the restart information
`--lfrestart FILE`	Specifies that the large file restart option is on and the `FILE` input specifies a local file that contains the restart information
`--wlock`	Use advisory write (exclusive) lock for the upload

--wlock was removed in 175398c (see irods/irods#4433)
-T has become unstable since upgrading to iRODS 4.2.7 (see irods/irods#5217)

Incorrectly establishing iRODS user and zone

A recent change in the iRODS client commands (specifically iuserinfo) is causing persistent "Permission Denied" errors, because our wrapper code is consequently not able to establish the correct user and zone.

Solution: Parse ienv output, rather than iuserinfo.

(See irods/irods#4955)

Filesystems should not be associated with jobs in the PostgreSQL state engine

By associating filesystems with jobs, the maximum concurrency of a filesystem is only respected under that job. By disassociating filesystems from everything -- under the assumption that the same unique names are used by all clients -- then multiple jobs can be run concurrently without overwhelming any one filesystem.

(Note, by doing this, transfer workers could find themselves under-utilised while multiple jobs are running concurrently.)

No checksum or size invalidation in the PostgreSQL state engine

Per d66500a, if an attempt shows a mismatch in either checksum or size, then it will be considered unsuccessful and subject to retry (modulo the maximum attempts). However, as there is no mechanism to invalidate mismatched checksums or sizes, subsequent attempts -- even if they are successful -- will continue to show a mismatched checksum or size.

Potential solutions:

Recalculate the checksum on every fetch, rather than using the stored result. This is costly at runtime.
Expose a method that invalidates checksum and size data. This is messy and (potentially) coupled to the design of the PostgreSQL interface.
Rework the schema such that checksums and sizes are associated with attempts, rather than data (files). This would be the best solution, but is also the hardest to implement.

Dummy client perpetually reincarnates itself when a task is too long for the queue

The dummy client is designed to only accept tasks that fit into the runtime limit imposed by the execution context. If there are remaining tasks to complete, but they exceed that runtime limit, then the transfer phase workers will wake up, decided there's nothing they can do in the time available, kill themselves, then resurrect themselves to continue this Sisyphean cycle. (In many ways, this is quite amusing.)

It should be noted that the overhead of transferring files can be relatively large and, for small files, this is significant and it skews the transfer rate calculation. This therefore impacts the runtime estimate of larger files, where the overhead becomes negligible -- anecdotally, at least in our production environment, by a factor of 30+ -- making the above cycle even more likely (despite it now also being erroneous).

Dummy client's resume function doesn't work

When trying to resume a job with the dummy client, it fails to initialise the job status (implemented per the PostgreSQL state engine).

Also, the conditions for restart in the client conflict with those at the library level.

Thundering Herd

The workers are a bit aggressive against the PostgreSQL database, with their advisory locking, such that the database becomes a contended resource. This is particularly acute when there are many, small files (i.e., when DB lookups are frequent). This is a classic thundering herd problem.

Possible solutions:

The advisory locking is only really needed when looking up the shared todo view and then writing the corresponding attempt record. IIRC, this was the original design, but it wasn't perfect and ran in to deadlocks; to save time (i.e., incur technical debt), the locking was applied to every transaction (see #7). Using a more intelligent locking scheme would mitigate the problem.
The attempt fetching routine, which currently runs in a tight loop, could have jittered backoff applied, so not all workers are competing at once.
The attempt fetching routine currently fetches the next, single task (that will fit in the run time limit). Perhaps it could instead return N (10, say), where their total estimated time fits into the limit, to reduce DB requests.
Investigate lock-free data structures which could be applied to the todo view, such that no locking is required (e.g., partitioning of todo, etc.)

Add tasks defensively

The dummy CLI needs to be made more defensive as it will crash when adding transfer tasks in (at least) the following cases:

File doesn't exist
Should log and skip/fail.
File cannot be accessed due to permissions
All files must be readable by the user running the CLI; otherwise, should log and fail/skip.
File cannot be accessed due to corruption
If a file cannot be statd, for whatever reason; these should be logged and (probably) fail.
File is not unique
At least with the PostgreSQL backend, this raises a constraint violation (by design); these can be logged and skipped.

If this preparation stage fails, then the transfer phase should also be cancelled.

Edge case to the route transformation design

You are expecting all input paths to look like this:

/something/something/something/PROJECT/.vault/.staged/ENCODED/PATH

…where you’re presumably extracting the PROJECT part and decoding the ENCODED/PATH part (amongst other things, like the Lustre volume, to avoid collisions), per the design. This is fine and will work in general, with the one exception (at Sanger) of hgi. Because hgi owns the infrastructure, the root of its subtree -- which is where the vault gets placed -- is a bit higher and will be different on each of our current Lustre volumes. Specifically, the PROJECT value, following the above schema, will equal the following for any hgi team vaults:

Lustre Volume	`PROJECT` Value
`scratch114`	`scratch114`
`scratch115`	`scratch115`
`scratch118`	`humgen`
`scratch119`	`mdt3`

One way you might handle this is that, instead of simply extracting PROJECT from the paths you are fed, you could do group(/something/something/something/PROJECT) (i.e., check the group owner of the path prefix ending in PROJECT). This would work for hgi and all other project directories, but it would break for team directories as they’re named things like soranzo and hurles, but their groups are team151 and team29, respectively (etc.). I would therefore suggest encoding the exceptions as a mapping, read in from some config file, and doing something like:

group = project_prefix.group()             # project_prefix = /something/something/something/PROJECT (extracted from source)
project = group_mapping.get(group, group)  # i.e., fallback to the group name if the mapping doesn't exist
# etc.
target = some_root / project / some_disambiguator / decoded_path

Here, group_mapping would be a dictionary, populated from a config file, that looks like, say:

{
   "team151": "soranzo",
   "team29": "hurles",
   # etc.
}

Installation import issues

Hola!

The CLI version import in setup.py (https://github.com/wtsi-hgi/shepherd/blob/develop/setup.py#L23) causes an import of cli/dummy.py (https://github.com/wtsi-hgi/shepherd/blob/develop/cli/__init__.py#L3), which in turn starts importing everything. There becomes a requirement on Jinja2, which is problematic because at this stage, it hasn't yet been installed.

Tag a file once it has been transferred and verified

Suggested by @gn5: Once a file has been successfully transferred and verified, an additional piece of metadata can be added to the file (if the filesystem supports it) to inform end users that the transfer has completed and verified successfully.

This could be implemented as a route transformer (lib.planning.types.RouteScriptTransformation) or simply as part of a transfer route definition (e.g., in the current, dummy implementation: lib/planning/templates/posix_to_irods.sh.j2).

wtsi-hgi / shepherd Goto Github PK

shepherd's People

Contributors

Watchers

shepherd's Issues

Assumption of linear transfer time in PostgreSQL state engine

PostgreSQL state engine can still deadlock

Suspect iRODS upload options

Incorrectly establishing iRODS user and zone

Filesystems should not be associated with jobs in the PostgreSQL state engine

No checksum or size invalidation in the PostgreSQL state engine

Dummy client perpetually reincarnates itself when a task is too long for the queue

Dummy client's resume function doesn't work

Thundering Herd

Add tasks defensively

Edge case to the route transformation design

Installation import issues

Tag a file once it has been transferred and verified

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent