Problem
ls()
is the central mechanism to asset management at Mindbender. It works by (1) scanning the filesystem for assets and (2) delivering them as dictionaries of data as defined by their respective “schema”.
from mindbender import api
for asset in api.ls():
print(asset["name"])
Each dictionary encapsulates all information about a given asset, including (1) versions, (2) subsets, (3) representations and ultimately (4) files.
The API is straightforward.
api.register_root("/path/to/assets")
- Call
ls()
to read its contents.
At the moment, this reading of content occurs individually on each client - e.g. an artist computer. This means that (1) an artist requests a listing, (2) ls() is invoked and requests a listing via the OS, (3) the OS makes several (four) requests for a directory listing over a network per asset and (4) have the resulting data formatted as per their schema.
At 10 artists working simultaneously on small projects, even with the subpar network performance already present at Mindbender, the amount of listings could peak at ~2000/second (50 assets 10 artists 4 levels), but the bottleneck is likely still not disk performance and likely need not be of concern.
However, as crew and projects get larger, and as additional strains are added to the filesystem (e.g. sync), the impact on performance may start to become noticeable.
Solution
In this case, I have a suggestion for one achievable method that may alleviate this burden up to an estimated 100+ artists and feature film sized projects.
It consists of (1) a client and (2) a server.
A dedicated computer hosts an “ls service”. This service takes requests for asset listings and reports back.
ls() is retargeted to make requests to this service, as opposed to querying the networked filesystem directly.
The service may then (1) reside directly on the computer closest to the filesystem (in this case the NAS) and (2), most crucially, cache prior requests for listings.
The caching mechanism can in the simplest case hand pre-made results back without a roundtrip to the filesystem and update the cache only at a given interval - such as every 10-30 seconds.
This means the filesystem, no matter how many requests are made, are never bothered more than once every 10-30 seconds.
The immediate side-effect of this caching, as caching of any kind, is data going stale. That is, the artist not receiving the latest information when it was expected. As a solution to this, the caching mechanism could be integrated with publishing or the native journalling mechanism of the operating system, such that it rarely ever has to perform a “full scan” but instead mostly updates parts that change.
For example, in a given cache made 30 seconds ago, a new asset is published and makes a request to the ls service, saying “Hey, I just added ‘AssetA’ to this location”. The caching mechanism then updates it’s internal cache, without having to roundtrip to the filesystem.
Remote
Once the above service is implemented, an additional benefit arises. Namely that if local artists can query a single source of information for assets, so can remote artists.
Practically, anyone working remotely could send a request to the same server of a listing of assets. Once choosing an asset, his computer could then evaluate whether (1) it is already available locally or (2) whether to “make it available offline”.
Due to caching, requests made to the service would be virtually free, enabling both local and remote computer to poll it for updates. That is, whenever a new asset is made available, the artist could receive a notification - such as a balloon popup in the task bar.