When copying projects at scale (currently only 10-20...), storage might enter a bad st

I will also add here a few things to think about: have storage

I also discussed this with <a class="user-mention notranslate" data-hovercard-type="us

I also discussed this with <a class="user-mention notranslate" data-hover

(I was mostly discussing with <a class="user-mention notranslate" data-hovercard-type=

(I was mostly discussing with <a class="user-mention notranslate" data-ho

Storage: refactor copying of projects about osparc-simcore HOT 11 CLOSED

sanderegg commented on July 27, 2024

Storage: refactor copying of projects

from osparc-simcore.

Comments (11)

sanderegg commented on July 27, 2024 1

I will also add here a few things to think about:

have storage use a separate DB than the rest of simcore platform (as scaling them will invariably hit the limit of the number of clients on postgres)
totally ditch storage database
use AWS ElasticCache and check the best practices listed on that page: https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/using-caching-for-frequently-accessed-content.html

from osparc-simcore.

bisgaard-itis commented on July 27, 2024

Here's some food for thought: Several of the issues we encounter with the metamodeling comes from the fact that we have to copy/clone an entire project when running a study. This means that running a study job (in the sense of the api-server) creates a clone of the project in s3. This can potentially lead to a huge amount of data and moreover it is quite inefficient. Even if we improve storage drastically, this will always be a very heave operation. One reason why we need this is that our projects are not version controlled. If our projects were version controlled one would simply specify a commit with which the project should run. Note that cloning the projects is also the source of #5803. I suspect that the cloning of projects will continue to bite our 🍑. Hence, I suggest we consider introducing a version control system for our projects.

from osparc-simcore.

wvangeit commented on July 27, 2024

I also discussed this with @matusdrobuliak66 a bit this week. An other option is to have some filesystem-based snapshot mechanism that we would use.
If we could e.g. just mount a snapshot of a study, things would be fairly easy. That way we could also be a read-only mount in case one wants to investigate a study somebody else has open.

from osparc-simcore.

bisgaard-itis commented on July 27, 2024

I also discussed this with @matusdrobuliak66 a bit this week. An other option is to have some filesystem-based snapshot mechanism that we would use. If we could e.g. just mount a snapshot of a study, things would be fairly easy. That way we could also be a read-only mount in case one wants to investigate a study somebody else has open.

Here you mean mount when running the job? Because I think the whole problem is that we don't have a version control system, so there is not really such a thing as a "snapshot" (unless I am mistaken). Currently I think there is just "the latest version". So if someone goes and changes a file while your map service is submitting jobs then the jobs will not be running the expected simulations.

from osparc-simcore.

wvangeit commented on July 27, 2024

Well yeah, when you start a job, you create a snapshot of the template's disk space, and run your job from that one.

from osparc-simcore.

wvangeit commented on July 27, 2024

(I was mostly discussing with @matusdrobuliak66 because when picking a networked file system it could be good to check if it would have such capabilities or not)

from osparc-simcore.

bisgaard-itis commented on July 27, 2024

(I was mostly discussing with @matusdrobuliak66 because we picking a networked file system it could be good to check if it would have such capabilities or not)

OK, interesting idea. I suspect it would be more flexible to actually version control our projects. But if the file system effort would fit with this then that would of course be very nice

from osparc-simcore.

wvangeit commented on July 27, 2024

The question is what you define as 'version control'. You mean git or so?
Since these projects could potentially be huge with lots of binary files, i would do it at a lower level. But anyhow, to be discussed.

from osparc-simcore.

bisgaard-itis commented on July 27, 2024

The question is what you define as 'version control'. You mean git or so? Since these projects could potentially be huge with lots of binary files, i would do it at a lower level. But anyhow, to be discussed.

The tool I know for doing this with large amounts of data is https://github.com/iterative/dvc. We already use it in the Sim4Life desktop team. This would essentially be a replacement for RClone which we currently use to synchronize data in the running container with S3. The way DVC is intended to be used is with Git (but I think it can be used independently). It adds every "large" file to the .gitignore and computes a small file with metadata which is commited into the Git repo. It then backups the large file to S3. Among other things this metadata file holds a checksum of the large file.

Of course this solution will never bring the same performance you can probably get by mounting a file system. Let's discuss next week.

Tbh I don't really mind if it would be DVC or another tool. I know there are good alternatives for combining Git with large files and DVC is just the one I know, so that's why I mention that one. For me the key feature which would bring several big advantages would be to use proper version control. One of them would be to get rid of the cloning of projects when submitting jobs in metamodeling, but another would be from a user perspective, to just have different branches with code/data and being able to easily revert changes. One nice thing about DVC is that this offers essentially the functionality you would hope to have from Git. E.g. you can do a dvc diff to check what your changes are. Currently, if you have two versions of your project in osparc that is not easy to achieve. To me the fact that we can't pin a version of a project (a snapshot) is a big limitation and I am convinced we would find other places in the code where this would be a big advantage. In general I think the fact that we have to lock the db/redis whenever we touch a file is unfortunate. My understanding is that we do that because we are afraid a user with access to it could modify it. With version control we could probably get rid of a lot of that and, with the profiling I have done with the metamodeling in the last couple of weeks, I think that would be a big advantage.

from osparc-simcore.

sanderegg commented on July 27, 2024

Before we start changing technology, I would be very happy that we list what the requirements are and what we want to achieve, also it should be backwards compatible. I am against making special ways just for the public API if it does fulfil what the current GUI does. That means, that the current tech stack must be migrated in any case.
Nevertheless I find very problematic now that we are running behind issues because we totally opened the gates without limits in production.

Anyway, I am not sure I understand whether the DVC is really needed here, S3 is made for large files already. I think the issues currently are:

access to the database when copying that is locked while copying is going on (and this is due to the multipart upload), which means that you cannot copy more than 12 projects at one time,
we could also have a copy-on-modify policy,

from osparc-simcore.

bisgaard-itis commented on July 27, 2024

Before we start changing technology, I would be very happy that we list what the requirements are and what we want to achieve, also it should be backwards compatible. I am against making special ways just for the public API if it does fulfil what the current GUI does. That means, that the current tech stack must be migrated in any case. Nevertheless I find very problematic now that we are running behind issues because we totally opened the gates without limits in production.

Anyway, I am not sure I understand whether the DVC is really needed here, S3 is made for large files already. I think the issues currently are:
* access to the database when copying that is locked while copying is going on (and this is due to the multipart upload), which means that you cannot copy more than 12 projects at one time,

* we could also have a copy-on-modify policy,

Makes sense. Let's discuss in the meeting today

from osparc-simcore.

Storage: refactor copying of projects about osparc-simcore HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent