Giter Site home page Giter Site logo

Comments (13)

AkihiroSuda avatar AkihiroSuda commented on June 28, 2024 2

Oh sorry for leaving the PR #88 for months..
I'll be work on the PR today

from continuity.

stevvooe avatar stevvooe commented on June 28, 2024

Looks like a decent proposal:

  • let's follow the right to left convention, ie Copy(dst, src string), since we've followed that here in other contexts.
  • I'm not sure if a full manifest build is necessary, but the extra work is negligible and can be hidden behind the function call.
  • How will this integrate with drivers? I don't Resource needs a copyable method, but we need to push this down to the driver, so they can make efficient decisions. Ideally, we'd have a CopyTree function (or some clever naming) at the root that can be used in common cases and then one can fall back to drivers for more complex uses (I am copying from VM remote to some specialized thing).

from continuity.

ijc avatar ijc commented on June 28, 2024

Ack to your first two bullets.

Not sure I quite follow the third. By "I don't Resource needs a copyable method" did you mean "I don't think Resource needs a Copy method" or "I don't think Resource needs a Copyable interface"?

I think perhaps you are saying both, since Resource doesn't have visibility onto a Driver. That implies we should add the copying for each resource type to the toplevel context.CopyTree you propose which would then know how to use the drivers to duplicate each kind of Resource?

Hrm, might we want different drivers for source (implied in the context) and destination (would need to be passed?) Perhaps destination should also be a(n empty) Context?

from continuity.

stevvooe avatar stevvooe commented on June 28, 2024

Not sure I quite follow the third. By "I don't Resource needs a copyable method" did you mean "I don't think Resource needs a Copy method" or "I don't think Resource needs a Copyable interface"?

"I don't think Resource needs a Copyable interface"

Resource is not supposed to be bound to an environment, although it may draw on resources, such as disk storage.

I'm not sure how tightly we should integrate with drivers. Perhaps, only as much is needed to avoid repeating logic in getting resources from a given environment.

I guess that would mean there is a version that takes paths and instantiates the environments (contexts) from paths and another that works solely from environments. Does that make sense?

from continuity.

ijc avatar ijc commented on June 28, 2024

I take it your use of "environment" assume #78 has happened and will try to interpret with that in mind and use that term throughout.

So I think you are proposing two functions. First a

func Duplicate(dstroot, srcroot string) error

That would build an Environment (nÊe Context) for srcroot and dstroot and then call our other new function:

func DuplicateEnvironment(dst, src, Environment) error

Or should this be:

func (src *Environment) Duplicate(dst Environment) error

In either case this will then iterate over the Resources in src and call their Copy methods in turn.

My original proposal had those Copy methods having this signature:

Copy(srcroot, dstroot string) error

Which makes it impossible to integrate with drivers at all. Maybe this is signposting me towards the idea that the duplication should be happening in DupicateEnvironment (with a big type switch) rather than pushing down via a new Copy method.

To do this you would need to be able to "readish" operations on src.driver and "writeish" operations on dst.driver. Since driver is private in Environment either one or both of those is going to be inaccessible in any particular method. The first option I see there is to add Driver method to Environment to expose it (or to make the member public). WDYT? I'm not immediately enamoured with that idea.

An alternative model just occurred to me:

func (src *Environment) Duplicate(dst string) (*Environment, error)
func (src *Environment) DuplicateWithOptions(dst string, options EnvironmentOptions) (*Environment, error)

i.e. take the dst as a string and return a new Environment referencing the newly duplicated tree. Now we have access to src.driver since it is a method on src and we can have access to the dst driver because we are creating it and can hold onto it.

from continuity.

stevvooe avatar stevvooe commented on June 28, 2024

@ijc Yes, I am using "environment" because "context" is confusing in the context of the existence of the context package.

That proposal looks good. As far as driver access goes, I think it is reasonable to expose something on environment to allow implementations to access platforms directly for performance optimizations.

One other thing to consider is the behavior: do this "sync" (match the set) or "copy" (destination is now a superset of both src and dst). Are they destructive?

from continuity.

ijc avatar ijc commented on June 28, 2024

That proposal looks good.

There was at least 3 in #95 (comment) ;-) Do you have a favourite?

One other thing to consider is the behavior: do this "sync" (match the set) or "copy" (destination is now a superset of both src and dst). Are they destructive?

In the name of simplicity (and because I'm in inherently lazy) I'd prefer to mandate that the destination not exist at all upon entry for now. If we come up with a need to "overlay" src onto dst that should be doable in a backwards compatible way I think.

from continuity.

stevvooe avatar stevvooe commented on June 28, 2024

There was at least 3 in #95 (comment) ;-) Do you have a favourite?

There all my favo[u]rite! I guess I was saying the general direction looks good.

I think the focus should be on the top-level, path-based function Duplicate (maybe CopyTree or SyncTree, we can bike shed it later). Pushing down some functionality to the environment seems like a good implementation detail.

I think the main implementation would be something like this:

func Duplicate(dst, src string) error {
  return DuplicateEnvironment(local.New(dst), local.New(src))
}

Basically, its a helper for doing the easy thing and then you can use the other parts when something special is required.

However, I do see the benefit of making the "environment" one-sided, in that you sync to a local directory, but this may have limitations if you are syncing two environments that are only accessible from their drivers.

Stepping back, I may have incorrectly rejected one of the original ideas to have this on the manifest, because something like this makes a lot of sense:

func (m *Manifest) Apply(env Environment, provider Provider)

The above would let you apply the manifest to an environment, retrieving the content from the given provider, which might let you get direct access to fds for using things like CopyFileRange. We may already have a concept similar to the above, however, it would be nice to avoid the creation of the manifest, as hashing the whole filesystem is expensive when your syncing to an empty directory tree.

@dmcgowan Any thoughts here?

from continuity.

ijc avatar ijc commented on June 28, 2024

it would be nice to avoid the creation of the manifest, as hashing the whole filesystem is expensive when your syncing to an empty directory tree.

I'm unclear if hashing is a necessary part of the hardlink stuff. The hardlinkManager doesn't hash but the hlm.Merge calls the package-level Merge which does seem to pay some attention to the digests.

So it's not the Manifest as such which causes all the hashing but rather the fact that it creates all the Resources. While we might be able to avoid building the Manifest I'm not so sure we can avoid creating Resource objects, can we?

I think we certainly want to DTRT with hardlinks in this functionality.

from continuity.

ijc avatar ijc commented on June 28, 2024

I finally started looking into implementing this week (before I got distracted again)... It looks like continuity.ApplyManifest already covers most of the use case here, in my copious free time I'll look into massaging it to suite my needs, likely by coercing ./bin/continuity apply to be able to dup a manifest+source tree in the first instance.

One thing which seems to be missing is timestamps, adding that would require reving the protobuf descriptions I think. Need to think a bit harder about whether that is a requirement for my use case (copy up from underlying image into a volumes for cri-containerd) at first glance I think it probably is...

from continuity.

stevvooe avatar stevvooe commented on June 28, 2024

One thing which seems to be missing is timestamps

I think we need to add timestamsp, regardless. cc @AkihiroSuda

I know there was a PR for this.

from continuity.

stevvooe avatar stevvooe commented on June 28, 2024

@AkihiroSuda I think we were undecided on #88 but I think it is clearly a necessity now.

from continuity.

ijc avatar ijc commented on June 28, 2024

My use case for this was to achieve the same thing which containerd/cri#535 did much more simply.

Given that I no longer have a pressing need for this functionality. Probably this issue can be closed, unless you would prefer to keep it around for the design discussion should this need arise again in a different context?

from continuity.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤ī¸ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.