Giter Site home page Giter Site logo

Comments (5)

jensens avatar jensens commented on August 22, 2024 1

Streaming sounds like a good plan. Getting a copy of a site by something like fetching a stream from a source and piping it to the target would be awesome (simplified like):

curl https://my-source-plone.tld/@export  -H "Accept: application/json" | curl -H "Content-Type: application/json" -X POST --data-binary @- https://my-targe-plone.tld/@import

This could work for folders as well (given relations do not point outside this folder)

from collective.exportimport.

djay avatar djay commented on August 22, 2024

the tar looks possible.

  • create a tarfile(fileobj=response) to stream the tar
  • still write the content json to a temp file
  • annote the request with the tarfile
  • use another set of serializers that write their blob data into a tarfile and write the path to that blob into the json. you could use content path/field/filename to make it more meaningful path.
  • write the content json from the temp file into the tar
  • write any other json exporters into the tar
    This should stream a compressed tar pretty efficiently.

But I guess uploading it is never going to stream so would be unrealistic way for blobs for a large site.

from collective.exportimport.

fredvd avatar fredvd commented on August 22, 2024

Hi Dylan,

Thanks for starting this issue. I promised Philip to create to separate issues 2 weeks ago, but work, flue 'light' and PloneConf preparation got into the way. I had to rush a bit over my last section in the slide during my talk at PloneConf, had some slides in there as well about generating a generic bundle format. We have to take into account there are 4-5 different but related issues at play here, with also different use cases.

  1. Improving the output/input form UI
  2. Making more specific selection of output possible (path/criteria)
  3. Creating an 'bundled' export format to export both content-tree and multiple 'other/metadata' data items
  4. Importing content from the bundle format in one go, but fixing the chicken&egg problem with 2 pass import

It's a lot to talk about at once, but when you improve the UI, there's direclty the question of how the export will be.

I like the idea of using tar.gz to combine the .xml's in one bundle, both for content-tree and metadata items. If you select only one item in the UI you could return it as a .gz file. OTOH, most metadata files aren't large, http has gzip transport nowadays and it's extra compress/uncompress.

The risk of dropping separate files in a tar.gz is that you cannot 'streaming upload and import the content. Oh wait I'm wrong, TAR is off course Tape Archive Format, so you can read and uncompress it as a stream :-) (Thank you @jensens ) We just shouldn't use zip he says.

So one idea Philip and I discussed is move back from exporting the metadata in separate json files and put them back in the object structure of each individual item. BUT those poperties on import have to be stored in a separate import queue and imported/processed on the second pass. This would then become our preferred bundle format.

Relations, default_page, objPosinParent, localroles/ownership, portlets. etc. The relationvalues would however have to be converted to a UUID structure per exported content item as the internal implemenation of that catalog for performance is intids. And those adapters are not yet created. [look up ticket/issue on some other repo from 1.5 years ago]

There are still extra metadata things outside the contenttree like members and groups. For sites without external user/group sources (ldap) you'd want to disable members in the export UI. Also there is no harm at setting ownership user id's on content without having the users already.

Another edge case would be /group/context/user portlets that still need there own lists in the json stream after the main contenttree.

The trickiest part which such a format is possible relation values on Persistent Tiles and portlets. We'll have to come up with a json substructure on each context/item export that stores the relationValues as UUID/type and on importing the contenttree stream puts those values in the 2nd pass import list and restores them in the tile/portlets annotations on those contexts.

I had this ticket open all day yesterday and added paragraphs but forgot to post it in the end :-$

from collective.exportimport.

djay avatar djay commented on August 22, 2024

I've change this issue to a single export format since it is the thing that most things are hinging on.
using tar streaming is def possible and would allow blobs to be more naturally included in a export for small sites. I outlined how above. I was also thinking that if a single json file was selected there would be no tar, just the json uncompressed.

but then more I thought about it, putting all the json formats into a single meta json file and even including base64 encoded blobs is perhaps not significantly different from having a tar seperate json files and blobs?
Both are possible while I don't think its good to give the user too many choice that make no difference, its also possible to provide both options.

The reason I'd tend towards a tar combined format is that I'd still like to move this plugin to both a migration tool and an end user import/sync tool. I'd like to get to the point where its possible for example for someone to create a dir structure locally with pdfs in, tars it and then they upload it and exportimport creates all those files inside plone for them.

from collective.exportimport.

Rudd-O avatar Rudd-O commented on August 22, 2024

zfs send | zfs recv for Plone. I like it!

from collective.exportimport.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.