Giter Site home page Giter Site logo

Comments (4)

techdragon avatar techdragon commented on June 19, 2024

Because of the way this library leans on itertable, this is trickier than i expected. So far I've got to subclass/replace just as many parts of itertable as data_wizard so I'm probably going to package my final fix up like the separate data_wizard.sources app since otherwise it seems like I'd just be adding a lot of otherwise useless functionality to itertable to support data_wizard in order to get data_wizard to support django-storages.

from django-data-wizard.

sheppard avatar sheppard commented on June 19, 2024

I have integrated Django Data Wizard with S3 before, but that was with very large (multi-GB) CSV files and some heavy customization to parallelize the import across several Lambda workers. For the simple case with django-storages, I think all that is needed is to get the file data from S3 into a BytesIO on the file attribute of a custom itertable class. Perhaps something like this:

# myapp/wizard.py

from itertable import ExcelFileIter  # or whichever format
from data_wizard.loaders import FileLoader

class ExcelS3Iter(ExcelFileIter):
    def load(self):
        response = s3.get_object(
            Bucket=self.bucket,
            Key=self.filename,
        )
        self.file = response['Body']  # I think this is already file-like

class S3Loader(FileLoader):
    def load_iter(self):
        return ExcelS3Iter(
            bucket=settings.AWS_STORAGE_BUCKET_NAME,
            filename=self.file.name,
        )

# myproject/settings.py
DATA_WIZARD = {
    'LOADER': 'myapp.wizard.S3Loader',
}

I believe no other customization of itertable should be necessary. That said, itertable exists primarily to support data_wizard - so anything to make that support better is a valid contribution IMO. The only caveat is that itertable shouldn't need to know anything about Django - so it could for example have an optional boto3 integration, but not django-storages specifically.

The goal with having itertable be its own library is to make it easier to test the file loading and parsing code in isolation, without worrying about the complexity introduced by data_wizard's task runner.

from django-data-wizard.

sheppard avatar sheppard commented on June 19, 2024

Actually, I think a better approach in this specific case is to update itertable.load_file() to accept arbitrary file-like objects (wq/itertable@5a47f32). Then in data_wizard it's just a matter of passing the file object directly from the storage backend to itertable (4a63066).

I haven't tested this with django-storages specifically, but it should just work. If you would like to try it out before the next release, be sure to update both itertable and data_wizard to the latest development builds.

from django-data-wizard.

sheppard avatar sheppard commented on June 19, 2024

This fix has been released.

from django-data-wizard.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.