Giter Site home page Giter Site logo

Comments (3)

jonbakerfish avatar jonbakerfish commented on July 17, 2024

Did you try using mongodb to save your data? All you need is to install mongodb and change the pipeline.
Besides, the mongodb will use the ID of user/tweet as index and ensure they are unique. see pipelines.py

BTW, you can use mongo-hacker for better UX when looking into the data.

from tweetscraper.

ckingdev avatar ckingdev commented on July 17, 2024

I looked at the that file plenty, but that doesn't change the fact that the default pipeline is broken and there's no documentation. If you read what I wrote, I have my problem solved. I was offering to clean up the code and make a pull request. I don't need mongo to store this, I'd just have to pull everything out and dump it to disk as json anyway.

from tweetscraper.

grandimam avatar grandimam commented on July 17, 2024

@ckingdev - default pipeline is broken is not entirely correct, if you understood how scrapy worked. And one of the reason why DB interaction is simpler because of not having this overhead of storing data in files.

However, to make your work a bit simpler - you could have just removed the pipeline flow i.e removed yield at line numbers: 151 & 161 with file insertion logic. This would have worked for your specific use case.

Introducing this logic in the package entirely is not suggested as the DB interaction is more elegant and this design should be encouraged.

from tweetscraper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.