Comments (3)
Did you try using mongodb to save your data? All you need is to install mongodb and change the pipeline.
Besides, the mongodb will use the ID of user/tweet as index and ensure they are unique. see pipelines.py
BTW, you can use mongo-hacker for better UX when looking into the data.
from tweetscraper.
I looked at the that file plenty, but that doesn't change the fact that the default pipeline is broken and there's no documentation. If you read what I wrote, I have my problem solved. I was offering to clean up the code and make a pull request. I don't need mongo to store this, I'd just have to pull everything out and dump it to disk as json anyway.
from tweetscraper.
@ckingdev - default pipeline is broken is not entirely correct, if you understood how scrapy worked. And one of the reason why DB interaction is simpler because of not having this overhead of storing data in files.
However, to make your work a bit simpler - you could have just removed the pipeline flow i.e removed yield at line numbers: 151 & 161 with file insertion logic. This would have worked for your specific use case.
Introducing this logic in the package entirely is not suggested as the DB interaction is more elegant and this design should be encouraged.
from tweetscraper.
Related Issues (20)
- Mac OS
- Error Language Setting HOT 2
- "Could not find conda environment: tweetscraper"
- can't get "gt" property in update_cookie function
- How can I change the language of the tweets? HOT 1
- Can i get all the tweets from a specify user? HOT 1
- Combining Tweet and user data HOT 1
- Can I use the TweetScraper in China?
- Can tweet language be specified in the query? HOT 1
- getting error while trying to write the data to file on pc
- HTTP status code is not handled or not allowed HOT 3
- not getting gt (gues token) cookie ob ny ubutu server
- Windows 10 WSL, Selenium setup correctly, conda tweetscraper activated, WebDriverException status 0 HOT 1
- Feature: Getting Following/Friends and Followers (and by ID)
- Feature: Scraping Twitter User account data (Bio, Following/Friend, Link)
- [question] Does this authorization token have any special significance ?
- Anything changed? HOT 1
- TypeError: __init__() got an unexpected keyword argument 'firefox_options' HOT 1
- Login
- ERROR
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tweetscraper.