Comments (4)
from epicosm_legacy.
Need to look at finding other field with full text. Really don't want a conditional in there if we can avoid it.
from epicosm_legacy.
I've updated mongoexport to be making a csv with the fields
user.id_str,id_str,created_at,retweeted_status.full_text,full_text
that is
USER ID, TWEET ID, WHEN, CONTENTS OF RETWEET FIELD, CONTENTS OF BASE TEXT FIELD.
This loses no data (well none is lost anyway because it is all in mongodb), but it present a problem.
- Tweets which are actual tweets generate a blank field (last field is correct).
- Retweets are truncated in last field, but are in full and correct in penultimate field.
- You can't just move retweets into final field, because then you can't tell if it is a tweet or not. Which would require an odd shell script to sort out, since conditional export is not possible for mongoexport.
- Other option is for docker to contain a script, in javascript, which interacts with mongodb to say "if it IS a tweet take THIS field, if NOT take THAT field", but you are still left with the problem of tweets not having the "RT @blahblah" appending the field, so you can't tell if it is a retweet or not in the final csv.
from epicosm_legacy.
for NLP, dealt with by the function tweet_or_retweet
.
In plain csv output, both fields are sent out to file, otherwise unclear.
from epicosm_legacy.
Related Issues (20)
- Duplicate user file is made even when empty. Fix.
- Information notice "pulling from dockerhub" needs adding.
- check docker is running locally needs to be portable
- container launched needs to handle incorrect passwords.
- Add docker restart option.
- We might need a new name>ID method, twitter is being very slow at this step now, when it used to be very fast. HOT 2
- log needs to be more comprehensive, and happen thu run time, not at exit.
- Add and test get friends. And document. HOT 1
- Need to test get following in Docker env HOT 1
- create pyexeinstaller withing baking in credentials. HOT 1
- add input() wait if detects no user list or credentials file so window doesn't auto-close.
- --refresh happens for all iterations in python executable. HOT 1
- need a stop and restart command for executable HOT 1
- groundtruth reports
- senti text fields HOT 1
- On MacOS, openssl version may be too new for MongodDB (specifically mongodump) HOT 1
- get_tweets is messy
- pathing for NLP needs tidying
- check for path of mongodb would be nice to includer
- v2API - tweet harvest count discrepancy
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from epicosm_legacy.