Giter Site home page Giter Site logo

Comments (4)

altanner avatar altanner commented on July 18, 2024

tweepy/tweepy#1085

from epicosm_legacy.

altanner avatar altanner commented on July 18, 2024

Need to look at finding other field with full text. Really don't want a conditional in there if we can avoid it.

from epicosm_legacy.

altanner avatar altanner commented on July 18, 2024

I've updated mongoexport to be making a csv with the fields

user.id_str,id_str,created_at,retweeted_status.full_text,full_text
that is
USER ID, TWEET ID, WHEN, CONTENTS OF RETWEET FIELD, CONTENTS OF BASE TEXT FIELD.

This loses no data (well none is lost anyway because it is all in mongodb), but it present a problem.

  1. Tweets which are actual tweets generate a blank field (last field is correct).
  2. Retweets are truncated in last field, but are in full and correct in penultimate field.
  3. You can't just move retweets into final field, because then you can't tell if it is a tweet or not. Which would require an odd shell script to sort out, since conditional export is not possible for mongoexport.
  4. Other option is for docker to contain a script, in javascript, which interacts with mongodb to say "if it IS a tweet take THIS field, if NOT take THAT field", but you are still left with the problem of tweets not having the "RT @blahblah" appending the field, so you can't tell if it is a retweet or not in the final csv.

from epicosm_legacy.

altanner avatar altanner commented on July 18, 2024

for NLP, dealt with by the function tweet_or_retweet.

In plain csv output, both fields are sent out to file, otherwise unclear.

from epicosm_legacy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.