Giter Site home page Giter Site logo

Comments (4)

dentoir avatar dentoir commented on August 16, 2024

The latest git version adds the Twitter fields specified above, as well as UTF8MB4 support and several other features. (Commit: commit) See the upgrade manual for upgrading dmi-tcat.

Note that UTF8MB4 and the new twitter fields will only be available for new query bins, unless you run the upgrade script.

As of now MySQL 5.5.3 has become MANDATORY. Please upgrade your MySQL server version, if necessary.

New fields

The ‘_tweet’ table now has the following additional fields:

  • from_user_created_at (date time) indicates when a user’s account was created
  • from_user_withheld_scope (varchar) if present, it indicates whether the content being withheld is the "status" or a "user." See (documentation)
  • possibly_sensitive (boolean). True if the tweet points to an url with possibly disturbing material (See explanation)
  • truncated (boolean). True when this tweet is a retweet and the original text has been shortened because the added RT @user exceeded the 140 character limit
  • withheld_copyright (boolean). True when the content has been withheld due to copyright infringement

(Obviously some extra information such as withheld_copyright will only show up in a cli search.php and not during real-time tracking.
See a detailed explanation for withheld and scopes here

The ‘_urls’ table now has the following additional fields:

  • url_is_media_upload (boolean). True if the url points to media uploaded directly to twitter
  • media_type (varchar) Null, or the type of media uploaded. can currently only have the value 'photo'
  • photo_size_width (smallint) width of the picture in px
  • photo_size_height (smallint) height of the picture in px

There is a new ‘_withheld’ table per bin, storing detailed information on what tweet and what users where withheld in which countries.

  • tweet_id (bigint) the tweet id
  • user_id (bigint) the user id
  • country (varchar) ISO name of country

There is a new ‘_places’ table per bin, storing what Twitter place marker (place_id, see docs was attached to a tweet.

Search

search.php can now be run through cron by setting the $cronjob option to true in the script parameters. The script takes care not to insert duplicate tweets. The user needs to estimate what cron settings are appropriate (depends on the volume of the query results).

For a medium-volume query you may want to run search.php every 10 minutes.

*/10 * * * * (cd /var/www/dmi-tcat/capture/search/; php search.php)

New configuration options

Optional mysql configuration options have been added to config.php (see config.php.example), e.g. USE_INSERT_DELAYED and DISABLE_INSERT_IGNORE

from dmi-tcat.

supersambo avatar supersambo commented on August 16, 2024

Hi,
just a quick question. If I do not need the additional fields in my existing datasets, I do not need to run the update script, right? But they will be used for new bins, anyway?

from dmi-tcat.

ErikBorra avatar ErikBorra commented on August 16, 2024

@supersambo that's correct. Also note that your old datasets will continue to be in UTF8 (so no emoji support) while the new datasets will be in UTF8MB4.

from dmi-tcat.

supersambo avatar supersambo commented on August 16, 2024

great, thanks!

from dmi-tcat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.