Comments (4)
The latest git version adds the Twitter fields specified above, as well as UTF8MB4 support and several other features. (Commit: commit) See the upgrade manual for upgrading dmi-tcat.
Note that UTF8MB4 and the new twitter fields will only be available for new query bins, unless you run the upgrade script.
As of now MySQL 5.5.3 has become MANDATORY. Please upgrade your MySQL server version, if necessary.
New fields
The ‘_tweet’ table now has the following additional fields:
- from_user_created_at (date time) indicates when a user’s account was created
- from_user_withheld_scope (varchar) if present, it indicates whether the content being withheld is the "status" or a "user." See (documentation)
- possibly_sensitive (boolean). True if the tweet points to an url with possibly disturbing material (See explanation)
- truncated (boolean). True when this tweet is a retweet and the original text has been shortened because the added RT @user exceeded the 140 character limit
- withheld_copyright (boolean). True when the content has been withheld due to copyright infringement
(Obviously some extra information such as withheld_copyright will only show up in a cli search.php and not during real-time tracking.
See a detailed explanation for withheld and scopes here
The ‘_urls’ table now has the following additional fields:
- url_is_media_upload (boolean). True if the url points to media uploaded directly to twitter
- media_type (varchar) Null, or the type of media uploaded. can currently only have the value 'photo'
- photo_size_width (smallint) width of the picture in px
- photo_size_height (smallint) height of the picture in px
There is a new ‘_withheld’ table per bin, storing detailed information on what tweet and what users where withheld in which countries.
- tweet_id (bigint) the tweet id
- user_id (bigint) the user id
- country (varchar) ISO name of country
There is a new ‘_places’ table per bin, storing what Twitter place marker (place_id, see docs was attached to a tweet.
Search
search.php
can now be run through cron by setting the $cronjob option to true in the script parameters. The script takes care not to insert duplicate tweets. The user needs to estimate what cron settings are appropriate (depends on the volume of the query results).
For a medium-volume query you may want to run search.php every 10 minutes.
*/10 * * * * (cd /var/www/dmi-tcat/capture/search/; php search.php)
New configuration options
Optional mysql configuration options have been added to config.php (see config.php.example), e.g. USE_INSERT_DELAYED and DISABLE_INSERT_IGNORE
from dmi-tcat.
Hi,
just a quick question. If I do not need the additional fields in my existing datasets, I do not need to run the update script, right? But they will be used for new bins, anyway?
from dmi-tcat.
@supersambo that's correct. Also note that your old datasets will continue to be in UTF8 (so no emoji support) while the new datasets will be in UTF8MB4.
from dmi-tcat.
great, thanks!
from dmi-tcat.
Related Issues (20)
- Cannot query users? HOT 2
- Wordcount improvement
- uninstall HOT 1
- Export\import problem HOT 3
- import csv downloaded with academictwitteR produces bin with empty usernames HOT 10
- Visualization module(s?) fail because of unexpected collation HOT 3
- Trying to get in touch regarding a security issue HOT 2
- /capture/index.php doesnt have a response. HOT 4
- Installation on MacBook Pro M1 HOT 1
- Create archive_export.php that works with import.php HOT 10
- Docker build fails with Let's Encrypt enabled HOT 2
- no_mentions, no_tweets discrepancy HOT 17
- Installing problem: The apache2 configtest failed. HOT 4
- Klout has been deprecated
- Docker should use localhost as default SERVERNAME HOT 1
- Installation problem: TCAT Configuration Parameters loop HOT 6
- Remaining Request Stuck HOT 1
- Error on connecting to Twitter API - /1.1/statuses/filter.json HOT 1
- Only part of the scraped datasets are visible in the analysis section of DMI-TCAT HOT 4
- issue with installation on Google cloud HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dmi-tcat.