Giter Site home page Giter Site logo

digitalmethodsinitiative / dmi-tcat Goto Github PK

View Code? Open in Web Editor NEW
366.0 43.0 116.0 4.64 MB

Digital Methods Initiative - Twitter Capture and Analysis Toolset

License: Apache License 2.0

PHP 93.93% CSS 0.29% JavaScript 1.41% Python 0.47% Shell 3.86% Dockerfile 0.05%

dmi-tcat's Introduction

DMI TCAT

Docker Image CI

The Digital Methods Initiative Twitter Capture and Analysis Toolset (DMI-TCAT) allows one to retrieve and collect tweets from Twitter and to analyze them in various ways.

Installation

You can find detailed installation instructions in the wiki

Requirements

  • Twitter API credentials (these can be obtained from https://apps.twitter.com)
  • One of the following Linux distributions:
    • Ubuntu 18.04
    • Debian 9.*
  • ... or Docker (experimental)

Debian and Ubuntu

Run:

curl https://raw.githubusercontent.com/digitalmethodsinitiative/dmi-tcat/master/helpers/tcat-install-linux.sh | sudo bash

Docker

Our latest Docker images are availble on Docker Hub.

  1. Install Docker Desktop, and start it. Note that on Windows, you may need to ensure that WSL (Windows Subsystem for Linux) integration is enabled in Docker. You can find this in the Docker setting in Settings -> Resources-> WSL Integration -> Enable integration with required distros.

Basic installation

  1. Run the command docker run --publish 80:80 --volume tcat_data:/var/lib/mysql/ --detach --name tcat digitalmethodsinitiative/tcat:1.0 and Docker will download version 1.0 (or whatever tag with which you replace the "1.0")
  • --publish HOST_PORT:80 allows you to define which port on the host network is used. If you are using a different port, you may also need to add -e SERVERNAME=localhost:HOST_PORT where HOST_PORT is the desired port as this is used for internal links in the TCAT interface.
  • --volume volume_name:/var/lib/mysql/ ensures you are easily able to reuse your TCAT mysql database and recover data after you are no longer using TCAT
  1. Open the logs to retrieve you login information via either Docker's interface or the command line docker logs tcat (installation may take some time, so you can either wait or run docker logs -f tcat to follow along)
  2. Open http://localhost:80 in your browser and complete the configuration by providing your Twitter API information and which type of tweet capturing you would like to do.
  3. Congratulations! You can use the admin menu to create your first tweet capture bins
  4. In the future, you can stop and start your TCAT container with: docker stop tcat and docker start tcat

Customize for you own server

The Docker installation also allows you to easily host TCAT on a server. In addition to the SERVERNAME environment variable, you can also use Let's Encrypt by adding -e LETSENCRYPT=y and -e [email protected]. You should also open port 443 for Let's Encrypt to work. Your full command might look like this: docker run --publish 80:80 --publish 443:443 --volume tcat_data:/var/lib/mysql/ --detach --name tcat -e SERVERNAME=my.website.com -e LETSENCRYPT=y -e [email protected] digitalmethodsinitiative/tcat:1.0

Further TCAT customization

Finally, if you wish to develop TCAT yourself, you can clone this repository and create your own image.

  1. Clone this repository git clone https://github.com/digitalmethodsinitiative/dmi-tcat.git
  2. Build the image (from the directory where you have cloned TCAT): docker image build --progress=plain -t tcat:1.0 .
  3. Replace digitalmethodsinitiative/tcat:1.0 with tcat:1.0 from above in your docker run command

Issues

Please use the issue templates when reporting issues and bugs.

Status

Nice way to describe the fact that we don't have much

Contributing

We are happy to receive suggestions and improvements.

License

Apache License Version 2.0

dmi-tcat's People

Contributors

bernorieder avatar brendam avatar dale-wahl avatar dentoir avatar ekborra avatar erikborra avatar frederickjansen avatar hoylen avatar matnel avatar mikesname avatar nickwest avatar stijn-uva avatar theneva avatar thierrymarianne avatar tmantynen avatar xmacex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dmi-tcat's Issues

Captureroles

Hi Eric et al,

I'm settting up a new and more stable server to DMI-tcat here in Copenhagen. However installing the most recent edition from github returns a error that the sql table tcat_controller_tasklist is missing. Manually creating this table gets me on, but only to reach yet another error of CAPTUREROLES not being defined.

Are you in the middle of some major changes breaking the current github edition? And if yes do you any ETA of when it will be up and running again?

Best regards
Tobias

Mysql Error Clean Installation 5/5/2014

Hi guys, im triying install this project and i have an issue, no mysql schema included on master.zip can u check it? its a clean installation. can u help me?

onepercent

Hi Erik,
I updated tcat about a week ago and keyword tracking is working well so far. I discovered that there is already a script to get the onepercent sample. Is this already integrated?
When I run php dmi-tcat/capture/stream/onepercent.php it returns the following error:

PHP Deprecated: mysql_connect(): The mysql extension is deprecated and will be removed in the future: use mysqli or PDO instead in /home/supersambo/www/dmi-tcat/common/functions.php on line 5
PHP Warning: Invalid argument supplied for foreach() in /home/supersambo/www/dmi-tcat/capture/common/functions.php on line 18
PHP Fatal error: Call to undefined function processstweets() in /home/supersambo/www/dmi-tcat/capture/stream/onepercent.php on line 58

I assume this function is not finished yet? or is it a problem with my setup?

Catchable fatal error

Catchable fatal error: Object of class stdClass could not be converted to string in /home/bd/public_html/socialtrack/capture/common/functions.php on line 667

when i search with php search.php

any idea?

creating bin with - in it's name (volkskrant-links) results in SQL error

After creating a new bin with the name "volkskrant-links", DMI-TCAT became inaccessible. I had to manually remove the row with the bin (happened to be nr. 22) from the SQL database with
DELETE FROM tcat_query_bins WHERE id=22;
After that, DMI-TCAT worked fine again.

See /var/log/apache2/error.log:

[Tue Jul 15 13:55:58 2014] [error] [client 145.18.196.100] PHP Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-link_hashtags (\n\t\tid int(11) NOT NULL AUTO_INCREMENT,\n\t\ttweet_id bigint(20) NOT' at line 1' in /var/www/dmi-tcat/capture/common/functions.php:49\nStack trace:\n#0 /var/www/dmi-tcat/capture/common/functions.php(49): PDOStatement->execute()\n#1 /var/www/dmi-tcat/capture/query_manager.php(74): create_bin('volkskrant-link')\n#2 /var/www/dmi-tcat/capture/query_manager.php(19): create_new_bin(Array)\n#3 {main}\n thrown in /var/www/dmi-tcat/capture/common/functions.php on line 49, referer: http://datacollection2.followthenews-uva.cloudlet.sara.nl/dmi-tcat/capture/
[Tue Jul 15 13:56:08 2014] [error] [client 145.18.196.100] PHP Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-link_tweets' at line 1' in /var/www/dmi-tcat/capture/query_manager.php:528\nStack trace:\n#0 /var/www/dmi-tcat/capture/query_manager.php(528): PDOStatement->execute()\n#1 /var/www/dmi-tcat/capture/index.php(16): getBins()\n#2 {main}\n thrown in /var/www/dmi-tcat/capture/query_manager.php on line 528, referer: http://datacollection2.followthenews-uva.cloudlet.sara.nl/dmi-tcat/
[Tue Jul 15 13:56:12 2014] [error] [client 145.18.196.100] PHP Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-link_tweets' at line 1' in /var/www/dmi-tcat/capture/query_manager.php:528\nStack trace:\n#0 /var/www/dmi-tcat/capture/query_manager.php(528): PDOStatement->execute()\n#1 /var/www/dmi-tcat/capture/index.php(16): getBins()\n#2 {main}\n thrown in /var/www/dmi-tcat/capture/query_manager.php on line 528
[Tue Jul 15 13:56:16 2014] [error] [client 145.18.196.100] PHP Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-link_tweets' at line 1' in /var/www/dmi-tcat/capture/query_manager.php:528\nStack trace:\n#0 /var/www/dmi-tcat/capture/query_manager.php(528): PDOStatement->execute()\n#1 /var/www/dmi-tcat/capture/index.php(16): getBins()\n#2 {main}\n thrown in /var/www/dmi-tcat/capture/query_manager.php on line 528, referer: http://datacollection2.followthenews-uva.cloudlet.sara.nl/dmi-tcat/
[Tue Jul 15 13:56:31 2014] [error] [client 145.18.196.100] PHP Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-link_tweets t' at line 1' in /var/www/dmi-tcat/analysis/common/functions.php:685\nStack trace:\n#0 /var/www/dmi-tcat/analysis/common/functions.php(685): PDOStatement->execute()\n#1 /var/www/dmi-tcat/analysis/common/functions.php(6): get_all_datasets()\n#2 /var/www/dmi-tcat/analysis/index.php(4): require_once('/var/www/dmi-tc...')\n#3 {main}\n thrown in /var/www/dmi-tcat/analysis/common/functions.php on line 685, referer: http://datacollection2.followthenews-uva.cloudlet.sara.nl/dmi-tcat/
[Tue Jul 15 13:56:34 2014] [error] [client 145.18.196.100] PHP Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-link_tweets t' at line 1' in /var/www/dmi-tcat/analysis/common/functions.php:685\nStack trace:\n#0 /var/www/dmi-tcat/analysis/common/functions.php(685): PDOStatement->execute()\n#1 /var/www/dmi-tcat/analysis/common/functions.php(6): get_all_datasets()\n#2 /var/www/dmi-tcat/analysis/index.php(4): require_once('/var/www/dmi-tc...')\n#3 {main}\n thrown in /var/www/dmi-tcat/analysis/common/functions.php on line 685, referer: http://datacollection2.followthenews-uva.cloudlet.sara.nl/dmi-tcat/
[Tue Jul 15 13:56:36 2014] [error] [client 145.18.196.100] PHP Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-link_tweets t' at line 1' in /var/www/dmi-tcat/analysis/common/functions.php:685\nStack trace:\n#0 /var/www/dmi-tcat/analysis/common/functions.php(685): PDOStatement->execute()\n#1 /var/www/dmi-tcat/analysis/common/functions.php(6): get_all_datasets()\n#2 /var/www/dmi-tcat/analysis/index.php(4): require_once('/var/www/dmi-tc...')\n#3 {main}\n thrown in /var/www/dmi-tcat/analysis/common/functions.php on line 685, referer: http://datacollection2.followthenews-uva.cloudlet.sara.nl/dmi-tcat/
[Tue Jul 15 13:56:38 2014] [error] [client 145.18.196.100] PHP Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-link_tweets t' at line 1' in /var/www/dmi-tcat/analysis/common/functions.php:685\nStack trace:\n#0 /var/www/dmi-tcat/analysis/common/functions.php(685): PDOStatement->execute()\n#1 /var/www/dmi-tcat/analysis/common/functions.php(6): get_all_datasets()\n#2 /var/www/dmi-tcat/analysis/index.php(4): require_once('/var/www/dmi-tc...')\n#3 {main}\n thrown in /var/www/dmi-tcat/analysis/common/functions.php on line 685, referer: http://datacollection2.followthenews-uva.cloudlet.sara.nl/dmi-tcat/
[Tue Jul 15 13:56:39 2014] [error] [client 145.18.196.100] PHP Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-link_tweets t' at line 1' in /var/www/dmi-tcat/analysis/common/functions.php:685\nStack trace:\n#0 /var/www/dmi-tcat/analysis/common/functions.php(685): PDOStatement->execute()\n#1 /var/www/dmi-tcat/analysis/common/functions.php(6): get_all_datasets()\n#2 /var/www/dmi-tcat/analysis/index.php(4): require_once('/var/www/dmi-tc...')\n#3 {main}\n thrown in /var/www/dmi-tcat/analysis/common/functions.php on line 685, referer: http://datacollection2.followthenews-uva.cloudlet.sara.nl/dmi-tcat/
[Tue Jul 15 13:56:40 2014] [error] [client 145.18.196.100] PHP Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-link_tweets t' at line 1' in /var/www/dmi-tcat/analysis/common/functions.php:685\nStack trace:\n#0 /var/www/dmi-tcat/analysis/common/functions.php(685): PDOStatement->execute()\n#1 /var/www/dmi-tcat/analysis/common/functions.php(6): get_all_datasets()\n#2 /var/www/dmi-tcat/analysis/index.php(4): require_once('/var/www/dmi-tc...')\n#3 {main}\n thrown in /var/www/dmi-tcat/analysis/common/functions.php on line 685, referer: http://datacollection2.followthenews-uva.cloudlet.sara.nl/dmi-tcat/
[Tue Jul 15 13:57:28 2014] [error] [client 127.0.0.1] PHP Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-link_tweets t' at line 1' in /var/www/dmi-tcat/analysis/common/functions.php:685\nStack trace:\n#0 /var/www/dmi-tcat/analysis/common/functions.php(685): PDOStatement->execute()\n#1 /var/www/dmi-tcat/analysis/common/functions.php(6): get_all_datasets()\n#2 /var/www/dmi-tcat/analysis/index.php(4): require_once('/var/www/dmi-tc...')\n#3 {main}\n thrown in /var/www/dmi-tcat/analysis/common/functions.php on line 685, referer: http://localhost/dmi-tcat/
root@datacollection2:/var/log/apache2#

Add more api keys

Hi guys, recently i add more api keys to track with "search" in config.php, but im still getting "{"errors":[{"message":"Rate limit exceeded","code":88}]}"

$twitter_keys = array(
    array("twitter_consumer_key" => "xxx",
        "twitter_consumer_secret" => "xxx",
        "twitter_user_token" => "xxx-xxx",
        "twitter_user_secret" => "xxx",));
$twitter_keys = array(
    array("twitter_consumer_key" => "xxx",
        "twitter_consumer_secret" => "xxx",
        "twitter_user_token" => "xxx-xxx",
        "twitter_user_secret" => "xxx",));
$twitter_keys = array(
    array("twitter_consumer_key" => "xxx",
        "twitter_consumer_secret" => "xxx",
        "twitter_user_token" => "xxx-xxx",
        "twitter_user_secret" => "xxx",));
$twitter_keys = array(
    array("twitter_consumer_key" => "xxx",
        "twitter_consumer_secret" => "xxx",
        "twitter_user_token" => "xxx-xxx",
        "twitter_user_secret" => "xxx",));

Tracking keywords and language filter

Hello,
I've just started using DMI-TCAT and I'm not too familiar with it yet. I was wondering if it was possible to track keywords AND to filter on language ? I know it's easy to filter afterwards but my current queries capture way too many tweets. I'm not too sure as to where I should add the parameter ?
Sorry for the noob question, I'm still in the midst of understanding DMI-TCAT as a whole (I had a hard time finding where to fill in my proxy settings!).

Recommended server?

Hi,

I will no longer have access to my development server by the end of this week, and was wondering if you have any recommendations of server hosts that would allow me command line access to install and continue running dmi-tcat?

Thanks,
Sam

bin already exists

Hi Erik,

With the new version / web interface, I have the config set up to follow users, and have truncated the tables in the database (installing this on a VM that was briefly used for testing).

When I create a new pool of users, I get a popup stating that the bin name already exists, and it doesn't get added. I get this popup whatever I call the bin, and even though there are no bins defined in the SQL tables (and I removed the old followbins.php etc in case it was checking against them..)

Any ideas?

Cheers,

Darryl

DMI-TCAT controller killed a process

I'm constantly receiving emails saying that dmi-tcat killed the processes because the script was idle. This occured suddenly and I'm not aware of any profound changes made to my server, which may cause this problem. I'm getting the message every 5 minutes.

The log files do not point me to the problem. dmi-tcat/capture/stream/logs/error.log just contains:

2013-10-29 07:05:03 connecting to API socket
2013-10-29 07:05:03 connecting - query array (
'track' => 'global warming,globalwarming,climate,climatechange,yasuni,yasuniitt',
)
2013-10-29 07:11:03 connecting to API socket
2013-10-29 07:11:03 connecting - query array (
'track' => 'global warming,globalwarming,climate,climatechange,yasuni,yasuniitt',
)
2013-10-29 07:17:03 connecting to API socket
2013-10-29 07:17:03 connecting - query array (
'track' => 'global warming,globalwarming,climate,climatechange,yasuni,yasuniitt',
)

dmi-tcat/capture/stream/logs/controller.log states:

2013-10-29 07:05:01 script was idle for more than 300 seconds - killing and starting
2013-10-29 07:06:01 script called - pid:3038 idle:58
2013-10-29 07:07:01 script called - pid:3038 idle:118
2013-10-29 07:08:01 script called - pid:3038 idle:178
2013-10-29 07:09:01 script called - pid:3038 idle:238
2013-10-29 07:10:01 script called - pid:3038 idle:298
2013-10-29 07:11:01 script called - pid:3038 idle:358
2013-10-29 07:11:01 script was idle for more than 300 seconds - killing and starting
2013-10-29 07:12:01 script called - pid:3119 idle:58
2013-10-29 07:13:01 script called - pid:3119 idle:118
2013-10-29 07:14:01 script called - pid:3119 idle:178
2013-10-29 07:15:01 script called - pid:3119 idle:238
2013-10-29 07:16:01 script called - pid:3119 idle:298
2013-10-29 07:17:01 script called - pid:3119 idle:358
2013-10-29 07:17:01 script was idle for more than 300 seconds - killing and starting

the apache error log does not contain any relevant information either.

I tested my Twitter tokens and they work well with some python scripts to retrieve User Informations.

Errors: Test the capturing scripts

I encounter the following errors:

[ec2-user@ip-10-253-71-114 stream]$ php dmitcat_onepercent.php

PHP Notice: Use of undefined constant CAPTUREROLES - assumed 'CAPTUREROLES' in /var/www/html/dmi-tcat/capture/common/functions.php on line 983

PHP Notice: unserialize(): Error at offset 0 of 12 bytes in /var/www/html/dmi-tcat/capture/common/functions.php on line 983

PHP Warning: in_array() expects parameter 2 to be array, boolean given in /var/www/html/dmi-tcat/capture/common/functions.php on line 984

tracker_run() role onepercent is not configured to run

Warning notice from Twitter re. dmi-tcat

Hi there,

I haven't used dmi-tcat for a few months, but recieved the warning notice below from Twitter yesterday. I am not sure what processes in the dmi-tcat application are against Twitter's guidlines, but thought I should let you know that this has occurred in case you or someone else recieves similar notifications in future. I had followed all your setup instructions to the letter, and had not used dmi-tcat since April, so not sure what has happened. Here is the email message I recieved from Twitter:

"This is a notice that your application, dmi-tcat onepercent, is no longer allowed to perform write operations.

Please make sure that your application follows Twitter's API Terms of Service
To request that your application be re-enabled for write operations, please visit our support form.

While this restriction is still in place, please do not attempt to register a new API key for your application without authorization from Twitter. Such an action is a violation of our API Terms of Service and may result in the permanent suspension of your application (as well as any associated developer accounts). "

Do let me know if you want me to ask why they restricted the application, and I will.

Best,
Sam

capture/index.php in 404 & mysql insert errors after a bin query was created with a dash

Hello,

When Oscar named a new query bin with a dash "-" to separate words, both the capture and analysis UI went down. They now return a depressing 404 Not Found :'(

In Mysql, as I wanted to check if the server was not overloaded by too many tweets coming in, the following query returned an error:

select count(id) from disseny-barcelona_urls;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-barcelona_urls' at line 1

But the following worked:

select * from disseny-barcelona_urls;

Source: http://stackoverflow.com/questions/18670394/using-dash-in-mysql-table-name

In track.error.log there are also insert errors everytime the software tries to insert data in disseny-barcelona_hashtags, disseny-barcelona_mentions, disseny-barcelona_tweets, disseny-barcelona_urls. For example:

2014-05-23 20:18:18 insert error: INSERT IGNORE INTO disseny-barcelona_urls (tweet_id,created_at,from_user_name,from_user_id,url,url_expanded) VALUES ('469920255322914816','2014-05-23 20:18:14','hachenar','435069351','http://t.co/SHo6zKxsX8','http://instagram.com/p/oWOxyXCKdG/')

Shall the following resolve the problem?
1- delete the table disseny-barcelona_hashtags, disseny-barcelona_mentions, disseny-barcelona_tweets, disseny-barcelona_urls
2- delete disseny-barcelona row from tcat_query_bins (id 5)
3- delete all rows from tcat_query_bins_periods and tcat_query_bins_phrases when querybin_id=5
4- Recreate the query with an underscore then hit Oscar :-)

Thank you very much!

Add more in-depth explanations and examples of TCAT queries

And give some examples.

For instance:

"When you want to retrieve unique text strings within TCAT, you need to slightly change your query. For instance, if you are looking for the terms 'global' and 'trade', and you don't want TCAT to also return hits for the word 'globally' and 'traded', you need to enter your query as follows:

[ global ] AND [ trade ]

Please note that using the spaces between the brackets and your text string is essential. Without them, TCAT will return hits for 'globally' as well."

Utilize different mysql port number in tcat

Hi,

We need to use another port number besides the mysql default for our tcat database. It would be great if this was configurable in the config.php script rather than hand editing the various PDO connect commands in the distribution

A couple of notes from a recent install

A couple of things I noticed on a recent install:

For follow & onepercent, the config file is missing slashes, which means it doesn't work (tries to connect to api.twitter.com1.1/track/.. etc):

'track' => "https://stream.twitter.com/",
'follow' => "https://stream.twitter.com",
'onepercent' => "https://stream.twitter.com",

Also, the analysis module doesn't seem to pick up 'bins' from the followbins file, so to analyse currently you have to download the tweets directly, but I guess that's in the pipeline?

Cheers,

Darryl

Not tracking

Erik, thanks for your support, now i can installed, but my system is not tracking nothing, mysql tables are empty, can u help me?

Php Cron return this:

X-Powered-By: PHP/5.3.26
Content-type: text/html

dmitcat_track.php works but not controller.php

I've got a few query bins with track queries. They used to work fine, but they stopped collecting tweets. When I used dmitcat_track.php, it works fine; it also works when I use the search script with the query bins. But when I use controller.php, nothing seems to happen.

Here is the result of tail *log:

tail *log
==> controller.log <==
2014-09-06 17:59:01 controller.php already running, skipping this check
2014-09-06 18:00:01 controller.php already running, skipping this check
2014-09-06 18:01:01 controller.php already running, skipping this check
2014-09-06 18:02:01 controller.php already running, skipping this check
2014-09-06 18:02:07 check_running_role: no running track script (pid 2128 seems dead)
2014-09-06 18:02:07 check_running_role: no running track script found

==> track.error.log <==
2014-09-05 17:45:32 connecting - query array (
  'track' => '[tracked keywords]',
)
2014-09-06 16:13:42 running php version 5.3.10-1ubuntu3.13 in mode cli with extensions Core,date,ereg,libxml,openssl,pcre,zlib,bcmath,bz2,calendar,ctype,dba,dom,hash,fileinfo,filter,ftp,gettext,SPL,iconv,json,mbstring,pcntl,session,posix,readline,Reflection,standard,shmop,SimpleXML,soap,sockets,Phar,exif,sysvmsg,sysvsem,sysvshm,tokenizer,wddx,xml,xmlreader,xmlwriter,zip,curl,gd,imagick,imap,intl,mcrypt,memcache,ming,mysql,mysqli,PDO,pdo_mysql,pdo_pgsql,pdo_sqlite,pgsql,ps,pspell,recode,snmp,sqlite3,tidy,xmlrpc,xsl,mhash (ini file: /etc/php5/cli/php.ini)
2014-09-06 16:13:42 installing term signal handler for this script
2014-09-06 16:13:42 started script track with pid 2128
2014-09-06 16:13:42 connecting to API socket
2014-09-06 16:13:42 connecting - query array (
  'track' => '[tracked keywords]',
)

Notice and Warning

Notice: Use of undefined constant CURLOPT_EXPECT_100_TIMEOUT_MS - assumed 'CURLOPT_EXPECT_100_TIMEOUT_MS' in /home/bd/public_html/socialtrack/capture/common/tmhOAuth/tmhOAuth.php on line 799

Warning: curl_setopt_array(): Array keys must be CURLOPT constants or equivalent integer values in /home/bd/public_html/socialtrack/capture/common/tmhOAuth/tmhOAuth.php on line 804

Notice: Undefined index: headers in /home/bd/public_html/socialtrack/capture/search/adn.php on line 62
remaining

With latest version.

Any idea?

URL expander exits with error

The urlexpand.py script exits with an error

On line 112:
AttributeError: 'module' object has no attribute 'wait'

This seems to happen after the script has done its job already. The cronjob will restart the script, but the script is not running in the background continuously.

GNIP script

Hello,

I'd very much like to try your GNIP import script.
Thanks!

SSL issue

Hi,

we had your great too already running, but since a month or so it won't start anymore.
In the track.error.log I found an error message regarding SSL:
'error' => 'error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure',
'errno' => 35,

Does anybody have an idea why this is occurring and how we can solve the problem?

Thanks very much!

Search API Functionality

It would be useful if DMI-TCAT could 'stream' (i.e. repeatedly search) via the Search API. In particular, this would be useful for use cases not supported under the streaming API, such as the tracking of specific URLs or part of URLs.

There seems to be no mysql table for the query manager

There seems to be no mysql table for the query manager. The error message is

PHP Fatal error:  Uncaught exception 'PDOException' with message 'SQLSTATE[42S02]: Base table or view not found: 1146 Table 'twittercapture.tcat_query_phrases' doesn't exist' in /home/supersambo/www/dmi-tcat/capture/common/functions.php:380
Stack trace:
#0 /home/supersambo/www/dmi-tcat/capture/common/functions.php(380): PDOStatement->execute()
#1 /home/supersambo/www/dmi-tcat/capture/stream/track.php(46): getActivePhrases()
#2 /home/supersambo/www/dmi-tcat/capture/stream/track.php(30): stream()
#3 {main}
  thrown in /home/supersambo/www/dmi-tcat/capture/common/functions.php on line 380

This will happen for users who pulled the latest source and were using DMI-TCAT before the query manager was implemented.

Removing bins

Thanks for providing us with this great tool!
Is it correct that there is currently no way to delete a bin (except deleting it from the SQL databases manually?)
Thanks,
Damian

add or modify query-bin via cli

Hi,

is there any way to create a query-bin, or modify an existing one, via CLI (and thus articulate it with other tools)?

Modifying a query

I have a (too) large query bin, with 157 phrases which hit the rate limit quite often. So I want to modify it. But when I click on "modify phrases", the /capture/ page simply reloads without showing the popup for modifying the keywords. When I try to do that for a smaller query, it works fine, though. Any idea what may be going on?

Analysis: exclude [RT] - File with \[RT_\] cannot be downloaded

Hello Erik!

We're having a lot of fun with TCat in Barcelona :-)

For big datasets, we wanted to exclude the retweets, so we introduced [RT ] in exclude.
The file is generated correctly on the analysis/cache folder on the server.
However we cannot download it from a browser since it contains 2 slash before the opening and closing brackets "[RT_]".

The generated url is /analysis/cache/Barcelona-20140602-20140603--/[RT_/]----tweetStats--26060f3a66.csv
When we click on the download link, a 404 error occurs.

I attached a picture of what was introduced in the analysis UI for a Tweet stats analysis.
screen shot 2014-06-04 at 10 53 51

Best!

Not Tracking all tweets

Hello,

Dmi-tcat is installed and running on a Debian 7 server.
The tool is capturing some tweets but apparently very few of them:
screen shot 2014-05-20 at 14 08 05

I read and followed the questions in the following thread #21

dmitcat_track.php is running:

foxsi@oxsi:/var/www/dmi-tcat/logs$ ps aux | grep php
foxsi 5921 0.0 0.3 169600 14020 ? S 13:43 0:00 /usr/bin/php /var/www/dmi-tcat/capture/stream/dmitcat_track.php
foxsi 6110 0.0 0.0 11108 892 pts/1 S+ 13:56 0:00 grep php
root 17811 0.0 0.1 169820 4784 ? Ss May15 0:12 php-fpm: master process (/etc/php5/fpm/php-fpm.conf)
www-data 17813 0.0 0.1 169820 4236 ? S May15 0:00 php-fpm: pool www
www-data 17814 0.0 0.1 169820 4236 ? S May15 0:00 php-fpm: pool www

Log files are writeable:
foxsi@oxsi:/var/www/dmi-tcat/logs$ tail -f *log
==> controller.log <==
2014-05-20 12:51:01 script track is running with pid [5921] and has been idle for 75 seconds
2014-05-20 12:52:01 script track is running with pid [5921] and has been idle for 135 seconds
2014-05-20 12:53:01 script track is running with pid [5921] and has been idle for 195 seconds
2014-05-20 12:54:01 script track is running with pid [5921] and has been idle for 255 seconds
2014-05-20 12:55:01 script track is running with pid [5921] and has been idle for 315 seconds
2014-05-20 12:56:01 script track is running with pid [5921] and has been idle for 375 seconds
2014-05-20 12:57:01 script track is running with pid [5921] and has been idle for 435 seconds
2014-05-20 12:58:01 script track is running with pid [5921] and has been idle for 495 seconds
2014-05-20 12:59:01 script track is running with pid [5921] and has been idle for 21 seconds
2014-05-20 13:00:02 script track is running with pid [5921] and has been idle for 82 seconds

==> track.error.log <==
2014-05-20 12:43:05 failed to flush capture buffer
2014-05-20 12:43:05 writing rate limit information to database
2014-05-20 12:43:05 exiting now on TERM signal
2014-05-20 12:43:06 running php version 5.4.4-14+deb7u9 in mode cli with extensions Core,date,ereg,libxml,openssl,pcre,zlib,bcmath,bz2,calendar,ctype,dba,dom,hash,fileinfo,filter,ftp,gettext,SPL,iconv,json,mbstring,pcntl,session,posix,Reflection,standard,shmop,SimpleXML,soap,sockets,Phar,exif,sysvmsg,sysvsem,sysvshm,tokenizer,wddx,xml,xmlreader,xmlwriter,zip,PDO,curl,gd,mysql,mysqli,pdo_mysql,mhash (ini file: /etc/php5/cli/php.ini)
2014-05-20 12:43:06 installing term signal handler for this script
2014-05-20 12:43:06 started script track with pid 5921
2014-05-20 12:43:06 connecting to API socket
2014-05-20 12:43:06 connecting - query array (
'track' => '[barcelona]',
)

The content of crontab -e

  • * * * * (cd /var/www/dmi-tcat/capture/stream/; php controller.php)
    0 * * * * (. /home/foxsi/politweaks/bin/activate; cd /var/www/dmi-tcat/helpers; sh urlexpand.sh)

Thank you for your help!!!

problems with accents

Tweets are getting into database like this:

Alexis Sanchez would ‘prefer a move to Arsenal over Liverpool’ topic http://t.co/CYnEHER9rY

áéíóú or ñ are not entry correctly, any idea?

Sentiment Analysis?

How i can see this? i see many files in analysis folder but they arent in the export section, any idea?

Regards.

Problem with search

Hi guys, i appreciate your work and time to answer my problem, and i have another problem with this query in search.php

$bin_name = 'CharlesAranguiz';
$keywords = 'charanguiz20' OR 'charles aranguiz' OR 'aranguiz' OR 'el principe' OR 'sir charles' OR 'Cha Cha Cha' OR 'principe azul';

I get only 100 results, but searching manual in twitter this querys i have a lot more, any idea? full search.php here:

http://pastebin.com/sXWHS8xY

Regards!

Adding new Twitter fields

Twitter sometimes adds fields to its JSON objects.

Fields currently not incorporated into the DMI-TCAT tables for tweets are place, possibly_sensitive, scopes, truncated, favorited, withheld_copyright, withheld_in_countries, and withheld_scope. For users it concerns withheld_in_countries, withheld_scope, favorites_count, and created_at. For entities it concerns media, and in particular the extended entities.

Both place, extended_entities, and withheld_in_countries should get new tables in DMI-TCAT. E.g. _places, _media, and _withheld_in_countries. The other new fields can be incorporated into the existing _tweets tables. I.e. possibly_sensitive boolean, scopes varchar(255), trunctated boolean, favorited boolean, withheld_copyright boolean, withheld_scope varchar(255), from_user_withheld_scope varchar(255), from_user_favorites_count (int), from_user_created_at datetime, and two foreign keys withheld_in_countries bigint and from_user_withheld_in_countries bigint, which link to the _withheld_in_countries_tables.

Also, we need to store media links the url table.

In order to incorporate those new fields into the database definitions of DMI-TCAT, we need a script which checks the table encoding, new fields, and indices against the current database definitions. The script should be called by the controller just before starting one of the tracking scripts.

Error with tracking

Controller log:

2014-07-10 14:47:18 received instruction to execute 'reload' for script track
2014-07-10 14:47:18 check_running_role: no running track script (pid 30898 seems dead)
2014-07-10 14:47:18 check_running_role: no running track script found
2014-07-10 14:47:18 script track was not running - starting

And no tracking started.

Any idea?

Errors: Test the capturing scripts

Hi,
I am trying to install dmi-tcat, but when I get to the section to test the capturing scripts, i get the following errors. Can you tell me where I am going wrong. Apologies for such a noob query.

Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42S02]: Base table or view not found: 1146 Table 'twittercapture.tcat_query_phrases' doesn't exist' in /Users/BASE_URL/Sites/dmi-tcat/capture/common/functions.php:440
Stack trace:
#0 /Users/BASE_URL/Sites/dmi-tcat/capture/common/functions.php(440): PDOStatement->execute()
#1 /Users/BASE_URL/Sites/dmi-tcat/capture/stream/track.php(49): getActivePhrases()
#2 /Users/BASE_URL/Sites/dmi-tcat/capture/stream/track.php(32): stream()
#3 {main}
  thrown in /Users/BASE_URL/Sites/dmi-tcat/capture/common/functions.php on line 440

no track script found

Hi,

  1. Some words in portuguese returns the error "no track script found" even if in Twitter stream there is a lot of text.

  2. I'm missing something when I try the command sudo chown to assign the user the should use the app

  3. Some words returns data, others don't.

What am I doing wrong?

Thank you,
Gabriela

Encoding, accents & special characters

Hello,

When connecting to index.php in capture or analysis, Firefox or chrome auto-detect the encoding as Western ISO-8859-1 when the internal encoding is set to UTF-8 in config.php.
I think adding the following line based on config.php would resolve the problem:
meta charset="UTF-8"

screen shot 2014-05-21 at 12 33 44

(Otherwise, we can still manually set the encoding in View > Character Encoding > UTF-8)
screen shot 2014-05-21 at 12 33 57

Best!

lat & lon

Hi Eric,

Thank you for providing the community with an excellent tool for working with twitter. It has already gained quite a lot of interest here in Copenhagen.

We were wondering if there is a way to capture tweets for a specific location (lat/long)? We are able to do this using the Twitter API, but are unsure if there is way to track such a call in TCAT?

In Twitter API this could look like this:
GET https://search.twitter.com/search.json?q=to:twitter%20geocode:37.781157,-122.398720,25mi&callback=yourCallback

Maybe there are a php file which one could hardcode with the geocode?

Best regards
Tobias

Script track idle for more than 600 seconds after Migration

After migrating to the new version we will get a email with the following content every 10 min.:

script track was idle for more than 600 seconds - killing and starting

The log is without any valuable information to what causes the error and it appears as though at least some tweets are making it into the database. First I thought it were a database error cause by heavy migration, but the entire database has been repaired and returns no errors.

Any suggestions anyone?

Best regards
Tobias

New features available in DMI-TCAT

The DMI summer school proved to be very productive for the development of DMI-TCAT!

The various updates include:

  • the possibility to display the line graph by day, hour or minute. You can also export the line graph as SVG
  • new filters: twitter client, geo coordinates (if you have mysql >= 5.6.1)
  • new exports: from_user_lang-hashtag, twitter client-hashtag
  • a new experimental module: the Sankey Maker can be used to plot the relation between from_user_lang, hashtags, timezone, and/or twitter client
  • streamlining filenames
  • streamlining key looping in ids/lookup.php, search/search.php, user/timeline.php and user/user-friends.php
  • resolving various small issues

Thanks to the summer school participants for pushing the limits of DMI-TCAT and to Emile den Tex and Bernhard Rieder for their programming efforts!

More updates next week!

controller.php already running, skipping this check

Hi,
my dmi-tcat was not capturing any tweets in the last week. The controller.log has an entry saying
controller.php already running, skipping this check for each minute during the last seven days, but this is not the case. controller and track were not running so I started the controller script again manually and tcat is capturing tweets again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.