pbinkley / twarc-report Goto Github PK
View Code? Open in Web Editor NEWData conversions and examples for generating reports from twarc collections using tools such as D3.js
License: Creative Commons Zero v1.0 Universal
Data conversions and examples for generating reports from twarc collections using tools such as D3.js
License: Creative Commons Zero v1.0 Universal
line 41: module archive has no attribute main.
import archive
archive.main() <--- causing problems, please advice.
Add a script that can be run from a single cron job, and that will harvest all the active projects (based on start/end dates in metadata.json), and generate outputs.
Generate outputs into a Jekyll site. Develop Jekyll plugins:
Hello,
Thank you so much for extending the twarc library!
This isn't an issue in the classic sense so I apologize for using this mechanism.
I was wondering if you could say a bit more about the relationship between harvest.py and twarc (the submodule specified, not the most current version).
More specifically, from looking at the code in harvest.py which eventually calls upon twarc's archive.py, it does not seem that there is a mechanism for including the API keys. The version of twarc that twarc-report uses called upon one to enter them as:
twarc.py --consumer_key foo --consumer_secret bar --access_token baz --access_token_secret bez --search ferguson
How is this handled in when using harvest.py?
Thanks for your help!
Benjamin
I see your requirements specifies pysparklines but no version and I just wanted to let you know that the recently released 1.0 does not support Python versions before 3, so could cause your project issues. I would change your requirements.txt to use pysparklines==0.9 to resolve this problem until your project is fully Python 3 compatible.
Any -t option greater than 1 causes an error I think due to differences in iterating through keys in a dict between Python 2 and 3. It seems to work fine in Python 2.7 .
Just FYI, and thanks for the tools!
When I try the second command:
git submodule update
I get:
Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Clone of '[email protected]:edsu/twarc.git' into submodule path 'twarc' failed
I receive the following error message on my fierst attempt to run harvest.py. I have tried several different solutions to ensure that twarc-archive.py is in my PATH, but continue to receive this error message.
vagelos-ve536-0866:twarc-report-master Research$ ./harvest.py projects/projectA
/Users/Research/anaconda/bin/twarc-archive.py
/Library/Frameworks/Python.framework/Versions/2.7/bin/twarc-archive.py
/Library/Frameworks/Python.framework/Versions/3.4/bin/twarc-archive.py
/opt/local/bin/twarc-archive.py
/opt/local/sbin/twarc-archive.py
/usr/local/bin/twarc-archive.py
Cannot run twarc-archive.py
As a follower of an event that is being live-tweeted, I want to have a project directory where I will update a harvest periodically with a cronjob using twarc/utils/archive.py, with project metadata such as the twarc query, project title and creator, etc., all stored in a json file, so that the same cron job can generate twarc-report outputs that include the project metadata for clarity.
I'm thinking of json like this:
{"twarcquery": "#code4lib OR #c4l15 OR #code4arc",
"title": "Code4lib Conference, Portland OR, 10-12 Feb. 2015",
"creator": "Peter Binkley"}
And have a module that loads it with:
with open("metadata.json") as json_data:
project_metadata = json.load(json_data)
json_data.close()
title = project_metadata["title"]
And finally, use this in a script that embeds archive.py and runs the updates and the twarc-report outputs.
Hello,
When I try to execute harvest.py, I receive the following error:
Traceback (most recent call last):
File "./harvest.py", line 41, in
archive.main()
File "twarc/utils/archive.py", line 76, in main
sys.exit(1)
NameError: global name 'sys' is not defined
Am I making a mistake?
Hi everyone,
After a long time of struggling in twarc, finally, I extracted the tweets from Twitter hashtags. My question now is, how can I convert the data that I got to full text? all I can see now is just numbers. (the image is attached)
PS: I followed these steps which are here: https://github.com/DocNow/twarc and my file save as josn and I opened it on Excel.
Another PS: I am not a programmer nor developer :)
Refator to imitate the structure of twarc, with a single executable twarc-report
that takes subcommands to specify the desired script. Enable installation by pip install
.
Timebar example works perfectly, and makes me really excited to explore this more.
But in https://github.com/pbinkley/twarc-report#d3wordcloudpy
clicking the example --> https://www.wallandbinkley.com/twarc/c4l15/animatedwordcloud.html
results in a
failed to load resource: the server responded with a status of 404 (Not Found)
for the
Hi, frequently I redirect reports to text files on my debians & OSXs, to keep trace about ongoing Twarcs. But in the first IF...THEN...ELSE 4 lines lack of .encode("utf-8") when calling sparkline.sparkify() to print percentiles.
Adding .encode("utf-8"), like you do in all other calls, at lines 25-29-33-37 solve the following errors:
reportprofile.py tweets.json > reporttweets.txt
Traceback (most recent call last):
File "../twarc-report/reportprofile.py", line 25, in <module>
print "User percentiles: " + sparkline.sparkify(data["userspercentiles"])
UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-27: ordinal not in range(128)
Sorry if I'm not doing a pull/request, but I'm not sure it happens to all users.
Update the way native twarc functions are called, to use the new twarc structure.
Version the metadata.json in git to track changes in the query etc. (e.g. when you add an extra hashtag after you've been harvesting a project for a while). Associate the commit id with each harvest.
ouput
directory, with an index.html
Locations are now contained in place
element:
"place": {
"full_name": "Toronto, Ontario",
"url": "https://api.twitter.com/1.1/geo/id/3797791ff9c0e4c6.json",
"country": "Canada",
"place_type": "city",
"bounding_box": {
"type": "Polygon",
"coordinates": [
[
[
-79.639319,
43.403221
],
[
-78.90582,
43.403221
],
[
-78.90582,
43.855401
],
[
-79.639319,
43.855401
]
]
]
},
"contained_within": [],
"country_code": "CA",
"attributes": {},
"id": "3797791ff9c0e4c6",
"name": "Toronto"
},
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.