Giter Site home page Giter Site logo

pbinkley / twarc-report Goto Github PK

View Code? Open in Web Editor NEW
55.0 55.0 6.0 140 KB

Data conversions and examples for generating reports from twarc collections using tools such as D3.js

License: Creative Commons Zero v1.0 Universal

JavaScript 15.62% Python 44.84% HTML 36.41% Shell 3.14%
python social

twarc-report's People

Contributors

pbinkley avatar ruebot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

twarc-report's Issues

harvest.py

line 41: module archive has no attribute main.

import archive
archive.main() <--- causing problems, please advice.

Add universal cron function

Add a script that can be run from a single cron job, and that will harvest all the active projects (based on start/end dates in metadata.json), and generate outputs.

Add jekyll integration

Generate outputs into a Jekyll site. Develop Jekyll plugins:

  • versioning of outputs at given intervals (say all the tweets from a calendar day). Provide menus of versions.
  • data-driven pages for images, link, wall, etc. Maintain local cache of thumbnails of images and links.
  • build and deploy jekyll site after each harvest

clarification of harvest.py's relationship to twarc submodule

Hello,

Thank you so much for extending the twarc library!

This isn't an issue in the classic sense so I apologize for using this mechanism.

I was wondering if you could say a bit more about the relationship between harvest.py and twarc (the submodule specified, not the most current version).

More specifically, from looking at the code in harvest.py which eventually calls upon twarc's archive.py, it does not seem that there is a mechanism for including the API keys. The version of twarc that twarc-report uses called upon one to enter them as:

twarc.py --consumer_key foo --consumer_secret bar --access_token baz --access_token_secret bez --search ferguson

How is this handled in when using harvest.py?

Thanks for your help!

Benjamin

Populating the twarc subdirectory

When I try the second command:
git submodule update
I get:
Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Clone of '[email protected]:edsu/twarc.git' into submodule path 'twarc' failed

Error when trying to run harvest.py

I receive the following error message on my fierst attempt to run harvest.py. I have tried several different solutions to ensure that twarc-archive.py is in my PATH, but continue to receive this error message.

vagelos-ve536-0866:twarc-report-master Research$ ./harvest.py projects/projectA
/Users/Research/anaconda/bin/twarc-archive.py
/Library/Frameworks/Python.framework/Versions/2.7/bin/twarc-archive.py
/Library/Frameworks/Python.framework/Versions/3.4/bin/twarc-archive.py
/opt/local/bin/twarc-archive.py
/opt/local/sbin/twarc-archive.py
/usr/local/bin/twarc-archive.py
Cannot run twarc-archive.py

Project metadata

As a follower of an event that is being live-tweeted, I want to have a project directory where I will update a harvest periodically with a cronjob using twarc/utils/archive.py, with project metadata such as the twarc query, project title and creator, etc., all stored in a json file, so that the same cron job can generate twarc-report outputs that include the project metadata for clarity.

I'm thinking of json like this:

{"twarcquery": "#code4lib OR #c4l15 OR #code4arc", 
"title": "Code4lib Conference, Portland OR, 10-12 Feb. 2015",
"creator": "Peter Binkley"}

And have a module that loads it with:

with open("metadata.json") as json_data:
    project_metadata = json.load(json_data)
    json_data.close()
title = project_metadata["title"]    

And finally, use this in a script that embeds archive.py and runs the updates and the twarc-report outputs.

archive.py

Hello,

When I try to execute harvest.py, I receive the following error:

Traceback (most recent call last):
File "./harvest.py", line 41, in
archive.main()
File "twarc/utils/archive.py", line 76, in main
sys.exit(1)
NameError: global name 'sys' is not defined

Am I making a mistake?

how to convert twarc data to full texts?

Hi everyone,
After a long time of struggling in twarc, finally, I extracted the tweets from Twitter hashtags. My question now is, how can I convert the data that I got to full text? all I can see now is just numbers. (the image is attached)

PS: I followed these steps which are here: https://github.com/DocNow/twarc and my file save as josn and I opened it on Excel.
Another PS: I am not a programmer nor developer :)

Screen Shot 2020-12-13 at 2 26 32 PM

Repackage scripts as subcommands

Refator to imitate the structure of twarc, with a single executable twarc-report that takes subcommands to specify the desired script. Enable installation by pip install.

some

Hi, frequently I redirect reports to text files on my debians & OSXs, to keep trace about ongoing Twarcs. But in the first IF...THEN...ELSE 4 lines lack of .encode("utf-8") when calling sparkline.sparkify() to print percentiles.
Adding .encode("utf-8"), like you do in all other calls, at lines 25-29-33-37 solve the following errors:

reportprofile.py tweets.json > reporttweets.txt
Traceback (most recent call last):
  File "../twarc-report/reportprofile.py", line 25, in <module>
    print "User percentiles: " + sparkline.sparkify(data["userspercentiles"])
UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-27: ordinal not in range(128)

Sorry if I'm not doing a pull/request, but I'm not sure it happens to all users.

Update use of twarc

Update the way native twarc functions are called, to use the new twarc structure.

Manage project config with git

Version the metadata.json in git to track changes in the query etc. (e.g. when you add an extra hashtag after you've been harvesting a project for a while). Associate the commit id with each harvest.

Add basic management suite

  • Add script to create new projects (specify the query, the title, etc., and it will create the project directory, populate it with the necessary files such as metadata.json).
  • Add script to generate all outputs for a given project (both twarc-report and native twarc) into an ouput directory, with an index.html

Replace old geo element with place

Locations are now contained in place element:

 "place": {
    "full_name": "Toronto, Ontario",
    "url": "https://api.twitter.com/1.1/geo/id/3797791ff9c0e4c6.json",
    "country": "Canada",
    "place_type": "city",
    "bounding_box": {
      "type": "Polygon",
      "coordinates": [
        [
          [
            -79.639319,
            43.403221
          ],
          [
            -78.90582,
            43.403221
          ],
          [
            -78.90582,
            43.855401
          ],
          [
            -79.639319,
            43.855401
          ]
        ]
      ]
    },
    "contained_within": [],
    "country_code": "CA",
    "attributes": {},
    "id": "3797791ff9c0e4c6",
    "name": "Toronto"
  },

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.