Giter Site home page Giter Site logo

wikiwho's Introduction

wikiwho

An algorithm to identify authorship and editor interactions in Wiki revisioned content.

Installation Requirements

WikiWho has been tested on Mac OS X and Debian GNU/Linux, running on Python 2.7. (A python3 version can be found in this branch: https://github.com/maribelacosta/wikiwho/tree/python3 It has however not been extensively tested yet - stick to the master for now if you want reliability)

WikiWho utilizes the Wikimedia Utilities library to process the revisioned content extracted from Wikipedia. These functions can be downloaded from the official Wikimedia Utilities repository (under the MIT license) at the following link:

Running WikiwhoRelationships.py

(Note: WikiWho.py is the original script, giving just provenance information. WikiwhoRelationships.py can provide the exact same authorship/provenance data, plus interactions, but might run slower due to the overhead of interaction calculation. We didn't test that yet.)

Expected data:

  • the full revision history of one or a given set of wiki articles (basically everything that wikimedia utilities can process). It is here expected to be .xml, could also be .json, etc. if you parse it correctly.

  • you can get xml for single articles here. Or (more reliable, but only 50 revisions at a time) at the official API: example call (note that you will have to use rvcontinue to get all revisions). Finally, the full history dump can be downloaded at the WM dumps page. Note that the dump is quite big for enwiki, i.e. not recommended just for testing. Use wm utilities to read the compressed files.

How to run:

python WikiwhoRelationships.py

parameters:

-i [source_file_name.xml] (complete history dump XML of one article)

-o [a | r] --> what type of output to produce --> a=authorship for all tokens of a revision | r= interactions for every revision with each other revision in the past. I.e., this will list you all revisions and for each type of interaction we defined (delete, undelete, reintro, ..) the revisions that were target of that interaction and the number of tokens that interaction included. We will soon provide code that will spit put a more aggregated version of this as an editor-editor network. Yet, from the output available right now, you can already construct such a network yourself by summing up the positive and/or negative interactions between two editors over the whole revision history or a part of it.

-r [<revid> | all] --> what revision to show. revID or "all" for -o a, revID only for -o r

example A:

python WikiwhoRelationships.py -i Randomarticle.xml -o a -r 5

gives authorship for all tokens of revision 5 (has to be an actual revision id) of Randomarticle

example B:

python WikiwhoRelationships.py -i Randomarticle.xml -o r -r 5

gives the edit interactions produced at every revision to other revisions, up to revision number 5 (has to be an actual revision id) of Randomarticle

Contact

  • Fabian Floeck: fabian.floeck[.]gesis.org
  • Maribel Acosta: maribel.acosta[.]kit.edu

License

This work is licensed under the MIT license.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.