Giter Site home page Giter Site logo

cliff-api-client's Introduction

This is the source code for the Media Cloud core system. Media Cloud, a joint project of the Berkman Center for Internet & Society at Harvard University and the Center for Civic Media at MIT, is an open source, open data platform that allows researchers to answer complex quantitative and qualitative questions about the content of online media.

For more information on Media Cloud, go to mediacloud.org.

Note: Most users prefer to use Media Cloud's API and public tools to query our data instead of running their own Media Cloud instance.

The code in this repository will be of interest to those users who wish to run their own Media Cloud instance and users of the public tools who want to understand how Media Cloud is implemented.

The Media Cloud code here does three things:

  • Runs a web app that allows you to manage a set of media sources and their feeds.

  • Periodically crawls the feeds setup within the web app and downloads any new stories found within the downloaded feeds.

  • Extracts the substantive text from the downloaded story content (minus the ads, navigation, comments, etc.) and associates a set of tags with each story based on that extracted text.

For very brief installation instructions, see INSTALL.markdown.

Please send us a note at [email protected] if you are using any of this code or if you have any questions. We are very interested in knowing who's using the code and for what.

Build Status

Pull, build, push, test

History of the Project

Print newspapers are declaring bankruptcy nationwide. High-profile blogs are proliferating. Media companies are exploring new production techniques and business models in a landscape that is increasingly dominated by the Internet. In the midst of this upheaval, it is difficult to know what is actually happening to the shape of our news. Beyond one-off anecdotes or painstaking manual content analysis, there are few ways to examine the emerging news ecosystem.

The idea for Media Cloud emerged through a series discussions between faculty and friends of the Berkman Center. The conversations would follow a predictable pattern: one person would ask a provocative question about what was happening in the media landscape, someone else would suggest interesting follow-on inquiries, and everyone would realize that a good answer would require heavy number crunching. Nobody had the time to develop a huge infrastructure and download all the news just to answer a single question. However, there were eventually enough of these questions that we decided to build a tool for everyone to use.

Some of the early driving questions included:

  • Do bloggers introduce storylines into mainstream media or the other way around?
  • What parts of the world are being covered or ignored by different media sources?
  • Where do stories begin?
  • How are competing terms for the same event used in different publications?
  • Can we characterize the overall mix of coverage for a given source?
  • How do patterns differ between local and national news coverage?
  • Can we track news cycles for specific issues?
  • Do online comments shape the news?

Media Cloud offers a way to quantitatively examine all of these challenging questions by collecting and analyzing the news stream of tens of thousands of online sources.

Using Media Cloud, academic researchers, journalism critics, policy advocates, media scholars, and others can examine which media sources cover which stories, what language different media outlets use in conjunction with different stories, and how stories spread from one media outlet to another.

Sponsors

Media Cloud is made possible by the generous support of the Ford Foundation, the Open Society Foundations, and the John D. and Catherine T. MacArthur Foundation.

Collaborators

Past and present collaborators include Morningside Analytics, Betaworks, and Bit.ly.

License

Media Cloud is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Media Cloud is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with Media Cloud . If not, see <http://www.gnu.org/licenses/>.

cliff-api-client's People

Contributors

dependabot[bot] avatar rahulbot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cliff-api-client's Issues

experiment with person disambiguation from "entity-linking" literature

We only disambiguate places right now, even though we turn other entity types as well. I ran into this recent paper on SWAT - A System for Detecting Salient Wikipedia Entities in Texts, which tries to identify salient entities by linking them to Wikipedia entries (and lots more after that). While the live demo of SWAT didn't work well for me in terms of the salience aspect of news articles, we should check out some of the "entity linking" tools it links to.

I couldn't find links for the other two "salience" tools they mention: CMU-Google and SEL. Maybe those were just internal projects without public-facing APIs / code?

Error in parse_text() - ValueError("No JSON object could be decoded")

I've just updated to the latest version of cliff.api (note my previous update would have been in 2018), and now can't seem to get parse_text functioning. I can access JSON through the localhost in the browser, so I know the VM must be running correctly, but when I run the following:

from cliff.api import Cliff
my_cliff = Cliff('http://localhost:8999')
output = my_cliff.parse_text("This is about Einstien at the IIT in New Delhi.")

I come up against the below error:

Traceback (most recent call last):
  File "C:/Users/joeym/Documents/carnivore_text_mining/python/geoparse_text.py", line 3, in <module>
    output = my_cliff.parse_text("This is about Einstien at the IIT in New Delhi.")
  File "C:\Python27\lib\site-packages\cliff\api.py", line 31, in parse_text
    return self._parse_query(self.PARSE_TEXT_PATH, cleaned_text, demonyms, language)
  File "C:\Python27\lib\site-packages\cliff\api.py", line 58, in _parse_query
    return self._query(path, payload)
  File "C:\Python27\lib\site-packages\cliff\api.py", line 65, in _query
    return r.json()
  File "C:\Python27\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "C:\Python27\Lib\json\__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\Lib\json\decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python27\Lib\json\decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
>>> 

Whereas in the browser, the same text call to the localhost spits out the JSON:

{"results":{"organizations":[{"count":1,"name":"IIT"}],"places":{"focus":{"cities":[{"id":1261481,"lon":77.22445,"name":"New Delhi","score":1,"countryGeoNameId":"1269750","countryCode":"IN","featureCode":"PPLC","featureClass":"P","stateCode":"07","lat":28.63576,"stateGeoNameId":"1273293","population":317797}],"states":[{"id":1273293,"lon":77.1,"name":"National Capital Territory of Delhi","score":1,"countryGeoNameId":"1269750","countryCode":"IN","featureCode":"ADM1","featureClass":"A","stateCode":"07","lat":28.6667,"stateGeoNameId":"1273293","population":16787941}],"countries":[{"id":1269750,"lon":79.0,"name":"Republic of India","score":1,"countryGeoNameId":"1269750","countryCode":"IN","featureCode":"PCLI","featureClass":"A","stateCode":"00","lat":22.0,"stateGeoNameId":"","population":1173108018}]},"mentions":[{"id":1261481,"lon":77.22445,"source":{"charIndex":38,"string":"New Delhi"},"name":"New Delhi","countryGeoNameId":"1269750","countryCode":"IN","featureCode":"PPLC","featureClass":"P","stateCode":"07","confidence":1.0,"lat":28.63576,"stateGeoNameId":"1273293","population":317797}]},"people":[{"count":1,"name":"Einstien"}]},"status":"ok","milliseconds":221,"version":"2.3.0"}

The module worked perfectly before, so I assume it must either be an update to the module itself, or some new dependency incompatibility. Any ideas what the problem might be or how I might diagnose myself? I should add, thank you so much for this python module - it's been incredibly useful!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.