Giter Site home page Giter Site logo

newsapi_'s Introduction

newsapi_

Intro

This repository provides an overview of pulling news articles from NewsAPI. In addition the code here will do a simple deduplication based on the title and entity relevance determination.

Deduplication is based on Min-hash & LSH While relevance detemination is based on Spacy and a simple heuristic.

To run the code, clone the repo, then install the requirements

pip install -r requirements.txt

The entry is located at:

app/main.py

The code can be executed via:

main.py --search_query Bloomberg --language en --api_key <APIKEY HERE>

You will need to provision your APIKEYS from NewsAPI yourself.


Docker

Additionally, the repository contains a Dockerfile that can deploy a service Make sure to obtain API KEYs from NewsAPI and include it in the Dockerfile on line #6

ENV API_KEY=#<REPLACE WITH YOUR OWN API KEY FROM https://newsapi.org/>

Then, to set up the docker simply run the following commands

#build docker container based on Dockerfile and tag it
docker build -t news_api ./

#run to start container based on image
docker run -d --name newsapi_ctnr -p 8002:80 -v ~/newsapi_sample/:/data news_api

Now you can test if the docker is running by using python -

# in python to test the connection
import requests
params={
    "search_query":"Bloomberg",    
}
resp = requests.get('http://127.0.0.1:8002/articles',params=params)
print(f'{resp.json()}')

Sample Results

  • The sample payload looks as follows -
{
    "query": "Bloomberg",
    "results": 100,
    "articles": [
        {
            "source": {
                "id": "wired",
                "name": "Wired"
            },
            "author": "Noam Cohen",
            "title": "Michael Bloomberg, Geezer Tech Bro",
            "description": "He may have founded his startup way back in the early 1980s, but he likes to \u201cmove fast and break things\u201d too.",
            "url": "https://www.wired.com/story/michael-bloomberg-geezer-tech-bro/",
            "urlToImage": "https://media.wired.com/photos/5e45e1590334170008035101/191:100/w_1280,c_limit/Ideas-Bloomberg-Tech-1200442377.jpg",
            "publishedAt": "2020-02-14T13:00:00Z",
            "content": "Bloombergs tech startup wasnt typical. It was always based in New York City, for one thing. The time was the early 1980s, and Bloomberg was 39, a general partner at the prominent Wall Street firm Salomon Brothers. He had been a successful trader who was then \u2026 [+3844 chars]",
            "dn_id": "f9aff21e-8a2b-426f-803d-86784fd5e9f2",
            "dn_duplicate_ids": [],
            "dn_confidence": "1",
            "dn_relevance": 1
        },

where fields: "dn_id" - UUID of the article "dn_duplicate_ids" - list of UUID IDs that are duplicates of the article (based on minhash & LSH hash) "dn_confidence" - Entity relevance confidence, this value is always 1 due to use of a heuristic "dn_relevance" - Entity relevance of an article belonging to the query

newsapi_'s People

Contributors

felflare avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.