Giter Site home page Giter Site logo

eventregistry / event-registry-python Goto Github PK

View Code? Open in Web Editor NEW
224.0 18.0 53.0 832 KB

Python package for API access to news articles and events in the Event Registry

Home Page: http://eventregistry.org/

License: MIT License

Python 100.00%
event-registry python news news-aggregator news-feed newsapi events information-extraction news-articles

event-registry-python's Introduction

Event Registry is a Python package that can be used to easily access the news data available in Event Registry through the API. The package can be used to query for articles or events by filtering using a large set of filters, like keywords, concepts, topics, sources, sentiment, date, etc. Details about the News API are available on the landing page of the product.

Installation

Event Registry package can be installed using Python's pip installer. In the command line, simply type:

pip install eventregistry

and the package should be installed. Alternatively, you can also clone the package from the GitHub repository. After cloning it, open the command line and run:

python setup.py install

Validating installation

To ensure the package has been properly installed run python and type:

import eventregistry

If you don't get any error messages, then your installation has been successful.

Updating the package

As features are added to the package you will need at some point to update it. In case you have downloaded the package from GitHub simply do a git pull. If you have installed it using the pip command, then simply run

pip install eventregistry --upgrade

Authentication and API key

When making queries to Event Registry you will have to use an API key that you can obtain for free. The details on how to obtain and use the key are described in the Authorization section.

Four simple examples to get you interested

Print a list of recently articles or blog posts from US based sources with positive sentiment mentioning phrases "George Clooney" or "Sandra Bullock"

from eventregistry import *
er = EventRegistry(apiKey = YOUR_API_KEY)

# get the USA URI
usUri = er.getLocationUri("USA")    # = http://en.wikipedia.org/wiki/United_States

q = QueryArticlesIter(
    keywords = QueryItems.OR(["George Clooney", "Sandra Bullock"]),
    minSentiment = 0.4,
    sourceLocationUri = usUri,
    dataType = ["news", "blog"])

# obtain at most 500 newest articles or blog posts, remove maxItems to get all
for art in q.execQuery(er, sortBy = "date", maxItems = 500):
    print(art)

Print a list of most relevant business articles from the last month related to Microsoft or Google. The articles should be in any language (including Chinese, Arabic, ...)

from eventregistry import *
# allowUseOfArchive=False will allow us to search only over the last month of data
er = EventRegistry(apiKey = YOUR_API_KEY, allowUseOfArchive=False)

# get the URIs for the companies and the category
microsoftUri = er.getConceptUri("Microsoft")    # = http://en.wikipedia.org/wiki/Microsoft
googleUri = er.getConceptUri("Google")          # = http://en.wikipedia.org/wiki/Google
businessUri = er.getCategoryUri("news business")    # = news/Business

q = QueryArticlesIter(
    conceptUri = QueryItems.OR([microsoftUri, googleUri]),
    categoryUri = businessUri)

# obtain at most 500 newest articles, remove maxItems to get all
for art in q.execQuery(er, sortBy = "date", maxItems = 500):
    print(art)

Search for latest events related to Star Wars

from eventregistry import *
er = EventRegistry(apiKey = YOUR_API_KEY)

q = QueryEvents(keywords = "Star Wars")
q.setRequestedResult(RequestEventsInfo(sortBy = "date", count = 50))   # request event details for latest 50 events

# get the full list of 50 events at once
print(er.execQuery(q))

Search for articles that (a) mention immigration, (b) are related to business, and (c) were published by news sources located in New York City

from eventregistry import *
er = EventRegistry(apiKey = YOUR_API_KEY)

q = QueryArticlesIter(
    # here we don't use keywords so we will also get articles that mention immigration using various synonyms
    conceptUri = er.getConceptUri("immigration"),
    categoryUri = er.getCategoryUri("business"),
    sourceLocationUri = er.getLocationUri("New York City"))

# obtain 500 articles that have were shared the most on social media
for art in q.execQuery(er, sortBy = "socialScore", maxItems = 500):
    print(art)

What are the currently trending topics

from eventregistry import *
er = EventRegistry(apiKey = YOUR_API_KEY)

# top 10 trending concepts in the news
q = GetTrendingConcepts(source = "news", count = 10)
print(er.execQuery(q))

Learning from examples

We believe that it's easiest to learn how to use our service by looking at examples. For this reason, we have prepared examples of various most used features. View the examples grouped by main search actions:

View examples of searching for articles

View examples of searching for events

View examples of obtaining information about an individual event

Examples of how to obtain the full feed of articles

Examples of how to obtain the full feed of events

Play with interactive Jupyter notebook

To interactively learn about how to use the SDK, see examples of use, see how to get extra meta-data properties, and more, please open this Binder. You'll be able to view and modify the examples.

Where to next?

Terminology. There are numerous terms in the Event Registry that you will constantly see. If you don't know what we mean by an event, story, concept or category, you should definitely check this page first.

Learn about EventRegistry class. You will need to use the EventRegistry class whenever you will want to interact with Event Registry so you should learn about it.

Details about articles/events/concepts/categories/... that we can provide. When you will be requesting information about events, articles, concepts, and other things, what details can you ask for each of these?

Querying events. Check this page if you are interested in searching for events that match various search criteria, such as relevant concepts, keywords, date, location or others.

Querying articles. Read if you want to search for articles based on the publisher's URL, article date, mentioned concepts or others.

Trends. Are you interested in finding which concepts are currently trending the most in the news? Maybe which movie actor is most popular in social media? How about trending of various news categories?

Articles and events shared the most on social media. Do you want to get the list of articles that have been shared the most on Facebook and Twitter on a particular date? What about the most relevant event based on shares on social media?

Data access and usage restrictions

Event Registry is a commercial service but it allows also unsubscribed users to perform a certain number of operations. Non-paying users are not allowed to use the obtained data for any commercial purposes (see the details on our Terms of Service page) and have access to only last 30 days of content. In order to avoid these restrictions please contact us about the available plans.

event-registry-python's People

Contributors

deanflood avatar gregorleban avatar jphalip avatar mattgallivan avatar muki avatar renaud avatar scovetta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

event-registry-python's Issues

More Sample Code?

Hi,

Would it be possible to post more sample code? I'm unsure how to (for example) run a query for the last 5 years, for any article title that matches the phrase "soybeans".

I'm also unclear on the difference between a keyword and a conceptUri..

Get source of events

Hi,

I am trying to track the information flow in news media. Specifically, I download all the articles belonging to a particular event and find out the sources of these articles. For example, articles a,b and c are clustered as one event by EventRegistry. So, I would like to know the source of a, the source of b and the source of c.

But the issue is that the daily limitation on the number of queries for the event with all the articles inside is 2000. It takes one query to download all the articles of each event. Using this approach, I can track the flow of 2000 events per day only.

Is there any way I can download more in one day? The article content is not very important to me. I just need the timestamp and sources of all the articles inside one event. I was thinking about downloading as many articles as possible (that gives me 2000 queries per day * 200 articles per query) and then find out all the articles belonging to the same events on my machine locally. Would this approach work?

Thanks in advance.

python 3.5

does it support python 3.5 i managed to succesfully install via pip.

however code fails

File "", line 1, in
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/eventregistry/init.py", line 2, in
from .EventRegistry import *
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/eventregistry/EventRegistry.py", line 62
print "Event Registry host: %s" % (self._host)

i tested the above using interactive python. from eventregistry import *

List of concepts or categories

Can someone point to where I can see the list of categories/concepts that exist?
I intend to filter my query by categories if possible.
Example: Searching article titles for the word 'obama' yields the article 'Former Michelle Obama chief to head Grammys inclusion group' which is in the business/art_music category. I want to ignore this category if possible.

Provided uri is not valid concept uri.

Hey,
I am trying to get news with the Python API for a given company, using its concept uri. For many companies I have tried, the concept search worked, but for few companies I encountered the following error:

Provided uri is not valid concept uri.

For instance, for this company, I am running the following commands:
concept_uri = er.getConceptUri(conceptLabel='Amer International Group')
# the result is 'http://en.wikipedia.org/wiki/Amer_International_Group'
er.getConceptInfo(concept_uri)
This throws an error:

Out[13]: {'http://en.wikipedia.org/wiki/Amer_International_Group': {'error': "Provided uri ('http://en.wikipedia.org/wiki/Amer_International_Group') is not valid concept uri."}}

If I understand correctly, there are several legal entity types on Wikipedia for different types of organizations, so in order to cover these cases, should I use something like this?

concept_uri = er.getConceptUri(conceptLabel='Amer International Group', sources=['org', 'private', 'public', 'limited partnership'])

In the code documentation of the getConceptUri function I can only see this possible input for concept sources:
loc, org, wiki, person

Could you please let me know if these are the current types of sources or whether I can/should use other types of pages from Wikipedia?
Thanks

Complex query returning results outside of specified date range

Hi,

I'm having difficulty putting together a complex query using the JSON structure. I've followed the instructions here, but I'm still getting odd results. Specifically, I'm getting results outside of the date range I've specified, I'm also seeing some very low-ranked sources in my results, suggesting the startSourceRankPercentile and endSourceRankPercentile params aren't working.

What I'm trying to do is get results between the given date range (required) and source rank range (required), where at least one of the listed concepts or keywords occur.

See below for a rough reproduction of my query.

{
    "query": {
        "$query": {
            "$and": [
                {
                    "dateStart": "2022-02-01",
                    "dateEnd": "2022-03-01",
                    "startSourceRankPercentile": 0,
                    "endSourceRankPercentile": 20,
                    "$or": [
                        {
                            "conceptUri": {
                                "$or": [
                                    "http://en.wikipedia.org/wiki/British_Columbia",
                                    "http://en.wikipedia.org/wiki/Flood",
                                    "http://en.wikipedia.org/wiki/Natural_disaster",
                                    "http://en.wikipedia.org/wiki/Environment_and_Climate_Change_Canada",
                                    "http://en.wikipedia.org/wiki/Global_warming",
                                    "http://en.wikipedia.org/wiki/Severe_weather",
                                    "http://en.wikipedia.org/wiki/K\\u00f6ppen_climate_classification",
                                    "http://en.wikipedia.org/wiki/University_of_Victoria",
                                    "http://en.wikipedia.org/wiki/Preprint",
                                    "http://en.wikipedia.org/wiki/Peer_review",
                                    "http://en.wikipedia.org/wiki/Probability",
                                    "http://en.wikipedia.org/wiki/Atmospheric_river",
                                    "http://en.wikipedia.org/wiki/Precipitation",
                                    "http://en.wikipedia.org/wiki/Email"
                                ]
                            }
                        },
                        {
                            "keyword": {
                                "$or": [
                                    "A list",
                                    "of a bunch of",
                                    "people",
                                    "that aren't on wikipedia"
                                ]
                            }
                        }
                    ]
                }
            ]
        }
    },
    "resultType": "articles",
    "articlesPage": 1,
    "articlesSortBy": "rel",
    "articlesArticleBodyLen": -1,
    "includeArticleConcepts": true,
    "includeArticleSocialScore": true,
    "includeArticleLocation": true,
    "includeSourceLocation": true,
    "includeSourceRanking": true,
    "forceMaxDataTimeWindow": -1,
    "apiKey": "MY_API_KEY"
}

Timeout Error for some concepts

I am querying for SP 500 companies. And my query is,

start_date = datetime.date(2020, 2, 10)
end_date = datetime.date(2020, 2, 26)
query = QueryArticlesIter(conceptUri='https://en.wikipedia.org/wiki/AbbVie_Inc.',
categoryUri='dmoz/Business',
lang=['eng'],
dateStart=start_date,
dateEnd=end_date,
dataType = ["news", 'pr'])
for q in query.execQuery(er, sortBy='date', sortByAsc=True, maxItems=10):
print(q)

sometimes this query longs forever and some other times I get the following error (in German),

Event Registry exception while executing the request:
('Connection aborted.', TimeoutError(10060, 'Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat', None, 10060, None))
The request will be automatically repeated in 3 seconds...

Is there any solution to handle this error or to optimize my query and speed it up.

Why categories field is empty ?

Hi,
i'm executing this query.

q = QueryArticles(lang = ["ita"])
    q.addRequestedResult(RequestArticlesInfo(page = 1, count = 30, sortBy = "date", sortByAsc = False, returnInfo =
    ReturnInfo(articleInfo = ArticleInfoFlags(bodyLen = -1,
                 basicInfo = True,
                 title = True,
                 body = True,
                 eventUri = False,
                 concepts = False,
                 storyUri = False,
                 duplicateList = False,
                 originalArticle = False,
                 categories = True,
                 location = False,
                 image = True,
                 extractedDates = False,
                 socialScore = False,
                 details = True))))
    return er.execQuery(q)

In the results, the "categories" field is empty for each news.
schermata 2017-04-01 alle 10 51 01

Why ?

Greetings

Rate Limited when searching for articles in specific language?

I'm utilizing the code below to search for articles for a list of topics. This method works fine, but as soon as I add the lang argument, the api suddenly starts rate limiting me. Why is this happening? Code is below:

def get_articles(keyList):
alist = []
for key in keyList:
q = QueryArticlesIter(conceptUri = er.getConceptUri(key), lang='eng')
for article in q.execQuery(er, articleBatchSize = 10, ):
alist.append(article)
return alist

Accessing articles from an event

I am performing an event query, and what to see the articles associated with a given event. Somehow I can't see this information in the returned result, or find relevant information on the wiki. Did I miss this?

ImportError: No module named 'EventRegistry' - Python3

EventRegistry doesn't seem to work out of the box with Python3 like promised. Here are the steps I took:

From the terminal:
pip3 install eventregistry

Then
python3 test.py (source code is included below - it's literally just importing and referencing EventRegistry)

Which gives me the following error:

Traceback (most recent call last):
  File "test.py", line 1, in <module>
    from eventregistry import *
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/eventregistry/__init__.py", line 1, in <module>
    from EventRegistry import *
ImportError: No module named 'EventRegistry'

I can reproduce this behavior both on OSX 10.12.1 and Bash for Windows (although I'd expect the latter to be a bit buggy).

The test.py file is just the following:

from eventregistry import *

er = EventRegistry()

How to apply text analytics for extracted articles.

Hi,

I already queried for some news for one year and saved them in a database. I wanted to get the annotation for each news based on its unique Identifier (e.g., uri :"1031133810") from the EventRegistry. Is it possible to send a request with the uri and get the annotation as a response or I have to do it while querying the news and get the raw text and annotations in one-go?
Thanks for your support.

Filtering GetRecentArticles

Hi, congratulations on your DNI funding.
Is it possible to filter stream of recent articles by Source or Event? Right now it sends every single article added to ER which is not very useful as it is way above 200 limit of maxArticleCount.
Thanks

{u'error': u'No results match the criteria'} in Django app

So I'm trying to pull political news (nationally) and it was working fine. But now I get an output of

{u'error': u'No results match the criteria'}

when using my code in views.py of a Django app. If I run the same code in a standalone python program it gives me the above error.

Here's my code


	current = datetime.today()
	current.strftime("%Y-%m-%d")
	d = datetime.today() - timedelta(days=1)
	d.strftime("%Y-%m-%d")

	#Get Events
	er = EventRegistry()
	q = QueryEvents(lang = "eng")
	q.addConcept(er.getConceptUri("Politics", lang = "eng"))   
	q.addLocation(er.getLocationUri("United States"))
	#Sort events by their size of articles
	q.addRequestedResult(RequestEventsInfo(sortBy = "size", count=10))   # return event details for last 10 events
	q.setDateLimit(d, current)
	#q.EventInfoFlags()

	res = er.execQuery(q)

	#   Print all of the info the the execQuery
	pprint(res)

Could it be a limit on my requests? I've definitely done this a bunch of times, but I don't know why it would work in it's own python code.

I didn't know where else to post this question.

AttributeError: 'dict' object has no attribute 'getConceptUri'

I am using the latest version of EventRegistry. When i try to add a Concept using er.getConceptUri("Reliance Communications") i get the above mentioned error. The error occurs when i login using a username and password. If i login without username and password i dont see this error.


python news.py
Event Registry host: http://eventregistry.org
{u'action': u'success', u'desc': u'Login successful'}
Reliance Communications
Traceback (most recent call last):
File "news.py", line 16, in
q.addConcept(er.getConceptUri(x))
AttributeError: 'dict' object has no attribute 'getConceptUri'


Please help.

Regards

Bala

Usage Limits

Hi,

I would like to know the procedure for requesting for more usage limit, or at least discuss a way to prevent my account getting locked.

Can't get ArticlesUriList

Hi, I am trying to get the ArticlesUriList as a result of my query, but keep getting the following error:

{'error': 'The user does not have sufficient permissions to query multiple return types in a single call.'}

What are the multiple return types? Is this just not possible in the free plan or am I doing it wrong?

er = EventRegistry(apiKey=API_KEY)

q = QueryArticles()
q.addNewsSource(['www.nytimes.com'])
q.addRequestedResult(RequestArticlesUriList())
print(er.execQuery(q))

Thanks a lot in advance.
Mirco

Wildcard

Hello!

I'm trying to figure out if it is possible to query for articles using a wildcard '*' symbol. E.g. if I'm looking for 'fish', 'fishing' and 'fisheries', is there a way to just look for 'fish*'?

Thanks in advance!

Jasper Ginn.

data not appearing

Something is wrong, I am not able to get any results. this is the output of running the newEvents example python gist.

`hrishikeshbman@cloudshell:$ vi h.py
hrishikeshbman@cloudshell:
$ python h.py
Event Registry host: http://beta.eventregistry.org
22:44:34 request took 0.588 sec. Response size: 0.21KB

0 articles were added since the last call
sleeping for 20 seconds...
22:44:39 request took 0.297 sec. Response size: 0.21KB

0 articles were added since the last call
sleeping for 20 seconds...
22:44:45 request took 0.296 sec. Response size: 0.21KB

0 articles were added since the last call
sleeping for 20 seconds...
22:44:50 request took 0.294 sec. Response size: 0.21KB

0 articles were added since the last call
sleeping for 20 seconds...
22:44:55 request took 0.293 sec. Response size: 0.21KB

0 articles were added since the last call
sleeping for 20 seconds...
22:45:01 request took 0.301 sec. Response size: 0.21KB

0 articles were added since the last call
sleeping for 20 seconds...
22:45:06 request took 0.294 sec. Response size: 0.21KB

0 articles were added since the last call
sleeping for 20 seconds...
^CTraceback (most recent call last):
File "h.py", line 24, in
time.sleep(5)
KeyboardInterrupt
hrishikeshbman@cloudshell:~$ `

GetTopSharedArticles issue

HI Leban,

I'm not getting any results for GetTopSharedArticles queries:

from eventregistry import *
er = EventRegistry(apiKey = key)
q = GetTopSharedArticles(date = "2015-05-23", count = 30)
ret = er.execQuery(q)

Output:

{'articles': {'page': 1, 'totalResults': 0, 'pages': 0, 'results': []}}

Anyway I can get some results/

Thanks,
Roland

No results when start/end date specified

For any constellation of query, whenever I set a start and end date, the api will return: No results match the search conditions.
Not only have I tested to query articles which I would receive when I do not set a date but also the given query from the example folder returns no articles:

from eventregistry import *
er = EventRegistry()

q = QueryArticles(
    dateStart = datetime.date(2016, 3, 22), dateEnd = datetime.date(2016, 3, 23),
    conceptUri = er.getConceptUri("Brussels"),
    sourceUri = er.getNewsSourceUri("New York Times"))

# return details about the articles, including the concepts, categories, location and image
q.setRequestedResult(RequestArticlesInfo(count = 30,
    returnInfo = ReturnInfo(
        articleInfo = ArticleInfoFlags(duplicateList = True, concepts = True, categories = True, location = True, image = True))))
# execute the query
res = er.execQuery(q)

I wonder whether this is a bug and you can reproduce this behaviour?

Handling the exception of not available tokens for the specific day

Hi,
I am getting this exception message:

Event Registry exception while executing the request:
You have used all available tokens for unsubscribed users from this IP address. In order to continue using Event Registry please subscribe to a paid plan.
The request will be automatically repeated in 3 seconds...

when I iterate over the result of my search query, but I cannot find where exactly this exception happens internally (and I also cannot handle it properly because it does not appear to be of the type Exception).

This is how I iterate over the response of my query:

er = EventRegistry( ... )
 query_result = query.execQuery(er, ...)
        for article in query_result:
            print(article)  # shows the above-mentioned message

I would like to handle this exception and stop calling the API, and then try the next day. Apparently the method er.getRemainingAvailableRequests shows the remaining tokens in general, not for the specific day and this one er.getDailyAvailableRequests just shows general information about the default tokens in event registry.

How can I get the remaining daily tokens in Python? I also don't see a relevant HTTP status code for the Rest API: https://eventregistry.org/documentation?tab=introduction

Don't get any result for an existing conceptUri.

I upgraded to a monthly subscription and querying S&P 500 companies.
I noticed for some companies like 3M with conceptUri: https://en.wikipedia.org/wiki/3M, I still don't get any result for the last year. I am wondering if I'm doing something wrong or it's an issue with the library. Is there any way to test the conceptUri for 500 companies without using 500 tokens just to make sure I can get the correct results?

er = EventRegistry(apiKey=api_key)
query = QueryArticlesIter(conceptUri="https://en.wikipedia.org/wiki/3M",
                          lang=['eng'],
                          dateStart=datetime.date(2019, 1, 1),
                          dateEnd=datetime.date(2020, 2, 1),
                          )

for cur_query in query.execQuery(er, sortBy='date',
                                 sortByAsc=True,
                                 maxItems=10):
    print(cur_query)

Getting requestInfo for all queried articles.

Hi,

I am using the following query, and dump the results in a local json file.

query = QueryArticlesIter(concept="my_concept")
query.setRequestedResult(RequestArticlesInfo(count=100,
                                                     returnInfo=ReturnInfo(
                                                         articleInfo=ArticleInfoFlags(
                                                             bodyLen=-1,
                                                             concepts=True ))
                                                     )
                                 )
news = er.execQuery(query)
news_list = []
for article in news["articles"]["results"]:
    news_list.append(article)
news_file.write(json.dumps(news_list))

My Question
Although there are 313 articles for my query, but the above code only returns 100 of them. As I understood from the documentation, it's because of count=100 argument in RequestArticlesInfo. Is it possible to apply ReturnInfo for all queried articles in the above code?
Thanks for your help in advance.

Control length of returned article text?

Is it possible to increase the length of article text returned from the API? Or is there a maximum length that can be returned? I'm looking to get more text content from articles for analysis and the default length is limited.

Sorting event by virality

When I use QueryEventsIter, is there a way to sort the returned events by virality as the webpage?
Thanks!

Wrong "body" value for certain articles

It seems that the "Body" JSON values are not correct for certain articles. It looks like they are taken from other articles.

Examples are articles with URI 525017041 or 525216702

Regards,
Matt

No JSON object could be decoded

@gregorleban
I was running through various examples from the wiki ... but haven't had any success in articles returning. Just keep getting "No JSON object could be decoded". I'm wondering am I doing something wrong? Thanks.

0 results

Hi, I'm just trying to do a standard concept query and I'm getting 0 results. What am I doing wrong?

`
TRAINING_STARTDATE = date(year = 2014, month = 1, day = 1)
TRAINING_ENDDATE = date(year = 2019, month = 3, day = 13)

def query(outfile, params):
q = QueryArticles(
keywords = params["keywords"],
conceptUri = "https://en.wikipedia.org/wiki/Apple_Inc.",
categoryUri = params["categories"],
sourceUri = params["sources"],
sourceLocationUri = None,
sourceGroupUri = None,
authorUri = None,
locationUri = None,
lang = "eng",
dateStart = TRAINING_ENDDATE - datetime.timedelta(days=7),
dateEnd = TRAINING_ENDDATE,
dateMentionStart = None,
dateMentionEnd = None,
keywordsLoc = "title",
ignoreKeywords = None,
ignoreConceptUri = None,
ignoreCategoryUri = None,
ignoreSourceUri = None,
ignoreSourceLocationUri = None,
ignoreSourceGroupUri = None,
ignoreAuthorUri = None,
ignoreLocationUri = None,
ignoreLang = None,
ignoreKeywordsLoc = "body",
isDuplicateFilter = "keepAll",
hasDuplicateFilter = "keepAll",
eventFilter = "keepAll",
startSourceRankPercentile = params["rankStart"], #experiment with this
endSourceRankPercentile = params["rankEnd"],
dataType = ["news", "pr", "blogs"])

r = ReturnInfo(articleInfo = ArticleInfoFlags(
    bodyLen = -1,
    title = True,
    basicInfo = True,
    body = True,
    url = True,
    eventUri = True,
    authors = True,
    concepts = True,
    categories = True,
    links = False,
    videos = False,
    image = False,
    socialScore = True,
    sentiment = True,
    location = False,
    dates = False,
    extractedDates = False,
    duplicateList = False,
    originalArticle = False,
    storyUri = False))

articles = RequestArticlesInfo(
    page = 1, 
    count = 100, 
    sortBy = "date",
    sortByAsc = False,
    returnInfo = r
    )

uris = RequestArticlesUriWgtList(
    page = 1,
    count = 50000,
    sortBy = "")

time = RequestArticlesTimeAggr()

q.setRequestedResult(articles)

results = er.execQuery(q) 
print(results)

with open(outfile, "w") as fp:
    json.dump(results, fp)

if name == "main":

query("./data/test.json", {
    "keywords": "Apple",
    "concepts": "https://en.wikipedia.org/wiki/Apple_Inc.",
    "categories": None,
    "sources": None,
    "rankStart": 0,
    "rankEnd": 100
}

`

Query with multiple search terms

Hello,
Thanks for the great tool.
I plan to get all the articles(atleast the count) matching a query (ex: "Angelique kerber US open")
but since the query is converted to concept URI, Is there a way I can query multiple keywords.

Also the mentions count in news and social media doesn't seem to work for social media(It works great for the news though). It always returns me 0 count. Is there any issue around social media data?
Thank you.

Adding Timeout

When eventregistry is slow to respond (it can be over 30 sec), it would be good to have a timeout on requests.

QueryEventsIter ignores requestedResult parameter

Running a QueryEventsIter request with specified requestedResult parameter returns data with default returnInfo settings. Example code:

from eventregistry import *

import argparse
# parse the input arguments
parser = argparse.ArgumentParser()
parser.add_argument("apiKey", help="Event Registry API key (obtainable on you ER profile page).")
args = parser.parse_args()

er = EventRegistry(apiKey = args.apiKey)

# specify return info - we turn off summary and concepts
returnInfoSpec = RequestEventsInfo(
    sortBy = "rel",
    returnInfo = ReturnInfo(
        eventInfo = EventInfoFlags(
            title=True,
            summary=False,
            articleCounts=False,
            concepts=False,
            categories=False,
            location=True,
            date=True)))

# build query
q = QueryEventsIter(
    conceptUri = er.getConceptUri("Obama"),
    requestedResult = returnInfoSpec)

# run query
for event in q.execQuery(er, sortBy = "date", maxItems=1):
    # event contains default info (including summary, concepts ...)
    print(event.keys())

not json serialisable

I'm testing code from the wiki

!/usr/local/bin/python3.5

from eventregistry import *
er = EventRegistry()

q = GetTrendingConcepts(source="news",count=10)
print(er.execQuery(q))

I keep getting error
TypeError: b'news' is not JSON serializable

Can I make a case sensitive keyword query?

qStr = ComplexArticleQuery(
        CombinedQuery.AND([
            BaseQuery(dateStart = from_date, dateEnd = till_date),
            BaseQuery(keyword = QueryItems.OR([symbol, name])),
            BaseQuery(categoryUri = QueryItems.OR([businessUri, financeUri])),
            BaseQuery(lang = "eng")
        ])
    )

like if the symbol happen to be "INTL", which is the stock code of Intel Crop., the result articles are not of my interest. 

Warning for account deactivation

Hi,

I am currently using the Python news API of event registry and I have approximately used 50% of my monthly tokens. When I perform searches, I am getting the following warning, so I stopped using the API to avoid my account being disabled. I just tried to use it again and the warning appeared only after having done 5 searches.
Do you know why is this warning happening and how I can fix this? I'd like to use the rest of my tokens today. (It is the last day of the month and the tokens expire afterwards).
Edit: The keywords that I am using are not generic and sometimes it can happen that they don't return any articles. Thus, I assume the slow requests are not caused by having too broad or no keywords.
Edit 2: I have also added a 5-second timer between the query.execQuery() calls but I keep getting this warning. I am also counting the time that a query takes to be executed and it takes less than a second, even though the warning reports several seconds.

=========== WARNING ===========
The processing of the request took a lot of time (41 sec). By repeatedly making slow requests your account will be automatically disabled.
===============================

Use of API Key through SDK?

Hey, there!

From my understanding, it is not possible to pass the API key generated in the website to the EventRegistry SDK, right? Unless I have it stored in a settings.json file in the same directory as the EventRegistry.py file, that is (which I only found out reading the code's comments, I think this should also be in the wiki somewhere). This makes the SDK pretty useless if you intend to do more than 20 searches a day (in which case you'd need to use your API key), right?
I'd strongly recommend that you add a parameter to the EventRegistry.__init__() for the apiKey. Also, it would be great if the wiki mentioned the need for an API key, because I only found out about it once my 20 requests were done and I accidentally found the section of the control panel where you can find the API key.
Finally, it's not clear to me: is it possible to group terms in the keywords parameter? Can I do something like q = QueryArticles(keywords="'new york' baseball 'mets stadium'"?
Thanks!

Getting data only for last 10 days

I was able to get data only for the last 10 days. Is there any new restriction? I am able to browse through the last month data on the website though. ( I tied getting India Vs England data for last month). Please respond.
Thanks.

request() got an unexpected keyword argument 'json'

Hi,
i'm trying to use for the first time EventRegistry API.
I receive this error :
"Event Registry exception while executing the request:
request() got an unexpected keyword argument 'json' "
when i try to execute a query.

Is this caused by a mistake in the use of API?

Greetings

Downloading Articles related to an Event

Hi,

I have been able to use a combination of QueryEvents along with ReturnInfo flags to get a list of events that occurred in a particular time period.

I have enabled the flag for stories, however, I am unable to find all the articles pertaining to a story, and I can only the medoid article for that story.

Is there a way to get all the links to the articles about an event?

Find articles which exclude a specific location

Hi,
when i try to find the articles in a specific location (for example Italy), i'm doing this.

q = QueryArticles( lang = ["ita"])
    q.addLocation(er.getLocationUri("Italy"))

How i can find, instead, the articles which exclude a specific location?

Greetings

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.