kotartemiy / pygooglenews Goto Github PK

View Code? Open in Web Editor NEW

1.3K 32.0 134.0 3.9 MB

If Google News had a Python library

Home Page: https://newscatcherapi.com

License: MIT License

Python 100.00%

news google rss python data-science

pygooglenews's People

Contributors

Stargazers

Watchers

Forkers

hirajanwin stavskal tarsbase valrcs sekmet d33tah nagarc jackoceancolor ocakgun phillipadsmith sonrebmax feconroses shafiahmed johnjboren kalapathar macginitie wouldayajustlookatit saifrahmed alfa14290 orange888 jjbalcarcel tuangeek aspiringguru skymeson jyotirmaya rkrishna116 kevincliffo ashishpatel26 chishaku chorseng iamvazu henrywong62992809 joanasoaresf discidiumtechnologies hypokras stevefirsake weak-interaction enzoferras889 artem090587 suvrajeet01 umw0lverine coreira sec99 bogorman vocuzi thundree kdargie yafaa gururajang baifengbai shaggy63 sofianeboumedine mlgirl pavan-94 maybeee18 anicyber-team n0x-l anonymousss9 alykhan swipswaps baseball333 abhinavgairola yuanli1 austinekrash jcma08 cememrt pratheepkanati 8w9ag bnder1 mahmudmoni titania1 mizanur55 alyetama schrecka sw-membership mathiasfls paykounkoun sammyrulez javis25 girimanojkumar nkasmanoff-dwight sipioteo heatherbone sortsofecon sadykova dueuwh noecc12 vishalbelsare redgene omardadabhoy sunilsankar rockykitamura romreborn benwaldner algonacci webclinic017 ashispapu vertexninja barry-07 gg-big-org

pygooglenews's Issues

Could not parse your date

This is my code.

s=gn.search('energy digital transformation',helper=True,from_ =date1.strftime("%Y-%m-%d"), to_ =date2.strftime("%Y-%m-%d"))

The following result is obtained.
#Exception: Could not parse your date

Why can't it recognize the date?

error in feedparser setup command: use_2to3 is invalid.

I tried to install it
pip install pygooglenews

but I got this error

.
.
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [1 lines of output]
error in feedparser setup command: use_2to3 is invalid.
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Suggestion: argument to specify number of returned articles

Hi there

I've just started using your pygooglenews library - it works nicely :)

I was wondering whether it's possible for an extra argument to be added to the function arguments to specify the number of returned news articles? Currently it's limited to 100 - I don't know whether that's a limit imposed by google or not...

No worries if it's not feasible.

Major issue with using from and to.

So, for the time being, it seems like dateparser itself is broken. Here's a stackoverflow thread detailing the issue, https://stackoverflow.com/questions/71498132/error-in-heroku-regex-regex-core-error-bad-escape-d-at-position-7-when-usin

To put it simply, whenever you try to use the from_ or to_ arguments, you get the error "error: bad escape \d at position 7", which is related to issues between regex and dateparser. I was able to fix it by rolling back regex to 2022.3.2, but you may want to find a more long term solution.

ImportError: cannot import name 'GoogleNews' from partially initialized module 'pygooglenews'

Hello, I'm having problem with running pygooglenews.
My error result as follows:

Traceback (most recent call last):

File "C:\pygooglenews.py", line 1, in
from pygooglenews import GoogleNews

ImportError: cannot import name 'GoogleNews' from partially initialized module 'pygooglenews' (most likely due to a circular import) (C:\pygooglenews.py)

Here's my code :

from pygooglenews import GoogleNews
gn = GoogleNews()
search = gn.search('lockdown')
print(search)

Max Results

Hello, how can I maximize my results to only getting 5 results for a top news search?

Broken link in README

The newscatcherapi.com link in the README points to https://github.com/kotartemiy/pygooglenews/blob/master/newscatcherapi.com, which returns a 404.

Perhaps it ought to link to:

https://newscatcherapi.com/

https://github.com/kotartemiy/newscatcher.

Mix both topics and search

Hello!
I would say it's more of a question than an issue, sorry, but it there a way to mix or combine both of the topic_headlines and the search methods. Like looking for news on a specific topic, but filtrating it according to some keyword search.

Thank you

not getting full search results when date range includes pre-2020 and post-2020 parameters

The search by date range from_ & to_ don't seem to return the full results when including both pre-2020 and post-2020 date parameters.

For example:
search = gn.search(f"intite:{ticker_name}",from_='2017-01-01', to_='2020-12-01') # only the results for 2020 are returned.

I have worked around this for now by splitting this into two queries:

first_search = gn.search(f"intite:{ticker_name}",from_='2017-01-01', to_='2019-12-31') # results returned as expected

second_search = gn.search(f"intite:{ticker_name}",from_='2020-01-01', to_='2020-12-01')# results returned as expected

news subsections

hello how I can i get the news subsections in business such as 'personal finance', etc. Is there an option to do this?

link for reference
https://news.google.com/topics/CAAqKggKIiRDQkFTRlFvSUwyMHZNRGx6TVdZU0JXVnVMVWRDR2dKSFFpZ0FQAQ?hl=en-GB&gl=GB&ceid=GB%3Aen

Exception: Could not parse your date

I have this very simple code

gn = GoogleNews()

start = datetime.date(2018,3,1)

end = datetime.date(2019,3,1)

print(start)

gn.search(query="car", from_=start.strftime('%Y-%m-%d'), to_=end.strftime('%Y-%m-%d'))

but it's giving me the error of
`

AttributeError Traceback (most recent call last)
/opt/anaconda3/lib/python3.8/site-packages/pygooglenews/init.py in __from_to_helper(self, validate)
89 try:
---> 90 validate = parse_date(validate).strftime('%Y-%m-%d')
91 return str(validate)

/opt/anaconda3/lib/python3.8/site-packages/dateparser/conf.py in wrapper(*args, **kwargs)
84
---> 85 return f(*args, **kwargs)
86 return wrapper

/opt/anaconda3/lib/python3.8/site-packages/dateparser/init.py in parse(date_string, date_formats, languages, locales, region, settings)
52
---> 53 data = parser.get_date_data(date_string, date_formats)
54

/opt/anaconda3/lib/python3.8/site-packages/dateparser/date.py in get_date_data(self, date_string, date_formats)
416 for locale in self._get_applicable_locales(date_string):
--> 417 parsed_date = _DateLocaleParser.parse(
418 locale, date_string, date_formats, settings=self._settings)

/opt/anaconda3/lib/python3.8/site-packages/dateparser/date.py in parse(cls, locale, date_string, date_formats, settings)
193 instance = cls(locale, date_string, date_formats, settings)
--> 194 return instance._parse()
195

/opt/anaconda3/lib/python3.8/site-packages/dateparser/date.py in _parse(self)
197 for parser_name in self._settings.PARSERS:
--> 198 date_obj = self._parsersparser_name
199 if self._is_valid_date_obj(date_obj):

/opt/anaconda3/lib/python3.8/site-packages/dateparser/date.py in _try_parser(self)
221 self._settings.DATE_ORDER = self.locale.info.get('date_order', _order)
--> 222 date_obj, period = date_parser.parse(
223 self._get_translated_date(), settings=self._settings)

/opt/anaconda3/lib/python3.8/site-packages/dateparser/conf.py in wrapper(*args, **kwargs)
84
---> 85 return f(*args, **kwargs)
86 return wrapper

/opt/anaconda3/lib/python3.8/site-packages/dateparser/date_parser.py in parse(self, date_string, settings)
36 stz = get_localzone()
---> 37 date_obj = stz.localize(date_obj)
38 else:

AttributeError: 'backports.zoneinfo.ZoneInfo' object has no attribute 'localize'

During handling of the above exception, another exception occurred:

Exception Traceback (most recent call last)
in
26 return stories
27
---> 28 df = pd.DataFrame(get_news('Banana'))

in get_news(search)
13
14 for date in date_list[:-1]:
---> 15 search = gn.search(search, from_=date.strftime('%Y-%m-%d'), to_=(date+delta).strftime('%Y-%m-%d'))
16 newsitem = search['entries']
17

/opt/anaconda3/lib/python3.8/site-packages/pygooglenews/init.py in search(self, query, helper, when, from_, to_, proxies, scraping_bee)
139
140 if from_ and not when:
--> 141 from_ = self.from_to_helper(validate=from)
142 query += ' after:' + from
143

/opt/anaconda3/lib/python3.8/site-packages/pygooglenews/init.py in __from_to_helper(self, validate)
91 return str(validate)
92 except:
---> 93 raise Exception('Could not parse your date')
94
95

Exception: Could not parse your date
`

I would appreciate any help

does this work with python3 ?

My search results return fewer news than I expect, is this normal?

Thank you for the great tool! I would like to scrap large scale news data from google news, however when I use the keyword 'covid' to get the response for 48 months, I got only 100 news data.
Is that normal? I don't think google news have that less data related to the topic, or does the API limits the amounts of response? Here is my code:

gn = GoogleNews()
search = gn.search("covid", when = '60m') # 設定關鍵字

all_news = search['entries']

print("There are totally {} news".format(len(all_news)))

Combining geo with search?

Hi, great work, wondering if it's possible to combine geo() with searches? So for example
"~/geo/NY" and "apnews"? Thanks for any pointers you can give, in your library or generally.

'when' not working

q='Todd Hido' when='6m'

produces this url:

https://news.google.com/rss/search?q=Todd+Hido+when:6m&ceid=US:en&hl=en-US&gl=US

returns no results

yet, searching with this url

https://www.google.com/search?q=todd+hido&oq=todd+hido&tbm=nws

shows plenty of results within the last six months

removing the when produces results, but a different set of results to the web ui version

Copy editing

Think you meant to write "Before we start, ... then I would advise you to use one of the 3 methods described below" instead of "above" in https://github.com/kotartemiy/pygooglenews#working-with-google-news-in-production. Cheers.

Only able to retrieve 100 links

Hi,
I tried to download some articles using pygooglenews but it only gives me 100 links. I put the from and to dates of 2020-01-01 and 2020-07-07. Please help.

Could it be that the installation is broken?

$ pip install pygooglenews  --upgrade
Collecting pygooglenews
  WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/84/9e/893c2336f2faa6fa96b0f86b794cccb99f4b090a5c62a61d3eeee594acff/pygooglenews-0.1.1-py3-none-any.whl
  WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/84/9e/893c2336f2faa6fa96b0f86b794cccb99f4b090a5c62a61d3eeee594acff/pygooglenews-0.1.1-py3-none-any.whl
  WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/84/9e/893c2336f2faa6fa96b0f86b794cccb99f4b090a5c62a61d3eeee594acff/pygooglenews-0.1.1-py3-none-any.whl
  WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/84/9e/893c2336f2faa6fa96b0f86b794cccb99f4b090a5c62a61d3eeee594acff/pygooglenews-0.1.1-py3-none-any.whl
  WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/84/9e/893c2336f2faa6fa96b0f86b794cccb99f4b090a5c62a61d3eeee594acff/pygooglenews-0.1.1-py3-none-any.whl
ERROR: Could not install packages due to an EnvironmentError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Max retries exceeded with url: /packages/84/9e/893c2336f2faa6fa96b0f86b794cccb99f4b090a5c62a61d3eeee594acff/pygooglenews-0.1.1-py3-none-any.whl (Caused by ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)"))

$ ping files.pythonhosted.org
PING dualstack.r.ssl.global.fastly.net (151.101.1.63) 56(84) bytes of data.
64 bytes from 151.101.1.63 (151.101.1.63): icmp_seq=1 ttl=55 time=18.2 ms
64 bytes from 151.101.1.63 (151.101.1.63): icmp_seq=2 ttl=55 time=25.5 ms
64 bytes from 151.101.1.63 (151.101.1.63): icmp_seq=3 ttl=55 time=9.44 ms
64 bytes from 151.101.1.63 (151.101.1.63): icmp_seq=4 ttl=55 time=10.0 ms
64 bytes from 151.101.1.63 (151.101.1.63): icmp_seq=5 ttl=55 time=36.3 ms
^C
--- dualstack.r.ssl.global.fastly.net ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 9.449/19.928/36.369/10.125 ms

Unable to install: error in feedparser setup command: use_2to3 is invalid.

I got this error while downloading package using pip3 install pygooglenews :

Collecting feedparser<6.0.0,>=5.2.1
  Using cached feedparser-5.2.1.zip (1.2 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [1 lines of output]
      error in feedparser setup command: use_2to3 is invalid.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Tried with pip3 install feedparser but no results

Package Errors

Hey yall, I properly imported the packages, but for some reason I keep getting the following errors. Any suggestions?

Im running the following code:

from pygooglenews import GoogleNews
import json
import time

gn = GoogleNews()
top = gn.top_news()

entries = top["entries"]
count = 0
for entry in entries:
  count = count + 1
  print(
    str(count) + ". " + entry["title"] + entry["published"]
  )
  time.sleep(0.25)

Exception: Could not parse your date

AttributeError: 'zoneinfo.ZoneInfo' object has no attribute 'localize'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File ~\Documents\python class\pygooglenews.py:16 in
s = gn.search( query="NVIDIA" , from_ = '2024-04-01', to_ = '2024-04-02' )

File ~\anaconda3\lib\site-packages\pygooglenews_init_.py:141 in search
from_ = self._from_to_helper(validate=from)

File ~\anaconda3\lib\site-packages\pygooglenews_init_.py:93 in from_to_helper
raise Exception('Could not parse your date')； the code works fine without from and to, but It can't parse the date

Top headlines by date

Can we get top headlines within a date range?

Can I obtain intraday data by hour?

errorusing ScrapingBee

When adding ScrapingBee API key the function gives an error:

Exception: ScrapingBee status_code: 500 {"error": "If you wish to scrape Google, use the custom_google=True parameter. ! Each requests will costs 20 credits !"}

Code Is Broken Due To Google Changes?

Attempting to run some simple text queries for a project however I don't seem to get proper results anymore, is this library broken?

AttributeError at importation

(module 'base64' has no attribute 'decodestring') this method has removed from python 3.9 and changed to decodebytes.
This bug impossibilize the lib import

i Can't Install it!!!

i tried pip install pygooglenews on the cmd, and also on the terminal of VisualStudio Code but it gives me this error everytime:

error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [1 lines of output]
      error in feedparser setup command: use_2to3 is invalid.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

i tried different methods, such as to upgrade pip, pip3, and various solutions people propose here: facebook/prophet#418

but nothing works....

If someone has any help, it would be helpful!
Thanks

is rss still working on google news?

First of all, I would like to thank you for sharing this great library! 💯

I am having trouble with getting any kind of RSS feed in google news. When using query search, the response is an empty list []

from pygooglenews import GoogleNews

gn = GoogleNews()
search = gn.search('lockdown')
search['entries']

Also, adding /rss/ after news.google.com/ in the browser, show no result for query search

while top_news and topics seems to work fine, both with pygooglenews and adding /rss/ in the browser

Is this something related with news.google itself? is there some workarround?

dependencies question

hello, i got this readout on docker build...
pygooglenews 0.1.2 requires beautifulsoup4<5.0.0,>=4.9.1, but you'll have beautifulsoup4 4.8.1 which is incompatible.
pygooglenews 0.1.2 requires requests<3.0.0,>=2.24.0, but you'll have requests 2.21.0 which is incompatible.

when I tested in development everything worked fine... may I ask is it possible it can still work with this requests and beautifulsoup version? or maybe it only fully works with your required ones but mostly works otherwise with the versions I have?

Can't install - feedreader dependency issue

This project is unmaintained. If you need a workaround, you can force the dependencies to be a specific version like this:

python -m pip install "beautifulsoup4==4.9.1"
python -m pip install "dateparser==0.7.6"
python -m pip install "requests==2.24.0"
python -m pip install "feedparser==6.0.8"
python -m pip install --no-deps pygooglenews

This should work. (If it doesn't, check if the versions have changed in pyproject.toml for some reason)

error in installing pygooglenews

I have tried installing the library but it is failing to install itself. I used collab to run it and windows 11

!pip install pygooglenews --upgrade

Collecting pygooglenews
Using cached pygooglenews-0.1.2-py3-none-any.whl (10 kB)
Requirement already satisfied: beautifulsoup4<5.0.0,>=4.9.1 in /usr/local/lib/python3.10/dist-packages (from pygooglenews) (4.11.2)
Collecting dateparser<0.8.0,>=0.7.6 (from pygooglenews)
Using cached dateparser-0.7.6-py2.py3-none-any.whl (362 kB)
Collecting feedparser<6.0.0,>=5.2.1 (from pygooglenews)
Using cached feedparser-5.2.1.zip (1.2 MB)
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

It seems to not work on lan "pt" however it shows as available

I even tested in the API service, it returns:

"message":"[lang] parameter's form is not correct/not supported language"

base 64 deprecation

This package depends on feedparser, which has an error in this section of feedparser.py (line 91):

# Python 3.1 deprecates decodestring in favor of decodebytes
_base64decode = getattr(base64, 'decodebytes', base64.decodestring)

This has been deprecated since Python 3.1, but it was completely removed in Python 3.9. This makes it unable to import.

Error installing the package

Hello, I cannot install pygooglenews

Simply running pip install pygooglenews doesn't work, neither does installing setuptools

Collecting pygooglenews
  Using cached pygooglenews-0.1.2-py3-none-any.whl (10 kB)
Requirement already satisfied: beautifulsoup4<5.0.0,>=4.9.1 in /home/gitpod/.pyenv/versions/3.12.1/lib/python3.12/site-packages (from pygooglenews) (4.12.2)
Collecting dateparser<0.8.0,>=0.7.6 (from pygooglenews)
  Using cached dateparser-0.7.6-py2.py3-none-any.whl (362 kB)
Collecting feedparser<6.0.0,>=5.2.1 (from pygooglenews)
  Using cached feedparser-5.2.1.zip (1.2 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [11 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 14, in <module>
        File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/setuptools/__init__.py", line 16, in <module>
          import setuptools.version
        File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/setuptools/version.py", line 1, in <module>
          import pkg_resources
        File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/pkg_resources/__init__.py", line 2158, in <module>
          register_finder(pkgutil.ImpImporter, find_on_path)
                          ^^^^^^^^^^^^^^^^^^^
      AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Is there a way to pull "thumbnail" images from the posts as well?

I am wanting to find out if there is a way to pull the "thumbnail" image from each article?

I feel like that's the only thing this library is missing!!

Thanks so much in advance to anyone who can answer!

Add list of other country

I noticed when I checked for my home country it was still showing American news.

Having more than 100 results from a "search"

Hi,
I'm trying to gather all the results around a specific topic, I cannot get more than 100 results.
Can it be changed ?
Cheers