kotartemiy / pygooglenews Goto Github PK
View Code? Open in Web Editor NEWIf Google News had a Python library
Home Page: https://newscatcherapi.com
License: MIT License
If Google News had a Python library
Home Page: https://newscatcherapi.com
License: MIT License
This is my code.
The following result is obtained.
#Exception: Could not parse your date
Why can't it recognize the date?
I tried to install it
pip install pygooglenews
.
.
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [1 lines of output]
error in feedparser setup command: use_2to3 is invalid.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Hi there
I've just started using your pygooglenews library - it works nicely :)
I was wondering whether it's possible for an extra argument to be added to the function arguments to specify the number of returned news articles? Currently it's limited to 100 - I don't know whether that's a limit imposed by google or not...
No worries if it's not feasible.
D
So, for the time being, it seems like dateparser itself is broken. Here's a stackoverflow thread detailing the issue, https://stackoverflow.com/questions/71498132/error-in-heroku-regex-regex-core-error-bad-escape-d-at-position-7-when-usin
To put it simply, whenever you try to use the from_ or to_ arguments, you get the error "error: bad escape \d at position 7", which is related to issues between regex and dateparser. I was able to fix it by rolling back regex to 2022.3.2, but you may want to find a more long term solution.
Traceback (most recent call last):
File "C:\pygooglenews.py", line 1, in
from pygooglenews import GoogleNews
Here's my code :
from pygooglenews import GoogleNews
gn = GoogleNews()
search = gn.search('lockdown')
print(search)
Hello, how can I maximize my results to only getting 5 results for a top news search?
The newscatcherapi.com link in the README points to https://github.com/kotartemiy/pygooglenews/blob/master/newscatcherapi.com, which returns a 404.
Perhaps it ought to link to:
or
Hello!
I would say it's more of a question than an issue, sorry, but it there a way to mix or combine both of the topic_headlines
and the search
methods. Like looking for news on a specific topic, but filtrating it according to some keyword search.
Thank you
The search by date range from_ & to_ don't seem to return the full results when including both pre-2020 and post-2020 date parameters.
For example:
search = gn.search(f"intite:{ticker_name}",from_='2017-01-01', to_='2020-12-01') # only the results for 2020 are returned.
I have worked around this for now by splitting this into two queries:
first_search = gn.search(f"intite:{ticker_name}",from_='2017-01-01', to_='2019-12-31') # results returned as expected
second_search = gn.search(f"intite:{ticker_name}",from_='2020-01-01', to_='2020-12-01')# results returned as expected
hello how I can i get the news subsections in business such as 'personal finance', etc. Is there an option to do this?
link for reference
https://news.google.com/topics/CAAqKggKIiRDQkFTRlFvSUwyMHZNRGx6TVdZU0JXVnVMVWRDR2dKSFFpZ0FQAQ?hl=en-GB&gl=GB&ceid=GB%3Aen
I have this very simple code
gn = GoogleNews()
start = datetime.date(2018,3,1)
end = datetime.date(2019,3,1)
print(start)
gn.search(query="car", from_=start.strftime('%Y-%m-%d'), to_=end.strftime('%Y-%m-%d'))
AttributeError Traceback (most recent call last)
/opt/anaconda3/lib/python3.8/site-packages/pygooglenews/init.py in __from_to_helper(self, validate)
89 try:
---> 90 validate = parse_date(validate).strftime('%Y-%m-%d')
91 return str(validate)
/opt/anaconda3/lib/python3.8/site-packages/dateparser/conf.py in wrapper(*args, **kwargs)
84
---> 85 return f(*args, **kwargs)
86 return wrapper
/opt/anaconda3/lib/python3.8/site-packages/dateparser/init.py in parse(date_string, date_formats, languages, locales, region, settings)
52
---> 53 data = parser.get_date_data(date_string, date_formats)
54
/opt/anaconda3/lib/python3.8/site-packages/dateparser/date.py in get_date_data(self, date_string, date_formats)
416 for locale in self._get_applicable_locales(date_string):
--> 417 parsed_date = _DateLocaleParser.parse(
418 locale, date_string, date_formats, settings=self._settings)
/opt/anaconda3/lib/python3.8/site-packages/dateparser/date.py in parse(cls, locale, date_string, date_formats, settings)
193 instance = cls(locale, date_string, date_formats, settings)
--> 194 return instance._parse()
195
/opt/anaconda3/lib/python3.8/site-packages/dateparser/date.py in _parse(self)
197 for parser_name in self._settings.PARSERS:
--> 198 date_obj = self._parsersparser_name
199 if self._is_valid_date_obj(date_obj):
/opt/anaconda3/lib/python3.8/site-packages/dateparser/date.py in _try_parser(self)
221 self._settings.DATE_ORDER = self.locale.info.get('date_order', _order)
--> 222 date_obj, period = date_parser.parse(
223 self._get_translated_date(), settings=self._settings)
/opt/anaconda3/lib/python3.8/site-packages/dateparser/conf.py in wrapper(*args, **kwargs)
84
---> 85 return f(*args, **kwargs)
86 return wrapper
/opt/anaconda3/lib/python3.8/site-packages/dateparser/date_parser.py in parse(self, date_string, settings)
36 stz = get_localzone()
---> 37 date_obj = stz.localize(date_obj)
38 else:
AttributeError: 'backports.zoneinfo.ZoneInfo' object has no attribute 'localize'
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
in
26 return stories
27
---> 28 df = pd.DataFrame(get_news('Banana'))
in get_news(search)
13
14 for date in date_list[:-1]:
---> 15 search = gn.search(search, from_=date.strftime('%Y-%m-%d'), to_=(date+delta).strftime('%Y-%m-%d'))
16 newsitem = search['entries']
17
/opt/anaconda3/lib/python3.8/site-packages/pygooglenews/init.py in search(self, query, helper, when, from_, to_, proxies, scraping_bee)
139
140 if from_ and not when:
--> 141 from_ = self.from_to_helper(validate=from)
142 query += ' after:' + from
143
/opt/anaconda3/lib/python3.8/site-packages/pygooglenews/init.py in __from_to_helper(self, validate)
91 return str(validate)
92 except:
---> 93 raise Exception('Could not parse your date')
94
95
Exception: Could not parse your date
`
I would appreciate any help
Thank you for the great tool! I would like to scrap large scale news data from google news, however when I use the keyword 'covid' to get the response for 48 months, I got only 100 news data.
Is that normal? I don't think google news have that less data related to the topic, or does the API limits the amounts of response? Here is my code:
gn = GoogleNews()
search = gn.search("covid", when = '60m') # 設定關鍵字
all_news = search['entries']
print("There are totally {} news".format(len(all_news)))
Hi, great work, wondering if it's possible to combine geo() with searches? So for example
"~/geo/NY" and "apnews"? Thanks for any pointers you can give, in your library or generally.
q='Todd Hido' when='6m'
produces this url:
https://news.google.com/rss/search?q=Todd+Hido+when:6m&ceid=US:en&hl=en-US&gl=US
returns no results
yet, searching with this url
https://www.google.com/search?q=todd+hido&oq=todd+hido&tbm=nws
shows plenty of results within the last six months
removing the when
produces results, but a different set of results to the web ui version
Think you meant to write "Before we start, ... then I would advise you to use one of the 3 methods described below" instead of "above" in https://github.com/kotartemiy/pygooglenews#working-with-google-news-in-production. Cheers.
Hi,
I tried to download some articles using pygooglenews but it only gives me 100 links. I put the from and to dates of 2020-01-01 and 2020-07-07. Please help.
$ pip install pygooglenews --upgrade
Collecting pygooglenews
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/84/9e/893c2336f2faa6fa96b0f86b794cccb99f4b090a5c62a61d3eeee594acff/pygooglenews-0.1.1-py3-none-any.whl
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/84/9e/893c2336f2faa6fa96b0f86b794cccb99f4b090a5c62a61d3eeee594acff/pygooglenews-0.1.1-py3-none-any.whl
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/84/9e/893c2336f2faa6fa96b0f86b794cccb99f4b090a5c62a61d3eeee594acff/pygooglenews-0.1.1-py3-none-any.whl
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/84/9e/893c2336f2faa6fa96b0f86b794cccb99f4b090a5c62a61d3eeee594acff/pygooglenews-0.1.1-py3-none-any.whl
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/84/9e/893c2336f2faa6fa96b0f86b794cccb99f4b090a5c62a61d3eeee594acff/pygooglenews-0.1.1-py3-none-any.whl
ERROR: Could not install packages due to an EnvironmentError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Max retries exceeded with url: /packages/84/9e/893c2336f2faa6fa96b0f86b794cccb99f4b090a5c62a61d3eeee594acff/pygooglenews-0.1.1-py3-none-any.whl (Caused by ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)"))
$ ping files.pythonhosted.org
PING dualstack.r.ssl.global.fastly.net (151.101.1.63) 56(84) bytes of data.
64 bytes from 151.101.1.63 (151.101.1.63): icmp_seq=1 ttl=55 time=18.2 ms
64 bytes from 151.101.1.63 (151.101.1.63): icmp_seq=2 ttl=55 time=25.5 ms
64 bytes from 151.101.1.63 (151.101.1.63): icmp_seq=3 ttl=55 time=9.44 ms
64 bytes from 151.101.1.63 (151.101.1.63): icmp_seq=4 ttl=55 time=10.0 ms
64 bytes from 151.101.1.63 (151.101.1.63): icmp_seq=5 ttl=55 time=36.3 ms
^C
--- dualstack.r.ssl.global.fastly.net ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 9.449/19.928/36.369/10.125 ms
I got this error while downloading package using pip3 install pygooglenews
:
Collecting feedparser<6.0.0,>=5.2.1
Using cached feedparser-5.2.1.zip (1.2 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [1 lines of output]
error in feedparser setup command: use_2to3 is invalid.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Tried with pip3 install feedparser
but no results
Hey yall, I properly imported the packages, but for some reason I keep getting the following errors. Any suggestions?
Im running the following code:
from pygooglenews import GoogleNews
import json
import time
gn = GoogleNews()
top = gn.top_news()
entries = top["entries"]
count = 0
for entry in entries:
count = count + 1
print(
str(count) + ". " + entry["title"] + entry["published"]
)
time.sleep(0.25)
AttributeError: 'zoneinfo.ZoneInfo' object has no attribute 'localize'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ~\Documents\python class\pygooglenews.py:16 in
s = gn.search( query="NVIDIA" , from_ = '2024-04-01', to_ = '2024-04-02' )
File ~\anaconda3\lib\site-packages\pygooglenews_init_.py:141 in search
from_ = self._from_to_helper(validate=from)
File ~\anaconda3\lib\site-packages\pygooglenews_init_.py:93 in from_to_helper
raise Exception('Could not parse your date'); the code works fine without from and to, but It can't parse the date
Can we get top headlines within a date range?
When adding ScrapingBee API key the function gives an error:
Exception: ScrapingBee status_code: 500 {"error": "If you wish to scrape Google, use the custom_google=True parameter. ! Each requests will costs 20 credits !"}
Attempting to run some simple text queries for a project however I don't seem to get proper results anymore, is this library broken?
(module 'base64' has no attribute 'decodestring') this method has removed from python 3.9 and changed to decodebytes.
This bug impossibilize the lib import
i tried pip install pygooglenews
on the cmd, and also on the terminal of VisualStudio Code but it gives me this error everytime:
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [1 lines of output]
error in feedparser setup command: use_2to3 is invalid.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
i tried different methods, such as to upgrade pip, pip3, and various solutions people propose here: facebook/prophet#418
but nothing works....
If someone has any help, it would be helpful!
Thanks
First of all, I would like to thank you for sharing this great library! 💯
I am having trouble with getting any kind of RSS feed in google news. When using query search, the response is an empty list []
from pygooglenews import GoogleNews
gn = GoogleNews()
search = gn.search('lockdown')
search['entries']
Also, adding /rss/ after news.google.com/ in the browser, show no result for query search
while top_news and topics seems to work fine, both with pygooglenews and adding /rss/ in the browser
Is this something related with news.google itself? is there some workarround?
hello, i got this readout on docker build...
pygooglenews 0.1.2 requires beautifulsoup4<5.0.0,>=4.9.1, but you'll have beautifulsoup4 4.8.1 which is incompatible.
pygooglenews 0.1.2 requires requests<3.0.0,>=2.24.0, but you'll have requests 2.21.0 which is incompatible.
when I tested in development everything worked fine... may I ask is it possible it can still work with this requests and beautifulsoup version? or maybe it only fully works with your required ones but mostly works otherwise with the versions I have?
This project is unmaintained. If you need a workaround, you can force the dependencies to be a specific version like this:
python -m pip install "beautifulsoup4==4.9.1"
python -m pip install "dateparser==0.7.6"
python -m pip install "requests==2.24.0"
python -m pip install "feedparser==6.0.8"
python -m pip install --no-deps pygooglenews
This should work. (If it doesn't, check if the versions have changed in pyproject.toml
for some reason)
I have tried installing the library but it is failing to install itself. I used collab to run it and windows 11
!pip install pygooglenews --upgrade
Collecting pygooglenews
Using cached pygooglenews-0.1.2-py3-none-any.whl (10 kB)
Requirement already satisfied: beautifulsoup4<5.0.0,>=4.9.1 in /usr/local/lib/python3.10/dist-packages (from pygooglenews) (4.11.2)
Collecting dateparser<0.8.0,>=0.7.6 (from pygooglenews)
Using cached dateparser-0.7.6-py2.py3-none-any.whl (362 kB)
Collecting feedparser<6.0.0,>=5.2.1 (from pygooglenews)
Using cached feedparser-5.2.1.zip (1.2 MB)
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Preparing metadata (setup.py) ... error
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
I even tested in the API service, it returns:
"message":"[lang] parameter's form is not correct/not supported language"
This package depends on feedparser
, which has an error in this section of feedparser.py
(line 91):
# Python 3.1 deprecates decodestring in favor of decodebytes
_base64decode = getattr(base64, 'decodebytes', base64.decodestring)
This has been deprecated since Python 3.1, but it was completely removed in Python 3.9. This makes it unable to import.
Hello, I cannot install pygooglenews
Simply running pip install pygooglenews
doesn't work, neither does installing setuptools
Collecting pygooglenews
Using cached pygooglenews-0.1.2-py3-none-any.whl (10 kB)
Requirement already satisfied: beautifulsoup4<5.0.0,>=4.9.1 in /home/gitpod/.pyenv/versions/3.12.1/lib/python3.12/site-packages (from pygooglenews) (4.12.2)
Collecting dateparser<0.8.0,>=0.7.6 (from pygooglenews)
Using cached dateparser-0.7.6-py2.py3-none-any.whl (362 kB)
Collecting feedparser<6.0.0,>=5.2.1 (from pygooglenews)
Using cached feedparser-5.2.1.zip (1.2 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [11 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 14, in <module>
File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/setuptools/__init__.py", line 16, in <module>
import setuptools.version
File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/setuptools/version.py", line 1, in <module>
import pkg_resources
File "/workspace/.pyenv_mirror/user/current/lib/python3.12/site-packages/pkg_resources/__init__.py", line 2158, in <module>
register_finder(pkgutil.ImpImporter, find_on_path)
^^^^^^^^^^^^^^^^^^^
AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
I am wanting to find out if there is a way to pull the "thumbnail" image from each article?
I feel like that's the only thing this library is missing!!
Thanks so much in advance to anyone who can answer!
I noticed when I checked for my home country it was still showing American news.
Hi,
I'm trying to gather all the results around a specific topic, I cannot get more than 100 results.
Can it be changed ?
Cheers
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.