jmoiron / speedparser Goto Github PK

View Code? Open in Web Editor NEW

101.0 17.0 20.0 49.74 MB

feedparser but faster and worse

License: MIT License

Python 100.00%

speedparser's Issues

Update pip version

Updated pip to reflect changes in master

like encoding dectecting with parse( ..., encoding=True)

https://pypi.python.org/pypi/speedparser

ImportError: cannot import name 'parse'

I do everything the same way, as in the tutorial, but get this error on Python 3.4.

Using legacy feedparser date parsing breaks on this feed.

http://www.theprizefinder.com/feed/top-prizes

Not exactly sure on the best approach here... I don't want to uninstall feedparser since I use it when parsing fails catastrophically, but I also don't really want to use the broken date stuff. Perhaps we could optionally force feedparser compat instead of using actual feedparser?

Not work with http://news.ycombinator.com/rss

>>> import speedparser
>>> feed = "http://news.ycombinator.com/rss"
>>> speedparser.parse(feed)
{'bozo_tb': 'Traceback (most recent call last):\n  File "/home/kir/.virtualenvs/rss/lib/python3.5/site-packages/speedparser/speedparser.py", line 688, in parse\n    parser = SpeedParser(document, cleaner, unix_timestamp, encoding)\n  File "/home/kir/.virtualenvs/rss/lib/python3.5/site-packages/speedparser/speedparser.py", line 585, in __init__\n    self.tree = tree.getroottree()\nAttributeError: \'NoneType\' object has no attribute \'getroottree\'\n', 'bozo_exception': AttributeError("'NoneType' object has no attribute 'getroottree'",), 'bozo': 1, 'feed': {}, 'entries': []}

Does speedparser compatible with python 3.7?

I get import error while trying it importing:

>>> import speedparser
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python\Python37\lib\site-packages\speedparser\__init__.py", line 1, in <module>
    from speedparser import parse
ImportError: cannot import name 'parse' from 'speedparser' (C:\Python\Python37\lib\site-packages\speedparser\__init__.py)

I've checked __init__.py and if change first line to from .speedparser import parse it will fix import error, however later it throw another error:

>>> import speedparser
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python\Python37\lib\site-packages\speedparser\__init__.py", line 1, in <module>
    from .speedparser import parse
  File "C:\Python\Python37\lib\site-packages\speedparser\speedparser.py", line 19, in <module>
    import urlparse
ModuleNotFoundError: No module named 'urlparse'

And it's look like it python 2 module, at least there a few libraries:
https://pypi.org/project/urlparse2/
https://pypi.org/project/urlparse3/
https://pypi.org/project/urlparse4/
but all of them only for python 2.

ImportError: cannot import name 'parse' from partially initialized module 'speedparser'

I installed speedparser using pip install speedparser . When I tried importing it in a python shell, it gives the following error (in both VSCode and the Windows Terminal).

>>> import speedparser
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\speedparser\__init__.py", line 1, in <module>
    from speedparser import parse
ImportError: cannot import name 'parse' from partially initialized module 'speedparser' (most likely due to a circular import) (C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\speedparser\__init__.py)

The same error also exists when I try to import it by including the import statement in a .py file.

I have feedparser installed along speedparser. I don't think that is the issue but it seems like the only thing that might cause a circular import. I don't have any files named speedparser.py or parse.py in the current project directory.

Related issues #13 .

Dont get it worked out when I parse an url

I get an error when I try to parse something.
I try to parse this url: http://rss.cnn.com/rss/money_news_companies.rss

I work with Python3.4 and tornadoweb. So I made a change in the 'speedparser.py' file. I changed urlparse into urllib.parse, because urlparse is not longer supported in python3.

The error is:
{'bozo_tb': 'Traceback (most recent call last):\n File "C:\Python34\lib\site-packages\speedparser\speedparser.py", line 685, in parse\n parser = SpeedParser(document, cleaner, unix_timestamp, encoding)\n File "C:\Python34\lib\site-packages\speedparser\speedparser.py", line 582, in init\n self.tree = tree.getroottree()\nAttributeError: 'NoneType' object has no attribute 'getroottree'\n', 'bozo': 1, 'bozo_exception': AttributeError("'NoneType' object has no attribute 'getroottree'",), 'feed': {}, 'entries': []}

Issue with spaces in xmlns namespace

The following feed URL fails due to spaces in the xmlns attribute:

xmlns = 'http://www.w3.org/2005/Atom'

Speedparser returns an error about not being able to determine the version. The code appears to be specifically looking for xmlns= with no spaces.

lxml.clean cannot defang some raw text ("<3")

lxml.html.clean cannot clean some raw text (in titles, descriptions, etc); particularly, text that might look like html but isn't. The first case I noticed was the tag "<title><3</title>" will fail with a ParserError('Document is empty'), which is likely an underlying libxml2 issue.

We cannot simply pass the text through, as <3<script>alert('foo');</script> will also raise this same error. Currently, there is a regression test "TestHeartParserError" which confirms this error in speedparser and confirms that feedparser will read this content.

Embedded YouTube videos not playing

https://github.com/hiidef/flavorsdjango/issues/206

YouTube embeds display as thumbnails in WordPress feed, and direct back to blog post rather than video

http://cocinela.com/#18e/wordpress
https://flavorsme.tenderapp.com/discussions/problems/6415-youtube-videos-through-wordpress

Should note that WordPress.com uses shortcodes for embeds: http://en.support.wordpress.com/videos/youtube/

Issue with spaces in xmlns namespace

The following feed URL fails due to spaces in the xmlns attribute:

xmlns = 'http://www.w3.org/2005/Atom'

Speedparser returns an error about not being able to determine the version. The code appears to be specifically looking for xmlns= with no spaces.

jmoiron / speedparser Goto Github PK

speedparser's Issues

Update pip version

ImportError: cannot import name 'parse'

Using legacy feedparser date parsing breaks on this feed.

Not work with http://news.ycombinator.com/rss

Does speedparser compatible with python 3.7?

ImportError: cannot import name 'parse' from partially initialized module 'speedparser'

Dont get it worked out when I parse an url

Issue with spaces in xmlns namespace

lxml.clean cannot defang some raw text ("<3")

Embedded YouTube videos not playing

Issue with spaces in xmlns namespace

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent