Giter Site home page Giter Site logo

speedparser's Issues

Using legacy feedparser date parsing breaks on this feed.

http://www.theprizefinder.com/feed/top-prizes

Not exactly sure on the best approach here... I don't want to uninstall feedparser since I use it when parsing fails catastrophically, but I also don't really want to use the broken date stuff. Perhaps we could optionally force feedparser compat instead of using actual feedparser?

Not work with http://news.ycombinator.com/rss

>>> import speedparser
>>> feed = "http://news.ycombinator.com/rss"
>>> speedparser.parse(feed)
{'bozo_tb': 'Traceback (most recent call last):\n  File "/home/kir/.virtualenvs/rss/lib/python3.5/site-packages/speedparser/speedparser.py", line 688, in parse\n    parser = SpeedParser(document, cleaner, unix_timestamp, encoding)\n  File "/home/kir/.virtualenvs/rss/lib/python3.5/site-packages/speedparser/speedparser.py", line 585, in __init__\n    self.tree = tree.getroottree()\nAttributeError: \'NoneType\' object has no attribute \'getroottree\'\n', 'bozo_exception': AttributeError("'NoneType' object has no attribute 'getroottree'",), 'bozo': 1, 'feed': {}, 'entries': []}

Does speedparser compatible with python 3.7?

I get import error while trying it importing:

>>> import speedparser
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python\Python37\lib\site-packages\speedparser\__init__.py", line 1, in <module>
    from speedparser import parse
ImportError: cannot import name 'parse' from 'speedparser' (C:\Python\Python37\lib\site-packages\speedparser\__init__.py)

I've checked __init__.py and if change first line to from .speedparser import parse it will fix import error, however later it throw another error:

>>> import speedparser
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python\Python37\lib\site-packages\speedparser\__init__.py", line 1, in <module>
    from .speedparser import parse
  File "C:\Python\Python37\lib\site-packages\speedparser\speedparser.py", line 19, in <module>
    import urlparse
ModuleNotFoundError: No module named 'urlparse'

And it's look like it python 2 module, at least there a few libraries:
https://pypi.org/project/urlparse2/
https://pypi.org/project/urlparse3/
https://pypi.org/project/urlparse4/
but all of them only for python 2.

ImportError: cannot import name 'parse' from partially initialized module 'speedparser'

I installed speedparser using pip install speedparser . When I tried importing it in a python shell, it gives the following error (in both VSCode and the Windows Terminal).

>>> import speedparser
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\speedparser\__init__.py", line 1, in <module>
    from speedparser import parse
ImportError: cannot import name 'parse' from partially initialized module 'speedparser' (most likely due to a circular import) (C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\speedparser\__init__.py)

The same error also exists when I try to import it by including the import statement in a .py file.

I have feedparser installed along speedparser. I don't think that is the issue but it seems like the only thing that might cause a circular import. I don't have any files named speedparser.py or parse.py in the current project directory.

image

Related issues #13 .

Dont get it worked out when I parse an url

I get an error when I try to parse something.
I try to parse this url: http://rss.cnn.com/rss/money_news_companies.rss

I work with Python3.4 and tornadoweb. So I made a change in the 'speedparser.py' file. I changed urlparse into urllib.parse, because urlparse is not longer supported in python3.

The error is:

{'bozo_tb': 'Traceback (most recent call last):\n File "C:\Python34\lib\site-packages\speedparser\speedparser.py", line 685, in parse\n parser = SpeedParser(document, cleaner, unix_timestamp, encoding)\n File "C:\Python34\lib\site-packages\speedparser\speedparser.py", line 582, in init\n self.tree = tree.getroottree()\nAttributeError: 'NoneType' object has no attribute 'getroottree'\n', 'bozo': 1, 'bozo_exception': AttributeError("'NoneType' object has no attribute 'getroottree'",), 'feed': {}, 'entries': []}

lxml.clean cannot defang some raw text ("<3")

lxml.html.clean cannot clean some raw text (in titles, descriptions, etc); particularly, text that might look like html but isn't. The first case I noticed was the tag "<title><3</title>" will fail with a ParserError('Document is empty'), which is likely an underlying libxml2 issue.

We cannot simply pass the text through, as <3<script>alert('foo');</script> will also raise this same error. Currently, there is a regression test "TestHeartParserError" which confirms this error in speedparser and confirms that feedparser will read this content.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.