Giter Site home page Giter Site logo

Comments (4)

SiqingYu avatar SiqingYu commented on June 4, 2024

Hi @quinncomendant 😄 I'd triage this issue.
Could you please send an email containing the URL to the podcast you added?
My email is [email protected]

from mygpo.

SiqingYu avatar SiqingYu commented on June 4, 2024

A bug results in podcast URLs in the database like this:
http://www.npr.org/rss/podcast.php?id=510300http:%2525252F%25252Fwww.npr.org%2525252Frss%252Fpodcast.php%2525253Fid=510300
The 25 is the result of encoded % (percent sign), and the 2F is the result of / (slash). It seems that % is encoded using urllib.parse.quote every time a podcast is updated and then not appropriately decoded.

from mygpo.

SiqingYu avatar SiqingYu commented on June 4, 2024

From urllib.parse — Parse URLs into components — Python 3.8.2 documentation

The URL quoting functions focus on taking program data and making it safe for use as URL components by quoting special characters and appropriately encoding non-ASCII text.

URL My opinion is that we shouldn't use urllib.parse.quote to deal with user-submitted URLs; otherwise, we have to handle the error-prone encoding and decoding process. Instead, we should urllib.parse.quote only for URLs constructed in code.

Another thing worth noting is that Django provides django.utils.http.urlencode, a wrapper utility for urllib.parse.quote.

An example in the wild is Sentry. I found that Sentry doesn't use urllib.parse.quote/unquote in their code at all. They only use django.utils.http.urlencode for their auto-generated URLs.

from mygpo.

zb3 avatar zb3 commented on June 4, 2024

I wanted to help with this one, albeit note I'm not really familiar with the whole codebase. After testing this locally, I see (at least) two issues here:

The first issue is that mygpo-feedservice calls unquote on the feed url and therefore can't handle URLs with %23 unless quoted twice. This can be observed live by typing http://example.com/test?key=abc%23def here. The service fetches http://example.com/test?key=abc instead. This is what makes me unable to add the podcast via the "Missing podcast" feature because the correct feed is not fetched.

The second issue I see is that normalize_feed_url is not idempotent when it comes to the query part. I'm not sure I got this right, but the way I understand it is that the urllib.parse.quote function used there isn't meant to encode the input, but rather to quote only what should be quoted but currently isn't. This currently works for the path but not the query. So for example this:

http://example.com/a%3Ab:c?query=a%3Abc

gets normalized into

http://example.com/a%3Ab%3Ac?query=a%253Abc

while IMO it should be

http://example.com/a%3Ab%3Ac?query=a%3Abc

So when normalize_feed_url is invoked multiple times, we can end up with URLs like

http://example.com/a%3Ab%3Ac?query=a%252525253Abc

If these issues are relevant, I can sumbit PRs to fix both, but there's a problem because in some cases these issues cancel each other. So those'd need to be fixed together. But fixing the problem with feedservice might break some URLs quoted twice that already exist in the production database. Fixing those 'd probably require a data migration.

BTW, I wasn't able to reproduce the original case where # is not even saved to the database. When I create and update the podcast via the API, it seems to work for me because of those 2 bugs (the url contains %2523, feedservice unquotes it).

from mygpo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.