Comments (4)
Hi @quinncomendant 😄 I'd triage this issue.
Could you please send an email containing the URL to the podcast you added?
My email is [email protected]
from mygpo.
A bug results in podcast URLs in the database like this:
http://www.npr.org/rss/podcast.php?id=510300http:%2525252F%25252Fwww.npr.org%2525252Frss%252Fpodcast.php%2525253Fid=510300
The 25
is the result of encoded %
(percent sign), and the 2F
is the result of /
(slash). It seems that %
is encoded using urllib.parse.quote
every time a podcast is updated and then not appropriately decoded.
from mygpo.
From urllib.parse — Parse URLs into components — Python 3.8.2 documentation
The URL quoting functions focus on taking program data and making it safe for use as URL components by quoting special characters and appropriately encoding non-ASCII text.
URL My opinion is that we shouldn't use urllib.parse.quote
to deal with user-submitted URLs; otherwise, we have to handle the error-prone encoding and decoding process. Instead, we should urllib.parse.quote
only for URLs constructed in code.
Another thing worth noting is that Django provides django.utils.http.urlencode, a wrapper utility for urllib.parse.quote
.
An example in the wild is Sentry. I found that Sentry doesn't use urllib.parse.quote/unquote
in their code at all. They only use django.utils.http.urlencode
for their auto-generated URLs.
from mygpo.
I wanted to help with this one, albeit note I'm not really familiar with the whole codebase. After testing this locally, I see (at least) two issues here:
The first issue is that mygpo-feedservice calls unquote on the feed url and therefore can't handle URLs with %23
unless quoted twice. This can be observed live by typing http://example.com/test?key=abc%23def
here. The service fetches http://example.com/test?key=abc
instead. This is what makes me unable to add the podcast via the "Missing podcast" feature because the correct feed is not fetched.
The second issue I see is that normalize_feed_url
is not idempotent when it comes to the query part. I'm not sure I got this right, but the way I understand it is that the urllib.parse.quote
function used there isn't meant to encode the input, but rather to quote only what should be quoted but currently isn't. This currently works for the path but not the query. So for example this:
http://example.com/a%3Ab:c?query=a%3Abc
gets normalized into
http://example.com/a%3Ab%3Ac?query=a%253Abc
while IMO it should be
http://example.com/a%3Ab%3Ac?query=a%3Abc
So when normalize_feed_url
is invoked multiple times, we can end up with URLs like
http://example.com/a%3Ab%3Ac?query=a%252525253Abc
If these issues are relevant, I can sumbit PRs to fix both, but there's a problem because in some cases these issues cancel each other. So those'd need to be fixed together. But fixing the problem with feedservice might break some URLs quoted twice that already exist in the production database. Fixing those 'd probably require a data migration.
BTW, I wasn't able to reproduce the original case where #
is not even saved to the database. When I create and update the podcast via the API, it seems to work for me because of those 2 bugs (the url contains %2523
, feedservice unquotes it).
from mygpo.
Related Issues (20)
- ability to disable registrations HOT 3
- hide Google login button when google client id is empty/unset HOT 1
- Login 500 error HOT 5
- Import Error
- Each episode state can only be set once and never again
- No more right-clicking
- Docker build and Kubernetes hosting HOT 3
- Server is overloaded HOT 6
- Unsubscribing deleted podcasts
- Ideas / People for stabilizing gpodder.net HOT 6
- openAPI: `.` vs `/` for parameters
- Broken link on gpodder.net's "Contribute" page HOT 2
- broken account
- Error creating a new account HOT 3
- ValueError: Database is int between 0 and limit - 1
- Self-hosted mygpo getting error 500 HOT 3
- I Get A 500 Whenever I Attempt To Register A New Account HOT 5
- [Feature request] Docker container for selfhosting HOT 5
- 500 - Internal server error. HOT 15
- Paypal donation link broken HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mygpo.