Giter Site home page Giter Site logo

dplocki / podcast-downloader Goto Github PK

View Code? Open in Web Editor NEW
92.0 2.0 14.0 291 KB

The Python script for downloading new mp3 from RSS given channels

License: GNU General Public License v3.0

Python 99.71% Dockerfile 0.29%
python3 podcast script automation rss rss-feed-bot no-database json-configuration

podcast-downloader's People

Contributors

clstaudt avatar dplocki avatar ru-fu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

podcast-downloader's Issues

pip3 install podcast_downloader installs not-latest version

Describe the bug
Wrong version of podcast_downloader is installed

To Reproduce
Steps to reproduce the behavior:

  1. Run pip3 install podcast_downloader or python3 -m pip install podcast_downloader
  2. Run pip3 show podcast_downloader
  3. Observe version installed is 0.1.1

Expected behavior
Version of install to be 0.2.0 or latest

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: macOS
  • Python version: 3.9
  • Version: 0.1.1
  • Link to RSS feed: Not applicable

Additional context
Originally thought this was an issue with the app itself, but realized I didn't actually have the latest version of the package

Please, add new variables.

Hi, thanks for the excellent work.

1 - I came to ask if you could add new variables.

For example in this RSS I would like to get the description and the author
https://www.omnycontent.com/d/playlist/8c0a4104-a688-4e57-91fd-ad7b00d5dddd/a32cf512-c3ce-4057-8ec8-af3400c547e5/ac708daf-04da-4352-ae6d-af3400ca82ad/podcast.rss

2 - In the same RSS gives this error

[←[2m2023-05-08 16:42:55←[0m] ←[31mError:←[0m The podcast file "←[97mhttps://traffic.omny.fm/d/clips/8c0a4104-a688-4e57-91fd-ad7b00d5dddd/a32cf512-c3ce-4057-8ec8-af3400c547e5/f789c11e-447f-460d-a89c-af390172e0b3/audio.mp3?utm_source=Podcast&in_playlist=ac708daf-04da-4352-ae6d-af3400ca82ad←[0m" could not be saved to disk "←[97mC:\Users\Filipe Mota/Downloads/Podcast/A caminho do Catar[20221027] Portugueses a viver no Catar "É um país muito rico e compensa vir para cá trabalhar".mp3←[0m" due to the following error:
←[97m[Errno 22] Invalid argument: 'C:\Users\Filipe Mota/Downloads/Podcast/A caminho do Catar\[20221027] Portugueses a viver no Catar "É um país muito rico e compensa vir para cá trabalhar".mp3'←[0m

3 - In the RSS below this error in ep1 and in the trailer.

[←[2m2023-05-08 15:43:26←[0m] ←[31mError:←[0m The podcast file "←[97mhttps://traffic.omny.fm/d/clips/b04d3ae5-22c4-41b6-b20a-aa54000ba759/4093b241-20e0-4025-8a00-afba013b2218/29e80dd4-0527-4d7d-85e9-afc401721117/audio.mp3?utm_source=Podcast&in_playlist=b150e14d-4d2e-4c4e-9cf2-afba013f7a91←[0m" could not be saved to disk "←[97mC:\Users\Filipe Mota/Downloads/Podcast/O Sargento na Cela 7[20230314] Estreia. "O Sargento na Cela 7". Episódio 1 O Prisioneiro.mp3←[0m" due to the following error:
←[97m[Errno 22] Invalid argument: 'C:\Users\Filipe Mota/Downloads/Podcast/O Sargento na Cela 7\[20230314] Estreia. "O Sargento na Cela 7". Episódio 1 O Prisioneiro.mp3'←[0m

https://www.omnycontent.com/d/playlist/b04d3ae5-22c4-41b6-b20a-aa54000ba759/4093b241-20e0-4025-8a00-afba013b2218/b150e14d-4d2e-4c4e-9cf2-afba013f7a91/podcast.rss

4 - And I wish there was an alternative to the date.

YEARMMDD and YEAR.MM.DD

With the dots on the dates it would make it a lot easier to read

5 - I have a question

The possibility of having more than one podcast in a JSON file? Well, I tried and I couldn't.

6 - Error because of accents

https://rss.podplaystudio.com/3240.xml

error

Thanks and keep up the great work.
Best regards,
BlackSpirits

the download result is NONE from this xml https://feed.xyzfm.space/jve6gh9jt8vm

Describe the bug
the download result is NONE from this xml https://feed.xyzfm.space/jve6gh9jt8vm

Screenshots
[?[2m2023-05-27 10:54:22?[0m] Loading configuration (from file: "?[97mD:\AudioProject\data_engineering\podcast-downloader-master\config\config.json?[0m")
[?[2m2023-05-27 10:54:22?[0m] Checking "?[97m北海怪兽?[0m"
[?[2m2023-05-27 10:54:28?[0m] Last downloaded file "?[97m?[0m"
[?[2m2023-05-27 10:54:28?[0m] ?[97m北海怪兽?[0m: Nothing new
[?[2m2023-05-27 10:54:28?[0m] ------------------------------
[?[2m2023-05-27 10:54:28?[0m] Finished

Final filename can exceed 255 chars

Describe the bug
Final filename can exceed 255 if template string includes another pattern in addition to the title, causing the program to crash.

To Reproduce
Steps to reproduce the behavior:

  • Set file_name_template to "[%publish_date%] %title%.%file_extension%". Download an episode with title longer than 255 chars.

Expected behavior
Program should not crash. Need to truncate expanded template.

Desktop (please complete the following information):

Additional context
Checked the code. Looks like the truncation only applies to the title, and not the expanded template.

def str_to_filename(value: str) -> str:
    value = unicodedata.normalize("NFKC", value)
    value = re.sub(r"[\u0000-\u001F\u007F\*/:<>\?\\\|]", " ", value)

    return value.strip()[:FILE_NAME_CHARACTER_LIMIT]


def file_template_to_file_name(name_template: str, entity: RSSEntity) -> str:
    return (
        name_template.replace("%file_name%", link_to_file_name(entity.link))
        .replace("%publish_date%", time.strftime("%Y%m%d", entity.published_date))
        .replace("%file_extension%", link_to_extension(entity.link))
        .replace("%title%", str_to_filename(entity.title))
    )

Default location of configuration file

Is your feature request related to a problem? Please describe.
As the project become a Python module, configuration file needs to be in home directory.

Describe the solution you'd like
The configuration needs to placed in the home path, to be independent of calling place

Describe alternatives you've considered
I think the script parameter will be nice.

The downloaded audio does not match the audio provided by rss.

To Reproduce
Steps to reproduce the behavior:

  1. Enter configuration
    {
    "if_directory_empty": "download_all_from_feed",
    "podcasts": [
    {
    "name": "Thai PBS Podcast",
    "rss_link": "https://www.thaipbspodcast.com/program-rss.php?id=133",
    "path": "xxx",
    "podcast_extensions": {".mp3": "audio/x-m4a"}
    }].
    }

  2. See error
    The file sizes are all 65KB and there is a read error.

Screenshots
image

rss has no attribute 'href'

Describe the bug
it' s not this project fault but the podcast rss fault, i wonder if there's a solution.
the rss like ' https://feeds.audiomeans.fr/feed/88cf4afb-075f-42e2-b94b-3f3d4ed98f69.xml', download it and it will return: "AttributeError: object has no attribute 'href' "

To Reproduce
Steps to reproduce the behavior:
{"if_directory_empty": "download_all_from_feed",
"podcasts": [
{
"name": "test",
"rss_link": "https://feeds.audiomeans.fr/feed/88cf4afb-075f-42e2-b94b-3f3d4ed98f69.xml",
"path": "~/test"
}
}
image

Limit for download at once should be also present in configuration file

Describe the bug
There is no way to limit download files in configuration file.

Expected behavior
I can enter the value for limit into configuration file.

Desktop (please complete the following information):

  • OS general
  • Python version: general
  • Version: 0.1.1
  • Link to RSS feed: general

Include the episode title in the file name

Is your feature request related to a problem? Please describe.
For a better organization it would be interesting to include the possibility that the file name contains the episode title

Describe the solution you'd like
A new flag in the configuration could be require_title

Add check-in test run

Is your feature request related to a problem? Please describe.
Missing the check-in workflow.

Describe the solution you'd like
Adding the workflow which will checking all the new commit by testing them.

The structure of configuration file needs to be redesigned

Is your feature request related to a problem? Please describe.
Currently all options in the configuration file are just podcasts data. No room for general options.

Describe the solution you'd like
In config file there should be section for general options.

Support mp4 file downloads

Is your feature request related to a problem? Please describe.
Some podcasts like this one have both .mp3 and .m4a audio files.

Describe the solution you'd like
It would be cool if the script could download both kinds!

Describe alternatives you've considered
Doing it in a shell command instead 🤷🏻 😅 I prefer the way your script keeps track of files already downloaded though!

If directory is empty download all new feeds from n days

Is your feature request related to a problem? Please describe.
Now if the directory for podcast is empty, the script will download all mp3s from RSS. It's not good thing if someone is update with current podcast.

Describe the solution you'd like
An option in config file which determine which how often this file is run (e.g. in form of days number).

Can't download from this RSS

Describe the bug
Can't download from this RSS: "https://www.omnycontent.com/d/playlist/6dd8413b-ede6-483a-bf4e-ab80014939de/20f4bf02-d62f-40b2-b532-af10011ba71b/2bdbf0f4-e0ca-4343-9fb2-af10011ba729/podcast.rss"

To Reproduce
jason file:

{
    "if_directory_empty": "download_from_4_days",
    "podcasts": [
        {
            "name": " Listening Time",
            "rss_link": "https://www.omnycontent.com/d/playlist/6dd8413b-ede6-483a-bf4e-ab80014939de/20f4bf02-d62f-40b2-b532-af10011ba71b/2bdbf0f4-e0ca-4343-9fb2-af10011ba729/podcast.rss",
            "path": "./ttt",
            "file_name_template": "[%publish_date%] %title%.%file_extension%"
        }
    ]
}

Command:
python3 -m podcast_downloader

Expected behavior
Download episodes.

Error message

[2023-02-05 14:48:21] Loading configuration (from file: "~/.podcast_downloader_config.json")
[2023-02-05 14:48:21] Checking " Listening Time"
[2023-02-05 14:48:22] Last downloaded file "<none>"
[2023-02-05 14:48:22]  Listening Time: Nothing new
[2023-02-05 14:48:22] ------------------------------
[2023-02-05 14:48:22] Finished

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Ubuntu
  • Python version Python 3.8.10
  • Version ??? (I don't know this mean the version of what.)
  • Link to RSS feed [e.g. https://www.omnycontent.com/d/playlist/6dd8413b-ede6-483a-bf4e-ab80014939de/20f4bf02-d62f-40b2-b532-af10011ba71b/2bdbf0f4-e0ca-4343-9fb2-af10011ba729/podcast.rss]

error

C:\Users\Filipe Mota>python -m podcast_downloader
[←[2m2023-10-15 16:20:41←[0m] Loading configuration (from file: "←[97m~/.podcast_downloader_config.json←[0m")
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Users\Filipe Mota\AppData\Roaming\Python\Python312\site-packages\podcast_downloader_main
.py", line 159, in
load_configuration_file(os.path.expanduser(CONFIG_FILE)),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Filipe Mota\AppData\Roaming\Python\Python312\site-packages\podcast_downloader\parameters.py", line 21, in load_configuration_file
return json.load(json_file)
^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\json_init
.py", line 293, in load
return loads(fp.read(),
^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\json\decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Invalid \escape: line 7 column 23 (char 217)

If directory is empty an exception is thrown

Describe the bug
If you trying to check the empty directory, an exception is thrown.

To Reproduce
Steps to reproduce the behavior:

  1. setup podcast
  2. make sure, that the directory of it is empty
  3. run script

Additional context

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "__main__.py", line 79, in <module>
    last_downloaded_file = get_last_downloaded(rss_source_path)
  File "downloaded.py", line 23, in get_last_downloaded
    return next(get_downloaded_files(podcast_directory))
StopIteration

There is a problem with reading configuration file

Describe the bug
Script cannot find the existing configuration file on home directory: ~/.podcast_downloader_config.json

To Reproduce
Steps to reproduce the behavior:

  1. Place configuration file: ~/.podcast_downloader_config.json
  2. Run script
  3. See error: "Cannot find configuration file"

Expected behavior
Run without problems

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.