Giter Site home page Giter Site logo

mmguero / cleanvid Goto Github PK

View Code? Open in Web Editor NEW
42.0 5.0 6.0 415 KB

cleanvid is a little script to mute profanity in video files

License: BSD 3-Clause "New" or "Revised" License

Python 94.30% Dockerfile 2.51% Shell 3.19%
srt video profanity video-files ffmpeg subtitle objectional-language python python3 profanity-detection

cleanvid's Introduction

cleanvid

Latest Version Docker Image

cleanvid is a little script to mute profanity in video files in a few simple steps:

  1. The user provides as input a video file and matching .srt subtitle file. If subtitles are not provided explicitly, they will be extracted from the video file if possible; if not, subliminal is used to attempt to download the best matching .srt file.
  2. pysrt is used to parse the .srt file, and each entry is checked against a list of profanity or other words or phrases you'd like muted. Mappings can be provided (eg., map "sh*t" to "poop"), otherwise the word will be replaced with *****.
  3. A new "clean" .srt file is created. with only those phrases containing the censored/replaced objectional language.
  4. ffmpeg is used to create a cleaned video file. This file contains the original video stream, but the audio stream is muted during the segments containing objectional language. The audio stream is re-encoded as AAC and remultiplexed back together with the video. Optionally, the clean .srt file can be embedded in the cleaned video file as a subtitle track.

You can then use your favorite media player to play the cleaned video file together with the cleaned subtitles.

As an alternative to creating a new video file, cleanvid can create a simple EDL file (see the mplayer or KODI documentation) or a custom JSON definition file for PlexAutoSkip.

cleanvid is part of a family of projects with similar goals:

Installation

Using pip, to install the latest release from PyPI:

python3 -m pip install -U cleanvid

Or to install directly from GitHub:

python3 -m pip install -U 'git+https://github.com/mmguero/cleanvid'

Prerequisites

cleanvid requires:

To install FFmpeg, use your operating system's package manager or install binaries from ffmpeg.org. The Python dependencies will be installed automatically if you are using pip to install cleanvid.

usage

usage: cleanvid.py [-h] [-s <srt>] -i <input video> [-o <output video>] 
                   [--plex-auto-skip-json <output JSON>] [--plex-auto-skip-id <content identifier>]
                   [--subs-output <output srt>] [-w <profanity file>] [-l <language>] [-p <int>] [-e] [-f] [--subs-only] [--offline] [--edl] [-r] [-b]
                   [-v VPARAMS] [-a APARAMS]

options:
  -h, --help            show this help message and exit
  -s <srt>, --subs <srt>
                        .srt subtitle file (will attempt auto-download if unspecified and not --offline)
  -i <input video>, --input <input video>
                        input video file
  -o <output video>, --output <output video>
                        output video file
  --plex-auto-skip-json <output JSON>
                        custom JSON file for PlexAutoSkip (also implies --subs-only)
  --plex-auto-skip-id <content identifier>
                        content identifier for PlexAutoSkip (also implies --subs-only)
  --subs-output <output srt>
                        output subtitle file
  -w <profanity file>, --swears <profanity file>
                        text file containing profanity (with optional mapping)
  -l <language>, --lang <language>
                        language for srt download (default is "eng")
  -p <int>, --pad <int>
                        pad (seconds) around profanity
  -e, --embed-subs      embed subtitles in resulting video file
  -f, --full-subs       include all subtitles in output subtitle file (not just scrubbed)
  --subs-only           only operate on subtitles (do not alter audio)
  --offline             don't attempt to download subtitles
  --edl                 generate MPlayer EDL file with mute actions (also implies --subs-only)
  -r, --re-encode       Re-encode video
  -b, --burn            Hard-coded subtitles (implies re-encode)
  -v VPARAMS, --video-params VPARAMS
                        Video parameters for ffmpeg (only if re-encoding)
  -a APARAMS, --audio-params APARAMS
                        Audio parameters for ffmpeg

Docker

Alternately, a Dockerfile is provided to allow you to run cleanvid in Docker. You can build the oci.guero.top/cleanvid:latest Docker image with build_docker.sh, then run cleanvid-docker.sh inside the directory where your video/subtitle files are located.

Contributing

If you'd like to help improve cleanvid, pull requests will be welcomed!

Authors

  • Seth Grover - Initial work - mmguero

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Acknowledgments

Thanks to:

cleanvid's People

Contributors

drive4code avatar mmguero avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cleanvid's Issues

Make the .edl CMX 3600 Format for Compatibility

Due to the audio artifacts still being there, i've been trying to find a way to solve the problem by bypassing ffmpeg. The problem is that by using the --edl flag the file generated is only compatible with MPlayer and Kodi, as it is different from the standard.

If the edl was in format CMX 3600, I would be able to import it in a video editor to apply the changes by bypassing ffmpeg

Add the ability to insert time codes and have cleanvid make cuts

CleanVid works great! I appreciate the work. I'd love the ability to also provide a file with time codes and have cleanvid cut the video, kind of like what Vidangel does, but open source of course. If I had the skills I'd love to try making it, but my coding is very limited.

--swears

I must be doing something wrong. -w / --swears doesn't seem to allow me to use my own swears.txt file.

GUI or Plugin Support coming?

Would be really nice to have plugin support for Kodi/Jellyfin, Emby, or Plex.
Any plans to add this, or just create a basic GUI to make running this easier?

--edl is not consistent

When running with the --edl parameter It does not getting all of the subtitle sections for swear words. does not seem to be based on the word but more how close together they are maybe just the last one?

below is my cleaned srt where I was testing.

1
00:00:27,465 --> 00:00:29,015
Propose. dang bad word there

2
00:00:29,066 --> 00:00:31,265

  • Two.
  • What's the stake? dang bad word there

3
00:00:33,297 --> 00:00:34,647
Your ring. poop bad word there

4
00:00:40,548 --> 00:00:44,257
Gambling again, Poldark. ***** bad word there
Remind me why you enlisted.

5
00:00:44,306 --> 00:00:47,016
To escape the gallows, sir. ***** bad word there

and this is my resulted edl file. the srt file had much past it but this is a good example.

27.5 29.015 1
29.1 31.265 1
33.3 34.647 1
40.5 44.257 1

-f output

Is it possible to have the subtitles output with multiple words per line instead of 1 word per timestamp?

image

Subtitle output

Is it possible to integrate an argument for subtitle output location?

I'm exporting my cleaned movies to another folder, but the clean srt stays with the input file.

something like ./cleanvid.py -w swears.txt -i /mnt/terriblehorriblemovies/Filthymovie.mp4 -o mnt/familymovies/Cleanmovie.mp4 -s /mnt/terriblehorriblemovies/Filthymovie.srt --subout /mnt/familymovies/Cleanmovie.srt

Make the padding a float

Would it be possible? It's annoying that only a portion of a second before the word ends i start to hear it

Timestamp inaccuracy

After the update, the video gets muted for too long. After the swear word they work well, but before you get to it consistent issues arise.
Here's an visualization of the issue:
image

My srt files at this point contains the following:
212
00:00:58,680 --> 00:00:58,700
a

213
00:00:58,700 --> 00:00:58,720
fuck.

214
00:00:58,720 --> 00:00:59,300
You're

215
00:00:59,300 --> 00:00:59,360
a

216
00:00:59,360 --> 00:00:59,620
fucking

217
00:00:59,620 --> 00:00:59,680
liar.

Audio changes

Is it possible to modify the audio track of the video passed in and use a filter to boost it by a certain amount of db like in ffmpeg normally? I noticed there is a section for audio parameters for it but it only seems to execute if it is reencoding.
Please let me know thanks.

Mute master pc volume app

Hi I'm sorry if this is in the wrong place but I can't see any other way of contacting you. Thank you for creating software to remove profanities.
I currently use an EDL file along with Kodi to mute the swearing in videos but this is limited to videos I own. I would love to be able to play tv and films from streaming platforms such as Amazon and Apple TV and mute the swearing on the fly. The only way I can think of that this could be done is to have a simple timer app that can import an EDL file for the timings and can be started at the same time as the streaming video. The app would be designed to mute the master pc volume based upon the timings.
I'm not a programmer so I'm not going to attempt this myself but wondered if you might be up for the challenge or could direct me to the right website where I could request this?

Regards

Nathan

Multilingual Filtering

The swears.txt file is English-only, but I like watching foreign-language films, so a multilingual dataset of bad words would be great. LDNOOBW by Shutterstock is probably the best dataset I've come across that could do the job.

Seeing as we already have a --lang flag, perhaps we could extend it to select which language(s) to search for bad words? The one problem would be when some video has more than one language in it - thoughts?

Keep getting "SyntaxError: unexpected character after line continuation character"

I am new to your program. No matter what I do, I keep getting the following error "SyntaxError: unexpected character after line continuation character."

I am using the following code, but I am not sure what to use for APARAMS:

cleanvid.py [-s <C:\Users\User\Desktop\Video1.srt>] -i <C:\Users\User\Desktop\Video1.mkv> [-o <C:\Users\User\Desktop\Clean Video1.mkv>] [--subs-output <C:\Users\User\Desktop\Clean Video1.srt>] [-w <C:\Users\User\Desktop\Profanity.txt>] [-p <3>] [--offline] [-a APARAMS]

Error_while executing the code

I have tried installing the delegator from delegator.py.But, I still end up with the below error.

would you please assist me. My python version is 2.7

AttributeError: 'module' object has no attribute 'run'

Traceback (most recent call last):
File "cleanvid.py", line 162, in
cleaner.MultiplexCleanVideo()
File "cleanvid.py", line 124, in MultiplexCleanVideo
ffmpegResult = delegator.run("ffmpeg -y -i "" + self.inputVidFileSpec + """ +
AttributeError: 'module' object has no attribute 'run'

--plex-auto-skip-json is not generating anything

Consider the following command, I am unable to get cleanvid to generate auto skip json

cleanvid -i "/example/Oppenheimer (2023) (2160p).mkv" --plex-auto-skip-json /Users/Example/Desktop/json.json --plex-auto-skip-id 66558 -w /Users/Example/Desktop/wordfilters.txt -s "/Users/Example/Downloads/oppenheimer.srt"

I can get it to create a video mkv file if I set an output AND designate -e. But I cannot get the tool to create the json file for plex-auto-skip

With that said, I really appreciate you supporting this, and included plex-auto-skip in this. Let me know what other info you need.

Feature request: integrate with Plex

Your tool looks amazing. Can you please integrate it with Plex? I would love to mute profanity while watching videos with Plex.

https://www.plex.tv/
https://github.com/pkkid/python-plexapi
https://python-plexapi.readthedocs.io/en/latest/

https://www.reddit.com/r/PleX/comments/orbdby/plex_github_project_to_censor_profanity/
"I cant say that it would be possible for remote clients, or when playing shared content, but if someone is playing content on a plex client that is on the same network as the server then this is technically possible without plugins, assuming you have subtitle files that are in sync with the media. It is pretty much the same thing as how the automatic intro skippers work, only instead of skipping forward the intro length you do client.mute and client.setSubtitleStream actions, followed by the reverse after however long you want those off. Projects like the python plex api can be used to help control plex clients and poll what is playing. I cant recommend anything for scanning a subtitle file for keywords and getting the timestamp of where it happens, but odds are with how common subtitles are that there's a project or 5 already out there just for that purpose. I'd be using this type setup for a "movie dialog game" instead of to "censor profanity," and have something like the corner light flash red for a few seconds 5 or 10 seconds before a muted scene so I'd know to be ready for the next "improv section," but to each their own. No reason you cant host your creation on github."

[ Windows ] Error Out on Long files

I've noticed Windows Error Out When the File is too large, happened with a 50-minute video.

Error: Too Long Commandline Argument

I think its a Windows Specific issue with a limit on the commandline when chaining alll those afade filters together.
I've tried enabling long paths in the register as suggested online, yet still no luck

Utilizing more than 1 thread

OS: Linux (in a docker container to be specific)

Is there a way to have this script utilize more than 1 thread, I have some other ffmpeg stuff that runs and is able to use more which speeds it up considerably, is that not possible in this because of some restriction?
If so, is there a way to pass the -threads flag to ffmpeg through cleanvid?
Please let me know, thanks.

Embedded & Burned Subtitles are Incorrect

I've noticed that the subs when running the --burn and --embed-subs flags are the ones of the words_clean.srt rather than the cleaned original subtitles. This means the only thing displayed is ***** or replacement words where the profanity's been muted, rather than the original file's subtitle with the censored words

Error when running

I get this error when running the cmd:
File "C:\Program Files\Python38\lib\codecs.py", line 504, in read newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 0: invalid start byte

my cmd: cleanvid.py -s tls.srt -i tls.mp4 -o tlsc.mp4 -w swears.txt

I have Python 3.8 64bit installed.

Wiki

Wiki please on How to put a movie file to process

Possible inclusion of a range flag / option?

Hello!
Firstly, I wanted to thank you for making this neat program, I really appreciate it and it's been a wonderful help for me!

For this Issue / Suggestion, I'd like to enquire about a range option or help with a line of code in the script if even possible, basically, like some other profanity muters there's an option to set a before and after extension on the found lines from the subtitle, so that in the event of a sudden exclamation by a speaker, slightly offset timing, or trailing swears, this would ensure that they're caught by the volume mute by starting a little early (usually 500ms) and ending a little later (also 500ms) on top of the duration of the line, like so:

image

Any assistance would be wonderful!
Thank you again

Add support for Kodi plugin

Awesome project! Any chance that this could be wrapped as a Kodi plugin, so the whole process can be managed within Kodi (generating clean srt file + edl file)?

Verbose Mode / Embedding Comments in EDL Files

Sometimes I'd like to tweak my EDL files, and while this tool can output a generally great starting point, it's cumbersome trying to figure out which words it's trying to mute on. I've tested on Debian Stable's MPlayer (1.4) and it appears it ignores comments or in fact anything after the 3rd field. See this line for reference. For Kodi, which I don't use, it allows comments on their own lines starting with ##. Note I haven't checked if Kodi disregards anything past the 3rd field.

I propose a command-line flag (or even default behavior!) embedding within the EDL file information on what word(s) it is censoring at such-and-such a timestamp.

Strip subtitle format?

Hi. I am getting subtitles in this format when using the tool. Is there a way to strip the font size etc? I don't want that in the subtitles.

1
00:00:05,040 --> 00:00:05,880
{\an1}[engine revving]

2
00:00:06,360 --> 00:00:07,920
{\an1}Sideways in linen.

Offline mode?

From the README: "If subtitles are not provided explicitly, they will be extracted from the video file if possible; if not, subliminal is used to attempt to download the best matching .srt file."

It would be a nice idea to include some flag where cleanvid would never need Internet access or otherwise call out to whatever various servers subliminal uses.

Swear Censoring

Hey, I'm rather new to python and coding in general but quick learned. I've been at it for hours trying to get this script to work and for the life of me cannot.

PS C:\Users\Kaize> cleanvid -i "C:\Users\Kaize\OneDrive\Pictures\Camera Roll\WIN_20240205_12_48_28_Pro.mp4" -w "D:\Programs\Python 3\Lib\site-packages\cleanvid\swears.txt" -o "C:\Users\Kaize\OneDrive\Pictures\Camera Roll\Censored"
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "D:\Programs\Python 3\Scripts\cleanvid.exe_main
.py", line 7, in
File "D:\Programs\Python 3\Lib\site-packages\cleanvid\cleanvid.py", line 663, in RunCleanvid
subsFile = GetSubtitles(inFile, lang, args.offline)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Programs\Python 3\Lib\site-packages\cleanvid\cleanvid.py", line 147, in GetSubtitles
savedSub = save_subtitles(video, [bestSubtitles[video][0]])
~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

This is what I'm getting. I'm not worried about subtitles I just need the profanity in videos muted. I've tried --offline as well and still nothing. I'm using windows powershell.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.