Comments (7)
Thank you for the offer @fogoplayer! At least for now, I've had to remove the GPL/open-source license on the audio component, so it isn't in this repo anymore. I have a pretty busy next couple of days, but after that I hope to have some time to dedicate to this. If I had to guess, it is probably a case of the captions being added out of sync with the audio, but that's just a guess. It seems like YouTube has gotten progressively worse over the last couple years with that. I'll definitely take a closer look as soon as I can though, and report findings back here.
If you wanted to do some more research, I'm curious if we can find the API that YouTube is using to request the captions, and see if it also includes more timing info. Right now the muting is pretty simple. It basically has a mutation observer that watches for any nodes being added or modified, and then it determines if it is part of the captions. Then, it filters the node and if it contains a word that needed to be filtered it will mute until the next word/phrase gets added to the page, and then it repeats the process again, and if it doesn't have to filter it will unmute. So, right now the timing is very dependent on when the elements get added/removed from the DOM.
from advancedprofanityfilter.
It looks like we can find the API!
For this video: https://www.youtube.com/watch?v=CWWSovO3Txc
It sent this request and got this response. (Recorded as Gists so I don't add a 9,000 line JSON file to this thread.)
Key insights:
- The request seems to grab caption data for the entire video--it's only fired once
- It's fired when the user enables Closed Captioning, not on page load.
- The jackpot seems to be the
events
member, which is a list.- Members of that list are objects that have the following members:
- tStartMs: number
- dDurationMs: number
- wWinId: number
- segs: Object[]
- I'm guessing the first two are the start time and duration of the string represented in the caption
segs
is another list of objects. Those keys are:- utf8: a string containing a single word
- tOffsetMs: number
- acAsrConf: number
My next steps are going to be to try to whip together a Proof-of-Concept script to validate if this JSON data can be converted into accurate timings. I also might try and write something to crawl over the file and make sure there aren't any exceptions to the grammar above.
P.S.: I know you said you're short on time, so I have no expectation of quick replies. I'll post anything I can find in this thread, and I'll take direction and updates as they come.
from advancedprofanityfilter.
Thanks for the examples @fogoplayer, I'll take a look and see what can be done as soon as I get some more time.
from advancedprofanityfilter.
Sounds great!
I'd be happy to pitch in my own typescript skills, and every time I've scanned the repo for words related to audio, muting, etc. I haven't been able to find anything that looks like a relevant segment of code. I don't expect you to hold my hand through the whole process, but if you can point me in a direction I'd love to get started on a PR!
from advancedprofanityfilter.
Okay, proof-of-concept done:
// jsdoc-typed js, because there's no need to get Babel involved in this
/**
* @param {string} captionData - the stringified JSON data from the YT API
* @param {boolean?} verbose - controls logging
*/
function checkTimings(captionData, verbose = false) {
/** @type {HTMLVideoElement} */
const videoElement = document.querySelector("video")
// YT caption JSON data includes "control characters" that are invalid JSON.
// I'm not sure how Google is parsing them, but given that no curse words contain control characters, it's enough for us to just ignore them.
captionData = captionData.replaceAll(/\"[\n\r\t\v\b\a\f\\\h\x]\"/g, '"control character"')
const {events} = JSON.parse(captionData)
for (const event of events) {
const {tStartMs: startTime, segs} = event
setTimeout(()=>{
for (const seg of segs) {
const {tOffsetMs: delay = 0, utf8: token} = seg
setTimeout(()=>{
console.log(token)
if(token.trim() === "earbuds") {
videoElement.volume = 0
} else {
videoElement.volume = 1
}
}, delay / videoElement.playbackRate)
}
}, startTime / videoElement.playbackRate)
}
videoElement.currentTime = 0
videoElement.play()
}
A few key findings here:
- The start times in the JSON file do seem to have the obvious meaning!
- Sanitization of the JSON file may prove to be a non-trival difficulty here. I know there are ways to access the
response.json()
directly, and if that method is properly escaped it might be a non-issue.
And the big one:
- Timings were still somewhat inaccurate. They started out extremely accurate, but seemed to drift as time went on, and by the three minute mark they were noticeably lagging behind. It could be that YouTube's time stamps are off, but it seems more likely that it's an issue with my code.*
This is where I'm likely to focus going forward. I've heard using timeouts inside a Web Worker makes them more accurate, but a solution that runs on thetimeUpdate
event might be better.
*setTimeout
is known to drift, and I wasn't trying particularly hard to optimize my code so there may be additional delays (such as the use of a slower for loop syntax).In addition to the method above (timeouts
inside timeouts
), I also tried a technique where I computed the total time offset before creating the timeout, with no noticeable difference.
from advancedprofanityfilter.
New attempt:
/**
* @typedef {{
* timeMs: number
* token: string
* }}
*/
/**
* @param {string} captionData - the stringified JSON data from the YT API
* @param {boolean?} verbose - controls logging
*/
function checkTimings(captionData, wordToMute) {
////////////////////////
// Parse caption data //
////////////////////////
// YT caption JSON data includes "control characters" that are invalid JSON.
// I'm not sure how Google is parsing them, but given that no curse words contain control characters, it's enough for us to just ignore them.
captionData = captionData.replaceAll(/\"[\n\r\t\v\b\a\f\\\h\x]\"/g, '"control character"')
const {events} = JSON.parse(captionData)
////////////////////////////////////
// Create sorted timestamps array //
////////////////////////////////////
/**
* @type {[]}
*/
const timestamps = []
for (const event of events) {
const {tStartMs: startTime, segs} = event
if(!segs) continue
for (const seg of segs) {
const {tOffsetMs: delay = 0, utf8: token} = seg
timestamps.push({timeMs: startTime + delay, token})
}
}
//////////////////////////////////////////////////////
// Check current token each time the player updates //
//////////////////////////////////////////////////////
const video = document.querySelector("video")
// removed due to low polling rate
// video.ontimeupdate =
// we'd probably want to add and remove the interval on play and pause events, but good enough for now
clearInterval(window.captionInterval)
window.captionInterval = setInterval(() => {
const start = performance.now()
const {token} = binSearch(video.currentTime * 1000)
const end = performance.now()
console.log(token, "\t", video.currentTime * 1000)
if(token.trim() === wordToMute) video.volume = 0
else video.volume = 1
}, 50)
video.currentTime = 0
video.play()
/**
* Binary search of timestamps array
* @param {number} val the current time in ms
* @returns
*/
function binSearch(val, start=0, end=timestamps.length) {
if(end-start <= 1) return timestamps[start]
const med = Math.floor((start+end)/2)
if(timestamps[med].timeMs > val) return binSearch(val, start, med)
else return binSearch(val, med, end)
}
}
rather than setting timeouts for future mutes, i check the current time of the video, do a binary search to turn that into a token, and then apply filtering if that token matches the passed-in word to block.
I found a lot of benefits to this approach. It didn't decay over time, like setTimeout.
it handles changes to the playback rate and skipping around the video by default, without any extra logic. The binary search is super fast--the performance API often said its runtime was 0ms.
However, it still comes in too late sometimes, and at this point i think that's probably due to an inaccuracy in the timings in the captions. Increasing the polling rate to 100hz and decreasing the playback rate had no effect on the accuracy.
I wonder if a setting could be added to make audio censoring always come in early, kind of like how a minimum duration can be set right now?
from advancedprofanityfilter.
Thank you so much for all your work with this @fogoplayer, you were very thorough! I'm sorry its taken longer for me to get back to you with it. I will take some time to go through it all and let you know some next steps with where we can go with it.
I do agree also with your conclusion that the actual timing info may not be accurate, but the problem before was that we didn't have the timing info available, so there was no way to mute pre-emptively, but now with this information we should be able to. It likely wouldn't need to be much extra, but I do think it could be an option that we could allow.
from advancedprofanityfilter.
Related Issues (20)
- Not filtering when set to remove on a word set to partial HOT 2
- Disney+ not muting and/or not unmuting HOT 11
- New default filter suggestion HOT 2
- Adding Hallmark Movies Now to default filters HOT 9
- Hulu main site not muting. HOT 8
- Hulu audio muting not working HOT 11
- MGM Plus is there a config for that site? HOT 7
- Apple TV Captions hidden when extension is active HOT 7
- Hidive No longer muting. HOT 6
- Discovery+ no longer filtering HOT 7
- Reddit not fully being censored unless set to deep which slows down site tremendously. HOT 1
- Make It Only Censor The Bad Word But Not The Entire Caption HOT 4
- Not censoring in The Verge comment section HOT 3
- subtitles are blocked audio isn't HOT 4
- redbox.com captions are censored but audio isn't HOT 2
- AMC.com Audio Not Muting Captions Mute
- amazon prime video not muting audio HOT 3
- Audio filtering has stopped working HOT 4
- ITV UK site not muting audio
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from advancedprofanityfilter.