🐛 Deion Audio muting often misses the profanity, muting the

Thank you for the offer <a class="user-mention notranslate" data-hovercard-type="user"

It looks like we can find the API! For this video: <a href="https://

Thanks for the examples <a class="user-mention notranslate" data-hovercard-type="user"

Sounds great! I'd be happy to pitch in my own type skills, and

Okay, proof-of-concept done: <div class="highlight highlight-source-js notranslate

New attempt: <div class="highlight highlight-source-js notranslate position-relati

Thank you so much for all your work with this <a class="user-mention notranslate" data

Muting is delayed about advancedprofanityfilter HOT 7 OPEN

fogoplayer commented on July 24, 2024

Muting is delayed

from advancedprofanityfilter.

Comments (7)

richardfrost commented on July 24, 2024 2

Thank you for the offer @fogoplayer! At least for now, I've had to remove the GPL/open-source license on the audio component, so it isn't in this repo anymore. I have a pretty busy next couple of days, but after that I hope to have some time to dedicate to this. If I had to guess, it is probably a case of the captions being added out of sync with the audio, but that's just a guess. It seems like YouTube has gotten progressively worse over the last couple years with that. I'll definitely take a closer look as soon as I can though, and report findings back here.

If you wanted to do some more research, I'm curious if we can find the API that YouTube is using to request the captions, and see if it also includes more timing info. Right now the muting is pretty simple. It basically has a mutation observer that watches for any nodes being added or modified, and then it determines if it is part of the captions. Then, it filters the node and if it contains a word that needed to be filtered it will mute until the next word/phrase gets added to the page, and then it repeats the process again, and if it doesn't have to filter it will unmute. So, right now the timing is very dependent on when the elements get added/removed from the DOM.

from advancedprofanityfilter.

fogoplayer commented on July 24, 2024 2

It looks like we can find the API!

For this video: https://www.youtube.com/watch?v=CWWSovO3Txc

It sent this request and got this response. (Recorded as Gists so I don't add a 9,000 line JSON file to this thread.)

Key insights:

The request seems to grab caption data for the entire video--it's only fired once
It's fired when the user enables Closed Captioning, not on page load.
The jackpot seems to be the events member, which is a list.
- Members of that list are objects that have the following members:
- tStartMs: number
- dDurationMs: number
- wWinId: number
- segs: Object[]
- I'm guessing the first two are the start time and duration of the string represented in the caption
- segs is another list of objects. Those keys are:
  - utf8: a string containing a single word
  - tOffsetMs: number
  - acAsrConf: number

My next steps are going to be to try to whip together a Proof-of-Concept script to validate if this JSON data can be converted into accurate timings. I also might try and write something to crawl over the file and make sure there aren't any exceptions to the grammar above.

P.S.: I know you said you're short on time, so I have no expectation of quick replies. I'll post anything I can find in this thread, and I'll take direction and updates as they come.

from advancedprofanityfilter.

richardfrost commented on July 24, 2024 1

Thanks for the examples @fogoplayer, I'll take a look and see what can be done as soon as I get some more time.

from advancedprofanityfilter.

fogoplayer commented on July 24, 2024

Sounds great!

I'd be happy to pitch in my own typescript skills, and every time I've scanned the repo for words related to audio, muting, etc. I haven't been able to find anything that looks like a relevant segment of code. I don't expect you to hold my hand through the whole process, but if you can point me in a direction I'd love to get started on a PR!

from advancedprofanityfilter.

fogoplayer commented on July 24, 2024

Okay, proof-of-concept done:

// jsdoc-typed js, because there's no need to get Babel involved in this

/**
 * @param {string} captionData - the stringified JSON data from the YT API
 * @param {boolean?} verbose - controls logging
 */
function checkTimings(captionData, verbose = false) {
    /** @type {HTMLVideoElement} */
    const videoElement = document.querySelector("video")

    // YT caption JSON data includes "control characters" that are invalid JSON. 
    // I'm not sure how Google is parsing them, but given that no curse words contain control characters, it's enough for us to just ignore them.
    captionData = captionData.replaceAll(/\"[\n\r\t\v\b\a\f\\\h\x]\"/g, '"control character"')

    const {events} = JSON.parse(captionData)

    for (const event of events) {
        const {tStartMs: startTime, segs} = event
        setTimeout(()=>{
            for (const seg of segs) {
                const {tOffsetMs: delay = 0, utf8: token} = seg
                setTimeout(()=>{
                    console.log(token)
                    if(token.trim() === "earbuds") {
                        videoElement.volume = 0
                    } else {
                        videoElement.volume = 1
                    }
                }, delay / videoElement.playbackRate)
            }
        }, startTime / videoElement.playbackRate)
    }

    videoElement.currentTime = 0
    videoElement.play()
}

A few key findings here:

The start times in the JSON file do seem to have the obvious meaning!
Sanitization of the JSON file may prove to be a non-trival difficulty here. I know there are ways to access the response.json() directly, and if that method is properly escaped it might be a non-issue.

And the big one:

Timings were still somewhat inaccurate. They started out extremely accurate, but seemed to drift as time went on, and by the three minute mark they were noticeably lagging behind. It could be that YouTube's time stamps are off, but it seems more likely that it's an issue with my code.*
This is where I'm likely to focus going forward. I've heard using timeouts inside a Web Worker makes them more accurate, but a solution that runs on the timeUpdate event might be better.

*setTimeout is known to drift, and I wasn't trying particularly hard to optimize my code so there may be additional delays (such as the use of a slower for loop syntax).In addition to the method above (timeouts inside timeouts), I also tried a technique where I computed the total time offset before creating the timeout, with no noticeable difference.

from advancedprofanityfilter.

fogoplayer commented on July 24, 2024

New attempt:

/** 
 * @typedef {{
 *   timeMs: number
 *   token: string
 * }} 
 */

/**
 * @param {string} captionData - the stringified JSON data from the YT API
 * @param {boolean?} verbose - controls logging
 */
function checkTimings(captionData, wordToMute) {
    ////////////////////////
    // Parse caption data //
    ////////////////////////
    // YT caption JSON data includes "control characters" that are invalid JSON. 
    // I'm not sure how Google is parsing them, but given that no curse words contain control characters, it's enough for us to just ignore them.
    captionData = captionData.replaceAll(/\"[\n\r\t\v\b\a\f\\\h\x]\"/g, '"control character"')
    const {events} = JSON.parse(captionData)

    ////////////////////////////////////
    // Create sorted timestamps array //
    ////////////////////////////////////
    /**
     * @type {[]}
     */
    const timestamps = []

    for (const event of events) {
        const {tStartMs: startTime, segs} = event
        if(!segs) continue

        for (const seg of segs) {
            const {tOffsetMs: delay = 0, utf8: token} = seg
            timestamps.push({timeMs: startTime + delay, token})
        }
    }

    //////////////////////////////////////////////////////
    // Check current token each time the player updates //
    //////////////////////////////////////////////////////
    const video = document.querySelector("video")
    // removed due to low polling rate
    // video.ontimeupdate = 
    
    // we'd probably want to add and remove the interval on play and pause events, but good enough for now
    clearInterval(window.captionInterval)
    window.captionInterval = setInterval(() => {
        const start = performance.now()
        const {token} = binSearch(video.currentTime * 1000)
        const end = performance.now()
        console.log(token, "\t", video.currentTime * 1000)

        if(token.trim() === wordToMute) video.volume = 0
        else video.volume = 1
    }, 50)

    video.currentTime = 0
    video.play()

    /**
     * Binary search of timestamps array
     * @param {number} val the current time in ms
     * @returns 
     */
    function binSearch(val, start=0, end=timestamps.length) {
        if(end-start <= 1) return timestamps[start]

        const med = Math.floor((start+end)/2)
        if(timestamps[med].timeMs > val) return binSearch(val, start, med)
        else return binSearch(val, med, end)
    }
}

rather than setting timeouts for future mutes, i check the current time of the video, do a binary search to turn that into a token, and then apply filtering if that token matches the passed-in word to block.

I found a lot of benefits to this approach. It didn't decay over time, like setTimeout. it handles changes to the playback rate and skipping around the video by default, without any extra logic. The binary search is super fast--the performance API often said its runtime was 0ms.

However, it still comes in too late sometimes, and at this point i think that's probably due to an inaccuracy in the timings in the captions. Increasing the polling rate to 100hz and decreasing the playback rate had no effect on the accuracy.

I wonder if a setting could be added to make audio censoring always come in early, kind of like how a minimum duration can be set right now?

from advancedprofanityfilter.

richardfrost commented on July 24, 2024

Thank you so much for all your work with this @fogoplayer, you were very thorough! I'm sorry its taken longer for me to get back to you with it. I will take some time to go through it all and let you know some next steps with where we can go with it.

I do agree also with your conclusion that the actual timing info may not be accurate, but the problem before was that we didn't have the timing info available, so there was no way to mute pre-emptively, but now with this information we should be able to. It likely wouldn't need to be much extra, but I do think it could be an option that we could allow.

from advancedprofanityfilter.

Muting is delayed about advancedprofanityfilter HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent