Giter Site home page Giter Site logo

Comments (7)

richardfrost avatar richardfrost commented on July 24, 2024 2

Thank you for the offer @fogoplayer! At least for now, I've had to remove the GPL/open-source license on the audio component, so it isn't in this repo anymore. I have a pretty busy next couple of days, but after that I hope to have some time to dedicate to this. If I had to guess, it is probably a case of the captions being added out of sync with the audio, but that's just a guess. It seems like YouTube has gotten progressively worse over the last couple years with that. I'll definitely take a closer look as soon as I can though, and report findings back here.

If you wanted to do some more research, I'm curious if we can find the API that YouTube is using to request the captions, and see if it also includes more timing info. Right now the muting is pretty simple. It basically has a mutation observer that watches for any nodes being added or modified, and then it determines if it is part of the captions. Then, it filters the node and if it contains a word that needed to be filtered it will mute until the next word/phrase gets added to the page, and then it repeats the process again, and if it doesn't have to filter it will unmute. So, right now the timing is very dependent on when the elements get added/removed from the DOM.

from advancedprofanityfilter.

fogoplayer avatar fogoplayer commented on July 24, 2024 2

It looks like we can find the API!

For this video: https://www.youtube.com/watch?v=CWWSovO3Txc

It sent this request and got this response. (Recorded as Gists so I don't add a 9,000 line JSON file to this thread.)

Key insights:

  • The request seems to grab caption data for the entire video--it's only fired once
  • It's fired when the user enables Closed Captioning, not on page load.
  • The jackpot seems to be the events member, which is a list.
    • Members of that list are objects that have the following members:
    • tStartMs: number
    • dDurationMs: number
    • wWinId: number
    • segs: Object[]
    • I'm guessing the first two are the start time and duration of the string represented in the caption
    • segs is another list of objects. Those keys are:
      • utf8: a string containing a single word
      • tOffsetMs: number
      • acAsrConf: number

My next steps are going to be to try to whip together a Proof-of-Concept script to validate if this JSON data can be converted into accurate timings. I also might try and write something to crawl over the file and make sure there aren't any exceptions to the grammar above.

P.S.: I know you said you're short on time, so I have no expectation of quick replies. I'll post anything I can find in this thread, and I'll take direction and updates as they come.

from advancedprofanityfilter.

richardfrost avatar richardfrost commented on July 24, 2024 1

Thanks for the examples @fogoplayer, I'll take a look and see what can be done as soon as I get some more time.

from advancedprofanityfilter.

fogoplayer avatar fogoplayer commented on July 24, 2024

Sounds great!

I'd be happy to pitch in my own typescript skills, and every time I've scanned the repo for words related to audio, muting, etc. I haven't been able to find anything that looks like a relevant segment of code. I don't expect you to hold my hand through the whole process, but if you can point me in a direction I'd love to get started on a PR!

from advancedprofanityfilter.

fogoplayer avatar fogoplayer commented on July 24, 2024

Okay, proof-of-concept done:

// jsdoc-typed js, because there's no need to get Babel involved in this

/**
 * @param {string} captionData - the stringified JSON data from the YT API
 * @param {boolean?} verbose - controls logging
 */
function checkTimings(captionData, verbose = false) {
    /** @type {HTMLVideoElement} */
    const videoElement = document.querySelector("video")

    // YT caption JSON data includes "control characters" that are invalid JSON. 
    // I'm not sure how Google is parsing them, but given that no curse words contain control characters, it's enough for us to just ignore them.
    captionData = captionData.replaceAll(/\"[\n\r\t\v\b\a\f\\\h\x]\"/g, '"control character"')

    const {events} = JSON.parse(captionData)

    for (const event of events) {
        const {tStartMs: startTime, segs} = event
        setTimeout(()=>{
            for (const seg of segs) {
                const {tOffsetMs: delay = 0, utf8: token} = seg
                setTimeout(()=>{
                    console.log(token)
                    if(token.trim() === "earbuds") {
                        videoElement.volume = 0
                    } else {
                        videoElement.volume = 1
                    }
                }, delay / videoElement.playbackRate)
            }
        }, startTime / videoElement.playbackRate)
    }

    videoElement.currentTime = 0
    videoElement.play()
}

A few key findings here:

  • The start times in the JSON file do seem to have the obvious meaning!
  • Sanitization of the JSON file may prove to be a non-trival difficulty here. I know there are ways to access the response.json() directly, and if that method is properly escaped it might be a non-issue.

And the big one:

  • Timings were still somewhat inaccurate. They started out extremely accurate, but seemed to drift as time went on, and by the three minute mark they were noticeably lagging behind. It could be that YouTube's time stamps are off, but it seems more likely that it's an issue with my code.*
    This is where I'm likely to focus going forward. I've heard using timeouts inside a Web Worker makes them more accurate, but a solution that runs on the timeUpdate event might be better.

*setTimeout is known to drift, and I wasn't trying particularly hard to optimize my code so there may be additional delays (such as the use of a slower for loop syntax).In addition to the method above (timeouts inside timeouts), I also tried a technique where I computed the total time offset before creating the timeout, with no noticeable difference.

from advancedprofanityfilter.

fogoplayer avatar fogoplayer commented on July 24, 2024

New attempt:

/** 
 * @typedef {{
 *   timeMs: number
 *   token: string
 * }} 
 */

/**
 * @param {string} captionData - the stringified JSON data from the YT API
 * @param {boolean?} verbose - controls logging
 */
function checkTimings(captionData, wordToMute) {
    ////////////////////////
    // Parse caption data //
    ////////////////////////
    // YT caption JSON data includes "control characters" that are invalid JSON. 
    // I'm not sure how Google is parsing them, but given that no curse words contain control characters, it's enough for us to just ignore them.
    captionData = captionData.replaceAll(/\"[\n\r\t\v\b\a\f\\\h\x]\"/g, '"control character"')
    const {events} = JSON.parse(captionData)

    ////////////////////////////////////
    // Create sorted timestamps array //
    ////////////////////////////////////
    /**
     * @type {[]}
     */
    const timestamps = []

    for (const event of events) {
        const {tStartMs: startTime, segs} = event
        if(!segs) continue

        for (const seg of segs) {
            const {tOffsetMs: delay = 0, utf8: token} = seg
            timestamps.push({timeMs: startTime + delay, token})
        }
    }

    //////////////////////////////////////////////////////
    // Check current token each time the player updates //
    //////////////////////////////////////////////////////
    const video = document.querySelector("video")
    // removed due to low polling rate
    // video.ontimeupdate = 
    
    // we'd probably want to add and remove the interval on play and pause events, but good enough for now
    clearInterval(window.captionInterval)
    window.captionInterval = setInterval(() => {
        const start = performance.now()
        const {token} = binSearch(video.currentTime * 1000)
        const end = performance.now()
        console.log(token, "\t", video.currentTime * 1000)

        if(token.trim() === wordToMute) video.volume = 0
        else video.volume = 1
    }, 50)

    video.currentTime = 0
    video.play()

    /**
     * Binary search of timestamps array
     * @param {number} val the current time in ms
     * @returns 
     */
    function binSearch(val, start=0, end=timestamps.length) {
        if(end-start <= 1) return timestamps[start]

        const med = Math.floor((start+end)/2)
        if(timestamps[med].timeMs > val) return binSearch(val, start, med)
        else return binSearch(val, med, end)
    }
}

rather than setting timeouts for future mutes, i check the current time of the video, do a binary search to turn that into a token, and then apply filtering if that token matches the passed-in word to block.

I found a lot of benefits to this approach. It didn't decay over time, like setTimeout. it handles changes to the playback rate and skipping around the video by default, without any extra logic. The binary search is super fast--the performance API often said its runtime was 0ms.

However, it still comes in too late sometimes, and at this point i think that's probably due to an inaccuracy in the timings in the captions. Increasing the polling rate to 100hz and decreasing the playback rate had no effect on the accuracy.

I wonder if a setting could be added to make audio censoring always come in early, kind of like how a minimum duration can be set right now?

from advancedprofanityfilter.

richardfrost avatar richardfrost commented on July 24, 2024

Thank you so much for all your work with this @fogoplayer, you were very thorough! I'm sorry its taken longer for me to get back to you with it. I will take some time to go through it all and let you know some next steps with where we can go with it.

I do agree also with your conclusion that the actual timing info may not be accurate, but the problem before was that we didn't have the timing info available, so there was no way to mute pre-emptively, but now with this information we should be able to. It likely wouldn't need to be much extra, but I do think it could be an option that we could allow.

from advancedprofanityfilter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.