Giter Site home page Giter Site logo

Question about wav content about localcroft HOT 8 OPEN

emphasize avatar emphasize commented on August 9, 2024
Question about wav content

from localcroft.

Comments (8)

el-tocino avatar el-tocino commented on August 9, 2024

You can train precise for recognizing sneezes, actually, if so inclined.

Using sox you can trim longer clips down, based on silence between words. Aim for 3s or less per clip. Then dump them in the nww folders as appropriate. It's still better to try and use false activation words and noises where possible. Random speech will help to an extent, but you also want to fine-tune this to be as accurate to both the wake word and discerning against not-wake-word as possible.

from localcroft.

emphasize avatar emphasize commented on August 9, 2024

Thanks, that's not meant to be a supplement. More an addition to the word finder methods you suggest.

Mozillas common voice dataset is an exceptable source then. Sadly not words, but short sentences with 6 or less words. And a hefty amount of data that's at least somewhat "peer-reviewed".

Do you recommend some ambient sound sources besides the tuxfamily.org suggestion?

--
Short additional question: What is meant by the batch -b option flag of precise-train?

Cheers,
Swen

from localcroft.

el-tocino avatar el-tocino commented on August 9, 2024

Precise community data has a not wake word section including some noises. The google speech commands dataset is an ideal addition to not wake words (though it's large, and will significantly increase training time). Recording ambient noise is pretty easy with a cell phone as well.

Batch size is useful for making a wider pass of data for each epoch, I tend to use pretty large sizes (5000?), some experimentation would be useful.

Latest Common Voice now has a large subset of single word entries.

from localcroft.

emphasize avatar emphasize commented on August 9, 2024

Google Research has a lot of different language datasets (Nepali, who would guess that), but unfortunatly no german one. Or do you suggesting that languages itself play a lesser role?
GSC v.2 is already downloaded, but then i realized: there's not much spoken english around here ;)

I think i will train them in a Raspbian VirtualMachine, if that's possible. Or turn to Windows completely for that process. My pi buddies already sweatin'.

from localcroft.

el-tocino avatar el-tocino commented on August 9, 2024

The language isn't as important as phonemes and pattern of words.

I'd train on a desktop rather than a pi with that volume of data. ;)

from localcroft.

emphasize avatar emphasize commented on August 9, 2024

After reviewing the common voice dataset more closely i think i'm pressed to trim down parts

based on silence between words

Do you mind sharing some useful sox commands?

Cheers

from localcroft.

emphasize avatar emphasize commented on August 9, 2024

I have a proposition myself.

https://d-rhyme.de/worte-verdrehen/

In general it's more for our german audience, but this particular section "twists words" in a way that the middle part of the name/word is replaced by random syllable(s?)/letters (word length is constant) - and therefor language agnostic

Let's say the wake word is "Samira". he spits out Salisa, Savita, Saliga, Sakita, ...

In my understanding that should be a great addition to the wordfinder/rhyme methods given by your howto.

from localcroft.

el-tocino avatar el-tocino commented on August 9, 2024

Try it and see?

Google sox silence, i don't have it handy and it'll explain the parameters better.

from localcroft.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.