Giter Site home page Giter Site logo

Support for audio use cases about turicreate HOT 24 CLOSED

apple avatar apple commented on July 20, 2024 4
Support for audio use cases

from turicreate.

Comments (24)

TobyRoseman avatar TobyRoseman commented on July 20, 2024 3

It's slated for the 5.4 release, which we plan to go out in March.

from turicreate.

MatthewWaller avatar MatthewWaller commented on July 20, 2024 2

Would love to know about this as well, especially as it relates to speech recognition.

from turicreate.

MatthewWaller avatar MatthewWaller commented on July 20, 2024 1

I would disagree with @coolioxlr

Appleโ€™s native SDK does not allow for all on-device speech recognition and even wants you not to speak health and other sensitive data.

And there is the opportunity to optimize for specific words in specialized fields. Any work you can do in general speech recognition on device with CoreML would be helpful.

from turicreate.

TobyRoseman avatar TobyRoseman commented on July 20, 2024 1

This conversation is certainly not dead. In fact I just put up two pull requests for a sound classifier.

@jamois - could you tell me more about your use case?

from turicreate.

davidcittadini avatar davidcittadini commented on July 20, 2024 1

@TobyRoseman Have you had any more thoughts about using ML to apply "effects" to audio. ML could be very useful with non-linear audio, which existing coding approaches are not very good at. For example, ML could learn a distortion profile for an audio stream and then apply that same distortion profile to any clean audio stream. The trick is to then be able to apply the model to live, real-time audio streams.

from turicreate.

jamois avatar jamois commented on July 20, 2024 1

Excellent! Thanks for the update and have a nice weekend.

from turicreate.

coolioxlr avatar coolioxlr commented on July 20, 2024 1

Thanks @TobyRoseman. This is exactly what I have been waiting. Looking forward to WWDC too.

from turicreate.

coolioxlr avatar coolioxlr commented on July 20, 2024

+1

from turicreate.

TobyRoseman avatar TobyRoseman commented on July 20, 2024

@jrjames83 @MatthewWaller @coolioxlr - Could you please share more details about what types of audio use cases you would like us to support?

from turicreate.

MatthewWaller avatar MatthewWaller commented on July 20, 2024

Thanks for looking into this @TobyRoseman.

Speech recognition, as mentioned before, would be great in a toolkit that takes something like frames of MFCC features and outputs probabilities of letters and punctuation at each frame. Something like the Deepspeech architecture that Mozilla is working on or Listen Attend Spell architectures that Google has recently published on.

Outside of that, it would be great to have a deep learning speaker diarization toolkit that can identify different speakers in an audio file.

from turicreate.

coolioxlr avatar coolioxlr commented on July 20, 2024

@tbartelmess Will be great to provide a simple example like the following just detecting few commands https://www.tensorflow.org/tutorials/sequences/audio_recognition
or
https://github.com/aqibsaeed/Urban-Sound-Classification
I know we can kind of achieve this using the activity classification sample in Turi create but they are not optimized for audio classification. An iOS sample how to use the model will be helpful as well since we might have to convert the audio to spectrogram.

I don't think building another deep learning speech recognition model is helpful here since iOS already provides speech recognition in native SDK.

from turicreate.

narner avatar narner commented on July 20, 2024

Hey there; just wanted to see if there was any update on this - thanks!

from turicreate.

TobyRoseman avatar TobyRoseman commented on July 20, 2024

@davidcittadini - that is a cool use case. Thanks for sharing. Unfortunately this is not possible with Turi Create.

from turicreate.

jamois avatar jamois commented on July 20, 2024

Hoping this conversation is not dead. I too am interested in a Turi example using audio, not necessarily for speech recognition. Thx.

from turicreate.

jamois avatar jamois commented on July 20, 2024

This conversation is certainly not dead. In fact I just put up two pull requests for a sound classifier.

@jamois - could you tell me more about your use case?

Sure. I just want to be able to train a model using audio files (e.g. .wav). So, for instance, if I have 5 sounds I want my system to recognize, I would train using 5 classes where each class would be represented by numerous (e.g. 100) sound files. I know all of this is possible via Tensorflow but would prefer (at the moment) to use Turi if possible. Thanks for the help!

from turicreate.

TobyRoseman avatar TobyRoseman commented on July 20, 2024

@jamois - Your use case sounds like exactly what we are planning to support with our new Sound Classifier.

from turicreate.

jamois avatar jamois commented on July 20, 2024

@jamois - Your use case sounds like exactly what we are planning to support with our new Sound Classifier.

Thanks for the update. When are you planning to roll this out?

from turicreate.

TobyRoseman avatar TobyRoseman commented on July 20, 2024

@davidcittadini - I have not thought more about this, but it sounds very interesting. I'd like to learn more. Are there any resources (ex: papers, blog posts, other products) you recommend?

from turicreate.

rplom avatar rplom commented on July 20, 2024

It's slated for the 5.4 release, which we plan to go out in March.

I was about to implement my own custom classifier when I ran into this post. How will it be accessed in the client code? IE: There's MLImageClassifier will there be a MLSoundClassifier? Or will clients writer their own?

from turicreate.

TobyRoseman avatar TobyRoseman commented on July 20, 2024

@rplom - to be clear: the Sound Classifier will be included in the next release of Turi Create. Two new functions will be added:

turicreate.load_audio(...)
turicreate.sound_classifier.create(...)

The first version of the sound classifier will support exporting to Core ML.

from turicreate.

TobyRoseman avatar TobyRoseman commented on July 20, 2024

Everything needed to use the Sound Classifier has now been merged into master. If you're willing to build from master, please give it a try.

I'm currently working on updating our User Guide with a Sound Classifier section. Until then you should be able to get started by using the docstrings of the above methods.

from turicreate.

rplom avatar rplom commented on July 20, 2024

This is great!

from turicreate.

jamois avatar jamois commented on July 20, 2024

Wow, great news! Thanks @TobyRoseman !

from turicreate.

TobyRoseman avatar TobyRoseman commented on July 20, 2024

Turi Create 5.4 is now launched. With this version you can create a sound classifier, using turicreate.load_audio(...) and turicreate.sound_classifier.create(...).

See the Sound Classifier Section of the User Guide for details.

Since we now support an audio use case, I'm going to close this issue. Feel free to open new issues, either about the sound classifier or for new audio use cases.

from turicreate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.