Giter Site home page Giter Site logo

tomcat-speechanalyzer's People

Contributors

adarshp avatar vincentraymond-ua avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tomcat-speechanalyzer's Issues

Pinning OpenSMILE version

We should pin the version of OpenSMILE used in the Docker image (right now it builds from the latest master branch version of the OpenSMILE Github repo) for the sake of reproducibility. Ideally, we should simply use a tagged release from their Github repo.

@paulosoaresua - is this something you can take care of? Feel free to delegate the task to an undergrad or someone else if you'd like.

Audio chunk timestamps

We will need to capture the timestamps of the individual incoming audio chunks.

@vincentraymond-ua Could you please implement this? Basically, every time an audio chunk reaches the speechAnalyzer agent over websocket, it should publish a message to the message bus to the topic agent/speech_analyzer/chunk_metadata (or some better name if you come up with one!) with the timestamp, chunk size in bytes, participant_id, and the header and msg parts of the common TA3 message format.

UTC timestamps in features table

Currently, the timestamp in the features table seems to be a relative one. We should also have a column for the absolute timestamp if possible.

Agent info message

With data collection starting soon and the new features that are in the pipeline (e.g. vocalic feature extraction, sentiment analysis), it makes sense to add the agent version info message proposed by JCR here:

https://gitlab.asist.aptima.com/asist/testbed/-/blob/develop/MessageSpecs/Agent/versioninfo/agent_versioninfo.md

While this is a draft, I doubt that there will be large changes to the format, so I think we can go ahead and add it.

This message should be published to the message bus whenever there is a trial start message published on the bus.

For now, just include the agent_name, version, and owner fields in the data part of the message. We can update the format when it gets finalized in the testbed WG.

Stream restarting

@vincentraymond-ua Google Cloud Speech's streaming recognition requests have a 5 minute limit, so we would need to restart the streams periodically to get around this limit and do endless streaming transcription. The Python version has this implemented, the C++ version should be updated to do this as well.

See here: https://github.com/clulab/tomcat-speech/blob/99f660e1f78a2722a7a88687e9a5ebdfe48a7faa/agents/asr/google_asr_client.py#L110-L115

And here: https://github.com/clulab/tomcat-speech/blob/99f660e1f78a2722a7a88687e9a5ebdfe48a7faa/agents/asr/google_asr_client.py#L36-L39

Remove data.sentiment.speaker field

The data.sentiment.speaker field in agent/asr/final messages seems to contain UUIDs that vary for a given speaker. This is likely a bug. Since the data.participant_id identifies the speaker, we probably don't need the data.sentiment.speaker key (unless I'm missing something).

Refactor word alignment messages

We need to refactor our word alignment messages to bypass the issue of Elasticsearch's automatic schema generation.

@vincentraymond-ua - could you please make the data.features part of the message a string instead of a JSON object? The string should be able to be parsed as a JSON object. I think there is a json.dump() method to do this in the nlohmann-json library.

Other changes requested:

  • change the topic from word/feature to agent/asr/word_alignment
  • change msg.sub_type from alignment to asr:alignment
  • Add agent/asr/word_alignment to message_topics.csv

The JSON schema and example in the MessageSpec folder will also need to be updated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.