ml4ai / tomcat-speechanalyzer Goto Github PK

View Code? Open in Web Editor NEW

3.0 10.0 0.0 7.76 MB

An agent program for multi-participant speech analysis.

CMake 2.10% C++ 71.46% PHP 14.48% Shell 0.47% Dockerfile 0.67% C 10.25% HTML 0.58%

tomcat-speechanalyzer's People

Contributors

Stargazers

Watchers

tomcat-speechanalyzer's Issues

data.features.utterance_id is redundant with data.id in agent/asr/final messages

The data.features.utterance_id field is identical to the data.id field in agent/asr/final messages. Is this intentional? If not, let's remove it.

Pinning OpenSMILE version

We should pin the version of OpenSMILE used in the Docker image (right now it builds from the latest master branch version of the OpenSMILE Github repo) for the sake of reproducibility. Ideally, we should simply use a tagged release from their Github repo.

@paulosoaresua - is this something you can take care of? Feel free to delegate the task to an undergrad or someone else if you'd like.

Remove data.alternatives field for intermediate Google transcriptions

According to the Google Cloud Speech documentation, confidence values are only set for final transcriptions. From looking at the speechAnalyzer outputs in the reprocessed spiral 3 data, it looks like intermediate transcription messages only contain one alternative, so there is no point in providing this field to the users.

Thus, we should remove data.alternatives for messages published on agent/asr/intermediate.

Audio chunk timestamps

We will need to capture the timestamps of the individual incoming audio chunks.

@vincentraymond-ua Could you please implement this? Basically, every time an audio chunk reaches the speechAnalyzer agent over websocket, it should publish a message to the message bus to the topic agent/speech_analyzer/chunk_metadata (or some better name if you come up with one!) with the timestamp, chunk size in bytes, participant_id, and the header and msg parts of the common TA3 message format.

UTC timestamps in features table

Currently, the timestamp in the features table seems to be a relative one. We should also have a column for the absolute timestamp if possible.

Agent info message

With data collection starting soon and the new features that are in the pipeline (e.g. vocalic feature extraction, sentiment analysis), it makes sense to add the agent version info message proposed by JCR here:

https://gitlab.asist.aptima.com/asist/testbed/-/blob/develop/MessageSpecs/Agent/versioninfo/agent_versioninfo.md

While this is a draft, I doubt that there will be large changes to the format, so I think we can go ahead and add it.

This message should be published to the message bus whenever there is a trial start message published on the bus.

For now, just include the agent_name, version, and owner fields in the data part of the message. We can update the format when it gets finalized in the testbed WG.

data.features.time_interval field

Is the data.features.time_interval field still required?

Stream restarting

@vincentraymond-ua Google Cloud Speech's streaming recognition requests have a 5 minute limit, so we would need to restart the streams periodically to get around this limit and do endless streaming transcription. The Python version has this implemented, the C++ version should be updated to do this as well.

See here: https://github.com/clulab/tomcat-speech/blob/99f660e1f78a2722a7a88687e9a5ebdfe48a7faa/agents/asr/google_asr_client.py#L110-L115

And here: https://github.com/clulab/tomcat-speech/blob/99f660e1f78a2722a7a88687e9a5ebdfe48a7faa/agents/asr/google_asr_client.py#L36-L39

Remove data.sentiment.speaker field

The data.sentiment.speaker field in agent/asr/final messages seems to contain UUIDs that vary for a given speaker. This is likely a bug. Since the data.participant_id identifies the speaker, we probably don't need the data.sentiment.speaker key (unless I'm missing something).

Refactor word alignment messages

We need to refactor our word alignment messages to bypass the issue of Elasticsearch's automatic schema generation.

@vincentraymond-ua - could you please make the data.features part of the message a string instead of a JSON object? The string should be able to be parsed as a JSON object. I think there is a json.dump() method to do this in the nlohmann-json library.

Other changes requested:

change the topic from word/feature to agent/asr/word_alignment
change msg.sub_type from alignment to asr:alignment
Add agent/asr/word_alignment to message_topics.csv

The JSON schema and example in the MessageSpec folder will also need to be updated.

data.features.text is redundant with data.text

data.features.text is identical to data.text in agent/asr/final messages. We should remove data.features.text.

ml4ai / tomcat-speechanalyzer Goto Github PK

tomcat-speechanalyzer's People

Contributors

Stargazers

Watchers

tomcat-speechanalyzer's Issues

data.features.utterance_id is redundant with data.id in agent/asr/final messages

Pinning OpenSMILE version

Remove data.alternatives field for intermediate Google transcriptions

Audio chunk timestamps

UTC timestamps in features table

Agent info message

data.features.time_interval field

Stream restarting

Remove data.sentiment.speaker field

Refactor word alignment messages

data.features.text is redundant with data.text

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent