ml4ai / tomcat-speechanalyzer Goto Github PK
View Code? Open in Web Editor NEWAn agent program for multi-participant speech analysis.
An agent program for multi-participant speech analysis.
The data.features.utterance_id
field is identical to the data.id
field in agent/asr/final
messages. Is this intentional? If not, let's remove it.
We should pin the version of OpenSMILE used in the Docker image (right now it builds from the latest master
branch version of the OpenSMILE Github repo) for the sake of reproducibility. Ideally, we should simply use a tagged release from their Github repo.
@paulosoaresua - is this something you can take care of? Feel free to delegate the task to an undergrad or someone else if you'd like.
According to the Google Cloud Speech documentation, confidence values are only set for final transcriptions. From looking at the speechAnalyzer outputs in the reprocessed spiral 3 data, it looks like intermediate transcription messages only contain one alternative, so there is no point in providing this field to the users.
Thus, we should remove data.alternatives
for messages published on agent/asr/intermediate
.
We will need to capture the timestamps of the individual incoming audio chunks.
@vincentraymond-ua Could you please implement this? Basically, every time an audio chunk reaches the speechAnalyzer agent over websocket, it should publish a message to the message bus to the topic agent/speech_analyzer/chunk_metadata
(or some better name if you come up with one!) with the timestamp, chunk size in bytes, participant_id, and the header
and msg
parts of the common TA3 message format.
Currently, the timestamp in the features
table seems to be a relative one. We should also have a column for the absolute timestamp if possible.
With data collection starting soon and the new features that are in the pipeline (e.g. vocalic feature extraction, sentiment analysis), it makes sense to add the agent version info message proposed by JCR here:
While this is a draft, I doubt that there will be large changes to the format, so I think we can go ahead and add it.
This message should be published to the message bus whenever there is a trial start message published on the bus.
For now, just include the agent_name, version, and owner fields in the data part of the message. We can update the format when it gets finalized in the testbed WG.
Is the data.features.time_interval field still required?
@vincentraymond-ua Google Cloud Speech's streaming recognition requests have a 5 minute limit, so we would need to restart the streams periodically to get around this limit and do endless streaming transcription. The Python version has this implemented, the C++ version should be updated to do this as well.
The data.sentiment.speaker
field in agent/asr/final
messages seems to contain UUIDs that vary for a given speaker. This is likely a bug. Since the data.participant_id
identifies the speaker, we probably don't need the data.sentiment.speaker
key (unless I'm missing something).
We need to refactor our word alignment messages to bypass the issue of Elasticsearch's automatic schema generation.
@vincentraymond-ua - could you please make the data.features
part of the message a string instead of a JSON object? The string should be able to be parsed as a JSON object. I think there is a json.dump()
method to do this in the nlohmann-json library.
Other changes requested:
word/feature
to agent/asr/word_alignment
msg.sub_type
from alignment
to asr:alignment
agent/asr/word_alignment
to message_topics.csv
The JSON schema and example in the MessageSpec
folder will also need to be updated.
data.features.text
is identical to data.text
in agent/asr/final messages. We should remove data.features.text
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.