Comments (5)
@Alex-Cook4, depending on a specific environment, the role of a speaker (Agent/Client; Trader/Client) can possibly be retrieved and assigned to each individual rtpStreamId.
I'd suggest to keep it, but produce a default value such as Unassigned for environments, where it cannot be done. I also suggest to rename it from role into speakerRole.
Regarding the speakerId, isn't it required for environments where speaker diarization is required? In that case, there may be multiple different values of speakerId (1,2,3...) for each individual rtpStreamId.
I agree with renaming suggestions with one minor correction: rtpStreamStartTime, not rtpStreamsStartTime.
What'd you think?
from streamsx.speech2text.
@mgorbat Thanks for the input.
With regards to:
depending on a specific environment, the role of a speaker (Agent/Client; Trader/Client) can possibly be retrieved and assigned to each individual rtpStreamId.
Is that true coming from the RTP packets themselves? I understand that we can get that information from a CTI feed that we later correlate with the identifiers in the RTP Stream, but if that's what you're talking about, then those attributes shouldn't show up until later in my opinion.
Regarding:
Regarding the speakerId, isn't it required for environments where speaker diarization is required? In that case, there may be multiple different values of speakerId (1,2,3...) for each individual rtpStreamId.
Diarization results are currently placed in the list<int32> utteranceSpeakers since a given utterance may have multiple speaker identifiers.
from streamsx.speech2text.
@Alex-Cook4
my understanding is that in certain cases the range or list of ip addresses and ports present in the RTP packets and the direction of a stream can be identified as originating from an agent or a client. It is hard to say though how frequent those cases are and whether the direction of an RTP stream (forward/reverse) will not be lost after a network tap point.
Agree with removal of speakerId, as I haven't noticed there is an attribute for this already.
from streamsx.speech2text.
In that case, I'm fine with keeping the role attribute as an indicator that this is something that can potentially be set in customized situations. My updated proposal would be the following:
- callId -> rtpStreamId: since this isn't actually the id of a call, it only has a single speaker. The true call id comes from CTI correlation and would have multiple of these "callId"s.
- captureSeconds -> rtpStreamStartTime: since it actually refers to the captureSeconds of the first packet in the RTP stream
- role -> keep, but document that it is currently unusable
- speakerId -> REMOVE: unless there are plans to support this in some way
from streamsx.speech2text.
I would also like to add a:
- rtpStreamComplete - boolean attribute to indicate that this is the last utterance from a stream
from streamsx.speech2text.
Related Issues (6)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from streamsx.speech2text.