Giter Site home page Giter Site logo

Comments (5)

mgorbat avatar mgorbat commented on September 26, 2024

@Alex-Cook4, depending on a specific environment, the role of a speaker (Agent/Client; Trader/Client) can possibly be retrieved and assigned to each individual rtpStreamId.
I'd suggest to keep it, but produce a default value such as Unassigned for environments, where it cannot be done. I also suggest to rename it from role into speakerRole.

Regarding the speakerId, isn't it required for environments where speaker diarization is required? In that case, there may be multiple different values of speakerId (1,2,3...) for each individual rtpStreamId.

I agree with renaming suggestions with one minor correction: rtpStreamStartTime, not rtpStreamsStartTime.

What'd you think?

from streamsx.speech2text.

Alex-Cook4 avatar Alex-Cook4 commented on September 26, 2024

@mgorbat Thanks for the input.
With regards to:

depending on a specific environment, the role of a speaker (Agent/Client; Trader/Client) can possibly be retrieved and assigned to each individual rtpStreamId.

Is that true coming from the RTP packets themselves? I understand that we can get that information from a CTI feed that we later correlate with the identifiers in the RTP Stream, but if that's what you're talking about, then those attributes shouldn't show up until later in my opinion.

Regarding:

Regarding the speakerId, isn't it required for environments where speaker diarization is required? In that case, there may be multiple different values of speakerId (1,2,3...) for each individual rtpStreamId.

Diarization results are currently placed in the list<int32> utteranceSpeakers since a given utterance may have multiple speaker identifiers.

from streamsx.speech2text.

mgorbat avatar mgorbat commented on September 26, 2024

@Alex-Cook4
my understanding is that in certain cases the range or list of ip addresses and ports present in the RTP packets and the direction of a stream can be identified as originating from an agent or a client. It is hard to say though how frequent those cases are and whether the direction of an RTP stream (forward/reverse) will not be lost after a network tap point.

Agree with removal of speakerId, as I haven't noticed there is an attribute for this already.

from streamsx.speech2text.

Alex-Cook4 avatar Alex-Cook4 commented on September 26, 2024

In that case, I'm fine with keeping the role attribute as an indicator that this is something that can potentially be set in customized situations. My updated proposal would be the following:

  • callId -> rtpStreamId: since this isn't actually the id of a call, it only has a single speaker. The true call id comes from CTI correlation and would have multiple of these "callId"s.
  • captureSeconds -> rtpStreamStartTime: since it actually refers to the captureSeconds of the first packet in the RTP stream
  • role -> keep, but document that it is currently unusable
  • speakerId -> REMOVE: unless there are plans to support this in some way

from streamsx.speech2text.

Alex-Cook4 avatar Alex-Cook4 commented on September 26, 2024

I would also like to add a:

  • rtpStreamComplete - boolean attribute to indicate that this is the last utterance from a stream

from streamsx.speech2text.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.