ibmstreams / streamsx.speech2text Goto Github PK

View Code? Open in Web Editor NEW

3.0 15.0 5.0 230 KB

(incubation) Provides ability to transform speech to text in a Streams application

Home Page: http://ibmstreams.github.io/streamsx.speech2text

License: Apache License 2.0

Makefile 21.92% Shell 40.21% Java 37.87%

speech2text ibm-streams stream-processing speech-to-text

streamsx.speech2text's Introduction

streamsx.speech2text Repository

This repository provides supporting applications/solutions, as well as microservices for effective transformation and analysis of speech to text in a Streams application using the product-included Speech2Text Toolkit.

Check out this video about how Verizon is using Speech2Text in Streams: https://youtu.be/Zg-_BJt6jdc

This is NOT the Speech2Text Toolkit

Using the Speech2Text toolkit with the WatsonS2T operator requires purchase of the IBM Streams product and is included as a separate download (no extra cost).

Build toolkit

Install cyrus-sasl-devel.x86_64 (this is only needed for the dps toolkit, i.e. the CallState application): yum install cyrus-sasl-devel.x86_64
Run ant: ant

streamsx.speech2text's People

Contributors

Stargazers

Watchers

Forkers

alex-cook4 mgorbat jjbosox adamtorn

streamsx.speech2text's Issues

Project population and graduation status

Please populate repository.

To graduate from incubation phase, please keep GRADUATION_STATUS.md up to date.

When ready to move out of incubation phase, please open an issue to the streamsx.adminstration project. PMC members will review the GRADUATION_STATUS.md page and vote if the project is ready to be moved out of incubation.

change name of the default branch

as per guidelines the "main" branch shall no longer have the name "master"

RTPExtract Operator Fails for some packets

I have seen the RTPExtract operator failing with "out of memory" errors caused by certain PCAP files.

DatacenterSink Job should send JSON tuples rather than Streams tuples

Changing to JSON tuples will allow changes to the schema without breaking downstream applications (assuming we add more fields).

Naming of some attributes/types are deceptive

I haven't started changing attribute names yet because I want to try to keep the work backwards compatible for now. This issue is more for guidance when we decide we can really overhaul things.

Since I keep getting confused, I'm working on adding documentation on what each type means. I will call out attributes that I believe are misleading.

In the Utterance:

type  Utterance	=	
	tuple<rstring callId, 					// ipAddress + captureSeconds -> In an environment where
								// each speaker is on a separate RTP Streams, the callId
								// is effectively the ID for that speaker's stream. 
								// In the one-speaker to one-stream case, CTI correlation
								// must be done. 
		int32 utteranceNumber, 				// The utterance number for given RTP Stream, [0,1,...]
		float64 utteranceStartTime, 			// Seconds of audio processed for a given RTP Stream
								// up to start of the Utterance
        	float64 utteranceEndTime, 			// Seconds of audio processed for a given RTP Stream
								// up to end of the Utterance
        	uint32 captureSeconds,  			// This refers to the capture time in seconds of the first
								// RTP packet in the SSRC stream
        	rstring role, 					// role = "AGENT" -- this is currently useless
		rstring utterance, 				// The text of a single utterance
		int32 speakerId, 				// Not used - based on a channel id that is set to 0, since 
								// we only handle a single channel at a time
        	rstring callCenter, 				// ID for the call center the utterance is coming from
        	float64 utteranceConfidence, 			// Statistical confidence in the transcription of the utterance
        	list<float64>  utteranceTokenConfidences/*, 	// Statistical confidence in each token/word of the utterance
        	list<int32> utteranceSpeakers, 			// If using diarization, speaker of each token/word
		list<rstring> nBestHypotheses*/> ; 		// Alternative guesses for the utterance text

I recommend the following:

callId -> rtpStreamId: since this isn't actually the id of a call, it only has a single speaker. The true call id comes from CTI correlation and would have multiple of these "callId"s.
captureSeconds -> rtpStreamsStartTime: since it actually refers to the captureSeconds of the first packet in the RTP stream
role -> REMOVE: unless there are plans to support this in some way
speakerId -> REMOVE: unless there are plans to support this in some way

As I see other types/attributes I think could be cleaned up, I will add them to this issue.

CallCenter application should leverage "transcriptionComplete" attribute

Newer versions of the WatsonS2T operator have a "transcriptionComplete" field that is set when sending the last utterance for a given id (in the case of files, it's the last utterance of the file).

Currently, parts of the code rely upon a utteranceNumber == -1 to indicate that an RTPStream has finished processing. While this tuple with -1 will be emitted most of the time, it is only in the cases where a "partialUtterance" was not found when the reset signal was received, therefore it is unreliable.

This is probably most likely to happen if a call were to disconnected mid-sentence, otherwise it's probably less likely.

ibmstreams / streamsx.speech2text Goto Github PK

streamsx.speech2text's Introduction

streamsx.speech2text Repository

This is NOT the Speech2Text Toolkit

Build toolkit

streamsx.speech2text's People

Contributors

Stargazers

Watchers

Forkers

streamsx.speech2text's Issues

Project population and graduation status

change name of the default branch

RTPExtract Operator Fails for some packets

DatacenterSink Job should send JSON tuples rather than Streams tuples

Naming of some attributes/types are deceptive

CallCenter application should leverage "transcriptionComplete" attribute

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent