I haven't started changing attribute names yet because I want to try to keep the work backwards compatible for now. This issue is more for guidance when we decide we can really overhaul things.
Since I keep getting confused, I'm working on adding documentation on what each type means. I will call out attributes that I believe are misleading.
type Utterance =
tuple<rstring callId, // ipAddress + captureSeconds -> In an environment where
// each speaker is on a separate RTP Streams, the callId
// is effectively the ID for that speaker's stream.
// In the one-speaker to one-stream case, CTI correlation
// must be done.
int32 utteranceNumber, // The utterance number for given RTP Stream, [0,1,...]
float64 utteranceStartTime, // Seconds of audio processed for a given RTP Stream
// up to start of the Utterance
float64 utteranceEndTime, // Seconds of audio processed for a given RTP Stream
// up to end of the Utterance
uint32 captureSeconds, // This refers to the capture time in seconds of the first
// RTP packet in the SSRC stream
rstring role, // role = "AGENT" -- this is currently useless
rstring utterance, // The text of a single utterance
int32 speakerId, // Not used - based on a channel id that is set to 0, since
// we only handle a single channel at a time
rstring callCenter, // ID for the call center the utterance is coming from
float64 utteranceConfidence, // Statistical confidence in the transcription of the utterance
list<float64> utteranceTokenConfidences/*, // Statistical confidence in each token/word of the utterance
list<int32> utteranceSpeakers, // If using diarization, speaker of each token/word
list<rstring> nBestHypotheses*/> ; // Alternative guesses for the utterance text
As I see other types/attributes I think could be cleaned up, I will add them to this issue.