open-voice-interoperability / docs-archived Goto Github PK

A repo containing OVON published and WIP artifacts, docs, etc. BETA site is available on github.io.

Home Page: https://open-voice-network.github.io/docs

License: Apache License 2.0

Ruby 100.00%

docs-archived's Introduction

Open Voice Network (OVON) document repo

This is the document repo for the Open Voice Network. This repository contains the published and work in progress artifacts, docs, schemas, technical committee working group meeting minutes, etc of the OVON.

A static version of this repo is hosted at https://open-voice-network.github.io/docs.

OVON way of working

Our "way of working" is documented here.

Still unsure what to do? Ask in the #general channel in the OVN Slack and someone can help you.

Static site development

Our static site is built with Jekyll.

# command for local Jekyll site build
bundle exec jekyll serve --baseurl=""

docs-archived's People

Contributors

Stargazers

Watchers

Forkers

omitfo2 lens-fitzgerald pvsb prototypeadmin sambaul mvandermeulen oitacoleman nkmyers0794-zz jemsbhai davidattwater nsouthernlf

docs-archived's Issues

AWG CHECK IN

Please leave a comment containing your name. When enough of the AWG working group members have checked in, we will start using GITHUB to keep track of our issues, tasks, etc.

Outside expertise contacts

Jonathan Eisenzopf (Discourse.ai) Oct 2nd at 1:36 PM
On some of the issues that are being documented above, as you start getting into solutioning, I'd recommend inviting one or more PhD level Computational Linguists and VUI Designers. These issues are easy for people but non-trivial for a system to resolve. The NLP libraries out there will not work for you so I would not recommend taking them on without some outside expertise. Wally Brill, James Giangola, Peter Krogh, Dan Padgett, Jonathan Bloom, or Roberto Pieraccini at Google are all excellent. Lisa Falkson at Amazon is great. There are many other great options at other companies like Susan Hura, Leslie Walker and Nandini Stocker who may be willing to lend their expertise. Let me know if you'd like me to reach out to any of these people for advise. @jim Larson (SpeechTEK) you probably know all of these people too.

How VRS relates to DNS?

This is a research project that requires discussion and research with DNS experts.

Standardize date format to ISO 8601 YYYY-MM-DD

Standardized date format to ISO 8601 (YYYY-MM-DD)

Figure out the process of issue management for the OVN

This issue is being assigned to Oita, Jim, and Jon, to resolve in their meeting of 2021.01.05.

VRS: an encompassing solution architecture and value propositions

What is the encompassing solution architecture of a voice registry system? How does a VRS interact with proprietary platforms (ranging from Alexa-Google-Bixby-Magenta, etc.) and independent agents (as developed with Soundhound, Rasa, etc.)? What will be the VRS value propositions for 1) individuals, 2) enterprise brand owners, 3) enterprise developers, 4) independent agent developers and designers, 5) platforms?

Add 'voice application' definition in the VRS document.

This action item is based on the meeting outcome https://github.com/open-voice-network/docs/blob/master/components/voice_registry_system_meeting_notes.md | 09.03.2020

This definition should be aligned in the Technical Masterplan doc.

Fix the inconsistent naming in the diagram

This was raised by @nkmyers0794

some possible activities that we should consider doing.

Here are some possible activities that we should consider doing. I have organized them under the 5 areas of research from Jon’s OVN presentations (I am compiling a list of additional possible activities suggested by the developer-experience community that will be available in a couple of weeks.)

Destination Registry
1.1. Name: Approaches for Voice Registry
Motivation: Need to determine the best approach for developing a voice registry
Description: Analyze the pros and cons of developing a Voice Registry which leverages DNS vs developing an entirely new Voice Registry which is independent of DNS.
Voice Commerce Core Processes
2.1. Pay by voice
Motivation: Accelerating people’s purchasing process through of voice payment through user’s voice print and biometrics.
Description: Specify the vocabulary for a frequently-used business activity by constructing and deploying a skill (printing a portion of a conversation) An example approach is outlined in Telephone Commands in five languages, Final draft ETSI ES 202 076 V2.1.1 (2009-06) ETSI Standard Human Factors (HF); User Interfaces; Generic spoken command vocabulary for ICT devices and services https://www.etsi.org/deliver/etsi_es/202000_202099/202076/02.01.01_50/es_202076v020101m.pdf
Identification and Authentication
(see 2.1 pay by voice)
Data Privacy
4.1. Name Evaluate Opal Approach to Privacy
Motivation: Need to specify a data privacy policy and enforcement mechanism
Description: Need to specify a data privacy policy for speech applications. Evaluate Almond approach to privacy and determine it if is useful in our work. Reference: ALMOND, THE OPEN, PRIVACY-PRESERVING VIRTUAL ASSISTANT https://oval.cs.stanford.edu/
Interoperability
5.1. Name: Vendor Independent language for writing speech applications
Status: Proposed
Motivation: Developers waste time and effort developing applications twice, one for Alexa and once for Google.
Description: Specify a voice application in a vendor-independent manner that can be transformed to both a working Alexa or Google voice application. Recommend if we should develop a vender-independent language for writing speech applications that can be translated to be executed on both Alexa and Google platforms.
Other
6.1. Name: next generation Voice app platform
Motivation: Make new functions and features available to users via enhanced platform (that is also vendor-independent).
Description: extend the design work by Dan and Maria in their “master plan” to include multimodel functionality, emotion detection and generation, and other new and useful features and functionality. This could be step 1 of developing next gen voice app platform

Get requirements from developers community group.

PROBLEM:
Review the requirements from the developer.

https://github.com/open-voice-network/docs/blob/6fef8cc3854b8e52b14da5cb6e3cc68d9bc1ff11/components/voice_registry_system.md#L137-L137

DOD (Definition of Done)

Git issues created based on the requirements
Prioritize the requirements

VRS scenario discussion raised by Jonathan

When an invocation is made, how does the voice assistant know what to pass to VRS?
Here is an example that goes with the question. "Hey Google, talk to a man about a dog".
In this case, what does Google pass to VRS?
Even more pragmatic, "Alexa, ask Target if they have a pee es five in stock". What does Alexa pass to VRS?
One final example, "Hey Google, talk to Target, the department store".

cc: @eisenzopf

Standardize document versioning convention

how should discovery work?

Users can invoke a google-like search for voice app capabilities. The user specifies his/her needs, and the system responds with a prioritized list of voice applications and their skills. Users with displays will see voice application descriptions in a format similar to the app stores or online marketplaces for mobile devices.

Add DELETE in swagger records

Alternatives names scenario

PROBLEM
Discuss the solution options on how can VRS help in solving the alternatives names scenarios (disambiguation #56 #64)
Example: "Tarjey" = "Target"; "Delta" = "Delta Dental"

This is different from mispronunciation.

DOD

Identify the possible options for the alternative
Make a recommendation the best path forward

Privacy WG: detailed review of major platform privacy literature, to determine levels of alignment with Privacy WG 1.0 document.

Oita: Microsoft. Jon: Google. Also needed: Apple, Amazon, Deutsche Telekom (as proxy for European).

Update vocabulary markdown file - word starts with C

Grab the definition and vocabulary from this link.

Rename Dialog Broker

Members of the Steering Committee, and the chair, have noted that Dialog Broker should be renamed. A few things:

Dialog Manager, Dialog Broker, etc -- too many common sounding names
Suggestion was "Intent Broker" -- more specific and explicit

VRS integration with other components

PROBLEM:
Identify the other components that VRS is going to interact with

DOD (Definition of Done)

Add scenarios and examples in the VRS markdown file
Identify the flow and responsibility of the VRS and the components it's integrating
Design interface (API contracts) between each components

AWG Tactic: Explore-research the approach and technology of Jovo, the Berlin-based open source voice framework.

Jovo is widely regarded as the leading open voice platform available today. The value proposition is simple: build one voice experience on Jovo, and it will be able to work across multiple platforms, including Alexa, Google Assistant, Samsung Bixby, and more. Jan Konig is the founder and CEO, and an early supporter of the OVN concept; he can be reached at [email protected]. Jovo web site is here: https://www.jovo.tech/.

1.0.6.1 Do we need central location for common words?

https://github.com/open-voice-network/docs/blob/210d3b28e8a407c0520c611933c28cf286e43cb8/components/voice_registry_system.md#L156-L156

Create a POV if common words like "hello, goodmorning, etc" should be registered in VRS.

Should we correct mispronunciations (phonetic spelling correction)? This is especially trooublesome for foreign-sounding names

Mulltimodal dialog manager

While speech is a useful mode for users interacting with computers, speech interfaces are greatly enhanced by additional modes. The modes available to users depend on the physical capabilities of the client. Desktop, laptop, phones, cars, home appliances, and wareables will support a variety of input and output modes. North Star shoould support a variety of input modes in addition to speech, including tactile (keyboard, pen mouse joystick) and visual (scanner, still camera and video cameras). It will also support a variety of output modes in addition to speech, including display visual (text, photographs videos), and tactile

The dialog manager will collect input from users using various input modes, converting input to internal formats, and integrate the information into requests and commands for processing by the dialog broker.

The dialog manager will also convert information to be presented to the user to formats required by available hardware output components. For example, if the client has a display, the information could be displayed as a photo, or if no display is available, the information could be spoken as a verbal description of the photo.

Use VRS so that I can find voice experiences from companies that I’m interested in

Problem #2: As a consumer, I want to use VRS so that I can find voice experiences from companies that I’m interested in and discover new voice experiences across the different voice platforms.

Submitted by @rogerkibbe

Disambiguation of entity name homophone

In the event, a user utters "Sysco" and meant Sysco food company and not the Cisco Networking.

Term "platform" in vocabulary

Platform: The collection of components (the environment) needed to execute a voice application. Examples of platforms include the Amazon and Google platforms that execute voice applications.

Dialog Broker vs. Dialog Manager

As I was reviewing the Technical Masterplan I noticed that I had to re-read and think through the difference between Dialog Broker and Dialog Manager a few times in order to understand the concepts. I think it it would be helpful to update the current graphics or create a new one entirely that provides a more "non-technical" display of what these concepts are and how they are meant to function in the overall VRS model. Thoughts on this, or is anyone else experiencing the same hurdle?

For example:

When I think of Dialog Broker I think of an Alexa Skill, Google Action, etc. where intents have been stored for a given voice application. Am I on the right track with this?

When I think of Dialog Manager I think of the different voice assistant services like Alexa, Google Assistant, Siri, etc. Am I on the right track with this?

Any feedback or thoughts would be helpful!

A strategy for privacy in speech applications

Object (speech applications) owners may specify a range from high level to fine-level access and privacy constraints. Object owners can specify constraints involving the following (borrowed Stanford’s Opal system):

Name of one or users (human and/or software agents) who may request a skill be performed.
Collection of skills which the user(s) may perform.
Under what conditions the skills may be performed, for example, “do not send invoices before the end of the month.”

If a user is denied access to an object, an error message will be presented to the user describing why the request was denied and what to do or who to contact to gain access. For example, “It is too early in the month to send the monthly invoice, send invoice at the end of the month.”

what happens with data at rest on de smart speaker device in regards to security

In the process of a user speaker, the speech being used as input, recognised and parsed by the platform. What for each step happens with data at rest?

Add 'technical resource' definition in the VRS document

This action item is based on the meeting outcome https://github.com/open-voice-network/docs/blob/master/components/voice_registry_system_meeting_notes.md | 09.03.2020

Create "How we work" doc

Conversation platforms vs voice apps

Looking at the VRS document it occurs to me that we may adhere to much to the tech platforms offerings of conversations platform services. In the current models drawn in the VRS docs we don't count on the fact that non-techplatforms like Volkswagen of US Bank will offer conversational services straight to the users through their own end points: a Volkswagen car or the us bank mobile app. In away they are their own conversational platforms as well as apps. And Volkswagen will probably "pipe" their assistant through Alexa, GA and the likes. Our current approach does not account for this.

Essecpially for the Volkswagen context it may not be unthinkable that various assistants will be available at some point. How will a VRS solution work then?

VRS: What is the operative and MECE VRS definition of "explicit" request?

This is at the core of the VRS concept -- that the OVN can develop and deliver the means for users to confidently connect with the voice-based destinations of specifically-requested organizations. How do we define an explicit request?

Use VRS for determining invocation app for each platform

Problem #1: As a Technical Resource, I want to use the VRS so that I can determine the invocation name for the voice applications on each conversational platform following their guidelines and clearly knowing the restrictions.

Purpose: Search is what to name the voice application.

(edited bases on Meeting 09.02.2020)

Submitted by Mark.Tucker

Presentation component in architecture

Many voice applications will have a visual component, especially to present graphics, videos, and illustrations to users. Insert a box labeled “presentation” next to the TTS/SST box to represent a new component that displays visual content to the user.

Get feedback about prioritization from the VRS channel about the problem context that was collected.

Discuss if an individual problem is a good problem for the VRS to solve.

Add additional problem scenario (ex: location) of VRS

This is brought up by Jonathan Eisenzopf

Add LICENSE file in each OVON git repo

There is not a license file in each of the repos and, where a license file is included (in the Website folder), the license type (MIT) doesn’t match the license type in the Technical Charter (Apache).

Update vocabulary markdown file - word starts with A

Update the vocabulary file based on this discussion document

Improve README to be clear about the intents of this repo

That things can be a WIP

There is hardly an explicit invocation is our personal and messy world

In the VRS work and documents, we determined that people will use explicit invocations to activate assistants or apps. Entities believe their customers will call them by name. Yet this seems to be a very top side approach. People refer to entities or services in a multitude of ways and in a multitude of meanings. Key is that "my" is not included in the current explicit invocation thinking. for example "My Albert Hein" is the new one where the old one is the same distance from my house. Also I refer to Target said in a French accent: "Tarrjay".

Voice is the most contextualized and personalized channel ever. That makes it very messy from a central approach. When we use a central approach we will miss a possible need view that will fit the channel. At its core is how the user gets the best possible experience.

AWG Big Issue: how can we define the interfaces between conversation processors (components) in a future-proof manner?

Discussion to be initiated in AWG meeting of 2020.12.22.

Add another disambiguation homograph scenario (Delta Faucet, Delta Airlines, Delta Dental)

This is suggested by Roger Kibbe, during the VRS meeting 08.20.2020

Change master to "main" branch name

Since we do not have automation yet, it is better to take care of this now.

Clean-up the VRS in the Technical Masterplan

Definition of Done

Make sure the VRS information in the Technical Masterplan has enough information and details go to our VRS document.
Consistency in using the OVN Vocabularies.

VRS: how will the "discoverability" of VRS be developed, optimized?

This is an issue of technology and process-marketing questions. Is it possible that standardized VRS components are made available and attached to independent conversational agents? How will VRS components interact with existing voice application platforms? How might an OVN develop-drive awareness of VRS -- with whom, through whom?

how will verification work in the security capabilities?

in the three types of verification as proposed in the security capabilities in voice document, how may these work?

Master Plan suggested addition/revisions -- vertical industry enterprise "problems"

Recommended for the Master Plan: inclusion of "problems" -- envisioned and documented from vertical industry enterprise usage points of view -- that OVN proposed standards must enable and/or resolve. Included as of today (2020.05.26) is an example for the commerce industry, specifically retail. Recommended prior to the final definition of activities or projects: industry SME development of Tech Comm-reviewed/approved problems for health & life sciences, financial services, transportation, and media verticals. Also needed, in time: problems for smart and connected cities, public safety, education. Purpose of the problems: to a) envision aspirational enterprise usage of voice assistance, with under-girding assumptions of technical progress, and b) establish usage benchmarks against which the OVN will mind-test proposed standards.

1.0.6.2 Who is the decision maker whether user's utterance is explicit or implicit invocation?

PROBLEM

Discuss and debate the options of the ownership of decision making implicit or explicit.