Giter Site home page Giter Site logo

open-voice-interoperability / docs-archived Goto Github PK

View Code? Open in Web Editor NEW
15.0 15.0 11.0 13.55 MB

A repo containing OVON published and WIP artifacts, docs, etc. BETA site is available on github.io.

Home Page: https://open-voice-network.github.io/docs

License: Apache License 2.0

Ruby 100.00%

docs-archived's Introduction

Open Voice Network (OVON) document repo

This is the document repo for the Open Voice Network. This repository contains the published and work in progress artifacts, docs, schemas, technical committee working group meeting minutes, etc of the OVON.

A static version of this repo is hosted at https://open-voice-network.github.io/docs.

OVON way of working

Our "way of working" is documented here.

Still unsure what to do? Ask in the #general channel in the OVN Slack and someone can help you.

Static site development

Our static site is built with Jekyll.

# command for local Jekyll site build
bundle exec jekyll serve --baseurl=""

docs-archived's People

Contributors

jcstine avatar maridob avatar nsouthernlf avatar kldank avatar pmotch avatar lens-fitzgerald avatar fmfiterate avatar dadahl avatar slushpupie avatar jim1151 avatar nkmyers0794-zz avatar elizabeth-robins avatar rogerkibbe avatar omitfo2 avatar davidattwater avatar johniwasz avatar

Stargazers

DAS avatar Rafael Tappe Maestro avatar Jac avatar  avatar Brett Adler avatar Muntaser Syed avatar Miguel Costa avatar HotToddie avatar  avatar Maria Aretoulaki avatar iotmnovak avatar Mark avatar  avatar  avatar  avatar

Watchers

Kathy Reid avatar Vadim avatar  avatar James Cloos avatar Michael Lewis avatar  avatar  avatar Phil D Hall avatar HotToddie avatar Jonathan Eisenzopf avatar  avatar iotmnovak avatar  avatar  avatar  avatar

docs-archived's Issues

AWG CHECK IN

Please leave a comment containing your name. When enough of the AWG working group members have checked in, we will start using GITHUB to keep track of our issues, tasks, etc.

Outside expertise contacts

Jonathan Eisenzopf (Discourse.ai) Oct 2nd at 1:36 PM
On some of the issues that are being documented above, as you start getting into solutioning, I'd recommend inviting one or more PhD level Computational Linguists and VUI Designers. These issues are easy for people but non-trivial for a system to resolve. The NLP libraries out there will not work for you so I would not recommend taking them on without some outside expertise. Wally Brill, James Giangola, Peter Krogh, Dan Padgett, Jonathan Bloom, or Roberto Pieraccini at Google are all excellent. Lisa Falkson at Amazon is great. There are many other great options at other companies like Susan Hura, Leslie Walker and Nandini Stocker who may be willing to lend their expertise. Let me know if you'd like me to reach out to any of these people for advise. @jim Larson (SpeechTEK) you probably know all of these people too.

VRS: an encompassing solution architecture and value propositions

What is the encompassing solution architecture of a voice registry system? How does a VRS interact with proprietary platforms (ranging from Alexa-Google-Bixby-Magenta, etc.) and independent agents (as developed with Soundhound, Rasa, etc.)? What will be the VRS value propositions for 1) individuals, 2) enterprise brand owners, 3) enterprise developers, 4) independent agent developers and designers, 5) platforms?

some possible activities that we should consider doing.

Here are some possible activities that we should consider doing. I have organized them under the 5 areas of research from Jon’s OVN presentations (I am compiling a list of additional possible activities suggested by the developer-experience community that will be available in a couple of weeks.)

  1. Destination Registry
    1.1. Name: Approaches for Voice Registry
    Motivation: Need to determine the best approach for developing a voice registry
    Description: Analyze the pros and cons of developing a Voice Registry which leverages DNS vs developing an entirely new Voice Registry which is independent of DNS.

  2. Voice Commerce Core Processes
    2.1. Pay by voice
    Motivation: Accelerating people’s purchasing process through of voice payment through user’s voice print and biometrics.
    Description: Specify the vocabulary for a frequently-used business activity by constructing and deploying a skill (printing a portion of a conversation) An example approach is outlined in Telephone Commands in five languages, Final draft ETSI ES 202 076 V2.1.1 (2009-06) ETSI Standard Human Factors (HF); User Interfaces; Generic spoken command vocabulary for ICT devices and services https://www.etsi.org/deliver/etsi_es/202000_202099/202076/02.01.01_50/es_202076v020101m.pdf

  3. Identification and Authentication
    (see 2.1 pay by voice)

  4. Data Privacy
    4.1. Name Evaluate Opal Approach to Privacy
    Motivation: Need to specify a data privacy policy and enforcement mechanism
    Description: Need to specify a data privacy policy for speech applications. Evaluate Almond approach to privacy and determine it if is useful in our work. Reference: ALMOND, THE OPEN, PRIVACY-PRESERVING VIRTUAL ASSISTANT https://oval.cs.stanford.edu/

  5. Interoperability
    5.1. Name: Vendor Independent language for writing speech applications
    Status: Proposed
    Motivation: Developers waste time and effort developing applications twice, one for Alexa and once for Google.
    Description: Specify a voice application in a vendor-independent manner that can be transformed to both a working Alexa or Google voice application. Recommend if we should develop a vender-independent language for writing speech applications that can be translated to be executed on both Alexa and Google platforms.

  6. Other
    6.1. Name: next generation Voice app platform
    Motivation: Make new functions and features available to users via enhanced platform (that is also vendor-independent).
    Description: extend the design work by Dan and Maria in their “master plan” to include multimodel functionality, emotion detection and generation, and other new and useful features and functionality. This could be step 1 of developing next gen voice app platform

VRS scenario discussion raised by Jonathan

When an invocation is made, how does the voice assistant know what to pass to VRS?
Here is an example that goes with the question. "Hey Google, talk to a man about a dog".
In this case, what does Google pass to VRS?
Even more pragmatic, "Alexa, ask Target if they have a pee es five in stock". What does Alexa pass to VRS?
One final example, "Hey Google, talk to Target, the department store".

cc: @eisenzopf

Category

  • Understand the need for the category.
  • Should we have a category as an augmentation for better identification?

how should discovery work?

Users can invoke a google-like search for voice app capabilities. The user specifies his/her needs, and the system responds with a prioritized list of voice applications and their skills. Users with displays will see voice application descriptions in a format similar to the app stores or online marketplaces for mobile devices.

Alternatives names scenario

PROBLEM
Discuss the solution options on how can VRS help in solving the alternatives names scenarios (disambiguation #56 #64)
Example: "Tarjey" = "Target"; "Delta" = "Delta Dental"

This is different from mispronunciation.

DOD

  • Identify the possible options for the alternative
  • Make a recommendation the best path forward

Rename Dialog Broker

Members of the Steering Committee, and the chair, have noted that Dialog Broker should be renamed. A few things:

  • Dialog Manager, Dialog Broker, etc -- too many common sounding names
  • Suggestion was "Intent Broker" -- more specific and explicit

VRS integration with other components

PROBLEM:
Identify the other components that VRS is going to interact with

DOD (Definition of Done)

  • Add scenarios and examples in the VRS markdown file
  • Identify the flow and responsibility of the VRS and the components it's integrating
  • Design interface (API contracts) between each components

AWG Tactic: Explore-research the approach and technology of Jovo, the Berlin-based open source voice framework.

Jovo is widely regarded as the leading open voice platform available today. The value proposition is simple: build one voice experience on Jovo, and it will be able to work across multiple platforms, including Alexa, Google Assistant, Samsung Bixby, and more. Jan Konig is the founder and CEO, and an early supporter of the OVN concept; he can be reached at [email protected]. Jovo web site is here: https://www.jovo.tech/.

Mulltimodal dialog manager

While speech is a useful mode for users interacting with computers, speech interfaces are greatly enhanced by additional modes. The modes available to users depend on the physical capabilities of the client. Desktop, laptop, phones, cars, home appliances, and wareables will support a variety of input and output modes. North Star shoould support a variety of input modes in addition to speech, including tactile (keyboard, pen mouse joystick) and visual (scanner, still camera and video cameras). It will also support a variety of output modes in addition to speech, including display visual (text, photographs videos), and tactile

The dialog manager will collect input from users using various input modes, converting input to internal formats, and integrate the information into requests and commands for processing by the dialog broker.

The dialog manager will also convert information to be presented to the user to formats required by available hardware output components. For example, if the client has a display, the information could be displayed as a photo, or if no display is available, the information could be spoken as a verbal description of the photo.

Term "platform" in vocabulary

Platform: The collection of components (the environment) needed to execute a voice application. Examples of platforms include the Amazon and Google platforms that execute voice applications.

Dialog Broker vs. Dialog Manager

As I was reviewing the Technical Masterplan I noticed that I had to re-read and think through the difference between Dialog Broker and Dialog Manager a few times in order to understand the concepts. I think it it would be helpful to update the current graphics or create a new one entirely that provides a more "non-technical" display of what these concepts are and how they are meant to function in the overall VRS model. Thoughts on this, or is anyone else experiencing the same hurdle?

For example:

When I think of Dialog Broker I think of an Alexa Skill, Google Action, etc. where intents have been stored for a given voice application. Am I on the right track with this?

When I think of Dialog Manager I think of the different voice assistant services like Alexa, Google Assistant, Siri, etc. Am I on the right track with this?

Any feedback or thoughts would be helpful!

A strategy for privacy in speech applications

Object (speech applications) owners may specify a range from high level to fine-level access and privacy constraints. Object owners can specify constraints involving the following (borrowed Stanford’s Opal system):

  1. Name of one or users (human and/or software agents) who may request a skill be performed.
  2. Collection of skills which the user(s) may perform.
  3. Under what conditions the skills may be performed, for example, “do not send invoices before the end of the month.”

If a user is denied access to an object, an error message will be presented to the user describing why the request was denied and what to do or who to contact to gain access. For example, “It is too early in the month to send the monthly invoice, send invoice at the end of the month.”

Conversation platforms vs voice apps

Looking at the VRS document it occurs to me that we may adhere to much to the tech platforms offerings of conversations platform services. In the current models drawn in the VRS docs we don't count on the fact that non-techplatforms like Volkswagen of US Bank will offer conversational services straight to the users through their own end points: a Volkswagen car or the us bank mobile app. In away they are their own conversational platforms as well as apps. And Volkswagen will probably "pipe" their assistant through Alexa, GA and the likes. Our current approach does not account for this.

Essecpially for the Volkswagen context it may not be unthinkable that various assistants will be available at some point. How will a VRS solution work then?

Use VRS for determining invocation app for each platform

Problem #1: As a Technical Resource, I want to use the VRS so that I can determine the invocation name for the voice applications on each conversational platform following their guidelines and clearly knowing the restrictions.

Purpose: Search is what to name the voice application.

(edited bases on Meeting 09.02.2020)

Submitted by Mark.Tucker

Presentation component in architecture

Many voice applications will have a visual component, especially to present graphics, videos, and illustrations to users. Insert a box labeled “presentation” next to the TTS/SST box to represent a new component that displays visual content to the user.

Add LICENSE file in each OVON git repo

There is not a license file in each of the repos and, where a license file is included (in the Website folder), the license type (MIT) doesn’t match the license type in the Technical Charter (Apache).

There is hardly an explicit invocation is our personal and messy world

In the VRS work and documents, we determined that people will use explicit invocations to activate assistants or apps. Entities believe their customers will call them by name. Yet this seems to be a very top side approach. People refer to entities or services in a multitude of ways and in a multitude of meanings. Key is that "my" is not included in the current explicit invocation thinking. for example "My Albert Hein" is the new one where the old one is the same distance from my house. Also I refer to Target said in a French accent: "Tarrjay".

Voice is the most contextualized and personalized channel ever. That makes it very messy from a central approach. When we use a central approach we will miss a possible need view that will fit the channel. At its core is how the user gets the best possible experience.

VRS: how will the "discoverability" of VRS be developed, optimized?

This is an issue of technology and process-marketing questions. Is it possible that standardized VRS components are made available and attached to independent conversational agents? How will VRS components interact with existing voice application platforms? How might an OVN develop-drive awareness of VRS -- with whom, through whom?

Master Plan suggested addition/revisions -- vertical industry enterprise "problems"

Recommended for the Master Plan: inclusion of "problems" -- envisioned and documented from vertical industry enterprise usage points of view -- that OVN proposed standards must enable and/or resolve. Included as of today (2020.05.26) is an example for the commerce industry, specifically retail. Recommended prior to the final definition of activities or projects: industry SME development of Tech Comm-reviewed/approved problems for health & life sciences, financial services, transportation, and media verticals. Also needed, in time: problems for smart and connected cities, public safety, education. Purpose of the problems: to a) envision aspirational enterprise usage of voice assistance, with under-girding assumptions of technical progress, and b) establish usage benchmarks against which the OVN will mind-test proposed standards.

1.0.6.2 Who is the decision maker whether user's utterance is explicit or implicit invocation?

PROBLEM

  • Discuss and debate the options of the ownership of decision making implicit or explicit.

DOD

  • - Define explicit and implicit terminologies.
  • - Write the options for the path forward.
  • - Identify the pros and cons for all options
  • - Make a recommended decision.

https://github.com/open-voice-network/docs/blob/210d3b28e8a407c0520c611933c28cf286e43cb8/components/voice_registry_system.md#L158-L158

Link: #102

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.