Giter Site home page Giter Site logo

cognitiveservicestutorial's Introduction

Cognitive Services Tutorial

Cognitive Services Tutorial for AI Immersion Workshop 2017

Stories in Ready

Goals

The goal of the Cognitive Services track for the AI Immersion Workshop is to give Build attendees the opportunity to build an end-to-end scenario using Cognitive Services and other Microsoft Azure technologies in tandem, proving that the whole is greater than the sum of its parts and providing a code-base to build on well after the session is over.

Scenario

We're building an end-to-end scenario that allows you to pull in your own pictures, use Cognitive Services to find objects and people in the images, figure out how those people are feeling, and store all of that data into a NoSQL Store (DocumentDB). We use that NoSQL Store to populate an Azure Search index, and then build a Bot Framework bot using LUIS to allow easy, targeted querying.

We walk through the scenario in detail in the Lab Manual - please start there!

Architecture

We build a simple C# application that allows you to ingest pictures from your local drive, then invoke several different Cognitive Services to gather data on those images:

  • Computer Vision: We use this to grab tags and a description
  • Face: We use this to grab faces and their details from each image
  • Emotion: We use this to pull emotion scores from each face in the image

We'll walk through why each of those APIs are used and the differences between them. Once we have this data, we process it to pull out the details we need, and store it all into DocumentDB, our NoSQL PaaS offering.

Once we have it in DocumentDB, we'll build an Azure Search Index on top of it (Azure Search is our PaaS offering for faceted, fault-tolerant search - think Elastic Search without the management overhead). We'll show you how to query your data, and then build a Bot Framework bot to query it. Finally, we'll extend this bot with LUIS to automatically derive intent from your queries and use those to direct your searches intelligently.

Architecture Diagram

Extra Credit

There is no reason this is all controlled from the user's machine. In a real production system, you would like to be able to scale out and throttle your queries to the various services involved, and this is simple to do in Azure. For extra credit, we'll extend our initial application to upload the images to Blob Storage and provide Azure Topic messages on each. We use those messages to trigger Azure Function jobs which query the various Cognitive Services and write their results to DocumentDB. This allows the system to scale out as needed, and provide online processing of images as soon as they make it into the cloud, even if the client detaches. It also allows multiple clients to potentially upload and process at once.

More Extra Credit

What's in a face? We've pulled out facial details from all of your images as they were ingested, but can we pull out people? We'll build a client to allow you to look at faces we've detected and label and group them, throwing these tags and groups into the Face API to allow future images to be auto-tagged/grouped as they are ingested. We can then extend the Bot to allow you to query for individuals or groups ("show me pictures of my family taken outside").

cognitiveservicestutorial's People

Contributors

ayako avatar carlosp-ms avatar jennifermarsman avatar liamca avatar lpperras avatar noodlefrenzy avatar simonpo avatar yorek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cognitiveservicestutorial's Issues

Build LUIS Model

Build the LUIS model required for pulling out intents and entities from the bot. So far the intents I see are possibly "show me" and maybe "list"(?), with entities being "#", object-type, person-name.

Possible utterances:

  • show me the top 3 pictures of cats
  • list all the pictures taken outside
  • show me pictures of a bird

Feel free to shoot any and all of these full of holes.

Finalize preliminary architecture

Yes, I know "finalizing" a preliminary architecture seems odd, but this is just getting everyone on-board, revising, etc. until we're all satisfied. Then converting it into something we can put in the README.

Ingest images

Build code to ingest images from a directory and operate upon them. What "operate" means is TDB until architecture is finalized.

Finalize Timeline

Once we have a better sense of how long it will take to present the various phases as well as build the code, we should finalize the timeline and do any culling we need to do.

On the cull list (in order, but definitely negotiable):

  1. Anything related to specific face recognition, grouping
  2. Move calls to all but one of the vision cog services APIs into the skeleton code, so they don't need to write API/SDK-calling code more than once.
  3. Move transform from API results into DocDB schema into skeleton code.
  4. Export LUIS model and have them just import it.
  5. The entire Bot.

Build rest of bot dialog

Very vague here, but at least:

  • help text (i.e. they say "help", we say "you can do xxx"
  • really minor chatter/flavor content

Build Skeleton Code: Image ingest -> DocDB

Take full working solution from image ingestion to docdb metadata-insertion and rip out the guts, leaving method stubs and comments. This will be the skeleton framework that folks can build on during the tutorial.

Document Phase 3: Bot

LabManual.md documentation of building the bot, integrating the LUIS model, calling Azure Search and displaying the results.

Document Phase 2: Azure Search

LabManual.md documentation for the queries we'd like to do within Azure Search, as well as how the schema gets created via ingesting DocumentDB and any tuning we do there.

Build Baseline Image Corpus

Gather a corpus of images we can use as a baseline, for people who don't have their own, and put it in a place that's easy for people to ingest (perhaps in this GH repo).

Store Metadata to DocumentDB

Open question - (where) do we store images? If not to DocDB, we should have a separate task for that and this should be updated to store the location of the stored image.

Store all metadata gathered from the cognitive services about the image, as well as filename, type, initial path, upload/processing date/time, time for each API call, total time to process.

Call Emotion API

Code to call emotion API for each face in each image, and pull out all relevant metadata from the response payload. Should handle errors, as well.

Build initial bot code

Build baseline bot code to get the bot up and running, integrating LUIS and taking the response to talk to search (LUIS model and search queries themselves are separate tasks, see #12 and #13 ). Rendering the results is also a separate task, see #15.

Build Presentation: Phase 2

More detailed presentation of how to create an Azure Search index from DocumentDB, how to query it, how faceting works.

Show search results in bot

When you get back one or more images from a search query, render them in the bot framework using the appropriate cards. Intentionally vague at this point.

Define number of phases

The tutorial session will be multi-phase. My suggestion was two phases:

  1. Images->Q->Az Fn->Cog Svcs/Vision->DocDB->Az Search
  2. Bot Framework->LUIS->Az Search

My thought was we should create branches for each phase, for skeleton and for complete version. Plus a final stage with the complete solution.

Call Face API

Code to call face API for each image, and pull out all relevant metadata from the response payload. Should handle errors, as well.

Build Azure Search Index

Build the Azure Search index on top of the Document DB. This involves whatever tuning is needed after the vanilla "sync with DocDB" step.

Call Image Recognition API

Code to call image recognition API for each image, and pull out all relevant metadata from the response payload. Should handle errors, as well.

Face Tagging/Grouping Pre-Logic

Build whatever code is required to allow tracking of faces within an image and across images to allow later Face API operations for tagging and grouping.

Build Skeleton Code: Bot Framework

Take full working solution for all of the bot pieces and rip out the guts, leaving method stubs and comments. This will be the skeleton framework that folks can build on during the tutorial. Alternately, we can leave in the results-rendering and help/flavor dialog portions to reduce scope.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.