noodlefrenzy / cognitiveservicestutorial Goto Github PK

View Code? Open in Web Editor NEW

27.0 11.0 27.0 30.43 MB

Cognitive Services Tutorial for AI Immersion Workshop 2017

License: MIT License

C# 99.71% ASP 0.06% HTML 0.24%

cognitiveservicestutorial's Introduction

Cognitive Services Tutorial

Cognitive Services Tutorial for AI Immersion Workshop 2017

Goals

The goal of the Cognitive Services track for the AI Immersion Workshop is to give Build attendees the opportunity to build an end-to-end scenario using Cognitive Services and other Microsoft Azure technologies in tandem, proving that the whole is greater than the sum of its parts and providing a code-base to build on well after the session is over.

Scenario

We're building an end-to-end scenario that allows you to pull in your own pictures, use Cognitive Services to find objects and people in the images, figure out how those people are feeling, and store all of that data into a NoSQL Store (DocumentDB). We use that NoSQL Store to populate an Azure Search index, and then build a Bot Framework bot using LUIS to allow easy, targeted querying.

We walk through the scenario in detail in the Lab Manual - please start there!

Architecture

We build a simple C# application that allows you to ingest pictures from your local drive, then invoke several different Cognitive Services to gather data on those images:

Computer Vision: We use this to grab tags and a description
Face: We use this to grab faces and their details from each image
Emotion: We use this to pull emotion scores from each face in the image

We'll walk through why each of those APIs are used and the differences between them. Once we have this data, we process it to pull out the details we need, and store it all into DocumentDB, our NoSQL PaaS offering.

Once we have it in DocumentDB, we'll build an Azure Search Index on top of it (Azure Search is our PaaS offering for faceted, fault-tolerant search - think Elastic Search without the management overhead). We'll show you how to query your data, and then build a Bot Framework bot to query it. Finally, we'll extend this bot with LUIS to automatically derive intent from your queries and use those to direct your searches intelligently.

Extra Credit

There is no reason this is all controlled from the user's machine. In a real production system, you would like to be able to scale out and throttle your queries to the various services involved, and this is simple to do in Azure. For extra credit, we'll extend our initial application to upload the images to Blob Storage and provide Azure Topic messages on each. We use those messages to trigger Azure Function jobs which query the various Cognitive Services and write their results to DocumentDB. This allows the system to scale out as needed, and provide online processing of images as soon as they make it into the cloud, even if the client detaches. It also allows multiple clients to potentially upload and process at once.

More Extra Credit

What's in a face? We've pulled out facial details from all of your images as they were ingested, but can we pull out people? We'll build a client to allow you to look at faces we've detected and label and group them, throwing these tags and groups into the Face API to allow future images to be auto-tagged/grouped as they are ingested. We can then extend the Bot to allow you to query for individuals or groups ("show me pictures of my family taken outside").

cognitiveservicestutorial's People

Contributors

Stargazers

Watchers

cognitiveservicestutorial's Issues

Build faceted search queries

Use LUIS intents/entities to build faceted search queries against the search schema.

Build LUIS Model

Build the LUIS model required for pulling out intents and entities from the bot. So far the intents I see are possibly "show me" and maybe "list"(?), with entities being "#", object-type, person-name.

Possible utterances:

show me the top 3 pictures of cats
list all the pictures taken outside
show me pictures of a bird

Feel free to shoot any and all of these full of holes.

Finalize preliminary architecture

Yes, I know "finalizing" a preliminary architecture seems odd, but this is just getting everyone on-board, revising, etc. until we're all satisfied. Then converting it into something we can put in the README.

Ingest images

Build code to ingest images from a directory and operate upon them. What "operate" means is TDB until architecture is finalized.

Finalize Timeline

Once we have a better sense of how long it will take to present the various phases as well as build the code, we should finalize the timeline and do any culling we need to do.

On the cull list (in order, but definitely negotiable):

Anything related to specific face recognition, grouping
Move calls to all but one of the vision cog services APIs into the skeleton code, so they don't need to write API/SDK-calling code more than once.
Move transform from API results into DocDB schema into skeleton code.
Export LUIS model and have them just import it.
The entire Bot.

Build rest of bot dialog

Very vague here, but at least:

help text (i.e. they say "help", we say "you can do xxx"
really minor chatter/flavor content

Build Skeleton Code: Image ingest -> DocDB

Take full working solution from image ingestion to docdb metadata-insertion and rip out the guts, leaving method stubs and comments. This will be the skeleton framework that folks can build on during the tutorial.

Build Presentation: Phase 0

Introduction to Azure Portal, Overview of Day's Agenda

Document Phase 3: Bot

LabManual.md documentation of building the bot, integrating the LUIS model, calling Azure Search and displaying the results.

Document Phase 0: Azure Pass setup + portal overview

LabManual.md documentation for the setup process + portal overview of the pieces we'll need.

Document Phase 2: Azure Search

LabManual.md documentation for the queries we'd like to do within Azure Search, as well as how the schema gets created via ingesting DocumentDB and any tuning we do there.

Build Baseline Image Corpus

Gather a corpus of images we can use as a baseline, for people who don't have their own, and put it in a place that's easy for people to ingest (perhaps in this GH repo).

Test Azure Pass re: Cognitive Services

Can Azure Pass support everything we want to do here? We'll need more than the free tier on cog services.

Store Metadata to DocumentDB

Open question - (where) do we store images? If not to DocDB, we should have a separate task for that and this should be updated to store the location of the stored image.

Store all metadata gathered from the cognitive services about the image, as well as filename, type, initial path, upload/processing date/time, time for each API call, total time to process.

Preliminary architecture document/diagram

Get a preliminary architecture document out for review. Really light (markdown or PPT).

Call Emotion API

Code to call emotion API for each face in each image, and pull out all relevant metadata from the response payload. Should handle errors, as well.

Build initial bot code

Build baseline bot code to get the bot up and running, integrating LUIS and taking the response to talk to search (LUIS model and search queries themselves are separate tasks, see #12 and #13 ). Rendering the results is also a separate task, see #15.

Document Phase 1: Ingest -> DocDB

README-level or word-doc-level documentation for ingestion of images through to metadata-storage in DocumentDB.

Build Presentation: Phase 2

More detailed presentation of how to create an Azure Search index from DocumentDB, how to query it, how faceting works.

Show search results in bot

When you get back one or more images from a search query, render them in the bot framework using the appropriate cards. Intentionally vague at this point.

Define number of phases

The tutorial session will be multi-phase. My suggestion was two phases:

Images->Q->Az Fn->Cog Svcs/Vision->DocDB->Az Search
Bot Framework->LUIS->Az Search

My thought was we should create branches for each phase, for skeleton and for complete version. Plus a final stage with the complete solution.

Call Face API

Code to call face API for each image, and pull out all relevant metadata from the response payload. Should handle errors, as well.

Build Azure Search Index

Build the Azure Search index on top of the Document DB. This involves whatever tuning is needed after the vanilla "sync with DocDB" step.

Build Presentation: Problem Statement, Architecture Overview, Setup

Initial high-level presentation of the problem, the architecture, and setting up Azure Pass + Portal overview.

Build Presentation: Phase 3

Whirlwind tour of the parts of the Bot Framework and LUIS we'll need.

Call Image Recognition API

Code to call image recognition API for each image, and pull out all relevant metadata from the response payload. Should handle errors, as well.

Face Tagging/Grouping Pre-Logic

Build whatever code is required to allow tracking of faces within an image and across images to allow later Face API operations for tagging and grouping.

Build Presentation: Phase 1

Cognitive Services overview, with more detail on vision and LUIS

Build Skeleton Code: Bot Framework

Take full working solution for all of the bot pieces and rip out the guts, leaving method stubs and comments. This will be the skeleton framework that folks can build on during the tutorial. Alternately, we can leave in the results-rendering and help/flavor dialog portions to reduce scope.