aws-solutions / media-insights-on-aws Goto Github PK

A serverless framework to accelerate the development of applications that discover next-generation insights in your video, audio, text, and image resources by utilizing AWS Machine Learning and Media services.

License: Apache License 2.0

Shell 3.85% Python 81.85% Dockerfile 0.07% JavaScript 0.45% TypeScript 13.79%

aws machine-learning video-processing serverless-framework

media-insights-on-aws's Introduction

Media Insights on AWS is a development framework for building serverless applications that process video, images, audio, and text on AWS. It takes care of workflow orchestration and data persistence so that you can focus on workflow development. By addressing the concerns of running workflows, Media Insights on AWS empowers you to build applications faster with the benefit of inheriting a pre-built and robust back end.

Media Insights on AWS has been successfully used in a variety of scenarios, such as:

Deriving video features for ad placement
Transforming video content with redaction
Indexing videos based on visual and audio content
Translating videos for automated localization

For additional details and sample use cases, refer to How to Rapidly Prototype Multimedia Applications on AWS with the Media Insights on AWS on the AWS Media blog.

This repository contains the Media Insights on AWS back-end framework. Users interact with the framework through REST APIs or by invoking Lambda functions directly. You will not find a graphical user interface (GUI) in this repository, but a reference application for Media Insights on AWS that includes a GUI is in the Content Localization repository.

Install

You can deploy Media Insights on AWS in your AWS account with the following Cloud Formation templates. The Cloud Formation stack name must be 12 or fewer characters long.

Region	Launch
US East (N. Virginia)
US West (Oregon)
EU West (Ireland)

The Cloud Formation options for these one-click deploys are described in the installation parameters section.

Build from scratch:

Run the following commands to build and deploy Media Insights on AWS from scratch. Be sure to define values for MI_STACK_NAME and REGION first.

REGION=[specify a region]
MI_STACK_NAME=[specify a stack name]
git clone https://github.com/aws-solutions/media-insights-on-aws
cd media-insights-on-aws
cd deployment
VERSION=0.0.0
DATETIME=$(date '+%s')
DIST_OUTPUT_BUCKET=media-insights-on-aws-$DATETIME
aws s3 mb s3://$DIST_OUTPUT_BUCKET-$REGION --region $REGION
aws s3 mb s3://$DIST_OUTPUT_BUCKET --region $REGION
./build-s3-dist.sh --template-bucket $DIST_OUTPUT_BUCKET --code-bucket $DIST_OUTPUT_BUCKET --version $VERSION --region $REGION
TEMPLATE={copy "Template to deploy" link from output of build script}
aws cloudformation create-stack --stack-name $MI_STACK_NAME --template-url $TEMPLATE --region $REGION --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM CAPABILITY_AUTO_EXPAND --disable-rollback

Outputs

If you're building applications on Media Insights on AWS then you will need to understand the following resources in the Outputs tab of the Cloud Formation stack:

DataplaneApiEndpoint is the endpoint for accessing dataplane APIs to create, update, delete and retrieve media assets
DataplaneBucket is the S3 bucket used to store derived media (derived assets) and raw analysis metadata created by the workflows.
WorkflowApiEndpoint is the endpoint for accessing the Workflow APIs to create, update, delete and execute the workflows.
WorkflowCustomResourceArn is the custom resource that can be used to create workflows in CloudFormation scripts

Cost

You are responsible for the cost of the AWS services used while running this solution. The cost for running this solution with the default settings in the us-east-1 (N. Virginia) region is approximately $24 per month without free tiers, or $13 per month with free tiers for 100 workflow runs. Most use cases are covered by the free tier for all AWS services except Amazon Kinesis and AWS Lambda. The costs for the Amazon Kinesis data stream ($12.56/mo) and the Workflow Scheduler lambda ($3.73/mo) will remain relatively unchanged, regardless of how many workflows execute.

Approximate monthly cost, excluding all free tiers:

AWS Service	Quantity	Cost
Amazon API Gateway	1 million workflows	$3.50 / mo
Amazon Dynamo DB	1 million workflows	$.025 / mo
AWS Lambda	100 workflows	$4.75 / mo
Amazon Kinesis	100 workflows	$12.56 / mo
Amazon SQS	1 million workflows	$0.40 / mo
Amazon SNS	n/a	No charge
Amazon S3	100 workflows	$2.3 / mo
AWS Xray	100 workflows	$.0005 / mo

These cost estimates are based on workflows processing live action videos 10 minutes in duration. Each additional 100 workflow executions will cost roughly $2, or higher for videos longer than 10 minutes and lower for videos shorter than 10 minutes.

Limitations

The Cloud Formation stack name for Media Insights on AWS must be 12 or fewer characters long. This will ensure all the resources in the stack remain under the maximum length allowed by Cloud Formation.

Media Insights on AWS does not inherently limit media attributes such as file size or video duration. Those limitations depend on the services used in user-defined workflows. For example, if a workflow uses Amazon Rekognition, then that workflow will be subject to the limitations listed in the guidelines and quotas for Amazon Rekognition. For those who use the Amazon Rekognition service within workflows, be aware about use cases that involve public safety and the general AWS Service Terms.

Architecture Overview

Deploying Media Insights on AWS builds the following environment in the AWS Cloud:

The AWS CloudFormation template provisions the following resources:

Resource: An Amazon API Gateway resource for the control plane REST API

Execution flow: This is the entry point where requests to create, read, update, delete (CRUD), or execute workflows begin.
Resource: AWS Lambda and Amazon Simple Queue Service (Amazon SQS) resources to support workflow orchestration and translating user-defined workflows into AWS Step Functions

Execution flow: Requests for workflow CRUD will finish in this step after an AWS Lambda function updates workflow related tables in DynamoDB. Requests to execute workflows will begin in this step by an AWS Lambda function that saves the request to an SQS queue, which is later read and executed by an AWS Lambda function (called, the workflow scheduler) that controls how many workflows can run at the same time.
Resource: Amazon DynamoDB tables to store workflow-related data, such as state machine definitions for operators, workflow configurations, and workflow execution status.
Resource: Step function resources in AWS Step Functions

Execution flow: When a user defines a new workflow using the workflow API, then an AWS Lambda function creates an executable step function resource in AWS Step Function. When the workflow scheduler starts a workflow, it starts that step function resource, which then invokes a series of AWS Lambda functions that call external services and/or download results from those services. When all the AWS Lambda functions in a workflow have finished execution, then an AWS Lambda function is called to update the workflow status in Amazon DynamoDB.
Resource: AWS Lambda functions for using the following commonly used services in workflows: Amazon Rekognition, Amazon Comprehend, Amazon Translate, Amazon Transcribe, Amazon Polly, and AWS Elemental MediaConvert

Execution flow: Operators consist of AWS Lambda functions that call external services and/or download results from those services. They are invoked by a state machine in AWS Step Functions, as prescribed by the workflow definition. These AWS Lambda functions save results to long-term storage via the data plane REST API.
Resource: An Amazon API Gateway resource for the data plane REST API

Execution flow: Operators save results to long-term storage by calling this API.
Resource: Amazon Simple Storage Service (Amazon S3), DynamoDB, and DynamoDB Streams for media and metadata data storage

Execution flow: The AWS Lambda function behind the data plane API directly accesses Amazon S3 and Amazon DynamoDB to perform incoming CRUD requests. That AWS Lambda function saves files, such as binary media files or JSON metadata files, in Amazon S3. A pointer to those files is saved in an Amazon DynamoDB table. Finally, a time-ordered sequence of modifications to that table are saved in an Amazon DynamoDB Stream and an Amazon Kinesis Data stream.
Resource: An Amazon Kinesis Data stream for interfacing with external applications

Execution flow: The Amazon Kinesis Data Streams provides an interface for external applications to access data stored in the data plane. This interface is appropriate for feeding downstream data stores, such as the Amazon Elasticsearch Service or Amazon Neptune, that support specialized data access patterns required by end-user applications. In order to feed a downstream data store, you must implement a consumer (e.g. an AWS Lambda function) that consumes records from the data stream and performs the necessary extract, transform, and load (ETL) tasks needed for the external application.

NOTE: The ETL tasks that feed downstream data stores are entirely use-case dependent and therefore must be user-defined. The Implementation Guide includes detailed instructions for implementing ETL functions in Media Insights on AWS.

Architecture components:

Workflow API: Use the workflow API to create, update, delete, execute, and monitor workflows.
Control plane: The control plane includes the workflow API and state machines for workflows. Workflow state machines are composed of operators from the Media Insights on AWS operator library. When operators within the state machine are run, they interact with the Media Insights on AWS data plane to store and retrieve derived asset and metadata generated from the workflow.

The control plane uses the following Amazon DynamoDB tables store workflow-related data:
- Workflow – This table records user-defined workflows.
- Workflow Execution – This table records the details of every workflow run.
- Operations – This table records details for each operator in the operator library, such as references to Lambda functions and default runtime parameters.
- Stage – This table records the auto-generated AWS Step Functions code needed for each operator.
- System – This table records system-wide configurations, such as maximum concurrent workflows.
Operators: Operators are generated state machines that call AWS Lambda functions to perform media analysis or media transformation tasks. Users can define custom operators, but the Media Insights on AWS operator library includes the following pre-built operators:
- Celebrity Recognition - An asynchronous operator to identify celebrities in a video using Amazon Rekognition.
- Content Moderation - An asynchronous operator to identify unsafe content in videos using Amazon Rekognition.
- Face Detection - An asynchronous operator to identify faces in videos using Amazon Rekognition.
- Face Search - An asynchronous operator to identify faces from a custom face collection in videos using Amazon Rekognition.
- Label Detection - An asynchronous operator to identify objects in a video using Amazon Rekognition.
- Person Tracking - An asynchronous operator to identify people in a video using Amazon Rekognition.
- Shot Detection - An asynchronous operator to identify camera shots in a video using Amazon Rekognition.
- Text Detection – An asynchronous operator to identify text in a video using Amazon Rekognition.
- Technical Cue Detection – An asynchronous operator to identify technical cues such as end credits, color bars, and black bars in a video using Amazon Rekognition.
- Comprehend Key Phrases – An asynchronous operator to find key phrases in text using Amazon Comprehend.
- Comprehend Entities – An asynchronous operator to find references to real-world objects, dates, and quantities in text using Amazon Comprehend.
- Create SRT Captions – A synchronous operator to generate SRT formatted caption files from a video transcript generated by Amazon Transcribe.
- Create VTT Captions - A synchronous operator to generate VTT formatted caption files from a video transcript generated by Amazon Transcribe.
- Media Convert - An asynchronous operator to transcode input video into mpeg4 format using AWS Elemental MediaConvert.
- Media Info – A synchronous operator to read technical tag data for video files.
- Polly - An asynchronous operator that turns input text into speech using Amazon Polly.
- Thumbnail - An asynchronous operator that generates thumbnail images for an input video file using AWS Elemental MediaConvert.
- Transcribe - An asynchronous operator to convert input audio to text using Amazon Transcribe.
- Translate - An asynchronous operator to translate input text using Amazon Translate.
Data plane: This stores the media assets and metadata generated by workflows. Implement a consumer of the Kinesis data stream in the data plane to extract, transform, and load (ETL) data from the master data store to downstream databases that support the data access patterns required by end-user applications.
Data plane API: This API is used to create, update, delete, and retrieve media assets and metadata.
Data plane pipeline: This pipeline stores metadata for an asset that can be retrieved using an object's AssetId and Metadata type. Writing data to the pipeline initiates a copy of the data to be stored in Kinesis Data Streams. This data stream is the interface that end-user applications can connect to use data stored in the data plane.
Data pipeline consumers: Changes to the data plane DynamoDB table are reflected in an Amazon Kinesis data stream. For each record in that stream, data pipeline consumers perform the necessary extract, transform, and load (ETL) tasks needed to replicate data, such as media metadata, to the data stores used by external applications. These ETL tasks are entirely use-case dependent and therefore must be user-defined. The Implementation Guide includes detailed instructions for implementing data pipeline consumers.

Installation Parameters

You can deploy Media Insights on AWS in your AWS account with the one-click deploy buttons shown above.

Required parameters

Stack Name: The name of the stack. This must be 12 or fewer characters long.

Optional parameters

Parameter	Default	Description
`MaxConcurrentWorkflows`	`5`	Identifies the maximum number of workflows to run concurrently. When the maximum is reached, additional workflows are added to a wait queue. If too high, then workflows may fail due to external service quotas. Recommended range is 2 to 5.
`DeployAnalyticsPipeline`	`true`	Determines whether to deploy a data streaming pipeline that can be consumed by external applications. By default, this capability is activated when the solution is deployed. Set to `false` to deactivate this capability.
`DeployTestWorkflow`	`false`	Determines whether to deploy test resources that contain Lambda functions required for integration and end-to-end testing. By default, this capability is deactivated. Set to `true` to activate this capability.
`EnableXrayTrace`	`false`	Determines whether to activate Active Xray tracing on all entry points to the stack. By default, this capability is deactivated when the solution is deployed. Set to true to activate this capability.
`ExternalBucketArn`	``	The ARN for Amazon S3 resources that exist outside the stack which may need to be used as inputs to the workflows. The ARN must be a valid Amazon S3 ARN and must reference the same AWS account that is used for the stack. By default, ExternalBucketArn will be blank, meaning workflows will only be able to input media files from the data plane bucket.

Developers

Join our Gitter chat at https://gitter.im/awslabs/aws-media-insights-engine! This public chat forum was created to foster communication between Media Insights on AWS developers worldwide.

For instructions on how to build applications with Media Insights on AWS, read the API reference and builder's guide in the Implementation Guide.

Security

Media Insights on AWS uses AWS_IAM to authorize REST API requests. The following screenshot shows how to test authentication to the Media Insights on AWS API using Postman. Be sure to specify the AccessKey and SecretKey for your own AWS environment.

For more information, see the Implementation Guide.

S3 Macie

Amazon Macie can help you discover and protect sensitive data in AWS. If your use-case generates and stores sensitive data to Amazon S3, we recommend that you enable Amazon Macie on the Dataplane Amazon S3 bucket.

Uninstall

To uninstall Media Insights on AWS, delete the CloudFormation stack, as described below. This will delete all the resources created by the Media Insights on AWS template except the Dataplane and the DataplaneLogs S3 buckets. These two buckets are retained when the solution stack is deleted in order to help prevent accidental data loss. You can use either the AWS Management Console or the AWS Command Line Interface (AWS CLI) to empty, then delete those S3 buckets after deleting the CloudFormation stack.

Option 1: Uninstall using the AWS Management Console

Sign in to the AWS CloudFormation console.
Select the Media Insights on AWS stack.
Choose Delete.

Option 2: Uninstall using AWS Command Line Interface

aws cloudformation delete-stack --stack-name <installation-stack-name> --region <aws-region>

Deleting Media Insights on AWS data buckets

Media Insights on AWS creates two S3 buckets that are not automatically deleted. To delete these buckets, use the steps below.

Sign in to the Amazon S3 console.
Select the Dataplane bucket.
Choose Empty.
Choose Delete.
Select the DataplaneLogs bucket.
Choose Empty.
Choose Delete.

To delete the S3 bucket using AWS CLI, run the following command:

aws s3 rb s3://<bucket-name> --force

Collection of operational metrics

This solution collects anonymized operational metrics to help AWS improve the quality and features of the solution. For more information, including how to disable this capability, please see the implementation guide.

It consists of the following information:

Solution ID: The Media Insights on AWS solution identifier (SO0163)
Unique ID (UUID): Randomly generated, unique identifier for each Media Insights on AWS deployment
Timestamp: Date and time of the stack deployment
Instance Data: The version of the solution that was deployed

Example data:

{
    "Solution": "SO0163",
    "UUID": "d84a0bd5-7483-494e-8ab1-fdfaa7e97687",
    "TimeStamp": "2021-03-01T20:03:05.798545",
    "Data": {
        "Version": "2.0.5",
        "CFTemplate": "Created"
    }
}

To opt out of this reporting, complete the following steps before launching the AWS Cloudformation template.

Download the AWS CloudFormation template to your local hard drive.
Open the AWS CloudFormation template with a text editor.
Modify the AWS CloudFormation template mapping section so SendAnonymizedData value is No.

Example:

    AnonymizedData:
      SendAnonymizedData:
        Data: No

Known Issues

Visit the Issue page in this repository for known issues and feature requests.

Contributing

See the CONTRIBUTING file for how to contribute.

Logo

The Media Insights on AWS logo features a clapperboard representing multimedia, centered inside a crosshair representing under scrutiny.

License

See the LICENSE file for our project's licensing.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

media-insights-on-aws's People

Stargazers

Watchers

Forkers

lidioramalho seanthemighty awsyuval saidach profotomedia scottzanevra jfinni asifkazi tuliocasagrande merlingino sobataro yut148 stelukutla abimzy jseonkim hasanp87 cts-ronmackley gitter-badger nrangala19 antonostrovsky paulrigor chatchai-komrangded digitalvideosherpa nbcnews xujunbj shakti-github13 spawar1991 sebastduval leochan2009 rodfmattos olivercera srivg luisgcastillos khileshchauhan shamsy stillonmyway takuya-miyazaki hashir-aws estebanelia iandow guanlinz battlebyte aruncs2005 peachang gmarchand tebanieo corner4world abhishekms1047 afelipeg otarkia zoltanarvai joaquin6 ravivilla milkfloat brandold azadsolanki gopin471 gabriel7131 iyksam20 aburkleaux gamma425 liangyimingcom saitop lifetimestudios nikolabravo abelrugaju isabella232 jkielbaey awspascal jccguimaraes nukacody mohkatz vijaypabothu my-machine-learning-projects-ct dch90 alangunning johnaffolter happyaffyai andrzejzieba alexis-33 makawtharani zeroandoneme adeelahmad suttonsp0 zeroandoneme cloudsmart leftbrainstuff

media-insights-on-aws's Issues

Cognito Support for API’s

Build pattern for user schemas / groups
Cloudformation for cognito infrastructure
Integrate Workflow API
Integrate Dataplane API
*Update any operators / services calling an API to perform an auth call or bypass API and invoke lambda directly

disable polly if translate is disabled

Show the output from the Polly operator in the UI

The CompleteWorkflow runs polly, but the Media Insight UI doesn't show the result.

Question: does polly automatically change input language to match the output language specified in translate? Should we expose a voice selection for this operato in the configuration?

Update dataplane helper with scoped down IAM policy

Dataplane helper (which is needed for every operator) needs ability to invoke dataplane API handler lambda, but not lambda *
- Pattern is completed and implemented for most operators
- Just need to finish applying this to all operators

mieCompleteWorkflow state machine is failing: An error occurred (ResourceNotFoundException) when calling the DescribeCollection operation: Collection:family_faces not found!

Running the default workflow (mieCompleteWorkflow state machine) results in an error in the faceSearch operator: An error occurred (ResourceNotFoundException) when calling the DescribeCollection operation: Collection:family_faces not found!

The state machine is looking for a face collection that does not exist in the stack. Since this operators data is not surfaced in the MIE app, I think the operator should be removed from the workflow.

"Analytics" tab is a bit misleading

After I uploaded an asset, my instinct was to go to the Analytics page to see the results of the workflow analysis. But what I'm looking for is actually in the Collections tab, as a link to the asset that's been analyzed.

We might want to consider naming Analytics tab to something else. Don't have a suggestion at the moment though.

Add a generic operator for ingesting precomputed data files

Add a new operator that can be used to ingest results precomputed by offline/standalone ML algorithms.

Add proxy encode

Add an ingest step to the complete workflow that uses mediainfo and mediaconvert to better prepare videos for processing. Include support for all the media types supported by Media Convert. Currently the front-end only supports jpg and mp4.

Also, show an alert in the browser when a user tried to upload a non-supported media type. Currently those alerts only go to the javascript console.

Versioning

Add versioning to MIE externals. Version should be set in one place (during build) and propagated automatically to components. Consider:

APIs - schema
boto config
Operators
Data schema

Allow multiple MIE deployments per region

Allow multiple MIE deployments per region. Currently stack deployments will fail if MIE has already been deployed in the region.

Make the WebAppCloudfrontUrl MIE stack output a clickable link

Just need to add "https://" to the front, I think.

Add support for new Translate languages

New languages for Amazon Translate: Greek, Hungarian, Romanian, Thai, Ukrainian, Urdu and Vietnamese

Fix chalice api handler roles / policies

* Add functionality to replace IAM role in API handler lambdas with a parameter passed in from workflow stack
* Create new IAM policies specific to each API
* Update cloudformation stacks

Media upload gets error

Once I deploy the solution, I could get access to the main page for media collection but loading icon never gone and when I try to upload a media files to the system, I could see some javascript alert then it does not work at all. Would you please help how to fix it? also, no plan to add Cognito authentication to restrict user access?

Support multi-tenant MIE with data isolation among users

As an owner of an MIE installation, I want to have multiple users on the system, but I only want each user to see the data for the workflows they ran.

Chunked workflows

Applications may want to run workflows for an asset in chunks but store and retireve the asset metadata and derived assets as a single dataplane Asset or Media Object.

Examples

I want to do frame accurate content analysis for a media object (where I can reference it by time or segment then frame). I will split the video into frames then run concurrent instances of the same workflow against all the frames.

I want to speed up the processing of an indexing workflow so I can return results faster. I split the video into HLS segments and then process each segment in paralell.

I want to analyze a video that is too large to process as a single object due to service limits.

I want to process live video and return the analysis results in as close to realtime as possible so that viewer can see the analysis as part of the viewing experience. As video segments are produced I may process them directly or further chunk them into frames and process them. I want to retireve all data up to now for the asset or all data for a chunk (reference byt tie, segement, frame).

I want to run different workflows on the same asset concurrently. For eample, run one workflow to generate instant translation and one to index.

Proposed solution

Provide a mechanism to group multiple workflows under a single workflow for monitoring purposes (parent-child or possible just group by asset id).

We just need to determine what our “session” id is going to be for the dataplane, possibly some “parent” workflow id

Provide a mechanism to store data generated from multiple workflows for the same asset

The logic we used for storing paged results will work for chunks as well. We open a “session” with the dataplane and append data as it comes in.

Provide a mechanism to retireve the latest data for an asset even if all the chunks are not processed yet (dataplane doesn’t know)

Chunks are written to s3 as they come in, but the pointer isn’t updated until the dataplane receives an “end” signal. Possibly add a new method for retrieving live data, essentially bypass the pointer and read directly from s3

Update remaining tests to pass auth header

Per comment on PR# 42

"As discussed, please add an issue for tests failing due to missing Auth header and address post delivery."

Remaining tests to update are instant translate

Dockerize building the MIE deployment package

Move build process into docker to ensure we have consistent environment to generate lambda layers and packages. The base MIE lamabda layer is already being built in docker. The idea is to do this for the entire build.

Show Step Function links and workflow output to help people troubleshoot errors

So that workflows can be more easily monitored.

Build custom face collection for Rekognition in Media Insights UI

Create a user inteface that will provided a guided experience to create a new face collection, add and delete faces and delete a face collection from the Media Insights UI.

Cost estimations

Make a statement like this front and center in the MIE README: "Some of the services MIE uses are not free tier. When you run MIE, you will pay. Processing lengthy movies can get expensive."

Explain methodology for estimating cost of MIE. Give some example scenarios with actual costs.

Costs should be by media type, size and duration since most ML and Media services cost models are based on these dimensions.

clean up workflows and tests

Create one test that tests all the operators. Is tests-parameterized-rekognition redundant?
Does anything use the comprehend workflow? If not, remove it.

Parametrize building the MIE deployment package

Break up the build process into smaller actions, especially for parts that are delivered separately.

Some possible build options:

Build deployment package for a certain app: Instant Translate, Media Insights
Build API docs
Build web app

Add a visualization for the transcript as captions or overlay on the video in the UI

We would like to be able to show the timed text output from Amazon Transcribe as part of the video palyback. This could be done as a web page overlay or by producing a captions track for the video player. The requirements for this need to be scoped to come up implementation details. Some possibilities:

Show captions with video playback
Highlight words or phrases below a confidence threshold
Create multiple caption tracks from translated output

Collection view incorrectly renders fields and item props

https://bootstrap-vue.js.org/docs/components/table/

The collection view is rendering item data that shouldn't be shown per the bootstrap vue docs.

fix broken thumbnails for image media types

fix broken thumbnails for image media types and display the image instead of video player in asset view

Elements defined in custom table slots not being rendered

Fields such as thumbnail and actions are not being rendered from the slot prop that is passed into the table.

https://bootstrap-vue.js.org/docs/components/table/

Add 'rerun analysis' button on asset analysis view

Provide an option for users to rerun an analysis workflow for a given asset

Document and implement solution limits

Design, Document and implement configuration parameters for solution limits.

Break out this task after design.

Hard limits are absolute limits of the solution that can't be changed.
Soft limits are configurable limits set by the owner of the deploy based on the policies they want to enforce for their workloads.

Example: Currently, we set an artificial limit of 256M on the input file size. This prevents unintentional and possibly costly jobs from being run on large (file size) inputs. We want to make this check configurable for the solution. This is a soft limit. There is also an absolute limit (unmeasured) to the size of file MIE can handle, this is a hard limit.

Proposal for limits on this solution:

Video

File-based video

Video size This is a tough determination to make, since the input sizes for video can vary widely for the same length video. Also, we could use proxy encodes to reduce file sizes if we hit service limits.

Video duration Use Rekognition limit of 2 hours for a single file. Longer durations could be handled by chunking inputs (segmented video). Soft limits: configurable.

Segmented video (#43)

Video size same as file based.

Video duration Hard limit: unlimitied (to up to storage limits for analysis outputs). Soft limits: configurable.

Create an operator that can detect scenes in video.

This operator will be very useful to help determine different scenes in video. Applications include searching for ads, credits, slates, scenes. Another application is that applying ML to unique scenes would save processing time by running ML on still frames, rather than on video.

Find an alternative to the IP-based policy for Elasticsearch

Relying on the IP-based policy for Elasticsearch casues several pain points for users of the MIE web app and Kibana which currently access ES from the internet (rather than the AWS network):

chaging locations casues you to have to change the IPs in the ES access policy and reboot the cluster
it is not always obvious what IP will be used to access ES in more complex network environments, so it creates a difficult out of the box experience
for production or even shared use, everyone will need to change this. Provide something that can be extended for more production uses cases

Show workflow configuration options in the GUI

Show workflow configuration options for the kitchen sink workflow (MieCompleteWorkflow), including:

enable / disable each operator
specify custom vocabulary for Transcribe
specify face collection for Face Search
specify data file for Generic Data Lookup
specify input and output languages for translations
specify the voice for polly

Set deployment defaults to enable all components required to build the sample Media Insights web application

The deployment script for MIE is componentized so the stack can be deployed in layers. This facilitates debugging and also allows you deploy only the parts of MIE that are needed for your application.

The current default settings for the deployment are set to build only the APIs. The default should be changed to deploy everything needed for the demo application.

Operators should be self-documenting

When I list operators using

GET /workflow/operator 
or
GET /workflow/operator/<operator-name>

I want to see a decription of the operator, the input parameters and types and the outputs and types. Also, it would be great if the same information was (somehow) propagated automatically to the documentation when I build MIE.

provide a developer's guide

Provide a developer's guide that explains how to create and use workflows and operators, how to get media data from DynamoDB, and how to extend the Elasticsearch consumer to feed other downstream data stores.

Cognito support in GUI

* Integrate cognito libraries into app
* Update website build script to parse cognito values into env variables
* Login component
* Logout component
* Update secure views to perform authorization check
* Update API calls to pass auth token

Add AWS Cognito Authentication and IAM policies to MIE Dataplane and Workflow APIs

Currently APIs are secured using IP-based policies. Replace this with Cognito Authentication and AWS IAM policies to enable more flexible user access management.

show transcript as subtitles in the video player

Feature requested by stakeholders: show transcript as subtitles in the video player

"Elasticsearch server denied access. Please check its access policy" message on Analyze page of UI

Upload is working as I can see my video in the Collection view, but when I click on the Analyze link for the asset in the collection view I get this error message and no data are dispalyed for the asset.

Enable configuration for custom vocabulary for Transcribe operations

Provide an interface to configure a custom vocabulary to be used by Transcibe in the Media Insights UI.

Include user guide under Help menu

Include user guide under Help menu so users know how to get started (e.g. step 1 Upload, step 2 wait for workflow to finish, step 3 goto collection view and click analyze, etc)

Support for subtitle indexing (.scc, .srt, etc)

Allow for indexing of individual / batch of closed captioning files for assets. Potentially expose via API to allow for MAM integration, and Kibana for search ex: "Search this Phrase" would return all occurrences (timecode, assetid) of the phrase in an asset.

List entire workflow history for a user's session in the upload page.

List the entire workflow history for a user's session in the upload page so they know what they've run. Currently that workflow history disappears when users leave the workflow page.

add tip to lower confidence threshold if no data is found

When no data is being shown due to high confidence threshold, then advise user that they can lower confidence threshold to see data.

add favicon and app name

Change the name and icon (i.e. favicon) that appears in the browser tab for this application.

Support > 5G with S3 multipart upload

Right now, we are not using multi-part upload so we are limited by the API to 5G inputs.

labelDetection failed for 1 hour long media all the time

Hello,

I am a big fan of Media Insights Engine and it works very well for short video files in most cases, but I got fails on labelDetecting for 1 hour long video all the time no matter what file size is as you can see from attached.

If I look at the CloudWatch Logs for failed operation, it said something went wrong for AssetId.

START RequestId: dd865da7-5f88-404b-89e0-64f6090dc364 Version: $LATEST
Received: {'Name': 'labelDetection', 'Status': 'Executing', 'MetaData': {'LabelDetectionJobId': '211cae014c68d54f773fd2213fedbaecc37b934066e6ad3ae9edc5f86c8d267f', 'AssetId': '5d17cf20-4d83-4562-8ca3-13f1e0d4643d', 'WorkflowExecutionId': '14aae1e4-993b-425e-92dd-bca2dd6d83a0'}, 'Media': {}, 'Outputs': {'Error': 'Lambda.Unknown', 'Cause': 'The cause could not be determined because Lambda did not return an error type.'}}
Missing a required key for building the output object: 'AssetId'
[ERROR] Exception
Traceback (most recent call last):
File "/var/task/operator_failed.py", line 36, in lambda_handler
raise Exception
END RequestId: dd865da7-5f88-404b-89e0-64f6090dc364
REPORT RequestId: dd865da7-5f88-404b-89e0-64f6090dc364 Duration: 2.80 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 69 MB Init Duration: 275.74 ms

and this is the only thing that doesn't look like normal for me in chectlabelDetection,

START RequestId: eb1a8b08-b05f-4891-b79a-18893e192803 Version: $LATEST
Uploaded metadata for asset: 5d17cf20-4d83-4562-8ca3-13f1e0d4643d
Uploaded metadata for asset: 5d17cf20-4d83-4562-8ca3-13f1e0d4643d
Uploaded metadata for asset: 5d17cf20-4d83-4562-8ca3-13f1e0d4643d
Uploaded metadata for asset: 5d17cf20-4d83-4562-8ca3-13f1e0d4643d
Uploaded metadata for asset: 5d17cf20-4d83-4562-8ca3-13f1e0d4643d
...
Uploaded metadata for asset: 5d17cf20-4d83-4562-8ca3-13f1e0d4643d
Uploaded metadata for asset: 5d17cf20-4d83-4562-8ca3-13f1e0d4643d
Uploaded metadata for asset: 5d17cf20-4d83-4562-8ca3-13f1e0d4643d
END RequestId: eb1a8b08-b05f-4891-b79a-18893e192803
REPORT RequestId: eb1a8b08-b05f-4891-b79a-18893e192803 Duration: 300100.18 ms Billed Duration: 300000 ms Memory Size: 128 MB Max Memory Used: 86 MB