plaidml / openvino Goto Github PK

This project forked from openvinotoolkit/openvino

OpenVINO™ Toolkit - Deep Learning Deployment Toolkit repository

Home Page: https://docs.openvinotoolkit.org/latest/index.html

License: Apache License 2.0

CMake 1.09% Python 17.82% C++ 80.22% C 0.46% Shell 0.20% Batchfile 0.12% HTML 0.03% Dockerfile 0.01% JavaScript 0.02% CSS 0.03%

openvino's Introduction

A platform for making deep learning work everywhere.

To Our Users

First off, we’d like to thank you for choosing PlaidML. Whether you’re a new user or a multi-year veteran, we greatly appreciate you for the time you’ve spent tinkering around with our source code, sending us feedback, and improving our codebase. PlaidML would truly not be the same without you.

The feedback we have received from our users indicates an ever-increasing need for performance, programmability, and portability. During the past few months, we have been restructuring PlaidML to address those needs. Below is a summary of the biggest changes:

We’ve adopted MLIR, an extensible compiler infrastructure that has gained industry-wide adoption since its release in early 2019. MLIR makes it easier to integrate new software and hardware into our compiler stack, as well as making it easier to write optimizations for our compiler.
We’ve worked extensively on Stripe, our low-level intermediate representation within PlaidML. Stripe contains optimizations that greatly improve the performance of our compiler. While our work on Stripe began before we decided to use MLIR, we are in the process of fully integrating Stripe into MLIR.
We created our C++/Python embedded domain-specific language (EDSL) to improve the programmability of PlaidML.

Today, we’re announcing a new branch of PlaidML — plaidml-v1. This will act as our development branch going forward and will allow us to more rapidly prototype the changes we’re making without breaking our existing user base. As a precaution, please note that certain features, tests, and hardware targets may be broken in plaidml-v1 as is a research project. Right now plaidml-v1 only supports Intel and AMD CPUs with AVX2 and AVX512 support.

You can continue to use code on the master branch or from our releases on PyPI. For your convenience, the contents of our master branch will be released as version 0.7.0. There is no further development in this branch. plaidml-v1 is a research project.

PlaidML is an advanced and portable tensor compiler for enabling deep learning on laptops, embedded devices, or other devices where the available computing hardware is not well supported or the available software stack contains unpalatable license restrictions.

PlaidML sits underneath common machine learning frameworks, enabling users to access any hardware supported by PlaidML. PlaidML supports Keras, ONNX, and nGraph.

As a component within the nGraph Compiler stack, PlaidML further extends the capabilities of specialized deep-learning hardware (especially GPUs,) and makes it both easier and faster to access or make use of subgraph-level optimizations that would otherwise be bounded by the compute limitations of the device.

As a component under Keras, PlaidML can accelerate training workloads with customized or automatically-generated Tile code. It works especially well on GPUs, and it doesn't require use of CUDA/cuDNN on Nvidia hardware, while achieving comparable performance.

PlaidML works on all major operating systems: Linux, macOS, and Windows.

Building PlaidML from source

Due to use of conda PlaidML runs on all major Linux distributions.

export PLAIDML_WORKSPACE_DIR=[choose a directory of your choice]

# setting up miniconda env
cd ${PLAIDML_WORKSPACE_DIR}
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.12.0-Linux-x86_64.sh
bash Miniconda3-py37_4.12.0-Linux-x86_64.sh -p ${PLAIDML_WORKSPACE_DIR}/miniconda3
eval "$(${PLAIDML_WORKSPACE_DIR}/miniconda3/bin/conda shell.bash hook)"
conda activate

# clone plaidml-v1 and set up env
git clone https://github.com/plaidml/plaidml.git --recursive -b plaidml-v1
cd plaidml
conda env create -f environment.yml -p .cenv/
conda activate .cenv/

# we might need to go into .cenv/bin and create a sym-link 
cd .cenv/bin/
ln -s ninja ninja-build
cd ../../

# preparing PlaidML build
./configure

# buidling PlaidML
cd build-x86_64/Release
ninja && PYTHONPATH=$PWD python plaidml/plaidml_setup.py

Demos and Related Projects

Plaidbench

Plaidbench is a performance testing suite designed to help users compare the performance of different cards and different frameworks.

cd build-x86_64/Release
ninja plaidbench_py && PYTHONPATH=$PWD KMP_AFFINITY=granularity=fine,verbose,compact,1,0 OMP_NUM_THREADS=8 python plaidbench/plaidbench.py -n128 keras resnet50

The command above is suited for 8-core Intel/AMD CPUs with hyper-threading enabled. E.g. on an Intel i9-11900K we expect around 8.5ms latency.

Reporting Issues

Either open a ticket on GitHub.

CI & Validation

Validated Hardware

A comprehensive set of tests for each release are run against the hardware targets listed below.

AMD CPUs with AVX2 and AVX512
Intel CPUs with AVX2 and AVX512

Validated Networks

We support all of the Keras application networks from current versions of 2.x. Validated networks are tested for performance and correctness as part of our continuous integration system.

CNNs
- Inception v3
- ResNet50
- VGG19
- VGG16
- Xception
- DenseNet

openvino's People

Contributors

Watchers

Forkers

flex-plaidml-team

openvino's Issues

Add LSTMSequence

Needs SCF (plaidml/plaidml#1494) or maybe eDSL looping (plaidml/plaidml#1466)

Handle multiple-output ops

Operations that produce multiple outputs currently cause map::at errors; this appears related to how we index tensorIONameMap_.

Add BinaryConvolution

Fix FP16 Testing

Tests of FP16 ops fail for the PlaidML plugin in bizarre ways. An example error message is

Relative comparison of values expected: -4713.58544921875 and actual: -54740496384 at index 0 with threshold 0.0099999997764825821 failed

This is especially notable as the most negative FP16 value is -65504, so it would not be possible to achieve this value in FP16.

However, CPU tests for these same ops with fp16 precision do pass.

Add ExtractImagePatches

Add ReadValue

Add ScatterNDUpdate

Wait for an updated eDSL scatter (plaidml/plaidml#1493)

Add Assign

Add Mish

If we add softplus (see plaidml/plaidml#1496) we can use it here

Add PriorBox

Add OneHot

I believe there is some existing work on this, claiming this while I confirm.

Add Interpolate

This would benefit from interpolated gather (plaidml/plaidml#1490)

Add NonMaxSuppression

Needs SCF or something similar (plaidml/plaidml#1494)

Add Hswish

Add GatherTree

Probably needs SCF (plaidml/plaidml#1494), although an eDSL looping construct may be sufficient (plaidml/plaidml#1466)

Add ShuffleChannels

Add ReverseSequence

There may be existing work... I'm assigning to myself to look

Add Proposal

Needs sort (plaidml/plaidml#1442)

Add RegionYolo

Add support for the RegionYolo op to the PlaidML plugin. It is possible that code for this already exists and just needs to be wrapped in calling code in inference-engine/src/plaidml_plugin/ops/region_yolo.cpp.

Add SoftPlus

Consider if it should be an eDSL intrinsic (plaidml/plaidml#1496)

Add ScatterUpdate

Wait for an updated eDSL scatter (plaidml/plaidml#1493)

Add Gather

To support this with good performance, we will want eDSL to have an axis parameter for gather (plaidml/plaidml#1491). This can be worked around at the cost of performance.

Add ReorgYolo

Add support for the ReorgYolo op to the PlaidML plugin. It is possible that code for this already exists and just needs to be wrapped in calling code in inference-engine/src/plaidml_plugin/ops/reorg_yolo.cpp.

Add PriorBoxClustered

Add EmbeddingSegmentsSum

Add DeformableConvolution

This will be easier if we wait for interpolated gather to be available in the eDSL (plaidml/plaidml#1490).

Add EmbeddingBagOffsetsSum

Add Bucketize

This might be more efficient if we can directly write SCF dialect (plaidml/plaidml#1494).

However, I believe this is nonetheless currently possible: repeat to add a new dimension whose size is the number of bucket divisions, broadcast-compare to each bucket value in this axis, do a sum contraction/reduction over this axis to get the bucket number.