- Victor [email protected] is a software architecture of NPU in Intel.
- Victor got his master degree in EE from ZheJiang University, and joined Intel Flex at 2011. Victor started his career from graphics driver and C-for-media runtime development on Intel's GPU.
- From 2018 Q4, Victor has been working for VPU Architecture on various projects, including hardware numeric emulation, neural network low-bit quantization and pruning, vpu performance modelling ..etc.
- AI Software Architecture for multiple generation of Intel NPU product line.
- Focus on compilation technology to run models efficiently on NPU including layer fusion, vertical fusion, operator tiling, scheduling optimization.
- LLM performance optimization, mixed precision, flashattention, task pipeline ..etc.
- Key model performance anlaysis, work with engineering team to identify optimization opportunities and solutions.
- Author of nbperf (5-team-member): is a high level abstract compiler to work with VPU-EM for accurate and reliable model performance simulation.
- Author of numericsbench (3-team-member) : numeric emulation software used for NPU numeric sign-off and validation.
- Academic paper
- Neural network quantization and pruning: QAT and Post-training quantization; PACT; Low-bit quantization; Mixed Precision Quantization
- Reinforcement Learning and NAS based approach to search optimal model to fit VPU.
- Academic paper
- Focus on deep learning algorithm development on computer vision tasks.
- Led an innovation project "Personal Fitness Coach Powered by AI" incubated by China I2R
- Key developer for chip defects inspection in manufacture.
- Runtime and User Mode Graphics driver development on multiple mainstream OSes (Linux and Windows)
- Optimized resource management and cross-layer code refactoring
- Cut off 80% validation time by using virtualization technology Worked as a GPGPU (MDF) SDK runtime developer for Intel integrated GPU (from SandyBridge to SkyLake)
- Project open-source link C for Media
- Master in Computer Vision (Convolutional Neural Network, Human Pose Estimation, Product Defects Detection in Manufacture)
- Master in Graphics Runtime Development (Resource Management, Knowledge on Graphics Subsystem on Windows(DX9/DX11), Graphics software stack on Linux (Libva))
- Master in Programing Languages and Tools (C/C++, Python, Keras, Caffe)
- Intel ZiZhu Innovation Star 2018
- Multiple Person 2D Human pose estimation. CNN Network design and optimization.
- Project Introduction Article Get Your AI Fitness Coach
- Selected and incubated by Intel [China I2R Batch 4 Program]
- Numbers: ~4x inference acceleration: depth-wise/separable conv, layer fusion, multiple task learning, clCaffe, fp16, inference engine,30fps at i7-6700HQ
- Factory faced chip escape with defects on solder-resistor and land pad.
- Traditional CV method can't meet Factory's requirement (false negative and false positive)
- Designed U-shape-like network to do segmentation which outperform detection network.
- Designed image synthesis algorithm to solve limited training sample issue.
- Competition detect the keypoints of cloth to represent fashion. It contains 5 categories: skirt, blouse, dress, trousers and outwear.
- Keywords: Keras, U-net, GlobalNet+RefineNet, Multiple stack, On line hard negative mining.
- Rank Top2%, 45/2321 at first round competition.
- Competition try to identify nerve structures in ultrasound images
- Keras, U-net, Dice coefficient loss, Transformation for data augmentation.
- Rank Top5%, 55/923
- Competition wants to use CNN to classify driver's behavior, such as texting, drinking, reaching behind during driving.
- Caffe, Fine-tuning from ResNet, Driver location normalization, Data augmentation. Dropout to overcome overfitting.
- Rank Top10%, 132/1440
- 2011-03-01 - 2008-06-01
- Wireless Sensor Network
- MAC(Media Access Control) Protocol
- 2002-08-01 - 2006-06-01
- Network
- C Programing Language
- Apparatus, method, device and medium for accelerating computation of process engine
- Methods and apparatus to accelerate convolution
- Apparatus and method for reinforcement learning based post-training sparsification
- Malware Detection in Memory
- Data Stored or Free Space based Fifo Buffer
- Facilitating Efficient Communication and Data Processing in Heterogeneous Computing Environment in a Heterogeneous Computing Environment
- Event-driven Framework for GPU Programing
- EXECUTION UNIT-SHARED HYBRID TECHNIQUE FOR ACCELERATED COMPUTING ON GRAPHICS PROCESSORS
- Graphics Processing Unit Operation
- GPU-CPU TWO-PATH Memory Copy
- Method and Apparatus to Improve Shared Memory Efficiency
- Apparatus and Method to Improve Memory Access Performance Between Shared Local Memory and System Global Memory
- Fluent in English
- Native Speaker in Mandarin