Giter Site home page Giter Site logo

intel / workload-services-framework Goto Github PK

View Code? Open in Web Editor NEW
47.0 3.0 44.0 3.59 MB

License: Apache License 2.0

CMake 2.84% Shell 43.76% Awk 2.50% Python 16.35% Jinja 6.89% M4 5.85% Smarty 0.71% Tcl 13.54% Dockerfile 0.36% HCL 4.29% Lua 0.96% Makefile 0.24% C++ 0.11% Go 0.07% Roff 0.22% JavaScript 0.96% PowerShell 0.02% Scala 0.32%

workload-services-framework's Introduction

Note: The Workload Services Framework is a benchmarking framework and is not intended to be used for the deployment of workloads in production environments. It is recommended that users consider any adjustments which may be necessary for the deployment of these workloads in a production environment including those necessary for implementing software best practices for workload scalability and security.

Introduction

This is the Workload Services Framework repository. The repository contains a set of workloads optimized for Intel(R) Xeon(R) platforms. See the list of supported workloads under the workload directory.

Setup

  • Sync your system date/time. This is required by any credential authorization.
  • If you are behind a corporate firewall, please setup http_proxy, https_proxy and no_proxy in /etc/environment.
  • Run the setup-dev.sh script to setup the development host for Cloud and On-Premises workload development and evaluation. See Cloud and On-Premises Setup for more details on the setup.

Evaluate Workload

Evaluate any workload as follows:

mkdir build 
cd build
cmake ..                               # .. is required here
cd workload/OpenSSL-RSAMB              # Go to any workload folder
./ctest.sh -N                          # List all test cases
./ctest.sh -R test_openssl_rsamb_sw_rsa -V  # Evaluate a specific test case
./list-kpi.sh logs*                    # Show KPIs

  • The WSF supports multiple validation backends. By default, the docker backend, or the Kubernetes backend if available, is used to evaluate any workload locally. To evaluate workloads on Cloud or in an on-premises cluster, please use the terraform backend. Additional setup required such as configuring Cloud account credentials.

Build Workload

mkdir -p build
cd build
cmake -DREGISTRY= -DBENCHMARK=ResNet-50 ..
cd workload/ResNet-50
make
./ctest.sh -N

TIP: You can specify BENCHMARK to limit the repository scope to the specified workload. The build and test operations on all other workloads are disabled. See Build Options for details.

cd build
cmake -DBENCHMARK=ResNet-50
make
./ctest.sh -N

See Also

workload-services-framework's People

Contributors

amit152769 avatar ddmatthe avatar dependabot[bot] avatar fruitboy1226 avatar huangguanxu avatar jiangrenzhi1226 avatar lianhao avatar liuleomiao avatar logonletty avatar mrw7 avatar palade avatar paladin2000cn avatar praveennish avatar przmk0 avatar rdower avatar rsznejder avatar tangrui333 avatar tank-tong avatar wech3 avatar xiaohui1wang avatar xiaoshaw avatar xwu2intel avatar ywu75 avatar yyxxll12345 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

workload-services-framework's Issues

Docker image build failed for PyTorch-Xeon

As part of building docker images for ResNet50-PyTorch-Xeon-Public and BERTLarge-PyTorch-Xeon-Public workloads, building PyTorch-Xeon intermediary image failed while setting up Conda environment.

https://github.com/intel/workload-services-framework/blob/23.2/stack/PyTorch-Xeon/Dockerfile.2.intel_public#L64

One of the dependency package tornado where latest versions stopped support for Python 3.7 and throwing following error

#10 27.09 ERROR: Could not find a version that satisfies the requirement tornado==6.3.2

Couple of options to resolve this

  • Upgrade Python version to 3.8 or higher here
  • Fix tornado version to <= 6.2.0

OpenSSL3-RSAMB kubernetes-config.yaml CONFIG is always set to "qat-rsa"

Summary

When running the OpenSSL3-RSAMB workload with various qatsw_* test cases, the Kubernetes configuration consistently points to "qat-rsa"

Workload Release Version

v23.2

Configuration

-- Setting: PLATFORM=SRF, ARCH=linux/amd64
-- Setting: REGISTRY=docker-registry.services.svc.cluster.local:5000/workload_public/openssl3-rsamb/
-- Setting: RELEASE=:v23.2
-- Setting: TIMEOUT=86400
-- Setting: BENCHMARK=OpenSSL3-RSAMB
-- Setting: BACKEND=kubernetes

Issue

During our testing of OpenSSL3-RSAMB with multiple qatsw_* test cases, we observed that the test_openssl3_rsamb_qatsw_* consistently produces RSA KPI results. An interesting observation is that the Kubernetes configuration file (kubernetes-config.yaml) is generated by ctest.sh sets the CONFIG environment variable to "qat-rsa" for all these test cases.

The below kubernetes-config.yaml is from running ctest.sh with test_openssl3_rsamb_qatsw_aes-gcm testcase,

#
# Apache v2 license
# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#


apiVersion: batch/v1
kind: Job
metadata:
  name: benchmark
spec:
  template:
    spec:
      containers:
      - name: benchmark
        image: openssl3-rsamb-qat-sw:v23.2
        imagePullPolicy: Always
        env:
      containers:
      - name: benchmark
        image: openssl3-rsamb-qat-sw:v23.2
        imagePullPolicy: Always
        env:
        - name: CONFIG
          value: "qat-rsa"
        securityContext:
          privileged: true
      restartPolicy: Never
  backoffLimit: 4

Expected

#
# Apache v2 license
# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#


apiVersion: batch/v1
kind: Job
metadata:
  name: benchmark
spec:
  template:
    spec:
      containers:
      - name: benchmark
        image: openssl3-rsamb-qat-sw:v23.2
        imagePullPolicy: Always
        env:
      containers:
      - name: benchmark
        image: openssl3-rsamb-qat-sw:v23.2
        imagePullPolicy: Always
        env:
        - name: CONFIG
          value: "qat-aes-gcm"
        securityContext:
          privileged: true
      restartPolicy: Never
  backoffLimit: 4

WSF External Workload Docker Builds Fail For Some Workloads

Summary

We're trying to build docker images for the WSF external workloads using the make command with the v23.3 release, the docker image builds are failing for some of the workloads with various issues.

Build Issues

Workloads and their respective issues

Workload : BERTLarge-PyTorch-Xeon-Public

ERROR: Could not find a version that satisfies the requirement tornado==6.3.3
ERROR: No matching distribution found for tornado==6.3.3

Workload : SpecCpu-2017

ERROR: failed to solve: failed to compute cache key: failed to calculate the checksum of ref 8f74b3ec-14bb-4b3f-bf4a-2415442bc78c::s1b8k8203tvipdcv00euqyggg: "/data": not found

Workload : SmartScience-YOLO-MSTCN-OpenVINO

ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref 7076f431-11b5-40fa-bd05-b9b81ac8589c::fheik60wiqbjlxsagh6d5d4hc: "/script": not found

Workload : SPDK-NVMe-o-TCP

Docker builds are not happening when trying to run make from the workload directory. Looks like there is a bug with Make Steps.

Workload : Malconv

AssertionError: Framework is not detected correctly from the model format. This could be caused by an unsupported model or inappropriate framework installation.

Workload : Video-Structure

Same issue as SPDK-NVMe-o-TCP

Workload : ResNet50-PyTorch-Xeon-Public

ERROR: Could not find a version that satisfies the requirement tornado==6.3.3
ERROR: No matching distribution found for tornado==6.3.3

Workload: 3DHuman-Pose-Estimation

ERROR: failed to solve: failed to compute cache key: failed to calculate the checksum of ref 8f74b3ec-14bb-4b3f-bf4a-2415442bc78c::lsi9ljjqcrg6rzsiznnqlueve: "/motion-tracking-sdk": not found

Docker Image Name Inconsistency Between Make Build and Kubernetes Config for v23.3 Release

Summary

When building a Docker image for the HammerDB-TPCC workload using the make command with the v23.3 release, the generated Docker image has a different name compared to the one specified in Kubernetes configuration files. This issue was not present in the v23.2 release.

The Kubernetes configuration file expects the image name to be: tpcc-mysql8031-base:v23.3, but the make build generates the Docker image name as: mysql8031-base:v23.3. This discrepancy has been observed across multiple workloads, including Fio, BertLarge-PyTorch, SmartScience, Video-Structure, CDN-NGINX, 3DHuman-Pose-Estimation, SpecCpu-2017, ResNet50-PyTorch, SPDK-NVMe-o-TCP, and Istio-Envoy.

Workload Release Version

v23.3

Configuration

  • Setting: PLATFORM=
  • Setting: REGISTRY=
  • Setting: RELEASE=:v23.3
  • Setting: TIMEOUT=86400
  • Setting: BACKEND=kubernetes

Issue

We are encountering an issue while attempting to build Docker images for the HammerDB-TPCC workload using the make command from the workload directory. The problem lies in the inconsistency between the Docker image name generated by make and the image name specified in our Kubernetes manifest files. In the case of v23.3 release, the Kubernetes config file expects the image name to be: tpcc-mysql8031-base:v23.3, while the make build process generates the Docker image name as: mysql8031-base:v23.3.

This inconsistency has also been identified across several other workloads, including Fio, BertLarge-PyTorch, SmartScience, Video-Structure, CDN-NGINX, 3DHuman-Pose-Estimation, SpecCpu-2017, ResNet50-PyTorch, SPDK-NVMe-o-TCP, and Istio-Envoy.

Expected

The image name specified in the Kubernetes configuration file should match the Docker image name generated by the make build process: tpcc-mysql8031-base:v23.3.

Running WSF External Workloads on a Kubernetes Pod without using terraform docker image

Summary

We are exploring the possibility of running WSF external workloads on a K8s pod. We are considering the 23.1 branch of WSF. Our primary objective is to determine if it's feasible to execute these workloads directly on the pod (just like a local execution), without the need to pull Docker images for running the workload.

Platform

Kubernetes EKS 1.23

What is expected?

Ability to run workloads on a pod without using the docker image of Terraform as it does on Jenkins jobs. Can we achieve this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.