Giter Site home page Giter Site logo

intel / workload-services-framework Goto Github PK

View Code? Open in Web Editor NEW
47.0 3.0 44.0 3.6 MB

License: Apache License 2.0

CMake 2.84% Shell 43.76% Awk 2.50% Python 16.35% Jinja 6.89% M4 5.85% Smarty 0.71% Tcl 13.54% Dockerfile 0.36% HCL 4.29% Lua 0.96% Makefile 0.24% C++ 0.11% Go 0.07% Roff 0.22% JavaScript 0.96% PowerShell 0.02% Scala 0.32%

workload-services-framework's Issues

OpenSSL3-RSAMB kubernetes-config.yaml CONFIG is always set to "qat-rsa"

Summary

When running the OpenSSL3-RSAMB workload with various qatsw_* test cases, the Kubernetes configuration consistently points to "qat-rsa"

Workload Release Version

v23.2

Configuration

-- Setting: PLATFORM=SRF, ARCH=linux/amd64
-- Setting: REGISTRY=docker-registry.services.svc.cluster.local:5000/workload_public/openssl3-rsamb/
-- Setting: RELEASE=:v23.2
-- Setting: TIMEOUT=86400
-- Setting: BENCHMARK=OpenSSL3-RSAMB
-- Setting: BACKEND=kubernetes

Issue

During our testing of OpenSSL3-RSAMB with multiple qatsw_* test cases, we observed that the test_openssl3_rsamb_qatsw_* consistently produces RSA KPI results. An interesting observation is that the Kubernetes configuration file (kubernetes-config.yaml) is generated by ctest.sh sets the CONFIG environment variable to "qat-rsa" for all these test cases.

The below kubernetes-config.yaml is from running ctest.sh with test_openssl3_rsamb_qatsw_aes-gcm testcase,

#
# Apache v2 license
# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#


apiVersion: batch/v1
kind: Job
metadata:
  name: benchmark
spec:
  template:
    spec:
      containers:
      - name: benchmark
        image: openssl3-rsamb-qat-sw:v23.2
        imagePullPolicy: Always
        env:
      containers:
      - name: benchmark
        image: openssl3-rsamb-qat-sw:v23.2
        imagePullPolicy: Always
        env:
        - name: CONFIG
          value: "qat-rsa"
        securityContext:
          privileged: true
      restartPolicy: Never
  backoffLimit: 4

Expected

#
# Apache v2 license
# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#


apiVersion: batch/v1
kind: Job
metadata:
  name: benchmark
spec:
  template:
    spec:
      containers:
      - name: benchmark
        image: openssl3-rsamb-qat-sw:v23.2
        imagePullPolicy: Always
        env:
      containers:
      - name: benchmark
        image: openssl3-rsamb-qat-sw:v23.2
        imagePullPolicy: Always
        env:
        - name: CONFIG
          value: "qat-aes-gcm"
        securityContext:
          privileged: true
      restartPolicy: Never
  backoffLimit: 4

Docker Image Name Inconsistency Between Make Build and Kubernetes Config for v23.3 Release

Summary

When building a Docker image for the HammerDB-TPCC workload using the make command with the v23.3 release, the generated Docker image has a different name compared to the one specified in Kubernetes configuration files. This issue was not present in the v23.2 release.

The Kubernetes configuration file expects the image name to be: tpcc-mysql8031-base:v23.3, but the make build generates the Docker image name as: mysql8031-base:v23.3. This discrepancy has been observed across multiple workloads, including Fio, BertLarge-PyTorch, SmartScience, Video-Structure, CDN-NGINX, 3DHuman-Pose-Estimation, SpecCpu-2017, ResNet50-PyTorch, SPDK-NVMe-o-TCP, and Istio-Envoy.

Workload Release Version

v23.3

Configuration

  • Setting: PLATFORM=
  • Setting: REGISTRY=
  • Setting: RELEASE=:v23.3
  • Setting: TIMEOUT=86400
  • Setting: BACKEND=kubernetes

Issue

We are encountering an issue while attempting to build Docker images for the HammerDB-TPCC workload using the make command from the workload directory. The problem lies in the inconsistency between the Docker image name generated by make and the image name specified in our Kubernetes manifest files. In the case of v23.3 release, the Kubernetes config file expects the image name to be: tpcc-mysql8031-base:v23.3, while the make build process generates the Docker image name as: mysql8031-base:v23.3.

This inconsistency has also been identified across several other workloads, including Fio, BertLarge-PyTorch, SmartScience, Video-Structure, CDN-NGINX, 3DHuman-Pose-Estimation, SpecCpu-2017, ResNet50-PyTorch, SPDK-NVMe-o-TCP, and Istio-Envoy.

Expected

The image name specified in the Kubernetes configuration file should match the Docker image name generated by the make build process: tpcc-mysql8031-base:v23.3.

WSF External Workload Docker Builds Fail For Some Workloads

Summary

We're trying to build docker images for the WSF external workloads using the make command with the v23.3 release, the docker image builds are failing for some of the workloads with various issues.

Build Issues

Workloads and their respective issues

Workload : BERTLarge-PyTorch-Xeon-Public

ERROR: Could not find a version that satisfies the requirement tornado==6.3.3
ERROR: No matching distribution found for tornado==6.3.3

Workload : SpecCpu-2017

ERROR: failed to solve: failed to compute cache key: failed to calculate the checksum of ref 8f74b3ec-14bb-4b3f-bf4a-2415442bc78c::s1b8k8203tvipdcv00euqyggg: "/data": not found

Workload : SmartScience-YOLO-MSTCN-OpenVINO

ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref 7076f431-11b5-40fa-bd05-b9b81ac8589c::fheik60wiqbjlxsagh6d5d4hc: "/script": not found

Workload : SPDK-NVMe-o-TCP

Docker builds are not happening when trying to run make from the workload directory. Looks like there is a bug with Make Steps.

Workload : Malconv

AssertionError: Framework is not detected correctly from the model format. This could be caused by an unsupported model or inappropriate framework installation.

Workload : Video-Structure

Same issue as SPDK-NVMe-o-TCP

Workload : ResNet50-PyTorch-Xeon-Public

ERROR: Could not find a version that satisfies the requirement tornado==6.3.3
ERROR: No matching distribution found for tornado==6.3.3

Workload: 3DHuman-Pose-Estimation

ERROR: failed to solve: failed to compute cache key: failed to calculate the checksum of ref 8f74b3ec-14bb-4b3f-bf4a-2415442bc78c::lsi9ljjqcrg6rzsiznnqlueve: "/motion-tracking-sdk": not found

Running WSF External Workloads on a Kubernetes Pod without using terraform docker image

Summary

We are exploring the possibility of running WSF external workloads on a K8s pod. We are considering the 23.1 branch of WSF. Our primary objective is to determine if it's feasible to execute these workloads directly on the pod (just like a local execution), without the need to pull Docker images for running the workload.

Platform

Kubernetes EKS 1.23

What is expected?

Ability to run workloads on a pod without using the docker image of Terraform as it does on Jenkins jobs. Can we achieve this?

Docker image build failed for PyTorch-Xeon

As part of building docker images for ResNet50-PyTorch-Xeon-Public and BERTLarge-PyTorch-Xeon-Public workloads, building PyTorch-Xeon intermediary image failed while setting up Conda environment.

https://github.com/intel/workload-services-framework/blob/23.2/stack/PyTorch-Xeon/Dockerfile.2.intel_public#L64

One of the dependency package tornado where latest versions stopped support for Python 3.7 and throwing following error

#10 27.09 ERROR: Could not find a version that satisfies the requirement tornado==6.3.2

Couple of options to resolve this

  • Upgrade Python version to 3.8 or higher here
  • Fix tornado version to <= 6.2.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.