Giter Site home page Giter Site logo

axs2mlperf's People

Contributors

akshat-tripathi avatar elimkwan avatar ens-lg4 avatar g4v avatar maria-18-git avatar mosalov avatar psyhtest avatar sahelib25 avatar xihajun avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

boringresearch

axs2mlperf's Issues

Support cpupower

If we install linux-tools-* on Ubuntu (including in Docker):

sudo apt install linux-tools-common linux-tools-generic 

we can easily switch between different governors e.g.:

  • for Peak Performance profiles:
sudo cpupower frequency-set --governor performance
  • for Energy Efficiency profiles:
sudo cpupower frequency-set --governor powersave

More options can be available like conservative and ondemand, but performance and powersave should be sufficient.

We would like to add these commands to axs, so that when governor=$GOVERNOR is specified, we automatically run:

sudo cpupower frequency-set --governor $GOVERNOR

before the experiment.

Add creating and running of container input parameters for all benchmarks

We should add input parameter(s) for creating and running of container as in CK.
We have the following parameters in CK:
--docker=container_only
--container=$CONTAINER_ID

Examples in CK:
https://github.com/krai/ck-qaic-dev-krai/blob/2a10fb8eb7e806b0de29be40f28867cd1950b236/cmdgen/benchmark.inference.qaic-loadgen/.cm/meta.json#L123

https://github.com/krai/ck-qaic-dev-krai/blob/2a10fb8eb7e806b0de29be40f28867cd1950b236/cmdgen/benchmark.inference.qaic-loadgen/.cm/meta.json#L132

Now we have in axs:

mkdir -p ${WORKDIR}/${USER}/axs_experiment_${BENCHMARK} && \

docker run -it --name ${USER}_${BENCHMARK} --entrypoint /bin/bash --privileged --group-add $(getent group qaic | cut -d: -f3) \
-v ${WORKDIR}/${USER}/work_collection/axs2qaic-dev:/home/krai/work_collection/axs2qaic-dev_main \
-v ${WORKDIR}/${USER}/work_collection/axs2sut-dev:/home/krai/work_collection/axs2sut-dev_main \
-v ${WORKDIR}/${USER}/work_collection/axs2kilt-dev:/home/krai/work_collection/axs2kilt-dev_main \
-v ${WORKDIR}/${USER}/work_collection/axs2cpu-dev:/home/krai/work_collection/axs2cpu-dev_main \
-v ${WORKDIR}/${USER}/work_collection/axs2system-dev:/home/krai/work_collection/axs2system-dev_main \
-v ${WORKDIR}/${USER}/work_collection/kilt-mlperf-dev_main:/home/krai/work_collection/kilt-mlperf-dev_main \
-v ${WORKDIR}/${USER}/axs_experiment_${BENCHMARK}:/home/krai/experiment \
${IMAGE_NAME}
  • Add possibilities to choice what repositories and experiment directory to map. Also there are default values.
  • Add permissions for experiment dicretory:
    0 - can't delete
    1 - delete with confirmation
    2 - delete without confirmation

Investigate difference between `model_name_compliance_dict` in `base_loadgen_program` and `model_name_dict` in `submitter`

Now there are two similar dictionaries model_name_compliance_dict in base_loadgen_program and model_name_dict in submitter.

base_loadgen_program:
https://github.com/krai/axs2mlperf/blob/master/base_loadgen_program/data_axs.json#L94
submitter:
https://github.com/krai/axs2mlperf/blob/master/submitter/data_axs.json#L21

Find difference or if not combine to one dictionary and move to one place.

Add `experiment_begin_timestamp` and `experiment_end_timestamp` for all experiments

All loadgen experiments should be recorded simultaneously with the timestamp.
experiment_begin_timestamp - time when experiment starts.
experiment_end_timestamp - time when experiment stops.

So we need to add experiment_begin_timestamp and experiment_end_timestamp in the base class when we create output_entry and write it to output_entry .

Change dependency of `model_name` to `dataset_name` in `model_name2dataset_size` and `task` in `model_name2buffer_size`

There are two dictionaries model_name2dataset_size and model_name2buffer_size which allow to set dataset_size and buffer_size according to model_name in submitter
Need to change model_name dependencies to dataset_name in model_name2dataset_size and task in model_name2buffer_size.
task will be added like attribute for all programs instead of tags.
See issue:
#6
dataset_name should be taken from data_axs.json of model_recipe.

Allow setting deterministic=False for an arbitrary pipeline command

For the main axs repo: find a way to pass deterministic=True to an arbitrary call() in a pipeline (in list format).
Ideally also allow testing it from command line.

Grep for deterministic in both main axs and axs2mlperf repositories after introducing this feature, to remove our workaround.

Add new attribute `task` for all programs instead of tags like `classified_imagenet` , `detected_coco`, `bert_squad`, `gptj_cnndm`

New attribute task should be added for all benchmark programs(with and without KILT, without and with loadgen).
Need to update
https://github.com/krai/axs2mlperf/tree/master
https://github.com/krai/axs2kilt-dev
https://github.com/krai/axs/tree/master/core_collection/workflows_collection

So tags like classified_imagenet , detected_coco, bert_squad, gptj_cnndm change to attribute task in _producer_rules according to:

tag in _producer_rules attribute in _producer_rules extra attribute
classified_imagenet or image_classifier task=image_classification dataset_name=imagenet
detected_coco task=object_detection dataset_name=openimages/coco
bert_squad task=bert dataset_name=squad_v1_1
gptj_cnndm task=gptj dataset_name=cnndm_v3_0_0

Add task to each created entry, program and _producer_rules.
Remove these tags from output_entry_tags and move "output_entry_tags": [ "loadgen_output" ] to base_loadgen_program

Need to update:
Without KILT:

With KILT:

Without loadgen and KILT:

Switch to JSON as internal parameter transfer mechanism for loadgen-based programs

Currently the passing of the parameters to the execution scripts (in ONNX-classification, ONNX-detection, ONNX-BERT and Pytorch-classification) is positional. It is fragile (has to be carefully synchronized between the "sending" and the "receiving" sides) and, together with parameters that we already record in the experiment entry, is a partial duplication.

Instead, we could record ALL the input parameters in the experiment entry, and pass the path to its' data_axs.json file to the executing script as its only parameter.

For example move input parameters from the command line to experiment entry.
So all parameters will end up in data_axs.json of output_entry and the path to this JSON file will become the only input parameter to the script.

Add `lambada` dataset for `gptj`

Dataset:
https://huggingface.co/Intel/gpt-j-6B-int8-dynamic

All models supporting lambada dataset https://huggingface.co/models?dataset=dataset:lambada

We need to run for models:

  1. Intel/gpt-j-6B-pytorch-int8-static - https://huggingface.co/Intel/gpt-j-6B-pytorch-int8-static

  2. Intel/gpt-j-6B-int8-static - https://huggingface.co/Intel/gpt-j-6B-int8-static

  3. ENOT-AutoDL/gpt-j-6B-tensorrt-int8 - https://huggingface.co/ENOT-AutoDL/gpt-j-6B-tensorrt-int8 ( we need to use tensorrt)

Try to use https://github.com/krai/axs2mlperf/tree/master/gptj_reference_loadgen for working with these models and lambada. Update gptj_reference_loadgen if need it.

`object_detection_onnx_loadgen_py` doesn't work for retinanet_openimages by one command for accuracy or performance

Run

maria@dyson:~/work_collection/axs2mlperf$ 
axs byquery loadgen_output,detected_coco,framework=onnx,loadgen_scenario=Offline,loadgen_mode=AccuracyOnly,model_name=retinanet_openimages,loadgen_dataset_size=20,loadgen_buffer_size=100,execution_device=cpu
...
} saved to '/home/maria/work_collection/data_axs.json'                                                                                                                                      WARNING:root:The resolved downloading_tool_entry 'wget_tool' located at '/home/maria/work_collection/wget_tool' uses the shell tool '/usr/bin/wget'                                         WARNING:root:[shell]  parameter 'shell_cmd' is contained but BLOCKED, skipping further
------------------------------------------------------------------------------------------------------------------------                                                                    While computing nested_calls in {'file_name': 'extracted/openimages-mlperf.json', 'archive_path': ['^', 'execute', [[['byquery', [['downloaded', 'openimages_annotations', 'v2_1']]], ['get_path']]], {}], '__query': 'inference_ready,openimages_annotations,v2_1', 'tags': ['v2_1', 'openimages_annotations', 'inference_ready']} the following exception was raised: The "run" function is missing required positional arguments: ['shell_cmd']
========================================================================================================================                                                                    ------------------------------------------------------------------------------------------------------------------------                                                                    While computing nested_calls in ['^', 'byquery', 'inference_ready,openimages_annotations,v2_1'] the following exception was raised: The "run" function is missing required positional arguments: ['shell_cmd']                                                                                                                                                                          ========================================================================================================================                                                                    ------------------------------------------------------------------------------------------------------------------------                                                                    While computing nested_calls in ['^^', 'execute', [[['get', 'openimages_annotations_v21_entry'], ['get_path']]]] the following exception was raised: The "run" function is missing required positional arguments: ['shell_cmd']
========================================================================================================================                                                                    ------------------------------------------------------------------------------------------------------------------------                                                                    While computing nested_calls in [['^^', 'get', 'validation'], True, ['^^', 'substitute', 'cp #{openimages_mlperf_v2_1_path}# #{output_entry_path}#/annotations']] the following exception was raised: The "run" function is missing required positional arguments: ['shell_cmd']
========================================================================================================================
------------------------------------------------------------------------------------------------------------------------
While computing nested_calls in ['^^', 'case', [['^^', 'get', 'validation'], True, ['^^', 'substitute', 'cp #{openimages_mlperf_v2_1_path}# #{output_entry_path}#/annotations']], {'default_
value': ''}] the following exception was raised: The "run" function is missing required positional arguments: ['shell_cmd']
========================================================================================================================
------------------------------------------------------------------------------------------------------------------------
While computing nested_calls in ['^^', 'substitute', ['^^', 'get', 'shell_cmd_with_subs']] the following exception was raised: The "run" function is missing required positional arguments: ['shell_cmd']
========================================================================================================================
------------------------------------------------------------------------------------------------------------------------
While computing nested_calls in ['^', 'execute', [[['byquery', [['^^', 'get', 'images_query']]]]], {}, ['images_query']] the following exception was raised: The "run" function is missing r
equired positional arguments: ['shell_cmd']
========================================================================================================================
------------------------------------------------------------------------------------------------------------------------                                                                    While computing nested_calls in ['^^', 'execute', [[['get', 'images_entry'], ['get_path', [['^^', 'get', 'images_rel_dir']]]]]] the following exception was raised: The "run" function is missing required positional arguments: ['shell_cmd']
========================================================================================================================
------------------------------------------------------------------------------------------------------------------------
While computing nested_calls in {'resolution': 800, 'supported_extensions': ['jpeg', 'jpg', 'gif', 'png'], 'data_type': 'uint8', 'new_file_extension': 'rgb8', 'file_name': 'preprocessed', 'fof_name': 'original_dimensions.txt', '__query': ['preprocessed', 'dataset_name=openimages', 'resolution=800', 'first_n=20'], 'dataset_entry': ['^', 'byquery', [['^^', 'substitute', 'dataset,dataset_name=#{dataset_name}#']], {}, ['dataset_name']], 'images_dir': ['^^', 'dig', 'dataset_entry.images_dir'], 'annotation_data': ['^^', 'dig', 'dataset_entry.annotation_data'], 'dataset_name': 'openimages', 'first_n': 20, 'tags': ['preprocessed']} the following exception was raised: The "run" function is missing required positional arguments: ['shell_cmd']
========================================================================================================================
------------------------------------------------------------------------------------------------------------------------
While computing nested_calls in ['^', 'execute', [[['byquery', [['^^', 'get', 'preprocessed_images_query']]], ['get_path']]], {}, ['preprocessed_images_query']] the following exception was raised: The "run" function is missing required positional arguments: ['shell_cmd']
========================================================================================================================
...
  File "/home/maria/axs/core_collection/essentials_collection/downloader/code_axs.py", line 80, in download                                                                                     retval = downloading_tool_entry.call('run', [], {"url": url, "target_path": target_path, "record_entry_path": record_entry_path})                                                         File "/home/maria/axs/runnable.py", line 324, in call                                                                                                                                         action_object, joint_arg_tuple, optional_arg_dict   = function_access.prep(action_object, pos_params, self, captured_mapping)                                                             File "/home/maria/axs/function_access.py", line 85, in prep                                                                                                                                   raise TypeError( 'The "{}" function is missing required positional arguments: {}'
TypeError: The "run" function is missing required positional arguments: ['shell_cmd']

As a result model has been downloaded but not other data.

maria@dyson:~/work_collection$ ll downloaded_openimages/
total 8
drwxr-xr-x  2 maria krai   27 Sep  6 11:21 ./
drwxr-xr-x 21 maria krai 4096 Sep  6 11:21 ../
-rw-r--r--  1 maria krai   99 Sep  6 11:21 data_axs.json
maria@dyson:~/work_collection$ ll downloaded_openimages-mlperf_annotations_2.1.json.zip/
total 8
drwxr-xr-x  2 maria krai   27 Sep  6 11:21 ./
drwxr-xr-x 21 maria krai 4096 Sep  6 11:21 ../
-rw-r--r--  1 maria krai  951 Sep  6 11:21 data_axs.json
maria@dyson:~/work_collection$ ll downloaded_resnext50_32x4d_fpn.onnx/
total 145212
drwxr-xr-x  2 maria krai        59 Sep  6 11:21 ./
drwxr-xr-x 21 maria krai      4096 Sep  6 11:21 ../
-rw-r--r--  1 maria krai      1583 Sep  6 11:21 data_axs.json
-rw-r--r--  1 maria krai 148688824 Sep  6 08:04 resnext50_32x4d_fpn.onnx

Rename entry names

According to https://github.com/krai/axs2kilt-dev/issues/37
rename:
base_imagenet_loadgen_experiment -> base_image_classification_loadgen_experiment
image_classification_onnx_loadgen_py -> image_classification_using_onnxrt_loadgen
image_classification_torch_loadgen_py -> image_classification_using_torch_loadgen
object_detection_onnx_loadgen_py -> object_detection_using_onnxrt_loadgen
bert_squad_onnxruntime_loadgen_py -> bert_using_onnxrt_loadgen
gptj_cnndm_reference_loadgen_py -> gptj_reference_loadgen

Move `num_gpu` and parameters which use it to the base class `nvidia_gpu_support` and add `supported_execution_providers`

Update all entries to support of `entry_creator`

All axs entries should use entry_creator for object creation.
Now different types of save are used for it.
For example, set_path(entry_name setting) is used in case of record_entry.

First entries without dependencies on others should be updated and then it should be tested.
Also we need to check whether we can move pipeline parameter , e.g.

    "pipeline": [ "^^", "execute", [[
        [ "run" ],
        [],
        [ "get", "stored_newborn_entry" ]
    ]] ],

to entry_creator or not.

Remove links in power experiments

Now we have links for other experiments in power experiment.
In this case it is difficult to move power experiments to other machines where links will be wrong.
Need to switch to JSON with names of experiments.
Example:

maria@eb6 ~/work_collection/axs2mlperf/base_loadgen_program (master *=)$ time axs byquery power_loadgen_output,task=image_classification,framework=onnxrt,loadgen_scenario=SingleStream,loadgen_mode=PerformanceOnly,model_name=resnet50,loadgen_dataset_size=20,loadgen_buffer_size=100,loadgen_target_latency=0.68,sut_name=eb6-kilt-qaic
...
['^', 'byname', 'generated_by_power_measurement_on_run_fb6c03e09eb54951a3c79345c1398afb']
maria@eb6 ~/work_collection/axs2mlperf/base_loadgen_program (master *=)$ ll ~/work_collection/generated_by_power_measurement_on_run_fb6c03e09eb54951a3c79345c1398afb
total 36
drwxr-xr-x  3 maria krai 4096 Oct 18 16:02 ./                                                                                                                                               drwxr-xr-x 99 maria krai 8192 Oct 18 15:52 ../
-rw-r--r--  1 maria krai 2698 Oct 18 16:02 data_axs.json
lrwxrwxrwx  1 maria krai  122 Oct 18 16:02 last_mlperf_logs -> /data/maria/work_collection/generated_by_image_classification_using_onnxrt_loadgen_on_get_883c03616cf64d7b96e67306a1c37338/  drwxr-xr-x  5 maria krai 4096 Oct 18 16:02 power_logs/
-rw-r--r--  1 maria krai  251 Oct 18 16:02 program_output.json
lrwxrwxrwx  1 maria krai  122 Oct 18 15:52 ranging_logs -> /data/maria/work_collection/generated_by_image_classification_using_onnxrt_loadgen_on_get_f3da5fb349de43179d7b637b3b09f46d/
lrwxrwxrwx  1 maria krai  122 Oct 18 16:02 testing_logs -> /data/maria/work_collection/generated_by_image_classification_using_onnxrt_loadgen_on_get_883c03616cf64d7b96e67306a1c37338/
maria@eb6 ~/work_collection/axs2mlperf/base_loadgen_program (master *=)$ cat ~/work_collection/generated_by_power_measurement_on_run_fb6c03e09eb54951a3c79345c1398afb/program_output.json
{
    "ranging_entry_name": "generated_by_image_classification_using_onnxrt_loadgen_on_get_f3da5fb349de43179d7b637b3b09f46d",
    "testing_entry_name": "generated_by_image_classification_using_onnxrt_loadgen_on_get_883c03616cf64d7b96e67306a1c37338"
}

Add supporting of `JSON` file as input and output files for all benchmarks without `loadgen`

Need to add JSON file as input with all parameters and as output file.
Update:
https://github.com/krai/axs/blob/master/core_collection/workflows_collection/image_classification/onnx_image_classifier/data_axs.json#L80

https://github.com/krai/axs/blob/master/core_collection/workflows_collection/image_classification/pytorch_image_classifier/data_axs.json#L53

https://github.com/krai/axs/blob/master/core_collection/workflows_collection/image_classification/tf_image_classifier/data_axs.json#L84

https://github.com/krai/axs/blob/master/core_collection/workflows_collection/object_detection/onnx_object_detector/data_axs.json#L80

https://github.com/krai/axs/blob/master/core_collection/workflows_collection/bert/bert_squad_onnxruntime_py/data_axs.json#L81

like as example:
https://github.com/krai/axs2mlperf/blob/master/image_classification_onnx_loadgen_py/data_axs.json#L97
https://github.com/krai/axs2mlperf/blob/master/base_loadgen_program/data_axs.json#L83
https://github.com/krai/axs2mlperf/blob/master/image_classification_onnx_loadgen_py/onnx_loadgen_classifier.py#L37-L56
https://github.com/krai/axs2mlperf/blob/master/image_classification_onnx_loadgen_py/onnx_loadgen_classifier.py#L37-L56
https://github.com/krai/axs2mlperf/blob/master/image_classification_torch_loadgen_py/torch_loadgen_classifier.py#L29-L41

Also need to update:
https://github.com/krai/axs/blob/master/core_collection/workflows_collection/image_classification/onnx_image_classifier/onnx_classify.py#L59-L72

https://github.com/krai/axs/blob/master/core_collection/workflows_collection/image_classification/pytorch_image_classifier/pytorch_classify.py#L53-L66

https://github.com/krai/axs/blob/master/core_collection/workflows_collection/image_classification/tf_image_classifier/tf_classify.py#L31-L49

https://github.com/krai/axs/blob/master/core_collection/workflows_collection/object_detection/onnx_object_detector/onnx_detect.py#L28-L48

https://github.com/krai/axs/blob/master/core_collection/workflows_collection/bert/bert_squad_onnxruntime_py/bert_squad_onnxruntime.py#L12-L33

Supporting of `ipmitool` sensors for all benchmarks

Move 'work_collection' from `output_entry` as input parameter

For example we have
[ "attach", [ "^", "work_collection" ] ], in output_entry
https://github.com/krai/axs/blob/c10d6a6c888434067f14f87170e75640fa3cc02e/core_collection/workflows_collection/base_benchmark_program/data_axs.json#L10

in each experiment.
So we create all experiments in work_collection.
It is better to set collection as input parameter for saving of experiments.

Need to add a name of container(collection) where will be created an experiment to in output_entry for creating of experiments.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.