Giter Site home page Giter Site logo

argonne-lcf / dlio_benchmark Goto Github PK

View Code? Open in Web Editor NEW
55.0 14.0 23.0 2.4 MB

An I/O benchmark for deep Learning applications

Home Page: https://dlio-benchmark.readthedocs.io

License: Apache License 2.0

Python 99.84% Dockerfile 0.16%
artificial-intelligence data-management deep-learning storage pytorch tensorflow

dlio_benchmark's Introduction

Deep Learning I/O (DLIO) Benchmark

test status

This README provides an abbreviated documentation of the DLIO code. Please refer to https://dlio-benchmark.readthedocs.io for full user documentation.

Overview

DLIO is an I/O benchmark for Deep Learning. DLIO is aimed at emulating the I/O behavior of various deep learning applications. The benchmark is delivered as an executable that can be configured for various I/O patterns. It uses a modular design to incorporate more data loaders, data formats, datasets, and configuration parameters. It emulates modern deep learning applications using Benchmark Runner, Data Generator, Format Handler, and I/O Profiler modules.

Installation and running DLIO

Bare metal installation

git clone https://github.com/argonne-lcf/dlio_benchmark
cd dlio_benchmark/
pip install .
dlio_benchmark ++workload.workflow.generate_data=True

Bare metal installation with profiler

git clone https://github.com/argonne-lcf/dlio_benchmark
cd dlio_benchmark/
pip install .[dlio_profiler]

Container

git clone https://github.com/argonne-lcf/dlio_benchmark
cd dlio_benchmark/
docker build -t dlio .
docker run -t dlio dlio_benchmark ++workload.workflow.generate_data=True

You can also pull rebuilt container from docker hub (might not reflect the most recent change of the code):

docker docker.io/zhenghh04/dlio:latest
docker run -t docker.io/zhenghh04/dlio:latest python ./dlio_benchmark/main.py ++workload.workflow.generate_data=True

If your running on a different architecture, refer to the Dockerfile to build the dlio_benchmark container from scratch.

One can also run interactively inside the container

docker run -t docker.io/zhenghh04/dlio:latest /bin/bash
root@30358dd47935:/workspace/dlio$ python ./dlio_benchmark/main.py ++workload.workflow.generate_data=True

PowerPC

PowerPC requires installation through anaconda.

# Setup required channels
conda config --prepend channels https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/

# create and activate environment
conda env create --prefix ./dlio_env_ppc --file environment-ppc.yaml --force
conda activate ./dlio_env_ppc
# install other dependencies
python -m pip install .

Lassen, LLNL

For specific instructions on how to install and run the benchmark on Lassen please refer to: Install Lassen

Running the benchmark

A DLIO run is split in 3 phases:

  • Generate synthetic data that DLIO will use
  • Run the benchmark using the previously generated data
  • Post-process the results to generate a report

The configurations of a workload can be specified through a yaml file. Examples of yaml files can be found in dlio_benchmark/configs/workload/.

One can specify the workload through the workload= option on the command line. Specific configuration fields can then be overridden following the hydra framework convention (e.g. ++workload.framework=tensorflow).

First, generate the data

mpirun -np 8 dlio_benchmark workload=unet3d ++workload.workflow.generate_data=True ++workload.workflow.train=False

If possible, one can flush the filesystem caches in order to properly capture device I/O

sudo sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

Finally, run the benchmark

mpirun -np 8 dlio_benchmark workload=unet3d

Finally, run the benchmark with Profiler

export DLIO_PROFILER_ENABLE=1
export DLIO_PROFILER_INC_METADATA=1
mpirun -np 8 dlio_benchmark workload=unet3d

All the outputs will be stored in hydra_log/unet3d/$DATE-$TIME folder. To post process the data, one can do

dlio_postprocessor --output-folder hydra_log/unet3d/$DATE-$TIME

This will generate DLIO_$model_report.txt in the output folder.

Workload YAML configuration file

Workload characteristics are specified by a YAML configuration file. Below is an example of a YAML file for the UNet3D workload which is used for 3D image segmentation.

# contents of unet3d.yaml
model: unet3d

framework: pytorch

workflow:
  generate_data: False
  train: True
  checkpoint: True

dataset: 
  data_folder: data/unet3d/
  format: npz
  num_files_train: 168
  num_samples_per_file: 1
  record_length: 146600628
  record_length_stdev: 68341808
  record_length_resize: 2097152
  
reader: 
  data_loader: pytorch
  batch_size: 4
  read_threads: 4
  file_shuffle: seed
  sample_shuffle: seed

train:
  epochs: 5
  computation_time: 1.3604

checkpoint:
  checkpoint_folder: checkpoints/unet3d
  checkpoint_after_epoch: 5
  epochs_between_checkpoints: 2
  model_size: 499153191

The full list of configurations can be found in: https://argonne-lcf.github.io/dlio_benchmark/config.html

The YAML file is loaded through hydra (https://hydra.cc/). The default setting are overridden by the configurations loaded from the YAML file. One can override the configuration through command line (https://hydra.cc/docs/advanced/override_grammar/basic/).

Current Limitations and Future Work

  • DLIO currently assumes the samples to always be 2D images, even though one can set the size of each sample through --record_length. We expect the shape of the sample to have minimal impact to the I/O itself. This yet to be validated for case by case perspective. We plan to add option to allow specifying the shape of the sample.

  • We assume the data/label pairs are stored in the same file. Storing data and labels in separate files will be supported in future.

  • File format support: we only support tfrecord, hdf5, npz, csv, jpg, jpeg formats. Other data formats can be extended.

  • Data Loader support: we support reading datasets using TensorFlow tf.data data loader, PyTorch DataLoader, and a set of custom data readers implemented in ./reader. For TensorFlow tf.data data loader, PyTorch DataLoader

    • We have complete support for tfrecord format in TensorFlow data loader.
    • For npz, jpg, jpeg, hdf5, we currently only support one sample per file case. In other words, each sample is stored in an independent file. Multiple samples per file case will be supported in future.

How to contribute

We welcome contributions from the community to the benchmark code. Specifically, we welcome contribution in the following aspects: General new features needed including:

  • support for new workloads: if you think that your workload(s) would be interested to the public, and would like to provide the yaml file to be included in the repo, please submit an issue.
  • support for new data loaders, such as DALI loader, MxNet loader, etc
  • support for new frameworks, such as MxNet
  • support for noval file systems or storage, such as AWS S3.
  • support for loading new data formats.

If you would like to contribute, please submit an issue to https://github.com/argonne-lcf/dlio_benchmark/issues, and contact ALCF DLIO team, Huihuo Zheng at [email protected]

Citation and Reference

The original CCGrid'21 paper describes the design and implementation of DLIO code. Please cite this paper if you use DLIO for your research.

@article{devarajan2021dlio,
  title={DLIO: A Data-Centric Benchmark for Scientific Deep Learning Applications},
  author={H. Devarajan and H. Zheng and A. Kougkas and X.-H. Sun and V. Vishwanath},
  booktitle={IEEE/ACM International Symposium in Cluster, Cloud, and Internet Computing (CCGrid'21)},
  year={2021},
  volume={},
  number={81--91},
  pages={},
  publisher={IEEE/ACM}
}

We also encourage people to take a look at a relevant work from MLPerf Storage working group.

@article{balmau2022mlperfstorage,
  title={Characterizing I/O in Machine Learning with MLPerf Storage},
  author={O. Balmau},
  booktitle={SIGMOD Record DBrainstorming},
  year={2022},
  volume={51},
  number={3},
  publisher={ACM}
}

Acknowledgments

This work used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility under Contract DE-AC02-06CH11357 and is supported in part by National Science Foundation under NSF, OCI-1835764 and NSF, CSR-1814872.

License

Apache 2.0 LICENSE


Copyright (c) 2022, UChicago Argonne, LLC All Rights Reserved

If you have questions about your rights to use or distribute this software, please contact Argonne Intellectual Property Office at [email protected]

NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit others to do so.

dlio_benchmark's People

Contributors

hariharan-devarajan avatar johnugeorge avatar kaushikvelusamy avatar krehm avatar lhovon avatar louisddn avatar olgakogiou avatar theassembler1 avatar venkat-1 avatar zhenghh04 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dlio_benchmark's Issues

Support for DALI, pytorch, and tensorflow reader

All data loaders support internal reading functions. I will use this issue to describe some Data loader and possible integration into dlio_benchmark.

Dali data loader

Examples: npz tfrecord

Suggestion: define input pipeline where we do the following: a) read files, b) extract samples, and c) resize. DaliReader will have a init, read, and finalize API.

TFRecord

Examples csv (experimental) and tfrecord

Suggestion Return tf.data.dataset which includes reading, extracting samples and resize. TensorflowReader will have a init, read, and finalize API.

Pytorch

The recommended way to use PyTorch is to define custom data loaders. But it has some custom image loading.

Suggested Changes

  • I will create separate enums for TensorflowReaders and DaliReaders we support. They will have numbers similar to our ReaderType for compatibility.
  • Rename our data loaders to DLIO_PYTORCH, DLIO_TENSORFLOW, and DLIO_DALI as this is our implementations.
  • Similarly, rename our data reads as DLIO_CSV and so on.
  • The new data loaders would be called NATIVE_TENSORFLOW and NATIVE_PYTORCH.
  • For validation our current loaders work with our DLIOReaderType. If user selects The NATIVE_TENSORFLOW then it will be validated against TensorflowReaderType and similarly for DALI.
  • The base classes for these reader would be different as well. We will have three baseclasses DaliBaseReader, DLIOBaseReader, PyTorchBaseReader, and TensorflowBaseReader.

Docker image error: " File system scheme 's3' not implemented. "

I built my own docker image from the source code, and it works fine with local filesystem. However, when I try use the s3 interface I am getting the following error:

Steps to reproduce the problem:
docker build -t dlio .

docker run -e S3_ENDPOINT=http://xxxx:9000 -e AWS_ACCESS_KEY_ID=xxxx-e AWS_SECRET_ACCESS_KEY=xxx -e S3_VERIFY_SSL=0 -e S3_USE_HTTPS=0 -e AWS_REGION=us-east-1 -t dlio dlio_benchmark workload=resnet50 ++workload.workflow.generate_data=True ++workload.workflow.train=False ++workload.storage.storage_type=s3 ++workload.storage.storage_root=mybucket

Error:
File "/usr/local/lib/python3.10/site-packages/dlio_benchmark/framework/tf_framework.py", line 130, in create_node tf.io.gfile.mkdir(id) File "/usr/local/lib/python3.10/site-packages/tensorflow/python/lib/io/file_io.py", line 483, in create_dir_v2 _pywrap_file_io.CreateDir(compat.path_to_bytes(path)) tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 's3' not implemented (file: 's3://mybucket/resnet50')

When I build and install the source code on bare-metal node, S3 interface works fine. I can read/write data to S3 without any issue. The requirement.txt includes tensorflow-io and I could not figure it out why the docker image does not work.

Add support for NFS profiler

Since files are in closed category, we need to add profiler support for NFS. Currently, we have IOSTAT which won't work for NFS.

ModuleNotFoundError: No module named 'dlio_benchmark'

Installing this for the first time so forgive my ignorance here.

I'm deploying to physical Ubuntu 22.04.3 LTS hosts with python 3.10.12

From a clean install I'm running the following steps:

git clone https://github.com/argonne-lcf/dlio_benchmark
cd dlio_benchmark/
python3 -m pip install .
dlio_benchmark ++workload.workflow.generate_data=True

However, the invoking dlio_benchmark delivers a traceback:

apatt@server-12:~/github/dlio_benchmark$ dlio_benchmark ++workload.workflow.generate_data=True
Traceback (most recent call last):
  File "/home/apatt/.local/bin/dlio_benchmark", line 5, in <module>
    from dlio_benchmark import main
ModuleNotFoundError: No module named 'dlio_benchmark'

Poking around Google (and even some hunting through the forrest of ChatGPT responses) did not yield any solid leads.
Not sure where to go from here. Before I go mucking about with .pth files and the like just want to know if I'm overlooking something simple.

profiler invalid

Hi, I run everything fine in docker. But how should I use other profilers when running the BERT model. If I set ++workload.profiling.profiler=tf/tensorflow, it will cause the error 'tf' is not a valid Profiler/'tensorflow' is not a valid Profiler. How should I use other profilers correctly?

dlio_postprocessor.py fails extracting iostat trace

Hi Team

Below is my run config . Using default.yaml template and changing some params

MPI_procs = 128

batch_size=64
epochs = 2

profiling:
profiler: iostat
iostat_devices: nvme3n1

Below is the failing trace

2022-12-20 06:42:29 Processing loading and processing times for epoch 1
2022-12-20 06:42:29 Processing loading times for phase eval
2022-12-20 06:42:29 Processing processing times for phase eval
2022-12-20 06:42:29 Processing loading and processing times for epoch 2
2022-12-20 06:42:29 Reading from /workspace/dlio/hydra_log/default/2022-12-20-06-42-17/125_load_and_proc_times.json
2022-12-20 06:42:29 Processing loading and processing times for epoch 1
2022-12-20 06:42:29 Processing loading times for phase eval
2022-12-20 06:42:29 Processing processing times for phase eval
2022-12-20 06:42:29 Processing loading and processing times for epoch 2
2022-12-20 06:42:29 Reading from /workspace/dlio/hydra_log/default/2022-12-20-06-42-17/126_load_and_proc_times.json
2022-12-20 06:42:29 Processing loading and processing times for epoch 1
2022-12-20 06:42:29 Processing loading times for phase eval
2022-12-20 06:42:29 Processing processing times for phase eval
2022-12-20 06:42:29 Processing loading and processing times for epoch 2
2022-12-20 06:42:29 Reading from /workspace/dlio/hydra_log/default/2022-12-20-06-42-17/127_load_and_proc_times.json
2022-12-20 06:42:29 Processing loading and processing times for epoch 1
2022-12-20 06:42:29 Processing loading times for phase eval
2022-12-20 06:42:29 Processing processing times for phase eval
2022-12-20 06:42:29 Processing loading and processing times for epoch 2
2022-12-20 06:42:29 Computing overall stats
2022-12-20 06:42:29 Computing per epoch stats
2022-12-20 06:42:29 Computing stats for epoch 1 eval
2022-12-20 06:42:29 Parsing iostat trace
2022-12-20 06:42:29 Processing iostat item 0
2022-12-20 06:42:29 Extracting stats from iostat trace
2022-12-20 06:42:29 Extracting stats for epoch 1 start
2022-12-20 06:42:29 Extracting stats for epoch 1 block1
===============Processing DLIO output================
  Job configuration
  output_folder: /workspace/dlio/hydra_log/default/2022-12-20-06-42-17
  num_proc: 128
  epochs: 2
Traceback (most recent call last):
  File "/workspace/dlio/src/dlio_postprocessor.py", line 626, in <module>
  batch_size: 64
  do_eval: True
  batch_size_eval: 1
  do_checkpoint: False
  debug: False
  name: default
    main()
  File "/workspace/dlio/src/dlio_postprocessor.py", line 623, in main
    postproc.generate_report()
  File "/workspace/dlio/src/dlio_postprocessor.py", line 552, in generate_report
    self.extract_stats_from_iostat_trace()
  File "/workspace/dlio/src/dlio_postprocessor.py", line 346, in extract_stats_from_iostat_trace
    start, end = pd.to_datetime(phase_data['start']), pd.to_datetime(phase_data['end'])
KeyError: 'end'

latest master fails to run via docker container

This morning I executed the following code based on the README.md instructions:

git clone https://github.com/argonne-lcf/dlio_benchmark
cd dlio_benchmark/
docker build -t dlio .
docker run -t dlio dlio_benchmark ++workload.workflow.generate_data=True

It fails almost immediately with:

[root@delphi-017 dlio_benchmark]# docker run -t dlio dlio_benchmark ++workload.workflow.generate_data=True
Traceback (most recent call last):
File "/usr/local/bin/dlio_benchmark", line 5, in
from dlio_benchmark.main import main
File "/usr/local/lib/python3.8/dist-packages/dlio_benchmark/main.py", line 41, in
from dlio_benchmark.utils.statscounter import StatsCounter
File "/usr/local/lib/python3.8/dist-packages/dlio_benchmark/utils/statscounter.py", line 18, in
from dlio_benchmark.utils.config import ConfigArguments
File "/usr/local/lib/python3.8/dist-packages/dlio_benchmark/utils/config.py", line 35, in
from dlio_profiler.logger import fn_interceptor as Profile
File "/usr/local/lib/python3.8/dist-packages/dlio_profiler/logger.py", line 19, in
import dlio_profiler_py as profiler
ModuleNotFoundError: No module named 'dlio_profiler_py'

Syntax Error in dlio_benchmark.py

Hello again.

finally i installed dlio_benchmark and run it. but i got syntaxkError.

SyntaxError: Non-ASCII character '\xc2' in file ./src/dlio_benchmark.py on line 3, but no encodign declared;

how can i resolve this error. Thanks.

logs not getting written when multiprocessing_context is spawn or forkserver

I just opened PR #130 to fix dlio.log so that it gets reopened in spawn and forkserver child
processes so that the child log messages are not lost.

The same problem exists with dlp.log, but some of the code that needs to change is in repository
dlio-profiler. Once that is updated and its release number is bumped, then changes can
be made in dlio_benchmark to use the newer dlio-profiler version.

container run issue with exec format error

Hi, I have x64 based system with Ubuntu 20.04 and with GPUs in it. When i try to run
docker run -t docker.io/zhenghh04/dlio:latest python ./dlio_benchmark/main.py ++workload.workflow.generate_data=True,
I get below error .
standard_init_linux.go:211: exec user process caused "exec format error"
PLease suggest

Multi-threaded reading for Tensorflow

python src/dlio_benchmark.py workload=unet3d ++workload.reader.read_threads=4 ++workload.workflow.generate_data=True ++workload.framework=tensorflow ++workload.reader.data_loader=tensorflow

unet3d training fails when used with multiprocessing_context=spawn

I have been chasing a problem where the unet3d training run fails at the end with a malloc corruption abort. I have managed to shrink the size of the training run down to the minimum that fails, which is 5 sample files and 1 MPI process and 1 reader process. The training run reaches the end and calculates the AU efficiency, then aborts.

...
INFO] Averaged metric over all epochs
[METRIC] ==========================================================
[METRIC] Training Accelerator Utilization [AU] (%): 0.0000 (0.0000)
[METRIC] Training Throughput (samples/second): 0.4132 (0.0000)
[METRIC] Training I/O Throughput (MB/second): 57.7696 (0.0000)
[METRIC] train_au_meet_expectation: fail
[METRIC] ==========================================================
 [/mnt/nvm/rehm/storage/dlio_benchmark/dlio_benchmark/utils/statscounter.py:121]
[INFO] 2024-01-23T08:49:04.125469 outputs saved in RANKID_output.json [/mnt/nvm/rehm/storage/dlio_benchmark/dlio_benchmark/utils/statscounter.py:318]
[DLIO_PROFILER ERROR]: signal caught 15
[DLIO_PROFILER ERROR]: signal caught 15
[DLIO_PROFILER ERROR]: signal caught 11
malloc(): unsorted double linked list corrupted
[DLIO_PROFILER ERROR]: signal caught 6

The two signal 15 messages correspond to the resource_tracker child process and the single child process spawned to read the 5 sample files. What happens is that one of the threads in the spawn-child aborts due to the malloc corruption. numpy is running BLAS and has a thread pool with a number of threads in it. After the one thread takes the abort, all the other threads are left blocked waiting for pthread locks forever, and the resource_tracker child process waits for the spawn-child to exit, which it never will. Both processes have to be killed off with "kill -9" to get rid of them. I do not yet know the source of the signal 11 seen above, perhaps it is also related to the malloc corruption.

I will upload the yaml file that I use with this run. You will note that profiling is disabled, yet it appears to be running anyway in both child processes. I am running the latest 'main' branch, commit fb762c2 is the head commit.

The command line that I use is:

mpirun -np 1 python3 dlio_benchmark/dlio_benchmark/main.py --config-path=/mnt/nvm/rehm/storage/storage-conf workload=unet3d ++workload.workflow.train=True ++hydra.output_subdir=configs

unet3d.yaml.txt

Checkpointing feature doesn't work

Checkpointing is enabled in the config file but feature is broken. Below is the relevant parts of the sample config file that I used

workflow:
  generate_data: False
  train: True
  evaluation: True
  checkpoint: True

checkpoint:
  checkpoint_after_epoch: 2
  epochs_between_checkpoints: 2
  steps_between_checkpoints: 4

Create pip packages for dlio_benchmark and dlio-profiler

We need to create a pick package for dlio_benchmark and dlio-profiler and release them.

Also as a part of this change, I can check dynamically within dlio_benchmark if dlio-profiler is installed and then set the required variables before loading it up.

Unable to run, missing package perftrace

Did the code get updated but not the requirements.txt file? I started getting the following error when I updated to the latest head (at about 1pm MT 3/10).

ImportError: cannot import name 'perftrace' from 'src.utils.utility' (/workspace/dlio/src/utils/utility.py)
from src.utils.utility import utcnow, measure_performance, perftrace

I am searching if there is a pip package associated to this, but I haven't found it yet.

Persistent Pytorch workers

@zhenghh04 there is an option with Pytorch data loader to make works persistent using persistent_workers=True. I believe this will reduce overhead of spawning workers within Pytorch and help the data loading.

Thoughts?

Checkpointing issues

I had a quick test with Checkpointing feature. I see the following issues

  1. When epochs_between_checkpoints is set, evaluation is skipped in the same epoch where checkpointing happens.
  2. when steps_between_checkpoints is set, error occurs during checkpointing of second block
Traceback (most recent call last):
  File "dlio_benchmark/src/dlio_benchmark.py", line 354, in main
    benchmark.run()
  File "dlio_benchmark/src/dlio_benchmark.py", line 295, in run
    steps = self._train(epoch)
  File "dlio_benchmark/src/dlio_benchmark.py", line 236, in _train
    self.stats.end_block(epoch, block, block_step)
  File "/home/ubuntu/storage/dlio_benchmark/src/utils/statscounter.py", line 103, in end_block
    if 'end' in self.per_epoch_stats[epoch][f'block{block}']:
KeyError: 'block2' 

dliobenchmark installation issue.kindly help

I have installed and tested dlio benchmark on my cluster previously but from last three days there are some errors which i am facing on my system regarding openfabrics and i am unable to diagnose the cause. I even tried installing it on other system but same issue is coming again and again . Any kind of help and suggestions will help me a lot . i have attached the screenshot for reference. please help

Screenshot (5)

tensorflow framework computation time

if framework=tensorflow is set, the computation_time set in the configuration file is not reflecting the true computation time

For example
with https://github.com/argonne-lcf/dlio_benchmark/blob/workloads/dlio_benchmark/configs/workload/resnet50.yaml

The actual tracing info is more than that was set.

{"name":"TFFramework.compute","cat":"ai_framework","pid":0,"tid":110573,"ts":169653616,"dur":115326,"ph":"X","args":{"hostname":"x3101c0s7b0n0","core_affinity": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]}}
{"name":"TFFramework.compute","cat":"ai_framework","pid":0,"tid":110573,"ts":169778575,"dur":118006,"ph":"X","args":{"hostname":"x3101c0s7b0n0","core_affinity": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]}}
{"name":"TFFramework.compute","cat":"ai_framework","pid":0,"tid":110573,"ts":169910633,"dur":116634,"ph":"X","args":{"hostname":"x3101c0s7b0n0","core_affinity": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]}}
{"name":"TFFramework.compute","cat":"ai_framework","pid":0,"tid":110573,"ts":170239135,"dur":107555,"ph":"X","args":{"hostname":"x3101c0s7b0n0","core_affinity": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]}}
{"name":"TFFramework.compute","cat":"ai_framework","pid":0,"tid":110573,"ts":170356073,"dur":123732,"ph":"X","args":{"hostname":"x3101c0s7b0n0","core_affinity": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]}}
{"name":"TFFramework.compute","cat":"ai_framework","pid":0,"tid":110573,"ts":170487830,"dur":118801,"ph":"X","args":{"hostname":"x3101c0s7b0n0","core_affinity": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]}}
{"name":"TFFramework.compute","cat":"ai_framework","pid":0,"tid":110573,"ts":170613041,"dur":125267,"ph":"X","args":{"hostname":"x3101c0s7b0n0","core_affinity": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]}}
{"name":"TFFramework.compute","cat":"ai_framework","pid":0,"tid":110573,"ts":170744207,"dur":132214,"ph":"X","args":{"hostname":"x3101c0s7b0n0","core_affinity": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]}}
{"name":"TFFramework.compute","cat":"ai_framework","pid":0,"tid":110573,"ts":170883659,"dur":119470,"ph":"X","args":{"hostname":"x3101c0s7b0n0","core_affinity": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]}}
{"name":"TFFramework.compute","cat":"ai_framework","pid":0,"tid":110573,"ts":171017504,"dur":136645,"ph":"X","args":{"hostname":"x3101c0s7b0n0","core_affinity": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]}}

torch_framework creates checkpoint_folder in the wrong location

Checkpoint folders and files are to be created relative to the storage_root of the storage class used for the run. Code in torch_framework.py for checkpoint_folder and checkpoint file creation is hard-coded using posix calls which may not be appropriate for future storage classes, and the checkpoint_folder is not relative to the storage_root. Storage class methods should be used for checkpoint_folder and checkpoint file creation.

Pull request #126 addresses these issues.

ConfigArguments singleton not compatible with multiprocessing_context of spawn or forkserver

I have tried running dlio_benchmark with infiniband, and need to use multiprocessing_context=spawn or forkserver. It turns out that the ConfigArguments singleton doesn't play well with these modes.

When dlio_benchmark starts up, main instantiates the singleton, which then contains all the configuration default values. main then calls LoadConfig() with the yaml file as a parameter, which causes attributes in the ConfigArguments singleton to be replaced with the overriding values from the yaml file and from the command line.

This all works fine in the main process, but when a child process is spawned, the pickled data sent to that child for execution only contains a reference to ConfigArguments, not the current contents of the parent's ConfigArguments singleton.

I set a breakpoint at the beginning of method worker_init() in torch_data_loader.py. When the child process breaks in pdb, I step into the ReaderFactory.get_reader() function, then step into the following line of code:

_args = ConfigArguments.get_instance()

Since this is a child process that was not created with fork, the ConfigArguments singleton doesn't yet exist, so a new one is created, but this new instantiation contains only the default config parameters, all of the overrides in the yaml file and on the command line are lost. So the child process ends up executing incorrect code.

Any ideas on how to fix this?

Thanks, Kevan

MPIRun crash when DLIO run on ubuntu 22.04

The following dump is observed when run on ubuntu 22.04. This error is always reproducible
Followed same commands as in

sudo apt-get install mpich
python -m pip install --upgrade pip
pip install -r requirements.txt

However, no issue is found on Ubuntu 20.04 (which is the same as GitHub CI).
One difference that I saw was, default mpich package of ubuntu 22.04 is 4.0-3 while it was 3.3.2 for ubuntu 20.04

mpirun -np 8 python3 dlio_benchmark/src/dlio_benchmark.py --config-path=$CONFIG_PATH workload=unet3d ++workload.workflow.profiling=True ++workload.profiling.profiler=iostat ++workload.profiling.io_devices_to_trace=sda

2022-11-27T15:39:13.503015 Running DLIO with 8 processes [/home/ubuntu/storage/dlio_benchmark/src/dlio_benchmark.py:102]
2022-11-27T15:39:13.503270 Reading YAML config file './configs/workload/unet3d.yaml' [/home/ubuntu/storage/dlio_benchmark/src/dlio_benchmark.py:104]
2022-11-27T15:39:13.535216 Profiling Started with iostat [/home/ubuntu/storage/dlio_benchmark/src/dlio_benchmark.py:182]
2022-11-27T15:39:13.535376 Max steps per epoch: 113 = 1 * 3620 / 4 / 8 (samples per file * num files / batch size / comm size) [/home/ubuntu/storage/dlio_benchmark/src/dlio_benchmark.py:273]
2022-11-27T15:39:13.535420 Steps per eval: 5 = 1 * 42 / 1 / 8 (samples per file * num files / batch size eval / comm size) [/home/ubuntu/storage/dlio_benchmark/src/dlio_benchmark.py:277]
2022-11-27T15:39:13.535456 Starting epoch 1: 113 steps expected [/home/ubuntu/storage/dlio_benchmark/src/utils/statscounter.py:50]
2022-11-27T15:39:13.539909 Starting block 1 [/home/ubuntu/storage/dlio_benchmark/src/utils/statscounter.py:93]
A process has executed an operation involving a call
to the fork() system call to create a child process.

As a result, the libfabric EFA provider is operating in
a condition that could result in memory corruption or
other system errors.

For the libfabric EFA provider to work safely when fork()
is called, you will need to set the following environment
variable:
          RDMAV_FORK_SAFE

However, setting this environment variable can result in
signficant performance impact to your application due to
increased cost of memory registration.

You may want to check with your application vendor to see
if an application-level alternative (of not using fork)
exists.

Your job will now abort.

python3:7393 terminated with signal 6 at PC=7f8e8d500a7c SP=7fff7bc8e290.  Backtrace:
A process has executed an operation involving a call
to the fork() system call to create a child process.

As a result, the libfabric EFA provider is operating in
a condition that could result in memory corruption or
other system errors.

Darshan Tool

Hi i am trying to run cosmoflow dlio benchmark and i came across the darshan profiling tool. Can anyone help me out to how to use the darshan profiler to generate the reports and graphs which it generates. what steps to be followed and what commands to be usued on linux system

Validating DLRM config

The DLRM workload support is added here: #114. But we still need to validate that. I am adding this issue to keep track of that.

Error on running dlio_benchmark

Running the code from master

mpirun -np 4 python3 src/dlio_benchmark.py workload=unet3d ++workload.workflow.generate_data=True ++workload.workflow.train=True

You get the below error.

Traceback (most recent call last):  File "/home/robl/src/dlio_benchmark/src/dlio_benchmark.py", line 390, in main    benchmark.run()  File "/home/robl/src/dlio_benchmark/src/utils/utility.py", line 147, in wrapper    x = func(*args, **kwargs)  File "/home/robl/src/dlio_benchmark/src/dlio_benchmark.py", line 347, in run    self.framework.get_loader(DatasetType.VALID).read(epoch)  File "/home/robl/src/dlio_benchmark/src/utils/utility.py", line 147, in wrapper    x = func(*args, **kwargs)  File "/home/robl/src/dlio_benchmark/src/data_loader/torch_data_loader.py", line 49, in read    dataset = TorchDataset(self.format, self.dataset_type, epoch_number)  File "/home/robl/src/dlio_benchmark/src/data_loader/torch_data_loader.py", line 24, in __init__    self.reader.read(epoch_number)  File "/home/robl/src/dlio_benchmark/src/utils/utility.py", line 147, in wrapper    x = func(*args, **kwargs)  File "/home/robl/src/dlio_benchmark/src/reader/npz_reader.py", line 54, in read    self.after_read()  File "/home/robl/src/dlio_benchmark/src/reader/reader_handler.py", line 137, in after_read    self.total = int(math.ceil(self.get_sample_len() / self.batch_size))
TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

No support for different sample types in same data format

Currently a data format (npc, tfrecord, etc) maps to a single sample type (image, text, etc).

We will need a method for the various container formats to support different sample types. I would recommend adding an abstraction for SampleType that describes how to create or process a type of record. A reader would use the SampleType instead of using hard coded methods.

For example: TFRecord only supports images but it should support samples used for DLRM as well.

output_folder doesn't get created if storage class is S3

While experimenting with S3 storage, I found that the output_folder doesn't get created. This happened because the code tries to create the output folder with the following code:

        self.output_folder = self.args.output_folder
       self.output = StorageFactory().get_storage(self.args.storage_type, self.args.output_folder,
                                                  self.args.framework)
       self.output.create_namespace(exist_ok=True)

but that assumes that the storage class is posix. If the class is S3, then self.output.create_namespace() is called with what should be a bucket, and the routine itself is a no-op because the bucket is supposed to already exist. Log files are always posix, so I've created pull request #124 where I've replaced the above calls with an os.makedirs() call which always works for posix directories.

Separately, I moved the configuration of the dlio.log file farther down after the prior dlio.log has been deleted, so there is no chance that a process will end up with a log handler pointing to an open but unlinked file.

Finally, there was one log message that was issued before the prior dlio.log was deleted, so the message was lost, I moved it later in the code so that it gets logged.

Post processing issue , ImportError: cannot import name 'quantiles'

Hi Team , trying to create report from profilers trace , its failing on importing quantiles from statistics module .

[root@k8s-worker86 dlio_benchmark]# ll
total 40
drwxr-xr-x  4 root root    54 Dec 15 12:18 configs
-rw-r--r--  1 root root   372 Dec 13 10:48 Dockerfile
drwxr-xr-x  3 root root    69 Dec 13 10:48 docs
-rwxr-xr-x  1 root root     0 Dec 15 12:20 __init__.py
-rw-r--r--  1 root root 11357 Dec 13 10:48 LICENSE
-rw-r--r--  1 root root  8459 Dec 13 10:48 README.md
-rw-r--r--  1 root root  1028 Dec 13 10:48 requirements.txt
-rw-r--r--  1 root root   613 Dec 13 10:48 setup.cfg
-rw-r--r--  1 root root   215 Dec 13 10:48 setup.py
drwxr-xr-x 11 root root   228 Dec 15 12:23 src
drwxr-xr-x  3 root root   106 Dec 13 10:48 tests
[root@k8s-worker86 dlio_benchmark]#
[root@k8s-worker86 dlio_benchmark]#
[root@k8s-worker86 dlio_benchmark]# python3 -m src.dlio_postprocessor.py --output-folder /mnt/dlio/hydra_log/default/2022-12-15-02-27-09
Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib64/python3.6/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/root/dlio_benchmark/src/dlio_postprocessor.py", line 24, in <module>
    from statistics import mean, median, stdev, quantiles
ImportError: cannot import name 'quantiles'
[root@k8s-worker86 dlio_benchmark]#
[root@k8s-worker86 dlio_benchmark]#
[root@k8s-worker86 dlio_benchmark]#
[root@k8s-worker86 dlio_benchmark]# pip install statistics
Requirement already satisfied: statistics in /usr/local/lib/python3.6/site-packages (1.0.3.5)
Requirement already satisfied: docutils>=0.3 in /usr/local/lib/python3.6/site-packages (from statistics) (0.18.1)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

Output explanation?

I just started testing DLIO and the output has me confused. Is there an explanation of the output somewhere?

Thanks!

Impact of affinity on read threads in DLIO.

In Livermore Computing machines, srun or lrun commands set a core affinity of 1 core which could make all threads on the node run using 1 core even if more cores are available.

We should increase the core affinity on our mpirun to make use of all cores in the system. If users have an affinity less than read threads, we should throw a warning to let them know.

Building horovod requires gcc version < 8

I am trying to build DLIO in a virtual environment to test its portability but I cannot build horovod.
I get the following error indicating that the gcc version on my machine is not supported.
I could downgrade gcc, however this is a shared system and it might break other things.

Maybe we should go back to using docker, to increase portability, unless we are supposed to be using docker? In this case, we should say it in the readme.

Failed to build horovod
Installing collected packages: horovod
  Running setup.py install for horovod ... error
  error: subprocess-exited-with-error
  
  × Running setup.py install for horovod did not run successfully.
  │ exit code: 1
  ╰─> [278 lines of output]
      running install
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.8
      creating build/lib.linux-x86_64-3.8/horovod
(... skipped)
      Running CMake in build/temp.linux-x86_64-3.8/RelWithDebInfo:
      cmake /tmp/pip-install-7kh3ez0t/horovod_b725b0ff4af74068b022383e3db12517 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_LIBRARY_OUTPUT_DIRECTORY_RELWITHDEBINFO=/tmp/pip-install-7kh3ez0t/horovod_b725b0ff4af74068b022383e3db12517/build/lib.linux-x86_64-3.8 -DPYTHON_EXECUTABLE:FILEPATH=/dl-bench/lhovon/dlio_benchmark/.venv/bin/python3
      cmake --build . --config RelWithDebInfo -- -j8 VERBOSE=1
      -- Could not find CCache. Consider installing CCache to speed up compilation.
      -- The CXX compiler identification is GNU 9.4.0
      -- Check for working CXX compiler: /usr/bin/c++
      -- Check for working CXX compiler: /usr/bin/c++ -- works
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Build architecture flags: -mf16c -mavx -mfma
      -- Using command /dl-bench/lhovon/dlio_benchmark/.venv/bin/python3
      -- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so (found version "3.1")
      -- Found MPI: TRUE (found version "3.1")
      -- Looking for a CUDA compiler
      -- Looking for a CUDA compiler - /usr/bin/nvcc
      -- Looking for a CUDA host compiler - /usr/bin/c++
      -- The CUDA compiler identification is unknown
      -- Check for working CUDA compiler: /usr/bin/nvcc
      -- Check for working CUDA compiler: /usr/bin/nvcc -- broken
      CMake Error at /usr/share/cmake-3.16/Modules/CMakeTestCUDACompiler.cmake:46 (message):
        The CUDA compiler
      
          "/usr/bin/nvcc"
      
        is not able to compile a simple test program.
      
        It fails with the following output:
      
          Change Dir: /tmp/pip-install-7kh3ez0t/horovod_b725b0ff4af74068b022383e3db12517/build/temp.linux-x86_64-3.8/RelWithDebInfo/CMakeFiles/CMakeTmp
      
          Run Build Command(s):/usr/bin/make cmTC_a113d/fast && /usr/bin/make -f CMakeFiles/cmTC_a113d.dir/build.make CMakeFiles/cmTC_a113d.dir/build
          make[1]: Entering directory '/tmp/pip-install-7kh3ez0t/horovod_b725b0ff4af74068b022383e3db12517/build/temp.linux-x86_64-3.8/RelWithDebInfo/CMakeFiles/CMakeTmp'
          Building CUDA object CMakeFiles/cmTC_a113d.dir/main.cu.o
          /usr/bin/nvcc -ccbin=/usr/bin/c++    -x cu -c /tmp/pip-install-7kh3ez0t/horovod_b725b0ff4af74068b022383e3db12517/build/temp.linux-x86_64-3.8/RelWithDebInfo/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_a113d.dir/main.cu.o
          In file included from /usr/include/cuda_runtime.h:83,
                           from <command-line>:
          /usr/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported!
            138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported!
                |  ^~~~~
          make[1]: *** [CMakeFiles/cmTC_a113d.dir/build.make:66: CMakeFiles/cmTC_a113d.dir/main.cu.o] Error 1
          make[1]: Leaving directory '/tmp/pip-install-7kh3ez0t/horovod_b725b0ff4af74068b022383e3db12517/build/temp.linux-x86_64-3.8/RelWithDebInfo/CMakeFiles/CMakeTmp'
          make: *** [Makefile:121: cmTC_a113d/fast] Error 2
      
      
      
      
      
        CMake will not be able to correctly generate this project.
      Call Stack (most recent call first):
        CMakeLists.txt:177 (enable_language)
      
      
      -- Configuring incomplete, errors occurred!
      See also "/tmp/pip-install-7kh3ez0t/horovod_b725b0ff4af74068b022383e3db12517/build/temp.linux-x86_64-3.8/RelWithDebInfo/CMakeFiles/CMakeOutput.log".
      See also "/tmp/pip-install-7kh3ez0t/horovod_b725b0ff4af74068b022383e3db12517/build/temp.linux-x86_64-3.8/RelWithDebInfo/CMakeFiles/CMakeError.log".
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-7kh3ez0t/horovod_b725b0ff4af74068b022383e3db12517/setup.py", line 213, in <module>
          setup(name='horovod',
        File "/dl-bench/lhovon/dlio_benchmark/.venv/lib/python3.8/site-packages/setuptools/__init__.py", line 145, in setup
          return distutils.core.setup(**attrs)
        File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
          dist.run_commands()
        File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
          self.run_command(cmd)
        File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/dl-bench/lhovon/dlio_benchmark/.venv/lib/python3.8/site-packages/setuptools/command/install.py", line 61, in run
          return orig.install.run(self)
        File "/usr/lib/python3.8/distutils/command/install.py", line 589, in run
          self.run_command('build')
        File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/lib/python3.8/distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/dl-bench/lhovon/dlio_benchmark/.venv/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 84, in run
          _build_ext.run(self)
        File "/usr/lib/python3.8/distutils/command/build_ext.py", line 340, in run
          self.build_extensions()
        File "/tmp/pip-install-7kh3ez0t/horovod_b725b0ff4af74068b022383e3db12517/setup.py", line 145, in build_extensions
          subprocess.check_call(command, cwd=cmake_build_dir)
        File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-install-7kh3ez0t/horovod_b725b0ff4af74068b022383e3db12517', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_RELWITHDEBINFO=/tmp/pip-install-7kh3ez0t/horovod_b725b0ff4af74068b022383e3db12517/build/lib.linux-x86_64-3.8', '-DPYTHON_EXECUTABLE:FILEPATH=/dl-bench/lhovon/dlio_benchmark/.venv/bin/python3']' returned non-zero exit status 1.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> horovod

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.