Giter Site home page Giter Site logo

mesh's Introduction

Mesh TensorFlow - Model Parallelism Made Easier

PyPI version GitHub Issues Contributions welcome License Build Status

Introduction

Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying a broad class of distributed tensor computations. The purpose of Mesh TensorFlow is to formalize and implement distribution strategies for your computation graph over your hardware/processors. For example: "Split the batch over rows of processors and split the units in the hidden layer across columns of processors." Mesh TensorFlow is implemented as a layer over TensorFlow.

Watch our YouTube video.

Do I need Mesh TensorFlow?

If you just want data-parallel training (batch-splitting), then you do not need Mesh TensorFlow, though Mesh TensorFlow can do this. The most common reasons for more sophisticated parallel computation are:

  • The parameters of the model do not fit on one device - e.g. a 5-billion-parameter language model.

  • An example is so large that the activations do not fit on one device. - e.g. large 3D image model(experimental/unet.py).

  • Lower-latency parallel inference (at batch size 1).

The Mesh TensorFlow Approach to Distributed Computation

  • A "Mesh" is an n-dimensional array of processors, connected by a network.

  • Each tensor is distributed (split and/or replicated) across all processors in a mesh.

  • Tensor dimensions and mesh dimensions are named. The layouts of all tensors follow from a set of user-defined layout rules which specify which tensor-dimensions are split across which mesh-dimensions. This ensures that the corresponding dimensions in different tensors are split in the same manner.

  • Layouts do not affect results - only performance.

  • The implementation of an operation involves parallel computation on all processors in the mesh, and sometimes also collective communication. A processor usually just manipulates the slices of the input tensors already resident on that processor, and produces the slice of the output that goes on that processor.

Getting Started

Installation

To install the latest stable version, run

pip install mesh-tensorflow

To install the latest development version, run

pip install -e "git+https://github.com/tensorflow/mesh.git#egg=mesh-tensorflow"

Installing mesh-tensorflow does not automatically install or update TensorFlow. We recommend installing it via pip install tensorflow or pip install tensorflow-gpu. See TensorFlow’s installation instructions for details. If you're using a development version of Mesh TensorFlow, you may need to use TensorFlow's nightly package (tf-nightly).

Example Network (MNIST)

To illustrate, let us consider a simple model for the MNIST image-classification task. Our network has one hidden layer with 1024 units, and an output layer with 10 units (corresponding to the 10 digit classes).

The code consists of two parts, the first describing the mathematical operations, and the second describing the devices and tensor/computation layout. For the full example, see examples/mnist.py. TODO(noam): verify that this code works.

# tf_images is a tf.Tensor with shape [100, 28, 28] and dtype tf.float32
# tf_labels is a tf.Tensor with shape [100] and dtype tf.int32
graph = mtf.Graph()
mesh = mtf.Mesh(graph, "my_mesh")
batch_dim = mtf.Dimension("batch", 100)
rows_dim = mtf.Dimension("rows", 28)
cols_dim = mtf.Dimension("cols", 28)
hidden_dim = mtf.Dimension("hidden", 1024)
classes_dim = mtf.Dimension("classes", 10)
images = mtf.import_tf_tensor(
    mesh, tf_images, shape=[batch_dim, rows_dim, cols_dim])
labels = mtf.import_tf_tensor(mesh, tf_labels, [batch_dim])
w1 = mtf.get_variable(mesh, "w1", [rows_dim, cols_dim, hidden_dim])
w2 = mtf.get_variable(mesh, "w2", [hidden_dim, classes_dim])
# einsum is a generalization of matrix multiplication (see numpy.einsum)
hidden = mtf.relu(mtf.einsum(images, w1, output_shape=[batch_dim, hidden_dim]))
logits = mtf.einsum(hidden, w2, output_shape=[batch_dim, classes_dim])
loss = mtf.reduce_mean(mtf.layers.softmax_cross_entropy_with_logits(
    logits, mtf.one_hot(labels, classes_dim), classes_dim))
w1_grad, w2_grad = mtf.gradients([loss], [w1, w2])
update_w1_op = mtf.assign(w1, w1 - w1_grad * 0.001)
update_w2_op = mtf.assign(w2, w2 - w2_grad * 0.001)

In the code above, we have built a Mesh TensorFlow graph, which is simply a Python structure. We have completely defined the mathematical operations. In the code below, we specify the mesh of processors and the layout of the computation.

devices = ["gpu:0", "gpu:1", "gpu:2", "gpu:3"]
mesh_shape = [("all_processors", 4)]
layout_rules = [("batch", "all_processors")]
mesh_impl = mtf.placement_mesh_impl.PlacementMeshImpl(
    mesh_shape, layout_rules, devices)
lowering = mtf.Lowering(graph, {mesh:mesh_impl})
tf_update_ops = [lowering.lowered_operation(update_w1_op),
                 lowering.lowered_operation(update_w2_op)]

The particular layout above implements data-parallelism, splitting the batch of examples evenly across all four processors. Any Tensor with a "batch" dimension (e.g. images, h, logits, and their gradients) is split in that dimension across all processors, while any tensor without a "batch" dimension (e.g. the model parameters) is replicated identically on every processor.

Alternatively, for model-parallelism, we can set layout_rules=[("hidden", "all_processors")]. In this case, any tensor with a "hidden" dimension (e.g. hidden, w1, w2) is split, while any other tensor (e.g. image, logits) is fully replicated.

We can even combine data-parallelism and model-parallelism on a 2-dimensional mesh of processors. We split the batch along one dimension of the mesh, and the units in the hidden layer along the other dimension of the mesh, as below. In this case, the hidden layer is actually tiled between the four processors, being split in both the "batch" and "hidden_units" dimensions.

mesh_shape = [("processor_rows", 2), ("processor_cols", 2)]
layout_rules = [("batch", "processor_rows"), ("hidden", "processor_cols")]

Where does the network communication happen?

Some Mesh TensorFlow operations cause network communication. For example, an einsum (generalized matrix multiplication) is computed as follows:

  • On each processor, compute the einsum of the slices of the two operands that are local to that processor.
  • If no reduced-out dimensions are split, then we are done.
  • If reduced-out dimensions are split, then perform an "allreduce" operation on the resulting slices - summing across any mesh dimensions over which the reduced-out dimensions are split.

Where the allreduces happen depends will depend on the computation layout. For example, in a data-parallel layout where the "batch" dimension is split, allreduces will happen when computing the parameter gradients, since this involves matrix multiplications which reduce out the "batch" dimension.

How do I pick a layout?

While results do not depend on layout (except in the realm of roundoff errors and random seeds), performance and memory consumption depend heavily on layout. Fortunately, the auto_mtf subpackage provides a method for automatically choosing a layout. For more information about what auto_mtf is doing to choose a layout, see its README file.

import mesh_tensorflow.auto_mtf

graph = mtf.Graph()
mesh = mtf.Mesh(graph, "my_mesh")
# Insert model code here.
outputs = [logits, loss]  # iterable of mtf.Tensor, the outputs you're computing
mesh_shape = [("processor_rows", 2), ("processor_cols", 2)]
layout_rules = mtf.auto_mtf.layout(graph, mesh_shape, outputs)

It is possible for advanced users to eke out additional performance by tuning the layout (and model) further. Mesh TensorFlow helps by accumulating and printing counters of computation/communication. To start, here are some tricks/guidelines.

  • It is illegal for two dimensions of the same tensor to be split across the same mesh dimension.
  • For any compute-intense operation (e.g. einsum), make sure that all mesh-dimensions are used to split dimensions of the inputs or outputs. Otherwise, computation is duplicated.
  • To keep the ratio of compute/communication high (i.e. not be bandwidth-bound), split dimensions into large chunks. This should be familiar in the data-parallelism case, where we want a large batch size per processor to avoid spending most of our time communicating.

The Mesh TensorFlow Language

Mesh TensorFlow (v0.0) is implemented as a Python library which can generate part of a TensorFlow graph. The user first builds a mtf.Graph (the analog of a TensorFlow graph) made up of mtf.Tensors and mtf.Operations. As in TensorFlow, this graph consists of simple Python objects. The user then creates a mtf.Lowering object, which lowers the mtf.Graph into TensorFlow, adding to the default TensorFlow graph.

The Mesh TensorFlow language is nearly identical to TensorFlow, with the familiar notion of a Graph, Tensors, Operations, and automatic gradient computation. The principal differences are as follows:

Meshes replace devices

A Mesh is a n-dimensional array of processors with named dimensions. Each Tensor is assigned to a Mesh, instead of a device.

Tensor dimensions are named

Each Tensor has a static Shape, which is a tuple of different "Dimensions". A Dimension is a (name, size) pair. For example, the shape of a Tensor representing a batch of images might be:

[("batch", 100), ("rows", 28"), ("cols", 28), ("channels", 3)].

Layouts

A Tensor is laid out on its mesh with one slice on each processor. A Tensor "layout", is an injective partial map specifying which dimensions of the tensor are (evenly) split across which dimensions of the mesh. No dimension of a tensor may be split across two dimensions of its mesh and no two dimensions of a tensor may be split across the same dimension of its mesh. The user defines a global set of layout rules in the form of (tensor-dimension-name, mesh-dimension-name) pairs. A dimension of a tensor is split across a dimension of its mesh if there is a matching rule.

Example Layouts

Take our example Tensor image_batch with shape: [("batch", 100), ("rows", 28"), ("cols", 28), ("channels", 3)]

Assume that this Tensor is assigned to a mesh of 8 processors with shape: [("processor_rows", 2), ("processor_cols", 4)]

  • If we use an empty set of layout rules [], we get no splitting. Each processor contains the whole Tensor.

  • If we use the layout rules "batch:processor_cols", then the "batch" dimension of the Tensor is split across the "processor_cols" dimension of the batch. This means that each processor contains a Tensor slice with shape [25, 28, 28, 3]. For example, processors (0, 3) and (1, 3) contain identical slices - image_batch[75:100, :, :, :].

  • If we use the layout rules "rows:processor_rows;cols:processor_cols", then the image is split in two dimensions, with each processor containing one spatial tile with shape [100, 14, 7, 3]. For example, processor (0, 1) contains the slice image_batch[:, 0:14, 7:14, :].

Some layout rules would lead to illegal layouts:

  • "batch:processor_rows;rows:processor_rows" is illegal because two tensor dimensions could not be split across the same mesh dimension.

  • "channels:processor_rows" is illegal because the size of the tensor dimension is not evenly divisible by the size of the mesh dimension.

Einsum

Mesh TensorFlow uses Einstein-summation notation, mtf.einsum(inputs, output_shape), using the (named) Dimensions as the symbols. Matrix multiplication, broadcast, sum-reduction, and transposition can all be expressed as special cases of mtf.einsum, though the familiar interfaces are also supported. The operation is lowered to slice-wise tf.einsums, followed by allreduce across any mesh-dimensions corresponding to the summed-out Tensor dimensions.

Reshape can be expensive

mtf.reshape(x, new_shape) is used to change a Tensor's shape, potentially leading to a new tensor layout and hence network communication.

CPU/GPU/TPU implementations

Mesh TensorFlow works on CPU, GPU and TPU. The TPU implementation is very different from the CPU/GPU implementation.

Multi-CPU/GPU meshes are implemented with PlacementMeshImpl. In this case Mesh TensorFlow emits separate TensorFlow operations placed on the different devices, all in one big TensorFlow graph.

TPU meshes are implemented in with SimdMeshImpl. In this case, Mesh TensorFlow emits TensorFlow operations (and communication collectives) from the perspective of one core, and this same program runs on every core, relying on the fact that each core actually performs the same operations. This piggy-backs on the TPU data-parallelism infrastructure, which operates the same way. This "SIMD" approach keeps the TensorFlow and XLA graphs from growing with the number of cores. The differences between cores are as follows:

  • different slices of the variables (this works now)
  • different positions in the collective communication (this works now)
  • different slices of the infed and outfed tensors. We currently work around this by requiring that all imported/exported tensors be fully-replicated. In the future, we should handle this correctly.

Experimental features

The input pipeline of Mesh Tensorflow models might become a bottleneck, when training with large input (e.g., high resolution images). We provide new APIs and a new input pipeline for you to run Mesh Tensorflow models. You can find them under the experimental/ folder. We suggest that you give them a try when your input is so large that running Mesh Tensorflow models with the default APIs is almost infeasible. To be more specific:

  • The BROADCAST mode in TPUEstimator does not scale up to large inputs (images of tens of millions of pixels). We provide a new input pipeline: experimental/input_reader.py. See experimental/model_executor.py on how to use it.
  • If your model takes images as input and has convolution layers. You cannot directly map image height and width dimensions to mesh dimensions, due to the sliding-window nature of convolution. Instead, you should use spatial partitioning. We provide examples in experimental/unet.py.
  • If you want more control on the training and evaluation loop, instead of using the default API (TPUEstimator) to run your model, you can use low level APIs in experimental/model_executor.py.

Note that we did not test the experimental code on GPUs. We ran them on TPUs. We believe that some debugging would be required for it to work on GPUs.

Instructions for running on cloud-tpu

Note: It requires tensorflow>=1.11.0.

Prerequisite

Please go through the Transformer tutorial.

Create VM and TPU instance in Cloud console

TODO(trandustin,ylc): update given mtf pypi package

ctpu up -name=ylc-mtf-donut -tf-version=nightly -tpu-size=v2-8 -zone=us-central1-b

SSH into VM

git clone https://github.com/tensorflow/mesh.git
cd mesh/
pip install --user .

Run the Transfomer model (no Tensor2Tensor dependencies)

pip install tensorflow_datasets

cd mesh/
DATA_DIR=gs://noam-mtf/data
MODEL_DIR=gs://noam-mtf/transformer_standalone
TPU=noam-mtf-donut

# MODEL HPARAMS AND DIRECTORY  (uncomment one)
# base model
MODEL=./transformer/gin/model_base.gin
# 5B parameters (too big for this dataset, only trains with model-parallelism)
# MODEL=./transformer/gin/model_5b.gin

# UNCOMMENT ONE OF THESE
# Data-parallelism
LAYOUT=./transformer/gin/layout_data_parallel.gin
# Model-parallelism
# LAYOUT=./transformer/gin/layout_model_parallel.gin
# Data-parallelism and Model-Parallelism
# LAYOUT=./transformer/gin/layout_data_and_model_parallel.gin

# TRAIN
python examples/transformer_standalone.py \
  --tpu=$TPU --data_dir=$DATA_DIR --model_dir=$MODEL_DIR --gin_file=$MODEL \
  --gin_file=$LAYOUT --gin_param="run.mode='train'"

# EVAL
python examples/transformer_standalone.py \
  --tpu=$TPU --data_dir=$DATA_DIR --model_dir=$MODEL_DIR --gin_file=$MODEL \
  --gin_file=$LAYOUT --gin_param="run.mode='evaluate'"

The above code will train on the LM1B language modeling benchmark, as specified in examples/transformer_standalone_defaults.gin. To train a sequence-to-sequence model on WMT14 en-de, change utils.run.dataset to wmt_translate_ende/ende_subwords8k_t2t and set utils.run.mode to True. Note that the wmt_translate_ende/ende_subwords8k_t2t dataset was removed from TensorFlow Datasets in commit 211cb6f, so in order to train a model using this dataset you need to install a version of TFDS before this commit. Then, you can decode the WMT en-de development set and evaluate it using SacreBLEU like so:

# INFER
pip3 install sacrebleu
mkdir ~/input ~/output
DECODE_INPUT=/home/$USER/input/ende.dev
DECODE_OUTPUT=/home/$USER/output/ende.dev.out
~/.local/bin/sacrebleu -t wmt13 -l en-de --echo src > $DECODE_INPUT
python examples/transformer_standalone.py \
  --tpu=$TPU --data_dir=$DATA_DIR --model_dir=$MODEL_DIR --gin_file=$MODEL \
  --gin_file=$LAYOUT \
  --gin_param="decode_from_file.input_filename='$DECODE_INPUT'" \
  --gin_param="decode_from_file.output_filename='$DECODE_OUTPUT'" \
  --gin_param="run.mode='infer'"

# Compute BLEU score for dev set
cat $DECODE_OUTPUT | ~/.local/bin/sacrebleu -t wmt13 -l en-de -tok intl

Run the Transfomer model with Tensor2Tensor config

git clone https://github.com/tensorflow/tensor2tensor.git
cd tensor2tensor/
pip install --user  .

Before running the model, you need to prepare the training data and bucket for storing checkpoints. Refer to the Transformer tutorial to learn how to generate the training data and create buckets.

CONF=mtf_transformer_paper_tr_0_mesh_8
NAME=ende_$CONF\_0828
MODEL=mtf_transformer
PROBLEM=translate_ende_wmt32k_packed

DATA_DIR=gs://xxxx
OUT_DIR=gs://xxxx
TPU_NAME=ylc-mtf-donut

tensor2tensor/bin/t2t-trainer \
  --model=$MODEL \
  --hparams_set=$CONF \
  --problem=$PROBLEM \
  --train_steps=10000 \
  --eval_steps=200 \
  --data_dir=$DATA_DIR \
  --output_dir=$OUT_DIR \
  --use_tpu=True \
  --cloud_tpu_name=$TPU_NAME

Run the toy model without Tensor2Tensor dependencies

This toy model contains two fully-connected layers which aim to train a identity function: f(x) = x. Since there are 8 TPU cores, we can arbitrary change the FLAGS.mesh_shape and FLAGS.layout to achieve different data-parallelism and model-parallelism strategies.

MODEL_DIR=gs://xxxx
TPU_NAME=ylc-mtf-donut

# 2 ways data-parallelism and 4 ways model-parallelism.
# In this configuration, we split the batch dimension into 2 cores and the
# hidden dimension into 4 cores.
python examples/toy_model_tpu.py \
  --tpu=$TPU \
  --model_dir=$MODEL_DIR \
  --io_size=8 \
  --hidden_size=8 \
  --mesh_shape='x:2;y:4' \
  --layout='batch:x;hidden:y'

# 8 ways model-parallelism.
# In this configuration, We split the hidden dimension into 8 cores.
python examples/toy_model_tpu.py \
  --tpu=$TPU \
  --model_dir=$MODEL_DIR \
  --io_size=8 \
  --hidden_size=8 \
  --mesh_shape='all:8' \
  --layout='hidden:all'

References

N. Shazeer, Y. Cheng, N. Parmar, D. Tran, A. Vaswani, P. Koanantakool, P. Hawkins, H. Lee, M. Hong, C. Young, R. Sepassi, and B. Hechtman. Mesh-TensorFlow: Deep learning for supercomputers. In Neural Information Processing Systems, 2018.

@inproceedings{shazeer2018mesh,
  author = {Noam Shazeer and Youlong Cheng and Niki Parmar and Dustin Tran and Ashish Vaswani and Penporn Koanantakool and Peter Hawkins and HyoukJoong Lee and Mingsheng Hong and Cliff Young and Ryan Sepassi and Blake Hechtman},
  title = {{Mesh-TensorFlow}: Deep Learning for Supercomputers},
  booktitle = {Neural Information Processing Systems},
  year = {2018},
}

mesh's People

Contributors

adarob avatar afrozenator avatar brettkoonce avatar brianwa84 avatar cghawthorne avatar chuanhaozhuge avatar conchylicultor avatar copybara-service[bot] avatar craffel avatar crccw avatar daphnei avatar dustinvtran avatar hwchung27 avatar hyperparticle avatar irolnick avatar jam14j avatar katelee168 avatar majnemer avatar marcvanzee avatar mmatena avatar mrry avatar nfiedel avatar nshazeer avatar saberkun avatar sharannarang avatar sudroy avatar sun51 avatar toponado-zz avatar wuthefwasthat avatar yilei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mesh's Issues

mtf.dropout is inverted

mtf.dropout(x, 0.1) means dropout with 90% probability.

tf.dropout(x, 0.1) means dropout with 10% probability.

For around a month, this has caused an agonizing bug with a GPT project that was ported to mesh tensorflow.

Is there a reason this is inverted? Is it too late to change? If not, you might want to issue some sort of warning, somewhere. Although mtf doesn't explicitly say that it's compatible with the tf api, it was somewhat shocking to end-users that it inverted a basic operation.

Does mesh tensorflow really support GPU training?

Hi, I have been trying to use mesh tensorflow on GPUs. I ran the mnist.py example to test the speed using GPU and CPU, by setting CUDA_VISIBLE_DEVICES variable (removed convolutional layers due to cuDNN version). However, using GPUs I obtained 80-100 global_steps/sec, and got similars values using CPU. I originally doubts the real support for GPUs from my attempts to train T5 model using GPUs. Do you have a working example that demonstrates the support for GPUs, particularly on the aspect of speed?

Set up website under tensorflow.org

Alternatively, we may not want to commit to more open-source platforms (this website, but also a mailing list). Instead, we may want to look into how Mesh TF could be merged into core TF. If that's the future, this TODO would only be useful for the short-term.

Running the transformer model with Tensor2Tensor using Mesh-Tensorflow(GPU implementation)

I am trying to run the transformer model with Tensor2tensor using mesh-tensorflow (GPU-implementation) but I am facing few errors.

steps to reproduce:

PROBLEM=translate_enfr_wmt32k
MODEL=mtf_transformer
HPARAMS=mtf_transformer_paper_tr_0_mesh_8
DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS
mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR
datagen:
t2t-datagen
--data_dir=$DATA_DIR
--tmp_dir=$TMP_DIR
--problem=$PROBLEM
train:
t2t-trainer
--data_dir=$DATA_DIR
--problem=$PROBLEM
--model=$MODEL
--hparams_set=$HPARAMS
--output_dir=$TRAIN_DIR
--train_steps=10

error
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Multiple OpKernel registrations match NodeDef '{{node transformer/dropout/binary_op/parallel_0_1/Less}}': 'op: "Less" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_BFLOAT16 } } }' and 'op: "Less" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_BFLOAT16 } } }'
[[transformer/dropout/binary_op/parallel_0_1/Less]]

Regarding data and model parallelism of mnist python code in examples

I have made changes to the mnist.py in the examples section, as documented in the GitHub I have made the changes to achieve data parallelism and model parallelism. I have collected nvprof files for each of them. It seems to be a bit off. Because p2p interaction is happening in data parallelism but not in model parallelism. I went back and checked and re-created the files but still it looks the same. I am attaching the screenshots of nvprof. I have done this using 4 GPU's. I am also attaching the nvprof files.

data-parallelism

model-parallelism

link for model parallelism nvprof file:
https://drive.google.com/open?id=1omQ_neb7eUgmDRnYMmLUyKzD2inO4Kai

link for data parallelism nvprof file:
https://drive.google.com/open?id=1MHGdzexNIcV9L66x1VkUQ11DBcM5H_qv

Performance on GPUs and multiple GPU support

We tried to run Mesh-TensorFlow to train T5 on GPUs following the instructions on T5's repository, but the training is extremely slow.

global_step/sec: 0.0467347
examples/sec: 0.186939

The training script successfully detected GPUs (showing "Adding visible gpu devices: ..."), but most of computation seems to run on a CPU.
By enabling log_device_placement, we can see many operators on both CPUs and GPUs.
ProfilerHook showed that it actually uses both, but I couldn't know if the behavior is expected or not.

I am wondering if Mesh-TensorFlow runs on GPUs in a practical sense.
I found an issue that mentioned a similar problem, but it was closed with no answer (#35).

I also failed to find reliable documents about training on multiple GPUs.
An existing issue #20 mentioned the same question, but no answer was given.

I appreciate if someone could give us any information regarding the above questions.

mtf.reduce_mean crashes when reducing over no elements

culprit:
return reduce_sum(x, output_shape=output_shape) * (output_shape.size / x.shape.size)

Desired behavior:

  • when reduced dimension is size 0, should return a tensor of NaNs
  • more importantly, when reduced dimension is non-zero, should just return a new tensor of size zero

related bug: division by 0 shouldn't crash, should return +- inf
relevant line: return ScalarMultiplyOperation(x1, 1.0 / x2).outputs[0]

package published to pypi is broken?

cluster@master0:~/diseaseTools$ clear
cluster@master0:~/diseaseTools$ docker run -it python:3.6-jessie sh
# pip install mesh-tensorflow

Collecting mesh-tensorflow
  Downloading https://files.pythonhosted.org/packages/7b/9a/8f46d2bf6ecc8f622a4d3a7a9838c340bf0e6523a2bfc2a56a0ce870d2d8/mesh_tensorflow-0.0.1-py2.py3-none-any.whl
Collecting six (from mesh-tensorflow)
  Downloading https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
Collecting future (from mesh-tensorflow)
  Downloading https://files.pythonhosted.org/packages/00/2b/8d082ddfed935f3608cc61140df6dcbf0edea1bc3ab52fb6c29ae3e81e85/future-0.16.0.tar.gz (824kB)
    100% |████████████████████████████████| 829kB 21.6MB/s
Building wheels for collected packages: future
  Running setup.py bdist_wheel for future ... done
  Stored in directory: /root/.cache/pip/wheels/bf/c9/a3/c538d90ef17cf7823fa51fc701a7a7a910a80f6a405bf15b1a
Successfully built future
Installing collected packages: six, future, mesh-tensorflow
Successfully installed future-0.16.0 mesh-tensorflow-0.0.1 six-1.11.0
# # python
Python 3.6.6 (default, Oct 16 2018, 07:22:54)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mesh_tensorflow as mtf
>>> mtf.Graph()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'mesh_tensorflow' has no attribute 'Graph'
>>> mtf.__path__
['/usr/local/lib/python3.6/site-packages/mesh_tensorflow']
>>> quit()
# ls /usr/local/lib/python3.6/site-packages/mesh_tensorflow
__init__.py  __pycache__  import_test.py
#

as shown there is nothing inside the package.

When I do the equivalent with the dev install, pip install -e "git+https://github.com/tensorflow/mesh.git#egg=mesh-tensorflow" things work.

Incorrect tensorflow dependency requirements

In setup.py, the tensorflow requirement is >=1.15. However, in mesh_tensorflow.utils:

with tf.summary.create_file_writer(model_dir).as_default():

here tf.summary is a 2.0 module. so when using gin config

utils.tpu_estimator_model_fn.tpu_summaries = True

it throws an error with tensorflow 1.15:

image

Non autoregressive Predict and Evaluate doesn’t Work

Hi,

I am using Google T5 library which is based on TensorFlow mesh for training a non-autoregressive model like Bert.

The training running without a problem, but both the prediction and the evaluation don't work because of the Unitransformer model expects only an autoregressive model for decoding.

ERROR:tensorflow:Error recorded from prediction_loop: must be autoregressive
  In call to configurable 'sample_autoregressive' (<function Unitransformer.sample_autoregressive at 0x7f2a3276a620>)
INFO:tensorflow:prediction_loop marked as finished
WARNING:tensorflow:Reraising captured error
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-86647c2a14e0> in <module>()
      2 model.eval(
      3     mixture_or_task_name="ss3",
----> 4     checkpoint_steps="all"
      5 )

31 frames
/usr/local/lib/python3.6/dist-packages/mesh_tensorflow/transformer/transformer.py in sample_autoregressive(self, partial_sequences, stop_at_token, max_steps, temperature, variable_dtype, encoder_output, encoder_sequence_id, encoder_inputs, shared_params, has_partial_sequences, encoder_layer_outputs, never_end, remove_partial_sequences, sampling_keep_top_k)
    778     """
    779     if not self.autoregressive:
--> 780       raise ValueError("must be autoregressive")
    781 
    782     inputs = partial_sequences

ValueError: must be autoregressive
  In call to configurable 'sample_autoregressive' (<function Unitransformer.sample_autoregressive at 0x7f2a3276a620>)
  In call to configurable 'decode' (<function decode at 0x7f2a3270eb70>)

This is the gin file, that I have used:

import mesh_tensorflow.optimize
import mesh_tensorflow.transformer.learning_rate_schedules
import mesh_tensorflow.transformer.transformer_layers
import t5.models.mesh_transformer
import t5.data.sentencepiece_vocabulary

# Macros:
# ==============================================================================
d_ff = 3072
d_kv = 64
d_model = 768
dropout_rate = 0.1
MIXTURE_NAME = 'ss3'
num_heads = 12
num_layers = 12
model_parallelism = 1
split= "train"
tokens_per_batch = 65536

# Parameters for AdafactorOptimizer:
# ==============================================================================
AdafactorOptimizer.beta1 = 0.0
AdafactorOptimizer.clipping_threshold = 1.0
AdafactorOptimizer.decay_rate = None
AdafactorOptimizer.epsilon1 = 1e-30
AdafactorOptimizer.epsilon2 = 0.001
AdafactorOptimizer.factored = True
AdafactorOptimizer.min_dim_size_to_factor = 128
AdafactorOptimizer.multiply_by_parameter_scale = True

# Parameters for denoise:
# ==============================================================================
denoise.inputs_fn = @preprocessors.noise_span_to_unique_sentinel
denoise.noise_density = 0.15
denoise.noise_mask_fn = @preprocessors.iid_noise_mask
denoise.targets_fn = @preprocessors.nonnoise_span_to_unique_sentinel

# Parameters for DenseReluDense:	
# ==============================================================================	
DenseReluDense.dropout_rate = %dropout_rate	
DenseReluDense.hidden_size = %d_ff	

# Parameters for drop_noise_tokens:	
# ==============================================================================	
# None.	

# Parameters for drop_nonnoise_tokens:	
# ==============================================================================	
# None.

# Parameters for get_dataset:
# ==============================================================================

# Parameters for get_sentencepiece_model_path:
# ==============================================================================
get_sentencepiece_model_path.mixture_or_task_name = %MIXTURE_NAME

# Parameters for get_variable_dtype:
# ==============================================================================
get_variable_dtype.activation_dtype = 'bfloat16'

# Parameters for iid_noise_mask:
# ==============================================================================
# None.

# Parameters for LayerStack:
# ==============================================================================
LayerStack.dropout_rate = %dropout_rate	
LayerStack.norm_epsilon = 1e-06

# Parameters for learning_rate_schedule_noam:
# ==============================================================================
learning_rate_schedule_noam.linear_decay_fraction = 0.0
learning_rate_schedule_noam.multiplier = 1.0
learning_rate_schedule_noam.offset = 0
learning_rate_schedule_noam.warmup_steps = 10000

# Parameters for make_layer_stack:
# ==============================================================================
make_layer_stack.block_scope = True	
make_layer_stack.layers = \
    [@mesh_tensorflow.transformer.transformer_layers.SelfAttention,	
     @mesh_tensorflow.transformer.transformer_layers.DenseReluDense]	
make_layer_stack.num_layers = %num_layers

# Parameters for mesh_train_dataset_fn:
# ==============================================================================
mesh_train_dataset_fn.mixture_or_task_name = %MIXTURE_NAME

# Parameters for noise_span_to_unique_sentinel:
# ==============================================================================
# None.

# Parameters for nonnoise_span_to_unique_sentinel:
# ==============================================================================
# None.

# Parameters for pack_dataset:
# ==============================================================================

# Parameters for pack_or_pad:
# ==============================================================================
# None.

# Parameters for rate_num_examples:
# ==============================================================================
rate_num_examples.maximum = 524288
rate_num_examples.scale = 1.0
rate_num_examples.temperature = 1.0

# Parameters for reduce_concat_tokens:
# ==============================================================================
reduce_concat_tokens.batch_size = 128
reduce_concat_tokens.feature_key = 'targets'

# Parameters for run:
# ==============================================================================
run.autostack = True
run.batch_size = ('tokens_per_batch', %tokens_per_batch)
run.dataset_split = %split
run.ensemble_inputs = None
run.eval_checkpoint_step = None
run.eval_dataset_fn = None
run.eval_summary_dir = None
run.export_path = ''
run.iterations_per_loop = 100
run.keep_checkpoint_max = None
run.layout_rules = \
    'ensemble:ensemble,batch:batch,d_ff:model,heads:model,vocab:model,experts:batch'
run.learning_rate_schedule = @learning_rate_schedules.learning_rate_schedule_noam
run.mesh_shape = @mesh_tensorflow.transformer.utils.tpu_mesh_shape()
run.mode = 'train'
run.model_type = 'aligned'
run.optimizer = @optimize.AdafactorOptimizer
run.perplexity_eval_steps = 10
run.predict_fn = None
run.save_checkpoints_steps = 5000
run.sequence_length = {'inputs': 512, 'targets': 512}
run.train_dataset_fn = @t5.models.mesh_transformer.mesh_train_dataset_fn
run.train_steps = 786432
run.variable_filter = None
run.vocabulary = @t5.data.sentencepiece_vocabulary.SentencePieceVocabulary()

# Parameters for select_random_chunk:
# ==============================================================================
select_random_chunk.feature_key = 'targets'
select_random_chunk.max_length = 65536

# Parameters for SelfAttention:
# ==============================================================================
SelfAttention.attention_kwargs = None	
SelfAttention.dropout_rate = %dropout_rate	
SelfAttention.key_value_size = %d_kv	
SelfAttention.num_heads = %num_heads	
SelfAttention.num_memory_heads = 0	
SelfAttention.relative_attention_num_buckets = 32	
SelfAttention.relative_attention_type = 'bias_shared'	
SelfAttention.shared_kv = False

# Parameters for SentencePieceVocabulary:
# ==============================================================================
SentencePieceVocabulary.extra_ids = 100
SentencePieceVocabulary.sentencepiece_model_file = \
    @t5.models.mesh_transformer.get_sentencepiece_model_path()

# Parameters for serialize_num_microbatches:
# ==============================================================================
serialize_num_microbatches.tokens_per_microbatch_per_replica = 2048

# Parameters for split_tokens:
# ==============================================================================
split_tokens.feature_key = 'targets'
split_tokens.min_tokens_per_segment = None

# Parameters for split_tokens_to_inputs_length:
# ==============================================================================
# None.

# Parameters for tpu_estimator_model_fn:
# ==============================================================================
tpu_estimator_model_fn.outer_batch_size = 1
tpu_estimator_model_fn.tpu_summaries = False

# Parameters for tpu_mesh_shape:
# ==============================================================================
tpu_mesh_shape.ensemble_parallelism = None
tpu_mesh_shape.model_parallelism = %model_parallelism
tpu_mesh_shape.tpu_topology = %tpu_topology

# Parameters for Unitransformer:
# ==============================================================================
Unitransformer.d_model = %d_model	
Unitransformer.ensemble = None	
#Unitransformer.input_full_attention = True	
Unitransformer.label_smoothing = 0.0	
Unitransformer.loss_denominator = None	
Unitransformer.loss_fn = None	
Unitransformer.loss_on_targets_only = False	
Unitransformer.max_length = 512	
Unitransformer.name = 'transformer'	
Unitransformer.positional_embedding = True	
Unitransformer.shared_embedding_and_softmax_weights = True	
Unitransformer.vocab_divisor = 128	
Unitransformer.z_loss = 0.0001

# Parameters for unsupervised:
# ==============================================================================
unsupervised.preprocessors = \
    [@preprocessors.select_random_chunk,
     @preprocessors.reduce_concat_tokens,
     @preprocessors.split_tokens_to_inputs_length,
     @preprocessors.denoise]

Is there a solution for that or currently the non-autoregressive doesn't work for eval and predict ?

Mesh Tensorflow requires `tensorflow.python.tpu.ops `?

I am running an experiment that requires:

  • tensorflow==1.13.1 or tensorflow-gpu==1.13.1
  • tensor2tensor==1.11.0

In tensor2tensor==1.11.0 and mesh-tensorflow==0.1.1, it imports mesh_tensorflow which further imports tensorflow.python.tpu.ops :

import mesh_tensorflow as mtf
#File "/usr/local/lib/python3.6/dist-packages/mesh_tensorflow/__init__.py", line 26, in <module>     
from mesh_tensorflow import simd_mesh_impl
  #File "/usr/local/lib/python3.6/dist-packages/mesh_tensorflow/simd_mesh_impl.py", line 32, in <module>
from tensorflow.python.tpu.ops import tpu_ops  # pylint: disable=g-direct-tensorflow-import
#ModuleNotFoundError: No module named 'tensorflow.python.tpu'  

In my version of TF 1.13.1 there is no tensorflow.python.tpu. Any way to fix this error? Which version of mesh_tensorflow should I downgrade to?

Split along layers

Is it possible to split it such that layers are split along some dimension of the mesh too?

For example:

Mesh shape: x:16,y:32
Layout: layers: x, hidden: y

If I had 32 layers, for example, I'd like the result to have 2 layers on the first slice of x, 2 layers on the next slice, etc. Ideally, something like GPipe where the forward and backward passes are pipelined so that 15/16ths of the devices don't sit idle would be preferable, but even being able to do the split naïvely would be useful.

[Bug Fix] Evaluation and Prediction for Aligned model

Hello,

Both evaluation and prediction currently not working with the aligned model "Bert Style".

I have fixed this issue by adding a new if statement in "transformer/utils.py":

    elif mode == tf.estimator.ModeKeys.PREDICT:
      inputs = mtf_features["inputs"]
      if predict_fn:
        mtf_samples = predict_fn(
            model=transformer_model,
            features=mtf_features,
            variable_dtype=get_variable_dtype())
      elif isinstance(transformer_model, transformer.Unitransformer) and model_type == 'aligned':
        # pad so that there is enough room for the targets
        inputs = mtf.pad(
            inputs, [0, sequence_length["targets"]], length_dim.name)
        logits, _ = transformer_model.call_simple(
            inputs=inputs, variable_dtype=get_variable_dtype(),
            compute_loss=False,
            mode=tf.estimator.ModeKeys.PREDICT)

        label_c_dim = mtf.Dimension('vocab', 256)
        mtf_samples = mtf.argmax(logits, label_c_dim)

As well as "transformer/transformer.py" needs to be modified :

  def call_simple(self,
                  inputs = None,
                  targets = None,
                  compute_loss = False,
                  mode=tf.estimator.ModeKeys.TRAIN,
                  variable_dtype=mtf.VariableDType(tf.float32),
                  sequence_id=None,
                  subsequence_id=None,
                  position=None,
                  encoder_output=None,
                  encoder_sequence_id=None,
                  encoder_inputs=None,
                  shared_params=None,
                  layer_outputs=None,
                  encoder_layer_outputs=None,
                  num_microbatches=1):

The only thing that I am currently defining manually is "label_c_dim".
@adarob @craffel @nshazeer It will be great if you could merge my code or defining a better solution and find an automatic way to find the vocab size for "label_c_dim".

The memory consumption does not include the backwards phase?

Dear authors,

I have read the code of auto-mesh. I found that when calculating the memory consumption given a schedule, it only included the consumption by the forward phase, but did not include the backward phase. This confused me, because the backpropagation also produces new data in memory.

Is there something I missed? or you did this way on purpose?

thanks for your answering,
Xiaoda

Can you go across multiple nodes?

Is it possible to use devices that are on different machines? For example, in Horovod I can specify the IP addresses of multiple machines and do data parallelism across them. However, this requires me to specifically have MPI setup on each machine. It's unclear to me if this can be done with TF Mesh. Maybe with a tf.train.clusterspec and the parameter server model??

Thanks.
-Tony

Running on multiple GPU

Hello I am trying to run the mnist python code in example section. When I tried to run them I observed they are using only 1 GPU, for all the three data parallelism, model parallelism , data and model parallelism. How can I make them to run on multiple GPU's.

Capture performance profile using Tensorboard

I would like to debug training/fine-tuning performance of mesh transformer on CPU/GPU.
Is it possible to capture performance profile using Tensorboard?
If so, is there an example or tutorial that I can follow?

README Questions

Hi there,

Thanks for creating this framework. I was trying to run the transformer example provided in the README.md and I realized some files are missing in the repository.
Could you please update those files?

For example, examples/transformer_standalone.py is missing. I looked at the commit history and still could not find it. Seems like it missed to push in.

python examples/transformer_standalone.py --tpu=$TPU --data_dir=$DATA_DIR --model_dir=$MODEL_DIR --gin_file=$MODEL --gin_file=$LAYOUT --gin_param="run.mode='train'"

Version:
Tensorflow : v1.13
mesh-tensorflow : head of the repo.

(Sorry I could not label as per contribution guidelines as the permissions are not available to do. )

Support for MultiworkerMirroredStrategy?

Is it possible to incorporate MultiworkerMirroredStrategy into Mesh TF? I would like to run model + data parallelism on a supercomputer that has multiple GPUs on multiple nodes.

It seems that, by default, MultiworkerMirroredStrategy uses all possible GPUs and replicates the model across nodes, making model parallelism by Mesh TF difficult to run on multiple nodes.

Preventing leak in packed sequences

When packing is done here https://github.com/tensorflow/mesh/blob/6a812c8bb847e081e976533ed497c7c5016bb1ec/mesh_tensorflow/transformer/dataset.py
Each packed sequence has multiple examples ("segments"). I'm trying to figure out where do you prevent information to leak between these examples (e.g in attention).

I came across this

def attention_mask_same_segment(

But I see it is not used anywhere.

I can't seem to find where the information leak is prevented elsewhere. Can you clarify?

GPipe vs mesh?

Any comments about GPipe which was supposed to be open sourced by Google soon?

Looks like both GPipe and Mesh can do model/data parallelism.

Distributed Mesh-TF

I want to run mnist.py example via mpirun to use devices from different nodes, ¿it is possible actually?

Add mtf-nightly

We need a nightly package so that, for example, Tensor2Tensor's open source does not break when it runs Travis builds using the latest functionality here.

Layers and Session Support

  1. I have an image classification model defined in Keras that I'm attempting to parallelize with MTF. However, it's not clear to me whether MTF support exists for keras.layers/tf.layers or if I'll need to recreate my model in MTF. Does MTF support keras.layers or tf.layers?

  2. Does MTF exclusively use sessions for training or is there support for TF 2.0 eager execution?

If the answer is "no" to either of the above questions, is there any plan to add support in the future?

mixed precision support on GPUs

Hi,
To speed up training on V100 GPUs, I'd like to run mesh tf using mixed precision. While TensorFlow has an easy to use automatic mixed precision feature, it requires the optimizer to be a tf.train.Optimizer. This won't work on mesh tf's optimizers.

My question is: how can I use mixed precision on GPUs with mesh tf? If not supported yet, can you add some support for this? Thanks.

Regarding change in code that will convert layout to use both model and data parallelism

mesh_shape = [("processor_rows", 2), ("processor_cols", 2)]
layout_rules = [("batch", "processor_rows"), ("hidden", "processor_cols")]

The above code change is mentioned to be using both model and data parallelism. But we will get "mesh_size error". So we need to change the value for mesh_size also. It should be *mesh_size=len(mesh_shape)len(mesh_shape[0])

Mesh overlapping is allowed ?

Hi, does MTF support overlapped meshes? For example, for a NN model with 6 layers, I want to parallelize three first layers with 1d mesh and three remaining with 2d mesh. These two meshes are overlapping on a 4 devices. If it's not allow in MTF, is there any solution to do that?

Communication Between TPU Cores and Encoder->Reduce->Decoder Pattern

My understanding from the readme is that there is some flexibility in the TPU Mesh, but all operations must replicated on all TPU cores.

Will there ever be support for reducing an encoder split across 8 cores to run a decoder on a single core?

Effectively, the graph would take an input of (cores * bs, other shapes) and the output would simply be (1, other shapes). A example usage would be encoding a set of tweets and outputting a single summary.

Finetuning a `bfloat16` checkpoint with `float32`

I'm trying to fine-tuning a released T5 checkpoint in float32,
but I get the following error:

2020-09-03 16:33:42.380962: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Invalid argument: tensor_name =
/block_018/layer_002/layer_norm/scale; expected dtype float does not equal original dtype bfloat16

Is what I'm trying to do supported? These are the relevant parts I set:
--gin_param="get_variable_dtype.activation_dtype = 'float32'"
--gin_param="get_variable_dtype.master_dtype = 'float32'"
--gin_param="get_variable_dtype.slice_dtype = 'float32'"
--gin_file="gs://t5-data/pretrained_models/3B/operative_config.gin"

(We explicitly want float32)

PROBLEM=./mesh_tensorflow/transformer/gin/problems/lm1b.gin

Line 13 in lm1b.gin "dataset.get_tfds_vocabulary.dataset_name = %dataset_name"
causes an error

There is no function named "get_tfds_vocabulary"
in /mesh_tensorflow/transformer/dataset.py

To fix the error the line can be replaced with
"vocabulary.get_tfds_vocabulary.dataset_name=%dataset_name"

tf2 in mesh_tensorflow/utils.py incompatible with tensor2tensor/rl

I'm wondering if tf2 is absolutely needed in mesh_tensorflow/utils.py? I'm trying to reproduce on the provided Google colab https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/rl
with tensorflow 1.13.1 and T2T 1.13.1 (the recommended config), but I got stuck at line 26 import tensorflow.compat.v2 as tf2 because I'm using tensorflow 1.13.1

Would it be possible to make mesh_tensorflow compatible with tensorflow v1?

Convolution layers in mesh tensorflow

I like to run the following Keras example deduced from here

# 1D CNN neural network
model_m = Sequential()
model_m.add(Reshape((TIME_PERIODS, num_sensors), input_shape=(input_shape,)))
model_m.add(Conv1D(100, 10, activation='relu', input_shape=(TIME_PERIODS, num_sensors)))
model_m.add(Conv1D(100, 10, activation='relu'))
model_m.add(MaxPooling1D(3))
model_m.add(Conv1D(160, 10, activation='relu'))
model_m.add(Conv1D(160, 10, activation='relu'))
model_m.add(GlobalAveragePooling1D())
model_m.add(Dropout(0.5))
model_m.add(Dense(num_classes, activation='softmax'))

on more than 1 machine (maybe having two nodes of CPUs each having multiple cores). Can I use mesh_tensorflow graphs for convolutional layers?
I like to apply both data and spatial parallelism on this example (maybe on a bigger data) on two identical machines. Would you please help me with this? I couldn't find many examples of using TFMesh.
Thanks

Question on params['context']

In the toy_model_tpu.py exampe, params['context'] is used to understand device assignments and host placements. Where is its value populated?

def model_fn(features, labels, mode, params):
...
if FLAGS.use_tpu:
ctx = params['context']

Support for training with multiple TPUs

The mtf_transformer in Tensor2Tensor defaults to a mesh configuration for TPUs that uses 32 cores or 4 Cloud TPUs. I wasn't able to find documentation on utilizing more than a single Cloud TPU, but I tried it anyway with TPU_NAME=grpc://tpu0:8470,grpc://tpu1:8470 and got an error:

*** InternalError: Invalid system configuration: 1x1 host topology with 0 missing hosts, but 2 hosts in total.

I am using TF 1.11.0 and the meshTF in Tensor2Tensor 1.9.0, for compatibility with Cloud TPU.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.