n3pdf / vegasflow Goto Github PK

View Code? Open in Web Editor NEW

32.0 6.0 8.0 8.72 MB

VegasFlow: accelerating Monte Carlo simulation across multiple hardware platforms

Home Page: https://vegasflow.readthedocs.io

License: Apache License 2.0

Python 98.92% Shell 1.08%

monte-carlo-integration machine-learning gpu-acceleration tensorflow monte-carlo efficiency

vegasflow's People

Contributors

Stargazers

Watchers

Forkers

ginnocen ianabc martaprivitera anasbahou rbvh carloarpini dlperf jackson-hou

vegasflow's Issues

Add a progressbar

This is a bit tricky as most of the calculation is done in parallel, but at least having the option of seeing "10 out of 20 chunks completed" just to have the feeling something is happening is nice.

References for integrands

Here a list of interesting integrands:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.3141&rep=rep1&type=pdf

Add examples of using VegasFlow as a "clever" random number generator.

Even though it is designed as an integrator, there is an use case for which VegasFlow can be "abused" which is using it as a random number generator following a certain distribution.

In order to do that one has to train the integrator with a target function and then just call .make_random() or whatever. This might be particularly relevant as we add more integration algorithms to the framework.

rpy to tensor

I'm beginner. Im running a MC integration to compute the CDF of a vinecopula model (involving 6 different variables). However, I have a lack of speed and efficency in the process. Tackling it, I'm trying to perform my model using VEGASFLOW, but I'm not sure how can I convert a rpy object to tensor, and also how can I set the limits of integration .

Here is my code using VEGAS:

`@vegas.batchintegrand
def integrate_fun(k): # Calculate VineCopula pdf
pdf_vine = np.array(VineCopula.RVinePDF(k, θ))
return pdf_vine

def vegas_6D(θ,u,v,w,x,y,z):
integ = vegas.Integrator([[0, u], [0, v], [0, w], [0, x], [0, y], [0, z]])
result = integ(integrate_fun, nitn = 10, neval = 10000)
return result.mean`

where u,v,w,x,y,z are: np. linspace between 0 and 1

Thank you in advance for your help

Accuracy of integration as target

The problem

We would like to integrate a function for zfit that integrates a (blackbox) function up to a certain accuracy. So we are not interested in the number of dimensions but would like the algorithm to continue until the accuracy is reached (or maybe a max_iter value). Is that too much black-magic? Or what do you think?

Proposed solution

So far I am not quite sure on the internals and maybe it's also worth to have a new high level function for this logic.

Are you available/want to contribute?

Yes

Example with histograms

It would be good to have an example where histograms are used.

Release of v 1.2

To Do

Changelog

The version 1.2 brings some improvements, the most outstanding being

Fine grained log control #57 [docs].
Distribution of jobs using dask #55 [docs].

We have also included several new use cases

pineappl integration [example].
slurm distribution [example].

Minor improvements of version 1.2 are

Keyword simplify_signature #58 [docs].
VegasFlow, vegas_wrapper, MonteCarloFlow, int_me, float_me can be imported from vegasflow, ex.

from vegasflow import VegasFlow

The base type for float (DTYPE) and int (DTYPEINT) can be controlled with environment variables #58 [docs].
Adds the utility function generate_condition_function #60 [docs].
The integrators now have the set_seed method for seeding the random number generator [docs].

Let me know if I'm forgetting anything.

Stability and future development of package

We're developing a general purpose model fitting library mainly based on TensorFlow, called zfit, with a focus on High Energy Physics application. Since models can be multidimensional (up to 10 dims) and often lack an analytic integral, we've been looking for efficient MC integration packages. VEGAS is of course a well proven algorithm, so vegasflow seems like an interesting package. However, we were wondering about:

what is the stability of the package? Do you expect many changes?
would you be interested to "help contributing it" into zfit by guiding which parts are stable and provide some expert knowledge on when to best use it?
What are the future plans for the package, will it be maintained? By whom and how long?
is there any comparison with the i-flow package, which claims to be more efficient than the VEGAS algorithm?

Log control

Implement the log control implemented in pdfflow (basically, copy it here). Mainly to take control of the tensorflow logs which make our logs look ugly.

Comparison to other integrators.

Lepage's Vegas

Webpage: https://vegas.readthedocs.io/en/latest/
This is the most obvious one. Once we have all the examples merged to the main branch we can have a wrapper around it that takes them all and compares them to Vegas. It doesn't make sense to do it one by one.

Pineappl example is obsolete

The python interface for pineappl has changed a lot so the example should be updated otherwise it won´t work.

Upload package to PyPI

Specifying integration limits

Hi vegasflow developers!

Can I specify integration (upper and lower) limits in vegasflow? I searched the documentation but didn't found a way.

Enable multi-device support for Vegas+

The problem

Currently Vegas+ does not support running in multiple GPUs, the reason is the variable number of events, which makes it impossible to know before hand how many events should be sent to each GPU.
#64 (comment)

Proposed solution

A possible solution would be to re-distribute the number of events per-device before each iteration.
This would introduce some overhead, but Vegas+ is expected to be used in scenarios where the integrand is complicated (otherwise you can just throw more points in by using Vegas) so it should be fine.

Freeze grid

AMD issue

VegasFlow doesn't scale particularly well in AMD CPUs, a thing that improves somewhat the results is using conda's version of tensorflow-mkl. The difference is most obvious for 1-4 threads, after that it reaches a strange plateau.

The multithreading problem might be due to the fact the only AMD CPU we had for testing was this beast. For smaller-scale AMD CPU with a more reasonable number of CPU cores this fix could go a long way:

conda install mkl -c intel --no-update-deps
conda install tensorflow

Before:

After:

(note it should say "cores" in plural)

Multi-GPU parallelism

After a more careful reading of https://www.tensorflow.org/guide/gpu, and testing some operators like matmul with large matrices, I realized that TF doesn't use all available GPUs automatically, i.e.:

it always allocates the model on GPU0
the tf operators do not have multi gpu implementation

Thus for this project we have to consider splitting manually the n_iter or n_calls across the available tf.devices so we may get a factor nGPUs faster. We may also consider adding the CPU0 together with GPUs.

So need to:

confirm my observation (I may be forgetting something trivial)
implement an algorithm for job balancing, which devides the workload of each device.
apply the job distribution inside vegas

Benchmarks of GPUs

The Titan V and the RTX seem to produce exactly the same results.
In the past we found the Titan V to be faster, but I'm starting to thing this could be due to the fact that the RTX was connected through the x1. I cannot find a commit in which the Titan V is faster.

This includes commits around 08d2b7a for which we used to see the reference... but it was the x16...

I am confused though, why is the Titan V not overwhelming the RTX in double precision? Is this due to TensorFlow?

Implement checkpoints for the Vegas grid

We need to implement two functions:
save_grid(file_name)

which should write down a file with divisions array (I would do human-readable, but it is not important)

load_grid(file_name)

which loads the file and resets the divisions array to be that one.

We can also add metadata to the file if we want, so ideally a json file with some data. Easy-peasy but needs to be done.

`tensorflow-metal` is not supported, errors when using tf with GPU on a mac - Some of the examples on doc does not work

Description

Hi, I'm trying to execute some of the examples that you have on your documentation. However, it's having some device-related issues. I don't have NVIDIA GPU (it's an M1 Mac), but it automatically sets the device to GPU even though the device seems to be set to False.

print(sampler.devices)
# {'/device:GPU:0': False}

Is it possible to change the use to CPU instead? I couldn't find a set device function in Vegasflow. Should it be done through tensorflow?

Thanks for your help.
Cheers

Code example

from vegasflow import VegasFlow, run_eager
import tensorflow as tf

run_eager(True)

def my_complicated_fun(xarr, **kwargs):
  return tf.reduce_sum(xarr, axis=1)

n_dim = 10
n_events = int(1e5)
sampler = VegasFlow(n_dim, n_events, verbose=False)
sampler.compile(my_complicated_fun)

# Now let's train the integrator for 10 iterations
_ = sampler.run_integration(10)

# Now we can use sampler to generate random numbers
rnds, _, px = sampler.generate_random_array(100)

Additional information

import tensorflow as tf
import vegasflow

print(f"Vegasflow: {vegasflow.__version__}")
# Vegasflow: 1.3.0
print(f"Tensorflow: {tf.__version__}")
# Tensorflow: 2.14.0
print(f"tf-mkl: {tf.python.framework.test_util.IsMklEnabled()}")
# AttributeError: module 'tensorflow' has no attribute 'python'
print(f"tf-cuda: {tf.test.is_built_with_cuda()}")
# tf-cuda: False
print(f"GPU available: {tf.test.is_gpu_available()}")
# GPU available: True

Traceback

[INFO] (vegasflow.monte_carlo) Checking whether the integrand outputs the correct shape (note, this will run a very small number of events and potentially trigger a retrace)
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
Cell In[103], line 15
     12 sampler.compile(my_complicated_fun)
     14 # Now let's train the integrator for 10 iterations
---> 15 _ = sampler.run_integration(10)
     17 # Now we can use sampler to generate random numbers
     18 rnds, _, px = sampler.generate_random_array(100)

File ~/packages/miniconda3/lib/python3.9/site-packages/vegasflow/monte_carlo.py:682, in MonteCarloFlow.run_integration(self, n_iter, log_time, histograms)
    679     start = time.time()
    681 # Run one single iteration and append results
--> 682 res, error = self._run_iteration()
    683 all_results.append((res, error))
    685 # If there is a histogram variable, store it and empty it

File ~/packages/miniconda3/lib/python3.9/site-packages/vegasflow/vflow.py:447, in VegasFlow._run_iteration(self)
    445 def _run_iteration(self):
    446     """Runs one iteration of the Vegas integrator"""
--> 447     return self._iteration_content()

File ~/packages/miniconda3/lib/python3.9/site-packages/vegasflow/vflow.py:442, in VegasFlow._iteration_content(self)
    440 # If training is active, act post integration
    441 if self.train:
--> 442     self.refine_grid(arr_res2)
    443 return res, sigma

File ~/packages/miniconda3/lib/python3.9/site-packages/vegasflow/vflow.py:363, in VegasFlow.refine_grid(self, arr_res2)
    361 for j in range(self.n_dim):
    362     new_divisions = refine_grid_per_dimension(arr_res2[j, :], self.divisions[j, :])
--> 363     self.divisions[j, :].assign(new_divisions)

File ~/packages/miniconda3/lib/python3.9/site-packages/tensorflow/python/ops/array_ops.py:1359, in strided_slice.<locals>.assign(val, name)
   1356 if name is None:
   1357   name = parent_name + "_assign"
-> 1359 return var._strided_slice_assign(
   1360     begin=begin,
   1361     end=end,
   1362     strides=strides,
   1363     value=val,
   1364     name=name,
   1365     begin_mask=begin_mask,
   1366     end_mask=end_mask,
   1367     ellipsis_mask=ellipsis_mask,
   1368     new_axis_mask=new_axis_mask,
   1369     shrink_axis_mask=shrink_axis_mask)

File ~/packages/miniconda3/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py:1523, in BaseResourceVariable._strided_slice_assign(self, begin, end, strides, value, name, begin_mask, end_mask, ellipsis_mask, new_axis_mask, shrink_axis_mask)
   1518 def _strided_slice_assign(self, begin, end, strides, value, name, begin_mask,
   1519                           end_mask, ellipsis_mask, new_axis_mask,
   1520                           shrink_axis_mask):
   1521   with _handle_graph(self.handle), self._assign_dependencies():
   1522     return self._lazy_read(
-> 1523         gen_array_ops.resource_strided_slice_assign(
   1524             ref=self.handle,
   1525             begin=begin,
   1526             end=end,
   1527             strides=strides,
   1528             value=ops.convert_to_tensor(value, dtype=self.dtype),
   1529             name=name,
   1530             begin_mask=begin_mask,
   1531             end_mask=end_mask,
   1532             ellipsis_mask=ellipsis_mask,
   1533             new_axis_mask=new_axis_mask,
   1534             shrink_axis_mask=shrink_axis_mask))

File ~/packages/miniconda3/lib/python3.9/site-packages/tensorflow/python/ops/gen_array_ops.py:8826, in resource_strided_slice_assign(ref, begin, end, strides, value, begin_mask, end_mask, ellipsis_mask, new_axis_mask, shrink_axis_mask, name)
   8824   return _result
   8825 except _core._NotOkStatusException as e:
-> 8826   _ops.raise_from_not_ok_status(e, name)
   8827 except _core._FallbackException:
   8828   pass

File ~/packages/miniconda3/lib/python3.9/site-packages/tensorflow/python/framework/ops.py:5888, in raise_from_not_ok_status(e, name)
   5886 def raise_from_not_ok_status(e, name) -> NoReturn:
   5887   e.message += (" name: " + str(name if name is not None else ""))
-> 5888   raise core._status_to_exception(e) from None

InvalidArgumentError: Cannot assign a device for operation ResourceStridedSliceAssign: Could not satisfy explicit device specification '/job:localhost/replica:0/task:0/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
ResourceStridedSliceAssign: CPU 
_Arg: GPU CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  ref (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  ResourceStridedSliceAssign (ResourceStridedSliceAssign) /job:localhost/replica:0/task:0/device:GPU:0

Op: ResourceStridedSliceAssign
Node attrs: new_axis_mask=0, Index=DT_INT32, shrink_axis_mask=1, end_mask=2, ellipsis_mask=0, begin_mask=2, T=DT_DOUBLE
Registered kernels:
  device='XLA_CPU_JIT'; Index in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_FLOAT8_E5M2, DT_FLOAT8_E4M3FN]
  device='DEFAULT'; T in [DT_INT32]
  device='CPU'; T in [DT_UINT64]
  device='CPU'; T in [DT_INT64]
  device='CPU'; T in [DT_UINT32]
  device='CPU'; T in [DT_UINT16]
  device='CPU'; T in [DT_INT16]
  device='CPU'; T in [DT_UINT8]
  device='CPU'; T in [DT_INT8]
  device='CPU'; T in [DT_INT32]
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_BFLOAT16]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  device='CPU'; T in [DT_COMPLEX64]
  device='CPU'; T in [DT_COMPLEX128]
  device='CPU'; T in [DT_BOOL]
  device='CPU'; T in [DT_STRING]
  device='CPU'; T in [DT_RESOURCE]
  device='CPU'; T in [DT_VARIANT]
  device='CPU'; T in [DT_QINT8]
  device='CPU'; T in [DT_QUINT8]
  device='CPU'; T in [DT_QINT32]

	 [[{{node ResourceStridedSliceAssign}}]] [Op:ResourceStridedSliceAssign] name: strided_slice/_assign

Add documentation for Vegas+

Before tagging the 1.3 release, documentation for Vegas+ (PR #64) should be added.

Performance issue in src/vegasflow/vflow.py

Hello! Our static bug checker has found a performance issue in src/vegasflow/vflow.py: refine_grid_per_dimension is repeatedly called in a for loop, but there are tf.function decorated functions while_check and while_body defined and called in refine_grid_per_dimension.

In that case, when refine_grid_per_dimension is called in a loop, the functions while_check and while_body will create new graphs every time, and that can trigger tf.function retracing warning.

Here is the tensorflow document to support it.

Briefly, for better efficiency, it's better to use:

@tf.function
def inner():
    pass

def outer():
    inner()

than:

def outer():
    @tf.function
    def inner():
        pass
    inner()

Looking forward to your reply. Btw, I am glad to create a PR to fix it if you are too busy.

Simplifying signature not working for PlainFlow

Description

I'm trying to run an integration using PlainFlow on a function that only takes the data tensor as an argument. To achieve this, I'm passing the keyword simplify_signature=True to the constructor. However, this leads to an error about unexpected arguments, which is the same output as when the keyword is omitted.

I'm assuming this is a bug, since the docs mention the keyword explicitly (https://vegasflow.readthedocs.io/en/latest/intalg.html?highlight=simplify_signature#id2) and it works without issue for the VegasFlow constructor.

Code example

import tensorflow as tf
from vegasflow import PlainFlow

def integrand(x):
    return tf.reduce_prod(x, axis=1)

vegas_instance = PlainFlow(4, int(1e4), simplify_signature=True)
vegas_instance.compile(integrand)
vegas_instance.run_integration(5)

This produces the following error

TypeError: integrand(x) got unexpected keyword argument `n_dim`

Additional information

I'm running on CPU with the following versions
Vegasflow: 1.2.0
Tensorflow: 2.5.0

Create a conda package

Add Arxiv link to readme

https://arxiv.org/abs/2002.12921

differentiability/run inside tf.function decorated function

AFAIK, the integration cannot be performed within an actual graph. It is also not differentiable as it returns a float.

Both would be quite good to see in the library (such as in torchquad).

To give an idea, using the example from the readme:

Details

from vegasflow import vegas_wrapper
import tensorflow as tf

def integrand(x, **kwargs):
""" Function:
x_{1} * x_{2} ... * x_{n}
x: array of dimension (events, n)
"""
return tf.reduce_prod(x, axis=1)

dimensions = 8
iterations = 5
events_per_iteration = int(1e5)
vegas_wrapper(integrand, dimensions, iterations, events_per_iteration, compilable=True)

neither

jitted = tf.function(vegas_wrapper)
vegas_wrapper(integrand, dimensions, iterations, events_per_iteration, compilable=True)

nor (deactivating the jit with tf.config.execute_functions_eagerly())

with tf.GradientTape() as tape:
    y = vegas_wrapper(integrand, dimensions, iterations, events_per_iteration, compilable=True)[0]
tape.gradient(a, y)

I think the point is that vegasflow now considers itself as the highest level library. While this can be for some cases, integration can also be used as part of a function (say to integrate out a few dimensions).

What do you think about this? Or is there a way to achieve this already?

Python 3.9 support

Basically at it stands VegasFlow should work with python 3.9 out of the box, the only thing stopping it from working is the fact that TensorFlow (tensorflow/tensorflow#44485) won't support official pip packages for python 3.9 for a while.

The pip package works perfectly fine with python 3.9 so if you have your own installation of Tensorflow working (for instance from your OS vendor) you can simply do

pip install vegasflow --no-deps

Benchmarking Integrands

Some examples:

high dim functions, compare to Vegas.
drell-yan LO vs aMC@NLO
risk analysis MC vs ?

Devices to be tested:

CPUs:
- i7-8850U
- i9 9980XE
- 2990WX
- Gold 6216
- Snapdragon 845
GPUs
- V100
- K40
- Titan V
- RTX 2080ti, 2080
- Radeon VII
- P100
- Adreno 630
- K80
TPU ?

Scientific notation in intermediate result

To avoid scary results that look like 0.00000

Add VegasFlow to the AUR

Given that I am an Arch user and some of our machines use Arch it would make our (mainly mine) lifes easier.

Todos

Software architecture
- create an installation setup
- create an entry point vegas which takes the integrand and the number of events/calls
- try/provide an interface to C code
  - cffi
  - custom op
- try tf-lite
- try tf-js

numpy vs tensorflow integrands

Another point I an trying to understand clearly, is the relationship between a pure numpy integrand (without @tf.function) and the tensorflow integrand. I did some tests with the lepage example and the performance between both looks identical, in particular the CPU usage for both runs are quite similar.

I have also tried to limit the numpy threads to 1 and observe the performance, I didn't get any deterioration, so I have the suspicious that tf converts the objects to tensors when mixing numpy with @tf.function calls

n3pdf / vegasflow Goto Github PK

vegasflow's People

Contributors

Stargazers

Watchers

Forkers

vegasflow's Issues

The problem

Proposed solution

Are you available/want to contribute?

To Do

Changelog

The problem

Proposed solution

Description

Code example

Additional information

Traceback

Description

Code example

Additional information

Recommend Projects

Recommend Topics

Recommend Org