Giter Site home page Giter Site logo

etrommer / torch-approx Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 3.0 1.98 MB

GPU-accelerated Neural Network layers using Approximate Multiplications for PyTorch

Home Page: https://etrommer.de/torch-approx

License: MIT License

C++ 3.48% C 0.88% Python 53.85% Cuda 21.07% Shell 0.21% Jupyter Notebook 20.51%
python approximate-computing convolutional-layers deep-learning fully-connected library machine-learning neural-network paper python3

torch-approx's Introduction

This is where I keep my research/productive work.

For more information about me see my website

Tinkering, partially done projects and non-professional work can be found under my private account

torch-approx's People

Contributors

etrommer avatar jadeaffenjaeger avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

torch-approx's Issues

Layer Noise Mode Interface

The noise mode of the layer currently adds a zero mean tensor with learnable standard deviation.

def stdev(self) -> torch.nn.Parameter:
"""
The relative standard deviation of the Additive Gaussian noise that is added
to the computation output. Scaling is done relative the current batch's standard devitaion.
This is only used when the mode is set to `noise`. It will have no effect in other modes.
"""
return self._stdev
@stdev.setter
def stdev(self, noise_std: float):
self._stdev = torch.nn.Parameter(torch.tensor(noise_std), requires_grad=True)

This is quite specific to the origin of torch-approx as a backend for AGN Approx. To make it more generally useful, this feature should be kept on a separate branch and the noise implementation replaced with a more generic interface that adds Gaussian Noise of a fixed mean and standard deviation to the layer output

@property
def stdev(self) -> float:
"""
Perturbation Error Relative Standard Deviation
Returns:
Currently configured perturbation standard deviation
"""
return self._stdev.item()
@stdev.setter
def stdev(self, val: float):
self._stdev = torch.tensor([val], device=self.weight.device) # type: ignore
@property
def mean(self) -> float:
"""
Perturbation Error mean
Returns:
Currently configured perturbation mean
"""
return self._mean.item()
@mean.setter
def mean(self, val: float):
self._mean = torch.tensor([val], device=self.weight.device) # type: ignore

Accuracy Benchmarking

Benchmarking of model accuracy when retrained using several modes is required.

Likely candidates for comparison:

  • Baseline (retraining with accurate multiplication but same hyperparamters)
  • Gaussian Noise of same stdev as multiplier
  • Behavioral Simulation (LUT)
  • Regression Models

FAILED & ERROR when running Unit Tests

Hi etrommer, I met with errors when runing unit tests with "poetry run pytest test". I installed poetry in a conda environment (python=3.10.13) and cloned your code. Then I installed packages with "poetry install --with "dev,extras"" and installed additional dependencies as well as pre-commit hooks fine. However the unit tests report failed for several times and then all errors. I also run the benchmark, though there little failures, most of the rest seems good. Could you help me to solve the errors? thanks:)

My cuda version is 11.7 and the following is the output log of unit test and the benchmark.

  1. unit test

============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-7.4.2, pluggy-1.3.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/zhaojun/torch-approx
configfile: pyproject.toml
plugins: cov-3.0.0, benchmark-4.0.0
collected 436 items

test/test_approx_layer.py .............................FFFFFFEEEEEEEEEEE [ 10%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 27%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 37%]
test/test_approx_mm.py EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 48%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 64%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 81%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 96%]
test/test_dwconv2d.py EEEEEEEEEEEEEE [100%]

==================================== ERRORS ====================================
_____ ERROR at setup of test_layer_fwd[cuda-weight_qconfig0-layer_config6] _____

@pytest.fixture(autouse=True)
def fix_seed():
    """
    Run before every test.
    - Fixes random seed to make test reproducible
    - Sets CUDA to blocking to allow for benchmarking of normally asynchronous kernels
    """
    os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
    np.random.seed(42)
  torch.manual_seed(42)

test/conftest.py:36:


../anaconda3/envs/approx/lib/python3.10/site-packages/torch/random.py:40: in manual_seed
torch.cuda.manual_seed_all(seed)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/cuda/random.py:113: in manual_seed_all
_lazy_call(cb, seed_all=True)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/cuda/init.py:183: in _lazy_call
callable()


def cb():
    for i in range(device_count()):
        default_generator = torch.cuda.default_generators[i]
      default_generator.manual_seed(seed)

E RuntimeError: CUDA error: an illegal memory access was encountered
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

../anaconda3/envs/approx/lib/python3.10/site-packages/torch/cuda/random.py:111: RuntimeError
_____ ERROR at setup of test_layer_fwd[cuda-weight_qconfig1-layer_config0] _____

[............................... similar errors .............................................]
=================================== FAILURES ===================================
______________ test_layer_fwd[cuda-weight_qconfig0-layer_config0] ______________

device = 'cuda'
layer_config = (<class 'torch.nn.modules.linear.Linear'>, (4, 20), (20, 10), {})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}

@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
    input_dims = layer_config[1]
    layer, ref_layer = generate_models(layer_config, device, weight_qconfig)

    x = torch.rand(input_dims, device=device)
    xref = copy.deepcopy(x)
  y = layer(x)

test/test_approx_layer.py:165:


../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_linear.py:46: in approx_fwd
y = self.approx_op(x, w, quant_params, self.htp_model)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/operators/lut.py:82: in forward
return ApproxGeMM.apply(x, w, self.lut, quant_params, htp_model)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]


x = tensor([[0.8243, 0.2120, 0.7301, 0.3219, 0.7536, 0.2120, 0.9263, 0.1413, 0.3454,
0.9970, 0.5495, 0.2512, 0.92... 0.7536, 0.7065, 0.3297, 0.9106, 0.3925, 0.1727, 0.9813, 0.3690, 0.2591,
0.9185, 0.9891]], device='cuda:0')
w = tensor([[ 0.0501, -0.2194, -0.0449, -0.2056, -0.1538, -0.0086, 0.1054, -0.0415,
0.0086, -0.0950, -0.1158, ...0225, 0.0881, -0.2074]], device='cuda:0',
grad_fn=)
lut = tensor([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 1, 2, ..., -3, -2, -1],
[ 0, 2, 4, ..., -6, -4, -2]... ..., 9, 6, 3],
[ 0, -2, -4, ..., 6, 4, 2],
[ 0, -1, -2, ..., 3, 2, 1]], dtype=torch.int32)
quant_params = QuantizationParameters(x_scale=tensor([0.0079], device='cuda:0'), x_zero_point=tensor([0], device='cuda:0', dtype=torch.int32), w_scale=tensor([0.0017], device='cuda:0'), w_zero_point=tensor([0], device='cuda:0', dtype=torch.int32))
htp_model = None

@staticmethod
def forward(  # type: ignore
    x: torch.Tensor,
    w: torch.Tensor,
    lut: torch.Tensor,
    quant_params: "QuantizationParameters",
    htp_model: Optional[Callable],
) -> torch.Tensor:
    """
    Approximate forward operation
    """

    x_q = torch.round((x / quant_params.x_scale) + quant_params.x_zero_point)[
        :, None, :
    ]
    w_q = torch.round(
        (w / quant_params.w_scale[:, None]) + quant_params.w_zero_point[:, None]
    ).T

    if htp_model is None:
      y_q = approx(x_q.char(), w_q.char(), lut).float()

E RuntimeError: CUDA error: invalid argument
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

src/torchapprox/operators/approxgemm.py:39: RuntimeError
______________ test_layer_fwd[cuda-weight_qconfig0-layer_config1] ______________

device = 'cuda'
layer_config = (<class 'torch.nn.modules.conv.Conv2d'>, (2, 8, 4, 4), (8, 16, 3), {'groups': 1})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}

@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
    input_dims = layer_config[1]
    layer, ref_layer = generate_models(layer_config, device, weight_qconfig)

    x = torch.rand(input_dims, device=device)
    xref = copy.deepcopy(x)
  y = layer(x)

test/test_approx_layer.py:165:


../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_conv2d.py:185: in forward
return ApproxLayer.forward(self, x_q, x_scale, x_zero_point, bias)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_conv2d.py:155: in approx_fwd
y = ApproxConv2dOp.apply(
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
src/torchapprox/operators/conv2d.py:240: in forward
y_q = _im2col_conv2d(x_q, w_q, conv_args, lut, out_dims)


x_q = tensor([[[[105., 27., 93., 41.],
[ 96., 27., 118., 18.],
[ 44., 127., 70., 32.],
... [ 18., 102., 123., 77.],
[ 99., 57., 5., 16.],
[ 80., 83., 114., 84.]]]], device='cuda:0')
w_q = tensor([[[[ 29., -125., -26.],
[-117., -88., -4.],
[ 60., -24., 5.]],

     [[ -54.,...
     [[  22., -123.,   89.],
      [  25.,   91., -126.],
      [  62., -107.,  -40.]]]], device='cuda:0')

conv_args = Conv2dArgs(in_channels=8, out_channels=16, kernel_size=(3, 3), stride=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1)
lut = tensor([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 1, 2, ..., -3, -2, -1],
[ 0, 2, 4, ..., -6, -4, -2]... ..., 9, 6, 3],
[ 0, -2, -4, ..., 6, 4, 2],
[ 0, -1, -2, ..., 3, 2, 1]], dtype=torch.int32)
out_dims = (2, 2)

def _im2col_conv2d(
    x_q: torch.FloatTensor,
    w_q: torch.FloatTensor,
    conv_args: Conv2dArgs,
    lut: torch.ShortTensor,
    out_dims: Tuple[int, int],
) -> torch.FloatTensor:
    # Pre-allocate output tensor
    y_q = torch.empty(
        x_q.size(0),
        conv_args.out_channels,
        math.prod(out_dims),
        device=x_q.device,
        dtype=torch.int32,
    )

    w_s8 = w_q.char()
    for group in range(conv_args.groups):
        # Calculate lower and upper channel index for current group
        in_ch_lower, in_ch_upper = _group_limits(
            group, conv_args.groups, conv_args.in_channels
        )
        out_ch_lower, out_ch_upper = _group_limits(
            group, conv_args.groups, conv_args.out_channels
        )

        # Im2Col operation
        x_unfold_s8 = torch.nn.functional.unfold(
            x_q[
                :,
                in_ch_lower:in_ch_upper,
                :,
            ],
            kernel_size=conv_args.kernel_size,
            padding=conv_args.padding,
            stride=conv_args.stride,
            dilation=conv_args.dilation,
        ).char()

        # Reshape weights to 2D
        w_flat_s8 = w_s8[out_ch_lower:out_ch_upper].view(
            conv_args.out_channels // conv_args.groups, -1
        )

        # ApproxGeMM
      y_q[:, out_ch_lower:out_ch_upper] = approx(
            w_flat_s8,
            x_unfold_s8,
            lut,
        )

E RuntimeError: CUDA error: invalid argument
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

src/torchapprox/operators/conv2d.py:200: RuntimeError

______________ test_layer_fwd[cuda-weight_qconfig0-layer_config4] ______________

device = 'cuda'
layer_config = (<class 'torch.nn.modules.conv.Conv2d'>, (2, 8, 4, 4), (8, 16, 3), {'groups': 8})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}

@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
    input_dims = layer_config[1]
    layer, ref_layer = generate_models(layer_config, device, weight_qconfig)

    x = torch.rand(input_dims, device=device)
    xref = copy.deepcopy(x)
  y = layer(x)

test/test_approx_layer.py:165:


../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_conv2d.py:185: in forward
return ApproxLayer.forward(self, x_q, x_scale, x_zero_point, bias)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_conv2d.py:155: in approx_fwd
y = ApproxConv2dOp.apply(
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
src/torchapprox/operators/conv2d.py:240: in forward
y_q = _im2col_conv2d(x_q, w_q, conv_args, lut, out_dims)


x_q = tensor([[[[105., 27., 93., 41.],
[ 96., 27., 118., 18.],
[ 44., 127., 70., 32.],
... [ 18., 102., 123., 77.],
[ 99., 57., 5., 16.],
[ 80., 83., 114., 84.]]]], device='cuda:0')
w_q = tensor([[[[ 29., -127., -26.],
[-119., -89., -5.],
[ 61., -24., 5.]]],

    [[[ -55...
    [[[-119., -126.,   53.],
      [ -20.,  118.,   20.],
      [  50.,   -8., -123.]]]], device='cuda:0')

conv_args = Conv2dArgs(in_channels=8, out_channels=16, kernel_size=(3, 3), stride=(1, 1), padding=(0, 0), dilation=(1, 1), groups=8)
lut = tensor([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 1, 2, ..., -3, -2, -1],
[ 0, 2, 4, ..., -6, -4, -2]... ..., 9, 6, 3],
[ 0, -2, -4, ..., 6, 4, 2],
[ 0, -1, -2, ..., 3, 2, 1]], dtype=torch.int32)
out_dims = (2, 2)

def _im2col_conv2d(
    x_q: torch.FloatTensor,
    w_q: torch.FloatTensor,
    conv_args: Conv2dArgs,
    lut: torch.ShortTensor,
    out_dims: Tuple[int, int],
) -> torch.FloatTensor:
    # Pre-allocate output tensor
    y_q = torch.empty(
        x_q.size(0),
        conv_args.out_channels,
        math.prod(out_dims),
        device=x_q.device,
        dtype=torch.int32,
    )

    w_s8 = w_q.char()
    for group in range(conv_args.groups):
        # Calculate lower and upper channel index for current group
        in_ch_lower, in_ch_upper = _group_limits(
            group, conv_args.groups, conv_args.in_channels
        )
        out_ch_lower, out_ch_upper = _group_limits(
            group, conv_args.groups, conv_args.out_channels
        )

        # Im2Col operation
        x_unfold_s8 = torch.nn.functional.unfold(
            x_q[
                :,
                in_ch_lower:in_ch_upper,
                :,
            ],
            kernel_size=conv_args.kernel_size,
            padding=conv_args.padding,
            stride=conv_args.stride,
            dilation=conv_args.dilation,
        ).char()

        # Reshape weights to 2D
        w_flat_s8 = w_s8[out_ch_lower:out_ch_upper].view(
            conv_args.out_channels // conv_args.groups, -1
        )

        # ApproxGeMM
      y_q[:, out_ch_lower:out_ch_upper] = approx(
            w_flat_s8,
            x_unfold_s8,
            lut,
        )

E RuntimeError: CUDA error: invalid argument
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

src/torchapprox/operators/conv2d.py:200: RuntimeError
______________ test_layer_fwd[cuda-weight_qconfig0-layer_config5] ______________

device = 'cuda'
layer_config = (<class 'torch.nn.modules.conv.Conv2d'>, (2, 8, 4, 4), (8, 8, 3), {'groups': 8})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}

@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
    input_dims = layer_config[1]
    layer, ref_layer = generate_models(layer_config, device, weight_qconfig)

    x = torch.rand(input_dims, device=device)
    xref = copy.deepcopy(x)
  y = layer(x)

test/test_approx_layer.py:165:


../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_conv2d.py:185: in forward
return ApproxLayer.forward(self, x_q, x_scale, x_zero_point, bias)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_conv2d.py:155: in approx_fwd
y = ApproxConv2dOp.apply(
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
src/torchapprox/operators/conv2d.py:237: in forward
y_q = dwconv2d(x_q, w_q, lut, conv_args.stride, conv_args.padding)


x = <[RuntimeError('CUDA error: an illegal memory access was encountered\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f91966b69d0>
w = <[RuntimeError('CUDA error: an illegal memory access was encountered\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f91966b6d40>
lut = <[RuntimeError('CUDA error: an illegal memory access was encountered\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f91966b4fe0>
stride = (1, 1), padding = (0, 0)

def dwconv2d(
    x: torch.FloatTensor,
    w: torch.FloatTensor,
    lut: torch.ShortTensor,
    stride: int = 1,
    padding: int = 0,
) -> torch.FloatTensor:
    """
    Approximate 2D Depthwise Convolution
    """
    x = x.char()
    w = w.char()

    assert x.device == w.device
    assert x.is_cuda
    assert (
        x.dtype == w.dtype == torch.int8
    ), "Input operands need to be 8-Bit signed Integer"
    assert lut.dtype == torch.int32, "LUT needs to be 32 bit signed Integer"

    def make_tuple(val):
        if not isinstance(val, tuple):
            return (val, val)
        return val

    stride = make_tuple(stride)
    padding = make_tuple(padding)

    lut = lut.to(x.device)
    small = ta_backend.use_dwconv2d_small(x, w, 1, 1, *stride, *padding)
    if small:
        out = ta_backend.dwconv2d_small(x, w, lut, 1, 1, *stride, *padding, True)
    else:
        out = ta_backend.dwconv2d(x, w, lut, 1, 1, *stride, *padding, *padding, True)
  return out.float()

E RuntimeError: CUDA error: an illegal memory access was encountered
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

src/torchapprox/operators/backend.py:70: RuntimeError
=============================== warnings summary ===============================
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/utils/cpp_extension.py:25
/home/zhaojun/anaconda3/envs/approx/lib/python3.10/site-packages/torch/utils/cpp_extension.py:25: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import packaging # type: ignore[attr-defined]

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config0]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config1]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config2]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config3]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config4]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config5]
ERROR test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config6]
ERROR test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig1-layer_config0]
ERROR test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig1-layer_config1]
......

  1. benchmark
    ==========================
    benchmarks/test_bench_torchapprox.py .F........................................F........................................F........................................F........................................F........ [ 70%]
    ................................F....................................... [100%]

======================================================================= short test summary info =================================================================================================
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[mobilenet_v2-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[effcientnet_b0-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[vgg16-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[alexnet-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[resnet18-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[resnet50-lut] - AssertionError: LUT needs to be signed 32 Bit Integer

Benchmark against TFApprox

Some comparison with TFApprox with TFApprox was requested.

This should be kept separate from productive code, similar to #6

Test case is not fully-defined yet. Most likely scenario: Comparison of Conv2D inference speed.

Add Benchmarks

Add (micro-)benchmarks to compare throughput of different inference modes

Support inline compilation of C approximate functions

Add the feature to replace the LUT operation with an inlined C function that performs operand transformations according to the logic of a given AM.

This will be helpful in benchmarking 12-Bit and 16-Bit AMs where a LUT would be too large.

Refactor ApproxConv2d operator into torch.autograd.Function

ApproxConv2d operator is currently composed from several separate Autograd Functions. Refactoring those into a single one will likely reduce problems with excessive memory consumption due to smaller number of tensors that need to be tracked

Set up Sphinx

Improve documentation, specifically:

  • Set up Sphinx in Github Actions
  • Add preliminary content to README.md

Benchmark against adaPT

TorchApprox and adaPT need to be compared in terms of runtime.

This should be kept on a separate branch to not interfere with productive code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.