etrommer / torch-approx Goto Github PK

GPU-accelerated Neural Network layers using Approximate Multiplications for PyTorch

Home Page: https://etrommer.de/torch-approx

License: MIT License

C++ 3.48% C 0.88% Python 53.85% Cuda 21.07% Shell 0.21% Jupyter Notebook 20.51%

python approximate-computing convolutional-layers deep-learning fully-connected library machine-learning neural-network paper python3

torch-approx's Introduction

This is where I keep my research/productive work.

For more information about me see my website

Tinkering, partially done projects and non-professional work can be found under my private account

torch-approx's People

Contributors

Stargazers

Watchers

torch-approx's Issues

Layer Noise Mode Interface

The noise mode of the layer currently adds a zero mean tensor with learnable standard deviation.

torch-approx/src/torchapprox/layers/approx_layer.py

Lines 50 to 60 in 5740d50

    
               def stdev(self) -> torch.nn.Parameter: 
        
                   """ 
        
                   The relative standard deviation of the Additive Gaussian noise that is added 
        
                   to the computation output. Scaling is done relative the current batch's standard devitaion. 
        
                   This is only used when the mode is set to `noise`. It will have no effect in other modes. 
        
                   """ 
        
                   return self._stdev 
        
               @stdev.setter 
        
               def stdev(self, noise_std: float): 
        
                   self._stdev = torch.nn.Parameter(torch.tensor(noise_std), requires_grad=True)

This is quite specific to the origin of torch-approx as a backend for AGN Approx. To make it more generally useful, this feature should be kept on a separate branch and the noise implementation replaced with a more generic interface that adds Gaussian Noise of a fixed mean and standard deviation to the layer output

torch-approx/src/torchapprox/layers/approx_layer.py

Lines 48 to 74 in 8284cf7

    
               @property 
        
               def stdev(self) -> float: 
        
                   """ 
        
                   Perturbation Error Relative Standard Deviation 
        
                   Returns: 
        
                       Currently configured perturbation standard deviation 
        
                   """ 
        
                   return self._stdev.item() 
        
               @stdev.setter 
        
               def stdev(self, val: float): 
        
                   self._stdev = torch.tensor([val], device=self.weight.device)  # type: ignore 
        
               @property 
        
               def mean(self) -> float: 
        
                   """ 
        
                   Perturbation Error mean 
        
                   Returns: 
        
                       Currently configured perturbation mean 
        
                   """ 
        
                   return self._mean.item() 
        
               @mean.setter 
        
               def mean(self, val: float): 
        
                   self._mean = torch.tensor([val], device=self.weight.device)  # type: ignore

Accuracy Benchmarking

Benchmarking of model accuracy when retrained using several modes is required.

Likely candidates for comparison:

Baseline (retraining with accurate multiplication but same hyperparamters)
Gaussian Noise of same stdev as multiplier
Behavioral Simulation (LUT)
Regression Models

FAILED & ERROR when running Unit Tests

Hi etrommer, I met with errors when runing unit tests with "poetry run pytest test". I installed poetry in a conda environment (python=3.10.13) and cloned your code. Then I installed packages with "poetry install --with "dev,extras"" and installed additional dependencies as well as pre-commit hooks fine. However the unit tests report failed for several times and then all errors. I also run the benchmark, though there little failures, most of the rest seems good. Could you help me to solve the errors? thanks:)

My cuda version is 11.7 and the following is the output log of unit test and the benchmark.

unit test

============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-7.4.2, pluggy-1.3.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/zhaojun/torch-approx
configfile: pyproject.toml
plugins: cov-3.0.0, benchmark-4.0.0
collected 436 items

test/test_approx_layer.py .............................FFFFFFEEEEEEEEEEE [ 10%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 27%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 37%]
test/test_approx_mm.py EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 48%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 64%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 81%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 96%]
test/test_dwconv2d.py EEEEEEEEEEEEEE [100%]

==================================== ERRORS ====================================
_____ ERROR at setup of test_layer_fwd[cuda-weight_qconfig0-layer_config6] _____

@pytest.fixture(autouse=True)
def fix_seed():
    """
    Run before every test.
    - Fixes random seed to make test reproducible
    - Sets CUDA to blocking to allow for benchmarking of normally asynchronous kernels
    """
    os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
    np.random.seed(42)

  torch.manual_seed(42)

test/conftest.py:36:

../anaconda3/envs/approx/lib/python3.10/site-packages/torch/random.py:40: in manual_seed
torch.cuda.manual_seed_all(seed)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/cuda/random.py:113: in manual_seed_all
_lazy_call(cb, seed_all=True)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/cuda/init.py:183: in _lazy_call
callable()

def cb():
    for i in range(device_count()):
        default_generator = torch.cuda.default_generators[i]

      default_generator.manual_seed(seed)

E RuntimeError: CUDA error: an illegal memory access was encountered
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

../anaconda3/envs/approx/lib/python3.10/site-packages/torch/cuda/random.py:111: RuntimeError
_____ ERROR at setup of test_layer_fwd[cuda-weight_qconfig1-layer_config0] _____

[............................... similar errors .............................................]
=================================== FAILURES ===================================
______________ test_layer_fwd[cuda-weight_qconfig0-layer_config0] ______________

device = 'cuda'
layer_config = (<class 'torch.nn.modules.linear.Linear'>, (4, 20), (20, 10), {})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}

@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
    input_dims = layer_config[1]
    layer, ref_layer = generate_models(layer_config, device, weight_qconfig)

    x = torch.rand(input_dims, device=device)
    xref = copy.deepcopy(x)

  y = layer(x)

test/test_approx_layer.py:165:

../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_linear.py:46: in approx_fwd
y = self.approx_op(x, w, quant_params, self.htp_model)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/operators/lut.py:82: in forward
return ApproxGeMM.apply(x, w, self.lut, quant_params, htp_model)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]

x = tensor([[0.8243, 0.2120, 0.7301, 0.3219, 0.7536, 0.2120, 0.9263, 0.1413, 0.3454,
0.9970, 0.5495, 0.2512, 0.92... 0.7536, 0.7065, 0.3297, 0.9106, 0.3925, 0.1727, 0.9813, 0.3690, 0.2591,
0.9185, 0.9891]], device='cuda:0')
w = tensor([[ 0.0501, -0.2194, -0.0449, -0.2056, -0.1538, -0.0086, 0.1054, -0.0415,
0.0086, -0.0950, -0.1158, ...0225, 0.0881, -0.2074]], device='cuda:0',
grad_fn=)
lut = tensor([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 1, 2, ..., -3, -2, -1],
[ 0, 2, 4, ..., -6, -4, -2]... ..., 9, 6, 3],
[ 0, -2, -4, ..., 6, 4, 2],
[ 0, -1, -2, ..., 3, 2, 1]], dtype=torch.int32)
quant_params = QuantizationParameters(x_scale=tensor([0.0079], device='cuda:0'), x_zero_point=tensor([0], device='cuda:0', dtype=torch.int32), w_scale=tensor([0.0017], device='cuda:0'), w_zero_point=tensor([0], device='cuda:0', dtype=torch.int32))
htp_model = None

@staticmethod
def forward(  # type: ignore
    x: torch.Tensor,
    w: torch.Tensor,
    lut: torch.Tensor,
    quant_params: "QuantizationParameters",
    htp_model: Optional[Callable],
) -> torch.Tensor:
    """
    Approximate forward operation
    """

    x_q = torch.round((x / quant_params.x_scale) + quant_params.x_zero_point)[
        :, None, :
    ]
    w_q = torch.round(
        (w / quant_params.w_scale[:, None]) + quant_params.w_zero_point[:, None]
    ).T

    if htp_model is None:

      y_q = approx(x_q.char(), w_q.char(), lut).float()

E RuntimeError: CUDA error: invalid argument
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

src/torchapprox/operators/approxgemm.py:39: RuntimeError
______________ test_layer_fwd[cuda-weight_qconfig0-layer_config1] ______________

device = 'cuda'
layer_config = (<class 'torch.nn.modules.conv.Conv2d'>, (2, 8, 4, 4), (8, 16, 3), {'groups': 1})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}

@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
    input_dims = layer_config[1]
    layer, ref_layer = generate_models(layer_config, device, weight_qconfig)

    x = torch.rand(input_dims, device=device)
    xref = copy.deepcopy(x)

  y = layer(x)

test/test_approx_layer.py:165:

../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_conv2d.py:185: in forward
return ApproxLayer.forward(self, x_q, x_scale, x_zero_point, bias)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_conv2d.py:155: in approx_fwd
y = ApproxConv2dOp.apply(
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
src/torchapprox/operators/conv2d.py:240: in forward
y_q = _im2col_conv2d(x_q, w_q, conv_args, lut, out_dims)

x_q = tensor([[[[105., 27., 93., 41.],
[ 96., 27., 118., 18.],
[ 44., 127., 70., 32.],
... [ 18., 102., 123., 77.],
[ 99., 57., 5., 16.],
[ 80., 83., 114., 84.]]]], device='cuda:0')
w_q = tensor([[[[ 29., -125., -26.],
[-117., -88., -4.],
[ 60., -24., 5.]],

     [[ -54.,...
     [[  22., -123.,   89.],
      [  25.,   91., -126.],
      [  62., -107.,  -40.]]]], device='cuda:0')

conv_args = Conv2dArgs(in_channels=8, out_channels=16, kernel_size=(3, 3), stride=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1)
lut = tensor([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 1, 2, ..., -3, -2, -1],
[ 0, 2, 4, ..., -6, -4, -2]... ..., 9, 6, 3],
[ 0, -2, -4, ..., 6, 4, 2],
[ 0, -1, -2, ..., 3, 2, 1]], dtype=torch.int32)
out_dims = (2, 2)

def _im2col_conv2d(
    x_q: torch.FloatTensor,
    w_q: torch.FloatTensor,
    conv_args: Conv2dArgs,
    lut: torch.ShortTensor,
    out_dims: Tuple[int, int],
) -> torch.FloatTensor:
    # Pre-allocate output tensor
    y_q = torch.empty(
        x_q.size(0),
        conv_args.out_channels,
        math.prod(out_dims),
        device=x_q.device,
        dtype=torch.int32,
    )

    w_s8 = w_q.char()
    for group in range(conv_args.groups):
        # Calculate lower and upper channel index for current group
        in_ch_lower, in_ch_upper = _group_limits(
            group, conv_args.groups, conv_args.in_channels
        )
        out_ch_lower, out_ch_upper = _group_limits(
            group, conv_args.groups, conv_args.out_channels
        )

        # Im2Col operation
        x_unfold_s8 = torch.nn.functional.unfold(
            x_q[
                :,
                in_ch_lower:in_ch_upper,
                :,
            ],
            kernel_size=conv_args.kernel_size,
            padding=conv_args.padding,
            stride=conv_args.stride,
            dilation=conv_args.dilation,
        ).char()

        # Reshape weights to 2D
        w_flat_s8 = w_s8[out_ch_lower:out_ch_upper].view(
            conv_args.out_channels // conv_args.groups, -1
        )

        # ApproxGeMM

      y_q[:, out_ch_lower:out_ch_upper] = approx(

            w_flat_s8,
            x_unfold_s8,
            lut,
        )

E RuntimeError: CUDA error: invalid argument
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

src/torchapprox/operators/conv2d.py:200: RuntimeError

______________ test_layer_fwd[cuda-weight_qconfig0-layer_config4] ______________

device = 'cuda'
layer_config = (<class 'torch.nn.modules.conv.Conv2d'>, (2, 8, 4, 4), (8, 16, 3), {'groups': 8})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}

@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
    input_dims = layer_config[1]
    layer, ref_layer = generate_models(layer_config, device, weight_qconfig)

    x = torch.rand(input_dims, device=device)
    xref = copy.deepcopy(x)

  y = layer(x)

test/test_approx_layer.py:165:

../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_conv2d.py:185: in forward
return ApproxLayer.forward(self, x_q, x_scale, x_zero_point, bias)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_conv2d.py:155: in approx_fwd
y = ApproxConv2dOp.apply(
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
src/torchapprox/operators/conv2d.py:240: in forward
y_q = _im2col_conv2d(x_q, w_q, conv_args, lut, out_dims)

    [[[ -55...
    [[[-119., -126.,   53.],
      [ -20.,  118.,   20.],
      [  50.,   -8., -123.]]]], device='cuda:0')

conv_args = Conv2dArgs(in_channels=8, out_channels=16, kernel_size=(3, 3), stride=(1, 1), padding=(0, 0), dilation=(1, 1), groups=8)
lut = tensor([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 1, 2, ..., -3, -2, -1],
[ 0, 2, 4, ..., -6, -4, -2]... ..., 9, 6, 3],
[ 0, -2, -4, ..., 6, 4, 2],
[ 0, -1, -2, ..., 3, 2, 1]], dtype=torch.int32)
out_dims = (2, 2)

def _im2col_conv2d(
    x_q: torch.FloatTensor,
    w_q: torch.FloatTensor,
    conv_args: Conv2dArgs,
    lut: torch.ShortTensor,
    out_dims: Tuple[int, int],
) -> torch.FloatTensor:
    # Pre-allocate output tensor
    y_q = torch.empty(
        x_q.size(0),
        conv_args.out_channels,
        math.prod(out_dims),
        device=x_q.device,
        dtype=torch.int32,
    )

    w_s8 = w_q.char()
    for group in range(conv_args.groups):
        # Calculate lower and upper channel index for current group
        in_ch_lower, in_ch_upper = _group_limits(
            group, conv_args.groups, conv_args.in_channels
        )
        out_ch_lower, out_ch_upper = _group_limits(
            group, conv_args.groups, conv_args.out_channels
        )

        # Im2Col operation
        x_unfold_s8 = torch.nn.functional.unfold(
            x_q[
                :,
                in_ch_lower:in_ch_upper,
                :,
            ],
            kernel_size=conv_args.kernel_size,
            padding=conv_args.padding,
            stride=conv_args.stride,
            dilation=conv_args.dilation,
        ).char()

        # Reshape weights to 2D
        w_flat_s8 = w_s8[out_ch_lower:out_ch_upper].view(
            conv_args.out_channels // conv_args.groups, -1
        )

        # ApproxGeMM

      y_q[:, out_ch_lower:out_ch_upper] = approx(

            w_flat_s8,
            x_unfold_s8,
            lut,
        )

E RuntimeError: CUDA error: invalid argument
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

src/torchapprox/operators/conv2d.py:200: RuntimeError
______________ test_layer_fwd[cuda-weight_qconfig0-layer_config5] ______________

device = 'cuda'
layer_config = (<class 'torch.nn.modules.conv.Conv2d'>, (2, 8, 4, 4), (8, 8, 3), {'groups': 8})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}

@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
    input_dims = layer_config[1]
    layer, ref_layer = generate_models(layer_config, device, weight_qconfig)

    x = torch.rand(input_dims, device=device)
    xref = copy.deepcopy(x)

  y = layer(x)

test/test_approx_layer.py:165:

../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_conv2d.py:185: in forward
return ApproxLayer.forward(self, x_q, x_scale, x_zero_point, bias)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_conv2d.py:155: in approx_fwd
y = ApproxConv2dOp.apply(
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
src/torchapprox/operators/conv2d.py:237: in forward
y_q = dwconv2d(x_q, w_q, lut, conv_args.stride, conv_args.padding)

x = <[RuntimeError('CUDA error: an illegal memory access was encountered\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f91966b69d0>
w = <[RuntimeError('CUDA error: an illegal memory access was encountered\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f91966b6d40>
lut = <[RuntimeError('CUDA error: an illegal memory access was encountered\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f91966b4fe0>
stride = (1, 1), padding = (0, 0)

def dwconv2d(
    x: torch.FloatTensor,
    w: torch.FloatTensor,
    lut: torch.ShortTensor,
    stride: int = 1,
    padding: int = 0,
) -> torch.FloatTensor:
    """
    Approximate 2D Depthwise Convolution
    """
    x = x.char()
    w = w.char()

    assert x.device == w.device
    assert x.is_cuda
    assert (
        x.dtype == w.dtype == torch.int8
    ), "Input operands need to be 8-Bit signed Integer"
    assert lut.dtype == torch.int32, "LUT needs to be 32 bit signed Integer"

    def make_tuple(val):
        if not isinstance(val, tuple):
            return (val, val)
        return val

    stride = make_tuple(stride)
    padding = make_tuple(padding)

    lut = lut.to(x.device)
    small = ta_backend.use_dwconv2d_small(x, w, 1, 1, *stride, *padding)
    if small:
        out = ta_backend.dwconv2d_small(x, w, lut, 1, 1, *stride, *padding, True)
    else:
        out = ta_backend.dwconv2d(x, w, lut, 1, 1, *stride, *padding, *padding, True)

  return out.float()

E RuntimeError: CUDA error: an illegal memory access was encountered
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

src/torchapprox/operators/backend.py:70: RuntimeError
=============================== warnings summary ===============================
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/utils/cpp_extension.py:25
/home/zhaojun/anaconda3/envs/approx/lib/python3.10/site-packages/torch/utils/cpp_extension.py:25: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import packaging # type: ignore[attr-defined]

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config0]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config1]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config2]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config3]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config4]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config5]
ERROR test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config6]
ERROR test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig1-layer_config0]
ERROR test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig1-layer_config1]
......

benchmark
==========================
benchmarks/test_bench_torchapprox.py .F........................................F........................................F........................................F........................................F........ [ 70%]
................................F....................................... [100%]

======================================================================= short test summary info =================================================================================================
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[mobilenet_v2-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[effcientnet_b0-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[vgg16-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[alexnet-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[resnet18-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[resnet50-lut] - AssertionError: LUT needs to be signed 32 Bit Integer

Implement Approximate Depthwise Convolution Kernels

Benchmarking has shown that Im2Col + ApproxGeMM is extremely slow for Depthwise-Separable Convolution Operations.

This should be addressed by adding dedicated Approximate DWConv operators.

accurate FP32 DWConv operators should be used as a template.

Benchmark against TFApprox

Some comparison with TFApprox with TFApprox was requested.

This should be kept separate from productive code, similar to #6

Test case is not fully-defined yet. Most likely scenario: Comparison of Conv2D inference speed.

Set up Sphinx in Github Actions
Add preliminary content to README.md

Benchmark against adaPT

TorchApprox and adaPT need to be compared in terms of runtime.

This should be kept on a separate branch to not interfere with productive code.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	def stdev(self) -> torch.nn.Parameter:
	"""
	The relative standard deviation of the Additive Gaussian noise that is added
	to the computation output. Scaling is done relative the current batch's standard devitaion.
	This is only used when the mode is set to `noise`. It will have no effect in other modes.
	"""
	return self._stdev

	@stdev.setter
	def stdev(self, noise_std: float):
	self._stdev = torch.nn.Parameter(torch.tensor(noise_std), requires_grad=True)

	@property
	def stdev(self) -> float:
	"""
	Perturbation Error Relative Standard Deviation

	Returns:
	Currently configured perturbation standard deviation
	"""
	return self._stdev.item()

	@stdev.setter
	def stdev(self, val: float):
	self._stdev = torch.tensor([val], device=self.weight.device) # type: ignore

	@property
	def mean(self) -> float:
	"""
	Perturbation Error mean

	Returns:
	Currently configured perturbation mean
	"""
	return self._mean.item()

	@mean.setter
	def mean(self, val: float):
	self._mean = torch.tensor([val], device=self.weight.device) # type: ignore