Giter Site home page Giter Site logo

pku-liang / flextensor Goto Github PK

View Code? Open in Web Editor NEW
174.0 174.0 30.0 9.83 MB

Automatic Schedule Exploration and Optimization Framework for Tensor Computations

License: MIT License

Python 70.65% Makefile 0.06% Cuda 12.86% C++ 3.63% C 12.57% Shell 0.22%

flextensor's People

Contributors

hatsu3 avatar holmosaint avatar knowingnothing avatar light-of-hers avatar yzliu567 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flextensor's Issues

optimize_block_celluar.py cannot work with '--target cuda'

Just tested 4 files: optimize_block_celluar.py, optimize_conv1d.py, optimize_conv2d.py, optimize_conv3d.py

  • optimize_block_celluar.py:
<class 'AssertionError'>
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 161, in exec_func
    res = func(*args, **kwargs)
  File "../../../auto_schedule/testing/scheduler.py", line 78, in build_func
    s, bufs = schedule_with_config(task_key, configs, op_pos=op_pos)
  File "../../../auto_schedule/testing/scheduler.py", line 1711, in schedule_with_config
    template(s, op)
  File "../../../auto_schedule/testing/scheduler.py", line 1263, in _cuda_schedule_split_reorder_fuse
    assert pos < len(spatial_remainder)
AssertionError
op build fail:
<class 'AssertionError'>
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 161, in exec_func
    res = func(*args, **kwargs)
  File "../../../auto_schedule/testing/scheduler.py", line 78, in build_func
    s, bufs = schedule_with_config(task_key, configs, op_pos=op_pos)
  File "../../../auto_schedule/testing/scheduler.py", line 1711, in schedule_with_config
    template(s, op)
  File "../../../auto_schedule/testing/scheduler.py", line 1263, in _cuda_schedule_split_reorder_fuse
    assert pos < len(spatial_remainder)
AssertionError
op build fail:
<class 'AssertionError'>
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 161, in exec_func
    res = func(*args, **kwargs)
  File "../../../auto_schedule/testing/scheduler.py", line 78, in build_func
    s, bufs = schedule_with_config(task_key, configs, op_pos=op_pos)
  File "../../../auto_schedule/testing/scheduler.py", line 1711, in schedule_with_config
    template(s, op)
  File "../../../auto_schedule/testing/scheduler.py", line 1263, in _cuda_schedule_split_reorder_fuse
    assert pos < len(spatial_remainder)
AssertionError
op build fail:
<class 'AssertionError'>
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 161, in exec_func
    res = func(*args, **kwargs)
  File "../../../auto_schedule/testing/scheduler.py", line 78, in build_func
    s, bufs = schedule_with_config(task_key, configs, op_pos=op_pos)
  File "../../../auto_schedule/testing/scheduler.py", line 1711, in schedule_with_config
    template(s, op)
  File "../../../auto_schedule/testing/scheduler.py", line 1263, in _cuda_schedule_split_reorder_fuse
    assert pos < len(spatial_remainder)
AssertionError
op build fail:
<class 'AssertionError'>
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 161, in exec_func
    res = func(*args, **kwargs)
  File "../../../auto_schedule/testing/scheduler.py", line 78, in build_func
    s, bufs = schedule_with_config(task_key, configs, op_pos=op_pos)
  File "../../../auto_schedule/testing/scheduler.py", line 1711, in schedule_with_config
    template(s, op)
  File "../../../auto_schedule/testing/scheduler.py", line 1263, in _cuda_schedule_split_reorder_fuse
    assert pos < len(spatial_remainder)
AssertionError
op build fail:
<class 'AssertionError'>
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 161, in exec_func
    res = func(*args, **kwargs)
  File "../../../auto_schedule/testing/scheduler.py", line 78, in build_func
    s, bufs = schedule_with_config(task_key, configs, op_pos=op_pos)
  File "../../../auto_schedule/testing/scheduler.py", line 1711, in schedule_with_config
    template(s, op)
  File "../../../auto_schedule/testing/scheduler.py", line 1263, in _cuda_schedule_split_reorder_fuse
    assert pos < len(spatial_remainder)
AssertionError
op build fail:
<class 'RuntimeError'>
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 161, in exec_func
    res = func(*args, **kwargs)
  File "../../../auto_schedule/testing/scheduler.py", line 82, in build_func
    raise RuntimeError("Invalid %s(%d) kernel"%(task.target, task.dev_id))
RuntimeError: Invalid cuda(0) kernel
op build fail:Invalid cuda(0) kernel
  • optimize_conv1d.py:
<class 'queue.Empty'>
op run fail:
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 1525, in get
    res = self.q.get(block=True, timeout=timeout)
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 105, in get
    raise Empty
queue.Empty
<class 'queue.Empty'>
op run fail:
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 1525, in get
    res = self.q.get(block=True, timeout=timeout)
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 105, in get
    raise Empty
queue.Empty
<class 'queue.Empty'>
op build fail:
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 1525, in get
    res = self.q.get(block=True, timeout=timeout)
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 105, in get
    raise Empty
queue.Empty
<class 'RuntimeError'>
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 161, in exec_func
    res = func(*args, **kwargs)
  File "../../../auto_schedule/testing/scheduler.py", line 82, in build_func
    raise RuntimeError("Invalid %s(%d) kernel"%(task.target, task.dev_id))
RuntimeError: Invalid cuda(0) kernel
op build fail:Invalid cuda(0) kernel
  • optimize_conv3d.py:
<class 'queue.Empty'>
op run fail:
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 1525, in get
    res = self.q.get(block=True, timeout=timeout)
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 105, in get
    raise Empty
queue.Empty
<class 'RuntimeError'>
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 161, in exec_func
    res = func(*args, **kwargs)
  File "../../../auto_schedule/testing/scheduler.py", line 82, in build_func
    raise RuntimeError("Invalid %s(%d) kernel"%(task.target, task.dev_id))
RuntimeError: Invalid cuda(0) kernel
op build fail:Invalid cuda(0) kernel
<class 'RuntimeError'>
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 161, in exec_func
    res = func(*args, **kwargs)
  File "../../../auto_schedule/testing/scheduler.py", line 82, in build_func
    raise RuntimeError("Invalid %s(%d) kernel"%(task.target, task.dev_id))
RuntimeError: Invalid cuda(0) kernel
op build fail:Invalid cuda(0) kernel
<class 'queue.Empty'>
op run fail:
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 1525, in get
    res = self.q.get(block=True, timeout=timeout)
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 105, in get
    raise Empty
queue.Empty
<class 'queue.Empty'>
op run fail:
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 1525, in get
    res = self.q.get(block=True, timeout=timeout)
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 105, in get
    raise Empty
queue.Empty
<class 'RuntimeError'>
Traceback (most recent call last):
  File "../../../auto_schedule/testing/scheduler.py", line 161, in exec_func
    res = func(*args, **kwargs)
  File "../../../auto_schedule/testing/scheduler.py", line 82, in build_func
    raise RuntimeError("Invalid %s(%d) kernel"%(task.target, task.dev_id))
RuntimeError: Invalid cuda(0) kernel
op build fail:Invalid cuda(0) kernel

and optimize_conv2d.py didn't output anything even with '--target llvm'

Where is the "kernel.c" file for the gemmini examples?

Hello,

Thanks for the amazing work and codes.

I am trying to run your examples and tests, but among them, I have not been able to run gemmini examples such as '/testing/others/hand-craft/gemmini-*.py'.

These examples require "kernel.c" file, but when I looked it up, the current project didn't have that file.

Can you tell me where the "kernel.c" file is? If not, can I get the file?

Thanks,

problems in DQN search

I ran the optimize/optimize_conv2d.py with the following command:

python3 optimize_conv2d.py --shapes yolo --target cuda --trials 1000 --timeout 10 --parallel 8 --log tmp_log.txt --method q

However, after a lot of warnings about the warm up things, I got this issue

Traceback (most recent call last):

  File "optimize_conv2d.py", line 204, in <module>
    logfile=flog,

  File "optimize_conv2d.py", line 116, in optimize
    rpc_info=rpc_info,

  File "/home/max/workspaces/python/FlexTensor/flextensor/scheduler.py", line 2100, in schedule
    perf_path=perf_path,

  File "/home/max/workspaces/python/FlexTensor/flextensor/scheduler.py", line 670, in schedule
    return self._q_schedule(configs, wanted_types, use_model=use_model)

  File "/home/max/workspaces/python/FlexTensor/flextensor/scheduler.py", line 443, in _q_schedule
    from_lst, next_points, action_lst = self.walker_group.walk(cur_lst, trial)

  File "/home/max/workspaces/python/FlexTensor/flextensor/model.py", line 359, in walk
    next_index_lst, direction_lst = self.walkers[name].walk(flattened_lst, index_lst, trial, epsilon, gamma)

  File "/home/max/workspaces/python/FlexTensor/flextensor/model.py", line 72, in walk
    q_values_lst = self.pre_judger(torch.FloatTensor(inputs)).detach()

  File "/home/max/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)

  File "/home/max/workspaces/python/FlexTensor/flextensor/model.py", line 34, in forward
    out = self.net(inputs)

  File "/home/max/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)

  File "/home/max/.local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)

  File "/home/max/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)

  File "/home/max/.local/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)

  File "/home/max/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1372, in linear
    output = input.matmul(weight.t())

RuntimeError: size mismatch, m1: [1 x 0], m2: [22 x 64] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:197

The same issue arised in another op I wrote by myself with "method=q" setting.

Are there any instructions on using DQN search?

How to encode and schedule for "reorder"?

Hello,

Thanks for the amazing work and codes.

I looked into the paper and codes but was confused about how you encode "reorder" schedule in the space. I have some questions.

  1. In the codes I found
    reorder_space = generate_reorder_space(groups) and groups = 3 by default
    Shall I change groups equal to rlevel*slevel?

  2. The configs it generated is just one dimensional, and does not match the description in the paper, i.e.,
    β€œreorder”: 𝑖1, 𝑗1, 𝑖2, 𝑗2, π‘˜1, 𝑖3, π‘˜2, 𝑗3, π‘˜3, 𝑖4, π‘˜4, 𝑗4 in Figure 3

Could you please explain these? Thanks.

Some problem while running on GPU

I want to test the performance of C9 of yolo after FlexTensor's optimization, but there seems to be some problems when running optimize_conv2d.py on GPU

$ python optimize_conv2d.py --shapes yolo --from 8 --to 9 --parallel 16 --target cuda
......
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 16
warm up [1599394505.223908] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 16
warm up [1599394508.009939] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 16
warm up [1599394510.781969] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 16
Fail to find valid schedule, too many errors
warm up [1599394513.576313] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 16
warm up [1599394516.424372] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 16
......

I have seen a previous issue and the current code uses 'spawn' when using multiprocessing.
It seems that it will not stop running because it can't find a suitable schedule.

Some problem while running on GPU

There seems some problems while running the second case on GPU:

$ python3 optimize_block_circulant_matrix.py --target cuda --parallel 20 --timeout 120
Optimize block_circulant_matrix shape (1024, 256, 8)
......
######################################
op schedules:
----------------------------------
fuse [[1, 2, 2]]
spatial [[8, 1, 4, 4], [16, 1, 2, 8]]
reduce [[1, 1, 8]]
reorder [[0]]
unroll [[512, 1]]
----------------------------------
fuse [[1, 2, 2]]
spatial [[32, 1, 2, 16], [2, 4, 32, 1]]
reorder [[0]]
unroll [[1, 1]]
graph schedules:
inline [[0, 0]]
merge [[0, 1]]
block_circulant_matrix_block_circulant_matrix_(1024, 256, 8)_cuda(0):[[{"fuse": [[1, 2, 2]], "spatial": [[8, 1, 4, 4], [16, 1, 2, 8]], "reduce": [[1, 1, 8]], "reorder": [[0]], "inline": [], "unroll": [[512, 1]], "merge": []}, {"fuse": [[1, 2, 2]], "spatial": [[32, 1, 2, 16], [2, 4, 32, 1]], "reduce": [], "reorder": [[0]], "inline": [], "unroll": [[1, 1]], "merge": []}], {"fuse": [], "spatial": [], "reduce": [], "reorder": [], "inline": [[0, 0]], "unroll": [], "merge": [[0, 1]]}]
Use 0.24673579999999998 ms
Cost 1717.7187826633453 s

Optimize block_circulant_matrix shape (1024, 256, 16)
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
.......

but if I only run the 2nd case, everything seems OK

$ python3 optimize_block_circulant_matrix.py --target cuda --parallel 20 --timeout 120 -f 1 -t 2
Optimize block_circulant_matrix shape (1024, 256, 16)
......
######################################
op schedules:
----------------------------------
fuse [[1, 2, 2]]
spatial [[32, 1, 2, 1], [4, 1, 4, 16]]
reduce [[1, 2, 8]]
reorder [[1]]
unroll [[512, 0]]
----------------------------------
fuse [[1, 2, 2]]
spatial [[128, 4, 1, 2], [4, 4, 16, 1]]
reorder [[2]]
unroll [[512, 1]]
graph schedules:
inline [[0, 0]]
merge [[1, 0]]
block_circulant_matrix_block_circulant_matrix_(1024, 256, 16)_cuda(0):[[{"fuse": [[1, 2, 2]], "spatial": [[32, 1, 2, 1], [4, 1, 4, 16]], "reduce": [[1, 2, 8]], "reorder": [[1]], "inline": [], "unroll": [[512, 0]], "merge": []}, {"fuse": [[1, 2, 2]], "spatial": [[128, 4, 1, 2], [4, 4, 16, 1]], "reduce": [], "reorder": [[2]], "inline": [], "unroll": [[512, 1]], "merge": []}], {"fuse": [], "spatial": [], "reduce": [], "reorder": [], "inline": [[0, 0]], "unroll": [], "merge": [[1, 0]]}]
Use 0.0080564 ms
Cost 1806.2457213401794 s

and if I run case 3~6, then case 3 will be OK but case 4 will fail

$ python3 optimize_block_circulant_matrix.py --target cuda --parallel 20 --timeout 120 -f 2 -t 6
Optimize block_circulant_matrix shape (1024, 512, 8)
......

Optimize block_circulant_matrix shape (1024, 512, 16)
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
warm up [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
......

same problem also appear when I run some other scripts such as 'optimize_conv1d.py' and 'optimize_gemm.py'

Multiprocessing isolation

@KnowingNothing
I have a question regarding the call path of the parallel_evaluate routines in FlexTensor.
When tuning for an "llvm" target, it makes sense that the "parallel" option should be set to <= num cores as to have a form of isolation on the HW and thus have accurate performance measurements, where each measurement occurs on a separate core.
In the case of "cuda" target, where the candidate schedules are executed on the GPU, this isn't as straightforward.

When looking at the number of simultaneously executing schedules on the GPU, I could observe several simultaneous processes being present on the device - confirmed via inspection through nvidia-smi - during a single measurement trial. This is somewhat concerning as it causes multiple processes to potentially compete for GPU resources at the same time during measurement, effectively causing the measurement to be inaccurate as the candidates can be scheduled in a different order internally by the nvidia's cuda driver scheduler.

FlexTensor seems to reuse some of the tvm's RPC pipeline for evaluating modules, namely "func.time_evaluator" which goes through the RPCTimeEvaluator module, in basically the same fashion as it happens in AutoTVM, with the main difference being that in AutoTVM, the LocalRunner (locally spawned RPC server & tracker) keeps the "parallel" option hardcoded to 1 as to maintain the isolation between 2 separate candidate schedules being measured on the device. I thought its maybe something to do with the torch.multiprocessing module, however it seems to be a simple wrapper over Python's multiprocessing which allows to share a portion of memory between several processes and on face-value, has nothing to do with isolation when it comes to GPU execution.

For "cuda" targets, would you recommend to stick with parallel = 1 to ensure process isolation or maybe I'm missing something obvious that ensures the isolation is maintained despite multiple processes executing on the device simultaneously?
I've tested this with AutoTVM and Ansor and they both seem to isolate each candidate and as such, you can only see a single process being registered in nvidia-smi at one time.

Any chance you could clarify the above?

Many thanks! :)

Problem for importing flextensor

Hi! I am following your guide to run the tutorial code: https://pku-ahs.github.io/tutorial/en/master/steps.html#install

I have installed and pulled your docker in my WSL, and I have completed these steps:

cd FlexTensor-Micro
export PYTHONPATH=$PYTHONPATH:/path/to/FlexTensor-Micro
cd FlexTensor-Micro/flextensor/tutorial

# First, CPU experiments
cd conv2d_llvm

# run flextensor
python optimize_conv2d.py --shapes res --target llvm --parallel 8 --timeout 20 --log resnet_config.log

But I am stuck by this bug:

root@4f3f7856e4d4:/FlexTensor-Micro/flextensor/tutorial/conv2d_llvm# python3 optimize_conv2d.py --shapes res --target llvm --parallel 8 --timeout 20 --log resnet_config.log
Traceback (most recent call last):

  File "optimize_conv2d.py", line 9, in <module>
    from flextensor.utils import Config, RpcInfo

ModuleNotFoundError: No module named 'flextensor'

Could you kindly help me? Thank you!!!!

No valid solution found using the tutorial example

Source Code:

import tvm

def gemm(A, B):
    k = tvm.reduce_axis((0, B.shape[0]))
    return tvm.compute((A.shape[0], B.shape[1]), lambda i, j: tvm.sum(A[i, k] * B[k, j], axis=k))

def wrap_gemm(N, K, M):
    A = tvm.placeholder((N, K))
    B = tvm.placeholder((K, M))
    Output = gemm(A, B)
    return [Output.op], [A, B, Output]

from flextensor.task import register_task, Task
from flextensor.model import WalkerGroup
from flextensor.scheduler import schedule

if __name__ == '__main__':
  task = Task(
    "gemm",
    "gemm",
    wrap_gemm,
    (1024, 1024, 1024),
    "llvm",
    0)
  register_task(task)
  s, bufs, configs = schedule(
            task.key, # give the key of target task
            slevel=4,
            rlevel=3,
            op_trial=100,
            timeout=10,
            op_stop=30,
            method="searching",
            parallel=4,
            )

Output:

graph space size 1
op 0 space size: 43188288
[Warning] Directory lib is not empty, but reusing it
warm up [1584960963.652893] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584960977.754150] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584960991.611287] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961005.486223] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961019.563355] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961033.576784] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961047.659202] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961061.669718] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961075.572861] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961089.390276] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961103.236112] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961117.260888] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961131.503041] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961145.477987] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961159.549252] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961173.469501] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961187.413991] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961201.389768] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961215.701637] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
warm up [1584961230.179169] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
warm up [1584961244.064868] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
warm up [1584961257.944109] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
warm up [1584961271.798598] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 20
warm up [1584961285.662753] [ inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf ]
Warning: No valid schedule found in warm up process, please use more trials
...
...

[CPU] TypeError: reduce() of empty sequence with no initial value

Hi,
I'm running optimize/optimize_conv2d.py with a custom convolution layer on a CPU and I get the following error. I understand why this happens but shouldn't there be a guard when config['spatial'] is empty?

The layer parameters are (128, 128, 28, 28, 256,_ , 3, 3, _, 1, 1, 1, 1) and optimize_conv2d.py uses default args.

warm up [1606782542.740292] [ inf ]
Warning: No valid schedule found in warm up process, please use more trials
Now automatically use more trials, increase 1
Fail to find valid schedule, too many errors

Traceback (most recent call last):

  File "optimize_conv2d.py", line 210, in <module>
    logfile=flog,

  File "optimize_conv2d.py", line 116, in optimize
    rpc_info=rpc_info,

  File "/homes/tharindu/FlexTensor/flextensor/scheduler.py", line 2135, in schedule
    s, bufs = schedule_with_config(task_key, configs, rewrite=rewrite)

  File "/homes/tharindu/FlexTensor/flextensor/scheduler.py", line 2155, in schedule_with_config
    s, bufs = schedule_with_config_ops(ops, bufs, configs, op_pos=op_pos, target=task.target)

  File "/homes/tharindu/FlexTensor/flextensor/scheduler.py", line 2202, in schedule_with_config_ops
    template(s, op, op_states[i])

  File "/homes/tharindu/FlexTensor/flextensor/scheduler.py", line 1680, in _cpu_schedule_simple
    tmp_extent = reduce(lambda a, b: a * b, [x[count] for x in config["spatial"]])

TypeError: reduce() of empty sequence with no initial value

Thanks,

'test_graph_schedule_cpu_general_dx' can not be imported

Hello,

I got the following error when try to run some examples, like flextensor/examples/opt_gemm_cpu.py.

Traceback (most recent call last):

  File "opt_gemm_cpu.py", line 6, in <module>
    from flextensor.test import test_graph_schedule_cpu_general_dx

ImportError: cannot import name 'test_graph_schedule_cpu_general_dx'

Is there any solution for that?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.