Giter Site home page Giter Site logo

gpgpu-sim_uvmsmart's People

Contributors

debashisganguly avatar ronianz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

gpgpu-sim_uvmsmart's Issues

cudaMallocManaged addresses which are not multiple of page size do not work correctly

UVM make the base address of managed malloc addresses be multiple of 256.

This causes that two basic blocks within a large page share same page.

For example, assuming each basic block corresponds to 64 KB, first basic block owns 17 pages and second basic block owns 17 pages, This means that two basic blocks share the same page.

As a result, same pages can be requested on PCI-e. In addition, when evicting he pages, the algorithm implemented in the source code does not work correctly (valid_pages_erase() erases wrong pages, fragmentation, etc).

is_block_evictable() returns false forever due to is_valid() condition on large page

I used 10 MB gddr size, 2 MB LRU page eviction policy and TBN prefetcher.

When the page eviction has to occur (when should_evict_page() returns true), is_block_evictable() checks whether the first page of the large page is valid or not.

Because is_valid() condition keeps returning false (because first page is not touched and arrived yet), pages cannot be evicted.

It seems that if the first page of the large page is not touched forever, the eviction cannot be done forever.

This means that the simulator runs forever, finding the evictable 2MB pages.

D2H time missing under oversubscription

In my understanding, there should be D2H memcpy under oversubscription. But not all log files under Results/Smart_Runtime/Oversub have D2H memcpy time, am I missing some details?

Error when running managed benchmarks in Docker image

After building the image from the Dockerfile, I am having trouble running the test cases provided in benchmarks/Managed. Specifically, the backprop benchmark outputs "ALLOC_1D_DBL: Couldn't allocate array of n floats", indicating that the cudaMallocManaged calls returned null. Running some of the other benchmarks like sssp give an error saying that the CUDA driver version is insufficient for the runtime version. This is directly after having built the image, before making any modifications, aside from adding the CUDA installation to the path. What could be causing this?

I miss a error and can not fix it, please help me

GPGPU-Sim PTX: allocating global region for "__T20" from 0x100 to 0x135 (global memory space)
GPGPU-Sim PTX: Warning __T20 was declared previous at _4.ptx:12 skipping new declaration
GPGPU-Sim PTX: allocating stack frame region for .param "_ZN5cufhe12DataTemplateIiED1Ev_param_0" from 0x0 to 0x8
GPGPU-Sim PTX: allocating stack frame region for .param "_ZN5cufhe12DataTemplateIiED0Ev_param_0" from 0x0 to 0x8
GPGPU-Sim PTX: allocating stack frame region for .param "func_retval0" from 0x0 to 0x8
GPGPU-Sim PTX: allocating stack frame region for .param "_ZNK5cufhe12DataTemplateIiE8SizeDataEv_param_0" from 0x8 to 0x10
GPGPU-Sim PTX: allocating stack frame region for .param "_ZN5cufhe11LWESample_TIiED1Ev_param_0" from 0x0 to 0x8
GPGPU-Sim PTX: allocating stack frame region for .param "_ZN5cufhe11LWESample_TIiED0Ev_param_0" from 0x0 to 0x8
GPGPU-Sim PTX: allocating stack frame region for .param "func_retval0" from 0x0 to 0x8
GPGPU-Sim PTX: allocating stack frame region for .param "_ZNK5cufhe11LWESample_TIiE8SizeDataEv_param_0" from 0x8 to 0x10
GPGPU-Sim PTX: allocating stack frame region for .param "free_param_0" from 0x0 to 0x8
GPGPU-Sim PTX: Warning __T20 was declared previous at _4.ptx:12 skipping new declaration
GPGPU-Sim PTX: allocating global region for "_ZTVN5cufhe12DataTemplateIiEE" from 0x180 to 0x1a8 (global memory space)
_4.ptx:60: Syntax error:

.global .align 8 .u64 _ZTVN5cufhe12DataTemplateIiEE[5] = {0, 0, _ZN5cufhe12DataTemplateIiED1Ev, _ZN5cufhe12DataTemplateIiED0Ev, _ZNK5cufhe12DataTemplateIiE8SizeDataEv};
^

GPGPU-Sim PTX: parser error detected, exiting... but first extracting .ptx to "_ptx_errors_wFRTmT"
Aborted (core dumped)

cudaMallocManaged support

Can the simulator can support over-gpu-memory allocation using cudaMallocManaged? Supposed gpu memory is 4GB and I want to allocate 6GB on host first and then gpu accesses by on-demand paging and then have oversubscription. Since I found thatcudaMallocManaged in cuda_runtime_api.cc and then subsequent call of gpu_mallocmanaged seem to have a 4G size limit?

prmt_impl instruction

I'm kind of new to this thing, but how can we add our own implementation of instructions? Are there any useful links to learn that maybe?

Duplicated calls on reserved_page_remove()

If reserve_pages_remove() is called when cache MISS or HIT_RESERVED occur,

reserve_pages_remove() from 'case 4' in writeback() over-decreases reserve page counter, which finally cause assertion.

reserve_pages_remove() should only be called for the mem_fetch object returned from the lower level (e.g., L2).
The locations are 'case 4' in writeback() and where WRITE_ACK is checked.

Am I missing something?

Build From Source Errors

I am not using Docker but just build from source. My environment is CentOS 6.10, and gcc 5.4 also cuda 8.0. When running benchmarks, it will have the errors at src/option_parser.cc:102 because the argument str is an empty string and the type of m_variable is integer and an exception happens. The cause is from src/gpgpu-sim/gpu-sim.cc:171. So that I change the 6th parameter from "" to "0". Recompile and then running the benchmark is ok now.

Query in Eviction code

latency_type ltype = latency_type::PCIE_WRITE_BACK;
for (std::list<eviction_t *>::iterator it = valid_pages.begin(); it != valid_pages.end(); it++) {
if ((*it)->addr <= iter->first && iter->first < (*it)->addr + (*it)->size) {
if ((*it)->RW == 1) {
ltype = latency_type::INVALIDATE;
break;
}
}
}

In the part where the pages are evicted, there is this loop that checks if there are any RW pages and decides the latency type based on it. The code uses latency_type::PCIE_WRITE_BACK by default and switches to latency_type::INVALIDATE if there are any RW pages in the set of evicted pages.

Shouldn't it be the other way around? I would expect to use writeback latency only when there are modified pages that needs to be written back and invalidate latency when all the pages are clean.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.