debashisganguly / gpgpu-sim_uvmsmart Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
UVM make the base address of managed malloc addresses be multiple of 256.
This causes that two basic blocks within a large page share same page.
For example, assuming each basic block corresponds to 64 KB, first basic block owns 17 pages and second basic block owns 17 pages, This means that two basic blocks share the same page.
As a result, same pages can be requested on PCI-e. In addition, when evicting he pages, the algorithm implemented in the source code does not work correctly (valid_pages_erase() erases wrong pages, fragmentation, etc).
I used 10 MB gddr size, 2 MB LRU page eviction policy and TBN prefetcher.
When the page eviction has to occur (when should_evict_page() returns true), is_block_evictable() checks whether the first page of the large page is valid or not.
Because is_valid() condition keeps returning false (because first page is not touched and arrived yet), pages cannot be evicted.
It seems that if the first page of the large page is not touched forever, the eviction cannot be done forever.
This means that the simulator runs forever, finding the evictable 2MB pages.
In my understanding, there should be D2H memcpy under oversubscription. But not all log files under Results/Smart_Runtime/Oversub
have D2H memcpy time, am I missing some details?
After building the image from the Dockerfile, I am having trouble running the test cases provided in benchmarks/Managed. Specifically, the backprop benchmark outputs "ALLOC_1D_DBL: Couldn't allocate array of n floats", indicating that the cudaMallocManaged calls returned null. Running some of the other benchmarks like sssp give an error saying that the CUDA driver version is insufficient for the runtime version. This is directly after having built the image, before making any modifications, aside from adding the CUDA installation to the path. What could be causing this?
GPGPU-Sim PTX: allocating global region for "__T20" from 0x100 to 0x135 (global memory space)
GPGPU-Sim PTX: Warning __T20 was declared previous at _4.ptx:12 skipping new declaration
GPGPU-Sim PTX: allocating stack frame region for .param "_ZN5cufhe12DataTemplateIiED1Ev_param_0" from 0x0 to 0x8
GPGPU-Sim PTX: allocating stack frame region for .param "_ZN5cufhe12DataTemplateIiED0Ev_param_0" from 0x0 to 0x8
GPGPU-Sim PTX: allocating stack frame region for .param "func_retval0" from 0x0 to 0x8
GPGPU-Sim PTX: allocating stack frame region for .param "_ZNK5cufhe12DataTemplateIiE8SizeDataEv_param_0" from 0x8 to 0x10
GPGPU-Sim PTX: allocating stack frame region for .param "_ZN5cufhe11LWESample_TIiED1Ev_param_0" from 0x0 to 0x8
GPGPU-Sim PTX: allocating stack frame region for .param "_ZN5cufhe11LWESample_TIiED0Ev_param_0" from 0x0 to 0x8
GPGPU-Sim PTX: allocating stack frame region for .param "func_retval0" from 0x0 to 0x8
GPGPU-Sim PTX: allocating stack frame region for .param "_ZNK5cufhe11LWESample_TIiE8SizeDataEv_param_0" from 0x8 to 0x10
GPGPU-Sim PTX: allocating stack frame region for .param "free_param_0" from 0x0 to 0x8
GPGPU-Sim PTX: Warning __T20 was declared previous at _4.ptx:12 skipping new declaration
GPGPU-Sim PTX: allocating global region for "_ZTVN5cufhe12DataTemplateIiEE" from 0x180 to 0x1a8 (global memory space)
_4.ptx:60: Syntax error:
.global .align 8 .u64 _ZTVN5cufhe12DataTemplateIiEE[5] = {0, 0, _ZN5cufhe12DataTemplateIiED1Ev, _ZN5cufhe12DataTemplateIiED0Ev, _ZNK5cufhe12DataTemplateIiE8SizeDataEv};
^
GPGPU-Sim PTX: parser error detected, exiting... but first extracting .ptx to "_ptx_errors_wFRTmT"
Aborted (core dumped)
Can the simulator can support over-gpu-memory allocation using cudaMallocManaged? Supposed gpu memory is 4GB and I want to allocate 6GB on host first and then gpu accesses by on-demand paging and then have oversubscription. Since I found thatcudaMallocManaged in cuda_runtime_api.cc and then subsequent call of gpu_mallocmanaged seem to have a 4G size limit?
Hi
I think this repo is very useful to everybody using gpgpu-sim
do you have any plan to merge this repo with original one?
thanks
I'm kind of new to this thing, but how can we add our own implementation of instructions? Are there any useful links to learn that maybe?
If reserve_pages_remove() is called when cache MISS or HIT_RESERVED occur,
reserve_pages_remove() from 'case 4' in writeback() over-decreases reserve page counter, which finally cause assertion.
reserve_pages_remove() should only be called for the mem_fetch object returned from the lower level (e.g., L2).
The locations are 'case 4' in writeback() and where WRITE_ACK is checked.
Am I missing something?
I am not using Docker but just build from source. My environment is CentOS 6.10, and gcc 5.4 also cuda 8.0. When running benchmarks, it will have the errors at src/option_parser.cc:102 because the argument str is an empty string and the type of m_variable is integer and an exception happens. The cause is from src/gpgpu-sim/gpu-sim.cc:171. So that I change the 6th parameter from "" to "0". Recompile and then running the benchmark is ok now.
gpgpu-sim_UVMSmart/src/gpgpu-sim/gpu-sim.cc
Lines 2302 to 2311 in bafab5d
In the part where the pages are evicted, there is this loop that checks if there are any RW pages and decides the latency type based on it. The code uses latency_type::PCIE_WRITE_BACK by default and switches to latency_type::INVALIDATE if there are any RW pages in the set of evicted pages.
Shouldn't it be the other way around? I would expect to use writeback latency only when there are modified pages that needs to be written back and invalidate latency when all the pages are clean.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.