imp: https://en.wikipedia.org/wiki/Thread_block https://www.quora.com/What-is-NVIDIA-GPU-micro-architecture https://www.cs.cmu.edu/afs/cs/academic/class/15668-s11/www/cuda-doc/html/group__CUDART__DEVICE_g5aa4f47938af8276f08074d09b7d520c.html https://devtalk.nvidia.com/default/topic/516290/shared-memory-per-block/ https://stackoverflow.com/questions/3519598/streaming-multiprocessors-blocks-and-threads-cuda?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
http://fimi.ua.ac.be/data/T10I4D100K.dat
https://stackoverflow.com/questions/11498769/what-cuda-shared-memory-size-means?rq=1