keroro824 / hashingdeeplearning Goto Github PK

View Code? Open in Web Editor NEW

1.1K 53.0 169.0 22.49 MB

Codebase for "SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems"

License: MIT License

C++ 78.66% C 0.91% Python 14.91% Makefile 1.15% CMake 4.37%

hashingdeeplearning's Introduction

SLIDE

The SLIDE package contains the source code for reproducing the main experiments in this paper.

For Optimized Code on CPUs (with AVX, BFloat and other memory optimization) from the newer paper please refer here

Dataset

The Datasets can be downloaded in Amazon-670K. Note that the data is sorted by labels so please shuffle at least the validation/testing data.

TensorFlow Baselines

We suggest directly get TensorFlow docker image to install TensorFlow-GPU. For TensorFlow-CPU compiled with AVX2, we recommend using this precompiled build.

Also there is a TensorFlow docker image specifically built for CPUs with AVX-512 instructions, to get it use:

docker pull clearlinux/stacks-dlrs_2-mkl

config.py controls the parameters of TensorFlow training like learning rate. example_full_softmax.py, example_sampled_softmax.py are example files for Amazon-670K dataset with full softmax and sampled softmax respectively.

Run

python python_examples/example_full_softmax.py
python python_examples/example_sampled_softmax.py

Running SLIDE

Dependencies

CMake v3.0 and above
C++11 Compliant compiler
Linux: Ubuntu 16.04 and newer
Transparent Huge Pages must be enabled.
- SLIDE requires approximately 900 2MB pages, and 10 1GB pages: (Instructions)

Notes:

For simplicity, please refer to the our Docker image with all environments installed. To replicate the experiment without setting Hugepages, please download Amazon-670K in path /home/code/HashingDeepLearning/dataset/Amazon
Also, note that only Skylake or newer architectures support Hugepages. For older Haswell processors, we need to remove the flag -mavx512f from the OPT_FLAGS line in Makefile. You can also revert to the commit 2d10d46b5f6f1eda5d19f27038a596446fc17cee to ignore the HugePages optimization and still use SLIDE (which could lead to a 30% slower performance).
This version builds all dependencies (which currently are ZLIB and CNPY).

Commands

Change the paths in ./SLIDE/Config_amz.csv appropriately.

git clone https://github.com/sarthakpati/HashingDeepLearning.git
cd HashingDeepLearning
mkdir bin
cd bin
cmake ..
make
./runme ../SLIDE/Config_amz.csv

hashingdeeplearning's People

Contributors

Stargazers

Watchers

Forkers

rush-lab synchronicity89 pombredanne miaojinshuai dbunandar lostmsu ricklentz innovativeexpedition jaredpilcher trohard chunde monkeyking davidalphafox sophiesongge lygztq seanlausl luyifanlu zebrajack apextw anarchy89 scleveland sarthakpati play3577 shadowkun wrathematics happog maxy218 ashbt abtgit chomolungma eddebc muharremokutan quire7 inspectordidi xman min-xu-ai buddhics mycpuorg nibircse acproject zizai fejiso julianocristian wangdongya bothe queenmariehatcher jordanmicahbennett kaplie joeworld jsrobin hhucypa edisonnica hhy5277 bazzmx xrosliang pskhodad sivamgr lavrovd ved27 sanjeevsolanki awesome-archive jeanmaximiliencadic zhangkai2017 27260102 kuncao satoshirobatofujimoto tanviranik zeta1999 dewittpe robinvanemden beyonddream-productions prashant118 lukw00heck jld23 stjordanis mo3et anshkumar ognir perfmjs mindis slide-upcxx cleemesser tarsbase rishabhdahale logichen keshava rohinkumar r0b3rt24 itomakiweb-corp naototty jrdeco560 lukasc-ch aunali1 matteo-grella pejvan serkanishchi nevilshah235 zheddie heronalps hieuhoang

hashingdeeplearning's Issues

Any plan for an OS X version ?

Hi there,

is there any plan to provide an OS X compatible source version? (or some patches ?)

Thanks in advance,
Djamé

logloss in `Network::ProcessInput`

Right now, logloss in Network::ProcessInput is just hardcoded to zero -- is there an easy way to actually compute and return this?

Illegal instruction (core dumped)

I have the following issue:

/runme Config_amz.csv
new Network
new Layer
new Node array
Illegal instruction (core dumped)

I reversed to 2d10d46 and then I was able to run the experiment.

My TPH configuration:

hugeadm --pool-list
Size  Minimum  Current  Maximum  Default
2097152     1000     1000     1000        *
1073741824       20       20       20

dmesg says
[7506565.490401] traps: runme[27327] trap invalid opcode ip:5654a0192380 sp:7ffd6a645dd0 error:0 in runme[5654a0188000+1c000]
CPU:

model name      : Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
 cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3
cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total
cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d

OS: Ubuntu 18.04
g++: 7.4.0

Please let me know if more information is needed

Failed to execute with default given commands

I followed the steps mentioned on readme. I got the following :
./runme ../SLIDE/Config_amz.csv

new Network
new Layer
new Node array
[1] 2402 segmentation fault (core dumped) ./runme ../SLIDE/Config_amz.csv

I did adjust the config file as well and removed -mavx512f from Makefile

error in docker: shape requires a multiple of 100

Steps to reproduce:

docker run -it ottovonxu/slide:v3 bash

___ /________________________________ / /______ __
__ / _ _ _ __ _ / __ _ / / __ / __ _ | /| / /
_ / / / / / /( )/ // / / _ __/ _ / / // / |/ |/ /
// ___/// //// _/// // // _/__/|__/

WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.

To avoid this, run the container by specifying your user's userid:

$ docker run -u $(id -u):$(id -g) args...

root@0506217073cd:/# cd /slide/src/HashingDeepLearning/python_examples

root@0506217073cd:/slide/src/HashingDeepLearning/python_examples# python example_sampled_softmax.py
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_impl.py:1344: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

2020-03-08 20:17:57.007308: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Finished 0 steps. Time elapsed for last 500 batches = 0.00030684471130371094
test_acc: 0.0
#######################
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 85771648 values, but the requested shape requires a multiple of 100
[[Node: Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_Placeholder_2_0_2, sampled_softmax_loss/concat_1/values_0)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "example_sampled_softmax.py", line 116, in
main()
File "example_sampled_softmax.py", line 94, in main
sess.run(train_step, feed_dict={x_idxs:idxs_batch, x_vals:vals_batch, y:labels_batch})
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 85771648 values, but the requested shape requires a multiple of 100
[[Node: Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_Placeholder_2_0_2, sampled_softmax_loss/concat_1/values_0)]]

Caused by op 'Reshape', defined at:
File "example_sampled_softmax.py", line 116, in
main()
File "example_sampled_softmax.py", line 55, in main
loss = tf.reduce_mean(tf.nn.sampled_softmax_loss(tf.transpose(W2),b2,tf.reshape(y,[-1,max_label]),layer_1,n_samples,n_classes,remove_accidental_hits=False, num_true=max_label,partition_strategy='div'))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 6113, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 85771648 values, but the requested shape requires a multiple of 100
[[Node: Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_Placeholder_2_0_2, sampled_softmax_loss/concat_1/values_0)]]

Warnings about narrowing conversions

During the build for #11, I saw lots of warnings about narrowing conversions, which should be fixed for stability.

I think this also goes back to #9 a bit, since ensuring cross-platform conformance via a third-party tool would basically ensure some of these warnings are fixed.

ottovonxu/slide:inf python code return errors

I'm trying to run code in your docker image but it always return error as bellow image.

Run code TF-GPU erroring

I run python code using TF-GPU on this image tensorflow/tensorflow:latest-gpu, and they alot of errors in code such as:

x_idxs = tf.placeholder(tf.int64, shape=[None,2])
AttributeError: module 'tensorflow' has no attribute 'placeholder'

I solved it by using
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

### But why this error, I think the code must run without any error?

Pytorch compatibility

Hi.

Thank you for your greatly interesting work !

I know that a huge chunk of the industry uses pytorch for research, and tensorflow for deployment. So making this useful for training with the pytorch framework would be very appreciated, I think.
Are there any plans to make this compatible with pytorch ?

Adapt to convolutionnal layers

Are there any plans to adapt this work to convolutionnal layers in tensorflow ?

Segmentation fault

When trying to run the program runme I have a degmentation fault. I will be grateful if you could provide solution hints.

This is valgrind output:

valgrind -s --leak-check=full ./runme Config_amz.csv
==26728== Memcheck, a memory error detector
==26728== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==26728== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==26728== Command: ./runme Config_amz.csv
==26728==
new Network
==26728== Invalid write of size 4
==26728== at 0x40F9EB: Network::Network(int*, NodeType*, int, int, float, int, int*, int*, int*, float*, std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, cnpy::NpyArray, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, cnpy::NpyArray> > >) (Network.cpp:15)
==26728== by 0x403936: main (main.cpp:472)
==26728== Address 0xb is not stack'd, malloc'd or (recently) free'd
==26728==
==26728==
==26728== Process terminating with default action of signal 11 (SIGSEGV)
==26728== Access not within mapped region at address 0xB
==26728== at 0x40F9EB: Network::Network(int*, NodeType*, int, int, float, int, int*, int*, int*, float*, std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, cnpy::NpyArray, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, cnpy::NpyArray> > >) (Network.cpp:15)
==26728== by 0x403936: main (main.cpp:472)
==26728== If you believe this happened as a result of a stack
==26728== overflow in your program's main thread (unlikely but
==26728== possible), you can try to increase the size of the
==26728== main thread stack using the --main-stacksize= flag.
==26728== The main thread stack size used in this run was 8388608.
==26728==
==26728== HEAP SUMMARY:
==26728== in use at exit: 246 bytes in 12 blocks
==26728== total heap usage: 37 allocs, 25 frees, 115,114 bytes allocated
==26728==
==26728== LEAK SUMMARY:
==26728== definitely lost: 0 bytes in 0 blocks
==26728== indirectly lost: 0 bytes in 0 blocks
==26728== possibly lost: 0 bytes in 0 blocks
==26728== still reachable: 246 bytes in 12 blocks
==26728== suppressed: 0 bytes in 0 blocks
==26728== Reachable blocks (those to which a pointer was found) are not shown.
==26728== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==26728==
==26728== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==26728==
==26728== 1 errors in context 1 of 1:
==26728== Invalid write of size 4
==26728== at 0x40F9EB: Network::Network(int*, NodeType*, int, int, float, int, int*, int*, int*, float*, std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, cnpy::NpyArray, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, cnpy::NpyArray> > >) (Network.cpp:15)
==26728== by 0x403936: main (main.cpp:472)
==26728== Address 0xb is not stack'd, malloc'd or (recently) free'd
==26728==
==26728== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Erreur de segmentation (core dumped)

Profile for performance considerations

I'm seeing scale-out behavior that needs some further study. In particular, larger thread numbers seem to increase the page usage, but I don't have the profiler completely setup. Has anyone come up with a configuration file that is good for profiling? IE something with a 10-15 min run time but still enables a data access pattern that is in the spirit of the paper..

Multiple CPU Parallelism?

After reading the paper I believe the approach here is Open MPI? If so would CPU parallelism be an option here or would network-bound IPC become the limiting factor? I am wondering if Dask MPI integration would make sense as first steps to support it? https://mpi.dask.org/en/latest/

Unable to get Amazon-670k in the format needed

The download link asks for access, but after two days I did not get access.

Other sources of Amazon-670K have different file format, and it is not documented, what file format you need,

thx for any help

Code vs Paper

In attempting to run the code and compare to the plots in your paper, but things at first don't appear to match up. I've seen the previous issue mentioning the different batch size using the Amazon-670K dataset between the committed code and what was run. I'm also curious as to the logging steps and the number of batches used to calculated the accuracy between the plots in Figure 5 and the current code. Running the TF GPU code the accuracy is reported every 50 batches, and it will achieve a top value of around 0.6 compared to the 0.3 in the paper. This is based on the log file but I want to check whether you used a different sample batch count for calculating the accuracy than the current code (which does it with 20 batches every 50 steps in python_examples/example_full_softmax.py).

It would be nice to have a setup which can reproduce the runs you have to ensure an adequate baseline. The wall clock times for a run of the TF GPU code came out much slower (>8 hours, reporting ~22s every 50 steps using the latest code from the repository) than what I believe is shown in the figures. You mention a few different run times but it's not always clear exactly which configuration that was used. I've been sticking to the full not sampling TF code on GPU so far, and I have yet to compare on the same hardware with TF CPU or SLIDE yet as I am seeing such a difference in run time and accuracy using the code as-is.

In the meantime I've been experimenting with tuning the GPU version as its GPU utilization was low and it spent quite a bit of time waiting on data. Overall i've achieved something close to a 3x improvement. You may have intentionally kept the code as similar as possible between SLIDE and TF but then that does not necessarily take advantages a GPUs raw performance. Could you comment on what if any of this may have been deliberate to offer some degree of equivalent comparison in your view. Again raw run time performance may not be the ultimate goal but understanding this and how to reproduce it is none-the-less helpful.

I did attempt to run the TF sampled softmax, however it produces an error "Input to reshape is a tensor with 85771648 values, but the requested shape requires a multiple of 100" and this was with TF 1.8 (for everything else i've been using TF 1.15 and TF 2.1.0).

Thanks

Tony

segfaults on nullptrl access

A quick gdb points to:

HashingDeepLearning/SLIDE/Layer.cpp

Line 453 in b7b3bba

    
           activeValuesperlayer[layerIndex + 1][i] = _Nodes[activenodesperlayer[layerIndex + 1][i]].getActivation(activenodesperlayer[layerIndex], activeValuesperlayer[layerIndex], lengths[layerIndex], inputID);

on layerindex=0, activeValuesperlayer[layerIndex] is 0x0

stdout:

new Layer
new Node array
128 24921
new Layer
new Node array
670091 1.92284e+06
after layer
Network Initialization takes 9038.33 milliseconds
128 records, with 0 features and 0 labels
Inference takes 5610.18 milliseconds
 iter 0: 0 correct
128 records, with 71740189 features and 758811012 labels
Segmentation fault (core dumped)```

CMake support

First off, let me congratulate the team on this fantastic idea and for making the implementation available for the community. I am sure this will push the boundaries of ML even further and truly democratize the landscape.

Would it be possible to add CMake support for your package?

Cheers,
Sarthak

Question about paper

Quick question -- is the code in python_examples/example_full_softmax.py the code that you used to produce Fig 5 in the paper?

DensifiedWtaHash and LSH questions

Dear authors, thanks for this wonderful work. I think the following may be bugs in the code (or in my understanding) and I will appreciate a response.

* LSH.cpp:82 => why log(binsize) and not log2(binsize). It looks like we want K * floor(log2(binsize)) == RangePow 
   -- as the comment on DensifiedWtaHash.cpp:60 seems to indicate
   -- this (i.e., binary log) seems to be the intent in the shift logic in line 82 as well  
* DensifiedWtaHash:97-99,150-152 => should it be _numhashes or _numhashes -1? 
   -- Lines 101 and 154 would cause a memory violation.
*  DensifiedWtaHash:102 => why 100 and is it safe to break as `next` and `hashArray[i]` would be INT_MIN 
   -- which causes a serious problem later in LSH.cpp:82, where with zero shift 0x80000000 > (1<<RangePow) - 1
   --  a few steps later this can cause a segmentation violation in LSH.cpp:130
         o (this issue is easier to see in the OptAdd version in LNS which removes the needless saving of indices)

Window support

Hi,

Having this building on Windows would be amazing. There are quite a few Linux-only headers that are being used and I am fairly certain cross-platform solutions can be found.

Thanks again for all the work!

Cheers,
Sarthak

Porting to Cython

Is it possible for you to port this into Cython and release it as PyPI package ?
It would be easy for existing DL users(tf and pytorch users) to use it natively in their code.

Memory leaks

I noticed wrong usage of delete.
For instance here:

HashingDeepLearning/SLIDE/Layer.cpp

Line 109 in e5a2078

delete _dwtaHasher, _binids;

Such expressions destroy only their first operand, ignoring everything followed by comma.
More precisely, comma is an operator that divides delete-expression into two independent parts.

./runme gets Segmentation fault (core dumped)

Hello Beidi Chen:
My name Yongtao Huang, an intern in VMware china.
I heard about that your new paper () have been accepted by ICLR2021. Congratulation.

I meet some error during performing SLIDE code.
My CPU is "Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz", which supports Haswell.
And I have activated Transparent HugePage.

(slide) root@hyongtao:/home/hyongtao/SLIDE/HashingDeepLearning/bin# ./runme ../SLIDE/Config_amz.csv
new Network
new Layer
new Node array
Segmentation fault (core dumped)
(slide) root@hyongtao:/home/hyongtao/SLIDE/HashingDeepLearning/bin#

Could you help me revise this problem? Thank you for a lot.

batch size for Amazon-670K dataset

in paper, it says, "We chose a batch size of 128 for Delicious-200K dataset and 256 for Amazon-670K dataset".

but in the code,
https://github.com/keroro824/HashingDeepLearning/blob/master/SLIDE/Config_amz.csv#L6

it is still 128. Which one is correct?

Add a license?

Hi there, I really like this work. I'd like to try to reimplement it in Julia. I can use the paper as a reference, and I'd like to use this code as a reference as well, but since there's no license, I'm not sure if I'm allowed to do so. Can you add a license to make it explicit (one way or another)?

Assuming you want people to use this code freely, the MIT license is a good choice. If you want to be more restrictive about how people use the code, then GPL or "Research use only" would be better. (I apologize if you already know this!).

Either way, it would be good to be explicit. Thanks!

(I'd submit the same issue in the RUSH-LAB version of this repo, but issues are not enabled there.)

No weight.npz and savedweight.npz files

In code I doesn't have the weight.npz and savedweight.npz files, Any one know where I find it, please?

The SLIDE fail to execute

I am trying to run the SLIDE (cpp implementation) per the steps in the readme, while with below configuration, is all failed, the platform I am using is a Xeon-SKX,

Configuration 1:
compile latest master code and run
./runme Config_amz.csv
new Network
Segmentation fault (core dumped)

Configuration 2:
Per description "You can also revert to the commit 2d10d46 to ignore the HugePages optmization and still use SLIDE (which could lead to a 30% slower performance).
run “git checkout 2d10d46” and run.
./runme Config_amz.csv
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr: __pos (which is 18446744073709551615) > this->size() (which is 0)
Aborted (core dumped)

Configuration 3:
Makefile code line 16 change to
OPT_FLAGS := -fno-strict-aliasing -g -O3 -fopenmp -march=skylake -mtune=intel -mavx2 -mavx512f
./runme Config_amz.csv
new Network
Segmentation fault (core dumped)

May I know if I did the right thing?

Thanks.

Segfault

This looks very interesting, but I'm unable to successfully run the example. I modified the trainData, testData, and logFile paths of the config file appropriately. Valgrind reports this:

==12259== Memcheck, a memory error detector
==12259== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12259== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==12259== Command: ./runme Config_amz.csv
==12259== 
new Network
==12259== Invalid write of size 4
==12259==    at 0x10F124: Network::Network(int*, NodeType*, int, int, float, int, int*, int*, int*, float*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, cnpy::NpyArray, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cnpy::NpyArray> > >) (Network.cpp:13)
==12259==    by 0x10D01E: main (main.cpp:472)
==12259==  Address 0xb is not stack'd, malloc'd or (recently) free'd

Issues with Implementation and Replicating Paper Results

Hi there! I am working on a PyTorch implementation of SLIDE. I'm currently trying to compare its performance against SLIDE. I'm faced with a few doubts/issues while evaluating SLIDE, and need clarifications for the same.

I'm unable to replicate the accuracy vs iteration plot for Delicious 200K dataset using the paramters Simhash, K=9, L=50 mentioned in the paper (plot attached). I also observe that SLIDE's accuracy seems to worsen beyond a certain point. What could be the reasons for these?
I observe a few inconsistencies in the implementations of WTA and DWTA hashes.

HashingDeepLearning/SLIDE/LSH.cpp

Line 82 in 3cebe6f

index += h<<((_K-1-j)*(int)floor(log(binsize)));

The hashes are combined as index += h<<((_K-1-j)(int)floor(log(binsize))); But, if the hashes are to simply be concatenated, shouldn't it instead be index += h<<((_K-1-j)(int)ceil(log2(binsize))); However, for binsize = 8, I also observe that shifting by floor(log(binsize)) = 2 bits gives better convergence than shifting by ceil(log2(binsize)) = 3 bits. Is this intentional? Why is this the case?
There appears to be a bug in WTA hash .

HashingDeepLearning/SLIDE/WtaHash.cpp

Line 57 in 3cebe6f

values[i] = data[i*binsize+j];

What is the reason behind using simhash for Delicious 200K and DWTA hash for Amazon 670K?
The paper had mentioned extension of SLIDE to convolution as a future direction. Has there been any progress along this line?

Pushing down _normalzationConstants array to threads

The paper shows good scaling to 32 threads. I have been looking at 64, 128, and higher thread counts.
There gets to be some contention in arrays like _normalizationConstants which appears to be suited to live in the body of the thread for all of its life except perhaps the one call to garner statistics.

In addition, in Layer.cpp. this change to loop and then write was beneficial to gcc and helped reduce the sharing contention.

if(_type == NodeType::Softmax) {
    float accum = 0;
    for (int i = 0; i < len; i++) {
        float realActivation = exp(activeValuesperlayer[layerIndex + 1][i] - maxValue);
        activeValuesperlayer[layerIndex + 1][i] = realActivation;
        _Nodes[activenodesperlayer[layerIndex + 1][i]].SetlastActivation(inputID, realActivation);
        accum += realActivation;
    }
    _normalizationConstants[inputID] = accum;
}

SLIDE

CPU learning

active neurons during inference

is it possible to tell which neurons are active during inference? I would have thought it was train._ActiveinputIds but this variable is forceable switched on at

HashingDeepLearning/SLIDE/Node.cpp

Line 94 in 4da2a62

_train[inputID]._ActiveinputIds = 1; //activate input

during inference.

Any advise gratefully appreciated

Killed error

I want ask Where we can find weight and savedweight files which included as path in confing file?

keroro824 / hashingdeeplearning Goto Github PK

hashingdeeplearning's Introduction

SLIDE

Dataset

TensorFlow Baselines

Running SLIDE

Dependencies

Notes:

Commands

hashingdeeplearning's People

Contributors

Stargazers

Watchers

Forkers

hashingdeeplearning's Issues

Recommend Projects

Recommend Topics

Recommend Org