graphchi / graphchi-cpp Goto Github PK

View Code? Open in Web Editor NEW

800.0 800.0 311.0 24.51 MB

GraphChi's C++ version. Big Data - small machine.

Home Page: https://www.usenix.org/system/files/conference/osdi12/osdi12-final-126.pdf

Makefile 0.57% Python 0.31% CSS 0.29% JavaScript 0.28% C++ 94.43% C 1.07% Shell 1.24% HTML 1.63% Roff 0.19%

graphchi-cpp's People

Contributors

Stargazers

Watchers

Forkers

musicbug rtvt123 zhouhao15917 sanqiang happynewye pyduan gaosanyuan cryptomeme xydinesh shhong yolidozy meng-li runrunliuliu zxwinner redsuncmx albedium mylittlepython hermioneqiu liuyncool mrgloom thejonan davidxv sunbjt fish444555 zmahdavi kommusoft vlsi1217 alienfeel mulriple sipims generalpacific carlzhangxuan rusonwong atbrox easonzhaoz wangby zcourts luckysherry zhuxiaoqiang neozhangthe1 cescalante-carecloud jerryye mafeichao bassam123 predictionio fanfannothing aman147001 tempbottle mosessky ensemblearner sailingwood yunfuliu mattgenius ty01csbaidu timshenkao he-yunlong stupidshuangshaung jimmy0000 luckness ecispshuo willsewell msperlich sprime01 pawan2087 vduan divyavenu owo codeboy17 hesword timedcy slxu yinweichong pconstr drewnewell eshbeata warmwarm2014 dbl001 hadeli ashwin711 doxav oceanfar yuxiang-wu yanshanjing yubing84 hunslater arne-cl bowcharn denisfitz57 iamzoumao xingwudao wycg1984 timwee lazycrazyowl lmkoch chagge chubbymaggie wchen3 jananithiru cyniu jackiexie168

graphchi-cpp's Issues

Error when run Make, even though already installed apple-gcc42 and modified Makefile

Hi all,

I tried to install the Graphchi-cpp on my Macbook (Yosemite).
Even though I followed all the steps given for MacOS, i.e installed apple-gcc42, modifying the Makefile as CPP = g++-4.2, I still got error and cannot install the Graphchi-cpp
The error is below:

An-Vo:graphchi-cpp-master DANIEL_VO$ make
g++-4.2 -g -O3 -I/usr/local/include/ -I./src/ -fopenmp -Wall -Wno-strict-aliasing -Iexample_apps/ example_apps/connectedcomponents.cpp -o bin/example_apps/connectedcomponents -lz
couldn't understand kern.osversion `14.0.0'
In file included from ./src/metrics/metrics.hpp:40,
from ./src/io/stripedio.hpp:48,
from ./src/api/vertex_aggregator.hpp:42,
from ./src/graphchi_basic_includes.hpp:43,
from example_apps/connectedcomponents.cpp:50:
./src/util/pthread_tools.hpp: In constructor ‘graphchi::semaphore::semaphore()’:
./src/util/pthread_tools.hpp:161: warning: ‘sem_init’ is deprecated (declared at /usr/include/sys/semaphore.h:55)
./src/util/pthread_tools.hpp:161: warning: ‘sem_init’ is deprecated (declared at /usr/include/sys/semaphore.h:55)
./src/util/pthread_tools.hpp: In destructor ‘graphchi::semaphore::~semaphore()’:
./src/util/pthread_tools.hpp:173: warning: ‘sem_destroy’ is deprecated (declared at /usr/include/sys/semaphore.h:53)
./src/util/pthread_tools.hpp:173: warning: ‘sem_destroy’ is deprecated (declared at /usr/include/sys/semaphore.h:53)
./src/api/chifilenames.hpp: At global scope:
./src/api/chifilenames.hpp: In instantiation of ‘std::basic_ostream<_CharT, _Traits>& std::operator<<(std::basic_ostream<_CharT, _Traits>&, const std::basic_string<_CharT, _Traits, _Alloc>&) [with _CharT = char, _Traits = std::char_traits, _Alloc = std::allocator]’:
./src/api/chifilenames.hpp:79: instantiated from here
./src/api/chifilenames.hpp:79: error: explicit instantiation of ‘std::basic_ostream<_CharT, _Traits>& std::operator<<(std::basic_ostream<_CharT, _Traits>&, const std::basic_string<_CharT, _Traits, _Alloc>&) [with _CharT = char, _Traits = std::char_traits, _Alloc = std::allocator]’ but no definition available
./src/api/chifilenames.hpp: In instantiation of ‘std::basic_ostream<_CharT, _Traits>& std::operator<<(std::basic_ostream<_CharT, _Traits>&, const std::basic_string<_CharT, _Traits, _Alloc>&) [with _CharT = char, _Traits = std::char_traits, _Alloc = std::allocator]’:
./src/api/chifilenames.hpp:79: instantiated from here
./src/api/chifilenames.hpp:79: error: explicit instantiation of ‘std::basic_ostream<_CharT, _Traits>& std::operator<<(std::basic_ostream<_CharT, _Traits>&, const std::basic_string<_CharT, _Traits, _Alloc>&) [with _CharT = char, _Traits = std::char_traits, _Alloc = std::allocator]’ but no definition available
./src/api/chifilenames.hpp: In instantiation of ‘std::basic_ostream<_CharT, _Traits>& std::operator<<(std::basic_ostream<_CharT, _Traits>&, const std::basic_string<_CharT, _Traits, _Alloc>&) [with _CharT = char, _Traits = std::char_traits, _Alloc = std::allocator]’:
./src/api/chifilenames.hpp:79: instantiated from here
./src/api/chifilenames.hpp:79: error: explicit instantiation of ‘std::basic_ostream<_CharT, _Traits>& std::operator<<(std::basic_ostream<_CharT, _Traits>&, const std::basic_string<_CharT, _Traits, _Alloc>&) [with _CharT = char, _Traits = std::char_traits, _Alloc = std::allocator]’ but no definition available
./src/api/chifilenames.hpp: In instantiation of ‘std::basic_ostream<_CharT, _Traits>& std::operator<<(std::basic_ostream<_CharT, _Traits>&, const std::basic_string<_CharT, _Traits, _Alloc>&) [with _CharT = char, _Traits = std::char_traits, _Alloc = std::allocator]’:
./src/api/chifilenames.hpp:79: instantiated from here
./src/api/chifilenames.hpp:79: error: explicit instantiation of ‘std::basic_ostream<_CharT, _Traits>& std::operator<<(std::basic_ostream<_CharT, _Traits>&, const std::basic_string<_CharT, _Traits, _Alloc>&) [with _CharT = char, _Traits = std::char_traits, _Alloc = std::allocator]’ but no definition available
make: *** [example_apps/connectedcomponents] Error 1

It shows some error in the file ./src/api/chifilenames.hpp, but I could not find the error there.
Please kindly advise.

Bests,
An Vo

couldn't understand kern.osversion `14.3.0'

I encountered the above error as I was trying to execute "make apps" or "make".
It seems that g++-4.2 is not working on yosemite. Instead, I used g++-4.8 and it worked fine :)

What does the multiplex file mean in stripedio.cpp ?

Improve exception safety with smart pointers

Would you like to wrap any pointer data members with the template class "std::unique_ptr"?

Update candidate: bitset_scheduler

Errors when make files

After I download the zip file and unzip it, I want to run the project. However, when I compile it with mingw make command, I came across this error:

File not found - .hpp
File not found - *.hpp
File not found - *.hpp
File not found - *.hpp
File not found - *.hpp
The syntax of the command is incorrect.
makefile:31: recipe for targe 'example_apps/connectedcomponents' failed
make: ** [example_apps/connectedcomponents] Error 1

How can I fix this?

Unclear error message when system is out of file descriptors

./toolkits/collaborative_filtering/itemcf --distance=4 --training=dataset --K=350 --quiet=0 --nshards=1 --clean_cache=1 --min_allowed_intersection=48 --allow_zeros=1
WARNING: common.hpp(print_copyright:195): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to [email protected]
[distance] => [4]
[training] => [dataset]
[K] => [350]
[quiet] => [0]
[nshards] => [1]
[clean_cache] => [1]
[min_allowed_intersection] => [48]
[allow_zeros] => [1]
WARNING: chifilenames.hpp(find_shards:271): Could not find shards with nshards = 1
WARNING: chifilenames.hpp(find_shards:272): Please define 'nshards 0' or 'nshards auto' to automatically detect.
INFO: sharder.hpp(start_preprocessing:370): Starting preprocessing, shovel size: 17476266
INFO: io.hpp(compute_matrix_size:136): Starting to read matrix-market input. Matrix dimensions: 5466961 x 15325, non-zeros: 125225326
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.0.shovel, max:5475672
INFO: sharder.hpp(flush:193): Sort done.dataset4.0.shovel
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.1.shovel, max:5477316
INFO: sharder.hpp(flush:193): Sort done.dataset4.1.shovel
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.2.shovel, max:5477394
INFO: sharder.hpp(flush:193): Sort done.dataset4.2.shovel
INFO: sharder.hpp(flush_shovel:407): Too many outstanding shoveling threads...
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.3.shovel, max:5479571
INFO: sharder.hpp(flush:193): Sort done.dataset4.3.shovel
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.4.shovel, max:5481431
INFO: sharder.hpp(flush:193): Sort done.dataset4.4.shovel
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.5.shovel, max:5481431
INFO: sharder.hpp(flush:193): Sort done.dataset4.5.shovel
INFO: sharder.hpp(flush_shovel:407): Too many outstanding shoveling threads...
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.6.shovel, max:5481997
INFO: sharder.hpp(flush:193): Sort done.dataset4.6.shovel
INFO: io.hpp(convert_matrixmarket:564): Global mean is: 317.101 Now creating shards.
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.7.shovel, max:5482284
INFO: sharder.hpp(flush:193): Sort done.dataset4.7.shovel
INFO: sharder.hpp(flush_shovel:401): Waiting shoveling threads...
INFO: io.hpp(convert_matrixmarket:578): Now creating shards.
INFO: sharder.hpp(determine_number_of_shards:586): Number of shards to be created: 1
INFO: sharder.hpp(write_shards:967): Edges per shard: 125225327 nshards=1 total: 125225326
INFO: sharder.hpp(write_shards:973): Buffer size in merge phase: 52428804
INFO: sharder.hpp(finish_shard:625): Starting final processing for shard: 0
DEBUG: sharder.hpp(finish_shard:636): Shovel size:1502703912 edges: 125225326
DEBUG: sharder.hpp(finish_shard:702): 0 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 10000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 20000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 30000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 40000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 50000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 60000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 70000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 80000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 90000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 100000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 110000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 120000000 / 125225326
INFO: sharder.hpp(createnextshard:887): Remaining edges: 0 remaining shards:0 edges per shard=125225327
INFO: sharder.hpp(createnextshard:890): Edges per shard: 125225327

=== REPORT FOR sharder() ===
[Timings]
edata_flush: 0.352861s (count: 478, min: 0.000675s, max: 0.001248, avg: 0.000738203s)
execute_sharding: 21.844 s
finish_shard.sort: 9.24113 s
preprocessing: 55.024 s
shard_final: 14.7475 s
[Other]
app: sharder
INFO: sharder.hpp(done:903): Created 1 shards, for 125225326 edgesSuccessfully finished sharding for dataset
INFO: io.hpp(convert_matrixmarket:583): Created 1 shards.
INFO: itemcf.cpp(main:514): M = 5466961
DEBUG: stripedio.hpp(stripedio:271): Start io-manager with 2 threads.
INFO: graphchi_engine.hpp(graphchi_engine:154): Initializing graphchi_engine. This engine expects 4-byte edge data.
INFO: chifilenames.hpp(load_vertex_intervals:400): shard: 0 - 5482284
INFO: graphchi_engine.hpp(run:737): GraphChi starting
INFO: graphchi_engine.hpp(run:738): Licensed under the Apache License 2.0
INFO: graphchi_engine.hpp(run:739): Copyright Aapo Kyrola et al., Carnegie Mellon University (2012)
DEBUG: slidingshard.hpp(sliding_shard:213): Total edge data size: 500901304, dataset.edata.e4B.0_1sizeof(ET): 4
INFO: graphchi_engine.hpp(print_config:132): Engine configuration:
INFO: graphchi_engine.hpp(print_config:133): exec_threads = 24
INFO: graphchi_engine.hpp(print_config:134): load_threads = 4
INFO: graphchi_engine.hpp(print_config:135): membudget_mb = 800
INFO: graphchi_engine.hpp(print_config:136): blocksize = 1048576
INFO: graphchi_engine.hpp(print_config:137): scheduler = 1
INFO: graphchi_engine.hpp(run:773): Start iteration: 0
INFO: graphchi_engine.hpp(run:852): 0.064265s: Starting: 0 -- 5482284
INFO: graphchi_engine.hpp(run:865): Iteration 0/5, subinterval: 0 - 5482284
DEBUG: graphchi_engine.hpp(run:880): Allocation 5482285 vertices, sizeof:64 total:350866240
ERROR: memoryshard.hpp(load_edata:320): Did not find block dataset.edata.e4B.0_1_blockdir_1048576/329
ERROR: memoryshard.hpp(load_edata:321): Going to exit...
DEBUG: memoryshard.hpp(load_edata:327): Compressed/full size: 0.688722 number of blocks: 478
itemcf: ../../src/shards/memoryshard.hpp:329: void graphchi::memory_shard<VT, ET, svertex_t>::load_edata() [with VT = unsigned int; ET = unsigned int; svertex_t = graphchi::graphchi_vertex<unsigned int, unsigned int>]: Assertion `blockid == nblocks' failed.
Aborted (core dumped)
tarjan:~/graphchi-cpp

ulimit -n
1024

tarjan:~/graphchi-cpp

ulimit -n 50000
tarjan:~/graphchi-cpp

NOW THE SAME CODE WORKS!

./toolkits/collaborative_filtering/itemcf --distance=4 --training=dataset --K=350 --quiet=0 --nshards=1 --clean_cache=1 --min_allowed_intersection=48 --allow_zeros=1
WARNING: common.hpp(print_copyright:195): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to [email protected]
[distance] => [4]
[training] => [dataset]
[K] => [350]
[quiet] => [0]
[nshards] => [1]
[clean_cache] => [1]
[min_allowed_intersection] => [48]
[allow_zeros] => [1]
WARNING: chifilenames.hpp(find_shards:271): Could not find shards with nshards = 1
WARNING: chifilenames.hpp(find_shards:272): Please define 'nshards 0' or 'nshards auto' to automatically detect.
INFO: sharder.hpp(start_preprocessing:370): Starting preprocessing, shovel size: 17476266
INFO: io.hpp(compute_matrix_size:136): Starting to read matrix-market input. Matrix dimensions: 5466961 x 15325, non-zeros: 125225326
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.0.shovel, max:5475672
INFO: sharder.hpp(flush:193): Sort done.dataset4.0.shovel
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.1.shovel, max:5477316
INFO: sharder.hpp(flush:193): Sort done.dataset4.1.shovel
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.2.shovel, max:5477394
INFO: sharder.hpp(flush:193): Sort done.dataset4.2.shovel
INFO: sharder.hpp(flush_shovel:407): Too many outstanding shoveling threads...
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.3.shovel, max:5479571
INFO: sharder.hpp(flush:193): Sort done.dataset4.3.shovel
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.4.shovel, max:5481431
INFO: sharder.hpp(flush:193): Sort done.dataset4.4.shovel
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.5.shovel, max:5481431
INFO: sharder.hpp(flush:193): Sort done.dataset4.5.shovel
INFO: sharder.hpp(flush_shovel:407): Too many outstanding shoveling threads...
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.6.shovel, max:5481997
INFO: sharder.hpp(flush:193): Sort done.dataset4.6.shovel
INFO: io.hpp(convert_matrixmarket:564): Global mean is: 317.101 Now creating shards.
INFO: sharder.hpp(flush:191): Sorting shovel: dataset4.7.shovel, max:5482284
INFO: sharder.hpp(flush:193): Sort done.dataset4.7.shovel
INFO: sharder.hpp(flush_shovel:401): Waiting shoveling threads...
INFO: io.hpp(convert_matrixmarket:578): Now creating shards.
INFO: sharder.hpp(determine_number_of_shards:586): Number of shards to be created: 1
INFO: sharder.hpp(write_shards:967): Edges per shard: 125225327 nshards=1 total: 125225326
INFO: sharder.hpp(write_shards:973): Buffer size in merge phase: 52428804
INFO: sharder.hpp(finish_shard:625): Starting final processing for shard: 0
DEBUG: sharder.hpp(finish_shard:636): Shovel size:1502703912 edges: 125225326
DEBUG: sharder.hpp(finish_shard:702): 0 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 10000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 20000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 30000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 40000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 50000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 60000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 70000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 80000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 90000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 100000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 110000000 / 125225326
DEBUG: sharder.hpp(finish_shard:702): 120000000 / 125225326
INFO: sharder.hpp(createnextshard:887): Remaining edges: 0 remaining shards:0 edges per shard=125225327
INFO: sharder.hpp(createnextshard:890): Edges per shard: 125225327

=== REPORT FOR sharder() ===
[Timings]
edata_flush: 0.343608s (count: 478, min: 0.000667s, max: 0.000944, avg: 0.000718845s)
execute_sharding: 21.5186 s
finish_shard.sort: 9.32435 s
preprocessing: 53.9705 s
shard_final: 14.3798 s
[Other]
app: sharder
INFO: sharder.hpp(done:903): Created 1 shards, for 125225326 edgesSuccessfully finished sharding for dataset
INFO: io.hpp(convert_matrixmarket:583): Created 1 shards.
INFO: itemcf.cpp(main:514): M = 5466961
DEBUG: stripedio.hpp(stripedio:271): Start io-manager with 2 threads.
INFO: graphchi_engine.hpp(graphchi_engine:154): Initializing graphchi_engine. This engine expects 4-byte edge data.
INFO: chifilenames.hpp(load_vertex_intervals:400): shard: 0 - 5482284
INFO: graphchi_engine.hpp(run:737): GraphChi starting
INFO: graphchi_engine.hpp(run:738): Licensed under the Apache License 2.0
INFO: graphchi_engine.hpp(run:739): Copyright Aapo Kyrola et al., Carnegie Mellon University (2012)
DEBUG: slidingshard.hpp(sliding_shard:213): Total edge data size: 500901304, dataset.edata.e4B.0_1sizeof(ET): 4
INFO: graphchi_engine.hpp(print_config:132): Engine configuration:
INFO: graphchi_engine.hpp(print_config:133): exec_threads = 24
INFO: graphchi_engine.hpp(print_config:134): load_threads = 4
INFO: graphchi_engine.hpp(print_config:135): membudget_mb = 800
INFO: graphchi_engine.hpp(print_config:136): blocksize = 1048576
INFO: graphchi_engine.hpp(print_config:137): scheduler = 1
INFO: graphchi_engine.hpp(run:773): Start iteration: 0
INFO: graphchi_engine.hpp(run:852): 0.17536s: Starting: 0 -- 5482284
INFO: graphchi_engine.hpp(run:865): Iteration 0/5, subinterval: 0 - 5482284
DEBUG: graphchi_engine.hpp(run:880): Allocation 5482285 vertices, sizeof:64 total:350866240
DEBUG: memoryshard.hpp(load_edata:327): Compressed/full size: 1 number of blocks: 478
INFO: graphchi_engine.hpp(run:889): Start updates
INFO: graphchi_engine.hpp(exec_updates_inmemory_mode:470): In-memory mode: Iteration 0 starts. (6.66187 secs)
entering iteration: 0 on before_exec_interval
pivot_st is 5466961 window_en 5482285
DEBUG: itemcf.cpp(before_exec_interval:464): Window init, grabbed: 0 edges extending pivor_range to : 5482286
DEBUG: itemcf.cpp(before_exec_interval:466): Window en is: 5482285 vertices: 5482285
INFO: graphchi_engine.hpp(exec_updates_inmemory_mode:470): In-memory mode: Iteration 1 starts. (8.83943 secs)

Segment error (core dumped)

Why is it that when using streampot data set, a sktch file can be generated when running analyzer with base and stream data at graph id 0, but an error can be generated when running analyzer with base and stream data at graph id 1
I use the following command

bin/unicorn/main filetype edgelist niters 10000 base ../base_data/streamspot_1 stream ../stream_data/streamspot_1 sketch ../sketch_data/train/streamspot_1

The following error occurred

INFO:     graphchi_engine.hpp(print_config:134): Engine configuration:
INFO:     graphchi_engine.hpp(print_config:135):  exec_threads = 4
INFO:     graphchi_engine.hpp(print_config:136):  load_threads = 4
INFO:     graphchi_engine.hpp(print_config:137):  membudget_mb = 2000
INFO:     graphchi_engine.hpp(print_config:138):  blocksize = 1048608
INFO:     graphchi_engine.hpp(print_config:139):  scheduler = 1
INFO:     graphchi_engine.hpp(run:783): Start iteration: 0
INFO: graphchi_engine.hpp(run:868): 0.023569s: Starting: 0 -- 1610644
Segment error (core dumped)

Paper on which svdpp.cpp is based

In the preamble of svdpp.cpp (comments part), the title of the paper on which the code is based, is missing.

SVD computation fails

Hi,

I tried to SVD a 14.6m*14.6m matrix with ~213m values on a 32 core machine with 244GB RAM. It failed thrice. Any hints are welcome.

Here are the three commands and error messages:

$ toolkits/collaborative_filtering/svd --training=/mnt/2/data.mm.mtx --nsv=50 --nv=55

INFO: graphchi_engine.hpp(run:838): 146.755s: Starting: 400798 -- 626074
INFO: graphchi_engine.hpp(run:851): Iteration 0/0, subinterval: 400798 - 626074
DEBUG: memoryshard.hpp(load_edata:257): Compressed/full size: 0.34846 number of blocks: 23
INFO: graphchi_engine.hpp(run:873): Start updates
INFO: graphchi_engine.hpp(run:883): Finished updates
*** Error in `/home/ubuntu/graphchi-prokopp/toolkits/collaborative_filtering/svd': free(): invalid size: 0x0000000044489d20 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80a46)[0x7fe6e8906a46]
/home/ubuntu/graphchi-prokopp/toolkits/collaborative_filtering/svd[0x439d3b]
/home/ubuntu/graphchi-prokopp/toolkits/collaborative_filtering/svd[0x416899]
/home/ubuntu/graphchi-prokopp/toolkits/collaborative_filtering/svd[0x4193dd]
/home/ubuntu/graphchi-prokopp/toolkits/collaborative_filtering/svd[0x4051eb]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fe6e88a7ea5]
/home/ubuntu/graphchi-prokopp/toolkits/collaborative_filtering/svd[0x406231]
======= Memory map: ========
00400000-0045d000 r-xp 00000000 08:01 1579028 /home/ubuntu/graphchi-prokopp/toolkits/collaborative_filtering/svd
0065c000-0065d000 r--p 0005c000 08:01 1579028 /home/ubuntu/graphchi-prokopp/toolkits/collaborative_filtering/svd
0065d000-0065e000 rw-p 0005d000 08:01 1579028 /home/ubuntu/graphchi-prokopp/toolkits/collaborative_filtering/svd
0065e000-0065f000 rw-p 00000000 00:00 0
01006000-462d1000 rw-p 00000000 00:00 0 [heap]
7fdf7c000000-7fdf7ff14000 rw-p 00000000 00:00 0
7fdf7ff14000-7fdf80000000 ---p 00000000 00:00 0
7fdf80000000-7fdf83c01000 rw-p 00000000 00:00 0
[...]
7fe3b0000000-7fe3b4000000 rw-p 00000000 00:00 0
7fe3b4000000-7fe3b8000000 rw-p 00000000 00:00 0 Aborted (core dumped)

--- Next try ---

$ toolkits/collaborative_filtering/svd --training=/mnt/2/data.mm.mtx --nsv=50 --nv=55 mudget_mb 100000 nshards 1
[...]
DEBUG: graphchi_engine.hpp(run:740): Engine being restarted, do not reinitialize.
INFO: graphchi_engine.hpp(print_config:131): Engine configuration:
INFO: graphchi_engine.hpp(print_config:132): exec_threads = 32
INFO: graphchi_engine.hpp(print_config:133): load_threads = 4
INFO: graphchi_engine.hpp(print_config:134): membudget_mb = 100000
INFO: graphchi_engine.hpp(print_config:135): blocksize = 4194304
INFO: graphchi_engine.hpp(print_config:136): scheduler = 0
INFO: graphchi_engine.hpp(run:759): Start iteration: 0
INFO: graphchi_engine.hpp(run:838): 222.611s: Starting: 0 -- 14628414
INFO: graphchi_engine.hpp(run:851): Iteration 0/0, subinterval: 0 - 14628414
DEBUG: memoryshard.hpp(load_edata:257): Compressed/full size: 0.33842 number of blocks: 203
INFO: graphchi_engine.hpp(run:873): Start updates
INFO: graphchi_engine.hpp(exec_updates_inmemory_mode:481): In-memory mode: Iteration 0 starts. (273.507 secs)
*** Error in `/home/ubuntu/graphchi-prokopp/toolkits/collaborative_filtering/svd': free(): invalid size: 0x00007fc4a7023670 ***
Segmentation fault (core dumped)

--- Next try ---

$ toolkits/collaborative_filtering/svd --training=/mnt/2/data.mm.mtx --nsv=5 --nv=5 mem
get_mb 100000 nshards 1
[...]
INFO: graphchi_engine.hpp(exec_updates_inmemory_mode:481): In-memory mode: Iteration 0 starts. (188.425 secs)
INFO: graphchi_engine.hpp(run:883): Finished updates
INFO: graphchi_engine.hpp(run:723): GraphChi starting
INFO: graphchi_engine.hpp(run:724): Licensed under the Apache License 2.0
INFO: graphchi_engine.hpp(run:725): Copyright Aapo Kyrola et al., Carnegie Mellon University (2012)
DEBUG: graphchi_engine.hpp(run:740): Engine being restarted, do not reinitialize.
INFO: graphchi_engine.hpp(print_config:131): Engine configuration:
INFO: graphchi_engine.hpp(print_config:132): exec_threads = 32
INFO: graphchi_engine.hpp(print_config:133): load_threads = 4
INFO: graphchi_engine.hpp(print_config:134): membudget_mb = 100000
INFO: graphchi_engine.hpp(print_config:135): blocksize = 4194304
INFO: graphchi_engine.hpp(print_config:136): scheduler = 0
INFO: graphchi_engine.hpp(run:759): Start iteration: 0
INFO: graphchi_engine.hpp(run:838): 194.59s: Starting: 0 -- 14628414
INFO: graphchi_engine.hpp(run:851): Iteration 0/0, subinterval: 0 - 14628414
DEBUG: memoryshard.hpp(load_edata:257): Compressed/full size: 0.33842 number of blocks: 203
INFO: graphchi_engine.hpp(run:873): Start updates
INFO: graphchi_engine.hpp(exec_updates_inmemory_mode:481): In-memory mode: Iteration 0 starts. (249.173 secs)
*** Error in `...../toolkits/collaborative_filtering/svd': munmap_chunk(): invalid pointer: 0x00007fb6c1976070 ***
Segmentation fault (core dumped)

matrix factorization toolkit not writing out output files

hi,

I tried nmf and pmf as shown in the blog post below, and no output files are being written out. It appears to train and converge OK, but the *U and *V files are missing.
http://bickson.blogspot.com/2012/12/collaborative-filtering-with-graphchi.html

./toolkits/collaborative_filtering/pmf --training=x_mm --minval=1 --maxval=500 --max_iter=10 --pmf_burn_in=5 --allow_zeros=1 --R_output_format

./toolkits/collaborative_filtering/nmf --training=x_mm --minval=1 --maxval=500 --max_iter=100 --quiet=1

WALS problem

As for wals algorithm,when I set all weights of the ratings to the same weight 1.0,it turns out that during each iteration the training rmse is all 0,so why it is like this??

Invalid command line arguments silently fail

It will be nice to have a better validation of command line arguments, where a wrong argument will give an error. Currently invalid arguments are simply ignored.

Unconventional MatrixMarket format

There are a number of output files obtained from running collaborative filtering algorithms found in toolkits/collaborative_filtering that advertise themselves to be MatrixMarket files through a .mm extension or a %%MatrixMarket matrix array real general header, but do not seem to follow the expected MatrixMarket format as defined by NIST.

For example, the output of running ./toolkits/collaborative_filtering/rating --training=smallnetflix_mm --num_ratings=5 --quiet=1 --algorithm=als is two files:

smallnetflix_mm.ids
smallnetflix_mm.ratings

Their header is (only one is shown here):

$ head -n 10 smallnetflix_mm.ids
%%MatrixMarket matrix array real general 
%This file contains item ids matching the ratings. In each row i, num_ratings top item ids for user i. (First column: user id, next columns, top K ratings). Note: 0 item id means there are no more items to recommend for this user.
95526 6 
1 1243 424 2641 2109 1557
2 2641 1548 1227 548 76 
3 1243 2548 1227 2641 76 
4 1449 2641 2109 3172 1227 
5 1449 1227 2298 735 1382 
6 2109 2669 1227 3112 2583
7 3516 2016 2647 1548 1243

'array' here indicates to the parser that the output is expected to be one value per line (column-oriented), yet it is not the case. Other files with the same problem include files ending in _U.mm or _V.mm.

This problem is especially apparent when using mmread from scipy.io (Python third-party way of reading matrixmarket files) to read these files as the format is then perceived as invalid and the file can't be read. (The --R_output_format option is not changing any of that for me).

I might be missing something here though. Thanks for the tool :).

Graphchi .hpp's cause compilation error

Hi,

If I want to organize my code in .h and .cpp files, I get compilation errors of the type

/tmp/ccSLfl1Z.o: In function get_dirname':
/home/ndavid/Documents/Research/LTR/graphchi/toolkits/graphchi-ltr/src/../../../src/preprocessing/conversions.hpp:173: multiple definition of graphchi::get_dirname(std::basic_string<char, std::char_traits<char>, std::allocator<char> >)' /tmp/ccqcbsrt.o:/home/ndavid/Documents/Research/LTR/graphchi/toolkits/graphchi-ltr/src/../../../src/preprocessing/conversions.hpp:173: first defined here

, if the code in question #includes anything from Graphchi. I am not a big fan of .hpp files, but I have to use those if I want to get around the problems. It would be great if you guys could somehow address this problem.

Best,
David Nemeskey

potential bugs in inmemconncomps.cpp

I find the vertex data is used in the update function before initialization.

The following graph is used to test, in adjlist format.

0 1 1
2 1 0 1
3 2 2 3 4
4 3 2 2 4
5 4 2 2 3

The expected output should be:

List of labels was written to file: /home/user/test.adj.components

label: 2, size: 3
label: 0, size: 2

However, inmemconncomps.cpp gives the following:

List of labels was written to file: /home/user/test.adj.components

label: 0, size: 5

ginfo vs. gcontext error in application_template.cpp

The name of the third parameter is gcontext but the code still uses ginfo

void update(graphchi_vertex<VertexDataType, EdgeDataType> &vertex, graphchi_context &gcontext) {

    if (ginfo.iteration == 0) {

command line option

Where could I get a list of all command line options of GraphChi?
Thanks

conversions.hpp:change all fopen() to fopen64()

The original code can not read files larger than 2G, after changing all fopen() to fopen64(), it works. Test on 6.5G

Problems creating large shard file (> 2GiB) ?

It appears that GraphChi 0.2.3 has trouble creating shard file larger than 2GiB? I have plenty of space to put it, but it errors out. Any additional details I can provide?

Reducing the memory budget sufficiently (so that < 2 GiB shards are created) works around the issue.

Running GraphChi Connected Components program
INFO:     conversions.hpp(convert_if_notexists:767): Did not find preprocessed shards for nodes_20131001
INFO:     conversions.hpp(convert_if_notexists:768): (Edge-value size: 4)
INFO:     conversions.hpp(convert_if_notexists:769): Will try create them now...
INFO:     sharder.hpp(start_preprocessing:326): Starting preprocessing, shovel size: 1310720000
INFO:     conversions.hpp(convert_edgelist:221): Reading in edge list format!
DEBUG:    conversions.hpp(convert_edgelist:226): Read 10000000 lines, 180.039 MB
.
.
.
DEBUG:    conversions.hpp(convert_edgelist:226): Read 1590000000 lines, 29900.5 MB
INFO:     sharder.hpp(flush:152): Sorting shovel: nodes_201310014.1.shovel, max:1738496712
INFO:     sharder.hpp(flush:154): Sort done.nodes_201310014.1.shovel
ERROR:    ioutil.hpp(writea:129): Could not write 3435929520 bytes! error:Bad file descriptor
connectedcomponents_list: ./src/util/ioutil.hpp:130: void writea(int, T*, size_t) [with T = graphchi::edge_with_value<unsigned int>]: Assertion `false' failed.

ERROR: Could not read configuration file: conf/graphchi.local.cnf

set_last_iteration is not reset properly

If I run an engine and set_last_iteration in the context, then I run the engine again, the number of requested iterations is ignored, and the value from the previous run is used.
I sugget to reset the last_iteration flag when the engine is run

Packages for common Linux distributions

Program would be easier to use if there .deb and .rpm packages available.

two "return"s with the same indentation

In file "toolkits/collaborative_filtering/libfm.cpp", there are two "return"s at line 193 and line 194 with the same indentation. They are not separated by any braces, either.
Are there something wrong?

getting the number of Koren's SVD++

solved.... my data has some issue

Hi,

I am trying to replicate the result on Netflix probe set in Koren's 2008 KDD paper using graphchi implementation of SVD++ so I set up the following parameters in
graphchi-cpp/toolkits/collaborative_filtering/svdpp.cpp

%%%%
svdpp.step_dec = get_option_float("svdpp_step_dec", 0.9);
svdpp.itmBiasStep = get_option_float("svdpp_item_bias_step", 0.007);
svdpp.itmBiasReg = get_option_float("svdpp_item_bias_reg", 0.005);
svdpp.usrBiasStep = get_option_float("svdpp_user_bias_step", 0.007);
svdpp.usrBiasReg = get_option_float("svdpp_user_bias_reg", 0.005);
svdpp.usrFctrStep = get_option_float("svdpp_user_factor_step", 0.007);
svdpp.usrFctrReg = get_option_float("svdpp_user_factor_reg", 0.015);
svdpp.itmFctrReg = get_option_float("svdpp_item_factor_reg", 0.015);
svdpp.itmFctrStep = get_option_float("svdpp_item_factor_step", 0.007);
svdpp.itmFctr2Reg = get_option_float("svdpp_item_factor2_reg", 0.015);
svdpp.itmFctr2Step = get_option_float("svdpp_item_factor2_step", 0.001);

%%%%

However I am getting something like this:

[minval] => [1]
[maxval] => [5]
[max_iter] => [40]
[quiet] => [1]
[D] => [50]
28.9275) Iteration: 0 Training RMSE: 0.991109 Validation RMSE: 1.05016 ratings_per_sec: 0
56.988) Iteration: 1 Training RMSE: 1.13069 Validation RMSE: 1.13803 ratings_per_sec: 1.702e+06
84.9509) Iteration: 2 Training RMSE: 1.37655 Validation RMSE: 1.43292 ratings_per_sec: 2.29897e+06
113.139) Iteration: 3 Training RMSE: 1.68918 Validation RMSE: 1.87668 ratings_per_sec: 2.59855e+06
140.337) Iteration: 4 Training RMSE: 1.88749 Validation RMSE: 2.08639 ratings_per_sec: 2.79937e+06
169.082) Iteration: 5 Training RMSE: 1.98145 Validation RMSE: 2.06916 ratings_per_sec: 2.90865e+06
197.412) Iteration: 6 Training RMSE: 1.98927 Validation RMSE: 2.12117 ratings_per_sec: 2.99253e+06
225.531) Iteration: 7 Training RMSE: 1.98473 Validation RMSE: 2.11817 ratings_per_sec: 3.05846e+06
254.665) Iteration: 8 Training RMSE: 1.97704 Validation RMSE: 2.14269 ratings_per_sec: 3.09737e+06
283.274) Iteration: 9 Training RMSE: 1.97679 Validation RMSE: 2.14098 ratings_per_sec: 3.13412e+06
310.496) Iteration: 10 Training RMSE: 1.97564 Validation RMSE: 2.1162 ratings_per_sec: 3.17823e+06
338.768) Iteration: 11 Training RMSE: 1.97641 Validation RMSE: 2.11651 ratings_per_sec: 3.20535e+06
366.073) Iteration: 12 Training RMSE: 1.97204 Validation RMSE: 2.06543 ratings_per_sec: 3.23678e+06
394.577) Iteration: 13 Training RMSE: 1.97018 Validation RMSE: 2.0808 ratings_per_sec: 3.25387e+06
423.161) Iteration: 14 Training RMSE: 1.9682 Validation RMSE: 2.08301 ratings_per_sec: 3.26834e+06
452.649) Iteration: 15 Training RMSE: 1.9673 Validation RMSE: 2.05791 ratings_per_sec: 3.27412e+06
481.934) Iteration: 16 Training RMSE: 1.96858 Validation RMSE: 2.05757 ratings_per_sec: 3.28081e+06
511.155) Iteration: 17 Training RMSE: 1.96518 Validation RMSE: 2.06031 ratings_per_sec: 3.28712e+06
539.405) Iteration: 18 Training RMSE: 1.96373 Validation RMSE: 2.08988 ratings_per_sec: 3.29856e+06
569.019) Iteration: 19 Training RMSE: 1.96371 Validation RMSE: 2.05959 ratings_per_sec: 3.30103e+06

Any idea what happened ??

Thanks !

Matrixmarket output not compliant with standard mmread.m

I think I found a bug in MMOutputter_vec (io.hpp), in relation to U, V storage to matrixmarket dense format. Actual output is not compliant with mmread.m expectations, causing wrong data load from octave/Matlab. It affects at least als.cpp and wals.cpp by default, when not using R_output_format=1.

It is easy to reproduce with latest version of graphchi (October 21 2013):
./toolkits/collaborative_filtering/wals --unittest=1

Then in octave:
U = mmread('test_wals_U.mm');
V = mmread('test_wals_V.mm');
full(U_V') #very different from actual test_wals matrix
[U(1:20); U(21:40); U(41:60); U(61:80)]_[V(1:20); V(21:40); V(41:60); V(61:80)]' #that seems the real matrix

Analogous with als: starting with als --unittest=1 --training=test_als

Regards

Is this project alive ?

I wonder whether this project has continued to live or better solutions have been found, so this was abandoned.

reserved identifier violation

I would like to point out that identifiers like "__CF_UTILS__" and "__graphchi_xcode__graphgenerators__" do not fit to the expected naming convention of the C++ language standard.
Would you like to adjust your selection for unique names?

Example app trianglecounting fails with twitter graph

I ran the trianglecounting example app with no problem on small graphs, but I get a seg fault when I run the twitter graph with 42 mil vertices & 1.5 bil edges. I have 500GB of RAM and 64 cores and over 1TB of free disk space and this crash occurs when nothing else is running.

Here is the output from gdb when I try to debug. It seems it fails within malloc:

DEBUG: slidingshard.hpp(sliding_shard:213): Total edge data size:
839065660, /mnt/ram0/graphs/twitter_rv.net_degord.edata.e4B.0_0.dyngraph.0_7sizeof(ET):
4

Program received signal SIGSEGV, Segmentation fault.
_int_malloc (av=0x7ffff7492720, bytes=643908276) at malloc.c:3900
3900 malloc.c: No such file or directory.
(gdb) backtrace
#0 _int_malloc (av=0x7ffff7492720, bytes=643908276) at malloc.c:3900
#1 0x00007ffff715bf95 in __GI___libc_malloc (bytes=643908276) at malloc.c:2924
#2 0x00000000004428ed in

graphchi::graphchi_dynamicgraph_engine<unsigned int, unsigned int,
graphchi::graphchi_vertex<unsigned int, unsigned int>

::commit_graph_changes (this=0x7fffffffd730) at
./src/engine/dynamic_graphs/graphchi_dynamicgraph_engine.hpp:733

3 0x000000000042a476 in graphchi::graphchi_engine<unsigned int,

unsigned int, graphchi::graphchi_vertex<unsigned int, unsigned int>
::run (this=0x7fffffffd730, userprogram=..., _niters=)
at ./src/engine/graphchi_engine.hpp:952

4 0x0000000000404eb3 in main (argc=, argv=<optimized

out>) at example_apps/trianglecounting.cpp:470
(gdb) frame 2

2 0x00000000004428ed in

graphchi::graphchi_dynamicgraph_engine<unsigned int, unsigned int,
graphchi::graphchi_vertex<unsigned int, unsigned int>
::commit_graph_changes (this=0x7fffffffd730) at
./src/engine/dynamic_graphs/graphchi_dynamicgraph_engine.hpp:733
733 edata =
(graphchi_edge*)malloc(num_edges *
sizeof(graphchi_edge));

Any idea how to solve this?
Thanks

About in-memory model

How to let Graphchi run in in-memory model if the machine has sufficient amount of memory?

Remove unnecessary null pointer checks

An extra null pointer check is not needed in functions like the following.

unknown errors: too many open files

I just input the matrix market format data of netflix, then get the following error:

ERROR: stripedio.hpp(open_session:404): Could not open: graphchi_train.edata..Z.e4B.4_5_blockdir_1048576/60 session: 1023 error: Too many open files
biassgd: ../../src/io/stripedio.hpp:406: int graphchi::stripedio::open_session(std::string, bool, bool): Assertion `rddesc>=0' failed.

------------------------details------------------------
./toolkits/collaborative_filtering/biassgd --training=graphchi_train --validation=graphchi_test --biassgd_lambda=1e-4 --biassgd_gamma=1e-4 --minval=1 --maxval=5 --max_iter=10
WARNING: common.hpp(print_copyright:214): GraphChi Collaborative filtering library is written by Danny Bickson (c). Send any comments or bug reports to [email protected]
[training] => [graphchi_train]
[validation] => [graphchi_test]
[biassgd_lambda] => [1e-4]
[biassgd_gamma] => [1e-4]
[minval] => [1]
[maxval] => [5]
[max_iter] => [10]
INFO: chifilenames.hpp(find_shards:265): Detected number of shards: 5
INFO: chifilenames.hpp(find_shards:266): To specify a different number of shards, use command-line parameter 'nshards'
INFO: io.hpp(convert_matrixmarket:509): File graphchi_train was already preprocessed, won't do it again.
INFO: io.hpp(read_global_mean:109): Opened matrix size: 480189 x 17770 edges: 99072112 Global mean is: 3.6033 time bins: 0 Now creating shards.
INFO: chifilenames.hpp(find_shards:265): Detected number of shards: 1
INFO: chifilenames.hpp(find_shards:266): To specify a different number of shards, use command-line parameter 'nshards'
INFO: io.hpp(convert_matrixmarket:509): File graphchi_test was already preprocessed, won't do it again.
INFO: io.hpp(read_global_mean:111): Opened VLIDATION matrix size: 480189 x 17770 edges: 1408395 Global mean is: 3.67363 time bins: 0 Now creating shards.
DEBUG: stripedio.hpp(stripedio:271): Start io-manager with 2 threads.
INFO: graphchi_engine.hpp(graphchi_engine:154): Initializing graphchi_engine. This engine expects 4-byte edge data.
INFO: chifilenames.hpp(load_vertex_intervals:400): shard: 0 - 497958
DEBUG: stripedio.hpp(stripedio:271): Start io-manager with 2 threads.
INFO: graphchi_engine.hpp(graphchi_engine:154): Initializing graphchi_engine. This engine expects 4-byte edge data.
INFO: chifilenames.hpp(load_vertex_intervals:400): shard: 0 - 484050
INFO: chifilenames.hpp(load_vertex_intervals:400): shard: 484051 - 487367
INFO: chifilenames.hpp(load_vertex_intervals:400): shard: 487368 - 491268
INFO: chifilenames.hpp(load_vertex_intervals:400): shard: 491269 - 494699
INFO: chifilenames.hpp(load_vertex_intervals:400): shard: 494700 - 497958
INFO: graphchi_engine.hpp(run:744): GraphChi starting
INFO: graphchi_engine.hpp(run:745): Licensed under the Apache License 2.0
INFO: graphchi_engine.hpp(run:746): Copyright Aapo Kyrola et al., Carnegie Mellon University (2012)
DEBUG: slidingshard.hpp(sliding_shard:213): Total edge data size: 79274092, graphchi_train.edata..Z.e4B.0_5sizeof(ET): 4
DEBUG: slidingshard.hpp(sliding_shard:213): Total edge data size: 79256548, graphchi_train.edata..Z.e4B.1_5sizeof(ET): 4
DEBUG: slidingshard.hpp(sliding_shard:213): Total edge data size: 79273696, graphchi_train.edata..Z.e4B.2_5sizeof(ET): 4
DEBUG: slidingshard.hpp(sliding_shard:213): Total edge data size: 79245960, graphchi_train.edata..Z.e4B.3_5sizeof(ET): 4
DEBUG: slidingshard.hpp(sliding_shard:213): Total edge data size: 79238152, graphchi_train.edata..Z.e4B.4_5sizeof(ET): 4
INFO: graphchi_engine.hpp(print_config:132): Engine configuration:
INFO: graphchi_engine.hpp(print_config:133): exec_threads = 8
INFO: graphchi_engine.hpp(print_config:134): load_threads = 4
INFO: graphchi_engine.hpp(print_config:135): membudget_mb = 800
INFO: graphchi_engine.hpp(print_config:136): blocksize = 1048576
INFO: graphchi_engine.hpp(print_config:137): scheduler = 0
INFO: graphchi_engine.hpp(run:780): Start iteration: 0
DEBUG: rmse_engine.hpp(reset_rmse:148): Detected number of threads: 8
INFO: graphchi_engine.hpp(run:859): 0.000446s: Starting: 0 -- 484050
DEBUG: graphchi_engine.hpp(determine_next_window:325): Memory budget exceeded with 838868704 bytes.
INFO: graphchi_engine.hpp(run:872): Iteration 0/9, subinterval: 0 - 80199
DEBUG: graphchi_engine.hpp(run:887): Allocation 80200 vertices, sizeof:64 total:5132800
DEBUG: memoryshard.hpp(load_edata:327): Compressed/full size: 0.172229 number of blocks: 76
INFO: graphchi_engine.hpp(run:896): Start updates
INFO: graphchi_engine.hpp(run:906): Finished updates
DEBUG: graphchi_engine.hpp(determine_next_window:325): Memory budget exceeded with 838865568 bytes.
INFO: graphchi_engine.hpp(run:872): Iteration 0/9, subinterval: 80200 - 234145
DEBUG: graphchi_engine.hpp(run:887): Allocation 153946 vertices, sizeof:64 total:9852544
INFO: graphchi_engine.hpp(run:896): Start updates
INFO: graphchi_engine.hpp(run:906): Finished updates
INFO: graphchi_engine.hpp(run:872): Iteration 0/9, subinterval: 234146 - 484050
DEBUG: graphchi_engine.hpp(run:887): Allocation 249905 vertices, sizeof:64 total:15993920
INFO: graphchi_engine.hpp(run:896): Start updates
INFO: graphchi_engine.hpp(run:906): Finished updates
INFO: graphchi_engine.hpp(run:859): 3.83968s: Starting: 484051 -- 487367
INFO: graphchi_engine.hpp(run:872): Iteration 0/9, subinterval: 484051 - 487367
DEBUG: graphchi_engine.hpp(run:887): Allocation 3317 vertices, sizeof:64 total:212288
DEBUG: memoryshard.hpp(load_edata:327): Compressed/full size: 0.172574 number of blocks: 76
INFO: graphchi_engine.hpp(run:896): Start updates
INFO: graphchi_engine.hpp(run:906): Finished updates
INFO: graphchi_engine.hpp(run:859): 4.27056s: Starting: 487368 -- 491268
INFO: graphchi_engine.hpp(run:872): Iteration 0/9, subinterval: 487368 - 491268
DEBUG: graphchi_engine.hpp(run:887): Allocation 3901 vertices, sizeof:64 total:249664
DEBUG: memoryshard.hpp(load_edata:327): Compressed/full size: 0.172775 number of blocks: 76
INFO: graphchi_engine.hpp(run:896): Start updates
INFO: graphchi_engine.hpp(run:906): Finished updates
INFO: graphchi_engine.hpp(run:859): 4.57392s: Starting: 491269 -- 494699
INFO: graphchi_engine.hpp(run:872): Iteration 0/9, subinterval: 491269 - 494699
DEBUG: graphchi_engine.hpp(run:887): Allocation 3431 vertices, sizeof:64 total:219584
DEBUG: memoryshard.hpp(load_edata:327): Compressed/full size: 0.171853 number of blocks: 76
INFO: graphchi_engine.hpp(run:896): Start updates
INFO: graphchi_engine.hpp(run:906): Finished updates
INFO: graphchi_engine.hpp(run:859): 4.86931s: Starting: 494700 -- 497958
INFO: graphchi_engine.hpp(run:872): Iteration 0/9, subinterval: 494700 - 497958
DEBUG: graphchi_engine.hpp(run:887): Allocation 3259 vertices, sizeof:64 total:208576
DEBUG: memoryshard.hpp(load_edata:327): Compressed/full size: 0.172008 number of blocks: 76
INFO: graphchi_engine.hpp(run:896): Start updates
INFO: graphchi_engine.hpp(run:906): Finished updates
5.38353) Iteration: 0 Training RMSE: 1.32668INFO: graphchi_engine.hpp(run:744): GraphChi starting
INFO: graphchi_engine.hpp(run:745): Licensed under the Apache License 2.0
INFO: graphchi_engine.hpp(run:746): Copyright Aapo Kyrola et al., Carnegie Mellon University (2012)
DEBUG: slidingshard.hpp(sliding_shard:213): Total edge data size: 5633580, graphchi_test.edata..Z.e4B.0_1sizeof(ET): 4
INFO: graphchi_engine.hpp(print_config:132): Engine configuration:
INFO: graphchi_engine.hpp(print_config:133): exec_threads = 8
INFO: graphchi_engine.hpp(print_config:134): load_threads = 4
INFO: graphchi_engine.hpp(print_config:135): membudget_mb = 800
INFO: graphchi_engine.hpp(print_config:136): blocksize = 1048576
INFO: graphchi_engine.hpp(print_config:137): scheduler = 0
INFO: graphchi_engine.hpp(run:780): Start iteration: 0
INFO: graphchi_engine.hpp(run:859): 5.1685s: Starting: 0 -- 497958
INFO: graphchi_engine.hpp(run:872): Iteration 0/0, subinterval: 0 - 497958
DEBUG: graphchi_engine.hpp(run:887): Allocation 497959 vertices, sizeof:64 total:31869376
DEBUG: memoryshard.hpp(load_edata:327): Compressed/full size: 0.180397 number of blocks: 6
INFO: graphchi_engine.hpp(run:896): Start updates
INFO: graphchi_engine.hpp(exec_updates_inmemory_mode:470): In-memory mode: Iteration 0 starts. (5.23427 secs)
Validation RMSE: 1.23962
INFO: graphchi_engine.hpp(run:906): Finished updates
INFO: graphchi_engine.hpp(run:780): Start iteration: 1
DEBUG: rmse_engine.hpp(reset_rmse:148): Detected number of threads: 8
INFO: graphchi_engine.hpp(run:859): 5.27579s: Starting: 0 -- 484050
DEBUG: graphchi_engine.hpp(determine_next_window:325): Memory budget exceeded with 838868704 bytes.
INFO: graphchi_engine.hpp(run:872): Iteration 1/9, subinterval: 0 - 80199
DEBUG: graphchi_engine.hpp(run:887): Allocation 80200 vertices, sizeof:64 total:5132800
DEBUG: memoryshard.hpp(load_edata:327): Compressed/full size: 0.172229 number of blocks: 76
INFO: graphchi_engine.hpp(run:896): Start updates
INFO: graphchi_engine.hpp(run:906): Finished updates
DEBUG: graphchi_engine.hpp(determine_next_window:325): Memory budget exceeded with 838865568 bytes.
INFO: graphchi_engine.hpp(run:872): Iteration 1/9, subinterval: 80200 - 234145
DEBUG: graphchi_engine.hpp(run:887): Allocation 153946 vertices, sizeof:64 total:9852544

Triangle counting stops

I'm trying to count triangles on a Twitter dataset with 8.9M nodes and 25.9M edges.
All the IDs in the edgelist have been renumbered to ensure they fit a 32bit integer.
Pagerank and Community detection give no problems, while trianglecounting stops at a certain point:

$ bin/example_apps/trianglecounting file ~/share/data/edgelist.txt nshards 2 membudget_mb 14000 execthreads 4

Produces: (note the -nan in the last line)

[...]
INFO:     trianglecounting.cpp(before_iteration:373): Now pivots: 8976316 8976316
INFO:     graphchi_engine.hpp(run:859): 23.2758s: Starting: 0 -- 8516683
INFO:     graphchi_engine.hpp(run:872): Iteration 3/99999, subinterval: 0 - 8516683
DEBUG:    graphchi_engine.hpp(run:887): Allocation 8516684 vertices, sizeof:72 total:613201248
INFO:     graphchi_dynamicgraph_engine.hpp(incorporate_buffered_edges:372): ::: Used 0 buffered edges.
DEBUG:    memoryshard.hpp(load_edata:327): Compressed/full size: -nan number of blocks: 0

Not sure what's wrong.

adding "xcode-select --install" to the docs

I had tons of errors when installing on Mac: missing header files, and libraries (see below).
All of them got resolved once I run "xcode-select --install". Please consider adding it to README.md, in the 'compiling on mac section'

'''
example_apps/connectedcomponents.cpp:47:17: error: cmath: No such file or directory
example_apps/connectedcomponents.cpp:48:18: error: string: No such file or directory
In file included from example_apps/connectedcomponents.cpp:50:
./src/graphchi_basic_includes.hpp:36:19: error: sstream: No such file or directory
In file included from ./src/graphchi_basic_includes.hpp:38,
from example_apps/connectedcomponents.cpp:50:
./src/api/chifilenames.hpp:36:19: error: fstream: No such file or directory
./src/api/chifilenames.hpp:43:18: error: vector: No such file or directory
In file included from ./src/api/chifilenames.hpp:47,
from ./src/graphchi_basic_includes.hpp:38,
from example_apps/connectedcomponents.cpp:50:
./src/logger/logger.hpp:53:19: error: cstdlib: No such file or directory
./src/logger/logger.hpp:54:20: error: iostream: No such file or directory
./src/logger/logger.hpp:55:19: error: cassert: No such file or directory
./src/logger/logger.hpp:56:19: error: cstring: No such file or directory
./src/logger/logger.hpp:57:19: error: cstdarg: No such file or directory
In file included from ./src/api/graph_objects.hpp:39,
from ./src/api/graphchi_program.hpp:35,
from ./src/graphchi_basic_includes.hpp:40,
from example_apps/connectedcomponents.cpp:50:
./src/util/qsort.hpp:27:21: error: algorithm: No such file or directory
In file included from ./src/io/stripedio.hpp:48,
from ./src/api/vertex_aggregator.hpp:42,
from ./src/graphchi_basic_includes.hpp:43,
from example_apps/connectedcomponents.cpp:50:
./src/metrics/metrics.hpp:34:15: error: map: No such file or directory
./src/metrics/metrics.hpp:36:18: error: limits: No such file or directory
In file included from ./src/metrics/metrics.hpp:40,
from ./src/io/stripedio.hpp:48,
from ./src/api/vertex_aggregator.hpp:42,
from ./src/graphchi_basic_includes.hpp:43,
from example_apps/connectedcomponents.cpp:50:
./src/util/pthread_tools.hpp:15:16: error: list: No such file or directory
In file included from ./src/util/cmdopts.hpp:38,
from ./src/metrics/metrics.hpp:41,
from ./src/io/stripedio.hpp:48,
from ./src/api/vertex_aggregator.hpp:42,
from ./src/graphchi_basic_includes.hpp:43,
from example_apps/connectedcomponents.cpp:50:
./src/util/configfile.hpp:34:18: error: cstdio: No such file or directory
In file included from ./src/io/stripedio.hpp:49,
from ./src/api/vertex_aggregator.hpp:42,
from ./src/graphchi_basic_includes.hpp:43,
from example_apps/connectedcomponents.cpp:50:
./src/util/synchronized_queue.hpp:4:17: error: queue: No such file or directory
In file included from ./src/api/chifilenames.hpp:47,
from ./src/graphchi_basic_includes.hpp:38,
from example_apps/connectedcomponents.cpp:50:
./src/logger/logger.hpp:127: error: ‘stringstream’ in namespace ‘std’ does not name a type
./src/logger/logger.hpp:155: error: ‘string’ in namespace ‘std’ does not name a type
./src/logger/logger.hpp:202: error: declaration of ‘operator<<’ as non-function
./src/logger/logger.hpp:202: error: expected ‘;’ before ‘(’ token
In file included from ./src/engine/graphchi_engine.hpp:54,
from ./src/graphchi_basic_includes.hpp:45,
from example_apps/connectedcomponents.cpp:50:
./src/shards/memoryshard.hpp:162: error: expected ;' before end of line ./src/shards/memoryshard.hpp:162: error: expected}' before end of line
In file included from ./src/api/chifilenames.hpp:47,
from ./src/graphchi_basic_includes.hpp:38,
from example_apps/connectedcomponents.cpp:50:
./src/logger/logger.hpp: In member function ‘void file_logger::set_log_to_console(bool)’:
./src/logger/logger.hpp:151: error: ‘log_to_console’ was not declared in this scope
./src/logger/logger.hpp: In member function ‘bool file_logger::get_log_to_console()’:
./src/logger/logger.hpp:161: error: ‘log_to_console’ was not declared in this scope
./src/logger/logger.hpp: In member function ‘int file_logger::get_log_level()’:
./src/logger/logger.hpp:166: error: ‘log_level’ was not declared in this scope
./src/logger/logger.hpp: In member function ‘file_logger& file_logger::operator<<(T)’:
./src/logger/logger.hpp:174: error: ‘streambuffkey’ was not declared in this scope
./src/logger/logger.hpp:176: error: ‘stringstream’ is not a member of ‘std’
./src/logger/logger.hpp:176: error: ‘streambuffer’ was not declared in this scope
./src/logger/logger.hpp:176: error: ‘struct logger_impl::streambuff_tls_entry’ has no member named ‘streambuffer’
./src/logger/logger.hpp: In member function ‘file_logger& file_logger::operator<<(const char_)’:
./src/logger/logger.hpp:187: error: ‘streambuffkey’ was not declared in this scope
./src/logger/logger.hpp:189: error: ‘stringstream’ is not a member of ‘std’
./src/logger/logger.hpp:189: error: ‘streambuffer’ was not declared in this scope
./src/logger/logger.hpp:189: error: ‘struct logger_impl::streambuff_tls_entry’ has no member named ‘streambuffer’
./src/logger/logger.hpp:194: error: ‘strlen’ was not declared in this scope
./src/logger/logger.hpp:195: error: ‘stream_flush’ was not declared in this scope
In file included from ./src/engine/graphchi_engine.hpp:54,
from ./src/graphchi_basic_includes.hpp:45,
from example_apps/connectedcomponents.cpp:50:
./src/shards/memoryshard.hpp: At global scope:
./src/shards/memoryshard.hpp:162: error: expected unqualified-id before end of line
./src/shards/memoryshard.hpp:162: error: expected declaration before end of line
./src/logger/logger.hpp:119: warning: ‘messages’ defined but not used
make: *_* [example_apps/connectedcomponents] Error 1
'''

Implement < Linear iterative solver > toolkit of GraphLab

solve Ax = b. I put [A b] as a new input matrix.It seems like Graphchi can deal with only a Matrix at one moment.
If there is a way to implement app as following:
./bin/solve_linear_app file A.mm b.mm

Implementing new graph file parsers for graphchi

I would like to try graphchi with a collection of graphs in the so-called METIS format, a simple adjacency list format, which is however not the same as the adjacency lists already supported.

http://www.cc.gatech.edu/dimacs10/downloads.shtml

The Introduction to Example Applications states that "it is fairly easy to write your own parsers. ", but it is not apparent how this works. Looking at the source code did not get me far. There should be some hints in the documentation on how to create a new parser.

Time svd++ does unneeded work?

When reading timesvdpp source code I noticed a weird thing. If you look at https://github.com/GraphChi/graphchi-cpp/blob/master/toolkits/collaborative_filtering/timesvdpp.cpp#L204

for (int i = 0; i < (int)M; i++) {
    vertex_data & data = latent_factors_inmem[i];
    data.pvec = zeros(4*D);
    time_svdpp_usr usr(data);
    *usr.bu = 0;
    for (int m=0; m< D; m++){
      usr.p[m] = 0.01*drand48() / (double) (D);
      usr.pu[m] = 0.001 * drand48() / (double) (D);
      usr.x[m] = 0.001 * drand48() / (double) (D);
      usr.ptemp[m] = usr.p[m];
    }
  }

you will notice that it creates a time_svdpp_usr object initializes it with random values, and then it immediately goes out of scope and thus gets destroyed after current iteration. The same thing happens with time_svdpp_movie, and time_svdpp_time. Is there a reason for doing this or is it a bug?
Regards.

does SVD++ and libFM support multi-core?

If yes, how was SGD method parallelized?

fatal error when compiling graphchi-cpp on Mac

after successfully getting clang@mp compiled on Mac (10.9.4), I got stuck when compiling graphchi-cpp, the errors are indicated as following

example_apps/connectedcomponents:47:18 fatal error: 'cmath' file not found.

I checked 'Makefile' in the folder of graphichi-cpp', the compiler was set to 'clang2', which is openmp enabled clang (C) compiler, but the version is 3.5. Actually, I want to set the corresponding compiler to clang2++ (C++ compiler) instead, but when I checked clang@mp installation directory, no clang++ compiler was generated, suspecting the version of llvm with openmp enabled is too low. How can I solve this issue?

Not getting validation rmse below 0.95 with MovieLens 100k

Hi,
I am trying out different algos with Movielens 100k (ua.base and ua.test).
But best rmse I could find is biassgd

./toolkits/collaborative_filtering/biassgd --training=movielense.train --validation=movielense.test --biassgd_lambda=0.25 --biassgd_gamma=0.10 --minval=1 --maxval=5 --max_iter=100 --quiet=1

Iteration: 99 Training RMSE: 0.906892 Validation RMSE: 0.955748

Can you post best some best tunings ... as I have seen some here http://mymedialite.net/examples/datasets.html but could not translate them directly here.

Also, how to control number of latent features (hidden layer size in case of RBM) ?

Is it good to use graphchi to run an algorithm based on BFS?

I am using graphchi to run this kind of algorithm, but it costs me many iterations. I don't know if it is efficient to use graphchi to run something like this without parallel computation.

any ideas about set parameters for BiasSVD to learn Netflix Prize data?

Thanks a lot. Basically, I'v tried biassgd_lambda in [1e0,1e-1,1e-2,1e-3,1e-4] and biassgd_gamma in [1e0,1e-1,1e-2,1e-3,1e-4]. But it seems does not work. And sometimes got into numerical errors at some point.

graphchi / graphchi-cpp Goto Github PK

graphchi-cpp's People

Contributors

Stargazers

Watchers

Forkers

graphchi-cpp's Issues

Analogous with als: starting with als --unittest=1 --training=test_als

3 0x000000000042a476 in graphchi::graphchi_engine<unsigned int,

4 0x0000000000404eb3 in main (argc=, argv=<optimized

2 0x00000000004428ed in

Recommend Projects

Recommend Topics

Recommend Org