laurolins / nanocube Goto Github PK
View Code? Open in Web Editor NEWLicense: Eclipse Public License 1.0
License: Eclipse Public License 1.0
hello guys,
I met a stupid problem when I tried to configure it ....
configure: WARNING: boost/thread.hpp: present but cannot be compiled
configure: WARNING: boost/thread.hpp: check for missing prerequisite headers?
configure: WARNING: boost/thread.hpp: see the Autoconf documentation
configure: WARNING: boost/thread.hpp: section "Present But Cannot Be Compiled"
configure: WARNING: boost/thread.hpp: proceeding with the compiler's result
I googled a lot and tried all possible solutions, but still it does not work.
any clue?
system config:
os: ubuntu 14.04.1
gcc/g++ 4.8.2
thanks a lot.
Hello there,
I've checked on Nabble forum but haven't found the related topic soooo...
Here I'm, let's try this beautifull stuff :)
So
my env is:
-OSX
-python installed with pandas in separeted env (I followed your guide)
my csv is (header and first line):
type,date,longitude,latitude,gender,age,ethnicity,outcome,clothes_removal
Person search,2014-12-01T00:10:00+00:00,-2.571604,51.414716,Male,over 34,White,Nothing found - no further action,
...many other lines
unset PYTHONHOME
unset PYTHONPATH
source $NANOCUBE_SRC/myPy/bin/activate
creating the dump (some warning on the date/deprecated etc.. but seems ok)
nanocube-binning-csv --sep=',' --latcol='latitude' --loncol='longitude' --timecol='date' --catcol='type','gender','age','ethnicity','outcome','clothes_removal' path_to_csv/stop_search.csv > stop_search.dmp
that's the .dmp header (first 37 lines+blank line)
name: same_path_to_csv/stop_search.csv
encoding: binary
metadata: location__origin degrees_mercator_quadtree25
field: location nc_dim_quadtree_25
field: type nc_dim_cat_1
valname: type 0 Person_and_Vehicle_search
valname: type 1 Person_search
field: gender nc_dim_cat_1
valname: gender 1 Male
valname: gender 2 Other
valname: gender 0 Female
field: age nc_dim_cat_1
valname: age 1 18-24
valname: age 2 25-34
valname: age 0 10-17
valname: age 4 under_10
valname: age 3 over_34
field: ethnicity nc_dim_cat_1
valname: ethnicity 3 White
valname: ethnicity 2 Other
valname: ethnicity 1 Black
valname: ethnicity 0 Asian
field: outcome nc_dim_cat_1
valname: outcome 0 Article_found_-Detailed_outcome_unavailable
valname: outcome 2 Nothing_found-_no_further_action
valname: outcome 6 Suspect_arrested
valname: outcome 4 Offender_given_drugs_possession_warning
valname: outcome 7 Suspect_summonsed_to_court
valname: outcome 1 Local_resolution
valname: outcome 3 Offender_cautioned
valname: outcome 5 Offender_given_penalty_notice
field: clothes_removal nc_dim_cat_1
valname: clothes_removal 1 True
valname: clothes_removal 0 False
metadata: tbin 2014-12-01_00:00:00_3600s
field: date nc_dim_time_2
field: count nc_var_uint_4
cat stop_search.dmp | nanocube-leaf -q 29512 -f 10000
VERSION: 3.2.1 Could not find program: /Users/my_user_name/nanocube-3.2.1/bin/nc_q25_c1_c1_c1_c1_c1_c1_u2_u4
Where do I mess something ? :(
Thanks in advance,
Luca
Sorry for my verbosity, just wish it could help
Hi there! We're wanting to deploy nanocubes through iframe in web portal(s) - we've implement a replace rule on application nginx server http://" and "https://" to just "//" (so protocol independent URL) - but this doesn't appear sufficient as nanocube server side JavaScript is making http requests which breaks if front end portal is https. Equally if we hard code https URLs in the nanocube backend, we can't support portals running over http. Is there a protocol independent solution we can implement on nanocube server side? Any guidance would be most appreciated. Kind regards, Mark
Dear @laurolins/nanobuce support,
I've recently discovered your toolkit and I'm testing it.
The test demo about crimes on my machine at localhost now works perfectly, BUT...
It's my behaviour to set in .profile of bash the export of variables.
I've seems that your configuration and installation depend on the name of the variables and not on their content.
That's the case of NANOCUBE_SRC, the dependency seems to be on the name of the variable and not on the content of itself.
I may suppose that there is some "hardcoded" dependency on it, I've tryed to rename it to "NANOCUBE_HOME" in every parameter used during the configuration but it fail at make/make install step.
I hope that my report would be helpfull for your development.
Best regards
Luca
I installed all the dependencies as said in README as follows:
sudo apt-get install build-essential
sudo apt-get install automake
sudo apt-get install libtool
sudo apt-get install zlib1g-dev
sudo apt-get install libboost-all-dev
sudo apt-get install libcurl4-openssl-dev
Then I input the follow command into console:
wget https://github.com/laurolins/nanocube/archive/3.2.1.zip
unzip 3.2.1.zip
cd nanocube-3.2.1
export NANOCUBE_SRC=`pwd`
./bootstrap
mkdir build
cd build
../configure --prefix=$NANOCUBE_SRC CXXFLAGS="-O3"
Then the problem showed at the last command of configure....
The error shows below and I don't know if I should do some extra operations to solve it?
checking for Boost's header version...
configure: error: invalid value: boost_major_version=
Any ideas about it? Thanks.
Running the following command :
../configure --prefix=$NANOCUBE_SRC CXXFLAGS="-O3"
Gives me the following output :
configure: Detected BOOST_ROOT; continuing with --with-boost=/usr/include/boost157
checking for Boost headers version >= 1.48.0... no
configure: error: cannot find Boost headers version >= 1.48.0
Before that, using the ./bootstrap command, I received the following error :
configure.ac:30: error: possibly undefined macro: LT_LIB_DLLOAD
If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation
PS : I have installed boost in my machine before, following this tutorial
Hi All,
I followed the tutorial to install the nanocube from master branch.
I run this in scripts folder:
python csv2Nanocube.py --catcol='Primary Type' --latcol='Latitude' --loncol='Longitude' crime50k.csv | NANOCUBE_BIN=../src ../src/ncserve --rf=100000 --threads=100
In the browser: http://localhost:8000
I can see the control panel, but the map side is blank with right top and left down part gray.
Is that normal?
Thank you,
Colin
Would you like to replace any double quotes by angle brackets around file names for include statements?
Would you like to add the configuration script "AX_PTHREAD" to your build specification?
Hi,
I've got in my csv file a time column with years, from 1901 to 2014.
When I execute that command, I get no error.
python csv2Nanocube.py --sep="," --catcol='type' --latcol="lat" --loncol="lon" --timecol="year" file.csv | NANOCUBE_BIN=../src ../src/ncserve --rf=100000 --threads=100
But in my browser, the time plot is not correct. It stops in 1908 and the curves are not correct.
How can I solve my problem? I tried to use the --datefmt but it didn't work.
Hi there! I'm grappling with an OLAP style problem and I'm hoping to apply nanocubes, but I'm not entirely sure how well my problem maps to this domain.
I've got an event stream representing changes to a set of entities. Something like 30 million entities, each of which might have a dozen dimensions. New events for each entity could arrive years or seconds apart. There is no spatial component to the data.
I mostly answer queries along the lines of 'at midnight every day between 2015-01-01 and 2015-07-31, how many entities had dimensions A = 1, B = 8, C = 3'. Maybe a colloquial way of stating the problem could be 'at midnight each day, how many people are watching netflix, eating popcorn, and wearing red socks'. My event stream only tells me when events change.
So in Postgres (after months of research into the validity of this approach) I end up building table partitions for each dimension, each row containing the entity id, the dimension's value, and the tsrange for which this fact was true. Then the problem reduces to intersecting time ranges, and building plain old macro scale cubes to cache aggregation results. But the bloat is staggering: ~6 GB of compressed data when unpacked this way and indexed tops 120 GB, and I'm not even considering all the possible dimensions yet. I feel like I'm forcing myself towards a big data problem I shouldn't have.
How might one introduce the concept of an event with a duration into a nanocube? If you can point me in the right direction I'll be sure to contribute some sample code back to the repo :)
Would you like to add more error handling for return values from functions like the following?
Hello,
I am trying to comprehend the algorithm to build a nanocube. My final goal is to multithread the building process.
To understand the pseudocode which comes with the nanocube paper, I tried to apply it to the illustration on page 2 (Fig. 2).
Adding the first point (o1) works fine and I get the exact same result as illustrated. If I try to add the second point (o2) I run into the following problem:
Starting from the nanocube #1 in Fig. 2, we want to add the second point o2 ((0,1), (01,10) ; IPhone). According to the pseudocode on page 3 (Fig. 3), the following instructions will be performed in this order:
updated_nodes = empty set
ADD(nano_cube, o2, 1, S, ltime, updated_nodes)
[l1, l2] = CHAIN(S, 1)
stack = TRAILPROPERPATH(nano_cube, [l1(o2), l2(o2)])
stack = STACK()
PUSH(stack, nano_cube)
node = nano_cube
child = CHILD(nano_cube, (0,1))
PUSH(stack, (0,1))
node = (0,1)
child = CHILD((0,1), (01,10))
PUSH(stack, (01,10))
node = (01,10)
return stack
child = null
node = POP(stack) // (01,10)
update = false
update = true // Content is proper
ADD(catNode, o2, 2, updated_nodes) // catNode is the node under (01,10)
[ld] = CHAIN(S, 2) //ld is the device labeling function
stack = TRAILPROPERPATH(catNode, [ld(o2)])
stack = STACK()
PUSH(stack, catNode)
node = catNode
child = CHILD(catNode, “IPhone”)
child = NEWPROPERCHILD(catNode, “IPhone”, NODE())
PUSH(stack, “IPhone”)
node = child
return stack
child = null
node = POP(stack) // “IPhone” node
update = false
SETPROPERCONTENT(“IPhone”, SUMMEDTABLETIMESERIES()) //IPhoneTimeSeries
update = true
INSERT(IPhoneTimeSeries, ltime(o2))
INSERT(updated_nodes, IPhoneTimeSeries)
child = “IPhone” // IPhone Node
node = POP(stack) // catNode
update = false
### Weirdness begins here ### Content of catNode is shared and not in updated_nodes
shallowCopy = SHALLOWCOPY(androidTimeSeries) //CONTENT(catNode) is the Android Timeseries, isn’t it?
node_sc = NODE()
SETSHAREDCONTENT(node_sc, CONTENT(androidTimeSeries)) //what is the value of the second argument in this case? What is the content of a timeseries?
Did I do something wrong?
when the point is very large, you cann't know what point is hot . maybe the hotMap is userful.
Hi,
I'm currently trying to pass a custom dmp file to nanocube in order to visualize my dataset.
But firstly, I wanted to test the example with the sftaxi.dmp given here :
https://github.com/laurolins/nanocube/blob/master/web/README
I follow the instructions, but when I go here :
http://localhost:8000/sftaxi_src.html
I see nothing and get the following error in firebug:
TypeError: str is undefined, leaflet-src.js (ligne 138)
I tried to update leaflet (to 0.7.3), but no more luck.
Do you have any ideas?
EDIT:
Another question, on the same thread. In the example.dmp given on the wiki, the data seems to be coded, and there is no "header" with fields like in the sftaxi.dmp. Besides, in the sftaxi.dmp, the data in written in "clear".. I'm a bit lost..
Eventually, could you explain to me the following command ? :
cat sftaxi.dmp |
ncdmp --encoding=b
dim-dmq=src,src_lat,src_lon,25
dim-dmq=dst,dst_lat,dst_lon,25
dim-tbin=time,time,2008-01-01_1h,2
var-one=count,4
| ncserve --rf=100000 --port=29513
Thank you in advance
Julien
How does one retrieve the extents of different dimensions?
In particular, the time dimensions have the lower bound specified in the tbin
metadata, but is there an upper bound? Are bounds known for the spatial dimensions?
Hi guys,
I compiled 1.0 successfully but now what shall I do to try out the webclient? I assume I have to give some data to stdin of the stree_serve binary?
Thanks so much!
Daniel
Is it possible to save(on disk) or cut a piece of the tree, in order to become viable for a data streaming problem?
checking for Boost headers version >= 1.48.0... yes
checking for Boost's header version... 1_58_0
checking for the flags needed to use pthreads... -pthread
checking for the toolset name used by Boost for g++... configure: WARNING: could not figure out which toolset name to use for g++
checking boost/system/error_code.hpp usability... yes
checking boost/system/error_code.hpp presence... yes
checking for boost/system/error_code.hpp... yes
checking for the Boost system library... yes
checking boost/thread.hpp usability... no
checking boost/thread.hpp presence... yes
configure: WARNING: boost/thread.hpp: present but cannot be compiled
configure: WARNING: boost/thread.hpp: check for missing prerequisite headers?
configure: WARNING: boost/thread.hpp: see the Autoconf documentation
configure: WARNING: boost/thread.hpp: section "Present But Cannot Be Compiled"
configure: WARNING: boost/thread.hpp: proceeding with the compiler's result
configure: WARNING: ## ----------------------------------- ##
configure: WARNING: ## Report this to [email protected] ##
configure: WARNING: ## ----------------------------------- ##
checking for boost/thread.hpp... no
configure: error: cannot find boost/thread.hpp
~/nanocube-3.2.1/build$ uname -a
Linux mijn 2.6.32-042stab093.4 #1 SMP Mon Aug 11 18:47:39 MSK 2014 x86_64 GNU/Linux
Note I had to edit the configure script because it wasn't getting the Boost header version. Here's how I "fixed" that issue:
boost_cv_lib_version=1_58_0 #`cat conftest.i`
[root@centos nanocube-master]# make
make all-recursive
make[1]: Entering directory /tmp/nanocube-master' Making all in src make[2]: Entering directory
/tmp/nanocube-master/src'
depbase=echo ncdmp.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'
;
g++ -DHAVE_CONFIG_H -I. -I.. -I../src -I/usr/include -I/usr/include -D_GLIBCXX_USE_NANOSLEEP -D_GLIBCXX_USE_SCHED_YIELD -pthread -DVERSION="2014.03.25_13:26" -g -O2 -std=c++0x -MT ncdmp.o -MD -MP -MF $depbase.Tpo -c -o ncdmp.o ncdmp.cc &&
mv -f $depbase.Tpo $depbase.Po
In file included from ncdmp_base.hh:1,
from ncdmp.cc:1:
DumpFile.hh:111: error: function definition does not declare parameters
DumpFile.hh: In member function 'bool dumpfile::DumpFileDescription::isBinary() const':
DumpFile.hh:105: error: 'encoding' was not declared in this scope
DumpFile.hh: In member function 'bool dumpfile::DumpFileDescription::isText() const':
DumpFile.hh:107: error: 'encoding' was not declared in this scope
ncdmp.cc: In function 'int main(int, char*)':
ncdmp.cc:226: error: expected initializer before ':' token
ncdmp.cc:229: error: expected primary-expression before '}' token
ncdmp.cc:229: error: expected ';' before '}' token
ncdmp.cc:229: error: expected primary-expression before '}' token
ncdmp.cc:229: error: expected ')' before '}' token
ncdmp.cc:229: error: expected primary-expression before '}' token
ncdmp.cc:229: error: expected ';' before '}' token
ncdmp.cc:232: error: 'struct dumpfile::DumpFileDescription' has no member named 'encoding'
ncdmp.cc:234: error: 'struct dumpfile::DumpFileDescription' has no member named 'encoding'
make[2]: ** [ncdmp.o] Error 1
make[2]: Leaving directory /tmp/nanocube-master/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory
/tmp/nanocube-master'
make: *** [all] Error 2
Hello,
We at YP Mobile Labs are currently working with this interesting technology to visualize some of our datasets.
We compiled ourselves a nc_q25_c1_c1_c1_c1_c1_c1_c1_c1_c1_c1_c1_c1_u2_u4 cube, which works fine, but it takes approximately two weeks to put ~200M points in it.
How can we speed up the building process? We thought about multithreading the building process. If that is a suitable way, can you guys provide us with some information and tips on how to do this in the best way possible?
Hi guys,
So I downgrade my pandas version to 0.15=6.2, however it can not accept 'to_numeric'.
So my question is "which version should I use to make it work?"
Regards,
Hawk
Is there any reason why Docker isn't supported by the project?
nanocube-binning-csv expects the column names to be present in the first line of the file.
When we deal with files of size 15-20GB it is time consuming to insert that line.
It would be good if we could pass the column schema via command line and not as first line in file.
hi。our web application only use in china 。can nanocube only show one country?how can i do it?
Our "core" query language is a conjunction of clauses across dimensions. This core language is good in that if the query is resolution-bounded, so is the time it takes to answer the query.
At the same time, very common and important queries fall outside this. (Typical example: difference between two heatmaps).
This issue will track the progress of a general querying infrastructure.
Avoid crashing the server on this assertion:
Assertion failed: (min_address.level == max_address.level && min_address.x <= max_address.x && min_address.y <= max_address.y), function visitRange, file ../../src/QuadTree.hh, line 1146.
line 168 has a bug。 var n_records = data.byteLength / record_size change to var n_records = Math.floor(data.byteLength / record_size);
when make
I've installed Nanocubes 3.2.1, and Chicago crime data.
There appears to be a bug when playing around with the time slider - particularly if
you move the time slider brush outside of the plotted data range - at this stage all interactive controls stop working and need to hit page refresh. It happens in both Chrome and Safari. Haven't test firefox.
Temporary demo site here:
http://52.64.86.244/nanocube/
I am in china 。the lat and lng is change。I don't konw what is Coordinate System in nanocube。 Is it WGS84?
Noticed in src/nc.cc there are a fair amount of unavailable API methods defined.
IDs are a concern to my team. We would prefer to use nanocubes for our solution, but without retrieving an original reference to the dataset returned from a query, we will be unable to feed the nanocube result to query an external service.
When will the API be fully developed?
In the BrightKite demo, if you select multiple bars from one histogram, it only uses the selection that is lowest in index. So if you select both 'Mon' and 'Tue', it will act as though only 'Mon' is selected, even though that isn't true (too see this note that the time series char doesn't change).
I'm trying a new dataset, and during the build process, I get
terminate called after throwing an instance of 'std::runtime_error'
what(): Invalid Path Size
I don't know if this error message is from FlatTree.hh
or from FlatTreeN.hh
, though.
I'm happy to share the dataset, but I wanted to confirm this is a bug and not a problem in my data formatting. It happens about 800k elements into the dataset, so some of the dataset does get processed correctly.
IN FlatTreeSerialization.hh Line 38.
type->getNumFields()
It resets to 0 after the number of categories crosses 256. Something to do with datatype.
Hi,
Is it possible to select two or more columns as categories with the "--catcol" command? Like in this example http://nanocubes.net/view.html#flights
Julien
Use one hierarchy to implement a multi-tag bit set.
I would like to point out that identifiers like "__mg_master_callback
" and "__server
" do eventually not fit to the expected naming convention of the C language standard.
Would you like to adjust your selection for unique names?
I read the document ,example。but cann't understand var-uint ,var-one. can you give a
example ?
According to our project, we would like to know the coordinates of the polygon corners while the user is drawing it, in particular on click event on the map; so we would like to use this code of the library Leaflet JS or something similar:
L.ClickHandler = L.Handler.extend({
addHooks: function() {
L.DomEvent.on(document, 'click', this._captureClick, this);
},
removeHooks: function() {
L.DomEvent.off(document, 'click', this._captureClick, this);
},
_captureClick: function(event) {
latElong = mymap.mouseEventToLatLng(event)
alert(mymap.mouseEventToLatLng(event))
return latElong
}
});
L.Map.addInitHook('addHandler', 'click', L.ClickHandler);
var mymap = L.map('mapid', {
click: true
}).setView([51.505, -0.09], 13);
and obtaining a similar result:
Our question is: where could we insert this function in your project? Where and how is defined the map in your project?
Thank you for your attention
Best regards,
Nicholas
Does the current implementation support the adding new data to the current used nanocube?
It would be a good idea to allow users to configure the cross-origin headers emitted by nanocube http servers. Right now we're hard-coding "Access-Control-Allow-Origin: *"
, which means that if we ever want to add a layer of security it needs to happen via a proxy. That is possible but painful for users to configure. We could add a command-line option for users to change *
to whatever they want.
Please could you advise how to get past this error?
mongoose.o: In function `load_dll':
/home/bencevans/Development/nanocube/src/mongoose.c:3834: undefined reference to `dlopen'
/home/bencevans/Development/nanocube/src/mongoose.c:3846: undefined reference to `dlsym'
One is badly needed :)
Hi All,
I followed the tutorial to install nanocube
And I run this script in scripts folder:
python csv2Nanocube.py --catcol='Primary Type' crime50k.csv | NANOCUBE_BIN=../src ../src/ncserve --rf=100000 --threads=100
and I got the following log:
//*********************************************
"VERSION: 2014.03.25_13:26
nc_dim_quadtree_25
quadtree dimension with 25 levels
nc_dim_cat_1
categorical dimension with 1 bytes
nc_dim_time_2
time dimension with 2 bytes
nc_var_uint_4
time dimension with 4 bytes
Dimensions: q25_c1
Variables: u2_u4
Registering handler: query
Registering handler: binquery
Registering handler: binqueryz
Registering handler: tile
Registering handler: tquery
Registering handler: bintquery
Registering handler: bintqueryz
Registering handler: stats
Registering handler: schema
Registering handler: valname
Registering handler: tbin
Registering handler: summary
Registering handler: graphviz
Registering handler: version
Registering handler: timing
Registering handler: start
Starting NanoCubeServer on port 29512
Mongoose starting 100 threads
ncserve: TaggedPointer.hh:47: void tagged_pointer::TaggedPointer::setPointer(T) [with T = quadtree::Node<flattree::FlatTree<timeseries::TimeSeries<nanocube::TimeSeriesEntryType<boost::mpl::vector<nanocube::u2, nanocube::u4> > > > >]: Assertion `data.aux.tag == 0 || data.aux.tag == 0xFF' failed."
**************************************************************************************************************//
I think that the nanocubes server is not running and I tried to change the nanocubes server port by editing config.json (in web folder) but this file does not exit.
Could you give me some advices?
P/s: when I ran “make” command to complie nanocubes. I got:
//*********************************************
“In file included from ContentHolder.hh:3:0,
from QuadTreeNode.hh:11,
from QuadTree.hh:13,
from NanoCube.hh:34,
from nc.cc:12:
TaggedPointer.hh:14:41: warning: left shift count >= width of type [enabled by default]
static const UInt64 bit47 = (1UL << 47);”
***************************************************************************************************************//
Is that normal?
Best,
LinhTH
It might be worth adding a warning to the user anytime an out-of-order (in the TimeSeries dimension) insertion happens. It's an easy mistake to make that triggers a 100x slowdown (it's accidentally just bitten us), and it's not completely obvious that the out-of-order time variable is the culprit.
I'm generating my queries from a custom language, so I'm making a lot of syntax errors in my queries. Things like missing parentheses and extra parentheses.
Instead of getting an error, these queries wait for about a minute and then return an empty response. It would be more helpful for them to fail, even if they are not specific about the syntax error.
Is there a more graceful way to stop a nanocube server than sending SIGQUIT or SIGKILL or whatever? I want to stop the server every night so that it can be reloaded with new data (there are updates released daily).
No, we don’t have that on the master branch. On v1.0 we had a mechanism by sending a shutdown request (something like http://host:port/shutdown=key). I will add an issue to include back this feature on the master branch.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.