ucbrise / confluo Goto Github PK
View Code? Open in Web Editor NEWReal-time Monitoring and Analysis of Data Streams
Home Page: https://ucbrise.github.io/confluo/
License: Apache License 2.0
Real-time Monitoring and Analysis of Data Streams
Home Page: https://ucbrise.github.io/confluo/
License: Apache License 2.0
Some parameters shared between the server and the client (e.g., TIME_RESOLUTION_NS) can become inconsistent if either the server or the client changes them. We need to add a mechanism to ensure consistency for these parameters.
Support projections for cases where not all columns are required in the output of a query.
Support Golang client and demo.
Right now, the time-period for trigger evaluation defaults to whatever the filter granularity is. We should dissociate the two and make sure the monitor thread correctly aggregates required filters at the proper time-period.
storage_mode
is used by both the read tail and data log, so the current in_memory
mode allocates using a mempool
only if the size requested matches; otherwise it calls malloc
. Need a cleaner way to make the allocations.
The rpc_dialog_writer hasn't had remove_trigger, remove_filter or remove_index tested. Add test cases to the C++ writer client test to cover this functionality.
The current reflog is a monolog_exp2, which has exponentially growing buckets. This would require a memory pool for each bucket. Instead, the reflog should have exponentially growing bucket containers that have pointers to fixed size buckets. The buckets can then be allocated using a single memory pool.
Add (potentially modified) SQL interface for supporting queries, adding/removing filters/indexes/triggers, etc.
The current query planner has a few issues that need to be resolved:
The following methods of the monolog implementations need to be tested if they will be used in the future.
void set(size_t idx, const T* data, size_t len);
void set_unsafe(size_t idx, const T* data, size_t len);
void get(T* data, size_t idx, size_t len);
size_t storage_size();
On Linux machines the vm.max_map_count
limit is hit very quickly (default on an ec2 c4.8xlarge
instance is 65536). This needs to be manually increased for archival to be successful (since reflogs have relatively small buckets and each bucket is memory mapped).
Instead the storage_allocator could potentially memory map larger-than-requested chunks of files and store the excess of the available range for future calls. It would also have to map at a different granularity depending on what's being mmapped (data log vs reflog buckets). This could be done by just looking at the size passed to the mmap call.
i complie confluo occur an error, and i already read other complie error issues, but no one is my problem, colud you help me? thank you very much.
error message:
In file included from /root/work/confluo/libconfluo/../libutils/utils/atomic.h:7:0,
from /root/work/confluo/libconfluo/confluo/archival/archival_utils.h:4,
from /root/work/confluo/libconfluo/src/archival/archival_utils.cc:1:
/usr/include/c++/4.8.2/atomic: In instantiation of ‘struct std::atomic<confluo::storage::encoded_ptr >’:
/root/work/confluo/libconfluo/confluo/storage/swappable_encoded_ptr.h:367:32: required from ‘class confluo::storage::swappable_encoded_ptr’
/root/work/confluo/libconfluo/src/archival/archival_utils.cc:17:42: required from here
/usr/include/c++/4.8.2/atomic:167:7: error: function ‘std::atomic<_Tp>::atomic() [with _Tp = confluo::storage::encoded_ptr]’ defaulted on its first declaration with an exception-specification that differs from the implicit declaration ‘std::atomic<confluo::storage::encoded_ptr >::atomic()’
atomic() noexcept = default;
^
make[2]: *** [libconfluo/CMakeFiles/confluo.dir/src/archival/archival_utils.cc.o] Error 1
my enviroment configure
boost 1.69
gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
i only build RPC component, others all OFF
Expose Sketch API to be able to:
Expose corresponding remote API.
Command line parser sometimes maps short options to completely unrelated arguments.
Implement count-min-sketch (supposedly more space-efficient than count-sketch)
Time-adaptive sketch: decay accuracy & storage for older values (using inflation)
Integrate with atomic multilog for offline/online approximate queries.
If possible, in the CMakeLists.txt
check if all of the dependencies for the Python client are installed.
Currently if a Python dependency is missing, it isn't detected and the build fails:
Traceback (most recent call last):
File "setup.py", line 3, in <module>
from setuptools import setup
ImportError: No module named setuptools
pyclient/CMakeFiles/pyclient_build.dir/build.make:57: recipe for target 'pyclient/CMakeFiles/pyclient_build' failed
make[2]: *** [pyclient/CMakeFiles/pyclient_build] Error 1
CMakeFiles/Makefile2:1281: recipe for target 'pyclient/CMakeFiles/pyclient_build.dir/all' failed
make[1]: *** [pyclient/CMakeFiles/pyclient_build.dir/all] Error 2
Makefile:160: recipe for target 'all' failed
make: *** [all] Error 2```
Currently, there is data is double copied at the server side:
This resolves the java bug where it prints "Received n" for every RPC call.
Appending to an atomic multilog even when it has no filters and indexes may have a large memory overhead despite only having to update a single data structure (the data log). Need to profile to discover possible memory leak(s).
Archive data, filters and indexes periodically and when hitting a memory limit.
Support nullable types for records that may have missing values.
Possibly caused by atomic multilog append latency.
Currently the client must read from and write to a table using a contiguous string buffer. Add an interface to the C++ and Python RPC clients to read and write using JSON for easier usability. Validate all fields when a record is written using this interface.
The documents do not mention about these, and there is nothing about subscription in the client API.
How to use Confluo as a pubsub system?
These tests would check if we can successfully recover a table from an on-disk version of the data.
Store/compute all aggregates as double precision values rather than the specific types of aggregated attribute. This avoids issues of overflow, etc for smaller types.
Add a parser for memory sizes, e.g, the values provided could be 10k or 100M or 5g, and it would get parsed to the appropriate number of bytes.
size_t configuration_params::MAX_MEMORY = dialog_conf.get<size_t>(
"max_memory", constants::DEFAULT_MAX_MEMORY);
The way the current radix_tree_node
is written does not permit for a simple mempool allocation of the entire struct.
Currently the record size for a schema is fixed; switch to dynamically sized records.
Determining if the pthread_create exist failed with the following output:
Change Dir: /root/confluo_test/confluo/build/CMakeFiles/CMakeTmp
Run Build Command:"/usr/bin/gmake" "cmTC_6011a/fast"
/usr/bin/gmake -f CMakeFiles/cmTC_6011a.dir/build.make CMakeFiles/cmTC_6011a.dir/build
gmake[1]: Entering directory /root/confluo_test/confluo/build/CMakeFiles/CMakeTmp' Building C object CMakeFiles/cmTC_6011a.dir/CheckSymbolExists.c.o /usr/bin/cc -o CMakeFiles/cmTC_6011a.dir/CheckSymbolExists.c.o -c /root/confluo_test/confluo/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c Linking C executable cmTC_6011a /usr/local/bin/cmake -E cmake_link_script CMakeFiles/cmTC_6011a.dir/link.txt --verbose=1 /usr/bin/cc -rdynamic CMakeFiles/cmTC_6011a.dir/CheckSymbolExists.c.o -o cmTC_6011a CMakeFiles/cmTC_6011a.dir/CheckSymbolExists.c.o: In function
main':
CheckSymbolExists.c:(.text+0x16): undefined reference to pthread_create' collect2: error: ld returned 1 exit status gmake[1]: *** [cmTC_6011a] Error 1 gmake[1]: Leaving directory
/root/confluo_test/confluo/build/CMakeFiles/CMakeTmp'
gmake: *** [cmTC_6011a/fast] Error 2
File /root/confluo_test/confluo/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:
/* */
#include <pthread.h>
int main(int argc, char** argv)
{
(void)argv;
#ifndef pthread_create
return ((int*)(&pthread_create))[argc];
#else
(void)argc;
return 0;
#endif
}
Determining if the function pthread_create exists in the pthreads failed with the following output:
Change Dir: /root/confluo_test/confluo/build/CMakeFiles/CMakeTmp
Run Build Command:"/usr/bin/gmake" "cmTC_f4480/fast"
/usr/bin/gmake -f CMakeFiles/cmTC_f4480.dir/build.make CMakeFiles/cmTC_f4480.dir/build
gmake[1]: Entering directory /root/confluo_test/confluo/build/CMakeFiles/CMakeTmp' Building C object CMakeFiles/cmTC_f4480.dir/CheckFunctionExists.c.o /usr/bin/cc -DCHECK_FUNCTION_EXISTS=pthread_create -o CMakeFiles/cmTC_f4480.dir/CheckFunctionExists.c.o -c /usr/local/share/cmake-3.13/Modules/CheckFunctionExists.c Linking C executable cmTC_f4480 /usr/local/bin/cmake -E cmake_link_script CMakeFiles/cmTC_f4480.dir/link.txt --verbose=1 /usr/bin/cc -DCHECK_FUNCTION_EXISTS=pthread_create -rdynamic CMakeFiles/cmTC_f4480.dir/CheckFunctionExists.c.o -o cmTC_f4480 -lpthreads /usr/bin/ld: cannot find -lpthreads collect2: error: ld returned 1 exit status gmake[1]: *** [cmTC_f4480] Error 1 gmake[1]: Leaving directory
/root/confluo_test/confluo/build/CMakeFiles/CMakeTmp'
gmake: *** [cmTC_f4480/fast] Error 2
A generic parsing framework would eliminate code redundancy in parsing:
Allow filters to use a function instead of just an expression. The filter class already accepts a function in the constructor but there's currently no way for an RPC client to specify a function, so the default_filter is always used.
Get alerts by time and trigger name
Subscribe to alerts
Active alerts
Corresponding C++/Python client methods and tests
Hi there,
I see there's been some activity in a branch called chain-replication
but it'd be nice to know what you folks have in mind for scale-out architecture. :)
Create a separate build process for debugging that enables the debug flag and does not enable compiler optimizations.
Currently the client exposes buffered reads and writes that assume sequential access patterns. These should be moved to a separate streaming interface.
Adhoc filter is creates an empty plan for queries on long fields with no upper bound (e.g., value >= 100). This may potentially extend to other types, have not confirmed.
Improve allocation using pools and file consolidation.
Hi, I'm trying to make install confluo. However, the installation has some errors to get maven repo. I have a maven proxy (mirror). Where can I set it in the conflue source?
[ 32%] Building CXX object libconfluo/CMakeFiles/confluo.dir/src/confluo_store.cc.o
In file included from /home/root/C++/confluo/libconfluo/confluo/aggregate/aggregate_info.h:7:0,
from /home/root/C++/confluo/libconfluo/confluo/aggregated_reflog.h:5,
from /home/root/C++/confluo/libconfluo/confluo/filter.h:4,
from /home/root/C++/confluo/libconfluo/confluo/archival/load_utils.h:7,
from /home/root/C++/confluo/libconfluo/confluo/atomic_multilog.h:11,
from /home/root/C++/confluo/libconfluo/confluo/confluo_store.h:8,
from /home/root/C++/confluo/libconfluo/src/confluo_store.cc:1:
/home/root/C++/confluo/libconfluo/confluo/parser/aggregate_parser.h:32:27: error: expected identifier before ‘(’ token
(std::string, agg)
^
/home/root/C++/confluo/libconfluo/confluo/parser/aggregate_parser.h:32:41: error: ‘agg’ has not been declared
(std::string, agg)
^
/home/root/C++/confluo/libconfluo/confluo/parser/aggregate_parser.h:33:23: error: ‘field_name’ has not been declared
(std::string, field_name))
^
/home/root/C++/confluo/libconfluo/confluo/parser/aggregate_parser.h:33:33: error: ‘parameter’ declared as function returning a function
(std::string, field_name))
^
/home/root/C++/confluo/libconfluo/confluo/parser/aggregate_parser.h:35:1: error: expected constructor, destructor, or type conversion before ‘namespace’
namespace confluo {
^
make[2]: *** [libconfluo/CMakeFiles/confluo.dir/src/confluo_store.cc.o] Error 1
make[1]: *** [libconfluo/CMakeFiles/confluo.dir/all] Error 2
make: *** [all] Error 2
GCC VERSION:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)
BOOST VERSION:
1.53.0
LOCATE :
boost: /usr/include/boost
Improve and add documentation where none exists in several header files, including but not limited to:
storage.h
monolog_linear.h
monolog_exp2.h
dialog_table.h
expression.h
tiered_index.h
The User Guide at https://ucbrise.github.io/confluo/ could use a full Python version. At this time it's completely unclear how to use Confluo from Python.
Currently, the trigger evaluation framework uses realtime (more precisely, server time) to determine which time-buckets to check for trigger evaluation. This can potentially be an issue if the application supplies the timestamps, and these timestamps have no correlation with server time.
If a filter or index was invalidated it shouldn't be valid again on recovery/restart.
A reader thread freeing memory does not know the id of the writer thread that allocated the pointer in the first place.
Currently, Confluo requires a fixed schema for each atomic multilog, which restricts different applications like network monitoring and diagnosis. We want to support flexible schemas (e.g., JSON) to facilitate these applications.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.