Giter Site home page Giter Site logo

ucbrise / confluo Goto Github PK

View Code? Open in Web Editor NEW
1.4K 1.4K 203.0 11.55 MB

Real-time Monitoring and Analysis of Data Streams

Home Page: https://ucbrise.github.io/confluo/

License: Apache License 2.0

CMake 1.89% C++ 91.68% Shell 0.42% C 0.02% Thrift 0.27% Python 2.52% Java 3.21%
atomic lock-free log

confluo's People

Contributors

anuragkh avatar attaluris avatar concurrencypractitioner avatar dependabot[bot] avatar haal avatar khandelwalwires avatar louishust avatar neilgiri avatar neilgoldman avatar shaneknapp avatar ujvl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

confluo's Issues

Consistent parameters across Confluo servers and clients

Some parameters shared between the server and the client (e.g., TIME_RESOLUTION_NS) can become inconsistent if either the server or the client changes them. We need to add a mechanism to ensure consistency for these parameters.

Support projections

Support projections for cases where not all columns are required in the output of a query.

Add trigger time-period as a configurable parameter

Right now, the time-period for trigger evaluation defaults to whatever the filter granularity is. We should dissociate the two and make sure the monitor thread correctly aggregates required filters at the proper time-period.

storage_mode usage

storage_mode is used by both the read tail and data log, so the current in_memory mode allocates using a mempool only if the size requested matches; otherwise it calls malloc. Need a cleaner way to make the allocations.

More tests for C++ writer client

The rpc_dialog_writer hasn't had remove_trigger, remove_filter or remove_index tested. Add test cases to the C++ writer client test to cover this functionality.

Monolog compatible with mempool design

The current reflog is a monolog_exp2, which has exponentially growing buckets. This would require a memory pool for each bucket. Instead, the reflog should have exponentially growing bucket containers that have pointers to fixed size buckets. The buckets can then be allocated using a single memory pool.

Add SQL interface

Add (potentially modified) SQL interface for supporting queries, adding/removing filters/indexes/triggers, etc.

Cleanup Query Planner

The current query planner has a few issues that need to be resolved:

  • Does not work if there are no indexes for a minterm (yields zero results)
  • Reduce filter is hard-coded
  • Does not handle all operators (e.g., !=)

Add tests for contiguous read/write monolog methods

The following methods of the monolog implementations need to be tested if they will be used in the future.

void set(size_t idx, const T* data, size_t len);
void set_unsafe(size_t idx, const T* data, size_t len);
void get(T* data, size_t idx, size_t len);
size_t storage_size();

More intelligent mmapping by storage_allocator

On Linux machines the vm.max_map_count limit is hit very quickly (default on an ec2 c4.8xlarge instance is 65536). This needs to be manually increased for archival to be successful (since reflogs have relatively small buckets and each bucket is memory mapped).

Instead the storage_allocator could potentially memory map larger-than-requested chunks of files and store the excess of the available range for future calls. It would also have to map at a different granularity depending on what's being mmapped (data log vs reflog buckets). This could be done by just looking at the size passed to the mmap call.

an error on complie confluo

i complie confluo occur an error, and i already read other complie error issues, but no one is my problem, colud you help me? thank you very much.

error message:
In file included from /root/work/confluo/libconfluo/../libutils/utils/atomic.h:7:0,
from /root/work/confluo/libconfluo/confluo/archival/archival_utils.h:4,
from /root/work/confluo/libconfluo/src/archival/archival_utils.cc:1:
/usr/include/c++/4.8.2/atomic: In instantiation of ‘struct std::atomic<confluo::storage::encoded_ptr >’:
/root/work/confluo/libconfluo/confluo/storage/swappable_encoded_ptr.h:367:32: required from ‘class confluo::storage::swappable_encoded_ptr’
/root/work/confluo/libconfluo/src/archival/archival_utils.cc:17:42: required from here
/usr/include/c++/4.8.2/atomic:167:7: error: function ‘std::atomic<_Tp>::atomic() [with _Tp = confluo::storage::encoded_ptr]’ defaulted on its first declaration with an exception-specification that differs from the implicit declaration ‘std::atomic<confluo::storage::encoded_ptr >::atomic()’
atomic() noexcept = default;
^
make[2]: *** [libconfluo/CMakeFiles/confluo.dir/src/archival/archival_utils.cc.o] Error 1

my enviroment configure
boost 1.69
gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)

i only build RPC component, others all OFF

Expose Sketch API for multilog

Expose Sketch API to be able to:

  • Add/remove a sketch on a field
  • Execute ad-hoc queries on frequencies
  • Evaluate approximate triggers in real-time

Expose corresponding remote API.

More thorough testing of compression

  • Need to write more tests to cover edge cases of delta and lz4 encoding/decoding
    • Different array sizes
    • Different data sets (current tests are monotonically increasing elements with constant increments)

Universal sketches, approximate queries

  • Implement count-min-sketch (supposedly more space-efficient than count-sketch)

  • Time-adaptive sketch: decay accuracy & storage for older values (using inflation)

  • Integrate with atomic multilog for offline/online approximate queries.

    • Figure out what the basic interface should look like
    • Consider:
      • cross-stream correlation queries
      • use-cases where the same stream is partitioned across multiple servers
      • storage-accuracy tradeoff, blowfish

Detect missing Python client build dependencies

If possible, in the CMakeLists.txt check if all of the dependencies for the Python client are installed.

Currently if a Python dependency is missing, it isn't detected and the build fails:

Traceback (most recent call last):
File "setup.py", line 3, in <module>
from setuptools import setup
ImportError: No module named setuptools
pyclient/CMakeFiles/pyclient_build.dir/build.make:57: recipe for target 'pyclient/CMakeFiles/pyclient_build' failed
make[2]: *** [pyclient/CMakeFiles/pyclient_build] Error 1
CMakeFiles/Makefile2:1281: recipe for target 'pyclient/CMakeFiles/pyclient_build.dir/all' failed
make[1]: *** [pyclient/CMakeFiles/pyclient_build.dir/all] Error 2
Makefile:160: recipe for target 'all' failed
make: *** [all] Error 2```

Remove double copy in server side record batches

Currently, there is data is double copied at the server side:

  • once while reading it from thrift's serialized format
  • once while converting it to dialog's expected format
    This can be collapsed into a single copy with ease, and should lead to performance improvements.

Possible memory leak

Appending to an atomic multilog even when it has no filters and indexes may have a large memory overhead despite only having to update a single data structure (the data log). Need to profile to discover possible memory leak(s).

JSON interface for RPC Client

Currently the client must read from and write to a table using a contiguous string buffer. Add an interface to the C++ and Python RPC clients to read and write using JSON for easier usability. Validate all fields when a record is written using this interface.

Aggregates as double precision values

Store/compute all aggregates as double precision values rather than the specific types of aggregated attribute. This avoids issues of overflow, etc for smaller types.

Memory size parser

Add a parser for memory sizes, e.g, the values provided could be 10k or 100M or 5g, and it would get parsed to the appropriate number of bytes.

size_t configuration_params::MAX_MEMORY = dialog_conf.get<size_t>(
    "max_memory", constants::DEFAULT_MAX_MEMORY);

i have a problem on install confluo

Determining if the pthread_create exist failed with the following output:
Change Dir: /root/confluo_test/confluo/build/CMakeFiles/CMakeTmp

Run Build Command:"/usr/bin/gmake" "cmTC_6011a/fast"
/usr/bin/gmake -f CMakeFiles/cmTC_6011a.dir/build.make CMakeFiles/cmTC_6011a.dir/build
gmake[1]: Entering directory /root/confluo_test/confluo/build/CMakeFiles/CMakeTmp' Building C object CMakeFiles/cmTC_6011a.dir/CheckSymbolExists.c.o /usr/bin/cc -o CMakeFiles/cmTC_6011a.dir/CheckSymbolExists.c.o -c /root/confluo_test/confluo/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c Linking C executable cmTC_6011a /usr/local/bin/cmake -E cmake_link_script CMakeFiles/cmTC_6011a.dir/link.txt --verbose=1 /usr/bin/cc -rdynamic CMakeFiles/cmTC_6011a.dir/CheckSymbolExists.c.o -o cmTC_6011a CMakeFiles/cmTC_6011a.dir/CheckSymbolExists.c.o: In function main':
CheckSymbolExists.c:(.text+0x16): undefined reference to pthread_create' collect2: error: ld returned 1 exit status gmake[1]: *** [cmTC_6011a] Error 1 gmake[1]: Leaving directory /root/confluo_test/confluo/build/CMakeFiles/CMakeTmp'
gmake: *** [cmTC_6011a/fast] Error 2

File /root/confluo_test/confluo/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:
/* */
#include <pthread.h>

int main(int argc, char** argv)
{
(void)argv;
#ifndef pthread_create
return ((int*)(&pthread_create))[argc];
#else
(void)argc;
return 0;
#endif
}

Determining if the function pthread_create exists in the pthreads failed with the following output:
Change Dir: /root/confluo_test/confluo/build/CMakeFiles/CMakeTmp

Run Build Command:"/usr/bin/gmake" "cmTC_f4480/fast"
/usr/bin/gmake -f CMakeFiles/cmTC_f4480.dir/build.make CMakeFiles/cmTC_f4480.dir/build
gmake[1]: Entering directory /root/confluo_test/confluo/build/CMakeFiles/CMakeTmp' Building C object CMakeFiles/cmTC_f4480.dir/CheckFunctionExists.c.o /usr/bin/cc -DCHECK_FUNCTION_EXISTS=pthread_create -o CMakeFiles/cmTC_f4480.dir/CheckFunctionExists.c.o -c /usr/local/share/cmake-3.13/Modules/CheckFunctionExists.c Linking C executable cmTC_f4480 /usr/local/bin/cmake -E cmake_link_script CMakeFiles/cmTC_f4480.dir/link.txt --verbose=1 /usr/bin/cc -DCHECK_FUNCTION_EXISTS=pthread_create -rdynamic CMakeFiles/cmTC_f4480.dir/CheckFunctionExists.c.o -o cmTC_f4480 -lpthreads /usr/bin/ld: cannot find -lpthreads collect2: error: ld returned 1 exit status gmake[1]: *** [cmTC_f4480] Error 1 gmake[1]: Leaving directory /root/confluo_test/confluo/build/CMakeFiles/CMakeTmp'
gmake: *** [cmTC_f4480/fast] Error 2

Filter using a function

Allow filters to use a function instead of just an expression. The filter class already accepts a function in the constructor but there's currently no way for an RPC client to specify a function, so the default_filter is always used.

Clustering/scale-out roadmap?

Hi there,

I see there's been some activity in a branch called chain-replication but it'd be nice to know what you folks have in mind for scale-out architecture. :)

Separate debugging build

Create a separate build process for debugging that enables the debug flag and does not enable compiler optimizations.

Range query bug

Adhoc filter is creates an empty plan for queries on long fields with no upper bound (e.g., value >= 100). This may potentially extend to other types, have not confirmed.

Where to set a maven repo proxy.

Hi, I'm trying to make install confluo. However, the installation has some errors to get maven repo. I have a maven proxy (mirror). Where can I set it in the conflue source?

i have a problem on complie confluo

[ 32%] Building CXX object libconfluo/CMakeFiles/confluo.dir/src/confluo_store.cc.o
In file included from /home/root/C++/confluo/libconfluo/confluo/aggregate/aggregate_info.h:7:0,
from /home/root/C++/confluo/libconfluo/confluo/aggregated_reflog.h:5,
from /home/root/C++/confluo/libconfluo/confluo/filter.h:4,
from /home/root/C++/confluo/libconfluo/confluo/archival/load_utils.h:7,
from /home/root/C++/confluo/libconfluo/confluo/atomic_multilog.h:11,
from /home/root/C++/confluo/libconfluo/confluo/confluo_store.h:8,
from /home/root/C++/confluo/libconfluo/src/confluo_store.cc:1:
/home/root/C++/confluo/libconfluo/confluo/parser/aggregate_parser.h:32:27: error: expected identifier before ‘(’ token
(std::string, agg)
^
/home/root/C++/confluo/libconfluo/confluo/parser/aggregate_parser.h:32:41: error: ‘agg’ has not been declared
(std::string, agg)
^
/home/root/C++/confluo/libconfluo/confluo/parser/aggregate_parser.h:33:23: error: ‘field_name’ has not been declared
(std::string, field_name))
^
/home/root/C++/confluo/libconfluo/confluo/parser/aggregate_parser.h:33:33: error: ‘parameter’ declared as function returning a function
(std::string, field_name))
^
/home/root/C++/confluo/libconfluo/confluo/parser/aggregate_parser.h:35:1: error: expected constructor, destructor, or type conversion before ‘namespace’
namespace confluo {
^
make[2]: *** [libconfluo/CMakeFiles/confluo.dir/src/confluo_store.cc.o] Error 1
make[1]: *** [libconfluo/CMakeFiles/confluo.dir/all] Error 2
make: *** [all] Error 2

GCC VERSION:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)

BOOST VERSION:
1.53.0

LOCATE :
boost: /usr/include/boost

Improve documentation

Improve and add documentation where none exists in several header files, including but not limited to:

  • storage.h
  • monolog_linear.h
  • monolog_exp2.h
  • dialog_table.h
  • expression.h
  • tiered_index.h

Trigger evaluation should be agnostic to application timestamps

Currently, the trigger evaluation framework uses realtime (more precisely, server time) to determine which time-buckets to check for trigger evaluation. This can potentially be an issue if the application supplies the timestamps, and these timestamps have no correlation with server time.

Memory pool thread id

A reader thread freeing memory does not know the id of the writer thread that allocated the pointer in the first place.

Support for flexible schema

Currently, Confluo requires a fixed schema for each atomic multilog, which restricts different applications like network monitoring and diagnosis. We want to support flexible schemas (e.g., JSON) to facilitate these applications.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.