Giter Site home page Giter Site logo

ccommon's Introduction

ccommon

ccommon is a C library for the various cache projects developed by Twitter's cache team. It is currently used by unified cache backend.

Origins

The Twitter Cache team started working on a fork of Memcached in 2010, and over time has written various cache backends such as fatcache, slimcache and cache middle layer twemproxy. These projects have a lot in common, especially when you examine the project structure and the underlying mechanism that drives the runtime. Instead of stretching our effort thin by maintaining several individual code bases, we started building a library that captures the commonality of these projects. It is also our belief that the commonality extends beyond just caching, and can be used as the skeleton of writing many more high-throughput, low-latency services used intended for a distributed environment.

Dependencies

Build using CMake

To use cmake, make sure you already have it installed and the version is above 2.8

# you can also configure and compile in-source, i.e., directly at the project top level, but out-of-source compile is strongly encouraged by CMake.
# For one: there won't be something like "make (dist)clean" to help you clean up the mess afterwards
mkdir _build
cd _build
cmake ..
make

License

This software is licensed under the Apache 2 license, see LICENSE for details.

ccommon's People

Contributors

brayniac avatar juliaferraioli avatar kevyang avatar michalbiesek avatar noxiouz avatar paegun avatar pavanky avatar seppo0010 avatar slyphon avatar swlynch99 avatar synecdoche avatar thinkingfish avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ccommon's Issues

duplicate symbols due to definition in header file

include/cc_signal.h defines struct signal signals[SIGNAL_MAX]; which is transitively included in multiple files. This results in undefined behavior. The struct signal signals should be declared as extern in the header and defined in a single translation unit.

Rename some ccommon functions in bstring per slyphon's and kevyang's suggestions

Some names in bstring module are confusing, we should rename them per @slyphon 's suggestion

Quote from twitter/pelikan#194

Jonathan:

bstring_set_raw might be better named bstring_set_literal
bstring_set_text might be better named bstring_set_char_p or bstring_from_char_ptr

Kevin:

I like bstring_set_literal. I think bstring_set_cstr is a clear name to me, not sure how you guys feel about it though.

Jonathan:

bstring_set_cstr is good. +1
Rust has CStr/CString to refer to a null-terminated string, so it follows that naming convention well.

fix flaky tests

Add a mock timer for tests that rely on timing, such as pipe tcp timewheel, these occasionally fail due to timing issues.

Rust-enabled build is broken

Fix rust-enabled build

Expected behavior

CI should be green.

Actual behavior

CI is failing due to Rust-enabled build.

Steps to reproduce the behavior

This started showing up in the last week without code change.

style guide and externs

"- Use of extern should be considered as evil, if it is used in header files
to reference global variables."
Why?

struct allocation in initialization

I noticed this asymmetry among different modules, and I was wondering if there's a reason for it or we should unify them. On some cases, the struct calls cc_alloc and returns a pointer to the newly created struct[1], and in other cases, it receives a pointer to a memory address to use that has not yet being initialized[2]. The latter seems to be more generic, because it allows to be used with stack allocations.

[1] https://github.com/twitter/ccommon/blob/master/src/cc_log.c#L76
[2] https://github.com/twitter/ccommon/blob/master/include/cc_bstring.h#L52

A separate cc_realloc implementation for tests

Buffer operations are often error-prone, especially when raw pointers are used and the underlying buffer is moved before the references are discarded/reset. The one place where buffer movement happens is through cc_realloc(), therefore, if we can make memory address change every time realloc is called, we can detect such problems more quickly (these are nasty bugs to reproduce in production).

Since we wrap realloc already, it should be easy to swap in our own implementation when debugging/testing. E.g. by turning that into a combo alloc + memcpy, we are guaranteed to get a new address each time.

This will help us prevent problems like twitter/pelikan#98

rust proc_macro for config structs

Add a proc_macro for the config structs (eg: ccommon-stream, ccommon-debug, ...)

Problem

In #245 we added configuration structs for the ccommon components, however there is a significant amount of boilerplate.

Solution

I propose adding a proc_macro for generating code to reduce the boilerplate. It should be possible to define the overall configuration struct, with its default values in one location and let the proc_macro expand that out to a complete struct definition, accessors, and corresponding serde annotations.

math on void pointer

In include/cc_array.h on line 112 (sha 05c6e1e) there is the line
return arr->data + arr->size * idx;

arr->data is declared as void * which has no defined operations on it. It is likely that changing it to a char * (and cascading the requirements of that change) is sufficient.

add interface to accept with flags

Current tcp_accept() does not allow customized flags and sets the presumed ones (O_NONBLOCK, TCP_NODELAY) using separate calls.

A better interface would be to allow flags to be passed in, and take advantage of accept4 when possible.
accept4 avoids further syscalls when one wants to accept a connection and immediately apply certain flags to the socket.

It is relatively recent (glibc 2.10 and linux 2.6.28) and is missing from the current osx versions, so a compile time check and a fallback implementation is needed to make it work universally.

generic hashtable template with optional locking mask

There are 4 hashtables in Pelikan now, each only very slightly different. In general if we can create a hashtable template with the following components pluggable, we can serve them all. This is likely gonna require more macros, but we do use cc_queue.h anyway.

  • type of hash node
  • type of "key" to build hash from
  • hash function used (against key)
  • comparison function on "key"

Additionally, it'd be nice to provide built-in per-entry locking support. This can be achieved by having an bit-mask for all entries, and set the corresponding bit to true when an entry is being accessed. The granularity of the lock can be adjusted by changing the mapping between bit-mask and hash entry (e.g. 2**n entries mapping to the same bit in mask, n being configurable).

Of course, if only we use a language that has proper templating...

check_wheel is a little flaky

Mostly due to timing delays, should revisit to either relax timing (and reduce jitter's impact) or find another way that is timing independent.

option to preallocate in contiguous memory in cc_pool

We should allow the option to force contiguous memory for preallocated pools of objects. This may be useful for resource pools where we want memory locality.

This most likely already happens because preallocation calls the allocate function in a tight loop. However, we rely on the memory allocator for this behavior, and depending on the implementation of it we may or may not actually end up with contiguous memory for a preallocated pool.

merge notes and docs

we have two different directories for docs about the repo; notes and docs. we should merge these and prune what we don't need.

introduce murmurhash3

In preparation to port fatcache to Pelikan, we need a non-cryptographic hash function that can generate message digests that are longer than what we use for hashtables (typically 32-bit) and have low collision rate.

After reading this post, and a few other blogposts, I think murmurhash3 is an excellent alternative for the SHA-1 that's currently used in fatcache.

I will copy and modify the C++ implementation (MIT license) into ccommon and see if any changes are necessary to make it work with a C compiler, another C version (provided by qLibc) doesn't seem to yield platform optimized code as the canonical version, and I will need to modify that as well to incorporate the source, so starting from the C++ version makes more sense.

tcp_maximize_sndbuf behavior broken on Linux

I was writing a test for tcp_maximize_sndbuf and it was easy on OS X, but it's current implementation makes no sense on Linux (Ubuntu).

The implementation does a binary search with these values:

134250496 0
201342976 0
234889216 0
251662336 0
260048896 0
264242176 0
266338816 0
267387136 0
267911296 0
268173376 0
268304416 0
268369936 0
268402696 0
268419076 0
268427266 0
268431361 0
268433409 0
268434433 0
268434945 0
268435201 0
268435329 0
268435393 0
268435425 0
268435441 0
268435449 0
268435453 0
268435455 0
268435456 0

The left column is the attempted value and the right column is the status received. It can be seen that the status is always 0, meaning that setsockopt never fails even if the value used is higher than the maximum value possible. Calling getsockopt at the end returns the value 425984 (at least on my machine), which is lower than any value attempted.

Should we set it to INT_MAX instead of doing the binary search?

timeout_event_create segfaults

It appears that simply calling timeout_event_create causes a segfault, when the newly created timeout_event is being reset:

$r
Process 72759 launched: './pelikan_slimcache' (x86_64)
load config from ../template/slimcache.conf
Process 72759 stopped
* thread #1: tid = 0x268df5, 0x00000001000192a5 pelikan_slimcache`timeout_event_reset(t=0x0000000100103970) + 37 at cc_wheel.c:61, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x00000001000192a5 pelikan_slimcache`timeout_event_reset(t=0x0000000100103970) + 37 at cc_wheel.c:61
   58       t->free = false;
   59   
   60       TAILQ_NEXT(t, tqe) = NULL;
-> 61       TAILQ_PREV(t, tevent_tqh, tqe) = NULL;
   62       t->cb = NULL;
   63       t->data = NULL;
   64       t->recur = false;
(lldb) bt
* thread #1: tid = 0x268df5, 0x00000001000192a5 pelikan_slimcache`timeout_event_reset(t=0x0000000100103970) + 37 at cc_wheel.c:61, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
  * frame #0: 0x00000001000192a5 pelikan_slimcache`timeout_event_reset(t=0x0000000100103970) + 37 at cc_wheel.c:61
    frame #1: 0x000000010001931b pelikan_slimcache`timeout_event_create + 43 at cc_wheel.c:80
    frame #2: 0x000000010001e53e pelikan_slimcache`main + 5 at main.c:62
    frame #3: 0x000000010001e539 pelikan_slimcache`main(argc=<unavailable>, argv=<unavailable>) + 201
    frame #4: 0x00007fff89b425fd libdyld.dylib`start + 1

changes to continuous integration

We will be dropping our paid Travis CI plan at the end of 2021. We do not expect there to be any visible changes to this repo, but wanted to give some notice just in case. We recommend migrating CI jobs to GitHub Actions.

Travis CI provides free testing for open source projects. In addition, Twitter has paid for a small number of additional concurrent builds which were available for open source as well as private repositories. Many Twitter projects have already moved to GitHub Actions for CI, and we have no private repos left using Travis, so we will be discontinuing our plan at the end of 2021.

Since this repo is open source, we do not expect this change to impact Travis CI builds for this project. However, we still recommend most Twitter projects to migrate to GitHub Actions for CI at your convenience.

Stream generalization

Investigate a setup where channel type and buffer type are both pluggable, what would the stream setup look like then? Reference: #76

typedef conventions

I'm thinking about slowly converting all the user-defined _t type names into a few alternatives, so we are compliant with the style guide:

  • _e for enum
  • _f for floating point numbers, regardless of size
  • _i for signed integers, regardless of size
  • _u for unsigned integers, regardless of size
  • _p for other pointer type
  • _fn for function pointer types
  • _st for structs

In general, we may want to discourage applying typedef on primitive types unless there's a good reason (e.g. different type representation on different platforms), types that may change in the future, or type names that are commonly used as alias.

Thoughts?

unused parameter warning in debug_log_flush

/Users/kyang/ccommon/src/cc_debug.c: In function 'debug_log_flush':
/Users/kyang/ccommon/src/cc_debug.c:106:23: warning: unused parameter 'arg' [-Wunused-parameter]
debug_log_flush(void *arg)

Update default buffer size

To subtract 16 bytes (metadata overhead for malloc) from the current default size, which is exactly 16KB

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.