jctools / jctools Goto Github PK

Home Page: http://jctools.github.io/JCTools

License: Apache License 2.0

Java 100.00%

concurrency data-structures queues lock-free wait-free awesome java benchmarks

jctools's Issues

SpscGrowableArrayQueue capacity is less than requested capacity

Users currently expect a power of 2 capacity to end up being the actual resulting capacity of the queue. While this is mildly arbitrary (given that for requested capacity that is less than power of 2 the capacity is rounded up) it is arguably consistent with the following guideline:
Queue capacity >= requested capacity

SpscGrowableArrayQueue breaks that assumption. If requested capacity is <= power of 2 the resulting capacity will be (power of 2 - 1).
To fix the issue I would like to re-structure the queue such that a marker value is used to signal queue is full, but the link to next queue is pointed to in an extra slot.

Support memory footprint requirement in spec

This is following input from V.Klang (TypeSafe) and P.Ron(P.Universe) where a requirement for a concurrent queue is expressed such that queue memory footprint is minimal even at the risk of false sharing. For JAQ queues we are err in the side of padding, so perhaps have 4 settings: MIN/LOW/HIGH/MAX
In the case of bounded impls this would translate into:
MIN-> no padding
LOW-> pad between P/C fields
HIGH-> pad pre/post class + between P/C fields
MAX-> pad pre/post class + pre/post array + between P/C fields

Make lvProducerIndex() and lvConsumerIndex() accessible from outside

I need to monitor the SpScArrayQueue. Unfortunately, I can neither extends from it nor I can access the getter methods mentioned in the subject.

Hence, could you please remove the final modifier of the class and make its getter methods protected?

[Channels] Add a set of benchmarks for comparing Channel vs. Disruptor vs. queue

The premise of the Channel approach is that inlining the data into the ring buffer is good for locality and should therefore improve performance. We should be able to demonstrate the benefit via a set of benchmarks.

[SpscArrayQueue Question] How to clear from the Producer ?

Hi Nitsan

as you suggested I am asking my question over here...

I was reading up on your SpscArrayQueueand studying the implementation a bit as well. Particularly I am interested in the clear() method. As far as I understood it is expected to be called by the Consumer exclusively (as of it invoking poll()continuously until the queue is emptied).

Is there a possibility to clear() the SpscArrayQueue from the producer thread side as well ?

Thank you for any help or suggestion on this matter, cheers Christian.

SpscGrowableArrayQueue on failed check for look ahead the next element availability check is reversed leading to incorrect estimation of capacity

I found two more bugs:

L91: should allow initial size to be equal to max size and thus not growing.

L138-145: there is a missing case when the queue is at max capacity and there is exactly one space left: L138 checks if the index + 1 is occupied thus in the max capacity case, one can't have 128 elements in the queue but only 127; in the below max case, it won't use cell at index unless index + 1 is occupied, which is odd. I think the correct check would be:

if (null == lvElement(buffer, calcElementOffset(index + 1, mask))) { // buffer is not full
    return writeToQueue(buffer, e, index, offset);
} else if (mask == maxSize) {
    if (null == lvElement(buffer, offset)) {
        return writeToQueue(buffer, e, index, offset);
    }
    // we're full and can't grow
    return false;
}

Note the == check.

[SPC Question] Is it safe to have several producer threads accessing SPC abstractions in sequence ?

Heya Nitsan

I was wondering if it was safe to access JCTools's SP*C abstractions sequentially from multiple producers. Those producers would be taking turns and only one would be accessing the queue at any given moment in time.

Do you keep some internal state, like a producer index, that needs to become visible for subsequent producers and I would need to guarantee visibility by syncing the threads on the outside ?

Cheers, Christian.

Add multicast/unicast property to spec

Support for MC.
Consider disruptor like (but with 1 'stage') queues where all consumers get all messages.
Consider last consumer must null entry, so number of consumers is bounded (but may be dynamic)

Non-Blocking Doubly-Linked List (Niloufar Shafiei algo)

I'd be interested in your thoughts on the algorithm presented in this recent paper for a doubly-linked list and how it compares performance-wise with the LinkedList implementations currently in JCTools. I've been struggling to find an actual implementation of this algorithm though the pseudo code in the paper seems thorough. It is honestly a bit beyond my comfort zone hence me reaching out to you before I attempt it to find out if you've already pursued it or have guidance or opinions on it.

Paper: http://arxiv.org/abs/1408.1935

Thanks

What about the adaptive backtracking technique at consumer-side?

In the paper "BQueue ..." (International Journal of Parallel Programming, 2013), Wang et al. propose an adaptive backtracking technique at the consumer-side to avoid cache trashing due to the shared access on the buffer array. Have you already investigate this technique with SpScArrayQueue?

Use templating/codegen to generate required field padding and ordering

This would be hugely helpful and could be considered a completely separate tool/lib. This is an extended alternative to @contended annotation which should allow for field groups. E.g.:

class Foo extends Bar {
  @GroupLayout(id=0,prepad=64)
  @FieldLayout(group=0)
  int a;
  @FieldLayout(group=0)
  int b;
  @FieldLayout(group=1)
  byte c;
}

Will result in a hierarchy being generated such that fields are ordered and padded:

abstract class FooPrePadding extends Bar {
  long p0,p1,...;
}
abstract class FooFields1 extends FooPrePadding {
  int a;
  int b;
}
abstract class FooFields2 extends FooFields1 {
  byte c;
}
class Foo extends FooFields2 {
}

Add SNAPSHOT version to mvn repository

Btw: increase the version number to 1.1. Currently, a mvn package generates a 1.0-SNAPSHOT jar.

Use runtime code generation to create implementations

This allows:

minimal boilerplate for padding (+ runtime decision on pad size)
parametrized constants
code optimization based on runtime info
The trade off is code bloat.

getAndSetObject() is undefined for the type Unsafe in MpscLinkedQueue8.java

When using a Java compiler version less than 8, I get the following error in Eclipse:

Description Resource    Path    Location    Type
The method getAndSetObject(MpscLinkedQueue8ProducerNodeRef<E>, long, LinkedQueueNode<E>) is undefined for the type Unsafe   MpscLinkedQueue8.java   /jctools-core/src/main/java/org/jctools/queues  line 29 Java Problem

Add single writer concurrent set implementations

A common/valid use case to optimize for that is not enabled by existing work is a concurrent map or set for single threaded writers with many readers. Implementation can leverage and simplify existing work from highly scalable lib. The implementation should yield improved performance and lower footprint over existing work. Set implementation should be specialized, not over a Map.

Bring over experimental work for alternative queue interface into org.jctools.cqueues

This is the ConcurrentQueue interface and related implementations. Also incorporate any enhancements made from queues onto cqueues.

[Channels] Add support for non-nulling consumer queues (i.e. object channels)

Not sure how this can be described on the spec... also consider this goes against FF look in buffer approach and requires consumers cursor visibility for wrap detection.

Document SPSC on wiki

Provide blocking variants for all queue types

I need an SpSc queue that blocks on consumer side if no data is available. Is there a more efficient approach than integrating a semaphore?
For example, the consumer does not always need to check a lock on each consuming action if it caches the queue's size.

Why does MpscCompoundQueueColdFields use multiple MpscArrayQueues?

As far as I know a Mpsc_Queue consists of multiple Spsc_Queues.

Clarify minimum & maximum size on the channel

Channels have both a minimum and a maximum capacity and the mantra "ram is cheap, don't underprovision". They should have:

getters for both.
a constructor parameter to specify the minimum
Eg: constructor parameter = 20, getMinimumCapacity = 20, getMaximumCapacity = 32
the slack is computed based on the parameter, so if you ask for 20 you get 32 and slack 5, if you ask for 32 you get 64 with slack 8

UnsupportedOperationException

Many methods still seem un implemented. May be this can be reviewed and see what can be implemented than just throwing UnsupportedOperationException.

gpg command required

Since I updated, it requires gpg to build. (on a Mac)
Normal?

Cleanup implementations to minimize boiler plate

Consider code-gen for padding?
Consider coding standards?

Make CACHE_LINE_SIZE configurable

e.g., via Integer.getInteger(<property>, 64)

[Channels] Implement MPSC Channels

Queue semantics don't offer object reuse. Having to allocate new object instance for every offer() call effectively halves queue throughput.

It would be useful to have disruptor-like pattern around a buffer.

Extended API to support batched producer/consumer methods

Could we save some volatiled index and array updates when using a batch/buffered add for a producer in a high throughput context? Optimally, we could just wrap/decorate an available SpSc queue and adapt the element adding behavior.

Decouple flyweight serialisation from channel implementation

Evaluate an spsc circular linked node queue

Allocate n nodes at the beginning just like we do with an array and linked the last with the first node. In contrast to the array version, we do not need to manage any indices and do not need to compute an offset. We just need to sync the node value via soValue and lvValueAndNull. Next is fixed in a bounded version. I do not know how the cache locality could break the performance here since we do not use an array.

Implement alternative concurrent counters (LongAdder alternative)

Some improvements still possible on top of LongAdder for particular cases as demonstrated by the following post:
http://psy-lob-saw.blogspot.com/2013/06/java-concurrent-counters-by-numbers.html

Non-volatile long writes on 32-bit JVM can results in tearing and a bug in SpscArrayQueue::size()

While working on some queue related enhancements in RxJava, I've come across a situation where one thread writes a plain long field f1 followed by an ordered write f2 and the other thread volatile reads f2 followed by volatile read of f1. This works on 64 bit JVM but may read an incorrect value on 32 bit JVM. (This gist)[https://gist.github.com/akarnokd/760d4c0669b7caac7a1a] demonstrates the problem. This may happen with SpscArrayQueue.producerIndex but the likelihood is very small because it may present itself only after 1<<32 elements + a concurrent size() call. Is it worth preparing for this issue by lazy-set these longs as well?

Document overall approach on wiki

Implement unbounded MPSC

http://www.1024cores.net/home/lock-free-algorithms/queues/non-intrusive-mpsc-node-based-queue
Note XCHG only available in JDK8 :(

Offer returns false occasionally when space should exist

I have a use case where I'm using the queue size as a specific limit and relying on the offer rejection as a boundary. I found that it seems there are times when offer rejects even though there should be space available.

A unit test that demonstrates this behavior is here: https://github.com/benjchristensen/JCTools/blob/offer/jctools-core/src/test/java/org/jctools/queues/SpmcArrayQueueTest.java#L54

This test does the following

1 thread offering values from Producer.request
2 threads draining from the queue
as each of the draining threads pulls items from queue.poll it calls Producer.request which schedules onto the same producing thread offering more items via queue.offer
occasionally the offer rejects

I can't see anything wrong with the code that would cause more values to be offered than items polled.

This unit test works if I replace the SpmcArrayQueue with a synchronized LinkedList or ArrayBlockingQueue.

Setup a JCStress based test suite for queues

Capitalize on new JDK 8 getAndAdd intrinsic for MPSC/SPMC/MPMC bounded queues

This should be a relatively small change and should benefit implementations where the CAS loop can be replaced by LOCK XADD. This does however open the door to producers/consumers overstepping and spinning until full/empty queue state is resolved.

[Channels] Add support for bit packing boolean fields to message flyweight

Goal: pack lots of boolean values into bytes. Issue also includes investigating performance through JMH Benchmarks.

Implement unbounded SPSC using linked buffers instead of nodes

[Channels] NOTE: Channels unaligned memory access considerations

The SPSC fixed sized ring buffer is using an int marker for publication:

private boolean isMessageReady(long offset) {
    return UNSAFE.getIntVolatile(null, offset) == READY_MESSAGE_INDICATOR;
}

private void busyIndicator(long offset) {
    UNSAFE.putOrderedInt(null, offset, BUSY_MESSAGE_INDICATOR);
}

This is potentially an issue if offset is not integer aligned. We use int because there's no putOrderedByte. The int access may not be atomic, but this is not really an issue because we only set the value to 0 or 1 so there's only one actual byte in use.
I am aware that for JDK9 Unsafe gains new methods for unaligned access which we may need to use here and elsewhere in the channel code.
I am also aware that current packing/layout of data in each message is NOT aligned, which is something we may want to address in future.

BaseLinkedQueue: size() == 0 is not indicative of isEmpty()

This is true for the MPSC queue because of the whole 'bubbles' in the queue crap where producerNode is set but not yet linked to the next node.

Java 8 Contention

May be you can use contention instead of padding.

Batch consuming an MpscLinkedQueue?

Would it be possible/feasible to batch consume the contents of an MpscLinkedQueue? I.e., have a method taking a callback which then traverses the links via plain loads until it reaches the producer's node. So instead of polling one by one, it would move the consumer pointer once and not bother nulling out every node's value.

Concerns about the new SpscGrowableArrayQueue

(Copied from my comment on RxJava)

Correct me if I'm wrong, but it seems the code on line 135 may read beyond the buffer if offset is at the last element.

The other thing I see is that the queue should disallow offering Object[] values because it would confuse the poll. Better yet, when a new array needs to be communicated, wrap it into a private holder class so it is not confused with any other type; resize should be infrequent enough to not cause significant overhead.

SpmcArrayQueue peek not-null but poll is null?

I was using SpmcArrayQueue with a single producer and single consumer and I've noticed that sometimes, I read a non-null value via peek but the subsequent poll returns null. Looking into the class it seems there is a race window spElement and soProducerIndex and reading the various indexes in poll.

I could change my algorithm to not require peek, but the question is: shouldn't either offer/peek have a happens-before relation (would require soElement) or poll be a bit stronger (lvElement, then wait for index update)?

(Java 1.8u31 64 bit, Win7)

jctools / jctools Goto Github PK

jctools's Issues

Recommend Projects

Recommend Topics

Recommend Org