jctools / jctools Goto Github PK
View Code? Open in Web Editor NEWHome Page: http://jctools.github.io/JCTools
License: Apache License 2.0
Home Page: http://jctools.github.io/JCTools
License: Apache License 2.0
Users currently expect a power of 2 capacity to end up being the actual resulting capacity of the queue. While this is mildly arbitrary (given that for requested capacity that is less than power of 2 the capacity is rounded up) it is arguably consistent with the following guideline:
Queue capacity >= requested capacity
SpscGrowableArrayQueue breaks that assumption. If requested capacity is <= power of 2 the resulting capacity will be (power of 2 - 1).
To fix the issue I would like to re-structure the queue such that a marker value is used to signal queue is full, but the link to next queue is pointed to in an extra slot.
This is following input from V.Klang (TypeSafe) and P.Ron(P.Universe) where a requirement for a concurrent queue is expressed such that queue memory footprint is minimal even at the risk of false sharing. For JAQ queues we are err in the side of padding, so perhaps have 4 settings: MIN/LOW/HIGH/MAX
In the case of bounded impls this would translate into:
MIN-> no padding
LOW-> pad between P/C fields
HIGH-> pad pre/post class + between P/C fields
MAX-> pad pre/post class + pre/post array + between P/C fields
I need to monitor the SpScArrayQueue
. Unfortunately, I can neither extends from it nor I can access the getter methods mentioned in the subject.
Hence, could you please remove the final modifier of the class and make its getter methods protected?
The premise of the Channel approach is that inlining the data into the ring buffer is good for locality and should therefore improve performance. We should be able to demonstrate the benefit via a set of benchmarks.
Hi Nitsan
as you suggested I am asking my question over here...
I was reading up on your SpscArrayQueue
and studying the implementation a bit as well. Particularly I am interested in the clear()
method. As far as I understood it is expected to be called by the Consumer exclusively (as of it invoking poll()
continuously until the queue is emptied).
Is there a possibility to clear()
the SpscArrayQueue
from the producer thread side as well ?
Thank you for any help or suggestion on this matter, cheers Christian.
I found two more bugs:
L91: should allow initial size to be equal to max size and thus not growing.
L138-145: there is a missing case when the queue is at max capacity and there is exactly one space left: L138 checks if the index + 1 is occupied thus in the max capacity case, one can't have 128 elements in the queue but only 127; in the below max case, it won't use cell at index unless index + 1 is occupied, which is odd. I think the correct check would be:
if (null == lvElement(buffer, calcElementOffset(index + 1, mask))) { // buffer is not full
return writeToQueue(buffer, e, index, offset);
} else if (mask == maxSize) {
if (null == lvElement(buffer, offset)) {
return writeToQueue(buffer, e, index, offset);
}
// we're full and can't grow
return false;
}
Note the == check.
Heya Nitsan
I was wondering if it was safe to access JCTools's SP*C abstractions sequentially from multiple producers. Those producers would be taking turns and only one would be accessing the queue at any given moment in time.
Do you keep some internal state, like a producer index, that needs to become visible for subsequent producers and I would need to guarantee visibility by syncing the threads on the outside ?
Cheers, Christian.
Support for MC.
Consider disruptor like (but with 1 'stage') queues where all consumers get all messages.
Consider last consumer must null entry, so number of consumers is bounded (but may be dynamic)
I'd be interested in your thoughts on the algorithm presented in this recent paper for a doubly-linked list and how it compares performance-wise with the LinkedList implementations currently in JCTools. I've been struggling to find an actual implementation of this algorithm though the pseudo code in the paper seems thorough. It is honestly a bit beyond my comfort zone hence me reaching out to you before I attempt it to find out if you've already pursued it or have guidance or opinions on it.
Paper: http://arxiv.org/abs/1408.1935
Thanks
In the paper "BQueue ..." (International Journal of Parallel Programming, 2013), Wang et al. propose an adaptive backtracking technique at the consumer-side to avoid cache trashing due to the shared access on the buffer array. Have you already investigate this technique with SpScArrayQueue
?
This would be hugely helpful and could be considered a completely separate tool/lib. This is an extended alternative to @contended annotation which should allow for field groups. E.g.:
class Foo extends Bar {
@GroupLayout(id=0,prepad=64)
@FieldLayout(group=0)
int a;
@FieldLayout(group=0)
int b;
@FieldLayout(group=1)
byte c;
}
Will result in a hierarchy being generated such that fields are ordered and padded:
abstract class FooPrePadding extends Bar {
long p0,p1,...;
}
abstract class FooFields1 extends FooPrePadding {
int a;
int b;
}
abstract class FooFields2 extends FooFields1 {
byte c;
}
class Foo extends FooFields2 {
}
Btw: increase the version number to 1.1. Currently, a mvn package
generates a 1.0-SNAPSHOT jar.
This allows:
When using a Java compiler version less than 8, I get the following error in Eclipse:
Description Resource Path Location Type
The method getAndSetObject(MpscLinkedQueue8ProducerNodeRef<E>, long, LinkedQueueNode<E>) is undefined for the type Unsafe MpscLinkedQueue8.java /jctools-core/src/main/java/org/jctools/queues line 29 Java Problem
A common/valid use case to optimize for that is not enabled by existing work is a concurrent map or set for single threaded writers with many readers. Implementation can leverage and simplify existing work from highly scalable lib. The implementation should yield improved performance and lower footprint over existing work. Set implementation should be specialized, not over a Map.
This is the ConcurrentQueue interface and related implementations. Also incorporate any enhancements made from queues onto cqueues.
Not sure how this can be described on the spec... also consider this goes against FF look in buffer approach and requires consumers cursor visibility for wrap detection.
I need an SpSc queue that blocks on consumer side if no data is available. Is there a more efficient approach than integrating a semaphore?
For example, the consumer does not always need to check a lock on each consuming action if it caches the queue's size.
As far as I know a Mpsc_Queue consists of multiple Spsc_Queues.
Channels have both a minimum and a maximum capacity and the mantra "ram is cheap, don't underprovision". They should have:
Many methods still seem un implemented. May be this can be reviewed and see what can be implemented than just throwing UnsupportedOperationException.
Since I updated, it requires gpg to build. (on a Mac)
Normal?
Consider code-gen for padding?
Consider coding standards?
e.g., via Integer.getInteger(<property>, 64)
Queue semantics don't offer object reuse. Having to allocate new object instance for every offer() call effectively halves queue throughput.
It would be useful to have disruptor-like pattern around a buffer.
Could we save some volatiled index and array updates when using a batch/buffered add for a producer in a high throughput context? Optimally, we could just wrap/decorate an available SpSc queue and adapt the element adding behavior.
Allocate n nodes at the beginning just like we do with an array and linked the last with the first node. In contrast to the array version, we do not need to manage any indices and do not need to compute an offset. We just need to sync the node value via soValue and lvValueAndNull. Next is fixed in a bounded version. I do not know how the cache locality could break the performance here since we do not use an array.
Some improvements still possible on top of LongAdder for particular cases as demonstrated by the following post:
http://psy-lob-saw.blogspot.com/2013/06/java-concurrent-counters-by-numbers.html
While working on some queue related enhancements in RxJava, I've come across a situation where one thread writes a plain long field f1 followed by an ordered write f2 and the other thread volatile reads f2 followed by volatile read of f1. This works on 64 bit JVM but may read an incorrect value on 32 bit JVM. (This gist)[https://gist.github.com/akarnokd/760d4c0669b7caac7a1a] demonstrates the problem. This may happen with SpscArrayQueue.producerIndex but the likelihood is very small because it may present itself only after 1<<32 elements + a concurrent size() call. Is it worth preparing for this issue by lazy-set these longs as well?
http://www.1024cores.net/home/lock-free-algorithms/queues/non-intrusive-mpsc-node-based-queue
Note XCHG only available in JDK8 :(
I have a use case where I'm using the queue size as a specific limit and relying on the offer rejection as a boundary. I found that it seems there are times when offer
rejects even though there should be space available.
A unit test that demonstrates this behavior is here: https://github.com/benjchristensen/JCTools/blob/offer/jctools-core/src/test/java/org/jctools/queues/SpmcArrayQueueTest.java#L54
This test does the following
Producer.request
queue.poll
it calls Producer.request
which schedules onto the same producing thread offering more items via queue.offer
offer
rejectsI can't see anything wrong with the code that would cause more values to be offered than items polled.
This unit test works if I replace the SpmcArrayQueue
with a synchronized LinkedList
or ArrayBlockingQueue
.
This should be a relatively small change and should benefit implementations where the CAS loop can be replaced by LOCK XADD. This does however open the door to producers/consumers overstepping and spinning until full/empty queue state is resolved.
Goal: pack lots of boolean values into bytes. Issue also includes investigating performance through JMH Benchmarks.
The SPSC fixed sized ring buffer is using an int marker for publication:
private boolean isMessageReady(long offset) {
return UNSAFE.getIntVolatile(null, offset) == READY_MESSAGE_INDICATOR;
}
private void busyIndicator(long offset) {
UNSAFE.putOrderedInt(null, offset, BUSY_MESSAGE_INDICATOR);
}
This is potentially an issue if offset is not integer aligned. We use int because there's no putOrderedByte. The int access may not be atomic, but this is not really an issue because we only set the value to 0 or 1 so there's only one actual byte in use.
I am aware that for JDK9 Unsafe gains new methods for unaligned access which we may need to use here and elsewhere in the channel code.
I am also aware that current packing/layout of data in each message is NOT aligned, which is something we may want to address in future.
This is true for the MPSC queue because of the whole 'bubbles' in the queue crap where producerNode is set but not yet linked to the next node.
May be you can use contention instead of padding.
Would it be possible/feasible to batch consume the contents of an MpscLinkedQueue? I.e., have a method taking a callback which then traverses the links via plain loads until it reaches the producer's node. So instead of polling one by one, it would move the consumer pointer once and not bother nulling out every node's value.
(Copied from my comment on RxJava)
Correct me if I'm wrong, but it seems the code on line 135 may read beyond the buffer if offset is at the last element.
The other thing I see is that the queue should disallow offering Object[] values because it would confuse the poll. Better yet, when a new array needs to be communicated, wrap it into a private holder class so it is not confused with any other type; resize should be infrequent enough to not cause significant overhead.
I was using SpmcArrayQueue with a single producer and single consumer and I've noticed that sometimes, I read a non-null value via peek but the subsequent poll returns null. Looking into the class it seems there is a race window spElement and soProducerIndex and reading the various indexes in poll.
I could change my algorithm to not require peek, but the question is: shouldn't either offer/peek have a happens-before relation (would require soElement) or poll be a bit stronger (lvElement, then wait for index update)?
(Java 1.8u31 64 bit, Win7)
Hi, it would ease things if this was avaiable in central maven.
-ruediger
to support wider use cases.
Working towards a full suite of queues. Implement a counter based, single class format. Validate vs. stock JDK.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.