Comments (51)
@damir78 It seems you did not read all our thoughts about numbers in the old project.
I know that BigDecimal
is not performant. I know it builds a new object for each operation (precisely it is the reason of the caching system existence). I know double
is far much faster; Tail was using doubles.
But operations on doubles lead to big value approximations, which I considered unacceptable for a technical analysis lib.
</my-two-cents>
from ta4j.
I also think that BigDecimal is not the bottleneck of performance but one of ta4j advantages because of its high precision.
@damir78 your indicator from your rep does not fit into ta4js framework. Also
Why to use Caching ??? I think it is overengineered. Simple Rolling Window concept : collect few values, fire (calculate) and forget.
is not compatible with some ta4j concepts. It could work if you just need the successive values of e.g. two SMAs of the current indices but ta4j should also support the possibility to build more complex strategies that could need results from independent indices over the whole time series.
I have implemented (or copied^^) the columnar store approach for ta4j TimeSeries
(and Tick/Bar
) and two indicators. (branch restructuring/performance-timeSeries-and-ticks
)
I have also started with some tests to get a first idea of the difference of performance between the current approach and the columnar store approach. All in all it runs @damir78s example and calculates for several SMAs(with increasing time frame from 2 to 30) all values of a very big TimeSeries
and prints the required time for each run;
Maybe we could add a 'double implementation' and some caching logic and compare the results, too.
[Column based] time: 3544 lastValue: 1166398.5
[Classic ] time: 6050 lastValue: 1166397.5
Fastest: [Column based] diff: 2506
--------------Test (time frame 2-4)--------------
[Column based] time: 5077 lastValue: 1166398
[Classic ] time: 5580 lastValue: 1166397
Fastest: [Column based] diff: 503
--------------Test (time frame 2-5)--------------
[Column based] time: 7142 lastValue: 1166397.5
[Classic ] time: 8444 lastValue: 1166396.5
Fastest: [Column based] diff: 1302
--------------Test (time frame 2-6)--------------
[Column based] time: 9659 lastValue: 1166397
[Classic ] time: 11356 lastValue: 1166396
Fastest: [Column based] diff: 1697
--------------Test (time frame 2-7)--------------
[Column based] time: 12211 lastValue: 1166396.5
[Classic ] time: 14336 lastValue: 1166395.5
Fastest: [Column based] diff: 2125
--------------Test (time frame 2-8)--------------
[Column based] time: 14830 lastValue: 1166396
[Classic ] time: 17345 lastValue: 1166395
Fastest: [Column based] diff: 2515
--------------Test (time frame 2-9)--------------
[Column based] time: 17445 lastValue: 1166395.5
[Classic ] time: 20409 lastValue: 1166394.5
Fastest: [Column based] diff: 2964
--------------Test (time frame 2-10)--------------
[Column based] time: 20197 lastValue: 1166395
[Classic ] time: 25868 lastValue: 1166394
Fastest: [Column based] diff: 5671
--------------Test (time frame 2-11)--------------
[Column based] time: 22873 lastValue: 1166394.5
[Classic ] time: 26605 lastValue: 1166393.5
Fastest: [Column based] diff: 3732
--------------Test (time frame 2-12)--------------
[Column based] time: 25725 lastValue: 1166394
[Classic ] time: 29895 lastValue: 1166393
Fastest: [Column based] diff: 4170
--------------Test (time frame 2-13)--------------
[Column based] time: 28579 lastValue: 1166393.5
[Classic ] time: 33258 lastValue: 1166392.5
Fastest: [Column based] diff: 4679
--------------Test (time frame 2-14)--------------
[Column based] time: 31546 lastValue: 1166393
[Classic ] time: 39835 lastValue: 1166392
Fastest: [Column based] diff: 8289
--------------Test (time frame 2-15)--------------
[Column based] time: 35214 lastValue: 1166392.5
[Classic ] time: 40504 lastValue: 1166391.5
Fastest: [Column based] diff: 5290
--------------Test (time frame 2-16)--------------
[Column based] time: 37678 lastValue: 1166392
[Classic ] time: 44137 lastValue: 1166391
Fastest: [Column based] diff: 6459
--------------Test (time frame 2-17)--------------
[Column based] time: 40757 lastValue: 1166391.5
[Classic ] time: 48264 lastValue: 1166390.5
Fastest: [Column based] diff: 7507
--------------Test (time frame 2-18)--------------
[Column based] time: 43887 lastValue: 1166391
[Classic ] time: 53531 lastValue: 1166390
Fastest: [Column based] diff: 9644
--------------Test (time frame 2-19)--------------
[Column based] time: 46901 lastValue: 1166390.5
[Classic ] time: 55245 lastValue: 1166389.5
Fastest: [Column based] diff: 8344
--------------Test (time frame 2-20)--------------
[Column based] time: 50250 lastValue: 1166390
[Classic ] time: 59122 lastValue: 1166389
Fastest: [Column based] diff: 8872
--------------Test (time frame 2-21)--------------
[Column based] time: 52783 lastValue: 1166389.5
[Classic ] time: 62247 lastValue: 1166388.5
Fastest: [Column based] diff: 9464
--------------Test (time frame 2-22)--------------
[Column based] time: 57312 lastValue: 1166389
[Classic ] time: 69830 lastValue: 1166388
Fastest: [Column based] diff: 12518
--------------Test (time frame 2-23)--------------
[Column based] time: 60414 lastValue: 1166388.5
[Classic ] time: 71119 lastValue: 1166387.5
Fastest: [Column based] diff: 10705
--------------Test (time frame 2-24)--------------
[Column based] time: 63334 lastValue: 1166388
[Classic ] time: 75748 lastValue: 1166387
Fastest: [Column based] diff: 12414
--------------Test (time frame 2-25)--------------
[Column based] time: 66719 lastValue: 1166387.5
[Classic ] time: 78736 lastValue: 1166386.5
Fastest: [Column based] diff: 12017
--------------Test (time frame 2-26)--------------
[Column based] time: 70651 lastValue: 1166387
[Classic ] time: 83484 lastValue: 1166386
Fastest: [Column based] diff: 12833
--------------Test (time frame 2-27)--------------
[Column based] time: 74081 lastValue: 1166386.5
[Classic ] time: 89094 lastValue: 1166385.5
Fastest: [Column based] diff: 15013
--------------Test (time frame 2-28)--------------
[Column based] time: 76898 lastValue: 1166386
[Classic ] time: 91456 lastValue: 1166385
Fastest: [Column based] diff: 14558
--------------Test (time frame 2-29)--------------
[Column based] time: 80644 lastValue: 1166385.5
[Classic ] time: 95866 lastValue: 1166384.5
Fastest: [Column based] diff: 15222
--------------Test (time frame 2-7)--------------
[Column based] time: 12211 lastValue: 1166396.5
[Classic ] time: 14336 lastValue: 1166395.5
Fastest: [Column based] diff: 2125
--------------Test (time frame 2-8)--------------
[Column based] time: 14830 lastValue: 1166396
[Classic ] time: 17345 lastValue: 1166395
Fastest: [Column based] diff: 2515
--------------Test (time frame 2-9)--------------
[Column based] time: 17445 lastValue: 1166395.5
[Classic ] time: 20409 lastValue: 1166394.5
Fastest: [Column based] diff: 2964
--------------Test (time frame 2-10)--------------
[Column based] time: 20197 lastValue: 1166395
[Classic ] time: 25868 lastValue: 1166394
Fastest: [Column based] diff: 5671
--------------Test (time frame 2-11)--------------
[Column based] time: 22873 lastValue: 1166394.5
[Classic ] time: 26605 lastValue: 1166393.5
Fastest: [Column based] diff: 3732
--------------Test (time frame 2-12)--------------
[Column based] time: 25725 lastValue: 1166394
[Classic ] time: 29895 lastValue: 1166393
Fastest: [Column based] diff: 4170
--------------Test (time frame 2-13)--------------
[Column based] time: 28579 lastValue: 1166393.5
[Classic ] time: 33258 lastValue: 1166392.5
Fastest: [Column based] diff: 4679
--------------Test (time frame 2-14)--------------
[Column based] time: 31546 lastValue: 1166393
[Classic ] time: 39835 lastValue: 1166392
Fastest: [Column based] diff: 8289
--------------Test (time frame 2-15)--------------
[Column based] time: 35214 lastValue: 1166392.5
[Classic ] time: 40504 lastValue: 1166391.5
Fastest: [Column based] diff: 5290
--------------Test (time frame 2-16)--------------
[Column based] time: 37678 lastValue: 1166392
[Classic ] time: 44137 lastValue: 1166391
Fastest: [Column based] diff: 6459
--------------Test (time frame 2-17)--------------
[Column based] time: 40757 lastValue: 1166391.5
[Classic ] time: 48264 lastValue: 1166390.5
Fastest: [Column based] diff: 7507
--------------Test (time frame 2-18)--------------
[Column based] time: 43887 lastValue: 1166391
[Classic ] time: 53531 lastValue: 1166390
Fastest: [Column based] diff: 9644
--------------Test (time frame 2-19)--------------
[Column based] time: 46901 lastValue: 1166390.5
[Classic ] time: 55245 lastValue: 1166389.5
Fastest: [Column based] diff: 8344
--------------Test (time frame 2-20)--------------
[Column based] time: 50250 lastValue: 1166390
[Classic ] time: 59122 lastValue: 1166389
Fastest: [Column based] diff: 8872
--------------Test (time frame 2-21)--------------
[Column based] time: 52783 lastValue: 1166389.5
[Classic ] time: 62247 lastValue: 1166388.5
Fastest: [Column based] diff: 9464
--------------Test (time frame 2-22)--------------
[Column based] time: 57312 lastValue: 1166389
[Classic ] time: 69830 lastValue: 1166388
Fastest: [Column based] diff: 12518
--------------Test (time frame 2-23)--------------
[Column based] time: 60414 lastValue: 1166388.5
[Classic ] time: 71119 lastValue: 1166387.5
Fastest: [Column based] diff: 10705
--------------Test (time frame 2-24)--------------
[Column based] time: 63334 lastValue: 1166388
[Classic ] time: 75748 lastValue: 1166387
Fastest: [Column based] diff: 12414
--------------Test (time frame 2-25)--------------
[Column based] time: 66719 lastValue: 1166387.5
[Classic ] time: 78736 lastValue: 1166386.5
Fastest: [Column based] diff: 12017
--------------Test (time frame 2-26)--------------
[Column based] time: 70651 lastValue: 1166387
[Classic ] time: 83484 lastValue: 1166386
Fastest: [Column based] diff: 12833
--------------Test (time frame 2-27)--------------
[Column based] time: 74081 lastValue: 1166386.5
[Classic ] time: 89094 lastValue: 1166385.5
Fastest: [Column based] diff: 15013
--------------Test (time frame 2-28)--------------
[Column based] time: 76898 lastValue: 1166386
[Classic ] time: 91456 lastValue: 1166385
Fastest: [Column based] diff: 14558
--------------Test (time frame 2-29)--------------
[Column based] time: 80644 lastValue: 1166385.5
[Classic ] time: 95866 lastValue: 1166384.5
Fastest: [Column based] diff: 15222 1166393.5
[Classic ] time: 33258 lastValue: 1166392.5
Fastest: [Column based] diff: 4679
--------------Test (time frame 2-14)--------------
[Column based] time: 31546 lastValue: 1166393
[Classic ] time: 39835 lastValue: 1166392
Fastest: [Column based] diff: 8289
--------------Test (time frame 2-15)--------------
[Column based] time: 35214 lastValue: 1166392.5
[Classic ] time: 40504 lastValue: 1166391.5
Fastest: [Column based] diff: 5290
--------------Test (time frame 2-16)--------------
[Column based] time: 37678 lastValue: 1166392
[Classic ] time: 44137 lastValue: 1166391
Fastest: [Column based] diff: 6459
--------------Test (time frame 2-17)--------------
[Column based] time: 40757 lastValue: 1166391.5
[Classic ] time: 48264 lastValue: 1166390.5
Fastest: [Column based] diff: 7507
--------------Test (time frame 2-18)--------------
[Column based] time: 43887 lastValue: 1166391
[Classic ] time: 53531 lastValue: 1166390
Fastest: [Column based] diff: 9644
--------------Test (time frame 2-19)--------------
[Column based] time: 46901 lastValue: 1166390.5
[Classic ] time: 55245 lastValue: 1166389.5
Fastest: [Column based] diff: 8344
--------------Test (time frame 2-20)--------------
[Column based] time: 50250 lastValue: 1166390
[Classic ] time: 59122 lastValue: 1166389
Fastest: [Column based] diff: 8872
--------------Test (time frame 2-21)--------------
[Column based] time: 52783 lastValue: 1166389.5
[Classic ] time: 62247 lastValue: 1166388.5
Fastest: [Column based] diff: 9464
--------------Test (time frame 2-22)--------------
[Column based] time: 57312 lastValue: 1166389
[Classic ] time: 69830 lastValue: 1166388
Fastest: [Column based] diff: 12518
--------------Test (time frame 2-23)--------------
[Column based] time: 60414 lastValue: 1166388.5
[Classic ] time: 71119 lastValue: 1166387.5
Fastest: [Column based] diff: 10705
--------------Test (time frame 2-24)--------------
[Column based] time: 63334 lastValue: 1166388
[Classic ] time: 75748 lastValue: 1166387
Fastest: [Column based] diff: 12414
--------------Test (time frame 2-25)--------------
[Column based] time: 66719 lastValue: 1166387.5
[Classic ] time: 78736 lastValue: 1166386.5
Fastest: [Column based] diff: 12017
--------------Test (time frame 2-26)--------------
[Column based] time: 70651 lastValue: 1166387
[Classic ] time: 83484 lastValue: 1166386
Fastest: [Column based] diff: 12833
--------------Test (time frame 2-27)--------------
[Column based] time: 74081 lastValue: 1166386.5
[Classic ] time: 89094 lastValue: 1166385.5
Fastest: [Column based] diff: 15013
--------------Test (time frame 2-28)--------------
[Column based] time: 76898 lastValue: 1166386
[Classic ] time: 91456 lastValue: 1166385
Fastest: [Column based] diff: 14558
--------------Test (time frame 2-29)--------------
[Column based] time: 80644 lastValue: 1166385.5
[Classic ] time: 95866 lastValue: 1166384.5
Fastest: [Column based] diff: 15222
--------------Test (time frame 2-30)--------------
[Column based] time: 85560 lastValue: 1166385
[Classic ] time: 100278 lastValue: 1166384
Fastest: [Column based] diff: 14718
from ta4j.
Strangely when I run the sampler against a huge demo app (thousands of strats against tens of thousands of ticks) it takes 118 seconds, and the first hotspot I see is getSimpleName().
Quick look at the call tree shows for my demo and the calls are in AbstractRule.traceIsSatisfied(). Every call to that generates a call to getSimpleName(). Change that to use a protected final String class variable, and re-run.
Next run takes 106 seconds and hotspot is once again getSimpleName() but from BaseStrategy.traceShouldExit(). Same problem, same fix, re-run.
Next run takes 100 seconds and now the top 5 hotspots are all in BigDecimal. I re-wrote Decimal to use double instead of BigDecimal and re-ran.
This run completed in 13 seconds. Obvi the big improvement was the change to Decimal, but those getSimpleName() calls are annoying and widespread and create noise in the performance analysis.
from ta4j.
Guys, I did proof of concept: replaced Decimal with Double. It is better. Much better:
for one iteration h=3 it took:
for (int h = 2; h < 3; h++) :
h = 2
average = 5.0
Time elapsed: 9256 -> with Double (before with Decimal : 86950, factor 10)
for (int h = 2; h < 201; h++) :
...
h = 197
h = 198
h = 199
h = 200
average = 5.0
Time elapsed: 264863 or ~4 min, is defintely better as 50+ mins
If anybody wanna have the adapted sources 'Double instead Decimal'-> le me know I will upload the sources .
The next bottleneck is CachedIndicator: the implementation is bad-> on growing each time to clea, to add all ticks-> very expensive operations...
from ta4j.
@damir78 @edlins Nice founds.
Maybe we could extract a interface of the Decimal
class and create two classes FastDecimal
-> with double delegate and ExactDecimal
-> with BigDecimal delegate and let the user choose with one he wants to use?
from ta4j.
Why is performance not priority?
from ta4j.
While the OP's example is what I would consider a straw man, it's true that ta4j's performance has not been optimized.
I've done a lot of modifications on my personal fork of ta4j and among the changes are significant improvements to performance. A "low hanging fruit" is the Indicator caching mechanism. For example, there's absolutely no reason to cache values for simple indicators like the ClosePriceIndicator, VolumeIndicator, etc.
I've added a NonCachedIndicator (does basically what you think it would do) and there's a stark performance increase when using it vs CachedIndicator for most of the simple indicators. Consider this test:
` @test
public void testClosePriceIndicatorPerformance() {
List<Tick> ticks = new ArrayList<>();
for (int i = 1; i < 999_999; i++) {
ticks.add(new MockTick(ZonedDateTime.now(), Duration.ZERO, i));
}
TimeSeries series = new TimeSeries(ticks);
long start = System.currentTimeMillis();
ClosePriceIndicator closePrices = new ClosePriceIndicator(series);
for (int i = 0; i < series.getTickCount(); i++) {
closePrices.getValue(i);
}
System.out.println("Time Elapsed: " + (System.currentTimeMillis() - start) + " ms");
}`
When ClosePriceIndicator extends CachedIndicator, this runs in 80 ms on my computer. When extending NonCachedIndicator, it's down to 13 ms.
While there are use cases where caching is valuable (i.e. complex calculations), it can be detrimental in simple cases, and shouldn't be used as a universal Indicator foundation.
When I get some time, I'll look into Decimal performance.
from ta4j.
Hi @edlins, improving performance is always a good idea i think. Feel free to contribute. At the moment it is a frequent practice to make a purch request to develop
branch with your changes or suggestions. The purch request will be discussed, can be updated and finally your PR will be merged into develop if accepted. If you don't want to implement something specific or if you don't want to risk to do much work that could be rejected you also can open an issue first.
Similar discussion in #26
from ta4j.
Okay :D, maybe i will add a double implementation to the test..
from ta4j.
see develop...edlins:perf-getSimpleName,Decimal
from ta4j.
Hey guys,
refactored ta4j for using double instead Decimal. There is no Decimal in project.
Feel you free : https://github.com/damir78/ta4jUsingDouble
Using double implementation:
SMALauncher:
h = 2
average = 5.0
Time elapsed: 732
from ta4j.
Hey, guys,
I consider the whole issue closed with resolution 'won't fix'.
Important: we have identified the current bottlenecks: class Decimal (too much object will be created during calculations) and CachedIndicator (during increaseLengthTo).
Decimal will be NOT replaced by double. Possible custom implementations (s.related issues). CachedIndicator can be optimized as well.
Both are out of scope of this issue.
from ta4j.
Using simple custom SMA implementation (for comparison):
loop: h = 2; h < 3
`
@test
public void testCustomSMA() {
long start = System.currentTimeMillis();
int initialCapacity = 54 * 5 * 24 * 60 * 3;
double[] input = new double[initialCapacity];
for (int i = 0; i < input.length; i++) {
input[i] = 5d;
}
double average = 0f;
for (int h = 2; h < 3; h++) {
SimpleMovingAverageUsingQueue sma = new SimpleMovingAverageUsingQueue(h);
for (int i = 0; i < initialCapacity; i++) {
average = sma.average(input[i]);
}
}
long end = System.currentTimeMillis();
System.out.println("average = " + average);
System.out.println("Time elapsed: " + (end - start));
}
`
average = 5.0
Time elapsed: 63
loop: h = 2; h < 201
`
@test
public void testCustomSMA() {
long start = System.currentTimeMillis();
int initialCapacity = 54 * 5 * 24 * 60 * 3;
double[] input = new double[initialCapacity];
for (int i = 0; i < input.length; i++) {
input[i] = 5d;
}
double average = 0f;
for (int h = 2; h < 201; h++) {
SimpleMovingAverageUsingQueue sma = new SimpleMovingAverageUsingQueue(h);
for (int i = 0; i < initialCapacity; i++) {
average = sma.average(input[i]);
}
}
long end = System.currentTimeMillis();
System.out.println("outp = " + average);
System.out.println("Time elapsed: " + (end - start));
}
`
average = 5.0
Time elapsed: 4595
from ta4j.
Why do you close? I do not know if your performance issue is because of this lib or because of your code. Can you publish your custom SMA implementation and the implementation of AnotherMockTimeSeries. I also use e this lib for backtesting and never had performance issuse.
With this code i have a better result:
Ā“Ā“Ā“Ā“
public void testIfQuickEnough() {
long start = System.currentTimeMillis();
int initialCapacity = 54 * 5 * 24 * 60 * 3;
double[] input = new double[initialCapacity];
List<Tick> ticks = new ArrayList<>();
for (int i = 0; i < input.length; i++) {
ticks.add(new BaseTick(ZonedDateTime.now(), Decimal.valueOf(5d), Decimal.valueOf(5d), Decimal.valueOf(5d), Decimal.valueOf(5d), Decimal.valueOf(5d)));
}
TimeSeries timeSeries = new BaseTimeSeries(ticks);
Decimal average = null;
for (int h = 2; h < 3; h++) {
SMAIndicator quoteSMA = new SMAIndicator(new ClosePriceIndicator(timeSeries), h);
for (int i = 0; i < timeSeries.getTickCount(); i++) {
average = quoteSMA.getValue(i);
}
}
long end = System.currentTimeMillis();
System.out.println("average = " + average);
System.out.println("Time elapsed: " + (end - start));
}
Ā“Ā“Ā“Ā“
average = 5
Time elapsed: 13433
from ta4j.
Hi @team172011 ,
I closed this issue because... it will be not fixed... The problem is a framework itself. The performance was not a high priority....
I did reopened.
from ta4j.
In this patch file can you find the simple test that tests the performance of most used SMA indicator.
LackOfPerformance.patch.txt
from ta4j.
Recorded the video š https://youtu.be/uCGYMz6bxtY
from ta4j.
@team172011 @magick93
Guys, thx for your interest and that you interesting for this issue. It was my view on this problem(performance) with ta4j. But reality says: ta4j is not good enough. You can download a patch that I added and test it yourself. This framework should be rewritten (IMHO). I would not use ta4j for battle trading engine.
I recorded an another video how it works with ta-lib , the old good ta library.
Compare how quick is ta-lib: https://youtu.be/inllw1cJjHA
try this with ta4j: for (int h = 2; h < 201; h++) ....
`
@test
public void testIfQuickEnoughFullIteration() {
long start = System.currentTimeMillis() ;
int initialCapacity = 54 * 5 * 24 * 60 * 3 ;
double[] input = new double[initialCapacity];
for (int i = 0; i < input.length; i++) {
input[i] = 5d;
}
TimeSeries timeSeries = new AnotherMockTimeSeries(input);
Decimal average = null;
for (int h = 2; h < 201; h++) {
SMAIndicator quoteSMA = new SMAIndicator(new ClosePriceIndicator(timeSeries), h);
for (int i = 0; i < timeSeries.getTickCount(); i++) {
average = quoteSMA.getValue(i);
}
}
long end = System.currentTimeMillis();
System.out.println("average = " + average);
System.out.println("Time elapsed: " + (end - start));
}
`
from ta4j.
Maybe the performance problem comes from converting to Decimal/BigDecimal (while running calculate()?)..#26. We should use performance/memory analyzer to dig into..does ta-lib uses "BigDecimal"?
from ta4j.
@damir78 would be nice to see your implementation of SMAIndicatorUsingQueue, maybe we could fix this issue with your implementation?
from ta4j.
I've created a github project:
Checkout : https://github.com/damir78/ta4j-performance-test
you can find there all files and tests (src/test).
from ta4j.
We can add something like this to enable/disable CachingSystem:
public abstract class CachedIndicator<T> extends AbstractIndicator<T>
protected abstract boolean enableCash();
}
public class CorrelationCoefficientIndicator extends CachedIndicator<Decimal> {
@Override
boolean enableCash(){
return false;
}
}
Or we can directly say:
// no need to redirect to CachedIndicator
public class CorrelationCoefficientIndicator extends AbstractIndicator<Decimal> {
..}
Caching should be disabled by default. All values can be re-calculated.
Maybe in each indicator another constructor can be used to explicitly enable caching:
new SMAIndicator(..,.., enableCache).
from ta4j.
Hey guys, I'm scott. Good work on this project. This appears to be under active development so I thought I'd throw my $0.02 in. I don't mean to derail this issue, but my comment is performance related. I'm happy to open a new issue but it might appear redundant. (Related - is there a forum or irc channel for informal discussion?)
I like the ta4j design overall, but some of the class implementations are hideously inefficient. When I ran the Quickstart example it took about 30 seconds to run on my aging laptop. I implemented a new Tick class with a few tiny changes, added one new default method to the Tick interface, and added one new method to CsvTradesLoader. Quickstart now runs in about 4 seconds for me. I haven't pushed the changes to my fork yet, and I haven't looked at anything else but I feel that if Decimal isn't rewritten this will never be fast. Let me know if you're interested.
from ta4j.
I think the main reason why ta-lib is so fast and ta4j is so slow is neither the decimal data type nor the caching logic, but the POJO structure. Maybe we should restructure the framework like described in https://medium.com/@entzik/a-time-series-columnar-store-exercise-4e2d77fe6c24 or implemented in https://github.com/entzik/acropolis-concept
from ta4j.
Interesting approach. So storing all values in a "temporarly/non-persistent wide column store" by java-arrays and have a layer to access these values as it is a ordinary instance of a "object". Makes sense.
If we are doing so, we should evaluate the integration of a "caching framework" as well (https://ignite.apache.org/). So we are able to cache (and store persistently) whatever we want (tradingrecord, timeseries,..) and replace the actual caching layer with 3rd party.
from ta4j.
we should evaluate the integration of a "caching framework" as well (https://ignite.apache.org/).
Great idea!
from ta4j.
As discussed on riot.im:
https://github.com/PatrickCallaghan/datastax-timeseries-lib/tree/master/src/main/java/com/datastax/timeseries/model
I think this gets more close to what Iām thinking but the candlestick and candlestickseries should not use arraylist and be combined to implement datatype[] more efficiently. Best that it is handled an array caching memory manager to reduce object creation and move dirty arrays back to cache pool. Using periodicity and duration parameter we could assume default array size however we should not limit size and let primitive array to dynamically increase.
from ta4j.
@magick93, I think an interface should be created to allow ta4j sink db agonistic which contributors then implement specific db framework of their choice
from ta4j.
"JSR 107 Caching API" is the interface (and is already "db agonistic") and is supported by the caching framework. No need to have another interface.
from ta4j.
Guys, before you refactor or restructure:
you should evaluate the new approach.
Generally:
-
Use primitives instead objects. int-> long->float->double... ->Object. Object is slowest.
How to use "int" for prices: simple multiply by 1000. This approach taken from iex stock exchange. IMHO: Decimal must be replaced by primitive data type. Because array growing can to slow performance, I would suggest to use: primitive data collections !! Array is very useful if you have constant data length. This approach is perfect for backtesting . For realtime trading: primitive data collection -
ta-lib is very powerful... but you need understand the purpose.Ta-lib is perfect for backtesting!!! If you use realtime trading you should introduce "rolling window" concept-> store some values before to apply calculation. In this case I would suggest to look again my SimpleMovingAverage: https://github.com/damir78/ta4j-performance-test
-
Why to use Caching ??? I think it is overengineered. Simple Rolling Window concept : collect few values, fire (calculate) and forget.
-
If core of core must be refactored -> better to create new project?
from ta4j.
- Use primitives instead objects.
Maybe we should restrict the use of (Big)Decimal only for price-related indicators. For any other indicator not related to prices, we could use double. The "How to use "int" for prices" is a workaround which should be avoided! There are money classes (suitable for all price indicators) for that which can be used #26:
To differentiate between "price indicator" and "no-price-indicator", we could create abstract class "PriceIndicator<Decimal>
";
// PriceIndicator<Decimal> extends (Cached)Indicator<Decimal>
public class MedianPriceIndicator extends PriceIndicator<Decimal> {
..
}
More information about high performance money classes:
- https://github.com/JavaMoney/jsr354-ri-bp/blob/master/src/main/java/org/javamoney/moneta/FastMoney.java
- http://java-performance.info/high-performance-money-class/
- https://github.com/mikvor/money-conversion
- Why to use Caching ??? I think it is overengineered. Simple Rolling Window concept : collect few values, fire (calculate) and forget.
Yes. This makes more sense! No need for internal caching. "Rolling window" would be sufficient. Actually, there is the concept of "maxTickCount", which can be extended to "rolling window" ("flexible windowing"). Maybe something like the "flexible windowing" like https://flink.apache.org/introduction.html.
By the way, if someone wants to cache or store something then user can use any caching/db provider, which is out of scope of ta4j. I guess, the actual internal caching of ta4j is only used to avoid redundant calculations..
from ta4j.
I think @TheCookieLab's point shows that before jumping down BigDecimal
's throat some profiling is desperately needed before going around guessing what can be improved. The core value of their point is that there is likely a fair amount of ta4j (not JDK) code that is much slower and more readily optimised.
Maybe we should restrict the use of (Big)Decimal only for price-related indicators. For any other indicator not related to prices, we could use double.
I think this is in an indication that removing BigDecimal
is a knee jerk reaction that should be avoided until there is data supporting the fact that the biggest/final obstacle in improving ta4j's performance is removing it.
Most of what the indicators work with are not actually "money", despite price being an input, the intermediate and ultimate result are simply indicator values and not monetary amounts. Thinking about this as "money" is going down the wrong mental model because money has no real need for precision beyond it's monetary representation, and a lot of money libraries allow precision on only one side of the decimal point but not both. One example (out of many) is the MACD histogram values. They are quite tiny, especially if you're working with an asset that has 8 places of decimal precision (as most cryptocurrencies do), doubles will open you up to a high margin of error when calculating these (and most other derivatives.)
Correctness in a TA library seems, to me, like the most important aspect and floating point explicitly makes the tradeoff of precision for speed. Before making that trade off in ta4j there should be data for performance and the rate of error (how quickly does floating point imprecision accumulate when using derivative values, such as EMA, SMA etc., and derivatives of those derivatives) in a large variety of scenarios. There should also be profiling to indicate that the bottleneck is in fact BigDecimal
and not a poor algorithm or implementation in ta4j.
See #26 for previous discussion on this.
To summarise: It is not my impression that anyone is against improving the performance of this library but I applaud @team172011 (and originally @mdeverdelhan) and other core developers for taking the time to consider the implications of changes to the most important aspects of this library.
from ta4j.
I think this is in an indication that removing BigDecimal is a knee jerk reaction that should be avoided until there is data supporting the fact that the biggest/final obstacle in improving ta4j's performance is removing it.
Yes, I would also stick with BigDecimal. I have not even had any performance problems and if so, I would even want to have precise values with eventual performance penalty instead of unprecise values with (better) performance. Maybe, today performance is bad, but tommorow it will be better (because of faster computers), but what stays is: precise or unprecise values. And we should definitly use precise values all over the lib!
from ta4j.
Guys, I think if you wanna make a good ta math library you should spent more in effectiveness and performance and not in the "how to represent 100 as Money class".
from ta4j.
Replace Decimal.. It is the bottle neck...
from ta4j.
from ta4j.
By the way, for the "Column based"-approach: Please also evaluate the "performance" and "memory consumption" of different data structure. For example, "to get a value", LinkedHashSet (ordered) is faster than a ArrayList (ordered), "to set a value", ArrayList is faster than HashSet. All common cache provider (infinispan, gridgain, etc) use a HashMap to store any value/object. Please evaluate those solutions as well.
from ta4j.
This is relevant for caching, right? I use arrays without caching at the moment. In the test every value is calculated only once
from ta4j.
Exactly what I was thinking. Decimal should be an interface.
from ta4j.
As a reminder, do not hesitate to have a look to our old thoughts about Decimal and caching system: #24 (comment)
Of course BigDecimal
is the bottleneck. I generalized the cache usage because of its bad performance (see mdeverdelhan/ta4j-origins#20 (comment)). But be careful if you want to go the way you are describing, it may raise more questions.
I continue to think that the best solution is just refactoring the inner logic of the Decimal
class (for instance: by using a pair of long
under the hood instead of a BigDecimal
).
from ta4j.
I recommend to improve ta4j step by step:
1. step: implement the "column-based"-approach, which is a good idea from @team172011
(for example, no need to create a new object for each tick)
2. step: improve the "getSimpleName()"-issue found by @edlins
3. step: Make a Interface which is called "Decimal.java" and implement the "BaseDecimal.java" (which should use BigDecimal or better long). This should be the standard!
Or we can call it LongDecimal (->uses long), ShortDecimal (->uses short), BaseDecimal (->uses BigDecimal). So user can decide which to use.
4. step: Improve the existing caching system
According to https://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency
The float and double types are particularly ill-suited for monetary calculations because it is impossible to represent 0.1 (or any other negative power of ten) as a float or double exactly.
For example, suppose you have $1.03 and you spend 42c. How much money do you have left?
System.out.println(1.03 - .42);
prints out 0.6100000000000001.The right way to solve this problem is to use BigDecimal, int or long for monetary calculations.
from ta4j.
How to use long instead of BigDecimal
?
by using a pair of long under the hood instead of a BigDecimal
?
from ta4j.
- step: improve the "getSimpleName()"-issue found by @edlins
I'll open an issue for this and take it. For now I think I'll only touch AbstractRule and BaseStrategy. There are many other uses of getSimpleName() but those two classes are the worst for using it in log.trace().
from ta4j.
@team172011 You can use a long
if you work like decimal4j (in short: your smallest decimal is the first digit of your long
).
The problem is that decimal4j is limited to approx. 18 digits, which I find not enough. On the other hand, BigDecimal
is unlimited, which is useless for us. I think that 32 digits should be sufficient. It might be achieved by implementing the Decimal class using 2 long
. But it is complex maths. I have not found enough time to dive into it.
We already discussed (with @davassi) about that here.
Anyway, I strongly advise you not to use double
for your calculcations. Here are some issues where strong value deviations were found when ta4j was using double
: mdeverdelhan/ta4j-origins#50, mdeverdelhan/ta4j-origins#4
Also an old quote (from mdeverdelhan/ta4j-origins#19 (comment)):
Here you will find some resources about what are the problems of doubles and why they should not be used when dealing with money :
- http://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency
- http://stackoverflow.com/questions/179427/how-to-resolve-a-java-rounding-double-issue
- http://programmers.stackexchange.com/questions/224859/when-do-rounding-problems-become-a-real-problem-is-the-least-significant-digit
from ta4j.
There is no need to reanimate Decimal. It works fine without https://github.com/damir78/ta4jUsingDouble
You should understand: every time if you use BigDecimal-> internally will be created new immutable object. It is not performant if have too much calculations.
For example:
minus. plus. multipliedBy. dividedBy-> jvm creates at least 4 BigDecimal objects.
If you use double-> it is highly optimized.
Let me know, if https://github.com/damir78/ta4jUsingDouble do not meet your visions /aims.
But it now faster much faster.
from ta4j.
@mdeverdelhan I have read your previous discussions (and the others on the web) regarding using double for financial operations and I agree, or rather there is nothing to agree but the fact that double can lead to big inaccuracies.
For dealing with 'classic' securities that have less decimals places BigDecimal
is too massive and double
is too imprecise.
I think we don't have to reinvent the wheel, we could test decimal4j with less decimal places and introduce it as second Decimal
type instead of double
.
from ta4j.
@mdeverdelhan Sry, but I can't undesrtand your argument . All calculations are correct. All tests are adapted and green as well.
from ta4j.
for the calculations you don't need to have BigDecimal. If you still wanna have it-> use it ONLY for end result representation / money representation.
from ta4j.
for the calculations you don't need to have BigDecimal.
@damir78 Why do you think so? As you read the posts here, you should understand the issues regarding double and the solution we provide in the future (#88). So please, stop to discuss, why we should use double solely for our calculations. End-result will vary if internal calculations are unprecise! This is common math und should be clear. If double fits your need, then stick with it.
from ta4j.
Simon, at this point what is your opinion on #88? If you oppose making Decimal an interface with BaseDecimal using BigDecimal, I'd like to know why. Sure it's a little more work (I volunteer) but the results are the same. I don't see a compelling rational reason to obstruct Decimal-as-interface. In fact, this debate exactly makes the case why it should be an interface, does it not?
from ta4j.
Because i want to enable the possibility of implementing its own calculations type/logic.
from ta4j.
Related Issues (20)
- KDJ
- [BUG] Reviewing process does not work as promised HOT 2
- TradingRecord#getLastPosition(): improve code
- Possible RSI bug HOT 7
- Does the lib support Other TimeFrames HOT 1
- Amount field in the BaseBar class HOT 1
- [BUG]ConvergenceDivergenceIndicator always returns false on 500 data sets with 500 bars each HOT 1
- [BUG] Supertrend Indicator value goes to 0 and stays 0 forever HOT 2
- Zig Zag indicator implementation question HOT 3
- Create releases more often HOT 17
- Trade style specific question [Support and resistence] HOT 8
- Can ta4j backtest trading multiple stocks simultaneously HOT 2
- [BUG] ProfitLossCriterion: Javadoc says "excludes trading costs" but code uses trading cost HOT 5
- Backtest with fixed entry amount like a new BarSeriesManager or Strategy
- [BUG] VersusEnterAndHoldCriterion should use 'amount' to match comparsion
- [BUG]
- Suggestion for improving the organization of the proyect folders and classes HOT 4
- [BUG] HOT 1
- Suggestion to secure the correct functionality of the library HOT 2
- SMAIndicator inner cache
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ta4j.