This report is about the oblig 3 in the UiO coarse IN4030 (Efficient parallel programming). In this report I'm solving the process of finding large prime numbers & factorizing large numbers. It demonstrates the usage of Sieve of Eratosthenes and how paralellization can make this process more efficient.
The program can be run using the command:
java -cp target/IN4030-oblig-3-1.0-SNAPSHOT.jar src.Main <n> <t> <m>
- n - Program will generate all primes up to n and factorize the 100 largest numbers less than n*n
- t - How many threads to use. if zero then the program uses the number of cores on the machine
- m - Which mode to use. Can be the following:
- 0: runs the program sequentially
- 1: runs the program in parallel
- 2: runs benchmarks and prints these times
- 3: runs tests
Example usage for running sequentially for 20000000:
java -cp target/IN4030-oblig-3-1.0-SNAPSHOT.jar src.Main 20000000 0 0
Example usage for running parallel for 20000000 with 4 threads:
java -cp target/IN4030-oblig-3-1.0-SNAPSHOT.jar src.Main 20000000 4 1
Example usage for running benchmarks:
java -cp target/IN4030-oblig-3-1.0-SNAPSHOT.jar src.Main 20000000 0 2
Example usage for running tests:
java -cp target/IN4030-oblig-3-1.0-SNAPSHOT.jar src.Main 20000000 0 3
Given that the number n should be factorized.
First I used the sequential sieve to obtain all prime numbers up to m (red line). In order to obtain these numbers, the sequential sieve had to mark all numbers from 0 to p (green line) and then collect the primes (numbers which was not marked). This sequential part is relatively fast as it's only iterating double square root of n. The iteration is also skipping:
- Skipping even numbers
- Skipping by 2p
- Starting to read from p*p
Now that I had primes from 0 to m (in red), I could start marking the numbers from 0 to n (black line). In order to parallelize it, I decided to split 0 to n so each thread only calculates for a certain portion. Since we're using bytearrays for splitting and skipping even numbers, it's important that we split so each thread doesn't start at the middle of a byte, but rather starts at the following 16th byte.
Now that we've marked all numbers from 0-n, we have a bitarray of marked numbers (numbers which is not prime numbers). All we have to do at the end is sequentially convert this bitarray to an array of integer with prime numbers from 0-n
4. Parallel factorization of a large number – how you did the parallelization – consider including drawings
(Before starting this process, it's important to know the prime numbers has already been generated by the sieve)
The prime numbers were evenly distributed among all threads. Each thread would be responsible for around the same amount of prime numbers to mark
In each thread, it iterates the assigned primes and check if the prime number is a factor in n. If yes, it will add this prime to an array, otherwise it will continue searching.
After all threads were finished, they had a local array of factors. The final thing we have to do is to merge these arrays and add the remainder if there is any. Then we have the complete factorization and are finished.
The main method should be used for launching the program. See 2. (User guide) on how to launch it. Based on the input the program either:
- Runs sequentially where it's using the precode for the sieve. Then it factorizes using the sequentially code which I created.
- Runs parallel. In this process, It's using the parallel sieve which I've built to obtain the prime numbers. After this, it uses the result of the prime numbers to factorize in parallel.
- Runs benchmarks. In this process, it's taking every individual process (sequential sieve, sequential factorization, parallel sieve, parallel factorization) and measures the time for each of these. It also displays the speedsup achieved. It calculates the median runtime of total 7 runs.
- Runs tests. In this case it's doing the whole procedure of factorizing sequential and factorizing in parallel. After completed, it compares the outout of the results for each sequential & parallel and ensures these are equal.
6. Measurements – includes discussion, tables, graphs of speedups for the four values of N, number of cores used.
Note: Time in in milliseconds
n | Sequential | Parallel | Speedup |
---|---|---|---|
2000000 | 7.235 | 8.130 | 0.889 |
20000000 | 69.18 | 60.838 | 1.137 |
200000000 | 855.313 | 496.267 | 1.723 |
2000000000 | 12616.962 | 7223.361 | 1.746 |
n | Sequential | Parallel | Speedup |
---|---|---|---|
2000000 | 7.041 | 68.891 | 0.102 |
20000000 | 53.188 | 81.871 | 0.649 |
200000000 | 319.489 | 135.36 | 2.360 |
2000000000 | 1147.948 | 379.519 | 3.024 |
n | Sequential | Parallel | Speedup |
---|---|---|---|
2000000 | 27.900 | 63.561 | 0.438 |
20000000 | 119.145 | 136.906 | 0.870 |
200000000 | 1363.669 | 805.802 | 1.692 |
2000000000 | 14228.420 | 7664.074 | 1.856 |
From the results we can first see that paralleling the sieve seems to give a speedup. Once n increases, the efficiency of multithreading the program is also increasing. At 2 billion, we have a speedup for around 1.856 which means that the algorithm is almost runs at twice the speed due to running multithreaded. For smaller numbers however, it's not giving large speedsup. For e.g 20000000 we see that we only achieve speedup of 0.870 and when n is even lower, then parallelization is slower likely due to overhead of starting threads etc. We can also see that the parallel sieve does complete at 7223.361 which is below the requirement of the sieve completing in 30 seconds.
Over to the factorization part, we can see that the speedup begins with around 0.102 which implies that for low numbers, paralleling does not increase the speed. The trend shows that as N does become larger, the efficiency of paralleling increases. When is n is 200000000 or larger, we achieve a speedup greater than 1. For even greater n numbers, we see that the effect of parallelization is even higher (2000000000 has speedup of 3.024). We see that having multiple threads which each factorizes subset of the prime numbers does drastically increase the efficiency for high numbers. For low numbers however, having a sequential solution is faster likely due to the overhead of synchronization's and creating/starting threads.
For the whole procedure of sieve + factorization, we see that the sequential algorithm is faster for lower numbers likely due to not having to create threads, synchronize etc. As n increases however, the overhead turns out to be less significant and parallelize runs faster than sequential. I'd expect this speedup to increase more if we're increasing n to a higher number.
We can see that parallelling the sieve does give a speedup, especially once n becomes larger. We also see that this occurs in parallelling the factorization. As n becomes larger, the efficiency increases. Multithreading enables us to run the factorization faster once there's large numbers. In total, when adding the sieve and the factorization, we see that parallelization is beneficial when n is 20000000 or more. Lower numbers are faster running in sequence which is likely due to the overhead of creating threads, synchronization of threads etc in the parallel solution.
Running benchmarks. This can take a while, please wait...
Starting benchmarking sieves...
Sequential Sieve used median time 7.2358ms for n = 2000000
Parallel Sieve used median time 8.1305ms for n = 2000000
Speedup: 0.8899575671852901
Sequential Sieve used median time 69.1867ms for n = 20000000
Parallel Sieve used median time 60.8381ms for n = 20000000
Speedup: 1.1372265077311752
Sequential Sieve used median time 855.3137ms for n = 200000000
Parallel Sieve used median time 496.2678ms for n = 200000000
Speedup: 1.723492235442235
Sequential Sieve used median time 12616.9623ms for n = 2000000000
Parallel Sieve used median time 7223.3613ms for n = 2000000000
Speedup: 1.7466885257421638
Finished benchmarking sieves...
--------------------------------
Starting benchmarking factorization...
Sequential Factorization used median time 7.0416ms for n = 2000000
Paralell Factorization used median time 68.8914ms for n = 2000000
Speedup: 0.10221304836307579
Sequential Factorization used median time 53.1885ms for n = 20000000
Paralell Factorization used median time 81.8712ms for n = 20000000
Speedup: 0.6496606865417875
Sequential Factorization used median time 319.4897ms for n = 200000000
Paralell Factorization used median time 135.36ms for n = 200000000
Speedup: 2.3602962470449174
Sequential Factorization used median time 1147.9484ms for n = 2000000000
Paralell Factorization used median time 379.5191ms for n = 2000000000
Speedup: 3.024744736167429
Finished benchmarking factorization...
--------------------------------
Starting benchmarking sieve + factorization...
Sequential sieve + factorization used median time 27.9006ms for n = 2000000
Parallel sieve + factorization used median time 63.5612ms for n = 2000000
Speedup: 0.4389564702994909
Sequential sieve + factorization used median time 119.1454ms for n = 20000000
Parallel sieve + factorization used median time 136.9062ms for n = 20000000
Speedup: 0.8702703018563074
Sequential sieve + factorization used median time 1363.6698ms for n = 200000000
Parallel sieve + factorization used median time 805.8028ms for n = 200000000
Speedup: 1.69231206444058
Sequential sieve + factorization used median time 14228.4203ms for n = 2000000000
Parallel sieve + factorization used median time 7664.0743ms for n = 2000000000
Speedup: 1.8565086588474227
Finished benchmarking sieve + factorization
--------------------------------
Process finished with exit code 0
Run the command
java -jar target/benchmarks.jar
The measurement begins with warming up in order to make sure that:
- Program is in memory/cache
- JIT has done it's optimizations
- The data has been read from disk
- The data has been through memory/cache
It warms up once. After this, it runs 3 iterations of the benchmarks and collects the results of these runs. We can see the results as following for n=200_000_000:
Iteration 1: 1.101 s/op
Iteration 2: 1.089 s/op
Iteration 3: 1.076 s/op
Result "src.JbhBenchmarks.testSequential":
1.089 ±(99.9%) 0.232 s/op [Average]
(min, avg, max) = (1.076, 1.089, 1.101), stdev = 0.013
CI (99.9%): [0.857, 1.320] (assumes normal distribution)
Iteration 1: 0.700 s/op
Iteration 2: 0.694 s/op
Iteration 3: 0.691 s/op
Result "src.JbhBenchmarks.testParallel":
0.695 ±(99.9%) 0.079 s/op [Average]
(min, avg, max) = (0.691, 0.695, 0.700), stdev = 0.004
CI (99.9%): [0.616, 0.774] (assumes normal distribution)
Benchmark Mode Cnt Score Error Units
JbhBenchmarks.testParallel avgt 3 0.695 ± 0.079 s/op
JbhBenchmarks.testSequential avgt 3 1.089 ± 0.232 s/op
From the benchmarks we can see that we can run the sequential procedure around 1.089 per second the parallel procedure 0.695 per second. We can also see that we've achieved a speedup of around 1.56 by running it parallel. We concluded these results by seeing the time it took on average running a total of 3 runs for each parallel and sequential. Due to the warmup, I believe the variance for each run is lower between each run. We can also see from the results that there is not much variance between each run. As there's not much variance, using the average time seems like a good fit. We could potentially run the methods more than 3 times, but for this benchmarking I thought it was sufficient. As a summary, the results can be used to argue that paralleling this procedure for such number n does improve the speed of the algorithm.
The benchmarks are set to report the average time usage
How does factorizationa nd Sieve of Sieve of Eratosthenes work? (Not neccesairy to read for examiner)
A prime number is a whole number which is only dividable by itself and 1
Given the numbers 1-17
- We ignore 1 (x)
- Second number is 2, we underline it. It's a prime
- Then we jump by 2, cross that out, jump by 2, cross that out, jump by 2, cross that out... (2,4,6,8,10,12,14,16 are crossed out)
- Then we go to next number which has not been crossed out (3). That's a prime number and cross out every jump by 3 (6, 9, 12, 15 crossed out if not already)
- Then we go to next number which is not crossed out, (5) which is a prime number and cross out every jump by 5 (10, 15 if not crossed out already)
- Then we go to 7, but 7 is greater than of the maximum number now (17). It means that the remaining numbers which are not crossed out, is primes (11, 13, 17)
- We should also remove the even numbers before starting this process (besides 2).
- We can do the stepping by starting looking at numbers from p*p (e.g for 3 we start at 9, or for 7 we start at 49)
- If we're stepping forward with 3 we hit an odd number each 2 time (3, 6(odd), 9, 12(odd)) so we can skip jump by 6 all the time (2*p)
Any number higher than 1 is possible to factorise as a product of prime numbers N = p1 * p2 * p3 * px. E.g 4 = 22 (2 is a prime number) E.g 6 = 32 (3 and 2 is prime number) If you ignore the order of the factorised numbers, there's only one possible combination (so 6 can be factorised as 32 or 23 but if you ignore the order they are the same). In this submission we order them from highest to lowest If there's only one number in this factorization, the number itself is a prime number.
If we wanna factorize e.g 532 the way we do is:
- Start taking numbers from the sieve, starting with 2
- Try divide that into 532: 532/2 = 266. So current factorization is 2
- Then we retry to divide it by the number again: 266/2 = 133. Current factorization is 2*2
- Then we try again 133/2 which doesnt work.
- So we step to the next number in the sieve (3) and check that it's not greater than the square root of by multiple 3*3 (9) and checking that its not higher than
- 3 is also not dividable, so we go to next number in sieve (5), not dividable by 133.
- So we go to next number (7). It's dividable by 7! So we end up with 19 and the factorisation 2 * 2 * 7.
- Then we try 7 again and check if it divides by 17. But before this, we have to check if we should keep trying:
- We should stop trying whenever the candidate is greater than the square root of the number we're working on (square root of 19 is less than 5. And 7 is greater than this so we should stop trying)
- So we have reached out stop condition. If there is a number left (19 in this case) we know it's a prime number and therfore also a factor.
- So the factorization is now 227*19.