A distributed implementation of argon built using Cloud Haskell with a PostgreSQL database.
Distributed-Argon uses cloud haskell, implementing the work-stealing and the master/slave algorithm, for distributing the workload of argon, a library which measures code complexity.
The program accepts a GitHub repository and then calculates the complexity for every file of every commit in the project, storing the results in a database. I created another repository Charting-Complexity to generate the graphs.
I decided to implement two algorithms and graph their results against eachother.
- Work-Stealing
A worker nodes steal work from the manager. the manager sends each file on a first-come-first-serve basis. The workers evaluate the complexity, return the result and request more work from the manager. This implementation is often referred to as the self-scheduling or work-stealing pattern.
Link to implementation in the source
- Master/Slave
A manager node decides on the distribution of the work. the manager splits up the work evenly (per-file basis) and distributes an even amount to each worker.
Link to implementation in the source
The manager stores the results it receives from the workers in a database as they come in non-deterministically.
As I would have expected, the work-stealing pattern was a faster approach on average. This can be seen from the sample results provided below. Rather than the manager sending files, and the workers waiting, it is faster for the manager to send work to whoever is ready. In the master/slave there is the potential for lost working time while a manager is waiting for a worker to finish some previous task. This does not occur with the work-stealing pattern however, as the manager simply sends the work to whoever is ready.
A PostgreSQL database is used to store the revelant information relating to a repositories complexity and the time taken with various amounts of nodes. There a database maintains two
Repository
Id | Url | Nodes | Start Time | End time |
---|---|---|---|---|
1 | https://github.com... | 2 | 2017-11-26 15:02:36.830273+00 | 2017-11-26 15:03:25.63044+00 |
Commit Info
Id | Commit | Start Time | End Time |
---|---|---|---|
1 | 22939d... | 2017-11-26 15:02:36.830273+00 | 2017-11-26 15:02:36.830273+00 |
Commit Results
Id | Commit | File Path | Complexity |
---|---|---|---|
1 | 22939d... | Distributed-Argon/src/Lib.hs | JSON data |
- PostgreSQL to store the data.
- Stack to build and run the project.
stack build
Fire up two shells and execute the following scripts.
bash workers.sh
bash run.sh <Github Repository> <pattern>
The patterns can be
work-stealing
or master-slave
The number of workers, host address and port numbers, can be edited by altering the worker.sh and manager.sh scripts.
I have built a graphical display of the results using Chart.js. A link to that repo can be found here
Alternatively, as all the necessary information is stored in a database, it can therefore be manipulated in any way you see fit.
To all the argon contributors for allowing me to display my distributed programming skills with their great library!