William Clift Data Structures April 2020
Parallel Merge Sort Assignment
Concepts
The given code allowed us to perform a merge-sort algorithm on an array set of integers recursively. The task was then to parallize this algorithm using extends RecursiveAction. We introduced the ForkJoin framework so that each time we create a new MergeSort object, we could solve the subproblem of each half of the array. Once the array is split down to bits of two, we return and then call merge() which sorts by insertion from the two sublists to the big list. This concludes when the entire list is considered in the last recursive return, and the full list is sorted. We take advantage of multiple cores on any given machine by splitting into a two new processes every time we do the recursive step. Let us now examine the implementaton of its parallelization.
Implementation
The method compute() was added to the given code first to override its method in RecursiveAction. Then when we start the merge sort, we call compute() instead of mergeSort() for every new object that is forked in a new process. If it is not multi-threaded, we continue classically with the recursion, else we create two new subprocesses and then use ForkJoinTask.invokeAll() to fork the the process into two new ones. I also have the program keeping track of how many cores on the machine are remaining so that we do not exceed that in number of threads, making many threads that do small amounts of work. Once we have split up the tasks in all cores of the machine, we continue with classic mergeSort. Finally, they merge and the array comes back sorted.
The Addition of Generics
For an additional challenge, I introduced generics to the algorithm so that this could be used for any set of data. This required using <T extends Comparable<? super T>> and ensuring that each time that we work with the datastructure, we are able to handle any type that it contains. This made the program more complicated because each reference to any datastructure or variable with this type needed to be replace with T that would stand in for the type given at runtime. But, given any primitive datatype that extends comparable, we will be able to sort it.
Empiricle Analysis
As we know from our studies, the merge-sorting algorithm has a runtime of RandomNumGenerate, transfering these to a file, and then reading the file into the program and reading the time output. The time, of course, started recording once the merge sort algorithm started running.
Single, Multi-Threaded, and Difference (Runtime vs. Number of Integers)
As can be seen above, the blue data-points refer to the single-threaded algorithm, the green is multi-threaded, and the orange is the difference. This shows a speedup going form single to multi-threaded, which generally expands as the number of integers increase. This is added by the fact the the difference (Single Thread - Multiple Threads) is majorly positive values. Let us now observe an analysis of regresion on the datasets.
Trendline Comparison (Runtime vs. Number of Integers)
Adding the regresssion plots gives us these models. We can see that by using a logarithmic regression, the graphs follow this nicely. These lines are functional The following equations represent their functions.
For the Single-threaded, labeled in blue:
For the Multi-threaded, labeled in green:
For the Difference, labeled in orange:
Clearly, these regression lines properly serve the algorithms runtime as it increases. By this, we can conclude the runtime for our algorithm must be
Conclusion
In this programming exercise, we were able to see the parallelization of a merge-sorting algorithm that led to a speedup in the runtime when looking at the trend in our analysis. We also got practice using generics in the Java Comparable framework as well as forkJoin. Getting the chance to thoroughly investigate an elegant sorting algorithm that uses recursive techniques to increase efficiency gives us insight into the problem of sorting.