CSC352 Sharon Pamela's Project 2013

From CSclasswiki
Jump to: navigation, search

Abstract
My project will be about comparing MPI and Hadoop’s performance when sorting through the wikidump files. These two tools will be implemented using their full potential as available to us in class (e.g. all 8 cores in MPI, all nodes available in AWS, etc…). The goal of my project is to aid a person with access to the same tools we have in class and interested in doing a collage or working with the wikidump in general choose the better tool for loading and working with the wikidump files. With this study I'm hoping to find a significance difference in execution time from one method to the next. However, time gained is only one one peace of the puzzle. This project will also give insight into the complexities of the programs being used. Both the MPI and Hadoop programs will use the manager-worker paradigm we saw in class, but not in a round-robing way. Instead, the manager will pass 2 chunks of 10 images each to the workers to start, for example. When the worker finishes the first packet it would notify the manager before it starts working on its the second one. When the worker notifies the manager that it is working in its second package, the manager starts preparing the next package to be sent. This way the manager and worker work at roughly at the same time and the manager is not wasting time waiting for the worker to finish in order to send it its package. The hope is that this method will be much more efficient than the the round robin method we have seen in the past.

      • Under construction