imuqtadir / nasdaq-stocksvolatility-mapreduce Goto Github PK
View Code? Open in Web Editor NEWThe project is aimed to find top ten most volatile stocks and top ten least volatile stocks using HDFS. HDFS is designed to store very large datasets reliably. Given are three datasets called large, medium and small. These datasets contain many files; each having stocks pricing data with respect to the company. We aim to use MapReduce in order to speed-up the process of reading files and processing them in parallel. The input files are read by multiple mappers and partitioned into smaller parts for their processing. The mapper then outputs <key, value> pair which serves as an input to the reducer. Reducer then takes the <key, List<value>> pair as itβs input and processes and outputs the results to the user.