Giter Site home page Giter Site logo

akashbing / csse-senior-thesis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gateslm/csse-senior-thesis

0.0 1.0 0.0 18.79 MB

Senior thesis project done for 2016-2017. Utilizes Hadoop Ecosystem, Java MapReduce, Apache Pig, Apache Hive.

HTML 43.05% Java 32.43% Shell 0.50% TeX 23.59% PigLatin 0.42%

csse-senior-thesis's Introduction

CSSE-Senior-Thesis

Senior thesis project done for 2016-2017. Utilizes Hadoop Ecosystem, Java MapReduce, Apache Pig, Apache Hive.

Hadoop Image

This project was completed during the academic year at Rose-Hulman Institute of Technology for the Computer Science and Software Engineering Departmant, under the adivisement of Dr. Sriram Mohan.


Folder Layout

In the following setup, each of the folders have the following:

  • Diagrams: Contains various diagrams that have been used in papers I typed or for presentations.
  • Example_Data: Contains subfolders that hold example data:
    • AdvertisementData: Contains example output of the advertisement randomly generated by GenerateThoroughBillboardData project. Contains the regionAds.csv for the 29 Regions over the time period coded into the program and segmentAds.csv for the ~1300 Segments over the time period coded into the program.
    • RegionData: 3 files, csv, xml, and json, that were downloaded from the Chicago Data Portal using automated download script. The 3 show what the 'Congestion Estimates by Regions' datasets look like.
    • SegmentData: 3 files, csv, xml, and json, that were downloaded from the Chicago Data Portal using automated download script. The 3 show what the 'Congestion Estimates by Regions' datasets look like.
    • Visual Data: Folder containing the datasets used to find the coordinates of the regions and segments, and the subfolder holds an HTML file that visually shows the map of zip codes of Chicago and 58 blimps, where the matching colors mean the North-East corner and South-West corner of the region square.
  • Misc: Various files and projects
    • Projects/GenerateBillboardData: Project that was initially used to create advertisement data. (Eclipse Project)
    • Scripts: Scripts that I used to keep track of commands.
    • Summaries: Documents related to the progress of the thesis during the academic school year.
  • Paper: Document to create final thesis paper for the class.
  • Presentations: Various files used for a poster presentation and for quarterly updates
  • Projects: Projects used for the project (Eclipse Projects)
    • DataCollection:Using jsoup to download Census data per zip code.
    • GenerateThoroughBillboardData:Generates the advertisement data for segment and region. Every row would be the segment or region identifier, with an advertisement, rating, and length of time for the advertisement. The data is available in the Example_Data/AdvertisementData
    • SegmentsInRegions:Quick program to determine which segments fell in a region. Was an idea to determine how much segments factor into a region.
    • Traffic Analysis: Main aggregation of data. Contains 4 seperate projects with in the project itself.
      • CongestionFinder:Original prototype for day analysis. Attributes from this project is used in the DayAnalysis project, which is described below
      • DayAnalysis:Took all the data for a region and segment on a given day and found the statistics of congestion and found the longest period of congestion.
      • FlowAnalysis:Original prototype for flow analysis. The code was reused in the project FlowAnalysisGrouping but with added features.
      • FlowAnalysisGrouping:Takes in advertisement data joined with DayAnalysis output data and determines is a given area gets an advertisement and what the best advertisement would be for the region and segment. Additional to this, the determining of advertisement placement also took into account of the length of time the congestion calculations took place. Time split ups are week of year, month of year, year, and all time (Same as FlowAnalysis)
  • Results:Final results after running all the commands found in the trafficAnalysisCommands.txt.
  • Scripts: Contains all the scripts for Pig, Hive, and Bash to either clean and parse the data, store into an SQL format, or download the data. The order to run the commands is found in the trafficAnalysisCommands.txt.

trafficAnalysisCommands.txt:

Contains the order and all the commands to run the project and produce results similar to mine, unless the advertisement data is randomly generated.


Please feel free to contact me for more information through GitHub.

csse-senior-thesis's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.