Giter Site home page Giter Site logo

kaustubhsagale / te_it_dsbda_assignments_sppu Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ranjeetkumbhar01/te_it_dsbda_assignments_sppu

0.0 0.0 0.0 12.09 MB

314457: DSBDA Lab | TE IT SPPU (2019 Pattern)

License: MIT License

Java 0.06% Jupyter Notebook 99.94%

te_it_dsbda_assignments_sppu's Introduction

Custom Badge

Third Year Information Technology (2019 Course) - 314457: DS & BDA Lab

Introduction

This repository contains the lab assignments and projects for the DS & BDA (Data Science and Big Data Analytics) Lab course, which is part of the third-year Information Technology curriculum for the 2019 batch.

Table of Contents

  • Group A: Assignments based on the Hadoop
  • Group B: Assignments based on Data Analytics using Python
  • Group C: Model Implementation
  • Usage
  • Requirements
  • License
  • Contact

Group A: Assignments based on the Hadoop

This assignment involves the installation of Hadoop on either a single node or multiple nodes. The instructions provided here will guide you through the installation process.

This assignment involves designing a distributed application using MapReduce (Java) to process a log file from a system. The goal is to identify the users who have logged in for the maximum period on the system.

This assignment involves using HiveQL to build a flight information system. It covers various tasks such as creating, dropping, and altering database tables, loading data into tables, inserting new values and fields, joining tables, and creating an index on the flight information table.

  • Creating,Dropping, and altering Database tables.
  • Creating an external Hive table.
  • Load table with data, insert new values and field in the table, Join tables with Hive
  • Create index on Flight Information Table
  • Find the average departure delay per day in 2008.

Group B: Assignments based on Data Analytics using Python

In this assignment, we work with the Facebook metrics dataset and perform the following operations:

  1. Create data subsets: We create subsets of the dataset based on specific criteria or filters.

  2. Merge Data: We merge multiple datasets together based on common columns or keys.

  3. Sort Data: We sort the data based on one or more columns in ascending or descending order.

  4. Transposing Data: We transpose the data to interchange rows and columns.

  5. Shape and Reshape Data: We reshape the data to convert it into a different structure or format.

  6. Visualize the data: We use Python libraries such as Matplotlib and Seaborn to plot graphs and visualize the data.

In this assignment, we work with the Air Quality and Heart Diseases datasets and perform the following operations:

  1. Data Cleaning: We clean the datasets by handling missing values, outliers, and inconsistent data.

  2. Data Integration: We integrate multiple datasets into a single dataset based on common attributes or keys.

  3. Data Transformation: We transform the data by applying mathematical or statistical operations, feature scaling, or encoding categorical variables.

  4. Error Correcting: We correct errors in the data, such as fixing inconsistent values or resolving data quality issues.

  5. Data Model Building: We build predictive models or analyze patterns in the data using machine learning algorithms or statistical techniques.

  6. Visualize the data: We use Python libraries such as Matplotlib and Seaborn to plot graphs and visualize the data.

Visualize the data using Python libraries matplotlib, seaborn by plotting the graphs.

In this assignment, we work with the Adult and Iris datasets and perform the following data visualization operations using Tableau:

  1. 1D (Linear) Data Visualization: We visualize data along a single dimension using techniques such as bar charts, histograms, or box plots.

  2. 2D (Planar) Data Visualization: We visualize data in two dimensions using techniques such as scatter plots, bubble charts, or heatmaps.

  3. 3D (Volumetric) Data Visualization: We visualize data in three dimensions using techniques such as 3D scatter plots, surface plots, or volume rendering.

  4. Temporal Data Visualization: We visualize data over time using techniques such as line graphs, area charts, or time series plots.

  5. Multidimensional Data Visualization: We visualize data with more than three dimensions using techniques such as parallel coordinates, radar charts, or trellis plots.

  6. Tree/Hierarchical Data Visualization: We visualize hierarchical or tree-structured data using techniques such as tree maps, sunburst charts, or dendrograms.

  7. Network Data Visualization: We visualize network or graph data using techniques such as node-link diagrams, force-directed layouts, or chord diagrams.

Group C: Model Implementation

Assignment 1: Web Scraping

Create a review scrapper for any ecommerce website to fetch real time comments, reviews, ratings, comment tags, customer name using Python.

Usage

Each assignment is organized into separate folders, and within each folder, you will find the necessary files and code for that assignment. Feel free to explore the code and datasets provided.

Requirements

To run the code in these assignments, you need to have Python installed on your system along with the required libraries and dependencies. Make sure to install the necessary packages mentioned in the assignment files. For Tableau, you will need to have Tableau software installed on your machine.

License

This project is licensed under the MIT License. Feel free to use the code and materials for educational purposes or personal projects.

Contact

If you have any questions or suggestions, please feel free to contact: Email:

  • Ranjeet - contact [dot] ranjeetkumbhar [at] gmail [dot] com

te_it_dsbda_assignments_sppu's People

Contributors

ranjeetkumbhar01 avatar runburn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.