Giter Site home page Giter Site logo

sports-analytics's Introduction

Problem Statement 2

Methods and solutions are well explained here. Highly recommended to go through Readme.md to understand the methods and solutions of the project.

Introduction

As a data analysis intern, you have to analyse sports data for a client. You are given two datasets related to IPL (Indian Premier League) cricket matches. One dataset contains ball-by-ball data and the other contains match-wise data. You have to import the datasets into an SQL database and perform the tasks given in this assignment to find important insights from this dataset.

Quick Access

DataSets

IPL_matches.csv

IPL_Ball.csv

Code

sports_data_analytics.sql

Read File

Readme.md

About the Data

The "IPL_Ball.csv" file is for ball-by-ball data and it has information of all the 19,3468 balls bowled between the years 2008 and 2020. It has 17 columns.

The "IPL_matches.csv" file contains match-wise data and has data of 816 IPL matches. This table has 17 columns.


Project

Prerquisites before solving the problem statement

  • Creating a database in MySQL Commandline Client
CREATE DATABASE ipl;
  • Using the database
use ipl;

1. Create a table named matches with appropriate data types for columns.

id is used as PRIMARY KEY here

CREATE TABLE matches(
    id INT PRIMARY KEY,
    city VARCHAR(40),
    date DATE,
    player_of_match VARCHAR(40),
    venue VARCHAR(80),
    neutral_venue INT,
    team1 VARCHAR(80),
    team2 VARCHAR(80),
    toss_winner VARCHAR(80),
    toss_decision VARCHAR(20),
    winner VARCHAR(80),
    result VARCHAR(40),
    result_margin INT,
    eliminator VARCHAR(10),
    method VARCHAR(10),
    umpire1 VARCHAR(40),
    umpire2 VARCHAR(40)
);

Then describe the table.

DESCRIBE matches;

Alt text

2. Create a table named deliveries with appropriate data types for columns.

Here id is used as a FOREIGN KEY.

CREATE TABLE deliveries (
    `id` INT,
    `inning` INT,
    `over` INT,
    `ball` INT,
    `batsman` VARCHAR(40),
    `non_striker` VARCHAR(40),
    `bowler` VARCHAR(40),
    `batsman_runs` INT,
    `extra_runs` INT,
    `total_runs` INT,
    `is_wicket` INT,
    `dismissal_kind` VARCHAR(40),
    `player_dismissed` VARCHAR(40),
    `fielder` VARCHAR(40),
    `extras_type` VARCHAR(40),
    `batting_team` VARCHAR(40),
    `bowling_team` VARCHAR(40),
     FOREIGN KEY(id) REFERENCES matches(id)
);

Describing the Table.

DESCRIBE deliveries;

Alt text

3. Import data from CSV file 'IPL_matches.csv' attached in resources to matches.

Prerequisite

  • Open the csv files and save as in 'CSV MS-DOS' format to avoid encryption (It is probably 'CSV comma delimited' prior).

  • Remove all the commas contained in the observations using excel by, Selecting the column > Home > Editing tab > Find & Select > Replace.

  • Change to yyyy-mm-dd format by right cick > Format cells > Number > Time > selecting 'yyyy-mm-dd' format > Ok. And save te file.

  • For the sql version above 5.6, there must be a restriction to import file. So, you have to paste the importing file in the file directory. File directories are mostly a hidden file.

  • Check the directory.

show variables like "secure_file_priv";

Alt text

  • Apply show all hidden files > search the directory in pc > paste the files in the directory.

Importing 'IPL_matches.csv'.

LOAD DATA INFILE "C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\IPL_matches.csv" INTO TABLE matches
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(id,city,date,player_of_match,venue,neutral_venue,team1,team2,toss_winner,toss_decision,winner,result,result_margin,eliminator,method,umpire1,umpire2)
;

Alt text

4. Import data from CSV file ’IPL_Ball.csv’ attached in resources to ‘deliveries’

LOAD DATA INFILE "C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\IPL_Ball.csv" INTO TABLE deliveries
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;

Alt text

5. Select the top 20 rows of the deliveries table.

SELECT * FROM deliveries
LIMIT 20;

Alt text

6. Select the top 20 rows of the matches table.

SELECT * FROM matches
LIMIT 20;

Alt text

7. Fetch data of all the matches played on 2nd May 2013.

SELECT * FROM matches
WHERE date = '2013-05-02';

2013-05-02

8. Fetch data of all the matches where the margin of victory is more than 100 runs.

SELECT * FROM matches
WHERE result_margin > 100;

Alt text

9. Fetch data of all the matches where the final scores of both teams are tied and order it in descending order of the date.

SELECT * FROM matches
WHERE result = 'tie'
ORDER BY date DESC;

Alt text

10. Get the count of cities that have hosted an IPL match.

SELECT COUNT(DISTINCT city) 
FROM matches;

Alt text

11. Get the count of venues that have hosted IPL matches.

SELECT COUNT(DISTINCT venue)
FROM matches;

Alt text

12. Fetch data of 10 matches played before 2015.

SELECT * FROM matches
WHERE date < '2015-01-01'
LIMIT 10;

Alt text

13. Select the 10 matches ordered by winner, venue then city in descending order.

SELECT * FROM matches
ORDER BY winner DESC, venue DESC, city DESC
LIMIT 10;

winner_venue_city

14. Name different Non-strikers played in the IPL.

SELECT DISTINCT non_striker
FROM deliveries;

non_striker

15. Find all the winner teams who have result margin above 100 and won by runs.

SELECT * FROM matches
WHERE result_margin > 100 AND result = 'runs';

margin_above100

16. Find average and sum of result margins when Rajasthan Royals is winner.

SELECT AVG(result_margin) FROM matches
WHERE winner = 'Rajasthan Royals';
SELECT SUM(result_margin) FROM matches
WHERE winner = 'Rajasthan Royals';

margin_rroyals sum_rroyals

17. Find name of stadiums that are named under associations and rename it (column name) as Stadium. Find which cities these stadiums are located.

SELECT DISTINCT venue AS  Stadium, city FROM matches
WHERE venue LIKE '%Association%';

association

18. Create a database comprising of id, player of match, batsman and bowler usingleft join of 10 rows.

SELECT matches.id, matches.player_of_match, deliveries.batsman, deliveries.bowler
FROM matches
LEFT JOIN deliveries
ON matches.id = deliveries.id
LIMIT 10;

left_join

sports-analytics's People

Contributors

rh540640 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

rahulraman1981

sports-analytics's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.