Giter Site home page Giter Site logo

hackerspace-pesu / best11-fantasycricket Goto Github PK

View Code? Open in Web Editor NEW
24.0 4.0 17.0 5.2 MB

Predicting the Best 11 for a fantasy cricket game

License: GNU Affero General Public License v3.0

Python 63.45% CSS 9.13% HTML 25.88% Dockerfile 1.54%
fantasy-cricket regress python fastapi scrapy sports scrapyrt espncricinfo

best11-fantasycricket's Introduction

Best11-Fantasycricket

HackerSpace-PESU

Description

In the past year or so fantasy cricket has been getting a lot of traction and with recent deal struck by Dream11 with IPL, more people are playing fantasy cricket than ever, but the problem is lot of people do not make right choices in choosing the team and end up thinking winning is all about luck and nothing else. With our project we want to break that myth by making a model which when given with players predicts the best 11 that will have the most points in the fantasy league. We have gathered statistics of players throughput their career and the model takes in the scores last 5 games a player has played and it tries to predict his score in the next game using a linear model.

Requirements

  1. FastAPI
  2. sklearn
  3. scrapyrt
  4. scrapy

Install using

pip3 install -r requirements.txt

Local Development

To run our project follow these steps

  1. Clone our repo into your system

  2. Change your directory to 'Best11-Fantasycricket' using

cd Best11-Fantasycricket
  1. Linux and MACOS

    1. Type nano /etc/hosts on your terminal or open /etc/hosts on your prefered editor

    Windows

    1. Open C:\windows\system32\drivers\etc\hosts in your prefered editor

    2. And add the below line to the the file and save

    127.0.0.1 espncricinfo

    OR

    1. Open app/fantasy_cricket/scrapyrt_client.py in your prefered editor

    2. Change line 16 to

      	self.url = "http://localhost:9080/crawl.json"
  2. Open a tab on your terminal and run

uvicorn app.main:app

  1. Open another tab on your terminal and run

scrapyrt

  1. Open http://localhost:8000/ and voila!!

Note: Visit http://localhost:9080/crawl.json with the correct queries to see the crawler api

Docker

  1. Follow the steps:

    	docker build -t espncricinfo:latest "." -f docker/espncricinfo/Dockerfile
    	docker build -t best11:latest "." -f docker/11tastic/Dockerfile
    	docker-compose -f docker/docker-compose.yaml up
  2. Visit http://localhost:8080/ to see the website in action

Note Visit http://localhost:9080/crawl.json with the correct queries to see the crawler api

How do I contribute to this project????

โš ๏ธ Warning! Existing contributors and/or future contributors , re-fork the repo as the commit-history has been rewritten to reduce size of the repo while cloning which makes cloning much faster than before!.

Refer to the Contributing.md file of our repository

If you have any suggestions for our project , do raise a issue and we will look into it and if we think it helps our project we will keep it open until its implemented by us or by anyone else

If you have any questions regarding our project , you can contact any of the maintainers(info on respective profile pages) or raise a issue and we'll answer you as soon as possible.

Thank You

Maintainers

  1. Royston

  2. Shreyas

  3. Sammith

Acknowledgements

  1. Special thanks to scientes for allowing us to use the server to host the website

  2. We would like to thank espncricinfo for their amazing website with daily updates and availabilty to scrape

If you liked our project we would really appreciate you starring this repo.

Thank you

best11-fantasycricket's People

Contributors

dan-329 avatar jukiforde avatar milanmandal avatar nimendrak avatar roysti10 avatar sammithsb avatar scientes avatar srp457 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

best11-fantasycricket's Issues

[BUG] Web crawler searches through matches from the 1900s

Describe the bug
The web crawler in feature-crawler
takes in match records from the 1900s . This wastes a lot of time and reduces efficiency of the crawler
To Reproduce
Steps to reproduce the behavior:

  1. Follow the instructions in the README file to run the crawler
  2. Wait for the Ids crawl to finish and notice

Expected behavior
The solution to this would be to set a filter which takes match records only from the year 2017 and greater
Possible solution
in cralwer/cricketcrawler/spiders/howstat.py in function parse_scorecard

if int(date[0:4]) >= 2017:
     item=MatchidItem(name=url[startint+10:],folder=folder,matchid=matchid,date=date)
      yield item

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • Version [feature-crawler]

Additional context
The starting point to this might be crawler/cricketcrawler/spiders/howstat.py

[FEATURE] Add batting crawler to the webcrawler

Is your feature request related to a problem? Please describe.
The crawler currently only crawls for new players , and match ids. The web crawler also needs to crawl for batting stats

Describe the solution you'd like
A function similar to parse_player and parse_scorecard Extend function parse_scorecard in crawler/cricketcrawler/cricketcrawler/spiders/howstat.py to batting and get the following

  • runs
  • no of 4s and 6s
  • strike rate

Refer Dataset.md to understand the matchcodes and playercodes

Additional context
Crawler can be found in feature-webcrawler branch of the repo. Crawler is built using [scrapy](https://scrapy.org/)

Ignore retired players

Describe the bug
It is evident that retired players don't play anymore. The webcrawler still includes them which needs to be filtered

To Reproduce
Steps to reproduce the behavior:

  1. Follow the instructions in README.md and notice once it starts collecting players. It can also be noticed in data_crawler/ids_names.csv.

Possible Solution
The solution :
in cralwer/cricketcrawler/spiders/howstat.py , in function parse_player

if retired == False:
          yield PlayerItem(name=url[url.find("?PlayerID=")+10:],gametype=gametype,folder=".",longname=name,retired=retired)

Screenshots
Screenshot from 2020-11-14 13-51-43

Desktop (please complete the following information):

  • Version [master]

[DATA] Organizing data into one csv file

Is your data format related to a problem? Please describe.
Currently, we have about 337 files per batting and bowling, each player has his own csv file, this wouldnt work once the number of players keep increasing.

Describe the solution you'd like
Inside the zip folder there are two folders called ODI and T20, these two folders must be converted to a single csv file called zip_ODI.csv and zip_T20.csv
the format of the csv file is as follows:

player matches Date
player1 matchid1 date 1
matchid2 date 2
matchid3 date 3
player2 matchid1 date 1
.... and so on

Similarly do it also for zip2, bowl,wk

Additional context
I'll be creating a seperate branch for this once a PR is opened for this

Check the appropriate choice

  • Organize data better
  • Adding data to the dataset

More tests required

Tests for crawler espn-matches required

Scrapy_autounit failes due to random generation of links in the crawler

[FEATURE] Add a function for averages

Describe the issue
We would like to add statistics such as batting average, bowling average to our dataset

Solution
To do this , we would like you to add two functions in a file called average.py, batting
_average and bowling_average, and implement them.

Test
run both of them and check if it updates the dataset

Note: While making a PR dont send it in with the updated dataset, that will be done by the maintainers only, just write the functions in the file as of now

Comment if you would like to work on it

Setup Web-Crawler for daily updates

Describe the Issue:
The player records in the data folder are outdated and static. Thus, they may not be enough to accurately predict player performances in current matches. Previous records were created from web-scraped data from howstat.com.

Solution:
Keep the records up to date using web-scraping for daily updates and reflect those changes in the data folder.

Comment if you would like to work on this

Dockerfile

How do you plan on hosting the App?

Describe the solution you'd like
Using Docker/docker-compose would one of the easier ways of hosting, especially it you plan on using a database. It would also make using a database in development more comparable when issues arise

Test Player Records Required

Describe the Issue:
The current model uses player data from ODI matches only. Adding Test matches would improve the usability of the model.

Solution:
Employ a web-scraper to scrape player records from howstat.com for test matches and update them in the data folder.
Ensure the scraped data is in the same format as the files in the data folder.

Comment if you would like to work on it.

Unnecessary Comments and print statements

Describe the issue
In team.py , there are a lot of uneccessary comment statements, comment string and a few print statements

Solution
Remove all such comments. comment strings and all print statements and rememeber to autoformat using black

Note: This is only a first timers issue, PRs from experienced users will be labelled invalid
Comment if you would like to work on it

Pycricbuzz is down

Describe the bug
The pycricbuzz package which we had used for getting the live matches and their respective squads has been disabled,a alternative option has to be put in ASAP

To Reproduce
Steps to reproduce the behavior:
Run the local development of the website

Desktop (please complete the following information):

  • OS: [ALL]
  • Version [v0.1.0]

Additional context
Possible solution is to crawl espncricinfo

[FEATURE REQ] Scoring systems for different Fantasy cricket platforms

Is your feature request related to a problem? Please describe.
Since more and more fantasy cricket platforms emerge , we would like to build a support for all such platforms

Describe the solution you'd like
In file fantasy_leagues.py each fantasy cricket platform should be represented in the following way
The list in each key of dictionary represents ['T20','ODI','TEST']
Example class

class Dream11(Teams):
    """Dream11 League

    Supported platforms:
            * ODI
            * T20
            * TEST
    """

    name = "Dream11"

    batting_dict = {
            "runs": [1, 1, 1],
            "boundaries": [1, 1, 1],
            "sixes": [2, 2, 2],
            "50": [8, 4, 4],
            "100": [16, 8, 8],
            "duck": [-2, -3, -4],
        }
  

    bowling_dict =  {
            "wicket": [25, 25, 16],
            "4-wicket-haul": [8, 4, 4],
            "5-wicket-haul": [16, 8, 8],
            "Maiden": [4, 8, None],
        }

    wk_dict = {
            "Catch": [8, 8, 8],
            "Stump": [12, 12, 12],
        }

Some platforms are

Comment if you would like to work on it

Categorising Players by Country

Describe the Issue:
The current data has players in no particular order or pattern. To allow for other exploratory data analysis, categorising players by country would be useful.

Solution:
Categorise the players in the data folder by country and update them in another folder named categorised under the data folder.

Comment if you would like to work on it.

Better interface

Issue
We are looking for a better front-end for our GUI, no backend changes expected
Solution
We are not expecting anything complex, any kind of significant improvements to the present model is welcome.

Players performance against different teams

Describe the issue
Many a times players will be out of form but the moment they see a specific opponents they somehow find that form, great Example is Steve Smith against India, no matter how his recent form is, he will play well against India, there are many more examples like that, one of the problems with our model is, it doesn't take that into account.

Solution
There's no prototype solution for this, You can collect data from websites(make sure its legal) and form a algorithm or detect patterns, anything that would improve the models losses.
One solution we have in mind to try out is to take into account both recent form and performance of player against that specific team, so if a player has had a rough patch in recent games but has an amazing record against that opponent the model should take that into account so that it predicts a better 11, we think it might help in improving the model.

Comment if you would like to work on it

No match found

Getting, There are no matches scheduled for the next 24 hours!.
IPL 2022 started but it is showing the above result. can anyone help on this.

Women's Cricket Data

Describe the Issue:
Note: This is an optional issue as of now
The data used consists of men's cricket. To make it more inclusive, adding records of women's cricket would be very useful.

Solution:
Mimic the data folder for women's cricket using a web-scraper
Currently there are no record for womens cricket on howstat.com, if you could find a open source, web scraping friendly website, go ahead, but be careful of legal issues while scraping
Use the names as given in the folder with the appropriate suffix.
eg. zip_women.csv

Comment if you would like to work on it.

[BUG] Fix changes after PR #26

Describe the bug
In PR #26 , the model fails due to updated folder names, It needs to fixed before merging it into master

To Reproduce
Steps to reproduce the behavior:
run app.py or check .py from the issue-25 branch only

Expected behavior
It must work the same as before. All changes must be done in the issue-25 branch only

Additional context
Not much work, wherever there is zip,zip2,bowl or wk, it just needs to be updated to 'zip/ODI' or 'zip2/ODI' and so on

[FEATURE] Add wicket keeper crawler to the webcrawler

Is your feature request related to a problem? Please describe.
The crawler currently only crawls for new players , and match ids. The web crawler also needs to crawl for wicketkeeping stats

Describe the solution you'd like
Extend function parse_scorecard in crawler/cricketcrawler/cricketcrawler/spiders/howstat.py to wicketkeeping and get the following

  • catches
  • stumpings

Additional context
Crawler can be found in feature-webcrawler branch of the repo. Crawler is built using scrapy

[FEATURE] Add bowling crawler to the webcrawler

Is your feature request related to a problem? Please describe.
The crawler currently only crawls for new players , and match ids. The web crawler also needs to crawl for batting stats

Describe the solution you'd like
Extend function parse_scorecard in crawler/cricketcrawler/cricketcrawler/spiders/howstat.py to bowling and get the following

  • wickets
  • Overs
  • Maidens
  • Economy

Refer Dataset.md to understand the matchcodes and playercodes

Additional context
Crawler can be found in feature-webcrawler branch of the repo. Crawler is built using [scrapy](https://scrapy.org/)

Time series Model

Describe the issue
Currently our model predicts points using linear regression, due to lack of features, if you can produce better results than the current model with another time series model, it will be great

Solution
change the current linear regression model to your model, and set up a pull request

Test
run check.py and see if your score beats our losses

Comment if you would like to work on it

[DOCS] Update Dataset.md after PR 36

Is your feature request related to a problem? Please describe.
After #36 the entire dataset has been restructured, those changes need to reflected in Dataset.md
The file structure can be found in the descriptions of #36

Describe the solution you'd like
Since zip, zip2,Bowl and wk folders no longer exist, they need to removed , the scoring table remains same,
Two folders have been added namely ODI and T20
Each having 4 folders containing the joint files of zip,zip2,Bowl and wk
this needs to reflected

[FEATURE] Implementation of FastAPI Framework

Describe the Issue
The current model makes use of the Flask web framework for implementing the model. Now with the existence of many more robust frameworks, implementing one such frame work that is FastAPI would be beneficial.

Solution
Create a FastAPI implementation of the existing Flask model that has basic functionalities provided by it. Any extra features that seem appropriate or aesthetically pleasing are well appreciated.

Test
Ensure the model makes use of the existing python scripts and matches to form teams and display them accordingly.
i.e. It should be seamless to integrate with the current scripts and project.

Comment if you would like to work on this.

Fantasy cricket API

We are looking for a API for any of the fantasy cricket platforms, as scraping them would be illegal as of now
Any suggestions are welcome

Preferably not very expensive, amazing if it would be free xD

Typing hints using pydantic

Is your feature request related to a problem? Please describe.
Add type hints to the files inside fantasy_cricket.
using pydantic and typing
Additional context
It would also be great if mypy was used for the checks and added to the CI

[DATA] Segregation of ODI records from T20 records

Is your feature request related to a problem? Please describe.
Currently the zip and zip2 folder has records for both ODI and T20, we would like to segregate these records for the future ,so that it would be easy to scrape the data in the future

Describe the solution you'd like
Create two folders in both zip and zip2, called ODI and T20 and segregate the records, also create a new folder called ODI in bowl and wk and place all the files in it ,

Comment if you would like to work on it

T20 League Player Records Required

Describe the Issue:
The current data folder contains player records from ODI matches only. It would make the model more relevant to include matches from other leagues as well.
Set up a PR if you are done with any T20 league(currently only IPL is available in howstat.com)
Solution:
The webcrawler has been set up in feature-webcrawler. Add your solution to it
Try to keep the data in the same format as in the data folder

T20_Leagues list

  • IPL
  • Big Bash
  • CPL (Carribean)

Comment if you would like to work on it.

Add algorithm to select players based on credits

Describe the issue
Currently our model predicts only on points but fantasy cricket has limitations on the basis of credits, we would like to implement the algorithm for it

Solution
Create a function that takes input the players list , credits list , maximum credits for the match and points predicted by the model for each player and select the best 11 players, based on the points whilst not crossing the maximum credits.
Note: be careful not to violate the team rule , i.e maximum 7 players from each team

Test
You can test with your own credits as of now, as we are still figuring out an api for it
You do not need to integrate it with current flask model, you can create a function in team.py as of now

Comment if you would like to work on it

More player records for ODI

Describe the Issue:
Currently we have players only from 11 countries whose distribution can be found in issue #16 , we would like more countries players to be added, some countries are Sri Lanka, Zimbabwe , etc.
Solution:
Employ a web-scraper to scrape player records from howstat.com for ODI matches for these countries (only non -retired) and update them in the data folder.
Ensure the scraped data is in the same format as the files in the data folder.

Comment if you would like to work on it.

Limit the countries in Web Scraping

The Scraper gets players from some countries whose matches the fantasy cricket platforms dont host on their website.
The case is same for matches

Make the crawler such that it takes only the players/Matches from the following countries

Country
India
England
Australia
Bangladesh
New Zealand
South Africa
West Indies
Pakistan
Ireland
Afghanistan
Sri Lanka

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.