hackerspace-pesu / best11-fantasycricket Goto Github PK

View Code? Open in Web Editor NEW

24.0 4.0 17.0 5.2 MB

Predicting the Best 11 for a fantasy cricket game

License: GNU Affero General Public License v3.0

Python 63.45% CSS 9.13% HTML 25.88% Dockerfile 1.54%

fantasy-cricket regress python fastapi scrapy sports scrapyrt espncricinfo

best11-fantasycricket's Introduction

Best11-Fantasycricket

Description

In the past year or so fantasy cricket has been getting a lot of traction and with recent deal struck by Dream11 with IPL, more people are playing fantasy cricket than ever, but the problem is lot of people do not make right choices in choosing the team and end up thinking winning is all about luck and nothing else. With our project we want to break that myth by making a model which when given with players predicts the best 11 that will have the most points in the fantasy league. We have gathered statistics of players throughput their career and the model takes in the scores last 5 games a player has played and it tries to predict his score in the next game using a linear model.

Requirements

Install using

pip3 install -r requirements.txt

Local Development

To run our project follow these steps

Clone our repo into your system
Change your directory to 'Best11-Fantasycricket' using

cd Best11-Fantasycricket

Linux and MACOS
1. Type nano /etc/hosts on your terminal or open /etc/hosts on your prefered editor
Windows
1. Open C:\windows\system32\drivers\etc\hosts in your prefered editor
2. And add the below line to the the file and save
127.0.0.1 espncricinfo

OR
1. Open app/fantasy_cricket/scrapyrt_client.py in your prefered editor
2. Change line 16 to
```
	self.url = "http://localhost:9080/crawl.json"
```
Open a tab on your terminal and run

uvicorn app.main:app

Open another tab on your terminal and run

scrapyrt

Open http://localhost:8000/ and voila!!

Note: Visit http://localhost:9080/crawl.json with the correct queries to see the crawler api

Docker

Follow the steps:

	docker build -t espncricinfo:latest "." -f docker/espncricinfo/Dockerfile
	docker build -t best11:latest "." -f docker/11tastic/Dockerfile
	docker-compose -f docker/docker-compose.yaml up

Visit http://localhost:8080/ to see the website in action

Note Visit http://localhost:9080/crawl.json with the correct queries to see the crawler api

How do I contribute to this project????

⚠️ Warning! Existing contributors and/or future contributors , re-fork the repo as the commit-history has been rewritten to reduce size of the repo while cloning which makes cloning much faster than before!.

Refer to the Contributing.md file of our repository

If you have any suggestions for our project , do raise a issue and we will look into it and if we think it helps our project we will keep it open until its implemented by us or by anyone else

If you have any questions regarding our project , you can contact any of the maintainers(info on respective profile pages) or raise a issue and we'll answer you as soon as possible.

Thank You

Maintainers

Acknowledgements

Special thanks to scientes for allowing us to use the server to host the website
We would like to thank espncricinfo for their amazing website with daily updates and availabilty to scrape

If you liked our project we would really appreciate you starring this repo.

Thank you

best11-fantasycricket's People

Contributors

Stargazers

Watchers

Forkers

parijatdhar97 rahul30032 nimendrak ramgsuri scientes techyringo ghanender-chauhan egurnick abhisinha2001 milanmandal data-science-ai-open-source dan-329 jukiforde mraaghav sajalshlan hazdik bharani25

best11-fantasycricket's Issues

[BUG] Web crawler searches through matches from the 1900s

Describe the bug
The web crawler in feature-crawler
takes in match records from the 1900s . This wastes a lot of time and reduces efficiency of the crawler
To Reproduce
Steps to reproduce the behavior:

Follow the instructions in the README file to run the crawler
Wait for the Ids crawl to finish and notice

Expected behavior
The solution to this would be to set a filter which takes match records only from the year 2017 and greater
Possible solution
in cralwer/cricketcrawler/spiders/howstat.py in function parse_scorecard

if int(date[0:4]) >= 2017:
     item=MatchidItem(name=url[startint+10:],folder=folder,matchid=matchid,date=date)
      yield item

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Version [feature-crawler]

Additional context
The starting point to this might be crawler/cricketcrawler/spiders/howstat.py

Add privacy policy page

Needed for GDPR as of my knowledge

[FEATURE] Add batting crawler to the webcrawler

Is your feature request related to a problem? Please describe.
The crawler currently only crawls for new players , and match ids. The web crawler also needs to crawl for batting stats

Describe the solution you'd like
~~A function similar to parse_player and parse_scorecard~~ Extend function parse_scorecard in crawler/cricketcrawler/cricketcrawler/spiders/howstat.py to batting and get the following

runs
no of 4s and 6s
strike rate

Refer Dataset.md to understand the matchcodes and playercodes

Additional context
Crawler can be found in feature-webcrawler branch of the repo. Crawler is built using [scrapy](https://scrapy.org/)

Add Contact Page

Requirement for GDPR as of my knowledge

Ignore retired players

Describe the bug
It is evident that retired players don't play anymore. The webcrawler still includes them which needs to be filtered

To Reproduce
Steps to reproduce the behavior:

Follow the instructions in README.md and notice once it starts collecting players. It can also be noticed in data_crawler/ids_names.csv.

Possible Solution
The solution :
in cralwer/cricketcrawler/spiders/howstat.py , in function parse_player

if retired == False:
          yield PlayerItem(name=url[url.find("?PlayerID=")+10:],gametype=gametype,folder=".",longname=name,retired=retired)

Screenshots

Desktop (please complete the following information):

Version [master]

[DATA] Organizing data into one csv file

Is your data format related to a problem? Please describe.
Currently, we have about 337 files per batting and bowling, each player has his own csv file, this wouldnt work once the number of players keep increasing.

Describe the solution you'd like
Inside the zip folder there are two folders called ODI and T20, these two folders must be converted to a single csv file called zip_ODI.csv and zip_T20.csv
the format of the csv file is as follows:

player	matches	Date
player1	matchid1	date 1
	matchid2	date 2
	matchid3	date 3
player2	matchid1	date 1
.... and so on

Similarly do it also for zip2, bowl,wk

Additional context
I'll be creating a seperate branch for this once a PR is opened for this

Check the appropriate choice

Organize data better
Adding data to the dataset

More tests required

Tests for crawler espn-matches required

Scrapy_autounit failes due to random generation of links in the crawler

[FEATURE] Add a function for averages

Describe the issue
We would like to add statistics such as batting average, bowling average to our dataset

Solution
To do this , we would like you to add two functions in a file called average.py, batting
_average and bowling_average, and implement them.

Test
run both of them and check if it updates the dataset

Note: While making a PR dont send it in with the updated dataset, that will be done by the maintainers only, just write the functions in the file as of now

Comment if you would like to work on it

Setup Web-Crawler for daily updates

Describe the Issue:
The player records in the data folder are outdated and static. Thus, they may not be enough to accurately predict player performances in current matches. Previous records were created from web-scraped data from howstat.com.

Solution:
Keep the records up to date using web-scraping for daily updates and reflect those changes in the data folder.

Comment if you would like to work on this

Dockerfile

How do you plan on hosting the App?

Describe the solution you'd like
Using Docker/docker-compose would one of the easier ways of hosting, especially it you plan on using a database. It would also make using a database in development more comparable when issues arise

[FEATURE] Missing requirements.txt

It would be easier to install when the amount of modules grows by using a requirements.txt

Test Player Records Required

Describe the Issue:
The current model uses player data from ODI matches only. Adding Test matches would improve the usability of the model.

Solution:
Employ a web-scraper to scrape player records from howstat.com for test matches and update them in the data folder.
Ensure the scraped data is in the same format as the files in the data folder.

Comment if you would like to work on it.

Unnecessary Comments and print statements

Describe the issue
In team.py , there are a lot of uneccessary comment statements, comment string and a few print statements

Solution
Remove all such comments. comment strings and all print statements and rememeber to autoformat using black

Note: This is only a first timers issue, PRs from experienced users will be labelled invalid
Comment if you would like to work on it

Pycricbuzz is down

Describe the bug
The pycricbuzz package which we had used for getting the live matches and their respective squads has been disabled,a alternative option has to be put in ASAP

To Reproduce
Steps to reproduce the behavior:
Run the local development of the website

Desktop (please complete the following information):

OS: [ALL]
Version [v0.1.0]

Additional context
Possible solution is to crawl espncricinfo

[FEATURE] Add circleci badge on README.md

Is your feature request related to a problem? Please describe.
Since the CI has been shifted to circleci, it would be great if the circleci badge was added to the readme file

Describe the solution you'd like
Reference: https://circleci.com/docs/2.0/status-badges/

[FEATURE REQ] Scoring systems for different Fantasy cricket platforms

Is your feature request related to a problem? Please describe.
Since more and more fantasy cricket platforms emerge , we would like to build a support for all such platforms

Describe the solution you'd like
In file fantasy_leagues.py each fantasy cricket platform should be represented in the following way
The list in each key of dictionary represents ['T20','ODI','TEST']
Example class

class Dream11(Teams):
    """Dream11 League

    Supported platforms:
            * ODI
            * T20
            * TEST
    """

    name = "Dream11"

    batting_dict = {
            "runs": [1, 1, 1],
            "boundaries": [1, 1, 1],
            "sixes": [2, 2, 2],
            "50": [8, 4, 4],
            "100": [16, 8, 8],
            "duck": [-2, -3, -4],
        }
  

    bowling_dict =  {
            "wicket": [25, 25, 16],
            "4-wicket-haul": [8, 4, 4],
            "5-wicket-haul": [16, 8, 8],
            "Maiden": [4, 8, None],
        }

    wk_dict = {
            "Catch": [8, 8, 8],
            "Stump": [12, 12, 12],
        }

Some platforms are

Comment if you would like to work on it

Categorising Players by Country

Describe the Issue:
The current data has players in no particular order or pattern. To allow for other exploratory data analysis, categorising players by country would be useful.

Solution:
Categorise the players in the data folder by country and update them in another folder named categorised under the data folder.

Comment if you would like to work on it.

Better interface

Issue
We are looking for a better front-end for our GUI, no backend changes expected
Solution
We are not expecting anything complex, any kind of significant improvements to the present model is welcome.

[recommendation] Change from Flask to fastapi

Fastapi is a python framework designed for building apis:
https://fastapi.tiangolo.com/
it simplifies things, uses stuff like pydantic for input verification and has automatic Openapi (swagger) support and much more nice treats

Players performance against different teams

Describe the issue
Many a times players will be out of form but the moment they see a specific opponents they somehow find that form, great Example is Steve Smith against India, no matter how his recent form is, he will play well against India, there are many more examples like that, one of the problems with our model is, it doesn't take that into account.

Solution
There's no prototype solution for this, You can collect data from websites(make sure its legal) and form a algorithm or detect patterns, anything that would improve the models losses.
One solution we have in mind to try out is to take into account both recent form and performance of player against that specific team, so if a player has had a rough patch in recent games but has an amazing record against that opponent the model should take that into account so that it predicts a better 11, we think it might help in improving the model.

Comment if you would like to work on it

No match found

Getting, There are no matches scheduled for the next 24 hours!.
IPL 2022 started but it is showing the above result. can anyone help on this.

Women's Cricket Data

Describe the Issue:
Note: This is an optional issue as of now
The data used consists of men's cricket. To make it more inclusive, adding records of women's cricket would be very useful.

Solution:
Mimic the data folder for women's cricket using a web-scraper
Currently there are no record for womens cricket on howstat.com, if you could find a open source, web scraping friendly website, go ahead, but be careful of legal issues while scraping
Use the names as given in the folder with the appropriate suffix.
eg. zip_women.csv

Comment if you would like to work on it.

Pre-commit hook

Add a pre-commit hook with basic hooks like check-ast , check-yaml, check-merge-conflict, end-of-file-fixer

Also add additional hooks for black

[BUG] Fix changes after PR #26

Describe the bug
In PR #26 , the model fails due to updated folder names, It needs to fixed before merging it into master

To Reproduce
Steps to reproduce the behavior:
run app.py or check .py from the issue-25 branch only

Expected behavior
It must work the same as before. All changes must be done in the issue-25 branch only

Additional context
Not much work, wherever there is zip,zip2,bowl or wk, it just needs to be updated to 'zip/ODI' or 'zip2/ODI' and so on

[FEATURE] Add wicket keeper crawler to the webcrawler

Is your feature request related to a problem? Please describe.
The crawler currently only crawls for new players , and match ids. The web crawler also needs to crawl for wicketkeeping stats

Describe the solution you'd like
Extend function parse_scorecard in crawler/cricketcrawler/cricketcrawler/spiders/howstat.py to wicketkeeping and get the following

catches
stumpings

Additional context
Crawler can be found in feature-webcrawler branch of the repo. Crawler is built using scrapy

[FEATURE] Add bowling crawler to the webcrawler

Is your feature request related to a problem? Please describe.
The crawler currently only crawls for new players , and match ids. The web crawler also needs to crawl for batting stats

Describe the solution you'd like
Extend function parse_scorecard in crawler/cricketcrawler/cricketcrawler/spiders/howstat.py to bowling and get the following

wickets
Overs
Maidens
Economy

Refer Dataset.md to understand the matchcodes and playercodes

Additional context
Crawler can be found in feature-webcrawler branch of the repo. Crawler is built using [scrapy](https://scrapy.org/)

Time series Model

Describe the issue
Currently our model predicts points using linear regression, due to lack of features, if you can produce better results than the current model with another time series model, it will be great

Solution
change the current linear regression model to your model, and set up a pull request

Test
run check.py and see if your score beats our losses

Comment if you would like to work on it

[DOCS] Update Dataset.md after PR 36

Is your feature request related to a problem? Please describe.
After #36 the entire dataset has been restructured, those changes need to reflected in Dataset.md
The file structure can be found in the descriptions of #36

Describe the solution you'd like
Since zip, zip2,Bowl and wk folders no longer exist, they need to removed , the scoring table remains same,
Two folders have been added namely ODI and T20
Each having 4 folders containing the joint files of zip,zip2,Bowl and wk
this needs to reflected

[FEATURE] Implementation of FastAPI Framework

Describe the Issue
The current model makes use of the Flask web framework for implementing the model. Now with the existence of many more robust frameworks, implementing one such frame work that is FastAPI would be beneficial.

Solution
Create a FastAPI implementation of the existing Flask model that has basic functionalities provided by it. Any extra features that seem appropriate or aesthetically pleasing are well appreciated.

Test
Ensure the model makes use of the existing python scripts and matches to form teams and display them accordingly.
i.e. It should be seamless to integrate with the current scripts and project.

Comment if you would like to work on this.

Fantasy cricket API

We are looking for a API for any of the fantasy cricket platforms, as scraping them would be illegal as of now
Any suggestions are welcome

Preferably not very expensive, amazing if it would be free xD

Typing hints using pydantic

Is your feature request related to a problem? Please describe.
Add type hints to the files inside fantasy_cricket.
using pydantic and typing
Additional context
It would also be great if mypy was used for the checks and added to the CI

[DATA] Segregation of ODI records from T20 records

Is your feature request related to a problem? Please describe.
Currently the zip and zip2 folder has records for both ODI and T20, we would like to segregate these records for the future ,so that it would be easy to scrape the data in the future

Describe the solution you'd like
Create two folders in both zip and zip2, called ODI and T20 and segregate the records, also create a new folder called ODI in bowl and wk and place all the files in it ,

Comment if you would like to work on it

T20 League Player Records Required

Describe the Issue:
The current data folder contains player records from ODI matches only. It would make the model more relevant to include matches from other leagues as well.
Set up a PR if you are done with any T20 league(currently only IPL is available in howstat.com)
Solution:
The webcrawler has been set up in feature-webcrawler. Add your solution to it
Try to keep the data in the same format as in the data folder

T20_Leagues list

IPL
Big Bash
CPL (Carribean)

Comment if you would like to work on it.

Add algorithm to select players based on credits

Describe the issue
Currently our model predicts only on points but fantasy cricket has limitations on the basis of credits, we would like to implement the algorithm for it

Solution
Create a function that takes input the players list , credits list , maximum credits for the match and points predicted by the model for each player and select the best 11 players, based on the points whilst not crossing the maximum credits.
Note: be careful not to violate the team rule , i.e maximum 7 players from each team

Test
You can test with your own credits as of now, as we are still figuring out an api for it
You do not need to integrate it with current flask model, you can create a function in team.py as of now

Comment if you would like to work on it

Add SEO to HTML templates

Good Resource: https://developers.google.com/search/docs/beginner/seo-starter-guide?hl=en

More player records for ODI

Describe the Issue:
Currently we have players only from 11 countries whose distribution can be found in issue #16 , we would like more countries players to be added, some countries are Sri Lanka, Zimbabwe , etc.
Solution:
Employ a web-scraper to scrape player records from howstat.com for ODI matches for these countries (only non -retired) and update them in the data folder.
Ensure the scraped data is in the same format as the files in the data folder.

Comment if you would like to work on it.

Limit the countries in Web Scraping

The Scraper gets players from some countries whose matches the fantasy cricket platforms dont host on their website.
The case is same for matches

Make the crawler such that it takes only the players/Matches from the following countries

Country
India
England
Australia
Bangladesh
New Zealand
South Africa
West Indies
Pakistan
Ireland
Afghanistan
Sri Lanka

hackerspace-pesu / best11-fantasycricket Goto Github PK

best11-fantasycricket's Introduction

Best11-Fantasycricket

Description

Requirements

Local Development

Docker

How do I contribute to this project????

Thank You

Maintainers

Acknowledgements

best11-fantasycricket's People

Contributors

Stargazers

Watchers

Forkers

best11-fantasycricket's Issues

Recommend Projects

Recommend Topics

Recommend Org