Giter Site home page Giter Site logo

github-repository-analysis's Introduction

Github-Repository-Analysis

This project help you to get data from Github with Github API and create analytics database

Cloning

Use GitHub CLI to copy this project on your machine.

gh repo clone AndriiShchur/Github-Repository-Analysis

Usage

Fist of all create DB, in my Example I use Azure SQL. I will use the following schema:

alt text

Create env.dist files with credential:

SERVER="sql server name"
DB="main"
USER="your sql server name user"
PASSWORD="your sql server name password"
DB_NAME_TEST="your new db for analytics"
GIT_TOKEN = "your git token"

To create DB run:

python scr\scripts\create_db.py

Insert sql script name create_db.sql and change DB configuration inside file if it necessary.

To create TABLES run:

python scr\scripts\create_tables.sql

Insert sql script name create_tables.sql and change TABLE's schema inside file if it necessary.

To create PROCEDURCE AND TRIGER for ANALYTICS TABLE run:

python scr\scripts\create_sql_proc_and_trig.sql

Insert sql script name analytics_procedure.sql for procedurce name and analytics_update_triger.sql for trigger name, than change parametrs inside files if it necessary.

To get data from GitHub repository and INSERT to SQL DB run:

python scr\scripts\get_data_from_git.py

Insert sql script name insert_new_repo.sql to load new Repository name in RepoMain table, insert_records_in_pr_main.sqlto load new PR's information in PRMain TABLE, insert_records_in_pr_files.sqlto load new PR's files information in PRFile TABLE. The RepoAnalytics TABLE will update automaticly, when last record in PRFile TABLE will bee INSERT.

To get analytic's results run:

SELECT ra.[RepoID]
      ,rm.[RepoName]
      ,ra.[MinPRTime]
      ,ra.[MaxPRTime]
      ,ra.[AVGPRTime]
      ,ra.[Top1File]
      ,ra.[Top2File]
      ,ra.[Top3File]
FROM [dbo].[RepoAnalytics] AS ra
LEFT JOIN RepoMain AS rm ON ra.RepoID=rm.RepoID

I will see the foloowinf results (Example):

alt text

  • MinPRTime - minimum minutes to merge PR in Repository
  • MaxPRTime - maximun minutes to merge PR in Repository
  • AVGPRTime - average minutes to merge PR in Repository
  • Top1File - the first file changed most often in pull requests in Repository
  • Top2File - the second file changed most often in pull requests in Repository
  • Top3File - the thisd file changed most often in pull requests in Repository

Testing

To test scripts result, you can use unit test in test/unit folder

To test DB creation run:

python -m pytest tests/unit -k "unittestdb"

It will test, if new DB created/exists and check it cofiguration.

To test DB's schema run:

python -m pytest tests/unit -m "unittesttableexc"

It will test, if new TABLES created/exists and check it schema

To test DB's schema run:

python -m pytest tests/unit -m "unittesttableexc"
python -m pytest tests/unit -m "unittesttablestr"

It will test, if new TABLES created/exists and check it schema

To test If the last Repository in RepoMain TABLE have Analytics run:

python -m pytest tests/unit -m "unittestresults"

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

github-repository-analysis's People

Contributors

andriishchur avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.