Giter Site home page Giter Site logo

cncf / devstats.archive Goto Github PK

View Code? Open in Web Editor NEW
444.0 28.0 147.0 268.16 MB

📈CNCF-created tool for analyzing and graphing developer contributions

Home Page: https://devstats.cncf.io/

License: Apache License 2.0

Shell 72.96% Makefile 0.14% Go 2.15% Ruby 1.21% HTML 21.71% Roff 0.03% PLpgSQL 1.76% Vim Script 0.03%
githubarchive postgres kubernetes grafana-dashboard metrics apache jaeger cncf statistics github git golang tsdb

devstats.archive's Introduction

Build Status CII Best Practices

GitHub archives and git Grafana visualization dashboards

Authors: Łukasz Gryglicki [email protected], Justyna Gryglicka [email protected].

This is a toolset to visualize GitHub archives using Grafana dashboards.

GHA2DB stands for GitHub Archives to DashBoards.

More information about Kubernetes dashboards here.

Kubernetes and Helm

Please see example Helm chart for an example Helm deployment.

Please see Helm chart for a full Helm deployment.

Please see LF Helm chart for the LF Helm deployment (it is a data deployment, has no Grafana, and uses ElasticSearch in addition to Postgres to store data).

Please see GraphQL Helm chart for GraphQL foundation DevStats deployment.

Please see Kubernetes dashboard if you want to enable a local dashboard to explore the cluster state.

Please see bare metal example to see an example of bare metal deployment.

The rest of this document describes the current bare metal deployment on metal.equinix.com used by CNCF projects.

Presentations

  • Presentations are available here.
  • Direct link.
  • Another direct link.

Talks

Architecture

DevStats is deployed using Helm on Kubernetes running on bare metal servers provided by Equinix.

DevStats is written in Go, it uses GitHub archives, GitHub API and git as its main data sources.

Under the hood, DevStats uses the following CNCF projects:

  • Helm (for deployment).
  • containerd (as a Kubernetes container runtime, CRI).
  • cert-manager (for HTTPS/SSL certificates).
  • OpenEBS (for local storage volumes support).
  • MetalLB (as a load balancer for bare metal servers).
  • CoreDNS (Kubernetes internal DNS).

And other projects, including:

  • Equinix (bare metal servers provider).
  • Ubuntu (containers base operating system).
  • kubeadm (for installing Kubernetes).
  • NFS (for shared write network volumes support).
  • NGINX (for ingress).
  • Calico (as networking for Kubernetes, CNI).
  • Golang (DevStats is written in Go).
  • PostgreSQL (DevStats database is Postgres).
  • patroni (HA deployment of PostgreSQL database, tweaked for DevStats).
  • GitHub archives (main data source).
  • GitHub API (data source).
  • git (data source).
  • Grafana (UI).
  • Let's Encrypt (provides HTTPS/SSL certificates).
  • Travis CI (continuous integration & testing).

Please check this for a detailed architecture description.

Deploying on your own project(s)

See the simple DevStats example repository for single project deployment (Homebrew), follow instructions to deploy for your own project.

Goal

We want to create a toolset for visualizing various metrics for the Kubernetes community (and also for all CNCF projects).

Everything is open source so that it can be used by other CNCF and non-CNCF open source projects.

The only requirement is that project must be hosted on a public GitHub repository/repositories.

Data hiding

If you want to hide your data (replace with anon-#) please follow the instructions here.

Forking and installing locally

This toolset uses only Open Source tools: GitHub archives, GitHub API, git, Postgres databases, and multiple Grafana instances. It is written in Go and can be forked and installed by anyone.

Contributions and PRs are welcome. If you see a bug or want to add a new metric please create an issue and/or PR.

To work on this project locally please fork the original repository, and:

Please see Development for local development guide.

For more detailed description of all environment variables, tools, switches, etc, please see Usage.

Metrics

We want to support all kinds of metrics, including historical ones. Please see requested metrics to see what kind of metrics are needed. Many of them cannot be computed based on the data sources currently used.

Repository groups

There are some groups of repositories that are grouped together as a repository groups. They are defined in scripts/kubernetes/repo_groups.sql.

To setup default repository groups:

  • PG_PASS=pwd ./kubernetes/setup_repo_groups.sh.

This is a part of kubernetes/psql.sh script and kubernetes psql dump already has groups configured.

In an All CNCF project repository groups are mapped to individual CNCF projects scripts/all/repo_groups.sql:

Company Affiliations

We also want to have per company statistics. To implement such metrics we need a mapping of developers and their employers.

There is a project that attempts to create such mapping cncf/gitdm.

DevStats has an import tool that fetches company affiliations from cncf/gitdm and allows to create per company metrics/statistics. It also uses companies.yaml file to map company acquisitions (any data generated by a company acquired by another company is assigned to the latter using a mapping from companies.yaml).

If you see errors in the company affiliations, please open a pull request on cncf/gitdm and the updates will be reflected on https://k8s.devstats.cncf.io a couple of days after the PR has been accepted. Note that gitdm supports mapping based on dates, to account for developers moving between companies.

New affiliations are imported into DevStats about 1-2 times/month.

Architecture

For architecture details please see architecture file.

Detailed usage is here

Adding new metrics

Please see metrics to see how to add new metrics.

Adding new projects

To add a new project on a bare metal deployment follow adding new project instructions.

See cncf/devstats-helm:ADDING_NEW_PROJECTS.md for information about how to add more projects on Kubernetes/Helm deployment.

Grafana dashboards

Please see dashboards to see a list of already defined Grafana dashboards.

Exporting data

Please see exporting.

Detailed Usage instructions

Servers

The servers to run devstats are generously provided by Equinix bare metal hosting as part of CNCF's Community Infrastructure Lab.

One line run all projects

  • Use GHA2DB_PROJECTS_OVERRIDE="+cncf" PG_PASS=pwd devstats.
  • Or add this command using crontab -e to run every hour HH:08.

Checking projects activity

  • Use: PG_PASS=... PG_DB=allprj ./devel/activity.sh '1 month,,' > all.txt.
  • Example results here - all CNCF project activity during January 2018, excluding bots.

devstats.archive's People

Contributors

afrittoli avatar ahrkrak avatar amitkumarj441 avatar ant31 avatar austbot avatar dankohn avatar detiber avatar dgrizzanti avatar gaocegege avatar haardikdharma10 avatar isamrish avatar kmova avatar lukaszgryglicki avatar mattfarina avatar matthiasr avatar mattray avatar mhbauer avatar micahhausler avatar muvaf avatar notmyfault avatar prasadjivane avatar rhockenbury avatar roberthbailey avatar sergeykanzhelev avatar spiffxp avatar svrnm avatar vbehar avatar xtreme-sameer-vohra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

devstats.archive's Issues

Custom Date Ranges for Dashboards

It would be helpful to be able to pick a custom date range for all dashboards.

Use Case: We are creating programs around mentoring and contributor ladder where we will need to measure the value from multiple angles and the programs may not start or end on a specific 7 day/month/etc. period. It may be something like the last three days or two weeks, etc.

possible to add release timeframes to Quick Ranges?

I want to to use devstats to answer things like:

  • how has X changed over the past 3 releases?
  • is Y better or worse for the pending release vs. the last release we shipped?
  • what is the histogram of people who did Z for the release we're about to ship?

Something that would be helpful is to have Quick Ranges that are defined by specific kubernetes releases, eg:

  • specific releases: 1.8 release cycle, 1.7 release cycle
  • relative release: last release, last two releases, last three releases, (to now, or to their end)

How can we make answers these sorts of release-date based questions easier?

Suggested approvers stopped working about 12th December 2017

Last data is 11th Dec 2017:
https://k8s.devstats.cncf.io/dashboard/db/suggested-approvers?orgId=1
Workflow changed?
I've run metric SQL manually for (replacing {{from}} and {{to}}):
2017-10-01 - 2017-11-01 --> data is OK
2017-11-01 - 2017-12-01 --> data is OK
2017-12-01 - 2018-01-01 --> data ending at 12th Dec
2018-01-01 - 2018-02-01 --> no data.

SQL is here: https://github.com/cncf/devstats/blob/master/metrics/kubernetes/other_approver.sql

create temp table suggested_approvers as
select distinct i.id as issue_id,
  substring(
    c.body from '(?i)META={"approvers":\["([^"]+)"\]}'
  ) as approver,
  i.dup_repo_name as repo_name
from
  gha_comments c,
  gha_payloads pl,
  gha_issues i
where
  i.is_pull_request = true
  and c.event_id = pl.event_id
  and i.event_id = pl.event_id
  and i.created_at >= '{{from}}'
  and i.created_at < '{{to}}'
  and c.dup_actor_login = 'k8s-merge-robot'
  and c.body like '%APPROVALNOTIFIER%'
  and substring(
    c.body from '(?i)META={"approvers":\["([^"]+)"\]}'
  ) is not null
;

create temp table actual_approvers as
select distinct i.id as issue_id,
  c.dup_actor_login as approver
from
  gha_comments c,
  gha_payloads pl,
  gha_issues i
where
  i.is_pull_request = true
  and c.event_id = pl.event_id
  and i.event_id = pl.event_id
  and i.created_at >= '{{from}}'
  and i.created_at < '{{to}}'
  and c.dup_actor_login not in ('googlebot')
  and c.dup_actor_login not like 'k8s-%'
  and c.dup_actor_login not like '%-bot'
  and c.dup_actor_login not like '%-robot'
  and substring(
    c.body from '(?i)(?:^|\n|\r)\s*/approve\s*(?:\n|\r|$)'
  ) is not null
;

select
  'other_approvers;All;all_suggested_approvers,no_approver,other_approver,suggested_approver' as name,
  round(count(distinct sa.issue_id) / {{n}}, 2) as all_suggested_approvers,
  round(count(distinct sa.issue_id) filter (where aa.issue_id is null) / {{n}}, 2) as no_approver,
  round(count(distinct sa.issue_id) filter (where sa.approver != aa.approver) / {{n}}, 2) as other_approver,
  round(count(distinct sa.issue_id) filter (where sa.approver = aa.approver) / {{n}}, 2) as suggested_approver
from
  suggested_approvers sa
left join
  actual_approvers aa
on
  aa.issue_id = sa.issue_id
union select 'other_approvers;' || r.repo_group || ';all_suggested_approvers,no_approver,other_approver,suggested_approver' as name,
  round(count(distinct sa.issue_id) / {{n}}, 2) as all_suggested_approvers,
  round(count(distinct sa.issue_id) filter (where aa.issue_id is null) / {{n}}, 2) as no_approver,
  round(count(distinct sa.issue_id) filter (where sa.approver != aa.approver) / {{n}}, 2) as other_approver,
  round(count(distinct sa.issue_id) filter (where sa.approver = aa.approver) / {{n}}, 2) as suggested_approver
from
  gha_repos r
join
  suggested_approvers sa
on
  sa.repo_name = r.name
  and r.repo_group is not null
left join
  actual_approvers aa
on
  aa.issue_id = sa.issue_id
group by
  repo_group
order by
  all_suggested_approvers desc,
  name asc
;

drop table suggested_approvers;
drop table actual_approvers;

screen shot 2018-01-12 at 15 46 23

Any suggestions?

CI?

Would it be possible to use Travis CI or CircleCI to get automated testing?

I note this because I'm willing to help set that up if there's interest.

Min+max+avg in legends is a lie

If min+max+avg is enabled in the legend its important to know that the values are in fact the min/max/avg of the values returned from the query.

So if the query sent to Influxdb is aggregated by avg()
min will be the min of the avg values
max will be the max of the avg values
avg will be the correct value.

So the values might be off if you compare them to raw data.

Add support for /lifecycle bot-command

OK this one should be easy :)

A new bot-command /lifecycle was added within the last month or so, and is being used quite a bit. I'd like to see that reflected in the bot-commands dashboard.

I was going to try PR'ing this myself but I'm not sure if I understand the development flow, my guess was I'd need to modify:

  • metrics/kubernetes/bot_commands.sql
  • metrics/kubernetes/bot_commands_tags.sql
  • metrics/kubernetes/gaps.yaml

If I've got an example to follow I may be able to do this for other bot commands we may have missed

Add exporting options

Would it be possible to add an exporting feature to dashboards? If it's a matter of selecting a 'top 10' dashboards of importance to do this, I can select those. Looking for excel, csv, etc. options. json is fine for devs and the like but people using this tool may not have experience with that format. This will allow for others to easily consume the data in the format that they feel comfortable with.

Query by individual

Querying for company stats exists, but individuals does not. There are multiple workflows where this is useful, but a particular near term need is to measure activity towards membership ladder progression. Devstats appears to offer a more powerful path to insights than looking at a user's activity on a github page.

Consider SIGs as a top level dashboard object

I think it'd be a useful tool for the SIGs if the metrics being measured could be made available to them on a SIG level. So for example it'd be nice for SIGs to know how they'd been doing cycle to cycle or perhaps allow people to ascertain which SIGs need more reviewers or more hands on deck, which ones are running smooth, those sorts of things.

"But wait, there's no way for us to tell what which SIGs own which parts of the codebase!" Perhaps we can start encouraging people to start moving towards that model.

Combining dashboards

Should we continue the work of #41 should we combine more dashboards?

Lukasz, on a dashboard like https://k8s.devstats.cncf.io/dashboard/db/companies-velocity?orgId=1 is it possible to add single line sections between each graph (like the please report section on the homepage) that would include an A NAME and so clicking on it would produce a URL like https://k8s.devstats.cncf.io/dashboard/db/companies-velocity?orgId=1#companies_active that would jump you to that section?

If so, shall we combine the three companies dashboards? Is there a logical order?

Can we combine the Issues and PRs dashboards?

The 4 SIGs ones?

Anything else?

@spiffxp @parispittman @castrojo Please chime in what you would find useful.

req: issue age dashboard

It would be nice to have an issues-age dashboard, parallel to the prs-age dashboard currently in place

It would be nice, if possible, to be able to filter by: kind, sig, priority

The question I'm trying to answer is: how many auto-filed issues (author:k8s-merge-robot,label:kind/flake,"Failure cluster") have been addressed by humans vs. left untouched?

Add remaining CNCF projects: rkt, CNI, Envoy, Jaeger, Notary, TUF, Rook, All, CNCF

Add remaining CNCF projects:

  • rkt
  • CNI
  • Envoy
  • Jaeger
  • Notary
  • TUF
    With their websites:
  • rkt.cncftest.io, rkt.devstats.cncf.io
  • cni.cncftest.io, cni.devstats.cncf.io
  • envoy.cncftest.io, envoy.devstats.cncf.io
  • jaeger.cncftest.io, jaeger.devstats.cncf.io
  • noraty.cncftest.io, notary.devstats.cncf.io
  • tuf.cncftest.io, tuf.devstats.cncf.io

Contributors by Release

Similar to the Top Comments Dashboard, it would be very beneficial for easily accessible data on who contributed to each release by their handle, including # of PRs, and the repo that they went into. We can then filter by the repo and the release date/number.

Are "All PRs ..." dashboards redundant?

Trying to reduce cognitive load from too many dashboards.

Comparing PRs merged to All PRs merged it seems like the same information is available. Ditto for Need Rebase PRs vs All Need Rebase PRs. I prefer the more detailed non-all variants.

What do folks think about dropping the "All" dashboards? If people use them or prefer it to the more detailed dashboards, could we instead move the "all" graph to be another graph in the non-all dashboards?

Dashboards as code

Some day in the future we really want to improve the experience for people who want to store their dashboards as files primarily rather then storing them in the database primarily. Until then I recommend using https://github.com/weaveworks/grafanalib great way of writing your dashboards are code instead of bloaty json files :)

All Dashboards

Can you get the panel of All CNCF Projects (with their logos) to appear below the list of all dashboards on the homepage?

If not, please add it to the list of all views visible on the homepage.

Categories items on the home page

The items displayed on the home page (https://devstats.k8s.io) are several and they probably going to increase. We may end up with lots of scrolling. This can be confusing. Can we Categories related items e.g. have all issues related items under "Issue" or companies related items under 'Companies' (for example, have 'Companies stats", "Companies summary" and "Companies velocity" under main item "Companies". Thanks!

User Guide

Add to readme or github wiki a DevStats User Guide describing how to use the site and some example workflows to find and interpret data.

Some enhancements and corrections required for - ubuntu16 setup guide.

I am trying to setup a local devstats environment using the instructions provided in:
https://github.com/cncf/devstats/blob/master/INSTALL_UBUNTU16.md

It will be good to include the following:

  • Minimum System Requirements and recommended settings (like sysctl) for running postgres and influxdb on the same machine. (Currently I got into an issue where influx db is not starting the http service due to default limits on the openfiles etc. Still trying to get past this stage.)

Also, noticed the following minor items that needs to be updated:

  • GOPATH needs to be fixed throughout the doc. For example /data/dev or $HOME/data/dev
  • Need Go 1.7+. Using 1.6 default on Ubuntu 16.04 causes error at step : go get github.com/google/go-github/github
  • sysctl: cannot stat /proc/sys/net/ipv4/tcp_tw_recycle: No such file or directory (not sure if this is expected)
  • /etc/github/aoauth -> /etc/github/oauth

[dashboards] repository groups are unclear

What concrete repositories do the repository groups map to?

Personally I would prefer to have the ability to select individual repositories everywhere I can only use repository groups

Drop down menu of projects

Would it be possible to modify Grafana so that when you click the project logo at the top left, under the Pin option would be entries for every CNCF project (with logo and name). Selecting one would redirect you to that's project's page.

Dashboard item to display open issues and PR for SIGs

In the last ContribX SIG meeting (11/15/2017), we had a discussion about Kubernetes issues that are currently open but should have been closed based on some investigation done by a non-SIG member. Ideally SIG member should quickly close such an issues without spending much time per findings added as comments in the issue. However due to bandwidth limitation it's not always the case. One of the possible solutions was to display Open Issues and PR in the devstats per SIG to give a SIG idea about them. As I was looking at devstats, I guess the good place to display such stats for open issues can be under 'SIG issues' with new box for 'Open issue'? and something similar to PRs. (I am also working on possibilities of new labels to identify close candidates something that can be displayed on devstats in future)

ContribX meeting note for related action item under 11/15/2017:
https://docs.google.com/document/d/1qf-02B7EOrItQgwXFxgqZ5qjW0mtfu5qkYIF1Hl4ZLI/edit?ts=5a0c7fbb

Dashboard request: New/Episodic contributors

Here's some graphs I'd like to have:

Top request:

New Contributors: count of the number of PRs and count of unique individuals submitting a PR for the first time this month/week/day in repo history.

Also like to have, but less important:

Episodic Contributors: people/PRs from contributors who have not submitted a PR for a prior 3 months, and not more than 12 PRs ever.

New Issue Filers: count of the number of people and the number of issues from folks who are filing an issue for the first time in their history.

It would also be really nice if all of the above was divisible by SIG label.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.