Giter Site home page Giter Site logo

autodesk / hubble Goto Github PK

View Code? Open in Web Editor NEW
173.0 16.0 49.0 1.44 MB

🛰 Collaboration, usage, and health data visualization for GitHub Enterprise

Home Page: https://autodesk.github.io/hubble

License: MIT License

Python 88.35% Makefile 2.35% Shell 7.58% Dockerfile 1.72%
hubble-enterprise github-enterprise analytics github git

hubble's Introduction

Logo Banner

Hubble Enterprise GitHub Release CI Status codecov

Hubble Enterprise visualizes GitHub Enterprise collaboration, usage, and health data.

Explore our interactive demo or watch the recording of our GitHub Universe talk to learn more!

⚠️ Attention: Hubble Enterprise is not supported by or affiliated with GitHub. Use it at your own risk! Autodesk assumes no responsibility for any data loss or hardship incurred directly or indirectly by using Hubble Enterprise.

Hubble Enterprise runs all queries through the GitHub Enterprise administrative shell and ignores repository visibility settings to generate statistics over all repositories on your appliance. Consequently, the names (no content!) of private repositories could show up on the Hubble dashboard published via GitHub Pages on your appliance. If you have enabled Public Pages on your GitHub Enterprise management console, then everyone on your network will be able to see the Hubble dashboard!

Please use Hubble Enterprise on your production instance only after reviewing the source code carefully!

Getting Started

Hubble Enterprise consists of two components. The updater component is a Python script that queries relevant data from a GitHub Enterprise appliance and stores the results in a Git repository once a day. The docs component is a web application that visualizes the collected data and is hosted with GitHub Pages.

  1. Create a new, initialized, public repository for Hubble’s data on your GitHub Enterprise appliance (for instance, https://git.company.com/scm/hubble-data).
  2. Publish Hubble’s data repository on GitHub Pages. Go to the repository settings, options tab, GitHub Pages section, then choose master branch as source, and click save. GitHub Enterprise will now tell you the URL of the published data pages (for instance, https://pages.git.company.com/scm/hubble-data if you have subdomain isolation enabled). Please be aware that this is a GitHub Pages URL and not just the repository’s URL. Note this URL down as dataURL, as you will need it later.
  3. Create a new, uninitialized, public repository for Hubble on your GitHub Enterprise appliance (for instance, https://git.company.com/scm/hubble).
  4. Clone this repository to your local machine, add your new Hubble repository as a remote, and push Hubble’s master branch to this remote:
    git clone https://github.com/autodesk/hubble
    cd hubble
    git remote add ghe https://git.company.com/scm/hubble
    git push -u ghe master
  5. Open docs/_config.yml in your editor and set the dataURL that you noted earlier. Commit and push the change to your Hubble repository:
    git add docs/_config.yml
    git commit -m "Adjusting dataURL to our own instance"
    git push
  6. Publish Hubble’s docs folder on GitHub Pages. Go to the repository settings, options tab, GitHub Pages section, then choose master branch/docs folder as source, and click save. GitHub Enterprise will now tell the URL of the published dashboard pages. You may want to bookmark this URL to conveniently access the dashboard of Hubble Enterprise.
  7. Configure the updater component.

Contributing

Review the contributing guidelines before you consider working on Hubble Enterprise and proposing contributions.

Core Team

These are the humans that form the core team of Hubble Enterprise, in alphabetical order:


@larsxschneider

@pluehne

License

SPDX-License-Identifier: MIT

hubble's People

Contributors

craigez avatar dependabot[bot] avatar dfarr avatar filmaj avatar jonico avatar larsxschneider avatar mlbright avatar pluehne avatar stoe avatar svasek avatar swardu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hubble's Issues

feature request

Hello there,

We have a small feature request for housekeeping section, maybe this helps for others as well.

Git requests per user:
similar to API requests per user, Can we also have a a table under housekeeping, which shows number of git requests per user.

GitHub LFS data per repository:
for example repositories which consumes large size on GitHub are available only for admins. Maybe it is good to share the same on hubble so that it would be easy for end users.

Regards
Piyush

git-versions.tsv produces data file with lots of entries containing day and 1

Hi guys,
using v0.3.0 at the moment and for some reason on our GitHub Enterprise ReportGitVersion.py produces a data file with lots of entries containing the day and a 1. I ran the zgrep statement on a haproxy.* file for testing and get the same output with the day and 1. Output in git-versions.tsv and git-versionsnew.tsv looks something like this.
31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 ... ... ... 2.17.1 131 2.17.0 120 2.17 1 2.16.3 181 2.16.2 180 2.16.1 144 2.16.0 8 2.15.2 2 2.15.1 426 2.15.0 141
Has anybody else this issue as well? If I get some time tomorrow I will take the zgrep command apart and see where and why it happens but I am not the best bash/perl guy. My mileage may not be too good :)

Data being displayed is behind by 2 days

Today's date is Feb 7 however the data being displayed in Hubble is only up to Feb 5. I looked at the raw data files and they indeed have data for Feb 6.

I collect data every morning and always have seen Hubble display day -1 (which is expected). Now, though, Hubble is displaying day -2 but I keep collecting the data at the same time every morning.

Thanks for any insight.

SQL syntax error introduced by Pull Request GH-5

Latest change on PR #5 file hubble/updater/reports/ReportForksToOrgs.py lead to SQL exception.
GHE Version 2.11.2

ERROR 1064 (42000) at line 2: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'CAST(repos.created_at AS date) AS "creation date"
			FROM
				users AS orgs,
			' at line 3

Traceback (most recent call last):
  File "update-stats.py", line 99, in <module>
    main()
  File "update-stats.py", line 72, in main
    ReportForksToOrgs(configuration, dataDirectory, metaStats).update()
  File "~/Workspace/hubble/updater/reports/Report.py", line 173, in update
    self.updateData()
  File "~/Workspace/hubble/updater/reports/ReportDaily.py", line 35, in updateData
    self.updateDailyData()
  File "~/Workspace/hubble/updater/reports/ReportForksToOrgs.py", line 13, in updateDailyData
    self.detailedHeader, self.detailedData = self.parseData(self.executeQuery(self.query()))
  File "~/Workspace/hubble/updater/reports/Report.py", line 105, in executeQuery
    return self.executeScript(self.configuration["databaseCommand"], stdin = query)
  File "~/Workspace/hubble/updater/reports/Report.py", line 96, in executeScript
    stdout, stderr = executeCommand(script, stdin)
  File "~/Workspace/hubble/updater/reports/Report.py", line 23, in executeCommand
    raise RuntimeError(command[0] + " failed with exit code " + str(process.returncode))
RuntimeError: ssh failed with exit code 1

Distribute colors in collaboration chart in order of segment size

Currently, the collaboration chart segments are colored in clockwise order. The color palette was chosen to have sequentially contrasting colors. However, the segments usually have varying sizes, which means that less important segments might end up getting contrasting colors, while the bigger chunks are hard to distinguish.

For this reason, I’d like to distribute the colors not in clockwise order, but from the biggest to the smallest segments. In this way, the biggest chunks, which are most important, would get contrasting colors.

Feature request: Traffic based on user

Users are reviewing the transferred data (per day) and would like to get a histogram per user (instead of the aggregated total). Is there a way to narrow down the statistics to per user (or provide a graph/chart for user traffic)?

This would help users know if the improvements that they are making to their processes is working.

Feature request: Have hubble updater run as part of a CI job or Heroku

So far, the hubble updater can either be installed on the GitHub Enterprise machine or run as a cron job from a separate machine. Having it as a service on GitHub Enterprise itself has the drawback, that you need to reinstall it after every upgrade, having it running on a separate machine has the drawback that you have to find, secure and properly maintain this other machine. In both cases, some manual setups steps are required.

I wonder whether it would be possible to use a CI service like Travis or Jenkins or a free Heroku app to run hubble updater once a day. All systems provide a declarative approach to define their build / run steps + encrypted environment variables for the credentials needed to connect to GitHub Enterprise. With Heroku, one could even come up with a Heroku deploy button.

Move action bar below the charts on small screens

The action bar is currently vertically aligned with the title. For this reason, it is rendered above the charts on smaller screens:

screenshot from 2018-02-06 14-41-03

It might make sense to place the action bar below the charts along with the info boxes. First, because it’s not the topmost important piece of information (especially on small screens, where space is limited). Second, vertically aligning the action bar with the title will look odd when adding a view switcher as suggested in #115.

Feature request: showing % of contributions coming from within an org vs. outside

It's similar to the current Collaboration chart, except instead of visualizing which orgs collaborate with each other, I would like to be able to track what % of contributions come from outside the org (vs. inside), for some definition of "contribution". One idea: number of commits in master authored by members of the organization, vs. number of commits by non-members of the org? I'm open to different definitions.

I would love for this to be time-series data, that is, tracking this % of outside contributions over time. This is similar to #35 in that slicing the data up by time would be required.

Finally, one extension of this could be doing a similar analysis except replacing 'org' with 'repo'.

Let me know what you think. I'm happy to take a stab at this, but I'm still learning the project 😓. Haven't delved into the updater/ portion of the repo yet, which is where I believe this change would need to land.

Disable "chart download" link for collaboration chart

In #85 we introduced an action bar with a download link for our charts. That's great for most charts but the collaboration chart has a raw data format that is not really useful as it cannot easily be processed in Excel and the like. That's why we should remove the download link for it.

Monitor LDAP sync

In GHE you can define the LDAP sync intervals. The sync time should always be smaller than the sync interval. You can monitor the sync time (for ldap team syncs) of your GHE appliance as follows:

$ zcat -f production.log* |  perl -ne 'print if s/^(.*) \+.*resque\.performed.*ldap_team_sync.*?([0-9\.]+ms)$/\1 \2/' | sort

Visualize minimal remaining rate limiting quota

If rate limiting is enabled, then GHE logs will report the remaining quota for a user. Visualize the minimal remaining quota per day as well as the top 20 users using up the most of their quota.

Find minimal remaining quota with the following call:

zgrep -v 'status=40[13]' /var/log/github/unicorn.* | grep -Fv 'controller="Api::Internal::Raw"' | grep -Fv 'controller="Api::Meta"' | grep -Fv 'controller="Api::Root" path_info="/api/v3/internal/storage/favicon.ico"' | grep -Fv 'controller="Api::Enterprise" path_info="/api/v3/enterprise/stats/all"' | grep -oP 'rate_limit_remaining=\K[^ ]+' | sort -n | uniq | head

This chart would be useful to lower the available quota gradually over time.

Date mismatch—timezone issue?

Hubble data is collected from the prior day's rotated logs. That data is collecting properly on my system and being pushed to GHE.

When I view the Hubble graphs, though, the date that is shown is day -2 but the actual data is from day -1 (which is expected). I reside in the Mountain Time Zone. If I Remote Desktop into a system in Pacific Time Zone the date that is shown is day -1 (which is expected) and the data is still OK.

So for me, I'm seeing a display date issue but the underlying data is OK.

I am using the latest released version of Hubble (0.2.0?) and GHE 2.10.6. There was no date display issue until I updated to Hubble 0.2.0 so maybe something was introduced in the new release.

Our 2.12.X upgrade is a few weeks out. I need to install this Hubble on stage to see if the problem is still there.

Git traffic and tokenless authentication charts empty after GHE upgrade

Hello there,

We have hubble as service on a dedicated machine and we recently did GHE upgrade from v2.11 to v2.12.6 and after this hubble does not show the git traffic/ there is no data in git-download.tsv

fyi, there are no errors reported when we run update-stats.py.

In general do we need to do anything before or after we apply an update to GHE.

Regards
Piyush

Support multiple time ranges in the collaboration chart

It would be nice to have a view switcher for the collaboration chart that supports multiple time ranges.

In this way, we could render the collaboration chart with data from:

  • the last two months
  • the last two years
  • all data (@larsxschneider: Would this make sense?)

For this to work, we’ll need support for multiple data URLs (one for each view), because one file can’t trivially hold the data for all time ranges.

Display "Loading" label in collaboration chart

The "Collaboration Across Organizations" chart takes a bit of time on the first load if many organizations are visualized. A "Loading..." label would be nice to inform the user that things are happening.

JS code coverage reporting in PRs

I think it might be possible with a service like codecov.io, which is free for public projects like Hubble.

Let me know if you think this would be useful / desired, and I can take a look at implementing it.

Feature request: Test suite/linter

  • docs/assets/charts.js is pretty big and as I look to contribute to the project, I would like to ensure I'm not breaking anything in there.
  • adding a set of explicit JS linter rules for charts.js would be nice. I mentioned this for selfish reasons as my IDE currently looks like a red underline mess of spaghetti due to this project's preference for tabs 😜

Read-only user for /var/log and MySQL access

Hubble accesses GitHub Enterprise log data via administrative shell and the database via MySQL. It would be awesome if there would be a read-only administrative shell user limited to /var/log and a read-only user for SQL queries.

This would remove the (potential) risk for modifications to the instance. Plus, the user could get a lower priority or could be blocked in case of performance issues.

/cc @b4mboo @djdefi @kai @acrlewis

Overlapping data series are not rendered nicely

When selecting just two dataseries in the charts, everything looks rather good:

screenshot from 2017-10-30 17-19-09

However, when adding more data series, the colors are mixed awkwardly, making it hard to distinguish them:

screenshot from 2017-10-30 17-19-22

This is due to the use of transparency in the fill colors. Perhaps, we should look at Chart.js’s area charts, which seem to handle color transparency much more nicely:

screenshot from 2017-10-30 17-27-41

In the worst case, we should consider removing the fill colors entirely, as legibility of the charts should be top priority.

Hide empty views in view switcher

When there isn’t enough data for plotting a weekly-aggregated chart, an empty chart would be plotted. These are not interesting to look and might even confuse users. For this reason, it might be desirable to simply hide the respective view switcher buttons (or to hide the view switcher entirely if only one view can be shown currently).

Document new multiple-view configuration options

In #105, I introduce the possibility to have multiple views per chart. This still needs some documentation after we experimented a bit with the changes and feel confident about the implementation, naming, etc.

Document + test data-config attribute for charts

It'd be nice to describe in the README what the data-config attribute customizes as well as what options exist. In a similar vein, formalizing tests for its functionality would be nice.

I'll be working on a PR for this, btw.

Generate more representative demo data

It’s nice to showcase demo data on autodesk.github.io, even if it’s generated. In this way, users can get a feeling for new features such as the multiview charts.

However, some demo data files are very small and not very representative for demonstrating the use of multiview charts, for example. Hence, it would be great to have more extensive demo data files so that everything looks nice on the interactive live demo.

git-download.sh script broken on GHE 2.12.x

Due to a log format (order) change on github-audit.log.* on GHE 2.12.x (we tested that on GHE 2.12.3), git-download.sh is broken.

Here is a proposal to fix it using awk:

Current one which works on =<2.11.x

eval "$CAT_LOG_FILE" |
    perl -ne 'print if s/.*"program":"upload-pack".*"repo_name":"([^"]+).*"user_login":"([^"]+).*"cloning":([^,]+).*"uploaded_bytes":([^ ]+).*/\1\t\2\t\3\t\4/' |
    sort |
    perl -ne '$S{$1} += $2 and $C{$1} += 1 if (/^(.+)\t(\d+)$/);END{printf("%s\t%i\t%i\n",$_,$C{$_},$S{$_}) for ( keys %S );}' |
    sort -rn -k5,5

For GHE >=2.12

eval "$CAT_LOG_FILE" |
	perl -ne 'print if s/.*"cloning":([^,]+).*"program":"upload-pack".*"repo_name":"([^"]+).*"uploaded_bytes":([^,]+).*"user_login":"([^"]+).*/\1\t\2\t\3\t\4/'
	| awk '{print $2"\t"$4"\t"$1"\t"$3}'
	| sort
	| perl -ne '$S{$1} += $2 and $C{$1} += 1 if (/^(.+)\t(\d+)$/);END{printf("%s\t%i\t%i\n",$_,$C{$_},$S{$_}) for ( keys %S );}'
	| sort -rn -k5,5

Avoid error if no log file exists

There are no log files on a freshly provisioned GHE appliance. Consequently, Hubble cannot find any log file and experiences errors such as this one:

Started update of git-download.tsv
gzip: /var/log/github-audit.log.1*.gz: No such file or directory

Aggregation not working in ”pull request usage” chart

Despite the fact that I implemented a generalized aggregation framework and enabled monthly data aggregation for the pull request usage chart. This isn’t visible in the live demo, whose TSV file has monthly data only. On a real appliance, where daily data is recorded, this becomes apparent:

screenshot from 2018-02-12 15-03-31

Apparently, I introduced this bug in #105. The code responsible for aggregation is now only called for multiview charts (but the pull request usage chart has a single view currently).

Updater service not triggered without user session

When using the hubble-enterprise.timer on the appliance server, statistics are never collated. It seems that when you log off the primary server the updater stops too.

     ___ _ _   _  _      _      ___     _                    _
    / __(_) |_| || |_  _| |__  | __|_ _| |_ ___ _ _ _ __ _ _(_)___ ___
   | (_ | |  _| __ | || | '_ \ | _|| ' \  _/ -_) '_| '_ \ '_| (_-</ -_)
    \___|_|\__|_||_|\_,_|_.__/ |___|_||_\__\___|_| | .__/_| |_/__/\___|
                                                   |_|

Administrative shell access is permitted for troubleshooting and performing
documented operations procedures only. Modifying system and application files,
running programs, or installing unsupported software packages may void your
support contract. Please contact GitHub Enterprise technical support at
[email protected] if you have a question about the activities allowed by
your support contract.
Last login: Thu Dec 21 07:11:59 2017 from [masked-ip-address]
admin@github-primary:~$ journalctl -f --user-unit hubble-enterprise.timer
-- Logs begin at Wed 2017-06-21 20:01:19 UTC. --
Dec 21 06:59:43 github-primary systemd[23519]: Starting Runs Hubble Enterprise updater periodically.
Dec 21 06:59:43 github-primary systemd[23519]: Started Runs Hubble Enterprise updater periodically.
Dec 21 07:11:56 github-primary systemd[23519]: Stopping Runs Hubble Enterprise updater periodically.
Dec 21 07:11:56 github-primary systemd[23519]: Stopped Runs Hubble Enterprise updater periodically.
Dec 21 07:11:59 github-primary systemd[16682]: Starting Runs Hubble Enterprise updater periodically.
Dec 21 07:11:59 github-primary systemd[16682]: Started Runs Hubble Enterprise updater periodically.
Dec 21 07:13:16 github-primary systemd[16682]: Stopping Runs Hubble Enterprise updater periodically.
Dec 21 07:13:16 github-primary systemd[16682]: Stopped Runs Hubble Enterprise updater periodically.
Dec 21 07:16:13 github-primary systemd[22010]: Starting Runs Hubble Enterprise updater periodically.
Dec 21 07:16:13 github-primary systemd[22010]: Started Runs Hubble Enterprise updater periodically.

From the above, I logged in and 07:11:59 and logged out at 07:13:16. Then logged back in at 07:16:13.
Seems that you have to remain logged in for the collator to work.

The service collator works fine, but it has to be run manually every day.

Currently running on GHE Version 2.11.4 (but it also did this on GHE Version 2.10)
Using hubble-enterprise_0.1.1_all.deb

Active repo list should include repos with at least 2 pushers?

The active repositories list includes repositories with only 1 pusher. While many important projects are individual efforts, I wonder if this useful information in Hubble. Insisting on projects with at least 2 pushers in Hubble will very often reduce noise and allow Hubble consumers to focus on projects
where more effective collaboration is a true goal.

programming languages in use

GitHub has data about programming languages used in all its repositories. Given a .tsv of orgs, repos and languages in use, would it be interesting to add some kind of visualization like a bubble chart? If the .tsv file has (language, number-of-characters) tuples, this should be possible. If we want to break it down by organization, something like this might be interesting too.

Feature request: Link to a .tsv file with a list of active repositories

It would be handy to have a list of repositories that are considered active, as defined by Hubble. Perhaps a user may want to query the GitHub API for more specific information using this list of repositories. A text file with the org and repo names in an easily consumable format .json, .csv, .tsv etc. would serve this purpose nicely. The file could be linked on the Repositories -> Activity page.

Would you like me to submit a PR for this?

No more mouse over?

I just updated to the latest Hubble and now I no longer have mouse over information like mousing over bubbles or the bands in the collaboration charts.

I also reloaded the site in Safari in a new private window, I see the same behavior.

Have tooltip show date ranges for aggregated data

Sometimes, when looking at weekly-aggregated charts, it confuses me to see the date of the first day of the in the tooltips when hovering data points. Users could also be confused, because it’s not clear whether the shown value represents a single date or an entire week (and where the week starts and ends is also undocumented).

This could be addressed by having Chart.js show the respective date range in the tooltips. For data series that have been aggregated by first and last (which select only the chronologically first/last within each period), the tooltip should still show the respective (single) date. However, for data aggregated with sum, mean, med, min, and max, we could show the time range instead.

Collaboration chart truncates long organization names

The collaboration chart truncates long organization names:
screen shot 2017-10-17 at 08 10 20

I see two ways to fix it:

  1. Pre-calculate how much space we need for the given organization names and adjust the chart margin accrodingly.

  2. Visualize organizations names somewhat differently. Maybe something like this:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.