autodesk / hubble Goto Github PK
View Code? Open in Web Editor NEW🛰 Collaboration, usage, and health data visualization for GitHub Enterprise
Home Page: https://autodesk.github.io/hubble
License: MIT License
🛰 Collaboration, usage, and health data visualization for GitHub Enterprise
Home Page: https://autodesk.github.io/hubble
License: MIT License
It would be handy to have a list of repositories that are considered active, as defined by Hubble. Perhaps a user may want to query the GitHub API for more specific information using this list of repositories. A text file with the org and repo names in an easily consumable format .json, .csv, .tsv etc. would serve this purpose nicely. The file could be linked on the Repositories -> Activity page.
Would you like me to submit a PR for this?
The "Collaboration Across Organizations" chart takes a bit of time on the first load if many organizations are visualized. A "Loading..." label would be nice to inform the user that things are happening.
It would be nice to have a view switcher for the collaboration chart that supports multiple time ranges.
In this way, we could render the collaboration chart with data from:
For this to work, we’ll need support for multiple data URLs (one for each view), because one file can’t trivially hold the data for all time ranges.
So far, the hubble updater can either be installed on the GitHub Enterprise machine or run as a cron job from a separate machine. Having it as a service on GitHub Enterprise itself has the drawback, that you need to reinstall it after every upgrade, having it running on a separate machine has the drawback that you have to find, secure and properly maintain this other machine. In both cases, some manual setups steps are required.
I wonder whether it would be possible to use a CI service like Travis or Jenkins or a free Heroku app to run hubble updater once a day. All systems provide a declarative approach to define their build / run steps + encrypted environment variables for the credentials needed to connect to GitHub Enterprise. With Heroku, one could even come up with a Heroku deploy button.
Hubble data is collected from the prior day's rotated logs. That data is collecting properly on my system and being pushed to GHE.
When I view the Hubble graphs, though, the date that is shown is day -2 but the actual data is from day -1 (which is expected). I reside in the Mountain Time Zone. If I Remote Desktop into a system in Pacific Time Zone the date that is shown is day -1 (which is expected) and the data is still OK.
So for me, I'm seeing a display date issue but the underlying data is OK.
I am using the latest released version of Hubble (0.2.0?) and GHE 2.10.6. There was no date display issue until I updated to Hubble 0.2.0 so maybe something was introduced in the new release.
Our 2.12.X upgrade is a few weeks out. I need to install this Hubble on stage to see if the problem is still there.
In #85 we introduced an action bar with a download link for our charts. That's great for most charts but the collaboration chart has a raw data format that is not really useful as it cannot easily be processed in Excel and the like. That's why we should remove the download link for it.
After the recent changes, the repository feature usage chart stopped working on the live demo.
This is probably due to the small number of data points in this example. However, the chart should still be rendered, no matter how few data points are existent.
I’ll look into this later.
When selecting just two dataseries in the charts, everything looks rather good:
However, when adding more data series, the colors are mixed awkwardly, making it hard to distinguish them:
This is due to the use of transparency in the fill colors. Perhaps, we should look at Chart.js’s area charts, which seem to handle color transparency much more nicely:
In the worst case, we should consider removing the fill colors entirely, as legibility of the charts should be top priority.
"API request details" seems to contain a few undesired types (e.g. lfs
, application
, ...). Remove them! Only the types repos
and repositories
should be supported.
See: https://github.com/Autodesk/hubble/blob/master/updater/scripts/api-requests.sh#L8
GitHub recommends for most companies that they have a single large organization, and that you use teams to break down access rather than organizations. The collaborators graph is an awesome feature, but unfortunately gives little insight if you are running in the recommended configuration.
When there isn’t enough data for plotting a weekly-aggregated chart, an empty chart would be plotted. These are not interesting to look and might even confuse users. For this reason, it might be desirable to simply hide the respective view switcher buttons (or to hide the view switcher entirely if only one view can be shown currently).
Currently, the collaboration chart segments are colored in clockwise order. The color palette was chosen to have sequentially contrasting colors. However, the segments usually have varying sizes, which means that less important segments might end up getting contrasting colors, while the bigger chunks are hard to distinguish.
For this reason, I’d like to distribute the colors not in clockwise order, but from the biggest to the smallest segments. In this way, the biggest chunks, which are most important, would get contrasting colors.
Users are reviewing the transferred data (per day) and would like to get a histogram per user (instead of the aggregated total). Is there a way to narrow down the statistics to per user (or provide a graph/chart for user traffic)?
This would help users know if the improvements that they are making to their processes is working.
If rate limiting is enabled, then GHE logs will report the remaining quota for a user. Visualize the minimal remaining quota per day as well as the top 20 users using up the most of their quota.
Find minimal remaining quota with the following call:
zgrep -v 'status=40[13]' /var/log/github/unicorn.* | grep -Fv 'controller="Api::Internal::Raw"' | grep -Fv 'controller="Api::Meta"' | grep -Fv 'controller="Api::Root" path_info="/api/v3/internal/storage/favicon.ico"' | grep -Fv 'controller="Api::Enterprise" path_info="/api/v3/enterprise/stats/all"' | grep -oP 'rate_limit_remaining=\K[^ ]+' | sort -n | uniq | head
This chart would be useful to lower the available quota gradually over time.
The active repositories list includes repositories with only 1 pusher. While many important projects are individual efforts, I wonder if this useful information in Hubble. Insisting on projects with at least 2 pushers in Hubble will very often reduce noise and allow Hubble consumers to focus on projects
where more effective collaboration is a true goal.
Today's date is Feb 7 however the data being displayed in Hubble is only up to Feb 5. I looked at the raw data files and they indeed have data for Feb 6.
I collect data every morning and always have seen Hubble display day -1 (which is expected). Now, though, Hubble is displaying day -2 but I keep collecting the data at the same time every morning.
Thanks for any insight.
Can/Should we move docs/favicon*.png
and docs/apple-touch-icon*.png
to docs/assets/images
?
There are no log files on a freshly provisioned GHE appliance. Consequently, Hubble cannot find any log file and experiences errors such as this one:
Started update of git-download.tsv
gzip: /var/log/github-audit.log.1*.gz: No such file or directory
Sometimes, when looking at weekly-aggregated charts, it confuses me to see the date of the first day of the in the tooltips when hovering data points. Users could also be confused, because it’s not clear whether the shown value represents a single date or an entire week (and where the week starts and ends is also undocumented).
This could be addressed by having Chart.js show the respective date range in the tooltips. For data series that have been aggregated by first
and last
(which select only the chronologically first/last within each period), the tooltip should still show the respective (single) date. However, for data aggregated with sum
, mean
, med
, min
, and max
, we could show the time range instead.
Hubble tracks pure Git traffic already. Let's find a way to track Git LFS traffic as well.
Latest change on PR #5 file hubble/updater/reports/ReportForksToOrgs.py lead to SQL exception.
GHE Version 2.11.2
ERROR 1064 (42000) at line 2: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'CAST(repos.created_at AS date) AS "creation date"
FROM
users AS orgs,
' at line 3
Traceback (most recent call last):
File "update-stats.py", line 99, in <module>
main()
File "update-stats.py", line 72, in main
ReportForksToOrgs(configuration, dataDirectory, metaStats).update()
File "~/Workspace/hubble/updater/reports/Report.py", line 173, in update
self.updateData()
File "~/Workspace/hubble/updater/reports/ReportDaily.py", line 35, in updateData
self.updateDailyData()
File "~/Workspace/hubble/updater/reports/ReportForksToOrgs.py", line 13, in updateDailyData
self.detailedHeader, self.detailedData = self.parseData(self.executeQuery(self.query()))
File "~/Workspace/hubble/updater/reports/Report.py", line 105, in executeQuery
return self.executeScript(self.configuration["databaseCommand"], stdin = query)
File "~/Workspace/hubble/updater/reports/Report.py", line 96, in executeScript
stdout, stderr = executeCommand(script, stdin)
File "~/Workspace/hubble/updater/reports/Report.py", line 23, in executeCommand
raise RuntimeError(command[0] + " failed with exit code " + str(process.returncode))
RuntimeError: ssh failed with exit code 1
The reports fail to generate in 2.11.12 unless adding the import calendar
statement to Reports.py
In #105, I introduce the possibility to have multiple views per chart. This still needs some documentation after we experimented a bit with the changes and feel confident about the implementation, naming, etc.
Hello there,
We have a small feature request for housekeeping section, maybe this helps for others as well.
Git requests per user:
similar to API requests per user, Can we also have a a table under housekeeping, which shows number of git requests per user.
GitHub LFS data per repository:
for example repositories which consumes large size on GitHub are available only for admins. Maybe it is good to share the same on hubble so that it would be easy for end users.
Regards
Piyush
It'd be nice to describe in the README what the data-config
attribute customizes as well as what options exist. In a similar vein, formalizing tests for its functionality would be nice.
I'll be working on a PR for this, btw.
It’s nice to showcase demo data on autodesk.github.io, even if it’s generated. In this way, users can get a feeling for new features such as the multiview charts.
However, some demo data files are very small and not very representative for demonstrating the use of multiview charts, for example. Hence, it would be great to have more extensive demo data files so that everything looks nice on the interactive live demo.
I just updated to the latest Hubble and now I no longer have mouse over information like mousing over bubbles or the bands in the collaboration charts.
I also reloaded the site in Safari in a new private window, I see the same behavior.
The action bar is currently vertically aligned with the title. For this reason, it is rendered above the charts on smaller screens:
It might make sense to place the action bar below the charts along with the info boxes. First, because it’s not the topmost important piece of information (especially on small screens, where space is limited). Second, vertically aligning the action bar with the title will look odd when adding a view switcher as suggested in #115.
The user activity chart stems from before we came up with the idea to have all sliding windows and aggregation methods consider 4-week intervals instead of 30-day intervals. The benefit of this is that weekends are evened out, which avoids accidentally misinterpreting some fluctuations.
The user activity chart query should be changed, and the data format should be adapted to make things consistent.
Hello there,
We have hubble as service on a dedicated machine and we recently did GHE upgrade from v2.11 to v2.12.6 and after this hubble does not show the git traffic/ there is no data in git-download.tsv
fyi, there are no errors reported when we run update-stats.py
.
In general do we need to do anything before or after we apply an update to GHE.
Regards
Piyush
Calculate and visualize the number of failed webhook deliveries per day:
cat /var/log/hookshot/resqued.log.1 | grep -e '^\["repository-' | wc -l
Can the prior collaboration graph color pallet be used? The new pallet is rather bright.
docs/assets/charts.js
is pretty big and as I look to contribute to the project, I would like to ensure I'm not breaking anything in there.charts.js
would be nice. I mentioned this for selfish reasons as my IDE currently looks like a red underline mess of spaghetti due to this project's preference for tabs 😜The inline JavaScript tag displaying the server name in the header bar is bad practice.
I believe that we can obtain the server name statically through Jekyll instead.
It's similar to the current Collaboration chart, except instead of visualizing which orgs collaborate with each other, I would like to be able to track what % of contributions come from outside the org (vs. inside), for some definition of "contribution". One idea: number of commits in master authored by members of the organization, vs. number of commits by non-members of the org? I'm open to different definitions.
I would love for this to be time-series data, that is, tracking this % of outside contributions over time. This is similar to #35 in that slicing the data up by time would be required.
Finally, one extension of this could be doing a similar analysis except replacing 'org' with 'repo'.
Let me know what you think. I'm happy to take a stab at this, but I'm still learning the project 😓. Haven't delved into the updater/
portion of the repo yet, which is where I believe this change would need to land.
Background info: https://help.github.com/enterprise/2.11/user/articles/migrating-your-previous-admin-teams-to-the-improved-organization-permissions/
Apparently legacy admin teams can slow down the instance in some situations.
This is how we can find them:
ghe-console -y
Organization.find_each { |o|
if o.teams && o.teams.legacy_admin.size > 0
puts "\nOrg: #{o.name}\n"
o.teams.legacy_admin.each { |t|
puts "#{t.name} -- #{t.members.size} members"
}
end
}
I received the following feature request: Add an option for choosing a date range in the collaboration chart, while leaving the default range at two years.
E.g. like this:
grep 'BUG: soft lockup' /var/log/syslog
or
grep 'BUG: soft lockup' /var/log/error
Due to a log format (order) change on github-audit.log.* on GHE 2.12.x (we tested that on GHE 2.12.3), git-download.sh is broken.
Here is a proposal to fix it using awk:
Current one which works on =<2.11.x
eval "$CAT_LOG_FILE" |
perl -ne 'print if s/.*"program":"upload-pack".*"repo_name":"([^"]+).*"user_login":"([^"]+).*"cloning":([^,]+).*"uploaded_bytes":([^ ]+).*/\1\t\2\t\3\t\4/' |
sort |
perl -ne '$S{$1} += $2 and $C{$1} += 1 if (/^(.+)\t(\d+)$/);END{printf("%s\t%i\t%i\n",$_,$C{$_},$S{$_}) for ( keys %S );}' |
sort -rn -k5,5
For GHE >=2.12
eval "$CAT_LOG_FILE" |
perl -ne 'print if s/.*"cloning":([^,]+).*"program":"upload-pack".*"repo_name":"([^"]+).*"uploaded_bytes":([^,]+).*"user_login":"([^"]+).*/\1\t\2\t\3\t\4/'
| awk '{print $2"\t"$4"\t"$1"\t"$3}'
| sort
| perl -ne '$S{$1} += $2 and $C{$1} += 1 if (/^(.+)\t(\d+)$/);END{printf("%s\t%i\t%i\n",$_,$C{$_},$S{$_}) for ( keys %S );}'
| sort -rn -k5,5
Hello there,
How to update hubble version running as service on a dedicated machine. I am not able to find the right documentation.
Regards
Piyush
Despite the fact that I implemented a generalized aggregation framework and enabled monthly data aggregation for the pull request usage chart. This isn’t visible in the live demo, whose TSV file has monthly data only. On a real appliance, where daily data is recorded, this becomes apparent:
Apparently, I introduced this bug in #105. The code responsible for aggregation is now only called for multiview charts (but the pull request usage chart has a single view currently).
GitHub has data about programming languages used in all its repositories. Given a .tsv of orgs, repos and languages in use, would it be interesting to add some kind of visualization like a bubble chart? If the .tsv file has (language, number-of-characters) tuples, this should be possible. If we want to break it down by organization, something like this might be interesting too.
When using the hubble-enterprise.timer on the appliance server, statistics are never collated. It seems that when you log off the primary server the updater stops too.
___ _ _ _ _ _ ___ _ _
/ __(_) |_| || |_ _| |__ | __|_ _| |_ ___ _ _ _ __ _ _(_)___ ___
| (_ | | _| __ | || | '_ \ | _|| ' \ _/ -_) '_| '_ \ '_| (_-</ -_)
\___|_|\__|_||_|\_,_|_.__/ |___|_||_\__\___|_| | .__/_| |_/__/\___|
|_|
Administrative shell access is permitted for troubleshooting and performing
documented operations procedures only. Modifying system and application files,
running programs, or installing unsupported software packages may void your
support contract. Please contact GitHub Enterprise technical support at
[email protected] if you have a question about the activities allowed by
your support contract.
Last login: Thu Dec 21 07:11:59 2017 from [masked-ip-address]
admin@github-primary:~$ journalctl -f --user-unit hubble-enterprise.timer
-- Logs begin at Wed 2017-06-21 20:01:19 UTC. --
Dec 21 06:59:43 github-primary systemd[23519]: Starting Runs Hubble Enterprise updater periodically.
Dec 21 06:59:43 github-primary systemd[23519]: Started Runs Hubble Enterprise updater periodically.
Dec 21 07:11:56 github-primary systemd[23519]: Stopping Runs Hubble Enterprise updater periodically.
Dec 21 07:11:56 github-primary systemd[23519]: Stopped Runs Hubble Enterprise updater periodically.
Dec 21 07:11:59 github-primary systemd[16682]: Starting Runs Hubble Enterprise updater periodically.
Dec 21 07:11:59 github-primary systemd[16682]: Started Runs Hubble Enterprise updater periodically.
Dec 21 07:13:16 github-primary systemd[16682]: Stopping Runs Hubble Enterprise updater periodically.
Dec 21 07:13:16 github-primary systemd[16682]: Stopped Runs Hubble Enterprise updater periodically.
Dec 21 07:16:13 github-primary systemd[22010]: Starting Runs Hubble Enterprise updater periodically.
Dec 21 07:16:13 github-primary systemd[22010]: Started Runs Hubble Enterprise updater periodically.
From the above, I logged in and 07:11:59 and logged out at 07:13:16. Then logged back in at 07:16:13.
Seems that you have to remain logged in for the collator to work.
The service collator works fine, but it has to be run manually every day.
Currently running on GHE Version 2.11.4 (but it also did this on GHE Version 2.10)
Using hubble-enterprise_0.1.1_all.deb
Hi guys,
using v0.3.0 at the moment and for some reason on our GitHub Enterprise ReportGitVersion.py produces a data file with lots of entries containing the day and a 1. I ran the zgrep statement on a haproxy.* file for testing and get the same output with the day and 1. Output in git-versions.tsv and git-versionsnew.tsv looks something like this.
31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 31 1 ... ... ... 2.17.1 131 2.17.0 120 2.17 1 2.16.3 181 2.16.2 180 2.16.1 144 2.16.0 8 2.15.2 2 2.15.1 426 2.15.0 141
Has anybody else this issue as well? If I get some time tomorrow I will take the zgrep command apart and see where and why it happens but I am not the best bash/perl guy. My mileage may not be too good :)
I installed Hubble on a test machine and noticed that the README.md does not say anything about the need for meta.tsv to be present in the hubble-data repository. With an empty hubble-data repo the function checkSchemaVersion
will fail.
In GHE you can define the LDAP sync intervals. The sync time should always be smaller than the sync interval. You can monitor the sync time (for ldap team syncs) of your GHE appliance as follows:
$ zcat -f production.log* | perl -ne 'print if s/^(.*) \+.*resque\.performed.*ldap_team_sync.*?([0-9\.]+ms)$/\1 \2/' | sort
I think it might be possible with a service like codecov.io, which is free for public projects like Hubble.
Let me know if you think this would be useful / desired, and I can take a look at implementing it.
Hubble accesses GitHub Enterprise log data via administrative shell and the database via MySQL. It would be awesome if there would be a read-only administrative shell user limited to /var/log
and a read-only user for SQL queries.
This would remove the (potential) risk for modifications to the instance. Plus, the user could get a lower priority or could be blocked in case of performance issues.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.