Giter Site home page Giter Site logo

chaoss / grimoirelab Goto Github PK

View Code? Open in Web Editor NEW
463.0 24.0 177.0 269.56 MB

GrimoireLab: platform for software development analytics and insights

Home Page: https://chaoss.github.io/grimoirelab/

License: GNU General Public License v3.0

Shell 16.38% Roff 48.71% Python 30.23% Dockerfile 4.68%
chaoss metrics grimoirelab software-analytics data-mining data-visualization insights

grimoirelab's People

Contributors

alpgarcia avatar anajsana avatar animeshk08 avatar brntbeer avatar canasdiaz avatar dependabot[bot] avatar dlumbrer avatar drashti4 avatar eyehwan avatar georglink avatar hmitsch avatar ilmari-lauhakangas avatar jgbarah avatar jjmerchante avatar jonasrosland avatar jsmanrique avatar kevtainer avatar kritisingh1 avatar marcrepo avatar mhow2 avatar myml avatar nebrethar avatar nolski avatar rcheesley avatar sduenas avatar valeriocos avatar vchrombie avatar vsevagen avatar yankcrime avatar zhquan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grimoirelab's Issues

Error cloning into bare repository: RPC failed; result=22, HTTP code = 504

Using elasticgirl.22 and the ONAP repos (E.g http://gerrit.onap.org/r/dcae/apod/cdap) I've seen this error:

2017-11-16 12:11:17,047 - grimoire_elk.arthur - ERROR - Error feeding ocean from git (http://gerrit.onap.org/r/dcae/apod/cdap): git command - Cloning into bare repository '/home/bitergia/.perceval/repositories/http://gerrit.onap.org/r/dcae/apod/cdap-git'...                            
error: RPC failed; result=22, HTTP code = 504  
fatal: The remote end hung up unexpectedly     

Traceback (most recent call last):             
  File "./grimoire_elk/arthur.py", line 130, in feed_backend                                   
    ocean_backend.feed()                       
  File "./grimoire_elk/ocean/elastic.py", line 204, in feed                                    
    for item in items:                         
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backend.py", line 360, in decorator                                                                          
    for data in func(self, *args, **kwargs):   
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 120, in fetch                                                                    
    latest_items)                              
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 141, in __fetch_from_repo                                                        
    repo = self.__create_git_repository()      
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 186, in __create_git_repository                                                  
    repo = GitRepository.clone(self.uri, self.gitpath)                                         
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 763, in clone                                                                    
    cls._exec(cmd, env=env)                    
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 1220, in _exec                                                                   
    raise RepositoryError(cause=cause)         
perceval.errors.RepositoryError: git command - Cloning into bare repository '/home/bitergia/.perceval/repositories/http://gerrit.onap.org/r/dcae/apod/cdap-git'...                            
error: RPC failed; result=22, HTTP code = 504  
fatal: The remote end hung up unexpectedly 

Error "is not a Git mirror of repository " found while collecting existent Git data from Github

While trying to retrieve data for Chef repositories, I've discovered we are losing data due to this error.

How to reproduce:

2017-11-21 19:39:24,148 - grimoire_elk.arthur - ERROR - Error feeding ocean from git (https://github.com/chef/chef.git): directory '/home/bitergia/.perceval/repositories/https://github.com/chef/chef.git-git' is not a Git mirror of reposit
ory 'https://github.com/chef/chef.git'
Traceback (most recent call last):                         
  File "./grimoire_elk/arthur.py", line 116, in feed_backend                                                          
    ocean_backend.feed(latest_items=latest_items)                                                                     
  File "./grimoire_elk/ocean/elastic.py", line 204, in feed                                                           
    for item in items:                                                                                                
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backend.py", line 360, in decorator   
    for data in func(self, *args, **kwargs):                                                                          
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 120, in fetch                                                                                                                    
    latest_items)                                          
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 141, in __fetch_from_repo                                                                                                        
    repo = self.__create_git_repository()                                                                             
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 188, in __create_git_repository                                                                                                  
    repo = GitRepository(self.uri, self.gitpath)                                                                      
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 732, in __init__                                                                                                                 
    raise RepositoryError(cause=cause)    

Add TIme to Commit Field

One thing I have used a lot that doesn't seem to be supported in GrimoireLab is a field that provides the amount of time it takes for code to be committed upstream. This is the difference between the author_date and commit_date fields. In the past, I've done this with a scripted field in Kibana, but from my understanding this is very inefficient, and it also tends to disappear frequently after running Mordred (it seems to do something that wipes out scripted fields when I run it).

Kibana offers a basic time difference calculation, but it outputs a number of milliseconds and needs some formatting to be made human readable. I've typically expressed this in a number of days to the nearest tenth (e.g. 14.2 days), but hours might also work.

Would it be possible to add a field for something like time_to_commit?

Add support to the data source groups.io

The service provided by groups.io is started to being used to extend and improve the mailing list services widely used by FLOSS communities like Cloudfoundry.

The API offers a REST endpoint, more info at https://groups.io/api

The first iteration of this support could offer the same support offered to mailman:

  • data collection (with all the data related to the message, sender and recipient)
  • data enrichment with affiliation support
  • overview Kibiter panel (Kibana dashboard) with a Overview of the activity, contributors and repos (origin)

Move all repos to the chaoss GitHub organization

This is issue is to move all repositories in the grimoirelab organization to the chaoss organization, following the following schema:

  • grimoirelab/perceval -> chaoss/grimoirelab-perceval
  • grimoirelab/perceval-opnfv -> chaoss/grimoirelab-perceval-opnfv
  • grimoirelab/perceval-mozilla -> chaoss/grimoirelab-perceval-mozilla
  • grimoirelab/perceval-puppet -> chaoss/grimoirelab-perceval-puppet
  • grimoirelab/arthur -> chaoss/grimoirelab-kingarthur
  • grimoirelab/GrimoireELK -> chaoss/grimoirelab-elk
  • grimoirelab/grimoirelab-toolkit -> chaoss/grimoirelab-toolkit
  • grimoirelab/sortinghat -> chaoss/grimoirelab-sortinghat
  • grimoirelab/mordred -> chaoss/grimoirelab-mordred
  • grimoirelab/panels -> chaoss/grimoirelab-sigils
  • grimoirelab/reports -> chaoss/grimoirelab-manuscripts
  • grimoirelab/grimoirelab -> chaoss/grimoirelab
  • grimoirelab/training -> chaoss/grimoirelab-tutotrial
  • grimoirelab/grimoirelab.github.io -> chaoss/grimoirelab-web
  • grimoirelab/use_cases -> DEPRECATED

The plan will be desscribed in comments to this issue.

[tests] Some details about cache-test.tgz

Some details about cache-test.tgz in tests directory:

  • It would be better if instead of one large file, we had one per data source. That would make it more manageable, would need less storage when changing details in one data source (or adding a new one), and would avoid the current warning by github: "file is larger than GitHub's recommended maximum file size of 50.00 MB".
  • it unpacks to perceval-cache: it would be better if the basename of the file, and the directory name would be the same contains data for testing: are we sure all of it is public? in particular, it seems we have a spreadsheet, I'm not sure it is public...
  • I'm not sure the data for the telegram channel can be consdiered public

Get the percent of questions with a reply (comment or answer) of any kind

In mailing lists and Q&A forums there are some questions answered or commented and other without any answer of any kind.

Questions might be marked as "answered" and with the date when they have been answered the first time and the last time (so there could be a chance to get evolution over time of attention to questions).

Git data is not being collecting due to "Error in the pull function"

Data collection is not working for the following repo: https://git.opendaylight.org/gerrit/dlux. I've got the following error:

2017-11-27 22:43:43,347 - grimoire_elk.arthur - ERROR - Error feeding ocean from git (https://git.opendaylight.org/gerrit/dlux): git command - fatal: unable to access 'https://git.opendaylight.org/gerrit/dlux/': gnutls_handshake() failed: Error in the pull function.

Also digging in the logs, I found this other one:

2017-11-27 19:02:00,871 - grimoire_elk.arthur - ERROR - Error feeding ocean from git (https://git.opendaylight.org/gerrit/integration): <urlopen error [Errno 101] Network is unreachable>

Using release elasticgirl.23

Gerrit status values for patches

GrimorieLab groups Gerrit status for patches in MERGED (=CR+2), ABANDONED and NEW (CR-2, CR-1, CR0, CR+1) status but the enriched items miss which values were set originally.

Differentiating might be nice to cover different use cases such as:

  • "This patch has not received any review / feedback at all" (CR0)
  • "This patch needs improvement, follow up with the developer if they plan to improve it" (CR-1)
  • "This patch needs someone with CR+2 rights to merge this patch" (CR+1).

This issue was originally opened by @aklapper in chaoss/grimoirelab-perceval#357

Creating a new metric

@jgbarah

So, over the past month or so I've been working on trying to get more familiar with the interface and making my own custom visualizations in the Kibana interface, however there's one thing I still haven't quite figured out how to do. I'm interested in creating a custom visualization, a metric. The calculating of this metric requires some simple math be done on values being pulled from the database. I can't seem to find any way to do this, can you help me out with this problem? For specifics, I'm looking at trying to create a metric that gives the 'Pony Factor' for a repository, possibly getting it accepted into the project as a default visualization.

GSoC idea: Reporting of CHAOSS Metrics

[ This issue for addressing questions and comments related to this GSoC idea, which is one of the ideas proposed by the CHAOSS group for the 2018 edition of GSoC ]

[Edited, 2018-03-03: Added procedure for uploading information for application.]
[Edited, 2018-03-12: Added link to guidelines for proposals]

Currently, GrimoireLab includes a tool for reporting: Manuscripts. This tool reads data from a GrimoireLab ElasticSearch database, and produces with it a PDF report with relevant metrics for a set of analyzed projects. Internally, Manuscripts uses some Python code to produce charts and CSV tables, which are integrated into a LaTeX document to produce the final PDF. Other approaches, such as producing Jupyter notebooks, will be explored too.

This idea is about adding support to Manuscripts to produce reports based on the work of the CHAOSS Community. Since Manuscripts is still a moving target, this will be also a chance to participate in the general development of the tool itself, to convert it into a generic reporting system for GrimoireLab data.

The aims of the project are as follows:

  • Writing Python code to query GrimoireLab Elastisearch databases and obtain from it the metrics relevant for the report. Possible technologies to achieve this aim include Python Pandas.
  • Writing Python code to produce suitable representation for those metrics, such as tables and charts.
  • Adapting current tools to produce reports directly from data sources, by managing the GrimoireLab toolchain. Possible solutions include adding the code to Mordred, the tool orchestrating GrimoireLab tools.

Other aims, such as producing Jupyter notebooks as a final result or an intermediate step are completely within scope.

  • Difficulty: easy/medium
  • Requirements: Python programming. Interest in software analytics. Willingness to understand GrimoireLab internals.
  • Recommended: Experience with Python interfaces to databases would be convenient, but can be learned during the project. Experience with Latex and/or Python Jupyter Notebooks would help.
  • Mentors: @jgbarah , @germonprez , @jcabot

Microtasks

For becoming familiar with the GitLab technology, it is useful if you produce an analysis (Elasticsearch indexes and dashboard) for a GitHub project (git repos and GitHub issues and pull requests). Have a look at the "Before you start" chapter in the GrimoireLab tutorial to learn how to install the needed infrastructure (ElasticSearch, Kibiter, MariaDB) and GrimoireLab itself as Python packages, or how to deploy everything as a Docker container.

Once you're familiar with producing analysis, you can exploit the information in the indexes via a Python script, preferably presented as a Python Jupyter Notebook:

  • Microtask 1: Produce a listing of the number of new committers per month, and the number of commits for each of them, as a table and as a CSV file. Use the GrimoireLab enriched index for git.

  • Microtask 2: Produce a chart showing the distribution of time-to-close (using the corresponding field in the GrimoireLab enriched index for GitHub issues) for issues already closed, and opened during the last six months.

  • Microtask 3: Produce a listing of repositories, as a table and as CSV file, with the number of commits authored, issues opened, and pull requests opened, during the last three months, ordered by the total number (commits plus issues plus pull requests).

  • Microtask 4: Perform any other analysis you may find interesting, based on GrimoireLab enriched indexes for git and GitHub repositories.

If you want, you can also:

  • Microtask 5: Produce a pull request for any of the GrimoireLab tools, and try to follow instructions until it gets accepted. Try do do something simple that you consider useful, not necessarily fix to the code: improvement of comments, documentation or testing will usually be easier to get accepted, and very useful for the project. Please, avoid just producing a random pull request just to have another microtask: the objective is not that you get one more microtask done, but that you understand how to interact with developers in the project contributing with something that could be useful).

Of course, there is no need to do all the microtasks, you only need to show that your skills are in good standing for working in this project.

Showing the work you did

If you want to show the work you did, open a GitHub repository, and upload to it:

  • A README.md file explaining what you did, and linking to the results (which will be in the same repository, see below). This will be the main file to show your skills and interest on the project, so try to make it organized and clear, in a way that we can easily understand what you did.

  • Screenshots of the dashboard(s) you produced, and configuration files you used for them (if any). Please remember removing passwords and/or auth tokens that could be in them, before uploading. In the REAME.md file, link all those files, telling about which GitHub projects you analyzed.

  • Python scripts and/or Python Jupyter notebooks you produced, with enough information (in comments and/or in the README.md file) so that we can run them if needed. Upload Python scripts ready to work assuming GrimoireLab packages are already installed. Upload Jupyter notebooks ready to be seen via the GitHub web interface.

  • Links to the pull request you did (if any), along with any comment you may have about it.

Submitting information for the application process

  • You must complete at least one micro-task related to the idea you are interested on.

  • Once you completed at least one micro-task, go to the governance repository and create a pull request to add yourself, your information, and a link to your repository with the completed micro-task(s) in the GSoC-interest.md file (see above for the contents of the repository).

  • You are welcome to include in your repository other information that could be of interest, such as open issues or pull requests submitted to the project to which you intend to contribute during GSoC, contributions to other projects, skills, and other related information.

  • You must complete these things by March 27 16:00 UTC. Make sure to also submit the information required by GSoC for applicants (i.e., project proposal), linking to it from your pull request in the GSoC-interest.md file.

  • Message to the mailing list with guidelines on proposals.

Useful documentation

Asking for help

If you need help, please use the following channels.

For issues related to GrimoireLab:

For general issues related to CHAOSS or CHAOSS metrics

Custom metadata support

When there's additional custom metadata available for a data source that's being ingested into GrimoireLabs / Bitergia, I want to be able to capture and index that metadata into the system, so that I can query and report on it the same as the "system" metadata that GrimoireLabs / Bitergia already captures.

Examples of such metadata include:

  • Project lifecycle information (e.g. current state and lifecycle stage transition dates)
  • Project contribution date (which is not always the same as repository creation date or first commit date)
  • Member / customer flag
  • Type of organisation (e.g. FinServ vs FinTech vs other)

Stats on Jira activities

I'd like to see stats for people's contribution to project Jira. For example, number of activities (e.g. comments added, tickets opened/closed, etc.) per each individual would be helpful.

Track multiple Discourse instances

We would like to be able to have more than one Discourse instance in our Community Analytics stack. It is currently of secondary concern on whether or not these multiple instances will be visualized in one dashboard or in multiple-dashboards (tabs).

-Henrik

Define 'Path to Maintainership' and 'List Maintainers'

Per proposal in chaoss/community#5 :

The README.md of the repository contains a list of who is maintainer. Each CHAOSS repositry brings together different people and they document in the repository specific CONTRIBUTING.md how somone becomes a maintainer on their repository.

TODO:

  • Specify 'path to maintainership' for this repo in CONTRIBUTING.md file
  • List current maintainers in README.md file

Git data is not being collected for repo git://git.openembedded.org/bitbake

Data collection is not working when analyzing the repo git://git.openembedded.org/bitbake. It stores 0 commits in the ES database.

Current version used:

SORTINGHAT='641dabd71d4f4a23547592a10096240343268bd1'
GRIMOIREELK='9a69a31a6499402cddb3c622f12dc5d11e490fed'
PERCEVAL='f1c170ac6692ffe54558d590a96bedf748624651'
KIBITER='f7bf173cb1d79418b9909d877adfb998ce7b52da'
GRIMOIRELAB_GITHUB_IO='2f71c036020f6873efdd079f4ec2b5f26584cb2b'
ARTHUR='3efff3311b52222b7d06bf63ce97f8a42572bc06'
USE_CASES='31ea49ecc28f3e27aa29fddb0e01b7f475069b6a'
PANELS='ffe73323afe84e491d0b1705e86a7db3588bff38'
MORDRED='024019f5ee1df7096f91f7b41286cd58e55e6495'
TRAINING='9df30c3ec955387a423f261bb141f74133bf7ff7'
PERCEVAL_MOZILLA='423615b9745ab0b0e577701504e46e0f32aab6a3'
PERCEVAL_PUPPET='9c0d4c72a0e9b3c7c1e3295f72ebe70c5c874084'
REPORTS='4ad4f7c9d8cd5eb728a95c3a358d8aa339102c75'
GRIMOIRELAB_TOOLKIT='95d271af32744eb38f73df592f2e0dc5ea97683b'
PERCEVAL_OPNFV='0bec845df7e17965892dfb1181da0ab1c77168dc'
GRIMOIRELAB='efd656ff641623e6a639e095f14f38a71cacc094'

Perceval seems to be working ok but 0 documents are stored in the index (with that origin)

2017-11-23 16:13:42,311 - perceval.backends.core.git - DEBUG - Running command git update-ref refs/heads/1.20 3bb3f1823bdd46ab34577d43f1e39046a32bca77 (cwd: /home/bitergia/.perceval/repositories/git://git.openembedded.org/bitbake-git, env: {'HOME': '/home/bitergia', 'LANG': 'C', 'PAGER': ''})
2017-11-23 16:13:42,330 - perceval.backends.core.git - DEBUG - Git refs/heads/1.20 ref updated to 3bb3f1823bdd46ab34577d43f1e39046a32bca77 in git://git.openembedded.org/bitbake (/home/bitergia/.perceval/repositories/git://git.openembedded.org/bitbake-git)
2017-11-23 16:13:42,331 - perceval.backends.core.git - DEBUG - Running command git update-ref refs/tags/1.6.2 b72be28933502be03bf20a837e08913e898d66ab (cwd: /home/bitergia/.perceval/repositories/git://git.openembedded.org/bitbake-git, env: {'HOME': '/home/bitergia', 'LANG': 'C', 'PAGER': ''})
2017-11-23 16:13:42,351 - perceval.backends.core.git - DEBUG - Git refs/tags/1.6.2 ref updated to b72be28933502be03bf20a837e08913e898d66ab in git://git.openembedded.org/bitbake (/home/bitergia/.perceval/repositories/git://git.openembedded.org/bitbake-git)
2017-11-23 16:13:42,352 - perceval.backends.core.git - DEBUG - Running command git update-ref refs/heads/1.3 eda84e7271639baac4cb0623ded03b1d7a1ffcae (cwd: /home/bitergia/.perceval/repositories/git://git.openembedded.org/bitbake-git, env: {'HOME': '/home/bitergia', 'LANG': 'C', 'PAGER': ''})
2017-11-23 16:13:42,424 - perceval.backends.core.git - DEBUG - Git refs/heads/1.3 ref updated to eda84e7271639baac4cb0623ded03b1d7a1ffcae in git://git.openembedded.org/bitbake (/home/bitergia/.perceval/repositories/git://git.openembedded.org/bitbake-git)
2017-11-23 16:13:42,424 - perceval.backends.core.git - DEBUG - Running command git remote prune origin (cwd: /home/bitergia/.perceval/repositories/git://git.openembedded.org/bitbake-git, env: {'HOME': '/home/bitergia', 'LANG': 'C', 'PAGER': ''})
2017-11-23 16:13:42,532 - perceval.backends.core.git - DEBUG - Git repository git://git.openembedded.org/bitbake (/home/bitergia/.perceval/repositories/git://git.openembedded.org/bitbake-git) is synced
...
2017-11-23 16:13:42,533 - grimoire_elk.ocean.conf - DEBUG - Adding repo to Ocean https://bitergia:[email protected]/data/conf/repos/git_yocto_171123_git:__git.openembedded.org_bitbake {'index': 'git_yocto_171123', 'repo_
update': '2017-11-23T16:13:42.533460', 'repo_update_start': '2017-11-23T16:13:39.266488', 'index_enrich': 'git_yocto_171123_enriched_171123', 'project': None, 'success': True, 'backend_name': 'git', 'backend_params': ['git://git.openembed
ded.org/bitbake', '--latest-items']}
2017-11-23 16:13:42,647 - urllib3.connectionpool - DEBUG - https://yoctoproject.biterg.io:443 "POST /data/conf/repos/git_yocto_171123_git:__git.openembedded.org_bitbake HTTP/1.1" 200 None
2017-11-23 16:13:42,649 - grimoire_elk.arthur - INFO - Done git 

No errors found:

root@somosierra:/data/docker/containers/mordred/yocto# cat /data/docker/logs/mordred/yocto/all.log|grep "org/bitbake"|grep ERROR|wc -l
0

Jenkins errored fetching "status_code"

Got the following traceback with the following repo: https://jenkins.opendaylight.org/releng:

2017-11-27 22:43:43,149 - grimoire_elk.arthur - ERROR - Error feeding ocean from jenkins (https://jenkins.opendaylight.org/releng): 'NoneType' object has no attribute 'status_code'
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 345, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 844, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.4/dist-packages/urllib3/connection.py", line 326, in connect
    ssl_context=context)
  File "/usr/local/lib/python3.4/dist-packages/urllib3/util/ssl_.py", line 325, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib/python3.4/ssl.py", line 364, in wrap_socket
    _context=self)
  File "/usr/lib/python3.4/ssl.py", line 577, in __init__
    self.do_handshake()
  File "/usr/lib/python3.4/ssl.py", line 804, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:600)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 630, in urlopen
    raise SSLError(e)
urllib3.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:600)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.6-py3.4.egg/perceval/backends/core/jenkins.py", line 233, in __send_request
    req = requests.get(url)
  File "/usr/local/lib/python3.4/dist-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.4/dist-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/requests/adapters.py", line 519, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:600)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./grimoire_elk/arthur.py", line 133, in feed_backend
    ocean_backend.feed()
  File "./grimoire_elk/ocean/elastic.py", line 204, in feed
    for item in items:
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.6-py3.4.egg/perceval/backend.py", line 360, in decorator
    for data in func(self, *args, **kwargs):
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.6-py3.4.egg/perceval/backends/core/jenkins.py", line 91, in fetch
    raw_builds = self.client.get_builds(job['name'])
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.6-py3.4.egg/perceval/backends/core/jenkins.py", line 223, in get_builds
    return self.__send_request(url_jenkins)
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.6-py3.4.egg/perceval/backends/core/jenkins.py", line 237, in __send_request
    if e.response.status_code in [408, 410, 502, 503, 504]:
AttributeError: 'NoneType' object has no attribute 'status_code'

Using release elasticgirl.23

Stats on Gerrit reviews

I'd like to get some data on people's contribution for code reviews. For example, I'd like to see a table with reviewer's names, number of reviews, number of projects where reviews were contributed, etc.

'Feature Request' label for this tracker

In order to group tickets by type, it would be great having the 'Feature Request' label/tag. I see the 'enhacement' label, but I'm not sure whether this can be used with the same purpose.

[tests] Some details in docker-compose.yml:

When reading the file docker-compose.yml: in tests, shouldn't mariadb export some port? Or maybe not, and we could do the same for elasticsearch and kibitter? As the file is now, you need some ports available, which maybe are not in a certain host.

Querying using python while running grimoire in Docker

@utkarshrai commented in chaoss/grimoirelab-elk#231 :

I was working on calculating the time of close of issues.

When I'm using p2o.py, I am able to load a particular repository in my database and they have all the required fields to work with. This accomplishes the task easily.

When I run grimoire/full in docker and expose the 9200 port I am able to access the ES database from my py/ipynb script. But by default, the entire grimoire project is loaded. I tried reading through the settings but could not find a way to restrict it to loading a particular repository.
I only want to produce dashboards and database for grimoire-perceval, not the full project.
What should be the command?

Current Command:
docker run -p 127.0.0.1:5601:5601 -p 9200:9200 -v $(pwd)/credentials.cfg:/mordred-override.cfg -t grimoirelab/full

Moving the question here, because here is where the information about the docker images is being kept.

Visualising the Metrics

As per the discussion with @jgbarah, apart from creating pdfs of analysis of the projects using Manuscripts, we can look at other methods to visualise the metrics.
Some of them being:

  • HTML pages (One for each Metric describing the activity metrics under that Metric).
  • Interactive Notebooks by embedding D3.js code in them so that the users can analyse the Metrics more throughly. Using JupyterLab for that.
  • Using Vega to visualise the Metrics.

This ticket can serve as a discussion point for how the Metrics can be further visualised.

Support for projects hierarchy

Wouldn't it be nice having some kind of projects hierarchy in GrimoireLab?

Currently GrimoireLab only supports 1 level hierarchy according to projects.json example and the existing documentation. So we have Project > Repositories

In Bestiary is being introduced the Ecosystem level, so 2 levels are gonna be supported (sort of): Ecosytem > Project > Repositories

Is it gonna support Ecosystem > Project > ... > Project > Repositories? Do we need that deepth? This might be also related with #71

I would like also knowing if the community would agree on the following assumptions:

  • Do we expect that some same repositories might be in different projects or ecosystems? I would say "no, by design".
  • Might some projects be under no specific ecosystem? I would say "no, by design".
  • Might some repositories be under no specific project? I would say "no, by design".
  • Project names might be not unique. For example, several ecosystems might have a "Documentation" project. This should not be an issue.

Thanks for the comments.

Missing dependency in grimoire-mordred

When I try to run mordred with latest release (18.04-01), it seems that there is a missing dependency:

Traceback (most recent call last):
  File "/home/jsmanrique/grimoirelab/venv/bin/mordred", line 36, in <module>
    from mordred.config import Config
  File "/home/jsmanrique/grimoirelab/venv/lib/python3.5/site-packages/mordred/config.py", line 29, in <module>
    from grimoire_elk.utils import get_connectors
  File "/home/jsmanrique/grimoirelab/venv/lib/python3.5/site-packages/grimoire_elk/utils.py", line 55, in <module>
    from perceval.backends.puppet.puppetforge import PuppetForge, PuppetForgeCommand
ImportError: No module named 'perceval.backends.puppet'

Aggregated Jenkins metrics by job instead of by build

Wouldn't it be nice having trends and metrics related with Jenkins jobs instead of builds?

For Jenkins as data source, GrimoireLab currently shows metrics related with each build:
captura el 2018-03-15 a las 13 59 43

It would be nice having metrics related with each job, like:

  • average time
  • evolution of time spend on it
  • success build ratio

[tests] Check some details in projects.json

There are some details to check in projects.json, in the tests directory:

  • why "*jenkins" and "jenkins" as entries?
  • Why "*jira" (two of them) and "jira"?

Let's check them, and either explain why they are needed, or just remove those entries.

Git data is not being collected due to 'could not read Username'

Using the latest version provided by our Dev team, I'm not able to collect data for the YoctoProject.

How to reproduce:

The data collection is stuck, not sure whether the thread is alive or waiting for a password.

2017-11-22 10:32:45,677 - grimoire_elk.arthur - ERROR - Error feeding ocean from git (https://github.com/patternfly/patternfly-planning.git): git command - Cloning into bare repository '/home/bitergia/.perceval/repositories/https://github.com/patternfly/patternfly-planning.git-git'...
fatal: could not read Username for 'https://github.com': No such device or address

Traceback (most recent call last):
  File "./grimoire_elk/arthur.py", line 116, in feed_backend
    ocean_backend.feed(latest_items=latest_items)
  File "./grimoire_elk/ocean/elastic.py", line 204, in feed
    for item in items:
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backend.py", line 360, in decorator
    for data in func(self, *args, **kwargs):
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 120, in fetch
    latest_items)
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 141, in __fetch_from_repo
    repo = self.__create_git_repository()
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 186, in __create_git_repository
    repo = GitRepository.clone(self.uri, self.gitpath)
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 763, in clone
    cls._exec(cmd, env=env)
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 1220, in _exec
    raise RepositoryError(cause=cause)
perceval.errors.RepositoryError: git command - Cloning into bare repository '/home/bitergia/.perceval/repositories/https://github.com/patternfly/patternfly-planning.git-git'...
fatal: could not read Username for 'https://github.com': No such device or address

This error was found with the version:

SORTINGHAT='641dabd71d4f4a23547592a10096240343268bd1'      
GRIMOIREELK='9a69a31a6499402cddb3c622f12dc5d11e490fed'     
PERCEVAL='fcae408a56f16a25f1f62abc6c01b61d2e0dc100'        
KIBITER='f7bf173cb1d79418b9909d877adfb998ce7b52da'         
GRIMOIRELAB_GITHUB_IO='2f71c036020f6873efdd079f4ec2b5f26584cb2b'                                                       
ARTHUR='3efff3311b52222b7d06bf63ce97f8a42572bc06'          
USE_CASES='31ea49ecc28f3e27aa29fddb0e01b7f475069b6a'       
PANELS='ffe73323afe84e491d0b1705e86a7db3588bff38'          
MORDRED='024019f5ee1df7096f91f7b41286cd58e55e6495'         
TRAINING='9df30c3ec955387a423f261bb141f74133bf7ff7'        
PERCEVAL_MOZILLA='423615b9745ab0b0e577701504e46e0f32aab6a3'                                                            
PERCEVAL_PUPPET='9c0d4c72a0e9b3c7c1e3295f72ebe70c5c874084' 
REPORTS='4ad4f7c9d8cd5eb728a95c3a358d8aa339102c75'         
GRIMOIRELAB_TOOLKIT='95d271af32744eb38f73df592f2e0dc5ea97683b'                                                         
PERCEVAL_OPNFV='0bec845df7e17965892dfb1181da0ab1c77168dc'  
GRIMOIRELAB='efd656ff641623e6a639e095f14f38a71cacc094' 

Add gender support to the GrimoireLab toolchain

This task aims at building a proper gender support across the several tools available in GrimoireLab.

The following pieces are required from the best of my understanding:

This ticket will likely have an iterative process and other tools or requirements could be later added.

[tests] Some details in the stage script

Some comments on the stage script in the tests directory:

  • it should run in any properly configured machine (now includes wired paths, see below).
  • it should be, for consistency, a Python script
  • it would be better if installation of Python packages should be with pip install, even from a local directory (instead of python3 setup.py install). In particular, the latter may not work fro perceval-* packages.
  • why Python installation modules need sudo?
  • arthur seems to be a part of the test shouldn't it a least be optional?
  • shouldn't http://172.17.0.1:5601 be a configurable address, or something?

Track reviewers in panels for code review systems

Currently, in panels for code review systems (Gerrit, GitHub Pull Requests), we are tracking mainly submitters of patches. We should track (maybe in different panels) reviewers, since they are also a very interesting part of the story. For this, I think we have all the information in the raw indexes, so very likely we don't need changes in Perceval. But for sure we will need changes in GrimoireELK (maybe for creating a new index with info about reviewers) and in panels (for the new panels, or modification of the current ones).

Ignore submodule updates when calculating git stats

Gerrit has a feature called Submodule subscriptions [0] where it will automatically move the commit hash pointer for aggregator projects that are configured to subscribe to a submodule. When it does this however it automatically creates a new commit in the aggregator project for the updated submodule.

You can see many examples in this project [1] where most of the commits are just "Update git submodules" the result of this is that the contribution of to submodule project causes contributions to count more than once (by the number of projects that load the submodule).

[0] https://gerrit-review.googlesource.com/Documentation/user-submodules.html
[1] https://github.com/opendaylight/releng-autorelease/commits/master

Latest grimoirelab/full docker image not working with GitHub Enterprise backends

I ran the full docker image a couple of months ago on my company's GitHub Enterprise instance. All worked well! I would run the image like so:

docker run --name grim -p 127.0.0.1:9200:9200 -p my.secret.ip.address:5601:5601
-v $(pwd)/logs:/logs
-v $(pwd)/es-data:/var/lib/elasticsearch
-v $(pwd)/credentials.cfg:/mordred-override.cfg
-v $(pwd)/projects.json:/projects.json
-t grimoirelab/full

... with the following config files:

➔ cat credentials.cfg
[github]
api-token = super-secret-i-wont-tell-you
enterprise-url = https://git.corp.adobe.com

[projects]
projects_file = /projects.json

~/src/corp-grimoirelab on master
➔ cat projects.json
{
    "opensource_submission_process": {
        "github": [
            "https://git.corp.adobe.com/OpenSourceAdvisoryBoard/opensource_submission_process"
        ],
        "git": [
            "https://git.corp.adobe.com/OpenSourceAdvisoryBoard/opensource_submission_process"
        ]
    }
}

However, I just updated to the latest, and it no longer works. I see authentication errors related to GitHub in the output of the docker command - it looks like grimoire is trying to talk to public GitHub and not my enterprise instance:

Starting container: eb226f9c9242
Starting Elasticsearch
[ ok ] Starting Elasticsearch Server:.
Waiting for Elasticsearch to start...
tcp        0      0 0.0.0.0:9200            0.0.0.0:*               LISTEN      -
Elasticsearch started
Starting MariaDB
[ ok ] Starting MariaDB database server: mysqld.
Waiting for MariaDB to start...
tcp6       0      0 :::3306                 :::*                    LISTEN      -
MariaDB started
Starting Kibiter
Waiting for Kibiter to start...
..Kibiter started
Starting Mordred to build a GrimoireLab dashboard
This will usually take a while...
2018-04-02 17:01:33,746 - mordred.task_panels - ERROR - Can not find kibiter version
2018-04-02 17:01:33,747 - mordred.task_panels - ERROR - Can not configure kibiter
Dashboard panels, visualizations: uploading...
Dashboard panels, visualizations: uploaded!
Dashboard menu: uploading...
Dashboard menu: uploaded!
Collection for github: starting...
Collection for git: starting...
2018-04-02 17:01:40,574 - grimoire_elk.arthur - ERROR - Error feeding ocean from github (https://github.com/OpenSourceAdvisoryBoard/opensource_submission_process): 401 Client Error: Unauthorized for url: https://api.github.com/rate_limit
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/grimoire_elk/arthur.py", line 207, in feed_backend
    ocean_backend.feed()
  File "/usr/local/lib/python3.5/dist-packages/grimoire_elk/ocean/elastic.py", line 204, in feed
    self.feed_items(items)
  File "/usr/local/lib/python3.5/dist-packages/grimoire_elk/ocean/elastic.py", line 213, in feed_items
    for item in items:
  File "/usr/local/lib/python3.5/dist-packages/perceval/backend.py", line 127, in fetch
    self.client = self._init_client()
  File "/usr/local/lib/python3.5/dist-packages/perceval/backends/core/github.py", line 218, in _init_client
    self.archive, from_archive)
  File "/usr/local/lib/python3.5/dist-packages/perceval/backends/core/github.py", line 355, in __init__
    self._init_rate_limit()
  File "/usr/local/lib/python3.5/dist-packages/perceval/backends/core/github.py", line 531, in _init_rate_limit
    raise error
  File "/usr/local/lib/python3.5/dist-packages/perceval/backends/core/github.py", line 525, in _init_rate_limit
    response = super().fetch(url)
  File "/usr/local/lib/python3.5/dist-packages/perceval/client.py", line 132, in fetch
    response = self._fetch_from_remote(url, payload, headers, method, stream, verify)
  File "/usr/local/lib/python3.5/dist-packages/perceval/client.py", line 157, in _fetch_from_remote
    raise e
  File "/usr/local/lib/python3.5/dist-packages/perceval/client.py", line 153, in _fetch_from_remote
    response.raise_for_status()
  File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 935, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://api.github.com/rate_limit
Collection for github: finished after 00:00:00 hours
Username for 'https://git.corp.adobe.com': Loading blacklist...
0/0 blacklist entries loaded
Loading unique identities...
48/48 unique identities loaded
Enrichment for github: starting...
Elasticsearch aliases for github: creating...
Elasticsearch aliases for github: created!
Enrichment for github: finished after 00:00:00 hours
Elasticsearch aliases for github: creating...
Elasticsearch aliases for github: created!
Loading blacklist...
0/0 blacklist entries loaded
Loading unique identities...
48/48 unique identities loaded
Loading blacklist...
0/0 blacklist entries loaded
Loading unique identities...
48/48 unique identities loaded
Loading blacklist...
0/0 blacklist entries loaded
Loading unique identities...
48/48 unique identities loaded

... with that last bit of the output, from 'feeding ocean' to the end, repeating over and over.

It looks like the enterprise-url portion of my .cfg file is not being honoured.

(not sure if the Can not configure kibiter error is serious or not, let me know if I should file that separately)

Thanks for any help!

Aggregate Jenkins metrics by node type

Usually Jenkins nodes are set up dynamically with a certain configuration, and their names could be related with that configuration.

For example:

  • ubuntu1604-basebuild-4c-4g-2165, following the pattern <OS>-basebuild-<Cores>c-<GB RAM>g-<random>, used by ONAP and OpenDayLight
  • lf-pod2, following the pattern <company>-<slave_type><number>. Ref: OPNFV Jenkins Slave Naming Scheme

Wouldn't it be nice having aggregated metrics by one of this patterns?

For example, base on the "architecture" pattern (-) I would like to know:

  • Number of builds, jobs, etc.
  • Average and trends of time to build

It would help identifying efficient nodes and their characteristics.

Adding different data sources with Dockerhub installation

Hi,

Thanks GrimoireLab for creating this tool. I used it but, I find it difficult to add data sources and see the dashboard for those org's or repo's. This is particularly with DockerHub.

Could anyone guide me to right resource?

Support for release based reporting

A feature to allow a user to specify 2 refspecs, either branches or tags. The statistics should walk through git to the locate the common parent of the 2 branches or tags, and then use that as the initial commit to count for contributions reporting in the reporting branch.

For example if if a project has a 1.0 release and a 2.0 release, we would be interested in identifying and getting statistics on all of the contributions to the 2.0 release.

GSoC idea: Support of Standard Formats for Description of Projects

[ This issue for addressing questions and comments related to this GSoC idea, which is one of the ideas proposed by the CHAOSS group for the 2018 edition of GSoC ]

[Edited, 2018-02-28: Added some more detail to the idea, and changed its structure so that it is easier to understand.]
[Edited, 2018-03-03: Added procedure for uploading information for application.]
[Edited, 2018-03-12: Added link to guidelines for proposals]

Currently, GrimoireLab uses its own format for describing a project, including the data sources (repositories to retrieve information from), the internal organization of the project (e.g., in subprojects), and specifics about how the data is to be presented. It also uses its own format for expressing the many identities that a person may have in the different data sources, and the information related to their profile (preferred name, affiliation, etc.). The usual workflow includes people maintaining GrimoireLab dashboards to edit those files to decide which repositories are analyzed, or to add new identities that should be considered for certain persons. However, there are sources of information that could be retrieved automatically for at least some of that information. This idea is about walking that way of retrieving information automatically, using files maintained by the projects, or APIs provided by services.

Some of these sources of information are:

  • For the project structure, including repositories, some standard formats already exist, that can be directly used, or used with some modifications. Among them, DOAP is one of the most interesting ones, but there are many others. DOAP is used, for example, to define all Apache projects.

  • Some projects, such as Eclipse or OpenStack, maintain their own format for expressing their structure, including their repositories.

  • Some APIs offer list of repositories for projects, such as GitHub, BitBucket or GitLab, which provide lists of repositories for a given organization, or Gerrit, which provides lists of git repositories.

  • Some projects maintain identities of developers in some files in git repositories. Some popular formats include gitdm (used by the tool of the same name) and mailmap (used by git and other tools), that allow to declare several email addresses to be merged for the same person, and in some cases affiliation and other information.

  • Some projects use consistent naming in all or most of the datasources, allowing for easy identification of the same person in all of them.

This idea is about identifying formats used by projects to describe themselves and adding support to GrimoireLab. This includes not only static formats, but also APIs.

The aims of the project are as follows:

  • Supporting as many as possible of the formats and APIs mentioned above, and other that could found interesting, by either converting them to the current GrimoireLab format, or more likely, directly supporting it in Mordred or some related tools. For some of them, there is already partial support (for example, gitdm is partially supported by SortingHat, or the OpenStack format is partially supported by Mordred).

  • Testing the implementation with large projects supporting those formats, such as Apache Server (that uses DOAP), or some GitHub organization, etc.

The aims may require modifications to Mordred and other related tools to make them modular and simplify the implementation of support for future formats or APIs.

  • Difficulty: easy/medium
  • Requirements: Python programming. Willingness to understand GrimoireLab internals.
  • Recommended: Experience with Python HTTP and XML libraries would be convenient, but can be learned during the project.
  • Mentors: @jgbarah , @valeriocos

Microtasks

For becoming familiar with the GitLab technology, it is useful if you produce an analysis (Elasticsearch indexes and dashboard) for a GitHub project (git repos and GitHub issues and pull requests). Have a look at the "Before you start" chapter in the GrimoireLab tutorial to learn how to install the needed infrastructure (ElasticSearch, Kibiter, MariaDB) and GrimoireLab itself as Python packages, or how to deploy everything as a Docker container.

Once you're familiar with producing analysis, you can try to automatically produce the list of repositories to analyze, in several ways:

  • Microtask 1: Produce a Python script that produces configuration files for Mordred to analyze a complete GitHub organization, excluding repositories that are forks from other GitHub repositories. Test it with at least two GitHub organizations, producing screenshots of the resulting dashboard.

  • Microtask 2: Produce a Python script that adds a new GitHub repository (git and GitHub issues / pull requests) to a given set of Mordred configuration files. Test it by adding at least two repositories (in two separate steps) to a GrimoireLab dashboard, producing screenshots of the results.

  • Microtask 3: Produce a Python script that removes a GitHub repository (git and GitHub issues / pull requests) from a working GrimoireLab dashboard, by modifying the needed Mordred configuration files, and fixing the raw and enriched indexes to remove the items for the removed repository. Test it by removing at least two repositories (in two separate steps) from a GrimoireLab dashboard, producing screenshots of the results.

  • Microtask 4: Perform any other modification of Mordred configuration files that you may find useful, showing screenshots of how that affects the resulting dashboard.

If you want, you can also:

  • Microtask 5: Produce a pull request for any of the GrimoireLab tools, and try to follow instructions until it gets accepted. Try do do something simple that you consider useful, not necessarily fix to the code: improvement of comments, documentation or testing will usually be easier to get accepted, and very useful for the project. Please, avoid just producing a random pull request just to have another microtask: the objective is not that you get one more microtask done, but that you understand how to interact with developers in the project contributing with something that could be useful).

Of course, there is no need to do all the microtasks, you only need to show that your skills are in good standing for working in this project.

Showing the work you did

If you want to show the work you did, open a GitHub repository, and upload to it:

  • A README.md file explaining what you did, and linking to the results (which will be in the same repository, see below). This will be the main file to show your skills and interest on the project, so try to make it organized and clear, in a way that we can easily understand what you did.

  • Screenshots of the dashboard(s) you produced, and configuration files you used for them (if any). Please remember removing passwords and/or auth tokens that could be in them, before uploading. In the REAME.md file, link all those files, telling about which GitHub projects you analyzed.

  • Python scripts and/or Python Jupyter notebooks you produced, with enough information (in comments and/or in the README.md file) so that we can run them if needed. Upload Python scripts ready to work assuming GrimoireLab packages are already installed. Upload Jupyter notebooks ready to be seen via the GitHub web interface.

  • Links to the pull request you did (if any), along with any comment you may have about it.

Submitting information for the application process

  • You must complete at least one micro-task related to the idea you are interested on.

  • Once you completed at least one micro-task, go to the governance repository and create a pull request to add yourself, your information, and a link to your repository with the completed micro-task(s) in the GSoC-interest.md file (see above for the contents of the repository).

  • You are welcome to include in your repository other information that could be of interest, such as open issues or pull requests submitted to the project to which you intend to contribute during GSoC, contributions to other projects, skills, and other related information.

  • You must complete these things by March 27 16:00 UTC. Make sure to also submit the information required by GSoC for applicants (i.e., project proposal), linking to it from your pull request in the GSoC-interest.md file.

  • Message to the mailing list with guidelines on proposals.

Useful documentation

Asking for help

If you need help, please use the following channels.

For issues related to GrimoireLab:

For general issues related to CHAOSS or CHAOSS metrics

Meetup collection broken due to rate limits: sleep-for-rate not working

The Slack collection is not working and prints errors about too many connections. I guess related to the rate limitations. We're already using a dedicated token BTW.

I'm using the version below:

#!/bin/bash
SORTINGHAT='641dabd71d4f4a23547592a10096240343268bd1'
GRIMOIREELK='9a69a31a6499402cddb3c622f12dc5d11e490fed'
PERCEVAL='f1c170ac6692ffe54558d590a96bedf748624651'
KIBITER='f7bf173cb1d79418b9909d877adfb998ce7b52da'
GRIMOIRELAB_GITHUB_IO='2f71c036020f6873efdd079f4ec2b5f26584cb2b'
ARTHUR='3efff3311b52222b7d06bf63ce97f8a42572bc06'
USE_CASES='31ea49ecc28f3e27aa29fddb0e01b7f475069b6a'
PANELS='ffe73323afe84e491d0b1705e86a7db3588bff38'
MORDRED='024019f5ee1df7096f91f7b41286cd58e55e6495'
TRAINING='9df30c3ec955387a423f261bb141f74133bf7ff7'
PERCEVAL_MOZILLA='423615b9745ab0b0e577701504e46e0f32aab6a3'
PERCEVAL_PUPPET='9c0d4c72a0e9b3c7c1e3295f72ebe70c5c874084'
REPORTS='4ad4f7c9d8cd5eb728a95c3a358d8aa339102c75'
GRIMOIRELAB_TOOLKIT='95d271af32744eb38f73df592f2e0dc5ea97683b'
PERCEVAL_OPNFV='0bec845df7e17965892dfb1181da0ab1c77168dc'
GRIMOIRELAB='efd656ff641623e6a639e095f14f38a71cacc094'

This is the Mordred setup being used:

[meetup]
raw_index = meetup_cloudfoundry_170929
enriched_index = meetup_cloudfoundry_170929_enriched_170929
api-token = ****
no-cache = true
sleep-for-rate = true

Below you can see the 'Too Many Requests' error and how it keeps sending queries instead of stopping:

requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://api.meetup.com/Berlin-PaaS-Cloud-Foundry-Meetup/events/hrjxjmypsjbdb/comments?page=200&key=6c5d36544b39452392c3746c5f7028&sign=true                       
2017-11-23 15:01:30,815 - grimoire_elk.arthur - ERROR - Error feeding ocean from meetup (https://meetup.com/): 429 Client Error: Too Many Requests for url: https://api.meetup.com/Madison-Cloud-Foundry-Meetup/events?fields=event_hosts,feat
ured,group_topics,plain_text_description,rsvpable,series&status=cancelled,upcoming,past,proposed,suggested,draft&order=updated&page=200&scroll=since%3A2016-05-05T00%3A33%3A18.000Z&key=6c5d36544b39452392c3746c5f7028&sign=true              
Traceback (most recent call last):                                                                                                                                                                                                            
  File "./grimoire_elk/arthur.py", line 130, in feed_backend                                                                                                                                                                                  
    ocean_backend.feed()                                                                                                                                                                                                                      
  File "./grimoire_elk/ocean/elastic.py", line 204, in feed                                                                                                                                                                                   
    for item in items:                                                                                                                                                                                                                        
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backend.py", line 360, in decorator                                                                                                                          
    for data in func(self, *args, **kwargs):                                                                                                                                                                                                  
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/meetup.py", line 117, in fetch                                                                                                                 
    for evp in ev_pages:                                                                                                                                                                                                                      
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/meetup.py", line 412, in events                                                                                                                
    for page in self._fetch(resource, params):                                                                                                                                                                                                
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/meetup.py", line 491, in _fetch                                                                                                                
    raise e                                                                                                                                                                                                                                   
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/meetup.py", line 484, in _fetch                                                                                                                
    r.raise_for_status()                                                                                                                                                                                                                      
  File "/usr/local/lib/python3.4/dist-packages/requests/models.py", line 935, in raise_for_status                                                                                                                                             
    raise HTTPError(http_error_msg, response=self)                                                                                                                                                                                            
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://api.meetup.com/Madison-Cloud-Foundry-Meetup/events?fields=event_hosts,featured,group_topics,plain_text_description,rsvpable,series&status=cancelled,upcomi
ng,past,proposed,suggested,draft&order=updated&page=200&scroll=since%3A2016-05-05T00%3A33%3A18.000Z&key=6c5d36544b39452392c3746c5f7028&sign=true                                                                                              
2017-11-23 15:01:35,685 - grimoire_elk.arthur - ERROR - Error feeding ocean from meetup (https://meetup.com/): 403 Client Error: Forbidden for url: https://api.meetup.com/Boulder-Cognitive-Business-Meetup/events/225399399/comments?page=200&key=6c5d36544b39452392c3746c5f7028&sign=true 

Start using sortinghat on existing indexes

This is a follow up on chaoss/grimoirelab-sortinghat#112

I am ready to setup sortinghat, but wanted to verify e.g. double check on something. My dashboard has been running for over a year now. Based on primarily Grimoire-Elk and vanilla Elasticsearch and Kibana (5.4.x).

Can I install sortinghat and start using it, also on my existing indexes?

I use p2o to import my data, I know I will have to add some parameters so that p2o will also start using sortinghat. I do not need to create new indexes? Sortinghat will just extend them further?

Thanks.

Git data is not being collected: 'Name or service not known' error

Using the latest version provided by our Dev team, I'm not able to collect data for the YoctoProject.

How to reproduce:

2017-11-22 09:28:20,995 - grimoire_elk.arthur - ERROR - Error feeding ocean from git (git://git.openembedded.org/openembedded-core): [Errno -2] Name or service not known                                                                     
Traceback (most recent call last):
  File "./grimoire_elk/arthur.py", line 116, in feed_backend
    ocean_backend.feed(latest_items=latest_items)
  File "./grimoire_elk/ocean/elastic.py", line 204, in feed
    for item in items:
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backend.py", line 360, in decorator
    for data in func(self, *args, **kwargs):
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 120, in fetch
    latest_items)
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 146, in __fetch_from_repo
    commits = self.__fetch_newest_commits_from_repo(repo)
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 177, in __fetch_newest_commits_from_repo
    hashes = repo.sync()
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 867, in sync
    pack_name, refs = self._fetch_pack()
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.9.5-py3.4.egg/perceval/backends/core/git.py", line 1012, in _fetch_pack
    fd.write)
  File "/usr/local/lib/python3.4/dist-packages/dulwich-0.18.6-py3.4-linux-x86_64.egg/dulwich/client.py", line 727, in fetch_pack
    proto, can_read = self._connect(b'upload-pack', path)
  File "/usr/local/lib/python3.4/dist-packages/dulwich-0.18.6-py3.4-linux-x86_64.egg/dulwich/client.py", line 814, in _connect
    self._host, self._port, socket.AF_UNSPEC, socket.SOCK_STREAM)
  File "/usr/lib/python3.4/socket.py", line 530, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

Important: from the host the command git clone works with the repo git://git.openembedded.org/openembedded-core

This error was found with the version:

SORTINGHAT='641dabd71d4f4a23547592a10096240343268bd1'      
GRIMOIREELK='9a69a31a6499402cddb3c622f12dc5d11e490fed'     
PERCEVAL='fcae408a56f16a25f1f62abc6c01b61d2e0dc100'        
KIBITER='f7bf173cb1d79418b9909d877adfb998ce7b52da'         
GRIMOIRELAB_GITHUB_IO='2f71c036020f6873efdd079f4ec2b5f26584cb2b'                                                       
ARTHUR='3efff3311b52222b7d06bf63ce97f8a42572bc06'          
USE_CASES='31ea49ecc28f3e27aa29fddb0e01b7f475069b6a'       
PANELS='ffe73323afe84e491d0b1705e86a7db3588bff38'          
MORDRED='024019f5ee1df7096f91f7b41286cd58e55e6495'         
TRAINING='9df30c3ec955387a423f261bb141f74133bf7ff7'        
PERCEVAL_MOZILLA='423615b9745ab0b0e577701504e46e0f32aab6a3'                                                            
PERCEVAL_PUPPET='9c0d4c72a0e9b3c7c1e3295f72ebe70c5c874084' 
REPORTS='4ad4f7c9d8cd5eb728a95c3a358d8aa339102c75'         
GRIMOIRELAB_TOOLKIT='95d271af32744eb38f73df592f2e0dc5ea97683b'                                                         
PERCEVAL_OPNFV='0bec845df7e17965892dfb1181da0ab1c77168dc'  
GRIMOIRELAB='efd656ff641623e6a639e095f14f38a71cacc094' 

[tests] Some details for init-raw.sh

The script init-raw.sh, in the tests directory, uploads data to 172.17.0.1. But it in order to be useful in other setups, it should do that to any address, or maybe to localhost since it will run in the machine with the elasticsearch port. I see it can be configured with an environment variable, it would be more explicit with a command line argument.

In any case, for consistency, I think it should be in written in Python.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.