I think it would be pertinent to include a filter that excludes contributions to the c

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data

I suggest the way to move forward on this issue is: pick a com

Exclude company's own projects filter about osci HOT 13 OPEN

vallode commented on May 23, 2024 2

Exclude company's own projects filter

from osci.

Comments (13)

dzintars commented on May 23, 2024 1

... or at least you could start small and list at least the number of the repositories collaborators of the organizations contribute to.
If most of the organization contributors contribute to single or few repositories, this is a good indication of their efforts. :)

from osci.

vlad-isayko commented on May 23, 2024

The idea is pretty interesting.

There are also a number of primary questions that arise before the implementation of this idea.

Main question:

How to identify the repositories in relation to the company (a company's own repository or not)?

There is an option to use information about the organization (see OrgId). However, this is connected with the fact that you need to have a list of compliance of the company and the organization that belongs to it. It turns out that it is necessary to create such a list by hand for each company and constantly keep it up to date. And again, there is no certainty that this criterion is 100% valid.

Do you have any ideas on this?

from osci.

abitrolly commented on May 23, 2024

How to identify the repositories in relation to the company (a company's own repository or not)?

If source repo belongs to company. Maintaining official repo status is no different that mantaining official list of domains.
If all commits and merge requests are from the company

from osci.

vlad-isayko commented on May 23, 2024

1. If source repo belongs to company. Maintaining official repo status is no different that mantaining official list of domains.

I agree that at first glance, maintaining a list of repositories does not differ much from maintaining a list of companies. But the question arises about a significantly larger volume of repositories than companies and about a greater dynamics of the list of repositories than domains.

2. If all commits and merge requests are from the company

I didn't quite understand what it meant. Could you explain a little more broadly?

You are suggested to think that the company's own repository is those repositories in which commits are only from the company, right? Is this a necessary and/or sufficient condition?

from osci.

abitrolly commented on May 23, 2024

But the question arises about a significantly larger volume of repositories than companies and about a greater dynamics of the list of repositories than domains.

It could happen that the amount of non-owned repositories that companies are committing to is non-significant.

I didn't quite understand what it meant. Could you explain a little more broadly?

The repo where all commits are from corporate emails are definitely owned by the company. That's a sufficient condition for a filter. )

from osci.

vallode commented on May 23, 2024

Sorry for taking a while to respond, I simply don't have enough information on the workflow that OSCI uses (my bad) to elaborate further than what @abitrolly said. I would only ever consider a contribution to be in the company's full self-interest if the contribution landed on a repository that was owned by the company itself.

Is this a trivial task? Very unlikely, I think a "repo where all commits are from corporate emails" is too specific of a scenario and wouldn't affect the dataset very much (especially for the top dogs which is where my interest lies the most)...

We'd need a way to filter out contributions made from the organisation's own authors into the organisation's own repositories.

from osci.

abitrolly commented on May 23, 2024

We'd need a way to filter out contributions made from the organisation's own authors into the organisation's own repositories.

I agree. That would be sufficient.

from osci.

patrickstephens2 commented on May 23, 2024

I suggest the way to move forward on this issue is:

pick a company at random
look at the list of repos which OSCI is showing their employees contribute to
try to define some logic (algorithm) defining which of these repos are "company repos" vs "non-company repos". As part of this task you will have to define what is a "company repo", that in itself will be challenging.
Now pick another company at random and test the logic you came up with, refine it.
And so on with additional companies until you have logic which appears to manage the general case.

It's important to understand that a perfect algorithm for this does not exist, just different directions to go, each with pros and cons. An empirical approach (if that's the right term) like I suggest above is necessary rather than defining a theoretical approach. Your goal has to be to iterate until you reach a logic which is "good enough" to show a general picture of activity across organizations. This was our experience defining the logic for OSCI itself. What looks easy at a high level gets very challenging when one tries to define the detail and algorithmize it.

from osci.

abitrolly commented on May 23, 2024

As part of this task you will have to define what is a "company repo", that in itself will be challenging.

def outside_contributions():
    employees_committed
    contractors_committed
    robots_committed
    total_committed

    if (total_committed - employees_committed - contractors_committed - robots_committed > 0):
       return True

from osci.

patrickstephens2 commented on May 23, 2024

Let's take company ACME. It creates and runs project X. This project is not under the ACME org on github, so programmatically not directly connectable to the company. The project has 100 contributors, 99 who work at ACME and 1 who is outside (perhaps it is an ex-employee who worked on this before leaving the company and continued after... I have seen such examples). Is this a company project?

from osci.

dzintars commented on May 23, 2024

What could be the simplest and probably not the most accurate insight? While getting perfect stats sounds sweet, most likely we will not get there right away. So... what could be done right now to make the index by 1% better?
How about CLA's? Could those be considered as indication? If repo is requiring to submit CLA, could it be considered X org repository?
Could manual PR process be implemented to metatag the repos? Like... community could submit PR's to this repository to mark/add indexed repos to one or the other category and even augment the metadata? While fully automated process is neat... i think mostly we are interested in like... 2-5K public repositories and those definitely could be meta-tagged manually over the time.

from osci.

abitrolly commented on May 23, 2024

Maybe the priority should be to publish the data that could make different kind of filters possible. Right now the site https://opensourceindex.io/ just links to this repo with no diagrams of the DB schema are no information if the Big Query datasets are being public.

from osci.

jeffwilcox commented on May 23, 2024

At our company, internally we gather public data on GitHub activity from employees who choose to opt-in regarding their GitHub activity and contributions, with the goal of identifying trends in contributions to projects outside of Microsoft's governance. Our data is skewed differently than this index, however, since we have an internal indicator of who our employees are on GitHub once they opt-in to tell us, vs having to determine it from profiles.

Our numbers for December 2021, for example, are significantly higher for 'total community' and other figures as a result of so many people being e-mail private on GitHub... but of that specific month's contributions, I tried pulling equivalent data, and around a third of our actively-open-contributing employees contributed to projects not governed by our company, yielding a number higher than the index but not majorly larger.

While the data is interesting, our key reason for differentiating "is it controlled by Microsoft or not" is to help encourage our employees' participation in communities to become eligible in our FOSS Fund and to evolve the culture.

I agree slicing off a company's controlled projects is an interesting pivot, but a murky gray area, especially given foundations and cross-industry collaborations and so on.

from osci.

Exclude company's own projects filter about osci HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent