Comments (68)
Could we start with the easy cases? I feel that reducing the size of the problem also makes it less intimidating to approach.
We are already manually changing headers from BSD to apache for files whose authors are commiters with ICLAs so I think making an automated pass for this case should not be that hard: parse header for authors, see if all are commiters, replace with apache header. If that sounds right I can script that and give it a try.
What confuses me though is that we're worrying about git authors whereas I believe that if someone contributes a file without listing themselves as the authors in the header (for the BSD case), didn't the author concede rights over the code by doing so? At least that was my understanding at the time when I submitted patches to existing files and I did not include an extra line to add me as author to every affected file. In case this is not the correct assumption, I agree that a "best effort" approach (by comparing git author to authors on header) is the only remaining possibility.
from nuttx.
@justinmclean Can you help me understand our requirements here a little bit more with a couple examples:
1.
https://github.com/apache/incubator-nuttx/blob/master/arch/risc-v/include/arch.h
It would seem that this needs to keep the BSD header until Ken re-licenses it under Apache, and we need to call this file out in the LICENSE file as BSD-3, it would not need to be called out in the NOTICE file.
https://github.com/apache/incubator-nuttx/blob/master/arch/arm/src/arm/arm.h
This one we can put the Apache header on, but do not need to make and additions to the NOTICE or LICENSE files beyond the boilerplate Apache. This is because Greg has agreed to re-licence this code.
https://github.com/apache/incubator-nuttx/blob/master/arch/arm/src/imxrt/hardware/rt102x/imxrt102x_ccm.h
This one we can put the Apache header on, but do not need to make and additions to the NOTICE or LICENSE files beyond the boilerplate Apache. This is because Greg has agreed to re-licence this code, and while there are other Authors listed he is the sole copyright holder listed.
https://github.com/apache/incubator-nuttx/blob/master/arch/arm/src/imxrt/imxrt_lcd.c
It would seem that this needs to keep the BSD header unless NXP is willing to relicense it under Apache even though portions are copyrighted by Greg, and we need to call this file out in the LICENSE file as BSD-3, it would not need to be called out in the NOTICE file.
General Questions:
When do we need to be adding the "Based on source code originally developed by" to the NOTICE file. In a couple of the files coming from FreeBSD I see entries like
- Portions of this software were developed by David Chisnall
- under sponsorship from the FreeBSD Foundation.
I know we have other files with other license or cases to go through, but this should cover the vast majority and can get us moving in the right direction.
from nuttx.
@justinmclean any thoughts on these examples. I'm trying to be 100% sure I understand what we need to do here to move this forward in a meaningful way.
from nuttx.
- Correct
- Correct
- It would depend on the history of the file and changes made. In general unless teh changes are significant the original license and header should be kept.
- Would need to be discussed, in general 3rd party headers should not be changed without permission. Looks like they have here and it would be best to revert to the original header.
from nuttx.
Note that with a WIP disclaimer none of this actually blocks a release.
from nuttx.
License clearing wiki page (with draft process and tools): https://cwiki.apache.org/confluence/display/NUTTX/License+Clearing
This was used in release 9.0.0 and 9.1.0.
from nuttx.
Add the related email thread:
https://lists.apache.org/thread.html/r0d30d8c95e861826a3027499fc43bc3851e19f89fdaf8606eada1818%40%3Cdev.nuttx.apache.org%3E
https://lists.apache.org/thread.html/r3149c844791bd0164a3016cbebc690edd9277905678cfb33526937cb%40%3Cdev.nuttx.apache.org%3E
https://lists.apache.org/thread.html/r897f825f1bfcd3501c132438acc9403a70d415652119d1e528f7349f%40%3Cdev.nuttx.apache.org%3E
from nuttx.
@adamfeuer do you have enough free time to collect the statistics inforamtion? My team leader reserve a dedicated resource help you to improve the tools and generate the report. @PeterBee97.
from nuttx.
Thanks @xiaoxiang781216 – I should have enough time to do a high-level analysis this week or next, and I could definitely use the help!
@PeterBee97 are you able to help me do this? If so, reply here or send me an email (it's on my profile), and we'll work out what to do. 🙂
from nuttx.
@adamfeuer Hi Adam, sure I'm here to help. BTW I spent some time yesterday on a script that doesn't modify anything yet but only tries to extract information. Hope this helps :)
from nuttx.
@PeterBee97 Great work with the script and database! I'll update my tools branch and post it here– would you be willing to do a PR to that, so we can have a single branch that we're working on? I'm hoping we can merge these tools to master so that others can help us or continue our work.
Here's a few questions:
- Are you subscribed to the [email protected] email list? If not, would you be willing to subscribe?
- What's your email address? Will you either post it here, send me an email at [email protected]? So we can correspond with the NuttX email list if necessary.
- What time zone are you in? I am in Seattle WA USA, Pacific Time Zone, UTC-7.
- Have you seen the NuttX license clearing wiki page? The process we need to follow and improve is there, as well as a few tools.
- The authors in the file are good to have, but not enough to clear the licenses– we need to look at the git log and get authors from that. There's a script on the wiki page above that can do that.
- Would you be willing to make the script you wrote also emit a plain text file, ideally tab delimited CSV?
from nuttx.
@PeterBee97 I updated my license-clearing tools branch to upstream/master, here's where I've put my tools: https://github.com/starcat-io/incubator-nuttx/tree/feature/license-clearing-tools/tools/license-clearing
from nuttx.
@PeterBee97 Let's try running the process that we did on the sched/
module on either fs/
or mm/
– only the estimation part, not the whole clearing process. They have 100-250 files each, so it's a smaller chunk. We need git authors as well a what is in the file headers. Once we have a way to get stats for that module and all files, then we can try to do it for the whole project.
You can see what we did on sched/
at this wiki subpage: https://cwiki.apache.org/confluence/display/NUTTX/Analysis+March+2020
from nuttx.
@PeterBee97 Great work with the script and database! I'll update my tools branch and post it here– would you be willing to do a PR to that, so we can have a single branch that we're working on? I'm hoping we can merge these tools to master so that others can help us or continue our work.
Here's a few questions:
- Are you subscribed to the [email protected] email list? If not, would you be willing to subscribe?
- What's your email address? Will you either post it here, send me an email at [email protected]? So we can correspond with the NuttX email list if necessary.
- What time zone are you in? I am in Seattle WA USA, Pacific Time Zone, UTC-7.
- Have you seen the NuttX license clearing wiki page? The process we need to follow and improve is there, as well as a few tools.
- The authors in the file are good to have, but not enough to clear the licenses– we need to look at the git log and get authors from that. There's a script on the wiki page above that can do that.
- Would you be willing to make the script you wrote also emit a plain text file, ideally tab delimited CSV?
- Not yet, sure I'm willing to subscribe
- [email protected]
- I'm in Beijing, UTC+8 so my work time will be about 7 pm to 7 am in your timezone :(
- Yes, I browsed through the docs and mailing lists before making that tool
- Yeah, actually my tool is based on your script. The author0~author2 are from git log
- Sure, exporting to csv file is just one command in sqlite
@PeterBee97 Let's try running the process that we did on the
sched/
module on eitherfs/
ormm/
– only the estimation part, not the whole clearing process. They have 100-250 files each, so it's a smaller chunk. We need git authors as well a what is in the file headers. Once we have a way to get stats for that module and all files, then we can try to do it for the whole project.You can see what we did on
sched/
at this wiki subpage: https://cwiki.apache.org/confluence/display/NUTTX/Analysis+March+2020
By typing sched/ in the DB Browser filter I can see that these files either have apache license already or only owe copyrights to Greg or Xiaomi & Pinecone, which should have already approved the license change.
The csv files are uploaded && PR created. https://github.com/PeterBee97/incubator-nuttx/tree/feature/license-clearing-tools/tools/license-clearing
from nuttx.
@PeterBee97 Cool, thanks– I didn't realize the script already used git to find the authors, sorry for missing that. We will need all the authors, not just the top 3. I'll take a closer look tomorrow.
Re: Xiaomi and Pinecone already approving the license change, do you know if they have filed an Apache Software Grant Agreement (SGA)?
Would you be willing to run your tool on fs and mm directories, and see if you can extract a report of the authors for each section and file? That way we can see if we're dealing with 10 authors, 100 authors, etc.
I think another next step is to get you an account on the NuttX Fossology instance. At some point we'll need to get the data into there. I'll email Brennan and you on the list.
Thanks again for being willing to help with this!
from nuttx.
@PeterBee97 Cool, thanks– I didn't realize the script already used git to find the authors, sorry for missing that. We will need all the authors, not just the top 3. I'll take a closer look tomorrow.
Re: Xiaomi and Pinecone already approving the license change, do you know if they have filed an Apache Software Grant Agreement (SGA)?
Would you be willing to run your tool on fs and mm directories, and see if you can extract a report of the authors for each section and file? That way we can see if we're dealing with 10 authors, 100 authors, etc.
I think another next step is to get you an account on the NuttX Fossology instance. At some point we'll need to get the data into there. I'll email Brennan and you on the list.
Thanks again for being willing to help with this!
Top 3 was my idea, given that some 1 commit contributors can be ignored(can't they?). For license issue I don't know exactly the details, @xiaoxiang781216 knows better. I ran the tool on the whole proj already so those two directories can just be filtered. I'll try to get a report for particular files.
You're welcome :)
from nuttx.
from nuttx.
@patacongo Are the original CVS and SVN archives saved anywhere?
from nuttx.
@patacongo Are the original CVS and SVN archives saved anywhere?
No
from nuttx.
@patacongo Ok. I'll see if I can look through the commit message to see if I can see what's going on there.
I'm logged in to Bitbucket, but for some reason I can't view the graph link you posted. Maybe it's a permissions issue or I don't have access to the graphs addon?
from nuttx.
@PeterBee97 https://github.com/PeterBee97 Cool, thanks??? I didn't realize the script already used git to find the authors, sorry for missing that. We will need all the authors, not just the top 3. I'll take a closer look tomorrow.
I mentioned this before, but it bears repeating. The NuttX project was 13 years old in February of 2010. For the first 6 to 6 and a half years, the project used CVS and SVN. You will find no authorship or contact information for the first half of the project's life in the current GIT authors. The log will show me as the sole author for during that time. I did by far most the changes in those days, but not all. Prior to GIT, contributors were noted only in commit comments. It should be possible to get the names, or in most cases just user handles, from the comments but with no contact information. Github apparently does not even know how to parse that early activity. If you look at https://github.com/apache/incubator-nuttx/graphs/contributors you would conclude that the project has only existed since sometime in 2013. The project was actually created in February of 2007. This is clearer in the Bitbucket statistics[1]: https://bitbucket.org/nuttx/nuttx/addon/bitbucket-graphs/graphs-repo-page#!graph=contributors&uuid=4430abf9-a782-49ff-bd16-bc1df696048e&type=c&group=weeks which goes all the way back to the day the project was created. I think that is because prior to GIT, authors were NOT referenced by email address, but rather with some UUID. [1]Note you have to be logged into Bitbucket to see the statistics there.
@PeterBee97 can we add a column in the database to indicate the source code exist before git is used? @patacongo, we need gather the statistics information first and convert the unambiguous code base automatically(of course we need review the PR carefully) and then work on the rest case by case, otherwise NuttX can never become the TOP LEVEL PROJECT.
from nuttx.
@xiaoxiang781216 @patacongo @PeterBee97 I cloned the Bitbucket repo last night (https://bitbucket.org/nuttx/nuttx/src/master/), looked through the commit logs, and I can see what @patacongo is talking about. I didn't compare to the github log, but we should probably also do that. Then we can see if we can do anything with the information there.
It seems like we should be able to come up with a strategy for dealing with this:
- If we can get names and contact info from the commit messages, then we can run the license clearing process we already have, maybe with some additional steps about that process.
- At the very least, we can collect statistics about how many contributors we are talking about.
- If we can't get names and contact info from the commit messages, then we need to get help to address what @xiaoxiang781216 is talking about, so NuttX can graduate from podling status. Surely other Apache projects have faced this same issue.
Let me know if you have other thoughts about this.
@PeterBee97 Will you clone the Bitbucket repo and look at the logs to see if you have some insight about it?
from nuttx.
This is also informative:
git log | grep author
The will produce over 30 thousand lines but you clearly see that the last several thousand commits have author:
patacongo patacongo@42af7a65-404d-4744-a932-0658087f49c3
That, I think is a bogus email that was created when the SVN repository was converted to GIT.
Then there are several thousand with author:
Gregory Nutt [email protected]
That is GIT, but when I was still using GIT as though it were SVN with no authors.
The first author that is not me appears at:
commit b0507038494cd1ae9d14807db758d4e3ae98a1ef
Author: jeditekunum <[email protected]>
Date: Sat Jan 24 14:31:35 2015 -0600
First step at porting to MoteinoMEGA. LED shows assert failure at boot. Appears to be short double blink, short off (~1sec), followed by 250ms toggle cycles. Most of it derived from amber board.
So it appears that there is authorship information for the first 8 years. Only for the last 5 years.
from nuttx.
@patacongo @PeterBee97 If do git log --reverse
and search for ' by ' I find commits like this:
commit f03cb0ff3ababdcc84245d75d795ab956d110e09
Author: patacongo <patacongo@42af7a65-404d-4744-a932-0658087f49c3>
Date: Tue Mar 16 00:53:32 2010 +0000
Bugfixes submitted by David Hewson
git-svn-id: svn://svn.code.sf.net/p/nuttx/code/trunk@2543 42af7a65-404d-4744-a932-0658087f49c3
There are others. They seem to indicate patches or other code from contributors, committed by Greg.
from nuttx.
@patacongo Thanks for pointing this out again, I am sorry I didn't remember this.
from nuttx.
Bugfixes submitted by David Hewson
David Hewson I know. We are connected on LinkedIn. He just started working for HPE. He did a some of the LPC31 port in the 2010 timeframe but has not been involved significantly since.
from nuttx.
If do
git log --reverse
and search for ' by ' I find commits like this
"by" or "from" would both be good search keys. I also recorded the authors in the old ChangeLog files that were recently removed from the repositories because they are not used in the current workflow. That should be a complete list of authors except for a few trivial things like typo fixes that weren't normally included in the ChangeLog.
from nuttx.
@PeterBee97 Will you clone the Bitbucket repo and look at the logs to see if you have some insight about it?
I cloned the bitbucket repo today but the git log seems to be the same with that on GitHub...
So I found the latest ChangeLog from NuttX 9.0.0 RC0 and tried to filter out the names with keywords from|by and the help of some NLP library and put the results in names-changelog.txt. Also processed the git log in the same way and the result is names-gitlog.txt. Still the commit messages of earlier SVN commits are incomplete and many commits are authorless.
This may help cover some corner cases. Maybe we can open an issue and mention these users? But before that let's filter out the "safe" files first as @xiaoxiang781216 suggests.
from nuttx.
@PeterBee97 That's great! Less than 450 names in each file. The next steps are probably:
- remove all the non-human-names (Atmel, CONFIG_SDIO_PREFLIGHT, etc.)
- remove all the name of committers (they have ICLAs) - I manually made a list of committers
- remove duplicates (may need to be done manually since there are typos in the names
- merge the lists
Once this is done, it will give us a scope of how many people there are. Ideally we'd have a list of commits for each name, and only to the top N contributors... not sure what N should be, but looking at the data should tell us. Do you have an idea how to get a list of commits per name?
from nuttx.
I cloned the bitbucket repo today but the git log seems to be the same with that on GitHub...
Yes, the Bitbucket repositories are read-only mirrors of the incubator repositories.
from nuttx.
* I manually made a [list of committers](https://github.com/starcat-io/incubator-nuttx/blob/feature/license-clearing-tools/tools/license-clearing/committers.txt)
A large number of people do not use there names on PRs or commits, but rather some username/handle. A few of these I know. For example, v01d is Matias Nitshe, raidenpl is Mateusz Szafoni. Both Matias and Mateusz are Committers. But there are many more that I don't know.
from nuttx.
@patacongo Yes– we should find a way to update the committer list and the contributor list with handles... I'll think of some ways to do that...
from nuttx.
@PeterBee97 That's great! Less than 450 names in each file. The next steps are probably:
- remove all the non-human-names (Atmel, CONFIG_SDIO_PREFLIGHT, etc.)
- remove all the name of committers (they have ICLAs) - I manually made a list of committers
- remove duplicates (may need to be done manually since there are typos in the names
- merge the lists
Once this is done, it will give us a scope of how many people there are. Ideally we'd have a list of commits for each name, and only to the top N contributors... not sure what N should be, but looking at the data should tell us. Do you have an idea how to get a list of commits per name?
I used this script to get the list from git log and earlier commits by @patacongo :
git log --no-merges --author=patacongo --pretty=format:"%h %s" > gp.txt
cat ng2.txt | xargs -n 1 -I pp grep "pp" gp.txt > commits-patacongo.txt
./name-commits.sh ng2.txt name-commits.txt commits-patacongo.txt
Result:(I didn't exclude enlisted committers yet)
https://github.com/PeterBee97/authors-tool/blob/master/name-commits-full.txt
The names with no commits may be issue reporters' names, or names of committers who only contributed to the apps repo (I only ran the above commands in nuttx repo). Also some names are mentioned in ChangeLog, but sadly there's no commit
authored by or mentioning them.
from nuttx.
Any updates here? I think this is only blocker issue to prevert us graduate, let's try to make progress.
Thanks.
from nuttx.
@Apache9 No progress since the last update, I've been busy with other things. I'll merge @PeterBee97's code today. Next we should generate a list of people and the total lines of code for each person. Then we could sort in reverse order and decide how many people we need to try to contact.
@PeterBee97 Can you help with this? Can you find out how many lines of code were in each commit, tie them to a person in our list, create a list that combines all lines of code for each person, and create a CSV sorted in reverse order by total lines of code contributed?
from nuttx.
@adamfeuer how about we convert the source code which satisfy:
1.The first commit come from git not svn or cvs
2.The copyright owner in source code already sign SGA or ICLA
3.All contributor from git log already sign SGA or ICLA
from nuttx.
@xiaoxiang781216 That would be a good first step for the conversion process. But as discussed on the mailing list, I thought we first wanted to do a rough total estimate of the entire project?
If we want to do both in parallel, then I think your idea will be a good start. We would need:
- list of all contributors who have signed SGA or ICLA - right now we only have committers who I presume have signed ICLAs. I don't know how to get the complete list, do you?
- list of all files for which
- only ICLA committers are in the git log
- first commit is not from svn or cvs
- file's author headers match git author or author listed in git commit message
from nuttx.
My recollection is not 100% clear, but I am recalling that @justinmclean mentioned in very early phases of this project that there were some legacy changes that could be just grandfathered in without following the full IP clearance process. I understood that this was necessary for other large, established Incubator projects as well.
If my understanding is correct, then I propose that we take get permission to "cut some corners" on the pre-GIT changes that have no author associated with the individual commits. In most cases, the author of those early changes will be noted as an author or copyright holder in the BSD license header. In fact, I think that is true of all significant early code contributions. I would propose that we only use the GIT author changes for any automated analysis.
pre-GIT means pre-2014 so we are referring to very old changes.
Resolution of any remaining issues in the license headers will have to be a largely manual process anyway. We will have to examine each BSD license header and resolve all authors and copyright claims anyway. This should include all of the significant, pre-GIT changes. So I think with my suggestion here, the job can be made doable and there will be no loss of authorship on any significant contributions.
from nuttx.
In most cases, the author of those early changes will be noted as an author or copyright holder in the BSD license header. In fact, I think that is true of all significant early code contributions.
I can think of one very frequent case where this is not true. In many cases, people clone files from one location to another. This is particularly true under arch/ and boards/. You will discover many files that I wrote, that have me as the copyright holder and author but GIT will claim, incorrectly, that the person doing the PR/patch was the author. This will apply to several hundred files. There are cases where the info in the license header is more accurate than in the file header.
Third party code brought into the OS will have the same issue. The true author of the code is in the license header, not in the GIT log.
And there are places where people make mistakes in copying files without updating the license headers. For example, under net/ there are a few files that include some small bits of logic from Adam Dunkels. I see that those files with headers have been cloned numerous times and most are no longer correct. Adam Dunkels is not the author of any of the files under net/ (except perhaps some logic under net/sixlowpan and the TCP state machine and even those are very highly customized).
It is all very complex and we cannot expect to get it all 100% correct. I think we just have to keep a high level of integrity and do our best effort to discover and document all authorship.
I think the point is that GIT authors may not agree with the authorship in the license header and those will all need some clarification.
from nuttx.
@xiaoxiang781216 @patacongo I updated my comment above to include "file's author headers match git author or author listed in git commit message" – that handles the cases where things would match up easily.
Yes, there are a bunch of files that won't match up or are confusing... I think we just need to get a count of how many there are to see what it will take to track down the ones that matter.
from nuttx.
from nuttx.
from nuttx.
from nuttx.
Hi
What confuses me though is that we're worrying about git authors whereas I believe that if someone contributes a file without listing themselves as the authors in the header (for the BSD case), didn't the author concede rights over the code by doing so?
Without an ICLA (or an equivalent) this is not the case. Copyright automatically applies. They may not even own rights to the code they commit if their employment contract says otherwise. Thanks, Justin
So @justinmclean is it safe we do the batch conversion if the source code meet all following critieria?
1.The source code isn't converted from SVN or CVS
2.All commiters(or his company) in git log sign ICLA or SGA
3.The copyright holder in the source code sign ICLA or SGA
And I also have one queston: do we need the contributor to sign ICLA if he/she just modify a small portion of code(e.g. ~10 lines)? The quantity number is also important to write an automation tools .
from nuttx.
from nuttx.
Take care with this. The copyright holder in source may or may not be the correct one.
Similarly, the author in GIT may not be the author of the file. Often the copyright holder in the source file header is the correct one, even though that person many not appear in GIT history.
Many people copy files that wrote into different locations (very often for new architectures and for new boards which are very similar to older architectures and boards). Very often, I am the author of the file in these cases.
Bottom line: There is no magic, automated way to correct determine the author. It requires collecting data and then also applying human insight.
@justinmclean https://github.com/justinmclean For many cases there are multiple contributors of changes to a file. There is an original author, the original committer (who might be a different person) and people who have made trivial changes (as trivial as a spelling fix) or who have made substantial enhancements or re-designs. The former would not be treated as authors or copyright holders, but the latter may be. Is there any rule of thumb for what constitutes a significant change warranting rights to the file? Or does this also require human insight.
There are thousands of files involved here. This is potentially multiple man years of effort. I don't see how we can ever accomplish this.
from nuttx.
We can only operate on the information we have. If authorship information was lost from CVS and SVN era (git author is Greg) and the header does not list anyone else than Greg, we can either "play safe" and leave the BSD header (we would respecting original authors license even if we don't know who it really was) or assume that without further information the original author cannot prove authorship either then we are safe to change to Apache. For these "unknown" cases, I don't see any other way. We just need to decide and then act.
For other cases where there is indeed information I think we can script a header change based on various scenarios of git author/header author/author aliases where all have ICLAs. This change can be made to create one commit per file change and add the reason for the safety of the change to the commit message for traceability. Then, we can review each commit in a PR and decide if manual intervention is needed (throwing out unsafe changes, for example).
from nuttx.
We can only operate on the information we have. If authorship information was lost from CVS and SVN era (git author is Greg) and the header does not list anyone else than Greg, we can either "play safe" and leave the BSD header (we would respecting original authors license even if we don't know who it really was) or assume that without further information the original author cannot prove authorship either then we are safe to change to Apache. For these "unknown" cases, I don't see any other way. We just need to decide and then act.
In the SVN/CVS days, I did always give credit to the contributor in comments. However, the task of reading all comments in those 15 thousand or so commits is a very onerous task. The information is there, just not easily accessible.
AFAIK there are no un-credited changes in the repositories.
from nuttx.
We can try to see what wording you used in general and use some regular expression to try to match the attribution.
What I'm thinking is that in any case we will always need to analyze a file by looking at its complete git history to extract git author + header author + commit msg attribution right? The "easy" cases would then be files only touched by current commiters.
from nuttx.
from nuttx.
Let's clear the license for the files we own first. I think it is OK to have some files under compatibile licenses for a ASF project. You just need to mention them in the NOTICE file. And there is another possible solution is to rewrite these files so we can change the license. Anyway, this depends on the number of files we can not change license.
Thanks.
from nuttx.
from nuttx.
I think @xiaoxiang781216 has already found someone wish to help here? But anyway, we need at least a committer to review the work...
from nuttx.
I've been writing some scripts which convert the output of git log (over a given file) into JSON format, to obtain metadata for each revision of the file. The final JSON contains (among other information): commit author, commit message and blob hash for the file.
I then started writing a python script to parse the JSON and extract (using regular expressions) authors from commit message and file header, in each commit. It is working nicely so far.
The final goal would be to determine if a given file passes the previously discussed checks for the easy cases that can be moved to Apache header. The python script could also be used to make the header change and commit the result.
I will work a bit more on this and open a draft PR (to add the script inside tools/).
from nuttx.
I've been writing some scripts which convert the output of git log (over a given file) into JSON format, to obtain metadata for each revision of the file. The final JSON contains (among other information): commit author, commit message and blob hash for the file.
People have been using Fossology to get historical information: https://www.fossology.org/
from nuttx.
Yeah, life intervened and I haven't been able to get back to this. I have less time for it than I thought.
@PeterBee97 made some progress in parsing out the list of contributors from the Git log messages. I will see if I can take his list and see if I can get a list of files and also number of lines of code for each contribution... anyway that seems to be the next steps:
- get a list of people who contributed
- get a list of the commits they were involved with
- work out how many lines of code per person are involved
- sort the list largest to smallest – this will give us an idea of how big the job is
- try contacting people with the n largest contributions
There are several other approaches. This is just the one that seems most straightforward to me. If anyone wants to help, we could use help with:
- writing a script that could take a list of commits and output the contribution size in lines
- getting a list of names and commits from the git log (Peter's scripts are this, or very close I think)
from nuttx.
Please see #1834
I know @PeterBee97 started some of this work but to be honest it was quite difficult for me to take advantage of those, considering it was based on sqlite databases. I chose JSON format since it is quite easy to read and parse with different programming languages.
from nuttx.
Please see #1834
I know @PeterBee97 started some of this work but to be honest it was quite difficult for me to take advantage of those, considering it was based on sqlite databases. I chose JSON format since it is quite easy to read and parse with different programming languages.
I have to be in favor of anything that makes forward progress.
from nuttx.
@patacongo Re: anything that makes forward progress, me too.
@v01d yes, text-based json or csv/tsv formats would be great. The scripts in #1834 look cool. Maybe we combine them into one python script with the sh module. I'll try them out.
from nuttx.
@v01d yes, text-based json or csv/tsv formats would be great. The scripts in #1834 look cool. Maybe we combine them into one python script with the sh module. I'll try them out.
There's quite a bit of escaping going on in the bash script, so embedding it inside python would probably require some work. Not sure if it is worth it, but we can think about it.
from nuttx.
Comment moved to #1834
from nuttx.
Comment moved to #1834
from nuttx.
Oops, thought I was on the PR, I'll move the comments there
from nuttx.
Hi guys, we made some progress and post it here.
#1954
Basically, we collected the author/company list which have not signed the agreement. So the next step is to contact them via email and get them sign the agreement.
My questions are the following:
- Is there an email template for contacting the authors?
- Where do we return the signed ICLA to? Is there somebody from Apache Foundation to collect and verify them?
from nuttx.
ICLAs are emailed to [email protected] see https://www.apache.org/licenses/contributor-agreements.html
from nuttx.
@justinmclean Thanks!One more question, how would you normally contact companies to get their SGA signed? Do you contact people you know from the company to get introduced? What department is normally responsible for this?
For other authors, shall we just auto send email to contact them?
from nuttx.
@justinmclean One more question, shall we ask authors to send ICLA directly to [email protected]? Will someone from Apache Secretary process the mails and update the list and sync with us on the author list?
from nuttx.
I think this issue can be closed:
- It is inactive. There have been no comments since 2020
- NuttX has since graduated to a TLP so all IP clearance issues must have been resolved.
If there is something I am missing please just re-open.
from nuttx.
Related Issues (20)
- speed up CI checks HOT 4
- Adding Board Support in Nuttx: Steps and Requirements HOT 4
- mkrd fail in rc.sysinit HOT 5
- SIM is mounting /tmp as vfat instead of using TMPFS
- NSH I/O redirection and I/O operation doesn't work and incomplete HOT 2
- Cannot Run ESP32C6-devkitm HOT 4
- GitHub Status: Disruption in service with some Redis clusters Jul 31, 2024
- Add esp32c6 power manager feature HOT 3
- [BUG] Just a test HOT 1
- [BUG] kasan read access error in umm_initialize HOT 9
- [BUG] Running lm3s6965-ek:qemu-protected with gdb-multiarch is crashing HOT 5
- [HELP] ESP32 chip revision ERROR HOT 2
- [BUG] ESP32C6 DevKitM spi issue HOT 17
- [FEATURE] Extensible PL011 UART driver HOT 2
- [HELP] Cannot debug esp32c6 using open ocd HOT 8
- [HELP] prompts "undefined referenc to `_impure_prt`" HOT 6
- [FEATURE] Add support for better pre-commit HOT 2
- [HELP] Register setting during booting the OS on ARM A53 FVP HOT 2
- [HELP] Recreating symlinks/switching directory (configure) HOT 3
- [HELP] FFI compatiblity issues HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nuttx.