Giter Site home page Giter Site logo

alexcourouble / email2git Goto Github PK

View Code? Open in Web Editor NEW
39.0 7.0 6.0 54 KB

Matching Commits with Their Mailing List Discussions

Home Page: http://mcis.polymtl.ca/~courouble/email2git/

License: GNU General Public License v3.0

Python 100.00%
linux opensource git patchwork discussion

email2git's Introduction

email2git

Background Information

The Linux project's email-based reviewing process is highly effective in filtering open source contributions on their way from mailing list discussions towards Linus Torvalds' Git repository. However, once integrated, it can be difficult to link Git commits back to their review comments in mailing list discussions, especially when considering commits that underwent multiple versions (and hence review rounds), that belong to a multi-patch series, or that were cherry-picked.

As an answer to these and other issues, we created email2git, a patch retrieving system built for the Linux kernel. For a given commit, the tool is capable of finding the email patch as well as the email conversation that took place during the review process.

This repository contains the scripts used to retrieve the original email patch and discussion that introduced linux commits. The data comes from two different sources: the Linux git repository and patchwork. Patchwork tracks the patches sent to a mailing list and organises the review in a user firendly maner.

The commit-patch matches are accessible here.

Using Email2git for your Project

If your project uses an email-based contribution workflow, you can use email2git to provide your community with better context around your project's commits.

Following are the steps I followed to run Email2git on the Linux Kernel. Keep in mind that the Kernel has 700k + commits, which represents a large amount of data to parse. The following steps were implemented to speed up the matching process.

The process is composed of two major steps:

Email subject matching:

This steps leverages the patch subject / commit summary concept. Depending on the linux subsystem, the email subject might often be used as the "commit summary" as the patch makes it to the main linux tree.

The scripts used to prepare the data are in subject_data_gen.

commit_subject_generator.py reads a git log output and pwSubjectFull.py reads a data dump from the patchwork database.

These two scripts will generate the datasets used by subject_matching.py to execute the subject matching.

Targeted lines-based matching:

This step read the author and the files of patches and commit to make targeted patch/git diff line comparisons.

In this step, we also have to generate the necessary data with the scripts in line_data_prep. Once the data is generated, we run email2git.py to generate the list of matches.

email2git's People

Contributors

couralex6 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

email2git's Issues

Running email2git

Hi,

I'd like to run email2git on my local machine. Many resources are hard-coded in the python files, and the exact steps to run email2git remain unclear to me. What are the steps in what order to run email2git?

What version of python will I have to use? There are some paths in the code that look like they are neither python2 nor python3, such as the following in emaill2git.py:
print "========================= ", + len(MATCHED_CID) + ," ========================="

Thanks
Ralf

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.