This is a hacky little tool I wrote to parse Linux kernel commits, with security fixes in mind.
Lica allows you to parse a Linux repository's commit history, filtering for fixes and looking for specific keywords.
I've included some statistics in the output and a naive search for patch coverage if you give it some local kernel sources.
I wrote more about the motivation behind this tool over on my blog, in the post Analysing Linux Kernel Commits.
The --help
output details the available arguments, and should run out of the box with the requirements.txt
:
$ python3 lica/core.py --help
.____ .__
| | |__| ____ _____
| | | |/ ___\\__ \
| |___| \ \___ / __ \_
|_______ \__|\___ >____ /
\/ \/ \/
- some kind of tool to analyse Linux kernel commits.
usage: core.py [-h] [--since [SINCE]] [--release [RELEASE]] [--backports [BACKPORTS]] [--token [TOKEN]]
[--repo [REPO]]
Lica is a somewhat configurable tool for analysing Linux kernel commits.
options:
-h, --help show this help message and exit
--since [SINCE] How many days back to search commit history?
--release [RELEASE] Which kernel release to analyse? Major.Minor E.g. latest, '6.1', '5.15' etc.
--backports [BACKPORTS]
Do you want to check to see if commits were backported? See config.py's
'coverage_list' config for details.
--token [TOKEN] Specify your GitHub API token for increased limits.
--repo [REPO] Specify the GitHub repository you'd like to query over the API (only tested for
gregkh/linux)
$ python3 core.py --token my_github_token --backports /mnt/black/Kernels/ --since 180
I aplogise in advance for this lol, but to configure Lica I went for a quick and dirty dictionary-based approach.
In config.py
you can basically define your own dictionaries for filters and configurations. I've included comments for both the filter/keyword dictionaries and overarching configuration dictionary, as well as examples. Hopefully it's not too awkward.
Okay, so long story short, initially I wrote this to manully parse the output of running git log
on a local repository. Then I realised, oof, I might want to actually push this code. Surely there's a better way?
Then I remembered APIs were a thing and lo and behold, GitHub has an API for just this. So I reimplemented everything using PyGithub only to find out not only is it slower but you can also get rate limited pretty easy without setting up a token.
I played around briefly with trying to cache results (e.g. requests_cache
), but didn't get anywhere. So if this gets any use I'll look more into that or just reimplement an option to just use a local repository to query.
I haven't spent a lot of time on this tool, but there's plenty of scope to improve and expand upon it. I'll chuck some ideas here.
- add comments
Contributions welcome are welcome!