Create a way to see how many hits there are on a context outside of the current subset (this counter is already in the dataset).
I think it would be wise to stop adding hashes to contexts if there is no filename: this just takes extra space and creates noise in the context fields.
Add the info into the DB while parsing the file the first time, would save lot of going back to the samples.
Additionally, there would probably be a lot of contexts to pivot on.
This is only necessary if it makes the retrieval fast(er). The idea is that it might additionally facilitate incremental updates to the graph as results come in.