While working on the single word pre-task and timing exact-match queries, I noticed incorrect behavior. Here are some logs I've printed out to help correctly set up the code:
Query: aminophylin | Expected ED: 1 | Found: [('aminophyllin', 0, 0.9972)]
Query: amonul | Expected ED: 1 | Found: [('ammonul', 0, 0.9619)]
Query: aquaphylin | Expected ED: 1 | Found: [('aquaphyllin', 0, 0.9939)]
Query: aranon | Expected ED: 1 | Found: [('arranon', 0, 0.9619)]
Query: atna | Expected ED: 1 | Found: [('atnaa', 0, 0.96)]
Query: aved | Expected ED: 1 | Found: [('aveed', 0, 0.9533)]
Query: bacim | Expected ED: 1 | Found: [('baciim', 0, 0.9667)]
Query: brisdele | Expected ED: 1 | Found: [('brisdelle', 0, 0.9889)]
Query: bucal | Expected ED: 1 | Found: [('buccal', 0, 0.9611)]
Query: coper | Expected ED: 1 | Found: [('copper', 0, 0.9611)]
Query: davp | Expected ED: 1 | Found: [('ddavp', 0, 0.94)]
Query: duave | Expected ED: 1 | Found: [('duavee', 0, 0.9722)]
Query: dycil | Expected ED: 1 | Found: [('dycill', 0, 0.9722)]
Query: efexor | Expected ED: 1 | Found: [('effexor', 0, 0.9619)]
Query: emoquete | Expected ED: 1 | Found: [('emoquette', 0, 0.9889)]
Query: erycete | Expected ED: 1 | Found: [('erycette', 0, 0.9833)]
Query: evkeza | Expected ED: 1 | Found: [('evkeeza', 0, 0.9714)]
Query: ingreza | Expected ED: 1 | Found: [('ingrezza', 0, 0.9833)]
Query: inovar | Expected ED: 1 | Found: [('innovar', 0, 0.9619)]
Query: kimides | Expected ED: 1 | Found: [('kimidess', 0, 0.9875)]
Query: kwel | Expected ED: 1 | Found: [('kwell', 0, 0.96)]
Query: lunele | Expected ED: 1 | Found: [('lunelle', 0, 0.9762)]
Query: merem | Expected ED: 1 | Found: [('merrem', 0, 0.9611)]
Query: minipres | Expected ED: 1 | Found: [('minipress', 0, 0.9926)]
Query: mycapsa | Expected ED: 1 | Found: [('mycapssa', 0, 0.9833)]
Query: niki | Expected ED: 1 | Found: [('nikki', 0, 0.9533)]
Query: oral | Expected ED: 1 | Found: [('oral', 0, 1.0)]
Query: paladone | Expected ED: 1 | Found: [('palladone', 0, 0.9741)]
Query: pastile | Expected ED: 1 | Found: [('pastille', 0, 0.9833)]
Query: pellet | Expected ED: 1 | Found: [('pellet', 0, 1.0)]
Query: phexi | Expected ED: 1 | Found: [('phexxi', 0, 0.9667)]
Query: sebri | Expected ED: 1 | Found: [('seebri', 0, 0.9556)]
Query: shampo | Expected ED: 1 | Found: [('shampoo', 0, 0.981)]
Query: sula | Expected ED: 1 | Found: [('sulla', 0, 0.9533)]
Query: suprelin | Expected ED: 1 | Found: [('supprelin', 0, 0.9741)]
Query: talzena | Expected ED: 1 | Found: [('talzenna', 0, 0.9833)]
Query: vetids | Expected ED: 1 | Found: [('veetids', 0, 0.9619)]
Query: vivele | Expected ED: 1 | Found: [('vivelle', 0, 0.9762)]
Query: xaracol | Expected ED: 1 | Found: [('xaracoll', 0, 0.9875)]
Query: xidra | Expected ED: 1 | Found: [('xiidra', 0, 0.9556)]
{"avg_search_time_per_word": 9.558915955462466e-05, "false_postive_count": 40}
Average search time per word should be self-explanatory.
False positive count is the count of queries that were searched with an exact match, but returned a result with edit distance = 0.
Looking at the query and returned results, it appears that words with "doubled characters" (e.g. ee
, aa
, etc) are causing the search algorithm to incorrectly count the extra character as not modifying the word's edit distance to the query. This is bad! And we should re-run all previous tasks completed during term 1 to re-evaluate any previous conclusions made.