From Asad Sayeed's statistical NLP course at the University of Gothenburg.
My name: Konstantinos Peratinos
Only meaningful additional information is that result.txt files contain two extra lines of information before every matrix. A line with the class of the folder and a line with the ammount of lines that the matrix takes. This was done in order to parse it succesfully with simdoc
Files were names resultX.txt where X is the question number
Simdoc need a bit of finetuning to function for uneven matrixes but i ran out of time
Well i did not think of it excessively just chose a shortened vocabulary of -B = 40
File names | Grain-Grain | Crude-Crude | Crude-Grain | Grain-Crude |
---|---|---|---|---|
result1.txt | - | - | - | - ## Could not calculate due to different dimensions of the classes |
result2.txt | 0.297 | 0.396 | 0.162 | 0.162 |
result3.txt | - | - | - | - ## Could not calculate due to different dimensions of the classese |
result4.txt | 0.226 | 0.248 | 0.171 | 0.171 |
result5.txt | 0.214 | 0.270 | 0.056 | 0.052 |
result6.txt | - | - | - | - ## Could not calculate due to different dimensions of the classes |
result7.txt | 0.094 | 0.097 | 0.092 | 0.092 |
result8.txt | 0.11 | 0.10 | 0.07 | 0.07 ## Could not calculate due to different dimensions of the classes truncating to 1000 did not make them uniform |
(Delete if you're not answering the bonus.)