The extract_features test using query = 't6 t6' and document = 'd3' expects unique_que

Hello, please look at <a href="https://github.com/kbalog/ir-course/blob/master/ass

A5 extract_query_doc_features about uis-dat640-fall2020 HOT 7 OPEN

kbalog commented on August 14, 2024 1

A5 extract_query_doc_features

from uis-dat640-fall2020.

Comments (7)

BerntA commented on August 14, 2024 2

hey. I think there is still an error in the new test you gave us from A5-errata
q2d5_features = extract_query_doc_features(analyze_query(es, QUERY[2], 'body'), 'd5', es, index='toy_index')
assert q2d5_features['unique_query_terms_in_body'] == 2
assert q2d5_features['avg_TF_body'] == 1.0
where
q2d5_features['avg_TF_body'] should be 0.5 and not 1
d5 is
('d5', {'title': 't2', 'body': 't1 t2 t3 t5'})]
and QUERY[2] is
['t5', 't2']
so the sum of query terms frequency in body is 2. and the total number of terms is 4 --> 2/4 = 0.5.
Am I missing something here?

I get 1.0, because I divide on the unique amount of query terms, so 2/2 because we're aggregating over the terms that exist in both the analyzed query and document field?

from uis-dat640-fall2020.

Dregster commented on August 14, 2024

Wondering about the same!

feature_vect_q3_d3 is only correct if we consider query length, and not unique as above.

from uis-dat640-fall2020.

commented on August 14, 2024

Wondering about the same!

feature_vect_q3_d3 is only correct if we consider query length, and not unique as above.

But I am not sure if we can do that as it is mentioned 'unique query term'

from uis-dat640-fall2020.

tlinjordet commented on August 14, 2024

Hello, please look at
https://github.com/kbalog/ir-course/blob/master/assignments/A5_errata.md.

from uis-dat640-fall2020.

thek123 commented on August 14, 2024

hey. I think there is still an error in the new test you gave us from A5-errata
q2d5_features = extract_query_doc_features(analyze_query(es, QUERY[2], 'body'), 'd5', es, index='toy_index')
assert q2d5_features['unique_query_terms_in_body'] == 2
assert q2d5_features['avg_TF_body'] == 1.0
where
q2d5_features['avg_TF_body'] should be 0.5 and not 1
d5 is
('d5', {'title': 't2', 'body': 't1 t2 t3 t5'})]
and QUERY[2] is
['t5', 't2']
so the sum of query terms frequency in body is 2. and the total number of terms is 4 --> 2/4 = 0.5.
Am I missing something here?

from uis-dat640-fall2020.

thek123 commented on August 14, 2024

hey. I think there is still an error in the new test you gave us from A5-errata
q2d5_features = extract_query_doc_features(analyze_query(es, QUERY[2], 'body'), 'd5', es, index='toy_index')
assert q2d5_features['unique_query_terms_in_body'] == 2
assert q2d5_features['avg_TF_body'] == 1.0
where
q2d5_features['avg_TF_body'] should be 0.5 and not 1
d5 is
('d5', {'title': 't2', 'body': 't1 t2 t3 t5'})]
and QUERY[2] is
['t5', 't2']
so the sum of query terms frequency in body is 2. and the total number of terms is 4 --> 2/4 = 0.5.
Am I missing something here?

I get 1.0, because I divide on the unique amount of query terms, so 2/2 because we're aggregating over the terms that exist in both the analyzed query and document field?

should it be unique query terms though?

or else an aggregation function (sum, maximum, or average) over the term frequencies of each query term.

or should it be over the frequencies ( length) of query?

from uis-dat640-fall2020.

ChristofferHolmesland commented on August 14, 2024

should it be unique query terms though?
or should it be over the frequencies ( length) of query?
@thek123

It's all query terms, check the last line of the errata.

from uis-dat640-fall2020.

A5 extract_query_doc_features about uis-dat640-fall2020 HOT 7 OPEN

Comments (7)

Related Issues (9)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent