This is the starter code and tests for CSE 535 Fall 2014 Project 1
nicklondhe / newsindexer Goto Github PK
View Code? Open in Web Editor NEWThis is the starter code and tests for CSE 535 Fall 2014 Project 1
This is the starter code and tests for CSE 535 Fall 2014 Project 1
As per as one of the rule "Any punctuation marks that possibly mark the end of a sentence (. ! ?) should be removed. Obviously if the symbol appears within a token it should be retained (a.out for example)."
Since Opening and Closing brackets "()" are also considered as punctuation mark, so test case rule of f'(x) => f(x) should not yield f(x). Ideally it should yield f(x
While testing the indexer, TokenStream is not being updated in getAnalyzedTerm.
Analyzer analyzer = fact.getAnalyzerForField(FieldNames.CONTENT, stream);
while (analyzer.increment()) { }//Stream is not updated.
stream.reset(); return stream.next().toString();
As seen above stream is not updated and directly used again. Is this a bug ? Or should the analyzer work on the same object and not create a new TokenStream internally. Am a bit confused here.
On UBlearns, you have said the following should be the ideal case. Please let me know if the change will be incorporated in the final version you use to test.
Indexer test works properly for this code
Analyzer analyzer = fact.getAnalyzerForField(FieldNames.CONTENT, stream);
stream = analyzer.getStream();
while (analyzer.increment()) { }
stream.reset();
return stream.next().toString();
The directory browsing is incorrectly done, test and fix
For the test case - "email is [email protected]" - the @ symbol is used to split the token while in the test case "a+b-c" we're just dropping the special char.
Please confirm the expectation from the tests
I believe your intersect method is modifying the original index hashmaps, on the third iteration the occurances are 4 for docs 2,3,4 when the should be 3.
Given docs:
String[] strs = {"new home sales top sales forecasts", "home sales rise in july",
"increase in home sales in july", "july new home sales rise"};
Query: sales", "home", "july"
In the method validateAuthorOrg
In case the value of authorOrg is null the assertNull always fails as the asserNull compares the address of the String array rather than the value.
The code is as follows:
private void validateAuthorOrg(Document d, int count){
String authorOrg = authororgs[count];
if (authorOrg == null) {
assertNull(d.getField(FieldNames.AUTHORORG));
} else {
assertEquals(authorOrg,
d.getField(FieldNames.AUTHORORG)[0]);
}
}
The corrected code is as follows:
private void validateAuthorOrg(Document d, int count){
String authorOrg = authororgs[count];
if (authorOrg == null) {
assertNull(d.getField(FieldNames.AUTHORORG)[0]);
} else {
assertEquals(authorOrg,
d.getField(FieldNames.AUTHORORG)[0]);
}
}
The same bug is applicable to validateAuthor method as well.
The method setupIndex() in IndexerTest has annotation BeforeClass and hence should be a static method.
Also, the variable reader will have to be made static to be accessed inside setupIndex()
//null at end
assertFalse(stream.hasNext());
assertNull(stream.getCurrent());
TokenStreamTest getCurrent()
while (stream.hasNext()) {
tNext = stream.next();
for (int i = 0; i < 5; i++) {
tCurrent = stream.getCurrent();
assertTrue(stream.hasNext());
assertEquals(tNext, tCurrent);
}
}
at some point tNext points to last element so assertTrue(stream.hasNext()); will fail.
if our delimiter is a whitespace then last token should be "test." not "test" - its gonna be a responsibility of SymbolTokenFilter to remove "."
There's 2 bugs in testGetCurrent() - or my understanding of the getCurrent method is wrong
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Ljava.util.HashMap;
at edu.buffalo.cse.irf14.index.test.IndexerTest.prepareIndex(IndexerTest.java:247)
at edu.buffalo.cse.irf14.index.test.IndexerTest.testQuery(IndexerTest.java:173)
Is anyone else getting this ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.