Comments (6)
Doing some basic testing it looks like it is happening when you commented out line 42 in uspto/util/client.py.
from uspto-opendata-python.
Dear Christopher,
thanks for writing in. We can confirm the erratic behavior you are observing:
uspto-peds search 'appExamName:WILSON, NICHOLAS R' | jq '.numFound'
451438
Introduction
The change f07ebef you are referring to which removes the mm
parameter from the HTTP request came from #7 and has been introduced just recently. I recognize from the behavior you are observing that it seems to have an unfortunate side effect.
Investigation
As the change has been done in a hurry in order to support @rahul-gj, I now recognize that it's on me that I should have checked the meaning of this parameter first. After looking into the appropriate documentation about the Lucene/Solr DisMax query parser now, we should take these details about the mm (Minimum Should Match) Parameter into consideration:
- While the documentation on DisMax query parser says that
The default value of mm is 100% (meaning that all clauses must match).
- An answer on stackoverflow (solr-mm-parameter-of-dismax-parser) says that
If no
mm
parameter is specified in the query, or as a default insolrconfig.xml
, the effective value of theq.op
parameter (either in the query, as a default insolrconfig.xml
, or from the 'defaultOperator' option inschema.xml
) is used to influence the behavior.
So, the default behavior of themm
is determined byq.op
parameter. Ifq.op
is effectively AND, thenmm=100%
; ifq.op
is OR, thenmm=1
.
Conclusion
So, we definitively should send the mm
parameter from our end in order to control the query processing behavior of the Lucene/Solr query parser on the remote search backend. The appropriate value should be determined by the character of the query respectively by the intention of the researcher and should be populated in a way to adhere to do what I mean principles.
So, when implementing that, querying for numberlists with a search command like
uspto-peds search 'patentNumber:(6583088 6875727 8697602)'
should probably be handled a bit differently.
Thoughts
Let's see if we can a) determine the mm
value heuristically from the query expression or b) whether we should spend another command line parameter for designating that or even c) just amend the documentation to propose an expression like
uspto-peds search 'patentNumber:(6583088 OR 6875727 OR 8697602)'
for querying numberlists - if this actually would be the right thing to do here.
If c) would fit the bill, we might even be able to set mm
back to it's former value of 100%
in order to solve your issue while still keeping @rahul-gj happy.
Thanks again for reporting this to us.
With kind regards,
Andreas.
from uspto-opendata-python.
After some more investigations we want to share that a query like this will always return the correct number of results, regardless of the mm
value.
uspto-peds search 'appExamName:"WILSON, NICHOLAS R"' | jq '.numFound'
269
The user interface at https://ped.uspto.gov/peds/#/search will also behave like that and add quotes to the search string "WILSON, NICHOLAS R"
to make it verbatim. By the way, the user interface currently will always set mm=0%
.
from uspto-opendata-python.
Based on your observations I found this works in my case where I am taking advantage of the internal classes:
name = "WILSON, NICHOLAS R"
client = UsptoPatentExaminationDataSystemClient()
expression = 'appExamName:"{0}"'.format(name)
result = client.search(expression)
result
Note the double quotes around the variable for the format expression.
A change request to the code may not be necessary for the sake of @rahul-gj but perhaps a warning or something in the user doc?
I appreciate you looking into this.
from uspto-opendata-python.
The usage of quotes around the search expression has been working. Thanks again!
from uspto-opendata-python.
The usage of quotes around the search expression has been working. Thanks again!
Thanks for letting me know. I've diverted #11 and #12 from here. Thanks likewise!
from uspto-opendata-python.
Related Issues (16)
- In windows it gives codec UnicodeDecodeError:. HOT 6
- Adjust the mm (Minimum Should Match) Parameter of Lucene/Solr
- Improve query expression documentation
- How to search by appEarlyPubNumber? HOT 3
- Future Development? - Patent Client HOT 1
- Bug in the POST query to the API HOT 1
- Cant able to install giving lots of error like regex etc HOT 1
- Problem with downloading full information about patent HOT 2
- Problem with search fields HOT 1
- Namespace issue HOT 2
- Unable to access the USPTO PBD system HOT 9
- Outdated dependencies HOT 4
- Synchronously download documents for multiple patent numbers HOT 11
- Add more data sources from USPTO HOT 1
- Reintegrate aspects from "uspto-peds-python" fork HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from uspto-opendata-python.