Giter Site home page Giter Site logo

Comments (6)

bioSandMan avatar bioSandMan commented on May 31, 2024

Doing some basic testing it looks like it is happening when you commented out line 42 in uspto/util/client.py.

from uspto-opendata-python.

amotl avatar amotl commented on May 31, 2024

Dear Christopher,

thanks for writing in. We can confirm the erratic behavior you are observing:

uspto-peds search 'appExamName:WILSON, NICHOLAS R' | jq '.numFound'
451438

Introduction

The change f07ebef you are referring to which removes the mm parameter from the HTTP request came from #7 and has been introduced just recently. I recognize from the behavior you are observing that it seems to have an unfortunate side effect.

Investigation

As the change has been done in a hurry in order to support @rahul-gj, I now recognize that it's on me that I should have checked the meaning of this parameter first. After looking into the appropriate documentation about the Lucene/Solr DisMax query parser now, we should take these details about the mm (Minimum Should Match) Parameter into consideration:

  1. While the documentation on DisMax query parser says that

The default value of mm is 100% (meaning that all clauses must match).

  1. An answer on stackoverflow (solr-mm-parameter-of-dismax-parser) says that

If no mm parameter is specified in the query, or as a default in solrconfig.xml, the effective value of the q.op parameter (either in the query, as a default in solrconfig.xml, or from the 'defaultOperator' option in schema.xml) is used to influence the behavior.
So, the default behavior of the mm is determined by q.op parameter. If q.op is effectively AND, then mm=100%; if q.op is OR, then mm=1.

Conclusion

So, we definitively should send the mm parameter from our end in order to control the query processing behavior of the Lucene/Solr query parser on the remote search backend. The appropriate value should be determined by the character of the query respectively by the intention of the researcher and should be populated in a way to adhere to do what I mean principles.

So, when implementing that, querying for numberlists with a search command like

uspto-peds search 'patentNumber:(6583088 6875727 8697602)'

should probably be handled a bit differently.

Thoughts

Let's see if we can a) determine the mm value heuristically from the query expression or b) whether we should spend another command line parameter for designating that or even c) just amend the documentation to propose an expression like

uspto-peds search 'patentNumber:(6583088 OR 6875727 OR 8697602)'

for querying numberlists - if this actually would be the right thing to do here.

If c) would fit the bill, we might even be able to set mm back to it's former value of 100% in order to solve your issue while still keeping @rahul-gj happy.

Thanks again for reporting this to us.

With kind regards,
Andreas.

from uspto-opendata-python.

amotl avatar amotl commented on May 31, 2024

After some more investigations we want to share that a query like this will always return the correct number of results, regardless of the mm value.

uspto-peds search 'appExamName:"WILSON, NICHOLAS R"' | jq '.numFound'
269

The user interface at https://ped.uspto.gov/peds/#/search will also behave like that and add quotes to the search string "WILSON, NICHOLAS R" to make it verbatim. By the way, the user interface currently will always set mm=0%.

from uspto-opendata-python.

bioSandMan avatar bioSandMan commented on May 31, 2024

Based on your observations I found this works in my case where I am taking advantage of the internal classes:

name = "WILSON, NICHOLAS R"
client = UsptoPatentExaminationDataSystemClient()
expression = 'appExamName:"{0}"'.format(name)
result = client.search(expression)
result

Note the double quotes around the variable for the format expression.

A change request to the code may not be necessary for the sake of @rahul-gj but perhaps a warning or something in the user doc?

I appreciate you looking into this.

from uspto-opendata-python.

bioSandMan avatar bioSandMan commented on May 31, 2024

The usage of quotes around the search expression has been working. Thanks again!

from uspto-opendata-python.

amotl avatar amotl commented on May 31, 2024

The usage of quotes around the search expression has been working. Thanks again!

Thanks for letting me know. I've diverted #11 and #12 from here. Thanks likewise!

from uspto-opendata-python.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.