Giter Site home page Giter Site logo

ip-tools / uspto-opendata-python Goto Github PK

View Code? Open in Web Editor NEW
87.0 13.0 23.0 138 KB

A client library for accessing the USPTO Open Data APIs, written in Python.

Home Page: https://docs.ip-tools.org/uspto-opendata-python/

License: MIT License

Python 97.35% Makefile 2.65%
uspto pair patent information research search bulk-api bulk-download bulk-downloader opendata

uspto-opendata-python's People

Contributors

amotl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

uspto-opendata-python's Issues

Outdated dependencies

Can we update the dependencies?

It is un-necessarily uninstalling the updated packages and installing old packages.

  Found existing installation: urllib3 1.24.1
    Uninstalling urllib3-1.24.1:
      Successfully uninstalled urllib3-1.24.1
  Found existing installation: idna 2.8
    Uninstalling idna-2.8:
      Successfully uninstalled idna-2.8
  Found existing installation: requests 2.21.0
    Uninstalling requests-2.21.0:
      Successfully uninstalled requests-2.21.0
  Found existing installation: lxml 4.3.1
    Uninstalling lxml-4.3.1:
      Successfully uninstalled lxml-4.3.1
  Found existing installation: beautifulsoup4 4.7.1
    Uninstalling beautifulsoup4-4.7.1:
      Successfully uninstalled beautifulsoup4-4.7.1
beautifulsoup4-4.6.0,  lxml-4.2.5 requests-2.18.4 urllib3-1.22

Problem with search fields

I tried to search on USPTO web site for patents of Amazon and the result was ~9000 patents. Using this library I found only ~600 patents.

I was searching for patents using expression:
expression = 'firstNamedApplicant:(Amazon)'

What other fields can be used? (I tried some other fields, but result of search didn't change).

How to search by appEarlyPubNumber?

I tried following.

from uspto.peds.client import UsptoPatentExaminationDataSystemClient

client = UsptoPatentExaminationDataSystemClient()

client.search('appEarlyPubNumber:(US 2006-0063272 A1)')
client.search('appEarlyPubNumber:(US 2006-0063272)')
client.search('appEarlyPubNumber:(2006-0063272 A1)')
client.search('appEarlyPubNumber:(2006-0063272 2017-0042821)')

all gives

{'numFound': 0,
 'start': 0,
 'docs': [],
 'metadata': {'indexLastUpdatedDate': 'Thu May 30 02:30:21 EDT 2019',
  'queryId': '9f12c1af-cb6b-4f8c-8e0e-97289ba404ec',
  'responseHeader': {'zkConnected': True, 'status': 0, 'QTime': 73}}}

I know client have some issues but search by patent number is working fine.

client.search('patentNumber:(6583088 6875727 8697602)')

Improve query expression documentation

Introduction

This is about writing query expressions properly.

Searching for names

uspto-peds search 'appExamName:"WILSON, NICHOLAS R"'

Note the quotes around the examiner name here.

-- #10 (comment)

Searching for (multiple) document numbers

  1. For querying numberlists, propose an expression like (see also #10 (comment))
uspto-peds search 'patentNumber:(6583088 OR 6875727 OR 8697602)'
  1. Improve querying numberlists by providing an appropriate --numberlist= command line option.

Namespace issue

Okay, this one's pretty simple:

pip install uspto-opendata-python
# success

import uspto 
# success

import uspto.pdb.client
# ImportError: No module named pdb.client

import pkgutil
[name for _, name, _ in pkgutil.iter_modules(['uspto'])]
# []

Any ideas?

Future Development? - Patent Client

Hey! This is the only way I can see to contact you, so here I go!

I'm the author and maintainer of patent_client, a library with a similar scope and feature set as your own. patent_client is under active development, and growing, so if you'd like, I'd love to have you contribute, or add a note on your readme pointing to it!

Patent Client Logo

PyPI | GitHub | Docs

Thanks!

Parker

Reintegrate aspects from "uspto-peds-python" fork

Coming from #7, @rahul-gj created a fork of this library called uspto-peds-python, which just wraps the PEDS Search API and is purely based on the requests and BeautifulSoup packages.

This variant is obviously able to operate with a trimmed down subset of dependencies making it apparently more usable for specific use cases. However, the same thing could be achieved using extras_require() mechanisms.

This issue has been created to track the reintegration of both variants with each other again.

Thanks for your valuable input on that, Rahul.

In windows it gives codec UnicodeDecodeError:.

I tried cmd, powershell, ipython-qt-console, and cygwin.

this is the cygwin output. others are same.

$ pip install uspto-opendata-python
Collecting uspto-opendata-python
  Using cached uspto-opendata-python-0.7.1.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\cygwin64\tmp\pip-build-irgg6e4i\uspto-opendata-python\setup.py", line 5, in <module>
        README = open(os.path.join(here, 'README.rst')).read()
      File "C:\Users\user\WinPython-32bit-3.6.2.0Qt5\python-3.6.2\lib\encodings\cp1252.py", line 23, in decode
        return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4791: character maps to <undefined>

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\cygwin64\tmp\pip-build-irgg6e4i\uspto-opendata-python\

Cant able to install giving lots of error like regex etc

  Building wheel for regex (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python2.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-uxmfp1/regex/setup.py'"'"'; __file__='"'"'/tmp/pip-install-uxmfp1/regex/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-JPZCzH
       cwd: /tmp/pip-install-uxmfp1/regex/
  Complete output (1451 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-2.7
  creating build/lib.linux-x86_64-2.7/regex
  copying regex_3/__init__.py -> build/lib.linux-x86_64-2.7/regex
  copying regex_3/regex.py -> build/lib.linux-x86_64-2.7/regex
  copying regex_3/_regex_core.py -> build/lib.linux-x86_64-2.7/regex
  copying regex_3/test_regex.py -> build/lib.linux-x86_64-2.7/regex
  running build_ext
  building 'regex._regex' extension
  creating build/temp.linux-x86_64-2.7
  creating build/temp.linux-x86_64-2.7/regex_3
  x86_64-linux-gnu-gcc -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -ffile-prefix-map=/build/python2.7-W40Ff2/python2.7-2.7.18=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c regex_3/_regex.c -o build/temp.linux-x86_64-2.7/regex_3/_regex.o
  regex_3/_regex.c: In function ‘bytes1_char_at’:
  regex_3/_regex.c:755:15: error: ‘Py_UCS1’ undeclared (first use in this function); did you mean ‘Py_UCS4’?
    755 |     return *((Py_UCS1*)text + pos);
        |               ^~~~~~~
        |               Py_UCS4
  regex_3/_regex.c:755:15: note: each undeclared identifier is reported only once for each function it appears in
  regex_3/_regex.c:755:23: error: expected expression before ‘)’ token
    755 |     return *((Py_UCS1*)text + pos);
        |                       ^
  regex_3/_regex.c: In function ‘bytes1_set_char_at’:
  regex_3/_regex.c:760:8: error: ‘Py_UCS1’ undeclared (first use in this function); did you mean ‘Py_UCS4’?
    760 |     *((Py_UCS1*)text + pos) = (Py_UCS1)ch;
        |        ^~~~~~~
        |        Py_UCS4
  regex_3/_regex.c:760:16: error: expected expression before ‘)’ token
    760 |     *((Py_UCS1*)text + pos) = (Py_UCS1)ch;
        |                ^
  regex_3/_regex.c:760:40: error: expected ‘;’ before ‘ch’
    760 |     *((Py_UCS1*)text + pos) = (Py_UCS1)ch;
        |                                        ^~
        |                                        ;
  regex_3/_regex.c: In function ‘bytes1_point_to’:
  regex_3/_regex.c:765:13: error: ‘Py_UCS1’ undeclared (first use in this function); did you mean ‘Py_UCS4’?
    765 |     return (Py_UCS1*)text + pos;
        |             ^~~~~~~
        |             Py_UCS4
  regex_3/_regex.c:765:21: error: expected expression before ‘)’ token
    765 |     return (Py_UCS1*)text + pos;
        |                     ^
  regex_3/_regex.c: In function ‘bytes2_char_at’:
  regex_3/_regex.c:770:15: error: ‘Py_UCS2’ undeclared (first use in this function); did you mean ‘Py_UCS4’?
    770 |     return *((Py_UCS2*)text + pos);
        |               ^~~~~~~
        |               Py_UCS4
  regex_3/_regex.c:770:23: error: expected expression before ‘)’ token
    770 |     return *((Py_UCS2*)text + pos);
        |                       ^
  regex_3/_regex.c: In function ‘bytes2_set_char_at’:
  regex_3/_regex.c:775:8: error: ‘Py_UCS2’ undeclared (first use in this function); did you mean ‘Py_UCS4’?
    775 |     *((Py_UCS2*)text + pos) = (Py_UCS2)ch;
        |        ^~~~~~~
        |        Py_UCS4
  regex_3/_regex.c:775:16: error: expected expression before ‘)’ token
    775 |     *((Py_UCS2*)text + pos) = (Py_UCS2)ch;
        |                ^
  regex_3/_regex.c:775:40: error: expected ‘;’ before ‘ch’
    775 |     *((Py_UCS2*)text + pos) = (Py_UCS2)ch;
        |                                        ^~
        |                                        ;
   ^
    regex_3/_regex.c:26230:16: note: declared here
    26230 | PyMODINIT_FUNC PyInit__regex(void) {
          |                ^~~~~~~~~~~~~
    regex_3/_regex.c: At top level:
    regex_3/_regex.c:26217:27: error: storage size of ‘regex_module’ isn’t known
    26217 | static struct PyModuleDef regex_module = {
          |                           ^~~~~~~~~~~~
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python2.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-uxmfp1/regex/setup.py'"'"'; __file__='"'"'/tmp/pip-install-uxmfp1/regex/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-A7_y2n/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/sudhanshu/.local/include/python2.7/regex Check the logs for full command output.
WARNING: You are using pip version 20.3.4; however, version 23.2.1 is available.
You should consider upgrading via the '/usr/bin/python2.7 -m pip install --upgrade pip' command.
sudhanshu@

Synchronously download documents for multiple patent numbers

I would like to know if I can download the list of the patent number or application number in synchronous mode. I can do that on https://ped.uspto.gov/peds/ by giving a coma separated values like '6583088, 6875727, 8697602, 6331531, 6274350, 10112906, 9491944, 9504251, 9137998'

This is because I think and tested also to find out that It's constant time operation whether you request one or 300 it will take the almost same time to complete the requests.

Something like:

from uspto.peds.client import UsptoPatentExaminationDataSystemClient
client = UsptoPatentExaminationDataSystemClient()

client.download_document(
    type='patent'
    numbers='6583088, 6875727, 8697602, 6331531, 6274350, 10112906, 9491944, 9504251, 9137998', # or list
)

Result returns too many docs

Hello, I have been using the API for about a month now and I noticed something different today. Using the below query returns 451438 records and should only be returning 269 records associated to the given examiner.

# Peds basic query to check if PEDS is online
from uspto.peds.client import UsptoPatentExaminationDataSystemClient
import pandas as pd
name = 'WILSON, NICHOLAS R'
client = UsptoPatentExaminationDataSystemClient()
expression = "appExamName:{0}".format(name)
result = client.search(expression)
{'numFound': 451438,
 'start': 0,
 'docs': [{'corrAddrCountryName': 'UNITED STATES',
   'applId': '03429712',
   'totalPtoDays': '0',
   'appFilingDate': '1954-05-13T00:00:00Z',
   'appExamName': 'MATZ, DANIEL R',
   'appExamNameFacet': 'MATZ, DANIEL R',
...

I also emailed PEDS. They recently throttled the number of requests they could handle but we were able to get them to increase it again. I don't think the problem I'm experiencing is associated to their changes tho. Any thoughts?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.