Giter Site home page Giter Site logo

uspto-patft-web-crawler's Introduction

Web Crawler of USPTO PatFT Database

Crawler for fetching information of US Patents and batch PDF download.
preview:

Motivation

I've participated in patent analyzation project since Apr. 2017. Our team need to search with certain query on PatFT and examine if each resulting patent is suitable for our topic and then analyze suitable patents. I found out that we can download bulk patent data only by searching certain words, names, or regions with Download patent data and PAIR Bulk Data from USPTO's Open Data Portal, which aren't very useful for us, and suitable tools that can be found on the Internet are all charged. So, I started to write a Python scripts containing basic functions, which accelerated the progress of project. To made this program more user friendly, I revised the code and made an UI with PyQt5.  

Download Execution File

The source code has packaged with pyinstaller in Windows
1.Normal package
2.Single executable file

Instruction

You can follow the instruction below or watch this video. It should be easy to learn :).

Patent Fetcher

(1) Insert PN (2) Filtering conditions (3) Information to be fetched (4) PDF type to be downloaded (5) Table

  1. Insert the patent numbers (PNs) to be processed in following ways:
    (a) Choose a CSV file with PNs in the first coulumn (example).
    (b) Search with query (The query should examined on PatFT first) . The PNs should be shown in the table.

  2. (Optional) Filtering the shown PNs with setting the patent types, range of application date & issue date.
    The filtered PNs are also shown on the table but will be deleted in the end of this process.

  3. Fetching the information of patents shown in the table by web crawling.

  4. Download PDF of full-text or drawing section (or both simultaneously) of patents shown in the table.

  5. The table can be saved as a CSV file anytime.

Browser

In the second page, you can insert PN to show the PatFT web of this patent or open PDF with your default browser.

Caution

  1. The program has some problems when fetching information of the patents issued before 1976. Still working on it.
  2. Searching with long query takes a lot of time, same as it takes on PatFT (example). I tried using threading in the program but it leads to more time consumed, and multiprocessing leads to bad connection. If you have a long query with less than 500 results, copying the patents number to a CSV file on your own and insert the file should be faster.
  3. If you encountered any problems or have any suggestion (like adding other function), feel free to contact me!

uspto-patft-web-crawler's People

Contributors

mattwang44 avatar nobodyzxc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

uspto-patft-web-crawler's Issues

Future Development? Patent Client

Hey! This is the only way I can see to contact you, so here I go!

I'm the author and maintainer of patent_client, a library with a similar scope and feature set as your own. patent_client is under active development, and growing, so if you'd like, I'd love to have you contribute, or add a note on your readme pointing to it!

Patent Client Logo

PyPI | GitHub | Docs

Thanks!

Parker

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.