The patentr from jyprojs

Convert all formats to CSV

TXT (1976-2001)
XML type 1 (2002-2004)
XML type 2 (2005-present)

add shiny UI to work w/ patent data

convert_funcs.cpp last patent prints incorrectly

fout << currID << ",\"" << title << "\"," << appDate << "," << issDate << ",\"" << inventor << "\",\"" << assignee << "\"," << iclClass << "," << refs << "\n";
should be like above and not
fout << currID << ",\"" << title << "\"," << appDate << "," << issDate << "," << inventor << "," << assignee << "," << iclClass << "," << refs << "\n";

note: assignee and inventor fields may have commas

xml1 assignee might sometimes be ONM for organization name

CRAN release v0.2.0

XML1 format
XML2 format
text properties (abstract, description, etc.) for TXT format; edit: choose to ignore
vignette(s)
fix GitHub open issues

fix documentation for `output_file` parameter

no default value, output file name must be provided

add sample dataset

~10-50 rows showing the patent format output

account for immutability of data frames in R

each assignment copies over entire data frame resulting in slowdown (ridiculously inefficient code)

TXT (already line-by-line w/o modifying DFs b/c done in C++)
XML1
XML2

New field to retrieve

Congratulations for the work done! The tool works well and it seems to be very userfriendly. I know that in the bulk files of USPTO there are other fields that currently are not implemented in the package, such as the abstract and the description related to each patent. These information are fundamental for the analysis of the technical state-of-the-art of a domain. Is it possible to implement some functions for extracting other text fields of the patents?

IPC classification might use classification-ipc or classification-ipcr

xml2 assignee has first and last names if individual

Create UI function to obtain bulk data

Add helper functions

visualization
summary data
print

improve automated testing

confirm content as well in tests/testthat/test-convert.R

add update parameter to show user progress

likely w/ cat statements, per n patents

Unable to save as a dataframe and not getting headers in a csv

Hi,

I have two questions on the package's execution.

The document description says, “Data can be returned as a data frame or written to a file (see ‘output_file‘ parameter).” However, I am getting the following error when I try to retrieve the data as a dataframe.

df2018w1 <- get_bulk_patent_data(year = 2018, week = 1)

Error in cat("WKU,Title,App_Date,Issue_Date,Inventor,Assignee,ICL_Class,References,Claims\n",  : 
  argument "output_file" is missing, with no default

The code runs fine if I write the data to a csv but there are no column names in the csv output. Please tell how to get them.

Thanks!

need consistent date format across 3 formats

For 1 January 1976, TXT currently represents as 19760101 while XML formats represent as 1976-01-01; should switch TXT to 1976-01-01 to maintain consistency and readability

add GitHub repo elements (more aesthetic)

README.md content
README.md badges
pkgdown website

add automated testing

Getting `TRUE` for 2001 and after, download error for earlier years

Hi devs,

Thanks for the work and package.

For more recent years I am getting the TRUE placeholder and no data is downloading.

I tried running the examples and other years before 2001, and it attempts to download but I am getting this error:

Error in utils::download.file(url = curr_url, destfile = dest_file) : 
  cannot open URL 'https://bulkdata.uspto.gov/data/patent/grant/redbook/fulltext/2001/pftaps20010102_wk01.zip'
In addition: Warning message:
In utils::download.file(url = curr_url, destfile = dest_file) :
  InternetOpenUrl failed: 'The certificate authority is invalid or incorrect'

Probably can't verify the SSL of the USPTO website.

On an unrelated note, is there a way to download the patent data connected to a certain inventor or company instead of going by week?

add vignette(s)

Claims missing in some TXT conversions (Need to search for DCLM tag and PAL subtag under DCLM or CLMS)

Example of the PAL is 1995 wk 32

add claims field for all 3 formats

TXT
XML1
XML2

complete package documentation

References of patentr Output not matching with USPTO portal for week 1 2019 data

Hi,

I noticed that the csv output for week 1 for 2019 has blank values for the "references" field for all WKUs. I checked randomly for references for some of them on https://portal.uspto.gov/pair/PublicPair using their patent number as the search field and found the latest reference document on the "Display References" tab shows multiple references for these patents. Then I compared the contents for week 1 of 2012. The csv output for this shows populated fields for references for the patents issued and these matched with the information on the PublicPair portal for a randomly chosen patents.

Can you please execute an iteration with 2019 and 2012 week 1 data and check the reason for the references mismatch between the patentr output and the PublicPair portal.

Thanks!

jyprojs / patentr Goto Github PK

patentr's People

Contributors

Stargazers

Watchers

Forkers

patentr's Issues

Recommend Projects

Recommend Topics

Recommend Org