Giter Site home page Giter Site logo

government-interest's Introduction

This repository contains all necessary scripts to run and understand government interest patents.

There are two steps involved in running the scripts:

1. Data Preparation - The scripts in the 1_R_Data_Prep folder will prepare data for analysis.

2. Data Visualization - The scripts in the 2_Data_Viz_Generate folder will generate visualizations.

Step 1: Data Preparation

a. Download the bulk download data files from http://www.patentsview.org/download/. 
These are the bulk download data files you will need:
	1. assignee
	2. foreigncitation
	3. government_interest
	4. government_organization
	5. inventor_gender 
	6. nber 
	7. patent 
	8. patent_assignee
	9. patent_govintorg 
	10.patent_inventor 
	11.rawassignee
	12. usapplicationcitation
	13. uspatentcitation
	14. wipo
	15. wipo_field

	Save these in the 'data_to_read' folder under the '2_Data_Viz_Generate' folder. 

b. Go to the folder '1_R_Data_Prep'. Open the scripts from this folder in R/RStudio. 

c. First look at requirements.R. Make sure you change the **input_folder** and **output_folder** variables to match the folder paths where you stored the bulk download files from part a above (Example folder path: "<Your-Path-Here>/government-interest/2_Data_Viz_Generate/data_to_read/"). 

d. Run the script "assignees_looked_up_types.R" first. Then run through the remaining R scripts in numerical order. When running scripts, make sure your working directory matches your current directory (Example folder path: *"<Your-Path-Here>/government-interest/1_R_Data_Prep/"*)

Note: Some of the scripts will take time to run since several bulk download tables are large. 
Here are estimated running times:
	* assignees_looked_up_types.R (~ 17 minutes)
	* 1_intermediate_patcit.R (~ 1 hour, 15 minutes)
	* 2_create_core_tables (~ 1 hour, 10 minutes)
	* 3_create_assignee_table (~ 6 minutes)
	* 4_inventor_gender (~ 3 minutes)
	* 5_create_5yr_citation_1thru5 (~ 1 hour)

e. All the temporary tables generated from running the scripts in '1_R_Data_Prep' will be saved to the output folder path you specified ("2_Data_Viz_Generate/data_to_read/" folder).
These tables include the following:
	
	Assignees_looked_up_types.R:
		1. assignees_lookedup_types.csv
		2. temp_gi_assignee_type.csv

	Script 1:
		1. temp_num_foreign_documents_cited.csv
		2. temp_num_us_applications_cited.csv
		3. temp_num_us_patents_cited.csv
		4. temp_num_times_cited_by_us_patents.csv
		5. temp_patent_counts_fac_vfinal.csv

	Script 2:
		1. temp_patent_level_all.csv
		2. temp_patent_level_gi_subset.csv
		3. temp_patent_level_nongi_subset.csv

	Script 3:
		1. temp_gi_assignee_type.csv
		2. all_assignees.csv
		3. assignee_type.csv

	Script 4:
		1. temp_govt_associated_inventors_clean.csv
		2. temp_gi_inventor_gender.csv
		3. temp_gi_has_female_inv.csv

	Script 5:
		1. temp_5yr_citations_by_cite_yr1
		2. temp_5yr_citations_by_cite_yr2
		3. temp_5yr_citations_by_cite_yr3
		4. temp_5yr_citations_by_cite_yr4
		5. temp_5yr_citations_by_cite_yr5


f. Additional files: 

1. Go to the following webpage: https://www.aaas.org/programs/r-d-budget-and-policy/historical-trends-federal-rd.
Download the Excel file for "Total R&D by Agency, 1976-2018" under the _By Agency_ section. 

2. Open this file in Excel and transpose the data table. To do this, copy the data table (only upto the "Total R&D" section (ignore the R&D: Defense and Nondefense section)) and use the paste special option _transpose the table_. 

3. Save the transposed table as a csv file with the name "agencies.csv" in the '2_Data_Viz_Generate/data_to_read/' folder.

Step 2: Data Visualizations

a. Go to the folder '2_Data_Viz_Generate'. Open the scripts from this folder in R/RStudio. Make sure you change your working directory to match your current directory (Example folder path: *"<Your-Path-Here>/government-interest/2_Data_Viz_Generate/"*)

b. Run through the script "requirements.R". Be sure to set the input and output folder paths.

c. Next, run the script "govIntBrief.R". This will generate all visualizations and runs for ~ 41 minutes.

d. Run through the "requirements.R" script and then run the script "govIntBrief_assignee.R".

e. Run through the "requirements.R" script and then run the script "govIntBrief_gender.R".

This script will generate output to two folders:
(1) Folder **'data_viz/'**: a folder to store all of the viz that will be generated from running this R script
(2) Folder **'out/'**: a folder to store all of the tables that will be generated from running this R script

government-interest's People

Contributors

patenish avatar sarvo-madhavan avatar evelyndeng-air avatar

Watchers

James Cloos avatar Russ Allen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.