Giter Site home page Giter Site logo

i10brook / national_center_for_immunization_and_respiratory_diseases_about_national_immunizations_in_children Goto Github PK

View Code? Open in Web Editor NEW

This project forked from philipuit/national_center_for_immunization_and_respiratory_diseases_about_national_immunizations_in_children

0.0 0.0 0.0 0 B

Retrieve all of the data within nispuf14.dat and store it in a more accessible format Accessible format can be any of the following: csv file json file relational database For this assignment, feel free to use a dataframe (python library Pandas) for intermediate steps. We will work with 2 datasets this week: NISPUF14_CODEBOOK.PDF & nispuf14.dat (attached) from the National Center for Immunization and Respiratory Diseases about National Immunizations in Children. We will need to read the .pdf file to be able to better understand the .dat file, as we will outline below. NISPUF14_CODEBOOK.PDF is a PDF that contains a description of the format for the data in nispuf14.dat. In other words, the PDF tells you how to read the data in nispuf14.dat. Why would we need a PDF to tell us how to read our data? Well, this data file is stored in a positional format. This means that both the value and relative position of each character provides meaning within the dataset.

Jupyter Notebook 100.00%

national_center_for_immunization_and_respiratory_diseases_about_national_immunizations_in_children's Introduction

National Center for Immunization and Respiratory Diseases about National Immunizations in Children

Retrieve all of the data within nispuf14.dat and store it in a more accessible format Accessible format can be any of the following: csv file json file relational database

About the datasets:

We will work with 2 datasets this week: NISPUF14_CODEBOOK.PDF & nispuf14.dat (attached) from the National Center for Immunization and Respiratory Diseases about National Immunizations in Children.

We will need to read the .pdf file to be able to better understand the .dat file, as we will outline below.

NISPUF14_CODEBOOK.PDF is a PDF that contains a description of the format for the data in nispuf14.dat. In other words, the PDF tells you how to read the data in nispuf14.dat.

Why would we need a PDF to tell us how to read our data? Well, this data file is stored in a positional format. This means that both the value and relative position of each character provides meaning within the dataset.

Setup importing packages we will need

import PyPDF2 from PyPDF2 import PdfFileReader, PdfFileWriter,PdfFileMerger

import pandas as pd import numpy as np import json import tabula library import tabula

Summary

In conclusion, we looked at the retrieving and cleansing of data from a survey generated by the National Center for Immunization and Respiratory Diseases about National Immunizations in Children. We were successfully able to pull in the data from the NISPUF14_CODEBOOK.PDF file using pyPDF2 & tabula to place into a clean dataframe with tables form. We were able to clean the data to find only the data in Section 1, that was relevant to the .dat file. Once we cleaned the dataframe and ran pivot_table with column=column names, function to be able to isolate the information the was only needed in the nispuf14.dat file to understand it. After gaining insight of the data from the .pdf file, we succesfully loaded the .dat file and did some cleaning/cleansing on that dataset. It was a tough dataset to read, but with the work done from tabular & pivot_table of the .pdf file, we were able to understand that we only needed lines 0-97 for Section 1 info. We also learned the variable, varibale name, and line placement. We could then define them and place in their unique defined catelog to define in their own library if needed. We finally were successfuly able to retreive all data in the .dat file and store in a more acessible file, being .json &.csv (files attached).

References:

Acrobat, A. (2020). the International Organization for Standardization . In Adobe Acrobat. Retrieved from https://acrobat.adobe.com/us/en/acrobat/about-adobe-pdf.html.

greenvolunteers.org, . (n.d.). Main Features of PDF Format: Pros and Cons. In The World Guide and Database for Volunteer Work In Nature Conservation. Retrieved from https://www.greenvolunteers.org/main-features.html.

Judith, . (2017, March 29). PDF Alternatives โ€“ Other File Formats You Can Use. In PDF2go. Retrieved from https://blog.pdf2go.com/2017/03/29/pdf-alternatives-other-file-formats-you-can-use/.

Knowles, S. (2017, July 11). History of PDF: Creating the World's Most Popular File Format. In PDF Pro. Retrieved from https://www.pdfpro.co/blog/history-pdf.

Massart, R. (2015, April 13). 7 Big Benefits of Using PDF for Business. In Peernet. Retrieved from https://www.peernet.com/pdf-benefits-for-business/.

national_center_for_immunization_and_respiratory_diseases_about_national_immunizations_in_children's People

Contributors

philipuit avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.