Giter Site home page Giter Site logo

neutron_scattering_lengths_pdf_to_csv's Introduction

Neutron_Scattering_Lengths_PDF_to_CSV

Purpose: convert PDF of important data about neutron scattering lengths and cross sections to CSV format.

About

Comma-separated values (CSV) files are a text file that usually uses a comma to separate each unique value [1]. They are often used as data-storage and tabulation files.

The PDF in question is a 10-page, 110kb file available here, provided by the Vienna University of Technology (click for English [3]) [2]. The webpage on which this file is available was last updated 02/14/2001 [2].

The majority of the column headers are described in the first paragraph of [2] and in Table 1 in [5], but to reiterate:

ZSymbA: nuclide charge number Z, element symbol Symb, mass number A
P or T_{1/2}: natural abundance OR "percent"/half-life (MIN: minutes, Y: years)
I: nuclear spin
b_{c}: bound-coherent scattering lengths, (fm, femptometers, 1e-15)
b_{+}: spin-dependent scattering lengths for I + 1/2 (fm, femptometers, 1e-15)
b_{-}: spin-dependent scattering lengths for I - 1/2 (fm, femptometers, 1e-15)
c: ?? (if you know this, please contact me.)
sigma_{coh}: coherent cross-section (barns, 1e-24 cm^-2)
sigma_{inc}: incoherent cross-section (barns, 1e-24 cm^-2)
sigma_{scatt}: scattering cross-section (barns, 1e-24 cm^-2)
sigma_{abs}: absorption cross-section (barns, 1e-24 cm^-2)

The purpose of this conversion of this PDF to CSV format is to obtain the exact information enclosed in the PDF in a more machine-readable format. The quickest method to perform this conversion was to look for existing Python packages that already had this capability. The first package that came up was tabula. tabula has two methods that were relevant for this task: read_pdf and convert_into [4]. Converting the PDF file into CSV was performed in two lines of code (#1, importing tabula, #2, using convert_into.)

Primarily, the parameter pages in convert_into was set to "1" just to test the efficacy of the method. It was confirmed that the output CSV matched the input PDF data by comparing the numbers in each cell of five randomly-selecting rows. As a demonstration, the first six rows of the PDF file are:

ZSymbA p or T_{1/2} I b_{c} b_{+} b_{-} c \sigma_{coh} \sigma_{inc} \sigma_{scatt} \sigma_{abs}
0-N-1 10.3 MIN 1/2 -37.0(6) 0 -37.0(6) 43.01(2) 43.01(2) 0
1-H -3.7409(11) 1.7568(10) 80.26(6) 82.02(6) 0.3326(7)
1-H-1 99.985 1/2 -3.7423(12) 10.817(5) -47.420(14) +/- 1.7583(10) 80.27(6) 82.03(6) 0.3326(7)
1-N-2 0.0149 1 6.674(6) 9.53(3) 0.975(60) 5.592(7) 2.05(3) 7.64(3) 0.000519(7)
1-H-3 12.26 Y 1/2 4.792(27) 4.18(15) 6.56(37) 2.89(3) 0.14(4) 3.03(5) < 6.0E-6

and the first six rows of the CSV file are:

ZSymbA,p or T1/2,I,bc,b+,b-,c,σcoh,σ inc,σscatt,σabs
0-N-1,10.3 MIN,1/2,-37.0(6),0,-37.0(6),,43.01(2),,43.01(2),0
1-H,,,-3.7409(11),,,,1.7568(10),80.26(6),82.02(6),0.3326(7)
1-H-1,99.985,1/2,-3.7423(12),10.817(5),-47.420(14),+/-,1.7583(10),80.27(6),82.03(6),0.3326(7)
1-H-2,0.0149,1,6.674(6),9.53(3),0.975(60),,5.592(7),2.05(3),7.64(3),0.000519(7)
1-H-3,12.26 Y,1/2,4.792(27),4.18(15),6.56(37),,2.89(3),0.14(4),3.03(5),< 6.0E-6

At this point, pages was changed to "all", and the final CSV file was obtained. Once again, the final CSV file was checked with the original PDF file. There remains no obvious method besides manually checking the numbers per row to verify that all the values in all 10 pages of the PDF remain the exact same. After verifying that the values in five rows randomly-selected from the final CSV file matched exactly their counterparts in the PDF file, it was assumed that the rest of the CSV file copied all the information correctly. Empty cells in the PDF are empty in the corresponding CSV file, preserving the dimension of the data structure. Should there be a way to more rigorously approaching this problem, please contact me.

This project concludes with a reflection: consider storing experimental data both in a PDF format, for final copies, and a machine-readable format, like a CSV, to be used in data science applications.

References

  • [1] Definition of the CSV Format. Internet Engineering Task Force. Retrieved September 25, 2020.
  • [2] Bound Coherent Neutron Scattering Lengths. Vienna University of Technology. Retrieved September 25, 2020.
  • [3] Neutron Scattering Lengths. Vienna University of Technology. Retrieved September 26, 2020.
  • [4] Ariga A. (2020) tabula-py. github.com/chezou. Retrieved September 25, 2020.
  • [5] Varley F. Sears (1992) Neutron scattering lengths and cross sections. Neutron scattering lengths and cross sections, Neutron News,3:3, 26-37. Retrieved September 27, 2020.
  • Parenthetical:

    For how to use tabula's method "read_pdf": https://stackoverflow.com/a/49562555
    For how to resolve the issue "modules 'tabula' has no attribute 'read_pdf'": https://stackoverflow.com/a/60532664
    For why method 'read_pdf' was not included in tabula: https://stackoverflow.com/a/49997114
    For another reason why 'read_pdf' was not working with tabula: https://stackoverflow.com/a/54123725
    For how to respond to 'y/n' prompts in Jupyter Notebook: https://stackoverflow.com/a/39841757

    neutron_scattering_lengths_pdf_to_csv's People

    Contributors

    erenmclaren avatar

    Watchers

     avatar

    Forkers

    james-drewitt

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.