derek-jones / pdf-2-csv Goto Github PK
View Code? Open in Web Editor NEWExtracts points in graphs to csv
License: Mozilla Public License 2.0
Extracts points in graphs to csv
License: Mozilla Public License 2.0
Converting a graph from pdf to csv first requires that all the associated pdf operations be identified.
When viewing a pdf, using Mozilla's PDF.js, the user highlights a region on the page. All pdf operations associated with this region need to be identified and extracted from the pdf.
A graph will contain at least one x and y axis (assume one of each to start).
The most likely start/end of the x and y axis needs to be identified.
Identify the two longest lines at right-angles to each other, is one possible heuristic.
pdf operations use page relative coordinates. Knowing the page relative coordinates allows the page relative coordinates of graph points to be mapped to graph relative coordinates.
Extracting x/y axis relative coordinates of points in a graph requires knowing the start/end values and scale used for the x and y axis.
The start/end values and whether a linear or log scale is used, needs to be identified for each of the x and y axis.
The method used by particular a pdf generation tool to denote a single point (e.g., circle or cross) needs to be identified. Tools often embed an identifying string in the pdf they generate.
Some tools create their own 'point' character by, for instance, drawing vertical/horizontal lines. Things get complicated when the tool generates pdf operations that draws all the horizontal lines first, followed by all the vertical lines (or vice versa).
The following draws a circle on the page:
/F1 1 Tf 1 Tr 6.21 0 0 6.21 135.35 423.79 Tm (l) Tj 0 Tr
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.