Giter Site home page Giter Site logo

rimij405 / documentparser Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 114 KB

Document Parsing is difficult. This application is being created to knock out the bugs and learn how to use the NetOffice framework. (The rival DocX framework may also be tested).

License: GNU Affero General Public License v3.0

C# 86.56% HTML 13.44%

documentparser's Introduction

Document Parser

This progam is being built to profile resumes, extracting client information, and store it in such a way that the data becomes manageable.

Development Process:

  1. Framework that will store the data. Contact information, experiences, and education all go here.

  2. Application that will handle extracting data.

  3. Application that will handle formatting the extracted data.

  4. Framework that will handle the conversion of one document to the other.

Who is this for?

This program was originally created for a recruiting agency, the source code is hereby made open source and released for free, for the open source community under the guidelines specified by the AGPLv2 license.

Who created this?

You can learn more about the the developer, here, or check out his Github Profile.

Github | @Rimij405
Twitter | @Rimij405
LinkedIn | Ian Effendi

License

Boilerplate

Resume Scraper Copyright (C) 2016 Ian A.Effendi

This project has been created for the purpose of scraping data and information from clients from different documents and placing it into a separate document guided by a series of design specifications as designated by the code.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see: [ http://www.gnu.org/licenses/ ].

Dependencies and Attributions

This project makes use of the following, open-source, external libraries:

- libphonenumber-csharp (Under the Apache 2.0 License)

< https://github.com/erezak/libphonenumber-csharp/blob/master/LICENSE >

This library is utilized by the Telephone.cs class to help facilitate the extraction and formation of telephone numbers in a variety of document settings.This particular library is a port of the original Java library.


- US Address Parser(Under the GNU General Public License v. 2)

< https://usaddress.codeplex.com/ >

This library is utilized by the AddressBook.cs class to help facilitate the extraction and formation of street addresses in a variety of documents and settings.This library is a partial port of the original Perl module "GEO::StreetAddress:US" written by Schuyler D. Erle. for CPAN.


- DocX (Under the Microsoft Public License)

< https://docx.codeplex.com >

This.Net library allows developers to manipulate Word documents from Word 2007/2010/2013. It does not require Microsoft Word or Office to be installed.It was written by Cathal Coffey.


- iTextSharp Library (Under the GNU Affero General Public License v. 3)

< http://itextpdf.com/ >

This library helps automate the documentation process involving PDF files.IText's license prevents this source code from being developed for commercial purpose as a commercial waiver for the limitations of the AGPLv3 license, has not been purchased at present.

Use of iTextSharp is permitted so long as this source remains open source.


- PDFSharp Library (Under the MIT License)

< http://www.pdfsharp.net/ >

PDFSharp is an Open Source .NET library that can easily create and process PDF documents, on the fly, from any .NET language. The smae drawing routines can be used to create PDF documents, draw on the screen, or send output to any printer.


- MigraDoc Foundation Library (Under the MIT License)

< http://www.pdfsharp.net/ >

MigraDoc Foundation is an Open Source .NET library that can easily create documents based on an object model with paragraphs, tables, styles, etc., and render them into PDF's or RTF's.


- NPOI Library (Under the Apache 2.0 License)

< https://npoi.codeplex.com/ >

NPOI is the .NET version of the POI Java project at < http://poi.apache.org/ >. POI is an open source project that can help you read/write .xls, .doc, and .ppt files, having a wide application.


- Newtonsoft Json.NET Library (Under the MIT License)

< http://www.newtonsoft.com/json >

A high-performance, world-class JSON Serializer library, that was released as open-source under the MIT license.

Supports LINQ queries, XML conversion, and the.NET language making it incredibly versatile, and invaluable.


- SharpZipLib Library (Under the MIT License)

< https://github.com/icsharpcode/SharpZipLib >

This library was previously released under the GNU General Public License v. 2 (GPLv2), however, it has since been re-released under the MIT License, a simpler, more permissive, license.

documentparser's People

Contributors

effendiian avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.