Giter Site home page Giter Site logo

awesome-web-data-extractor's Introduction

awesome-web-data-extractor

A curated list of promising Web Data Extractors resources

80legs - Powerful and Economical Service Platform for Crawling and Processing Web Content
http://www.80legs.com/
Agenty – Hosted Web Scraping Tool
https://www.agenty.com/
Anthracite
http://freecode.com/projects/anthracite
Aristo - Answer Questions with a Knowledgeable Machine http://allenai.org/aristo/
artoo.js - The Client-Side Scraping Companion http://medialab.github.io/artoo/
AutoMate - Automate Data Extraction
https://www.networkautomation.com/

Automated RSS Scraper Scripts
http://www.djeaux.com/rss/
Automated Information Solutions
http://www.automated-info-solutions.com/
Automatic Information Extraction From Semi-Structured Web Pages By Pattern Discovery
http://portal.acm.org/citation.cfm?id=640423&dl=ACM&coll=portal
Beautiful Soup
http://freecode.com/projects/beautifulsoup
Beautiful Soup - HTML/XML Parser for Quick Turnaround Screen Scraping and Web Data Extraction http://www.crummy.com/software/BeautifulSoup/
BLIASoft Knowledge Discovery http://www.bliasoft.com/Eindex.html
Bot Research
http://www.BotResearch.info/
BYU Data Extraction Research Group
http://www.deg.byu.edu/
Captiva Software: Digital Information Capture Software
http://www.emc.com/enterprise-content-management/captiva/captiva.htm
ChartSearch Data Search Technology
http://www.ChartSearch.net/
Client-Side Deep Web Data Extraction
http://www.tic.udc.es/~mad/publications/ceceast2004.pdf
CloudScrape – Extract, Enrich and Connect
http://www.cloudscrape.com/
Common Crawl
http://www.commoncrawl.org/

Connotate – Web Data Extraction and Monitoring
http://www.connotate.com/
Content Grabber – Extract Data from Websites
http://www.ContentGrabber.com/
ContextMiner - Tools to Collect Data, Metadata and Contextual Information http://www.contextminer.org/
cQuery - Content Query Engine
http://cquery.com/
CrawlMonster
http://www.crawlmonster.com/
Crawly
http://crawly.diffbot.com/
Create a Crawler - Extract Data From an Entire Website https://www.import.io/
cURL groks URLs - Command Line Tool for Transferring Data http://curl.haxx.se/
Data Extraction Services
http://www.dataextractionservices.com/
DataHen – Advanced Web Scraping and Data Extraction Services
https://www.datahen.com/
Data Mining Resources
http://www.DataMiningResources.info/
Data Miner – Extract Data From any Website in Seconds
https://data-miner.io/
Dataminr - Real-time Information Discovery http://www.dataminr.com/
Data Scraper – East Web Scraping with Google Chrome
https://chrome.google.com/webstore/detail/data-scraper-easy-web-scr/nndknepjnldbdbepjfgmncbggmopgden?hl=en-US

DataSift - Powerful Social Data Platform http://datasift.com/
Data Toolbar – Web Data Extraction Software Made Simple
http://datatoolbar.com/
DataWatch Monarch – Self-Service Data Preparation
http://www.datawatch.com/
DataWrangler - Data Cleaning and Transformation Tool http://vis.stanford.edu/wrangler/
Deep Web Research 2017
http://www.DeepWebResearch.info/
DEiXTo – Powerful Web Data Extraction Tool Based on W3C DOM
http://deixto.com/
dexi.io – Web Data Processing for Professionals – Extract, Enrich and Connect
https://dexi.io/
DiffBot – Web Data Extraction Using Artificial Intelligence
http://www.DiffBot.com/
Digital Footprints - Collect Facebook Data http://digitalfootprints.dk/
DiscoverText - Import, Sort, Distribute and Analyze Electronic Content from eMail, Document Repositories, and Social Media http://discovertext.com/
Easy PDF Cloud https://www.easypdfcloud.com/
Easy Web Extract – Best Tool for Web Scraping
http://webextract.net/
eGrabber - Data Capture Tools
http://www.egrabber.com/
Facepager - Fetching Public Data From Facebook https://github.com/strohne/Facepager

FeedsAPI - Extract Content from Web Pages Tool http://www.feedsapi.com/
Ficstar Software - Web Data Extraction
http://www.ficstar.com/
File Information Tool Set (FITS) https://projects.iq.harvard.edu/fits
FMiner – Web Scraping Software
http://www.fminer.com/
Fresh WebSuction
http://www.freshwebmaster.com/
Grabby
https://grabby.io/
Grepsr – Web Scraping Made Simple, Fast and Manageable
https://www.grepsr.com/
Helium Scraper
http://www.heliumscraper.com/
Huginn - Your Agents Are Standing By https://github.com/cantino/huginn
iMacros – Data Extraction
http://imacros.net/overview
Imagination Engines
http://www.Imagination-Engines.com/
Import.io - Turn the Web Into Data With Extractors, Crawlers and Connectors https://import.io/
InfoExtractor - Extracts Relevant Information from Blogs, YouTube and Twitter http://www.infoextractor.org/
Information Retrieval (IR) and Information Extraction (IE) on the Web
http://www.webir.org/

Introduction to Information Retrieval
http://www-nlp.stanford.edu/IR-book/
iOpus Internet Macros
http://www.iopus.com/imacros/
iRobotSoft – Visual Web Scraping and Web Automation
http://irobotsoft.com/
iWeb Scraping Services
http://www.iwebscraping.com/
Junar - Discovering Data http://www.junar.com/
Karma - Data Integration Tool
http://www.isi.edu/integration/karma/
Kimono - Turn Website Into Structured APIs From Your Browser In Seconds https://www.kimonolabs.com/
Knowledge Discovery Resources
http://www.KnowledgeDiscovery.info/
Knowlesys® - Web Data Extraction, Web Grabber and Screen Scraper
http://www.knowlesys.com/index.htm
Liberty Metrics – Web Scraping Services
http://libertymetrics.com/
LingPipe – Information Extraction and Data Mining Tools
http://alias-i.com/lingpipe/
Metadata Extraction Tool
http://meta-extractor.sourceforge.net/
Mozenda – Comprehensive Web Data Gathering
http://www.mozenda.com/
NCapture - Capture Web Content http://www.qsrinternational.com/products_nvivo_add-ons.aspx

Netlytic - Making Sense of Online Conversations https://netlytic.org/home/
Newprosoft – Web Data Extraction Software
http://newprosoft.com/
NewsClipper.com - Snip and Ship Dynamic News Content to Your Web Pages
http://www.newsclipper.com/
Octoparse – Automated Web Scraping Software
http://www.octoparse.com/
Online Data Extractor Tool
http://www.onlinedataextractor.com/
OutWit Hub - Harvest the Web With Your Own Web Collection Engine http://www.outwit.com/
ParseHub – Web Crawling Using Machine Learning
http://www.ParseHub.com/
Pervasive Data Management and Integration Products
http://www.pervasive.com/
Priceonomics - Crawl Data From the Web http://priceonomics.com/
QL2 Software - Unstructured Data Management and Web Mining Software
http://www.ql2.com/
Quick Code
https://quickcode.io/
REBOL Technologies
http://www.rebol.com/
SalesTools.io
https://salestools.io/
Semantic Scholar - Free Scientific Literature Search and Discovery http://allenai.org/semantic-scholar/

ScrapeForge
http://freecode.com/projects/scrapeforge
ScrapeHero
https://www.scrapehero.com/
Scraper
http://freecode.com/projects/scraper
ScrapingHub – Cloud Based Data Extraction Tool
http://www.ScrapingHub.com/
Scraping Solutions – When the Solution You Seek Seems Impossible
https://www.scrapingsolutions.com.au/
Scrapy – Open Source Web Scraping Framework for Python
http://scrapy.org/
Screen-Scraper
http://freecode.com/projects/screenscraper
Screen-Scraper – Extracts Information From Web Sites
http://www.Screen-Scraper.com/
Screenscraping the Senate by Paul Ford
http://www.xml.com/pub/a/2004/09/01/hack-congress.html
Search and Replace with TextPipe Pattern Matching
http://www.datamystic.com/textpipe.html
Sensible Code
http://sensiblecode.io/
Social Media Data Collection Tools http://socialmediadata.wikidot.com/
Software for Web Scraping
http://scraping.pro/software-for-web-scraping/
Spinn3r - Indexing the Blogosphere http://docs.spinn3r.com/#overview

SPSS Modeler
http://developer.ibm.com/predictiveanalytics
Squirro - Find, Remember, Organize and Share Important Information https://squirro.com/
STACKS - Social Media Tracker, Analyzer, & Collector Toolkit at Syracuse https://github.com/bitslabsyr/stack
TadaWeb - Clone and Amplify Human Intelligence for Web Data Collection and Analysis https://www.tadaweb.com/
Texifter - Search, Sift, Sort, Classify and Analyze http://texifter.com/
TextConverter 4
https://www.simx.com/
TextRazor - Text Analysis Infrastructure https://www.textrazor.com/
Topicgrazer - Graze On Web Pages and Documents http://www.topicscape.com/Topicgrazer/help.php
UiPath – Web Data Extraction
https://www.uipath.com/guides/web-data-extraction
Unit Miner - Web Data Extraction Software
http://www.unitminer.com/
VietSpider
http://binhgiang.sourceforge.net/
VisualScraper – Web Data Extractor
http://www.VisualScraper.com/
Visual Web Ripper – Data Extraction Software
http://www.VisualWebRipper.com/
Visual Web Task
http://www.lencom.com/VisualWTSite.html

W3C Publishes Data Extraction Language (DEL) as W3C Note
http://xml.coverpages.org/ni2001-11-06-a.html
Web Content Extractor
http://www.newprosoft.com/
Web Data Extraction
http://www.wintask.com/web-data-extraction.php
Web Data Extraction Software Data Toolbar
https://webdataextractionsoftwaredatatoolbar.en.softonic.com/
Web Data Extractor
http://www.rafasoft.com/
Web Data Extractor
http://www.webextractor.com/
Web Data Extractor
http://fivesmallq.github.io/web-data-extractor
Web Data Extractor
http://www.lantechsoft.com/web-data-extractor.html
Web Data Guru – Web Data Extraction and Scraping Services
http://www.webdataguru.com/
Web-Harvest – Open Source Web Data Extraction Tool
http://web-harvest.sourceforge.net/index.php
WebHarvy – Intuitive Powerful Visual Web Scraper
https://www.webharvy.com/index.html
Webhose.io – Web Data For Your Business
http://www.webhose.io/
Web Robots – Web Scraping and Crawling
https://webrobots.io/
Web Scraper
http://www.webscraper.io/

Web Scraping – Wikipedia
https://en.wikipedia.org/wiki/Web_scraping
Website Data Extractor – Time to Rethink Web Scraping
http://www.kofax.com/
Website Extractor – Offline Browser
http://www.internet-soft.com/extractor.htm
WebSunDew – Advanced Web Scraping Tool
http://www.websundew.com/
Wikimedia Public Data Dumps http://meta.wikimedia.org/wiki/Data_dumps
WinAutomation
http://www.winautomation.com/
XRay Web Scraping Tool
http://freecode.com/projects/xrayguibasedwebscrapingtool
YaCy Web page Indexer
http://freecode.com/projects/yacy

Subject Tracer™ Information Blogs
Subject Tracer™ Information Blogs created and developed by the Virtual Private Library™ combine the best of the latest tools on the Internet. Using bots, blogs and news aggregators the Subject Tracer™ Information blogs generate RSS feeds with the latest resources to create a current information resource flow through niched subject tracers. I am proud to be the creator of the Internet’s first Subject Tracer™ Information Blogs:
Virtual Private Library™ http://www.VirtualPrivateLibrary.com/
Accessibility Resources
http://www.AccessibilityResources.info/
Agriculture Resources
http://www.AgricultureResources.info/
AnswerSpot
http://www.AnswerSpot.us/
Artificial Intelligence Resources
http://www.AIResources.info/
Astronomy Resources
http://www.AstronomyResources.info/
Auction Resources
http://www.AuctionResources.info/
Biological Informatics
http://www.BiologicalInformatics.info/
Biotechnology Resources
http://www.BiotechnologyResources.info/
Bot Research
http://www.BotResearch.info/
Business Intelligence Resources
http://www.BIResources.info/

ChatterBots
http://www.ChatterBots.info/
Data Mining Resources
http://www.DataMiningResources.info/
Deep Web Research
http://www.DeepWebResearch.info/
Directory Resources
http://www.DirectoryResources.info/
eCommerce Resources
http://eCommerceResources.info/
Education and Academic Resources
http://www.EducationResources.info/
Elder Resources
http://www.ElderResources.info/
Employment Resources
http://www.EmploymentResources.info/
Entrepreneurial Resources
http://www.EntrepreneurialResources.info/
Fact Checkers Directory
http://www.FactCheckers.info/
Financial Sources
http://www.FinancialSources.info/
Finding People
http://www.FindingPeople.info/
Games Resources
http://www.GamesResources.info/
Genealogy Resources
http://www.GenealogyResources.info/

Grant Resources
http://www.GrantResources.info/
Green Files
http://www.GreenFiles.info/
Grid, Distributed and Cloud Computing Resources
http://www.GridResources.info/
Healthcare Resources
http://www.HealthcareResources.info/
Information Futures Markets
http://www.InformationFuturesMarkets.com/
Information Quality Resources
http://www.InformationQualityResources.info/
International Trade Resources
http://www.InternationalTradeResources.info/
Internet Alerts
http://www.InternetAlerts.info/
Internet Demographics
http://www.InternetDemographics.info/
Internet Experts 2016
http://www.InternetExperts.info/
Internet Hoaxes
http://www.InternetHoaxes.info/
Intrapreneurial Resources
http://www.IntrapreneurialResources.info/
Journalism Resources
http://www.JournalismResources.info/
Knowledge Discovery
http://www.KnowledgeDiscovery.info/

Military Resources
http://www.MilitaryResources.info/
New Economy Analytics, Resources and Alerts
http://www.NewEconomyAnalytics.com/
Outsourcing/Offshoring Information and Resources
http://www.OutsourcingOffshore.us/
Privacy Resources
http://www.PrivacyResources.info/
ProxyCrawl crawling and scraping tools https://proxycrawl.com Reference Resources
http://www.ReferenceResources.info/
Research Resources
http://www.ResearchResources.info/
RestStress™
http://www.RestStress.com/
Script Resources
http://www.ScriptResources.info/
ShoppingBots
http://www.ShoppingBots.info/
Social Informatics
http://www.SocialInformatics.info/
Statistics Resources and Big Data
http://www.StatisticsResources.info/
Student Research
http://www.StudentResearch.info/
Theology Resources
http://www.TheologyResources.info/
Tutorial Resources
http://www.TutorialResources.info/

World Wide Web Reference
http://www.WWWReference.info/

Orginial material from and Inspired by Web Data Extractors 2018 A White Paper Link Compilation written by Marcus P. Zillman, M.S., A.M.H.A

awesome-web-data-extractor's People

Contributors

crawlbase avatar wanghaisheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.