Topic: scraping Goto Github

Some thing interesting about scraping

👇 Here are 5863 public repositories matching this topic...

aapatre / automatic-udemy-course-enroller-get-paid-udemy-courses-for-free

scraping,Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

User: aapatre

python python3 scraper scraping selenium

adbar / trafilatura

scraping,Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

User: adbar

Home Page: https://trafilatura.readthedocs.io

article-extractor corpus corpus-builder corpus-tools crawler html-to-markdown html2text news news-aggregator news-crawler nlp readability rss-feed scraping tei text-cleaning text-extraction text-mining text-preprocessing web-scraping

alirezamika / autoscraper

scraping,A Smart, Automatic, Fast and Lightweight Web Scraper for Python

User: alirezamika

scraping scraper scrape webscraping crawler web-scraping ai artificial-intelligence python webautomation automation machine-learning

altimis / scweet

scraping,A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

User: altimis

selenium-webdriver scraper scraping twitter tweets python following followers twitter-scraper scrape

apify / crawlee

scraping,Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Organization: apify

Home Page: https://crawlee.dev

apify automation crawler crawling headless headless-chrome javascript nodejs npm playwright puppeteer scraper scraping typescript web-crawler web-crawling web-scraping

apify / crawlee-python

scraping,Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Organization: apify

Home Page: https://crawlee.dev/python/

apify automation beautifulsoup crawler crawling headless headless-chrome pip playwright python

apify / fingerprint-suite

scraping,Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

Organization: apify

fingerprinting playwright puppeteer scraping typescript

claffin / cloudproxy

scraping,Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.

User: claffin

Home Page: https://cloudproxy.io/

cloud proxy proxy-server scraping

code4craft / webmagic

scraping,A scalable web crawler framework for Java.

User: code4craft

Home Page: http://webmagic.io/

crawler java scraping framework

d60 / twikit

User: d60

Home Page: https://twikit.readthedocs.io/en/latest/twikit.html

python bot client python3 scraper scraping search twitter wrapper twitter-api

damklis / dataengineeringproject

scraping,Example end to end data engineering project.

User: damklis

big-data scraping mongodb elasticsearch data-engineering kafka kafka-connect debezium django-rest-framework redis airflow minio s3 python data-pipeline hacktoberfest

elixir-crawly / crawly

scraping,Crawly, a high-level web crawling & scraping framework for Elixir.

Organization: elixir-crawly

Home Page: https://hexdocs.pm/crawly

elixir erlang scraper scraping scraping-websites extract-data spider crawler crawling

emadehsan / thal

scraping,Getting started with Puppeteer and Chrome Headless for Web Scraping

User: emadehsan

Home Page: https://emadehsan.com

puppeteer chrome-headless nodejs scraping mongoose mongodb

fake-useragent / fake-useragent

scraping,Up-to-date simple useragent faker with real world database

Organization: fake-useragent

Home Page: https://pypi.python.org/pypi/fake-useragent

python python3 user agent fake faker scraping user-agent user-agent-spoofer useragent

feder-cr / linkedin_auto_jobs_applier_with_ai

scraping,LinkedIn_AIHawk is a tool that automates the jobs application process on LinkedIn. Utilizing artificial intelligence, it enables users to apply for multiple job offers in an automated and personalized way.

User: feder-cr

automation linkedin-api linkedin-scraper bot challenge automate chatgpt gpt job jobsearch

geziyor / geziyor

scraping,Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

Organization: geziyor

crawler go scraper scraping spider

gocolly / colly

scraping,Elegant Scraper and Crawler Framework for Golang

Organization: gocolly

Home Page: https://go-colly.org/

golang scraper framework crawler scraping crawling spider go

holgerd77 / django-dynamic-scraper

scraping,Creating Scrapy scrapers via the Django admin interface

User: holgerd77

Home Page: http://django-dynamic-scraper.readthedocs.io

python django scraper scraping scrapy spider webscraping

iawia002 / lulu

scraping,[Unmaintained] A simple and clean video/music/image downloader 👾

User: iawia002

crawler crawling downloader python python3 scraper scraping video

iiab / iiab

scraping,Internet-in-a-Box - Build your own LIBRARY OF ALEXANDRIA with a Raspberry Pi !

Organization: iiab

Home Page: https://internet-in-a-box.org

learning hotspot library scraping raspberry-pi knowledge medical prisoners-rights human-rights curriculum-design diy community-networks mesh-networks privacy civic-tech offline home-school international-development distraction-free education

istresearch / scrapy-cluster

scraping,This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

Organization: istresearch

Home Page: http://scrapy-cluster.readthedocs.io/

python scrapy kafka redis scraping distributed

jfilter / clean-text

scraping,🧹 Python package for text cleaning

User: jfilter

python natural-language-processing text-cleaning text-normalization text-preprocessing python-package nlp user-generated-content scraping

kevinzg / facebook-scraper

scraping,Scrape Facebook public pages without an API key

User: kevinzg

facebook facebook-scraper facebook-scraping hacktoberfest scraping

khuyentran1401 / data-science

scraping,Collection of useful data science topics along with articles, videos, and code

User: khuyentran1401

Home Page: https://khuyentran1401.github.io/Data-science/

data-science machine-learning natural-language-processing python data-visualization data-analysis articles artificial-intelligence time-series scraping

leoncvlt / loconotion

scraping,📄 Python tool to turn Notion.so pages into lightweight, customizable static websites

User: leoncvlt

notion pyhton scraping static-site-generator

lorey / mlscraper

scraping,🤖 Scrape data from HTML websites automatically by just providing examples

User: lorey

Home Page: https://pypi.org/project/mlscraper/

scraping crawling html machine-learning extraction-engine scraper crawler crawler-python

lorien / awesome-web-scraping

scraping,List of libraries, tools and APIs for web scraping and data processing.

User: lorien

web-scraping captcha-bypass captcha-recaptcha crawling crawling-framework crawling-python crawling-tool scraping scraping-framework scraping-python

lorien / grab

scraping,Web Scraping Framework

User: lorien

Home Page: https://grab.readthedocs.io

web-scraping http-client framework python pycurl asynchronous network urllib3 spider crawler crawling scraping python-library python3

medialab / artoo

scraping,artoo.js - the client-side scraping companion.

Organization: medialab

Home Page: http://medialab.github.io/artoo/

scraping

meetmangukiya / instagram-scraper

scraping,Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.

User: meetmangukiya

instagram no-authentication javascript client scraping python-3-6 requests-html instagram-scraper

mendableai / firecrawl

scraping,🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

Organization: mendableai

Home Page: https://firecrawl.dev

ai ai-scraping crawler data html-to-markdown llm markdown rag scraper scraping web-crawler

montferret / ferret

scraping,Declarative web scraping

Organization: montferret

Home Page: https://www.montferret.dev/

cdp chrome cli crawler crawling data-mining dsl go golang hacktoberfest library query-language scraper scraping scraping-websites tool

nikolait / googlescraper

scraping,A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.

User: nikolait

Home Page: https://scrapeulous.com/

crawler python scraping search-engine search-engine-optimization search-engines

okfn-brasil / querido-diario

scraping,📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.

Organization: okfn-brasil

Home Page: https://queridodiario.ok.org.br/

data-science machine-learning politics artificial-intelligence open-data civic-tech governments-gazettes govtech spider scraping hacktoberfest hacktoberfest2023

online-judge-tools / oj

scraping,Tools for various online judges. Downloading sample cases, generating additional test cases, testing your code, and submitting it.

Organization: online-judge-tools

atcoder automation codeforces competitive-programming programming-contests scraping testing

oscarotero / embed

scraping,Get info from any web service or page

User: oscarotero

opengraph twitter-cards embeds scraping oembed

psf / requests-html

scraping,Pythonic HTML Parsing for Humans™

Organization: psf

Home Page: http://html.python-requests.org

html scraping python requests http kennethreitz lxml pyquery css-selectors beautifulsoup

scrapegraphai / scrapegraph-ai

scraping,Python scraper based on AI

Organization: scrapegraphai

Home Page: https://scrapegraphai.com

machine-learning scraping scraping-python scrapingweb automated-scraper sc gpt-3 gpt-4 llm llama3

scrapy / parsel

scraping,Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Organization: scrapy

python lxml xpath xml selectors css scraping hacktoberfest

scrapy / scrapy

scraping,Scrapy, a fast high-level web crawling & scraping framework for Python.

Organization: scrapy

Home Page: https://scrapy.org

crawler crawling framework hacktoberfest python scraping web-scraping web-scraping-python

simonw / shot-scraper

scraping,A command-line utility for taking automated screenshots of websites

User: simonw

Home Page: https://shot-scraper.datasette.io

playwright playwright-python scraping screenshot-utility screenshots

smartproxy / smartproxy

scraping,HTTP(S)/SOCKS5 rotating residential proxies - code examples & general information.

Organization: smartproxy

Home Page: https://smartproxy.com

proxy proxies https-proxy proxy-server python data-collection proxy-integration proxy-list python-scraper scraping

snooppr / snoop

scraping,Snoop — инструмент разведки на основе открытых данных (OSINT world)

User: snooppr

Home Page: https://github.com/snooppr/snoop/releases

blueteam ctf geo geocoder infosec ip nickname osint parser pentest police redteam scanner scraping security termux username username-checker username-search web-scraping

sparklemotion / mechanize

scraping,Mechanize is a ruby library that makes automated web interaction easy.

Organization: sparklemotion

Home Page: https://www.rubydoc.info/gems/mechanize/

ruby scraping web

spider-rs / spider

scraping,The fastest, most efficient web crawler and scraper written in Rust.

Organization: spider-rs

Home Page: https://spider.cloud

crawler indexer rust spider scraping headless-chrome ai-scraping llm-crawler web-crawler

symfony / panther

scraping,A browser testing and web crawling library for PHP and Symfony

Organization: symfony

scraping e2e-testing webdriver selenium selenium-webdriver symfony php chromedriver hacktoberfest

tabulapdf / tabula

scraping,Tabula is a tool for liberating data tables trapped inside PDF files

Organization: tabulapdf

Home Page: http://tabula.technology

pdf csv excel tables scraping

transitive-bullshit / awesome-puppeteer

scraping,A curated list of awesome puppeteer resources.

User: transitive-bullshit

automation awesome awesome-list crawling headless-chrome puppeteer scraping

ultrafunkamsterdam / undetected-chromedriver

scraping,Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

User: ultrafunkamsterdam

Home Page: https://github.com/UltrafunkAmsterdam/undetected-chromedriver

anti-bot anti-detection automation bot-detection browser captcha chrome chromedriver cloudflare cloudflare-bypass distil navigator python3 scraping selenium testing webdriver

yujiosaka / headless-chrome-crawler

scraping,Distributed crawler powered by Headless Chrome

User: yujiosaka

chrome chromium crawler crawling headless-chrome jquery promise puppeteer scraper scraping

Topic: scraping Goto Github

👇 Here are 5863 public repositories matching this topic...

aapatre / automatic-udemy-course-enroller-get-paid-udemy-courses-for-free

adbar / trafilatura

alirezamika / autoscraper

altimis / scweet

apify / crawlee

apify / crawlee-python

apify / fingerprint-suite

claffin / cloudproxy

code4craft / webmagic

d60 / twikit

damklis / dataengineeringproject

elixir-crawly / crawly

emadehsan / thal

fake-useragent / fake-useragent

feder-cr / linkedin_auto_jobs_applier_with_ai

geziyor / geziyor

gocolly / colly

holgerd77 / django-dynamic-scraper

iawia002 / lulu

iiab / iiab

istresearch / scrapy-cluster

jfilter / clean-text

kevinzg / facebook-scraper

khuyentran1401 / data-science

leoncvlt / loconotion

lorey / mlscraper

lorien / awesome-web-scraping

lorien / grab

medialab / artoo

meetmangukiya / instagram-scraper

mendableai / firecrawl

montferret / ferret

nikolait / googlescraper

okfn-brasil / querido-diario

online-judge-tools / oj

oscarotero / embed

psf / requests-html

scrapegraphai / scrapegraph-ai

scrapy / parsel

scrapy / scrapy

simonw / shot-scraper

smartproxy / smartproxy

snooppr / snoop

sparklemotion / mechanize

spider-rs / spider

symfony / panther

tabulapdf / tabula

transitive-bullshit / awesome-puppeteer

ultrafunkamsterdam / undetected-chromedriver

yujiosaka / headless-chrome-crawler

Recommend Projects

Recommend Topics

Recommend Org