sebastian-nagel Goto Github PK
Name: Sebastian Nagel
Type: User
Company: @commoncrawl
Location: Konstanz, Germany
Blog: https://de.linkedin.com/pub/sebastian-nagel/35/320/8b4
Name: Sebastian Nagel
Type: User
Company: @commoncrawl
Location: Konstanz, Germany
Blog: https://de.linkedin.com/pub/sebastian-nagel/35/320/8b4
Google Cloud Search Apache Nutch Indexer Plugin
A collection of awesome web crawler,spider in different languages
Run a high-fidelity browser-based crawler in a single Docker container
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...
A small tool with uses the CommonCrawl URL Index to download documents with certain file types or mime-types for mass-testing of frameworks like Apache POI and Apache Tika
Backend of Common Search. Analyses webpages and sends them to the index.
Instructions for sewing a cotton face mask
A set of reusable Java components that implement functionality common to any web crawler
Tools for managing datasets for governance and training.
Apache Hadoop docker image
DuckDB-Web - Source code of duckdb.org
Open Source, Distributed, RESTful Search Engine
ggplot for python
Apache Hadoop
Neural net language identification for many languages on short texts plus construction-based dialectometry
Impf Bot.py πβ‘ β Automatisierung fΓΌr den Corona ImpfterminService Bot
Java library for reading and writing WARC files with a typed API
Port of Google's language-detection library to Python.
news-please - an integrated web crawler and information extractor for news that just works.
Mirror of Apache Nutch
A registry of publicly available datasets on AWS
Open JSON - a truly open source JSON implementation
Experiments and metrics about robots.txt captures, presentation at #ossym2022
A Proposal for Common Crawl to Consider Moving Compression from Gzip to Zstandard
PyAthena is a Python DB API 2.0 (PEP 249) client for Amazon Athena.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.