thebabellibrarybot Goto Github PK

followers: 1.0 following: 1.0 repos: 29.0 gists: 0.0

Name: babelbot

Type: User

Company: The Babel Public Library

Bio: Experiments and programs investigating the intersection of art and AI with a focus on deciphering cryptic icon and linguistic archives of the distant past

Blog: http://thebabelpubliclibrary.org/

Hi there 👋

- 🔭 I’m currently working on ...

Putting ML models into production for page analysis, text-line-extraction, object detection, and HOCR of medival manuscripts.

Here you can find a variety of tools used to annotate data for ML, format data for ML, and running models in a UI. All projects are workspaces for The Babel Public Library.

You can also check out my basic project portfolio website MumbotPorts

Some of my favorite repos are pinned below, including a dataset I scrapped and formatted to mirror MINST but using a collection of 9 characters in latin textura from medieval text (provided by paleographers). An annotator aimed at leveraging a paleographers approach to transcribing, compiling, and carefully considering language data found in manuscripts. Exporters and API that convert the object structures I regularly use into standardized ML formats or standardized historic library formats such as PAGE XML, COCO, MARC, or Dublin Core. Last but not least API that may or may not be available to preform ML enabled alterations on datasets via lambda functions and sagemaker endpoints. (sagemakers endpoints are off more often than not cause thats a whole bill)

- 🌱 Stack ...

React
Node.js
AWS
Python
GitHub
Docker

- 👯 I’m looking to collaborate on ...

Historic HOCR ML Pipelines !

Game asset generation !

Making DevOps Cheaper !

- 💬 Ask me about ...

I'm really interested in natural language coding, few-shot-learning on depriciated data, unstructured language analysis, and just having fun with tech.

- 📫 How to reach me: ...

[email protected]

- 😄 Pronouns: ...

he/him

- ⚡ Fun fact: ...

I love to bike in NYC, 12mi a day baby!

babelbot's Projects

autoencoder-scribes

autoencoder and dataloaders for running anomaly detection in medieval scribes version of the MNIST dataset

babelanno-test

Webapp for annotating manuscripts with a paleographic approach to HOCR ground truth data-labeling

babelhistoricannotator

An annotator with advanced labeling, export and versioning tools to facilitate a paleographer's approach to ground truth creation for HOCR model training of manuscripts

bablib

background_funcs

bokted_backend

backend for booking with prod and dev branches

bokted_frontend

booking app frontend with prod and dev branches

booked

a booking and book keeping application for tattoo artists

bookedoauth

booked

casestudy

masterworks case-study with real-estate DB

gitbot-docker

dockerized project that scraps your github repos and helps you locate exact files and lines you may be confused about.

gitbot_shell

just the frontend for interaction and viewing purposes

honeyhive

API to make prompt templates and generate prompt completions for gpt3, vincua, and cohere

jackandadam

crud

keras-dreambooth-i

fine-tune stable diffusion to learn class-instance of sks pixel-art-character

kraken-bot-api

kraken, boto, FROM public.ecr.aws/lambda/python:3.8

kraken_mk_text_lines

crops original image to bbox dims found with kraken api. generates images per text line from original image metadata

libcurator

allows researchers to make collections for their research topics and review collection statistics quickly

manuscript-segmentation

Semantic segmentation of elements in a manuscript page. Includes code for training on custom annotations. Segmentation of text, image, noise, and marginilia trained on dataset curated with help from the Morgan library. NN built of deeplabv3 transfer learning using resnet50 pretrained weights.

manuscript_text_segmentation

NN designed to identify and make bounding boxes around section of a page that includes text, of course for manuscripts tho (13-15c BOH). Crops to furthers corners of bounding boxes and applies a mask over the access non text space. **Should be followed up with a similar version used to find noise // badlines and mask over that info aka prepping pages for line segmentation.

morganapi

dockerized api and SQL database to look at records from the morgan libraries archives. data visualization views are linked in this repo.

pack2

artwork packing for storage spaces

prep_fasterrcnn_anno

This is a quick tool for creating image masks used for training a FasterRCNN object detection module. Object detection takes several inputs including: image data, bbox dims, object labels, and image data masks. In order to quickly prepare all this data with one tool, this script will automatically create masks from a simple instance of bbox annotations created with VGG image annotator.