Giter Site home page Giter Site logo

Hi there šŸ‘‹

- šŸ”­ Iā€™m currently working on ...

Putting ML models into production for page analysis, text-line-extraction, object detection, and HOCR of medival manuscripts.

Here you can find a variety of tools used to annotate data for ML, format data for ML, and running models in a UI. All projects are workspaces for The Babel Public Library.

You can also check out my basic project portfolio website MumbotPorts

Some of my favorite repos are pinned below, including a dataset I scrapped and formatted to mirror MINST but using a collection of 9 characters in latin textura from medieval text (provided by paleographers). An annotator aimed at leveraging a paleographers approach to transcribing, compiling, and carefully considering language data found in manuscripts. Exporters and API that convert the object structures I regularly use into standardized ML formats or standardized historic library formats such as PAGE XML, COCO, MARC, or Dublin Core. Last but not least API that may or may not be available to preform ML enabled alterations on datasets via lambda functions and sagemaker endpoints. (sagemakers endpoints are off more often than not cause thats a whole bill)

- šŸŒ± Stack ...

  • React React
  • Node.js Node.js
  • AWS AWS
  • Python Python
  • GitHub GitHub
  • Docker Docker

- šŸ‘Æ Iā€™m looking to collaborate on ...

Historic HOCR ML Pipelines !

Game asset generation !

Making DevOps Cheaper !

- šŸ’¬ Ask me about ...

I'm really interested in natural language coding, few-shot-learning on depriciated data, unstructured language analysis, and just having fun with tech.

- šŸ“« How to reach me: ...

[email protected]

- šŸ˜„ Pronouns: ...

he/him

- āš” Fun fact: ...

I love to bike in NYC, 12mi a day baby!

babelbot's Projects

autoencoder-scribes icon autoencoder-scribes

autoencoder and dataloaders for running anomaly detection in medieval scribes version of the MNIST dataset

babelanno-test icon babelanno-test

Webapp for annotating manuscripts with a paleographic approach to HOCR ground truth data-labeling

babelhistoricannotator icon babelhistoricannotator

An annotator with advanced labeling, export and versioning tools to facilitate a paleographer's approach to ground truth creation for HOCR model training of manuscripts

booked icon booked

a booking and book keeping application for tattoo artists

gitbot-docker icon gitbot-docker

dockerized project that scraps your github repos and helps you locate exact files and lines you may be confused about.

honeyhive icon honeyhive

API to make prompt templates and generate prompt completions for gpt3, vincua, and cohere

kraken_mk_text_lines icon kraken_mk_text_lines

crops original image to bbox dims found with kraken api. generates images per text line from original image metadata

libcurator icon libcurator

allows researchers to make collections for their research topics and review collection statistics quickly

manuscript-segmentation icon manuscript-segmentation

Semantic segmentation of elements in a manuscript page. Includes code for training on custom annotations. Segmentation of text, image, noise, and marginilia trained on dataset curated with help from the Morgan library. NN built of deeplabv3 transfer learning using resnet50 pretrained weights.

manuscript_text_segmentation icon manuscript_text_segmentation

NN designed to identify and make bounding boxes around section of a page that includes text, of course for manuscripts tho (13-15c BOH). Crops to furthers corners of bounding boxes and applies a mask over the access non text space. **Should be followed up with a similar version used to find noise // badlines and mask over that info aka prepping pages for line segmentation.

morganapi icon morganapi

dockerized api and SQL database to look at records from the morgan libraries archives. data visualization views are linked in this repo.

pack2 icon pack2

artwork packing for storage spaces

prep_fasterrcnn_anno icon prep_fasterrcnn_anno

This is a quick tool for creating image masks used for training a FasterRCNN object detection module. Object detection takes several inputs including: image data, bbox dims, object labels, and image data masks. In order to quickly prepare all this data with one tool, this script will automatically create masks from a simple instance of bbox annotations created with VGG image annotator.

wand_dhseg_playground icon wand_dhseg_playground

a playground for making a llm with langchain that will either write the integration itself or acuratly id and req methods to integrate any repo, in this case dhSegment, with wandb for training and logging.

yolo_dims_edit_img_api icon yolo_dims_edit_img_api

takes dims found with yolo sagemaker model now stored in mongoDB and generates new image_versions via coords

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.