Giter Site home page Giter Site logo

mrgenius01 / finetuner Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jina-ai/finetuner

0.0 0.0 0.0 73.48 MB

:dart: Task-oriented embedding tuning for BERT, CLIP, etc.

Home Page: https://finetuner.jina.ai

License: Apache License 2.0

Python 98.05% Makefile 1.95%

finetuner's Introduction



Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications.

Task-oriented finetuning for better embeddings on neural search

PyPI Codecov branch PyPI - Downloads from official pypistats

Fine-tuning is an effective way to improve performance on neural search tasks. However, setting up and performing fine-tuning can be very time-consuming and resource-intensive.

Jina AI's Finetuner makes fine-tuning easier and faster by streamlining the workflow and handling all the complexity and infrastructure in the cloud. With Finetuner, you can easily enhance the performance of pre-trained models, making them production-ready without extensive labeling or expensive hardware.

🎏 Better embeddings: Create high-quality embeddings for semantic search, visual similarity search, cross-modal text<->image search, recommendation systems, clustering, duplication detection, anomaly detection, or other uses.

Low budget, high expectations: Bring considerable improvements to model performance, making the most out of as little as a few hundred training samples, and finish fine-tuning in as little as an hour.

📈 Performance promise: Enhance the performance of pre-trained models so that they deliver state-of-the-art performance on domain-specific applications.

🔱 Simple yet powerful: Easy access to 40+ mainstream loss functions, 10+ optimizers, layer pruning, weight freezing, dimensionality reduction, hard-negative mining, cross-modal models, and distributed training.

All-in-cloud: Train using our GPU infrastructure, manage runs, experiments, and artifacts on Jina AI Cloud without worrying about resource availability, complex integration, or infrastructure costs.

Pretrained Text Embedding Models

name parameter dimension Huggingface
jina-embedding-t-en-v1 14m 312 link
jina-embedding-s-en-v1 35m 512 link
jina-embedding-b-en-v1 110m 768 link
jina-embedding-l-en-v1 330m 1024 link

Benchmarks

Model Task Metric Pretrained Finetuned Delta Run it!
BERT Quora Question Answering mRR 0.835 0.967 15.8%

Open In Colab

Recall 0.915 0.963 5.3%
ResNet Visual similarity search on TLL mAP 0.110 0.196 78.2%

Open In Colab

Recall 0.249 0.460 84.7%
CLIP Deep Fashion text-to-image search mRR 0.575 0.676 17.4%

Open In Colab

Recall 0.473 0.564 19.2%
M-CLIP Cross market product recommendation (German) mRR 0.430 0.648 50.7%

Open In Colab

Recall 0.247 0.340 37.7%
PointNet++ ModelNet40 3D Mesh Search mRR 0.791 0.891 12.7%

Open In Colab

Recall 0.154 0.242 57.1%

All metrics were evaluated for k@20 after training for 5 epochs using the Adam optimizer with learning rates of 1e-4 for ResNet, 1e-7 for CLIP and 1e-5 for the BERT models, 5e-4 for PointNet++

Install

Make sure you have Python 3.8+ installed. Finetuner can be installed via pip by executing:

pip install -U finetuner

If you want to submit a fine-tuning job on the cloud, please use

pip install "finetuner[full]"

⚠️ Starting with version 0.5.0, Finetuner computing is performed on Jina AI Cloud. The last local version is 0.4.1. This version is still available for installation via pip. See Finetuner git tags and releases.

Articles about Finetuner

Check out our published blogposts and tutorials to see Finetuner in action!

If you find Jina Embeddings useful in your research, please cite the following paper:

@misc{günther2023jina,
      title={Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models}, 
      author={Michael Günther and Louis Milliken and Jonathan Geuter and Georgios Mastrapas and Bo Wang and Han Xiao},
      year={2023},
      eprint={2307.11224},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Support

Join Us

Finetuner is backed by Jina AI and licensed under Apache-2.0.

We are actively hiring AI engineers and solution engineers to build the next generation of open-source AI ecosystems.

finetuner's People

Contributors

hanxiao avatar bwanglzu avatar lmmilliken avatar guenthermi avatar jupyterjazz avatar gmastrapas avatar jina-bot avatar scott-martens avatar nomagick avatar azayz avatar maximilianwerk avatar deepankarm avatar numb3r3 avatar j-geuter avatar nan-wang avatar violenil avatar gvondulong avatar catstark avatar makram93 avatar shazhou2015 avatar alexcg1 avatar florian-hoenicke avatar jemmyshin avatar roshanjossey avatar slettner avatar shubhamsaboo avatar tadejsv avatar winstonww avatar mapleeit avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.