Giter Site home page Giter Site logo

ss_project's Introduction

SS_project

Resource:

purpose:

  • find similar commits
  • input: bunchs of commits(include folk)
  • output: clusters of similar commits
  • now focus on single repository

Environment:

  • conda create --name ss
  • conda activate ss
  • pip install -r requirements.txt
  • conda install -c pytorch faiss-gpu

Question:

  1. how k-Shingling deal with long content?

  2. for reduce dimension, why not change fc layer's output dimension?

how:

Flat Index:当数据集可以放入RAM并且需要精确搜索时使用。 PQ Index:当内存是考虑因素且数据集较大时,通过Product Quantization压缩向量以减少内存占用。 Flat Index with IVF:对于大型数据集,使用Inverted File System (IVF)可以提高搜索速度。 PQ Index with IVF:同时考虑内存和搜索效率时的选择,尤其是在数据集很大时。 Annoy Index:当向量维度小于100时的选择。 HNSWx Index:当向量维度大于等于100且需要更高效的索引时使用。

Flat Index: Used when the dataset can fit in RAM and an exact search is required. PQ Index: When memory is a consideration and the dataset is large, compress the vectors with Product Quantization to reduce the memory footprint. Flat Index with IVF: For large datasets, use Inverted File System (IVF) to improve search speed. PQ Index with IVF: The choice when considering both memory and search efficiency, especially when the dataset is large. Annoy Index: the choice when the vector dimension is less than 100. HNSWx Index: used when the vector dimension is greater than or equal to 100 and a more efficient index is needed.

input and output is fixed

input normalizition

url->commits->diff->similar diff->map commits_brunch_repository

try dr and know, evaluation, ablation of different method

gitlib issues, list of repositories

use cherry harvest, cargo run --release, to test

can use python, push changes to gitlab

ss_project's People

Contributors

sosekie avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.