Giter Site home page Giter Site logo

proxima's Introduction

Proxima Bilin Engine

背景介绍

随着 AI 技术的广泛应用,以及数据规模的不断增长,对非结构化数据处理的需求也日益增多。向量检索也逐渐成了 AI 技术链路中不可或缺的一环,同时也是对传统搜索技术的补充。

Proxima 是阿里巴巴达摩院系统 AI 实验室自研的向量检索内核。目前,其核心能力广泛应用于阿里巴巴和蚂蚁集团内众多业务,如淘宝搜索和推荐、蚂蚁人脸支付、优酷视频搜索、阿里妈妈广告检索等。同时,Proxima 还深度集成在各式各类的大数据和数据库产品中,如阿里云 Hologres、搜索引擎 Elastic Search 和 ZSearch、离线引擎 MaxCompute (ODPS) 等,为其提供向量检索的能力。

Proxima BE,全称 Proxima Bilin Engine,是 Proxima 团队开发的服务化引擎,实现了对大数据的高性能相似性搜索。支持 RESTful HTTP 接口访问,同时也支持多种语言的 SDK 以 GRPC 协议访问。

核心能力


Proxima BE 的主要核心能力有以下几点:

  • 支持单机超大规模索引:基于底层向量索引的工程和检索算法优化,使得有限成本下,实现了高效率的检索方法,并支持磁盘索引,单片索引可达几十亿的规模。

  • 支持多数据源全量和增量同步:通过 Mysql Repository 等组件,可将 mysql 等数据源中的数据,实时同步至索引服务,提供查询能力,简化数据处理流程。

  • 支持向量索引实时增删改查:基于全新 CRUD 图索引,支持在线大规模向量索引的从 0 到 1 的流式写入,并实现了索引即时增删改查,避免索引需定期重建。

  • 支持正排数据查询:支持在查询时,可展示文档的所有结构化字段。同时后期将基于此功能,进一步扩展出与文本与向量联合检索等功能。

如何构建

环境要求:

  • Linux or MacOS
  • gcc >= 4.9
  • cmake >= 3.14
git clone https://github.com/alibaba/proximabilin.git
cd proximabilin && git submodule update --init

mkdir build && cd build

# Build with Debug (Intel Haswell Microarchitecture)
#cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_HASWELL=ON ..

# Build with Release (Intel Haswell Microarchitecture)
cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_HASWELL=ON ..

make -j all

获取 Docker 镜像

平台 仓库 版本
Linux X86_64 ghcr.io/proximabilin/proxima-be 0.2.0

快速开始

使用手册

案列展示

License

Apache License 2.0

声明

Proxima BE 依赖了如下项目:

proxima's People

Contributors

bingdai86 avatar chendianzhang avatar divenswu avatar fancy-liu avatar leovirgo avatar proxima-se avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

proxima's Issues

Why has this project not been fully open-sourced?

We can see in this directory that the (suspected) core part of this project, proxima, is not open source, and the directory deps/proxima/lib contains binary files of the library for different platforms. What is the reason for this? Is this another KPI open-source project of Alibaba?

怎么提高图片搜索的准确度

请问大家有用这个做图片搜索的吗?
我按教程,在pytorch下用ResNet18模型,抽取图片的特征向量入库。然后检索。除非使用一张图片,可以检索出来。稍微做些修改的,都不行。是特征向量提取的问题,还是距离计算的问题?

多余的转义

会将json字符串再进行一次转义,我发送给proxima的就是已经转换成字符串类型的数据,结果proxima又给我加了一次转义,让\“key\”: \"value\"变成了,\\\“key\\\”: \\\"value\\\", 请不要做多余的事情

WriteRequestBuilder::validate_request中的一处判断

WriteRequestBuilder::validate_request中的一处判断:

auto &request_index_metas = request.row_meta().index_column_metas();
size_t index_column_size = static_cast<size_t>(request_index_metas.size());
...
size_t index_value_size =
    static_cast<size_t>(row.index_column_values().values_size());
if (index_value_size != index_column_size) {
  LOG_ERROR(
      "Row index columns size mismatched. meta[%zu] "
      "values[%zu] collection[%s]",
      index_column_size, index_value_size, collection.c_str());
  return ErrorCode_InvalidWriteRequest;
}

index_column_size 是指请求中的index_column的个数,
index_value_size 是指当前行中的所有index_column的数值,一个index_column可能有多维,所以这2个值本身就不等吧?

不知道我理解的对不对

thanks

请问libproxima.so什么时候会开源

您好,最近在看proximabilin的代码,收获比较多。不过涉及底层几个关键数据结构的代码,如正排索引和segment的代码,都被封装在libproxima.so中。不知道这个库后续什么时候开源?

MySQL Repository 不能同步数据到ProximaBE中

前提条件:
部署到同一机器的docker中,其中机器的IP地址为:172.16.36.229。

具体的操作步骤:
1、docker run -p 3306:3306 -e MYSQL_ROOT_PASSWORD=root -d mysql:5.7 且修改my.cnf文件:
[mysqld]
server-id = 12000
log_bin = binlog
binlog_format = ROW
运行show master logs;后能得到binlog的列表。

2、启动ProximaBE
sudo docker run -d --name proxima-bilin-engine -p 16000:16000 -p 16001:16001
-v /home/root/proxima-be/conf:/var/lib/proxima-be/conf
-v /home/root/proxima-be/data:/var/lib/proxima-be/data
-v /home/root/proxima-be/log:/var/lib/proxima-be/log
ghcr.io/proximabilin/proxima-be:0.2.0
配置信息:
common_config {
grpc_listen_port: 16000
http_listen_port: 16001
logger_type: "AppendLogger"
log_directory: "./log/"
log_file: "proxima_be.log"
log_level: 1
}

query_config {
query_thread_count: 8
}

index_config {
max_build_qps: 0
index_directory: "./data/"
flush_internal: 300
}

meta_config {
meta_uri: "sqlite:///var/lib/proxima-be/data/proxima_be_meta.sqlite"
}

3、启动MySQL Repository
sudo docker run -d --name proxima-mysql-repository
-v /home/root/proxima-be/conf:/var/lib/proxima-be/conf
-v /home/root/proxima-be/data:/var/lib/proxima-be/data
-v /home/root/proxima-be/log:/var/lib/proxima-be/log
ghcr.io/proximabilin/proxima-be:0.2.0
/var/lib/proxima-be/bin/mysql_repository --config /var/lib/proxima-be/conf/mysql_repo.conf

配置信息:
common_config {
log_directory: "/var/lib/proxima-be/log/"
log_file: "mysql_repo.log"
log_level: 1
}
repository_config {
index_agent_addr: "172.16.36.229:16000"
}

4、测试脚本使用Python中的测试脚本:mysql_example.py
运行中打印的日志中不能搜索出向量,且stats_collection接口返回:'total_doc_count': 0

CollectionStats{
    'collection_name': 'iris',
     'collection_path': './data//iris',
     'total_doc_count': 0,
     'total_segment_count': 1,
     'total_index_file_count': 6, 
    'total_index_file_size': 5455872, 
    'segment_stats': [SegmentStats{'segment_id': 0, 'state': <SegmentState.WRITING: 1>, 'doc_count': 0, 'index_file_count': 2, 'index_file_size': 2138112, 'min_doc_id': 0, 'max_doc_id': 0, 'min_primary_key': 18446744073709551615, 'max_primary_key': 0, 'min_timestamp': 18446744073709551615, 'max_timestamp': 0, 'min_lsn': 18446744073709551615, 'max_lsn': 0, 'segment_path': ''}]
}

docker镜像不存在

docker pull ghcr.io/alibaba/proxima-be

Using default tag: latest
Error response from daemon: manifest unknown

对比milvus

请问,这个框架和milvus的整体性能有大致对比么,例如开销,检索速度与rank效果等,谢谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.