ayuei / debeir Goto Github PK
View Code? Open in Web Editor NEWDense Bi-Encoder Retrieval for Rapid Experimentation
License: GNU General Public License v3.0
Dense Bi-Encoder Retrieval for Rapid Experimentation
License: GNU General Public License v3.0
pip install -r requirements-dev.txt
cd tests
./build_test_env.sh
(venv) tests$ cleanup.sh
Error response from daemon: No such container: elasticsearch_debir_test
Error: No such container: elasticsearch_debir_test
Error: No such container: indexer_test_elasticsearch
rm: cannot remove 'test_set/': No such file or directory
(venv) tests$ build_test_env.sh
Cloning into 'go-clinical-indexer'...
remote: Enumerating objects: 154, done.
remote: Counting objects: 100% (154/154), done.
remote: Compressing objects: 100% (81/81), done.
remote: Total 154 (delta 67), reused 124 (delta 40), pack-reused 0
Receiving objects: 100% (154/154), 60.58 KiB | 602.00 KiB/s, done.
Resolving deltas: 100% (67/67), done.
Extracting test data
tar: test.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
[+] Building 1.0s (6/8)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 38B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/golang:latest 0.9s
=> [internal] load build context 0.0s
=> => transferring context: 203.42kB 0.0s
=> CANCELED [1/4] FROM docker.io/library/golang@sha256:04f76f956e51797a44847e066bde1341c01e09054d3878ae88c7f77 0.0s
=> => resolve docker.io/library/golang@sha256:04f76f956e51797a44847e066bde1341c01e09054d3878ae88c7f77f09897c4d 0.0s
=> => sha256:04f76f956e51797a44847e066bde1341c01e09054d3878ae88c7f77f09897c4d 2.36kB / 2.36kB 0.0s
=> => sha256:8ea012ba16112273afc171ff75ce517fe4edeb3849f6714554aa4e71fe54e4c1 1.80kB / 1.80kB 0.0s
=> => sha256:180567aa84db27f3a680fc34bf2a84cc577b8b7a641ed6575c0aae78217f1e9a 7.11kB / 7.11kB 0.0s
=> ERROR [2/4] COPY test_set/test.tsv test.tsv 0.0s
------
> [2/4] COPY test_set/test.tsv test.tsv:
------
failed to compute cache key: "/test_set/test.tsv" not found: not found
Unable to find image 'elasticsearch:8.4.1' locally
8.4.1: Pulling from library/elasticsearch
3b65ec22a9e9: Pull complete
50533006a600: Pull complete
21050bd73374: Pull complete
d6d8ab1d90f2: Pull complete
b9a7535fafc7: Pull complete
7afff2f64b58: Pull complete
5552a92a8deb: Pull complete
65f9b3939425: Pull complete
48e160ec61e6: Pull complete
Digest: sha256:7dd81b0af4aa916cf58373d5befaad56f89b96dd5582ced9f63879ed2650802c
Status: Downloaded newer image for elasticsearch:8.4.1
3a319c8ebcacfa123f8b3406265aeb57d368104905539c4e3a7c4afbd3695145
Unable to find image 'debir/test:0.1' locally
docker: Error response from daemon: pull access denied for debir/test, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.
Collecting https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_md-0.5.0.tar.gz
Downloading https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_md-0.5.0.tar.gz (120.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 120.2/120.2 MB 887.4 kB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting spacy<3.3.0,>=3.2.3
Downloading spacy-3.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.1/6.1 MB 2.0 MB/s eta 0:00:00
Collecting spacy-loggers<2.0.0,>=1.0.0
Downloading spacy_loggers-1.0.4-py3-none-any.whl (11 kB)
Collecting tqdm<5.0.0,>=4.38.0
Downloading tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 7.0 MB/s eta 0:00:00
Collecting jinja2
Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.1/133.1 kB 9.7 MB/s eta 0:00:00
Collecting preshed<3.1.0,>=3.0.2
Downloading preshed-3.0.8-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (124 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.7/124.7 kB 9.1 MB/s eta 0:00:00
Collecting numpy>=1.15.0
Downloading numpy-1.23.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 6.3 MB/s eta 0:00:00
Requirement already satisfied: packaging>=20.0 in ./venv/lib/python3.10/site-packages (from spacy<3.3.0,>=3.2.3->en-core-sci-md==0.5.0) (22.0)
Collecting pathy>=0.3.5
Downloading pathy-0.10.1-py3-none-any.whl (48 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.9/48.9 kB 7.5 MB/s eta 0:00:00
Collecting srsly<3.0.0,>=2.4.1
Downloading srsly-2.4.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (491 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 491.3/491.3 kB 7.3 MB/s eta 0:00:00
Collecting catalogue<2.1.0,>=2.0.6
Downloading catalogue-2.0.8-py3-none-any.whl (17 kB)
Collecting wasabi<1.1.0,>=0.8.1
Downloading wasabi-0.10.1-py3-none-any.whl (26 kB)
Collecting murmurhash<1.1.0,>=0.28.0
Downloading murmurhash-1.0.9-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21 kB)
Collecting langcodes<4.0.0,>=3.2.0
Downloading langcodes-3.3.0-py3-none-any.whl (181 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 181.6/181.6 kB 11.5 MB/s eta 0:00:00
Collecting spacy-legacy<3.1.0,>=3.0.8
Downloading spacy_legacy-3.0.10-py2.py3-none-any.whl (21 kB)
Collecting cymem<2.1.0,>=2.0.2
Downloading cymem-2.0.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34 kB)
Requirement already satisfied: setuptools in ./venv/lib/python3.10/site-packages (from spacy<3.3.0,>=3.2.3->en-core-sci-md==0.5.0) (65.6.3)
Collecting click<8.1.0
Downloading click-8.0.4-py3-none-any.whl (97 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.5/97.5 kB 16.6 MB/s eta 0:00:00
Collecting requests<3.0.0,>=2.13.0
Downloading requests-2.28.1-py3-none-any.whl (62 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.8/62.8 kB 17.7 MB/s eta 0:00:00
Collecting typer<0.5.0,>=0.3.0
Downloading typer-0.4.2-py3-none-any.whl (27 kB)
Collecting pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4
Downloading pydantic-1.8.2-py3-none-any.whl (126 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 126.0/126.0 kB 10.4 MB/s eta 0:00:00
Collecting thinc<8.1.0,>=8.0.12
Downloading thinc-8.0.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (659 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 659.5/659.5 kB 7.3 MB/s eta 0:00:00
Collecting blis<0.8.0,>=0.4.0
Downloading blis-0.7.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 5.3 MB/s eta 0:00:00
Collecting smart-open<7.0.0,>=5.2.1
Downloading smart_open-6.3.0-py3-none-any.whl (56 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.8/56.8 kB 3.0 MB/s eta 0:00:00
Collecting typing-extensions>=3.7.4.3
Downloading typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting idna<4,>=2.5
Downloading idna-3.4-py3-none-any.whl (61 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.5/61.5 kB 5.8 MB/s eta 0:00:00
Collecting urllib3<1.27,>=1.21.1
Downloading urllib3-1.26.13-py2.py3-none-any.whl (140 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 140.6/140.6 kB 5.5 MB/s eta 0:00:00
Collecting certifi>=2017.4.17
Downloading certifi-2022.12.7-py3-none-any.whl (155 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.3/155.3 kB 4.5 MB/s eta 0:00:00
Collecting charset-normalizer<3,>=2
Downloading charset_normalizer-2.1.1-py3-none-any.whl (39 kB)
Collecting MarkupSafe>=2.0
Downloading MarkupSafe-2.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Building wheels for collected packages: en-core-sci-md
Building wheel for en-core-sci-md (setup.py) ... done
Created wheel for en-core-sci-md: filename=en_core_sci_md-0.5.0-py3-none-any.whl size=120252792 sha256=ebec347e8b23f52be25326759f087613b3e3b09a0cab652609ab1c8aaa554cd8
Stored in directory: /home/konrad/.cache/pip/wheels/96/61/7c/4f20424bc721af69e3a01337f45c41f5bef510f6fe9c3c3d43
Successfully built en-core-sci-md
Installing collected packages: wasabi, cymem, urllib3, typing-extensions, tqdm, spacy-loggers, spacy-legacy, smart-open, numpy, murmurhash, MarkupSafe, langcodes, idna, click, charset-normalizer, certifi, catalogue, typer, srsly, requests, pydantic, preshed, jinja2, blis, thinc, pathy, spacy, en-core-sci-md
Successfully installed MarkupSafe-2.1.1 blis-0.7.9 catalogue-2.0.8 certifi-2022.12.7 charset-normalizer-2.1.1 click-8.0.4 cymem-2.0.7 en-core-sci-md-0.5.0 idna-3.4 jinja2-3.1.2 langcodes-3.3.0 murmurhash-1.0.9 numpy-1.23.5 pathy-0.10.1 preshed-3.0.8 pydantic-1.8.2 requests-2.28.1 smart-open-6.3.0 spacy-3.2.4 spacy-legacy-3.0.10 spacy-loggers-1.0.4 srsly-2.4.5 thinc-8.0.17 tqdm-4.64.1 typer-0.4.2 typing-extensions-4.4.0 urllib3-1.26.13 wasabi-0.10.1
Traceback (most recent call last):
File "/home/konrad/tmp/DeBEIR/./examples/indexing/create_semantic_index.py", line 4, in <module>
import plac
ModuleNotFoundError: No module named 'plac'
Part of the JOSS review openjournals/joss-reviews#5017.
The documentation instructs to install from source, but ideally it should be installable from PyPI.
Part of the JOSS review openjournals/joss-reviews#5017
(venv) trec2022$ python training.py
Traceback (most recent call last):
File "/home/konrad/tmp/debeir/examples/trec2022/training.py", line 11, in <module>
from training.utils import DatasetToSentTrans
File "/home/konrad/tmp/debeir/examples/trec2022/training.py", line 11, in <module>
from training.utils import DatasetToSentTrans
ModuleNotFoundError: No module named 'training.utils'; 'training' is not a package
(venv) trec2022$ python train.py
Traceback (most recent call last):
File "/home/konrad/tmp/debeir/examples/trec2022/train.py", line 16, in <module>
from training.utils import get_scheduler_with_wandb
File "/home/konrad/tmp/debeir/examples/trec2022/training.py", line 11, in <module>
from training.utils import DatasetToSentTrans
ModuleNotFoundError: No module named 'training.utils'; 'training' is not a package
(venv) hparam_tuning$ python hparam_tuning_from_config.py
2023-02-28 13:33:52.220491: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-28 13:33:52.301716: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-02-28 13:33:52.304477: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-02-28 13:33:52.304491: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-02-28 13:33:52.826877: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-02-28 13:33:52.826932: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-02-28 13:33:52.826940: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
File "/home/konrad/tmp/debeir/examples/hparam_tuning/hparam_tuning_from_config.py", line 53, in <module>
hparam_config = HparamConfig.from_json(
File "/home/konrad/.local/lib/python3.10/site-packages/debeir/training/hparm_tuning/config.py", line 39, in from_json
return HparamConfig(json.load(open(fp)))
FileNotFoundError: [Errno 2] No such file or directory: './configs/hparam/trec2021_tuning.json'
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
Part of the JOSS review openjournals/joss-reviews#5017
@Ayuei in #14 (comment)_ you explained how to hook into the fit/training loop to support integrated intel CPUs/GPUs.
However that is quite cumbersome, and given that NVIDIA (and AMD?) is supported by default, I would prefer if that intel_extension_for_pytorch would just be included by default and would be used automatically if possible.
For example, there is no documentation on https://ayuei.github.io/DeBEIR/debeir/core/pipeline.html.
Please provide the missing documentation and add to the readme file, how it is generated.
This issue is part of the JOSS review at openjournals/joss-reviews#5017.
docs/debeir.html:
debeir
The NIR (Neural Index Ranker) source code library.
See ./main.py in the parent directory for an out-of-the-box runnable code.
Otherwise, check out notebooks in the parent directory for training your own model amongst other things.
Part of the JOSS review openjournals/joss-reviews#5017.
Even on an i9-12900k without a GPU the fastest example estimates 4 hours of duration. Is it possible to create an example that can be run faster or to modify one of the existing ones?
Please add a README.md into the examples/pipeline folder to specific the setup steps. With no setup, it aborts with the following error message:
RuntimeError: Elasticsearch instance cannot be reached at <AsyncElasticsearch(['http://localhost:9200'])>
(venv) indexing$ python create_semantic_index.py
2023-02-28 13:35:37.990490: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-28 13:35:38.071820: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-02-28 13:35:38.074573: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-02-28 13:35:38.074588: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-02-28 13:35:38.595994: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-02-28 13:35:38.596048: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-02-28 13:35:38.596056: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
File "/home/konrad/tmp/debeir/examples/indexing/create_semantic_index.py", line 81, in <module>
raise e
File "/home/konrad/tmp/debeir/examples/indexing/create_semantic_index.py", line 79, in <module>
plac.call(main)
File "/home/konrad/.local/lib/python3.10/site-packages/plac_core.py", line 436, in call
cmd, result = parser.consume(arglist)
File "/home/konrad/.local/lib/python3.10/site-packages/plac_core.py", line 287, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/home/konrad/tmp/debeir/examples/indexing/create_semantic_index.py", line 40, in main
config = config_factory(config, GenericConfig)
File "/home/konrad/.local/lib/python3.10/site-packages/debeir/datasets/factory.py", line 119, in config_factory
return config_cls.from_args(args_dict, config_cls)
File "/home/konrad/.local/lib/python3.10/site-packages/debeir/core/config.py", line 62, in from_args
obj = field_class(**{k: v for k, v in args_dict.items() if k in field_names})
AttributeError: 'NoneType' object has no attribute 'items'
According to requirements.txt, the outdated torch major version 1 is used while the current major version is 2.
Is it possible to upgrade to the current version?
$ python3 -m virtualenv venv
/usr/bin/python3: No module named virtualenv
Either add pip install virtualenv
to the docs or use -m venv
instead if that is enough, as it is shipped by default with Python3.3+, see https://stackoverflow.com/questions/41573587/what-is-the-difference-between-venv-pyvenv-pyenv-virtualenv-virtualenvwrappe.
Part of the JOSS review openjournals/joss-reviews#5017.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.