Giter Site home page Giter Site logo

debeir's People

Contributors

ayuei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

debeir's Issues

errors on build_test_env.sh

Steps

  1. clone this repository
  2. activate virtual environment
  3. pip install -r requirements-dev.txt
  4. cd tests
  5. ./build_test_env.sh

Output

(venv) tests$ cleanup.sh
Error response from daemon: No such container: elasticsearch_debir_test
Error: No such container: elasticsearch_debir_test
Error: No such container: indexer_test_elasticsearch
rm: cannot remove 'test_set/': No such file or directory
(venv) tests$ build_test_env.sh
Cloning into 'go-clinical-indexer'...
remote: Enumerating objects: 154, done.
remote: Counting objects: 100% (154/154), done.
remote: Compressing objects: 100% (81/81), done.
remote: Total 154 (delta 67), reused 124 (delta 40), pack-reused 0
Receiving objects: 100% (154/154), 60.58 KiB | 602.00 KiB/s, done.
Resolving deltas: 100% (67/67), done.
Extracting test data
tar: test.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
[+] Building 1.0s (6/8)                                                                                               
 => [internal] load build definition from Dockerfile                                                             0.0s
 => => transferring dockerfile: 38B                                                                              0.0s
 => [internal] load .dockerignore                                                                                0.0s
 => => transferring context: 2B                                                                                  0.0s
 => [internal] load metadata for docker.io/library/golang:latest                                                 0.9s
 => [internal] load build context                                                                                0.0s
 => => transferring context: 203.42kB                                                                            0.0s
 => CANCELED [1/4] FROM docker.io/library/golang@sha256:04f76f956e51797a44847e066bde1341c01e09054d3878ae88c7f77  0.0s
 => => resolve docker.io/library/golang@sha256:04f76f956e51797a44847e066bde1341c01e09054d3878ae88c7f77f09897c4d  0.0s
 => => sha256:04f76f956e51797a44847e066bde1341c01e09054d3878ae88c7f77f09897c4d 2.36kB / 2.36kB                   0.0s
 => => sha256:8ea012ba16112273afc171ff75ce517fe4edeb3849f6714554aa4e71fe54e4c1 1.80kB / 1.80kB                   0.0s
 => => sha256:180567aa84db27f3a680fc34bf2a84cc577b8b7a641ed6575c0aae78217f1e9a 7.11kB / 7.11kB                   0.0s
 => ERROR [2/4] COPY test_set/test.tsv test.tsv                                                                  0.0s
------
 > [2/4] COPY test_set/test.tsv test.tsv:
------
failed to compute cache key: "/test_set/test.tsv" not found: not found
Unable to find image 'elasticsearch:8.4.1' locally
8.4.1: Pulling from library/elasticsearch
3b65ec22a9e9: Pull complete 
50533006a600: Pull complete 
21050bd73374: Pull complete 
d6d8ab1d90f2: Pull complete 
b9a7535fafc7: Pull complete 
7afff2f64b58: Pull complete 
5552a92a8deb: Pull complete 
65f9b3939425: Pull complete 
48e160ec61e6: Pull complete 
Digest: sha256:7dd81b0af4aa916cf58373d5befaad56f89b96dd5582ced9f63879ed2650802c
Status: Downloaded newer image for elasticsearch:8.4.1
3a319c8ebcacfa123f8b3406265aeb57d368104905539c4e3a7c4afbd3695145
Unable to find image 'debir/test:0.1' locally
docker: Error response from daemon: pull access denied for debir/test, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.
Collecting https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_md-0.5.0.tar.gz
  Downloading https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_md-0.5.0.tar.gz (120.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 120.2/120.2 MB 887.4 kB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting spacy<3.3.0,>=3.2.3
  Downloading spacy-3.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.1/6.1 MB 2.0 MB/s eta 0:00:00
Collecting spacy-loggers<2.0.0,>=1.0.0
  Downloading spacy_loggers-1.0.4-py3-none-any.whl (11 kB)
Collecting tqdm<5.0.0,>=4.38.0
  Downloading tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 7.0 MB/s eta 0:00:00
Collecting jinja2
  Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.1/133.1 kB 9.7 MB/s eta 0:00:00
Collecting preshed<3.1.0,>=3.0.2
  Downloading preshed-3.0.8-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (124 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.7/124.7 kB 9.1 MB/s eta 0:00:00
Collecting numpy>=1.15.0
  Downloading numpy-1.23.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 6.3 MB/s eta 0:00:00
Requirement already satisfied: packaging>=20.0 in ./venv/lib/python3.10/site-packages (from spacy<3.3.0,>=3.2.3->en-core-sci-md==0.5.0) (22.0)
Collecting pathy>=0.3.5
  Downloading pathy-0.10.1-py3-none-any.whl (48 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.9/48.9 kB 7.5 MB/s eta 0:00:00
Collecting srsly<3.0.0,>=2.4.1
  Downloading srsly-2.4.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (491 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 491.3/491.3 kB 7.3 MB/s eta 0:00:00
Collecting catalogue<2.1.0,>=2.0.6
  Downloading catalogue-2.0.8-py3-none-any.whl (17 kB)
Collecting wasabi<1.1.0,>=0.8.1
  Downloading wasabi-0.10.1-py3-none-any.whl (26 kB)
Collecting murmurhash<1.1.0,>=0.28.0
  Downloading murmurhash-1.0.9-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21 kB)
Collecting langcodes<4.0.0,>=3.2.0
  Downloading langcodes-3.3.0-py3-none-any.whl (181 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 181.6/181.6 kB 11.5 MB/s eta 0:00:00
Collecting spacy-legacy<3.1.0,>=3.0.8
  Downloading spacy_legacy-3.0.10-py2.py3-none-any.whl (21 kB)
Collecting cymem<2.1.0,>=2.0.2
  Downloading cymem-2.0.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34 kB)
Requirement already satisfied: setuptools in ./venv/lib/python3.10/site-packages (from spacy<3.3.0,>=3.2.3->en-core-sci-md==0.5.0) (65.6.3)
Collecting click<8.1.0
  Downloading click-8.0.4-py3-none-any.whl (97 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.5/97.5 kB 16.6 MB/s eta 0:00:00
Collecting requests<3.0.0,>=2.13.0
  Downloading requests-2.28.1-py3-none-any.whl (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.8/62.8 kB 17.7 MB/s eta 0:00:00
Collecting typer<0.5.0,>=0.3.0
  Downloading typer-0.4.2-py3-none-any.whl (27 kB)
Collecting pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4
  Downloading pydantic-1.8.2-py3-none-any.whl (126 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 126.0/126.0 kB 10.4 MB/s eta 0:00:00
Collecting thinc<8.1.0,>=8.0.12
  Downloading thinc-8.0.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (659 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 659.5/659.5 kB 7.3 MB/s eta 0:00:00
Collecting blis<0.8.0,>=0.4.0
  Downloading blis-0.7.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 5.3 MB/s eta 0:00:00
Collecting smart-open<7.0.0,>=5.2.1
  Downloading smart_open-6.3.0-py3-none-any.whl (56 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.8/56.8 kB 3.0 MB/s eta 0:00:00
Collecting typing-extensions>=3.7.4.3
  Downloading typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting idna<4,>=2.5
  Downloading idna-3.4-py3-none-any.whl (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.5/61.5 kB 5.8 MB/s eta 0:00:00
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.13-py2.py3-none-any.whl (140 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 140.6/140.6 kB 5.5 MB/s eta 0:00:00
Collecting certifi>=2017.4.17
  Downloading certifi-2022.12.7-py3-none-any.whl (155 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.3/155.3 kB 4.5 MB/s eta 0:00:00
Collecting charset-normalizer<3,>=2
  Downloading charset_normalizer-2.1.1-py3-none-any.whl (39 kB)
Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-2.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Building wheels for collected packages: en-core-sci-md
  Building wheel for en-core-sci-md (setup.py) ... done
  Created wheel for en-core-sci-md: filename=en_core_sci_md-0.5.0-py3-none-any.whl size=120252792 sha256=ebec347e8b23f52be25326759f087613b3e3b09a0cab652609ab1c8aaa554cd8
  Stored in directory: /home/konrad/.cache/pip/wheels/96/61/7c/4f20424bc721af69e3a01337f45c41f5bef510f6fe9c3c3d43
Successfully built en-core-sci-md
Installing collected packages: wasabi, cymem, urllib3, typing-extensions, tqdm, spacy-loggers, spacy-legacy, smart-open, numpy, murmurhash, MarkupSafe, langcodes, idna, click, charset-normalizer, certifi, catalogue, typer, srsly, requests, pydantic, preshed, jinja2, blis, thinc, pathy, spacy, en-core-sci-md
Successfully installed MarkupSafe-2.1.1 blis-0.7.9 catalogue-2.0.8 certifi-2022.12.7 charset-normalizer-2.1.1 click-8.0.4 cymem-2.0.7 en-core-sci-md-0.5.0 idna-3.4 jinja2-3.1.2 langcodes-3.3.0 murmurhash-1.0.9 numpy-1.23.5 pathy-0.10.1 preshed-3.0.8 pydantic-1.8.2 requests-2.28.1 smart-open-6.3.0 spacy-3.2.4 spacy-legacy-3.0.10 spacy-loggers-1.0.4 srsly-2.4.5 thinc-8.0.17 tqdm-4.64.1 typer-0.4.2 typing-extensions-4.4.0 urllib3-1.26.13 wasabi-0.10.1
Traceback (most recent call last):
  File "/home/konrad/tmp/DeBEIR/./examples/indexing/create_semantic_index.py", line 4, in <module>
    import plac
ModuleNotFoundError: No module named 'plac'

Part of the JOSS review openjournals/joss-reviews#5017.

errors with trec2022 examples

(venv) trec2022$ python training.py             
Traceback (most recent call last):
  File "/home/konrad/tmp/debeir/examples/trec2022/training.py", line 11, in <module>
    from training.utils import DatasetToSentTrans
  File "/home/konrad/tmp/debeir/examples/trec2022/training.py", line 11, in <module>
    from training.utils import DatasetToSentTrans
ModuleNotFoundError: No module named 'training.utils'; 'training' is not a package
(venv) trec2022$ python train.py   
Traceback (most recent call last):
  File "/home/konrad/tmp/debeir/examples/trec2022/train.py", line 16, in <module>
    from training.utils import get_scheduler_with_wandb
  File "/home/konrad/tmp/debeir/examples/trec2022/training.py", line 11, in <module>
    from training.utils import DatasetToSentTrans
ModuleNotFoundError: No module named 'training.utils'; 'training' is not a package

error with hparam tuning example

(venv) hparam_tuning$ python hparam_tuning_from_config.py 
2023-02-28 13:33:52.220491: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-28 13:33:52.301716: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-02-28 13:33:52.304477: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-02-28 13:33:52.304491: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-02-28 13:33:52.826877: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-02-28 13:33:52.826932: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-02-28 13:33:52.826940: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
  File "/home/konrad/tmp/debeir/examples/hparam_tuning/hparam_tuning_from_config.py", line 53, in <module>
    hparam_config = HparamConfig.from_json(
  File "/home/konrad/.local/lib/python3.10/site-packages/debeir/training/hparm_tuning/config.py", line 39, in from_json
    return HparamConfig(json.load(open(fp)))
FileNotFoundError: [Errno 2] No such file or directory: './configs/hparam/trec2021_tuning.json'

support Intel CPUs/GPUs by default

@Ayuei in #14 (comment)_ you explained how to hook into the fit/training loop to support integrated intel CPUs/GPUs.

However that is quite cumbersome, and given that NVIDIA (and AMD?) is supported by default, I would prefer if that intel_extension_for_pytorch would just be included by default and would be used automatically if possible.

quick example possible?

Even on an i9-12900k without a GPU the fastest example estimates 4 hours of duration. Is it possible to create an example that can be run faster or to modify one of the existing ones?

add README.md in pipline folder

Please add a README.md into the examples/pipeline folder to specific the setup steps. With no setup, it aborts with the following error message:

RuntimeError: Elasticsearch instance cannot be reached at <AsyncElasticsearch(['http://localhost:9200'])>

error with create_semantic_index.py example

(venv) indexing$ python create_semantic_index.py 
2023-02-28 13:35:37.990490: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-28 13:35:38.071820: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-02-28 13:35:38.074573: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-02-28 13:35:38.074588: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-02-28 13:35:38.595994: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-02-28 13:35:38.596048: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-02-28 13:35:38.596056: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
  File "/home/konrad/tmp/debeir/examples/indexing/create_semantic_index.py", line 81, in <module>
    raise e
  File "/home/konrad/tmp/debeir/examples/indexing/create_semantic_index.py", line 79, in <module>
    plac.call(main)
  File "/home/konrad/.local/lib/python3.10/site-packages/plac_core.py", line 436, in call
    cmd, result = parser.consume(arglist)
  File "/home/konrad/.local/lib/python3.10/site-packages/plac_core.py", line 287, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/home/konrad/tmp/debeir/examples/indexing/create_semantic_index.py", line 40, in main
    config = config_factory(config, GenericConfig)
  File "/home/konrad/.local/lib/python3.10/site-packages/debeir/datasets/factory.py", line 119, in config_factory
    return config_cls.from_args(args_dict, config_cls)
  File "/home/konrad/.local/lib/python3.10/site-packages/debeir/core/config.py", line 62, in from_args
    obj = field_class(**{k: v for k, v in args_dict.items() if k in field_names})
AttributeError: 'NoneType' object has no attribute 'items'

upgrade torch major version to 2

According to requirements.txt, the outdated torch major version 1 is used while the current major version is 2.
Is it possible to upgrade to the current version?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.