treygrainger / ai-powered-search Goto Github PK

The codebase for the book "AI-Powered Search" (Manning Publications, 2024)

Dockerfile 0.05% Jupyter Notebook 97.33% Python 2.47% Shell 0.01% HTML 0.15%

ai ai-powered-search click-models foundation-models generative-search hybrid-search information-retrieval large-language-models learning-to-rank multimodal-search opensearch personalized-search question-answering reflected-intelligence search-engine semantic-knowledge-graphs semantic-search solr vector-database vector-search

ai-powered-search's People

Contributors

Stargazers

Watchers

ai-powered-search's Issues

docker-compose up failing : rpc error exit code 137

Enjoying the book, got the following error when running docker-compose up for the first time:

failed to solve: rpc error: code = Unknown desc = executor failed running [/bin/bash -o pipefail -c python -m pip install --upgrade pip && pip install -r requirements.txt]: exit code: 137

How do I solve this please. Thanks Rob
docker build error.txt

External aips-solr-data volume

Hi,
I do not succeed to persist the solr data :(
I added to ai-powered-search/docker/solr/docker-compose.yml the following:

volumes:
  aips-solr-data:
    name: aips-solr-data
    external: true

And I created the volume - output of docker volume inspect aips-solr-data is:

[
    {
        "CreatedAt": "2021-11-04T14:37:07+01:00",
        "Driver": "local",
        "Labels": {},
        "Mountpoint": "/var/lib/docker/volumes/aips-solr-data/_data",
        "Name": "aips-solr-data",
        "Options": {},
        "Scope": "local"
    }
]

Any hints/helps?
Thanks in advance!

outdoors dataset in notebook ch5/2.index-datasets

Because https://github.com/ai-powered-search/outdoors.git doesn't contain a single tar file, I changed the following line in notebook ch5/2.index-datasets

! cd outdoors && mkdir -p '../../data/outdoors/' && tar -xvf outdoors.tgz -C '../../data/outdoors/'

! cd outdoors && mkdir -p '../../data/outdoors/' && cat outdoors.tgz* | tar -xz && cp posts.csv ../../data/outdoors

The download and notebook seem to work now but I am not sure if my solution is appropriate?

solr container not responding

I was able to run docker-compose up and access the notebook but the solr container failed to start.

It first failed the health check in the welcome.ipynb with the error message

Error! One or more containers are not responding.
Please follow the instructions in Appendix A.

looking inside the docker-compose up log, the aips-solr failed to start

$ docker-compose down && docker-compose up
Removing aips-data-science ...
Removing aips-solr         ...
...
Creating aips-data-science ...
Creating aips-data-science ... done
Attaching to aips-zk, aips-solr, aips-data-science

aips-solr    | /bin/sh: 0: Can't open                         <--------------  cannot open 
aips-zk      | ZooKeeper JMX enabled by default
...
ps-zk      | 2021-05-09 18:08:19,839 [myid:1] - WARN  [main:QuorumPeerMain@125] - Either no config or no quorum defined in config, running  in standalone mode
aips-solr exited with code 127                               <-------------- failed with code error 127
...

I am using Windows10 Docker Desktop v20.10.5 and integration with WSL 2 distro Ubuntu-18.04.
I tried to ran the docker-compose up inside both the WSL, Powershell and Git Bash, all failed to start Solr.

Anyone else with this error message?

"Up next" link in `ch5/3.semantic-knowledge-graph.ipynb` points to Chapter 10, not Chapter 6

https://github.com/treygrainger/ai-powered-search/blob/main/docker/data-science/notebooks/ch5/3.semantic-knowledge-graph.ipynb?short_path=b296569#L867

`ch7/2.semantic-search.ipynb` - no example of use for defined functions

The notebook defines a bunch of different functions but doesn't show how to use them

`ch11/a.end-to-end-auto-ltr.ipynb` doesn't work end-to-end

The cell 8 returns <Response [405]>, and cell 9 doesn't find file data/product_judgments.txt

docker-compose up load error: solr.jvm:memory.heap.used

Has anyone seen this error / warning message before? Any ideas on how to fix it?

aips-solr | 2022-01-16 15:44:19.671 INFO (qtp1962329560-20) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=26

Full docker-compose up logs attached.
docker jvm memory prob.txt

Try to switch to Solr 9.x instead of 8.x

Or at least switch to 8.11.x to match the supported versions policy

"Loading / logging training set (omitted from book)" in `ch10/2.judgments-and-logging.ipynb` and "Log large training set" in `ch10/3.pairwise-transform.ipynb` fail with error

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
File ~/notebooks/ch10/../ltr/judgments.py:40, in judgments_open(path, mode)
     39 try:
---> 40     f=open(path, mode)
     41     if mode[0] == 'r':

FileNotFoundError: [Errno 2] No such file or directory: 'data/ai_pow_search_judgments.txt'

During handling of the above exception, another exception occurred:

UnboundLocalError                         Traceback (most recent call last)
Cell In[10], line 8
      4 from ltr import download
      6 ftr_logger=FeatureLogger(client, index='tmdb', feature_set='movies')
----> 8 with judgments_open('data/ai_pow_search_judgments.txt') as judgment_list:
      9     for qid, query_judgments in groupby(judgment_list, key=lambda j: j.qid):
     10         ftr_logger.log_for_qid(judgments=query_judgments, 
     11                                qid=qid,
     12                                keywords=judgment_list.keywords(qid))

File /opt/conda/lib/python3.10/contextlib.py:135, in _GeneratorContextManager.__enter__(self)
    133 del self.args, self.kwds, self.func
    134 try:
--> 135     return next(self.gen)
    136 except StopIteration:
    137     raise RuntimeError("generator didn't yield") from None

File ~/notebooks/ch10/../ltr/judgments.py:48, in judgments_open(path, mode)
     46         writer.flush()
     47 finally:
---> 48     f.close()

UnboundLocalError: local variable 'f' referenced before assignment

How to view and use the Chapt 13 's examples?

I followed all the steps indicated in the Appendix A.2 of the book.

How to view and use the Chapt 13's examples?

Rename code for chapters from `ch3`, `ch4`, ... to `ch03`, `ch04`, ... so they will be in the right order

This is how it looks like right now:

Chapter 10, 11, ... are shown before Chapter 3, 4, 5, ...

`ch7/1.index-datasets.ipynb` - not clear what implementation of `index_reviews_collection` should be used

Right now we have two slightly different implementations of index_reviews_collection in the source code (cell 10 & 11). Second is failing because of the file name, but it has more columns

Appendix A: error running healthcheck

I get the following error when running healthcheck. I tried doing this as a reply on Manning but ran into a 5K char limit as well as posting consecutive comments so I put my issue here.

Error! One or more containers are not responding.
Please follow the instructions in Appendix A.

I am running this on Windows 10 using Windows Terminal/Powershell

From the command line I ran this to check the docker-compose version

PS C:\kendevelopment\ai-powered-search\docker> docker-compose --version
docker-compose version 1.29.2, build 5becea4c

Then I run docker-compose up and here's the output.

PS C:\kendevelopment\ai-powered-search\docker> docker-compose up
Creating network "docker_zk-solr" with the default driver
Creating network "docker_solr-data-science" with the default driver
Creating aips-zk ... done
Creating aips-solr ... done
Creating aips-data-science ... done
Attaching to aips-zk, aips-solr, aips-data-science
: No such file /bin/sh: 0: cannot open
aips-zk | ZooKeeper JMX enabled by default
aips-zk | Using config: /conf/zoo.cfg
aips-zk | 2022-04-01 14:00:43,938 [myid:] - INFO [main:QuorumPeerConfig@133] - Reading configuration from: /conf/zoo.cfg
aips-zk | 2022-04-01 14:00:43,945 [myid:] - INFO [main:QuorumPeerConfig@375] - clientPort is not set
aips-zk | 2022-04-01 14:00:43,945 [myid:] - INFO [main:QuorumPeerConfig@389] - secureClientPort is not set
aips-zk | 2022-04-01 14:00:43,954 [myid:] - ERROR [main:QuorumPeerConfig@645] - Invalid configuration, only one server specified (ignoring)
aips-zk | 2022-04-01 14:00:43,960 [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
aips-zk | 2022-04-01 14:00:43,961 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
aips-zk | 2022-04-01 14:00:43,961 [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
aips-zk | 2022-04-01 14:00:43,961 [myid:1] - WARN [main:QuorumPeerMain@125] - Either no config or no quorum defined in config, running in standalone mode
aips-zk | 2022-04-01 14:00:43,964 [myid:1] - INFO [main:ManagedUtil@46] - Log4j found with jmx enabled.
aips-zk | 2022-04-01 14:00:43,976 [myid:1] - INFO [main:QuorumPeerConfig@133] - Reading configuration from: /conf/zoo.cfg
aips-zk | 2022-04-01 14:00:43,976 [myid:1] - INFO [main:QuorumPeerConfig@375] - clientPort is not set
aips-zk | 2022-04-01 14:00:43,977 [myid:1] - INFO [main:QuorumPeerConfig@389] - secureClientPort is not set
aips-zk | 2022-04-01 14:00:43,977 [myid:1] - ERROR [main:QuorumPeerConfig@645] - Invalid configuration, only one server specified (ignoring)
aips-zk | 2022-04-01 14:00:43,977 [myid:1] - INFO [main:ZooKeeperServerMain@117] - Starting server
aips-zk | 2022-04-01 14:00:44,040 [myid:1] - INFO [main:Environment@109] - Server environment:zookeeper.version=3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, built on 05/03/2019 12:07 GMT
aips-zk | 2022-04-01 14:00:44,040 [myid:1] - INFO [main:Environment@109] - Server environment:host.name=aips-zk
aips-zk | 2022-04-01 14:00:44,042 [myid:1] - INFO [main:Environment@109] - Server environment:java.version=1.8.0_232
aips-zk | 2022-04-01 14:00:44,042 [myid:1] - INFO [main:Environment@109] - Server environment:java.vendor=Oracle Corporation
aips-zk | 2022-04-01 14:00:44,042 [myid:1] - INFO [main:Environment@109] - Server environment:java.home=/usr/local/openjdk-8
aips-zk | 2022-04-01 14:00:44,043 [myid:1] - INFO [main:Environment@109] - Server environment:java.class.path=/apache-zookeeper-3.5.5-bin/bin/../zookeeper-server/target/classes:/apache-zookeeper-3.5.5-bin/bin/../build/classes:/apache-zookeeper-3.5.5-bin/bin/../zookeeper-server/target/lib/.jar:/apache-zookeeper-3.5.5-bin/bin/../build/lib/.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/zookeeper-jute-3.5.5.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/zookeeper-3.5.5.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/slf4j-log4j12-1.7.25.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/slf4j-api-1.7.25.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/netty-all-4.1.29.Final.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/log4j-1.2.17.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/json-simple-1.1.1.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jline-2.11.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jetty-util-9.4.17.v20190418.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jetty-servlet-9.4.17.v20190418.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jetty-server-9.4.17.v20190418.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jetty-security-9.4.17.v20190418.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jetty-io-9.4.17.v20190418.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jetty-http-9.4.17.v20190418.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/javax.servlet-api-3.1.0.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jackson-databind-2.9.8.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jackson-core-2.9.8.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/jackson-annotations-2.9.0.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/commons-cli-1.2.jar:/apache-zookeeper-3.5.5-bin/bin/../lib/audience-annotations-0.5.0.jar:/apache-zookeeper-3.5.5-bin/bin/../zookeeper-.jar:/apache-zookeeper-3.5.5-bin/bin/../zookeeper-server/src/main/resources/lib/.jar:/conf:
aips-zk | 2022-04-01 14:00:44,043 [myid:1] - INFO [main:Environment@109] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
aips-zk | 2022-04-01 14:00:44,043 [myid:1] - INFO [main:Environment@109] - Server environment:java.io.tmpdir=/tmp
aips-zk | 2022-04-01 14:00:44,043 [myid:1] - INFO [main:Environment@109] - Server environment:java.compiler=
aips-zk | 2022-04-01 14:00:44,043 [myid:1] - INFO [main:Environment@109] - Server environment:os.name=Linux
aips-zk | 2022-04-01 14:00:44,043 [myid:1] - INFO [main:Environment@109] - Server environment:os.arch=amd64
aips-zk | 2022-04-01 14:00:44,043 [myid:1] - INFO [main:Environment@109] - Server environment:os.version=5.10.102.1-microsoft-standard-WSL2
aips-zk | 2022-04-01 14:00:44,043 [myid:1] - INFO [main:Environment@109] - Server environment:user.name=zookeeper
aips-zk | 2022-04-01 14:00:44,044 [myid:1] - INFO [main:Environment@109] - Server environment:user.home=/home/zookeeper
aips-zk | 2022-04-01 14:00:44,044 [myid:1] - INFO [main:Environment@109] - Server environment:user.dir=/apache-zookeeper-3.5.5-bin
aips-zk | 2022-04-01 14:00:44,044 [myid:1] - INFO [main:Environment@109] - Server environment:os.memory.free=367MB
aips-zk | 2022-04-01 14:00:44,044 [myid:1] - INFO [main:Environment@109] - Server environment:os.memory.max=889MB
aips-zk | 2022-04-01 14:00:44,044 [myid:1] - INFO [main:Environment@109] - Server environment:os.memory.total=379MB
aips-zk | 2022-04-01 14:00:44,048 [myid:1] - INFO [main:ZooKeeperServer@938] - minSessionTimeout set to 4000
aips-zk | 2022-04-01 14:00:44,048 [myid:1] - INFO [main:ZooKeeperServer@947] - maxSessionTimeout set to 40000
aips-zk | 2022-04-01 14:00:44,049 [myid:1] - INFO [main:ZooKeeperServer@166] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /datalog/version-2 snapdir /data/version-2
aips-zk | 2022-04-01 14:00:44,084 [myid:1] - INFO [main:Log@193] - Logging initialized @772ms to org.eclipse.jetty.util.log.Slf4jLog
aips-zk | 2022-04-01 14:00:44,184 [myid:1] - WARN [main:ContextHandler@1588] - o.e.j.s.ServletContextHandler@cb644e{/,null,UNAVAILABLE} contextPath ends with /*
aips-zk | 2022-04-01 14:00:44,184 [myid:1] - WARN [main:ContextHandler@1599] - Empty contextPath
aips-zk | 2022-04-01 14:00:44,198 [myid:1] - INFO [main:Server@370] - jetty-9.4.17.v20190418; built: 2019-04-18T19:45:35.259Z; git: aa1c656c315c011c01e7b21aabb04066635b9f67; jvm 1.8.0_232-b09
aips-zk | 2022-04-01 14:00:44,259 [myid:1] - INFO [main:DefaultSessionIdManager@365] - DefaultSessionIdManager workerName=node0
aips-zk | 2022-04-01 14:00:44,259 [myid:1] - INFO [main:DefaultSessionIdManager@370] - No SessionScavenger set, using defaults
aips-zk | 2022-04-01 14:00:44,262 [myid:1] - INFO [main:HouseKeeper@149] - node0 Scavenging every 600000ms
aips-zk | 2022-04-01 14:00:44,297 [myid:1] - INFO [main:ContextHandler@855] - Started o.e.j.s.ServletContextHandler@cb644e{/,null,AVAILABLE}
aips-zk | 2022-04-01 14:00:44,327 [myid:1] - INFO [main:AbstractConnector@292] - Started ServerConnector@100fc185{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
aips-zk | 2022-04-01 14:00:44,327 [myid:1] - INFO [main:Server@410] - Started @1031ms
aips-zk | 2022-04-01 14:00:44,328 [myid:1] - INFO [main:JettyAdminServer@112] - Started AdminServer on address 0.0.0.0, port 8080 and command URL /commands
aips-zk | 2022-04-01 14:00:44,333 [myid:1] - INFO [main:ServerCnxnFactory@135] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
aips-zk | 2022-04-01 14:00:44,338 [myid:1] - INFO [main:NIOServerCnxnFactory@673] - Configuring NIO connection handler with 10s sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 kB direct buffers.
aips-zk | 2022-04-01 14:00:44,339 [myid:1] - INFO [main:NIOServerCnxnFactory@686] - binding to port /0.0.0.0:2181
aips-zk | 2022-04-01 14:00:44,356 [myid:1] - INFO [main:ZKDatabase@117] - zookeeper.snapshotSizeFactor = 0.33
aips-zk | 2022-04-01 14:00:44,359 [myid:1] - INFO [main:FileTxnSnapLog@372] - Snapshotting: 0x0 to /data/version-2/snapshot.0
aips-zk | 2022-04-01 14:00:44,362 [myid:1] - INFO [main:FileTxnSnapLog@372] - Snapshotting: 0x0 to /data/version-2/snapshot.0
aips-zk | 2022-04-01 14:00:44,376 [myid:1] - INFO [main:ContainerManager@64] - Using checkIntervalMs=60000 maxPerMinute=10000
aips-solr exited with code 2
aips-data-science | [I 14:00:46.559 NotebookApp] Writing notebook server cookie secret to /home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret
aips-data-science | [W 14:00:47.634 NotebookApp] All authentication is disabled. Anyone who can connect to this server will be able to run code.
aips-data-science | [W 14:00:47.659 NotebookApp] Error loading server extension jupyterlab
aips-data-science | Traceback (most recent call last):
aips-data-science | File "/home/jovyan/.local/lib/python3.7/site-packages/notebook/notebookapp.py", line 1572, in init_server_extensions
aips-data-science | mod = importlib.import_module(modulename)
aips-data-science | File "/opt/conda/lib/python3.7/importlib/init.py", line 127, in import_module
aips-data-science | return _bootstrap._gcd_import(name[level:], package, level)
aips-data-science | File "", line 1006, in _gcd_import
aips-data-science | File "", line 983, in _find_and_load
aips-data-science | File "", line 967, in _find_and_load_unlocked
aips-data-science | File "", line 677, in _load_unlocked
aips-data-science | File "", line 728, in exec_module
aips-data-science | File "", line 219, in _call_with_frames_removed
aips-data-science | File "/opt/conda/lib/python3.7/site-packages/jupyterlab/init.py", line 7, in
aips-data-science | from .labapp import LabApp
aips-data-science | File "/opt/conda/lib/python3.7/site-packages/jupyterlab/labapp.py", line 15, in
aips-data-science | from jupyter_server.serverapp import flags
aips-data-science | File "/opt/conda/lib/python3.7/site-packages/jupyter_server/serverapp.py", line 40, in
aips-data-science | from jupyter_core.paths import secure_write
aips-data-science | ImportError: cannot import name 'secure_write' from 'jupyter_core.paths' (/home/jovyan/.local/lib/python3.7/site-packages/jupyter_core/paths.py)
aips-data-science | [I 14:00:47.664 NotebookApp] Serving notebooks from local directory: /home/jovyan/notebooks
aips-data-science | [I 14:00:47.664 NotebookApp] The Jupyter Notebook is running at:
aips-data-science | [I 14:00:47.664 NotebookApp] http://(4214d2bd0462 or 127.0.0.1):8888/
aips-data-science | [I 14:00:47.664 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Everything looks OK except for an "aips-solr exited with code 2" message. I would think solr should be running.

In a separate Windows Terminal windows I run docker-compose ps

PS C:\kendevelopment\ai-powered-search\docker> docker-compose ps
Name Command State Ports

aips-data-science tini -g -- /bin/bash -o pi ... Up 0.0.0.0:2345->2345/tcp, 0.0.0.0:7077->7077/tcp, 0.0.0.0:8082->8080/tcp, 0.0.0.0:8081->8081/tcp, 0.0.0.0:8888->8888/tcp
aips-solr /bin/sh -c "./run_solr_w_l ... Exit 2
aips-zk /docker-entrypoint.sh zkSe ... Up 0.0.0.0:2181->2128/tcp, 2181/tcp, 2888/tcp, 3888/tcp, 8080/tcp
PS C:\kendevelopment\ai-powered-search\docker>

aips-solr with a state of Exit 2 is probably not good. Then went to http://localhost:8888/notebooks/welcome.ipynb

And ran the Healthcheck

Error! One or more containers are not responding.
Please follow the instructions in Appendix A.

Thanks for your help. Thanks for writing this as I'm really motivated to learn this.

requirements.txt step failing when running docker-compose up

Howdy, any idea on how I can get around this final issue with the install step? Thanks! I ran this immediately after cloning the repo.

jeff@Jeffreys-iMac docker % docker-compose up
Building notebooks
Step 1/30 : FROM jupyter/scipy-notebook:2021-11-04
 ---> 8255b7a7b41e
Step 2/30 : USER root
 ---> Using cache
 ---> 603875557610
Step 3/30 : RUN sudo apt-get update && apt-get install -y --reinstall  build-essential
 ---> Using cache
 ---> 4aca572c5f97
Step 4/30 : ENV APACHE_SPARK_VERSION=2.4.7     HADOOP_VERSION=2.7     SPARK_SOLR_VERSION=3.8.0
 ---> Using cache
 ---> f53298cb7313
Step 5/30 : RUN apt-get -y update &&     apt-get install --no-install-recommends -y openjdk-8-jre-headless ca-certificates-java &&     rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 9946e753dd7d
Step 6/30 : RUN conda install python=3.7.12
 ---> Using cache
 ---> 029bacab755d
Step 7/30 : COPY pull_aips_dependency.py pull_aips_dependency.py
 ---> Using cache
 ---> 8c11a388abe1
Step 8/30 : RUN python pull_aips_dependency.py spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz &&     tar xzf spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz -C /usr/local --owner root --group root --no-same-owner &&     rm spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
 ---> Using cache
 ---> a4d55e5f7492
Step 9/30 : RUN cd /usr/local && ln -s spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark
 ---> Using cache
 ---> 0de66e85c2f1
Step 10/30 : ENV SPARK_HOME=/usr/local/spark
 ---> Using cache
 ---> f26c8187d9ad
Step 11/30 : ENV PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip     SPARK_OPTS="--driver-java-options=-Xms1024M --driver-java-options=-Xmx4096M --spark.driver.extraLibraryPath=/usr/local/spark/lib/spark-solr-${SPARK_SOLR_VERSION}-shaded.jar --spark.executor.extraLibraryPath=/usr/local/spark/lib/spark-solr-${SPARK_SOLR_VERSION}-shaded.jar --driver-java-options=-Dlog4j.logLevel=info"     PATH=$PATH:$SPARK_HOME/bin
 ---> Using cache
 ---> a87bf90e4302
Step 12/30 : ENV SPARK_CLASSPATH=$SPARK_CLASSPATH:/usr/local/spark/lib/spark-solr-${SPARK_SOLR_VERSION}-shaded.jar
 ---> Using cache
 ---> 80247e5f54ac
Step 13/30 : ENV PYSPARK_SUBMIT_ARGS="--jars /usr/local/spark/lib/spark-solr-${SPARK_SOLR_VERSION}-shaded.jar"
 ---> Using cache
 ---> d0dd4e045c79
Step 14/30 : Run echo $SPARK_HOME
 ---> Using cache
 ---> 2bbc1fc08916
Step 15/30 : Run mkdir /usr/local/spark/lib/ && cd /usr/local/spark/lib/ &&     wget -q https://repo1.maven.org/maven2/com/lucidworks/spark/spark-solr/${SPARK_SOLR_VERSION}/spark-solr-${SPARK_SOLR_VERSION}-shaded.jar &&     echo "3bd0614d50ce6ef2769eb0d654e58fd68cf3e1f63c567dca8b12432a7e6ac907753b289f6d3cca5a80a67454d6ff841e438f53472cba37530293548751edaa8f *spark-solr-${SPARK_SOLR_VERSION}-shaded.jar" | sha512sum -c - &&     export EXTRA_CLASSPATH=/usr/local/spark/lib/spark-solr-${SPARK_SOLR_VERSION}-shaded.jar &&     $SPARK_HOME/bin/spark-shell --jars spark-solr-${SPARK_SOLR_VERSION}-shaded.jar
 ---> Using cache
 ---> ac5858d662ed
Step 16/30 : Run chmod a+rwx /usr/local/spark/lib/spark-solr-${SPARK_SOLR_VERSION}-shaded.jar
 ---> Using cache
 ---> bf4d54255f2a
Step 17/30 : COPY notebooks notebooks
 ---> Using cache
 ---> 13738a4630e6
Step 18/30 : RUN chown -R $NB_UID:$NB_UID /home/$NB_USER
 ---> Using cache
 ---> a00e63fb40b0
Step 19/30 : USER $NB_UID
 ---> Using cache
 ---> f15a2cc6aabd
Step 20/30 : WORKDIR /home/$NB_USER
 ---> Using cache
 ---> 3f0ecb34d427
Step 21/30 : COPY requirements.txt ./
 ---> 00d45e820595
Step 22/30 : RUN python -m pip install --upgrade pip &&   pip install -r requirements.txt
 ---> Running in 0d99e2cee0bb
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pip in /opt/conda/lib/python3.7/site-packages (21.3.1)
Defaulting to user installation because normal site-packages is not writeable
Collecting appnope==0.1.0
  Downloading appnope-0.1.0-py2.py3-none-any.whl (4.0 kB)
Collecting attrs==19.1.0
  Downloading attrs-19.1.0-py2.py3-none-any.whl (35 kB)
Collecting backcall==0.1.0
  Downloading backcall-0.1.0.zip (11 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting bleach==3.1.4
  Downloading bleach-3.1.4-py2.py3-none-any.whl (151 kB)
Collecting bs4==0.0.1
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting certifi==2019.6.16
  Downloading certifi-2019.6.16-py2.py3-none-any.whl (157 kB)
Collecting chardet==3.0.4
  Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Collecting cython==0.29.20
  Downloading Cython-0.29.20-cp37-cp37m-manylinux1_x86_64.whl (2.0 MB)
Collecting decorator==4.4.0
  Downloading decorator-4.4.0-py2.py3-none-any.whl (8.3 kB)
Collecting defusedxml==0.6.0
  Downloading defusedxml-0.6.0-py2.py3-none-any.whl (23 kB)
Requirement already satisfied: entrypoints==0.3 in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 11)) (0.3)
Collecting findspark==1.3.0
  Downloading findspark-1.3.0-py2.py3-none-any.whl (3.0 kB)
Collecting idna==2.8
  Downloading idna-2.8-py2.py3-none-any.whl (58 kB)
Collecting ipykernel==5.1.1
  Downloading ipykernel-5.1.1-py3-none-any.whl (114 kB)
Collecting ipython==7.5.0
  Downloading ipython-7.5.0-py3-none-any.whl (770 kB)
Requirement already satisfied: ipython-genutils==0.2.0 in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 16)) (0.2.0)
Collecting ipywidgets==7.5.0
  Downloading ipywidgets-7.5.0-py2.py3-none-any.whl (121 kB)
Collecting jedi==0.14.0
  Downloading jedi-0.14.0-py2.py3-none-any.whl (1.0 MB)
Collecting Jinja2==2.10.1
  Downloading Jinja2-2.10.1-py2.py3-none-any.whl (124 kB)
Collecting jsonschema==3.0.1
  Downloading jsonschema-3.0.1-py2.py3-none-any.whl (54 kB)
Collecting jupyter==1.0.0
  Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Collecting jupyter-client==5.2.4
  Downloading jupyter_client-5.2.4-py2.py3-none-any.whl (89 kB)
Collecting jupyter-console==6.0.0
  Downloading jupyter_console-6.0.0-py2.py3-none-any.whl (21 kB)
Collecting jupyter-core==4.5.0
  Downloading jupyter_core-4.5.0-py2.py3-none-any.whl (78 kB)
Collecting jupyterlab_server==1.0.0
  Downloading jupyterlab_server-1.0.0-py3-none-any.whl (26 kB)
Collecting lxml==4.6.2
  Downloading lxml-4.6.2-cp37-cp37m-manylinux1_x86_64.whl (5.5 MB)
Collecting MarkupSafe==1.1.1
  Downloading MarkupSafe-1.1.1-cp37-cp37m-manylinux2010_x86_64.whl (33 kB)
Collecting mergedeep==1.3.0
  Downloading mergedeep-1.3.0-py3-none-any.whl (6.3 kB)
Requirement already satisfied: mistune==0.8.4 in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 29)) (0.8.4)
Collecting nbconvert==5.5.0
  Downloading nbconvert-5.5.0-py2.py3-none-any.whl (447 kB)
Collecting nbformat==4.4.0
  Downloading nbformat-4.4.0-py2.py3-none-any.whl (155 kB)
Requirement already satisfied: notebook in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 33)) (6.4.5)
Collecting nltk==3.5
  Downloading nltk-3.5.zip (1.4 MB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting nmslib==2.1.1
  Downloading nmslib-2.1.1-cp37-cp37m-manylinux2010_x86_64.whl (13.5 MB)
Collecting numpy==1.19.0
  Downloading numpy-1.19.0-cp37-cp37m-manylinux2010_x86_64.whl (14.6 MB)
Collecting pandocfilters==1.4.2
  Downloading pandocfilters-1.4.2.tar.gz (14 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting parso==0.5.0
  Downloading parso-0.5.0-py2.py3-none-any.whl (94 kB)
Collecting pexpect==4.7.0
  Downloading pexpect-4.7.0-py2.py3-none-any.whl (58 kB)
Requirement already satisfied: pickleshare==0.7.5 in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 41)) (0.7.5)
Collecting plotly==4.14.3
  Downloading plotly-4.14.3-py2.py3-none-any.whl (13.2 MB)
Collecting plotnine==0.7.1
  Downloading plotnine-0.7.1-py3-none-any.whl (4.4 MB)
Collecting prometheus-client==0.7.1
  Downloading prometheus_client-0.7.1.tar.gz (38 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting prompt-toolkit==2.0.9
  Downloading prompt_toolkit-2.0.9-py3-none-any.whl (337 kB)
Collecting ptyprocess==0.6.0
  Downloading ptyprocess-0.6.0-py2.py3-none-any.whl (39 kB)
Collecting py4j==0.10.7
  Downloading py4j-0.10.7-py2.py3-none-any.whl (197 kB)
Collecting Pygments==2.4.2
  Downloading Pygments-2.4.2-py2.py3-none-any.whl (883 kB)
Collecting pyrsistent==0.15.2
  Downloading pyrsistent-0.15.2.tar.gz (106 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting pysolr==3.8.1
  Downloading pysolr-3.8.1-py2.py3-none-any.whl (16 kB)
Collecting python-dateutil==2.8.0
  Downloading python_dateutil-2.8.0-py2.py3-none-any.whl (226 kB)
Collecting pytz==2019.1
  Downloading pytz-2019.1-py2.py3-none-any.whl (510 kB)
Collecting pyzmq==18.0.1
  Downloading pyzmq-18.0.1-cp37-cp37m-manylinux1_x86_64.whl (1.1 MB)
Collecting qtconsole==4.5.1
  Downloading qtconsole-4.5.1-py2.py3-none-any.whl (118 kB)
Collecting requests==2.22.0
  Downloading requests-2.22.0-py2.py3-none-any.whl (57 kB)
Collecting retrying==1.3.3
  Downloading retrying-1.3.3.tar.gz (10 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting Send2Trash==1.5.0
  Downloading Send2Trash-1.5.0-py3-none-any.whl (12 kB)
Collecting six==1.12.0
  Downloading six-1.12.0-py2.py3-none-any.whl (10 kB)
Collecting spacy==2.3.0
  Downloading spacy-2.3.0-cp37-cp37m-manylinux1_x86_64.whl (10.0 MB)
Collecting transformers==4.5.1
  Downloading transformers-4.5.1-py3-none-any.whl (2.1 MB)
Collecting sentence-transformers==1.1.0
  Downloading sentence-transformers-1.1.0.tar.gz (78 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting testpath==0.4.2
  Downloading testpath-0.4.2-py2.py3-none-any.whl (163 kB)
Collecting tornado==6.0.3
  Downloading tornado-6.0.3.tar.gz (482 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting traitlets==4.3.2
  Downloading traitlets-4.3.2-py2.py3-none-any.whl (74 kB)
Collecting urllib3==1.25.4
  Downloading urllib3-1.25.4-py2.py3-none-any.whl (125 kB)
Collecting wcwidth==0.1.7
  Downloading wcwidth-0.1.7-py2.py3-none-any.whl (21 kB)
Requirement already satisfied: webencodings==0.5.1 in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 69)) (0.5.1)
Collecting widgetsnbextension==3.5.0
  Downloading widgetsnbextension-3.5.0-py2.py3-none-any.whl (2.2 MB)
Requirement already satisfied: beautifulsoup4 in /opt/conda/lib/python3.7/site-packages (from bs4==0.0.1->-r requirements.txt (line 5)) (4.10.0)
Requirement already satisfied: setuptools>=18.5 in /opt/conda/lib/python3.7/site-packages (from ipython==7.5.0->-r requirements.txt (line 15)) (60.0.4)
Requirement already satisfied: json5 in /opt/conda/lib/python3.7/site-packages (from jupyterlab_server==1.0.0->-r requirements.txt (line 25)) (0.9.5)
Requirement already satisfied: click in /opt/conda/lib/python3.7/site-packages (from nltk==3.5->-r requirements.txt (line 35)) (8.0.3)
Requirement already satisfied: joblib in /opt/conda/lib/python3.7/site-packages (from nltk==3.5->-r requirements.txt (line 35)) (1.1.0)
Collecting regex
  Downloading regex-2021.11.10-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (749 kB)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from nltk==3.5->-r requirements.txt (line 35)) (4.62.3)
Requirement already satisfied: psutil in /opt/conda/lib/python3.7/site-packages (from nmslib==2.1.1->-r requirements.txt (line 36)) (5.8.0)
Collecting pybind11<2.6.2
  Downloading pybind11-2.6.1-py2.py3-none-any.whl (188 kB)
Requirement already satisfied: scipy>=1.2.0 in /opt/conda/lib/python3.7/site-packages (from plotnine==0.7.1->-r requirements.txt (line 43)) (1.7.3)
Requirement already satisfied: patsy>=0.5.1 in /opt/conda/lib/python3.7/site-packages (from plotnine==0.7.1->-r requirements.txt (line 43)) (0.5.2)
Requirement already satisfied: statsmodels>=0.11.1 in /opt/conda/lib/python3.7/site-packages (from plotnine==0.7.1->-r requirements.txt (line 43)) (0.13.1)
Collecting mizani>=0.7.1
  Downloading mizani-0.7.3-py3-none-any.whl (63 kB)
Requirement already satisfied: matplotlib>=3.1.1 in /opt/conda/lib/python3.7/site-packages (from plotnine==0.7.1->-r requirements.txt (line 43)) (3.5.1)
Requirement already satisfied: pandas>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from plotnine==0.7.1->-r requirements.txt (line 43)) (1.3.5)
Collecting descartes>=1.1.0
  Downloading descartes-1.1.0-py3-none-any.whl (5.8 kB)
Collecting preshed<3.1.0,>=3.0.2
  Downloading preshed-3.0.6-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (125 kB)
Collecting thinc==7.4.1
  Downloading thinc-7.4.1-cp37-cp37m-manylinux1_x86_64.whl (2.1 MB)
Collecting blis<0.5.0,>=0.4.0
  Downloading blis-0.4.1-cp37-cp37m-manylinux1_x86_64.whl (3.7 MB)
Collecting catalogue<1.1.0,>=0.0.7
  Downloading catalogue-1.0.0-py2.py3-none-any.whl (7.7 kB)
Collecting srsly<1.1.0,>=1.0.2
  Downloading srsly-1.0.5-cp37-cp37m-manylinux2014_x86_64.whl (184 kB)
Collecting cymem<2.1.0,>=2.0.2
  Downloading cymem-2.0.6-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (35 kB)
Collecting wasabi<1.1.0,>=0.4.0
  Downloading wasabi-0.9.0-py3-none-any.whl (25 kB)
Collecting murmurhash<1.1.0,>=0.28.0
  Downloading murmurhash-1.0.6-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21 kB)
Collecting plac<1.2.0,>=0.9.6
  Downloading plac-1.1.3-py2.py3-none-any.whl (20 kB)
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from transformers==4.5.1->-r requirements.txt (line 60)) (4.10.0)
Collecting sacremoses
  Downloading sacremoses-0.0.46-py3-none-any.whl (895 kB)
Requirement already satisfied: packaging in /opt/conda/lib/python3.7/site-packages (from transformers==4.5.1->-r requirements.txt (line 60)) (21.2)
Collecting filelock
  Downloading filelock-3.4.0-py3-none-any.whl (9.8 kB)
Collecting torch>=1.6.0
  Downloading torch-1.10.1-cp37-cp37m-manylinux1_x86_64.whl (881.9 MB)
ERROR: Service 'notebooks' failed to build : The command '/bin/bash -o pipefail -c python -m pip install --upgrade pip &&   pip install -r requirements.txt' returned a non-zero code: 137

Update `jupyter/pyspark-notebook` to Spark 3.3.2 at least

try if Spark 3.4.0 image will work

Code in `Listing 13.12` is failing because of the problems with dependencies

It's a known issue caused by the new versions & related to the holzschu/Carnets#310 - after pinning dependencies it works.

ch5 - Notebook 2.index-datasets.ipynb not able to open

Hello,

First of all, this is an awesome book. Currently, I'm working with it day and night. :D

right now I'm in Chapter 5, but since the last commit 2.index-datasets.ipynb is broken. In commit 7cbf560aa9b4fb12a6b453b819e716774984e397 it works but after this not anymore.

I get the following error if I try to open it:
NotJSONError("Notebook does not appear to be JSON: '\\n\\n\\n\\n\\n\\n\\n<!DOCTYPE html>\\n<html la...")

Jupyter Lab interface generates a lot of noise in the logs

Should see if we can clean this up before publication:

aips-data-science  | [I 2023-12-03 16:44:07.224 ServerApp] Generating new user for token-authenticated request: d61e0fc3fb7e48b383d772bfa33204da
aips-data-science  | [I 2023-12-03 16:44:12.296 ServerApp] Generating new user for token-authenticated request: 36893885b19344c28acefaffb4aa2471
aips-data-science  | [I 2023-12-03 16:44:17.363 ServerApp] Generating new user for token-authenticated request: 4ff7daa059a24581abd581bef7814b32
aips-data-science  | [I 2023-12-03 16:44:22.419 ServerApp] Generating new user for token-authenticated request: df5178e8735246f4bd8a0331248208e0
aips-data-science  | [I 2023-12-03 16:44:27.476 ServerApp] Generating new user for token-authenticated request: bad3506047074bf898b8e747a16c19d5
aips-data-science  | [I 2023-12-03 16:44:32.527 ServerApp] Generating new user for token-authenticated request: d045277bf54d426cb4c2d4aaba3f7304
aips-data-science  | [I 2023-12-03 16:44:37.594 ServerApp] Generating new user for token-authenticated request: b04cc852e49246dd8ac9f7e022493374
aips-data-science  | [I 2023-12-03 16:44:42.682 ServerApp] Generating new user for token-authenticated request: 807f25cf442e4754994fa9a6e0d0f9d2
aips-data-science  | [I 2023-12-03 16:44:47.755 ServerApp] Generating new user for token-authenticated request: 88a9aee0871642169b06eea8aa001b91
aips-data-science  | [I 2023-12-03 16:44:52.818 ServerApp] Generating new user for token-authenticated request: fa90fdf67f314784846439465a05ac6b

Not all chapters are listed in the welcome notebook.

Right now, only chapters 1-5 & 10 are listed in the welcome notebook. We need to list each chapter & provide links to specific notebooks

`Listing 12.13 - Fully Automated LTR Loop` fails

stacktrace:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[29], line 30
     28 train, test = test_train_split(sdbn, train=0.8) 
     29 ranksvm_ltr(train, model_name='exploit', feature_set=exploit_feature_set)
---> 30 eval_model(test, model_name='exploit', sdbn=new_sdbn) 
     32 # ===============
     33 # EXPLORE
     35 explore_feature_set = [
     36     {
     37       "name" : "manufacturer_match",
   (...)
     66       }
     67     }]

NameError: name 'new_sdbn' is not defined

`Listing 12.12 - Rerun A/B test on new test3 model` fails

The cell 27 fails with following stacktrace:

KeyError                                  Traceback (most recent call last)
Cell In[27], line 5
      2 purchases = {'test1': 0, 'test3': 0}
      3 for _ in range(0, NUM_USERS):
----> 5     model_name, purchase_made = a_or_b_model(query='transformers dvd', 
      6                                              a_model='test1',
      7                                              b_model='test3')
      8     if purchase_made:
      9         purchases[model_name]+= 1 

Cell In[17], line 20, in a_or_b_model(query, a_model, b_model)
     17 else:
     18     model_name=b_model
---> 20 purchase_made = live_user_query(query=query, 
     21                                model_name=model_name,
     22                                desired=wants_to_purchase,
     23                                meh=might_purchase)
     24 return (model_name, purchase_made)

Cell In[16], line 15, in live_user_query(query, model_name, desired, meh, desired_prob, meh_prob, uninteresting_prob, quit_per_rank_prob)
      1 def live_user_query(query, model_name,
      2                     desired, meh,
      3                     desired_prob=0.15, 
      4                     meh_prob=0.03, 
      5                     uninteresting_prob=0.01,
      6                     quit_per_rank_prob=0.2):
      7     """Live user for 'query' where purchase probability depends on if 
      8        products upc is in one of three sets.
      9        
   (...)
     13        
     14        """   
---> 15     search_results = search(query, model_name, at=10)
     17     results = pd.DataFrame(search_results).reset_index()
     18     for doc in results.to_dict(orient="records"):

Cell In[11], line 32, in search(query, model_name, at, log)
     29 if log:
     30     print(resp)
---> 32 search_results = resp['response']['docs']
     34 for rank, result in enumerate(search_results):
     35     result['rank'] = rank

KeyError: 'response'

Make Retrotech Search Dataset Available

Howdy, enjoying the book! Can you add the Retrotech search sessions dataset? This would be to play around with the data referenced in chapter 11 regarding SDBM's.

https://github.com/treygrainger/ai-powered-search/tree/master/docker/data-science/notebooks/data/retrotech

or perhaps to the spark-3-support branch (to support the working pip installable branch, per #46)

Thank you!

Creating indexes in `ch12/0.ch12.setup.ipynb` generates Solr errors

Delete dynamic field
Status: Failure; Response:[ {'responseHeader': {'status': 400, 'QTime': 89}, 'error': {'metadata': ['error-class', 'org.apache.solr.api.ApiBag$ExceptionWithErrObject', 'root-error-class', 'org.apache.solr.api.ApiBag$ExceptionWithErrObject'], 'details': [{'delete-dynamic-field': {'name': '*_ngram'}, 'errorMessages': ["The dynamic field '*_ngram' is not present in this schema, and so cannot be deleted.\n"]}], 'msg': 'error processing commands', 'code': 400}} ]

docker-compose up failing due to: error: can't find Rust compiler

Hi there,

I have started enjoying the book but I am not able to start the containers 😞

After changing spacy version from 2.3.1 to 2.3.7 (to fix a very first error, No matching distribution found for spacy==2.3.0), I end up receiving an error related to a missing rust compiler during tokenizers installation.
Here's my environment:
macos big sur v11.4 Apple chip M1
docker 20.10.8
pip 21.3.1
I am on master, up-to-date.

Can you please have a look ?
Many thanks.
Steve

Full log below.

zookeeper uses an image, skipping
Building solr
[+] Building 0.4s (9/9) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 37B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/solr:8.5.2 0.2s
=> [internal] load build context 0.0s
=> => transferring context: 39B 0.0s
=> [1/4] FROM docker.io/library/solr:8.5.2@sha256:dd56c541fb28a60e241550a3eb63afde0d8890a1ffe3971399fd245a22d071be 0.0s
=> CACHED [2/4] ADD run_solr_w_ltr.sh ./run_solr_w_ltr.sh 0.0s
=> CACHED [3/4] RUN chown solr:solr run_solr_w_ltr.sh 0.0s
=> CACHED [4/4] RUN chmod u+x run_solr_w_ltr.sh 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:3348d34a2841ca9d73e6db70d4d22aa998fcfddd9cf16c2b6baedd6754d119d0 0.0s
=> => naming to docker.io/library/docker_solr 0.0s

Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
Building notebooks
[+] Building 443.0s (19/22)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 37B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/jupyter/scipy-notebook:2021-11-04 0.2s
=> [ 1/18] FROM docker.io/jupyter/scipy-notebook:2021-11-04@sha256:72632c1103a10a2d6212b2a5dc7726ebad0fd73c5e15395b8ed59608502c9b0c 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 11.54kB 0.0s
=> CACHED [ 2/18] RUN sudo apt-get update && apt-get install -y --reinstall build-essential 0.0s
=> CACHED [ 3/18] RUN apt-get -y update && apt-get install --no-install-recommends -y openjdk-8-jre-headless ca-certificates-java && rm -r 0.0s
=> CACHED [ 4/18] RUN conda install python=3.7.12 0.0s
=> CACHED [ 5/18] COPY pull_aips_dependency.py pull_aips_dependency.py 0.0s
=> CACHED [ 6/18] RUN python pull_aips_dependency.py spark-2.4.7-bin-hadoop2.7.tgz && tar xzf spark-2.4.7-bin-hadoop2.7.tgz -C /usr/local --ow 0.0s
=> CACHED [ 7/18] RUN cd /usr/local && ln -s spark-2.4.7-bin-hadoop2.7 spark 0.0s
=> CACHED [ 8/18] RUN echo /usr/local/spark 0.0s
=> CACHED [ 9/18] RUN mkdir /usr/local/spark/lib/ && cd /usr/local/spark/lib/ && wget -q https://repo1.maven.org/maven2/com/lucidworks/spark/s 0.0s
=> CACHED [10/18] RUN chmod a+rwx /usr/local/spark/lib/spark-solr-3.8.0-shaded.jar 0.0s
=> CACHED [11/18] COPY notebooks notebooks 0.0s
=> CACHED [12/18] RUN chown -R 1000:1000 /home/jovyan 0.0s
=> CACHED [13/18] WORKDIR /home/jovyan 0.0s
=> CACHED [14/18] COPY requirements.txt ./ 0.0s
=> ERROR [15/18] RUN python -m pip install --upgrade pip && pip install -r requirements.txt 442.8s

[15/18] RUN python -m pip install --upgrade pip && pip install -r requirements.txt:
#19 0.438 Defaulting to user installation because normal site-packages is not writeable
#19 0.455 Requirement already satisfied: pip in /opt/conda/lib/python3.7/site-packages (21.3.1)
#19 1.315 Defaulting to user installation because normal site-packages is not writeable
#19 1.430 Collecting appnope==0.1.0
#19 1.513 Downloading appnope-0.1.0-py2.py3-none-any.whl (4.0 kB)
#19 1.545 Collecting attrs==19.1.0
#19 1.564 Downloading attrs-19.1.0-py2.py3-none-any.whl (35 kB)
#19 1.590 Collecting backcall==0.1.0
#19 1.607 Downloading backcall-0.1.0.zip (11 kB)
#19 1.612 Preparing metadata (setup.py): started
#19 1.763 Preparing metadata (setup.py): finished with status 'done'
#19 1.805 Collecting bleach==3.1.4
#19 1.821 Downloading bleach-3.1.4-py2.py3-none-any.whl (151 kB)
#19 1.870 Collecting bs4==0.0.1
#19 1.887 Downloading bs4-0.0.1.tar.gz (1.1 kB)
#19 1.892 Preparing metadata (setup.py): started
#19 2.040 Preparing metadata (setup.py): finished with status 'done'
#19 2.071 Collecting certifi==2019.6.16
#19 2.089 Downloading certifi-2019.6.16-py2.py3-none-any.whl (157 kB)
#19 2.125 Collecting chardet==3.0.4
#19 2.149 Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)
#19 2.429 Collecting cython==0.29.20
#19 2.449 Downloading Cython-0.29.20-py2.py3-none-any.whl (973 kB)
#19 2.526 Collecting decorator==4.4.0
#19 2.545 Downloading decorator-4.4.0-py2.py3-none-any.whl (8.3 kB)
#19 2.571 Collecting defusedxml==0.6.0
#19 2.590 Downloading defusedxml-0.6.0-py2.py3-none-any.whl (23 kB)
#19 2.594 Requirement already satisfied: entrypoints==0.3 in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 11)) (0.3)
#19 2.615 Collecting findspark==1.3.0
#19 2.632 Downloading findspark-1.3.0-py2.py3-none-any.whl (3.0 kB)
#19 2.659 Collecting idna==2.8
#19 2.675 Downloading idna-2.8-py2.py3-none-any.whl (58 kB)
#19 2.736 Collecting ipykernel==5.1.1
#19 2.753 Downloading ipykernel-5.1.1-py3-none-any.whl (114 kB)
#19 2.819 Collecting ipython==7.5.0
#19 2.846 Downloading ipython-7.5.0-py3-none-any.whl (770 kB)
#19 2.865 Requirement already satisfied: ipython-genutils==0.2.0 in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 16)) (0.2.0)
#19 2.910 Collecting ipywidgets==7.5.0
#19 2.926 Downloading ipywidgets-7.5.0-py2.py3-none-any.whl (121 kB)
#19 2.964 Collecting jedi==0.14.0
#19 2.985 Downloading jedi-0.14.0-py2.py3-none-any.whl (1.0 MB)
#19 3.049 Collecting Jinja2==2.10.1
#19 3.086 Downloading Jinja2-2.10.1-py2.py3-none-any.whl (124 kB)
#19 3.122 Collecting jsonschema==3.0.1
#19 3.142 Downloading jsonschema-3.0.1-py2.py3-none-any.whl (54 kB)
#19 3.167 Collecting jupyter==1.0.0
#19 3.184 Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
#19 3.224 Collecting jupyter-client==5.2.4
#19 3.244 Downloading jupyter_client-5.2.4-py2.py3-none-any.whl (89 kB)
#19 3.272 Collecting jupyter-console==6.0.0
#19 3.289 Downloading jupyter_console-6.0.0-py2.py3-none-any.whl (21 kB)
#19 3.339 Collecting jupyter-core==4.5.0
#19 3.355 Downloading jupyter_core-4.5.0-py2.py3-none-any.whl (78 kB)
#19 3.408 Collecting jupyterlab_server==1.0.0
#19 3.426 Downloading jupyterlab_server-1.0.0-py3-none-any.whl (26 kB)
#19 3.641 Collecting lxml==4.6.2
#19 3.657 Downloading lxml-4.6.2-cp37-cp37m-manylinux2014_aarch64.whl (6.7 MB)
#19 3.872 Collecting MarkupSafe==1.1.1
#19 3.888 Downloading MarkupSafe-1.1.1-cp37-cp37m-manylinux2014_aarch64.whl (34 kB)
#19 3.914 Collecting mergedeep==1.3.0
#19 3.929 Downloading mergedeep-1.3.0-py3-none-any.whl (6.3 kB)
#19 3.935 Requirement already satisfied: mistune==0.8.4 in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 29)) (0.8.4)
#19 3.970 Collecting nbconvert==5.5.0
#19 3.991 Downloading nbconvert-5.5.0-py2.py3-none-any.whl (447 kB)
#19 4.027 Collecting nbformat==4.4.0
#19 4.060 Downloading nbformat-4.4.0-py2.py3-none-any.whl (155 kB)
#19 4.108 Collecting notebook==5.7.8
#19 4.139 Downloading notebook-5.7.8-py2.py3-none-any.whl (9.0 MB)
#19 4.420 Collecting nltk==3.5
#19 4.440 Downloading nltk-3.5.zip (1.4 MB)
#19 4.526 Preparing metadata (setup.py): started
#19 4.688 Preparing metadata (setup.py): finished with status 'done'
#19 4.746 Collecting nmslib==2.1.1
#19 4.764 Downloading nmslib-2.1.1-cp37-cp37m-manylinux2014_aarch64.whl (14.0 MB)
#19 5.372 Collecting numpy==1.19.0
#19 5.391 Downloading numpy-1.19.0-cp37-cp37m-manylinux2014_aarch64.whl (12.2 MB)
#19 5.660 Collecting pandocfilters==1.4.2
#19 5.676 Downloading pandocfilters-1.4.2.tar.gz (14 kB)
#19 5.685 Preparing metadata (setup.py): started
#19 5.823 Preparing metadata (setup.py): finished with status 'done'
#19 5.855 Collecting parso==0.5.0
#19 5.872 Downloading parso-0.5.0-py2.py3-none-any.whl (94 kB)
#19 5.905 Collecting pexpect==4.7.0
#19 5.922 Downloading pexpect-4.7.0-py2.py3-none-any.whl (58 kB)
#19 5.930 Requirement already satisfied: pickleshare==0.7.5 in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 40)) (0.7.5)
#19 5.991 Collecting plotly==4.14.3
#19 6.009 Downloading plotly-4.14.3-py2.py3-none-any.whl (13.2 MB)
#19 6.403 Collecting plotnine==0.7.1
#19 6.428 Downloading plotnine-0.7.1-py3-none-any.whl (4.4 MB)
#19 6.558 Collecting prometheus-client==0.7.1
#19 6.575 Downloading prometheus_client-0.7.1.tar.gz (38 kB)
#19 6.586 Preparing metadata (setup.py): started
#19 6.734 Preparing metadata (setup.py): finished with status 'done'
#19 6.799 Collecting prompt-toolkit==2.0.9
#19 6.817 Downloading prompt_toolkit-2.0.9-py3-none-any.whl (337 kB)
#19 6.846 Collecting ptyprocess==0.6.0
#19 6.864 Downloading ptyprocess-0.6.0-py2.py3-none-any.whl (39 kB)
#19 6.896 Collecting py4j==0.10.7
#19 6.913 Downloading py4j-0.10.7-py2.py3-none-any.whl (197 kB)
#19 6.957 Collecting Pygments==2.4.2
#19 6.978 Downloading Pygments-2.4.2-py2.py3-none-any.whl (883 kB)
#19 7.027 Collecting pyrsistent==0.15.2
#19 7.044 Downloading pyrsistent-0.15.2.tar.gz (106 kB)
#19 7.061 Preparing metadata (setup.py): started
#19 7.208 Preparing metadata (setup.py): finished with status 'done'
#19 7.236 Collecting pysolr==3.8.1
#19 7.251 Downloading pysolr-3.8.1-py2.py3-none-any.whl (16 kB)
#19 7.284 Collecting python-dateutil==2.8.0
#19 7.306 Downloading python_dateutil-2.8.0-py2.py3-none-any.whl (226 kB)
#19 7.383 Collecting pytz==2019.1
#19 7.401 Downloading pytz-2019.1-py2.py3-none-any.whl (510 kB)
#19 7.573 Collecting pyzmq==18.0.1
#19 7.596 Downloading pyzmq-18.0.1.tar.gz (1.2 MB)
#19 7.749 Preparing metadata (setup.py): started
#19 7.942 Preparing metadata (setup.py): finished with status 'done'
#19 7.977 Collecting qtconsole==4.5.1
#19 7.994 Downloading qtconsole-4.5.1-py2.py3-none-any.whl (118 kB)
#19 8.057 Collecting requests==2.22.0
#19 8.082 Downloading requests-2.22.0-py2.py3-none-any.whl (57 kB)
#19 8.106 Collecting retrying==1.3.3
#19 8.122 Downloading retrying-1.3.3.tar.gz (10 kB)
#19 8.129 Preparing metadata (setup.py): started
#19 8.272 Preparing metadata (setup.py): finished with status 'done'
#19 8.298 Collecting Send2Trash==1.5.0
#19 8.315 Downloading Send2Trash-1.5.0-py3-none-any.whl (12 kB)
#19 8.342 Collecting six==1.12.0
#19 8.363 Downloading six-1.12.0-py2.py3-none-any.whl (10 kB)
#19 8.487 Collecting spacy==2.3.7
#19 8.504 Downloading spacy-2.3.7.tar.gz (5.8 MB)
#19 9.188 Installing build dependencies: started
#19 110.1 Installing build dependencies: still running...
#19 178.1 Installing build dependencies: still running...
#19 184.5 Installing build dependencies: finished with status 'done'
#19 184.5 Getting requirements to build wheel: started
#19 184.7 Getting requirements to build wheel: finished with status 'done'
#19 184.8 Installing backend dependencies: started
#19 186.1 Installing backend dependencies: finished with status 'done'
#19 186.1 Preparing metadata (pyproject.toml): started
#19 186.4 Preparing metadata (pyproject.toml): finished with status 'done'
#19 186.4 Collecting transformers==4.5.1
#19 186.5 Downloading transformers-4.5.1-py3-none-any.whl (2.1 MB)
#19 186.5 Collecting sentence-transformers==1.1.0
#19 186.5 Downloading sentence-transformers-1.1.0.tar.gz (78 kB)
#19 186.6 Preparing metadata (setup.py): started
#19 186.7 Preparing metadata (setup.py): finished with status 'done'
#19 186.7 Collecting testpath==0.4.2
#19 186.8 Downloading testpath-0.4.2-py2.py3-none-any.whl (163 kB)
#19 186.8 Collecting tornado==6.0.3
#19 186.8 Downloading tornado-6.0.3.tar.gz (482 kB)
#19 186.9 Preparing metadata (setup.py): started
#19 187.0 Preparing metadata (setup.py): finished with status 'done'
#19 187.1 Collecting traitlets==4.3.2
#19 187.1 Downloading traitlets-4.3.2-py2.py3-none-any.whl (74 kB)
#19 187.2 Collecting urllib3==1.25.4
#19 187.2 Downloading urllib3-1.25.4-py2.py3-none-any.whl (125 kB)
#19 187.2 Collecting wcwidth==0.1.7
#19 187.2 Downloading wcwidth-0.1.7-py2.py3-none-any.whl (21 kB)
#19 187.2 Requirement already satisfied: webencodings==0.5.1 in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 68)) (0.5.1)
#19 187.3 Collecting widgetsnbextension==3.5.0
#19 187.4 Downloading widgetsnbextension-3.5.0-py2.py3-none-any.whl (2.2 MB)
#19 187.5 Requirement already satisfied: beautifulsoup4 in /opt/conda/lib/python3.7/site-packages (from bs4==0.0.1->-r requirements.txt (line 5)) (4.10.0)
#19 187.5 Requirement already satisfied: setuptools>=18.5 in /opt/conda/lib/python3.7/site-packages (from ipython==7.5.0->-r requirements.txt (line 15)) (60.5.0)
#19 187.6 Requirement already satisfied: json5 in /opt/conda/lib/python3.7/site-packages (from jupyterlab_server==1.0.0->-r requirements.txt (line 25)) (0.9.5)
#19 187.7 Requirement already satisfied: terminado>=0.8.1 in /opt/conda/lib/python3.7/site-packages (from notebook==5.7.8->-r requirements.txt (line 32)) (0.12.1)
#19 187.7 Requirement already satisfied: click in /opt/conda/lib/python3.7/site-packages (from nltk==3.5->-r requirements.txt (line 34)) (8.0.3)
#19 187.7 Requirement already satisfied: joblib in /opt/conda/lib/python3.7/site-packages (from nltk==3.5->-r requirements.txt (line 34)) (1.1.0)
#19 188.1 Collecting regex
#19 188.1 Downloading regex-2021.11.10-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (746 kB)
#19 188.2 Requirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from nltk==3.5->-r requirements.txt (line 34)) (4.62.3)
#19 188.2 Requirement already satisfied: psutil in /opt/conda/lib/python3.7/site-packages (from nmslib==2.1.1->-r requirements.txt (line 35)) (5.9.0)
#19 188.2 Collecting pybind11<2.6.2
#19 188.2 Downloading pybind11-2.6.1-py2.py3-none-any.whl (188 kB)
#19 188.3 Requirement already satisfied: statsmodels>=0.11.1 in /opt/conda/lib/python3.7/site-packages (from plotnine==0.7.1->-r requirements.txt (line 42)) (0.13.1)
#19 188.3 Collecting descartes>=1.1.0
#19 188.3 Downloading descartes-1.1.0-py3-none-any.whl (5.8 kB)
#19 188.3 Requirement already satisfied: patsy>=0.5.1 in /opt/conda/lib/python3.7/site-packages (from plotnine==0.7.1->-r requirements.txt (line 42)) (0.5.2)
#19 188.3 Requirement already satisfied: matplotlib>=3.1.1 in /opt/conda/lib/python3.7/site-packages (from plotnine==0.7.1->-r requirements.txt (line 42)) (3.5.1)
#19 188.3 Requirement already satisfied: pandas>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from plotnine==0.7.1->-r requirements.txt (line 42)) (1.3.5)
#19 188.4 Collecting mizani>=0.7.1
#19 188.4 Downloading mizani-0.7.3-py3-none-any.whl (63 kB)
#19 188.4 Requirement already satisfied: scipy>=1.2.0 in /opt/conda/lib/python3.7/site-packages (from plotnine==0.7.1->-r requirements.txt (line 42)) (1.7.3)
#19 188.5 Collecting wasabi<1.1.0,>=0.4.0
#19 188.5 Using cached wasabi-0.9.0-py3-none-any.whl (25 kB)
#19 188.6 Collecting cymem<2.1.0,>=2.0.2
#19 188.6 Using cached cymem-2.0.6-cp37-cp37m-linux_aarch64.whl
#19 188.6 Collecting plac<1.2.0,>=0.9.6
#19 188.6 Using cached plac-1.1.3-py2.py3-none-any.whl (20 kB)
#19 188.6 Collecting preshed<3.1.0,>=3.0.2
#19 188.6 Using cached preshed-3.0.6-cp37-cp37m-linux_aarch64.whl
#19 188.7 Collecting murmurhash<1.1.0,>=0.28.0
#19 188.7 Using cached murmurhash-1.0.6-cp37-cp37m-linux_aarch64.whl
#19 188.8 Collecting thinc<7.5.0,>=7.4.1
#19 188.8 Using cached thinc-7.4.5-cp37-cp37m-linux_aarch64.whl
#19 188.9 Collecting blis<0.8.0,>=0.4.0
#19 188.9 Using cached blis-0.7.5-cp37-cp37m-linux_aarch64.whl
#19 188.9 Collecting catalogue<1.1.0,>=0.0.7
#19 188.9 Using cached catalogue-1.0.0-py2.py3-none-any.whl (7.7 kB)
#19 189.0 Collecting srsly<1.1.0,>=1.0.2
#19 189.0 Using cached srsly-1.0.5-cp37-cp37m-linux_aarch64.whl
#19 189.1 Collecting filelock
#19 189.1 Downloading filelock-3.4.2-py3-none-any.whl (9.9 kB)
#19 189.2 Collecting sacremoses
#19 189.2 Downloading sacremoses-0.0.47-py2.py3-none-any.whl (895 kB)
#19 189.2 Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from transformers==4.5.1->-r requirements.txt (line 59)) (4.10.0)
#19 189.4 Collecting tokenizers<0.11,>=0.10.1
#19 189.4 Downloading tokenizers-0.10.3.tar.gz (212 kB)
#19 189.5 Installing build dependencies: started
#19 191.5 Installing build dependencies: finished with status 'done'
#19 191.5 Getting requirements to build wheel: started
#19 191.7 Getting requirements to build wheel: finished with status 'done'
#19 191.7 Preparing metadata (pyproject.toml): started
#19 191.8 Preparing metadata (pyproject.toml): finished with status 'done'
#19 191.8 Requirement already satisfied: packaging in /opt/conda/lib/python3.7/site-packages (from transformers==4.5.1->-r requirements.txt (line 59)) (21.2)
#19 191.9 Collecting torch>=1.6.0
#19 192.2 Downloading torch-1.10.1-cp37-cp37m-manylinux2014_aarch64.whl (51.0 MB)
#19 193.2 Requirement already satisfied: scikit-learn in /opt/conda/lib/python3.7/site-packages (from sentence-transformers==1.1.0->-r requirements.txt (line 60)) (1.0.2)
#19 193.3 Collecting sentencepiece
#19 193.3 Downloading sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB)
#19 193.4 Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->transformers==4.5.1->-r requirements.txt (line 59)) (3.6.0)
#19 193.4 Requirement already satisfied: typing-extensions>=3.6.4 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->transformers==4.5.1->-r requirements.txt (line 59)) (3.10.0.2)
#19 193.4 Requirement already satisfied: pyparsing>=2.2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.1.1->plotnine==0.7.1->-r requirements.txt (line 42)) (2.4.7)
#19 193.4 Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.1.1->plotnine==0.7.1->-r requirements.txt (line 42)) (4.28.5)
#19 193.4 Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.1.1->plotnine==0.7.1->-r requirements.txt (line 42)) (0.11.0)
#19 193.4 Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.1.1->plotnine==0.7.1->-r requirements.txt (line 42)) (1.3.2)
#19 193.4 Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.1.1->plotnine==0.7.1->-r requirements.txt (line 42)) (8.4.0)
#19 193.5 Collecting palettable
#19 193.5 Downloading palettable-3.3.0-py2.py3-none-any.whl (111 kB)
#19 193.7 Collecting pandas>=1.1.0
#19 193.8 Downloading pandas-1.3.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (10.7 MB)
#19 194.0 Downloading pandas-1.3.3-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (10.7 MB)
#19 194.4 Requirement already satisfied: soupsieve>1.2 in /opt/conda/lib/python3.7/site-packages (from beautifulsoup4->bs4==0.0.1->-r requirements.txt (line 5)) (2.0.1)
#19 194.5 Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn->sentence-transformers==1.1.0->-r requirements.txt (line 60)) (3.0.0)
#19 194.6 Building wheels for collected packages: backcall, bs4, nltk, pandocfilters, prometheus-client, pyrsistent, pyzmq, retrying, spacy, sentence-transformers, tornado, tokenizers
#19 194.6 Building wheel for backcall (setup.py): started
#19 194.8 Building wheel for backcall (setup.py): finished with status 'done'
#19 194.8 Created wheel for backcall: filename=backcall-0.1.0-py3-none-any.whl size=10413 sha256=d446198efc54c473b28e330cf08a9b80d4e0e5ba7685678634b2a2eefe3fc48b
#19 194.8 Stored in directory: /home/jovyan/.cache/pip/wheels/60/5a/10/2177abb11261d49069a732cbc0e66207783c7ee79c1f807167
#19 194.8 Building wheel for bs4 (setup.py): started
#19 195.0 Building wheel for bs4 (setup.py): finished with status 'done'
#19 195.0 Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1271 sha256=434cd8a88de0a62283a52bf90a8e0e970476bdf89513753ca9d7bd9ce583f88d
#19 195.0 Stored in directory: /home/jovyan/.cache/pip/wheels/0a/9e/ba/20e5bbc1afef3a491f0b3bb74d508f99403aabe76eda2167ca
#19 195.0 Building wheel for nltk (setup.py): started
#19 195.5 Building wheel for nltk (setup.py): finished with status 'done'
#19 195.5 Created wheel for nltk: filename=nltk-3.5-py3-none-any.whl size=1434693 sha256=320180c519a60c785e771f927f07268893476bb211e23fa24fa3c39ce330815c
#19 195.5 Stored in directory: /home/jovyan/.cache/pip/wheels/45/6c/46/a1865e7ba706b3817f5d1b2ff7ce8996aabdd0d03d47ba0266
#19 195.5 Building wheel for pandocfilters (setup.py): started
#19 195.7 Building wheel for pandocfilters (setup.py): finished with status 'done'
#19 195.7 Created wheel for pandocfilters: filename=pandocfilters-1.4.2-py3-none-any.whl size=7871 sha256=b5280b820859d0124baf03fe2d0df4d132d5a1d21adaa7ee7e468e429b8c8b9b
#19 195.7 Stored in directory: /home/jovyan/.cache/pip/wheels/63/99/01/9fe785b86d1e091a6b2a61e06ddb3d8eb1bc9acae5933d4740
#19 195.7 Building wheel for prometheus-client (setup.py): started
#19 195.9 Building wheel for prometheus-client (setup.py): finished with status 'done'
#19 195.9 Created wheel for prometheus-client: filename=prometheus_client-0.7.1-py3-none-any.whl size=41404 sha256=3c5dcfe44be00e3cc625bf43c1b82ccab4c70c32396183dc29227ff06c18f73c
#19 195.9 Stored in directory: /home/jovyan/.cache/pip/wheels/30/0c/26/59ba285bf65dc79d195e9b25e2ddde4c61070422729b0cd914
#19 195.9 Building wheel for pyrsistent (setup.py): started
#19 196.6 Building wheel for pyrsistent (setup.py): finished with status 'done'
#19 196.6 Created wheel for pyrsistent: filename=pyrsistent-0.15.2-cp37-cp37m-linux_aarch64.whl size=129753 sha256=9a4ca4be9aef2fbd77f4e2a769f87d67a4805fcc5940a9ab329a059645c95419
#19 196.6 Stored in directory: /home/jovyan/.cache/pip/wheels/bf/84/c2/d54c1edb44cc3248e7a0bc06f76395b7e971eaf52b3f63835b
#19 196.6 Building wheel for pyzmq (setup.py): started
#19 251.2 Building wheel for pyzmq (setup.py): finished with status 'done'
#19 251.2 Created wheel for pyzmq: filename=pyzmq-18.0.1-cp37-cp37m-linux_aarch64.whl size=6365580 sha256=c26cca45da14c3c3ae0654f29a224970de8ae284f66c58fcc7a0860012828d00
#19 251.2 Stored in directory: /home/jovyan/.cache/pip/wheels/41/19/79/c53cb3e5358dff43028ac8afed4f3dcf3ced0239f9fa97984f
#19 251.2 Building wheel for retrying (setup.py): started
#19 251.5 Building wheel for retrying (setup.py): finished with status 'done'
#19 251.5 Created wheel for retrying: filename=retrying-1.3.3-py3-none-any.whl size=11448 sha256=3752c6106eff9307a81f111a601a72d0d548ae679e4d38315d2c6b637c32522e
#19 251.5 Stored in directory: /home/jovyan/.cache/pip/wheels/f9/8d/8d/f6af3f7f9eea3553bc2fe6d53e4b287dad18b06a861ac56ddf
#19 251.5 Building wheel for spacy (pyproject.toml): started
#19 312.5 Building wheel for spacy (pyproject.toml): still running...
#19 373.3 Building wheel for spacy (pyproject.toml): still running...
#19 433.6 Building wheel for spacy (pyproject.toml): still running...
#19 441.3 Building wheel for spacy (pyproject.toml): finished with status 'done'
#19 441.3 Created wheel for spacy: filename=spacy-2.3.7-cp37-cp37m-linux_aarch64.whl size=26652568 sha256=9c3ab5eb8e4723226a3e68cf4943bb779010b0d11f6321938f36bc74cd3c293d
#19 441.3 Stored in directory: /home/jovyan/.cache/pip/wheels/aa/99/63/f57e42849e2e628229458201f2d3e61896ed3cfe2fe0c339e3
#19 441.3 Building wheel for sentence-transformers (setup.py): started
#19 441.6 Building wheel for sentence-transformers (setup.py): finished with status 'done'
#19 441.6 Created wheel for sentence-transformers: filename=sentence_transformers-1.1.0-py3-none-any.whl size=119616 sha256=cb1d60ae82025d648865233c1d6fa94990f677990bb430c68a2f0863ae05ea68
#19 441.6 Stored in directory: /home/jovyan/.cache/pip/wheels/20/fd/72/b2524b6c3af92dae3ce173595aeff673a8114255809a9aa381
#19 441.6 Building wheel for tornado (setup.py): started
#19 441.9 Building wheel for tornado (setup.py): finished with status 'done'
#19 441.9 Created wheel for tornado: filename=tornado-6.0.3-cp37-cp37m-linux_aarch64.whl size=424708 sha256=3af5a2cbe946cbd2be632b4cc9a721868ec22ab5dabb04aeac424c658f8ea6f2
#19 441.9 Stored in directory: /home/jovyan/.cache/pip/wheels/d0/31/2c/9406ed59f0dcdce0c453a8664124d738551590e74fc087f604
#19 442.0 Building wheel for tokenizers (pyproject.toml): started
#19 442.1 Building wheel for tokenizers (pyproject.toml): finished with status 'error'
#19 442.1 ERROR: Command errored out with exit status 1:
#19 442.1 command: /opt/conda/bin/python3.7 /opt/conda/lib/python3.7/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmptmcnxiko
#19 442.1 cwd: /tmp/pip-install-7dlawjhx/tokenizers_81a2f8296e644e608c00a45a89cdcd07
#19 442.1 Complete output (50 lines):
#19 442.1 running bdist_wheel
#19 442.1 running build
#19 442.1 running build_py
#19 442.1 creating build
#19 442.1 creating build/lib.linux-aarch64-3.7
#19 442.1 creating build/lib.linux-aarch64-3.7/tokenizers
#19 442.1 copying py_src/tokenizers/init.py -> build/lib.linux-aarch64-3.7/tokenizers
#19 442.1 creating build/lib.linux-aarch64-3.7/tokenizers/models
#19 442.1 copying py_src/tokenizers/models/init.py -> build/lib.linux-aarch64-3.7/tokenizers/models
#19 442.1 creating build/lib.linux-aarch64-3.7/tokenizers/decoders
#19 442.1 copying py_src/tokenizers/decoders/init.py -> build/lib.linux-aarch64-3.7/tokenizers/decoders
#19 442.1 creating build/lib.linux-aarch64-3.7/tokenizers/normalizers
#19 442.1 copying py_src/tokenizers/normalizers/init.py -> build/lib.linux-aarch64-3.7/tokenizers/normalizers
#19 442.1 creating build/lib.linux-aarch64-3.7/tokenizers/pre_tokenizers
#19 442.1 copying py_src/tokenizers/pre_tokenizers/init.py -> build/lib.linux-aarch64-3.7/tokenizers/pre_tokenizers
#19 442.1 creating build/lib.linux-aarch64-3.7/tokenizers/processors
#19 442.1 copying py_src/tokenizers/processors/init.py -> build/lib.linux-aarch64-3.7/tokenizers/processors
#19 442.1 creating build/lib.linux-aarch64-3.7/tokenizers/trainers
#19 442.1 copying py_src/tokenizers/trainers/init.py -> build/lib.linux-aarch64-3.7/tokenizers/trainers
#19 442.1 creating build/lib.linux-aarch64-3.7/tokenizers/implementations
#19 442.1 copying py_src/tokenizers/implementations/init.py -> build/lib.linux-aarch64-3.7/tokenizers/implementations
#19 442.1 copying py_src/tokenizers/implementations/byte_level_bpe.py -> build/lib.linux-aarch64-3.7/tokenizers/implementations
#19 442.1 copying py_src/tokenizers/implementations/sentencepiece_bpe.py -> build/lib.linux-aarch64-3.7/tokenizers/implementations
#19 442.1 copying py_src/tokenizers/implementations/base_tokenizer.py -> build/lib.linux-aarch64-3.7/tokenizers/implementations
#19 442.1 copying py_src/tokenizers/implementations/bert_wordpiece.py -> build/lib.linux-aarch64-3.7/tokenizers/implementations
#19 442.1 copying py_src/tokenizers/implementations/char_level_bpe.py -> build/lib.linux-aarch64-3.7/tokenizers/implementations
#19 442.1 copying py_src/tokenizers/implementations/sentencepiece_unigram.py -> build/lib.linux-aarch64-3.7/tokenizers/implementations
#19 442.1 creating build/lib.linux-aarch64-3.7/tokenizers/tools
#19 442.1 copying py_src/tokenizers/tools/init.py -> build/lib.linux-aarch64-3.7/tokenizers/tools
#19 442.1 copying py_src/tokenizers/tools/visualizer.py -> build/lib.linux-aarch64-3.7/tokenizers/tools
#19 442.1 copying py_src/tokenizers/init.pyi -> build/lib.linux-aarch64-3.7/tokenizers
#19 442.1 copying py_src/tokenizers/models/init.pyi -> build/lib.linux-aarch64-3.7/tokenizers/models
#19 442.1 copying py_src/tokenizers/decoders/init.pyi -> build/lib.linux-aarch64-3.7/tokenizers/decoders
#19 442.1 copying py_src/tokenizers/normalizers/init.pyi -> build/lib.linux-aarch64-3.7/tokenizers/normalizers
#19 442.1 copying py_src/tokenizers/pre_tokenizers/init.pyi -> build/lib.linux-aarch64-3.7/tokenizers/pre_tokenizers
#19 442.1 copying py_src/tokenizers/processors/init.pyi -> build/lib.linux-aarch64-3.7/tokenizers/processors
#19 442.1 copying py_src/tokenizers/trainers/init.pyi -> build/lib.linux-aarch64-3.7/tokenizers/trainers
#19 442.1 copying py_src/tokenizers/tools/visualizer-styles.css -> build/lib.linux-aarch64-3.7/tokenizers/tools
#19 442.1 running build_ext
#19 442.1 error: can't find Rust compiler
#19 442.1
#19 442.1 If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
#19 442.1
#19 442.1 To update pip, run:
#19 442.1
#19 442.1 pip install --upgrade pip
#19 442.1
#19 442.1 and then retry package installation.
#19 442.1
#19 442.1 If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
#19 442.1 ----------------------------------------
#19 442.1 ERROR: Failed building wheel for tokenizers
#19 442.1 Successfully built backcall bs4 nltk pandocfilters prometheus-client pyrsistent pyzmq retrying spacy sentence-transformers tornado
#19 442.1 Failed to build tokenizers
#19 442.1 ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

executor failed running [/bin/bash -o pipefail -c python -m pip install --upgrade pip && pip install -r requirements.txt]: exit code: 1
ERROR: Service 'notebooks' failed to build : Build failed

Make all notebooks trusted by default?

Right now, all notebooks are untrusted when you load them first time. Should we make them trusted? (docs)

Should we commit results in the notebooks?

If we're expecting that people will run the code in the notebooks, should we strip results from them, or leave committed?

With results in, it's a bit harder to do code reviews, etc.

Upgrade Spark to 3.x series

We're using Spark 2.4 that is outdated & don't have many performance improvements. It's better to switch to Spark 3.x, and install PySpark via pip instead of "manual" download.

`ch6/synonym detection.ipynb` fails on reading non-existent `product_description.json`

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[2], line 1
----> 1 product_description = pd.read_json("../data/temp/product_description.json")
      2 signals = pd.read_json("../data/temp/signal_sample.json")
      3 signals["query"] = signals["query_s"].apply(lambda x: re.sub("s$","", x.lower())) #conduct minimum stemming

File /opt/conda/lib/python3.10/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
    209     else:
    210         kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)

Should we remove `ch6/RelatedKeywords-samples-old.ipynb` ?

Should we remove "Old ALS Code" in the `ch09/1.personalization.ipynb` ?

It doesn't work anyway...

Fix warning about loading JupyterLab as notebook extension

[W 13:01:53.975 NotebookApp] Loading JupyterLab as a classic notebook (v6) extension.
[C 13:01:53.976 NotebookApp] You must use Jupyter Server v1 to load JupyterLab as notebook extension. You have v2.3.0 installed.
    You can fix this by executing:
        pip install -U "jupyter-server<2.0.0"

plot_judgments doesn't display a chart inline

In the 3.ch10-pairwise-transform notebook, command 5 didn't display the chart with the call to plot_judgments.

I needed to run %matplotlib inline to get the plot to show up.

`ch7/2.semantic-search.ipynb` - web server doesn't work correctly in Docker

IFrame shows connection refused even if web server is running.

But in general - it's not clear why we need it in this notebook...

Upgrade Spark-Solr to latest version

Current version uses 4.0.2, while 4.0.3 is already available

Discuss: Swtich to Jupyter Lab interface instead of using classic notebooks

Jupyter Lab provides more features, including tabbed browser, more functionality, etc.

If we switch to it, then we'll need to regenerate screenshots for Appendix A

`ch14/3.question-answering-GPU-fine-tuning.ipynb` should have a comment that it's designed to run on Google Collab, not in the Jupyter directly

Also, versions installed from the notebook conflict with installed in Jupyter

Can't setup on Mac OS (Apple Chip)

When you run docker-compose up the environment setup fails because there are no images available for Mac OS.

docker-compose up                                                                  at  11:52:45
[+] Running 0/1
 ⠴ zookeeper Pulling                                                                                                                                            2.5s
no matching manifest for linux/arm64/v8 in the manifest list entries

Chapter 13 Download Data set fails

when running this part of the code

def download_outdoors_dataset():
    from ltr.download import download, extract_tgz
    import tarfile

    dataset = ['https://github.com/ai-powered-search/outdoors/raw/master/outdoors.tgz']
    download(dataset, dest='data/')
    extract_tgz('data/outdoors.tgz') # -> Holds 'outdoors.csv', a big CSV file of the stackexchange outdoors dataset

Receiving this error:

ReadError: not a gzip file

Inside the gz file is a bunch of html I do not see any csv files

Packages can't be installed when building `docker-notebooks` image.

Building the image docker-notebooks fails:

docker-compose up

Fails with error:

[docker-notebooks  4/16] COPY requirements.txt ./                                                                                                           0.0s
 => ERROR [docker-notebooks  5/16] RUN python -m pip --no-cache-dir install --upgrade pip &&   pip --no-cache-dir install -r requirements.txt


#0 161.3       running build_ext
#0 161.3       running build_rust
#0 161.3       error: can't find Rust compiler
#0 161.3       
#0 161.3       If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
#0 161.3       
#0 161.3       To update pip, run:
#0 161.3       
#0 161.3           pip install --upgrade pip
#0 161.3       
#0 161.3       and then retry package installation.
#0 161.3       
#0 161.3       If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
#0 161.3       [end of output]
#0 161.3   
#0 161.3   note: This error originates from a subprocess, and is likely not a problem with pip.
#0 161.3   ERROR: Failed building wheel for tokenizers
#0 161.3   Building wheel for sacremoses (setup.py): started
#0 161.6   Building wheel for sacremoses (setup.py): finished with status 'done'
#0 161.6   Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895260 sha256=c0570d73297a0d503f9b7d64e9302ac5028a9d10417ed3375cbb8b0027e98382
#0 161.6   Stored in directory: /tmp/pip-ephem-wheel-cache-bg0l1kp3/wheels/42/79/78/5ad3b042cb2d97c294535162cdbaf9b167e3b186eae55ab72d
#0 161.6 Successfully built bs4 pysolr retrying sentence-transformers spacy sacremoses
#0 161.6 Failed to build tokenizers
#0 161.6 ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
------
failed to solve: executor failed running [/bin/bash -o pipefail -c python -m pip --no-cache-dir install --upgrade pip &&   pip --no-cache-dir install -r requirements.txt]: exit code: 1

Memory Issue

When I issue the command "docker-compose down" everything appears to run fine until it comes time to install python packages for the notebooks service. At this point, I receive: "ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device". I don't think that my device lacks the memory or RAM. Any suggestions what might be causing this issue? Is there any chance you could include system requirements on the README as well?

Some packages are installed twice, increasing the build time & image size

Some dependencies that are specified in the requirements.txt, like, transformers are later replaced by installing https://github.com/explosion/spacy-experimental/releases/download/v0.6.1/en_coreference_web_trf-3.4.0a2-py3-none-any.whl:

en-coreference-web-trf-3.4.0a2 spacy-alignments-0.9.1 spacy-transformers-1.1.9 transformers-4.25.1

Add a note to `ch14/1.question-answering-visualizer.ipynb` (and other notebooks in Chapter 14) that it requires that `outdoors` data set is already installed

We either need to add a comment, or add a code that will clone & unpack this dataset.

`ch5/1.open-information-extraction.ipynb` - Listing 5.1 `RuntimeError: could not create a primitive descriptor for a matmul primitive` on Apple M2

Hi, when trying to run Listing 5.1 via the provided docker-compose image on an Apple M2 machine, I get an exception (below). Other sections of the notebooks are working as expected, and the same code works correctly on an x86_64-based Linux system, also running via docker-compose. Perhaps an issue with spaCy or PyTorch support on that architecture? (I can offer to help test patches, but I'm not sure how to further triage the underlying issue.)

Specific notebook entry:

ai-powered-search/docker/data-science/notebooks/ch05/1.open-information-extraction.ipynb

Lines 167 to 186 in 4e1681d

    
            "source": [ 
        
             "text = \"\"\"\n", 
        
             "Data Scientists build machine learning models. They also write code. Companies employ Data Scientists. \n", 
        
             "Software Engineers also write code. Companies employ Software Engineers.\n", 
        
             "\"\"\"\n", 
        
             "\n", 
        
             "def generate_graph(text):\n", 
        
             "    doc = coref_model(text)    \n", 
        
             "    doc = resolve_coreferences(doc) # \"they\" => \"Data Scientists\"\n", 
        
             "    sentences = get_sentences(lang_model(doc)) # Data Scientists also write code. => ['nsubj, 'advmod', ROOT', 'dobj', 'punct']    \n", 
        
             "    \n", 
        
             "    facts=list()\n", 
        
             "    for sentence in sentences:\n", 
        
             "        facts.extend(resolve_facts(sentence)) # subj:(Companies), rel:(employ), obj:(Data Scientists)\n", 
        
             "    return facts\n", 
        
             "\n", 
        
             "graph = generate_graph(text)\n", 
        
             "for i in graph: print(i)" 
        
            ] 
        
           },

exception

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[8], line 16
     13         facts.extend(resolve_facts(sentence)) # subj:(Companies), rel:(employ), obj:(Data Scientists)
     14     return facts
---> 16 graph = generate_graph(text)
     17 for i in graph: print(i)

Cell In[8], line 7, in generate_graph(text)
      6 def generate_graph(text):
----> 7     doc = coref_model(text)    
      8     doc = resolve_coreferences(doc) # "they" => "Data Scientists"
      9     sentences = get_sentences(lang_model(doc)) # Data Scientists also write code. => ['nsubj, 'advmod', ROOT', 'dobj', 'punct']    

File /opt/conda/lib/python3.10/site-packages/spacy/language.py:1031, in Language.__call__(self, text, disable, component_cfg)
   1029     raise ValueError(Errors.E109.format(name=name)) from e
   1030 except Exception as e:
-> 1031     error_handler(name, proc, [doc], e)
   1032 if not isinstance(doc, Doc):
   1033     raise ValueError(Errors.E005.format(name=name, returned_type=type(doc)))

File /opt/conda/lib/python3.10/site-packages/spacy/util.py:1670, in raise_error(proc_name, proc, docs, e)
   1669 def raise_error(proc_name, proc, docs, e):
-> 1670     raise e

File /opt/conda/lib/python3.10/site-packages/spacy/language.py:1026, in Language.__call__(self, text, disable, component_cfg)
   1024     error_handler = proc.get_error_handler()
   1025 try:
-> 1026     doc = proc(doc, **component_cfg.get(name, {}))  # type: ignore[call-arg]
   1027 except KeyError as e:
   1028     # This typically happens if a component is not initialized
   1029     raise ValueError(Errors.E109.format(name=name)) from e

File /opt/conda/lib/python3.10/site-packages/spacy/pipeline/trainable_pipe.pyx:56, in spacy.pipeline.trainable_pipe.TrainablePipe.__call__()

File /opt/conda/lib/python3.10/site-packages/spacy/util.py:1670, in raise_error(proc_name, proc, docs, e)
   1669 def raise_error(proc_name, proc, docs, e):
-> 1670     raise e

File /opt/conda/lib/python3.10/site-packages/spacy/pipeline/trainable_pipe.pyx:52, in spacy.pipeline.trainable_pipe.TrainablePipe.__call__()

File /opt/conda/lib/python3.10/site-packages/spacy_experimental/coref/coref_component.py:153, in CoreferenceResolver.predict(self, docs)
    150     out.append([])
    151     continue
--> 153 scores, idxs = self.model.predict([doc])
    154 # idxs is a list of mentions (start / end idxs)
    155 # each item in scores includes scores and a mapping from scores to mentions
    156 ant_idxs = idxs

File /opt/conda/lib/python3.10/site-packages/thinc/model.py:334, in Model.predict(self, X)
    330 def predict(self, X: InT) -> OutT:
    331     """Call the model's `forward` function with `is_train=False`, and return
    332     only the output, instead of the `(output, callback)` tuple.
    333     """
--> 334     return self._func(self, X, is_train=False)[0]

File /opt/conda/lib/python3.10/site-packages/thinc/layers/chain.py:54, in forward(model, X, is_train)
     52 callbacks = []
     53 for layer in model.layers:
---> 54     Y, inc_layer_grad = layer(X, is_train=is_train)
     55     callbacks.append(inc_layer_grad)
     56     X = Y

File /opt/conda/lib/python3.10/site-packages/thinc/model.py:310, in Model.__call__(self, X, is_train)
    307 def __call__(self, X: InT, is_train: bool) -> Tuple[OutT, Callable]:
    308     """Call the model's `forward` function, returning the output and a
    309     callback to compute the gradients via backpropagation."""
--> 310     return self._func(self, X, is_train=is_train)

File /opt/conda/lib/python3.10/site-packages/spacy_experimental/coref/coref_model.py:85, in coref_forward(model, X, is_train)
     84 def coref_forward(model: Model, X, is_train: bool):
---> 85     return model.layers[0](X, is_train)

File /opt/conda/lib/python3.10/site-packages/thinc/model.py:310, in Model.__call__(self, X, is_train)
    307 def __call__(self, X: InT, is_train: bool) -> Tuple[OutT, Callable]:
    308     """Call the model's `forward` function, returning the output and a
    309     callback to compute the gradients via backpropagation."""
--> 310     return self._func(self, X, is_train=is_train)

File /opt/conda/lib/python3.10/site-packages/thinc/layers/pytorchwrapper.py:225, in forward(model, X, is_train)
    222 convert_outputs = model.attrs["convert_outputs"]
    224 Xtorch, get_dX = convert_inputs(model, X, is_train)
--> 225 Ytorch, torch_backprop = model.shims[0](Xtorch, is_train)
    226 Y, get_dYtorch = convert_outputs(model, (X, Ytorch), is_train)
    228 def backprop(dY: Any) -> Any:

File /opt/conda/lib/python3.10/site-packages/thinc/shims/pytorch.py:97, in PyTorchShim.__call__(self, inputs, is_train)
     95     return self.begin_update(inputs)
     96 else:
---> 97     return self.predict(inputs), lambda a: ...

File /opt/conda/lib/python3.10/site-packages/thinc/shims/pytorch.py:115, in PyTorchShim.predict(self, inputs)
    113 with torch.no_grad():
    114     with torch.cuda.amp.autocast(self._mixed_precision):
--> 115         outputs = self._model(*inputs.args, **inputs.kwargs)
    116 self._model.train()
    117 return outputs

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File /opt/conda/lib/python3.10/site-packages/spacy_experimental/coref/pytorch_coref_model.py:88, in CorefClusterer.forward(self, word_features)
     85     top_rough_scores_batch = top_rough_scores[i : i + batch_size]
     87     # a_scores_batch    [batch_size, n_ants]
---> 88     a_scores_batch = self.ana_scorer(
     89         all_mentions=words,
     90         mentions_batch=words_batch,
     91         pairwise_batch=pairwise_batch,
     92         top_indices_batch=top_indices_batch,
     93         top_rough_scores_batch=top_rough_scores_batch,
     94     )
     95     a_scores_lst.append(a_scores_batch)
     97 coref_scores = torch.cat(a_scores_lst, dim=0)

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File /opt/conda/lib/python3.10/site-packages/spacy_experimental/coref/pytorch_coref_model.py:165, in AnaphoricityScorer.forward(self, all_mentions, mentions_batch, pairwise_batch, top_indices_batch, top_rough_scores_batch)
    160 pair_matrix = self._get_pair_matrix(
    161     all_mentions, mentions_batch, pairwise_batch, top_indices_batch
    162 )
    164 # [batch_size, n_ants]
--> 165 scores = top_rough_scores_batch + self._ffnn(pair_matrix)
    166 scores = add_dummy(scores, eps=True)
    168 return scores

File /opt/conda/lib/python3.10/site-packages/spacy_experimental/coref/pytorch_coref_model.py:175, in AnaphoricityScorer._ffnn(self, x)
    170 def _ffnn(self, x: torch.Tensor) -> torch.Tensor:
    171     """
    172     x: tensor of shape (batch_size x rough_k x n_features
    173     returns: tensor of shape (batch_size x antecedent_limit)
    174     """
--> 175     x = self.out(self.hidden(x))
    176     return x.squeeze(2)

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

RuntimeError: could not create a primitive descriptor for a matmul primitive

ERROR: Version in "./docker-compose.yml" is unsupported.

With this docker-composer.yaml file I get the following error:

(base) raphy@pc:~/ai-powered-search/docker$ docker-compose up
ERROR: Version in "./docker-compose.yml" is unsupported. You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see https://docs.docker.com/compose/compose-file/

(base) raphy@pc:~/ai-powered-search/docker$ sudo docker-compose --version
docker-compose version 1.25.0, build unknown

(base) raphy@pc:~/ai-powered-search/docker$ sudo docker --version
Docker version 20.10.11, build dea9396

Commentin tin docker-compose.yml the line of the version:

#version: '3.8'

I get this error:

(base) raphy@pc:~/ai-powered-search/docker$ docker-compose up
ERROR: The Compose file './docker-compose.yml' is invalid because:
Unsupported config option for services: 'solr'
Unsupported config option for networks: 'zk-solr'

O.S. : Ubuntu 20.04

How to solve the problem?

Listing numbers in `ch13/4.semantic-search.ipynb` doesn't match to book's

13.16 in ch13/4.semantic-search.ipynb is 13.14 in the manuscript. We need to do adjustments

`ch6/bonus.related-terms-from-documents`, `ch6/synonym detection-v2.ipynb` & `ch6/RelatedKeywords-Samples` has missing code

Cell 4 uses a variable that isn't defined

aggr_signals = aggr_signals[aggr_signals["count"] > 1]
aggr_signals.shape[0]

gives

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 aggr_signals = aggr_signals[aggr_signals["count"] > 1]
      2 aggr_signals.shape[0]

NameError: name 'aggr_signals' is not defined

Indexing errors in `ch10/1.setup-the-movie-db.ipynb`, `ch11/0.setup.ipynb`, `ch12/0.setup.ipynb`

We need to add a warning that this is normal...

Wiping 'tmdb' collection
[('action', 'CREATE'), ('name', 'tmdb'), ('numShards', 1), ('replicationFactor', 1)]
Creating 'tmdb' collection
Status: Success
Del/Adding LTR QParser for tmdb collection
<Response [400]>
Status: Failure; Response:[ {'responseHeader': {'status': 400, 'QTime': 1}, 'errorMessages': ["error processing commands, errors: [{delete-queryparser=ltr, errorMessages=[NO such queryParser 'ltr' ]}], \n"], 'WARNING': 'This response format is experimental.  It is likely to change in the future.', 'error': {'metadata': ['error-class', 'org.apache.solr.api.ApiBag$ExceptionWithErrObject', 'root-error-class', 'org.apache.solr.api.ApiBag$ExceptionWithErrObject'], 'details': [{'delete-queryparser': 'ltr', 'errorMessages': ["NO such queryParser 'ltr' "]}], 'msg': "error processing commands, errors: [{delete-queryparser=ltr, errorMessages=[NO such queryParser 'ltr' ]}], ", 'code': 400}} ]
Status: Success
Adding LTR Doc Transformer for tmdb collection
Status: Failure; Response:[ {'responseHeader': {'status': 400, 'QTime': 1}, 'errorMessages': ["error processing commands, errors: [{delete-transformer=features, errorMessages=[NO such transformer 'features' ]}], \n"], 'WARNING': 'This response format is experimental.  It is likely to change in the future.', 'error': {'metadata': ['error-class', 'org.apache.solr.api.ApiBag$ExceptionWithErrObject', 'root-error-class', 'org.apache.solr.api.ApiBag$ExceptionWithErrObject'], 'details': [{'delete-transformer': 'features', 'errorMessages': ["NO such transformer 'features' "]}], 'msg': "error processing commands, errors: [{delete-transformer=features, errorMessages=[NO such transformer 'features' ]}], ", 'code': 400}} ]

Problems in `ch07/2.semantic-search.ipynb`

If I'm trying to run get_category_and_term_vector_solr_response("kimchi"), I'm getting:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 get_category_and_term_vector_solr_response("kimchi")

Cell In[6], line 16, in get_category_and_term_vector_solr_response(keyword)
      1 def get_category_and_term_vector_solr_response(keyword):
      2     query = {
      3         "params": { "fore": keyword, "back": "*:*", "df": "text_t" },
      4         "query": "*:*", "limit": 0,
   (...)
     13                         "type" : "terms", "field" : "doc_type", "limit": 1, "sort": { "r2": "desc" },
     14                         "facet" : { "r2" : "relatedness($fore,$back)"  }}}}}}
---> 16     response = run_search(query)
     17     return json.loads(response)

Cell In[8], line 12, in run_search(text)
     11 def run_search(text):
---> 12     q = urllib.parse.quote(text)
     13     qf, defType = "text_t", "lucene"
     15     return requests.get(SOLR_URL + "/reviews/select?q=" + q + "&qf=" + qf + "&defType=" + defType).text

File /opt/conda/lib/python3.10/urllib/parse.py:869, in quote(string, safe, encoding, errors)
    867     if errors is not None:
    868         raise TypeError("quote() doesn't support 'errors' for bytes")
--> 869 return quote_from_bytes(string, safe)

File /opt/conda/lib/python3.10/urllib/parse.py:894, in quote_from_bytes(bs, safe)
    889 """Like quote(), but accepts a bytes object rather than a str, and does
    890 not perform string-to-bytes encoding.  It always returns an ASCII string.
    891 quote_from_bytes(b'abc def\x3f') -> 'abc%20def%3f'
    892 """
    893 if not isinstance(bs, (bytes, bytearray)):
--> 894     raise TypeError("quote_from_bytes() expected bytes")
    895 if not bs:
    896     return ''

TypeError: quote_from_bytes() expected bytes

Also, run_search is defined later in the notebook, not before this function

Improvement: refactor `docker-compose.yaml` to avoid using `container_name`

Right now, the container_name is hardcoded, so when new containers are built (i.e., code has changed) they don't replace previous one, and they need to be removed explicitly.

When doing next round of refactoring, we may need to get rid of the container_name and just use service names as host names

	"source": [
	"text = \"\"\"\n",
	"Data Scientists build machine learning models. They also write code. Companies employ Data Scientists. \n",
	"Software Engineers also write code. Companies employ Software Engineers.\n",
	"\"\"\"\n",
	"\n",
	"def generate_graph(text):\n",
	" doc = coref_model(text) \n",
	" doc = resolve_coreferences(doc) # \"they\" => \"Data Scientists\"\n",
	" sentences = get_sentences(lang_model(doc)) # Data Scientists also write code. => ['nsubj, 'advmod', ROOT', 'dobj', 'punct'] \n",
	" \n",
	" facts=list()\n",
	" for sentence in sentences:\n",
	" facts.extend(resolve_facts(sentence)) # subj:(Companies), rel:(employ), obj:(Data Scientists)\n",
	" return facts\n",
	"\n",
	"graph = generate_graph(text)\n",
	"for i in graph: print(i)"
	]
	},

treygrainger / ai-powered-search Goto Github PK

ai-powered-search's People

Contributors

Stargazers

Watchers

Forkers

ai-powered-search's Issues

PS C:\kendevelopment\ai-powered-search\docker> docker-compose ps Name Command State Ports

Recommend Projects

Recommend Topics

Recommend Org

PS C:\kendevelopment\ai-powered-search\docker> docker-compose ps
Name Command State Ports