data61 / anonlink-entity-service Goto Github PK

View Code? Open in Web Editor NEW

26.0 5.0 8.0 12.49 MB

Privacy Preserving Record Linkage Service

License: Apache License 2.0

Python 82.25% Smarty 0.46% Shell 2.03% Jupyter Notebook 14.11% Dockerfile 1.03% Mako 0.12%

privacy-preserving-record-linkage record-linkage privacy-enhancing-technologies

anonlink-entity-service's Issues

Entity service only requesting the top 5 matches for each chunks comparison

I'm concerned with the following line on compute_filter_similarity() in async_worker.py:

chunk_results = anonlink.entitymatch.calculate_filter_similarity(chunk_dp1, chunk_dp2,
                                                                 threshold=threshold,
                                                                 k=5,
                                                                 use_python=False)

Why is k arbitrarily set to 5? Is there a better value for k and why?

Entity Service should have a failed state

From engineering created by hardbyte : n1analytics/engineering#215

Issue by tho802
Thursday Jul 28, 2016 at 21:40 GMT
Originally opened as https://github.csiro.au/magic/n1-compute/issues/215

Running the 1M x 1M on the entity service I got a failure on the db:

FATAL:  the database system is in recovery mode
FATAL:  the database system is in recovery mode
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  invalid record length at 0/727F6AC0

Which causes connection errors in the worker process:

08:11:03   INFO     Received task: async_worker.compute_filter_similarity[77e6485d-4673-474b-8be2-f08125fa4568]
08:11:03   WARNING  warning connecting to default postgres db
08:11:03   WARNING  warning connecting to default postgres db
08:11:03   WARNING  Can't connect to database
08:11:03   ERROR    Task async_worker.compute_filter_similarity[8f104a8c-4798-4e7b-8abd-6a74c0eccc07] raised unexpected: ConnectionError('Issue connecting to database',)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/var/www/async_worker.py", line 361, in compute_filter_similarity
    db = connect_db()
  File "/var/www/database.py", line 39, in connect_db
    raise ConnectionError("Issue connecting to database")
ConnectionError: Issue connecting to database
08:11:03   WARNING  Can't connect to database

Need to deal with that somehow by changing the task rate, rerunning the failed tasks or marking the mapping as "failed".

Ingress resource could route based on version

Ingress resources can specify URL rewriting and paths as well as domains/virtual host names.

It would be nice to have multiple versions of the entity service available behind the same domain so end users can test or pin to particular versions - we have even allowed for this in the path: https://es.data61.xyz/api/version

es.data61.xyz/api/v1 -> point to latest stable 1.x service
es.data61.xyz/api/v1.1 -> point to a specific version service
es.data61.xyz/api/v1.2 -> point to a specific version service

OpenAPI/Swagger Api docs

Spec: https://github.com/OAI/OpenAPI-Specification

Centralised logging

Lots of options: ElasticSearch, PostGresQL, AWS Cloud Watch...

Would be nice to support the same as N1-Engine which suggests logging to postgresql.

Create and update `deployment.svg`

The image representing the deployment is saved in a png file which cannot be easily updated in case of typo (for example, the container is named traefik, not traefic).
And having red underlines for words which are not recognised is not really nice...
More generally, it would be nice to keep the source files which created our images for easy updates if necessary.

Refactor database interaction to use sqlachamy

For long term maintainability it would be good to refactor the database code to use sqlalchamy.

Export metrics

Thinking something similar to how the redis helm chart allows loading a "sidecar" prometheus exporter of metrics. This optional component would export regular updates into our monitoring solution of choice e.g. postgresdb or elasticsearch.

https://github.com/kubernetes/charts/tree/master/stable/redis#configuration

Standalone docs

Create sphinx docs for entity service

Implement on_failure handlers and default retries to tasks

Need to look at best practise.

Probably implement via a shared BaseTask.

Migrate from anonlink to clkhash

Check that any hashing done as part of testing uses clkhash instead of anonlink.

Note some testing code uses the fake pii generators that were in anonlink.

Low resource deployment

Seeing these logs from redis on a system with memory constraints:

redis_1    | 1:M 08 Nov 06:22:22.105 # Client id=1048 addr=172.19.0.5:51698 fd=31 name= age=9 idle=0 flags=N db=0 sub=616549 psub=0 multi=-1 qbuf=16305 qbuf-free=16569 obl=0 oll=3307 omem=81319152 events=rw cmd=subscribe scheduled to be closed ASAP for overcoming of output buffer limits.

We should determine and document minimum reasonable requirements.

Define tests for Kubernetes cluster deployments

Test that we can deploy the service to a kubernetes cluster.

Aha! Link: https://csiro.aha.io/features/ANONLINK-14

Entity service does not accept chunked content

I see in nginx conf that:

    # Disable buffering of client data so we can handle larger uploads
    proxy_request_buffering off;

Which might make sense when receiving large uploads to choose what to do in the backend, but I'm not sure it is working/handled well in general.
In fact, I cannot post a Json to the mapping endpoint with a single chunk.
To repeat the issue, have the file containing the following request:

{
  "paillier_context": {
    "public_key":{
      "n": "AKkOPnV97gEWxlWxE2VzSolyEI-5x0TFf_kQaBa7ykuFo6gy8Mi6VVbEHPmNCYcCXBWhMPiGrkCID2lOYr_PKbx8npyblbRRXyPFlx9h1XbUugTUIoHE_jJiz2mVd7tJwoX8odCGPnEioxb0fZpNI8yNvfAjMTx7MnLw6uGvhkI_U-JbYKg-QJV-SGjeWz5nz6dHz7G1d9yKLAcwMFrW-3-ZkkwNb8SbYE7dJCElEiddAPUoBOyoFB-hy4JMYO3Avj3XD6kOkIBlyge8TpvkMHPjCoFRd7Qszi70xSebgtMrEWdYdd-4Ama306q4NG6y2KLsBH4f_mdIJRKzqhNext8",
      "key_ops": ["encrypt"],
      "kty":"DAJ",
      "kid":"Paillier public key for entity matching service",
      "alg":"PAI-GN1"
    },
    "s":true,
    "p":2048,
    "base":2,
    "encoded":true
  },
  "public_key":{
    "n": "AKkOPnV97gEWxlWxE2VzSolyEI-5x0TFf_kQaBa7ykuFo6gy8Mi6VVbEHPmNCYcCXBWhMPiGrkCID2lOYr_PKbx8npyblbRRXyPFlx9h1XbUugTUIoHE_jJiz2mVd7tJwoX8odCGPnEioxb0fZpNI8yNvfAjMTx7MnLw6uGvhkI_U-JbYKg-QJV-SGjeWz5nz6dHz7G1d9yKLAcwMFrW-3-ZkkwNb8SbYE7dJCElEiddAPUoBOyoFB-hy4JMYO3Avj3XD6kOkIBlyge8TpvkMHPjCoFRd7Qszi70xSebgtMrEWdYdd-4Ama306q4NG6y2KLsBH4f_mdIJRKzqhNext8",
    "key_ops":["encrypt"],
    "kty":"DAJ",
    "kid":"Paillier public key for entity matching service",
    "alg":"PAI-GN1"
  },
  "schema":[
    {"identifier":"INDEX","weight":0,"notes":"","unigram":false,"toRemove":""},
    {"identifier":"NAME first last","weight":1,"notes":"","unigram":false,"toRemove":""},
    {"identifier":"DOB YYYY/MM/DD","weight":1,"notes":"","unigram":false,"toRemove":"/"},
    {"identifier":"GENDER M or F","weight":1,"notes":"","unigram":true,"toRemove":""}
  ],
  "result_type":"permutation_unencrypted_mask"
}

Start the entity service.
The following command works (return 200 response)

curl -v -X POST --header "Content-Type: application/json" -d @erquestFile http://0.0.0.0:8851/api/v1/mappings

However, the following return a 400 status:

curl -v -X POST --header "Transfer-Encoding: chunked" --header "Content-Type: application/json" -d @erquestFile http://0.0.0.0:8851/api/v1/mappings

The received message is

{
    "message": "Failed to decode JSON object: Expecting value: line 1 column 1 (char 0)"
}

Serve swagger UI

Either within the deployed app, or at least with the standalone docs #3 we should serve the open api spec that was written.

Flask plugin looks like one way. I've tried sphinx-swaggerdoc but just found too many issues... I opened a PR but I've already decided it is beyond help.

Consider allowing unauthenticated progress indication

Currently an unauthed user can see a mappings status

Eg GET https://es.data61.xyz/api/v1/mappings/77bc11914e957d00c82d32cae965a040e3514a2fd66ef0c8/status

{
    "ready": true,
    "time_completed": "2017-08-02T09:44:18.211053",
    "time_started": "2017-08-02T09:43:05.998527",
    "time_added": "2017-08-02T09:42:59.726863",
    "threshold": 0.95
}

Should we also expose the current progress and the size of the matching job?

Entity service should use the received Paillier Context

From engineering created by hardbyte : n1analytics/engineering#398

Issue by smi9c4
Monday Jan 30, 2017 at 04:35 GMT
Originally opened as https://github.csiro.au/magic/n1-compute/issues/398

For the branch feature-es-database-refactor and the PR #391

When a new permutation is posted (with encrypted mask), the Paillier public key and the context are saved in the database as such (the public key is checked but not the context).

Then when encrypting a number, only the public key and the base are used, not the remaining information from the context as the precision and the signed variables.

We should first check the received context, use it when encrypting the mask and send it well with the encrypted values.

Output the similarity metrics

In some use cases the actual decision of what to do with a possible link could be made outside this server if the similarity scores were exposed.

This proposal is to add a new view type where all the links above a certain threshold are returned. Note this would be a many 2 many linkage where some rows may be referenced multiple times.

Load Testing

Put together a load testing test suite.

The tool https://locust.io/ has proven good for this kind of thing.

Demo should be optionally included with deployment

I have a few jupyter notebooks which have been used to demonstrate this system. It would be good to tidy them up and include them in this repo.

Ideally include them in the production deployment so we can demo this thing.

Define hashing schema

To make it easier for external people developing a client side tool we need to more strictly define the translation of PII into CLKs.

I see this as the resource that configures the hashing tools and both participants can download the hashing schema from the server to check they agree on how they are creating bloom filters.

There are two components:

describing the Crypto setup (default number of hashes, using our current double hash strategy vs moving to single hash)
describing the identifying features that get hashed and their weights etc

An example schema:

{
    "version": "1.0",
    "hash": {
        "type": "double hash"
    }
    "features": [
        {"identifier": "firstname", "type": "freetext", "ngram": 2, "weight": 5, "notes":""},
        {"identifier": "gender", "type": "enum", "values": ["M", "F"], "ngram": 1, "weight": 1},
        {"identifier": "phone", "type": "freetext", "ngram": 1, "weight": 1, "transforms": [{"type": "strip", "values": "()-"}]},
        {"identifier": "postcode", "type": "freetext", "ngram": 1, "positional": true, "weight": 2}
    ]
}

Eventually I'd like to document the schema using http://json-schema.org which is very similar to OpenAPI spec/swagger but for JSON instead of REST.

Use jenkinsfile for integration tests

Started in branch jenkins-pipeline

Cross origin support

Writing the swagger docs for #4 I noticed that we don't allow cross origin requests for the entity-service.

We just need to add the CORS header as documented here.

Entity Service doesn't consider which party should be chunked

From engineering created by hardbyte : n1analytics/engineering#409

Issue by tho802
Thursday Feb 09, 2017 at 00:49 GMT
Originally opened as https://github.csiro.au/magic/n1-compute/issues/409

As @smi9c4 points out in a pr comment we just chunk up the hashes from one data provider without considering if the other is perhaps larger. Even better would be to chunk both.

Container has security warnings on quay.io

https://quay.io/repository/n1analytics/entity-app/image/4a8811202433ac64a7c42e5bd4367d4a44685fa3d99448e609dbdab649d377f5?tab=vulnerabilities

Add Container Diagrams to docs

We should have a high level architecture diagram showing the communication between the various containers.

Large scale 1M x 1M match fails

Some bad news while testing scalability of the entity service for ICML... Something went wrong. This issue is to record and investigate what happened, other issues will track the required fixes.

The only external indication is that progress continued past 1:

The test in question was 1M x 1M match with 20 worker pods across a 8 spot instances.

At least one worker has had an exception:

23:02:29   INFO     Received task: async_worker.compute_filter_similarity[484710f7-a9d5-4a9b-b12c-ee96e4e853e0]
23:02:29   INFO     Received task: async_worker.compute_filter_similarity[f51346d6-ba08-46de-8f4e-c76947a729a9]
23:07:40   INFO     Timings: Prep: 17.1777 + 32.3618, Solve: 37268.0178, Total: 37317.5574  Comparisons: 499857456
23:07:50   ERROR    Chord callback '9f9a597c-b781-486d-98bd-2780d94aa02a' raised: ValueError('9f9a597c-b781-486d-98bd-2780d94aa02a',)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/celery/backends/base.py", line 549, in on_chord_part_return
    raise ValueError(gid)
ValueError: 9f9a597c-b781-486d-98bd-2780d94aa02a
23:07:55   INFO     Task async_worker.compute_filter_similarity[1c2625ab-4454-48ca-a9f1-d55e812429e4] succeeded in 37332.14917168999s: [(429312, 0.9969604863221885, 923257), (429313, 0.995417048579285, 479794), (429314, 0.9966254218222722, 454009), (429315,...
23:07:55   INFO     Received task: async_worker.compute_filter_similarity[7f406fe6-ecf8-441f-bf4c-f1302794a287]
00:09:24   INFO     Timings: Prep: 19.9007 + 32.9159, Solve: 37639.4037, Total: 37692.2203  Comparisons: 499857456
00:09:34   ERROR    Chord callback '9f9a597c-b781-486d-98bd-2780d94aa02a' raised: ValueError('9f9a597c-b781-486d-98bd-2780d94aa02a',)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/celery/backends/base.py", line 549, in on_chord_part_return
    raise ValueError(gid)
ValueError: 9f9a597c-b781-486d-98bd-2780d94aa02a

Need to look into how we are using chords.

Centralised (searchable) logging from all workers would be nice for this.

Support multiple Cryptographic Longterm Keys

It may be required that companies want to create multiple CLKs per row so they can link with multiple other orginisations.

To support this:

the server, database and api will have to be updated to support multiple clks
the command line tool should optionally take the number of clks to create, and upload them all.

Update tagging/uploading logic to avoid latest

Following on from the discussion in #71 this issue is to edit the tools/build.sh and tools/upload.sh scripts and to allow the tagging logic to be set in the jenkinsfile.

I'd like to keep an easy way for a developer to build the docker images locally. In that case I think it is acceptable to tag with latest, but we should avoid such tagging in jenkins.

DB migration tool

Think about using Alembic to migrate between database schemas.

This is especially great to have once we are deployed for real in more than one place.

Aha! Link: https://csiro.aha.io/features/ANONLINK-20

Entity Service jobs don't scale nicely

From engineering created by hardbyte : n1analytics/engineering#394
Thursday Jan 19, 2017 at 00:39 GMT
Originally opened as https://github.csiro.au/magic/n1-compute/issues/394

A deployment time configurable for the entity service is the maximum number of comparisons each task should be. The trade off being that if it is too large most jobs won't be executed in parallel, however if too small the network and serialisation overhead dominates the actual work.

I've been running tests with it set to 10M - but as seen in the logs below for a job comparing 1M x 1M, the solving is only taking 5% of the actual time per task. I think 100M will improve things but it would be good to consider different chunk sizes for each task depending on the size of the overall job.

2017-01-19T00:34:09.471822433Z 00:34:09   INFO     Received task: async_worker.compute_filter_similarity[9315ed1e-5d5a-4c39-a2d3-a464111883cf]
2017-01-19T00:34:09.955952543Z 00:34:09   INFO     Timings: Prep: 94.2852 + 0.0058, Solve: 5.7127, Total: 100.0037
2017-01-19T00:34:09.957798556Z 00:34:09   INFO     Progress. Compared 10000000 CLKS

API to directly upload hashes via object store instead of json HTTP

There is a bottle neck in the main entity service app container must be able to fit any uploaded hashes entirely in memory before processing them. It shouldn't be too hard to instead allow uploading of binary hashes directly to the object store.

Test entity service backend

Something that is untested is broken.

At the moment there is fairly decent unit tests of the anonlink library and a script which does some end to end testing of a deployed entity matching service. However there are no tests of the end to end service that actually check the results of the matching!

Notes on unit testing

Testing the flask endpoints isn't that straightforward as we are coupled with celery, redis, postgresql and minio.

http://flask.pocoo.org/docs/0.12/testing/

Celery tasks can also be tested with a bit of mocking - http://docs.celeryproject.org/en/latest/userguide/testing.html

Entity Service doesn't strictly check passed JSON types

From engineering created by hardbyte : n1analytics/engineering#401
Issue by tho802
Monday Feb 06, 2017 at 05:40 GMT
Originally opened as https://github.csiro.au/magic/n1-compute/issues/401

Found an issue when the entity service was passed a request with incorrect json structure - instead of failing gracefully the server threw a 500 error. In this case an object was found where a string was expected and the server said the dict was unhashable.

This ticket is to explore libraries that offer type checking of json structures and implement one.

Test the final mapping/permutation results

From engineering created by hardbyte : n1analytics/engineering#365

Issue by smi9c4
Wednesday Dec 14, 2016 at 07:37 GMT
Originally opened as https://github.csiro.au/magic/n1-compute/issues/365

There are quite a few tests in the entity service, finishing by receiving the mapping/permutation. However we are not checking that it is right.

Create test that actually shuts down a worker mid way through processing

This would be very easy on kubernetes so this depends on automated k8s tests

Refactor flask GET result method

It has gotten a bit unwieldy dealing with each view type - it should be refactored into more of a dispatcher.

Add script to package for release

Need a script to remove all git history, IDE configurations, build the docs and zip everything up.

Attached a candidate release:
n1-es-v1.4.10.zip

Persist in progress match data

The output of the many compute_filter_similarity celery tasks are "reduced" to become the arguments to another task. In a very large matching this will be a fairly large amount of data. Unclear if there is a maximum other than what can fit in the underlying queue (redis).

I suspect we should save this match data else where and simply return a database id, or filename.

Aha! Link: https://csiro.aha.io/features/ANONLINK-9

Build and upload to quay.io using jenkinsfile

Database init should be part of backend

This would allow us to use a standard postgresql docker container, helm deployment. Use hosted database solutions like RDS etc.

Cannot build the image locally running `./tools/build.sh`

I get the following error:

Step 8 : RUN cd AnonymousLinking && pip install -U -r requirements.txt && pip install -e . && cd ..
 ---> Running in 42dc08862697
Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'
The command '/bin/sh -c cd AnonymousLinking && pip install -U -r requirements.txt && pip install -e . && cd ..' returned a non-zero code: 1

Store the similarity scores regardless of the result type

Currently the similarity scores are only stored if the result type is set to "similarity_scores". In the future we may want to store the similarity scores regardless of the result type. However, beware that it may need a lot of storage!

Aha! Link: https://csiro.aha.io/features/ANONLINK-10

Each match should take own threshold

Thresholds are configured for the server, they should be part of each match.

Expose state of matching

Friday Jan 06, 2017 at 06:32 GMT
Originally opened as https://github.csiro.au/magic/n1-compute/issues/383

The entity service should record and report mapping's states.

For example it currently might return:

{'current': '375000750000',
 'elapsed': 3677.268006,
 'message': "Mapping isn't ready.",
 'progress': 1.0,
 'total': '375000750000'}

Looking at the server logging is required to see if it was busy creating a permutation, or encrypting data.

Comparison rate no longer fits in a postgres Integer

Max value of an integer on postgres is +2147483647, approx rate (on 20 nodes) is now ~10 billion cmp/s.

(my new favourite bug)

Secure two-party record linkage

In Feb 2017 Dongxi Liu from Data61 Marsfield have proposed a method for doing secure division using paillier to calculate the dice coefficient. Meaning the entity matching could be worked out without a semi-trusted third party.

Update: Calculate E(A)/E(B) by sending E(r A + e0) and E(r B + e) for random r, e0, and e, and then calculating (r A+e0) / (r B + e), which will be an estimate of the dice coefficient with a bit more noise.
The attached Excel spreadsheet illustrates this updated calculation of dice coefficient.
approximate-dice-coefficient.xlsx

Protocol.pdf

Security

I think the correctness proof is not hard, and when we have an implementation, the correctness and the accuracy can also be verified. For the security, our security goal is to protect the bloom filter of each party, that is each party cannot know more information about the number of common 1-bits and the number of 1-bits of the bloom filter of another party, compared with the ideal model in which both bloom filters are sent to a third trusted party. Informally, if we assume the homomorphic encryption is secure, then the party A (owning the private key) has those values encrypted, so B cannot know any information about A's bloom filter. For B's bloom filter, the information of 1-bits are randomized with three random numbers, and the way of randomization means that A cannot recover the number of 1-bits based on the hardness of approximate GCD problem. In addition, the bloom filter cannot be too short; otherwise, a party can do a dictionary attack to recover the bloom filter of another party based on the similarity ratio.

Blocking

Consider allowing users to upload "blocks" along with the CLKs.

http://www.record-linkage.de/-Research--fuzzy_blocking.htm

https://github.com/dinusha9/DLAW02

Entity Service doesn't correctly handle when user uploads 0 hashes

This is a repeatable issue. I'm trying to compute a permutation with unencrypted mask where DP1 has less CLKS than DP2 (but I cannot ensure that the data arrive in any order).

From logs:

es_backend_1        | [2017-02-06 23:38:49 +0000] [12] [ERROR] Error handling request
es_backend_1        | Traceback (most recent call last):
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 130, in handle
es_backend_1        |     self.handle_request(listener, req, client, addr)
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 171, in handle_request
es_backend_1        |     respiter = self.wsgi(environ, resp.start_response)
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1836, in __call__
es_backend_1        |     return self.wsgi_app(environ, start_response)
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1820, in wsgi_app
es_backend_1        |     response = self.make_response(self.handle_exception(e))
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask_restful/__init__.py", line 271, in error_router
es_backend_1        |     return original_handler(e)
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1403, in handle_exception
es_backend_1        |     reraise(exc_type, exc_value, tb)
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask/_compat.py", line 32, in reraise
es_backend_1        |     raise value.with_traceback(tb)
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1817, in wsgi_app
es_backend_1        |     response = self.full_dispatch_request()
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1477, in full_dispatch_request
es_backend_1        |     rv = self.handle_user_exception(e)
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask_restful/__init__.py", line 271, in error_router
es_backend_1        |     return original_handler(e)
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1381, in handle_user_exception
es_backend_1        |     reraise(exc_type, exc_value, tb)
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask/_compat.py", line 32, in reraise
es_backend_1        |     raise value.with_traceback(tb)
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1475, in full_dispatch_request
es_backend_1        |     rv = self.dispatch_request()
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1461, in dispatch_request
es_backend_1        |     return self.view_functions[rule.endpoint](**req.view_args)
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask_restful/__init__.py", line 477, in wrapper
es_backend_1        |     resp = resource(*args, **kwargs)
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask/views.py", line 84, in view
es_backend_1        |     return self.dispatch_request(*args, **kwargs)
es_backend_1        |   File "/usr/local/lib/python3.5/site-packages/flask_restful/__init__.py", line 587, in dispatch_request
es_backend_1        |     resp = meth(*args, **kwargs)
es_backend_1        |   File "/var/www/entityservice.py", line 293, in get
es_backend_1        |     "progress": (comparisons/total_comparisons) if total_comparisons is not 'NA' else 0.0
es_backend_1        | ZeroDivisionError: division by zero

Test Entity Service on k8s works with large payloads

Need to check if the entity service can actually carry out record linkage with large test sets before deploying to production.

Aha! Link: https://csiro.aha.io/features/ANONLINK-15

data61 / anonlink-entity-service Goto Github PK

anonlink-entity-service's Issues

Notes on unit testing

Security

Recommend Projects

Recommend Topics

Recommend Org