microsoft / tools-for-health-data-anonymization Goto Github PK

Set of tools for helping with data (in FHIR format) anonymization.

License: MIT License

PowerShell 2.80% C# 97.20%

tools-for-health-data-anonymization's Introduction

Tools for Health Data Anonymization

Privacy Notice and Consent

This project provides you the scripts and command line tools for your own use. It does NOT and cannot access, use, collect, or manage any of your data, including any personal or health-related data. You must bring your own data, and be 100% responsible for using our tools to work with your own data.

Tools for Health Data Anonymization is an open-source project that helps anonymize healthcare data, on-premises or in the cloud, for secondary usage such as research, public health, and more. The project first released the anonymization of FHIR data to open source on Friday, March 6th, 2020. Currently, it supports both FHIR data anonymization and DICOM data anonymization.

For information on FHIR data anonymization, please check out the FHIR anonymization documentation.
For information on DICOM data anonymization, please check out the DICOM anonymization documentation.

The anonymization core engine uses a configuration file specifying different parameters as well as anonymization methods for different data-elements and datatypes. The repo contains a sample configuration file, which is based on the HIPAA Safe Harbor method. You can modify or create your own configuration file as needed.

This open source project is fully backed by the Microsoft Healthcare team, but we know that this project will only get better with your feedback and contributions. We are leading the development of this code base, and test builds and deployments daily.

FHIR® is the registered trademark of HL7 and is used with the permission of HL7. Use of the FHIR trademark does not constitute endorsement of this product by HL7.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

FHIR® is the registered trademark of HL7 and is used with the permission of HL7.

tools-for-health-data-anonymization's People

Contributors

Stargazers

Watchers

Forkers

rrios042 tongwu-sh feitianyiren dj-singh meddevintgrp sowu880 taffywrinkle claudiusgonzo ingramali quanwanxx icepear-jzx vivekkranjan hkamel xelement2020 brendankowitz global-localhost global19 global19-atlassian-net hongyh13 mchorfa itye-msft curlybytes equideum roel4ez brian-careevolution patradebkumar suthep-cto qpc-database rsliang buildabonfire iboonz chgl marmikreal snkeos vthambidurai-r1 carlossardo macmullensantiago 5l1v3r1 ellerbach alancam smseerat flypig0425 lushpush thegotoguy christopher-watanabe-snkeos dattachandan zoltanvilaghy duke-crucible brianpos brianm-zus mohrmz treaston2 aced-idp patientzer03131 omri374 mikix ecgkit mohitasrivas fredatgithub venkyvb karthigeyang prathameshkale alexathomases rsanthanagopalan td-peter hyelee-harv sathya-reddy-m nkaluva9 chriss-0x01 pharmaccess obior

tools-for-health-data-anonymization's Issues

User-facing API to add custom IAnonymizerProcessor

It would be great if it were possible to add custom IAnonymizerProcessors as a user of the library.

My proposal is to add a AnonymizerEngine.AddProcessor(string methodKey, IAnonymizerProcessor processor) method. Where methodKey is a user-defined configuration key.

This might require refactoring the way the security labels are applied: the IAnonymizationProcessor will need to specify the label that should be applied as there is no longer a fixed lookup between the static AnonymizationOperations and the security labels.

Cannot instantiate `AnonymizerEngine`

Firstly, this is a brilliant idea for a project. Thank you!

I'm using the AnonymizerEngine class from the NuGet package at:

<PackageReference Include="Microsoft.Health.Fhir.Anonymizer.R4.Core" Version="3.0.0.33" />

with the following configuration file:

https://raw.githubusercontent.com/microsoft/Tools-for-Health-Data-Anonymization/master/FHIR/src/Microsoft.Health.Fhir.Anonymizer.R4.CommandLineTool/configuration-sample.json

Upon instantiating the class, the following Exception is thrown:

Microsoft.Health.Fhir.Anonymizer.Core.Exceptions.AnonymizerConfigurationException
  HResult=0x80131500
  Message=Invalid FHIR path nodesByType('Extension')
  Source=Microsoft.Health.Fhir.Anonymizer.R4.Core
  StackTrace:
   at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurations.AnonymizerConfigurationValidator.Validate(AnonymizerConfiguration config)
   at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager..ctor(AnonymizerConfiguration configuration)
   at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager.CreateFromSettingsInJson(String settingsInJson)
   at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager.CreateFromConfigurationFile(String configFilePath)
   at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerEngine..ctor(String configFilePath)
   at [REFERENCES TO MY CALLING CODE]

  This exception was originally thrown at this call stack:
    [External Code]

Inner Exception 1:
ArgumentException: Unknown symbol 'nodesByType'

Any idea what I might be doing wrong?

Mark pseudonymized identifiers more visibly

I noticed that this library only replaces the Identifier.value. While some information indivating that this might not be a "real" value is in the security attribute of the resource, I do not think that is enough. While I agree with #52 , I also do not think simply adding an extension is enough either.
Both the security entry and the extension can be jusifiably ignored by recieving applications. Those applications would than go on to interpret the value as a "real" value. This might quickly become a problem, e.g. if trying to find corresponding objects in another, non-pseudonymized (or pseudonymized with a different system and therefore different pseudonyms) data source.
I believe one solution to this problem would be to create and add a modifier extension (https://www.hl7.org/fhir/extensibility.html#modifierExtension) to the Identifier. Modifier extensions tell the recieving application that this Identifier might not be what it expects and even if the system does not recognize the extension, it now knows that "something is different about this Identifier".
Another solution (which i personally would prefer since modifier extensions tend to cause problems) would be to also change the system. The original system could be added in an extension. This would avoid confusion and also some issues e.g. with constraints that test the format of an identifier (e.g. in a profile which specifies an error if the identifier.value does not start with two letters followed by a colon.). It might also be useful to add a coding to identifier.type which expresses that this is now a pseudonym generated from an "original" identifier.
Since i did not find information on how to handle pseudonyms in the FHIR spec, this might also be a topic that should be discuessed on the FHIR chat in order to get community feedback.

.NET 6.0

Since the support of .NET Core 3.1 stops in December 2022, are there any plans to migrate to .NET 6.0?

Consider labeling output data with relevant security labels

For background on how to apply these, see: https://www.hl7.org/fhir/security-labels.html#rsl

https://www.hl7.org/fhir/v3/ObservationValue/cs.html#v3-ObservationValue-ANONYED is a relevant tag that would make sense to assign to Resource.meta.security.

This would allow us to indicate that the output data have been subject to anonymization. (Another relevant tag might be https://www.hl7.org/fhir/v3/ObservationValue/cs.html#v3-ObservationValue-REDACTED; all this might be good to control with configuration.)

Microsoft.Health.Fhir.Anonymizer.R4.AzureDataFactoryPipeline is broken at head

Configure FHIR/src/Microsoft.Health.Fhir.Anonymizer.R4.AzureDataFactoryPipeline/AzureDataFactorySettings.json and FHIR/src/Microsoft.Health.Fhir.Anonymizer.Shared.AzureDataFactoryPipeline/scripts/ArmTemplate/arm_template_parameters.json as per the instructions and then try, e.g.:

PS /home/you> Connect-AzAccount
PS /home/you> Get-AzSubscription
PS /home/you> Select-AzSubscription -SubscriptionId "YOURS"
PS /home/you/code/FHIR-Tools-for-Anonymization/FHIR/src> $SubscriptionId = "YOURS"
PS /home/you/code/FHIR-Tools-for-Anonymization/FHIR/src> $BatchAccountName = "YOURSdeidbatchacct"
PS /home/you/code/FHIR-Tools-for-Anonymization/FHIR/src> $BatchAccountPoolName = "YOURSDeidentifierPool"
PS /home/you/code/FHIR-Tools-for-Anonymization/FHIR/src> $BatchComputeNodeSize = "Standard_D2_v3"
PS /home/you/code/FHIR-Tools-for-Anonymization/FHIR/src> $ResourceGroupName = "YOURS-fhir-deidentifier"
PS /home/you/code/FHIR-Tools-for-Anonymization/FHIR/src/Microsoft.Health.Fhir.Anonymizer.R4.AzureDataFactoryPipeline> .\DeployAzureDataFactoryPipeline.ps1 -SubscriptionId $SubscriptionId -BatchAccountName $BatchAccountName -BatchAccountPoolName $BatchAccountPoolName -BatchComputeNodeSize $BatchComputeNodeSize -ResourceGroupName $ResourceGroupName

The pipeline will fail with an unhelpful error message:

<Error>
<Code>ResourceNotFound</Code>
<Message>The specified resource does not exist. RequestId:REDACTED Time:REDACTED</Message>
</Error>

We have a fork that does still work at 57542d1, so it was broken at some point after that.

Provide Nuget Package for the Core library

Thank you for this tool!

Have you considered publishing the Stu3/R4.Core libraries on NuGet?

configuration.json in CommandLineTool doesn't parse because of {"tag":"(50xx,xxxx)", "method":"remove"}

Number.Parsing throws an exception: System.FormatException: 'The input string '50xx' was not in a correct format.' when trying to parse the configuration.json file in the project. The offending line is:

    {"tag":"(50xx,xxxx)", "method":"remove"}, 
    {"tag":"(60xx,4000)", "method":"remove"}, 
    {"tag":"(60xx,3000)", "method":"remove"},

The (50xx,xxxx) etc. values are obviously not valid.

Also:
{"tag":"DA", "method":"dateshift"},
{"tag":"DT", "method":"dateshift"}

Do not parse.

Please remove or update this line. Location of the file:
DICOM/src/Microsoft.Health.Dicom.Anonymizer.CommandLineTool/configuration.json

Consider allowing comments in configuration files

When writing a configuration file, it is extremely useful to comment why a line was added (or often more importantly, why a line is not present).

JSON doesn't support comments. So what I've done for my config files is to add a comment field like so:

{"path": "Immunization.lotNumber", "method": "keep", "comment": "this is useful for X reason"},

Or more awkwardly, when I want to explain why a line isn't there or want to demark a section of rules:

{"comment": "** https://www.hl7.org/fhir/R4/immunization.html **", "path": "xxx", "method": "redact"},
{"comment": "Immunization.lotNumber isn't useful for Y reason", "path": "xxx", "method": "redact"},

But life would be a lot easier if I could just use comments - and would be more performant because the tool doesn't need to apply these fake rules.

YAML Option
JSON is (mostly) a subset of YAML - i.e. a YAML parser can read JSON. So by switching your config parser to a YAML parser, you'd lose no functionality and folks that wanted to use YAML (which supports comments) could.

JSON Option
Use a JSON parser that extends the format to allow for comments, or preprocess the config file to remove comments or some such workaround.

Display values in security coding do not fit those in the CodeSystem

In the security attribute, where the modifications to the resource are coded e.g.
{ "system": "http://terminology.hl7.org/CodeSystem/v3-ObservationValue", "code": "REDACTED", "display": "part of the resource is removed" },
the display values are wrong. Display values are defined as "Representation defined by the system" (https://hl7.org/implement/standards/fhir/2015Jan/datatypes-definitions.html#Coding.display), but the display values defined here https://terminology.hl7.org/1.0.0/CodeSystem-v3-ObservationValue.html do not match those in the resources. ( This should be catched by terminology-aware validators. )

Allow additional option to `processingErrors` to allow skipping of `InvalidInputException`

Setting processingErrors: "skip" skips only AnonymizerProcessingException and not InvalidInputException. While this is documented behavior, there are situations where it would be desirable to skip files for which deidentification fails due to InvalidInputException:

An upstream system emits malformed FHIR, or a corrupted file.
An upstream system emits FHIR of an unexpected version.
Additional, non-FHIR files are accidentally placed in the directory with FHIR.

Because our deidentification pipeline may take 24 hours or longer, we prefer to skip invalid input rather than fail on a single malformed file.

I propose adding an additional option, processingErrors: "invalid" that skips both AnonymizerProcessingException and InvalidInputException.

Edit: I see a comment indicating that there my have originally been a desire to add this feature.

Add extensions to the deindentified data to indicate which operations were performed during anonymization (redacted, shifted, etc.).

Per my comment at DeyDays 2020, I think that having extensions on FHIR elements that were modified during the anonymization process could be useful.

Include ability to spread deidentification over multiple nodes in a batch pool

Currently, the ADF pipeline is only able to execute on a single node, though that node can be of arbitrary size.

It takes us more than 48 hours to deidentify our entire set of FHIR resources. Since the task of deidentification is embarrassingly parallel, we would like the ability to spread our processing out over more than a single node so that we can deidentify our resources more quickly.

Complete fhirPathRules example

Thanks for a great product.
Is there a more complete example of redacting FHIR bundles?
For example:

shift all date fields and ages
redact all identifiers

Consider using the "masked" code for encrypted resource parts

Currently, when a value is encrypted, the custom encrypted flag is set as a security label: https://github.com/microsoft/FHIR-Tools-for-Anonymization/blob/6a9b8614c319afb5f85959c02f86b2304ec4618c/src/Microsoft.Health.Fhir.Anonymizer.Shared.Core/Models/SecurityLabels.cs#L28-L32. I think using the masked code here would be more appropriate, see https://terminology.hl7.org/2.0.0/CodeSystem-v3-ObservationValue.html:

Usage Note: "MASKED" may be used, per applicable policy, as a flag to indicate to a user or receiver that some portion of an IT resource has been further encrypted, and may be accessed only by an authorized user or receiver to which a decryption key is provided.

Which seems fitting.

Running the tool with a non supported anonymization method results in error that includes the (raw content) input FHIR JSON files

Description:

When running the tool with a non supported anonymization method, we see the AnonymizerConfigurationException (stack trace) error displayed in the console log. Although, it also includes the (raw content) input FHIR JSON files.

This could happen by mistake and it is very unlikely to go unnoticed. Nevertheless, if it happens, it might be seen as undesireable behaviour when running the tool in a server, as these logs might leak out (collected inadvertently), while containing sensitive information in the FHIR JSON files.

Desired outcome

Do not include the (raw content) input FHIR JSON files in the console log.

Reproduction steps:

Edit the tool's configuration-sample.json file and purposefuly set a non supported anonymization method:
{"path": "Resource.id", "method": "FAKEcryptoHash"}.
Run the tool in Windows as it follows:
Microsoft.Health.Fhir.Anonymizer.R4.CommandLineTool.exe -i "C:\Tool\FHIR\samples\fhir-r4-files" -o "C:\Tool\FHIR\samples\fhir-r4-files\output" -c "C:\Tool\configuration-sample.json"
You'll see the following output in the console:

[C:\Tool\FHIR\samples\fhir-r4-files\practitionerInformation1583281853074.json] Error:
Resource: {
"resourceType": "Bundle",
"type": "transaction",
"entry": [
{
"fullUrl": "urn:uuid:00000170-a2f3-ce01-0000-00000000010e",
"resource": {
"resourceType": "Practitioner",
"id": "00000170-a2f3-ce01-0000-00000000010e",
"meta": {
"profile": [
"http://hl7.org/fhir/us/core/StructureDefinition/us-core-practitioner"
]
},
"extension": [
{
"url": "http://synthetichealth.github.io/synthea/utilization-encounters-extension",
"valueInteger": 259
}
],
"identifier": [
{
"system": "http://hl7.org/fhir/sid/us-npi",
"value": "270"
}
],
"active": true,
"name": [
{
"family": "Roberts511",
"given": [
"Chi716"
],
"prefix": [
"Dr."
]
}
],
"telecom": [
{
"extension": [
{
"url": "http://hl7.org/fhir/us/core/StructureDefinition/us-core-direct",
"valueBoolean": true
}
],
"system": "email",
"value": "[email protected]",
"use": "work"
}
],
"address": [
{
"line": [
"1035-116TH AVE NE"
],
"city": "BELLEVUE",
"state": "WA",
"postalCode": "98004",
"country": "US"
}
],
"gender": "male"
},
"request": {
"method": "POST",
"url": "Practitioner"
}
}
]
}

ErrorMessage: Microsoft.Health.Fhir.Anonymizer.Core.Exceptions.AnonymizerConfigurationException: Anonymization method FAKEcryptoHash not supported.
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurations.AnonymizerConfigurationValidator.Validate(AnonymizerConfiguration config) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurations\AnonymizerConfigurationValidator.cs:line 19
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager..ctor(AnonymizerConfiguration configuration) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurationManager.cs:line 22
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager.CreateFromSettingsInJson(String settingsInJson) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurationManager.cs:line 39
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager.CreateFromConfigurationFile(String configFilePath) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurationManager.cs:line 51
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerEngine.CreateWithFileContext(String configFilePath, String fileName, String inputFolderName) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerEngine.cs:line 53
at Microsoft.Health.Fhir.Anonymizer.Tool.FilesAnonymizerForJsonFormatResource.FileAnonymize(String fileName) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.CommandLineTool\FilesAnonymizerForJsonFormatResource.cs:line 96
Error:
Resource: C:\Projects\T\FHIR\samples\fhir-r4-files\practitionerInformation1583281853074.json
ErrorMessage: Microsoft.Health.Fhir.Anonymizer.Core.Exceptions.AnonymizerConfigurationException: Anonymization method FAKEcryptoHash not supported.
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurations.AnonymizerConfigurationValidator.Validate(AnonymizerConfiguration config) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurations\AnonymizerConfigurationValidator.cs:line 19
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager..ctor(AnonymizerConfiguration configuration) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurationManager.cs:line 22
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager.CreateFromSettingsInJson(String settingsInJson) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurationManager.cs:line 39
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager.CreateFromConfigurationFile(String configFilePath) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurationManager.cs:line 51
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerEngine.CreateWithFileContext(String configFilePath, String fileName, String inputFolderName) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerEngine.cs:line 53
at Microsoft.Health.Fhir.Anonymizer.Tool.FilesAnonymizerForJsonFormatResource.FileAnonymize(String fileName) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.CommandLineTool\FilesAnonymizerForJsonFormatResource.cs:line 96
at Microsoft.Health.Fhir.Anonymizer.Tool.FilesAnonymizerForJsonFormatResource.<>c__DisplayClass5_0.<b__0>d.MoveNext() in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.CommandLineTool\FilesAnonymizerForJsonFormatResource.cs:line 52
Unhandled exception. Microsoft.Health.Fhir.Anonymizer.Core.Exceptions.AnonymizerConfigurationException: Anonymization method FAKEcryptoHash not supported.
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurations.AnonymizerConfigurationValidator.Validate(AnonymizerConfiguration config) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurations\AnonymizerConfigurationValidator.cs:line 19
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager..ctor(AnonymizerConfiguration configuration) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurationManager.cs:line 22

Failing unit tests due to international decimal parsing

Description

I've pulled release 3.0.0 to my local machine in Germany and noticed that I had failing unit tests targeting the PerturbProcessor. Specifically, I was receiving the following error pertaining to the Decimal.Parsing of test values:

System.FormatException : Input string was not in a correct format.

This occurs, for example, in line 138 of the PerturbProcessorTests.cs

new Quantity { Value = decimal.Parse("25,162.1378") },

To address this problem locally, I've included a CultureInfo.InvariantCulture parameter in the Parse function to specify not to use the local Culture for parsing, since in Germany the ',' is used as a decimal separator and the '.' as a thousands separator. According to Germany Culture rules, the value "25,162.1378" would place the thousands separator after the decimal separator, thus triggering a System.FormatException. Here, my workaround:

new Quantity { Value = decimal.Parse("25,162.1378", NumberStyles.Any, CultureInfo.InvariantCulture) },

This problem naturally affected the unit tests targeting test results from data fed into the PerturbProcessor. For example, the range assertion in AnonymizationVisitorTests.GivenAPerturbRule_WhenProcess_NodeShouldBePerturbed() with the test data considering a proportional span of 0.2. In countries that use the '.' as a thousands separator, this value is parsed as 2, which gives a much wider range of perturbation than 0.2 and is very likely to fall out of the range prescribed for a successful test.

Desired outcome

Perturb settings parsing globalization to avoid FormatExceptions and unexpected values resulting from the PerturbProcessor.

Reproduction steps
Run Fhir.Anonymizer.sln tests on a machine with regional settings set to a region where the ',' is used as a decimal separator and the '.' is used as a thousands separator, Germany for example.

Update version of firely SDK for R4 standard

Hi all,

I've just noticed that the version used for the FHIR R4 standard is relatively old, version 3.4.0 released nearly a year ago. I'm wondering if it would be beneficial to look into updating this version? Since release, quite a few issues have been closed, so it might be beneficial to update this.

An initial check I've run is to update the version to the most recent v.4.0.0 and running the R4.Core.UnitTests and R4.Core.FunctionalTests tests. The following tests fail:

Several AttributeValidatorTests in the R4.Core.UnitTests fail due to updated firely error messages.
GivenAStu3OnlyResource_WhenAnonymizing_ExceptionShouldBeThrown(testFile: "Stu3OnlyResource/ProcessResponse.json", ResourceName: "ProcessResponse") fails due to a failure to throw an exception. I assume the Firely SDK now handles the previous situation that caused an exception.
GivenAStu3OnlyResource_WhenAnonymizing_ExceptionShouldBeThrown(testFile: "Stu3OnlyResource/ProcessRequest.json", ResourceName: "ProcessRequest") fails due to a failure to throw an exception. I assume the Firely SDK now handles the previous situation that caused an exception.

The test failure due to changed error messages is a relatively quick fix. The GivenAStu3OnlyResource_WhenAnonymizing_ExceptionShouldBeThrown failures would have to be research to understand how the Stu3OnlyResource is handled in the updated firely SDK.

What do you think? If this is a benefit, I could create a PR for it. Seems like relatively little work to get the latest/greatest.

Question: Generalize Rule

Hi,

i don't understand how to use the last generalize example.

https://github.com/microsoft/Tools-for-Health-Data-Anonymization/blob/master/docs/FHIR-anonymization.md#generalize

How is
"$this.replaceMatches('(?<year>\\d{2,4})-(?<month>\\d{1,2})-(?<day>\\d{1,2})\\b', '${year}-${month}'"

a key value pair for the cases in the generalization method ? Ist straightup just a method with two params in it.
Other example make more sense:

"$this >= @2010-1-1": "@2010"

Where the key is clearly the condition and the value is target.
In the replaceMatches example the key-value pair of condition-target is not understandble, hence i don't know how to specify the JSON file appropriately.

Extend generalize method

Hi all,

We have a business requirement for which we would like to extend the generalize transformation to accept not only 'Keep' and 'Redact' but all the other methods as well. I have made the code changes locally and would like to submit a pull request for your review however it seems I do not have the permissions to create a new remote branch for your repository.

Looking forward to hearing from you.

Thanks,
Zoli

Necessity of field "params" in config.

We can discuss whether to remove this feature later.

Originally posted by @sowu880 in #107 (comment)

Filename too long

error: unable to create file FHIR/src/Microsoft.Health.Fhir.Anonymizer.R4.AzureDataFactoryPipeline.UnitTests/Microsoft.Health.Fhir.Anonymizer.R4.AzureDataFactoryPipeline.UnitTests.csproj: Filename too long
error: unable to create file FHIR/src/Microsoft.Health.Fhir.Anonymizer.Shared.AzureDataFactoryPipeline.UnitTests/Microsoft.Health.Fhir.Anonymizer.Shared.AzureDataFactoryPipeline.UnitTests.projitems: Filename too long
error: unable to create file FHIR/src/Microsoft.Health.Fhir.Anonymizer.Shared.AzureDataFactoryPipeline.UnitTests/Microsoft.Health.Fhir.Anonymizer.Shared.AzureDataFactoryPipeline.UnitTests.shproj: Filename too long
error: unable to create file FHIR/src/Microsoft.Health.Fhir.Anonymizer.Stu3.AzureDataFactoryPipeline.UnitTests/Microsoft.Health.Fhir.Anonymizer.Stu3.AzureDataFactoryPipeline.UnitTests.csproj: Filename too long
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.

Running the tool with a fake/wrong path in the config.json does not result in an error

Description:

When running the tool with a with a fake/wrong path in the config.json does not result in an error

This could happen by mistake and it is very unlikely to go unnoticed. Nevertheless, if it happens, it might be seen as undesireable behaviour because the user might assume that its data is being anonymized, while it is not.

Desired outcome

Throw an AnonymizerConfigurationException error informing that the path is not valid.

Reproduction steps:

Edit the tool's configuration-sample.json file and purposefuly set a non supported anonymization method:
{"path": "Resource.id-MISTAKE", "method": "cryptoHash"}.
Run the tool in Windows as it follows:
Microsoft.Health.Fhir.Anonymizer.R4.CommandLineTool.exe -i "C:\Tool\FHIR\samples\fhir-r4-files" -o "C:\Tool\FHIR\samples\fhir-r4-files\output" -c "C:\Tool\configuration-sample.json"

You'll see: Finished processing 'C:\Tool\FHIR\samples\fhir-r4-files'! no error is thrown. And the user might not notice that Resource.id was not cryptoHashed.

Date shift operation leads to type inconsistence for dateTime ,time, date and instant

Date shift operation will generate a new date time in string type, but the original type is Partial
DateTime or PartialTime

microsoft / tools-for-health-data-anonymization Goto Github PK

tools-for-health-data-anonymization's Introduction

Tools for Health Data Anonymization

Contributing

tools-for-health-data-anonymization's People

Contributors

Stargazers

Watchers

Forkers

tools-for-health-data-anonymization's Issues

Recommend Projects

Recommend Topics

Recommend Org