Giter Site home page Giter Site logo

tools-for-health-data-anonymization's Introduction

Tools for Health Data Anonymization

Build Status


Privacy Notice and Consent

This project provides you the scripts and command line tools for your own use. It does NOT and cannot access, use, collect, or manage any of your data, including any personal or health-related data. You must bring your own data, and be 100% responsible for using our tools to work with your own data.


Tools for Health Data Anonymization is an open-source project that helps anonymize healthcare data, on-premises or in the cloud, for secondary usage such as research, public health, and more. The project first released the anonymization of FHIR data to open source on Friday, March 6th, 2020. Currently, it supports both FHIR data anonymization and DICOM data anonymization.

The anonymization core engine uses a configuration file specifying different parameters as well as anonymization methods for different data-elements and datatypes. The repo contains a sample configuration file, which is based on the HIPAA Safe Harbor method. You can modify or create your own configuration file as needed.

This open source project is fully backed by the Microsoft Healthcare team, but we know that this project will only get better with your feedback and contributions. We are leading the development of this code base, and test builds and deployments daily.

FHIR® is the registered trademark of HL7 and is used with the permission of HL7. Use of the FHIR trademark does not constitute endorsement of this product by HL7.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

FHIR® is the registered trademark of HL7 and is used with the permission of HL7.

tools-for-health-data-anonymization's People

Contributors

boyawu10 avatar cchaomsft avatar chgl avatar christopher-watanabe-snkeos avatar dependabot[bot] avatar feordin avatar ginalee-dotcom avatar hongyh13 avatar jeganlingam avatar joedrowan avatar jovinson-ms avatar microsoft-github-operations[bot] avatar microsoftopensource avatar mikesoennichsen avatar moria97 avatar omri374 avatar ranvijaykumar avatar sowu880 avatar thegotoguy avatar tongwu-sh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tools-for-health-data-anonymization's Issues

User-facing API to add custom IAnonymizerProcessor

It would be great if it were possible to add custom IAnonymizerProcessors as a user of the library.

My proposal is to add a AnonymizerEngine.AddProcessor(string methodKey, IAnonymizerProcessor processor) method. Where methodKey is a user-defined configuration key.

This might require refactoring the way the security labels are applied: the IAnonymizationProcessor will need to specify the label that should be applied as there is no longer a fixed lookup between the static AnonymizationOperations and the security labels.

Cannot instantiate `AnonymizerEngine`

Firstly, this is a brilliant idea for a project. Thank you!

I'm using the AnonymizerEngine class from the NuGet package at:

<PackageReference Include="Microsoft.Health.Fhir.Anonymizer.R4.Core" Version="3.0.0.33" />

with the following configuration file:

https://raw.githubusercontent.com/microsoft/Tools-for-Health-Data-Anonymization/master/FHIR/src/Microsoft.Health.Fhir.Anonymizer.R4.CommandLineTool/configuration-sample.json

Upon instantiating the class, the following Exception is thrown:

Microsoft.Health.Fhir.Anonymizer.Core.Exceptions.AnonymizerConfigurationException
  HResult=0x80131500
  Message=Invalid FHIR path nodesByType('Extension')
  Source=Microsoft.Health.Fhir.Anonymizer.R4.Core
  StackTrace:
   at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurations.AnonymizerConfigurationValidator.Validate(AnonymizerConfiguration config)
   at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager..ctor(AnonymizerConfiguration configuration)
   at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager.CreateFromSettingsInJson(String settingsInJson)
   at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager.CreateFromConfigurationFile(String configFilePath)
   at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerEngine..ctor(String configFilePath)
   at [REFERENCES TO MY CALLING CODE]

  This exception was originally thrown at this call stack:
    [External Code]

Inner Exception 1:
ArgumentException: Unknown symbol 'nodesByType'

Any idea what I might be doing wrong?

Mark pseudonymized identifiers more visibly

I noticed that this library only replaces the Identifier.value. While some information indivating that this might not be a "real" value is in the security attribute of the resource, I do not think that is enough. While I agree with #52 , I also do not think simply adding an extension is enough either.
Both the security entry and the extension can be jusifiably ignored by recieving applications. Those applications would than go on to interpret the value as a "real" value. This might quickly become a problem, e.g. if trying to find corresponding objects in another, non-pseudonymized (or pseudonymized with a different system and therefore different pseudonyms) data source.
I believe one solution to this problem would be to create and add a modifier extension (https://www.hl7.org/fhir/extensibility.html#modifierExtension) to the Identifier. Modifier extensions tell the recieving application that this Identifier might not be what it expects and even if the system does not recognize the extension, it now knows that "something is different about this Identifier".
Another solution (which i personally would prefer since modifier extensions tend to cause problems) would be to also change the system. The original system could be added in an extension. This would avoid confusion and also some issues e.g. with constraints that test the format of an identifier (e.g. in a profile which specifies an error if the identifier.value does not start with two letters followed by a colon.). It might also be useful to add a coding to identifier.type which expresses that this is now a pseudonym generated from an "original" identifier.
Since i did not find information on how to handle pseudonyms in the FHIR spec, this might also be a topic that should be discuessed on the FHIR chat in order to get community feedback.

.NET 6.0

Since the support of .NET Core 3.1 stops in December 2022, are there any plans to migrate to .NET 6.0?

Consider labeling output data with relevant security labels

For background on how to apply these, see: https://www.hl7.org/fhir/security-labels.html#rsl

https://www.hl7.org/fhir/v3/ObservationValue/cs.html#v3-ObservationValue-ANONYED is a relevant tag that would make sense to assign to Resource.meta.security.

This would allow us to indicate that the output data have been subject to anonymization. (Another relevant tag might be https://www.hl7.org/fhir/v3/ObservationValue/cs.html#v3-ObservationValue-REDACTED; all this might be good to control with configuration.)

Microsoft.Health.Fhir.Anonymizer.R4.AzureDataFactoryPipeline is broken at head

Configure FHIR/src/Microsoft.Health.Fhir.Anonymizer.R4.AzureDataFactoryPipeline/AzureDataFactorySettings.json and FHIR/src/Microsoft.Health.Fhir.Anonymizer.Shared.AzureDataFactoryPipeline/scripts/ArmTemplate/arm_template_parameters.json as per the instructions and then try, e.g.:

PS /home/you> Connect-AzAccount
PS /home/you> Get-AzSubscription
PS /home/you> Select-AzSubscription -SubscriptionId "YOURS"
PS /home/you/code/FHIR-Tools-for-Anonymization/FHIR/src> $SubscriptionId = "YOURS"
PS /home/you/code/FHIR-Tools-for-Anonymization/FHIR/src> $BatchAccountName = "YOURSdeidbatchacct"
PS /home/you/code/FHIR-Tools-for-Anonymization/FHIR/src> $BatchAccountPoolName = "YOURSDeidentifierPool"
PS /home/you/code/FHIR-Tools-for-Anonymization/FHIR/src> $BatchComputeNodeSize = "Standard_D2_v3"
PS /home/you/code/FHIR-Tools-for-Anonymization/FHIR/src> $ResourceGroupName = "YOURS-fhir-deidentifier"
PS /home/you/code/FHIR-Tools-for-Anonymization/FHIR/src/Microsoft.Health.Fhir.Anonymizer.R4.AzureDataFactoryPipeline> .\DeployAzureDataFactoryPipeline.ps1 -SubscriptionId $SubscriptionId -BatchAccountName $BatchAccountName -BatchAccountPoolName $BatchAccountPoolName -BatchComputeNodeSize $BatchComputeNodeSize -ResourceGroupName $ResourceGroupName

The pipeline will fail with an unhelpful error message:

<Error>
<Code>ResourceNotFound</Code>
<Message>The specified resource does not exist. RequestId:REDACTED Time:REDACTED</Message>
</Error>

We have a fork that does still work at 57542d1, so it was broken at some point after that.

configuration.json in CommandLineTool doesn't parse because of {"tag":"(50xx,xxxx)", "method":"remove"}

Number.Parsing throws an exception: System.FormatException: 'The input string '50xx' was not in a correct format.' when trying to parse the configuration.json file in the project. The offending line is:

    {"tag":"(50xx,xxxx)", "method":"remove"}, 
    {"tag":"(60xx,4000)", "method":"remove"}, 
    {"tag":"(60xx,3000)", "method":"remove"}, 

The (50xx,xxxx) etc. values are obviously not valid.

Also:
{"tag":"DA", "method":"dateshift"},
{"tag":"DT", "method":"dateshift"}

Do not parse.

Please remove or update this line. Location of the file:
DICOM/src/Microsoft.Health.Dicom.Anonymizer.CommandLineTool/configuration.json

Consider allowing comments in configuration files

When writing a configuration file, it is extremely useful to comment why a line was added (or often more importantly, why a line is not present).

JSON doesn't support comments. So what I've done for my config files is to add a comment field like so:

{"path": "Immunization.lotNumber", "method": "keep", "comment": "this is useful for X reason"},

Or more awkwardly, when I want to explain why a line isn't there or want to demark a section of rules:

{"comment": "** https://www.hl7.org/fhir/R4/immunization.html **", "path": "xxx", "method": "redact"},
{"comment": "Immunization.lotNumber isn't useful for Y reason", "path": "xxx", "method": "redact"},

But life would be a lot easier if I could just use comments - and would be more performant because the tool doesn't need to apply these fake rules.

YAML Option
JSON is (mostly) a subset of YAML - i.e. a YAML parser can read JSON. So by switching your config parser to a YAML parser, you'd lose no functionality and folks that wanted to use YAML (which supports comments) could.

JSON Option
Use a JSON parser that extends the format to allow for comments, or preprocess the config file to remove comments or some such workaround.

Display values in security coding do not fit those in the CodeSystem

In the security attribute, where the modifications to the resource are coded e.g.
{ "system": "http://terminology.hl7.org/CodeSystem/v3-ObservationValue", "code": "REDACTED", "display": "part of the resource is removed" },
the display values are wrong. Display values are defined as "Representation defined by the system" (https://hl7.org/implement/standards/fhir/2015Jan/datatypes-definitions.html#Coding.display), but the display values defined here https://terminology.hl7.org/1.0.0/CodeSystem-v3-ObservationValue.html do not match those in the resources. ( This should be catched by terminology-aware validators. )

Allow additional option to `processingErrors` to allow skipping of `InvalidInputException`

Setting processingErrors: "skip" skips only AnonymizerProcessingException and not InvalidInputException. While this is documented behavior, there are situations where it would be desirable to skip files for which deidentification fails due to InvalidInputException:

  • An upstream system emits malformed FHIR, or a corrupted file.
  • An upstream system emits FHIR of an unexpected version.
  • Additional, non-FHIR files are accidentally placed in the directory with FHIR.

Because our deidentification pipeline may take 24 hours or longer, we prefer to skip invalid input rather than fail on a single malformed file.

I propose adding an additional option, processingErrors: "invalid" that skips both AnonymizerProcessingException and InvalidInputException.

Edit: I see a comment indicating that there my have originally been a desire to add this feature.

Include ability to spread deidentification over multiple nodes in a batch pool

Currently, the ADF pipeline is only able to execute on a single node, though that node can be of arbitrary size.

It takes us more than 48 hours to deidentify our entire set of FHIR resources. Since the task of deidentification is embarrassingly parallel, we would like the ability to spread our processing out over more than a single node so that we can deidentify our resources more quickly.

Complete fhirPathRules example

Thanks for a great product.
Is there a more complete example of redacting FHIR bundles?
For example:

  • shift all date fields and ages
  • redact all identifiers

Consider using the "masked" code for encrypted resource parts

Currently, when a value is encrypted, the custom encrypted flag is set as a security label: https://github.com/microsoft/FHIR-Tools-for-Anonymization/blob/6a9b8614c319afb5f85959c02f86b2304ec4618c/src/Microsoft.Health.Fhir.Anonymizer.Shared.Core/Models/SecurityLabels.cs#L28-L32. I think using the masked code here would be more appropriate, see https://terminology.hl7.org/2.0.0/CodeSystem-v3-ObservationValue.html:

Usage Note: "MASKED" may be used, per applicable policy, as a flag to indicate to a user or receiver that some portion of an IT resource has been further encrypted, and may be accessed only by an authorized user or receiver to which a decryption key is provided.

Which seems fitting.

Running the tool with a non supported anonymization method results in error that includes the (raw content) input FHIR JSON files

Description:

When running the tool with a non supported anonymization method, we see the AnonymizerConfigurationException (stack trace) error displayed in the console log. Although, it also includes the (raw content) input FHIR JSON files.

This could happen by mistake and it is very unlikely to go unnoticed. Nevertheless, if it happens, it might be seen as undesireable behaviour when running the tool in a server, as these logs might leak out (collected inadvertently), while containing sensitive information in the FHIR JSON files.

Desired outcome

Do not include the (raw content) input FHIR JSON files in the console log.

Reproduction steps:

  1. Edit the tool's configuration-sample.json file and purposefuly set a non supported anonymization method:
    {"path": "Resource.id", "method": "FAKEcryptoHash"}.
  2. Run the tool in Windows as it follows:
    Microsoft.Health.Fhir.Anonymizer.R4.CommandLineTool.exe -i "C:\Tool\FHIR\samples\fhir-r4-files" -o "C:\Tool\FHIR\samples\fhir-r4-files\output" -c "C:\Tool\configuration-sample.json"
  3. You'll see the following output in the console:

[C:\Tool\FHIR\samples\fhir-r4-files\practitionerInformation1583281853074.json] Error:
Resource: {
"resourceType": "Bundle",
"type": "transaction",
"entry": [
{
"fullUrl": "urn:uuid:00000170-a2f3-ce01-0000-00000000010e",
"resource": {
"resourceType": "Practitioner",
"id": "00000170-a2f3-ce01-0000-00000000010e",
"meta": {
"profile": [
"http://hl7.org/fhir/us/core/StructureDefinition/us-core-practitioner"
]
},
"extension": [
{
"url": "http://synthetichealth.github.io/synthea/utilization-encounters-extension",
"valueInteger": 259
}
],
"identifier": [
{
"system": "http://hl7.org/fhir/sid/us-npi",
"value": "270"
}
],
"active": true,
"name": [
{
"family": "Roberts511",
"given": [
"Chi716"
],
"prefix": [
"Dr."
]
}
],
"telecom": [
{
"extension": [
{
"url": "http://hl7.org/fhir/us/core/StructureDefinition/us-core-direct",
"valueBoolean": true
}
],
"system": "email",
"value": "[email protected]",
"use": "work"
}
],
"address": [
{
"line": [
"1035-116TH AVE NE"
],
"city": "BELLEVUE",
"state": "WA",
"postalCode": "98004",
"country": "US"
}
],
"gender": "male"
},
"request": {
"method": "POST",
"url": "Practitioner"
}
}
]
}

ErrorMessage: Microsoft.Health.Fhir.Anonymizer.Core.Exceptions.AnonymizerConfigurationException: Anonymization method FAKEcryptoHash not supported.
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurations.AnonymizerConfigurationValidator.Validate(AnonymizerConfiguration config) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurations\AnonymizerConfigurationValidator.cs:line 19
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager..ctor(AnonymizerConfiguration configuration) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurationManager.cs:line 22
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager.CreateFromSettingsInJson(String settingsInJson) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurationManager.cs:line 39
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager.CreateFromConfigurationFile(String configFilePath) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurationManager.cs:line 51
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerEngine.CreateWithFileContext(String configFilePath, String fileName, String inputFolderName) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerEngine.cs:line 53
at Microsoft.Health.Fhir.Anonymizer.Tool.FilesAnonymizerForJsonFormatResource.FileAnonymize(String fileName) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.CommandLineTool\FilesAnonymizerForJsonFormatResource.cs:line 96
Error:
Resource: C:\Projects\T\FHIR\samples\fhir-r4-files\practitionerInformation1583281853074.json
ErrorMessage: Microsoft.Health.Fhir.Anonymizer.Core.Exceptions.AnonymizerConfigurationException: Anonymization method FAKEcryptoHash not supported.
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurations.AnonymizerConfigurationValidator.Validate(AnonymizerConfiguration config) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurations\AnonymizerConfigurationValidator.cs:line 19
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager..ctor(AnonymizerConfiguration configuration) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurationManager.cs:line 22
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager.CreateFromSettingsInJson(String settingsInJson) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurationManager.cs:line 39
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager.CreateFromConfigurationFile(String configFilePath) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurationManager.cs:line 51
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerEngine.CreateWithFileContext(String configFilePath, String fileName, String inputFolderName) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerEngine.cs:line 53
at Microsoft.Health.Fhir.Anonymizer.Tool.FilesAnonymizerForJsonFormatResource.FileAnonymize(String fileName) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.CommandLineTool\FilesAnonymizerForJsonFormatResource.cs:line 96
at Microsoft.Health.Fhir.Anonymizer.Tool.FilesAnonymizerForJsonFormatResource.<>c__DisplayClass5_0.<b__0>d.MoveNext() in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.CommandLineTool\FilesAnonymizerForJsonFormatResource.cs:line 52
Unhandled exception. Microsoft.Health.Fhir.Anonymizer.Core.Exceptions.AnonymizerConfigurationException: Anonymization method FAKEcryptoHash not supported.
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurations.AnonymizerConfigurationValidator.Validate(AnonymizerConfiguration config) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurations\AnonymizerConfigurationValidator.cs:line 19
at Microsoft.Health.Fhir.Anonymizer.Core.AnonymizerConfigurationManager..ctor(AnonymizerConfiguration configuration) in D:\a\1\s\FHIR\src\Microsoft.Health.Fhir.Anonymizer.Shared.Core\AnonymizerConfigurationManager.cs:line 22

Failing unit tests due to international decimal parsing

Description

I've pulled release 3.0.0 to my local machine in Germany and noticed that I had failing unit tests targeting the PerturbProcessor. Specifically, I was receiving the following error pertaining to the Decimal.Parsing of test values:

System.FormatException : Input string was not in a correct format.

This occurs, for example, in line 138 of the PerturbProcessorTests.cs

new Quantity { Value = decimal.Parse("25,162.1378") },

To address this problem locally, I've included a CultureInfo.InvariantCulture parameter in the Parse function to specify not to use the local Culture for parsing, since in Germany the ',' is used as a decimal separator and the '.' as a thousands separator. According to Germany Culture rules, the value "25,162.1378" would place the thousands separator after the decimal separator, thus triggering a System.FormatException. Here, my workaround:

new Quantity { Value = decimal.Parse("25,162.1378", NumberStyles.Any, CultureInfo.InvariantCulture) },

This problem naturally affected the unit tests targeting test results from data fed into the PerturbProcessor. For example, the range assertion in AnonymizationVisitorTests.GivenAPerturbRule_WhenProcess_NodeShouldBePerturbed() with the test data considering a proportional span of 0.2. In countries that use the '.' as a thousands separator, this value is parsed as 2, which gives a much wider range of perturbation than 0.2 and is very likely to fall out of the range prescribed for a successful test.

Desired outcome

Perturb settings parsing globalization to avoid FormatExceptions and unexpected values resulting from the PerturbProcessor.

Reproduction steps
Run Fhir.Anonymizer.sln tests on a machine with regional settings set to a region where the ',' is used as a decimal separator and the '.' is used as a thousands separator, Germany for example.

Update version of firely SDK for R4 standard

Hi all,

I've just noticed that the version used for the FHIR R4 standard is relatively old, version 3.4.0 released nearly a year ago. I'm wondering if it would be beneficial to look into updating this version? Since release, quite a few issues have been closed, so it might be beneficial to update this.

An initial check I've run is to update the version to the most recent v.4.0.0 and running the R4.Core.UnitTests and R4.Core.FunctionalTests tests. The following tests fail:

  1. Several AttributeValidatorTests in the R4.Core.UnitTests fail due to updated firely error messages.
  2. GivenAStu3OnlyResource_WhenAnonymizing_ExceptionShouldBeThrown(testFile: "Stu3OnlyResource/ProcessResponse.json", ResourceName: "ProcessResponse") fails due to a failure to throw an exception. I assume the Firely SDK now handles the previous situation that caused an exception.
  3. GivenAStu3OnlyResource_WhenAnonymizing_ExceptionShouldBeThrown(testFile: "Stu3OnlyResource/ProcessRequest.json", ResourceName: "ProcessRequest") fails due to a failure to throw an exception. I assume the Firely SDK now handles the previous situation that caused an exception.

The test failure due to changed error messages is a relatively quick fix. The GivenAStu3OnlyResource_WhenAnonymizing_ExceptionShouldBeThrown failures would have to be research to understand how the Stu3OnlyResource is handled in the updated firely SDK.

What do you think? If this is a benefit, I could create a PR for it. Seems like relatively little work to get the latest/greatest.

Question: Generalize Rule

Hi,

i don't understand how to use the last generalize example.

https://github.com/microsoft/Tools-for-Health-Data-Anonymization/blob/master/docs/FHIR-anonymization.md#generalize

How is
"$this.replaceMatches('(?<year>\\d{2,4})-(?<month>\\d{1,2})-(?<day>\\d{1,2})\\b', '${year}-${month}'"

a key value pair for the cases in the generalization method ? Ist straightup just a method with two params in it.
Other example make more sense:

"$this >= @2010-1-1": "@2010"

Where the key is clearly the condition and the value is target.
In the replaceMatches example the key-value pair of condition-target is not understandble, hence i don't know how to specify the JSON file appropriately.

Extend generalize method

Hi all,

We have a business requirement for which we would like to extend the generalize transformation to accept not only 'Keep' and 'Redact' but all the other methods as well. I have made the code changes locally and would like to submit a pull request for your review however it seems I do not have the permissions to create a new remote branch for your repository.

Looking forward to hearing from you.

Thanks,
Zoli

Filename too long

error: unable to create file FHIR/src/Microsoft.Health.Fhir.Anonymizer.R4.AzureDataFactoryPipeline.UnitTests/Microsoft.Health.Fhir.Anonymizer.R4.AzureDataFactoryPipeline.UnitTests.csproj: Filename too long
error: unable to create file FHIR/src/Microsoft.Health.Fhir.Anonymizer.Shared.AzureDataFactoryPipeline.UnitTests/Microsoft.Health.Fhir.Anonymizer.Shared.AzureDataFactoryPipeline.UnitTests.projitems: Filename too long
error: unable to create file FHIR/src/Microsoft.Health.Fhir.Anonymizer.Shared.AzureDataFactoryPipeline.UnitTests/Microsoft.Health.Fhir.Anonymizer.Shared.AzureDataFactoryPipeline.UnitTests.shproj: Filename too long
error: unable to create file FHIR/src/Microsoft.Health.Fhir.Anonymizer.Stu3.AzureDataFactoryPipeline.UnitTests/Microsoft.Health.Fhir.Anonymizer.Stu3.AzureDataFactoryPipeline.UnitTests.csproj: Filename too long
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.

Running the tool with a fake/wrong path in the config.json does not result in an error

Description:

When running the tool with a with a fake/wrong path in the config.json does not result in an error

This could happen by mistake and it is very unlikely to go unnoticed. Nevertheless, if it happens, it might be seen as undesireable behaviour because the user might assume that its data is being anonymized, while it is not.

Desired outcome

Throw an AnonymizerConfigurationException error informing that the path is not valid.

Reproduction steps:

Edit the tool's configuration-sample.json file and purposefuly set a non supported anonymization method:
{"path": "Resource.id-MISTAKE", "method": "cryptoHash"}.
Run the tool in Windows as it follows:
Microsoft.Health.Fhir.Anonymizer.R4.CommandLineTool.exe -i "C:\Tool\FHIR\samples\fhir-r4-files" -o "C:\Tool\FHIR\samples\fhir-r4-files\output" -c "C:\Tool\configuration-sample.json"

You'll see: Finished processing 'C:\Tool\FHIR\samples\fhir-r4-files'! no error is thrown. And the user might not notice that Resource.id was not cryptoHashed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.