nhsx / ai-dictionary Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 8.0 971 KB

Prototype AI Dictionary from the NHS AI Lab

Home Page: https://nhsx.github.io/ai-dictionary/

License: MIT License

JavaScript 86.98% CSS 3.81% Python 9.21%

ai nhs reference

ai-dictionary's People

Contributors

Stargazers

Watchers

Forkers

nhsx-mirror manishjiva tomlincr puntofisso mbfons v-smith samhollings harrietrs

ai-dictionary's Issues

[CORRECTION] Algorithm

Thank you for finding and reporting an inaccuracy in the term, please fill in the details below:

Term: Algorithm

Current content: A set of instructions that can be followed by a human or computer.

For example, the NHS algorithm for detecting Acute Kidney Injury is a set of instructions that can be repeated for multiple patients.

In AI, machine learning algorithms use data to make decisions.

Corrected content: A set of instructions that can be followed by a human or computer.

For example, the NHS algorithm for detecting Acute Kidney Injury is a set of instructions that can be repeated for multiple patients.

In AI, machine learning algorithms use data to make predictions or recommendations which can inform decision making.

Reason for correction including reference if appropriate: thanks to LE from the Lab

[CORRECTION] F1 score

Thank you for finding and reporting an inaccuracy in the term, please fill in the details below:

Term: F1 score

Current content: An accuracy metric that combines the precision and recall values for a classification model into a single number, which ranges from 0 (poor accuracy) to 1 (high accuracy).

Corrected content: A metric which describes the accuracy of a classification model, by combining the precision and recall values into a single number, which ranges from 0 (poor accuracy) to 1 (high accuracy).

Reason for correction including reference if appropriate: Flagged by Vijay on the Hub as needing more explanation for generalists

[NEW] Data Clean Room

Thank you for suggesting a new term, please enter its details below:

Title: Data clean room

Description:

Related terms: data protection

Reason for including new term including references if appropriate: Suggestion from Alasdair R. (LinkedIn)

Allow paragraphs in term description

See #16 but for line breaks:

[CORRECTION] MLOps

Thank you for finding and reporting an inaccuracy in the term, please fill in the details below:

Term: MLOps

Current content: Machine learning operations is the process of safely deploying, monitoring and updating machine learning models in production, or real-world, environments.

Because machine learning models are built on data, and data can change, it is important to build resilience into the system so they can adapt to a changing environment.

Corrected content: Machine learning operations is the process of safely deploying, monitoring and updating machine learning models in production, or real-world, environments.

Because machine learning models are built on data, and data can change, it is important to build robustness into the system so they can adapt to a changing environment without losing performance.

Reason for correction including reference if appropriate: resilience needs to be defined from Rubeta/Hub

[NEW] Data augmentation

Thank you for suggesting a new term, please enter its details below:

Title: Data augmentation

Description: The process of artificially increasing the amount of data used to train a model, to reduce overfitting and improve model performance.

Commonly used in imaging applications, this can include rotating, cropping, adding noise or random levels of blur to existing images.

Related terms: overfitting, model, machine-learning

Reason for including new term including references if appropriate:

Query params/URL rewriting glitchy

Steps to reproduce:

Issue 1

Click on a term
Hit refresh
Expected behaviour: you see the same term. Actual behaviour: you are taken to front page

Issue 2

Click on a term
Click "Back to dictionary"
Expected behaviour: url parameter is removed. Actual behaviour: URL parameter remains, but is not acted on

Issue 3

Click on a term
Copy the URL e.g. https://nhsx.github.io/ai-dictionary?term=algorithm
Open a new browser window
Paste the URL
Expected behaviour: you see the term. Actual behaviour: redirected to front page

Sort terms alphabetically on front page

[CORRECTION] Validation data

Thank you for finding and reporting an inaccuracy in the term, please fill in the details below:

Term: Validation data

Current content: Data that is not included in the training data, but is used to improve the performance of an AI model as it is trained.

This definition relates to the definition within AI, and not the regulatory aspects of medical devices.

Corrected content: Data that is not included in the training data, but is used to check the performance of the model as it is being trained. This is separate to the test data used to check the final performance of the model.

This definition relates to the definition within AI, and not the regulatory aspects of medical devices.

Reason for correction including reference if appropriate: I think it could be explained that 'validation data' is used as an initial check of the performance of the AI model but is strictly not used for training the model. - Moyeen on the Hub

[IMPROVEMENT] Order related terms in terms of importance

Currently related terms are read in from the array in json, and not ordered.

If we keep this behaviour, we should order the related terms in terms of relevance (by hand).

[NEW] Linked data

Thank you for suggesting a new term, please enter its details below:

Title: Linked data

Description:

Related terms:

Reason for including new term including references if appropriate: Suggestion from Pam/Hub

[NEW] NLP

Thank you for suggesting a new term, please enter its details below:

Title: NLP

Description:

Related terms: transformer

Reason for including new term including references if appropriate: Suggestion from Pam/Hub

[NEW] Multimodal AI

Thank you for suggesting a new term, please enter its details below:

Title: Multimodal AI

Description: An AI that brings different facets (smaller AI models or data verticals) of the target together to form a more complete predictor.

In healthcare, the term is also associated with specific 'modalities' of data, such as the sequences in mpMRI scanning. However, the concept of multimodality with respect to AI is more than just bringing different data verticals together - it is about how very different AI models effectively interoperate - even 'merge' - to create a whole that is greater than the sum of its parts.

Related terms: model

Reason for including new term including references if appropriate: Suggestion from @manishjiva #39

[CORRECTION] API

Thank you for finding and reporting an inaccuracy in the term, please fill in the details below:

Term: API

Current content: “Application Programming Interface”. A standardised way to share information. An API is much like a shipping container - it implements an agreed structure and mechanism for sharing data, and enables interoperability.

Corrected content: Application Programming Interface. A standardised way to share data. An API defines the mechanisms to receive and send data, which is agnostic to how the underlying data is stored.

For example, NHS Digital has a number of APIs available to help build modern healthcare technology.

Reason for correction including reference if appropriate: API – I’m not sure the analogy of it being like a shipping container helps particularly, Rubeta from Hub

Create open graph tags for social sharing

To be checked with:
https://socialsharepreview.com/?url=https://nhsx.github.io/ai-dictionary

[UX] User feedback

The browsers back button not going back through the slides feels jarring initially.
Would be nice to have the list across 1 - 3 columns to save some space/scrolling. Not sure how easy that is to change dynamically.
~~One little thing, the search does not seem to pick up 'AI' as a word in the same way as Python: it returns every result with the word AI in the text nvm~~
I removed the fade at the top and bottom of the scroll using chrome inspect and personally think it looks cleaner without. Purely aesthetic choice, but on a small screen the fade is quite noticeable
You have chosen to display the acronym first, rather than the full definition. Can you look at either ranking the acronyms first when searching, or adding a hover or similar for the full definition for each acronym?

[SUGGESTION] Visual map of terms

Ref: https://www.d3-graph-gallery.com/network

[SUGGESTION] Implement DefinedTerm schema

Suggestion from HR: migrate to https://schema.org/DefinedTerm

[NEW] Neural Networks

Thank you for suggesting a new term, please enter its details below:

Title: Neural networks

Description: Neural networks are an approach to machine learning, loosely inspired by nature, that can describe complex relationships using a broader range of data than traditional approaches.

For example, neural networks can be trained on image data to describe features in medical images such as tumours. They can also be trained on free text such as clinical notes, allowing their use in clinical coding applications.

Neural networks can also be trained on tabulated, or structured data, such as a spreadsheet. Their ability to model complexity often comes at the cost of explainability, whereby the more complex the model, the harder to explain it becomes.

Related terms: explainability, model, machine-learning, structured, unstructured, supervised, ai, deep-learning

Reason for including new term including references if appropriate: important term, suggested by Moyeen and Sharan from the Hub

[NEW] Metadata

Thank you for suggesting a new term, please enter its details below:

Title: Metadata

Description:

Related terms:

Reason for including new term including references if appropriate: Suggestion from Pam/Hub

[CORRECTION] ICO Feedback

Thank you for finding and reporting an inaccuracy in the term, please fill in the details below:

Accuracy: Accuracy is also one of data protection’s foundational principles and it relates to inputs not outputs. In our guidance on AI and data protection we use the term statistical accuracy to describe what you term as accuracy here, so we see the term ‘statistical accuracy’ as more appropriate to avoid confusing organisations processing personal data in the context of AI.
Bias: Humans (reviewers, labellers, etc) can also be the source of bias, not just the datasets themselves. With that in mind, describing bias as the outcome of ‘errors’ may not capture all of its sources.
Fairness: Fairness as well as is one of data protection’s foundational principles. The definition given in the dictionary appears to conflate fairness with anti-discrimination. But fairness in data protection even though it encompasses discrimination concerns, is broader than discrimination. The ICO has said “fairness means that you should only handle personal data in ways that people would reasonably expect and not use it in ways that have unjustified adverse effects on them”. An unfair decision allocated indiscriminately across groups is not automatically fair.

[CORRECTION] Standard

Thank you for finding and reporting an inaccuracy in the term, please fill in the details below:

standard:

**An agreed set of definitions, guidelines and sometimes technical approaches for a specific area. Formal Standards may be mandated by the Government, whereas de facto standards are created and used by communities working in that space.

For example, ISO 13485 is a designated Standard mandated by the UK Government for the development of a medical device. It includes specific guidelines and processes for the safe development of a medical device, and organisations must demonstrate their adherence to the Standard to be allowed to place a medical device on the market.**

For example, ISO 13485 is a UK Designated Standard for quality management systems for medical device, it can be used to demonstrate conformance with parts of the medical device regulations.**

Standards are entirely voluntary under the UK's device regulations.

[NEW] Overfitting

Thank you for suggesting a new term, please enter its details below:

Title: Overfitting

Description: The process of building a model which is based too closely on the data. This results in a model which may be very accurate on the training data, but when tested on additional datasets such as the test data, unseen data or data from a new environment, performs badly.

Approaches to reduce overfitting include cross-validation, data augmentation and ensemble techniques (which combine different models).

Related terms: machine-learning, bias, model, training-data, test-data, cross-validation, underfitting, data-augmentation

Reason for including new term including references if appropriate: Important term, suggested by Moyeen on the Hub

[CORRECTION] Cross validation

Thank you for finding and reporting an inaccuracy in the term, please fill in the details below:

Term: Cross validation

Current content: An approach to generating different validation and training data sets from within a single data set, to improve the performance of a predictive model.

Popular techniques include k-fold cross validation.

Corrected content: An approach to reducing overfitting during model development, by iteratively selecting different portions of the data to train and validate a predictive (supervised) machine learning model.

Cross validation can increase the overall performance of a model, along with data augmentation techniques.

Reason for correction including reference if appropriate: by trying to be simple I wonder if the meaning has been lost here – the Wikipedia definition (below) could be simplified
Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice, from Rubeta/Hub

[NEW] Underfitting

Thank you for suggesting a new term, please enter its details below:

Title: Underfitting

Description: The process of building a model which is not based closely enough on the data. This results in a model which performs badly and fails to capture the relationships you are looking for.

There is a balance to be made between underfitting and overfitting.

Related terms: machine-learning, model, training-data, overfitting

Reason for including new term including references if appropriate: Important term, suggested by Moyeen on the Hub

[UX] Format tables

To implement #63 we need to format html tables.

Please see branch feature/table for example structure.

Example formatting (ignore colour, although differentiating between false and positive terms would be useful): https://miro.medium.com/max/1400/1*fxiTNIgOyvAombPJx5KGeA.png

Include slug and description in search terms

Currently the search bar searches through the term.title.

As a user, I want to be able to search for a term by matching the slug or description of the term so that I can find the definition I'm interested in.

UAT: Searching for "natural" will return the current Machine Learning definition
UAT: Searching for "ai" will return the Artificial Intelligence definition

[NEW] Deep Learning

Thank you for suggesting a new term, please enter its details below:

Title: Deep Learning

Description: An approach to building models using neural networks with more than one "hidden" layer of artificial neurons. This is a common approach when working with image and text data.

Deep learning models are able to capture complex relationships but can be difficult to interpret what data leads to a particular outcome.

Related terms: machine-learning, supervised, ai, neural-network, explainability

Reason for including new term including references if appropriate: important term, suggested by Moyeen and Sharan from the Hub

Json-validation - check self-reference

Currently a related term can relate to itself

Create Github community collateral

e.g. issue templates, PR templates, contributing, license etc.

Add dead-link checker

Add automation to check for dead (404, DNR) links in the terms list.

[SUGGESTION] Add further reading links

From MX, via Hub/email:

What would be helpful is to add extra references in each of the words to allow users to dig a bit more into each of the concepts. Perhaps wikipedia links might be useful.

[CORRECTION] Unsupervised machine learning

Thank you for finding and reporting an inaccuracy in the term, please fill in the details below:

Term: Unsupervised machine learning

Current content: A type of machine learning where you do not know the outcome or definition of your data, and are looking for patterns. This includes clustering techniques such as k-nearest neighbours and principal component analysis (PCA).

For example, unsupervised machine learning can help identify different groups of hospital patients who use hospital services in different ways.

Corrected content: A type of machine learning where you do not know the outcome or definition of your data, and are looking for patterns. This includes clustering techniques such as k-means and principal component analysis (PCA).

For example, unsupervised machine learning can help identify different groups of hospital patients who use hospital services in different ways.

Reason for correction including reference if appropriate: k-means not knn; thanks LE from Lab

[NEW] Various terms

Thank you for suggesting a new term, please enter its details below:

Title: Synthetic Data, Gradient descent, Binary, Sequential data

Description:

Related terms:

Reason for including new term including references if appropriate: Inbound over email from Dr. Z.A.

Add pre-commit hooks

To validate JSON schema

Replace NHSX logo with NHS logo

[CORRECTION] Validation data

Thank you for finding and reporting an inaccuracy in the term, please fill in the details below:

Term: Validation data

Current content: Data that is not included in the training data, but is used to improve the performance of an AI model as it is trained.

This definition relates to the definition within AI, and not the regulatory aspects of medical devices.

Corrected content: Data that are not included in the training data, but are used to improve the performance of an AI model as it is trained.

This definition relates to the definition within AI, and not the regulatory aspects of medical devices.

Reason for correction including reference if appropriate: data is a plural word – therefore the statement should read: Data that are not…..but are used…….

[IMPROVEMENT] Versioning of definitions

Suggestion from HR: Have you got plans to tag/semver it so that one can refer to it from a QMS? I was thinking that you might want to separate the related terms so that the versioned term definitions are cleaner.

[NEW] Graph Neural Networks

Thank you for suggesting a new term, please enter its details below:

Title:
Graph Neural Networks

Description:
Graph Neural Networks (GNNs) are a class of deep learning methods designed to perform inference on data described by graphs.

Graphs are a very powerful way of representing data, relationships and their complexity. Training machine learning models to learn relationships in graphs and predict their features as more data integrates into the graph.

GNNs are neural networks that can be directly applied to graphs, and provide an easy way to do node-level, edge-level, and graph-level prediction tasks.

In recent years, variants of GNNs such as graph convolutional network (GCN), graph attention network (GAT), graph recurrent network (GRN) have demonstrated ground-breaking performances on many deep learning tasks.

An example of data in healthcare representable as graphs is psychopathology networks consist of aspects (e.g., symptoms) of mental disorders (nodes) and the connections between those aspects (edges). A trained GNN on this graph will be able to predict disorders based on the provided symptoms.

Related terms:
Neural Networks
Models
Deep Learning

Reason for including new term including references if appropriate:
Reference:
1- https://neptune.ai/blog/graph-neural-network-and-some-of-gnn-applications
2- https://arxiv.org/abs/1812.08434

[IMPROVEMENT] Machine Learning

Reference https://www.gov.uk/government/publications/good-machine-learning-practice-for-medical-device-development-guiding-principles

[NEW] Federated Learning

Thank you for suggesting a new term, please enter its details below:

Title: Federated Learning

Description:

Related terms: Data protection, machine learning

Reason for including new term including references if appropriate: Suggestion from Alasdair R. (LinkedIn)

[NEW] XAI

XAI

Typically, AI solutions adopt a "black box" approach in which it is impossible (or at the least very difficult) to explain how the model generated a specific answer. XAI, short for Explainable Artificial Intelligence, refers to when it is possible for humans for understand how the results of an AI model were obtained.

[NEW] GAN

Thank you for suggesting a new term, please enter its details below:

Title: Generative Adversarial Network (GAN)

Description:

Related terms: Neural Network, Deep Learning, AI

Reason for including new term including references if appropriate: Suggestion from Alasdair R. (LinkedIn)

[CORRECTION] Data

Thank you for finding and reporting an inaccuracy in the term, please fill in the details below:

Term: Data

Current content: Information stored in a digital way.

For example, this can be information on your physical state such as heart rate, blood pressure, or notes on your recent visit to your primary care physician. A picture can be data, as can audio but we are struggling to standardise how to represent smell (for the time being).

Corrected content: Information stored in a digital way.

For example, this can be information on your physical state such as heart rate, blood pressure, or notes on your recent visit to your primary care physician.

Imaging data is a common type of healthcare data, which includes data generated from X-ray machines, CT scanners, MRI scanners, OCT systems etc.

Reason for correction including reference if appropriate: I wasn't sure why smell was mentioned – and actually smell essentially detects volatile compounds in the skin so that would be the data to collect https://pubmed.ncbi.nlm.nih.gov/21079799/ from Rubeta/Hub

[NEW] Confusion Matrix

Thank you for suggesting a new term, please enter its details below:

Confusion Matrix

A tool that is used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known

False Positive
False Negative
True Positive
True Negative

Reason for including new term including references if appropriate:

Allow hyperlinks in descriptions

As a term writer, I need to include hyperlinks to other websites, so that I can direct readers to further information.

Current naive attempts at including HTML a tags in the terms.json file yield:

I assume we want search to exclude URLs as it may trigger false positives?

nhsx / ai-dictionary Goto Github PK

ai-dictionary's People

Contributors

Stargazers

Watchers

Forkers

ai-dictionary's Issues

Issue 1

Issue 2

Issue 3

Recommend Projects

Recommend Topics

Recommend Org