Giter Site home page Giter Site logo

alanzanardi / hematological-analysis Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 824 KB

Here I have collected two scripts written in Python and SQL, designed for analyzing data related to physiological parameters derived from experimental measurements. These tools were created to expedite the statistical analysis process, extracting and sorting data from tabular-format datasets, in my specific case studies.

License: MIT License

Python 100.00%
anova-analysis statistical-analysis statistics

hematological-analysis's Introduction

statistical-tools

Here I have collected two scripts written in Python and SQL, designed for analyzing data related to physiological parameters derived from experimental measurements. These tools were created to expedite the statistical analysis process, extracting and sorting data from tabular-format datasets, in my specific case studies.

Hoping they can be useful to you as example from which collecting inspiration for your specific cases.

*** DISCLAIMER *** these scripts were written to help me in my laboratory data analysis work. The example I used to explain how they work show totally random values, in any case each data contained here should be trated as confidential. Thank you for the support.

How it works 🔧

Both scripts are designated to collect specific data from my dataset (namely, haematological_dataset.db). The difference is that data are stored in a local database located in the project folder (not present in the repository) in haematological_analysis_local_folder case; while in haematological_analysis_host case, data are stored on a database located on an host (in my case, the localhost).

What was my problem? 🔍

The type of dataset I needed to analyze was relatively simple.

In my case, the dataset was made from haematological analysis performed on blood sample from different subjects (both male and female) at different timepoints (6, 12, 18 and 24 months). These subjects were grouped on the basis of genotype (knock-out, KO; heterozygous, HE; and wild-type, WT).

The haematological parameters measured are: red blood cell count, rcb; haemoglobin level, hgb; hematocrit, hct; mean corpuscular volume, mcv; mean corpuscular haemoglobin, mch; mean corpuscular haemoglobin concentration, mchc; red cell distribution width - standard deviation, rdw_sd; reticulocyte number, ret_num; reticulocyte percentage, ret_perc; platelet count, plt; white blood cell count, wbc; reticulocyte haemoglobin content, ret_he.

An example of such dataset is depicted in figure 1, left panel.

My goal was to group data at different timepoints on the basis of the parameter considered (see figure 1, right panels), in order to perform variance analysis and post hoc test on the three genotypes, using Prism GraphPad.

Given the amount of haematological parameters measured, the timepoints considered and the difference between male and female that could be significant, this process used to be time consuming (approximately two hours for analysis).

Figure 1

Why the project was useful? 💡

These scripts were created to automate the data collection process from my dataset, covering all the steps I previously performed manually — from the Excel file to the Prism GraphPad analysis.

Data were collected at specific timepoints (the input inserted at the beginning) and grouped based on the hematological parameter considered (in my case, all parameters).

Following this, the algorithm conducted the Shapiro-Wilk normality test on the Gaussian distribution to determine the appropriate test type to use (parametric vs. non-parametric); then, it calculted the p-value through ANOVA or Kruskal-Wallis test, and conducted post hoc tests (Tukey’s or Dunn’s, depending on the data distribution) for multiple comparisons.

At this stage of the algorithm, a report displaying p-values and post hoc test results for each parameter is printed in the terminal (see Figure 2).

Figure 2


In addition, a brief preview of the corresponding graphs for each considered parameter is also displayed to provide an overview of the data distribution in each situation (see figure 3).

Figure 3


Using these scripts, I was able to save several hours of unproductive work. 🥰

hematological-analysis's People

Contributors

alanzanardi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.