kmader / quantitative-big-imaging-2019 Goto Github PK

View Code? Open in Web Editor NEW

27.0 4.0 19.0 266.74 MB

The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2019

Home Page: http://kmader.github.io/Quantitative-Big-Imaging-2019

License: Apache License 2.0

Shell 0.01% Jupyter Notebook 99.95% HTML 0.04% CSS 0.01% Python 0.01%

image-processing lecture-slides exercises lecture-video reproducible-science

quantitative-big-imaging-2019's Introduction

Quantitative Big Imaging Course 2019

Here are the lectures, exercises, and additional course materials corresponding to the spring semester 2019 course at ETH Zurich, 227-0966-00L: Quantitative Big Imaging.

The lectures have been prepared and given by Kevin Mader and associated guest lecturers. Please note the Lecture Slides and PDF do not contain source code, this is only available in the handout file. Some of the lectures will be recorded and placed on YouTube on the QBI Playlist. The lectures are meant to be followed in chronological order and each lecture has a corresponding hands-on exercise. The entire lecture set is available as a single PDF file available in the releases section

Learning Objectives

General

Ability to compare qualitative and quantitative methods and name situations where each would be appropriate
Awareness of the standard process of image processing, the steps involved and the normal order in which they take place
Ability to create and evaluate quantitative metrics to compare the success of different approaches/processes/workflows
Appreciation of automation and which steps it is most appropriate for
The relationship between automation and reproducibility for analysis

Image Enhancement

Awareness of the function enhancement serves and the most commonly used methods
Knowledge of limitations and new problems created when using/overusing these techniques

Segmentation

Awareness of different types of segmentation approaches and strengths of each
Understanding of when to use automatic methods and when they might fail

Shape Analysis

Knowledge of which types of metrics are easily calculated for shapes in 2D and 3D
Ability to describe a physical measurement problem in terms of shape metrics
Awareness of common metrics and how they are computed for arbitrary shapes

Statistics / Big Data

Awareness of common statistical techniques for hypothesis testing
Ability to design basic experiments to test a hypothesis
Ability to analyze and critique poorly designed imaging experiments
Familiarity with vocabulary, tools, and main concepts of big data
Awareness of the differences between normal and big data approaches
Ability to explain MapReduce and apply it to a simple problem

Target Audience

The course is designed with both advanced undergraduate and graduate level students in mind. Ideally students will have some familiarity with basic manipulation and programming in languages like Python (Matlab or R are also reasonable starting points). Much of the material is available as visual workflows in a tool called KNIME, although these are less up to date than the Python material. Interested students who are worried about their skill level in this regard are encouraged to contact Kevin Mader directly ([email protected]).

Students with very diverse academic backgrounds have done well in the course (Informatics to Art History to Agriculture).
Successful students typically spent a few hours a week working on the exercises to really understand the material.
More advanced students who are already very familiar with Python, C++, or Java are also encouraged to take the course and will have to opportunity to develop more of their own tools or explore topics like machine learning in more detail.

Slack

For communicating, discussions, asking questions, and everything, we will be trying out Slack this year. You can sign up under the following link. It isn't mandatory, but it seems to be an effective way to engage collaboratively How scientists use slack

Weekly Plan

21st February - Introduction and Workflows

Exercises

28th February - Ground Truth: Building and Augmenting Datasets

Exercises

Data Augmentation in KNIME

7th March - Image Enhancement (Guest Lecture - A. Kaestner)

Exercises

Overview

14th March - Basic Segmentation, Discrete Binary Structures

Exercises

Overview

21th March - Advanced Segmentation

Exercises

Overview

28th March - Analyzing Single Objects, Shape and Texture

Exercises

Overview

4th April - Analyzing Complex Objects

Exercises

Overview

11th April - Dynamic Experiments

Exercises

18th April - Statistics, Prediction, and Reproducibility

Exercises

KNIME Exercises
C. Elegans Dataset on Kaggle R Notebook or Python Notebook
Lung Segmentation [https://www.kaggle.com/kmader/dsb-lung-segmentation-algorithm/notebook](Rule-based Image Processing) and Simple Neural Network

2nd May - Scaling Up / Big Data

Exercises

9th May - Guest Lecture - High Content Screening (M. Prummer)

High Content Screening Slides - Michael Prummer / Nexus / Roche

Exercises

High Content Screening with C. Elegans
Goal is looking at what metrics accurately indicate living or dead worms and building a simple predictive model
High Content Screening using Dask/Big Data
Kaggle Overview
Shape Analysis
Processing in R

16th May - Tracking/Dynamic Experiments - Live Coding

23rd May - Project Presentations

Exercises

General Information

The exercises are based on the lectures and take place in the same room after the lecture completes. The exercises are designed to offer a tiered level of understanding based on the background of the student. We will (for most lectures) take advantage of an open-source tool called KNIME (www.knime.org), with example workflows here (https://www.knime.org/example-workflows). The basic exercises will require adding blocks in a workflow and adjusting parameters, while more advanced students will be able to write their own snippets, blocks or plugins to accomplish more complex tasks easily. The exercises from two years ago (available here are done entirely in ImageJ and Matlab for students who would prefer to stay in those environments (not recommended)

Install KNIME

Windows: https://www.dropbox.com/s/75hx7fdpnpzrh5u/knime_rsna_2018.zip?dl=0
Mac: https://www.dropbox.com/s/3tdssp67daadzix/knime_rsna_mac.zip?dl=0
(After you extract it move the KNIME.app into the /Applications/ folder)

Install Python

If you use colab, kaggle or mybinder you won't need python on your own machine but if you want to set it up in the same way the class has you can follow the instructions shown in the video here and below

Install Anaconda Python https://www.anaconda.com/distribution/#download-section
Download the course from github as a zip file
Extract the zip file
Open a terminal (or command prompt on windows)
Go to the binder folder inside the course directory (something like: Downloads/Quantitative-Big-Imaging-2019-master/binder)
Install the environment
conda env create -f environment.yml
Activate the environment conda activate qbi2019 or activate qbi2019
Go up one directory to the root of the course cd ..
Start python jupyter notebook

Assistance

The exercises will be supported by Amogha Pandeshwar and Kevin Mader. There will be office hours in ETZ H75 on Thursdays between 14-15 or by appointment.

Online Tools

The exercises will be available on Kaggle as 'Datasets' and we will be using mybinder as stated above.

Feedback (as much as possible)

Create an issue (on the group site that everyone can see and respond to, requires a Github account), issues from last year
Provide anonymous feedback on the course here
Or send direct email (slightly less anonymous feedback) to Kevin

Final Examination

The final examination (as originally stated in the course material) will be a 30 minute oral exam covering the material of the course and its applications to real systems. For students who present a project, they will have the option to use their project for some of the real systems related questions (provided they have sent their slides to Kevin after the presentation and bring a printed out copy to the exam including several image slices if not already in the slides). The exam will cover all the lecture material from Image Enhancement to Scaling Up (the guest lecture will not be covered). Several example questions (not exhaustive) have been collected which might be helpful for preparation.

Practice Questions

Projects

Overview of possible projects
Here you signup for your project with team members and a short title and description

Software Dependencies

The course, slides and exercises are primarily done using Python 3.6 and Jupyter Notebook 5.5. The binder/repo2docker-compatible environment](https://github.com/jupyter/repo2docker) can be found at binder/environment.yml. A full copy of the environment at the time the class was given is available in the wiki file. As many of these packages are frequently updated we have also made a copy of the docker image produced by repo2docker uploaded to Docker Hub at https://hub.docker.com/r/kmader/qbi2018/

All Lectures

The packages which are required for all lectures

numpy
matplotlib
scipy
scikit-image
scikit-learn
ipyvolume

Machine Learning Packages

For machine learning and big data lectures a few additional packages are required

tensorflow
pytorch
opencv
dask
dask_ndmeasure
dask_ndmorph
dask_ndfilter

Image Registration / Medical Image Data

For the image registration lecture and medical image data

itk
SimpleITK
itkwidgets

Other Material

Data Science/Python Introduction Handbook
ETH Deep Learning Course taught in the Fall Semester, also uses Python but with a much more intensive mathematical grounding and less focus on images.
EPFL Deep Learning Course taught in the Spring Semester by Francois Fleuret, uses Python and PyTorch covers theoretical topics and more advanced research topics with a number of applications and code.
FastAI Deep Learning Course and Part 2 for a very practically focused introduction to Deep Learning using the Python skills developed in QBI.
Deep Learning for Self-Driving Cars at MIT open to beginners and is designed for those who are new to machine learning, but it can also benefit advanced researchers in the field looking for a practical overview of deep learning methods and their application
Reproducible Research
Coursera Course
Course and Tools in R
Performance Computing Courses
High Performance Computing for Science and Engineering (HPCSE) I
Programming Massively Parallel Processors with CUDA
Introduction to Machine Learning (EPFL)

Additional Lectures from Previous Years

Tutorial: Python, Notebooks and Scikit

Lecture slides
Handout

Roads from Aerial Images

Javier Montoya / Computer Vision / ScopeM

Introduction to Deep Learning / Machine Learning

Presented by Aurelien Lucchi in Data Analytics Lab in D-INFK at ETHZ

Slides

quantitative-big-imaging-2019's People

Contributors

Stargazers

Watchers

Forkers

lento234 kkc-krish gsoumyendu natandrade imaginglectures sh4zkh4n traintravel nphilip1098 pableizi ahmadjaved97 00mjk habibmrad

quantitative-big-imaging-2019's Issues

Split Topology/Distribution lectures

Update PCA explanation and examples

The examples with cells are currently not great and don't really show axes or shape tensors well

K-means on images / 3D data

I'd like to cluster my vector maps with k-means in time. My number of samples are thus my number of timesteps and the featurevector are the u and v vectors for all pixels. Like this, I cannot use the information that u and v are a couple at each point in space, I just stack them behind each other. sklearn.Kmeans doesn't allow for a third dimension.

Add puppy pvalue video to stats lecture

Use medium article from @kozyrkov
https://hackernoon.com/explaining-p-values-with-puppies-af63d68005d0

Emphasize background correction more

rolling ball
CLAHE
detrend
high pass

Add links to Data viz book

https://socviz.co/lookatdata.html

Data Visualization: A practical introduction"
A stunning, beautiful, carefully researched, free, online book by
@kjhealy

Add to why distributed image processing is hard

This issue and figures perfectly show why it is hard

dask/dask-image#94

Parallel k-means

Send to student

Add example from awesome dask_image blog post

Use @mrocklin blog post as basis for big data exercise: https://blog.dask.org/2019/04/09/numba-stencil

Datasets building reference

https://link.springer.com/chapter/10.1007/978-3-319-94878-2_6

Add Databricks Cloud Example

add examples for image analysis and word count using spark

Animated gif for K-Means

show how k-means works iteratively

Add SEM angle change reconstruction kernel

Gotta catch them all python intro

Add tutorial for python bugs

https://inventwithpython.com/blog/2019/08/15/python-error-messages-gotta-catch-em-all/
@AlSweigart

Better example of vector images

Make a better example using DTI images: https://en.wikipedia.org/wiki/Diffusion_MRI#/media/File:DTI-axial-ellipsoids.jpg

Link to understanding images

http://www.cs.utoronto.ca/~fidler/teaching/2018/CSC420.html

Fix link to course wiki

Add epfl deep learning course

Merge exercises and lectures into one list

ROC Curve to Clinical Utility

https://jamanetwork.com/journals/jama/fullarticle/2748179

Considering Net Benefit During the Selection of the Best-Performing Model
Two receiver operating characteristic (ROC) curves are shown. Expected utility increases in the direction of the arrowhead. The hypotenuse of the light blue triangle marks the maximum utility achievable by taking readmission preventing actions based on the model represented by the orange curve. The blue ROC curve extends into the triangle, showing that actions based on this model have a higher utility, even though it has a lower area under the ROC.

Negative mining

https://www.reddit.com/r/computervision/comments/2ggc5l/what_is_hard_negative_mining_and_how_is_it/ckiuu9i?utm_medium=android_app&utm_source=share

Let's say I give you a bunch of images that contain one or more people, and I give you bounding boxes for each one. Your classifier will need both positive training examples (person) and negative training examples (not person).

For each person, you create a positive training example by looking inside that bounding box. But how do you create useful negative examples?

A good way to start is to generate a bunch of random bounding boxes, and for each that doesn't overlap with any of your positives, keep that new box as a negative.

Ok, so you have positives and negatives, so you train a classifier, and to test it out, you run it on your training images again with a sliding window. But it turns out that your classifier isn't very good, because it throws a bunch of false positives (people detected where there aren't actually people).

A hard negative is when you take that falsely detected patch, and explicitly create a negative example out of that patch, and add that negative to your training set. When you retrain your classifier, it should perform better with this extra knowledge, and not make as many false positives.

Windows: https://www.dropbox.com/s/75hx7fdpnpzrh5u/knime_rsna_2018.zip?dl=0
Mac: https://www.dropbox.com/s/3tdssp67daadzix/knime_rsna_mac.zip?dl=0
(After you extract it move the KNIME.app into the /Applications/ folder)

from pipe_utils import flatten_step
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler
digit_pipe = Pipeline([('Flatten', flatten_step),
                       ('Normalize', RobustScaler())])
digit_pipe.fit(img_data)

show_pipe(digit_pipe, img_data)
show_pipe(digit_pipe, img_data, show_hist=True)

Produces the following error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-39-fb8d4dbf3178> in <module>()
      6 digit_pipe.fit(img_data)
      7 
----> 8 show_pipe(digit_pipe, img_data)
      9 show_pipe(digit_pipe, img_data, show_hist=True)

~/GitHub/QBI-ethz/Quantitative-Big-Imaging-2019-master/Lectures/pipe_utils.py in show_pipe(pipe, in_data, show_hist)
     52                     last_data = step_op.predict(last_data)
     53 
---> 54         display_data(c_ax, last_data, show_hist)
     55         c_ax.set_title('Step {} {}\n{}'.format(i, last_data.shape, step_name))
     56         c_ax.axis('on')

~/GitHub/QBI-ethz/Quantitative-Big-Imaging-2019-master/Lectures/pipe_utils.py in display_data(in_ax, raw_data, show_hist)
     33         else:
     34             n_stack = np.stack([(x-x.mean())/x.std() for x in in_data], 0)
---> 35             in_ax.imshow(montage2d(n_stack))
     36 
     37 

TypeError: 'module' object is not callable

Computer vision notes

https://towardsdatascience.com/how-to-screw-up-a-computer-vision-project-166dfcc44a5f

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.