Giter Site home page Giter Site logo

coursera-uw-machine-learning-clustering-retrieval's Introduction

Coursera UW Machine Learning Clustering & Retrieval

Course can be found in Coursera

Notebook for quick search can be found in my blog SSQ

Videos in Bilibili(to which I post it)

  • Week 1 Intro

  • Week 2 Nearest Neighbor Search: Retrieving Documents

    • Implement nearest neighbor search for retrieval tasks
    • Contrast document representations (e.g., raw word counts, tf-idf,…)
      • Emphasize important words using tf-idf
    • Contrast methods for measuring similarity between two documents
      • Euclidean vs. weighted Euclidean
      • Cosine similarity vs. similarity via unnormalized inner product
    • Describe complexity of brute force search
    • Implement KD-trees for nearest neighbor search
    • Implement LSH for approximate nearest neighbor search
    • Compare pros and cons of KD-trees and LSH, and decide which is more appropriate for given dataset
    • Choosing features and metrics for nearest neighbor search
    • Implementing Locality Sensitive Hashing from scratch
  • Week 3 Clustering with k-means

    • Describe potential applications of clustering
    • Describe the input (unlabeled observations) and output (labels) of a clustering algorithm
    • Determine whether a task is supervised or unsupervised
    • Cluster documents using k-means
    • Interpret k-means as a coordinate descent algorithm
    • Define data parallel problems
    • Explain Map and Reduce steps of MapReduce framework
    • Use existing MapReduce implementations to parallelize kmeans, understanding what’s being done under the hood
    • Clustering text data with k-means
  • Week 4 Mixture Models: Model-Based Clustering

    • Interpret a probabilistic model-based approach to clustering using mixture models
    • Describe model parameters
    • Motivate the utility of soft assignments and describe what they represent
    • Discuss issues related to how the number of parameters grow with the number of dimensions
      • Interpret diagonal covariance versions of mixtures of Gaussians
    • Compare and contrast mixtures of Gaussians and k-means
    • Implement an EM algorithm for inferring soft assignments and cluster parameters
      • Determine an initialization strategy
      • Implement a variant that helps avoid overfitting issues
    • Implementing EM for Gaussian mixtures
    • Clustering text data with Gaussian mixtures
  • Week 5 Latent Dirichlet Allocation: Mixed Membership Modeling

    • Compare and contrast clustering and mixed membership models
    • Describe a document clustering model for the bagof-words doc representation
    • Interpret the components of the LDA mixed membership model
    • Analyze a learned LDA model
      • Topics in the corpus
      • Topics per document
    • Describe Gibbs sampling steps at a high level
    • Utilize Gibbs sampling output to form predictions or estimate model parameters
    • Implement collapsed Gibbs sampling for LDA
    • Modeling text topics with Latent Dirichlet Allocation
  • Week 6 Hierarchical Clustering & Closing Remarks

    • Bonus content: Hierarchical clustering
      • Divisive clustering
      • Agglomerative clustering
        • The dendrogram for agglomerative clustering
        • Agglomerative clustering details
    • Hidden Markov models (HMMs): Another notion of “clustering”
    • Modeling text data with a hierarchy of clusters

coursera-uw-machine-learning-clustering-retrieval's People

Contributors

ssq avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.