Giter Site home page Giter Site logo

lda's Introduction

LDA(Latent Dirichlet Allocation)

此專案以Python3進行開發,使用scikit-learn以新聞資料統計詞頻,結合LDA主題模型實作的範例。

LDA Introduction:

LDA是主題模型,可以對一篇文章進行分析,計算它屬於哪個主題的概率,比如一篇文章,裡面好多詞:蘋果、三星、華為、魅族……等等,那麼這篇文章很有可能是手機這個主題。

先簡單的說一下LDA的核心**:
我們認為每一個文檔Doc都是由多個主題Topic組成,而每一個主題Topic由多個詞Word組成。

通過對語料庫D中所有的文檔d進行分詞或者抽詞處理之後,通過模型訓練,我們得到兩個機率矩陣:
一是每一個Doc對應K個Topic的機率;
二是每一個Topic對應N個詞組成的詞表的機率。

注意由於LDA是基於詞頻統計的,因此一般不用TF-IDF來做文檔特徵

image

lda's People

Stargazers

WenTingTseng avatar

Watchers

James Cloos avatar

Forkers

wentingtseng

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.