Giter Site home page Giter Site logo

dblp-coauthor's Introduction

DBLP-Coauthor-Mining<br >(从DBLP数据集中挖掘合作者)

详细说明

详细说明请查看 数据挖掘实战之DBLP中合作者挖掘(Python+Hadoop)

文件说明

getAuthors.py

下载DBLP数据集dblp.xml到该目录下,http://dblp.uni-trier.de/xml/
运行getAuthors.py 得到authors.txt文件

encode.py

运行该文件后将对上一步得到的authors.txt文件编码(安装作者姓名出现的顺序依次以正整数编码)得到编码后的文件authors_encoded.txt,以及作者姓名与编码对应的文件authors_index.txt,其对应关系为姓名所在的行号减1即为其编码ID(ID从0开始)

view.data.py

读取authors.txt,统计不同支持度下有多少作者,同时绘制曲线,确定支持度阈值大概范围

final.py

主要借鉴了《机器学习实战》中的例子,将结果写入了 result*.txt文件,注意最后的结果增加了置信度过滤。

mapper.py & reduce.py

第一轮MapReduce的Map和Reduce所用到的文件,其实质就是一个wordCount的过程

mapper2.py & reduce2.py

第二轮MapReduce的Map和Reduce所用到的文件,注意在这里的输出并给出没有完整的挖掘结果,而是输出的条件模式集,有空的话再转化一下。(本s实验目的只是验证FP-growth在分布式下实现的可能性,所以没有给出完整的结果)

viewRelation.py

添加了作者与其合作者之间的可视化功能,使用了networkx包。

dblp-coauthor's People

Contributors

findmyway avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.