Giter Site home page Giter Site logo

askintution / poemmining Goto Github PK

View Code? Open in Web Editor NEW

This project forked from liuhuanyong/poemmining

0.0 1.0 0.0 47.82 MB

Chinese Classic Poem Mining Project including corpus buiding by spyder and content analysis by nlp methods, 基于爬虫与nlp的**古代诗词文本挖掘项目

Python 55.36% HTML 44.64%

poemmining's Introduction

PoemMining

Chinese Classic Poem Mining Project including corpus buiding by spyder and content analysis by nlp methods, 基于爬虫与nlp的**古代诗词文本挖掘项目

项目介绍

**古代诗词文化无疑是文化瑰宝,如何运用计量语言学方法对古代诗词进行挖掘,将有重要意义,本项目将从以下几个方面进行尝试:
1)基于诗词集合的诗人画像生成
2)基于诗词集合的诗人地点足迹识别
3)基于诗词集合的相似诗人聚类, 基于ATM模型,user2vec模型
4)基于诗词集合的情绪分类,标签自动生成
5)基于诗词集合的意象挖掘

项目结构

项目主要包括两个任务:

  1. 古代诗词语料库的构建
  2. 基于古代诗词语料库的挖掘

脚本结构

1, poem_spider.py:主要完成古代诗词语料库的构建,选取的是古诗文网 (https://so.gushiwen.org),结果已经保存至corpus_poem.zip文件当中
2, poem_process.py:主要基于构建起来的古诗词语料库,进行基础的文本分析,根据网站上的用户交互信息,得到古诗词文本本身的外部信息
3, atm_model.py:利用作者-主题模型,对古诗词进行主题分析,最终目的是实现作者主题分布与风格聚类
4, location_mining.py:基于诗人百科生平记事的地点挖掘与可视化,最终最终实现对诗人关联地点的一键生成.

阶段性成果

1, 古代诗词语料库,一共采集到92127首古代诗词
2, 古代诗词外部计量分析结果,结果保存至result文件夹
3, 诗人足迹一键生成,使用方式如下,结果会直接生成以搜索诗人名字命名的html文件:

from location_mining import *
name = '李白'
handler = PoetWalk()
handler.mining_main(name)

以下是举例结果:
李白足迹 image 李清照足迹 image 苏轼足迹 image 文天祥足迹 image

If any question about the project or me ,see https://liuhuanyong.github.io/

poemmining's People

Contributors

liuhuanyong avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.