Giter Site home page Giter Site logo

cail's Introduction

更多模型和代码,参考:https://github.com/shelleyHLX/text-classification :) 嗷嗷嗷,有缘再见呀。。。。。。。

**法研杯比赛

法律数据集

文件组成

cail2018_big.json: 171w

数据组成

数据中涉及 183个法条202个罪名,均为刑事案件

数据清洗

数据中筛除了刑法中前101条(前101条并不涉及罪名),并且为了方便进行模型训练,将罪名和法条数量少于30的类删去。

数据格式

数据利用json格式储存,每一行为一条数据,每条数据均为一个字典

字段及意义
  • fact: 事实描述
  • meta: 标注信息,标注信息中包括:
    • criminals: 被告(数据中均只含一个被告)
    • punish_of_money: 罚款(单位:元)
    • accusation: 罪名
    • relevant_articles: 相关法条
    • term_of_imprisonment: 刑期 刑期格式(单位:月)
      • death_penalty: 是否死刑
      • life_imprisonment: 是否无期
      • imprisonment: 有期徒刑刑期

数据处理

停用词 地名,人名,一般停用词。

分词 Python包:jieba。

模型

此部分涉及两个模型:TextCNN,Attention。

代码框架

|- ckpt # 保存训练好的模型
|- data    # 预处理得到的数据
|- data_raw     # 原始数据
|- log # 训练日志
|- models      # 模型代码
|  |- Attention_TextCNN # 模型名称
|  |  |- network.py       # 定义网络结构
|  |  |- train.py        # 模型训练
|  |  |- predict.py        # 模型预测
|- process_data            # 预处理
|- scores              # 预测的结果
|- summary           # tensorboard数据
|- data_helper.py        # 数据处理辅助函数
|- evaluator.py  # 评价函数
|- utils.py # 其他函数

下面是我实验中的一些环境依赖,版本只提供参考。

环境/库 版本
Ubuntu 16.04 LTS
python 3.5.0
tensorflow-gpu 1.4.0

代码运行

law_id.py --> embed2ndarray.py --> fact2dic_law2id.py --> fact2words.py --> word2id.py --> batch_data.py

train.py --> predict.py

结果

任务一: 42/170 shelley 86.91 85.34 85.81

任务二: 41/170 shelley 84.63 82.87 83.40

参考文献

(1)TextCNN: Kim Y. Convolutional Neural Networks for Sentence Classification[J]. Eprint Arxiv, 2014.

Conneau A, Schwenk H, Barrault L, et al. Very Deep Convolutional Networks for Text Classification[J]. 2017:1107-1116. Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[J]. 2014:1-9.

(2)Attention: Yang Z, Yang D, Dyer C, et al. Hierarchical Attention Networks for Document Classification[C]// Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2017:1480-1489.

cail's People

Contributors

shelleyhlx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cail's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.