Giter Site home page Giter Site logo

general-pos's Introduction

general_pos

词性标注分类体系:https://blog.csdn.net/qq_32023541/article/details/84632829

data:

ontonotes:

  1. 原始数据:ontonotes-release-5.0/

  2. 处理程序1:util/process_ontonotes_data_to_word.py
    说明:把原始ontonotes数据用BasicTokenizer切分,中文一个字为一个单位,英文一个单词为一个单位

  3. 处理程序2:util/replace_tag.sh
    说明:把步骤2得到的数据中的不在词性标注分类的tag去掉或替换

  4. 处理程序3:util/split_ontonotes_data.py
    说明:把步骤3处理过的数据,分割为train/dev/test三份

  5. 处理程序4:util/process_ontonotes_word_to_wordpiece.py
    说明:把原始ontonotes数据处理成标准NER格式的数据,用WordpieceTokenizer切分,适合bert使用。 注意:Bert给定中文词表是不包括大写字母(A-Z)的,所以必须得do_lower_case. 英文词表包括大写字母,不用do_lower_case.

train/predict/export 过程:

cd bert_pos/
train: sh run_train.sh
export: sh run_export.sh

metrics:

ontonotes测试集效果(ontonotes训练集训练):
the macro F1 is: 0.856473
the micro F1 is: 0.942755
================================================================================
tp = 204922     tn = 1172907
================================================================================
tag             tp      fp      fn      Precsion        Recall          F1
B-NN            26627   1678    1578    0.940717        0.944052        0.942382
I-NN            28076   1421    1607    0.951826        0.945861        0.948834
B-PU            21446   72      52      0.996654        0.997581        0.997117
I-PU            389     22      71      0.946472        0.845652        0.893226
B-VV            18466   1167    1175    0.940559        0.940176        0.940368
I-VV            13366   1076    1138    0.925495        0.921539        0.923513
B-AD            12497   614     773     0.953169        0.941748        0.947424
I-AD            5995    616     670     0.906822        0.899475        0.903133
B-NR            6243    292     182     0.955318        0.971673        0.963426
I-NR            10729   708     259     0.938096        0.976429        0.956878
B-PN            5981    166     203     0.972995        0.967173        0.970075
I-PN            2058    96      120     0.955432        0.944904        0.950139
B-P             4382    191     233     0.958233        0.949512        0.953853
I-P             752     37      38      0.953105        0.951899        0.952502
B-CD            3777    189     165     0.952345        0.958143        0.955235
I-CD            2096    112     84      0.949275        0.961468        0.955333
B-DEG           3896    192     332     0.953033        0.921476        0.936989
I-DEG           0       0       0       1.000000        1.000000        1.000000
B-M             3321    160     176     0.954036        0.949671        0.951849
I-M             147     18      39      0.890909        0.790323        0.837607
B-JJ            2344    492     692     0.826516        0.772069        0.798365
I-JJ            1881    435     526     0.812176        0.781471        0.796528
B-DEC           2539    358     226     0.876424        0.918264        0.896856
I-DEC           0       0       1       1.000000        0.000000        0.000000
B-DT            2603    183     116     0.934314        0.957337        0.945686
I-DT            1327    107     93      0.925384        0.934507        0.929923
B-VC            2346    102     107     0.958333        0.956380        0.957356
I-VC            0       0       2       1.000000        0.000000        0.000000
B-VA            2104    474     373     0.816137        0.849415        0.832443
I-VA            1367    432     340     0.759867        0.800820        0.779806
B-NT            2056    100     90      0.953618        0.958062        0.955834
I-NT            3012    133     92      0.957711        0.970361        0.963994
B-LC            1776    101     84      0.946191        0.954839        0.950495
I-LC            472     31      36      0.938370        0.929134        0.933729
B-SP            2031    137     141     0.936808        0.935083        0.935945
I-SP            58      2       5       0.966667        0.920635        0.943089
B-AS            1558    111     66      0.933493        0.959360        0.946250
I-AS            0       0       0       1.000000        1.000000        1.000000
B-CC            1387    98      82      0.934007        0.944180        0.939066
I-CC            206     18      47      0.919643        0.814229        0.863732
B-VE            1138    83      53      0.932023        0.955500        0.943615
I-VE            155     13      20      0.922619        0.885714        0.903790
B-IJ            1281    49      40      0.963158        0.969720        0.966428
I-IJ            81      9       45      0.900000        0.642857        0.750000
B-OD            281     30      57      0.903537        0.831361        0.865948
I-OD            259     7       37      0.973684        0.875000        0.921708
B-CS            348     16      21      0.956044        0.943089        0.949523
I-CS            337     19      24      0.946629        0.933518        0.940028
B-MSP           303     27      31      0.918182        0.907186        0.912651
I-MSP           0       0       0       1.000000        1.000000        1.000000
B-DEV           284     12      27      0.959459        0.913183        0.935750
I-DEV           0       0       0       1.000000        1.000000        1.000000
B-BA            245     2       2       0.991903        0.991903        0.991903
I-BA            0       0       0       1.000000        1.000000        1.000000
B-ETC           162     1       1       0.993865        0.993865        0.993865
I-ETC           19      1       0       0.950000        1.000000        0.974359
B-SB            159     5       10      0.969512        0.940828        0.954955
I-SB            0       0       0       1.000000        1.000000        1.000000
B-DER           90      10      13      0.900000        0.873786        0.886700
I-DER           0       0       0       1.000000        1.000000        1.000000
B-LB            73      9       7       0.890244        0.912500        0.901235
I-LB            0       0       0       1.000000        1.000000        1.000000
B-URL           18      0       2       1.000000        0.900000        0.947368
I-URL           378     8       20      0.979275        0.949749        0.964286
B-FW            0       0       11      1.000000        0.000000        0.000000
I-FW            0       0       1       1.000000        0.000000        0.000000
B-ON            0       0       5       1.000000        0.000000        0.000000
I-ON            0       1       2       0.000000        0.000000        0.000000
B-X             0       0       0       1.000000        1.000000        1.000000
I-X             0       0       0       1.000000        1.000000        1.000000
================================================================================

general_pos_predict_sdk

推理sdk

general-pos's People

Contributors

xmcbbkad avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.