less than 1 minute read

  • Collected a Chinese corpus bigger than 5G, composing of news, poems(shi), Chinese ci, classical proses, papers
  • Implemented machine learning algorithms including Naive Bayers and Logistic regression to classify
  • Used word2vec, combined with TextCNN, TextRNN, TextRNN_Att, TextRCNN, DPCNN, Transformer with pytorch; compared the accuracies
  • Implemented ERNIE, bert, bert_cnn, bert_dpcnn, bert_rnn, bert_rcnn with pytorch; compare the accuracies

Machine learning algorithms

avataravatar

RNN & CNN

  • Models comparison

    Model Accuracy
    TextCNN 0.8734
    TextRNN 0.8853
    TextRNN_Att 0.8859
    TextRCNN 0.8862
    DPCNN 0.8853
    Transformer 0.8752
  • The best moder performance
    avatar
  • Confusion matrix
    avatar

Bert & ERNIE

  • Models comparison

    Model Accuracy
    ERNIE 0.8869
    Bert 0.8890
    Bert_cnn 0.8819
    Bert_dpcnn 0.8777
    Bert_rnn 0.8809
    Bert_rcnn 0.8789
  • The best moder performance
    avatar
  • Confusion matrix
    avatar