Classifying Documents with Large-texts Based on RNN&BERT

less than 1 minute read

Collected a Chinese corpus bigger than 5G, composing of news, poems(shi), Chinese ci, classical proses, papers
Implemented machine learning algorithms including Naive Bayers and Logistic regression to classify
Used word2vec, combined with TextCNN, TextRNN, TextRNN_Att, TextRCNN, DPCNN, Transformer with pytorch; compared the accuracies
Implemented ERNIE, bert, bert_cnn, bert_dpcnn, bert_rnn, bert_rcnn with pytorch; compare the accuracies

Machine learning algorithms

avatar avatar

RNN & CNN

Models comparison

Model	Accuracy
TextCNN	0.8734
TextRNN	0.8853
TextRNN_Att	0.8859
TextRCNN	0.8862
DPCNN	0.8853
Transformer	0.8752

The best moder performance
Confusion matrix

Bert & ERNIE

Models comparison

Model	Accuracy
ERNIE	0.8869
Bert	0.8890
Bert_cnn	0.8819
Bert_dpcnn	0.8777
Bert_rnn	0.8809
Bert_rcnn	0.8789

The best moder performance
Confusion matrix

Twitter Facebook LinkedIn

Midea Digitalized Office Tool: Datamix

less than 1 minute read

Built the framework of one online office tool “self-service getting data” in Flask Utilized Redis as the cache to improve the read rate; used MySQL to s...

Evaluation for Mainstream Media’s Influence During the Epidemic

3 minute read

Built an indicator system to evaluate the influence of mainstream media including media’s activity, spreading scale and ability to leading opinions Scra...

Online Visual Content Analysis on Time-Sync Comments

less than 1 minute read

Crawled time-sync comments(TSCs) from Bilibili in Python; implemented text analysis methods such as sentiment analysis, analysis of text similarity and L...

Online Homework Management System Permalink

less than 1 minute read

Constructed the database in MySQL and designed entities (students, instructors, administrators, and courses) and their relationship based on the 3rd NF of...

Jingru Chen

Classifying Documents with Large-texts Based on RNN&BERT

Machine learning algorithms

RNN & CNN

Bert & ERNIE

You May Also Enjoy

Midea Digitalized Office Tool: Datamix

Evaluation for Mainstream Media’s Influence During the Epidemic

Online Visual Content Analysis on Time-Sync Comments

Online Homework Management System Permalink