Home
Article
ALBERT vs BERT Parameters
110M
BERT Base
12M
ALBERT Base
Matrix Factorization: V x E + E x H < V x H
Layer Sharing: 12 Layers share same weights