ALBERT vs BERT Parameters

110M
BERT Base
12M
ALBERT Base
Matrix Factorization: V x E + E x H < V x H
Layer Sharing: 12 Layers share same weights