How It Works
The heatmap above visualizes the Positional Encoding matrix. Each row represents a position in the sequence (from top to bottom), and each column represents a dimension of the embedding vector (from left to right).
PE(pos, 2i) = sin(pos / 10000^(2i/d_model))
PE(pos, 2i+1) = cos(pos / 10000^(2i/d_model))
The color indicates the value of the encoding, ranging from -1 (Blue/Dark) to +1 (Red/Light). You can see distinct patterns:
- The left side (lower dimensions) has high-frequency changes, alternating rapidly between -1 and 1.
- The right side (higher dimensions) has low-frequency changes, forming long waves.
- This unique pattern allows the model to learn relative positions easily.