Performance
Easy to implement algorithm
- Uses only common, 32-bit ALU operations
- Can eliminate 128 bit rotates by compressing quad ops into 4 sequential stages
Parallelization
- First 6 quad-rounds are not parallelizable
- Last 6 have better parallel qualities
Operations are simple but key schedule takes MANY operations