Many algorithms are memory bounded. Reducing the required memory bandwidth will increase algorithm performances...
But F16 is not a format supported in C, nor a format supported by a FPU
F16 & Altivec
Well, in fact F16 is supported by Altivec/VMX/Velocity Engine SIMD extension
Just have a look at the intrinsic vec_re: reciprocal value computation in half precision. Half precision is not 'half' format, F32 are still used for containing the F16 number
F13 for every one
Conversion from 32-bit format to 16-bit is not very quick (bias management, exponent check, mantissa re-calibration, ...) for scalar and SIMD FPU.
The secret: just cut!
F13 has a small accuracy, but it is enough for some algorithms (and still better than fixed point computation).