TensorFloat-32

TensorFloat-32 (TF32) is a numeric floating point format designed for Tensor Core running on certain Nvidia GPUs. It was first implemented in the Ampere architecture ^[1]. TensorFloat-32 combines the 8-bit exponent size of IEEE single precision with the 10-bit mantissa size of half precision for a total of 19 bits per number. It is comparable to the bfloat16 format, which uses a 7-bit mantissa.

[1]

TensorFloat-32

Format

See also

References

Related Articles