E8M0: Exponent-Only Power-of-Two Scale Factor
Block-scaled number formats (like OCP MX and NVIDIA NVFP4) need a compact scale factor that can be applied to a group of low-precision elements. This scale factor must be:
- Compact: 1 byte per block (not per element)
- Efficient to apply: scaling by a power of 2 is a simple bit shift, not a multiply
- Wide range: must cover the full dynamic range of the data being scaled
The e8m0 type is an 8-bit exponent-only format: no sign bit, no mantissa, pure power-of-two encoding. It represents values of the form 2^(encoding - 127), covering a range from 2^-127 (~5.9 × 10^-39) to 2^127 (~1.7 × 10^38). Scaling by an e8m0 value is a bit shift, not a floating-point multiply.
e8m0 is a fixed-format 8-bit exponent-only type:
| Property | Value |
|---|---|
| Total bits | 8 |
| Sign bits | 0 (always positive) |
| Mantissa bits | 0 (no fractional part) |
| Exponent bits | 8 |
| Bias | 127 |
| Value | 2^(encoding - 127) |
| Range | 2^-127 to 2^127 |
| Special | 0xFF = NaN |
Key Properties
Section titled “Key Properties”- Pure power-of-two: every value is an exact power of 2
- No sign bit: always positive (scale factors are magnitudes)
- No mantissa: maximum dynamic range for a single byte
- Shift-based arithmetic: scaling = exponent addition (integer add on bytes)
- Trivially copyable: single
uint8_tstorage - NaN marker: encoding 0xFF is reserved for NaN
Value Table (Selected)
Section titled “Value Table (Selected)”| Encoding | Value |
|---|---|
| 0 | 2^-127 ≈ 5.88 × 10^-39 |
| 64 | 2^-63 ≈ 1.08 × 10^-19 |
| 127 | 2^0 = 1.0 |
| 128 | 2^1 = 2.0 |
| 190 | 2^63 ≈ 9.22 × 10^18 |
| 254 | 2^127 ≈ 1.70 × 10^38 |
| 255 | NaN |
How It Works
Section titled “How It Works”The encoding is trivially simple: the stored byte minus the bias (127) gives the exponent.
value = 2^(stored_byte - 127)Multiplication of an e8m0 by another e8m0 is addition of their stored bytes (with bias adjustment). Applying an e8m0 scale to a floating-point value is adding the e8m0 exponent to the float’s exponent field — equivalent to ldexp(value, exponent) or a simple bit shift for fixed-point values.
How to Use It
Section titled “How to Use It”Include
Section titled “Include”#include <universal/number/e8m0/e8m0.hpp>using namespace sw::universal;Basic Usage
Section titled “Basic Usage”e8m0 scale(1.0f); // Encoding 127, value = 2^0 = 1.0e8m0 scale2(256.0f); // Encoding 135, value = 2^8 = 256.0
std::cout << "scale: " << scale << " encoding: " << to_binary(scale) << std::endl;std::cout << "scale2: " << scale2 << " encoding: " << to_binary(scale2) << std::endl;As Block Scale Factor
Section titled “As Block Scale Factor”// e8m0 is the scale type for OCP MX blocks#include <universal/number/mxfloat/mxfloat.hpp>
// MX block: 32 elements sharing one e8m0 scalemxblock<e4m3, 32> block;// The block internally uses e8m0 as the shared scale factor
// Quantize float data into the blockstd::vector<float> data(32);// ... fill data ...block.quantize(data);// The e8m0 scale captures the block's dynamic range// Elements are scaled relative to this power-of-twoDynamic Range Inspection
Section titled “Dynamic Range Inspection”e8m0 val;for (unsigned i = 0; i < 256; ++i) { val.setbits(i); if (i < 255) { std::cout << "encoding " << i << ": 2^" << (int(i) - 127) << " = " << val << std::endl; } else { std::cout << "encoding 255: NaN" << std::endl; }}Problems It Solves
Section titled “Problems It Solves”| Problem | How e8m0 Solves It |
|---|---|
| Block formats need a compact scale factor | 1 byte per block, covers 10^-39 to 10^38 |
| Floating-point multiply for scaling is expensive | Power-of-2 scaling is a bit shift |
| Scale factor must cover full data dynamic range | 8-bit exponent spans 254 orders of magnitude |
| Hardware needs simple, fixed-format scale encoding | No sign, no mantissa, trivial decode |
| OCP MX specification compliance | Direct implementation of OCP MX v1.0 scale type |
| Memory-efficient metadata for quantized tensors | 1 byte overhead per 32 elements |