ucalc: Mixed-Precision REPL Calculator
ucalc is an interactive calculator for exploring and comparing arithmetic
across Universal number types. Instead of writing, compiling, and running C++
for each experiment, you can compare representations, measure precision, and
analyze errors interactively.
Quick Start
Section titled “Quick Start”# Buildcmake -DUNIVERSAL_BUILD_TOOLS_UCALC=ON ..make ucalc
# Interactiveucalc
# One-shotucalc "type posit32; show 1/3"
# Pipe modeecho "compare sqrt(2)" | ucalcCommands Reference
Section titled “Commands Reference”| Command | Description |
|---|---|
type <name> | Set the active arithmetic type |
types | List all available types |
show <expr> | Value + binary decomposition + components |
compare <expr> | Evaluate across all types in a table |
range | Symmetry range: [maxneg … minneg] 0 [minpos … maxpos] |
precision | Binary/decimal digits, epsilon, minpos, maxpos |
ulp <value> | Unit in the last place at a given value |
bits <expr> | Raw bit pattern |
sweep <expr> for <var> in [a, b, n] | Error analysis across a range |
faithful <expr> | Check faithful rounding vs higher-precision reference |
color on/off | Toggle ANSI color-coded bit fields |
vars | List defined variables |
help | Command reference |
Expressions support standard arithmetic (+, -, *, /, ^), parentheses,
variables (x = expr), constants (pi, e, phi, ln2, ln10, sqrt2),
and functions (sqrt, abs, log, exp, sin, cos, pow).
Constants are sourced at quad-double precision (~64 decimal digits) and converted to the active type at its native precision.
Example 1: Precision Near 1.0 — Where Posit Outshines IEEE
Section titled “Example 1: Precision Near 1.0 — Where Posit Outshines IEEE”Posit’s tapered precision allocates more fraction bits near 1.0 than IEEE float does at the same bit width. This means posit32 has a smaller epsilon (7.45e-9 vs 1.19e-7), and can resolve smaller perturbations.
double> type floatActive type: float (float (IEEE-754 binary32))float> precision type: float (IEEE-754 binary32) binary digits: 23 decimal digits: 6.9 epsilon: 1.1920929e-07 minpos: 1.40129846e-45 maxpos: 3.40282347e+38float> type posit32Active type: posit32 (posit< 32, 2, uint32_t>)posit32> precision type: posit< 32, 2, uint32_t> binary digits: 27 decimal digits: 8.1 epsilon: 7.450580597e-09 minpos: 7.523163845e-37 maxpos: 1.329227996e+36Example 2: How 0.1 Looks Across 42 Types
Section titled “Example 2: How 0.1 Looks Across 42 Types”The decimal value 0.1 cannot be represented exactly in binary floating-point.
The compare command reveals how each type approximates it, grouped by
bit width (small <=32, medium 33-80, large >80):
double> compare 1/10Type Value Binary----------------------------------------------------------------------float 0.100000001 0b0.01111011.10011001100110011001101posit8 1.02e-01 0b0.01.00.101posit16 1.0001e-01 0b0.01.00.10011001101posit32 1.000000001e-01 0b0.01.00.100110011001100110011001101bfloat16 0.1 0x0.01111011.1001100fp16 9.9976e-02 0b0.01011.1001100110fp32 9.99999940e-02 0b0.01111011.10011001100110011001100fp8e2m5 0.00e+00 0b0.00.00000fp8e3m4 9.38e-02 0b0.000.0110fp8e4m3 1.02e-01 0b0.0011.101fp8e5m2 9.4e-02 0b0.01011.10fixpnt16 0.10156250 0b00000000.00011010fixpnt32 0.1000061035156250 0b0000000000000000.0001100110011010lns8 0.10511 0b0.11100.11int8 0 0b00000000int16 0 0b0000000000000000int32 0 0b00000000000000000000000000000000takum8 0.125 0b0.0.110.0.00takum16 0.099976 0b0.0.101.11.100110011takum32 0.09999999963 0b0.0.101.11.1001100110011001100110011hfloat32 0.099999964 0b0.1000000.000110011001100110011001decimal32 0.1000000 0b0.01001.011110.00000000000000000000rational8 0.1 0b0000'0001 / 0b0000'1010rational16 0.1 0b0000'0000'0000'0001 / 0b0000'0000'0000'1010rational32 0.1 0b0000'0000'0000'0000'0000'0000'0000'0001 / 0b0000'0000'0000'0000'0000'0000'0000'1010
Type Value Binary--------------------------------------------------------------------------------double 0.10000000000000001 0b0.01111111011.1001100110011001100110011001100110011001100110011010posit64 1.0000000000000000002e-01 0b0.01.00.10011001100110011001100110011001100110011001100110011001101fp64 9.99999999999999917e-02 0b0.01111111011.1001100110011001100110011001100110011001100110011001lns16 0.100112047230168338396 0b0.1111100.10101110int64 0 0b0000000000000000000000000000000000000000000000000000000000000000takum64 0.10000000000000000555 0b0.0.101.11.100110011001100110011001100110011001100110011001101000000dfixpnt8_4 0.1000 0.0000000000000000.0001000000000000dfixpnt16_8 0.10000000 0.00000000000000000000000000000000.00010000000000000000000000000000hfloat64 0.099999999999999992 0b0.1000000.00011001100110011001100110011001100110011001100110011001decimal64 0.1000000000000000 0b0.01001.01111110.00000000000000000000000000000000000000000000000000
Type Value / Binary--------------------------------------------------------------------------------fp128 9.99999999999999999999999999999999928e-02 0b0.011111111111011.1001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001lns32 0.09999987268604650092473917766255908645689487457275390625 0b0.111111111111100.1010110110010110dd 1.0000000000000000000000000000000e-01 0b0.01111111011.1001100110011001100110011001100110011001100110011010|01100110011001100110011001100110011001100110011001101dd_cascade 9.999999999999999999999999999999969e-02 0b0.01111111011.1001100110011001100110011001100110011001100110011010|01100110011001100110011001100110011001100110011001101td_cascade 1.0000000000000000000000000000000000000000000000002e-01 0b0.01111111011.1001100110011001100110011001100110011001100110011010|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000qd 1.000000000000000000000000000000000000000000000000000000000000000e-01 0b0.01111111011.1001100110011001100110011001100110011001100110011010|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000qd_cascade 9.99999999999999999999999999999999999999999999999999999999999999991e-02 0b0.01111111011.1001100110011001100110011001100110011001100110011010|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000Notice that decimal32 and rational32 represent 0.1 exactly — decimal
uses base-10 encoding and rational stores the fraction 1/10 directly. Every
binary type introduces rounding error, but the magnitude differs by orders
of magnitude across types.
Now try:
double> compare 0.1The results are different as we are giving the number systems a double precision
floating-point number, 0.1, which approximates 1/10, and that approximation
propagates through all the number systems.
Example 3: The Golden Ratio Identity — Measuring Arithmetic Fidelity
Section titled “Example 3: The Golden Ratio Identity — Measuring Arithmetic Fidelity”The golden ratio satisfies phi^2 - phi - 1 = 0. With native-precision arithmetic, each type reveals its true residual:
posit32> x = phi1.618033990e+00posit32> show x * x - x - 1 value: 0.000000000e+00 binary: 0b0.0000000000000000000000000000000.. components: sign: +, regime: -31, exponent: 1, significand: 1 type: posit< 32, 2, uint32_t>Posit32 evaluates to exactly zero (lucky cancellation at this precision). IEEE single shows a residual of one ULP:
fp32> y = phi1.61803401e+00fp32> show y * y - y - 1 value: 1.19209290e-07 binary: 0b0.01101000.00000000000000000000000 components: sign: +, scale: -23, significand: 1.000000000e+00 type: fp32 (IEEE-754 binary32)Double-double reveals a residual at its own machine epsilon (~1e-33):
dd> z = phi1.6180339887498948482045868343656e+00dd> show z * z - z - 1 value: -6.1629758220391547297791294162718e-33 binary: 0b1.01110010100.000...0|000...0 components: double-double: -6.16298e-33 type: double-doublePosit32 is lucky in this expression as the rounding of phi and phi^2 are in the same direction and yield values exactly 1.0 apart:
Active type: posit32 (posit< 32, 2, uint32_t>)posit32> x = phi1.618033990e+00posit32> xsqr = phi * phi2.618033990e+00posit32> vars posit32 x = 1.618033990e+00 posit32 xsqr = 2.618033990e+00
posit32> show x value: 1.618033990e+00 color: 01000100111100011011101111001110 components: sign: +, regime: 0, exponent: 1, significand: 1.61803399026393890381 type: posit< 32, 2, uint32_t>posit32> show xsqr value: 2.618033990e+00 color: 01001010011110001101110111100111 components: sign: +, regime: 0, exponent: 2, significand: 1.3090169951319694519 type: posit< 32, 2, uint32_t>posit32> show xsqr - x value: 1.000000000e+00 color: 01000000000000000000000000000000 components: sign: +, regime: 0, exponent: 1, significand: 1 type: posit< 32, 2, uint32_t>The Priest-based dd_cascade shows the same dynamic:
posit32> type dd_cascadeActive type: dd_cascade (double-double Priest)dd_cascade> y = phi1.618033988749894848204586834365637e+00dd_cascade> ysqr = phi * phi2.618033988749895023991527441632691e+00dd_cascade> vars dd_cascade y = 1.618033988749894848204586834365637e+00 dd_cascade ysqr = 2.618033988749895023991527441632691e+00
dd_cascade> show y value: 1.618033988749894848204586834365637e+00 color: dd_cascade[ high: 1.61803, low: -5.43212e-17 ] components: double-double Priest: 1.61803 type: double-double Priestdd_cascade> show ysqr value: 2.618033988749895023991527441632691e+00 color: dd_cascade[ high: 2.61803, low: 1.21466e-16 ] components: double-double Priest: 2.61803 type: double-double Priestdd_cascade> show ysqr - y value: 1.000000000000000000000000000000000e+00 color: dd_cascade[ high: 1, low: 0 ] components: double-double Priest: 1 type: double-double PriestExample 4: Dynamic Range Comparison Across 16-bit Types
Section titled “Example 4: Dynamic Range Comparison Across 16-bit Types”The range command reveals how different 16-bit types trade precision for
range:
fp16> rangefp16 (IEEE-754 binary16)[ -6.5504e+04 ... -5.9605e-08 0 5.9605e-08 ... 6.5504e+04 ]
bfloat16> rangebfloat16[ -3.4e+38 ... -1.2e-38 0 1.2e-38 ... 3.4e+38 ]
posit16> rangeposit< 16, 2, uint16_t>[ -7.2058e+16 ... -1.3878e-17 0 1.3878e-17 ... 7.2058e+16 ]
lns16> rangelns< 16, 8, uint16_t, Saturating>[ -18396865112328554496 ... -5.436e-20 0 5.436e-20 ... 18396865112328554496 ]All four are 16 bits, but their tradeoffs are dramatic:
| Type | Dynamic Range (decades) | Precision (digits) |
|---|---|---|
| fp16 | ~13 decades | 3.0 |
| bfloat16 | ~76 decades | 2.1 |
| posit16 | ~34 decades | 3.3 |
| lns16 | ~39 decades | ~2.4 |
bfloat16 matches float’s range but sacrifices precision. posit16 delivers more precision than fp16 AND more range. lns16 achieves the widest range of any 16-bit format by encoding values as logarithms.
Example 5: Precision Ladder — From 8-bit to 32-bit
Section titled “Example 5: Precision Ladder — From 8-bit to 32-bit”The precision command measures each type’s effective precision at 1.0.
This reveals how bit-width translates to decimal accuracy:
fp8e4m3> precision type: fp8e4m3 (OFP 8-bit e4m3) binary digits: 3 decimal digits: 0.9 epsilon: 1.25e-01 minpos: 1.95e-03 maxpos: 4.16e+02
posit16> precision type: posit< 16, 2, uint16_t> binary digits: 11 decimal digits: 3.3 epsilon: 4.8828e-04 minpos: 1.3878e-17 maxpos: 7.2058e+16
fp32> precision type: fp32 (IEEE-754 binary32) binary digits: 23 decimal digits: 6.9 epsilon: 1.19209290e-07 minpos: 1.40129846e-45 maxpos: 3.40282347e+38
posit32> precision type: posit< 32, 2, uint32_t> binary digits: 27 decimal digits: 8.1 epsilon: 7.450580597e-09 minpos: 7.523163845e-37 maxpos: 1.329227996e+36At 32 bits, posit delivers 27 binary digits near 1.0 compared to IEEE float’s 23 — a 16x smaller epsilon. This means posit32 can resolve perturbations near 1.0 that fp32 cannot (see Example 1).
Example 6: Catastrophic Cancellation
Section titled “Example 6: Catastrophic Cancellation”Subtracting nearly equal quantities destroys significant digits. The
expression (1 + 1e-8) - 1 should yield 1e-8 but exercises catastrophic
cancellation:
fp32> show (1 + 1e-8) - 1 value: 0.00000000e+00 binary: 0b0.00000000.00000000000000000000000 components: sign: +, zero type: fp32 (IEEE-754 binary32)IEEE single loses the 1e-8 term entirely — it’s below the ULP at 1.0 (which is ~1.2e-7). posit32 preserves the term:
posit32> show (1 + 1e-8) - 1 value: 7.450580597e-09 binary: 0b0.00000001.01.000000000000000000000 components: sign: +, regime: -7, exponent: 2, significand: 1 type: posit< 32, 2, uint32_t>The posit result (7.45e-9) is the nearest posit representable to 1e-8. Double-double recovers nearly full accuracy:
dd> show (1 + 1e-8) - 1 value: 1.0000000000000000209225608301285e-08 binary: 0b0.01111100100.0101011110011000111011100010001100001000110000111010|000...0 components: double-double: 1e-08 type: double-doubleThis pattern is critical in numerical methods where loss-of-significance in intermediate results cascades into large final errors.
Example 7: Faithful Rounding Verification
Section titled “Example 7: Faithful Rounding Verification”A result is faithfully rounded if it equals one of the two representable
values adjacent to the exact answer. The faithful command checks this
against a quad-double reference:
posit32> faithful sqrt(2) result: 1.414213561e+00 reference: 1.414213562373095048801688724209698...e+00 rounded: 1.414213561e+00 neighbor: 1.414213568e+00 faithful: yes
fp32> faithful sqrt(2) result: 1.41421354e+00 reference: 1.414213562373095048801688724209698...e+00 rounded: 1.41421354e+00 neighbor: 1.41421366e+00 faithful: yesBoth posit32 and fp32 produce faithfully rounded sqrt(2). The rounded
value is the nearest representable, and neighbor is the next representable
in the opposite direction. The result must equal one of them.
Example 8: Transcendental Error Profiles with Sweep
Section titled “Example 8: Transcendental Error Profiles with Sweep”The sweep command evaluates an expression across a range and reports
ULP error vs a double-precision reference. This reveals where type-specific
approximation errors concentrate:
posit32> sweep sin(x) for x in [0, 3.14159, 6]x result double ref ULP error-------------------------------------------------------------------------------------0 0.000000000e+00 0 0.000.628318 5.877848230e-01 0.58778482293254253 0.041.256636 9.510561898e-01 0.95105618829288086 0.441.884954 9.510570094e-01 0.95105700829655349 0.312.513272 5.877869688e-01 0.58778696973054001 0.853.14159 2.654390983e-06 2.65358979335273e-06 20261.95The error is sub-ULP through most of the range but explodes near pi where argument reduction subtracts nearly equal quantities. This is a fundamental limitation shared by all binary types — the result near sin(pi) depends on how many digits of pi the type can represent.
Example 9: Exact Decimal Arithmetic for Financial Calculations
Section titled “Example 9: Exact Decimal Arithmetic for Financial Calculations”Binary floating-point cannot represent 0.1, 0.01, or most decimal
fractions exactly. In financial software this causes accumulation errors
that violate accounting identities. Decimal fixed-point (dfixpnt)
uses BCD encoding and carries every decimal digit without rounding:
double> show 0.1 + 0.2 - 0.3 value: 5.5511151231257827e-17 binary: 0b0.01111001001.0000000000000000000000000000000000000000000000000000 components: sign: +, scale: -54, significand: 1 type: double (IEEE-754 binary64)
double> type dfixpnt16_8Active type: dfixpnt16_8 (dfixpnt< 16, 8, BCD, Modulo, uint8_t>)dfixpnt16_8> show 0.1 + 0.2 - 0.3 value: 0.00000000 binary: 0.00000000000000000000000000000000.00000000000000000000000000000000 components: dfixpnt< 16, 8, BCD, Modulo, uint8_t>: 0 type: dfixpnt< 16, 8, BCD, Modulo, uint8_t>Double produces a non-zero residual (~5.55e-17) because 0.1 and 0.2
are rounded on entry. dfixpnt16_8 yields exactly zero.
This matters when totals must balance to the penny. Consider an invoice with three items at $19.99, two at $5.99, and one at $1.50, plus 7.25% sales tax:
dfixpnt16_8> tax = 0.07250.07250000dfixpnt16_8> subtotal = 19.99 * 3 + 5.99 * 2 + 1.5073.45000000dfixpnt16_8> show subtotal + subtotal * tax value: 78.77512500 binary: 0.00000000000000000000000001111000.01110111010100010010010100000000 components: dfixpnt< 16, 8, BCD, Modulo, uint8_t>: 78.7751 type: dfixpnt< 16, 8, BCD, Modulo, uint8_t>The subtotal is exactly $73.45, tax is exactly $5.325125, and the grand total is exactly $78.775125. The same calculation in double:
double> tax = 0.07250.072499999999999995double> subtotal = 19.99 * 3 + 5.99 * 2 + 1.5073.450000000000003double> show subtotal + subtotal * tax value: 78.775125000000003 binary: 0b0.10000000101.0011101100011001101110100101111000110101001111111000 components: sign: +, scale: 6, significand: 1.230861328125 type: double (IEEE-754 binary64)Double’s subtotal is already 73.450000000000003 — off by 3e-15. These errors are invisible in a single calculation but accumulate across thousands of line items in a ledger, eventually causing reconciliation failures. Decimal fixed-point eliminates this class of error entirely.
Example 10: Takum’s Uniform Precision Across the Dynamic Range
Section titled “Example 10: Takum’s Uniform Precision Across the Dynamic Range”Posit arithmetic concentrates precision near 1.0 by using a variable-length regime field: values close to 1.0 get many fraction bits, but extreme values consume most bits on the regime, leaving few for the significand. Takum (Hunhold, 2024) replaces the variable-length regime with a bounded characteristic field, giving a more uniform precision distribution and a dramatically wider dynamic range.
At 32 bits, both types deliver identical precision near 1.0:
takum32> precision type: takum< 32, 3, uint32_t> binary digits: 27 decimal digits: 8.1 epsilon: 7.450580597e-09 minpos: 1.727235358e-77 maxpos: 5.789601701e+76
posit32> precision type: posit< 32, 2, uint32_t> binary digits: 27 decimal digits: 8.1 epsilon: 7.450580597e-09 minpos: 7.523163845e-37 maxpos: 1.329227996e+36Same epsilon, same 27 binary digits at 1.0. But takum32 spans 10^77 while posit32 reaches only 10^36 — over twice the dynamic range in decades.
The difference becomes dramatic away from 1.0. Compare the ULP at increasing scales:
| Scale | takum32 ULP | posit32 ULP | Relative ULP (takum) | Relative ULP (posit) |
|---|---|---|---|---|
| 1 | 7.45e-9 | 7.45e-9 | 7.45e-9 | 7.45e-9 |
| 1e5 | 5.96e-3 | 5.96e-3 | 5.96e-8 | 5.96e-8 |
| 1e10 | 1,192 | 9,537 | 1.19e-7 | 9.54e-7 |
| 1e15 | 1.19e8 | 1.53e10 | 1.19e-7 | 1.53e-5 |
| 1e20 | 2.38e13 | 2.44e16 | 2.38e-7 | 2.44e-4 |
| 1e30 | 2.38e23 | 6.44e28 | 2.38e-7 | 6.44e-2 |
Takum’s relative precision stays nearly constant (~2e-7) across 30 decades of scale. Posit’s degrades from 7.45e-9 at 1.0 to 0.064 at 1e30 — a factor of 8.6 million. At 1e30, posit32 has barely one significant digit left.
This is visible in the representations themselves:
takum32> show 1e20 value: 1.00000002e+20 binary: 0b0.1.110.000011.010110101111000111011 components: ... Characteristic : 66 Scale : 66 type: takum< 32, 3, uint32_t>
posit32> show 1e20 value: 1.000159405e+20 binary: 0b0.111111111111111110.10.01011011000 components: sign: +, regime: 16, exponent: 4, significand: 1.35546875 type: posit< 32, 2, uint32_t>Takum32 represents 1e20 to 8 significant digits (1.00000002e+20). Posit32 manages only 4 (1.000159405e+20 — off by 1.6e16). The posit’s regime field has expanded to 18 bits, leaving only 11 for exponent and significand. Takum’s characteristic field stays bounded, preserving fraction bits at every scale.
A sweep of sqrt(x) across a wide range confirms the pattern:
takum16> sweep sqrt(x) for x in [0.001, 1e12, 8]x result double ref ULP error-------------------------------------------------------------------------------------0.001 0.031616 0.031622776601683791 0.211.4285714e+11 3.7888e+05 377964.47300922842 0.314.2857143e+11 6.5536e+05 654653.67070797761 0.148.5714286e+11 9.257e+05 925820.09977255156 0.031e+12 9.9942e+05 1000000.0000000001 0.29
posit16> sweep sqrt(x) for x in [0.001, 1e12, 8]x result double ref ULP error-------------------------------------------------------------------------------------0.001 3.1616e-02 0.031622776601683791 0.431.4285714e+11 3.7069e+05 377964.47300922842 2.464.2857143e+11 6.4307e+05 654653.67070797761 2.268.5714286e+11 9.0931e+05 925820.09977255156 4.561e+12 9.7894e+05 1000000.0000000001 10.78At 1e12, posit16’s sqrt has 10.78 ULP error vs takum16’s 0.29 — a 37x improvement. Takum maintains sub-ULP accuracy across the entire range because it never runs out of fraction bits.
Available Types
Section titled “Available Types”ucalc registers 42 types spanning the major number system families:
| Family | Types |
|---|---|
| Integer | int8, int16, int32, int64 |
| Fixed-point | fixpnt16, fixpnt32 |
| Decimal fixed-point | dfixpnt8_4, dfixpnt16_8 |
| Native IEEE | float, double |
| Classic float | fp16, fp32, fp64, fp128 |
| Google Brain Float | bfloat16 |
| FP8 (Deep Learning) | fp8e2m5, fp8e3m4, fp8e4m3, fp8e5m2 |
| Logarithmic | lns8, lns16, lns32 |
| Posit | posit8, posit16, posit32, posit64 |
| Takum | takum8, takum16, takum32, takum64 |
| Decimal float | decimal32, decimal64 |
| Hexadecimal float | hfloat32, hfloat64 |
| Rational | rational8, rational16, rational32 |
| Multi-component | dd, dd_cascade, td_cascade, qd, qd_cascade |