Skip to content

ucalc: Mixed-Precision REPL Calculator

ucalc is an interactive calculator for exploring and comparing arithmetic across Universal number types. Instead of writing, compiling, and running C++ for each experiment, you can compare representations, measure precision, and analyze errors interactively.

Terminal window
# Build
cmake -DUNIVERSAL_BUILD_TOOLS_UCALC=ON ..
make ucalc
# Interactive
ucalc
# One-shot
ucalc "type posit32; show 1/3"
# Pipe mode
echo "compare sqrt(2)" | ucalc
CommandDescription
type <name>Set the active arithmetic type
typesList all available types
show <expr>Value + binary decomposition + components
compare <expr>Evaluate across all types in a table
rangeSymmetry range: [maxneg … minneg] 0 [minpos … maxpos]
precisionBinary/decimal digits, epsilon, minpos, maxpos
ulp <value>Unit in the last place at a given value
bits <expr>Raw bit pattern
sweep <expr> for <var> in [a, b, n]Error analysis across a range
faithful <expr>Check faithful rounding vs higher-precision reference
color on/offToggle ANSI color-coded bit fields
varsList defined variables
helpCommand reference

Expressions support standard arithmetic (+, -, *, /, ^), parentheses, variables (x = expr), constants (pi, e, phi, ln2, ln10, sqrt2), and functions (sqrt, abs, log, exp, sin, cos, pow).

Constants are sourced at quad-double precision (~64 decimal digits) and converted to the active type at its native precision.


Example 1: Precision Near 1.0 — Where Posit Outshines IEEE

Section titled “Example 1: Precision Near 1.0 — Where Posit Outshines IEEE”

Posit’s tapered precision allocates more fraction bits near 1.0 than IEEE float does at the same bit width. This means posit32 has a smaller epsilon (7.45e-9 vs 1.19e-7), and can resolve smaller perturbations.

double> type float
Active type: float (float (IEEE-754 binary32))
float> precision
type: float (IEEE-754 binary32)
binary digits: 23
decimal digits: 6.9
epsilon: 1.1920929e-07
minpos: 1.40129846e-45
maxpos: 3.40282347e+38
float> type posit32
Active type: posit32 (posit< 32, 2, uint32_t>)
posit32> precision
type: posit< 32, 2, uint32_t>
binary digits: 27
decimal digits: 8.1
epsilon: 7.450580597e-09
minpos: 7.523163845e-37
maxpos: 1.329227996e+36

The decimal value 0.1 cannot be represented exactly in binary floating-point. The compare command reveals how each type approximates it, grouped by bit width (small <=32, medium 33-80, large >80):

double> compare 1/10
Type Value Binary
----------------------------------------------------------------------
float 0.100000001 0b0.01111011.10011001100110011001101
posit8 1.02e-01 0b0.01.00.101
posit16 1.0001e-01 0b0.01.00.10011001101
posit32 1.000000001e-01 0b0.01.00.100110011001100110011001101
bfloat16 0.1 0x0.01111011.1001100
fp16 9.9976e-02 0b0.01011.1001100110
fp32 9.99999940e-02 0b0.01111011.10011001100110011001100
fp8e2m5 0.00e+00 0b0.00.00000
fp8e3m4 9.38e-02 0b0.000.0110
fp8e4m3 1.02e-01 0b0.0011.101
fp8e5m2 9.4e-02 0b0.01011.10
fixpnt16 0.10156250 0b00000000.00011010
fixpnt32 0.1000061035156250 0b0000000000000000.0001100110011010
lns8 0.10511 0b0.11100.11
int8 0 0b00000000
int16 0 0b0000000000000000
int32 0 0b00000000000000000000000000000000
takum8 0.125 0b0.0.110.0.00
takum16 0.099976 0b0.0.101.11.100110011
takum32 0.09999999963 0b0.0.101.11.1001100110011001100110011
hfloat32 0.099999964 0b0.1000000.000110011001100110011001
decimal32 0.1000000 0b0.01001.011110.00000000000000000000
rational8 0.1 0b0000'0001 / 0b0000'1010
rational16 0.1 0b0000'0000'0000'0001 / 0b0000'0000'0000'1010
rational32 0.1 0b0000'0000'0000'0000'0000'0000'0000'0001 / 0b0000'0000'0000'0000'0000'0000'0000'1010
Type Value Binary
--------------------------------------------------------------------------------
double 0.10000000000000001 0b0.01111111011.1001100110011001100110011001100110011001100110011010
posit64 1.0000000000000000002e-01 0b0.01.00.10011001100110011001100110011001100110011001100110011001101
fp64 9.99999999999999917e-02 0b0.01111111011.1001100110011001100110011001100110011001100110011001
lns16 0.100112047230168338396 0b0.1111100.10101110
int64 0 0b0000000000000000000000000000000000000000000000000000000000000000
takum64 0.10000000000000000555 0b0.0.101.11.100110011001100110011001100110011001100110011001101000000
dfixpnt8_4 0.1000 0.0000000000000000.0001000000000000
dfixpnt16_8 0.10000000 0.00000000000000000000000000000000.00010000000000000000000000000000
hfloat64 0.099999999999999992 0b0.1000000.00011001100110011001100110011001100110011001100110011001
decimal64 0.1000000000000000 0b0.01001.01111110.00000000000000000000000000000000000000000000000000
Type Value / Binary
--------------------------------------------------------------------------------
fp128 9.99999999999999999999999999999999928e-02
0b0.011111111111011.1001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001
lns32 0.09999987268604650092473917766255908645689487457275390625
0b0.111111111111100.1010110110010110
dd 1.0000000000000000000000000000000e-01
0b0.01111111011.1001100110011001100110011001100110011001100110011010|01100110011001100110011001100110011001100110011001101
dd_cascade 9.999999999999999999999999999999969e-02
0b0.01111111011.1001100110011001100110011001100110011001100110011010|01100110011001100110011001100110011001100110011001101
td_cascade 1.0000000000000000000000000000000000000000000000002e-01
0b0.01111111011.1001100110011001100110011001100110011001100110011010|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000
qd 1.000000000000000000000000000000000000000000000000000000000000000e-01
0b0.01111111011.1001100110011001100110011001100110011001100110011010|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000
qd_cascade 9.99999999999999999999999999999999999999999999999999999999999999991e-02
0b0.01111111011.1001100110011001100110011001100110011001100110011010|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000

Notice that decimal32 and rational32 represent 0.1 exactly — decimal uses base-10 encoding and rational stores the fraction 1/10 directly. Every binary type introduces rounding error, but the magnitude differs by orders of magnitude across types.

Now try:

double> compare 0.1

The results are different as we are giving the number systems a double precision floating-point number, 0.1, which approximates 1/10, and that approximation propagates through all the number systems.


Example 3: The Golden Ratio Identity — Measuring Arithmetic Fidelity

Section titled “Example 3: The Golden Ratio Identity — Measuring Arithmetic Fidelity”

The golden ratio satisfies phi^2 - phi - 1 = 0. With native-precision arithmetic, each type reveals its true residual:

posit32> x = phi
1.618033990e+00
posit32> show x * x - x - 1
value: 0.000000000e+00
binary: 0b0.0000000000000000000000000000000..
components: sign: +, regime: -31, exponent: 1, significand: 1
type: posit< 32, 2, uint32_t>

Posit32 evaluates to exactly zero (lucky cancellation at this precision). IEEE single shows a residual of one ULP:

fp32> y = phi
1.61803401e+00
fp32> show y * y - y - 1
value: 1.19209290e-07
binary: 0b0.01101000.00000000000000000000000
components: sign: +, scale: -23, significand: 1.000000000e+00
type: fp32 (IEEE-754 binary32)

Double-double reveals a residual at its own machine epsilon (~1e-33):

dd> z = phi
1.6180339887498948482045868343656e+00
dd> show z * z - z - 1
value: -6.1629758220391547297791294162718e-33
binary: 0b1.01110010100.000...0|000...0
components: double-double: -6.16298e-33
type: double-double

Posit32 is lucky in this expression as the rounding of phi and phi^2 are in the same direction and yield values exactly 1.0 apart:

Active type: posit32 (posit< 32, 2, uint32_t>)
posit32> x = phi
1.618033990e+00
posit32> xsqr = phi * phi
2.618033990e+00
posit32> vars
posit32 x = 1.618033990e+00
posit32 xsqr = 2.618033990e+00
posit32> show x
value: 1.618033990e+00
color: 01000100111100011011101111001110
components: sign: +, regime: 0, exponent: 1, significand: 1.61803399026393890381
type: posit< 32, 2, uint32_t>
posit32> show xsqr
value: 2.618033990e+00
color: 01001010011110001101110111100111
components: sign: +, regime: 0, exponent: 2, significand: 1.3090169951319694519
type: posit< 32, 2, uint32_t>
posit32> show xsqr - x
value: 1.000000000e+00
color: 01000000000000000000000000000000
components: sign: +, regime: 0, exponent: 1, significand: 1
type: posit< 32, 2, uint32_t>

The Priest-based dd_cascade shows the same dynamic:

posit32> type dd_cascade
Active type: dd_cascade (double-double Priest)
dd_cascade> y = phi
1.618033988749894848204586834365637e+00
dd_cascade> ysqr = phi * phi
2.618033988749895023991527441632691e+00
dd_cascade> vars
dd_cascade y = 1.618033988749894848204586834365637e+00
dd_cascade ysqr = 2.618033988749895023991527441632691e+00
dd_cascade> show y
value: 1.618033988749894848204586834365637e+00
color: dd_cascade[ high: 1.61803, low: -5.43212e-17 ]
components: double-double Priest: 1.61803
type: double-double Priest
dd_cascade> show ysqr
value: 2.618033988749895023991527441632691e+00
color: dd_cascade[ high: 2.61803, low: 1.21466e-16 ]
components: double-double Priest: 2.61803
type: double-double Priest
dd_cascade> show ysqr - y
value: 1.000000000000000000000000000000000e+00
color: dd_cascade[ high: 1, low: 0 ]
components: double-double Priest: 1
type: double-double Priest

Example 4: Dynamic Range Comparison Across 16-bit Types

Section titled “Example 4: Dynamic Range Comparison Across 16-bit Types”

The range command reveals how different 16-bit types trade precision for range:

fp16> range
fp16 (IEEE-754 binary16)
[ -6.5504e+04 ... -5.9605e-08 0 5.9605e-08 ... 6.5504e+04 ]
bfloat16> range
bfloat16
[ -3.4e+38 ... -1.2e-38 0 1.2e-38 ... 3.4e+38 ]
posit16> range
posit< 16, 2, uint16_t>
[ -7.2058e+16 ... -1.3878e-17 0 1.3878e-17 ... 7.2058e+16 ]
lns16> range
lns< 16, 8, uint16_t, Saturating>
[ -18396865112328554496 ... -5.436e-20 0 5.436e-20 ... 18396865112328554496 ]

All four are 16 bits, but their tradeoffs are dramatic:

TypeDynamic Range (decades)Precision (digits)
fp16~13 decades3.0
bfloat16~76 decades2.1
posit16~34 decades3.3
lns16~39 decades~2.4

bfloat16 matches float’s range but sacrifices precision. posit16 delivers more precision than fp16 AND more range. lns16 achieves the widest range of any 16-bit format by encoding values as logarithms.


Example 5: Precision Ladder — From 8-bit to 32-bit

Section titled “Example 5: Precision Ladder — From 8-bit to 32-bit”

The precision command measures each type’s effective precision at 1.0. This reveals how bit-width translates to decimal accuracy:

fp8e4m3> precision
type: fp8e4m3 (OFP 8-bit e4m3)
binary digits: 3
decimal digits: 0.9
epsilon: 1.25e-01
minpos: 1.95e-03
maxpos: 4.16e+02
posit16> precision
type: posit< 16, 2, uint16_t>
binary digits: 11
decimal digits: 3.3
epsilon: 4.8828e-04
minpos: 1.3878e-17
maxpos: 7.2058e+16
fp32> precision
type: fp32 (IEEE-754 binary32)
binary digits: 23
decimal digits: 6.9
epsilon: 1.19209290e-07
minpos: 1.40129846e-45
maxpos: 3.40282347e+38
posit32> precision
type: posit< 32, 2, uint32_t>
binary digits: 27
decimal digits: 8.1
epsilon: 7.450580597e-09
minpos: 7.523163845e-37
maxpos: 1.329227996e+36

At 32 bits, posit delivers 27 binary digits near 1.0 compared to IEEE float’s 23 — a 16x smaller epsilon. This means posit32 can resolve perturbations near 1.0 that fp32 cannot (see Example 1).


Subtracting nearly equal quantities destroys significant digits. The expression (1 + 1e-8) - 1 should yield 1e-8 but exercises catastrophic cancellation:

fp32> show (1 + 1e-8) - 1
value: 0.00000000e+00
binary: 0b0.00000000.00000000000000000000000
components: sign: +, zero
type: fp32 (IEEE-754 binary32)

IEEE single loses the 1e-8 term entirely — it’s below the ULP at 1.0 (which is ~1.2e-7). posit32 preserves the term:

posit32> show (1 + 1e-8) - 1
value: 7.450580597e-09
binary: 0b0.00000001.01.000000000000000000000
components: sign: +, regime: -7, exponent: 2, significand: 1
type: posit< 32, 2, uint32_t>

The posit result (7.45e-9) is the nearest posit representable to 1e-8. Double-double recovers nearly full accuracy:

dd> show (1 + 1e-8) - 1
value: 1.0000000000000000209225608301285e-08
binary: 0b0.01111100100.0101011110011000111011100010001100001000110000111010|000...0
components: double-double: 1e-08
type: double-double

This pattern is critical in numerical methods where loss-of-significance in intermediate results cascades into large final errors.


A result is faithfully rounded if it equals one of the two representable values adjacent to the exact answer. The faithful command checks this against a quad-double reference:

posit32> faithful sqrt(2)
result: 1.414213561e+00
reference: 1.414213562373095048801688724209698...e+00
rounded: 1.414213561e+00
neighbor: 1.414213568e+00
faithful: yes
fp32> faithful sqrt(2)
result: 1.41421354e+00
reference: 1.414213562373095048801688724209698...e+00
rounded: 1.41421354e+00
neighbor: 1.41421366e+00
faithful: yes

Both posit32 and fp32 produce faithfully rounded sqrt(2). The rounded value is the nearest representable, and neighbor is the next representable in the opposite direction. The result must equal one of them.


Example 8: Transcendental Error Profiles with Sweep

Section titled “Example 8: Transcendental Error Profiles with Sweep”

The sweep command evaluates an expression across a range and reports ULP error vs a double-precision reference. This reveals where type-specific approximation errors concentrate:

posit32> sweep sin(x) for x in [0, 3.14159, 6]
x result double ref ULP error
-------------------------------------------------------------------------------------
0 0.000000000e+00 0 0.00
0.628318 5.877848230e-01 0.58778482293254253 0.04
1.256636 9.510561898e-01 0.95105618829288086 0.44
1.884954 9.510570094e-01 0.95105700829655349 0.31
2.513272 5.877869688e-01 0.58778696973054001 0.85
3.14159 2.654390983e-06 2.65358979335273e-06 20261.95

The error is sub-ULP through most of the range but explodes near pi where argument reduction subtracts nearly equal quantities. This is a fundamental limitation shared by all binary types — the result near sin(pi) depends on how many digits of pi the type can represent.


Example 9: Exact Decimal Arithmetic for Financial Calculations

Section titled “Example 9: Exact Decimal Arithmetic for Financial Calculations”

Binary floating-point cannot represent 0.1, 0.01, or most decimal fractions exactly. In financial software this causes accumulation errors that violate accounting identities. Decimal fixed-point (dfixpnt) uses BCD encoding and carries every decimal digit without rounding:

double> show 0.1 + 0.2 - 0.3
value: 5.5511151231257827e-17
binary: 0b0.01111001001.0000000000000000000000000000000000000000000000000000
components: sign: +, scale: -54, significand: 1
type: double (IEEE-754 binary64)
double> type dfixpnt16_8
Active type: dfixpnt16_8 (dfixpnt< 16, 8, BCD, Modulo, uint8_t>)
dfixpnt16_8> show 0.1 + 0.2 - 0.3
value: 0.00000000
binary: 0.00000000000000000000000000000000.00000000000000000000000000000000
components: dfixpnt< 16, 8, BCD, Modulo, uint8_t>: 0
type: dfixpnt< 16, 8, BCD, Modulo, uint8_t>

Double produces a non-zero residual (~5.55e-17) because 0.1 and 0.2 are rounded on entry. dfixpnt16_8 yields exactly zero.

This matters when totals must balance to the penny. Consider an invoice with three items at $19.99, two at $5.99, and one at $1.50, plus 7.25% sales tax:

dfixpnt16_8> tax = 0.0725
0.07250000
dfixpnt16_8> subtotal = 19.99 * 3 + 5.99 * 2 + 1.50
73.45000000
dfixpnt16_8> show subtotal + subtotal * tax
value: 78.77512500
binary: 0.00000000000000000000000001111000.01110111010100010010010100000000
components: dfixpnt< 16, 8, BCD, Modulo, uint8_t>: 78.7751
type: dfixpnt< 16, 8, BCD, Modulo, uint8_t>

The subtotal is exactly $73.45, tax is exactly $5.325125, and the grand total is exactly $78.775125. The same calculation in double:

double> tax = 0.0725
0.072499999999999995
double> subtotal = 19.99 * 3 + 5.99 * 2 + 1.50
73.450000000000003
double> show subtotal + subtotal * tax
value: 78.775125000000003
binary: 0b0.10000000101.0011101100011001101110100101111000110101001111111000
components: sign: +, scale: 6, significand: 1.230861328125
type: double (IEEE-754 binary64)

Double’s subtotal is already 73.450000000000003 — off by 3e-15. These errors are invisible in a single calculation but accumulate across thousands of line items in a ledger, eventually causing reconciliation failures. Decimal fixed-point eliminates this class of error entirely.


Example 10: Takum’s Uniform Precision Across the Dynamic Range

Section titled “Example 10: Takum’s Uniform Precision Across the Dynamic Range”

Posit arithmetic concentrates precision near 1.0 by using a variable-length regime field: values close to 1.0 get many fraction bits, but extreme values consume most bits on the regime, leaving few for the significand. Takum (Hunhold, 2024) replaces the variable-length regime with a bounded characteristic field, giving a more uniform precision distribution and a dramatically wider dynamic range.

At 32 bits, both types deliver identical precision near 1.0:

takum32> precision
type: takum< 32, 3, uint32_t>
binary digits: 27
decimal digits: 8.1
epsilon: 7.450580597e-09
minpos: 1.727235358e-77
maxpos: 5.789601701e+76
posit32> precision
type: posit< 32, 2, uint32_t>
binary digits: 27
decimal digits: 8.1
epsilon: 7.450580597e-09
minpos: 7.523163845e-37
maxpos: 1.329227996e+36

Same epsilon, same 27 binary digits at 1.0. But takum32 spans 10^77 while posit32 reaches only 10^36 — over twice the dynamic range in decades.

The difference becomes dramatic away from 1.0. Compare the ULP at increasing scales:

Scaletakum32 ULPposit32 ULPRelative ULP (takum)Relative ULP (posit)
17.45e-97.45e-97.45e-97.45e-9
1e55.96e-35.96e-35.96e-85.96e-8
1e101,1929,5371.19e-79.54e-7
1e151.19e81.53e101.19e-71.53e-5
1e202.38e132.44e162.38e-72.44e-4
1e302.38e236.44e282.38e-76.44e-2

Takum’s relative precision stays nearly constant (~2e-7) across 30 decades of scale. Posit’s degrades from 7.45e-9 at 1.0 to 0.064 at 1e30 — a factor of 8.6 million. At 1e30, posit32 has barely one significant digit left.

This is visible in the representations themselves:

takum32> show 1e20
value: 1.00000002e+20
binary: 0b0.1.110.000011.010110101111000111011
components: ... Characteristic : 66 Scale : 66
type: takum< 32, 3, uint32_t>
posit32> show 1e20
value: 1.000159405e+20
binary: 0b0.111111111111111110.10.01011011000
components: sign: +, regime: 16, exponent: 4, significand: 1.35546875
type: posit< 32, 2, uint32_t>

Takum32 represents 1e20 to 8 significant digits (1.00000002e+20). Posit32 manages only 4 (1.000159405e+20 — off by 1.6e16). The posit’s regime field has expanded to 18 bits, leaving only 11 for exponent and significand. Takum’s characteristic field stays bounded, preserving fraction bits at every scale.

A sweep of sqrt(x) across a wide range confirms the pattern:

takum16> sweep sqrt(x) for x in [0.001, 1e12, 8]
x result double ref ULP error
-------------------------------------------------------------------------------------
0.001 0.031616 0.031622776601683791 0.21
1.4285714e+11 3.7888e+05 377964.47300922842 0.31
4.2857143e+11 6.5536e+05 654653.67070797761 0.14
8.5714286e+11 9.257e+05 925820.09977255156 0.03
1e+12 9.9942e+05 1000000.0000000001 0.29
posit16> sweep sqrt(x) for x in [0.001, 1e12, 8]
x result double ref ULP error
-------------------------------------------------------------------------------------
0.001 3.1616e-02 0.031622776601683791 0.43
1.4285714e+11 3.7069e+05 377964.47300922842 2.46
4.2857143e+11 6.4307e+05 654653.67070797761 2.26
8.5714286e+11 9.0931e+05 925820.09977255156 4.56
1e+12 9.7894e+05 1000000.0000000001 10.78

At 1e12, posit16’s sqrt has 10.78 ULP error vs takum16’s 0.29 — a 37x improvement. Takum maintains sub-ULP accuracy across the entire range because it never runs out of fraction bits.


ucalc registers 42 types spanning the major number system families:

FamilyTypes
Integerint8, int16, int32, int64
Fixed-pointfixpnt16, fixpnt32
Decimal fixed-pointdfixpnt8_4, dfixpnt16_8
Native IEEEfloat, double
Classic floatfp16, fp32, fp64, fp128
Google Brain Floatbfloat16
FP8 (Deep Learning)fp8e2m5, fp8e3m4, fp8e4m3, fp8e5m2
Logarithmiclns8, lns16, lns32
Positposit8, posit16, posit32, posit64
Takumtakum8, takum16, takum32, takum64
Decimal floatdecimal32, decimal64
Hexadecimal floathfloat32, hfloat64
Rationalrational8, rational16, rational32
Multi-componentdd, dd_cascade, td_cascade, qd, qd_cascade