ucalc: Mixed-Precision REPL Calculator

ucalc is an interactive calculator for exploring and comparing arithmetic across Universal number types. Instead of writing, compiling, and running C++ for each experiment, you can compare representations, measure precision, and analyze errors interactively.

Quick Start

# Build
cmake -DUNIVERSAL_BUILD_TOOLS_UCALC=ON ..
make ucalc

# Interactive
ucalc

# One-shot
ucalc "type posit32; show 1/3"

# Pipe mode
echo "compare sqrt(2)" | ucalc

Commands Reference

Command	Description
`type <name>`	Set the active arithmetic type
`types`	List all available types
`show <expr>`	Value + binary decomposition + components
`compare <expr>`	Evaluate across all types in a table
`range`	Symmetry range: [maxneg … minneg] 0 [minpos … maxpos]
`precision`	Binary/decimal digits, epsilon, minpos, maxpos
`ulp <value>`	Unit in the last place at a given value
`bits <expr>`	Raw bit pattern
`sweep <expr> for <var> in [a, b, n]`	Error analysis across a range
`faithful <expr>`	Check faithful rounding vs higher-precision reference
`color on/off`	Toggle ANSI color-coded bit fields
`vars`	List defined variables
`help`	Command reference

Expressions support standard arithmetic (+, -, *, /, ^), parentheses, variables (x = expr), constants (pi, e, phi, ln2, ln10, sqrt2), and functions (sqrt, abs, log, exp, sin, cos, pow).

Constants are sourced at quad-double precision (~64 decimal digits) and converted to the active type at its native precision.

Example 1: Precision Near 1.0 — Where Posit Outshines IEEE

Posit’s tapered precision allocates more fraction bits near 1.0 than IEEE float does at the same bit width. This means posit32 has a smaller epsilon (7.45e-9 vs 1.19e-7), and can resolve smaller perturbations.

double> type float
Active type: float (float (IEEE-754 binary32))
float> precision
  type:           float (IEEE-754 binary32)
  binary digits:  23
  decimal digits: 6.9
  epsilon:        1.1920929e-07
  minpos:         1.40129846e-45
  maxpos:         3.40282347e+38
float> type posit32
Active type: posit32 (posit< 32, 2, uint32_t>)
posit32> precision
  type:           posit< 32, 2, uint32_t>
  binary digits:  27
  decimal digits: 8.1
  epsilon:        7.450580597e-09
  minpos:         7.523163845e-37
  maxpos:         1.329227996e+36

Example 2: How 0.1 Looks Across 42 Types

The decimal value 0.1 cannot be represented exactly in binary floating-point. The compare command reveals how each type approximates it, grouped by bit width (small <=32, medium 33-80, large >80):

double> compare 1/10
Type                         Value  Binary
----------------------------------------------------------------------
float                  0.100000001  0b0.01111011.10011001100110011001101
posit8                    1.02e-01  0b0.01.00.101
posit16                 1.0001e-01  0b0.01.00.10011001101
posit32            1.000000001e-01  0b0.01.00.100110011001100110011001101
bfloat16                       0.1  0x0.01111011.1001100
fp16                    9.9976e-02  0b0.01011.1001100110
fp32                9.99999940e-02  0b0.01111011.10011001100110011001100
fp8e2m5                   0.00e+00  0b0.00.00000
fp8e3m4                   9.38e-02  0b0.000.0110
fp8e4m3                   1.02e-01  0b0.0011.101
fp8e5m2                    9.4e-02  0b0.01011.10
fixpnt16                0.10156250  0b00000000.00011010
fixpnt32        0.1000061035156250  0b0000000000000000.0001100110011010
lns8                       0.10511  0b0.11100.11
int8                             0  0b00000000
int16                            0  0b0000000000000000
int32                            0  0b00000000000000000000000000000000
takum8                       0.125  0b0.0.110.0.00
takum16                   0.099976  0b0.0.101.11.100110011
takum32              0.09999999963  0b0.0.101.11.1001100110011001100110011
hfloat32               0.099999964  0b0.1000000.000110011001100110011001
decimal32                0.1000000  0b0.01001.011110.00000000000000000000
rational8                      0.1  0b0000'0001 / 0b0000'1010
rational16                     0.1  0b0000'0000'0000'0001 / 0b0000'0000'0000'1010
rational32                     0.1  0b0000'0000'0000'0000'0000'0000'0000'0001 / 0b0000'0000'0000'0000'0000'0000'0000'1010

Type                            Value  Binary
--------------------------------------------------------------------------------
double            0.10000000000000001  0b0.01111111011.1001100110011001100110011001100110011001100110011010
posit64     1.0000000000000000002e-01  0b0.01.00.10011001100110011001100110011001100110011001100110011001101
fp64          9.99999999999999917e-02  0b0.01111111011.1001100110011001100110011001100110011001100110011001
lns16         0.100112047230168338396  0b0.1111100.10101110
int64                               0  0b0000000000000000000000000000000000000000000000000000000000000000
takum64        0.10000000000000000555  0b0.0.101.11.100110011001100110011001100110011001100110011001101000000
dfixpnt8_4                     0.1000  0.0000000000000000.0001000000000000
dfixpnt16_8                0.10000000  0.00000000000000000000000000000000.00010000000000000000000000000000
hfloat64         0.099999999999999992  0b0.1000000.00011001100110011001100110011001100110011001100110011001
decimal64          0.1000000000000000  0b0.01001.01111110.00000000000000000000000000000000000000000000000000

Type        Value / Binary
--------------------------------------------------------------------------------
fp128       9.99999999999999999999999999999999928e-02
            0b0.011111111111011.1001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001
lns32       0.09999987268604650092473917766255908645689487457275390625
            0b0.111111111111100.1010110110010110
dd          1.0000000000000000000000000000000e-01
            0b0.01111111011.1001100110011001100110011001100110011001100110011010|01100110011001100110011001100110011001100110011001101
dd_cascade  9.999999999999999999999999999999969e-02
            0b0.01111111011.1001100110011001100110011001100110011001100110011010|01100110011001100110011001100110011001100110011001101
td_cascade  1.0000000000000000000000000000000000000000000000002e-01
            0b0.01111111011.1001100110011001100110011001100110011001100110011010|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000
qd          1.000000000000000000000000000000000000000000000000000000000000000e-01
            0b0.01111111011.1001100110011001100110011001100110011001100110011010|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000
qd_cascade  9.99999999999999999999999999999999999999999999999999999999999999991e-02
            0b0.01111111011.1001100110011001100110011001100110011001100110011010|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000|00000000000000000000000000000000000000000000000000000

Notice that decimal32 and rational32 represent 0.1 exactly — decimal uses base-10 encoding and rational stores the fraction 1/10 directly. Every binary type introduces rounding error, but the magnitude differs by orders of magnitude across types.

Now try:

double> compare 0.1

The results are different as we are giving the number systems a double precision floating-point number, 0.1, which approximates 1/10, and that approximation propagates through all the number systems.

Example 3: The Golden Ratio Identity — Measuring Arithmetic Fidelity

The golden ratio satisfies phi^2 - phi - 1 = 0. With native-precision arithmetic, each type reveals its true residual:

posit32> x = phi
1.618033990e+00
posit32> show x * x - x - 1
  value:      0.000000000e+00
  binary:     0b0.0000000000000000000000000000000..
  components: sign: +, regime: -31, exponent: 1, significand: 1
  type:       posit< 32, 2, uint32_t>

Posit32 evaluates to exactly zero (lucky cancellation at this precision). IEEE single shows a residual of one ULP:

fp32> y = phi
1.61803401e+00
fp32> show y * y - y - 1
  value:      1.19209290e-07
  binary:     0b0.01101000.00000000000000000000000
  components: sign: +, scale: -23, significand: 1.000000000e+00
  type:       fp32 (IEEE-754 binary32)

Double-double reveals a residual at its own machine epsilon (~1e-33):

dd> z = phi
1.6180339887498948482045868343656e+00
dd> show z * z - z - 1
  value:      -6.1629758220391547297791294162718e-33
  binary:     0b1.01110010100.000...0|000...0
  components: double-double: -6.16298e-33
  type:       double-double

Posit32 is lucky in this expression as the rounding of phi and phi^2 are in the same direction and yield values exactly 1.0 apart:

Active type: posit32 (posit< 32, 2, uint32_t>)
posit32> x = phi
1.618033990e+00
posit32> xsqr = phi * phi
2.618033990e+00
posit32> vars
  posit32     x          = 1.618033990e+00
  posit32     xsqr       = 2.618033990e+00

posit32> show x
  value:      1.618033990e+00
  color:      01000100111100011011101111001110
  components: sign: +, regime: 0, exponent: 1, significand: 1.61803399026393890381
  type:       posit< 32, 2, uint32_t>
posit32> show xsqr
  value:      2.618033990e+00
  color:      01001010011110001101110111100111
  components: sign: +, regime: 0, exponent: 2, significand: 1.3090169951319694519
  type:       posit< 32, 2, uint32_t>
posit32> show xsqr - x
  value:      1.000000000e+00
  color:      01000000000000000000000000000000
  components: sign: +, regime: 0, exponent: 1, significand: 1
  type:       posit< 32, 2, uint32_t>

The Priest-based dd_cascade shows the same dynamic:

posit32> type dd_cascade
Active type: dd_cascade (double-double Priest)
dd_cascade> y = phi
1.618033988749894848204586834365637e+00
dd_cascade> ysqr = phi * phi
2.618033988749895023991527441632691e+00
dd_cascade> vars
  dd_cascade  y          = 1.618033988749894848204586834365637e+00
  dd_cascade  ysqr       = 2.618033988749895023991527441632691e+00

dd_cascade> show y
  value:      1.618033988749894848204586834365637e+00
  color:      dd_cascade[ high: 1.61803, low: -5.43212e-17 ]
  components: double-double Priest: 1.61803
  type:       double-double Priest
dd_cascade> show ysqr
  value:      2.618033988749895023991527441632691e+00
  color:      dd_cascade[ high: 2.61803, low: 1.21466e-16 ]
  components: double-double Priest: 2.61803
  type:       double-double Priest
dd_cascade> show ysqr - y
  value:      1.000000000000000000000000000000000e+00
  color:      dd_cascade[ high: 1, low: 0 ]
  components: double-double Priest: 1
  type:       double-double Priest

Example 4: Dynamic Range Comparison Across 16-bit Types

The range command reveals how different 16-bit types trade precision for range:

fp16> range
fp16 (IEEE-754 binary16)
[ -6.5504e+04 ... -5.9605e-08  0  5.9605e-08 ... 6.5504e+04 ]

bfloat16> range
bfloat16
[ -3.4e+38 ... -1.2e-38  0  1.2e-38 ... 3.4e+38 ]

posit16> range
posit< 16, 2, uint16_t>
[ -7.2058e+16 ... -1.3878e-17  0  1.3878e-17 ... 7.2058e+16 ]

lns16> range
lns< 16, 8, uint16_t, Saturating>
[ -18396865112328554496 ... -5.436e-20  0  5.436e-20 ... 18396865112328554496 ]

All four are 16 bits, but their tradeoffs are dramatic:

Type	Dynamic Range (decades)	Precision (digits)
fp16	~13 decades	3.0
bfloat16	~76 decades	2.1
posit16	~34 decades	3.3
lns16	~39 decades	~2.4

bfloat16 matches float’s range but sacrifices precision. posit16 delivers more precision than fp16 AND more range. lns16 achieves the widest range of any 16-bit format by encoding values as logarithms.

Example 5: Precision Ladder — From 8-bit to 32-bit

The precision command measures each type’s effective precision at 1.0. This reveals how bit-width translates to decimal accuracy:

fp8e4m3> precision
  type:           fp8e4m3 (OFP 8-bit e4m3)
  binary digits:  3
  decimal digits: 0.9
  epsilon:        1.25e-01
  minpos:         1.95e-03
  maxpos:         4.16e+02

posit16> precision
  type:           posit< 16, 2, uint16_t>
  binary digits:  11
  decimal digits: 3.3
  epsilon:        4.8828e-04
  minpos:         1.3878e-17
  maxpos:         7.2058e+16

fp32> precision
  type:           fp32 (IEEE-754 binary32)
  binary digits:  23
  decimal digits: 6.9
  epsilon:        1.19209290e-07
  minpos:         1.40129846e-45
  maxpos:         3.40282347e+38

posit32> precision
  type:           posit< 32, 2, uint32_t>
  binary digits:  27
  decimal digits: 8.1
  epsilon:        7.450580597e-09
  minpos:         7.523163845e-37
  maxpos:         1.329227996e+36

At 32 bits, posit delivers 27 binary digits near 1.0 compared to IEEE float’s 23 — a 16x smaller epsilon. This means posit32 can resolve perturbations near 1.0 that fp32 cannot (see Example 1).

Example 6: Catastrophic Cancellation

Subtracting nearly equal quantities destroys significant digits. The expression (1 + 1e-8) - 1 should yield 1e-8 but exercises catastrophic cancellation:

fp32> show (1 + 1e-8) - 1
  value:      0.00000000e+00
  binary:     0b0.00000000.00000000000000000000000
  components: sign: +, zero
  type:       fp32 (IEEE-754 binary32)

IEEE single loses the 1e-8 term entirely — it’s below the ULP at 1.0 (which is ~1.2e-7). posit32 preserves the term:

posit32> show (1 + 1e-8) - 1
  value:      7.450580597e-09
  binary:     0b0.00000001.01.000000000000000000000
  components: sign: +, regime: -7, exponent: 2, significand: 1
  type:       posit< 32, 2, uint32_t>

The posit result (7.45e-9) is the nearest posit representable to 1e-8. Double-double recovers nearly full accuracy:

dd> show (1 + 1e-8) - 1
  value:      1.0000000000000000209225608301285e-08
  binary:     0b0.01111100100.0101011110011000111011100010001100001000110000111010|000...0
  components: double-double: 1e-08
  type:       double-double

This pattern is critical in numerical methods where loss-of-significance in intermediate results cascades into large final errors.

Example 7: Faithful Rounding Verification

A result is faithfully rounded if it equals one of the two representable values adjacent to the exact answer. The faithful command checks this against a quad-double reference:

posit32> faithful sqrt(2)
  result:    1.414213561e+00
  reference: 1.414213562373095048801688724209698...e+00
  rounded:   1.414213561e+00
  neighbor:  1.414213568e+00
  faithful:  yes

fp32> faithful sqrt(2)
  result:    1.41421354e+00
  reference: 1.414213562373095048801688724209698...e+00
  rounded:   1.41421354e+00
  neighbor:  1.41421366e+00
  faithful:  yes

Both posit32 and fp32 produce faithfully rounded sqrt(2). The rounded value is the nearest representable, and neighbor is the next representable in the opposite direction. The result must equal one of them.

Example 8: Transcendental Error Profiles with Sweep

The sweep command evaluates an expression across a range and reports ULP error vs a double-precision reference. This reveals where type-specific approximation errors concentrate:

posit32> sweep sin(x) for x in [0, 3.14159, 6]
x                                      result               double ref      ULP error
-------------------------------------------------------------------------------------
0                             0.000000000e+00                        0           0.00
0.628318                      5.877848230e-01      0.58778482293254253           0.04
1.256636                      9.510561898e-01      0.95105618829288086           0.44
1.884954                      9.510570094e-01      0.95105700829655349           0.31
2.513272                      5.877869688e-01      0.58778696973054001           0.85
3.14159                       2.654390983e-06     2.65358979335273e-06       20261.95

The error is sub-ULP through most of the range but explodes near pi where argument reduction subtracts nearly equal quantities. This is a fundamental limitation shared by all binary types — the result near sin(pi) depends on how many digits of pi the type can represent.

Example 9: Exact Decimal Arithmetic for Financial Calculations

Binary floating-point cannot represent 0.1, 0.01, or most decimal fractions exactly. In financial software this causes accumulation errors that violate accounting identities. Decimal fixed-point (dfixpnt) uses BCD encoding and carries every decimal digit without rounding:

double> show 0.1 + 0.2 - 0.3
  value:      5.5511151231257827e-17
  binary:     0b0.01111001001.0000000000000000000000000000000000000000000000000000
  components: sign: +, scale: -54, significand: 1
  type:       double (IEEE-754 binary64)

double> type dfixpnt16_8
Active type: dfixpnt16_8 (dfixpnt< 16,   8, BCD,     Modulo, uint8_t>)
dfixpnt16_8> show 0.1 + 0.2 - 0.3
  value:      0.00000000
  binary:     0.00000000000000000000000000000000.00000000000000000000000000000000
  components: dfixpnt< 16,   8, BCD,     Modulo, uint8_t>: 0
  type:       dfixpnt< 16,   8, BCD,     Modulo, uint8_t>

Double produces a non-zero residual (~5.55e-17) because 0.1 and 0.2 are rounded on entry. dfixpnt16_8 yields exactly zero.

This matters when totals must balance to the penny. Consider an invoice with three items at $19.99, two at $5.99, and one at $1.50, plus 7.25% sales tax:

dfixpnt16_8> tax = 0.0725
0.07250000
dfixpnt16_8> subtotal = 19.99 * 3 + 5.99 * 2 + 1.50
73.45000000
dfixpnt16_8> show subtotal + subtotal * tax
  value:      78.77512500
  binary:     0.00000000000000000000000001111000.01110111010100010010010100000000
  components: dfixpnt< 16,   8, BCD,     Modulo, uint8_t>: 78.7751
  type:       dfixpnt< 16,   8, BCD,     Modulo, uint8_t>

The subtotal is exactly $73.45, tax is exactly $5.325125, and the grand total is exactly $78.775125. The same calculation in double:

double> tax = 0.0725
0.072499999999999995
double> subtotal = 19.99 * 3 + 5.99 * 2 + 1.50
73.450000000000003
double> show subtotal + subtotal * tax
  value:      78.775125000000003
  binary:     0b0.10000000101.0011101100011001101110100101111000110101001111111000
  components: sign: +, scale: 6, significand: 1.230861328125
  type:       double (IEEE-754 binary64)

Double’s subtotal is already 73.450000000000003 — off by 3e-15. These errors are invisible in a single calculation but accumulate across thousands of line items in a ledger, eventually causing reconciliation failures. Decimal fixed-point eliminates this class of error entirely.

Example 10: Takum’s Uniform Precision Across the Dynamic Range

Posit arithmetic concentrates precision near 1.0 by using a variable-length regime field: values close to 1.0 get many fraction bits, but extreme values consume most bits on the regime, leaving few for the significand. Takum (Hunhold, 2024) replaces the variable-length regime with a bounded characteristic field, giving a more uniform precision distribution and a dramatically wider dynamic range.

At 32 bits, both types deliver identical precision near 1.0:

takum32> precision
  type:           takum< 32, 3, uint32_t>
  binary digits:  27
  decimal digits: 8.1
  epsilon:        7.450580597e-09
  minpos:         1.727235358e-77
  maxpos:         5.789601701e+76

posit32> precision
  type:           posit< 32, 2, uint32_t>
  binary digits:  27
  decimal digits: 8.1
  epsilon:        7.450580597e-09
  minpos:         7.523163845e-37
  maxpos:         1.329227996e+36

Same epsilon, same 27 binary digits at 1.0. But takum32 spans 10^77 while posit32 reaches only 10^36 — over twice the dynamic range in decades.

The difference becomes dramatic away from 1.0. Compare the ULP at increasing scales:

Scale	takum32 ULP	posit32 ULP	Relative ULP (takum)	Relative ULP (posit)
1	7.45e-9	7.45e-9	7.45e-9	7.45e-9
1e5	5.96e-3	5.96e-3	5.96e-8	5.96e-8
1e10	1,192	9,537	1.19e-7	9.54e-7
1e15	1.19e8	1.53e10	1.19e-7	1.53e-5
1e20	2.38e13	2.44e16	2.38e-7	2.44e-4
1e30	2.38e23	6.44e28	2.38e-7	6.44e-2

Takum’s relative precision stays nearly constant (~2e-7) across 30 decades of scale. Posit’s degrades from 7.45e-9 at 1.0 to 0.064 at 1e30 — a factor of 8.6 million. At 1e30, posit32 has barely one significant digit left.

This is visible in the representations themselves:

takum32> show 1e20
  value:      1.00000002e+20
  binary:     0b0.1.110.000011.010110101111000111011
  components: ... Characteristic :    66 Scale :    66
  type:       takum< 32, 3, uint32_t>

posit32> show 1e20
  value:      1.000159405e+20
  binary:     0b0.111111111111111110.10.01011011000
  components: sign: +, regime: 16, exponent: 4, significand: 1.35546875
  type:       posit< 32, 2, uint32_t>

Takum32 represents 1e20 to 8 significant digits (1.00000002e+20). Posit32 manages only 4 (1.000159405e+20 — off by 1.6e16). The posit’s regime field has expanded to 18 bits, leaving only 11 for exponent and significand. Takum’s characteristic field stays bounded, preserving fraction bits at every scale.

A sweep of sqrt(x) across a wide range confirms the pattern:

takum16> sweep sqrt(x) for x in [0.001, 1e12, 8]
x                                      result               double ref      ULP error
-------------------------------------------------------------------------------------
0.001                                0.031616     0.031622776601683791           0.21
1.4285714e+11                      3.7888e+05       377964.47300922842           0.31
4.2857143e+11                      6.5536e+05       654653.67070797761           0.14
8.5714286e+11                       9.257e+05       925820.09977255156           0.03
1e+12                              9.9942e+05       1000000.0000000001           0.29

posit16> sweep sqrt(x) for x in [0.001, 1e12, 8]
x                                      result               double ref      ULP error
-------------------------------------------------------------------------------------
0.001                              3.1616e-02     0.031622776601683791           0.43
1.4285714e+11                      3.7069e+05       377964.47300922842           2.46
4.2857143e+11                      6.4307e+05       654653.67070797761           2.26
8.5714286e+11                      9.0931e+05       925820.09977255156           4.56
1e+12                              9.7894e+05       1000000.0000000001          10.78

At 1e12, posit16’s sqrt has 10.78 ULP error vs takum16’s 0.29 — a 37x improvement. Takum maintains sub-ULP accuracy across the entire range because it never runs out of fraction bits.

Available Types

ucalc registers 42 types spanning the major number system families:

Family	Types
Integer	int8, int16, int32, int64
Fixed-point	fixpnt16, fixpnt32
Decimal fixed-point	dfixpnt8_4, dfixpnt16_8
Native IEEE	float, double
Classic float	fp16, fp32, fp64, fp128
Google Brain Float	bfloat16
FP8 (Deep Learning)	fp8e2m5, fp8e3m4, fp8e4m3, fp8e5m2
Logarithmic	lns8, lns16, lns32
Posit	posit8, posit16, posit32, posit64
Takum	takum8, takum16, takum32, takum64
Decimal float	decimal32, decimal64
Hexadecimal float	hfloat32, hfloat64
Rational	rational8, rational16, rational32
Multi-component	dd, dd_cascade, td_cascade, qd, qd_cascade