Multi-Limb Arithmetic Types in Universal
The Universal library implements high-performance multi-limb arithmetic types that serve as the foundation for all custom number systems. These types enable arbitrary precision arithmetic while maintaining efficiency through block-based storage and operation-specific optimizations.
Overview
Section titled “Overview”Multi-limb arithmetic in Universal is built around several core components located in include/sw/universal/internal/:
blockbinary- General-purpose multi-limb integer arithmeticblockfraction- Floating-point fraction managementblocksignificand- Floating-point significand with encoding optimizationsblocktriple- Complete floating-point representation for arithmetic
These components work together to provide a flexible, efficient foundation for implementing various number systems.
Core Components
Section titled “Core Components”1. blockbinary<nbits, BlockType, NumberType>
Section titled “1. blockbinary<nbits, BlockType, NumberType>”A parameterized blocked binary number system representing integers in 2’s complement or unsigned format.
Template Parameters:
nbits- Total number of bits in the representationBlockType- Underlying storage type (uint8_t, uint16_t, uint32_t, etc.)NumberType- EitherBinaryNumberType::SignedorBinaryNumberType::Unsigned
Key Features:
- Efficient storage using blocks of the specified type
- Support for both signed (2’s complement) and unsigned arithmetic
- Comprehensive set of arithmetic operations
- Overflow and carry detection
- Long division with quotient and remainder
Usage Example:
#include <universal/internal/blockbinary/blockbinary.hpp>
// 48-bit signed integer using 16-bit blocks, yielding 3 limbsblockbinary<48, uint16_t, BinaryNumberType::Signed> big_int;
// 196-bit unsigned integer using 64-bit blocks, also yielding 3 limbsblockbinary<196, uint64_t, BinaryNumberType::Unsigned> unsigned_int;2. blockfraction<nbits, BlockType>
Section titled “2. blockfraction<nbits, BlockType>”Manages floating-point fraction bits with optimizations for different arithmetic operations.
Key Features:
- Fraction bits scaled appropriately for add, multiply, divide operations
- Radix point management for different operation contexts
- Efficient normalization and alignment
- Support for guard bits and sticky bits for accurate rounding
Design Philosophy: The fraction bits in floating-point need different representations for optimal performance:
- Addition/subtraction: 2’s complement encoding for efficient alignment
- Multiplication: 1’s complement encoding for simpler partial product accumulation
- Division: Specialized encoding for long division algorithms
3. blocksignificand<nbits, BlockType, BitEncoding>
Section titled “3. blocksignificand<nbits, BlockType, BitEncoding>”Represents floating-point significands with operation-specific bit encodings.
Template Parameters:
nbits- Number of bits in the significandBlockType- Storage block typeBitEncoding- One ofBitEncoding::Flex,BitEncoding::Ones, orBitEncoding::Twos
Encoding Types:
Flex- Flexible encoding for general useOnes- 1’s complement encoding (optimal for multiplication)Twos- 2’s complement encoding (optimal for addition/subtraction)
Key Features:
- Type-encoded bit representations for compile-time optimization
- Radix point management that adapts to operation requirements
- Efficient conversion between encoding types
- Support for denormalized and normalized forms
4. blocktriple<fbits, BlockTripleOperator, BlockType>
Section titled “4. blocktriple<fbits, BlockTripleOperator, BlockType>”Complete floating-point representation combining sign, exponent, and significand.
Components:
- NaN, Inf, Zero
- Sign bit
- Exponent
- Significand (using
blocksignificandinternally) - Scale factor for denormalized arithmetic results
Key Features:
- Intermediate representation for floating-point arithmetic
- Always normalized in triple form: (sign, exp, significand)
- Comprehensive rounding support (TBD: currently only nearest-tie-to-even)
Architecture:
The blocktriple creates an execution environment for floating-point arithmetic. Configured by the BlockTripleOperator it sets up a blocksignificant with the following structure:
ADD iii.ffffrrrrrrrrr 3 integer bits, f fraction bits, and 2*fhbits rounding bits MUL ii.ffff'ffff 2 integer bits, 2*f fraction bits DIV ii.ffff'ffff'ffff'rrrr 2 integer bits, 3*f fraction bits, and r rounding bitsThe arithmetic operators are presented as in-place methods, that is:
- add(lhs, rhs)
- sub(lhs, rhs)
- mul(lhs, rhs)
- div(lhs, rhs)
The basic usage pattern is:
blocktriple<fbits, BlockTripleOperator::ADD, uint32_t> a, b, c;a = normalize(src_a);b = normalize(src_b);c.add(a, b);convert(c, target);Integration with Number Systems
Section titled “Integration with Number Systems”The multi-limb types are used extensively throughout Universal’s number systems:
cfloat (Classic Floating-Point)
Section titled “cfloat (Classic Floating-Point)”// cfloat uses blockbinary for exponent and blocktriple for arithmetictemplate<unsigned nbits, unsigned es, typename bt, bool hasSubnormals, bool hasMaxExpValues, bool isSaturating>class cfloat { // Internal storage is raw blocks bt _blocks[nrBlocks]; // Arithmetic operations use blockbinary and blocktriple to // manipulate exponent and fraction bits};areal (a faithful Floating-Point format with an uncertainty bit)
Section titled “areal (a faithful Floating-Point format with an uncertainty bit)”Uses similar block-based approach with adaptive precision based on operand requirements.
integer (arbitrary fixed-sized Integer)
Section titled “integer (arbitrary fixed-sized Integer)”Direct use of blockbinary for arbitrary precision integer arithmetic.
lns (Logarithmic Number System)
Section titled “lns (Logarithmic Number System)”Uses blockbinary to encode and implement arithmetic operations.
fixpnt (Fixed-Point)
Section titled “fixpnt (Fixed-Point)”Employs blockbinary with implicit decimal point positioning.
Performance Considerations
Section titled “Performance Considerations”Block Size Selection
Section titled “Block Size Selection”Choose block types based on target architecture:
uint8_t- Minimal memory, good for embedded systemsuint16_t- Balanced performance/memoryuint32_t- Optimal for most 32/64-bit architecturesuint64_t- Maximum performance on 64-bit systems (with carry bit considerations)
Memory Layout
Section titled “Memory Layout”Block types use contiguous storage with little-endian bit ordering within blocks. Here is an example using uint32_t as BlockType:
Block 0: [bits 0-31]Block 1: [bits 32-63]Block 2: [bits 64-95]Operation-Specific Optimizations
Section titled “Operation-Specific Optimizations”- Addition: Uses 2’s complement
blocksignificandfor efficient alignment - Multiplication: Uses 1’s complement
blocksignificandfor simpler partial products - Division: Uses specialized algorithms in
blocksignificandfor quotient/remainder
Error Handling
Section titled “Error Handling”Multi-limb types provide comprehensive error detection:
- Overflow/underflow detection
- Division by zero handling
- Invalid operation identification
- Exception propagation to calling number systems
Extension Points
Section titled “Extension Points”The block-based architecture allows for:
- Custom block types for specialized hardware
- SIMD optimizations for vector operations
- Hardware-specific carry bit handling
- Custom rounding modes and behaviors
Best Practices
Section titled “Best Practices”- Choose appropriate block sizes for your target architecture
- Use type-appropriate encodings for different operations
- Leverage compile-time parameterization for optimal performance
- Handle exceptions appropriately in your number system implementations
- Test extensively with boundary conditions and edge cases
Future Directions
Section titled “Future Directions”- SIMD-optimized implementations
- Hardware-specific assembly optimizations
- KPU and NPU acceleration for parallel operations
- Enhanced compile-time optimizations
This multi-limb arithmetic foundation enables Universal to provide efficient, accurate implementations of diverse number systems while maintaining a clean, extensible architecture.