Skip to content

Mixed Precision Software Development Kit

Mixed-Precision SDK GAP Analysis Summary

CategoryComponentsLocation
Error AnalysisAbsoluteError, RelativeError, LogRelativeError, calculateNrOfValidBitsinclude/sw/universal/utility/error.hpp
Error-Free Opstwo_sum, two_prod, two_diff for exact arithmeticinclude/sw/universal/numerics/error_free_ops.hpp
Quantizationqsnr() - Signal-to-Quantization-Noise Ratioinclude/sw/universal/quantization/qsnr.hpp
Stabilitycondest() - condition number, nbe() - backward errorinclude/sw/blas/utes/
PerformancePerformanceRunner benchmark frameworkinclude/sw/universal/benchmark/
BLASMixed-precision CG variants (fdp_fdp, dot_fdp, etc.)include/sw/blas/solvers/
ExamplesDNN inference, DSP, roots, interpolation, integrationmixedprecision/, applications/mixed-precision/
GapDescriptionImpact
Type Selection FrameworkNo automated recommendation of posit vs cfloat vs fixpnt based on requirementsUsers guess at type selection
Error Budget AnalyzerNo error allocation across algorithm stepsCan’t predict accuracy before running
Precision SchedulingNo framework for mixed-precision assignmentsManual trial-and-error optimization
Dynamic Range AnalysisNo saturation/overflow/underflow detectionSilent precision loss
Algorithm Instrumentation APINo standard hooks for metrics collectionCan’t profile mixed-precision codes
GapDescription
Memory Footprint AnalysisNo per-operation memory estimation
Bandwidth PredictionNo data movement quantification
Cache-Aware ToolsNo cache line optimization
Auto-Tuning FrameworkNo search for optimal precision assignments
Regression DetectionNo automated baseline comparison
include/sw/universal/sdk/
├── type_selection/
| ├── type_advisor.hpp # Recommend types from requirements
| └── precision_calibrator.hpp # Calibrate for specific algorithms
├── error_analysis/
| ├── error_budget.hpp # Allocate error across operations
| └── error_propagator.hpp # Track error growth
├── quantization/
| ├── range_analyzer.hpp # Dynamic range analysis
| └── overflow_detector.hpp # Saturation detection
└── profiling/
├── instrumentation.hpp # Metrics collection hooks
└── performance_model.hpp # Execution time prediction
  1. Type Advisor - Given range + error tolerance → recommend type
  2. Range Analyzer - Detect overflow/underflow risks
  3. Error Propagation Tracker - Wrap ops with cumulative error tracking
  4. Performance Predictor - Empirical model from benchmarks
TierDescriptionStatus
Foundational (types, BLAS, basic error)~70% complete
Workflow Support (calibration, tuning)~40% complete
Algorithm Design Support (advisors, instrumentation)~20% complete

Current State: Named but Not Implemented

The benchmark/energy/ directory exists but does NOT measure energy - it only counts operations and measures accuracy. This is a critical gap since energy efficiency is a PRIMARY motivation for mixed-precision.

What EXISTS for Energy

ComponentLocationWhat It DoesEnergy Support
Operation Countingutility/occurrence.hppCounts add/mul/div/load/storeNo cost model
Performance Runnerbenchmark/performance_runner.hppWall-clock timingTime only, not Joules
Memory Footprintapplications/mixed-precision/dnn/conv2dTracks bytes usedNo energy conversion
BLAS Benchmarksbenchmark/energy/blas/Dot, GEMM, MatVecMisleading name - no energy
Gap: Per-Operation Energy Costs
Description: Cost tables mapping (operation + bit-width) → picojoules
Why Critical: 8-bit mul uses ~10x less energy than 32-bit - this IS mixed-precision's value proposition
────────────────────────────────────────
Gap: Hardware Counter Integration
Description: RAPL (Intel), NVML (GPU), ARM energy probes
Why Critical: Can't measure actual energy without hardware hooks
────────────────────────────────────────
Gap: Memory Hierarchy Energy Model
Description: L1/L2/L3/DRAM access costs
Why Critical: Memory dominates energy in most algorithms
────────────────────────────────────────
Gap: Energy-Aware Type Selector
Description: Given accuracy + energy budget → recommend types
Why Critical: Users can't make energy-informed decisions
Energy Cost Reality (What's Missing)
// MISSING: This is what users need to make decisions
constexpr struct IntelSkylakeEnergy {
// Operations (picojoules)
double add_8bit = 0.5; double add_32bit = 2.0; // 4x ratio
double mul_8bit = 2.0; double mul_32bit = 10.0; // 5x ratio
double div_8bit = 10.0; double div_32bit = 50.0; // 5x ratio
// Memory access (picojoules)
double L1_access = 10;
double L2_access = 50;
double L3_access = 200;
double DRAM_access = 1000; // 100x more than L1!
};
include/sw/universal/energy/
├── cost_models/
| ├── intel_skylake.hpp # Per-op energy costs
| ├── arm_cortex_a.hpp
| └── generic.hpp # Conservative defaults
├── hw_counters/
| ├── rapl.hpp # Intel/AMD energy counters
| ├── nvml.hpp # NVIDIA GPU energy
| └── perf_event.hpp # Linux performance counters
├── profilers/
| ├── operation_energy.hpp # Extend occurrence.hpp with costs
| ├── memory_energy.hpp # Cache miss → energy
| └── algorithm_energy.hpp # Full algorithm profiling
└── optimization/
├── energy_type_advisor.hpp # Recommend types for energy budget
├── energy_budget.hpp # Allocate energy across steps
└── pareto_explorer.hpp # Energy vs accuracy tradeoffs
PriorityComponentEffortImpact
P1Energy cost tables (data)1 weekEnables all energy reasoning
P1RAPL integration (Linux/Intel)1 weekActual energy measurement
P2occurrence_with_energy.hpp3 daysMaps counts → joules
P2Memory energy profiler1 weekQuantifies data movement cost
P3Energy-aware type advisor2 weeksAutomated type selection
P3Pareto front explorer1 weekVisualize tradeoffs
CategoryPrevious AssessmentRevised with Energy Focus
Numerical Tools70% complete70% (unchanged)
Quantization Tools40% complete40% (unchanged)
Memory Tools20% complete20% (unchanged)
Energy ToolsNot assessed5% complete (only op counting exists)
Overall SDK~45%~35% (energy is foundational, not optional)

The library cannot currently answer the fundamental question: “How much energy will my mixed-precision algorithm use?”