ZFP Block: Compressed Floating-Point for Scientific Data
Scientific simulations produce enormous datasets: a 3D fluid dynamics simulation on a 1024^3 grid stores 4 billion floating-point values per time step. At double precision, one snapshot is 32 GB. A 1000-step run generates 32 TB. Storing, transmitting, and analyzing this data is the bottleneck in modern computational science.
ZFP (developed at Lawrence Livermore National Laboratory) is a block-based floating-point compression codec designed specifically for scientific data. Unlike general-purpose compressors (gzip, lz4), ZFP exploits the structure of floating-point arrays: spatial coherence, smooth variation between neighboring values, and the specific bit layout of IEEE-754. The result is 2-10x compression with controllable accuracy, including a lossless mode.
The Universal zfpblock type brings ZFP’s compression pipeline into the type system, enabling compressed storage that integrates seamlessly with numerical computation.
zfpblock<Real, Dim> is a compressed floating-point block:
| Parameter | Type | Default | Description |
|---|---|---|---|
Real | typename | float | Base floating-point type (float or double) |
Dim | unsigned | 1 | Dimensionality (1D, 2D, or 3D) |
Block Sizes
Section titled “Block Sizes”| Dimension | Block Shape | Values per Block |
|---|---|---|
| 1D | 4 | 4 |
| 2D | 4 × 4 | 16 |
| 3D | 4 × 4 × 4 | 64 |
Standard Type Aliases
Section titled “Standard Type Aliases”using zfp1f = zfpblock<float, 1>; // 1D float blocksusing zfp1d = zfpblock<double, 1>; // 1D double blocksusing zfp2f = zfpblock<float, 2>; // 2D float blocksusing zfp2d = zfpblock<double, 2>; // 2D double blocksusing zfp3f = zfpblock<float, 3>; // 3D float blocksusing zfp3d = zfpblock<double, 3>; // 3D double blocksCompression Modes
Section titled “Compression Modes”| Mode | Control | Description |
|---|---|---|
fixed_rate | Bits per value | Constant bitstream size; predictable memory |
fixed_precision | Max bit planes | Maximum number of bit planes to encode |
fixed_accuracy | Error tolerance | Absolute error bound per value |
reversible | Lossless | All bit planes preserved; exact round-trip |
Key Properties
Section titled “Key Properties”- Block-based: compresses 4^Dim values at a time
- Progressive: can decompress partial bit planes for preview
- Lossy + lossless: configurable accuracy/size trade-off
- Spatial decorrelation: lifting transform exploits local smoothness
- Negabinary encoding: efficient sign handling without separate bit
- Random access: decompress any block independently
How It Works
Section titled “How It Works”The ZFP codec applies a multi-stage pipeline to each block:
Compression Pipeline
Section titled “Compression Pipeline”- Block-float exponent: extract shared exponent for the block (like block-float quantization)
- Lifting transform: decorrelate values using a forward wavelet-like transform that predicts each value from its neighbors
- Reorder by sequency: permute coefficients to group by frequency (low-frequency first)
- Negabinary encoding: convert to sign-preserving two’s complement where the sign is embedded in the least significant bits
- Bit-plane encoding: emit bits from most significant plane to least, achieving progressive precision
Decompression Pipeline
Section titled “Decompression Pipeline”The pipeline runs in reverse: decode bit planes → negabinary decode → inverse reorder → inverse lifting transform → restore exponent.
Compression Ratio
Section titled “Compression Ratio”Depends on data smoothness and mode:
- Smooth data (temperature fields, pressure): 4-10x lossy, 1.5-2x lossless
- Noisy data (turbulence, particle positions): 2-4x lossy, 1.1-1.5x lossless
- fixed_rate mode: exact control over bits per value (e.g., 8 bits/value = 4x for float)
How to Use It
Section titled “How to Use It”Include
Section titled “Include”#include <universal/number/zfpblock/zfpblock.hpp>using namespace sw::universal;Basic Block Compression
Section titled “Basic Block Compression”// Compress a 4×4 block of float datazfpblock<float, 2> block;
float data[4][4] = { { 1.0f, 1.1f, 1.2f, 1.3f }, { 1.1f, 1.2f, 1.3f, 1.4f }, { 1.2f, 1.3f, 1.4f, 1.5f }, { 1.3f, 1.4f, 1.5f, 1.6f }};
block.encode(&data[0][0], fixed_rate, 8); // 8 bits per value = 4x compression
float reconstructed[4][4];block.decode(&reconstructed[0][0]);
// Measure reconstruction errorfor (int i = 0; i < 4; ++i) for (int j = 0; j < 4; ++j) std::cout << "error[" << i << "][" << j << "] = " << std::abs(data[i][j] - reconstructed[i][j]) << std::endl;Compressed Array Container
Section titled “Compressed Array Container”// zfparray provides a compressed array interfacezfparray<float, 2> compressed_field(1024, 1024);
// Write values (transparently compressed in blocks)for (int i = 0; i < 1024; ++i) for (int j = 0; j < 1024; ++j) compressed_field(i, j) = std::sin(i * 0.01) * std::cos(j * 0.01);
// Read values (transparently decompressed)float val = compressed_field(512, 512);
// Storage: ~4x less than uncompressed float arraySimulation Data Checkpointing
Section titled “Simulation Data Checkpointing”// Save simulation state with ZFP compressiontemplate<typename Real, unsigned Dim>void checkpoint(const std::string& filename, const Real* data, size_t n, int bits_per_value) { // Compress blocks and write to file zfpblock<Real, Dim> block; // ... iterate over blocks, compress, write ...}
// 3D fluid dynamics: 64 values per block, 8 bits/value = 4x compression// 32 TB simulation → 8 TB compressed checkpointsProblems It Solves
Section titled “Problems It Solves”| Problem | How zfpblock Solves It |
|---|---|
| Scientific simulation data is too large to store | 2-10x compression with controllable accuracy |
| General compressors don’t understand floating-point | ZFP exploits IEEE-754 structure and spatial coherence |
| Need random access to compressed data | Each block is independently decompressible |
| Need both lossless and lossy modes | Reversible mode for exact round-trip, lossy modes for higher compression |
| Memory bandwidth limits simulation performance | Compressed in-memory storage reduces bandwidth needs |
| Checkpoint I/O dominates simulation runtime | 4-10x smaller checkpoints = proportionally faster I/O |
| Progressive visualization of large datasets | Bit-plane encoding enables multi-resolution preview |