rtc-digit-count | Nihilai Collective

What It Is

rtc-digit-count is an algorithm for counting the number of decimal digits in an unsigned 32- or 64-bit integer. It eliminates the log₂ → log₁₀ approximation step used by every prior state-of-the-art approach, replacing it with a direct lookup keyed on the value's leading-zero count.

Counting decimal digits is a load-bearing primitive in serialization. Every integer written to JSON, every number formatted to a string, every printf-style numeric conversion hits this code path. Shaving instructions off this kernel propagates through every workload that emits numbers as text — which is most of them.

Why

The problem reduces to: given x, find the smallest n such that x < 10ⁿ. The established techniques — Lemire's and the prior fast-digit-count — approximate log₁₀(x) from log₂(x) using a magic-constant multiplication ((9 * log2) >> 5, (77 * log2) >> 8), then correct the off-by-one error with a single threshold comparison.

Both halves of that pipeline are paid for on every call: the multiply, the shift, and the correction.

rtc-digit-count observes that the multiplication is unnecessary. The leading-zero count already partitions the integer space into log₂ bins, and the digit count for each bin is a constant known at compile time. A 33-byte (uint32) or 65-byte (uint64) lookup table maps the leading-zero count directly to the digit count. The only correction needed is the same threshold comparison the prior approaches already paid for — but now without the multiply in front of it.

Zero multiplication. Zero shift. One LZCNT, one L1 load, one comparison, one branchless add.

The Algorithm

uint64 path — the entire implementation

template<uns64_t value_type>
uint32_t rtc_digit_count(const value_type x) {
    static constexpr uint8_t digit_counts[] {
        19,19,19,19, 18,18,18, 17,17,17, 16,16,16,16,
        15,15,15, 14,14,14, 13,13,13,13, 12,12,12,
        11,11,11, 10,10,10,10, 9,9,9, 8,8,8,
        7,7,7,7, 6,6,6, 5,5,5, 4,4,4,4,
        3,3,3, 2,2,2, 1,1,1,1,1
    };
    static constexpr uint64_t thresholds[] {
        0ull, 9ull, 99ull, 999ull, 9999ull, 99999ull, 999999ull,
        9999999ull, 99999999ull, 999999999ull, 9999999999ull,
        99999999999ull, 999999999999ull, 9999999999999ull,
        99999999999999ull, 999999999999999ull, 9999999999999999ull,
        99999999999999999ull, 999999999999999999ull, 9999999999999999999ull
    };
    const uint32_t base = digit_counts[std::countl_zero(x)];
    return base + static_cast<uint32_t>(x > thresholds[base]);
}

The 32-bit path is identical in shape, with a 33-byte LUT and 11-entry threshold table. Both tables fit comfortably inside a single L1 cache line and stay hot across any realistic workload.

Pipeline Comparison

Lemire

countl_zero → subtract → multiply (×9) → shift (>>5) → load → compare → add

fast-digit-count

countl_zero → subtract → multiply (×77) → shift (>>8) → load → compare → add

rtc-digit-count

countl_zero → load → compare → add

Net

Two arithmetic operations eliminated from the critical path on every call. Shorter dependency chain, lower instruction count, better port pressure.

Features

Header-only, single C++20 template function
Specialized for uint32_t and uint64_t
Zero multiplication on the hot path
Single hardware LZCNT/CLZ instruction
L1-resident lookup tables (33 / 65 bytes)
Branchless correction step

Cross-platform CI: MSVC, GCC, Clang on Windows / Linux / macOS
Bit-exact validation against reference implementations on every run
Benchmarked at batch sizes from 1 to 1,000,000
Statistical tie detection via 95% confidence interval overlap
MIT licensed
No dependencies beyond <bit>

Quick Start

Drop in the function

#include <bit>
#include <cstdint>

// rtc-digit-count goes here

int main() {
    uint64_t x = 18446744073709551615ull;
    uint32_t digits = rtc_digit_count(x);
    // digits == 20
}

Build the benchmark suite

git clone https://github.com/nihilai-collective/rtc-digit-count.git
cd rtc-digit-count
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
./build/rtc-digit-count

CMake pulls BenchmarkSuite and rt-ut automatically via FetchContent. Requires C++20.

Correctness

Every benchmark run is preceded by a bit-exact validation pass: each candidate algorithm processes the same input set and its output is compared element-by-element against a reference implementation. Any divergence aborts the run with the exact failing input printed. No algorithm is benchmarked before it is proven correct on that machine, with that compiler, on that exact set of random inputs.

Validation runs across the full Windows MSVC, Ubuntu GCC, Ubuntu Clang, macOS Clang, and macOS GCC matrix on every push.

Benchmark Methodology

Throughput is measured in megabytes processed per second (MB/s), benchmarked against Lemire's digit-count and the prior fast-digit-count approach on identical hardware.

The harness uses adaptive epoch sampling: iterations begin at 60 and double each epoch (60 → 120 → 240 → …) up to a maximum of 1,200 iterations per epoch. Each epoch evaluates a trailing window of max(iterations / 10, 10) samples, capped at 100,000. Convergence requires RSE < 2.5% AND mean shift < 1% epoch-over-epoch simultaneously — the first epoch satisfying both conditions is retained as the canonical result.

If convergence is never reached before 5 seconds elapse or the iteration cap is hit, the result is marked non-converged and excluded from all rankings — only converged results participate in win/tie/loss tallying. Statistical ties are detected via Welch's t-test on Bessel-corrected variance, so a small delta inside the noise floor doesn't get dressed up as a victory.

The output of each call is consumed through do_not_optimize_away() so the compiler can't eliminate the work, and the random input vectors are regenerated per iteration to prevent branch predictors and data caches from learning the input distribution.

Tests cover both uint32 and uint64 at batch sizes of 1, 10, 100, 1,000, 10,000, and 100,000 — spanning the per-call regime (where memory latency dominates) through the large-batch regime (where instruction throughput dominates). The CPU, OS, and compiler for each run are detected at build time and recorded in that platform's result file.

Platforms: Windows, Linux, macOS — Compilers: MSVC, GCC, Clang — Harness: BenchmarkSuite

View the full benchmark repository →

Results

Platform / Compiler

Integer Width

Loading benchmarks

Acknowledgments

rtc-digit-count is a direct response to — and improvement on — prior work in this space. Credit where it's owed:

Daniel Lemire — the magic-constant log-multiplication approach that established the modern baseline for fast digit counting.
fast-digit-count — the refined 77/256 multiplier variant that improved on Lemire's constant.
BenchmarkSuite — the measurement harness powering the comparisons here.