Nihilai Collective Logo

NIHILAI COLLECTIVE

Performance Engineering From First Principles

What do we do?
We get as close to the ceiling of performance in C++ as possible. More specifically, what we do, is take a given problem, map it to a given hardware target, such that it will execute as efficiently as possible, given the target problem space.

The Method

Every library published here was built the same way: identify that which can be moved to compile time, then what can be moved to "initialization time", then, finally, design the algorithm in such a way that it maps to the target hardware as efficiently as possible. The benchmark is the proof.

Track Record

  • void-numerics

    Numeric utilities library. Against the C++ standard library across the full platform matrix: wins, ties, losses. Published to the Microsoft vcpkg registry.
  • Jsonifier

    JSON parser and serializer. Against simdjson and Glaze across the full platform matrix: wins, ties, losses.
  • Digit-Counting Algorithm

    Novel direct-lookup approach for counting digits in unsigned 64-bit integers. Against Lemire and fast-digit-count across the full platform matrix: wins, ties, losses.
  • Tested across
    Ubuntu/CLANG Ubuntu/GCC MacOS/CLANG MacOS/GCC Windows/MSVC

Nihilus — LLM Inference Engine

The same compile-time methodology applied to transformer inference at scale.

During a transformer forward pass, only two parameters are truly runtime-mutable: batch size and sequence length. Nihilus uses this to determine the structure of execution at compile time — model topology, tensor layouts, memory plans, dispatch logic — collapsing into a 13 KB architectural representation populated just before each forward pass and resident in GPU constant memory during execution.

A generation request enters the GPU once. The decode loop executes there. Per-token host orchestration overhead: structurally zero.

Nihilus Project Page →   

Projects
Benchmarked and unit tested, ready to be tested by you.
Consulting
Available
Performance-critical C++ engagements. The benchmarks are the proof.