Verification¶

Mneme verifies replay correctness by comparing the device memory state after kernel execution (the replay epilogue) against the recorded epilogue snapshot. By default this comparison is an exact byte-wise check.

Annotations let you override this default on a per-buffer basis by attaching typed comparison metadata — data type, error threshold, norm, and an optional human-readable tag — to any device pointer before a kernel is recorded. The metadata is persisted inside the recorded snapshot and used automatically during every subsequent replay.

Why annotations?¶

Many GPU kernels produce floating-point results where bit-exact reproduction is neither expected nor required. Without annotations, any recompilation or launch-parameter change that introduces floating-point rounding differences will cause verification to fail, even when the result is numerically acceptable.

Annotations solve this by letting you declare, at record time, what "correct enough" means for each buffer.

The `mneme::annotate` API¶

The annotation API is declared in mneme/MnemeAnnotation.hpp, which is installed automatically when you build with Mneme.

Include and link¶

#include "mneme/MnemeAnnotation.hpp"

If your CMake target was created with add_mneme(my_target), the annotation runtime library (mnemert) is linked automatically and no extra target_link_libraries call is required.

Basic usage¶

double *d_output = nullptr;
cudaMalloc(&d_output, N * sizeof(double));

// Annotate *before* the kernel launch that will be recorded.
mneme::annotate(d_output, mneme::Metadata{
    .builtin        = mneme::BuiltinDType::F64,
    .threshold      = 1e-6,
    .threshold_kind = mneme::ThresholdKind::Relative,
    .norm           = mneme::Norm::Linf,
    .tag            = std::string("output_vector"),
});

my_kernel<<<grid, block>>>(d_output, ...);

The typed convenience overload deduces builtin from the pointer type:

mneme::annotate<double>(d_output, mneme::Metadata{
    .threshold      = 1e-6,
    .threshold_kind = mneme::ThresholdKind::Relative,
    .norm           = mneme::Norm::Linf,
});

Updating annotations between launches¶

You can call annotate on the same pointer more than once. Each kernel launch captures the annotation state that was active when recording began, so different dynamic instances of the same kernel can carry different verification policies:

mneme::annotate(d_out, mneme::Metadata{
    .builtin        = mneme::BuiltinDType::F64,
    .threshold      = 0.01,
    .threshold_kind = mneme::ThresholdKind::Relative,
    .norm           = mneme::Norm::Linf,
    .tag            = std::string("out_loose"),
});
vecAdd<<<4, 128>>>(d_in, d_out, N);

// Tighten the tolerance for the next invocation.
mneme::annotate(d_out, mneme::Metadata{
    .builtin        = mneme::BuiltinDType::F64,
    .threshold      = 0.005,
    .threshold_kind = mneme::ThresholdKind::Relative,
    .norm           = mneme::Norm::Linf,
    .tag            = std::string("out_tight"),
});
vecAdd<<<8, 64>>>(d_in, d_out, N);

Metadata fields¶

`BuiltinDType`¶

Tells Mneme how to interpret the raw bytes in the buffer.

Value	C++ type
`U8`	`uint8_t`
`I8`	`int8_t`
`U16`	`uint16_t`
`I16`	`int16_t`
`U32`	`uint32_t`
`I32`	`int32_t`
`U64`	`uint64_t`
`I64`	`int64_t`
`F16`	`__half`
`F32`	`float`
`F64`	`double`

When you use the typed overload mneme::annotate<T>(ptr, md), the builtin field is set automatically from the pointer type.

`threshold`¶

A non-negative double that defines the maximum acceptable error. Its exact meaning depends on threshold_kind and norm (see below).

`ThresholdKind`¶

Controls how the per-element error between the replayed value r and the recorded value e is computed.

Value	Formula
`Absolute`	$\lvert r - e \rvert$
`Relative`	$\frac{\lvert r - e \rvert}{\lvert e \rvert}$

`Norm`¶

Determines how per-element errors are aggregated across the buffer.

Value	Semantics
`None`	Per-element mode. Each element is individually compared against `threshold`. Verification fails if any element exceeds the threshold.
`L1`	$\sum_i \text{err}(i) \leq \text{threshold}$
`L2`	$\sqrt{\sum_i \text{err}(i)^2} \leq \text{threshold}$
`Linf`	$\max_i \text{err}(i) \leq \text{threshold}$

When norm is L1, L2, or Linf, the aggregated value is compared against threshold to determine pass/fail.

`tag` (optional)¶

A free-form string stored alongside the buffer snapshot. Tags are useful for identifying buffers in diagnostic output and multi-buffer kernels. They are not interpreted by Mneme.

How verification works at replay time¶

Mneme restores the recorded prologue device memory state.
The kernel is recompiled and executed.
For every recorded memory blob, Mneme dispatches a GPU comparison kernel that applies the BuiltinDType, ThresholdKind, and Norm stored in the blob's annotation metadata.
If all blobs pass, the replay result reports "verified": true.

Buffers that were not annotated fall back to the default byte-wise comparison (equivalent to threshold = 0, Norm::None).

Complete example¶

The following program annotates two device buffers with different tolerances and records two kernel launches, each capturing a distinct annotation state.

#include <cstdlib>
#include <iostream>
#include "mneme/MnemeAnnotation.hpp"

// Use DeviceTraits or plain CUDA/HIP runtime calls.
#ifdef __ENABLE_CUDA__
#include <cuda_runtime.h>
#define gpuMalloc   cudaMalloc
#define gpuMemset   cudaMemset
#define gpuFree     cudaFree
#define gpuSync     cudaDeviceSynchronize
#elif defined(__ENABLE_HIP__)
#include <hip/hip_runtime.h>
#define gpuMalloc   hipMalloc
#define gpuMemset   hipMemset
#define gpuFree     hipFree
#define gpuSync     hipDeviceSynchronize
#endif

template <typename T>
__global__ void vecAdd(T *in, T *out, size_t n) {
  auto tid = threadIdx.x + blockIdx.x * blockDim.x;
  if (tid < n)
    out[tid] += in[tid] + tid;
}

int main(int argc, const char *argv[]) {
  size_t N = std::atoi(argv[1]);
  double *in = nullptr, *out = nullptr;

  gpuMalloc((void **)&in,  N * sizeof(double));
  gpuMalloc((void **)&out, N * sizeof(double));
  gpuMemset(in,  0, N * sizeof(double));
  gpuMemset(out, 0, N * sizeof(double));

  // Annotate both pointers before the first recorded launch.
  mneme::annotate(in, mneme::Metadata{
      .builtin        = mneme::BuiltinDType::F64,
      .threshold      = 0.125,
      .threshold_kind = mneme::ThresholdKind::Absolute,
      .norm           = mneme::Norm::L2,
      .tag            = std::string("in_vec"),
  });

  mneme::annotate(out, mneme::Metadata{
      .builtin        = mneme::BuiltinDType::F64,
      .threshold      = 0.01,
      .threshold_kind = mneme::ThresholdKind::Relative,
      .norm           = mneme::Norm::Linf,
      .tag            = std::string("out_loose"),
  });

  vecAdd<<<4, 128>>>(in, out, N);
  gpuSync();

  // Update the annotation for the next launch.
  mneme::annotate(out, mneme::Metadata{
      .builtin        = mneme::BuiltinDType::F64,
      .threshold      = 0.005,
      .threshold_kind = mneme::ThresholdKind::Relative,
      .norm           = mneme::Norm::Linf,
      .tag            = std::string("out_tight"),
  });

  vecAdd<<<8, 64>>>(in, out, N);
  gpuSync();

  gpuFree(in);
  gpuFree(out);
  return 0;
}

Record, replay, and verify as usual:

mneme record  -rdb db/ -- ./vecAddAnnotations 1024
mneme replay  -rdb db/<static-hash>.json -rid <dynamic-hash> "default<O3>"

The replay JSON result will contain "verified": true when the replayed epilogue satisfies every annotated buffer's tolerance.

Summary¶

Concept	Description
`mneme::annotate(ptr, md)`	Attach verification metadata to a device pointer
`BuiltinDType`	Scalar type used to interpret buffer contents
`ThresholdKind`	Absolute or relative error formula
`Norm`	Per-element (`None`) or aggregate (`L1`, `L2`, `Linf`) comparison
`tag`	Optional label for diagnostics

Annotations are recorded once and applied automatically on every replay. Buffers without annotations continue to use the default byte-wise comparison.