Python API¶
mneme
¶
Core modules¶
Recorded execution database and memory snapshot bindings.
This module defines the Python-side representation of Mneme’s record/replay artifacts:
- MemStateRef: a lightweight wrapper over the native memory-snapshot object (prologue/epilogue) used for replay verification.
- RecordedExecution: a JSON-serializable description of a recorded kernel, including all observed dynamic instances and the LLVM IR modules required for replay.
The native snapshot API is accessed via ctypes/FFI (ffi.lib.MnemePy_*).
Instances of :class:MemStateRef behave as context managers: they load the
snapshot on entry and dispose the native handle on exit.
Notes
- This file is core to replay correctness: prologue/epilogue snapshots are used to verify that the replayed kernel produced the expected state.
- The JSON schema here is treated as a stable interchange format between record and replay tools.
MemStateRef
¶
Handle to a recorded memory snapshot (prologue/epilogue).
A :class:MemStateRef wraps Mneme’s native memory snapshot representation,
which encodes the recorded kernel argument pointers and the captured device
memory state. During replay, these snapshots serve two purposes:
- Inputs: The prologue snapshot provides the argument pointer list and initial memory contents required to execute the kernel deterministically.
- Verification: The epilogue snapshot represents the expected post-kernel state. Replay compares a reproduced epilogue against this snapshot to validate correctness.
Instances are context managers. Entering the context loads the snapshot into the native handle; leaving the context disposes it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fn
|
str
|
Path to the snapshot file on disk. |
required |
kernel_name
|
str
|
Kernel name associated with this snapshot (used by native layer). |
required |
snap_type
|
SnapshotType
|
Whether this snapshot is a prologue or epilogue capture. |
required |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the snapshot file does not exist. |
Source code in python/mneme/recorded_execution.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 | |
args
property
¶
Return the kernel argument pointer array stored in the snapshot.
Returns:
| Type | Description |
|---|---|
POINTER(c_void_p)
|
Pointer to an array of argument pointers as returned by the native API. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the snapshot has not been loaded via :meth: |
num_args
property
¶
Return the number of kernel arguments recorded in the snapshot.
Returns:
| Type | Description |
|---|---|
int
|
Number of arguments. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the snapshot has not been loaded via :meth: |
__eq__(other)
¶
Compare two snapshots using the native comparison routine.
Returns:
| Type | Description |
|---|---|
bool
|
True if the native layer considers the states equivalent. |
Source code in python/mneme/recorded_execution.py
245 246 247 248 249 250 251 252 253 254 | |
__ne__(other)
¶
Compare two snapshots using the native comparison routine.
Returns:
| Type | Description |
|---|---|
bool
|
True if the native layer considers the states different. |
Source code in python/mneme/recorded_execution.py
256 257 258 259 260 261 262 263 264 265 | |
open()
¶
Initialize and load the snapshot into the native handle.
Returns:
| Type | Description |
|---|---|
MemStateRef
|
Returns self for convenient chaining / context-manager usage. |
Source code in python/mneme/recorded_execution.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | |
reset()
¶
Reset the snapshot state in the native layer.
This is typically used to restore the device memory state to the recorded baseline (e.g., before re-running a replay) without reinitializing the snapshot handle.
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the snapshot has not been loaded via :meth: |
Source code in python/mneme/recorded_execution.py
227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 | |
RecordedExecution
¶
Description of a recorded kernel execution and its dynamic instances.
A :class:RecordedExecution captures everything needed to replay and tune
a kernel that was observed during application execution:
- Kernel identity (static hash, name, demangled name)
- Argument names and specialization availability
- Virtual address space reservation information (VA base + size)
- LLVM IR module file paths required for linking
- A mapping of dynamic hash → KernelInstance, representing each observed launch instance (grid/block/shared-mem and snapshot paths)
The class behaves like a mapping over kernel instances and supports JSON
serialization via :meth:to_json / :meth:from_json.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
static_hash
|
str
|
Stable identifier for the kernel’s static code shape. |
required |
kernel_name
|
str
|
Mangled or runtime kernel symbol name. |
required |
demangled_name
|
str
|
Human-readable kernel name (if available). |
required |
llvm_files
|
list[str]
|
Paths to LLVM IR modules captured during recording. |
required |
arg_names
|
list[str]
|
Recorded kernel argument names (for display/debugging). |
required |
specializations
|
list[bool]
|
Per-argument specialization availability flags. |
required |
va_addr
|
str
|
Base virtual address (hex string) used by Mneme’s memory manager. |
required |
va_size
|
int
|
Virtual address space size in bytes (or recording-specific unit). |
required |
kernel_instances
|
dict[str, KernelInstance]
|
Mapping from dynamic hash to recorded launch instance descriptor. |
required |
Source code in python/mneme/recorded_execution.py
288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 | |
KernelInstance
¶
Description of one dynamic kernel launch instance.
A kernel may be launched multiple times with different dynamic properties
(e.g., different grid/block sizes, argument values, or observed runtime hashes).
Each :class:KernelInstance stores:
- Launch parameters (grid, block, shared memory)
- Dynamic hash (identifies the runtime instance)
- Available specialization indices (derived from specialization flags)
- Snapshot file paths for prologue and epilogue
The prologue/epilogue snapshots are exposed via :class:MemStateRef objects,
which are opened on demand by the replay executor.
Source code in python/mneme/recorded_execution.py
327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 | |
to_dict()
¶
Convert this instance into a JSON-friendly dictionary.
Returns:
| Type | Description |
|---|---|
dict
|
Serializable representation containing dims, shared memory, occurrence count, and snapshot file paths. |
Source code in python/mneme/recorded_execution.py
380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 | |
from_json(fn)
classmethod
¶
Load a :class:RecordedExecution database from JSON.
This reconstructs all :class:KernelInstance entries and validates that
referenced LLVM module paths exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fn
|
str
|
Path to the recorded execution JSON file. |
required |
Returns:
| Type | Description |
|---|---|
RecordedExecution
|
Loaded record database. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the JSON file does not exist or referenced IR modules are missing. |
Source code in python/mneme/recorded_execution.py
511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 | |
link_llvm_modules(prune=True, internalize=True)
¶
Link recorded LLVM IR modules into a single module suitable for replay.
This is a convenience wrapper over the Proteus JIT linking layer. Results are cached on the first call and returned on subsequent calls.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prune
|
bool
|
Whether to prune unused symbols/IR during linking. |
True
|
internalize
|
bool
|
Whether to internalize symbols during linking. |
True
|
Returns:
| Type | Description |
|---|---|
ModuleRef
|
Linked IR module produced by the JIT layer. |
Source code in python/mneme/recorded_execution.py
454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 | |
to_json(fn)
¶
Serialize this record database to a JSON file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fn
|
str
|
Output JSON path. |
required |
Source code in python/mneme/recorded_execution.py
498 499 500 501 502 503 504 505 506 507 508 509 | |
SnapshotType
¶
Bases: Enum
Enumeration of memory snapshot roles within a recorded execution.
PROLOGUE Snapshot captured immediately before kernel execution. EPILOGUE Snapshot captured immediately after kernel execution.
The snapshot type is used by the native snapshot loader to interpret the record format and to decide which parts of state are treated as inputs vs outputs during verification.
Source code in python/mneme/recorded_execution.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | |
find_non_jsonables(obj, where='$')
¶
Debug helper: print paths to fields that are not JSON-serializable.
This is used as a sanity check before writing the record database to JSON. It is intentionally permissive and prints to stdout rather than raising.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj
|
Any
|
Object graph to inspect. |
required |
where
|
str
|
JSONPath-like location used when printing offending fields. |
'$'
|
Source code in python/mneme/recorded_execution.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
Device runtime bindings for Mneme (Python ↔ native runtime).
This module provides thin ctypes/FFI wrappers around the Mneme native runtime for loading device code objects and launching / profiling kernels.
Main abstractions
- :class:
DeviceModuleloads a device object from a :class:MemBufferRefand provides access to kernel entry points. - :class:
DeviceFunctionrepresents a kernel function handle and exposeslaunchandprofileoperations, along with basic resource-usage queries (registers, local/const memory).
Lifetime notes
The native runtime owns device-side resources. This module mirrors those
resources via ffi.ObjectRef and uses weak references to enforce correct
cleanup order:
- A :class:
DeviceFunctionkeeps only a weakref to its parent module. - When a :class:
DeviceModuleis disposed/unloaded, it invalidates all functions obtained from it to prevent use-after-free.
DeviceFunction
¶
Bases: ObjectRef
Handle to a device kernel function.
A DeviceFunction is obtained from a :class:DeviceModule via
:meth:DeviceModule.get_function. It provides:
- :meth:
launchfor a direct kernel launch (grid/block only) - :meth:
profilefor record/replay execution with prologue/epilogue state - Resource-usage queries via :attr:
reg_usage, :attr:local_mem, and :attr:const_mem
Notes
Instances are tied to the lifetime of the parent :class:DeviceModule.
If the module is unloaded, the function becomes invalid and further usage
raises an error.
Source code in python/mneme/device.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 | |
const_mem
property
¶
Constant memory usage for this kernel (bytes), as reported by the runtime.
The value is cached after the first query.
Returns:
| Type | Description |
|---|---|
int
|
Constant memory usage in bytes. |
local_mem
property
¶
Local memory usage for this kernel (bytes), as reported by the runtime.
The value is cached after the first query.
Returns:
| Type | Description |
|---|---|
int
|
Local memory usage in bytes. |
reg_usage
property
¶
Register usage for this kernel (registers per thread), as reported by the runtime.
The value is cached after the first query.
Returns:
| Type | Description |
|---|---|
int
|
Register usage per thread. |
__del__()
¶
Best-effort cleanup on garbage collection.
Notes
Python finalizers are not guaranteed to run promptly (or at all at process
shutdown). Resource lifetime should be controlled through the parent
:class:DeviceModule context manager whenever possible.
Source code in python/mneme/device.py
180 181 182 183 184 185 186 187 188 189 190 | |
__init__(func_ptr, module_ref, kernel_name)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func_ptr
|
Native function handle returned by the Mneme runtime. |
required | |
module_ref
|
DeviceModule
|
Parent module that owns the function. A weak reference is stored. |
required |
kernel_name
|
str
|
Kernel symbol name (used for debugging/logging and attribution). |
required |
Source code in python/mneme/device.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |
invalidate()
¶
Mark this function handle as invalid.
This is called when the owning :class:DeviceModule is unloaded/disposed,
preventing use-after-free of the underlying native function handle.
Source code in python/mneme/device.py
162 163 164 165 166 167 168 169 | |
launch(grid_dim, block_dim)
¶
Launch the kernel with the given grid/block configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grid_dim
|
dim3
|
Grid dimensions for the launch. |
required |
block_dim
|
dim3
|
Block dimensions for the launch. |
required |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the function pointer is NULL, the parent module was unloaded, or the function has been invalidated. |
Source code in python/mneme/device.py
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | |
profile(grid_dim, block_dim, prologue_state, epilogue_state, shared_mem_size, iterations=5)
¶
Execute the kernel under Mneme record/replay profiling.
This method executes the kernel with the provided recorded prologue/epilogue state buffers and measures execution time across multiple iterations. The native runtime is responsible for timing, validation hooks, and any device synchronization required for consistent measurements.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grid_dim
|
dim3
|
Grid dimensions for the launch. |
required |
block_dim
|
dim3
|
Block dimensions for the launch. |
required |
prologue_state
|
MemBufferRef
|
Recorded state buffer to initialize device memory / arguments prior to kernel execution. |
required |
epilogue_state
|
MemBufferRef
|
Recorded state buffer used to validate and/or capture post-state after kernel execution. |
required |
shared_mem_size
|
int
|
Dynamic shared memory size (bytes) for the launch. |
required |
iterations
|
int
|
Number of kernel executions to perform for profiling. |
5
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the parent module has been garbage collected. |
Source code in python/mneme/device.py
223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 | |
DeviceModule
¶
Bases: ObjectRef
Loaded device module/object.
A DeviceModule is constructed from a compiled object buffer (a :class:MemBufferRef)
using :meth:from_MemBuffer. It owns the native device object handle and is
responsible for releasing it.
Functions obtained via :meth:get_function are tracked and invalidated when
the module is disposed to prevent use-after-free.
Source code in python/mneme/device.py
280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 | |
__init__(module_ptr)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
module_ptr
|
Native module handle returned by the Mneme runtime. |
required |
Source code in python/mneme/device.py
292 293 294 295 296 297 298 299 300 | |
from_MemBuffer(buffer)
classmethod
¶
Load a device module from an in-memory object buffer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
buffer
|
MemBufferRef
|
Buffer containing a device object suitable for loading by the Mneme runtime. |
required |
Returns:
| Type | Description |
|---|---|
DeviceModule
|
A module that owns the loaded native device object. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
Source code in python/mneme/device.py
316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 | |
get_function(kernel_name)
¶
Resolve a kernel function from the loaded module.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kernel_name
|
str
|
Kernel symbol name to resolve. |
required |
Returns:
| Type | Description |
|---|---|
DeviceFunction
|
Function handle bound to this module. |
Notes
The returned function is tracked by the module and will be invalidated when the module is disposed.
Source code in python/mneme/device.py
342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 | |
get_device_arch()
¶
Return the current device architecture identifier.
Returns:
| Type | Description |
|---|---|
str
|
Architecture string reported by the Mneme runtime (backend-defined). |
Source code in python/mneme/device.py
371 372 373 374 375 376 377 378 379 380 | |
get_device_count()
¶
Return the number of visible devices.
Returns:
| Type | Description |
|---|---|
int
|
Device count as reported by the Mneme runtime. |
Source code in python/mneme/device.py
383 384 385 386 387 388 389 390 391 392 | |
set_device(dev_id)
¶
Set the active device for subsequent device operations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dev_id
|
int
|
Device index to select. |
required |
Source code in python/mneme/device.py
395 396 397 398 399 400 401 402 403 404 | |
mneme.replay_executor
Core record–replay execution pipeline for Mneme.
This module provides the execution backbone used by both: * the synchronous CLI execution path (via BaseExecutor subclasses), and * the asynchronous tuning engine (via TuneWorker).
At a high level, an "experiment" in Mneme is: 1) Load a recorded kernel execution (RecordedExecution + KernelInstance). 2) Reconstruct the recorded GPU memory state (prologue/epilogue snapshots) into a managed virtual address space (PageManagerRef). 3) Link recorded LLVM IR modules into a single IR module suitable for replay. 4) Apply optional IR specializations (arguments, launch dims, launch bounds). 5) Run an optimization pipeline and generate a device object. 6) Load the object onto the GPU, run the kernel, and optionally profile. 7) Verify correctness by comparing epilogue vs prologue expectations.
The pipeline is intentionally organized so that: * verification can be done with minimal instrumentation, * tracked runs collect timing/resource metrics, and * worker processes can amortize initialization costs by reusing a single executor.
Public API
BaseExecutor: Base class that owns GPU affinity, recorded state, and the build/run pipeline.
TuneWorker: Worker-process implementation used by the async tuning infrastructure.
BaseExecutor
¶
Base class for executing Mneme record–replay experiments.
A BaseExecutor instance is bound to: * one recorded database file (record_db), * one kernel instance inside that database (record_id), * one GPU device (device_id), * and an iteration count for measured runs.
Responsibilities
- Load the recorded execution metadata (RecordedExecution) and select the target KernelInstance (kernel_descr).
- Pin the current OS process to a specific GPU device (set_device()).
- Manage the replay address space and recorded snapshots:
- PageManagerRef selects/initializes the virtual address space.
- prologue/epilogue snapshots are opened and later compared.
- Provide a structured pipeline that takes IR -> object -> execution:
- _preprocess_ir(): apply specialization transforms and compute a variant hash
- _optimize(): run pass pipeline / O-level selection
- _codegen(): lower to a device object (MemBufferRef)
- _run(): load object, resolve kernel, execute and optionally profile
- _execute(): orchestrate verification + cleanup + tracked run
Lifecycle
BaseExecutor is designed to be used as a context manager:
executor = MyExecutor(record_db=..., record_id=..., device_id=...)
root_ir = executor.link_ir()
with executor:
res = executor.execute(...)
The context manager ensures GPU memory state (snapshots + page manager) is opened exactly once and released even when execution raises.
Notes / invariants
- A BaseExecutor instance is intended to be used within a single process. (Workers should construct one executor per worker process.)
- open() must be called before any execution; _execute() assumes prologue and epilogue states are loaded.
- link_ir() returns a linked IR module representing the recorded kernel; callers should clone before mutation if reusing across experiments.
Source code in python/mneme/replay_executor.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 | |
TuneWorker
¶
Bases: BaseExecutor
Worker-side executor used by the asynchronous tuning infrastructure.
TuneWorker is a concrete :class:BaseExecutor specialization intended to run
inside a dedicated worker process. It owns the GPU affinity, prologue/epilogue
state, page manager, and JIT pipeline required to compile and replay a recorded
kernel under a given :class:ExperimentConfiguration.
A worker process typically:
1) Initializes profiling and selects a GPU device.
2) Loads the recorded execution (record DB + record ID).
3) Links the recorded LLVM IR into a single module (link_ir).
4) Enters a message-processing loop (see :meth:run) to evaluate configurations.
Notes
- The public entry point for the worker process is :meth:
run, which is designed to be used as a multiprocessing target. - Per-request execution is handled by :meth:
process_payload, which builds, verifies, and runs the kernel according to the provided configuration.
Source code in python/mneme/replay_executor.py
617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 | |
__init__(*args, **kwargs)
¶
Construct a TuneWorker and initialize worker-local profiling.
This constructor initializes the Mneme profiler (for timing breakdowns) and
then delegates initialization to :class:BaseExecutor. The base class sets
device affinity, loads the recorded execution, and prepares prologue/epilogue
descriptors.
Notes
- The worker process should typically construct a single TuneWorker instance and reuse it for multiple requests to amortize startup overhead.
init_profiler()is required to be executed once by every OS process executing it multiple times results to undefined behavior.
Source code in python/mneme/replay_executor.py
640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 | |
process_payload(ir_module, config)
¶
Execute one tuning request: build, verify, and run the kernel under config.
This method is the unit of work performed by a worker in response to a tuning request. It executes the full Mneme record–replay pipeline using the provided IR module and configuration:
1) Records the experiment start timestamp.
2) Invokes the base executor pipeline (see :meth:BaseExecutor._execute),
which performs verification, IR sanitization, compilation, and timed execution.
3) Records the experiment end timestamp and annotates the result with the GPU id.
4) Returns both the populated :class:ExperimentResult and the transformed IR.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ir_module
|
ModuleRef
|
Root IR module (or clone) used as input for this experiment. The module is cloned internally and transformed as part of the execution pipeline. |
required |
config
|
ExperimentConfiguration
|
Configuration describing launch parameters, specialization options, and code-generation controls for this experiment. |
required |
Returns:
| Type | Description |
|---|---|
(ExperimentResult, ModuleRef)
|
A tuple containing:
|
Notes
- This method is expected to be called repeatedly within the worker loop; callers should pass a cloned IR module to avoid cross-experiment mutation.
- Timestamps are recorded in ISO 8601 format using UTC time.
Source code in python/mneme/replay_executor.py
659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 | |
run(request_q, response_q, record_db, record_id, device_id, iterations, results_db_dir, state)
staticmethod
¶
Worker process entry point: initialize resources and serve requests from a queue.
This method is designed to be used as the target function for a worker
multiprocessing.Process. It performs one-time initialization and then
enters a blocking loop that processes messages from request_q.
Initialization performed once per worker:
1) Redirects stdout/stderr to a per-worker log file:
{results_db_dir}/Worker-{device_id}.log. This avoids interleaved
output across processes.
2) Constructs a :class:TuneWorker with the given recording and device id.
3) Links and caches the root IR module (root_ir) that will be cloned
per experiment.
4) Opens GPU memory/prologue/epilogue resources via the executor context
manager (with worker as Memory).
5) Signals readiness by setting state.
Message protocol:
- {"payload": "terminate", ...}:
Stop the worker loop and exit.
- {"payload": "process", "exp_id": <id>, "data": <config-dict>}:
Execute an experiment and respond on response_q with:
{"exp_id": <id>, "payload": "result", "data": <result-dict>, "llvm_ir": ""}.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request_q
|
Queue
|
Queue from which the worker receives control messages and experiment requests. |
required |
response_q
|
Queue
|
Queue to which the worker publishes experiment results. |
required |
record_db
|
str
|
Path to the recorded execution database/file used to construct the executor. |
required |
record_id
|
str
|
Identifier of the recorded kernel instance inside |
required |
device_id
|
int
|
GPU device index to which this worker process is pinned. |
required |
iterations
|
int
|
Number of kernel iterations to execute during the tracked run (the full execution may include additional runs for verification/warmup depending on the executor pipeline). |
required |
results_db_dir
|
str
|
Directory where per-worker logs and output artifacts are written. |
required |
state
|
Event
|
Event used to signal to the parent process that initialization is complete and the worker is ready to accept requests. |
required |
Notes
- The worker loop blocks on
request_q.get()until a message arrives. - The worker clones
root_irper request to avoid cross-request IR mutation. - Exceptions raised inside the loop will currently propagate and terminate the worker process; higher-level infrastructure should treat this as a worker crash.
Source code in python/mneme/replay_executor.py
707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 | |
AsyncReplayExecutor
¶
Asynchronous record/replay executor backed by a pool of worker processes.
AsyncReplayExecutor provides a lightweight interface to evaluate
:class:ExperimentConfiguration objects using one or more worker processes.
Internally it manages:
- A global thread queue of pending :class:
EvalFuturejobs. - A set of :class:
TuneWorkerHandleinstances (one per worker process). - A monotonic job id generator for mapping submissions to results.
Users may submit jobs asynchronously via :meth:submit, or synchronously
evaluate a configuration via :meth:evaluate (submit + wait).
Notes
- Each worker handle can execute at most one in-flight job at a time.
- The executor is intended for repeated evaluations; startup/teardown overhead may dominate for microbenchmarks.
Source code in python/mneme/async_executor.py
338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 | |
__init__(record_db, record_id, iterations, results_db_dir, num_workers)
¶
Construct an asynchronous executor with a fixed-size worker pool.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
record_db
|
str
|
Path to the recorded execution database/file. |
required |
record_id
|
str
|
Identifier of the recorded kernel instance inside |
required |
iterations
|
int
|
Number of kernel iterations performed by each worker per tracked run. |
required |
results_db_dir
|
str
|
Directory where workers write logs and optional output artifacts. |
required |
num_workers
|
int
|
Number of worker processes to launch. |
required |
Source code in python/mneme/async_executor.py
360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 | |
evaluate(config)
¶
Synchronously evaluate one configuration through the worker pool.
This convenience method submits a configuration and blocks until the
corresponding :class:EvalFuture completes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ExperimentConfiguration
|
Experiment configuration to evaluate. |
required |
Returns:
| Type | Description |
|---|---|
ExperimentResult
|
Result object containing verification status, execution time samples, and optional compilation/resource metrics, depending on worker settings. |
Source code in python/mneme/async_executor.py
452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 | |
shutdown()
¶
Gracefully shutdown all workers and their monitoring threads.
This method requests each :class:TuneWorkerHandle to stop, which causes:
- the worker loop to receive a terminate message,
- the worker process to exit,
- the monitor thread to join.
Notes
- After shutdown, submitting additional jobs is undefined behavior.
Source code in python/mneme/async_executor.py
434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 | |
submit(config)
¶
Submit a new experiment configuration for asynchronous evaluation.
The configuration is wrapped in an :class:EvalFuture and enqueued for
execution by the first available worker handle.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ExperimentConfiguration
|
Experiment configuration to evaluate. |
required |
Returns:
| Type | Description |
|---|---|
EvalFuture
|
A future that will be resolved with an :class: |
Source code in python/mneme/async_executor.py
406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 | |
TuneWorkerHandle
¶
Thread-side controller for one worker process executing tuning experiments.
TuneWorkerHandle owns:
- A single worker :class:multiprocessing.Process running :meth:TuneWorker.run.
- A pair of IPC queues for requests/responses.
- A monitoring thread that drives a small state machine for submitting jobs
and receiving results.
- Crash detection and automatic worker respawn.
The handle consumes :class:EvalFuture objects from a shared thread queue
(global_q), forwards their configurations to the worker process, and
resolves each future when the corresponding result arrives.
Notes
- Each handle pins its worker process to a specific device id (GPU affinity is
handled inside :class:
BaseExecutor/ :class:TuneWorker). - Crash recovery is best-effort: if the worker dies while running an experiment, the active future is marked as failed and the worker is restarted.
Source code in python/mneme/async_executor.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 | |
StateMachine
¶
Bases: IntEnum
Internal action state for the monitor loop.
SUBMIT Attempt to dequeue a new job from the global queue and send it to the worker. RECEIVE Poll for a worker response and resolve the currently active future.
Source code in python/mneme/async_executor.py
68 69 70 71 72 73 74 75 76 77 78 79 | |
__init__(idx, global_q, record_db, record_id, device_id, iterations, results_db_dir)
¶
Construct a worker handle and start the worker process + monitor thread.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
idx
|
int
|
Logical worker index (primarily used for logging/debugging). |
required |
global_q
|
Queue
|
Shared thread queue containing :class: |
required |
record_db
|
str
|
Path to the recorded execution database/file. |
required |
record_id
|
str
|
Identifier of the recorded kernel instance inside |
required |
device_id
|
int
|
Device id (GPU index) assigned to the underlying worker process. |
required |
iterations
|
int
|
Number of kernel iterations used by the worker for the tracked execution. |
required |
results_db_dir
|
str
|
Directory where the worker writes logs and optional artifacts. |
required |
Notes
- The worker process is spawned immediately during initialization.
- A background thread is started to monitor the worker process and drive job submission/result collection.
Source code in python/mneme/async_executor.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | |
join()
¶
Request shutdown of the monitor thread and wait for completion.
This method signals the monitor loop to terminate, which triggers graceful worker shutdown and process join. It then joins the monitor thread.
Source code in python/mneme/async_executor.py
324 325 326 327 328 329 330 331 332 | |
pop(q, timeout)
¶
Pop an item from a queue with a timeout.
This helper provides a uniform interface for both thread-based queues
(:class:queue.Queue) and multiprocessing queues
(:class:multiprocessing.Queue). If the queue is empty at the end of the
timeout, None is returned.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
q
|
Queue-like object providing a |
required | |
timeout
|
float
|
Timeout in seconds for the blocking |
required |
Returns:
| Type | Description |
|---|---|
Any or None
|
The retrieved item, or |
Source code in python/mneme/async_executor.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | |
ExperimentConfiguration
dataclass
¶
Configuration for a single Mneme record/replay experiment.
This object captures all knobs that control kernel launch configuration,
specialization strategy, and code generation behavior. It is intended to be
hashable (via :meth:hash) so the same configuration can be given a stable,
persistent identifier across runs.
Attributes:
| Name | Type | Description |
|---|---|---|
grid |
dim3
|
Grid dimensions (x, y, z) of the kernel launch. |
block |
dim3
|
Block dimensions (x, y, z) of the kernel launch. |
shared_mem |
int
|
Amount of dynamic shared memory to allocate for the launch. |
specialize |
bool
|
Whether to enable specialization based on the recorded execution (e.g., specializing on input sizes or recorded parameters). |
set_launch_bounds |
bool
|
Whether to explicitly set CUDA launch bounds for the generated kernel. |
max_threads |
int
|
Maximum number of threads per block to assume when setting launch bounds or during specialization. |
min_blocks_per_sm |
int
|
Minimum number of resident blocks per SM when computing launch bounds. |
specialize_dims |
bool
|
Whether to specialize based on the recorded grid/block dimensions. |
passes |
str
|
Optimization pass pipeline specification, e.g. |
codegen_opt |
int
|
Code generation optimization level (e.g., 0–3). |
codegen_method |
str
|
Code generation strategy, e.g. |
prune |
bool
|
Whether to enable IR pruning / dead-code elimination in the generated kernel. This is mandatory always true. We will explore later the impact of it. |
internalize |
bool
|
Whether to internalize symbols (e.g., limit symbol visibility) during code generation. This is mandatory always true. We will explore later the impact of it. |
Source code in python/mneme/mneme_types.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | |
from_dict(data)
classmethod
¶
Construct a configuration from a plain dictionary.
The dictionary is expected to contain JSON-/YAML-serializable
representations, with "grid" and "block" encoded as dictionaries
compatible with :meth:dim3.from_dict.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict
|
Dictionary containing configuration fields. |
required |
Returns:
| Type | Description |
|---|---|
ExperimentConfiguration
|
A new configuration instance initialized from |
Source code in python/mneme/mneme_types.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
ground()
¶
Values that are 'unused' are set on default values. This can help when hashing configs
Returns:
| Type | Description |
|---|---|
None
|
This function does not return anyhing, but modifies the contents of the class in place |
Source code in python/mneme/mneme_types.py
177 178 179 180 181 182 183 184 185 186 187 188 189 | |
hash()
¶
Compute a stable SHA-256 hash of the full configuration.
The hash is computed from a normalized, JSON-serializable view of the configuration so that identical configurations produce the same digest across processes and runs.
Returns:
| Type | Description |
|---|---|
str
|
Hex-encoded SHA-256 digest of the configuration. |
Source code in python/mneme/mneme_types.py
191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | |
is_valid()
¶
Checks if the configuration follows device constraints
Returns:
| Type | Description |
|---|---|
bool
|
A boolean value indicating whether this is a valid configuration |
Source code in python/mneme/mneme_types.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
to_dict()
¶
Convert the configuration to a plain dictionary.
Returns:
| Type | Description |
|---|---|
dict
|
A JSON-/YAML-serializable dictionary representation of the configuration, suitable for persistence or hashing. |
Source code in python/mneme/mneme_types.py
149 150 151 152 153 154 155 156 157 158 159 160 | |
ExperimentResult
dataclass
¶
Result record for a single Mneme record/replay experiment.
This captures timing information, code size, resource usage, and basic execution outcome. It is designed to be easily serializable so that experiment runs can be logged and analyzed offline.
Attributes:
| Name | Type | Description |
|---|---|---|
preprocess_ir_time |
float
|
Time spent in applying |
opt_time |
float
|
Time spent in the optimization phase of the experiment. |
codegen_time |
float
|
Time spent in the code generation / compilation phase. |
obj_size |
int
|
Size of the generated object or binary artifact. |
exec_time |
list of int
|
Execution time measurements for the replayed kernel, one entry per run. |
verified |
bool
|
Whether the experiment matched the results of the recorded execution. |
executed |
bool
|
Whether the experiment was executed at least once (without a crash). |
failed |
bool
|
Whether the experiment ultimately failed (e.g., compilation or runtime error). |
start_time |
str
|
ISO 8601 timestamp for when the experiment started. |
end_time |
str
|
ISO 8601 timestamp for when the experiment finished. |
gpu_id |
int
|
Identifier of the GPU device on which the experiment ran. |
const_mem_usage |
int
|
Amount of constant memory used by the generated kernel. |
local_mem_usage |
int
|
Amount of local memory used by the generated kernel. |
reg_usage |
int
|
Number of registers used per thread by the generated kernel. |
error |
str
|
Error description, usually set by the TunerHandler on crash |
Source code in python/mneme/mneme_types.py
210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 | |
from_dict(data)
classmethod
¶
Construct an experiment result from a plain dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict
|
Dictionary containing result fields. |
required |
Returns:
| Type | Description |
|---|---|
ExperimentResult
|
A new result instance initialized from |
Source code in python/mneme/mneme_types.py
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 | |
to_dict()
¶
Convert the result record to a plain dictionary.
Returns:
| Type | Description |
|---|---|
dict
|
A JSON-/YAML-serializable dictionary representation of the result. |
Source code in python/mneme/mneme_types.py
287 288 289 290 291 292 293 294 295 296 | |
GPU profiling helpers (lazy-loaded native profiler binding).
This module provides a small Python wrapper around Mneme's native profiling library. The profiler is loaded lazily to avoid premature initialization side-effects in GPU runtimes (notably HSA), which may perform tool initialization during shared-library load time.
Why lazy-load?
- Mneme spawns worker processes (fork). Some GPU profiling/tooling stacks
initialize at import / dlopen time, which is unsafe or undesirable pre-fork.
- By deferring the load until :func:init_profiler is called inside each worker,
the profiling runtime is initialized in the correct process context.
Public API:
- :func:init_profiler Initialize (load) the profiling library.
- :func:gpu_profile_start Start profiling for a kernel name, returns correlation id.
- :func:gpu_profile_stop Stop profiling and return recorded timestamps/records.
Notes:
- This is an internal module; callers are expected to call :func:init_profiler
once per process before using start/stop.
- The native library and its ABI are considered the source of truth.
gpu_profile_start(kernel_name)
¶
Start GPU profiling for a kernel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kernel_name
|
str
|
Kernel name used as a label by the profiling backend. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Correlation identifier used to match start/stop calls. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the profiling library has not been initialized via :func: |
Source code in python/mneme/profile.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | |
gpu_profile_stop(correlation_id)
¶
Stop GPU profiling and return recorded profiling values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
correlation_id
|
int
|
Correlation identifier returned by :func: |
required |
Returns:
| Type | Description |
|---|---|
list[int]
|
List of profiling records returned by the native backend (typically GPU timestamps or counter values, depending on the profiler implementation). |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the profiling library has not been initialized via :func: |
Source code in python/mneme/profile.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
init_profiler()
¶
Initialize the Mneme profiling backend for the current process.
This should be called inside each worker process (post-fork) before any calls
to :func:gpu_profile_start / :func:gpu_profile_stop.
Source code in python/mneme/profile.py
131 132 133 134 135 136 137 138 | |
proteus¶
Python FFI bindings for the Proteus JIT transformation and code-generation pipeline.
This module provides a thin, Pythonic wrapper around Proteus’ C++ JIT infrastructure, exposing functionality for:
- Linking multiple LLVM IR modules into a single executable module
- Pruning dead IR and internalizing symbols
- Applying architecture-aware optimization pipelines
- Specializing kernels based on runtime arguments and launch dimensions
- Emitting device-specific executable objects (e.g., ELF / HSACO)
All operations are performed through a C FFI layer and operate directly on
LLVM modules represented by :class:~mneme.llvm.module.ModuleRef. Most
functions mutate the provided module in place and return either updated
metadata (such as a specialization hash) or compiled device artifacts.
This module forms the core of Mneme’s record–replay and autotuning workflow, bridging recorded execution metadata with dynamic compilation and execution on accelerator devices.
codegen_object(mod, device_arch, codegen_type='serial', codegen_opt_level=3)
¶
Generate a compiled device code object from an LLVM module.
Invokes the Proteus backend code generator for the given architecture and
returns the produced binary wrapped in a :class:~mneme.llvm.buffer.MemBufferRef.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mod
|
ModuleRef
|
LLVM module to compile. |
required |
device_arch
|
str
|
Target architecture string. |
required |
codegen_type
|
str
|
Codegen mode (e.g., |
'serial'
|
codegen_opt_level
|
int
|
Backend optimization level in |
3
|
Returns:
| Type | Description |
|---|---|
MemBufferRef
|
Memory buffer containing the produced code object. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
RuntimeError
|
If |
Source code in python/mneme/proteus/jit.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 | |
internalize(mod, kernel_name)
¶
Mark all symbols except the given kernel as internal.
This applies Proteus' internalization pass, restricting symbol visibility to reduce linking overhead and enable more aggressive optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mod
|
ModuleRef
|
LLVM module to update (mutated in-place). |
required |
kernel_name
|
str
|
Name of the kernel whose symbol must remain externally visible. |
required |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
Source code in python/mneme/proteus/jit.py
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | |
link_llvm_modules(modules, kernel_name, prune, internalize)
¶
Link multiple LLVM IR modules into a single unified module.
This constructs a new module by invoking Proteus' linker. Optionally performs pruning and internalization during the link stage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
modules
|
list[str]
|
Filesystem paths to LLVM IR modules to link. |
required |
kernel_name
|
str
|
Name of the kernel entry function to preserve. |
required |
prune
|
bool
|
Whether to prune dead IR after linking. |
required |
internalize
|
bool
|
Whether to internalize symbols except the kernel. |
required |
Returns:
| Type | Description |
|---|---|
ModuleRef
|
Newly linked module. |
Source code in python/mneme/proteus/jit.py
234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 | |
optimize(mod, device_arch, opt_level, codegen_opt_level)
¶
Run Proteus optimization passes on an LLVM module.
Applies middle-end optimization passes customized for a target device architecture and a chosen LLVM optimization level. Also configures the code-generation optimization intensity used later by the backend.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mod
|
ModuleRef
|
LLVM module to optimize (mutated in-place). |
required |
device_arch
|
str
|
Target device architecture string (e.g., |
required |
opt_level
|
str
|
LLVM optimization pipeline selector (e.g., |
required |
codegen_opt_level
|
int
|
Backend optimization level in |
required |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
ValueError
|
If |
Source code in python/mneme/proteus/jit.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | |
pruneIR(mod)
¶
Remove unused functions, globals, and dead IR from an LLVM module.
This calls Proteus' C++ pruning pass through the FFI to eliminate dead IR and reduce module size before further specialization or optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mod
|
ModuleRef
|
LLVM module to prune. |
required |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
Source code in python/mneme/proteus/jit.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | |
set_launch_bounds(mod, mod_hash, kernel_name, max_threads_per_block, min_blocks_per_sm)
¶
Apply CUDA/HIP-style launch-bounds metadata to the kernel.
Sets launch-bounds on the kernel to restrict maximum threads per block and communicate occupancy constraints, influencing register allocation and codegen decisions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mod
|
ModuleRef
|
LLVM module to annotate. |
required |
mod_hash
|
int
|
Current module hash. |
required |
kernel_name
|
str
|
Name of the kernel function. |
required |
max_threads_per_block
|
int
|
Maximum threads-per-block bound (must be |
required |
min_blocks_per_sm
|
int
|
Minimum required blocks per SM. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Updated module hash. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If |
Source code in python/mneme/proteus/jit.py
404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 | |
specialize_args(mod, mod_hash, kernel_name, kernel_args, num_args, specialize_indexes)
¶
Specialize a subset of kernel arguments inside an LLVM module.
Performs IR rewriting / constant propagation based on provided runtime arguments, and returns an updated hash reflecting the specialization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mod
|
ModuleRef
|
LLVM module to modify. |
required |
mod_hash
|
int
|
Current module hash before specialization. |
required |
kernel_name
|
str
|
Kernel whose arguments are specialized. |
required |
kernel_args
|
Raw pointers to argument values (FFI-compatible pointer array). |
required | |
num_args
|
int
|
Total number of kernel arguments. |
required |
specialize_indexes
|
Indices of arguments to specialize. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Updated module hash after specialization. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If more indices are requested than available arguments. |
Source code in python/mneme/proteus/jit.py
276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 | |
specialize_dims(mod, mod_hash, kernel_name, grid_dim, block_dim)
¶
Specialize launch dimensions (grid/block) inside the LLVM module.
Embeds compile-time constants for launch configuration, enabling IR simplification and more aggressive optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mod
|
ModuleRef
|
LLVM module to update. |
required |
mod_hash
|
int
|
Previous module hash. |
required |
kernel_name
|
str
|
Kernel to specialize. |
required |
grid_dim
|
dim3
|
Grid dimensions. |
required |
block_dim
|
dim3
|
Block dimensions. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Updated module hash. |
Source code in python/mneme/proteus/jit.py
335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 | |
specialize_dims_assume(mod, mod_hash, kernel_name, grid_dim, block_dim)
¶
Add launch-dimension assumptions (grid/block) inside the LLVM module.
Similar to :func:specialize_dims, but emits assumptions rather than (or in
addition to) direct constant replacement, enabling downstream passes to
simplify based on assumed launch invariants.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mod
|
ModuleRef
|
LLVM module to update. |
required |
mod_hash
|
int
|
Previous module hash. |
required |
kernel_name
|
str
|
Kernel to specialize. |
required |
grid_dim
|
dim3
|
Grid dimensions. |
required |
block_dim
|
dim3
|
Block dimensions. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Updated module hash. |
Source code in python/mneme/proteus/jit.py
369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 | |
Tuning¶
ExhaustiveSamplingStrategy
¶
Bases: SamplingStrategy
Exhaustive sampling strategy over the entire search space.
This strategy enumerates all valid combinations of parameters as defined
by the associated :class:SearchSpace. It is intended primarily for small
search spaces where full enumeration is feasible.
Notes
- This strategy has not been tested at all and should be considered as proof of concept.
- Exhaustive enumeration may become prohibitively expensive for large or high-dimensional search spaces.
Source code in python/mneme/tuning/sample_strategy.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | |
__init__(search_space)
¶
Construct an exhaustive sampler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_space
|
SearchSpace
|
Search space providing the parameter definitions and exhaustive enumeration logic. |
required |
Source code in python/mneme/tuning/sample_strategy.py
45 46 47 48 49 50 51 52 53 54 55 | |
__iter__()
¶
Yield all parameter combinations from the search space.
Returns:
| Type | Description |
|---|---|
Iterator[dict]
|
Iterator over parameter dictionaries produced by
:meth: |
Source code in python/mneme/tuning/sample_strategy.py
57 58 59 60 61 62 63 64 65 66 67 68 | |
OptunaSamplingStrategy
¶
Bases: SamplingStrategy
Optuna-driven adaptive sampling strategy.
This strategy delegates sampling decisions to an Optuna Study object.
Parameter suggestions are generated by invoking the search space’s Optuna
sampling logic, which typically binds Optuna Trial objects to parameter
definitions.
The iterator yields samples until the requested number of trials has been reached, accounting for trials that may already exist in the study (e.g., when resuming from a persistent Optuna backend).
Source code in python/mneme/tuning/sample_strategy.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
__init__(search_space, study, n_trials)
¶
Construct an Optuna-based sampling strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_space
|
SearchSpace
|
Search space providing Optuna-aware sampling logic. |
required |
study
|
Study
|
Optuna study object managing trials and optimization state. |
required |
n_trials
|
int
|
Total number of trials to execute (including any existing trials already present in the study). |
required |
Source code in python/mneme/tuning/sample_strategy.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
__iter__()
¶
Yield Optuna-suggested parameter dictionaries.
Iteration continues until the number of trials in the associated study
reaches n_trials.
Returns:
| Type | Description |
|---|---|
Iterator[dict]
|
Iterator yielding parameter dictionaries produced via Optuna. |
Source code in python/mneme/tuning/sample_strategy.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
RandomSamplingStrategy
¶
Bases: SamplingStrategy
Random sampling strategy over the search space.
This strategy draws a fixed number of independent samples from the search space using the search space’s random sampling logic.
Notes
- This strategy has not been tested at all and should be considered as proof of concept.
- Sampling does not guarantee coverage or uniqueness of configurations.
Source code in python/mneme/tuning/sample_strategy.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | |
__init__(search_space, num_samples)
¶
Construct a random sampler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_space
|
SearchSpace
|
Search space providing the parameter definitions and random sampling logic. |
required |
num_samples
|
int
|
Number of random samples to generate. |
required |
Source code in python/mneme/tuning/sample_strategy.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
__iter__()
¶
Yield randomly sampled parameter dictionaries.
Returns:
| Type | Description |
|---|---|
Iterator[dict]
|
Iterator yielding |
Source code in python/mneme/tuning/sample_strategy.py
100 101 102 103 104 105 106 107 108 109 110 111 | |
SamplingStrategy
¶
Bases: ABC
Abstract base class for parameter sampling strategies.
A SamplingStrategy defines how candidate parameter dictionaries are
generated from a :class:SearchSpace. Concrete implementations determine
whether sampling is exhaustive, random, adaptive, or driven by an external
optimization framework (e.g., Optuna).
Implementations must provide an iterator interface that yields parameter
dictionaries compatible with the associated :class:SearchSpace.
Source code in python/mneme/tuning/sample_strategy.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
__iter__()
abstractmethod
¶
Return an iterator that yields configs.
Source code in python/mneme/tuning/sample_strategy.py
23 24 25 26 | |
BaseParam
¶
Bases: ABC
Abstract base class for tuning parameter definitions.
A BaseParam represents a single tunable dimension in a :class:SearchSpace.
Concrete subclasses define the domain (fixed, boolean, categorical, numeric range,
pass-pipeline, etc.) and provide metadata needed by sampling backends.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Logical name of the parameter. This name is typically used as the Optuna
parameter name when sampling via :class: |
Source code in python/mneme/tuning/search_space.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
BoolParam
¶
Bases: BaseParam
A boolean parameter.
The domain is {True, False}.
Source code in python/mneme/tuning/search_space.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | |
__init__(name)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the parameter. |
required |
Source code in python/mneme/tuning/search_space.py
57 58 59 60 61 62 63 64 65 | |
CategoricalParam
¶
Bases: BaseParam
A parameter with an explicit finite set of choices.
The domain is the provided list of choices.
Source code in python/mneme/tuning/search_space.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | |
__init__(name, choices)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the parameter. |
required |
choices
|
list
|
Finite set of allowed values. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in python/mneme/tuning/search_space.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | |
FixedParam
¶
Bases: BaseParam
A parameter with a single fixed value.
This is useful for keeping a dimension present in the search space interface while effectively disabling tuning for that parameter.
Source code in python/mneme/tuning/search_space.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | |
__init__(name, value)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the parameter. |
required |
value
|
Any
|
Fixed value returned by all samplers. |
required |
Source code in python/mneme/tuning/search_space.py
37 38 39 40 41 42 43 44 45 46 47 | |
IntRangeParam
¶
Bases: BaseParam
Integer range parameter.
Represents an inclusive integer range [low, high] with a positive step.
Sampling produces values from the discrete set:
{low, low+step, ..., high} (assuming divisibility).
Notes
- This class models a discrete domain (even though it is expressed as a range).
Source code in python/mneme/tuning/search_space.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |
__init__(name, low, high, step=1)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the parameter. |
required |
low
|
int
|
Inclusive lower bound. |
required |
high
|
int
|
Inclusive upper bound. |
required |
step
|
int
|
Step size (must be positive). |
1
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in python/mneme/tuning/search_space.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |
PipelineParam
¶
Bases: BaseParam
Parameter representing a compiler optimization pipeline / pass sequence.
This parameter is specialized: its domain is defined by the available passes
provided by :class:PipelineManager and an internal sampling scheme that can
select passes, order them, and optionally select multiple occurrences.
Attributes:
| Name | Type | Description |
|---|---|---|
pass_manager |
PipelineManager
|
Helper that provides available passes and serialization to pipeline strings. |
available_passes |
list of str
|
Sorted list of pass identifiers that may be selected. |
num_draws |
int
|
Upper bound on how many pass-selection decisions are made when sampling. (Interpretation depends on the sampling backend.) |
Notes
This parameter must be used cautiously, the pipeline sequence is a combinatorial space on each own
composed of more than 100+ of dimensions. We mainly provide this for completeness, however applying
TPE or NGSAII on this space is not recommended.
Source code in python/mneme/tuning/search_space.py
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | |
__init__(name, num_draws)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the parameter (used as a logical key in the search space). |
required |
num_draws
|
int
|
Number of sampling "draws" used when constructing a pipeline. |
required |
Notes
- The available pass list is obtained from :class:
PipelineManagerand is sorted to ensure stable iteration order.
Source code in python/mneme/tuning/search_space.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | |
RealRangeParam
¶
Bases: BaseParam
Real-valued range parameter.
Represents an inclusive real range [low, high]. This parameter is intended
for continuous sampling backends (e.g., Optuna suggest_float).
Notes
- This parameter is not exhaustively enumerable.
Source code in python/mneme/tuning/search_space.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 | |
__init__(name, low, high)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the parameter. |
required |
low
|
float
|
Inclusive lower bound. |
required |
high
|
float
|
Inclusive upper bound. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in python/mneme/tuning/search_space.py
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 | |
SearchSpace
¶
Bases: ABC
Declarative representation of a tuning search space.
A :class:SearchSpace describes:
1) The primary tunable dimensions (see :meth:dimensions)
2) Any derived configuration computed from sampled parameters (see :meth:derived)
3) Constraints that determine whether a sampled assignment is valid (see :meth:constraints)
The base class also provides helper sampling routines for different backends (random sampling, Optuna sampling, and exhaustive enumeration of finite domains).
Notes
- Concrete subclasses should keep :meth:
dimensionspurely declarative and implement domain-specific logic inside :meth:derivedand :meth:constraints.
Source code in python/mneme/tuning/search_space.py
340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 | |
constraints(params)
abstractmethod
¶
Validate that a parameter assignment or derived configuration is legal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
dict
|
Parameter assignment to validate. Implementations may choose whether this expects only primary parameters or a derived configuration, depending on the calling context. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
|
Source code in python/mneme/tuning/search_space.py
395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 | |
derived(params)
abstractmethod
¶
Compute a full experiment configuration from sampled primary parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
dict
|
Dictionary mapping primary parameter names to sampled values. |
required |
Returns:
| Type | Description |
|---|---|
ExperimentConfiguration
|
Fully specified experiment configuration derived from the sampled values. |
Notes
- Derived configuration may include both original parameters and additional fields computed from them (e.g., mapping a normalized fraction to an integer launch-bounds parameter).
Source code in python/mneme/tuning/search_space.py
372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 | |
dimensions()
abstractmethod
¶
Return the top-level tunable parameters of this search space.
Returns:
| Type | Description |
|---|---|
dict
|
Mapping from parameter name to a :class: |
Source code in python/mneme/tuning/search_space.py
359 360 361 362 363 364 365 366 367 368 369 370 | |
sample_exhaustive()
¶
Exhaustively enumerate all configurations for finite domains.
This helper enumerates the cartesian product of all dimension value lists for parameters with finite, enumerable domains (fixed, boolean, categorical, and integer ranges). Real-valued parameters are not enumerable.
Yields:
| Type | Description |
|---|---|
dict
|
Dictionaries of the form |
Raises:
| Type | Description |
|---|---|
ValueError
|
If an attempt is made to enumerate a non-enumerable parameter type. |
Source code in python/mneme/tuning/search_space.py
507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 | |
sample_optuna(study)
¶
Generate a valid configuration using Optuna.
This method uses an Optuna study to create trials and sample primary dimensions, then computes the derived experiment configuration and enforces constraints. Invalid samples are immediately reported back to the study with a sentinel objective value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
study
|
Study
|
Optuna study used to create trials and manage search state. |
required |
Returns:
| Type | Description |
|---|---|
(ExperimentConfiguration, Trial)
|
A tuple containing:
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If a valid configuration cannot be produced within the retry budget. |
Notes
- This routine uses
study.ask()/study.tell()rather than Optuna’s higher-level objective API to support asynchronous evaluation.
Source code in python/mneme/tuning/search_space.py
452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 | |
sample_random()
¶
Generate one valid random sample from this search space.
This method repeatedly samples all primary dimensions using
:func:sample_random_param and applies :meth:constraints. Sampling is retried
up to MAX_RETRIES times.
Returns:
| Type | Description |
|---|---|
dict
|
A dictionary of the form |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If a valid configuration cannot be produced within the retry budget. |
Source code in python/mneme/tuning/search_space.py
414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 | |
sample_optuna_param(trial, param)
¶
Sample a single parameter value using an Optuna trial.
This function maps :class:BaseParam subclasses to the appropriate Optuna
sampling primitive (e.g., suggest_int, suggest_float, or
suggest_categorical). For specialized parameter types (e.g.,
:class:PipelineParam), it implements a custom sampling scheme that encodes
selection and ordering via multiple Optuna decision variables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trial
|
Trial
|
Optuna trial used to generate parameter suggestions. |
required |
param
|
BaseParam
|
Parameter definition describing the domain and sampling behavior. |
required |
Returns:
| Type | Description |
|---|---|
Any
|
Sampled value for the parameter. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the parameter type is not supported. |
Source code in python/mneme/tuning/search_space.py
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 | |
sample_random_param(param)
¶
Generate one random sample for a single parameter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
param
|
BaseParam
|
Parameter definition describing the domain. |
required |
Returns:
| Type | Description |
|---|---|
Any
|
Randomly sampled value. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the parameter type is not supported. |
Source code in python/mneme/tuning/search_space.py
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 | |
LLVM¶
ValueRef
¶
Bases: ObjectRef
A weak reference to a LLVM value.
Source code in python/mneme/llvm/value.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 | |
alignment
property
¶
The alignment property.
arguments
property
¶
Return an iterator over this function's arguments. The iterator will yield a ValueRef for each argument.
attributes
property
¶
Return an iterator over this value's attributes. The iterator will yield a string for each attribute.
block
property
¶
The block this instruction value was obtained from.
blocks
property
¶
Return an iterator over this function's blocks. The iterator will yield a ValueRef for each block.
function
property
¶
The function this argument or basic block value was obtained from.
has_initializer
property
¶
Returns True if a global variable has an initializer.
initializer
property
¶
Returns the initializer of a global variable.
instruction
property
¶
The instruction this operand value was obtained from.
instructions
property
¶
Return an iterator over this block's instructions. The iterator will yield a ValueRef for each instruction.
is_declaration
property
¶
Whether this value (presumably global) is defined in the current module.
memory_type
property
¶
The memory type accessed by this instruction LLVM type.
module
property
¶
The module this function or global variable value was obtained from.
operands
property
¶
Return an iterator over this instruction's operands. The iterator will yield a ValueRef for each operand.
type
property
¶
This value's LLVM type.
add_function_attribute(attr)
¶
Only works on function value
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
attr
|
str
|
attribute name |
required |
Source code in python/mneme/llvm/value.py
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 | |
as_instruction()
¶
Returns a constant expression value as an instruction.
Source code in python/mneme/llvm/value.py
515 516 517 518 519 520 521 522 523 | |
get_constant_value(signed_int=False, round_fp=False)
¶
Return the constant value, either as a literal (when supported) or as a string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
signed_int
|
bool
|
if True and the constant is an integer, returns a signed version |
False
|
round_fp
|
bool
|
if True and the constant is a floating point value, rounds the result upon accuracy loss (e.g., when querying an fp128 value). By default, raises an exception on accuracy loss |
False
|
Source code in python/mneme/llvm/value.py
439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 | |
TypeRef
¶
Bases: ObjectRef
A weak reference to a LLVM type
Source code in python/mneme/llvm/typeref.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | |
alloc_type_width
property
¶
Returns the offset in bytes between successive objects of the specified type, including alignment padding.
If Ty is a scalable vector type, the scalable property will be set and the runtime size will be a positive integer multiple of the base size.
This is the amount that alloca reserves for this type. For example, returns 12 or 16 for x86_fp80, depending on alignment.
element_count
property
¶
Returns the number of elements in an array or a vector. For scalable vectors, returns minimum number of elements. When the type is neither an array nor a vector, raises exception.
elements
property
¶
Returns iterator over enclosing types
is_array
property
¶
Returns true is the type is an array type.
is_pointer
property
¶
Returns true is the type is a pointer type.
is_struct
property
¶
Returns true is the type is a struct type.
is_vector
property
¶
Returns true is the type is a vector type.
name
property
¶
Get type name
store_type_width
property
¶
Returns the maximum number of bytes that may be overwritten by storing the specified type.
If this is a scalable vector type, the scalable property will be set and the runtime size will be a positive integer multiple of the base size.
For example, returns 36 for i36 and 80 for x86_fp80. The type passed must have a size (Type::isSized() must return true).
system_type_width
property
¶
Return the basic size of this type if it is a primitive type. These is target-dependent. This will return zero if the type does not have a size or is not a primitive type.
If this is a scalable vector type, the scalable property will be set and the runtime size will be a positive integer multiple of the base size.
Note that this may not reflect the size of memory allocated for an instance of the type or the number of bytes that are written when an instance of the type is stored to memory.
type_kind
property
¶
Returns the LLVMTypeKind enumeration of this type.
type_width
property
¶
Return the basic size of this type if it is a primitive type. These are fixed by LLVM and are not target-dependent. This will return zero if the type does not have a size or is not a primitive type.
If this is a scalable vector type, the scalable property will be set and the runtime size will be a positive integer multiple of the base size.
Note that this may not reflect the size of memory allocated for an instance of the type or the number of bytes that are written when an instance of the type is stored to memory.
ModuleRef
¶
Bases: ObjectRef
A reference to a LLVM module.
Source code in python/mneme/llvm/module.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 | |
aliases
property
¶
Return an iterator over this module's function aliases. The iterator will yield a ValueRef for each alias.
data_layout
property
writable
¶
This module's data layout specification, as a string.
functions
property
¶
Return an iterator over this module's functions. The iterator will yield a ValueRef for each function.
global_variables
property
¶
Return an iterator over this module's global variables. The iterator will yield a ValueRef for each global variable.
Note that global variables don't include functions (a function is a "global value" but not a "global variable" in LLVM parlance)
ifuncs
property
¶
Return an iterator over this module's ifuncs. The iterator will yield a ValueRef for each ifunc.
name
property
writable
¶
The module's identifier.
source_file
property
¶
The module's original source file name
struct_types
property
¶
Return an iterator over the struct types defined in the module. The iterator will yield a TypeRef.
triple
property
writable
¶
This module's target "triple" specification, as a string.
get_function(name)
¶
Get a ValueRef pointing to the function named name. NameError is raised if the symbol isn't found.
Source code in python/mneme/llvm/module.py
68 69 70 71 72 73 74 75 76 | |
get_global_variable(name)
¶
Get a ValueRef pointing to the global variable named name. NameError is raised if the symbol isn't found.
Source code in python/mneme/llvm/module.py
78 79 80 81 82 83 84 85 86 | |
get_struct_type(name)
¶
Get a TypeRef pointing to a structure type named name. NameError is raised if the struct type isn't found.
Source code in python/mneme/llvm/module.py
88 89 90 91 92 93 94 95 96 | |
verify()
¶
Verify the module IR's correctness. RuntimeError is raised on error.
Source code in python/mneme/llvm/module.py
98 99 100 101 102 103 104 | |
parse_assembly(llvmir, context=None)
¶
Create Module from a LLVM IR string
Source code in python/mneme/llvm/module.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 | |
parse_bitcode(bitcode, context=None)
¶
Create Module from a LLVM bitcode (a bytes object).
Source code in python/mneme/llvm/module.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | |