Documentation v1.0.4

M8P Microprocessor

A virtual machine designed to build and run sophisticated AI Systems.
It represents an architectural shift: treating AI operations as native instructions.

What is M8P?

M8P is a virtual machine (VM) designed to execute high-level AI operations—inference, vector search, matrix multiplication, and embedding generation—as native, first-class instructions.

Unlike traditional frameworks that rely on REST APIs and JSON serialization, M8P provides an atomic runtime environment. It is built on a robust C++ codebase, combining llama.cpp, an HNSW Vector DB, and AVX2/AVX512 optimizations.

The Architecture

M8P introduces a novel assembly language for AI. Instead of calling external services, you push data to registers and execute semantic opcodes.

Legacy Stack

Python Script

↓ HTTP (Latency)

Vector DB Service

↓ JSON (Serialization)

Inference API

M8P Stack

Single C++ Runtime

Shared Memory Space

Zero-Copy Latency

Instruction Set Reference

The M8P ISA primitives allow for high-level orchestration of AI workflows directly in VRAM.

01. Core & Math Operations

Basic OperationsID: BasicOperation

## Basic Operations
f32set <rage> 12.2
i32set <r2> 5
store <r1> ..string... # will store ...string... in register <r1>, store does not support newlines
store <r3> My age is <rage> and i have <r2> friends # store supports interpolation
dup <r1> <r2> # will duplicate register <r1> to <r2>
ret <r1> <r2> # multiple returns

Math Operations (F32/I32)ID: MathOperations

f32add <r10> 23.44533
f32sub <r10> 23.44533
f32mul <r10> 23.44533
i32set <r9> 123
i32add <r9> 123
i32mul <r9> 123

Interpolation & ReturnsID: TestStoreInterp

store <r1> Ayrton
store <r2> 28
store <out> My name is <r1> and my age is <r2>
ret <r1> <r2> <out>

02. Matrix & Vector Operations

Basic Matrix OpsID: MatrixOperations

matn <r1> 1 376 306 626 # variable width matrix
mat8 <r1> 10 20 30 40 50 60 70 89
matsub <r1> <r2> <r3> 
matadd <r1> <r2> <r3> 
matmul <r1> <r2> <r3> # Element-wise multiplication

Advanced Math (SIMD/AVX512)ID: AdvMatrixOps

matdot <r1> <r2> <result> ## dot product
matcosim <r3> <r2> <result> ## cosine similarity
matl2d <r1> <r2> <result> ## L2 euclidean distance
matnorm <r1> <out> ## normalization
pad <r1> 8 ## aligns/pads vector to length

03. Inference & VectorDB

LLM InferenceID: Inference

store <r1> Tell me a joke
## llm_instance will cache the response
llm_instance <r1> instname n_predict=24 temperature=0.5 
llm_instancestatus instname <r3> 

## Force generation (ignore cache)
llm_instance <r1> instname force=true

Embeddings & TokensID: EmbeddingsTokens

llm_embed <r1> <rv2> dim=16 
llm_tokenize <r1> <r1tokens> 
llm_detokenize <r1tokens> <r4>

HNSW VectorDBID: VectorDB

vdb_instance MYDB4 dim=16 max_elements=500 M=16 ef_construction=200
vdb_add MYDB4 <rv1> <r1> 
vdb_search MYDB4 <rv1> <rv37> distance=0.019

04. Testing & Assertions

Unit Testing PrimitivesID: Assertions

asserteq <r1> <r2>
assertcontains <r1> ...string...
assertnotempty <r1>
assertempty <r1>
assertnil <r1>
clr <r1> ## clears register

Key Workflow Features

Stateful & Stateless Sessions

Fork VM states to explore multiple reasoning paths simultaneously without reloading model weights.

Hardware Saturation

Direct access to CUDA and AVX512 ensures your math operations don't bottleneck the inference.