Speed up your AI workflows, decrease latency at the same time increase throughput by at least factor with the raw throughput of bare-metal C++ delivered solely to speed up your tools, agents, search and agentic workflows up to 43.5% without leaving awesome python.
Inter-process communication delays create unacceptable lag in real-time loops.
Quantization errors and context window fragmentation degrade model outputs.
Maintaining separate microservices for tools, memory, and inference creates overhead.
* HOW TO WE DETERMINE COMPUTE EFFICIENCY? SIMPLE, WE take into account the top 4 factors optimized in the vm: memory-effiency, zero-copying, streamming control & persistence settings. Conservative Estimations puts a minimum efficiency of 10% per factor. Further benchmarks will enhance this metric.
Complex agent chains incur massive serialization costs between steps.
GPUs stall while CPU-bound logic processes tool outputs and routing.
Maintaining separate microservices for tools, RAG, and inference creates debt.
M8 compresses and runs mission critical agent workload (inference + search + matrix ops) that needs to scale, that needs to work into a single, hardware-accelerated operation.
50 concurrent autonomous agents optimizing booking routes. Sub-second decision latency per node.
Integrated vector search and inference. Eliminates data movement overhead for massive knowledge bases.
Self-healing code generation loops running on air-gapped hardware for security-critical environments.
Real-time introspection of active neural pathways and vector states during inference.
High-throughput token streaming with low-latency backpressure handling for real-time apps.
Infinite workspace for generative ideation and multi-modal synthesis (Image + Text).
Vector-based personalized content ranking with sub-millisecond retrieval times.
Persistent, semantic storage layer for long-term context retention across sessions.
3D projection mapping of high-dimensional vector clusters for data exploration.
Algorithmic terrain and geometry generation driven by semantic prompts.
Real-time aggregation of global indices, forex, and crypto feeds.
Automated detection of price inefficiencies across fragmented liquidity pools.
NLP-driven sentiment analysis of global news and geopolitical events.
Real-time Greek calculation and volatility surface modeling.
Predictive modeling of forward curves and open interest dynamics.
Real-time portfolio exposure calculation and market sentiment scoring.
Seamless integration into existing quantitative stacks. Offload compute-heavy reasoning loops to the M8 Engine while maintaining Pythonic control.
import m8_core as m8
# > DEFINE EXECUTION GRAPH
# The VM handles the loop internally, eliminating round-trips.
strategy_graph = {
"model": "llama-3-70b-quantized-GGUF",
"allocation": { "gpu_layers": 92, "ctx_size": 8192 },
"pipeline": [
{
"op": "ingest",
"buffer": "market_signals"
},
{
"op": "infer",
"constraints": { "grammar": "json_tool_call" }
},
{
"op": "branch",
"conditions": ["buy", "sell", "hold"]
}
]
}
# > EXECUTE ON GRID
engine = m8.connect("wss://grid.m8.internal:8080")
result = engine.submit(strategy_graph, priority=1)
## Basic Operations
f32set <rage> 12.2
i32set <r2> 5
stream r2 is <r2> ## streams output (supports interpolation)
store <r1> ..string...
store <r3> My age is <rage> and i have <r2> friends
dup <r1> <r2> # duplicate r1 to r2
store <r2> ulala
ret <r1> <r2> # multiple returns
## Assertions
assertcontains <r1> ...string...
assertnotempty <r1>
assertempty <r1>
assertnil <r1>
asserteq <r1> <r2>
## Generating Embeddings
store <r1> Hello there
llm_embed <r1> <rv2> dim=16 ## stores embedding in <rv2>
## Generate llm-dictionary tokens
llm_tokenize <r1> <r1tokens>
llm_detokenize <r1tokens> <r4>
ret <r4>
## Result stored in first register
f32add <r10> 23.44533
f32sub <r10> 23.44533
f32mul <r10> 23.44533
f32set <r10> 78
i32set <r9> 123
i32add <r9> 123
i32mul <r9> 123
store <response> r10=<r10> r9=<r9>
ret <response>
## Matrix operations
matn <r1> 1 376 306 626 ... # variable width matrix
mat8 <r1> 10 20 30 40 50 60 70 89
mat8 <r2> 12.3 20.23 30.23 40.23 50.23 60.23 70 89
matsub <r1> <r2> <r3>
matadd <r1> <r2> <r3>
matmul <r1> <r2> <r3>
matnorm <r1> <r1norm>
matdot <score_weights> <result_metrics> <result> ## dot product
matcosim <score_weights> <result_metrics> <result> ## cosine similarity
matl2d <score_weights> <result_metrics> <result> ## L2 distance
ret <r3>
## Inference (Cached)
store <r1> Tell me a joke
## llm_instance will cache the response
llm_instance <r1> instname n_predict=24 temperature=0.5
llm_instancestatus instname <r3> #store result into <r3>
stream <r3> ## realtime stream
## Force Inference (Ignore cache)
store <r1> Tell me a joke
llm_instance <r1> instname n_predict=24 temperature=0.5 force=true
llm_instancestatus instname <r3>
## Vectordb operations
vdb_instance MYDB4 dim=16 max_elements=500 M=16 ef_construction=200
store <r1> DPR/XML Modelo IVA
llm_embed <r1> <rv1> dim=16
vdb_add MYDB4 <rv1> <r1>
vdb_search MYDB4 <rv1> <rv37> distance=0.019
llm_detokenize <rv37> <result_text>
ret <result_text>
## Alignment and Batch Add
mat8 <r1> 1 2 3 4 5 6 7 8
align <r1> 16
mat8 <r2> 10 20 30 40 50 60 70 80
align <r2> 16
vdb_instance MYDB dim=16 max_elements=500
vdb_add MYDB <r1> 1 2 3 4 5 6 7 8
vdb_add MYDB <r2> 10 20 30 40 50 60 70 80