// DICTIONARY_MANUAL

AI Engineering Dictionary

A structured reference index explaining emerging AI terms, architecture patterns, failure modes, and developer conventions. Use the homepage console for instant interactive search, or browse the sections below.

§01

Section 1 — The Model

9 entries compiled

Artificial Intelligence (AI)verified

A general term describing computer systems that perform tasks historically requiring human intelligence — like writing code, reasoning through bugs, or understanding natural language.

Inferenceverified

The execution phase where a trained model processes input tokens to generate output tokens.

Model Providerverified

The host infrastructure that serves model inference, either via cloud-based APIs or local serving engines.

The compiled, frozen parameters that calculate next-token probability distributions. A model is completely stateless and pure.

Next-token Predictionverified

The core autoregressive mechanism of generative models, calculating token probabilities and sampling one token at a time.

Non-determinismverified

The operational characteristic where identical prompts output different token sequences due to temperature scaling, nucleus sampling, and floating-point math variations.

Parametersverified

The internal floating-point weights and biases of a neural network, optimized during training, that define the model's behavior.

The basic numerical chunk (character fragment or sub-word) that a model reads and writes, converted from text via a tokenizer.

Trainingverified

The computational process of optimizing a model's parameters by exposing it to datasets and adjusting weights via backpropagation.

§02

Section 2 — Model provider request

14 entries compiled

An autonomous software system that embeds an LLM within a stateful execution loop, enabling it to call tools, interact with files, and iteratively accomplish complex goals.

Cache Tokensverified

The portion of input tokens that matched an active prefix cache, resulting in significantly reduced bills and near-instant processing.

Context Windowverified

The absolute maximum token limit (capacity) that a model can process, read, and write in a single API request.

Contextverified

The combined body of system instructions, conversation logs, files, and schemas injected into the model request to guide its behavior and provide facts.

Harnessverified

The client-side application code that drives the model, parses tool calls, maintains conversation logs, and executes command operations on the local machine.

Input Tokensverified

The numerical text fragments (prompts, system rules, history, and schemas) sent in a request to the model provider.

Model Provider Requestverified

The network API payload containing prompt messages, system templates, parameters, and tool definitions sent to a model provider.

Output Tokensverified

The numerical text fragments generated by the model in response to a request, billed at a premium rate and processed sequentially.

Prefix Cacheverified

An optimization system that stores pre-processed prompt segments in GPU memory, skipping repetitive calculations for identical context prefixes.

Sessionverified

The active span of a conversation thread, representing the sequence of user queries, tool calls, and model responses accumulated in memory.

Statefulverified

The operational design where a client application (harness) maintains a persistent record of messages, files, and variables across multiple stateless model queries.

Statelessverified

The architectural characteristic of AI models where each API request has no memory of prior queries, requiring the client to send the entire conversation history in every call.

System Promptverified

The root-level instruction block in an API request that establishes the model's role, constraints, formatting rules, and tool access boundaries.

A single request-response exchange in a session, composed of user input (and potential tool results) followed by the model's output.

§03

Section 3 — Environment

7 entries compiled

Environmentverified

The boundary of directories, files, systems, databases, and network APIs that an agent can see and modify using its tools.

Filesystemverified

The storage interface where an agent reads source documents, inspects files, and writes edits using file-operation tools.

Permission Modeverified

The configuration tier (Bypass, Ask, or Strict) that dictates which tool classes run automatically and which require developer approval.

Permission Requestverified

A checkpoint in an agent loop that prompts the developer for approval before executing a sensitive tool call.

Tool Callverified

The structured request generated by a model during inference, specifying a tool name and arguments it wants the client harness to run.

Tool Resultverified

The execution output (data or error logs) sent back to the model provider by the client harness after running a tool call.

An external function or API made available to a model, defined via a JSON schema, allowing the agent loop to execute operations on the host system.

§03

Section 3 — Agent Tooling

1 entries compiled

Pluginsverified

A packaged bundle of tool schemas, prompts, rules, and runtime configurations that extends an AI agent's capability for a specific domain.

§04

Section 4 — Sandbox

4 entries compiled

Agent Modeverified

The runtime configuration (e.g. architect, builder, interpreter) that sets the model's role instructions and restricts the tools it can access.

Hallucinationverified

A model failure mode where the LLM generates factually false statements, non-existent code functions, or phantom API parameters that sound plausible.

Sandboxverified

An isolated computing environment (container, VM, or restricted shell) that restricts the files and commands an agent can access, limiting the damage of automated actions.

Sycophancyverified

A model failure mode where the LLM submissively agrees with a user's incorrect statements or preferences to appear cooperative, prioritizing sycophantic agreement over technical accuracy.

§05

Section 5 — Parametric knowledge

5 entries compiled

Attention Budgetverified

The finite mathematical capacity each token has to distribute influence across other context tokens, which dilutes as prompt length grows.

Attention Relationshipverified

The mathematical connection computed between pairs of tokens inside the context window that represents how they influence and depend on each other.

Contextual Knowledgeverified

The active facts, source code files, and logs loaded inside the model's context window that it can read directly at query time.

Knowledge Cutoffverified

The calendar date past which a model has no pre-trained parametric knowledge of events, codebase changes, or library updates.

Parametric Knowledgeverified

The frozen world facts and coding capability compiled directly into the model's parameters during training, which cannot be modified during inference.

§06

Section 6 — Attention degradation

5 entries compiled

Attention Degradationverified

The gradual decline in a model's constraint-following and reasoning performance as prompt context length increases, caused by attention budget dilution.

Clearingverified

Ending the current conversation session and starting a fresh one with a completely empty context window to wipe out accumulated noise.

Handoffverified

The process of transferring task progress, decisions, and next steps from a bloated chat session to a fresh one, preserving focus while resetting the context window.

Primary Sourceverified

The original, raw source of truth (e.g. active code files, terminal test logs, database rows) rather than summaries or descriptions of them.

Smart Zoneverified

The early phase of a session where the context window is small, keeping the model sharp, highly focused, and accurate.

§07

Section 7 — Secondary source

5 entries compiled

Compactionverified

An in-memory session reset where the active chat history is summarized by the model, throwing away the detailed transcript to free up context window space.

Handoff Artifactverified

A persistent file written to the environment by an agent to record plans, status, and decisions, used to brief a fresh successor session.

Secondary Sourceverified

A compiled, lossy description or summary of a primary source (e.g. readmes, design docs, compaction summaries) that trades detail for lower token costs.

A high-level handoff artifact (like a PRD or design doc) stored in the environment that defines a project's goals, constraints, and ticket checklist across multiple sessions.

A granular handoff artifact that scopes a single session of work, designed to be completed before the model drifts out of the smart zone.

§08

Section 8 — Autocompact

5 entries compiled

AGENTS.mdverified

A project brief file loaded by the harness at startup, detailing the project overview, folder layout, commands, and constraints for coding agents.

Autocompactverified

Compaction triggered automatically by the client harness when context size crosses a threshold (often 80%), risking the quiet loss of task constraints.

Context Pointerverified

A reference path or URL link in one document pointing to another, allowing the agent to load the detail only when the task requires it.

Memory Systemverified

The client-side database or filesystem infrastructure that saves user preferences and project facts across sessions to simulate stateful continuity.

Progressive Disclosureverified

The optimization technique of loading only the specific context required for the active task, hiding detailed files behind context pointers until needed.

§09

Section 9 — Skills and Subagents

6 entries compiled

Away From Keyboard (AFK)verified

A working pattern where the developer leaves the agent to run unattended, deferring all review to the end of the session.

Automated Checkverified

A deterministic verification tool run locally (lints, typechecks, builds, test suites) that gives the agent binary pass/fail logs to self-correct from.

Automated Reviewverified

The process where a secondary model (with a fresh context window) reviews the diff generated by a working agent to catch design flaws, security risks, or contract breaks.

Human-in-the-loopverified

A working pattern where the developer actively monitors, redirects, and collaborates with the agent in real time, catching mistakes before they build up.

A pre-packaged, teachable capability (instructions, scripts, templates) loaded into the context window dynamically using progressive disclosure.

Subagentverified

A secondary agent spawned by a parent agent to execute a specific sub-task in a separate, isolated context window, returning a brief summary result.

§10

Section 10 — Human and Vibe review

6 entries compiled

Design Conceptverified

The shared mental model of what is being built, held in common between developer and agent, separate from any physical file or code asset.

Developer Experience (DX)verified

The quality of the interaction between a human developer and a codebase toolchain, characterized by fast feedback, clean documentation, and ease of work.

Grillingverified

A planning technique where the agent Socratically interviews the developer, one decision at a time, to resolve ambiguities before committing to code or specs.

Human Reviewverified

The final verification gate where a developer reads the primary code diff produced by an agent to judge its correctness, architecture, and safety.

Prototypingverified

A development technique where the agent builds a quick, visual version of a feature, allowing you to react to a physical asset rather than discussing concepts in abstract text.

Vibe Codingverified

A working pattern where the developer accepts the agent's code modifications blindly without conducting code diff reviews, judging progress strictly by runtime behavior.

§11

Section 11 — Agent experience

1 entries compiled

Agent Experience (AX)verified

How well a codebase and its environment are set up to support autonomous agents, defined by fast deterministic checks, clean API boundaries, and low context overhead.

§99

agents

1 entries compiled

Agent Engineeringverified

An architectural design pattern where an LLM is embedded within a stateful loop, allowing it to deconstruct goals, invoke tools, inspect environment feedback, and self-correct.

§99

embeddings-vectors

2 entries compiled

Embeddingsverified

High-dimensional coordinate lists (vectors) that represent the semantic meaning of text, images, or audio, placing related concepts close to each other in a continuous geometric space.

Vector Databasesverified

Specialized database systems designed to store, index, and query high-dimensional vector embeddings rapidly using Approximate Nearest Neighbor (ANN) search algorithms.

§99

mcp-tooling

1 entries compiled

Model Context Protocol (MCP)verified

An open standard protocol that connects AI clients (like IDEs or chat interfaces) to external tools, databases, and data resources using a uniform client-server API structure.

§99

rag

1 entries compiled

Retrieval-Augmented Generation (RAG)verified

An architectural pattern that extends a model's knowledge by retrieving relevant snippets from a local document database and injecting them into the context window at query time.