AI Engineering Interview Prep

// SYSTEMS_DESIGN

When should you use Retrieval-Augmented Generation (RAG) vs. Fine-tuning?

[CLICK_TO_DECODE_ANSWER]ENTER ↵

// DECODE_RESULT

Use **RAG** for injecting dynamic, real-time facts, handling private datasets, and keeping records up-to-date (low cost, immediate, updates factual context). Use **Fine-tuning** for training the model on tone, custom formatting styles, specific formatting rules, specialized syntax, or learning a specific dialect/API grammar (expensive, requires retraining, updates model parameters).

[CLICK_TO_FLIP_BACK]

// CONTEXT_LIMITS

What is attention degradation (lost-in-the-middle) and how do you prevent context window bloating?

[CLICK_TO_DECODE_ANSWER]ENTER ↵

// DECODE_RESULT

Attention degradation occurs when a model loses the ability to follow instructions when the context window is stuffed to capacity. To prevent it, implement **context compaction** (summarizing older messages), **prompt truncation**, **semantic pruning** (retaining only top-k relevant embeddings), or **system prompt placement** (re-injecting instructions at the very end of the chat stream).

[CLICK_TO_FLIP_BACK]

// AGENT_TOOLING

Explain the Model Context Protocol (MCP) and how it solves integration problems.

[CLICK_TO_DECODE_ANSWER]ENTER ↵

// DECODE_RESULT

Model Context Protocol (MCP) is an open standard designed by Anthropic that acts like a USB port for AI models. Instead of writing custom API integration code for every tool, database, and IDE, developers build standard **MCP Servers** that expose resources, prompts, and tools. Any client (like an IDE or agent) can query the server over standard JSON-RPC 2.0 protocol via standard I/O (stdin/stdout) or SSE to run tool actions securely.

[CLICK_TO_FLIP_BACK]

// FAILURE_MODES

What is the difference between factuality hallucinations and faithfulness hallucinations?

[CLICK_TO_DECODE_ANSWER]ENTER ↵

// DECODE_RESULT

**Factuality hallucinations** are when the model lacks parametric knowledge and guesses an answer (e.g. citing a library that does not exist). Fix this by adding correct context. **Faithfulness hallucinations** occur when the information is in context, but the model fails to adhere to it due to attention degradation. Fix this by pruning or reducing the size of the context window.

[CLICK_TO_FLIP_BACK]

// SAMPLING_INFRA

How do temperature and top-p sampling affect LLM output generation?

[CLICK_TO_DECODE_ANSWER]ENTER ↵

// DECODE_RESULT

During next-token prediction, the model computes probabilities. **Temperature** scales the logits before softmax: a temperature near 0 makes the model select only the most likely token (deterministic), while higher values flatten the curve, making rare tokens more probable (creative/noisy). **Top-p (nucleus sampling)** limits selection to the top tokens whose cumulative probability is less than p (e.g., top-p=0.9 is the top 90% most likely options), pruning out the long tail of improbable choices.

[CLICK_TO_FLIP_BACK]

// SYSTEM_PROMPTS

Developer Prompt Templates

// COPY_AND_PASTE_INTO_YOUR_LLM

1. Socratic Coding Partner

Forces the LLM to guide you step-by-step through a programming problem without typing out the full solution, improving recall and comprehension.

Act as a strict, senior Socratic programming tutor. When I ask you for help with a coding problem, do NOT write the solution for me. Instead, explain the core computer science concepts behind the issue, ask me leading questions to help me identify where my code is going wrong, and guide me toward writing the solution myself one step at a time.

2. Context Window Compactor

Instructs the LLM to compress a long chat history or code base description into a dense memory summary, saving token usage in future prompts.

Analyze the chat history and code files provided. Compile a highly compressed summary of the current project state, active memory parameters, files modified, outstanding bugs, and structural rules. Omit generic conversational filler. Keep the summary under 300 words, formatted as a dense markdown list, optimized for ingestion by another AI context window.

3. Architectural Sandbox Validator

Audits your code implementations against project rules (like no backend databases or strict styling tokens) and outputs clean validation results.

Audit the attached code changes against the following strict constraints:
1. Zero server-side databases (no Supabase, Postgres, Firebase).
2. Zero user authentication or user state storage.
3. Pure static compilation flow (pre-rendered JSON databases only).
4. Symmetrical CSS utility conventions (Newsreader for serif, JetBrains Mono for monospace, Space Grotesk for headers).

Output a list of COMPLIANT, WARNING, or CRITICAL violations with file paths and line-level recommendations.

Interview Preparation

When should you use Retrieval-Augmented Generation (RAG) vs. Fine-tuning?

What is attention degradation (lost-in-the-middle) and how do you prevent context window bloating?

Explain the Model Context Protocol (MCP) and how it solves integration problems.

What is the difference between factuality hallucinations and faithfulness hallucinations?

How do temperature and top-p sampling affect LLM output generation?

Developer Prompt Templates

1. Socratic Coding Partner

2. Context Window Compactor

3. Architectural Sandbox Validator

Pillar-Based Study Checklist

Pillar 1: Core LLM Foundations

Pillar 2: Distributed Infra & Scaling

Pillar 3: Context Retrieval (RAG)

Pillar 4: Agent Design & Tooling