// PREPARATION_FLIPCARDS
Interview Preparation
Master high-frequency concepts asked in AI systems design and engineering loops. Click any card to run the answer decode.
When should you use Retrieval-Augmented Generation (RAG) vs. Fine-tuning?
Use **RAG** for injecting dynamic, real-time facts, handling private datasets, and keeping records up-to-date (low cost, immediate, updates factual context). Use **Fine-tuning** for training the model on tone, custom formatting styles, specific formatting rules, specialized syntax, or learning a specific dialect/API grammar (expensive, requires retraining, updates model parameters).
What is attention degradation (lost-in-the-middle) and how do you prevent context window bloating?
Attention degradation occurs when a model loses the ability to follow instructions when the context window is stuffed to capacity. To prevent it, implement **context compaction** (summarizing older messages), **prompt truncation**, **semantic pruning** (retaining only top-k relevant embeddings), or **system prompt placement** (re-injecting instructions at the very end of the chat stream).
Explain the Model Context Protocol (MCP) and how it solves integration problems.
Model Context Protocol (MCP) is an open standard designed by Anthropic that acts like a USB port for AI models. Instead of writing custom API integration code for every tool, database, and IDE, developers build standard **MCP Servers** that expose resources, prompts, and tools. Any client (like an IDE or agent) can query the server over standard JSON-RPC 2.0 protocol via standard I/O (stdin/stdout) or SSE to run tool actions securely.
What is the difference between factuality hallucinations and faithfulness hallucinations?
**Factuality hallucinations** are when the model lacks parametric knowledge and guesses an answer (e.g. citing a library that does not exist). Fix this by adding correct context. **Faithfulness hallucinations** occur when the information is in context, but the model fails to adhere to it due to attention degradation. Fix this by pruning or reducing the size of the context window.
How do temperature and top-p sampling affect LLM output generation?
During next-token prediction, the model computes probabilities. **Temperature** scales the logits before softmax: a temperature near 0 makes the model select only the most likely token (deterministic), while higher values flatten the curve, making rare tokens more probable (creative/noisy). **Top-p (nucleus sampling)** limits selection to the top tokens whose cumulative probability is less than p (e.g., top-p=0.9 is the top 90% most likely options), pruning out the long tail of improbable choices.
// SYSTEM_PROMPTS
Developer Prompt Templates
1. Socratic Coding Partner
Forces the LLM to guide you step-by-step through a programming problem without typing out the full solution, improving recall and comprehension.
Act as a strict, senior Socratic programming tutor. When I ask you for help with a coding problem, do NOT write the solution for me. Instead, explain the core computer science concepts behind the issue, ask me leading questions to help me identify where my code is going wrong, and guide me toward writing the solution myself one step at a time.
2. Context Window Compactor
Instructs the LLM to compress a long chat history or code base description into a dense memory summary, saving token usage in future prompts.
Analyze the chat history and code files provided. Compile a highly compressed summary of the current project state, active memory parameters, files modified, outstanding bugs, and structural rules. Omit generic conversational filler. Keep the summary under 300 words, formatted as a dense markdown list, optimized for ingestion by another AI context window.
3. Architectural Sandbox Validator
Audits your code implementations against project rules (like no backend databases or strict styling tokens) and outputs clean validation results.
Audit the attached code changes against the following strict constraints: 1. Zero server-side databases (no Supabase, Postgres, Firebase). 2. Zero user authentication or user state storage. 3. Pure static compilation flow (pre-rendered JSON databases only). 4. Symmetrical CSS utility conventions (Newsreader for serif, JetBrains Mono for monospace, Space Grotesk for headers). Output a list of COMPLIANT, WARNING, or CRITICAL violations with file paths and line-level recommendations.
// PREP_SYSTEMATIC_FRAMEWORK
Pillar-Based Study Checklist
Pillar 1: Core LLM Foundations
- Next-Token Prediction & TokenizationUnderstand character fragment encoding constraints and standard BPE vocabulary limits.
- Logits & Sampling MathMaster Temperature scaling and Top-p (nucleus sampling) softmax mathematical impacts.
- Context Compaction & PruningLearn to implement older message summaries and sliding context window sliding parameters.
Pillar 2: Distributed Infra & Scaling
- Data Parallelism (DDP vs. FSDP)Study batch distribution, gradient all-reduce synchronization, and collective execution ops.
- ZeRO Optimizer State ShardingAnalyze ZeRO-1, ZeRO-2, and ZeRO-3 sharded parameter communication overheads.
- Gradient Accumulation & FP16Understand virtual batch sizes, scaling factors, and mixed-precision gradient overflow.
Pillar 3: Context Retrieval (RAG)
- Chunking & EmbeddingsCompare semantic, recursive character, and overlapping boundary chunking strategies.
- Vector Similarity & RerankingCompare BM25 hybrid retrieval, dense cosine vector metrics, and cross-encoder reranking.
- RAG Evaluation MetricsStudy NDCG, MRR, Faithfulness, Context Recall, and Answer Relevance criteria.
Pillar 4: Agent Design & Tooling
- ReAct Loop & State PreservationMaster plan-action-observation cycles and window memory buffering states.
- Model Context Protocol (MCP)Understand Client/Server tool calls, JSON-RPC 2.0 schemas, and stdin/stdout security.
- Plugin Packaging & Custom SkillsCompare custom task rules, subagent tool mounts, and modular plugin delivery packages.