1. Design
1.1. LLM Choice
1.1.1. Model Size
1.1.1.1. Latency
1.1.1.2. Memory/Compute
1.1.1.3. Quantization
1.1.1.4. SLM vs. LLM choice
1.1.2. Context Window Size
1.1.2.1. Pre-defined limits for SaaS models
1.1.2.2. Context-scaling techniques for self-hosted models
1.1.3. Specialty
1.1.3.1. Code generation
1.1.3.1.1. Qwen Coder 2.5
1.1.3.1.2. DeepSeek‑Coder 6.7B
1.1.3.2. General-Purpose
1.1.3.2.1. GPT-4o
1.1.3.2.2. Claude
1.1.3.3. Classification tasks
1.1.3.3.1. Bert
1.1.3.3.2. Roberta
1.1.3.4. Embeddings generation
1.1.3.4.1. Amazon Titan
1.1.3.4.2. E5-Large
1.1.3.4.3. OpenAI Text Embeddings
1.1.3.5. Function Calling
1.1.3.5.1. NaturalFunctions
1.1.3.5.2. Mistral 8x22B
1.1.3.5.3. Octopus
1.1.3.6. Text-to-SQL
1.1.3.6.1. OmniSQL family
1.1.3.6.2. Prem1B SQL
1.1.3.7. Reasoning
1.1.3.7.1. Qwen3
1.1.3.7.2. DeepSeek‑R1
1.1.3.8. Multi-modal capabilities
1.1.3.8.1. Meta Llama4 Vision
1.1.3.8.2. GPT-4o
1.1.3.8.3. Qwen-VL
1.1.4. Availability / Customizability
1.1.4.1. Closed Source / Limited Customizability
1.1.4.1.1. Fine Tuning
1.1.4.1.2. Cost
1.1.4.2. Open-source / Full Customizability
1.1.4.2.1. Mistral
1.1.4.2.2. Gemma
1.1.4.2.3. Llama
1.2. Privacy
1.2.1. Full control
1.2.2. Provider guarantee
1.2.2.1. ChatGPT Enterprise
1.2.2.2. Anthropic
1.2.3. No guarantee
2. Development
2.1. Basic Techniques
2.1.1. Sampling parameters
2.1.1.1. Top-K
2.1.1.2. Top-P
2.1.1.3. Temperature
2.1.2. Naive RAG
2.1.3. Prompt Engineering
2.1.3.1. Chain-of-thought Prompting
2.1.3.2. Tree-of-thought prompting
2.1.3.3. Few-shot prompting
2.1.4. Structured Output
2.1.4.1. Tool Calling
2.1.4.1.1. OpenAI Function Calling
2.1.4.1.2. Gorilla OpenFunctions
2.1.4.1.3. VertexAI Function Calling
2.1.4.2. Json Mode
2.1.4.2.1. OpenAI Json Mode
2.1.4.2.2. FireworksAI Json Mode
2.1.4.3. Grammar Mode
2.1.4.3.1. Llama.cpp
2.1.4.3.2. Fireworks AI Grammer Mode
2.1.4.3.3. Guidance
2.2. Advanced Techniques
2.2.1. Advanced RAG
2.2.1.1. Data / Knowledge Sources
2.2.1.1.1. Source Type
2.2.1.1.2. Pre-processing
2.2.1.2. Pre-Retrieval
2.2.1.2.1. Query optimization and decomposition
2.2.1.2.2. Embedding optimization
2.2.1.2.3. Query routing
2.2.1.3. Retrieval
2.2.1.3.1. Search techniques
2.2.1.3.2. Parameters
2.2.1.4. Post-Retrieval
2.2.1.4.1. Context Expansion
2.2.1.4.2. Reranking and filtering
2.2.1.4.3. Response Generation Prompt Optimization
2.2.1.4.4. Response quality control
2.2.1.4.5. Citation Generation
2.2.2. Fine tuning (training)
2.2.2.1. Training Objective
2.2.2.1.1. Alignment (e.g. make model less harmful, more helpful etc) - usually done through Reinforcement Learning
2.2.2.1.2. Model Compression through Distillation (Teach small model to mimic large model) - uisually done through SFT
2.2.2.1.3. New skill/specialization (usually through SFT)
2.2.2.2. Types
2.2.2.2.1. LLM Distillation
2.2.2.2.2. Reinforcement Learning
2.2.2.2.3. Supervised Fine Tuning
2.2.2.3. Tools / Libraries
2.2.2.3.1. Open Source
2.2.2.3.2. Fully Managed / SaaS
2.2.2.4. Data Creation Tools
2.2.2.4.1. Lilac
2.2.2.4.2. Tuna
2.2.2.4.3. Gretel
2.2.3. Agents
2.2.3.1. Single Agent
2.2.3.1.1. Planning Strategy
2.2.3.1.2. Tool use
2.2.3.1.3. Memory
2.2.3.1.4. Planning and Execution Pattern
2.2.3.1.5. Error correction and quality control
2.2.3.2. Multi-agent
2.2.3.2.1. Topology and Control Flow
2.2.3.2.2. Communication Mechanisms (medium and timing model for information exchange)
2.2.3.2.3. Multi-agent frameworks
3. Test / Evaluate
3.1. Accuracy Testing
3.1.1. Areas
3.1.1.1. Response Consistency
3.1.1.2. Data Handling
3.1.1.3. Response Completeness
3.1.1.4. Context Handling
3.1.1.5. Hallucination Detection
3.1.1.6. E2E Response Accuarcy
3.1.1.7. Contextual Relevance
3.1.2. Tools
3.1.2.1. RAGAS
3.1.2.2. UpTrain
3.1.2.3. Galileo
3.2. Performance Testing
3.2.1. Areas
3.2.1.1. Latency
3.2.1.2. Concurrent requests to LLM
3.2.1.3. First Chunk Latency (For streaming response like ChatGPT)
3.2.2. Tools
3.2.2.1. LangSmith
3.2.2.2. DeepEval
3.2.2.3. AgentOps
3.3. Ethical Testing
3.3.1. Areas
3.3.1.1. Jailbreaking
3.3.1.2. Prompt Injection
3.3.1.3. Data Leakage and Privacy
3.3.1.4. Violience, Toxicity and Biasness
3.3.2. Tools
3.3.2.1. DeepEval
3.3.2.2. UpTrain
3.4. Monitoring
3.4.1. Areas
3.4.1.1. Detect Failures
3.4.1.2. User feedback Analysis
3.4.1.3. Alerting mechanism
3.4.2. Tools
3.4.2.1. MlFlow
3.4.2.2. Arize Phoenix
3.4.2.3. Opik
4. Deployment / LLMOps
4.1. Model hosting
4.1.1. SaaS
4.1.1.1. Single Model
4.1.1.1.1. Azure OpenAI Service
4.1.1.1.2. OpenAI
4.1.1.2. Multi-Model
4.1.1.2.1. Amazon Bedrock
4.1.1.2.2. EdenAI
4.1.1.2.3. Groq
4.1.1.2.4. Lamini
4.1.1.2.5. AWS Sagemaker
4.1.2. Self-hosted
4.1.2.1. On-prem
4.1.2.1.1. LPU
4.1.2.1.2. GPU
4.1.2.2. Cloud
4.1.2.2.1. AWS EKS
4.1.2.2.2. AWS ECS
4.1.2.3. Tools
4.1.2.3.1. LLM Inference Tools
4.1.2.3.2. Model Serving Orchestrator
4.2. Model optimization
4.2.1. Model Pruning
4.2.2. Quantization
4.2.3. Model distillation
4.3. Ops Automation
4.3.1. Monitoring
4.3.1.1. TruEra LLM Observability Solution
4.3.1.2. DeepChecks LLM Evaluation
4.3.1.3. NeptuneAI
4.3.2. Model Lifecycle
4.3.2.1. KubeFlow
4.3.2.2. Sagemaker Pipelines
4.3.2.3. MLflow