
1. Design / Architecture
1.1. LLM Choice
1.1.1. Model Size
1.1.1.1. Latency
1.1.1.2. Memory/Compute
1.1.2. Context Window Size
1.1.3. Specialty
1.1.3.1. Code generation
1.1.3.1.1. CodeLlama
1.1.3.1.2. Phind70B
1.1.3.2. Conversation / Q&A
1.1.3.2.1. GPT
1.1.3.2.2. Claude
1.1.3.3. Classification tasks
1.1.3.3.1. Bert
1.1.3.3.2. Roberta
1.1.3.4. Embeddings generation
1.1.3.4.1. Amazon Titan
1.1.3.4.2. Ada-002
1.1.3.4.3. Jurassic
1.1.3.5. Function Calling
1.1.3.5.1. NaturalFunctions
1.1.3.5.2. Functionary
1.1.3.5.3. Mistral 8x22B
1.2. Availability / Customizability
1.2.1. Open-source / Full Customizability
1.2.1.1. Mistral
1.2.1.2. Gemma
1.2.1.3. Llama
1.2.2. Closed Source / Limited Customizability
1.2.2.1. Fine Tuning
1.2.2.1.1. Fine-tuning access
1.2.2.1.2. No fine-tuning access
1.2.2.2. Cost
1.3. Privacy
1.3.1. Full control
1.3.2. Provider guarantee
1.3.3. No guarantee
2. Development
2.1. Basic Techniques
2.1.1. Sampling parameters
2.1.1.1. Top-K
2.1.1.2. Top-P
2.1.1.3. Temperature
2.1.2. Naive RAG
2.1.3. Prompt Engineering
2.1.3.1. Chain-of-thought Prompting
2.1.3.2. Tree-of-thought prompting
2.1.3.3. Few-shot prompting
2.1.4. Structured Output
2.1.4.1. Tool Calling
2.1.4.1.1. OpenAI Function Calling
2.1.4.1.2. Gorilla OpenFunctions
2.1.4.1.3. VertexAI Function Calling
2.1.4.2. Json Mode
2.1.4.2.1. OpenAI Json Mode
2.1.4.2.2. FireworksAI Json Mode
2.1.4.3. Grammar Mode
2.1.4.3.1. Llama.cpp
2.1.4.3.2. Fireworks AI Grammer Mode
2.1.4.3.3. Guidance
2.2. Advanced Techniques
2.2.1. Fine tuning (training)
2.2.1.1. Training Objective
2.2.1.1.1. Alignment (e.g. make model less harmful, more helpful etc) - usually done through Reinforcement Learning
2.2.1.1.2. Model Compression through Distillation (Teach small model to mimic large model) - uisually done through SFT
2.2.1.1.3. New skill/specialization (usually through SFT)
2.2.1.2. Types
2.2.1.2.1. LLM Distillation
2.2.1.2.2. Reinforcement Learning
2.2.1.2.3. Supervised Fine Tuning
2.2.1.3. Tools / Libraries
2.2.1.3.1. Open Source
2.2.1.3.2. Fully Managed / SaaS
2.2.1.4. Data Creation Tools
2.2.1.4.1. Lilac
2.2.1.4.2. Tuna
2.2.1.4.3. Gretel
2.2.2. Advanced RAG
2.2.2.1. Advanced RAG Tools
2.2.2.1.1. LlamaIndex
2.2.2.1.2. Langchain
2.2.2.1.3. Chroma
2.2.2.2. Evaluation Tools
2.2.2.2.1. LlamaIndex builtin Evaluations
2.2.2.2.2. TruLens RAG Evaluations
2.2.2.2.3. Uptrain
2.2.2.2.4. DeepEval
2.2.2.2.5. TonicValidate
2.2.2.2.6. RAGAS
2.2.2.3. Indexing Techniques
2.2.2.3.1. Knowledge Graph Index
2.2.2.3.2. Hierarchical Indexing
2.2.2.4. Chunking Optimization
2.2.2.4.1. Recursive splitting
2.2.2.4.2. Semantic splitting
2.2.2.4.3. Layout-aware / Intelligent splitting
2.2.2.5. Retrieval Strategy
2.2.2.5.1. Vector Search
2.2.2.5.2. Hybrid Search
2.2.2.6. Relevance improvement
2.2.2.6.1. Context Expansion
2.2.2.6.2. Response Re-Ranking
2.2.2.6.3. Embeddings Adapter
2.2.2.7. Query Improvement
2.2.2.7.1. Query Expansion
2.2.2.7.2. Hyde (Hypothetical document embedding)
3. Test/evaluate
3.1. Accuracy and helpfulness testing
3.1.1. Techniques
3.1.1.1. Task Specific Accuracy Metrics
3.1.1.1.1. Bleu/COMET Score for translation
3.1.1.1.2. Rogue Score for summarization
3.1.1.2. Human Evaluation
3.1.1.3. LLM-based Evaluation
3.1.1.4. General-purpose benchmarks
3.1.1.4.1. Helm
3.1.1.4.2. MMLU
3.1.2. Accuracy Testing Tools
3.1.2.1. Libraries
3.1.2.1.1. Giskard
3.1.2.1.2. LangSmith
3.1.2.1.3. TruLens
3.1.2.2. Repos
3.1.2.2.1. Applied LLM Benchmark
3.1.2.2.2. EvalPlus
3.1.2.2.3. OpenAI Evals
3.1.2.2.4. DeepEval
3.2. Safety testing
3.2.1. Vulnerability scanning / Red teaming
3.2.1.1. Giskard vulnerability scanner
3.2.2. Jailbreak testing
3.2.2.1. LastLayer
3.2.2.2. Geiger
3.2.3. Ethical (bias, toxicity, PII)
3.2.3.1. LLMGuard
4. Deployment / LLMOps
4.1. Model hosting
4.1.1. SaaS
4.1.1.1. Single Model
4.1.1.1.1. Azure OpenAI Service
4.1.1.1.2. OpenAI
4.1.1.2. Multi-Model
4.1.1.2.1. Amazon Bedrock
4.1.1.2.2. EdenAI
4.1.1.2.3. Groq
4.1.1.2.4. Lamini
4.1.2. Self-hosted
4.1.2.1. On-prem
4.1.2.1.1. LPU
4.1.2.1.2. GPU
4.1.2.2. Cloud
4.1.2.3. Tools
4.1.2.3.1. Ollama
4.1.2.3.2. VLLM
4.1.2.3.3. OpenLLM
4.2. Model optimization
4.2.1. Model Pruning
4.2.2. Quantization
4.2.3. Model distillation
4.3. Ops Automation
4.3.1. Monitoring
4.3.1.1. TruEra LLM Observability Solution
4.3.1.2. DeepChecks LLM Evaluation
4.3.1.3. NeptuneAI
4.3.2. Model Update
4.3.2.1. KubeFlow
4.3.2.2. AWS Sagemaker