LLM Integration API Client Pattern Streaming Responses Function Calling (Tool Use) RAG Pipeline Document Chunking Cost Optimization Use the smallest model that achieves acceptable quality. Cache embeddings and responses where possible. Batch requests when latency is not critical. Anti-Patterns - Sending entire documents when only relevant chunks are needed - Not implementing retry logic with exponential backoff for API calls - Ignoring token usage tracking (leads to unexpected costs) - Using the most expensive model for simple classification tasks - Not validating or sanitizing LLM output bef…