remote-ollama-gpu-scheduler

Remote Ollama GPU Scheduler 高效调度远程 Ollama GPU 算力进行批量 embedding 的技能。核心问题 Node.js fetch 不尊重 NO PROXY ，导致 Tailscale 流量被本地代理拦截：架构选择方案1: Ollama Remote Backend（稳定但慢）配置 : 性能 : - 0.6b: 2.5 chunks/s (1024-dim) - 8b: 1.1 chunks/s (4096-dim) - 73k chunks: 8h (0.6b) / 19h (8b) 方案2: llama-server Backend（快但不稳定）⚠️ 配置 : 已知问题 : - llama.cpp b8352 + Qwen3-0.6b embedding 在 parallel 1 时崩溃 - 错误: - 临时方案: 用或换 llama.cpp b8200 Mac Studio llama-server 启动命令性能基准（Mac Studio M4 Max） | 后端 | 模型 | 并行度 | 速度 | 维度 | 稳定性 | |------|------|--------|------|------|--------| | Ollama | qwen3:0.6b | 1 | 2.5 c/s | 1024 | ✅ | | Oll…