Tesla P40 in a Homelab: 24GB of Inference on a Budget
Running a Tesla P40 for LLM inference. Why I ditched GPU passthrough for host-level drivers to stop the constant Proxmox crashes.
4 posts
Running a Tesla P40 for LLM inference. Why I ditched GPU passthrough for host-level drivers to stop the constant Proxmox crashes.
How to build a routing layer for AI agents that ensures sensitive data stays on local hardware while leveraging cloud LLMs for non-private tasks.
Moving beyond prompt engineering to implement token-level schema enforcement, pre-execution gates, and shell-safe execution pipelines for AI agents.
Deploying Ollama on Kubernetes can lead to GPU deadlocks. Here's how to avoid them.