Local RAG with llama.cpp on Nvidia and LlamaIndex
I already wrote up
how I run RAG over my own documents on an AMD Ryzen AI box using
Lemonade. This post is the same setup for an Nvidia laptop. Lemonade doesn’t
work with Nvidia’s proprietary drivers, so on this machine I drive three
llama-server processes directly instead. LlamaIndex still lives in a
libvirt VM and talks to them over virbr0.