For years I’ve had a dream of building a rack mounted PC capable of splitting its resources to host multiple GPU intensive VMs:
- a few gaming VMs
- a VM for work that can run Davinci Resolve and Blender renders
- an LLM server
- a Stable Diffusion server
- media server
Just to name a few possibilities…
Everytime I’ve looked into it, it seemed like the technology just wasn’t there yet. I remember a few years ago Linus TT took a shot at it, but in the end suggested the technology (for non-commercial entities) just wasn’t in a comfortable spot yet.
So how far off are we? Obviously AI focused companies seem to make it work, but what possibilities exist for us self-hosters who might also want to run multiple displays in addition to the web gui LLM servers? And without forking out crazy money for GPU virtualization software licenses?
You can use proxmox to do most of this. Currently my set will only pass-through the gpu to one VM. I have heard of splitting the power among VMs but I have not gone down that rabbit hole. If I want to play with llms I fire up that server, if I want to game, I shut that down and fire up my windows 10 vm.
In Proxmox they have VirGL-GPU and Virtio-GPU. They allow VMs to pass work to the GPU without being dedicated to one VM. I don’t think gaming was the intended use case and don’t know what kind of performance you would get. My uninformed guess is that it would not be great.
I’ve always found the documentation around virtio-GPU and virtgl very lacking, and have never gotten them working. Would love to get pointers if anyone has a good source.