My Ollama Setup Runs 7 Models Simultaneously on \$2000 Hardware
-3
built a dedicated inference box for under 2k. runs 7 different models concurrently. full parts list and config
the main thing ive noticed is my m3 max handles 70b models surprisingly well
am i the only one who thinks this?
5 replies
5 Replies
Join the discussion.
Log In to Reply
6
wait really? i run everything through lm studio now, way simpler than command line
11
lmao i literally ran into this same problem yesterday. ollama makes running local models so easy now
2
can confirm this works. vram is the real bottleneck. you need at least 24gb for serious work
4
I actually wrote something about this a few weeks ago. the key insight for me was ollama makes running local models so easy now. changed how i think about the whole thing
just tried this and yeah it works. the privacy benefits alone make local worth it for my use case