Local & Open Source AI · Posted by Suki Watanabe · 2mo ago

My Ollama Setup Runs 7 Models Simultaneously on \$2000 Hardware

-3

built a dedicated inference box for under 2k. runs 7 different models concurrently. full parts list and config

the main thing ive noticed is my m3 max handles 70b models surprisingly well

am i the only one who thinks this?

5 replies

5 Replies

2mo ago

just tried this and yeah it works. the privacy benefits alone make local worth it for my use case

2mo ago

wait really? i run everything through lm studio now, way simpler than command line

2mo ago

lmao i literally ran into this same problem yesterday. ollama makes running local models so easy now

2mo ago

can confirm this works. vram is the real bottleneck. you need at least 24gb for serious work

2mo ago

I actually wrote something about this a few weeks ago. the key insight for me was ollama makes running local models so easy now. changed how i think about the whole thing