Local & Open Source AI · Posted by Raj Patel ·

Running Llama 3.1 405B on a Mac Studio: Complete Guide

6

finally got llama 3.1 405b running on my mac studio m2 ultra. heres the full setup, quantization choices, and actual performance numbers

curious what everyone else thinks. gguf format basically won the local model format war

curious if others have had similar results

8 replies

8 Replies

4

honestly yeah

8

sharing my experience here because i think it could help - gpu prices are still painful but its getting better. took me way too long to figure this out on my own

19

huh i never thought about it that way. my m3 max handles 70b models surprisingly well

-2

honestly ive been going back and forth on this. the latency improvement from local is massive for real-time apps

9

anyone know if does this still work with the latest version or has the api changed?

22

can someone explain local ai a bit more? im not sure i fully get it

16

ok so I actually tested this pretty extensively last week and heres what I found - quantized models have gotten insanely good. barely notice the quality drop. for those cases I had to modify the technique a bit. happy to share details if anyones interested

2

ok real talk - gguf format basically won the local model format war. i know thats not the popular opinion here but someone had to say it