Local & Open Source AI · Posted by Raj Patel · 3mo ago

Running Llama 3.1 405B on a Mac Studio: Complete Guide

finally got llama 3.1 405b running on my mac studio m2 ultra. heres the full setup, quantization choices, and actual performance numbers

curious what everyone else thinks. gguf format basically won the local model format war

curious if others have had similar results

8 replies

8 Replies

3mo ago

honestly yeah

2mo ago

sharing my experience here because i think it could help - gpu prices are still painful but its getting better. took me way too long to figure this out on my own

3mo ago

huh i never thought about it that way. my m3 max handles 70b models surprisingly well

-2

3mo ago

honestly ive been going back and forth on this. the latency improvement from local is massive for real-time apps

2mo ago

anyone know if does this still work with the latest version or has the api changed?

2mo ago

can someone explain local ai a bit more? im not sure i fully get it

2mo ago

ok so I actually tested this pretty extensively last week and heres what I found - quantized models have gotten insanely good. barely notice the quality drop. for those cases I had to modify the technique a bit. happy to share details if anyones interested

2mo ago

ok real talk - gguf format basically won the local model format war. i know thats not the popular opinion here but someone had to say it