Ollama vs. Llama.cpp: Raw LLM Speed on the AMD MI60 (Token-per-Second Showdown)

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/72/5a/e4/725ae42f-a72b-fbca-6956-706598df0ba4/mza_14325347627045030928.jpg/600x600bb.jpg

Tech Rants

Edward Ojambo

162 episodes

1 day ago

Discuss the good, the ridiculous and the absurd to separate the hype from reality

Technology

RSS

All content for Tech Rants is the property of Edward Ojambo and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Discuss the good, the ridiculous and the absurd to separate the hype from reality

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43805679/43805679-1748898430171-41b5f68e1e913.jpg

Ollama vs. Llama.cpp: Raw LLM Speed on the AMD MI60 (Token-per-Second Showdown)

Tech Rants

1 hour 25 minutes 9 seconds

1 week ago

Ollama vs. Llama.cpp: Raw LLM Speed on the AMD MI60 (Token-per-Second Showdown)

This is the audio breakdown of our highly-requested GPU benchmark: Ollama versus Llama.cpp for local Large Language Model (LLM) inference on the powerful AMD Instinct MI60. We're testing a single, massive 70-billion-parameter model to measure which framework delivers the absolute highest tokens per second (t/s). The results are not what you expect! If you're running local AI or looking at the AMD ROCm stack for performance, you need to hear this speed comparison. We discuss the granular control you get with Llama.cpp versus the ease-of-use of Ollama and show exactly how to compile for peak efficiency.

For the full video and visual benchmark tables: https://youtube.com/live/CRqHIVR6PDk https://www.ojambo.com/web-ui-for-ai-deepseek-r1-32b-model

#AI #LLM #AMD #Ollama #LlamaCPP #ROCm #GPU #Benchmarking #LocalAI #TechPodcast