Today I'm playing with my Raspberry Pi 5 16GB, and the new-fangled "AI Hat+ 2".

Owl Eyes

Today I'm playing with my Raspberry Pi 5 16GB, and the new-fangled "AI Hat+ 2". I don't really like docker, but will play along, following the documentation provided, to get a nice "Open WebUI" web interface for the chats:

https://www.raspberrypi.com/documentation/computers/ai.html#step1-llm

I'm curious if any of the models they provide are any good:
- "deepseek_r1_distill_qwen:1.5b"
- "llama3.2:3b"
- "qwen2.5-coder:1.5b"
- "qwen2.5-instruct:1.5b"
- "qwen2:1.5b"

Can anyone vouch for these models?

#RaspberryPi #Linux #AI #docker #ollama #OpenSource

Henri

@d1 they are basically toy models

Owl Eyes

@slyecho woe is me, who spent too much on this toy

Henri

@d1 well, nothing wrong with toys. I also have a lot of SBCs and Raspberry Pi stuff to play with. But yeah, these models can run on the CPU too probably

Owl Eyes

@slyecho I was hoping that the "whisper" audio-to-text LLMs would gain hardware acceleration for that Hailo NPU (on that AI Hat+ 2), but alas, there's no mention of Hailo/Raspberry Pi hardware yet, here:

https://github.com/ggml-org/whisper.cpp

#RaspberryPi #LLM #AI #whisper

Anthropy

@d1 tiny models are fun to play with and can do basic things, but I'd hardly qualify a 1.5-3b parameter model as a "large" language model. It starts getting interesting at like 30-70b+, or 400b+ if you have the VRAM for it. Stuff like ChatGPT5.2 and Gemini3 Pro and such are trillions and trillions of parameters. Gemini 3 Flash is ~30b ish if I recall correctly.

That said, you should definitely try play with them, they can still do basic things in small contexts.

Owl Eyes

@anthropy thanks for the consolation

Anthropy

@d1 IIRC people use these kinda models for things like Home Assistant automation, so you can tell a voice assistant to turn on/off lights and such. In tiny contexts like that they seem to work fine- although I personally haven't tried that yet.