Local LLM Llamafile

Try a local LLM using Mozilla’s Llamafile:

  1. Get the 4.3GB file:
    curl -LO https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4-server.llamafile

  2. Make executable

  3. Run it. If you get errors at this point see this post

  4. Visit

H/t Simon Willison, more here: llamafile is the new best way to run a LLM on your own computer


It actually works pretty well. I got it a few days ago and worked out of the box. Merging many of these models looks to be something doable soon too.


Have you asked it to repeat a word forever yet? I’m still downloading but its the first thing I’m gonna try :imp:

1 Like

No not yet. It’s a quite good model though and helpful


I think my favourite thing about llamafile is what it represents. This is a single binary file which you can download and then use, forever, on (almost) any computer.

Except for mine apparently (Linux Mint) :frowning_face:

user@dell:~/Documents/Llama$ ./llava-v1.5-7b-q4-server.llamafile 
run-detectors: unable to find an interpreter for ./llava-v1.5-7b-q4-server.llamafile

Sweary words…
Looks like I need to learn to get CUDA setup correctly first.

llama_new_context_with_model: compute buffer total size = 162.63 MB
Available slots:
 -> Slot 0 - max context: 2048

llama server listening at

*** stack smashing detected ***: terminated
Terminating on uncaught SIGSEGV. Pass --strace and/or ShowCrashReports() for details.

try wine ./llava-v1.5-7b-q4-server.llamafile and then see what errors you get

I get “you need to install wine…”


1 Like

I went over to the dark side a few years back and went MAC os. This stuff all works on apple metal which does make things much simpler.


Well that worked more or less, thanks @Southside

Some issue with the webserver though

failed to open in a browser tab using /c/windows/explorer
.exe: Error 1359 (win32 error 6)
loading weights...
HTTP server listening","hostname":"","port":8080}
1 Like

After getting into nvidia driver hell, I managed to make some progress. Still not seeing anything at localhost:8080 but the error msgs are starting to make more sense.

GitHub - Mozilla-Ocho/llamafile: Distribute and run LLMs with a single file. may help

EDIT - sorry forget about wine.   

./llava-v1.5-7b-q4-server.llamafile      worked for me eventually after a LOT of faffing with nvidia drivers. I have a GTX1650 - was once upon a time a pretty good card.

It’s working out of the box on Ubuntu :man_shrugging:


working now but I had to bash the box into shape a wee bit

1 Like

Its no use at counting people. Dunno where it gets 12 from.

Scene from the wedding of an ex-colleague of @JimCollinxson btw

EDIT: Its still inventing people. I asked it try to identify the car in the photo below

User: identify this vehicle

Llama: The vehicle in the image is a red car, possibly an old-style sports car. It appears to be driving on dirt or gravel and has been described as being “in motion.” This suggests that it may have been racing or participating in some sort of off-road event. Additionally, there are two people visible near the vehicle; one person is closer to the left side while another person can be seen further back towards the right side of the image. The scene captures an exciting moment as the car navigates through its environment.

1 Like

Their names being “A” and “B” LOL

I still cannot find the guy holding his Tie up to show it off

1 Like

“All work and no play makes JackGPT a dull AI.”

Be careful with that! :laughing:

On some Linux systems, you might get errors relating to run-detectors or WINE. This is due to binfmt_misc registrations. You can fix that by adding an additional registration for the APE file format llamafile uses:

sudo wget -O /usr/bin/ape https://cosmo.zip/pub/cosmos/bin/ape-$(uname -m).elf sudo chmod +x /usr/bin/ape sudo sh -c "echo ':APE:M::MZqFpD::/usr/bin/ape:' >/proc/sys/fs/binfmt_misc/register" sudo sh -c "echo ':APE-jart:M::jartsr::/usr/bin/ape:' >/proc/sys/fs/binfmt_misc/register"


That works. Thanks @TylerAbeoJordan !


The server for some reason does not seem as good as the cli I have not tried images but that part is not so great. However the diff between nearly there and beyond belief is just more compute in training and that is the amazing part.


It sees dead people :skull:

1 Like