Local LLM'S, Hardware, UI and Models


A place to discuss all things related to running LLM’s locally.

Current h/w requirements and pricing, best models to run, UI and everything else.

To seed the topic the video below compares Mac M1 with M3 and Nvidia 4090 running 7, 13 and 70B models.

Also see the Mac M1 M2 M3 comparison chart for performance on Apple hardware which seems to be the way to go at this time.


Can folk here who are running Local LLM’s share what systems you are running these on.
I just tried both ollama dbrx & ollama wizardlm2:8x22b and it is slow as can be, useless is a better word.
I didnt think that my system is terrible but it seems that I need some serious upgrades if I want to use a decent LLM.

@dirvine pinging you cause I know that you run these. :slight_smile:

I am running it on:

11th Gen Intel® Core™ i5-11600K × 12
AMD Radeon™ RX 6600

I want to have a great experience so if you are having one, tell me what I need to blow the wallet on.

1 Like

I have 64GB of RAM on an Intel i7 CPU so a decent machine if a few years old.

I’ve tried a few models from 4GB to 40GB which are all too slow to be useful and get much much slower with size. They’ve been too slow to be useful so I didn’t evaluate much, but what I’ve tried so far hasn’t impressed enough to think they’d help with what I do. I find web search, StackExchange and communities answer all my needs atm.

I’m waiting for those who think this is going to be useful in a few months to show what they have. I hear the words but they seem overly optimistic based on what I’ve tried, because to be useful this stuff needs to work on devices far inferior to my laptop.


I guess age comes into play here especially for the CPU but I expected a somewhat useful experience with what I have.

This is what i see online:

I know that the GPU I bought in the midst of the shortage for a million dollars needs replaced but it is a bitter pill to swallow, yet if a gpu is optional I should get some benefit out of it.

I don’t have a GPU but we’re talking about this being consumer level, in theory “for everyone” so that’s the context I use to judge if this can deliver local AI as part of Autonomi’s fundamentals. Unless those have changed it doesn’t seem credible. If the threshold is different it would be good to know, but IIRC we’ve been told this will work on mobile.


I see where you are coming from, definitely valid and crossed my mind.

Autonomi running LLM’s does seem a stretch right now, that said I am curious enough on a personal level to possibly upgrade my system to have a good experience.

I am confused by the system requirements I posted above, from that I should not get the info I want faster by driving to the library.


Aye if you have the very latest mobile with zillions of RAM

I find the small models somewhat useabe with 32GB RAM and a 2yr old AMD Ryzen 5 3600 6-Core Processor with a somewhat more elderly
GeForce GTX 1650 SUPER
The larger models will mostly run eventually but so slow as to be useless.

My prediction of “32GB RAM should be enough for anyone” has turned out to be kinda Gatesian.


Well it is good to hear that I am not driving something so prehistoric that it is only me having this issue :slight_smile: I had not seen others complain so was thinking that you all had the latest gaming rigs running these things in the background just for fun!

I don’t get it though it is like treacle but when I look at system resources nothing is really struggling. :man_shrugging:


same here - mystified

1 Like

I saw a post saying if you can fit the entire LLM on your GPU it will be snappy, these models that I am trying are 70GB, I am keen on a good user experience but 24GB GPU’s are eye watering expensive and I’d need a mining rig to fit the full model :rofl:


I run LM Studio on Windows. i9-13900k, 64GB mem, MSI RTX3090 suprim x (24GB). The rig was mostly for audio, but I got the graphics card as a treat as I was mucking about with training some deep learning models.

I’ve found LM studio to be useful as it can automatically download quantised models from huggingface. A guy called “thebloke” gets large LLMs and quantises them for the good of the open source community.

LM studio is pretty user friendly and can help set up appropriate config for the model. It also allows you to offload some layers of the model to the GPU, and advises how many layers could be offloaded. Taking a 30b parameter model and cramming as much of it into the GPU took the average response time from about a word a second to about 1.5 sentences per second.

Possibly not what people are looking for if they’re doing development, but really good for mucking about and experimenting. It can also spin up a server which surfaces an API you can call externally.

I found code llama to be ok - the 70b model performed worse (in terms of accuracy / result) than the smaller ones. I was amazed by the results of wizard coder - worth a try (my use case is that I’m not a developer, and benefit greatly from being signposted towards the right approach / approaches).


In short what you are saying is I need a new system not a upgrade :slightly_smiling_face:

Thank you @futuretrack you have given me a wealth of information to go on there!
Seems that a simple pull and play is not going to provide the experience that I thought it would.

I need to curb my expectations and put a little more effort and resources in.


I think we are all going to have to wait for a better hardware:

(further in the future):

and/or better LLM structure:

… and what I think would be the Coup de grâce for speed and efficiency - stacking highly specialized (and so vastly smaller) LLM’s – with a language interpretation layer and a logic layer as the top two layers – wherein the logic layer can access any number of other specialized LLM’s to get the skill or data it requires to serve the request.


Comes in two colors:




A box that runs 70B FP16 LLaMA-2 and mixtral. $15,000 total price for red, $25,000 total price for green. Refundable until it ships.

See also:


We write and maintain tinygrad, the fastest growing neural network framework (over 23,000 GitHub stars)


Well I may as well have tried on my Raspberry pi then. :exploding_head: my pc is closer to a pi than to tiny box.

| Specification         | Red Tinybox        | Green Tinybox      |
| TFLOPS                | 738 FP16 TFLOPS    | 991 FP16 TFLOPS    |
| GPU RAM               | 144 GB             | 144 GB             |
| GPU RAM Bandwidth     | 5760 GB/s          | 6050 GB/s          |
| GPU Link Bandwidth    | 6x PCIe 4.0 x16    | 6x PCIe 4.0 x16    |
| CPU                   | 32 core AMD EPYC   | 32 core AMD EPYC   |
| System RAM            | 128 GB             | 128 GB             |
| System RAM Bandwidth  | 204.8 GB/s         | 204.8 GB/s         |
| Disk Size             | 4 TB raid array    | 4 TB raid array    |
|                       | + 1 TB boot        | + 1 TB boot        |
| Disk Read Bandwidth   | 28.7 GB/s          | 28.7 GB/s          |
| Networking            | Dual 1 GbE         | Dual 1 GbE         |
|                       | + open x16 OCP 3.0 | + open x16 OCP 3.0 |
| Noise                 | < 50 dB            | < 50 dB            |
|                       | 31 low speed fans  | 31 low speed fans  |
| Power Supply          | 2x 1600W           | 2x 1600W           |
| BMC                   | AST2500            | AST2500            |
| Operating System      | Ubuntu 22.04       | Ubuntu 22.04       |
| Dimensions            | 12U, 16.25" deep   | 12U, 16.25" deep   |
|                       | 90 lbs             | 90 lbs             |
| Driver Quality        | Mediocre           | Great              |
| Price                 | $15,000            | $25,000            |
1 Like


I’ve been wondering for a while if anyone has made a wearable device that remaps the entire spectrum [1] into the visible spectrum. Like night-vision goggles, but for everything. And ideally it could view everything at once or be tuned to various frequency ranges.

why? I think we might look at EMF radiation a lot differently if we could actually see the vibrating glow coming off transmitters like cell phones, wifi, etc, all around us, in our hands, cell phone towers, etc. Some areas in the city would be very bright and other places dim. Also living beings emit frequencies/aura that would be cool to look at. Fun teaching tool. Probably a thousand uses I’ve never thought of.

[1] Obviously with reasonable min and max cutoffs. VLF frequencies require miles long antennas, etc.


Honestly it only makes sense for humans to view ranges of EMF. Imagine choosing the narrow band for WiFi, especially 5G WiFi band and do that in the city. It’d be a blur. Or for 100Khz to 1GHz would also be a blur. Even Infra red cameras are collapsing a large range of frequencies into a single signal.


LSD visuals without the “sideeffects” :smiley:

1 Like

I feel as though I need to clarify that my opinion from yesterday has changed.

My issue appears to be that I went and tried the biggest models out there and had a terrible experience.
This morning I am trying smaller models with a far better experience.

I can run 7B models without issue, I have not tired 13B yet but suspect that will be ok.
With regard to personal ai on autonomi I would think the models run will simply need to be of a less adventurous size, not behemoth know it all models.


Not for everyone, which for me is a crucial point. Even small models are slow on my laptop, probably because I don’t have a GPU. Can any run on a phone as claimed, especially cheap phones?

I don’t doubt that these will run on some consumer devices, but we’re putting a lot on delivering personal AI for a project that has always been for everyone.

I don’t see that being achieved, so the question is what is the target platform for this and by when? And what level of accuracy/reliability and speed is ‘useful’.

These are important questions because we’re told this is going to be a big feature of the brand roll-out, enough for David to suggest we should stop thinking about providing a traditional filesystem or search, because LLMs would provide the UI and we’d not need that. Or at least almost everyone would just use an AI UI.

Now maybe he was talking longer term there but that wasn’t my impression because we were discussing mounting of a filesystem and he seemed to be suggesting I shouldn’t bother with that approach because it was soon to be obsolete.

At the moment it’s hard to know what the vision is for October (beyond the basics) and for the year or two following. Achievable timescales for things that Autonomi want to put front and centre will be important.