Local LLM'S, Hardware, UI and Models

Josh · April 19, 2024, 10:40pm

The M3 comes to Mac Studio in June (says internet rumors)
It looks like you should be able to get a beastly machine for around 3k…ish
That assumes it retails near current M2 models.

Still not cheap but for now it looks like the best deal for hardware that can do the job.

This is the current M2 version for $2999

riddim · April 20, 2024, 8:35am

But at least we’re getting to a similar area there…

130 GB/s

vs 100-400 GB/s

So depending on your configuration the mac could be slower and just support smaller models I assume… Since the main limiting factor seems to be the memory connection speed

Price seems to be expected around 2k for such a laptop as well from what I’ve read … That would get us to a 30-50% discount to the apple hardware at somewhat similar performance… Still not cheap but at least a bit less excessive… But we’ll see…

Josh · April 20, 2024, 11:26am

But the limitation is 64 GB Memory, why such a silly cap for a product claiming to be designed with Ai in mind.

I still think the mac studio for a grand more is better value then.

I will start a dedicated thread on hardware and UI’s. We have hijacked a thread not neccesarily intended for these discussions.

What we are discussing is important and very relevant considering Autonomi’s AI ambitions so a dedicated thread is needed.

Edit: could a @moderators please move the relevant part of this thread to the new thread I just created.

riddim · April 20, 2024, 1:27pm

Which is enough to run even the q2 version of dbrx

So yes for sure not exactly as powerful as the mac but (together with nvme disk and the speed to switch out the models residing in ram) already enough to get super powerful models to your PC…

And I think here holds @dirvine s half a year head start…you’d already get an excellent mixtral at 26 gb to write mails (even in German and not only good in english) for you or to do stuff in pretty good quality… I think that Hardware would be sufficient in a year from now to do pretty powerful stuff…

Edit/PS: oh and for Germans the mac studio with 96gb is 3.6k euros… 3.8k USD … So we’re probably talking more about getting two of the other machines for the price of one studio… Nonetheless the Apple hardware for sure is pretty awesome and way more desirable

Pps: plus we’re comparing here a laptop and a studio… The fairer comparison would be with a mac book of that specs which is 4.7k euros right now… 5k USD… for us here in Germany at least…

DavidMc0 · April 20, 2024, 3:31pm

And, I wouldn’t be surprised if some Snapdragon X ultra laptops with 64gb of RAM will be available for under $1k, so it may be 1/4 the price of an equivalent Macbook.

While of course the quality of the hardware won’t be equal to a Macbook, it’ll be a much better entry price for running some serious LLMs.

riddim · April 20, 2024, 3:47pm

While ofc the higher spec mac books will be way more powerful…

If you followed the link of @Josh the token per second spread on macs is very impressive

So if you want a very good experience with local llms for less than a car in price… It will still be the the macbook (/studio)

… The snapdragon will just be an actual option to run llms locally at usable speeds without going broke I guess …
(and don’t let the relatively large token per second fool you… It’s with a llama 7b that is quantized at 3.5 GB… So one of the smallest llms… Those numbers will go down a lot for larger models… And you probably don’t want to be in the lower left corner… )

rreive · April 20, 2024, 8:05pm

This is worth posting here as reference, the 1bit way to running LLM on resource constrained systems

rreive · April 20, 2024, 8:24pm

credit to @TylerAbeoJordan for pointing this out on another thread What’s up today? (Part 2) - #2496 by TylerAbeoJordan

rreive · April 20, 2024, 8:43pm

Bitnet the first 1bit LLM break-thru paper https://arxiv.org/pdf/2310.11453.pdf

An excerpt from the theaidream https://www.theaidream.com/post/exploring-1-bit-llms-by-microsoft site’s MS paper on 1 bit

Significance of BitNet b1.58

This advancement holds groundbreaking significance for various reasons:

Cost and Energy Efficiency: BitNet b1.58 achieves a paradigm shift by reducing the precision of weights to 1.58 bits. This substantial reduction leads to a drastic decrease in the energy and computational costs associated with operating Large Language Models (LLMs), establishing BitNet b1.58 as a more sustainable and efficient option.
Model Performance: Despite the reduction in bit representation, BitNet b1.58 not only matches but often surpasses the performance of full-precision LLMs in terms of perplexity and task-specific metrics, especially when starting from a 3 billion model size.
Scalability and Future Applications: BitNet b1.58 showcases outstanding scalability, paving the way for future applications. Its diminished computational requirements enable the deployment of more sophisticated AI models on edge and mobile devices, expanding the realm of possibilities for AI in various domains.

TylerAbeoJordan · April 20, 2024, 10:10pm

I’ve been wondering about this “a bit” . It seems that the computational cost isn’t the bottleneck at the moment - it’s memory bandwidth and I’m curious with the 1-bit models if they are actually much larger in size and so need more memory and more bandwidth than existing models. Do you have any thoughts on that?

Looking at the paper I found this graph:

Which seems to indicate that for the same degree of loss, the 1-bit models would need to be larger in size.

Given the memory bottlenecks with current hardware, I suspect that 1-bit models may actually be slower - although more efficient.

This makes sense to my mind as the current models relative to 1-bit models are more information dense, but that comes at a higher computation cost - to sort of decompress the info, yet that higher density maybe leads to lower memory usage.

Then again, there may be some other factors here that I’m missing.

Southside · April 20, 2024, 11:23pm

rreive · April 21, 2024, 4:28pm

The bitnet1.58 model has an 8X or more Smaller memory requirement versus conventional 32bit LLMs , and as such Bitnet1.58b is more capable of delivering complex engineered prompt responses making use of fewer cores with same memory bandwidth speeds as offered by processors with direct attached memory , using 8X less memory in the process. The accuracy of the response to the complex prompts outlined in their paper are within 5% of the performance of the very best 32bit quantization models reduced to be 8bit Transformer Stack models , which still use 4-6X more Compute cores and Memory and power.

I’ll cautiously say this is a seminal development by MS and CN, which needs more multiple 3rd party independent validations AND needs commercial competition to emerge. I suspect ternary computing will be revived in new silicon forms to tackle the AI inference challenge more effectively at much higher speeds, perhaps 10X or more faster in the medium term, which means an 80X fold improvement or more over what we currently have.

n.b- The guys at IOTA in Norway and Germany as well as in Asia Pac were heading down this path a few years ago, then bailed on their plans for a ternary processor to handle Edge IoT gateway jobs handling lost of sensors , util the guy in Austria steered the IOTA foundation into smart Contracts and their Shimmer overlay network effort.

Maybe its time for Autonomi to revisit that earlier IOTA work, which was ternary math applied to their DAG PoW validation method, which imo is very similar conceptually to what Autonomi does with PoW today.

The point is that there are likely a few ternary math people in the IOTA sphere which could be contacted and tapped as contractors to help @dirvine and co. create an Autonomi specific LLM bolt on inspired by Bitnet1.58B, which could in theory and likely practice, run on everyday i5gen10 Notebook CPUs and 8GB of memory.

The idea would be to create a private Autonomi LLM which trains on data the end user points it at data found on their own local private storage and/or secure Autonomi stored data, while interfacing an Autonomi genAI Assistant client accessible via the cli, or via plugin that loads into any chromium based browser. (Brave Browser is a reasonable place to start.)

In this way the proposed Autonomi genAI Assistant Plugin can be uploaded to the Google Store and downloaded to any Chromium based browser, of which there are several, once downloaded the plugin could, in a user permissioned way, then trigger the Autonomi safe client and safenode and safenode-manager installs and wallet setup…

Think of it as a Trojan horse operation to boost adoption of Autonomi Network if you like, which could go viral across Win11 and Linux desktops and even Apple Safari Desktops.

It would probably mean making sure part of the dataset is made immutably public to support GenAI Plugin Browser use of a useful LLM found on the Autonomi Network, optionally hosted as a node by Autonomi users which each contribute some CPU clock ticks to autonomously train up portions of the proposed Bitnet1.58B like Autonomi LLM public Model, a model which nobody could shutdown, as the inference training nodes which come and go can be replaced in the same fashion in which the storenodes work today. Anyway its raw concept stuff, food for thought.

Topic		Replies	Views
Local LLM Llamafile Community	46	2118	April 20, 2024
SAFE Compute? (AI) Features	17	821	June 12, 2023
Share, discuss interesting computer hardware Community hardware	63	3389	October 2, 2022
Testers wanted for an Autonomi app demo Apps	32	677	June 10, 2024
Dream machine build Community	31	1089	November 30, 2021

Local LLM'S, Hardware, UI and Models

Significance of BitNet b1.58

Related topics