I also think small models that are fine tuned with old memories (so no forgetting) can be useful. However titans etc. do have forgetting algorithms in them. So it’s a lot of work in that space, but this all leads to open ended AI which is where it’s exciting and dangerous too.
What is amazing is each move forward give us as people many more powers of collaboration and research to build on. So it’s kinda almost self perpetuating with human in the loop right now. It’s about to explode though
Seem very interesting, it will be exciting to follow the progress coming years. I have thoughts about getting a rtx 5090 when ANT becomes a success, it would be nice if you can get away with something cheaper and not throw away $2400 at Nvidia with their monopoly 60% margin.
Hopefully AMD Strix Halo’s top chip will give ok performance with larger models with up to 128gb RAM (96gb for GPU) for a lot less money than an RTX5090 or Mac with 96/128gb ram… but we’ll see.
Also, with Titans, I wonder whether there won’t be the same disadvantage if you have a limited memory compared to current models? Or would that be unaffected?
It depends. GPU memory (shared between CPU & GPU on Strix Halo and Apple M platforms) is a bottleneck if you want to run larger models, hence why AMD suggest Strix Halo will get more than 2x the performance of a 5090 on a 70b parameter model. (edit: actually, 4090 was the comparison AMD made, not 5090)
But, of course if a model fits in 32gb, the 5090 is a far more powerful GPU and would hugely outperform the APU.
I get the feeling that the Strix Halo is being compared to the laptop 4090, that one is not in the same league as a desktop 4090. A laptop 4090 is like a desktop 4080, but I hope the Strix Halo will be good and strong, I need to look into that.
Cooler than me, I only have a 3070 with 8GB Vram, which Nvidia claimed was enough, but it turned out not to be the case. At least I bought it for $550 on the first weeks after the release as I understood that the covid toilet paper zombies soon would buy them all and cause shortages.
It was the full-fat RTX 4090, but it’s a kind of unfair comparison because the model didn’t fit in the 4090’s VRAM.
Of course the GPU in RTX4090 is far more powerful, but when it has to fetch loads of data from system ram slowly, that memory bandwidth becomes a bottleneck that makes the GPU performance far less relevant… hence in this case a less powerful GPU with more fast memory available to it outperforms it.
Anyway, it’ll be good to have more choice, and hopefully the ability to run bigger models well for less money.
I hope at some point people will be able to contribute AI Compute / run LLM requests to Autonomi users anonymously on the network in exchange for ANT
For single user use, memory bandwidth is usually the bottleneck. To saturate the compute, you usually need batch inference, I.e. multiple users chatting at the same time or paralell processing of lots of documents. GPUs are much faster for this.
Memory bandwidth of 3090 is 936 GB/, 4090 about the same. Strix Halo should be around 256 GB/, a bit less than Macs with 128GB ram and about 1/4 of 3090/4090. Should be fast enough to be usable. 5090 has 1,792 GB/, so quite a bit faster.
Would it be possible to create some decentralized version of operating system? EG a version of linux married with nodes, with the whole filesystem hosted on Autonomi?
For AI you might want to wait for this : " NVIDIA Project DIGITS With New GB10 Superchip Debuts as World’s Smallest AI Supercomputer Capable of Running 200B-Parameter Models". Price ~3000$, should be available in May.