It was the full-fat RTX 4090, but it’s a kind of unfair comparison because the model didn’t fit in the 4090’s VRAM.
Of course the GPU in RTX4090 is far more powerful, but when it has to fetch loads of data from system ram slowly, that memory bandwidth becomes a bottleneck that makes the GPU performance far less relevant… hence in this case a less powerful GPU with more fast memory available to it outperforms it.
Anyway, it’ll be good to have more choice, and hopefully the ability to run bigger models well for less money.
I hope at some point people will be able to contribute AI Compute / run LLM requests to Autonomi users anonymously on the network in exchange for ANT
For single user use, memory bandwidth is usually the bottleneck. To saturate the compute, you usually need batch inference, I.e. multiple users chatting at the same time or paralell processing of lots of documents. GPUs are much faster for this.
Memory bandwidth of 3090 is 936 GB/, 4090 about the same. Strix Halo should be around 256 GB/, a bit less than Macs with 128GB ram and about 1/4 of 3090/4090. Should be fast enough to be usable. 5090 has 1,792 GB/, so quite a bit faster.
Would it be possible to create some decentralized version of operating system? EG a version of linux married with nodes, with the whole filesystem hosted on Autonomi?
For AI you might want to wait for this : " NVIDIA Project DIGITS With New GB10 Superchip Debuts as World’s Smallest AI Supercomputer Capable of Running 200B-Parameter Models". Price ~3000$, should be available in May.