For single user use, memory bandwidth is usually the bottleneck. To saturate the compute, you usually need batch inference, I.e. multiple users chatting at the same time or paralell processing of lots of documents. GPUs are much faster for this.
Memory bandwidth of 3090 is 936 GB/, 4090 about the same. Strix Halo should be around 256 GB/, a bit less than Macs with 128GB ram and about 1/4 of 3090/4090. Should be fast enough to be usable. 5090 has 1,792 GB/, so quite a bit faster.
Would it be possible to create some decentralized version of operating system? EG a version of linux married with nodes, with the whole filesystem hosted on Autonomi?
For AI you might want to wait for this : " NVIDIA Project DIGITS With New GB10 Superchip Debuts as Worldās Smallest AI Supercomputer Capable of Running 200B-Parameter Models". Price ~3000$, should be available in May.