I think we are all going to have to wait for a better hardware:
(further in the future):
and/or better LLM structure:
… and what I think would be the Coup de grâce for speed and efficiency - stacking highly specialized (and so vastly smaller) LLM’s – with a language interpretation layer and a logic layer as the top two layers – wherein the logic layer can access any number of other specialized LLM’s to get the skill or data it requires to serve the request.