AI News - from general AI to LLM’s. All posters welcome

A general topic for AI news, from general AI to LLM’s. All posters welcome.




Google Research unveiled TurboQuant, a compression algorithm for large language models and vector search engines, that shrinks a major inference-memory bottleneck: it reduces an AI model’s memory 6x, making it 8x faster with the same number of GPUs, all the while maintaining zero loss in accuracy and “redefining AI efficiency.”

Their paper is slated for presentation at ICLR 2026, but the reaction online was immediate: Cloudflare CEO Matthew Prince called it "Google’s DeepSeek moment. "

I expect all major AI providers will adopt this posthaste with lower inference prices to follow.

7 Likes

That feels like it could lead to demand on gpu and Ram going down, leading to prices going down.

2 Likes

According to the report, Anthropic is “developing and has begun testing with early access customers a new AI model more capable than any it has released previously,” the company said, following a data leak that revealed the model’s existence.

An Anthropic spokesperson said the new model represented “a step change” in AI performance and was “the most capable we’ve built to date.” The company said the model is currently being trialed by “early access customers.”

Descriptions of the model were inadvertently stored in a publicly-accessible data cache and were reviewed by Fortune.

A draft blog post that was available in an unsecured and publicly-searchable data store prior to Thursday evening said the new model is called “Claude Mythos” and that the company believes it poses unprecedented cybersecurity risks.

Hmmmmm

Jevons paradox

But doesnt this mean prices stay the same, but I can use more memory?

2 Likes

Not any time soon. It’s not so hard to reconfigure software to incorporate Google’s new compression algo, but at least two major things hold back increasing the context window (using this compression method):

  1. models have to be trained to use a larger context and this will take many months at least.
  2. even with larger context, models have more and more trouble keeping track of large context windows - meaning there isn’t a lot of incentive here (not much payback)

So while this compression system does give much more flexibility in terms of context, there are still some limits that will take longer to resolve; but by simply applying this compression system, there is large cost savings setarus parabus for model providers; hence it’s almost certain they are all now scrambling to put this into play and given the competition between them, much of this cost savings (not all) will be passed onto the consumer.

1 Like

I don’t think pricing will be affected by this because it isn’t being determined by cost or even demand. LLM providers are competing for users and losing money on every new product they create and, often frequently withdraw once it’s purpose is done.

They need to keep hyping in this period because they are losing money and hoping to make it one day. To do this they needs to change the environment in which these products operate. That’s quite a job, but we see many ways in which this is being attempted, from law and regulation, to business practices, software and other endeavours etc.

Little of that change is likely to be good in general because it isn’t motivated by things which benefit people in general. If they succeed, it will give those building these models - billionaires - yet more power. Incredibly more power over everyone. :thinking:

Ah but local LLMs!

:man_facepalming:

Sorry, that’s not decentralising despite the claims.

Who builds those models? I’m setting up the first node of my own computing cluster right now, but unless I have tens of millions to spend it ain’t going to be able to create an LLM model in my lifetime.

We’re not in a market here. There is competition, but there’s also a joint need by these companies to change the game, and we’ve seen how easily they’ve been able to do that already but they need to reach a point of no going back - for users - before the money runs out.

3 Likes

Local models aren’t competitive for much yet IMO and this compression technique isn’t of any use really for local models. It impacts kv cache – both the amount used and so consequently the amount being loaded and unloaded, thus making a huge difference in terms of kv cache use/user.

All up this is huge for providers and will almost certainly drop their costs a lot as they can serve many more customers with the same hardware.

I asked an AI to give some cost estimates and the answers were pretty incredible. Below are a couple of clips of the conversation:

Workload Type Estimated Increase in Users/Server
Short context, high concurrency 50-150%
Long context, moderate concurrency 200-400%
Ultra-long context, low concurrency 400-900%
Batch inference (offline) 300-600% higher throughput

The bottom line: for the workloads where the KV cache actually matters (which is most production deployments with meaningful context or concurrency), you’re looking at 2x to 6x more users per GPU , with the highest gains in exactly the use cases that are currently most expensive to serve.

TurboQuant suggests that current long-context pricing includes a significant “inefficiency premium” that can be eliminated. Whether that savings gets passed to customers immediately depends on competitive pressure, but over time, long context should become much cheaper relative to short context than it is today.

If you’re a heavy user of long-context AI, this is good news—your costs will likely drop substantially over the next 6-12 months.

3 Likes

Let’s check back. BTW, you have an advantage over me as I have no idea what they currently charge.

Once the trap is sprung and enshitification begins though, expect prices to increase one way or another because they have a staggeringly big investment to recoup. I’ve no idea when that will be because they first have to ensure users can just switch back to their old ways.

2 Likes

A look at current prices per million tokens:

Source: [Comparison of AI Models across Intelligence, Performance, and Price]

2 Likes

MiniMax 2.7 is a new Chinese model that was released this month and has really great price/performance (as can be seen from the chart in the post above. I’ve not used it myself yet (I mostly use Kimi 2.5 and Deepseek 3.2 - competitors to MiniMax).

I found this short video discussing MiniMax 2.7:

Google new compression algo is going to be a boon for all Chinese models who are very constrained by older hardware.

A 4x content window results in, simplified, 4 people on 1 gpu instead of 1 per gpu. The memory producers stocks went down on the new infornation. I can see that demand for memory has a chance to go down.

3 Likes

Let’s not forget the impact of Trump’s war. Semiconductor production is apparently particularly vulnerable if it continues.

4 Likes

Many variables in play, I hope my local Ai one day can collect info and make calculations on different probabilities for different outcomes

1 Like

It seems Sam Altman of OpenAI was hoarding RAM and, possibly, Google’s new memory algo has caused Altman to dump.

IMO, get it while it’s cheap. The banks are in war mode (printing lots of new cash). So all prices will be going up, although there will be a delay (usually a multi-month delay between the money supply increase and the price increase.) Not even to mention the cost of production going up due to energy restrictions.

1 Like

The AI job-pocolypse hasn’t even really begun yet, but it’s coming … no doubt about it.

You can now get an email address for your agent:

The internet is being burned to the ground before our eyes and many are cheering this on rather than saying hang on, what could possibly go wrong.

2 Likes

I don’t pretend to know the consequences of AI, but things are not looking good for employment in the future. Which necessarily will crush the economy … which then begs the question - who will be able to buy the stuff the AI and robots produce? And hence: just how far can this AI revolution go before it effectively knocks the legs off it’s own stool?

1 Like

Radiologists are mediocre in general, the sooner they are replaced the more people will be saved from suffering and death.

Local doctors are also mediocre in general, same thing with them, the sooner they are partially replaced the more people will be saved.

3 Likes

The AI’s are definitely superior in ability and are only getting better.

1 Like