Inference Models - Search News

OpenAI Halves Inference Costs With Software Alone: GPUs Drop to Hundreds

OpenAI inference cost reduction cut ChatGPT guest traffic from tens of thousands of Nvidia GPUs to just a couple hundred, ...

Morning Overview on MSN

OpenAI and Broadcom detailed a custom inference chip built to cut AI’s soaring costs

OpenAI partnered with Broadcom in October 2025 to design a custom inference chip aimed at reducing the growing expense of ...

1d

How AI Inference Sends Decision Making To The Edge

The next phase of AI infrastructure will not be defined by a single destination called “the cloud” or “the edge.” ...

AI Inference and World Model Startups Pull $1.8B in Two Days as Foundation Models Commoditize

AI inference infrastructure investment pulled $1.8 billion in 48 hours as Baseten’s $1.5B round at a $13B valuation and ...

2d

OpenAI reportedly reduced inference costs by more than half

According to a media report, OpenAI engineers have found optimizations that reduce the cost of operating existing AI models ...

9d

OpenAI unveils first custom AI inference chip, Jalapeño, with Broadcom — and its development was sped-up with OpenAI's own models

The companies attributed this speed to a deep software-hardware co-development process that actively used OpenAI’s own models ...

3d

China claims biggest AI model trained on local chips, as Meituan releases LongCat-2.0

LongCat-2.0 boasts 1.6 trillion parameters and a million-token context window, on par with DeepSeek’s latest flagship model.

8h

Waterloo's PAW compiles task specs into 23MB LoRA adapters a 600M-parameter model runs entirely offline.

Local AI inference at 32B-parameter quality, no cloud API required: University of Waterloo researchers released PAW on July 2, 2026, a system that compiles any natural-language task spec into a 23MB ...

4dOpinion

Why Chinese AI Models Should Worry Nvidia, Micron Stock Investors

Chinese models are quietly challenging the $600B+ AI infrastructure supercycle. Markets have glossed over it, but they ...

4d

DeepSeek open sources DSpark, a new framework to speed up LLM inference by up to 85%

DSpark can make decoding faster, but acceptance quality still determines how much speed the system actually realizes.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results