Inference Model Architecture

DigitalOcean And AMD Deliver Doubled Inference Performance For Character.ai

As enterprises seek alternatives to concentrated GPU markets, demonstrations of production-grade performance with diverse ...

Quadric, Inference Engine for On-Device AI Chips, Raises $30M Series C as Design Wins Accelerate Across Edge LLMs, Automotive, and Enterprise

Quadric Chimera (TM) processor IP is designed for this reality. Unlike fixed-function NPUs locked to today's model architectures, Chimera is fully programmable: it runs any AI model--current or future ...

Computer Weekly

Nvidia unveils Vera Rubin architecture to power AI agents

The AI chip giant has taken the wraps off its latest compute platform designed for test-time scaling and reasoning models, alongside a slew of open source models for robotics and autonomous driving.

Network World

OpenAI turns to Cerebras in a mega deal to scale AI inference infrastructure

The multibillion-dollar deal shows how the growing importance of inference is changing the way AI data centers are designed ...

TechNode

ByteDance unveils UltraMem architecture to reduce large model inference costs by up to 83%

Click to share on X (Opens in new window) X Click to share on Facebook (Opens in new window) Facebook ByteDance to exit gaming sector by closing down Nuverse Credit: ByteDance ByteDance’s Doubao Large ...

Business Wire

Vultr Launches Cloud Inference to Simplify Model Deployment and Automatically Scale AI Applications Globally

WEST PALM BEACH, Fla.--(BUSINESS WIRE)--Vultr, the world’s largest privately-held cloud computing platform, today announced the launch of Vultr Cloud Inference. This new serverless platform ...

The Next Platform

Cerebras Inks Transformative $10 Billion Inference Deal With OpenAI

If GenAI is going to go mainstream and not just be a bubble that helps prop up the global economy for a couple of years, AI ...

13d

New ‘Test-Time Training’ method lets AI keep learning without exploding inference costs

By allowing models to actively update their weights during inference, Test-Time Training (TTT) creates a "compressed memory" ...

13d

DGrid Launches First Web3 Decentralized Gateway Aggregation for AI Inference

DGrid, a next-generation decentralized AI infrastructure, today announced its official launch in 2026, introducing a ...

SDxCentral

AI IXP from Moonshot and QAI Moon brings inference closer to the edge

Moonshot Energy, QumulusAI (QAI Moon), and Connected Nation Internet Exchange Points (IXP.us) collaborated on a nationwide AI ...

VentureBeat

New transformer architecture can make language models faster and resource-efficient

Large language models like ChatGPT and Llama-2 are notorious for their extensive memory and computational demands, making them costly to run. Trimming even a small fraction of their size can lead to ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results