GLM-5.1 High-Speed API Sets New Record for Model Output Speed

22/05 11:23

GLM-5.1 High-Speed API has been launched for select enterprise clients, achieving a model output speed of 400 tokens per second, setting a new global record for large model official interface end-to-end speed. According to Odaily, this high-speed version retains the capabilities of the original flagship model and is powered by a high-performance inference engine jointly developed by Zhipu and the TileRT team. The engine optimizes GPU operation scheduling by restructuring it into a persistent Engine Kernel that resides on the GPU, reducing kernel startup and memory read/write delays in traditional inference. In multi-card scenarios, TileRT further specializes GPU nodes in the 8-card NVL topology into different functional Workers to enhance attention layer computation and cross-card communication efficiency. Currently, this high-speed version is available to select enterprise clients on the Zhipu MaaS platform. Future plans include optimizing FP8 inference and extended context capabilities to support low-latency scenarios such as AI programming, real-time interaction, and real-time voice applications.

Bullish

Bearish

Live Updates

6 hours ago
Totalis, the prediction market derivatives layer supported by YC, has officially launched, supporting parcel trading.
Bullish
Bearish
6 hours ago
Fake Uniswap Site Steals Over $400,000 in Crypto
Bullish
Bearish
6 hours ago
XRP Faces Downward Pressure After Resistance Test
Bullish
Bearish
6 hours ago
Europol and Latvian Authorities Dismantle SIM Card Fraud Network
Bullish
Bearish
6 hours ago
Japan's Core Consumer Inflation Rate Rises to 2.8% in April
Bullish
Bearish
6 hours ago
ZEC and XMR Drop 5% Amid Middle East Tensions
Bullish
Bearish
6 hours ago
USD/JPY Rises to 159, Gains 0.07% Intraday
Bullish
Bearish
6 hours ago
Bitcoin Faces High Risk as Institutional Selling Continues
Bullish
Bearish
6 hours ago
HYPE Briefly Surpasses Dogecoin as Privacy Tokens Slide Amid US-Iran Tensions
Bullish
Bearish
6 hours ago
ECB Executive Board Member Schnabel Highlights Inflation and Economic Growth Risks
Bullish
Bearish

GLM-5.1 High-Speed API Sets New Record for Model Output Speed

Live Updates

Trending News

$5M Crypto Ponzi Scheme Lands China’s Former Second Wealthiest Tycoon Yang Bin a 6-Year Jail Sentence

Russia to Use National Payment System for Crypto Swaps on 1 Sept — What You Need to Know

Meta CEO Mark Zuckerberg Expresses Regret Over Yielding to White House’s COVID-19 Censorship Demands

From Memes to Viral Videos: How Viggle AI is Taking Over Social Media with Effortless Image-to-Video Magic

AI technology "bubble" warning sounded! Gold broke through 2500 to hit a record high. Bitcoin "decoupled" and no longer a safe haven

Pixelverse Drops ‘Black Puma’ NFT Collection on Telegram’s TON – Here’s How to Get Yours

Coinbase’s New AI Machine Learning Model Set to Handle Bitcoin Traffic Spikes and Prevent Outages

Bitcoin Boom in Singapore as the Island Nation Develops Robust Risk Management Frameworks for Digital Asset Tokenisation

Meta Ditches High-End VR Headset, Debuting New Low-Cost Quest 3S VR Headset Set at Connect Conference in September

TON Society Demands Telegram CEO’s Release in Open Letter Amidst Mounting Worries of the Impact of Pavel Durov’s Arrest on Crypto