Home > AI > Body

Musk’s xAI Unveils Grok-3: More Power, But Is It Breaking New Ground?

clock
2025-02-18 07:24:15

Grok-3, developed by Elon Musk’s xAI, was unveiled on Monday, with the company making bold claims about its capabilities while showcasing a massive computing infrastructure that signals even bigger ambitions.

The announcement focused heavily on raw computational muscle, benchmark performance, and upcoming features, though many of the actual demonstrations felt like replays of what other AI companies have already achieved.

The star of the initial part of the show wasn't the AI itself, but rather "Colossus," a behemoth cluster of 200,000 GPUs that powers Grok-3's training. 

The system came together in two phases: 122 days of synchronous training on 100,000 GPUs, followed by 92 days of scaling up to the full 200,000. According to the xAI developers, building this infrastructure proved more challenging than developing the AI model itself. 

The company already has plans for an even more powerful cluster, with Musk saying they are aiming for five times the current capacity, effectively building what would be the most powerful GPU cluster on earth.

When it comes to performance, Grok-3 shows impressive results across standard AI benchmarks. The base model (the regular model without Chain of Thought and reasoning embedded) consistently tops the charts in math (AIME), science (GPOA), and coding (LCB) tests. 

It also seems very promising in blind tests. 

xAI confirmed that the mysterious model codenamed “Chocolate” was actually an early test version of Grok-3 that was uploaded to the LLM Arena

During those tests, it achieved the best ELO among all the LLMs, meaning users preferred its answers over the generations provided by all the other AI models in direct competition without knowing which model they were evaluating.

This is probably the most accurate way to measure quality without giving models any chance to cheat on benchmarks by training their AIs on those datasets. This benchmark is based purely on preference and blind choice by thousands of anonymous users.

xAI team shows off Grok 3's benchmark tests during a live presentation. Image: xAI
xAI team shows off Grok 3's benchmark tests during a live presentation. Image: xAI

A specialized "Reasoning Beta" variant of Grok-3, which employs internal chain-of-thought processing and additional computing at test time, pushes math scores even higher—reaching 93% on the AIME 2025 benchmark compared to the other best-performing models that rank below 87%.

Interestingly, a smaller version called Grok-3 Mini Reasoning Beta sometimes outperforms its larger sibling, thanks to a longer training time.

In other words, the full-size Grok-3 still has room for improvement once it receives comparable training duration, which seems promising given its greater parameter count.

But when xAI moved to demonstrate Grok-3's capabilities live, the presentation felt more like a game of catch-up than innovation. The team showcased the model solving physics problems and writing game code from scratch—impressive feats that ChatGPT, Claude, and Google's Gemini mastered a while ago. 

They also introduced DeepSearch, a research agent that, like similar tools from OpenAI and Google, scours the web and generates extensive reports on given topics.

X Premium Plus subscribers get immediate access to Grok-3, but the most powerful version and updated versions will usually live in a dedicated standalone app or on Grok.com.

Voice interactions, similar to OpenAI’s “Advanced Voice Mode” will arrive in the upcoming weeks, with Musk emphasizing this isn't simple text-to-speech but a genuine AI voice model capable of natural, expressive speech. 

Developers will get API access in the coming weeks, along with audio transcription capabilities, making Grok-3 a powerful tool for third-party AI-powered apps.

Just after showcasing an example of a Tetris game generated by Grok, xAI also revealed plans for an AI gaming studio that will let developers build games powered by Grok-3. 

Right now, the model is being slowly rolled out. By the time of writing, Decrypt has yet to receive access to the model, but some enthusiasts have tried it and are so far pleased with the results.

Computer scientist Lex Friedman, one of the loudest voices in the AI space, praised Grok-3’s capabilities.

I got to use Grok 3 extensively (early). My mind is blown, very impressive model 🤯 Congrats to Elon and the team for bringing it to life 👊

— Lex Fridman (@lexfridman) February 18, 2025

Others compared it to leading market rivals.

“Grok 3 + Thinking feels somewhere around the state of art territory of OpenAI's strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking,” former OpenAI co-founder Andrej Karpathy wrote in an extensive post on X. “For now, big congrats to the xAI team, they clearly have huge velocity and momentum”

I was given early access to Grok 3 earlier today, making me I think one of the first few who could run a quick vibe check.

Thinking
✅ First, Grok 3 clearly has an around state of the art thinking model ("Think" button) and did great out of the box on my Settler's of Catan… pic.twitter.com/qIrUAN1IfD

— Andrej Karpathy (@karpathy) February 18, 2025

X user Penny2x shared a game built from scratch with Grok-3—a 2d platformer similar to Mario Bros. 

They appeared impressed by Grok’s ability to understand instructions and improve upon several iterations. 

“I just keep asking for adjustments, and it keeps spitting the game out in a single file that I can put on my desktop and run.” he wrote in a post on X. “This is incredible. We live in the future. Everyone is a developer now.”

The game is available for testing at Thank Doge.

The company also confirmed plans to open-source Grok-2 once Grok-3 is fully mature and running correctly, which is expected to occur sometime in the coming months. 

xAI previously open-sourced its models after Grok-2, continuing its trend of releasing older versions to spur innovation—though Grok-2 lags behind top-tier models.

For now, Grok-3 appears adept at matching what the best AI models can already do. 

The real test will come when xAI rolls out its promised voice features, gaming tools, and API access in the weeks ahead. Now, the ball is in OpenAI’s court, which is set to release GPT-4.5 soon.

Edited by Sebastian Sinclair

Web3 Desktop Trading Tool
Stay ahead of the game in the cryptocurrency space.

7x24 Newsflash

05:21 2025-04-07
数据:持有超千枚 BTC 地址两月增 76 个,机构需求上升
据数据分析师 Ali 监测,过去两个月内新增 76 个持有超过 1,000 枚比特币的地址,增长幅度达 4.6%,显示出机构对比特币需求持续上升。
05:12 2025-04-07
香港证监会:今日将发布通函允许持牌虚拟资产交易平台提供质押服务
香港证监会投资产品部执行董事蔡凤仪今日(4月7日)出席香港Web 3嘉年华时表示,香港证监会将于今日发布通函,允许持牌虚拟资产交易平台提供质押服务,包括为虚拟资产现货ETF提供质押。质押允许投资者锁定其虚拟资产以支持区块链网络并获得回报,但由于存...
05:09 2025-04-07
两巨鲸向币安存入1933.2枚 BTC,价值约1.4925亿美元
据链上分析师 Lookonchain(@lookonchain)监测,两巨鲸在过去50分钟内向币安存入1933.2枚 BTC,价值约 1.4925 亿美元。
04:48 2025-04-07
某鲸鱼11 小时前卖出 309,514 枚TRUMP,亏损 1,406 万美元。
据Lookonchain监测,11 小时前,鲸鱼“traderpow”卖出 309,514 枚TRUMP (263 万美元),亏损 1,406 万美元。
04:45 2025-04-07
5.7万枚ETH持仓巨鲸即将面临清算,剩余补仓时间不足20分钟
据ai_9684xtpa监测,距离"56,995枚ETH濒临清算巨鲸"仓位触发清算不足20分钟,今日下午1:00MakerDAO预言机价格将更新为1531.63美元,而该巨鲸仓位的清算价格是1,564.58美元。
04:39 2025-04-07
BTC跌破77000美元
行情显示,BTC跌破77000美元,现报76990.01美元,24小时跌幅达到7.72%,行情波动较大,请做好风险控制。
04:36 2025-04-07
VIX飙至疫情峰值,特朗普关税战触发美股“黑色星期一”警报!此刻,所有人都在屏息等待:1987年的噩梦会否在今晚重现?点击查看...
VIX飙至疫情峰值,特朗普关税战触发美股“黑色星期一”警报!此刻,所有人都在屏息等待:1987年的噩梦会否在今晚重现?点击查看...
04:34 2025-04-07
期货热点追踪
周度展望:贸易紧张加剧、需求减弱与季节性因素影响,棕榈油期货价格将何去何从?
04:27 2025-04-07
MSCI新兴市场指数下跌6.9%,为2020年以来最大跌幅,抹去年内涨幅。
MSCI新兴市场指数下跌6.9%,为2020年以来最大跌幅,抹去年内涨幅。
04:24 2025-04-07
币安第二期投票上币榜:UXLINK、IP、ATH暂列前三
据官方页面信息,截至发稿,币安第二期投票上币榜排名如下:UXLINK 投票占比 21.5%,排名第一;IP 投票占比 19.7%,排名第二;ATH 投票占比 16.3%,排名第三; BIGTIME 投票占比 9.7%,排名第四。
04:16 2025-04-07
1. ChatGPT在印度用户激增但营收落后。2. Meta发布新一代AI模型Llama 4。<spa...
1. ChatGPT在印度用户激增但营收落后。2. Meta发布新一代AI模型Llama 4。3. Meta新旗舰AI模型Llama 4 Maverick测试成绩遭质疑,被指针对性优化。4. 马斯克诉OpenAI案将于明年春季启动陪审团审判。5. 字节跳动登记即梦AI作品著作权。6. 协创数据新设子公司,含AI相关业务。7. 非法获取学生信息并用AI技术发送骚扰短信,在校大学生被抓。8. 全国首个,深圳海关智能查验机器人引入“满血版”DeepSeek-R1。9. 谷歌研究:合成数据使大模型数学推理能力提升八倍。10. 科大讯飞打造中日英三语交互AI孙悟空,将亮相大阪世博会中国馆。11. 消息称OpenAI考虑收购AI设备初创公司io Products,后者由Jony Ive与Sam Altman共同创立。
04:15 2025-04-07
特朗普顾问纳瓦罗在福克斯新闻上公开批评马斯克
KrassenCast创始人Ed Krassenstein在X平台透露,特朗普顾问彼得・纳瓦罗在福克斯新闻上公开批评埃隆・马斯克,"听到马斯克谈论与欧洲建立零关税区挺有意思的。他根本没搞明白这里面的门道。而我认为公众需要明白的关键是 —— 马斯克本质上就是个卖车的,这才是他的核心身份。"