Cerebras Systems

Hardware

USA·HQ Sunnyvale·Est. 2016

Wafer-scale silicon — entire model on a single chip.

7.0

our score

Our take

Cerebras is the leading wafer-scale AI chip challenger, betting that massive monolithic silicon beats GPU clusters for the biggest models.

At a glance

Best known for: WSE, the largest silicon chip ever built for AI compute
Biggest strength: Wafer-scale architecture eliminating chip-to-chip interconnect bottlenecks
Biggest risk: Capital intensity and proving software ecosystem against NVIDIA CUDA
Stage: IPO Filed (2024)
Primary revenue: Sales of CS-3 wafer-scale systems and Cerebras Inference cloud services

What they do

Cerebras Systems designs and builds the Wafer Scale Engine (WSE) — a single silicon chip the size of an entire wafer — and the CS-3 systems that house it. Unlike conventional AI accelerators that link thousands of smaller GPUs or chips across networks, Cerebras etches compute and memory onto one massive die, aiming to eliminate the latency, power, and programming complexity of chip-to-chip interconnects. The result is a fundamentally different architecture where an entire large language model can reside on a single piece of silicon, removing the need for external networking during inference and training.

The company operates a hybrid business model. It sells CS-3 hardware directly to national laboratories, defense agencies, and select cloud providers, while also offering Cerebras Inference, a cloud API that delivers high-throughput LLM inference on WSE-3 backends without requiring customers to own the physical systems. Its software stack includes compilers and runtime tools that map PyTorch and other standard frameworks onto its tiled architecture, though extracting peak performance typically requires optimization. Cerebras targets the most demanding generative AI and high-performance computing workloads where model size and memory bandwidth are the primary constraints, positioning itself as a specialized alternative to GPU clusters.

Origin story

Cerebras was founded in 2016 in Sunnyvale, California, by a team including Andrew Feldman, Sean Lie, and Gary Lauterbach — veterans of SeaMicro, the microserver pioneer AMD acquired in 2012. Drawing on deep experience in dense system design, the founders set out to solve the interconnect bottleneck straining multi-chip AI clusters by building compute at wafer scale. After several years in stealth, the company unveiled the first Wafer Scale Engine in 2019, shocking the industry with a chip orders of magnitude larger than any GPU.

Subsequent generations — WSE-2 and the 2024 WSE-3 — doubled down on the architecture, while partnerships with entities such as UAE-based G42 and U.S. government labs provided early revenue and validation. These deals proved the manufacturing feasibility of full-wafer processors and established Cerebras in the high-performance AI training market. The company filed for an IPO in 2024, targeting a roughly $4 billion valuation as it sought to scale from niche supercomputing deployments into a broader AI inference platform serving enterprise and cloud customers.

Key products

WSE-3

2024

The third-generation wafer-scale processor powering CS-3 systems, designed to fit large AI models entirely on a single chip.

CS-3

2024

The server system integrating WSE-3 with power and cooling infrastructure, sold to data centers and labs.

Cerebras Inference

2024

A cloud API and hosted service delivering high-throughput LLM inference on WSE-3 hardware.

Leadership

AF
Andrew Feldman
CEO & Co-founder
Previously co-founded SeaMicro, which was acquired by AMD.
SL
Sean Lie
Co-founder & Chief Hardware Architect
Veteran chip architect from the SeaMicro and AMD lineage.
GL
Gary Lauterbach
Co-founder & CTO
Systems and silicon engineering leader; former SeaMicro/AMD.

Funding history

Year

Round

Amount

Lead investors

2024
IPO
$4B (est. valuation)
Public markets (filing)

Strengths & risks

Strengths

+Wafer-scale design removes network bottlenecks and delivers massive on-chip memory bandwidth
+Ability to fit entire large models on a single chip simplifies programming and reduces latency
+Strong traction in U.S. government, defense, and national lab segments
+Differentiated hardware story in a market dominated by NVIDIA GPU clusters
+Vertical integration from processor to system to inference cloud service

Risks

⚠Intense competition from NVIDIA's entrenched CUDA ecosystem and Blackwell architecture
⚠Extreme capital intensity and manufacturing yield risk at wafer scale
⚠Customer concentration with a relatively small number of CS-3 deployments to date
⚠Path to sustained profitability unproven post-IPO amid heavy R&D burn
⚠Dependence on TSMC for advanced wafer production and complex packaging

Recent moves

Filed for IPO targeting roughly $4 billion valuation
2024
Cerebras filed to go public in 2024, aiming to raise capital to scale manufacturing and its inference cloud business.
Launched WSE-3 and CS-3 systems
Early 2024
The third-generation wafer-scale processor and accompanying CS-3 system increased core count and memory to handle larger models.
Introduced Cerebras Inference cloud service
Mid-2024
A hosted API offering high-throughput LLM inference on WSE-3 hardware, marking a shift toward recurring cloud revenue.

Competitive position

Cerebras competes directly with NVIDIA's GPU empire and indirectly with other AI silicon players such as Groq, SambaNova Systems, and AWS's Trainium and Inferentia chips. Where Cerebras wins is on raw memory bandwidth and the ability to map massive models onto a single piece of silicon without the latency and power penalties of chip-to-chip networking. This makes it especially compelling for government supercomputing centers and certain cloud inference workloads where throughput and simplicity matter more than flexibility.

Where it loses is ecosystem maturity. NVIDIA's CUDA stack, massive developer mindshare, and broad ISV support remain unmatched, while hyperscalers increasingly prefer their own custom silicon for cost efficiency. Cerebras is not attempting to out-general NVIDIA; it is betting that a meaningful fraction of the AI market — the very largest models and the most latency-sensitive inference tasks — will pay a premium for wafer-scale simplicity. The risk is that this addressable market, while growing, remains narrow compared with the general-purpose AI infrastructure market, leaving Cerebras vulnerable to rapid GPU advancements and custom ASIC trends.

What to watch

01Post-IPO quarterly revenue growth and gross margins after 2024 listing
02Customer count outside government/defense and the UAE ecosystem
03Software stack maturity and ease of porting models from CUDA to Cerebras
04WSE-3 manufacturing yields and TSMC capacity allocation
05Inference pricing competitiveness versus GPU clouds and specialized ASICs

Frequently asked questions

What makes the Cerebras chip different from a GPU?

The Wafer Scale Engine is a single chip the size of an entire wafer, eliminating the need to connect thousands of smaller chips. This provides massive on-chip memory bandwidth and lets entire models run on one device.

What is the CS-3?

The CS-3 is Cerebras' server system that houses the WSE-3 processor, integrating power delivery and liquid cooling for data center deployment.

Who are Cerebras' typical customers?

National laboratories, defense agencies, and cloud providers running large-scale AI training and inference, though the company is expanding into enterprise via its cloud inference service.

Can I run PyTorch models on Cerebras hardware?

Yes. Cerebras provides a software stack that compiles standard frameworks like PyTorch for its tiled architecture, though models typically require optimization to exploit wafer-scale parallelism.

What is Cerebras Inference?

It is a hosted cloud API that delivers high-throughput LLM inference powered by WSE-3 chips, allowing developers to access wafer-scale compute without buying hardware.

Is Cerebras profitable?

As of its 2024 IPO filing, the company was not profitable; it is prioritizing R&D and market expansion in the capital-intensive AI silicon sector.

What are the main risks of wafer-scale manufacturing?

Manufacturing defects across a full wafer can lower yield, the chips require specialized liquid cooling, and supply is constrained by TSMC's advanced-node capacity.

How does Cerebras compare to NVIDIA?

Cerebras offers higher on-chip memory bandwidth and simpler scaling for massive models, but NVIDIA dominates in software ecosystem, developer adoption, and breadth of applications.

The bottom line

Cerebras occupies a unique position in the AI hardware stack with its wafer-scale approach, delivering unmatched on-chip memory bandwidth for training and inference of large models. Its 2024 IPO filing and push into inference-as-a-service signal ambitions to become a platform, not just a box vendor. However, the company faces the classic AI silicon challenge: proving sustained software ecosystem traction and profitability against NVIDIA's CUDA moat and hyperscaler TPU investments. If it can diversify beyond government and select cloud partners while ramping WSE-3 yields, it could cement itself as the go-to alternative for memory-bound AI workloads. If capital costs or customer adoption stall, the $4B valuation will face pressure.

Visit Cerebras Systems

Key products

WSE-3
Cerebras Inference

Cerebras Systems

At a glance

What they do

Origin story

Key products

WSE-3

CS-3

Cerebras Inference

Leadership

Funding history

Strengths & risks

Strengths

Risks

Recent moves

Filed for IPO targeting roughly $4 billion valuation

Launched WSE-3 and CS-3 systems

Introduced Cerebras Inference cloud service

Competitive position

What to watch

Frequently asked questions

Key products

Related companies

Nvidia

Runway

Meta AI

xAI