WED, 03 JUN 2026 · 18:34:47 UTC

Together AI

FlagshipInfrastructure

USA·HQ San Francisco·Est. 2022

Open-source model inference and training infrastructure.

7.0

our score

Our take

Together AI is the leading specialized inference cloud for open-source LLMs, but faces a brutal race against hyperscalers and model owners building their own stacks.

At a glance

Best known for
Fast, cheap inference API for open-source LLMs
Biggest strength
Price-performance leadership on open-weight model serving
Biggest risk
Margin compression from hyperscalers and model owners
Stage
Series B
Primary revenue
Usage-based fees for LLM inference and managed fine-tuning on dedicated/cloud GPUs

What they do

Together AI operates a specialized cloud infrastructure platform optimized for open-source generative AI models. Its core offering, Together Inference, provides API access to popular open-weight large language models—including Meta's Llama family, Mistral/Mixtral, DeepSeek, and Alibaba's Qwen—at price points that typically undercut general-purpose clouds. The platform abstracts away GPU cluster management, scheduling, and model-serving optimization, letting developers and enterprises run inference without building their own Kubernetes fleets. Complementing this is Together Fine-Tuning, a managed service that lets customers adapt these base models on proprietary data using Together's compute fabric.

The company targets a broad spectrum of AI builders: startups that need cost-efficient inference at scale, research labs running open-model evaluations, and enterprises experimenting with self-hosted alternatives to closed APIs like OpenAI's GPT-4. Together also invests in research around efficient inference—such as kernel optimizations, speculative decoding, and routing algorithms—to differentiate on latency and cost rather than model ownership. In effect, Together functions as an independent 'model-agnostic' AI cloud, betting that the open-source ecosystem will remain fragmented enough that developers need a neutral, optimized host rather than going directly to Meta, DeepSeek, or a hyperscaler.

Origin story

Together AI was founded in 2022 in San Francisco by a team with deep roots in AI research and distributed systems. The company emerged during the early generative AI boom with a thesis that open-source models would proliferate and require specialized infrastructure to run efficiently at scale—analogous to how the web needed CDNs and specialized databases. Its founders and early team included researchers with experience from Stanford, Berkeley, and major AI labs, giving it credibility in the research community.

The company quickly gained traction by offering inference endpoints for newly released open models faster than general clouds could, often within days of a model drop. This speed-to-market, combined with aggressive per-token pricing, made it a go-to for developers experimenting with Llama and Mistral variants. A pivotal moment came with its Series B, a $305 million round that valued the company at roughly $3.3 billion—one of the largest financings in the AI infrastructure layer at that stage. This capital signaled investor conviction that an independent inference layer could capture significant value even without owning frontier models. Since then, Together has expanded from pure inference into fine-tuning and enterprise-grade deployments, though it remains primarily known as an inference utility.

Key products

Together Inference

2022

API and dedicated-endpoint service for running open-source LLMs at scale, optimized for low latency and cost via custom scheduling and kernel optimizations.

Together Fine-Tuning

2023

Managed service for customizing open-weight models on customer datasets, supporting full-parameter and parameter-efficient methods like LoRA.

Leadership

  • VV

    Vipul Ved Prakash

    Co-founder & CEO

    Previously co-founded Topsy (acquired by Apple) and worked at Apple's AI/ML team; long-time open-source and search infrastructure advocate.

  • CZ

    Ce Zhang

    Co-founder

    Associate Professor at ETH Zurich with expertise in machine learning systems and distributed computing; drives technical research strategy.

  • CR

    Chris Ré

    Co-founder (public information limited on active operational role)

    Stanford professor and prominent ML systems researcher; known for work on data-centric AI and foundation model infrastructure.

Funding history

Year
Round
Amount
Lead investors
  • 2024
    Series B
    $305M
    General Catalyst, Salesforce Ventures, Coatue, Lux Capital, Prosperity7 (public information limited on full participant list)

Strengths & risks

Strengths

  • +Price-performance advantage on open-model inference vs general-purpose clouds
  • +Broad, fast model support capturing developer mindshare at release time
  • +Research credibility in ML systems and efficient serving optimizations
  • +Model-agnostic positioning hedges against any single open model losing relevance
  • +Strong balance sheet from $305M Series B enabling capacity expansion

Risks

  • Hyperscalers can replicate open-model endpoints and bundle with existing cloud spend
  • DeepSeek, Meta, and other model owners may offer cheaper or free hosted APIs
  • Commodity inference is a low-margin business at scale without differentiation
  • Enterprise buyers may prefer vertically integrated stacks from Nvidia or cloud providers
  • Rapid model release cycles require constant engineering investment to stay current

Recent moves

  1. Closed $305M Series B at ~$3.3B valuation

    2024

    Raised one of the largest infrastructure rounds of the year to scale GPU clusters and expand enterprise sales.

  2. Expanded model catalog to include DeepSeek and Qwen families

    2024

    Added high-performing Chinese open-weight models to maintain broadest coverage among independent inference providers.

Competitive position

Together AI competes on multiple fronts simultaneously. Against hyperscalers—AWS (Bedrock), Azure (Model Catalog), and GCP (Vertex AI)—it wins on price, speed of model availability, and lack of vendor lock-in, but loses on enterprise procurement integration and breadth of adjacent services. Against closed API providers like OpenAI and Anthropic, it offers model flexibility and data sovereignty, appealing to cost-conscious teams and privacy-sensitive enterprises, though it cannot match the raw capability of frontier proprietary models. Against other open-model clouds such as Fireworks AI, Groq, and Replicate, Together differentiates through its larger funding base, broader model catalog, and research-driven serving optimizations, though Groq has carved out a niche in ultra-low-latency with its LPU hardware.

The company's strategic challenge is that its advantages are largely operational and economic rather than structural. If AWS decides to subsidize Llama inference to retain GPU customers, or if DeepSeek offers its own API at cost, Together's pricing edge erodes quickly. Its best path to sustained differentiation is building proprietary efficiency layers—better speculative decoding, custom scheduling, or hardware co-design—that make its cost structure unmatchable even by giants, while deepening enterprise features like fine-tuning, security certifications, and multi-cloud portability.

What to watch

  • 01Gross margin trends as scale increases and hyperscalers compete on price
  • 02Enterprise customer concentration and average contract values moving upmarket
  • 03Release cadence of proprietary serving optimizations (kernels, routing, speculative decoding)
  • 04Strategic moves by Meta, DeepSeek, or Mistral to own their own hosted API channels
  • 05Capital expenditure efficiency: can it match GPU utilization rates of larger clouds

Frequently asked questions

How is Together AI different from OpenAI's API?

OpenAI hosts its own proprietary models like GPT-4, while Together specializes in open-weight models such as Llama and DeepSeek, offering more model choice and typically lower per-token pricing.

Can I use Together for production workloads or just experiments?

Together supports both; it offers pay-as-you-go APIs for prototyping and dedicated endpoints with SLAs for production traffic, though enterprise buyers should validate uptime and support tiers.

Does Together AI train its own foundation models?

No. Together is an infrastructure layer—it hosts, serves, and fine-tunes open-source models built by others, rather than developing proprietary foundation models from scratch.

How does pricing compare to AWS Bedrock or Azure?

Together generally undercuts hyperscalers on per-token pricing for comparable open models, though total cost of ownership depends on data transfer, storage, and existing cloud commitments.

What models are available on Together Inference?

The catalog includes Llama, Mixtral, DeepSeek, Qwen, and other popular open-weight LLMs, with new models typically added within days of release.

Is my data used to train Together's or the base models?

Together states that customer data submitted via API is not used to train foundation models, but enterprises should review the specific data processing agreement for fine-tuning workloads.

Can I run Together on my own cloud account or is it fully hosted?

Together is primarily a fully hosted service on its own GPU clusters; it does not currently offer a bring-your-own-cloud deployment model comparable to some Kubernetes-based alternatives.

Who should use Together Fine-Tuning vs doing it themselves?

Teams without dedicated ML infrastructure or GPU management expertise benefit most from Together's managed fine-tuning, which handles orchestration, checkpointing, and hyperparameter defaults.

The bottom line

Together AI has carved out a valuable position as the 'default cloud' for developers who want fast, cheap inference on open-weight models without managing GPUs themselves. Its aggressive pricing and broad model support (Llama, Mixtral, DeepSeek, Qwen) have driven rapid adoption among startups and research labs. However, the company sits in a high-risk middle layer: hyperscalers (AWS, Azure, GCP) are racing to add similar open-model endpoints, while model creators like Meta and DeepSeek are increasingly offering their own hosted APIs. Together's $3.3B valuation assumes it can maintain pricing power and expand into higher-margin enterprise workloads. The next 18 months will test whether it can evolve from a cost-efficient commodity inference layer into a stickier platform with proprietary optimizations, or whether it gets squeezed by giants above and below. Watch for enterprise traction, gross-margin trends, and whether it can build defensible moats in model-serving efficiency.

Visit Together AI

Key products

  • Together Inference
  • Together Fine-Tuning

Latest announcements

8 entries
  1. Together AI partners with Pearl Research Labs to launch a discounted Pearl-powered inference endpoint for Gemma-4-31B-it-pearl, using Proof of Useful Work to turn AI workloads into crypto emissions.

  2. Voice finder helps developers search, match, filter, and audition 600+ voices across Together AI TTS models using natural-language prompts or uploaded audio samples.

  3. Real-world inference benchmarks for coding agents: 31% more TPS than TensorRT-LLM, 2× better TTFT at saturation, and 76% lower cost than Claude Opus 4.6.

  4. What is an AI Native Cloud?

    announcementApr 7, 2026

    AI-native companies need infrastructure built for models, not legacy workloads. Learn what defines an AI Native Cloud and why it matters for the next platform shift.

  5. The team behind FlashAttention and ThunderKittens — how Together AI's kernel researchers close the gap between GPU hardware and production AI.

  6. Mamba-3

    researchMar 11, 2026

    Meet Mamba-3: the SSM built for inference. Faster than Transformers at decode, stronger than Mamba-2, and open-source from day one.

  7. As GPU throughput outpaces memory bandwidth, kernels must evolve. We introduce FlashAttention-4, featuring new pipelining for maximum overlap, 2-CTA MMA modes to reduce shared memory traffic, and a hardware-software hybrid approach to softmax exponentials.

  8. Together AI announces fine-tuning platform upgrades supporting larger models and longer contexts.

Related companies

All companies →