Together AI
FlagshipInfrastructureUSA·HQ San Francisco·Est. 2022
Open-source model inference and training infrastructure.
our score
Our take
Together AI is the leading specialized inference cloud for open-source LLMs, but faces a brutal race against hyperscalers and model owners building their own stacks.
At a glance
- Best known for
- Fast, cheap inference API for open-source LLMs
- Biggest strength
- Price-performance leadership on open-weight model serving
- Biggest risk
- Margin compression from hyperscalers and model owners
- Stage
- Series B
- Primary revenue
- Usage-based fees for LLM inference and managed fine-tuning on dedicated/cloud GPUs
What they do
Together AI operates a specialized cloud infrastructure platform optimized for open-source generative AI models. Its core offering, Together Inference, provides API access to popular open-weight large language models—including Meta's Llama family, Mistral/Mixtral, DeepSeek, and Alibaba's Qwen—at price points that typically undercut general-purpose clouds. The platform abstracts away GPU cluster management, scheduling, and model-serving optimization, letting developers and enterprises run inference without building their own Kubernetes fleets. Complementing this is Together Fine-Tuning, a managed service that lets customers adapt these base models on proprietary data using Together's compute fabric.
The company targets a broad spectrum of AI builders: startups that need cost-efficient inference at scale, research labs running open-model evaluations, and enterprises experimenting with self-hosted alternatives to closed APIs like OpenAI's GPT-4. Together also invests in research around efficient inference—such as kernel optimizations, speculative decoding, and routing algorithms—to differentiate on latency and cost rather than model ownership. In effect, Together functions as an independent 'model-agnostic' AI cloud, betting that the open-source ecosystem will remain fragmented enough that developers need a neutral, optimized host rather than going directly to Meta, DeepSeek, or a hyperscaler.
Origin story
Together AI was founded in 2022 in San Francisco by a team with deep roots in AI research and distributed systems. The company emerged during the early generative AI boom with a thesis that open-source models would proliferate and require specialized infrastructure to run efficiently at scale—analogous to how the web needed CDNs and specialized databases. Its founders and early team included researchers with experience from Stanford, Berkeley, and major AI labs, giving it credibility in the research community.
The company quickly gained traction by offering inference endpoints for newly released open models faster than general clouds could, often within days of a model drop. This speed-to-market, combined with aggressive per-token pricing, made it a go-to for developers experimenting with Llama and Mistral variants. A pivotal moment came with its Series B, a $305 million round that valued the company at roughly $3.3 billion—one of the largest financings in the AI infrastructure layer at that stage. This capital signaled investor conviction that an independent inference layer could capture significant value even without owning frontier models. Since then, Together has expanded from pure inference into fine-tuning and enterprise-grade deployments, though it remains primarily known as an inference utility.
Key products
Together Inference
2022API and dedicated-endpoint service for running open-source LLMs at scale, optimized for low latency and cost via custom scheduling and kernel optimizations.
Together Fine-Tuning
2023Managed service for customizing open-weight models on customer datasets, supporting full-parameter and parameter-efficient methods like LoRA.
Leadership
- VV
Vipul Ved Prakash
Co-founder & CEO
Previously co-founded Topsy (acquired by Apple) and worked at Apple's AI/ML team; long-time open-source and search infrastructure advocate.
- CZ
Ce Zhang
Co-founder
Associate Professor at ETH Zurich with expertise in machine learning systems and distributed computing; drives technical research strategy.
- CR
Chris Ré
Co-founder (public information limited on active operational role)
Stanford professor and prominent ML systems researcher; known for work on data-centric AI and foundation model infrastructure.
Funding history
- 2024Series B$305MGeneral Catalyst, Salesforce Ventures, Coatue, Lux Capital, Prosperity7 (public information limited on full participant list)
Strengths & risks
Strengths
- +Price-performance advantage on open-model inference vs general-purpose clouds
- +Broad, fast model support capturing developer mindshare at release time
- +Research credibility in ML systems and efficient serving optimizations
- +Model-agnostic positioning hedges against any single open model losing relevance
- +Strong balance sheet from $305M Series B enabling capacity expansion
Risks
- ⚠Hyperscalers can replicate open-model endpoints and bundle with existing cloud spend
- ⚠DeepSeek, Meta, and other model owners may offer cheaper or free hosted APIs
- ⚠Commodity inference is a low-margin business at scale without differentiation
- ⚠Enterprise buyers may prefer vertically integrated stacks from Nvidia or cloud providers
- ⚠Rapid model release cycles require constant engineering investment to stay current
Recent moves
Closed $305M Series B at ~$3.3B valuation
2024Raised one of the largest infrastructure rounds of the year to scale GPU clusters and expand enterprise sales.
Expanded model catalog to include DeepSeek and Qwen families
2024Added high-performing Chinese open-weight models to maintain broadest coverage among independent inference providers.
Competitive position
Together AI competes on multiple fronts simultaneously. Against hyperscalers—AWS (Bedrock), Azure (Model Catalog), and GCP (Vertex AI)—it wins on price, speed of model availability, and lack of vendor lock-in, but loses on enterprise procurement integration and breadth of adjacent services. Against closed API providers like OpenAI and Anthropic, it offers model flexibility and data sovereignty, appealing to cost-conscious teams and privacy-sensitive enterprises, though it cannot match the raw capability of frontier proprietary models. Against other open-model clouds such as Fireworks AI, Groq, and Replicate, Together differentiates through its larger funding base, broader model catalog, and research-driven serving optimizations, though Groq has carved out a niche in ultra-low-latency with its LPU hardware.
The company's strategic challenge is that its advantages are largely operational and economic rather than structural. If AWS decides to subsidize Llama inference to retain GPU customers, or if DeepSeek offers its own API at cost, Together's pricing edge erodes quickly. Its best path to sustained differentiation is building proprietary efficiency layers—better speculative decoding, custom scheduling, or hardware co-design—that make its cost structure unmatchable even by giants, while deepening enterprise features like fine-tuning, security certifications, and multi-cloud portability.
What to watch
- 01Gross margin trends as scale increases and hyperscalers compete on price
- 02Enterprise customer concentration and average contract values moving upmarket
- 03Release cadence of proprietary serving optimizations (kernels, routing, speculative decoding)
- 04Strategic moves by Meta, DeepSeek, or Mistral to own their own hosted API channels
- 05Capital expenditure efficiency: can it match GPU utilization rates of larger clouds
Frequently asked questions
How is Together AI different from OpenAI's API?
OpenAI hosts its own proprietary models like GPT-4, while Together specializes in open-weight models such as Llama and DeepSeek, offering more model choice and typically lower per-token pricing.
Can I use Together for production workloads or just experiments?
Together supports both; it offers pay-as-you-go APIs for prototyping and dedicated endpoints with SLAs for production traffic, though enterprise buyers should validate uptime and support tiers.
Does Together AI train its own foundation models?
No. Together is an infrastructure layer—it hosts, serves, and fine-tunes open-source models built by others, rather than developing proprietary foundation models from scratch.
How does pricing compare to AWS Bedrock or Azure?
Together generally undercuts hyperscalers on per-token pricing for comparable open models, though total cost of ownership depends on data transfer, storage, and existing cloud commitments.
What models are available on Together Inference?
The catalog includes Llama, Mixtral, DeepSeek, Qwen, and other popular open-weight LLMs, with new models typically added within days of release.
Is my data used to train Together's or the base models?
Together states that customer data submitted via API is not used to train foundation models, but enterprises should review the specific data processing agreement for fine-tuning workloads.
Can I run Together on my own cloud account or is it fully hosted?
Together is primarily a fully hosted service on its own GPU clusters; it does not currently offer a bring-your-own-cloud deployment model comparable to some Kubernetes-based alternatives.
Who should use Together Fine-Tuning vs doing it themselves?
Teams without dedicated ML infrastructure or GPU management expertise benefit most from Together's managed fine-tuning, which handles orchestration, checkpointing, and hyperparameter defaults.
The bottom line
Together AI has carved out a valuable position as the 'default cloud' for developers who want fast, cheap inference on open-weight models without managing GPUs themselves. Its aggressive pricing and broad model support (Llama, Mixtral, DeepSeek, Qwen) have driven rapid adoption among startups and research labs. However, the company sits in a high-risk middle layer: hyperscalers (AWS, Azure, GCP) are racing to add similar open-model endpoints, while model creators like Meta and DeepSeek are increasingly offering their own hosted APIs. Together's $3.3B valuation assumes it can maintain pricing power and expand into higher-margin enterprise workloads. The next 18 months will test whether it can evolve from a cost-efficient commodity inference layer into a stickier platform with proprietary optimizations, or whether it gets squeezed by giants above and below. Watch for enterprise traction, gross-margin trends, and whether it can build defensible moats in model-serving efficiency.
Key products
- Together Inference
- Together Fine-Tuning