What 37signals’ Cloud Repatriation Taught Us About AI Infrastructure

Summary

Cloud repatriation can cut costs by millions at scale.
AI workloads amplify savings due to GPU and storage costs.
On-prem or hybrid suits predictable, high-volume inference.
Cloud still fits burst workloads like model training.
Hybrid strategies balance cost, performance, and compliance.

In 2023, 37signals announced that it had completely left the public cloud and followed up by publicly documenting its cloud repatriation process, providing one of the clearest real-world examples of on-premises economics at scale. By reversing its cloud migration and shifting workloads to private cloud infrastructure, the company drastically reduced its annual cloud infrastructure spend by almost $2 million.

The transparency of the numbers made the case compelling. In 2022, 37signals spent $3,201,564 on cloud services, which is about $266,797 per month. These detailed cost breakdowns, along with published hardware investment and payback timelines, provided a rare look into the financial mechanics of large-scale cloud repatriation.

For commodity SaaS workloads, the math was clear. But the same logic raises an important question for the next generation of compute-heavy systems: “Does the economic argument extend to AI infrastructure as well?” In this article, we examine whether the same economic logic holds for AI infrastructure.

TL;DR

37signals spent ~$3.2M/year on AWS in 2022.
After repatriating workloads to their own infrastructure, cloud spend dropped to ~$1.3M by 2024.
The company invested roughly $700K–$800K in servers and paid them off in under 18 months.
The entire infrastructure is still run by the same 10-person team. No additional operational overhead.
The key takeaway is that at a sustained scale, owning infrastructure can be dramatically cheaper than renting it.

The 37signals Playbook: What Hanson Actually Documented

In 2022, 37signals spent $3.2 million annually on AWS. After leaving the cloud in 2023, their annual costs had dropped to approximately $1.3 million by 2024, a reduction of almost $2 million per year.

The transition required a hardware investment of roughly $600,000 in Dell servers. The company fully recouped the investment in under 18 months, achieving complete payback in the second half of 2023 as their AWS reserved instance contracts expired. From that point forward, the savings flowed directly to operating margin rather than offsetting capital expense.

37signals projected $1.5 million in hardware costs and roughly $200,000 per year in operating expenses. This shift replaces a recurring $1.3 million annual cloud storage bill with a one-time capital outlay plus a fraction of the ongoing operating cost. Over five years, 37signals revised the total savings projections upward from $7 million to more than $10 million.

37signals cloud exit financials by year

To illustrate the financial impact of 37signals’ cloud exit over time, the table below breaks down annual cloud spending, on-premises hardware investments, and operating costs, highlighting the resulting net savings and key operational notes.

Year	Cloud spend	Hardware investment	Operating costs	Notes
2022 Baseline	~$3.2M	$0	Included in cloud spend	Full cloud dependency
2023 Migration	~$2M	~$700–800K	Moderate	Hardware fully recouped in under 18 months
2024+ Post-repatriation	~$1.3M	~$1.5M (storage)	~$200K/year	~$1.9M annual savings
2025+	Minimal AWS dependency	~$1.5M (Pure Storage, 18PB)	~$200K/year	$10M+ projected 5-year savings

Notably, the migration did not require the team to expand operations. A 10-person infrastructure team handled the entire repatriation without adding new staff. Addressing a common concern about operational overhead, 37signals co-founder David Heinemeier Hansson noted:

We’ve been out for just over a year now, and the team managing everything is still the same. There were no hidden dragons of additional workload associated with the exit that required us to balloon the team, as some spectators speculated when we announced it. All the answers in our Big Cloud Exit FAQ continue to hold.

This directly challenges the common assumption that moving away from public cloud environments inevitably requires a significantly larger infrastructure team.

Execution followed a “criticality ladder” strategy where the team migrated lower-risk services first and more critical ones later. The team moved the HEY email system in stages, starting with caching, then database, and finally, job services. To minimize risk, they colocated infrastructure approximately one millisecond from the AWS region to preserve rollback capability during the cloud repatriation process. After stabilizing the system, they replaced managed services with substantial recurring costs, including RDS and managed Elasticsearch, which exceeded $500,000 together annually.

What makes 37signals’ case study consequential is the publicly documented cost efficiency. For organizations questioning long-term cloud adoption assumptions, particularly with regard to storage costs and managed services, the 37signals documentation provides a rare baseline for comparison.

Why AI Infrastructure Economics are Even More Extreme

The lessons from 37signals’ cloud repatriation take on a sharper edge when applied to AI infrastructure. Higher GPU costs, predictable inference workloads, massive embedding storage, and stricter data regulations create financial and operational pressures that amplify the advantages of on-premises or hybrid cloud solutions that allow you to move workloads where they make the most sense. Below, we break down the key drivers.

AI infrastructure cost comparison

To evaluate the cost implications of different AI infrastructure approaches, the table below compares upfront setup costs, monthly operating expenses at varying workloads, and expected break-even timelines for cloud, on-premises, and hybrid configurations.

Setup	Setup cost	Monthly cost	Break-even
Cloud GPU rental (AWS/ Azure)	$0	$2,900–3,500 (8h/day × $4–8/hour × 15 days)	N/A
Cloud inference APIs (Lambda Labs)	$0	$1,800–2,500 (8h/day × $3.67/hour × 15 days)	N/A
Self-hosted GPU (8×H100 server)	$200K–400K	$1,500–2,000 (power + maintenance)	<12 months
Hybrid (Cloud training + On-Prem)	$200K–400K	Training only, inference minimal	<12 months

Note: For cloud GPU rental, we estimate monthly cost assuming eight hours/day per GPU. The cost scales linearly with utilization; it is not directly per-query.

GPU cloud markups are high

AI workloads depend heavily on GPUs, and cloud providers charge far steeper premiums for GPU capacity than for typical CPU compute. On-demand AWS P5 instances with H100 GPUs cost roughly $4–8 per GPU-hour, while comparable Azure H100 instances are about $3.67 per hour. By contrast, spot markets and alternative providers such as Lambda Labs offer similar GPU capacity for $1–2 per hour, or $1.85–2.49 per hour with reserved commitments.

The result is a 4–8× markup for on-demand hyperscaler GPU capacity relative to the spot or specialized GPU cloud market. In other words, the premium cloud providers charge for high-end AI compute is significantly larger than typical CPU cloud markups. For organizations running sustained inference workloads, this pricing gap quickly becomes the dominant cost driver in AI infrastructure.

Predictable inference makes GPU ownership economical

High GPU pricing becomes especially significant because AI inference workloads are unusually predictable. Purchasing H100 GPUs outright can be cost-efficient. A single GPU costs roughly $25K–40K, while a complete 8×H100 server ranges from $200K–400K. Lenovo’s analysis shows that six or more hours of sustained daily usage reaches payback against AWS within the first year.

The reason this break-even arrives so quickly is that AI inference workloads are unusually predictable. Unlike SaaS traffic which fluctuates throughout the day, production AI systems such as recommendation engines tend to process steady volumes of requests.

Predictability changes the economics. When infrastructure runs at consistent utilization, owned hardware can be amortized efficiently across the workload. Paying cloud premiums for burst capacity that teams rarely use becomes unnecessary.

For organizations running inference continuously, the hardware investment is often recouped in under 12 months. From that point forward, the savings resemble the same pattern documented by 37signals. Fixed infrastructure replacing an ongoing rental bill.

Embedding storage requirements are massive

Even if GPU compute were optimized, AI systems introduce another rapidly growing cost layer: embedding storage. Vector databases store high-dimensional embeddings used for search, retrieval, and recommendation. As datasets scale into millions or billions of records, storage requirements expand quickly.

For instance, 10 million vectors at 1,536 dimensions require at least 58GB of raw storage, often 200–300GB with indexes and metadata. Cloud storage services like Pinecone charge $0.33/GB/month, meaning 500GB could cost $165/month before any queries. Self-hosted solutions like PostgreSQL with pgvector dramatically reduce cloud spending while keeping sensitive data under direct control. Over time, these storage requirements compound infrastructure costs alongside GPU compute, further reinforcing the economic advantages of self-hosted or hybrid architectures.

Data sovereignty and compliance favor on-premises deployment

Data residency regulations and general compliance are priorities in the AI space with the industry becoming increasingly regulated. Notably, the EU AI Act introduced strict regulations for AI systems, with prohibitions on certain AI use cases which took effect in February 2025. On-premises deployment simplifies compliance.

For financial organizations navigating complex regulatory environments, solutions like the Actian Data Intelligence Platform helps enforce data governance and streamline compliance workflows.

The Cloud Infrastructure Case Studies 37signals Validated

As much as the financial transparency of 37signals’ cloud exit was radical, their repatriation was not an isolated occurrence. It was part of a growing trend by many organizations trying to regain cost control and optimize their cloud infrastructure. Many high-profile case studies illustrate the scale and economics of moving workloads back from public clouds to owned or hybrid infrastructure.

Dropbox

Dropbox pioneered enterprise cloud repatriation as early as 2015, completing the migration between 2016 and 2018. The company moved roughly 90% of customer data, reportedly over 500 petabytes, off AWS to three owned colocation facilities. The infrastructure investment totaled $53 million, yet Dropbox reported $74.6 million in operational savings over two years per its 2018 S‑1 filing. A small portion of workloads, primarily European customers and specialized services, remain in AWS. Internally, the initiative was known as “Magic Pocket,” and it exemplifies how a well-executed hybrid cloud approach can deliver substantial savings while aligning with long-term business objectives.

Ahrefs

Ahrefs, the SEO tools company, relied on a Singapore colocation setup with 850 servers. Their reported savings from avoiding public cloud were approximately $400 million over 2.5 years. Actual infrastructure cost: $39.5 million for 850 servers (~$1,500/server/month), versus an estimated $447.7 million if hosted entirely on AWS (~$17,557/server/month equivalent). As Ahrefs put it: “We wouldn’t be profitable, or even exist, if our products were 100% on AWS.” While critics argue that Ahrefs inflated AWS estimates, the directional savings were undeniable, illustrating that cloud repatriation challenges can be surmounted at scale with careful planning.

GEICO

GEICO spent a decade migrating to multiple cloud providers only for its costs to climb and exceed projections by 2.5×, reaching $300 million by 2022 across eight providers. In response, GEICO began moving workloads to a private cloud using OpenStack and Kubernetes, targeting over 50% repatriation by 2029. Early results show 50% reductions in compute and 60% reduction per gigabyte of storage costs compared with public cloud services, demonstrating how a hybrid cloud architecture can deliver efficiency, compliance, and alignment with long-term business objectives.

Akamai

Akamai was on the path to spending over $100 million on third-party cloud services before migrating compute workloads to its own global edge network of 350,000+ servers. The migration delivered savings of roughly $100 million per year, a testament to the economics of repatriation when existing infrastructure and scale align.

What these cases share is the same economic pattern documented by 37signals. Predictable, high-volume workloads eventually become cheaper to run on owned infrastructure than on hyperscaler clouds.

These examples reflect a broader shift occurring across enterprise infrastructure strategies. Barclays’ Chief Information Officers (CIO) surveys show cloud repatriation trending upward in recent years, with the sentiment peaking in the second half of 2024 with 86% of CIOs planning repatriation.

barclays cio survey

Barclay’s CIO survey showing 86% of CIOs planning cloud repatriation

However, this statistic does not mean that companies are abandoning public cloud environments completely. According to IDC, only 8–9% of companies favor full repatriation with most preferring a hybrid approach that combines public and private clouds. Hybrid cloud infrastructure allows organizations to optimize workload placement by strategically allocating sensitive data and mission-critical applications on-premises while leveraging public cloud services for less critical workloads. As such, it has become increasingly important for teams exploring similar transitions to understand the nuances of hybrid deployments and their associated risks.

Cloud Repatriation Statistics

Cloud repatriation is accelerating at the same time as public cloud spending keeps climbing. IDC projects global public cloud spend will reach $1.6 trillion in 2028, doubling from their 2024 prediction. Yet as mentioned earlier, 86% of CIOs are planning some form of repatriation according to Barclays. Both trends can be true because this is not a cloud exodus so much as a rebalancing. Enterprises are leaning towards a hybrid cloud model.

AI is likely to accelerate that shift. AI workloads account for less than 10% of total cloud compute today but Gartner projects that this figure will approach 50% by 2029. Hyperscalers are responding with enormous capital investment. There is an estimated $600 billion in infrastructure spend in 2026, roughly three-quarters of it tied to AI. The assumption is clear: Enterprises will rent that GPU capacity. But the 37signals math suggests that once AI workloads move from experimentation to steady production, ownership economics begin to dominate.

Cost pressure is already driving behavior. Flexera reports that 27% of cloud resources are wasted or underutilized, and 21% of workloads have already been repatriated. The primary reason cited is cost exceeding projections, followed by performance concerns. With GPUs, the margin for inefficiency is thinner. There are fewer optimization levers, higher hourly rates, and faster budget burn.

Regulation adds another layer. The EU AI Act, DORA for financial services, China’s PIPL, and India’s DPDP are tightening data governance requirements. Mimecast reports that 87% of organizations now factor data sovereignty into vendor decisions. For AI systems, sovereignty extends beyond data location to model provenance, audit trails, and compliance documentation. On-premises deployment does not eliminate regulatory complexity, but it centralizes control, and for many enterprises, that simplicity is becoming strategically attractive.

flexera cloud challenges

A bar chart showing why enterprises repatriate

The Counter-Arguments and When Cloud Providers Win

Not all observers agree that cloud repatriation is the best path for every organization. Public cloud environments still deliver value in certain circumstances. But arguments often do not hold strong in the case of AI workloads.

When cloud wins vs. when on-premises wins

Component	Cloud advantage	On-prem advantage
Workload predictability	Handles spiky or unpredictable workloads	Predictable workloads cheaper to self-host
Team expertise	Requires minimal in-house infrastructure skill	Strong IT teams can optimize and reduce vendor reliance
Scale and growth	Rapid scaling and global expansion	Predictable growth enables cost-efficient hardware
Regulatory requirements	Managed compliance, geo-redundancy	Direct control simplifies regulatory alignment
Cost and margins	Pay-as-you-go reduces upfront spend	Long-term savings from owned infrastructure
Service quality	Cloud SLAs ensure availability and performance	Dedicated resources guarantee predictable uptime

Cloud “wrong usage” argument

Jeremy Daly, a serverless advocate, argues that “37signals was using the cloud wrong.” By treating cloud environments as virtual colocation, running VMs and Kubernetes, they were paying cloud premiums without capturing the value of serverless, managed services, and instant scaling. As Daly notes, “In the cloud, we should be renting services, not servers.”

For SaaS workloads with highly variable or spiky traffic, this argument is compelling. Serverless infrastructure allows organizations to scale instantly and pay only for the compute they actually use.

However, AI inference workloads often behave very differently. Production inference systems, such as recommendation models, copilots, and document processing pipelines, tend to run at steady, sustained utilization rather than unpredictable bursts. In these cases, the economic advantage of elastic cloud scaling diminishes. The premium paid for burst capacity still exists, but the workload itself rarely needs that burst capacity.

Daly’s argument, therefore, holds for variable SaaS workloads, where elasticity is critical. For sustained AI inference workloads running at high utilization, paying a premium for burst capacity that is rarely used can make dedicated infrastructure or hybrid deployments more cost-efficient.

Full cost critique

Some critics also question the financial assumptions behind 37signals’ approach. They point out that hardware and software normally account for only about 20% of IT costs, with the remainder covering electricity, cooling, physical security, racking, Uninterruptible Power Supply (UPS), and opportunity costs. David Heinemeier Hanson’s analysis did not include all of these overheads because 37signals used colocation facilities rather than fully owned data centers. Even so, considering 37signals’ figures, it is reasonable to conclude that renting colocation space can still be far cheaper than relying on cloud services.

Competence vs. growth framework

Forrest Brazeal’s IT competence versus growth aspirations framework provides additional nuance. He places 37signals in the High Competence/Low Growth quadrant, ideal for self-hosting. “Not every company has the competence (high) or growth aspirations (low) of 37signals,” he observes. Startups with uncertain or spiky workloads benefit from cloud flexibility, but AI companies running production inference at scale often combine high operational competence with steady growth. Such profiles (steady growth & high competence) are well-suited to repatriation.

Applying the Playbook to AI Infrastructure

If 37signals provided the economic blueprint, AI infrastructure makes the economics more concrete. The decision is no longer abstract. It becomes a structured assessment grounded in workload behavior, utilization, and regulatory exposure.

A practical four-question framework helps translate the 37signals logic into AI terms:

1. Is your inference workload predictable and sustained?

Unlike SaaS traffic spikes, most production AI systems such as recommendation engines, RAG pipelines, or fraud detection models process steady volumes with gradual growth.

2. Are projected GPU utilization rates above 60–70%?

At this threshold, owned hardware amortization typically undercuts public cloud GPU pricing within the first year.

3. Are you processing more than 10–50 million queries per month?

At this scale, per-token and per-query pricing from cloud APIs compound rapidly.

4. Do you face data sovereignty or strict compliance requirements?

For financial services, healthcare, or government workloads, regulatory mandates can tilt the decision toward controlled environments.

If the answer is “yes” to three or four of these, the repatriation economics tend to favor on-premises deployment for production inference.

Decision matrix

Workload stage	Recommended environment	Rationale
Model training	Public cloud	Compute-intensive; cloud GPUs handle burst workloads cost-effectively
Experimentation and prototyping	Public cloud	Flexible, fast provisioning for early-stage iteration
Production inference	On-premises / Hybrid	Steady workloads; owned hardware cheaper at 60–70%+ GPU utilization
Vector storage (embeddings)	On-premises	Reduces recurring managed-service costs and ensures data control

The hybrid AI pattern

In practice, most AI organizations adopt a hybrid model rather than an all-or-nothing shift. Training remains in the cloud. Inference moves closer to owned infrastructure.

Lenovo documented that training Llama 3.1 at hyperscale (39.3 million GPU hours) in the cloud would exceed $483 million. That type of elastic, short-term scale is exactly where public cloud excels. Inference is different. Once a model is trained, serving it for three to five years becomes steady, predictable work. That is where amortized hardware economics has the upper hand.

This split architecture also simplifies data migration risk. Instead of relocating entire AI pipelines at once, organizations can migrate production inference workloads gradually while leaving experimentation and early-stage training in cloud environments. A controlled, phased migration process reduces operational disruption while ensuring seamless integration between cloud-based training and on-premises serving layers.

Self-hosted inference economics

The economics of self-hosted inference depend heavily on utilization and token volume. According to enterprise deployment benchmarks, a 7B-parameter model running on an H100 GPU at roughly 70% utilization costs about $10,000 per year in spot nodes or hardware amortization. Power costs about $300 annually, bringing the total costs to about $10,300.

Public LLM APIs, by contrast, typically charge per million tokens, with enterprise pricing in 2025 ranging from $0.25–$15 per million input tokens and $1.25–$75 per million output tokens depending on model tier and provider.

At low usage levels, APIs remain the more economical option because infrastructure sits idle. However, the economics change as workloads scale. Industry analyses suggest that self-hosted deployment begins to break even at roughly two million tokens per day, after which the fixed cost of owned infrastructure is amortized across a large inference volume.

At high volumes, self-hosted inference can reduce costs by up to 78%. Artefact’s analysis found break-even around 8,000 conversations per day. Below that threshold, managed cloud APIs remain more economical. Above it, ownership compounds savings. The pattern mirrors 37signals: predictable workload plus high utilization equals rapid payback.

Vector databases

Instacart documented migrating from Elasticsearch plus FAISS to PostgreSQL with pgvector, achieving 80% cost savings and a 10× reduction in write amplification. Timescale’s pgvectorscale benchmarks show approximately 75% lower costs than managed vector services like Pinecone at comparable performance.

For RAG systems handling millions of queries monthly, self-hosted vector infrastructure produces savings that resemble the 37signals S3 case: large recurring storage bills replaced by amortized hardware and open-source tooling.

Data sovereignty as a structural driver

Grandview research reports that the sovereign cloud market was worth 648.87 billion USD in 2025 and is projected to reach USD 648.87 billion by 2033. Also, according to Gartner, around 60% of financial firms outside the United States are expected to adopt sovereign or on-premises deployments by 2028.

Frameworks such as the EU AI Act, China’s PIPL, and India’s DPDP mandate data localization and traceability. For organizations processing sensitive training datasets or proprietary inference logs, on-premises deployment inherently satisfies residency requirements because data never leaves jurisdictional boundaries.

The Bottom Line

37signals showed that cloud repatriation teams can measure, model, and defend decisions with hard numbers. With AI infrastructure, the economics can be even more pronounced. If cloud repatriation saved roughly $10 million for Basecamp, an equivalent AI company running production inference at a comparable scale could save multiples of that amount, given the much higher cost of GPU compute and embedding infrastructure.

For organizations choosing to run AI workloads in controlled environments, platforms like Actian VectorAI DB provide a purpose-built vector database designed for high-volume vector search and AI inference workloads. It can be deployed on-premises or in the cloud, allowing organizations to place vector infrastructure where it best fits their operational and economic requirements.

Join the community and learn more about Actian.

About Author