Building a Hosting Platform from Scratch
Building a hosting platform from scratch is a systems problem more than a UI problem. You are not just building signup forms and dashboards. You are building provisioning flows, tenant isolation, billing states, service orchestration, and operational tooling that has to behave predictably under failure.
My stack for the hosting platform combined Next.js for the dashboard, Node.js services for orchestration, FastAPI for infrastructure-facing automation endpoints, and PostgreSQL for persistent state. The architecture was shaped around one question: how do we turn a user action in the control panel into a safe, observable backend workflow?
The Platform Needed Clear Service Boundaries
The first mistake in this kind of product is stuffing everything into one application. Provisioning, billing, identity, DNS operations, and UI rendering all evolve at different speeds. I separated them into focused services with clear contracts.
tsexport interface ProvisioningRequest {
tenantId: string
product: 'shared-hosting' | 'vps' | 'database'
region: string
plan: string
requestedBy: string
}
export interface ProvisioningResult {
orderId: string
status: 'queued' | 'running' | 'completed' | 'failed'
resourceIds: string[]
}
That contract sat between the control panel and the orchestration layer. The frontend did not need to know how a VPS was actually created. It only needed a stable lifecycle to display.
Next.js Handled the Control Panel
The control panel focused on account management, resource visibility, and action dispatch. It was not responsible for long-running tasks.
tsexport async function createServerAction(input: CreateServerInput) {
const response = await fetch(`${process.env.API_URL}/provisioning/orders`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify(input),
})
if (!response.ok) {
throw new Error('Failed to create provisioning order')
}
return response.json()
}
That separation kept the web layer clean. A dashboard request should never block while infrastructure work runs for minutes.
Orchestration Needed Durable State
The orchestration service turned business events into infrastructure tasks. The important part was durability: every order had to be resumable, debuggable, and safe to retry.
tsexport async function enqueueProvisioningOrder(order: ProvisioningRequest) {
const record = await db.order.create({
data: {
tenantId: order.tenantId,
product: order.product,
region: order.region,
plan: order.plan,
status: 'queued',
},
})
await queue.add('provision-resource', { orderId: record.id })
return record
}
Once the order existed in the database, the worker layer could process it asynchronously and update status transitions without losing visibility.
FastAPI Was a Good Fit for Infrastructure Automation
I used FastAPI for machine-facing automation endpoints because the ergonomics for background tasks, validation, and typed request bodies were strong, especially for infrastructure workflows.
pyfrom fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class DnsRecordRequest(BaseModel):
zone: str
record_type: str
name: str
value: str
@app.post('/dns/records')
async def create_record(payload: DnsRecordRequest):
# call provider SDK or internal DNS service
return {'status': 'queued', 'name': payload.name}
That service did not own customer-facing UX. It owned deterministic infrastructure operations with clear request and response models.
Provisioning Was Built as a State Machine
Long-running workflows become much easier to debug when they move through explicit states instead of ad hoc logs.
tsexport type OrderStatus =
| 'queued'
| 'validating'
| 'provisioning'
| 'configuring'
| 'completed'
| 'failed'
Each state transition emitted an event and a timestamp. That made the admin panel, customer notifications, and support tooling much easier to implement later.
PostgreSQL Was the Source of Truth
I used PostgreSQL for customer accounts, orders, invoices, service assignments, and audit trails. The key decision was treating infrastructure state as application data, not just log output.
That meant support staff could answer practical questions quickly:
- when did an order start?
- which step failed?
- was DNS configured?
- was billing activated before provisioning completed?
If those questions require grepping logs across multiple services, the platform becomes painful to operate.
The Biggest Lesson: Operability Is a Feature
What made the hosting platform feel solid was not one technology choice. It was the discipline of keeping workflows observable, stateful, and resumable. Every provisioning task needed a visible status. Every failure needed a place to land. Every retry needed to be safe.
That is the difference between a demo platform and a real one. In a real platform, the hardest part is not creating resources. It is making resource creation understandable to both users and operators.