Skip to main content
graphwiz.aigraphwiz.ai
← Back to Posts

Scaling Self-Hosted Cloud Applications: From 1K to 100K+ Users

DevOpsInfrastructure
DevOpsscalingKubernetesNextcloudOpenDeskinfrastructure

Table of Contents

Introduction

Self-hosting cloud applications gives you control over data sovereignty—but that control comes with scaling responsibility. Unlike SaaS platforms that abstract infrastructure away, self-hosted solutions like Nextcloud, OpenDesk, or Matrix require deliberate architecture decisions as user counts grow.

This guide maps four scaling tiers, using Nextcloud and OpenDesk as practical case studies:

TierUsersArchitecture
Tier 11,000Single server, optimized
Tier 210,000Multi-service, load balanced
Tier 350,000Clustered, distributed
Tier 4100,000+Multi-region, enterprise

Scaling Tiers Overview


Tier 1: 1,000 Users — Single Server, Optimized

At 1,000 users with ~10-15% concurrent usage, a well-tuned single server suffices. The focus is on optimization, not distribution.

Tier 1: Single Server Architecture

Hardware Baseline

CPU:     8 vCPU
RAM:     32 GB
Storage: 500 GB NVMe SSD (or S3-compatible object storage)
Network: 1 Gbps
```text

### Database Configuration

**PostgreSQL** (recommended for performance):

```ini
# postgresql.conf
shared_buffers = 8GB
effective_cache_size = 24GB
max_connections = 200
work_mem = 64MB
maintenance_work_mem = 512MB
checkpoint_completion_target = 0.9
wal_buffers = 64MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
```text

**MySQL/MariaDB** alternative:

```ini
[mysqld]
innodb_buffer_pool_size = 8G
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
max_connections = 200
query_cache_type = 1
query_cache_size = 64M
transaction_isolation = READ-COMMITTED
```text

### Caching Stack

Single-server caching uses APCu for local cache and Redis for locking:

```php
// Nextcloud config.php
'memcache.local' => '\OC\Memcache\APCu',
'memcache.locking' => '\OC\Memcache\Redis',
'redis' => [
    'host' => 'localhost',
    'port' => 6379,
],
```text

### PHP-FPM Tuning

```ini
[www]
pm = dynamic
pm.max_children = 50
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 10
pm.max_requests = 500
```text

**Memory calculation**: Each PHP-FPM worker consumes ~50-100MB. With 32GB RAM and 8GB for database, you can safely run 150-200 workers.

### OPcache Configuration

```ini
opcache.enable = 1
opcache.memory_consumption = 256
opcache.interned_strings_buffer = 16
opcache.max_accelerated_files = 10000
opcache.revalidate_freq = 60
opcache.fast_shutdown = 1
```text

### Storage Strategy

**Option A: Local NVMe** — Fastest for small deployments

```text
Storage: /var/lib/nextcloud/data → 500GB NVMe
```text

**Option B: Object Storage** — Better for growth, simpler backup

```php
'objectstore' => [
    'class' => '\\OC\\Files\\ObjectStore\\S3',
    'arguments' => [
        'bucket' => 'nextcloud-primary',
        'hostname' => 'minio.internal.example.com',
        'key' => 'access-key',
        'secret' => 'secret-key',
        'use_path_style' => true,
    ],
],
```text

### Background Jobs

Use systemd timers instead of web-based AJAX:

```ini
# /etc/systemd/system/nextcloudcron.timer
[Unit]
Description = Run Nextcloud cron every 5 minutes

[Timer]
OnBootSec = 5min
OnUnitActiveSec = 5min

[Install]
WantedBy = timers.target
```text

Enable: `systemctl enable --now nextcloudcron.timer`

---

## Tier 2: 10,000 Users — Multi-Service, Load Balanced

At 10,000 users, single-server bottlenecks emerge. You need load balancing and read replicas.

![Tier 2: Load Balanced Architecture](/images/diagrams/scaling-tier2-load-balanced.svg)

### Hardware Sizing

| Component | Spec | Count |
| ----------- | ------ | ------- |
| Web Nodes | 8 vCPU, 16GB RAM | 3 |
| Database Primary | 8 vCPU, 32GB RAM | 1 |
| Database Replicas | 4 vCPU, 16GB RAM | 2 |
| Redis | 4 vCPU, 16GB RAM | 3 (cluster) |
| Object Storage | MinIO cluster | 4+ nodes |

### Load Balancer Configuration (HAProxy)

```haproxy
frontend nextcloud_https
    bind *:443 ssl crt /etc/ssl/nextcloud.pem
    acl url_discovery path /.well-known/caldav /.well-known/carddav
    http-request redirect location /remote.php/dav/ code 301 if url_discovery
    default_backend nextcloud_servers

backend nextcloud_servers
    balance leastconn
    option httpchk HEAD /status.php HTTP/1.1\r\nHost:\ nextcloud.example.com
    http-check expect status 200
    server web1 10.0.1.1:9000 check inter 5s fall 3 rise 2
    server web2 10.0.1.2:9000 check inter 5s fall 3 rise 2
    server web3 10.0.1.3:9000 check inter 5s fall 3 rise 2
```text

### Database Read Replicas

Nextcloud supports native read/write splitting (since v29):

```php
// config.php
'dbreplica' => [
    ['user' => 'nc_user', 'password' => 'pass1', 'host' => 'db-replica-1', 'dbname' => 'nextcloud'],
    ['user' => 'nc_user', 'password' => 'pass2', 'host' => 'db-replica-2', 'dbname' => 'nextcloud'],
],
```text

Read queries automatically route to replicas; writes go to primary.

### Redis Cluster

Distributed caching and file locking:

```php
'memcache.local' => '\OC\Memcache\APCu',
'memcache.distributed' => '\OC\Memcache\Redis',
'memcache.locking' => '\OC\Memcache\Redis',
'redis.cluster' => [
    'seeds' => [
        'redis-1:7000',
        'redis-2:7000',
        'redis-3:7000',
    ],
],
```text

### Session Storage

Web nodes are stateless; sessions go to Redis:

```php
// php.ini
session.save_handler = redis
session.save_path = "tcp://redis-1:6379?weight=1,tcp://redis-2:6379?weight=1"
```text

### Critical: Shared Configuration

All web nodes must share:

- **Same Redis cluster** (distributed cache + locking)
- **Same database** (primary + replicas)
- **Same object storage** (not local disk)
- **Same config.php** (synced via rsync or shared volume)

---

## Tier 3: 50,000 Users — Kubernetes, Clustered

At 50,000 users, Kubernetes becomes essential for orchestration, auto-scaling, and resilience.

![Tier 3: Kubernetes Clustered Architecture](/images/diagrams/scaling-tier3-kubernetes.svg)

### Architecture Overview

```mermaid
flowchart TD
    subgraph K8s["Kubernetes Cluster"]
        subgraph Ingress["Ingress Layer"]
            ING["NGINX Ingress + cert-manager"]
        end
        subgraph App["Application Layer"]
            P1["Nextcloud Pod 1"]
            P2["Nextcloud Pod 2"]
            P3["Nextcloud Pod 3"]
            PN["Nextcloud Pod N"]
            HPA["HPA: min=5, max=20 (CPU > 70%)"]
        end
        subgraph Data["Data Layer"]
            PG["PostgreSQL Cluster<br/>(Patroni)"]
            RD["Redis Cluster<br/>(6 nodes)"]
            MN["MinIO Cluster<br/>(4+ nodes)"]
        end
        ING --> P1
        ING --> P2
        ING --> P3
        ING --> PN
        P1 --> PG
        P1 --> RD
        P1 --> MN
    end
```text

### Kubernetes Deployment (OpenDesk Pattern)

OpenDesk demonstrates production Kubernetes architecture:

```yaml
# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nextcloud-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nextcloud
  minReplicas: 5
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
```text

### Database Clustering (Patroni/PostgreSQL)

```yaml
# Patroni cluster for high availability
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgresql
spec:
  serviceName: postgresql-headless
  replicas: 3
  selector:
    matchLabels:
      app: postgresql
  template:
    spec:
      containers:
      - name: postgresql
        resources:
          requests:
            cpu: "4"
            memory: "16Gi"
          limits:
            cpu: "8"
            memory: "32Gi"
```text

### Multi-Bucket Object Storage

For 50K+ users, distribute files across multiple S3 buckets:

```php
'objectstore' => [
    'class' => '\\OC\\Files\\ObjectStore\\S3',
    'arguments' => [
        'multibucket' => true,
        'num_buckets' => 64,
        'bucket' => 'nextcloud-',
        'hostname' => 'minio.internal.example.com',
        'key' => 'access-key',
        'secret' => 'secret-key',
        'use_path_style' => true,
    ],
],
```text

### Component Resource Guidelines

Based on OpenDesk scaling documentation:

| Component | Per X Users | CPU | RAM | Notes |
| ----------- | ------------- | ----- | ----- | ------- |
| Nextcloud | 500 concurrent | 2 vCPU | 4 GB | Scale horizontally |
| Collabora | 15 active users | 1 vCPU | 50 MB | Stateful - sticky sessions |
| Jitsi (JVB) | 200 concurrent | 4 vCPU | 8 GB | Video transcoding |
| Matrix/Element | 10K total | 15 vCPU | 12 GB | Federation doubles load |
| PostgreSQL | Cluster-wide | 16 vCPU | 64 GB | Primary + 2 replicas |

### Monitoring Stack

```yaml
# kube-prometheus-stack
prometheus:
  prometheusSpec:
    retention: 30d
    resources:
      requests:
        cpu: 500m
        memory: 2Gi

alertmanager:
  config:
    route:
      receiver: 'slack-notifications'
      routes:
      - match:
          severity: critical
        receiver: 'pagerduty'

grafana:
  additionalDataSources:
  - name: Loki
    type: loki
    url: http://loki:3100
```text

### Key Metrics to Monitor

- **Pod scaling events** — HPA triggers indicate capacity pressure
- **Database connection pool saturation** — Approaching max_connections
- **Redis memory usage** — Cache eviction rates
- **Object storage latency** — S3/MinIO response times
- **PHP-FPM queue length** — Requests waiting for workers

---

## Tier 4: 100,000+ Users — Multi-Region, Enterprise

At 100K+ users, single-region deployments hit limits. You need multi-region architecture, global load balancing, and sophisticated failure handling.

![Tier 4: Multi-Region Enterprise Architecture](/images/diagrams/scaling-tier4-multi-region.svg)

### Architecture Overview

```mermaid
flowchart TD
    GSLB["Global Load Balancer (GSLB)<br/>Route53 / CloudFlare / PowerDNS"]

    subgraph EU["Region EU"]
        EU_K8s["K8s Cluster (20+ nodes)"]
        EU_PG["PostgreSQL Primary"]
        EU_MN["MinIO Cluster (sync)"]
    end

    subgraph US["Region US"]
        US_K8s["K8s Cluster (20+ nodes)"]
        US_PG["PostgreSQL Primary"]
        US_MN["MinIO Cluster (sync)"]
    end

    subgraph AP["Region AP"]
        AP_K8s["K8s Cluster (20+ nodes)"]
        AP_PG["PostgreSQL Primary"]
        AP_MN["MinIO Cluster (sync)"]
    end

    GSLB --> EU
    GSLB --> US
    GSLB --> AP
    EU_MN <--> US_MN
    US_MN <--> AP_MN
```text

### Tenant Isolation Strategy

### Option A: Namespace per Tenant (Kubernetes)

```yaml
# Each organization gets isolated namespace
apiVersion: v1
kind: Namespace
metadata:
  name: tenant-acme-corp
  labels:
    tenant: acme-corp
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tenant-isolation
  namespace: tenant-acme-corp
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          tenant: acme-corp
```text

### Option B: Database per Tenant

```sql
-- Tenant isolation at database level
CREATE DATABASE nextcloud_acme;
CREATE DATABASE nextcloud_globex;

-- Row-level security for shared database
ALTER TABLE files ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON files USING (tenant_id = current_tenant());
```text

### Global Database Strategy

**Synchronous replication within region, async between regions:**

```mermaid
flowchart TD
    subgraph RegionEU["Region EU"]
        EU_P["Primary"]
        EU_R1["Replica 1"]
        EU_R2["Replica 2"]
        EU_P --> EU_R1 --> EU_R2
    end

    subgraph RegionUS["Region US"]
        US_S["Standby (Promotable)"]
        US_R["Replica"]
        US_S --> US_R
    end

EU_P -.-> | "Async Stream" | US_S
```text

### CDN and Edge Caching

```yaml
# CloudFlare / Fastly CDN rules
rules:
  - match:
      path: "/remote.php/dav/files/*"
    caching:
      enabled: false  # DAV is dynamic
  - match:
      path: "/apps/files/*"
    caching:
      enabled: true
      ttl: 3600
  - match:
      path: "/css/*"
      path: "/js/*"
    caching:
      enabled: true
      ttl: 86400
```text

### Capacity Planning at Scale

**Per-region sizing for 100K users:**

| Component | Instances | Each Spec |
| ----------- | ----------- | ----------- |
| Web/API Pods | 50-100 | 4 vCPU, 8 GB |
| Database Primary | 1 | 32 vCPU, 128 GB |
| Database Replicas | 4-6 | 16 vCPU, 64 GB |
| Redis Cluster | 9 (3x3) | 8 vCPU, 32 GB |
| MinIO Nodes | 12+ | 16 vCPU, 64 GB, NVMe |
| Load Balancers | 3 | 8 vCPU, 16 GB |

### Failure Modes and Mitigations

| Failure | Impact | Mitigation |
| --------- | -------- | ------------ |
| Single pod crash | None | K8s recreates automatically |
| Node failure | Minimal | Pods reschedule, PDB ensures availability |
| AZ failure | Degraded | Cross-AZ deployment, multi-AZ storage |
| Region failure | Failover | GSLB routes to healthy region |
| Database primary failure | Brief outage | Patroni failover to replica (<30s) |
| Object storage failure | Severe | Multi-region replication |

---

## OpenDesk vs Vanilla Nextcloud: Scaling Comparison

OpenDesk (German government's open-source workspace) provides a reference architecture for scaled deployments:

| Aspect | Vanilla Nextcloud | OpenDesk |
| -------- | ------------------- | ---------- |
| **Deployment** | VM or containers | Kubernetes-only |
| **Architecture** | Monolithic PHP | Modular microservices |
| **Database** | MySQL/MariaDB/PostgreSQL | PostgreSQL with clustering |
| **Auth** | Built-in or external | Keycloak + OpenLDAP (decoupled) |
| **Scaling** | Manual configuration | Helm charts with autoscaling |
| **Office Suite** | Optional app | Collabora/OnlyOffice integrated |
| **Video** | External (BigBlueButton) | Jitsi Meet (integrated) |

### OpenDesk Component Scaling

From official documentation:

```text
Collabora (Document Editing):
  Per 15 active users: 1 vCPU, 50 MB RAM

Jitsi (Video Conferencing):
  Per JVB (200 concurrent): 4 vCPU, 8 GB RAM
  Scale JVBs horizontally, use Octo for load balancing

Matrix/Element (Chat):
  Per 10K users (federation ON): 15 vCPU, 12 GB RAM
  Per 10K users (federation OFF): 10 vCPU, 8 GB RAM
  Federation adds 2-5x resource overhead
```text

---

## Scaling Decision Matrix

| Factor | Scale Vertically | Scale Horizontally |
| -------- | ------------------ | ------------------- |
| **Application type** | Stateful, monolithic | Stateless, microservices |
| **Traffic pattern** | Steady, predictable | Variable, bursty |
| **Availability requirement** | Best effort | High availability (99.9%+) |
| **Data consistency** | Strong, immediate | Eventual acceptable |
| **Operational complexity** | Keep simple | Accept complexity |
| **Budget** | Limited | Flexible |
| **Geographic distribution** | Single region | Multi-region |

### General Principles

1. **Start vertical, then horizontal** — Optimize single instance before adding complexity
2. **Stateless first** — 12-factor app principles enable horizontal scaling
3. **Shared nothing** — Each process independent, state in backing services
4. **Cache aggressively** — Multi-layer: CDN → App → DB
5. **Monitor everything** — You can't scale what you can't measure
6. **Automate scaling** — HPA, VPA, Cluster Autoscaler
7. **Design for failure** — Components will fail; plan for it
8. **Right-size continuously** — Over-provisioning wastes money

---

## Quick Reference: Configuration by Tier

### Tier 1 (1K Users)

```php
// Minimal config
'memcache.local' => '\OC\Memcache\APCu',
'memcache.locking' => '\OC\Memcache\Redis',
```text

### Tier 2 (10K Users)

```php
// Load balanced with replicas
'dbreplica' => [
    ['host' => 'db-replica-1'],
    ['host' => 'db-replica-2'],
],
'memcache.distributed' => '\OC\Memcache\Redis',
'redis.cluster' => ['seeds' => ['redis-1:7000', 'redis-2:7000', 'redis-3:7000']],
```text

### Tier 3 (50K Users)

```php
// Kubernetes with object storage
'objectstore' => [
    'class' => '\\OC\\Files\\ObjectStore\\S3',
    'arguments' => [
        'multibucket' => true,
        'num_buckets' => 64,
        'bucket' => 'nextcloud-',
    ],
],
```text

### Tier 4 (100K+ Users)

```php
// Multi-region with global load balancing
'objectstore' => [
    'class' => '\\OC\\Files\\ObjectStore\\S3',
    'arguments' => [
        'multibucket' => true,
        'num_buckets' => 128,
        'bucket' => 'nextcloud-',
        'region' => 'eu-west-1',
    ],
],
'trusted_proxies' => ['10.0.0.0/8', '172.16.0.0/12'],
'overwriteprotocol' => 'https',
```text

---

## Conclusion

Scaling self-hosted cloud applications follows predictable patterns:

- **1K users**: Tune the stack, optimize single server
- **10K users**: Add load balancing and read replicas
- **50K users**: Kubernetes orchestration, clustered databases
- **100K+ users**: Multi-region, tenant isolation, global load balancing

The key insight: **scaling is architecture, not just hardware**. Decisions made at 1K users—storage backend, caching strategy, state management—determine how smoothly you reach 100K.

Start with 12-factor principles: stateless processes, attached resources, horizontal scaling. Then layer on product-specific optimizations (Nextcloud's Redis locking, OpenDesk's Kubernetes-native components).

Invest in observability early. You cannot scale what you cannot measure.