# SupportHub — Production Architecture Roadmap

> **Status:** Design phase. Apply these changes when moving from dev to production.
> **Last updated:** 2026-03-06

---

## Current State (Development)

| Layer | Current | Notes |
|-------|---------|-------|
| **Database** | SQLite (local) / MySQL (production host) | Single-writer bottleneck with SQLite; MySQL already configured |
| **Background tasks** | `threading.Thread(daemon=True)` | In-memory progress dicts, no persistence, no retry |
| **Task scheduling** | 2 `before_request` hooks | Embedding catch-up (24h), website checker (infinite loop) |
| **Web server** | Flask dev server (`--debug`) | Single process, auto-reload |
| **Production entry** | `passenger_wsgi.py` | cPanel/Plesk Passenger — single worker |
| **Session storage** | Flask-Login cookies | `SECRET_KEY` from `.env` |
| **Static assets** | `app.css` + `app.js` (no build step) | Served by Flask or Passenger |
| **Auth** | bcrypt + Flask-Login + CSRF | Single admin user in dev |
| **Multi-tenancy** | None | All data is global to one organization |

### Background Tasks Inventory

| Task | Type | Trigger | Duration |
|------|------|---------|----------|
| Website background checker | Persistent daemon thread | `before_request` (once) | Infinite loop, 20 URLs/60s |
| Embedding catch-up | One-shot thread | `before_request` (24h rate-limit) | Minutes |
| Full website check | Batch thread | Manual button | ~1-2h for 10K sites |
| Module rescan | Batch thread | Manual button | ~48 min for 7.5K sites |
| Order sync (daily) | API call handler | External Addons cron / manual | Minutes |
| Historical thread sync | Batch thread | Manual from Settings | Variable |
| Doc scraper | Batch thread | Manual from Settings | Variable |
| Re-embed thread | One-shot thread | After reply/answered_outside | Seconds |

### Models (26 total)

Core: `UsersAdmin`, `AddonsThreads`, `AddonsMessages`, `Tickets`, `Tags`, `AddonsProducts`, `AddonsOrder`, `AddonsCustomer`
AI: `AISuggestions`, `AISuggestionFeedback`, `ThreadEmbeddings`, `ProductKnowledge`, `GeneratedGuide`
Content: `PredefinedMessages`, `DocSource`, `DocPage`, `MessageTranslation`, `MessageTopicSegment`, `SegmentationState`
System: `AppSettings`, `SyncHistory`, `AuditLog`, `WebsiteCheck`
Join tables: `TicketTags`, `ProductPredefinedMessages`, `AddonsAttachments`

---

## Phase 1 — Database: SQLite → PostgreSQL

**Why:** SQLite has file-level locking — one write blocks all others. With 20-50 concurrent users running syncs, website checks, and imports, you'll get `database is locked` errors constantly. MySQL is configured but PostgreSQL is the better choice for concurrent workloads and JSON support.

**Changes:**

1. **Install `psycopg2-binary`** (or `psycopg2` for production)
   ```
   pip install psycopg2-binary
   ```

2. **Update `config.py`** — add PostgreSQL connection string option:
   ```python
   # .env
   DATABASE_URL=postgresql://user:pass@localhost:5432/supporthub
   ```

3. **Migrate data** — use `pgloader` or Alembic:
   ```bash
   # Option A: pgloader (fastest for bulk migration)
   pgloader supporthub_local.db postgresql://user:pass@localhost/supporthub

   # Option B: Fresh schema + re-sync
   alembic upgrade head  # create tables
   # Then re-sync from Addons API
   ```

4. **Review SQLAlchemy queries** — most work unchanged, but check:
   - `CAST(x AS TEXT)` → PostgreSQL prefers `CAST(x AS VARCHAR)`
   - Boolean columns: SQLite stores 0/1, PostgreSQL stores true/false (SQLAlchemy handles this)
   - `LIKE` is case-sensitive in PostgreSQL (use `ILIKE` where needed)
   - Binary blobs (embeddings): PostgreSQL uses `BYTEA` — works with SQLAlchemy `LargeBinary`

5. **Connection pooling** — add to `config.py`:
   ```python
   SQLALCHEMY_POOL_SIZE = 10
   SQLALCHEMY_MAX_OVERFLOW = 20
   SQLALCHEMY_POOL_RECYCLE = 1800
   ```

**Effort:** 1-2 days. Low risk — SQLAlchemy abstracts most differences.

---

## Phase 2 — Background Tasks: Threads → Celery + Redis

**Why:** Current daemon threads use in-memory dicts for progress. With multiple Gunicorn workers, each worker has its own memory — progress tracking breaks, duplicate tasks spawn, and there's no retry on failure.

**Changes:**

1. **Install Celery + Redis:**
   ```
   pip install celery redis
   ```

2. **Create `supporthub/app/celery_app.py`:**
   ```python
   from celery import Celery

   celery = Celery('supporthub')
   celery.config_from_object({
       'broker_url': 'redis://localhost:6379/0',
       'result_backend': 'redis://localhost:6379/0',
       'task_serializer': 'json',
       'result_serializer': 'json',
       'accept_content': ['json'],
       'task_track_started': True,
       'task_acks_late': True,       # Re-queue if worker dies
       'worker_prefetch_multiplier': 1,
   })
   ```

3. **Convert background tasks to Celery tasks:**

   ```python
   # Before (threading)
   t = threading.Thread(target=_run, daemon=True)
   t.start()

   # After (Celery)
   from supporthub.app.tasks import run_website_check_batch
   result = run_website_check_batch.delay(batch_size=50)
   # result.id → track progress via Celery result backend
   ```

4. **Progress tracking** — replace in-memory dicts with Redis or DB:
   ```python
   # In task:
   self.update_state(state='PROGRESS', meta={'checked': 150, 'total': 7500})

   # In API endpoint:
   result = AsyncResult(task_id)
   return jsonify(result.info)
   ```

5. **Scheduled tasks** — replace `before_request` hooks with Celery Beat:
   ```python
   celery.conf.beat_schedule = {
       'embedding-catchup': {
           'task': 'supporthub.app.tasks.check_embeddings',
           'schedule': 86400.0,  # daily
       },
       'website-background-check': {
           'task': 'supporthub.app.tasks.run_background_website_checks',
           'schedule': 60.0,  # every minute
       },
   }
   ```

6. **Run workers:**
   ```bash
   # Worker (handles tasks)
   celery -A supporthub.app.celery_app worker --loglevel=info --concurrency=4

   # Beat (schedules periodic tasks)
   celery -A supporthub.app.celery_app beat --loglevel=info
   ```

**Tasks to migrate:**

| Current | Celery Task | Schedule |
|---------|-------------|----------|
| `_bg_checker` infinite loop | `run_background_website_checks` | Every 60s (Beat) |
| `_embed_missing` before_request | `check_embeddings` | Daily (Beat) |
| Website check batch (manual) | `run_website_check_batch` | On-demand |
| Module rescan (manual) | `run_module_rescan_batch` | On-demand |
| Historical sync | `run_historical_sync` | On-demand |
| Doc scraper | `run_doc_scrape` | On-demand |
| Order sync | `sync_orders_for_period` | On-demand / Daily (Beat) |

**Effort:** 2-3 days. Medium risk — need to ensure DB sessions are properly scoped per task.

---

## Phase 3 — Web Server: Dev → Gunicorn + Nginx

**Why:** Flask dev server handles one request at a time. Passenger (current production) is limited. Gunicorn with multiple workers handles concurrent users properly.

**Changes:**

1. **Install Gunicorn:**
   ```
   pip install gunicorn
   ```

2. **Create `gunicorn.conf.py`:**
   ```python
   bind = '0.0.0.0:8000'
   workers = 4                  # 2 * CPU cores + 1
   worker_class = 'gthread'     # Threaded workers for I/O-bound app
   threads = 2
   timeout = 120
   accesslog = '-'
   errorlog = '-'
   loglevel = 'info'
   ```

3. **Nginx reverse proxy:**
   ```nginx
   server {
       listen 443 ssl;
       server_name support.smart-modules.com;

       location / {
           proxy_pass http://127.0.0.1:8000;
           proxy_set_header Host $host;
           proxy_set_header X-Real-IP $remote_addr;
           proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
           proxy_set_header X-Forwarded-Proto $scheme;
       }

       location /static/ {
           alias /path/to/supporthub/app/static/;
           expires 30d;
       }
   }
   ```

4. **Remove `before_request` thread spawning** — Celery Beat handles scheduling now.

5. **Update `passenger_wsgi.py`** or replace with Gunicorn systemd service:
   ```ini
   [Unit]
   Description=SupportHub Gunicorn
   After=network.target

   [Service]
   User=supporthub
   WorkingDirectory=/path/to/support.smart-modules.com
   ExecStart=/path/to/venv/bin/gunicorn -c gunicorn.conf.py supporthub.app.main:app
   Restart=always

   [Install]
   WantedBy=multi-user.target
   ```

**Effort:** 1 day. Low risk if Celery is already handling background tasks.

---

## Phase 4 — Security Hardening

**Before going live with multiple users:**

| Item | Current State | Fix |
|------|--------------|-----|
| `.env` in repo | Committed with real keys | Add to `.gitignore`, create `.env.example` |
| `SECRET_KEY` | Weak default | Generate with `python -c "import secrets; print(secrets.token_hex(32))"` |
| Rate limiting | None | Add `flask-limiter` (e.g., 60 req/min per IP) |
| HTTPS enforcement | Not enforced | Add `Talisman` or Nginx redirect |
| Password policy | None | Enforce min length, complexity |
| Session timeout | None | Set `PERMANENT_SESSION_LIFETIME` |
| CORS | Not configured | Add `flask-cors` if API is cross-origin |
| SQL injection | Safe (SQLAlchemy ORM) | Keep using ORM, avoid raw SQL |
| XSS | `bleach` sanitization on messages | Verify all user input is escaped in templates |

**Effort:** 1 day.

---

## Phase 5 — Multi-Tenancy (if needed)

Only needed if multiple Addons sellers share the same instance. Skip if each seller gets their own deployment.

**Option A: Schema-per-tenant** (simpler)
- Each tenant gets a separate PostgreSQL schema
- Flask middleware sets `search_path` based on logged-in user's org
- Shared `users` table in `public` schema

**Option B: Row-level isolation** (single schema)
- Add `tenant_id` to every table
- Enforce via SQLAlchemy event listeners or query filters
- More complex but single DB to manage

**Recommendation:** Start with Option A unless you need cross-tenant analytics.

**Effort:** 3-5 days. High risk — touches every query.

---

## Migration Order

```
Phase 1 (PostgreSQL)     ← Do first, unblocks everything
    ↓
Phase 2 (Celery+Redis)   ← Replaces fragile thread model
    ↓
Phase 3 (Gunicorn+Nginx) ← Multi-worker serving
    ↓
Phase 4 (Security)       ← Before opening to users
    ↓
Phase 5 (Multi-tenancy)  ← Only if shared instance
```

Each phase is independently deployable. Phase 1 alone gives the biggest improvement.

---

## Infrastructure Requirements (Production)

| Component | Minimum | Recommended (50 users) |
|-----------|---------|------------------------|
| **PostgreSQL** | 1 GB RAM, 10 GB disk | 2 GB RAM, 50 GB disk |
| **Redis** | 64 MB RAM | 256 MB RAM |
| **Celery worker** | 1 worker, 4 threads | 2 workers, 4 threads each |
| **Celery Beat** | 1 process | 1 process |
| **Gunicorn** | 2 workers | 4 workers, 2 threads each |
| **Nginx** | Minimal | Minimal |
| **Total server** | 2 GB RAM, 1 vCPU | 4 GB RAM, 2 vCPU |

A single VPS (Hetzner €10/mo, DigitalOcean $12/mo) handles 50 users comfortably.

---

## Quick Reference: Current Services

| Service | File | Key Functions |
|---------|------|---------------|
| Auth | `auth_service.py` | `verify_password()`, `hash_password()`, `log_audit()` |
| AI | `ai_service.py` | `summarize_ticket()`, `classify_ticket()`, `draft_reply()` |
| Embeddings | `embedding_service.py` | `embed_thread()`, `similarity_search()`, `check_and_embed_missing()` |
| Sync | `sync_service.py` | `sync_threads()`, `run_historical_sync()` |
| Orders | `order_sync_service.py` | `sync_orders_for_period()`, `sync_all_orders()`, `import_csv()` |
| Tickets | `ticket_service.py` | `update_status()`, `update_priority()` |
| Dashboard | `dashboard_stats.py` | `get_revenue_stats()`, `get_ticket_stats()` |
| Segmentation | `segmentation_service.py` | `segment_thread()`, `get_segments()` |
| Docs | `doc_scraper_service.py` | `scrape_source()`, `scrape_all()` |
| Website checks | `website_check_service.py` | `check_website()`, `check_module_installed()`, `run_continuous_checks()` |