# Website Checks & Module Detection — Guide

## Overview

The website check system monitors customer shop URLs for:
1. **Online status** — Is the site reachable? (DNS + HTTP check)
2. **Module detection** — Are our PrestaShop modules installed? (checks `/modules/{slug}/logo.png`)

Data lives in the `website_checks` table. Each row = one unique URL.

## Architecture

### URL Flow

```
Addons API (threads) ──→ AddonsThreads.customer_website
Addons CSV (orders)  ──→ AddonsOrder.customer_website_csv
                              ↓ (profile rebuild)
                        AddonsCustomer.website
                              ↓ (normalize_url)
                        WebsiteCheck.website_url
```

All URLs pass through `sanitize_url()` on write (strips `//` prefix, ensures `https://`).
`normalize_url()` is used for matching/dedup (lowercase host, strip trailing slash).

### Key Files

| File | Role |
|------|------|
| `services/website_check_service.py` | All check logic: online, module detection, batch runner |
| `main.py` routes | API endpoints, progress tracking, background thread |
| `templates/customers.html` | UI: buttons, progress banner, status dots, module pills |
| `models.py` → `WebsiteCheck` | DB model for check results |
| `models.py` → `AddonsProducts.module_slug` | Slug config per product |

### Module Slugs (configured in Products page)

| Product | Slug |
|---------|------|
| Product Videos | `productvideo` |
| Pixel Plus (Facebook) | `facebookconversiontrackingplus` |
| Estimated Delivery V3 | `estimateddelivery` |
| Products Alert | `productsalert` |
| Products Feed (Facebook) | `facebookproductsfeed` |
| Wire Transfer Remainder | `wiretransferreminder` |

Slugs are editable inline on the Products page (`/products`).

## How It Works

### Online Check (`check_website`)
1. DNS resolution check (skip HTTP if DNS fails)
2. HTTP HEAD → fallback to GET if 405/403/400
3. 2xx/3xx = online; 403/503 = online but bot-protected
4. Timeout / connection error = offline

### Module Detection (`check_module_installed`)
1. GET `https://{site}/modules/{slug}/logo.png`
2. Check `Content-Type` header starts with `image/`
3. Real logo.png → `image/png` (~2-12KB); soft-404 → `text/html` (~90KB)
4. All 6 slugs checked in parallel (6 threads per site)

### Batch Processing
- **Check Websites** button: full check (online + modules), 50 per batch, 4 parallel workers
- **Rescan Modules** button: module-only scan (skips online check), 100 per batch, 4 workers
- Background daemon: 20 per batch, 60s between batches, prioritizes recently active customers

## Buttons on Customers Page

| Button | What it does | Speed |
|--------|-------------|-------|
| **Check Websites** | Full online + module check for all due URLs | ~2 sites/sec |
| **Rescan Modules** | Module-only scan for online sites with `module_found=NULL` | ~5 sites/sec |
| **Stop Checking** | Stops after current batch finishes | — |

## UI Elements

### Online Column (green/red dots)
- **Green dot**: Online — hover shows `(200, 350ms)` or `(403 — Bot-protected)`
- **Red dot**: Offline — hover shows `Offline — DNS not found` or `Offline — Timeout`
- **--**: Not checked yet

### Module Column (colored pills)
- **Green pill**: Module found on site AND customer purchased it
- **Orange pill**: Module found on site but NOT in customer's purchases
- **No**: Site is online, checked, no modules found
- **--**: Not checked yet

### Summary Strip
Live-updated during checks: `9857 checked · 7510 online · 2347 offline · 471 modules · 1 queued`

### Progress Banner
Inline banner with progress bar + stat cards (Checked/Online/Offline/Modules).
Transitions: blue (running) → green (complete) → amber (stopped).

## API Endpoints

| Method | Path | Purpose |
|--------|------|---------|
| POST | `/api/website-checks/run` | Start full check (all due URLs) |
| POST | `/api/website-checks/rescan-modules` | Start module-only rescan |
| POST | `/api/website-checks/stop` | Stop running check |
| GET | `/api/website-checks/progress` | Live progress (polled every 2s) |
| GET | `/api/website-checks/status` | Summary counts |
| POST | `/api/products/<id>/slug` | Set module slug for a product |

## Re-check / Rebuild Procedures

### Reset and re-scan all modules (when detection logic changes)
```python
from supporthub.app.db import session_scope
from supporthub.app.models import WebsiteCheck

with session_scope() as s:
    # Reset module data
    s.query(WebsiteCheck).filter(
        WebsiteCheck.module_found.isnot(None)
    ).update({
        WebsiteCheck.module_found: None,
        WebsiteCheck.modules_found_slugs: None,
    })
    # Re-queue online sites
    s.query(WebsiteCheck).filter(
        WebsiteCheck.is_online == True,
        WebsiteCheck.module_found.is_(None),
    ).update({WebsiteCheck.next_check_after: None})
```
Then click **Rescan Modules** on the Customers page.

### Force re-check all sites (online + modules)
```python
with session_scope() as s:
    s.query(WebsiteCheck).update({WebsiteCheck.next_check_after: None})
```
Then click **Check Websites** on the Customers page. Warning: this re-checks ~10K sites, takes ~1-2 hours.

### Re-register URLs from all sources (new customers/orders)
```python
from supporthub.app.services.website_check_service import ensure_all_urls_registered
with session_scope() as s:
    new = ensure_all_urls_registered(s)
    print(f'{new} new URLs registered')
```
This runs automatically at the start of every batch check.

### Fix bad URLs (// prefix, missing scheme)
```python
from supporthub.app.services.website_check_service import sanitize_url
with session_scope() as s:
    for model, field in [
        (AddonsCustomer, AddonsCustomer.website),
        (AddonsOrder, AddonsOrder.customer_website_csv),
        (AddonsThreads, AddonsThreads.customer_website),
    ]:
        for row in s.query(model).filter(field.like('//%')).all():
            setattr(row, field.key, sanitize_url(getattr(row, field.key)))
```

### Add a new module slug
1. Go to Products page (`/products`)
2. Find the product row
3. Click the Slug cell and type the PrestaShop module folder name
4. The slug is the folder name under `/modules/` in a PrestaShop install
5. New checks will automatically test this slug on all sites

## Scheduling

| Scenario | next_check_after |
|----------|-----------------|
| Module found | +30 days |
| Online, no module | +60 days |
| Offline | 2099-01-01 (effectively never) |
| Never checked | NULL (picked up immediately) |
| Module rescan needed | Set to NULL manually |

## Stats (2026-03-06)

- 10,073 total URLs registered
- 7,680 online, 2,392 offline
- ~48% module detection rate on online sites (vs 1% before Content-Type fix)
- 6 product slugs configured
