Infrastructure & Operations
This section documents everything related to how RoundTrip is built, deployed, run, and maintained in production. It covers Azure resource architecture, deployment pipelines, request flow, code management, and operational runbooks.
If something is running in production or dev, it is documented here. If you make a change to infrastructure, update the relevant page before closing the ticket.
Who This Section Is For
This section is written for senior developers and DevOps engineers who need to understand, operate, or modify the RoundTrip infrastructure. It assumes familiarity with:
- Azure App Service, Azure SQL, Azure Key Vault
- CI/CD pipelines and Git-based deployment workflows
- .NET application hosting and configuration
- DNS, Cloudflare, and static site hosting
If you are a developer who only works on application code and never touches infrastructure, the most important pages for you are:
- Request Pipeline — understand what happens to your code at runtime
- Code Management — branch strategy, PR flow, pipeline behaviour
- The Gotchas section below — mistakes that have caused production incidents
What's in This Section
| Page | What It Covers |
|---|---|
| Deployment Architecture | Azure resource inventory, environment overview, subscription layout, pending dev environment setup |
| Request Pipeline | End-to-end request flow from browser to database and back — middleware, auth, mediator, Dapper, SignalR, Hangfire |
| Code Management | Branch strategy, commit conventions, PR flow, ADO pipelines, Cloudflare Pages deployment, pipeline agent |
| Infrastructure Reference | Azure resource names, Key Vault secret mappings, App Service configuration, Cloudflare rules, runbooks |
System Overview
RoundTrip is a multi-tenant SaaS platform with two environments — production and development — both hosted on Azure with frontend delivery via Cloudflare Pages.
┌─────────────────────────────────────┐
│ USERS & CLIENTS │
│ Dispatchers · Technicians (PWA) │
└──────────────┬──────────────────────┘
│ HTTPS
┌──────────────▼──────────────────────┐
│ CLOUDFLARE │
│ DNS · CDN · Pages · Zero Trust │
│ roundtrips.app / dev.roundtrips.app│
└──────────────┬──────────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌───────────▼──────────┐ ┌──────────▼──────────┐ ┌────────▼───────────┐
│ REACT FRONTEND │ │ AZURE APP SERVICE │ │ ENTRA EXT ID │
│ Cloudflare Pages │ │ .NET 10 API │ │ CIAM │
│ React 19 · Vite │ │ FastEndpoints │ │ roundtripapp │
│ TanStack Query │ │ SignalR · Hangfire │ │ .onmicrosoft.com │
│ MSAL · Tailwind │ │ Cyrus Mediator │ │ │
└───────────────────────┘ └──────────┬──────────┘ └────────────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌──────────▼──────┐ ┌────────▼────────┐ ┌──────▼──────────────┐
│ AZURE SQL │ │ AZURE KEY │ │ AZURE BLOB │
│ SQL Server │ │ VAULT │ │ STORAGE │
│ Production DB │ │ kv-roundtrip │ │ invoice-pdfs │
│ Dev DB │ │ -production │ │ container │
└─────────────────┘ └─────────────────┘ └─────────────────────┘
Environments
| Production | Development | |
|---|---|---|
| API | app-roundtrip-production | app-roundtrip-dev |
| Frontend | roundtrips.app | dev.roundtrips.app |
| Database | sqldb-roundtrip-production | sqldb-roundtrip-dev |
| Key Vault | kv-roundtrip-production | kv-roundtrip-dev |
| Branch | main | development |
| Pipeline trigger | Push to main | Push to development |
| ASPNETCORE_ENVIRONMENT | Production | Production ⚠️ |
:::warning Dev Environment Uses Production Mode
ASPNETCORE_ENVIRONMENT on app-roundtrip-dev must be set to Production — not Development. The dev environment uses production-style configuration (Key Vault references, full middleware stack) to accurately mirror production behaviour. Setting it to Development breaks Key Vault reference resolution.
:::
How the Pieces Connect
Understanding how the major components interact prevents a whole class of debugging mistakes. Here is the short version — see Request Pipeline for the full detail.
Authentication flow:
- Frontend (MSAL) authenticates the user against Entra External ID CIAM
- Entra issues a JWT access token scoped to the RoundTrip API
- Every API request carries the JWT in the
Authorizationheader TenantScopingBehaviorextracts theTenantIdfrom the JWT claim via a Dapper lookup- All subsequent data access is automatically scoped to that tenant
Request flow:
Browser → Cloudflare → App Service → FastEndpoints → Auth Middleware
→ Pipeline Behaviors (Logging → Validation → TenantScoping → Transaction)
→ Mediator → Handler → Dapper / EF Core → Azure SQL → Response
Background job flow:
Domain Event → IDomainEventPublisher → Handler → Hangfire.Enqueue()
→ Hangfire Worker → Job Class → Infrastructure (SendGrid / Graph API / QuestPDF)
Real-time notifications:
Handler → INotificationPushService → SignalRNotificationPushService
→ SignalR Hub → Connected browser clients
Critical Knowledge
These are the infrastructure facts that have caused production incidents. Every developer and infrastructure engineer must know them.
:::danger Key Vault Changes Require Stop/Start — Not Restart When a Key Vault secret reference is added or updated in App Service configuration, the App Service must be fully stopped then started. A restart does not flush the Key Vault reference cache. This has caused production outages. Always stop → start, never restart. :::
:::danger Connection String Key Is ConnectionStrings__Default
The App Service connection string setting must be named ConnectionStrings__Default. Never ConnectionStrings__DefaultConnection or DefaultConnection. The wrong key means the app starts but cannot reach the database.
:::
:::danger Cloudflare Pages Branch Mapping — TRA-195
The dev pipeline (azure-pipelines-dev.yml) must deploy with --branch development. Using --branch main in the dev pipeline overwrites the production Cloudflare Pages deployment regardless of project name. This has happened. Always verify the branch flag before touching pipeline files.
:::
:::danger EF Core Migrations Never Run at Startup
Never call MigrateDatabase() in Program.cs. This causes startup timeouts on Azure App Service. Migrations are run manually from a local machine against the target database connection string.
:::
:::caution GraphApi Config Uses b2c-extensions-app Credentials
GraphUserService must be configured with credentials from the b2c-extensions-app registration in the roundtripapp Entra tenant — not the RoundTrip API registration. Using the wrong app registration causes user invitation and role assignment to fail silently or with cryptic errors.
:::
:::caution WebSockets Must Be Enabled via CLI WebSockets are required for SignalR. The Azure Portal does not expose the WebSockets toggle for Linux App Service plans. Always enable via CLI:
az webapp config set --name <app-name> --resource-group <rg> --web-sockets-enabled true
:::
Operational Runbooks
Quick links to the most commonly needed runbooks. Full runbooks are in the Infrastructure Reference.
| Situation | Runbook |
|---|---|
App Service won't start — SiteStartupCancelled | App Service Won't Start |
| Key Vault secret not resolving | Add a New Key Vault Secret |
| Rotate a client secret | Rotate a Client Secret |
| Hangfire jobs failing | Check Hangfire Failed Jobs |
| Dev site serving stale content | Check Cloudflare Pages branch mapping for roundtrip-dev project |
| User invitations not sending | Verify GraphApi__ClientSecret App Service setting — may need stop/start |
Monitoring & Observability
| Tool | What It Monitors | Access |
|---|---|---|
| Application Insights | API request traces, exceptions, performance, dependencies | Azure Portal → appi-roundtrip-production |
| Log Analytics | Raw logs, custom queries across environments | Azure Portal → log-roundtrip-production |
| Seq | Structured API logs (conditional on valid URI) | Internal — check App Service config |
| Hangfire Dashboard | Background job status, retry queues, failed jobs | Internal URL — check startup config |
| Azure App Service Log Stream | Live container stdout/stderr | az webapp log tail --name app-roundtrip-production --resource-group rg-roundtrip-production |
What's Not Done Yet
These are known infrastructure gaps — documented here so they are not forgotten and can be picked up as tickets.
| Gap | Notes |
|---|---|
| Dev environment not fully configured | Key Vault secrets, dev tenant seed data, Cloudflare Pages dev branch — see Deployment Architecture pending list |
| Tests not running in CI pipelines | Unit, integration, and Playwright tests must be added to ADO pipelines |
| ADO approval gate on production deployment | Production deploys should require manual approval before going live |
| Personal account Azure resources not cleaned up | Old kv-roundtrip-prod secrets must be verified against kv-roundtrip-production before deleting |
| Hangfire dashboard URL not documented | Find and document the internal path — add to Infrastructure Reference |
| Stripe webhook suspend/reinstate automation | Tenant suspension on payment failure not yet implemented |
| Data retention job | Purge cancelled tenant data after 30/90 days — Hangfire recurring job |