Skip to main content

Infrastructure & Operations

This section documents everything related to how RoundTrip is built, deployed, run, and maintained in production. It covers Azure resource architecture, deployment pipelines, request flow, code management, and operational runbooks.

If something is running in production or dev, it is documented here. If you make a change to infrastructure, update the relevant page before closing the ticket.


Who This Section Is For

This section is written for senior developers and DevOps engineers who need to understand, operate, or modify the RoundTrip infrastructure. It assumes familiarity with:

  • Azure App Service, Azure SQL, Azure Key Vault
  • CI/CD pipelines and Git-based deployment workflows
  • .NET application hosting and configuration
  • DNS, Cloudflare, and static site hosting

If you are a developer who only works on application code and never touches infrastructure, the most important pages for you are:

  • Request Pipeline — understand what happens to your code at runtime
  • Code Management — branch strategy, PR flow, pipeline behaviour
  • The Gotchas section below — mistakes that have caused production incidents

What's in This Section

PageWhat It Covers
Deployment ArchitectureAzure resource inventory, environment overview, subscription layout, pending dev environment setup
Request PipelineEnd-to-end request flow from browser to database and back — middleware, auth, mediator, Dapper, SignalR, Hangfire
Code ManagementBranch strategy, commit conventions, PR flow, ADO pipelines, Cloudflare Pages deployment, pipeline agent
Infrastructure ReferenceAzure resource names, Key Vault secret mappings, App Service configuration, Cloudflare rules, runbooks

System Overview

RoundTrip is a multi-tenant SaaS platform with two environments — production and development — both hosted on Azure with frontend delivery via Cloudflare Pages.

┌─────────────────────────────────────┐
│ USERS & CLIENTS │
│ Dispatchers · Technicians (PWA) │
└──────────────┬──────────────────────┘
│ HTTPS
┌──────────────▼──────────────────────┐
│ CLOUDFLARE │
│ DNS · CDN · Pages · Zero Trust │
│ roundtrips.app / dev.roundtrips.app│
└──────────────┬──────────────────────┘

┌───────────────────────┼───────────────────────┐
│ │ │
┌───────────▼──────────┐ ┌──────────▼──────────┐ ┌────────▼───────────┐
│ REACT FRONTEND │ │ AZURE APP SERVICE │ │ ENTRA EXT ID │
│ Cloudflare Pages │ │ .NET 10 API │ │ CIAM │
│ React 19 · Vite │ │ FastEndpoints │ │ roundtripapp │
│ TanStack Query │ │ SignalR · Hangfire │ │ .onmicrosoft.com │
│ MSAL · Tailwind │ │ Cyrus Mediator │ │ │
└───────────────────────┘ └──────────┬──────────┘ └────────────────────┘

┌───────────────────┼───────────────────┐
│ │ │
┌──────────▼──────┐ ┌────────▼────────┐ ┌──────▼──────────────┐
│ AZURE SQL │ │ AZURE KEY │ │ AZURE BLOB │
│ SQL Server │ │ VAULT │ │ STORAGE │
│ Production DB │ │ kv-roundtrip │ │ invoice-pdfs │
│ Dev DB │ │ -production │ │ container │
└─────────────────┘ └─────────────────┘ └─────────────────────┘

Environments

ProductionDevelopment
APIapp-roundtrip-productionapp-roundtrip-dev
Frontendroundtrips.appdev.roundtrips.app
Databasesqldb-roundtrip-productionsqldb-roundtrip-dev
Key Vaultkv-roundtrip-productionkv-roundtrip-dev
Branchmaindevelopment
Pipeline triggerPush to mainPush to development
ASPNETCORE_ENVIRONMENTProductionProduction ⚠️

:::warning Dev Environment Uses Production Mode ASPNETCORE_ENVIRONMENT on app-roundtrip-dev must be set to Production — not Development. The dev environment uses production-style configuration (Key Vault references, full middleware stack) to accurately mirror production behaviour. Setting it to Development breaks Key Vault reference resolution. :::


How the Pieces Connect

Understanding how the major components interact prevents a whole class of debugging mistakes. Here is the short version — see Request Pipeline for the full detail.

Authentication flow:

  1. Frontend (MSAL) authenticates the user against Entra External ID CIAM
  2. Entra issues a JWT access token scoped to the RoundTrip API
  3. Every API request carries the JWT in the Authorization header
  4. TenantScopingBehavior extracts the TenantId from the JWT claim via a Dapper lookup
  5. All subsequent data access is automatically scoped to that tenant

Request flow:

Browser → Cloudflare → App Service → FastEndpoints → Auth Middleware
→ Pipeline Behaviors (Logging → Validation → TenantScoping → Transaction)
→ Mediator → Handler → Dapper / EF Core → Azure SQL → Response

Background job flow:

Domain Event → IDomainEventPublisher → Handler → Hangfire.Enqueue()
→ Hangfire Worker → Job Class → Infrastructure (SendGrid / Graph API / QuestPDF)

Real-time notifications:

Handler → INotificationPushService → SignalRNotificationPushService
→ SignalR Hub → Connected browser clients

Critical Knowledge

These are the infrastructure facts that have caused production incidents. Every developer and infrastructure engineer must know them.

:::danger Key Vault Changes Require Stop/Start — Not Restart When a Key Vault secret reference is added or updated in App Service configuration, the App Service must be fully stopped then started. A restart does not flush the Key Vault reference cache. This has caused production outages. Always stop → start, never restart. :::

:::danger Connection String Key Is ConnectionStrings__Default The App Service connection string setting must be named ConnectionStrings__Default. Never ConnectionStrings__DefaultConnection or DefaultConnection. The wrong key means the app starts but cannot reach the database. :::

:::danger Cloudflare Pages Branch Mapping — TRA-195 The dev pipeline (azure-pipelines-dev.yml) must deploy with --branch development. Using --branch main in the dev pipeline overwrites the production Cloudflare Pages deployment regardless of project name. This has happened. Always verify the branch flag before touching pipeline files. :::

:::danger EF Core Migrations Never Run at Startup Never call MigrateDatabase() in Program.cs. This causes startup timeouts on Azure App Service. Migrations are run manually from a local machine against the target database connection string. :::

:::caution GraphApi Config Uses b2c-extensions-app Credentials GraphUserService must be configured with credentials from the b2c-extensions-app registration in the roundtripapp Entra tenant — not the RoundTrip API registration. Using the wrong app registration causes user invitation and role assignment to fail silently or with cryptic errors. :::

:::caution WebSockets Must Be Enabled via CLI WebSockets are required for SignalR. The Azure Portal does not expose the WebSockets toggle for Linux App Service plans. Always enable via CLI:

az webapp config set --name <app-name> --resource-group <rg> --web-sockets-enabled true

:::


Operational Runbooks

Quick links to the most commonly needed runbooks. Full runbooks are in the Infrastructure Reference.

SituationRunbook
App Service won't start — SiteStartupCancelledApp Service Won't Start
Key Vault secret not resolvingAdd a New Key Vault Secret
Rotate a client secretRotate a Client Secret
Hangfire jobs failingCheck Hangfire Failed Jobs
Dev site serving stale contentCheck Cloudflare Pages branch mapping for roundtrip-dev project
User invitations not sendingVerify GraphApi__ClientSecret App Service setting — may need stop/start

Monitoring & Observability

ToolWhat It MonitorsAccess
Application InsightsAPI request traces, exceptions, performance, dependenciesAzure Portal → appi-roundtrip-production
Log AnalyticsRaw logs, custom queries across environmentsAzure Portal → log-roundtrip-production
SeqStructured API logs (conditional on valid URI)Internal — check App Service config
Hangfire DashboardBackground job status, retry queues, failed jobsInternal URL — check startup config
Azure App Service Log StreamLive container stdout/stderraz webapp log tail --name app-roundtrip-production --resource-group rg-roundtrip-production

What's Not Done Yet

These are known infrastructure gaps — documented here so they are not forgotten and can be picked up as tickets.

GapNotes
Dev environment not fully configuredKey Vault secrets, dev tenant seed data, Cloudflare Pages dev branch — see Deployment Architecture pending list
Tests not running in CI pipelinesUnit, integration, and Playwright tests must be added to ADO pipelines
ADO approval gate on production deploymentProduction deploys should require manual approval before going live
Personal account Azure resources not cleaned upOld kv-roundtrip-prod secrets must be verified against kv-roundtrip-production before deleting
Hangfire dashboard URL not documentedFind and document the internal path — add to Infrastructure Reference
Stripe webhook suspend/reinstate automationTenant suspension on payment failure not yet implemented
Data retention jobPurge cancelled tenant data after 30/90 days — Hangfire recurring job