# Translately — full documentation corpus > Auto-generated by `scripts/gen-llms-txt.sh`. Do not hand-edit — every edit to a `docs/**.md` file regenerates this file. The generator walks `docs/` in stable order, skips `llms.txt` and `llms-full.txt` themselves, and concatenates each page with a header boundary so an LLM can slice it back into sections. # docs/ The `docs/` directory is the source for the [GitHub Pages site](https://pratiyush.github.io/translately/). The [`pages.yml`](https://github.com/Pratiyush/translately/blob/master/.github/workflows/pages.yml) workflow builds it with Jekyll + just-the-docs on every push to `master`. Per [CLAUDE.md rule #10](https://github.com/Pratiyush/translately/blob/master/CLAUDE.md), every PR that changes user-visible behaviour, the API surface, a config knob, or architecture lands its matching page update here in the same PR. ## Structure - [`index.md`](index.md) — landing page (hero, roadmap, quickstart, download links). Rendered by Jekyll with the just-the-docs theme. - [`_config.yml`](_config.yml) — Jekyll build config (theme, aux links, search, footer). - [`product/`](product/) — user-facing feature walkthroughs, one page per flow, light + dark screenshots. - [`api/`](api/) — API reference: committed `openapi.json`, scope matrix, error-code catalogue, rate-limit policy, versioning contract. - [`architecture/`](architecture/) — module maps, data flow, multi-tenancy, crypto; Architecture Decision Records under [`architecture/decisions/`](architecture/decisions/). - [`self-hosting/`](self-hosting/) — operator guides; [`hardening.md`](self-hosting/hardening.md) is the production-readiness checklist. - [`llms.txt`](llms.txt) — LLM-ingestible link index per the [llmstxt.org](https://llmstxt.org) standard. - [`llms-full.txt`](llms-full.txt) — every `.md` page concatenated for one-shot LLM ingestion. Regenerated by [`scripts/gen-llms-txt.sh`](https://github.com/Pratiyush/translately/blob/master/scripts/gen-llms-txt.sh). - `sdks/` — per-SDK guides (added in Phase 5). - `migration/` — migration guides for operators moving from another localization platform (added in Phase 7). ## Regenerating `llms-full.txt` Every PR that adds or changes a `.md` page under `docs/` must regenerate the full corpus: ```bash scripts/gen-llms-txt.sh # rewrite docs/llms-full.txt scripts/gen-llms-txt.sh --check # fail if the committed file is stale ``` The script is deterministic — files are walked in `LC_ALL=C sort` order and separated by `` markers. ## Local preview ```bash # Any static server works python3 -m http.server -d docs 8000 # http://localhost:8000/ ``` ## Writing style - Open source, technical, direct. No marketing fluff. - Every code block is copy-paste runnable against the current `master`. - Every page links to the `CHANGELOG.md` version that introduced the feature. - Accessibility: semantic HTML, keyboard-nav friendly, respects `prefers-color-scheme`. - Screenshots in both themes, named `*-light.png` / `*-dark.png`. --- title: API reference nav_order: 3 has_children: true permalink: /api/ --- # API reference Translately exposes a single versioned HTTP API under `/api/v1/`. This tree is the canonical reference. Per [CLAUDE.md rule #10](https://github.com/Pratiyush/translately/blob/master/CLAUDE.md), every PR that changes an endpoint, scope, error code, rate-limit, or versioning rule lands its matching page update here — including a regenerated [`openapi.json`](openapi.json) — in the same PR. ## Pages - [`openapi.json`](openapi.json) — machine-readable OpenAPI 3.1 spec, auto-generated from the backend. Regenerate on every API change (see [Regenerating](#regenerating) below). - [Scopes](scopes.md) — permission scope matrix (role → scope set), scope naming convention, `@RequiresScope` usage. - [Error codes](errors.md) — stable catalogue of `error.code` strings, HTTP status mapping, troubleshooting. - [Rate limits](rate-limits.md) — per-token and per-endpoint policy, headers, `429` retry semantics. - [Versioning](versioning.md) — URL-path versioning, deprecation policy, breaking-change rule. - [Authentication](auth.md) — JWT vs. PAT vs. API key, bearer-credential split, refresh rotation. - [Organizations, projects, members](organizations-and-projects.md) — self-serve org creation, project CRUD inside an org, role-change + remove for members. - [Keys, namespaces, translations](keys-and-namespaces.md) — project-scoped key CRUD, namespaces, per-language translation upsert. ## Conventions (mirror of [`.kiro/steering/api-conventions.md`](https://github.com/Pratiyush/translately/blob/master/.kiro/steering/api-conventions.md)) - **Base path:** `/api/v1/`. v2 only lands when v1 cannot absorb a change without breaking clients. - **Errors:** uniform envelope — `{"error":{"code":"ERROR_CODE","message":"human readable","details":{...}}}`. Codes are `SCREAMING_SNAKE_CASE`, stable across minor versions. - **Pagination:** cursor-based, `?cursor=...&limit=...`, responses carry `nextCursor` (null when exhausted). - **IDs:** ULIDs, 26-char base32, sortable. Exposed verbatim on the wire. - **Times:** ISO-8601 UTC, always with a `Z` suffix. - **Scopes:** every protected endpoint declares a minimum scope set via `@RequiresScope`. `INSUFFICIENT_SCOPE` is the 403 code. ## OpenAPI ingestion The committed [`openapi.json`](openapi.json) is the source of truth for generated clients: - The `@translately/js` SDK (Phase 5 — `sdks/js/`) is generated from this file via `openapi-typescript`. - The webapp uses the same generated types for every network call (T120). If you change a controller, regenerate the spec in the same PR. A stale `openapi.json` breaks the SDK build. ## Regenerating `docs/api/openapi.json` is produced by the Quarkus SmallRye OpenAPI extension at build time. Config lives in [`backend/app/src/main/resources/application.yml`](https://github.com/Pratiyush/translately/blob/master/backend/app/src/main/resources/application.yml) under `quarkus.smallrye-openapi.store-schema-directory`; two Gradle tasks manage the committed copy: ```bash # Regenerate + copy into docs/api/openapi.json ./gradlew :backend:app:copyOpenApi # Verify the committed file matches the build (CI runs this automatically) ./gradlew :backend:app:checkOpenApiUpToDate ``` `checkOpenApiUpToDate` is wired into `./gradlew check`, so CI fails with an actionable message if the committed schema drifts from the current controllers. Run `copyOpenApi` and commit the result to fix. --- title: Authentication endpoints parent: API reference nav_order: 5 --- # API — authentication This page covers **HTTP-level** authentication — the endpoints, the headers, and the payload shapes. For the design-level view (why rotation, why Argon2id, credential-type tradeoffs), see [auth architecture](../architecture/auth.md). Introduced by: [T103](https://github.com/Pratiyush/translately/issues/21) (email + password + verify + refresh), [T104](https://github.com/Pratiyush/translately/issues/22) (JWT issuer), [T110](https://github.com/Pratiyush/translately/issues/28) (API keys + PATs), [T110-enforce](https://github.com/Pratiyush/translately/issues/149) (ApiKey + PAT authenticator filters). Related: [scopes](scopes.md), [errors](errors.md), [auth architecture](../architecture/auth.md). ## Credential types Translately accepts three kinds of long-lived credentials, plus OIDC in Phase 7: | Credential | `Authorization` header (verbatim) | Token prefix | Identifies | TTL | |---|---|---|---|---| | Access JWT | `Authorization: Bearer ` | — (JWT is `header.payload.signature`) | a user | ~15 min | | API key | `Authorization: ApiKey tr_ak_<8>.<43>` | `tr_ak_` | a project | until revoked / expired | | PAT | `Authorization: Bearer tr_pat_<8>.<43>` | `tr_pat_` | a user | until revoked / expired | | OIDC (Phase 7) | `Authorization: Bearer ` | — | a user | IdP-defined | ### Dispatch rules The backend looks at the `Authorization` header and dispatches: 1. **`ApiKey `** — `ApiKeyAuthenticator` handles the request. Scopes come from the `api_keys.scopes` column verbatim. 2. **`Bearer tr_pat_`** — `PatAuthenticator` handles the request. Scopes are intersected with the owning user's current effective scope set at request time. 3. **`Bearer `** — the JWT auth layer handles it. A token that fails JWT parsing returns 401 at the HTTP layer, before any JAX-RS filter runs. 4. **No `Authorization` header** — the request proceeds anonymously. Protected endpoints answer 403 `INSUFFICIENT_SCOPE` (or 401 if the resource carries `@Authenticated`). A request carrying more than one credential (e.g. a JWT `Authorization` header *plus* an `x-api-key` query parameter) is refused — we never merge grants across credentials. HTTP itself allows only one `Authorization` header, so the common mistake is double-sending the credential in two wire places; the extra will be logged and rejected. ### Token shape on the wire All three non-JWT credential shapes look the same: ``` tr__<8-char-tail>.<43-char-base64url-secret> └───────── prefix ─────┘ └──────────── secret ─────────┘ ``` - The **prefix** is stored in the DB and is safe to show in UIs, logs, and audit trails. - The **secret** half is Argon2id-hashed before persistence. It's returned to the mint caller exactly once, then discarded. - The `.` separator lets us parse the two halves cleanly even though the base64url secret may contain `_` or `-`. ### Failure modes at authentication time | HTTP | `error.code` | When | |---|---|---| | 401 | `UNAUTHENTICATED` | Unknown prefix, bad secret, malformed token shape, or (for `@Authenticated` endpoints) no credential at all | | 401 | `CREDENTIAL_REVOKED` | Prefix matches and secret verifies, but `revoked_at IS NOT NULL` | | 401 | `CREDENTIAL_EXPIRED` | Prefix matches and secret verifies, but `expires_at < NOW()` | | 403 | `INSUFFICIENT_SCOPE` | Credential is valid but the target endpoint requires a scope not in its effective set | Unknown prefix and bad secret **collapse to the same code** on purpose: exposing the distinction would let an attacker fingerprint the prefix space via timing or response content. Revoked / expired get their own codes because the client can act on the distinction (rotate vs. re-mint). ## Endpoints All endpoints live under `/api/v1/auth/`. No scope is required to call them — they're the pre-auth surface. ### `POST /api/v1/auth/signup` Create a new user. Always returns 202 (see the forgot-password note below for the enumeration-avoidance rationale). ```http POST /api/v1/auth/signup Content-Type: application/json { "email": "me@example.com", "password": "correct horse battery staple", "fullName": "Me" } HTTP/1.1 202 Accepted ``` - Side-effect: sends a verify-email message via Quarkus Mailer → Mailpit in dev. - The verify link embeds a single-use Argon2id-hashed token; clicking it hits `POST /auth/verify-email`. - Validation errors (too-short password, malformed email) return `VALIDATION_FAILED` (400). ### `POST /api/v1/auth/verify-email` Consume the verification token from the email. On success, stamps `users.email_verified_at = NOW()`. ```http POST /api/v1/auth/verify-email Content-Type: application/json { "token": "" } HTTP/1.1 204 No Content ``` - Wrong / expired token → `INVALID_CREDENTIALS` (401). ### `POST /api/v1/auth/login` Exchange email + password for an access + refresh pair. ```http POST /api/v1/auth/login Content-Type: application/json { "email": "me@example.com", "password": "correct horse battery staple" } HTTP/1.1 200 OK Content-Type: application/json { "accessToken": "eyJ...", "accessExpiresAt": "2026-04-18T11:00:00Z", "refreshToken": "eyJ...", "refreshExpiresAt": "2026-05-18T10:45:00Z" } ``` - The refresh token is **also** set as an `HttpOnly; Secure; SameSite=Lax` cookie named `tr_refresh` for browser clients. CLI / server clients use the JSON body. - Wrong credentials → `INVALID_CREDENTIALS` (401). Unverified email → `EMAIL_NOT_VERIFIED` (403). ### `POST /api/v1/auth/refresh` Rotate the refresh token. **Single-use** — presenting a refresh token that was already consumed invalidates every refresh token for that user (session-wide kill switch on suspected replay). ```http POST /api/v1/auth/refresh Content-Type: application/json { "refreshToken": "eyJ..." } HTTP/1.1 200 OK { "accessToken": "eyJ...", "accessExpiresAt": "2026-04-18T11:15:00Z", "refreshToken": "eyJ...", "refreshExpiresAt": "2026-05-18T11:00:00Z" } ``` - Replay / already-consumed → `REFRESH_TOKEN_REUSED` (401). See [auth architecture](../architecture/auth.md#rotation-and-replay-protection-t103) for the full flow. - Missing / invalid signature → `TOKEN_INVALID` (401). - Past `exp` → `TOKEN_EXPIRED` (401). Browser clients may send the refresh token via the `tr_refresh` cookie instead of the body; the endpoint accepts whichever is present (not both). ### `POST /api/v1/auth/forgot-password` Start the password-reset flow. **Always returns 202**, regardless of whether the email exists — this is deliberate so an attacker cannot enumerate valid accounts. ```http POST /api/v1/auth/forgot-password Content-Type: application/json { "email": "me@example.com" } HTTP/1.1 202 Accepted ``` - Side-effect: if the email matches a user, sends a reset email with a single-use, Argon2id-hashed token. - No error responses (rate-limited like every unauthenticated endpoint — see [rate-limits](rate-limits.md)). ### `POST /api/v1/auth/reset-password` Consume the reset token and set a new password. ```http POST /api/v1/auth/reset-password Content-Type: application/json { "token": "", "newPassword": "new correct horse battery staple" } HTTP/1.1 204 No Content ``` - Invalid / expired / already-used token → `INVALID_CREDENTIALS` (401). - Weak password (policy: ≥ 12 chars) → `VALIDATION_FAILED` (400). - On success, every active refresh token for the user is invalidated — the user must re-login. ## JWT structure Access tokens are compact-serialized, RS256-signed JWTs. Claims: | Claim | Type | Meaning | |---|---|---| | `iss` | string | `"translately"` by default | | `aud` | string | `"translately-webapp"` by default | | `sub` | string | user ULID | | `upn` | string | user email (the User Principal Name) | | `scope` | string | space-separated scope tokens | | `groups` | string[] | same scope tokens, array form | | `orgs` | object[] | `[{id, slug, role}, …]` | | `typ` | `"access"` | distinguishes from refresh | | `iat` / `exp` | int | epoch seconds | Refresh tokens carry only `iss`, `aud`, `sub`, `jti`, `typ="refresh"`, `iat`, `exp`. See [auth architecture](../architecture/auth.md#jwt-format) for the full schema and rotation flow. ## Using a credential on a protected endpoint ```bash # Access JWT curl -H "Authorization: Bearer $ACCESS_JWT" \ https://api.example.com/api/v1/organizations/acme/projects # API key — project-scoped, server-to-server curl -H "Authorization: ApiKey tr_ak_k9c4n2xb.a1B2c3D4e5F6g7H8i9J0k1L2m3N4o5P6q7R8s9T0u1V" \ https://api.example.com/api/v1/projects/01HT.../keys # Personal Access Token — user-scoped, cross-project curl -H "Authorization: Bearer tr_pat_k9c4n2xb.a1B2c3D4e5F6g7H8i9J0k1L2m3N4o5P6q7R8s9T0u1V" \ https://api.example.com/api/v1/organizations/acme/projects ``` Responses always carry [rate-limit headers](rate-limits.md#response-headers), and on 403 emit the [`INSUFFICIENT_SCOPE` envelope](errors.md#insufficient_scope). ## Minting API keys and PATs API keys are project-scoped and require the `api-keys.write` scope in the owning organization. PATs are user-scoped and require only a valid access JWT — users can always manage their own credentials. Scopes on the new credential are **intersected** with the caller's current scope set (you can't mint something you don't hold). Both flows return the full secret **exactly once**, in the `201 Created` response. The secret is Argon2id-hashed before persistence; there is no "reveal" endpoint. ```http POST /api/v1/projects/01HT.../api-keys Authorization: Bearer Content-Type: application/json { "name": "CI publisher", "scopes": ["keys.read", "keys.write", "translations.write", "imports.write"] } HTTP/1.1 201 Created Content-Type: application/json { "id": "01HT...", "prefix": "tr_ak_9zF4n6ab", "secret": "tr_ak_9zF4n6ab.aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789_-AbCdEfG", "scopes": ["imports.write", "keys.read", "keys.write", "translations.write"], "createdAt": "2026-04-18T10:45:00Z" } ``` Save the `secret` somewhere safe — the server never shows it again. Full product walkthrough at [API keys and PATs](../product/api-keys-and-pats.md). ## OpenAPI The authoritative machine-readable spec is at [`openapi.json`](openapi.json). Every endpoint on this page carries its `@Operation` + `@APIResponses` annotations; regenerating the spec is part of every API PR (T113). --- title: Error codes parent: API reference nav_order: 2 --- # Error-code catalogue Every 4xx / 5xx response from the Translately API carries a uniform envelope. `error.code` is **stable across minor versions** — CLIs, SDKs, and the webapp match on it rather than parsing the human-readable message. Introduced by: [T108](https://github.com/Pratiyush/translately/issues/133) (scope authorization + envelope). Related: [API conventions steering](https://github.com/Pratiyush/translately/blob/master/.kiro/steering/api-conventions.md), [scopes](scopes.md), [auth](auth.md). ## Envelope shape ```json { "error": { "code": "KEY_NAME_TAKEN", "message": "A key named \"home.title\" already exists in this namespace.", "details": { "keyName": "home.title", "namespaceId": "01HT7F8..." }, "traceId": "01HT7F8..." } } ``` - **`code`** — `SCREAMING_SNAKE_CASE`, stable, machine-readable. Never renamed; new codes are added, old codes are deprecated. - **`message`** — human-readable English. The webapp i18ns from `code`, never from `message`. - **`details`** — optional structured context. Shape varies per code; documented below. - **`traceId`** — request id, always present, matches the server log entry. Pass this back when reporting issues. Content type: `application/json; charset=utf-8`. HTTP status is driven by the code (see the rightmost column below). ## Catalogue ### Authentication — 401 | Code | Meaning | Typical `details` | Introduced | |---|---|---|---| | `UNAUTHENTICATED` | No credential on the request | — | Phase 1 | | `INVALID_CREDENTIALS` | Wrong email + password, bad API key, or expired PAT | — | Phase 1 / T103 | | `TOKEN_EXPIRED` | Access JWT or refresh JWT past `exp` | `{ "expiredAt": "2026-04-18T10:00:00Z" }` | Phase 1 / T104 | | `TOKEN_INVALID` | Signature mismatch or malformed JWT | — | Phase 1 / T104 | | `REFRESH_TOKEN_REUSED` | Replay of a consumed refresh token — invalidates **every** refresh token belonging to the user | `{ "userId": "…" }` | Phase 1 / T103 | | `EMAIL_NOT_VERIFIED` | Valid credential but the user hasn't clicked the verify link | `{ "email": "…" }` | Phase 1 / T103 | ### Authorization — 403 | Code | Meaning | Typical `details` | Introduced | |---|---|---|---| | `INSUFFICIENT_SCOPE` | Authenticated but missing a required scope | `{ "required": […], "missing": […] }` | Phase 1 / T108 | | `FORBIDDEN` | Generic forbidden — caller has every required scope but the resource-level policy rejected (e.g. cross-org access) | — | Phase 1 | #### `INSUFFICIENT_SCOPE` in detail Exact response body from [`InsufficientScopeExceptionMapper`](https://github.com/Pratiyush/translately/blob/master/backend/api/src/main/kotlin/io/translately/api/security/InsufficientScopeExceptionMapper.kt): ```json { "error": { "code": "INSUFFICIENT_SCOPE", "message": "Missing required scope(s): keys.write", "details": { "required": ["keys.read", "keys.write"], "missing": ["keys.write"] } } } ``` - HTTP status: **403 Forbidden**. - `Content-Type: application/json`. - The [OAuth 2.0 `WWW-Authenticate: Bearer error="insufficient_scope" scope="…"` header](https://www.rfc-editor.org/rfc/rfc6750#section-3) is emitted by the filter when the caller authenticated via bearer JWT, so compliant OAuth clients can observe it. - Both `required` and `missing` are sorted alphabetically for deterministic diffs in logs / tests. ### Validation — 400 / 422 | Code | Meaning | Typical `details` | Introduced | |---|---|---|---| | `VALIDATION_FAILED` | Field-level validation errors on the request body | `{ "fields": [{ "path": "body.email", "code": "REQUIRED" }, …] }` | Phase 1 | | `PAGE_TOO_LARGE` | `limit` exceeds 200 | `{ "limit": 500, "max": 200 }` | Phase 1 | | `INVALID_SORT_FIELD` | `sort=field` references a non-whitelisted field | `{ "field": "createdAt" }` | Phase 1 | | `ICU_MESSAGE_INVALID` | Semantic validation failure from `icu4j` (422) | `{ "position": 17, "reason": "Expected '}'" }` | Phase 2 | | `MALFORMED_JSON` | Request body isn't valid JSON (400) | — | Phase 1 | ### Resource state — 404 / 409 / 410 | Code | Meaning | Typical `details` | |---|---|---| | `NOT_FOUND` | Resource not found, **or** auth prevents disclosing existence | — | | `KEY_NAME_TAKEN` | Unique constraint: key name already exists in this namespace | `{ "keyName": "…" , "namespaceId": "…" }` | | `ORG_SLUG_TAKEN` | Org slug already used globally | `{ "slug": "…" }` | | `PROJECT_SLUG_TAKEN` | Project slug already used in this org | `{ "slug": "…", "orgId": "…" }` | | `VERSION_CONFLICT` | Optimistic-locking: stored version differs from submitted | `{ "expected": 5, "actual": 7 }` | | `GONE` | Resource soft-deleted and past retention (410) | — | ### Rate-limiting — 429 | Code | Meaning | Typical `details` | |---|---|---| | `RATE_LIMIT_EXCEEDED` | Per-token sliding-window cap hit | `{ "limit": 120, "windowSeconds": 60, "retryAfterSeconds": 12 }` | Every 429 response carries the [`Retry-After` header](rate-limits.md#retry-headers). ### Server / dependency — 500 / 503 | Code | Meaning | Typical `details` | |---|---|---| | `INTERNAL_ERROR` | Unhandled exception; always logs a stack trace with `traceId` | — | | `DEPENDENCY_UNAVAILABLE` | Redis / S3 / Mailpit / DB unreachable | `{ "dependency": "redis" }` | ## Never-log rule The uniform envelope **never** includes: - `Authorization` header values, API keys, PATs, refresh tokens, password hashes, BYOK AI keys. - Request bodies for authenticated endpoints. - Webhook bodies. See the steering rule in [`.kiro/steering/api-conventions.md`](https://github.com/Pratiyush/translately/blob/master/.kiro/steering/api-conventions.md#logging). ## For SDK authors - Expose `error.code` as a typed enum; tolerate unknown values (forward-compat). - Surface `error.message` to humans untranslated; localise via the `code` using your own catalogue. - Surface `error.details` in its structured form — don't try to flatten. - On `RATE_LIMIT_EXCEEDED` / `DEPENDENCY_UNAVAILABLE`, implement exponential backoff; honour `Retry-After`. - On `REFRESH_TOKEN_REUSED`, invalidate local state and force a re-login — this is a security signal, not a retryable error. ## Deprecation Deprecated codes return with the body unchanged plus response headers: ``` Deprecation: true Sunset: Sat, 01 Nov 2026 00:00:00 GMT Link: ; rel="deprecation" ``` Announced in [`CHANGELOG.md`](https://github.com/Pratiyush/translately/blob/master/CHANGELOG.md) under `### Deprecated` at the release the deprecation lands, moved to `### Removed` at the sunset release. --- title: Imports and exports (i18next JSON) parent: API reference nav_order: 5 permalink: /api/imports-and-exports.html --- # Imports and exports — i18next JSON Ships in Phase 3 (T301 + T302). One synchronous import endpoint, one synchronous export endpoint. Async + SSE progress streaming (T303) moved to Phase 4. ## POST /imports/json Upsert translations into a project for a single language tag. ``` POST /api/v1/organizations/{orgSlug}/projects/{projectSlug}/imports/json Authorization: Bearer # or ApiKey / PAT with imports.write Content-Type: application/json ``` ### Request body ```json { "languageTag": "en", "namespaceSlug": "default", "mode": "MERGE", "body": "{\"nav.signIn\":\"Sign in\"}" } ``` - `languageTag` — BCP-47 tag. Required. Must be configured on the project (or the project has no configured languages). - `namespaceSlug` — optional. Auto-created if it doesn't exist yet. Defaults to `default`. - `mode` — `KEEP` / `OVERWRITE` / `MERGE`. Case-insensitive. Required. - `body` — the raw i18next JSON as a string (so the server can auto-detect flat vs nested shape). ### Conflict modes | Mode | Existing missing | Existing blank | Existing non-blank | |---|---|---|---| | `KEEP` | write | **skip** | **skip** | | `OVERWRITE` | write | write | write | | `MERGE` | write | write | **skip** | Import is transactional — if any exception is raised, the whole call rolls back. Per-row ICU validation lands bad rows in the `errors[]` array without rolling back the clean rows. ### Response ```json { "total": 3, "created": 2, "updated": 0, "skipped": 1, "failed": 0, "errors": [] } ``` If one row has bad ICU: ```json { "total": 2, "created": 1, "updated": 0, "skipped": 0, "failed": 1, "errors": [{"keyName":"broken","code":"INVALID_ICU_TEMPLATE","message":"Unmatched '{'..."}] } ``` ### Errors - `400 VALIDATION_FAILED` — bad mode, empty body, malformed JSON, or unsupported shape (e.g. top-level array). - `401 UNAUTHENTICATED` - `404 NOT_FOUND` — project not found, caller not a member, or languageTag not configured. ## GET /exports/json Download one language's translations as i18next JSON. ``` GET /api/v1/organizations/{orgSlug}/projects/{projectSlug}/exports/json ?languageTag=en &shape=FLAT [&namespaceSlug=default] [&tags=email,onboarding] # keys must carry every listed tag [&minState=APPROVED] # EMPTY < DRAFT < TRANSLATED < REVIEW < APPROVED Authorization: Bearer # or credential with exports.read ``` The response body is the JSON file itself. `Content-Disposition` carries a suggested filename (`{projectSlug}-{languageTag}-{shape}.json`), `X-Translately-Key-Count` carries the row count for progress bars. ### Errors - `400 VALIDATION_FAILED` — missing `languageTag`, bad `shape`, bad `minState`. - `401 UNAUTHENTICATED` - `404 NOT_FOUND` — project not found or caller not a member. ## Scopes | Endpoint | Scope | |---|---| | `POST /imports/json` | `imports.write` | | `GET /exports/json` | `exports.read` | See [scopes.md](scopes.md) for how scopes compose with roles. --- title: Keys, namespaces, translations parent: API reference nav_order: 7 --- # Keys, namespaces, translations Project-scoped CRUD for the localization data model. Every endpoint is `@Authenticated`; authorization runs in the service layer via `requireProjectAccess`, so non-members see `NOT_FOUND` (404) — the server never discloses whether a private project exists. Introduced by T208 backend (closes nothing alone; paired with the T207+T208 webapp PR to close #48 + #49). All bodies are `application/json`. Related: [scopes](scopes.md), [errors](errors.md), [organizations-and-projects](organizations-and-projects.md). ## Namespaces `GET /api/v1/organizations/{orgSlug}/projects/{projectSlug}/namespaces` — list every namespace in the project. ```http 200 OK { "data": [ { "id": "01HT...", "slug": "web", "name": "Web", "description": null, "createdAt": "2026-04-19T10:00:00Z" } ] } ``` `POST /api/v1/organizations/{orgSlug}/projects/{projectSlug}/namespaces` — create. ```http { "name": "Web", "slug": "web", "description": "Web app strings" } ``` - `slug` optional; if omitted it's derived from `name` (lowercase, non-alphanumeric → `-`). - 409 `NAMESPACE_SLUG_TAKEN` if the slug collides inside the same project. - 400 `VALIDATION_FAILED` if `name` is empty or >128 chars. ## Keys `GET /api/v1/organizations/{orgSlug}/projects/{projectSlug}/keys` — list. Query params: - `namespace=` — filter by namespace. - `limit=` (default 50, max 200), `offset=` (default 0). `POST /keys` — create. ```http { "namespaceSlug": "web", "keyName": "settings.save", "description": "Button label on the settings panel" } ``` Triggers an `Activity(actionType=CREATED)` row. `GET /keys/{keyId}` — single key with translations + tags + recent activity. `PATCH /keys/{keyId}` — rename, change description, change state, move namespace. ```http { "keyName": "settings.save.button", "state": "TRANSLATING" } ``` Writes either `UPDATED` or `STATE_CHANGED` Activity depending on which fields changed. `DELETE /keys/{keyId}` — soft-delete (sets `softDeletedAt`). Idempotent. Writes `DELETED` Activity. ## Translations `PUT /api/v1/organizations/{orgSlug}/projects/{projectSlug}/keys/{keyId}/translations/{languageTag}` — upsert the translation for a specific language. ```http { "value": "Save", "state": "DRAFT" } ``` - `languageTag` must match a configured `ProjectLanguage` on the project. - `state` is optional; if omitted and `value` is non-empty, state flips to `DRAFT`. Explicit `APPROVED` requires reviewer scope (enforced at the resource). - Writes `TRANSLATED` Activity. - ICU validation of `value` is deferred — wire-up with T203 is a follow-up on this endpoint. ## Error codes specific to this surface | Code | HTTP | When | |---|---|---| | `NOT_FOUND` | 404 | Target project / key / namespace doesn't exist, or caller is not a member | | `VALIDATION_FAILED` | 400 | Missing required field, bad field length, unknown state enum value | | `NAMESPACE_SLUG_TAKEN` | 409 | Namespace slug collision inside this project | | `KEY_NAME_TAKEN` | 409 | `(namespace, keyName)` collision inside this project | | `LANGUAGE_NOT_CONFIGURED` | 409 | `languageTag` on a translation upsert is not in the project's configured languages | | `UNAUTHENTICATED` | 401 | No bearer credential on the request | ## What's NOT here yet - **Tag resource.** Backend CRUD for tags lands with the webapp PR that needs it. - **Search & filter.** Free-text + tag-intersection search is the dedicated FTS path — see T206 / #47 (the architecture page at `docs/architecture/search.md` lands alongside that PR). - **Activity timeline endpoint.** The Activity rows are written but no `GET /keys/{keyId}/activity` is exposed yet; deferred with #46 post-MVP. ## Changelog First shipped in [Unreleased] under T208 backend. --- title: Organizations, projects, members parent: API reference nav_order: 6 --- # Organizations, projects, and members CRUD surface that v0.1.0 ships so the webapp's org/project pages have something to call. Every endpoint is `@Authenticated`; authorization is scoped to **organization membership** — non-members get `NOT_FOUND` (404) so the server never discloses whether a private org exists. Introduced by T118 (orgs UI) + T119 (projects UI + member management) + the matching backend work. All bodies are `application/json`. Related: [scopes](scopes.md), [errors](errors.md), [authentication endpoints](auth.md). ## Organizations `GET /api/v1/organizations` — list every org the caller belongs to. ```http 200 OK { "data": [ { "id": "01HT...", "slug": "acme", "name": "Acme Corp", "callerRole": "OWNER", "createdAt": "2026-04-18T10:45:00Z" } ] } ``` `POST /api/v1/organizations` — create a new org; the caller is added as OWNER. ```http { "name": "Acme Corp", "slug": "acme" } ``` - `slug` is optional; if omitted, we derive one from `name` (lowercase, non-alphanumeric → `-`, trimmed). - 201 on success with the full body shown above. - 409 `ORG_SLUG_TAKEN` if the slug is already in use — slugs are unique globally. - 400 `VALIDATION_FAILED` if `name` is empty / >128 chars, or the derived slug is unusable. `GET /api/v1/organizations/{orgSlug}` — single org (ULID or slug both accepted). 404 if you're not a member. `PATCH /api/v1/organizations/{orgSlug}` — rename. ```http { "name": "Acme International" } ``` Returns the updated body. Other fields (billing, BYOK AI config) are not editable in v0.1.0. ## Members `GET /api/v1/organizations/{orgSlug}/members` — list members. Any member can call. ```http 200 OK { "data": [ { "userId": "01HT...", "email": "alice@example.com", "fullName": "Alice Example", "role": "OWNER", "invitedAt": "2026-04-18T10:45:00Z", "joinedAt": "2026-04-18T10:45:00Z" } ] } ``` `PATCH /api/v1/organizations/{orgSlug}/members/{userId}` — change a member's role. Caller must be OWNER or ADMIN. ```http { "role": "ADMIN" } ``` - 400 `VALIDATION_FAILED` (`body.role = INVALID`) if the role isn't one of `OWNER` / `ADMIN` / `MEMBER`. - 409 `LAST_OWNER` if the change would leave the org with zero OWNERs. `DELETE /api/v1/organizations/{orgSlug}/members/{userId}` — remove a member. Idempotent target (404 if they aren't a member). - 409 `LAST_OWNER` if removing the target would leave the org with zero OWNERs. ### What about invites? Explicitly **not in v0.1.0.** The invite-by-email + pending-acceptance lifecycle needs the token-email plumbing that SSO / SAML / LDAP (Phase 7) brings. Until then, members grow through self-serve org creation — each user makes their own org and runs solo, or an existing member promotes them via the PATCH endpoint once their `sub` is known. ## Projects `GET /api/v1/organizations/{orgSlug}/projects` — list every project in the org. ```http 200 OK { "data": [ { "id": "01HT...", "slug": "marketing", "name": "Marketing site", "description": "Website copy", "baseLanguageTag": "en", "createdAt": "2026-04-18T10:45:00Z" } ] } ``` `POST /api/v1/organizations/{orgSlug}/projects` — create. ```http { "name": "Marketing site", "slug": "marketing", "description": "Website copy", "baseLanguageTag": "en" } ``` - `slug` optional; derived from `name` when absent. - `description` optional. - `baseLanguageTag` defaults to `"en"` when absent. - 409 `PROJECT_SLUG_TAKEN` if the slug collides within the same org (slugs are unique per org, not globally). `GET /api/v1/organizations/{orgSlug}/projects/{projectSlug}` — single project. `PATCH /api/v1/organizations/{orgSlug}/projects/{projectSlug}` — rename / edit description. ```http { "name": "Marketing Site 2.0", "description": null } ``` Pass `null` / empty string on `description` to clear it. The `baseLanguageTag` is immutable in v0.1.0 (Phase 2 adds a migration path). ## Error responses All endpoints use the [uniform error envelope](errors.md). The codes specific to this surface: | Code | HTTP | When | |---|---|---| | `NOT_FOUND` | 404 | Target org / project / member doesn't exist, or caller is not a member of the target org | | `VALIDATION_FAILED` | 400 | Name empty / too long, slug unparseable, role unknown | | `ORG_SLUG_TAKEN` | 409 | Slug already in use globally | | `PROJECT_SLUG_TAKEN` | 409 | Slug already in use inside this org | | `LAST_OWNER` | 409 | Membership change would orphan the org | | `UNAUTHENTICATED` | 401 | No JWT on the request | ## Changelog First shipped in [v0.1.0](../../CHANGELOG.md) (Phase 1 close-out). --- title: Rate limits parent: API reference nav_order: 3 --- # Rate limits Translately rate-limits every request. Limits are **per-token** (JWT subject, API key, PAT, or anonymous IP) and enforced through a Redis sliding window. The goal is to keep self-hosted instances responsive under accidental loops and genuinely-hostile traffic, not to nickel-and-dime legitimate callers. Introduced by: lays down in Phase 1 alongside the auth endpoints; takes effect as endpoints go live. Related: [API conventions steering](https://github.com/Pratiyush/translately/blob/master/.kiro/steering/api-conventions.md), [error codes](errors.md#rate-limiting--429), [scopes](scopes.md). ## Default limits | Traffic class | Limit | Window | |---|---|---| | Authenticated read (`GET`, `HEAD`, `OPTIONS`) | **600 req/min** | 60 s sliding | | Authenticated write (`POST`, `PUT`, `PATCH`, `DELETE`) | **120 req/min** | 60 s sliding | | AI suggest (`POST /suggest-translation`, batch equivalents) | **60 req/min** | 60 s sliding + per-project monthly budget cap | | Unauthenticated (signup, login, forgot-password, refresh) | **10 req/min per IP** | 60 s sliding | | Webhook inbound (Phase 6, if enabled) | **60 req/min per webhook** | 60 s sliding | Rate-limit state lives in Redis under the `rl:` key prefix, keyed by credential prefix + route group. Evictions run automatically via Redis TTLs; no application housekeeping required. ## Response headers Every response (successful and rate-limited) carries: ``` X-RateLimit-Limit: 600 X-RateLimit-Remaining: 582 X-RateLimit-Reset: 17 ``` - `X-RateLimit-Limit` — the current class's request cap. - `X-RateLimit-Remaining` — requests still available in the current window. - `X-RateLimit-Reset` — seconds until the window rolls forward. ## Retry headers A rate-limited request returns HTTP `429 Too Many Requests` with the [error envelope](errors.md#rate-limiting--429): ``` HTTP/1.1 429 Too Many Requests Retry-After: 12 Content-Type: application/json { "error": { "code": "RATE_LIMIT_EXCEEDED", "message": "Rate limit exceeded.", "details": { "limit": 120, "windowSeconds": 60, "retryAfterSeconds": 12 } } } ``` `Retry-After` is integer seconds, never a date (keeps parsing simple for CLIs). ## For SDKs and callers - **Respect `Retry-After`.** Don't back off blindly; the server already tells you when the next slot opens. - **Use a single persistent client per credential.** Short-lived curl scripts that reconnect per request hit the unauthenticated limit even when they don't mean to. - **Batch where you can.** Write endpoints accept batch payloads (`POST /keys/bulk`, `POST /translations/bulk`) specifically so you don't have to burn a write slot per row. - **Backoff sensibly on 5xx.** A `429` is a rate-limit signal; a `500` / `503` means retry the request later with exponential backoff. ## Operator tuning Each limit is a config property under `translately.rate-limit.*` in [`backend/app/src/main/resources/application.yml`](https://github.com/Pratiyush/translately/blob/master/backend/app/src/main/resources/application.yml). Override via env var: ``` TRANSLATELY_RATE_LIMIT_AUTHENTICATED_READ=1200 TRANSLATELY_RATE_LIMIT_AUTHENTICATED_WRITE=300 TRANSLATELY_RATE_LIMIT_AI_SUGGEST=120 TRANSLATELY_RATE_LIMIT_UNAUTHENTICATED=20 ``` Set a limit to `0` to disable rate-limiting for that class (self-hosters on a trusted LAN sometimes want this). Rate-limit metrics are exposed at `/q/metrics`: - `http_server_requests_seconds_count{status="429"}` — rejected count per route group. - `translately_rate_limit_remaining{class="…"}` — gauge of the **minimum** remaining budget across active keys per class. ## Why sliding window, not token bucket? Token buckets are cheaper but give bursty callers a better experience than we want to advertise: a legitimate caller who paces their requests shouldn't be punished because someone else burst. The sliding window gives a fair view of the last minute at the cost of a handful of Redis commands per request — an acceptable trade on the authenticated path. ## BYOK AI budget interaction AI Suggest requests consume two budgets: 1. **Rate limit** — 60 req/min per credential (this page). 2. **Per-project monthly cost cap** — set by `AI_CONFIG_WRITE`, enforced in service code before the provider call. Hitting the budget cap returns `AI_BUDGET_EXCEEDED` (403) **before** any provider network call is made. The rate-limit and budget are independent; you can exhaust one without touching the other. ## See also - Rate-limit headers on every response — not only on 429. Use them to self-pace proactively. - [`docs/api/errors.md`](errors.md) for the full response shape. - [Self-host hardening](../self-hosting/hardening.md) for the reverse-proxy configuration that caps payload size before it reaches Translately. --- title: Scopes parent: API reference nav_order: 1 --- # Permission scopes Every protected endpoint in the Translately API declares the scope(s) a caller must hold. Scopes are the atomic unit of authorization — API keys and PATs carry scope sets directly, user JWTs derive them from organization-role membership. Introduced by: [T108](https://github.com/Pratiyush/translately/issues/133) (scope enum + `@RequiresScope` + filter), [T109](https://github.com/Pratiyush/translately/issues/27) (role → scope resolver). Related: [authorization architecture](../architecture/authorization.md), [error codes](errors.md). ## Naming Each scope is a dotted, lowercase token: `.` where `` is `read` or `write` (with one exception: `ai.suggest`). - `write` **implies** `read` at resolver time: `keys.write` passes a `keys.read` check. - Scopes are **stable across minor versions**. Add new ones, deprecate old ones, remove one minor later. Never rename. - On the wire, scopes serialize as a **space-separated string** (the same grammar OAuth 2.0 uses): - In JWTs — the `scope` claim (`"scope": "keys.read keys.write translations.write"`) plus a mirrored `groups` array for `@RolesAllowed` interop. - In API keys and PATs — the `api_keys.scopes` / `personal_access_tokens.scopes` `VARCHAR(512)` column. ## Catalogue The full 31-token catalogue is defined in [`io.translately.security.Scope`](https://github.com/Pratiyush/translately/blob/master/backend/security/src/main/kotlin/io/translately/security/Scope.kt). Tokens are grouped by domain: ### Organization + membership | Token | Purpose | Introduced | |---|---|---| | `org.read` | Read organization metadata | Phase 1 | | `org.write` | Rename / update organization | Phase 1 | | `members.read` | List organization members | Phase 1 | | `members.write` | Invite / remove members, change roles | Phase 1 | | `api-keys.read` | List API keys (prefix + metadata only) | Phase 1 | | `api-keys.write` | Mint / revoke API keys | Phase 1 | | `audit.read` | Read audit log entries | Phase 7 | ### Project-wide | Token | Purpose | Introduced | |---|---|---| | `projects.read` | List and read projects in an org | Phase 1 | | `projects.write` | Create / archive projects | Phase 1 | | `project-settings.write` | Rename / reconfigure / delete a project | Phase 1 | ### Keys + translations (Phase 2) | Token | Purpose | |---|---| | `keys.read` | List keys, read metadata | | `keys.write` | Create / edit / delete keys, namespaces, tags | | `translations.read` | Read translation values | | `translations.write` | Author / edit translations | ### Imports + exports (Phase 3) | Token | Purpose | |---|---| | `imports.write` | Upload, preview, and run JSON imports | | `exports.read` | Generate export bundles | ### AI / MT + TM (Phase 4) | Token | Purpose | |---|---| | `ai.suggest` | Invoke AI-suggest on a key or batch (BYOK) | | `ai-config.write` | Configure provider, model, key, budget | | `tm.read` | Read translation-memory matches | | `glossaries.read` | Read glossary entries | | `glossaries.write` | Create / edit glossary entries | ### Screenshots (Phase 5) | Token | Purpose | |---|---| | `screenshots.read` | Read screenshots pinned to keys | | `screenshots.write` | Upload + pin screenshots | ### Webhooks + CDN (Phase 6) | Token | Purpose | |---|---| | `webhooks.read` | Read webhook configs + delivery log | | `webhooks.write` | Create / edit / disable webhooks | | `cdn.read` | Read CDN bundle config + URLs | | `cdn.write` | Configure CDN content | ### Tasks + branching (Phase 7) | Token | Purpose | |---|---| | `tasks.read` | Read translation tasks | | `tasks.write` | Create / assign / close tasks | | `branches.read` | Read translation branches | | `branches.write` | Create / merge / delete branches | ## Role → scope mapping The three built-in organization roles map to curated scope sets via `ScopeResolver`: | Role | Scope set | |---|---| | **OWNER** | every scope in the catalogue — new scopes default to OWNER so we never forget to grant them | | **ADMIN** | OWNER minus `project-settings.write`, `ai-config.write`, `api-keys.write` (retains `audit.read`) | | **MEMBER** | every `*.read` scope plus `keys.write`, `translations.write`, `imports.write`, `ai.suggest` | Invariant: `OWNER ⊃ ADMIN ⊃ MEMBER`. Enforced by `OrgRoleScopesTest`. See [`docs/architecture/authorization.md`](../architecture/authorization.md) for the rationale behind the ADMIN exclusion list. ## How a scope is checked 1. The authenticator (JWT / API key / PAT) resolves the caller's full scope set into `SecurityScopes`. 2. [`ScopeAuthorizationFilter`](https://github.com/Pratiyush/translately/blob/master/backend/api/src/main/kotlin/io/translately/api/security/ScopeAuthorizationFilter.kt) reads the `@RequiresScope(...)` annotation on the target resource method. 3. If `SecurityScopes ⊇ required`, the request continues; otherwise the filter throws `InsufficientScopeException`. 4. [`InsufficientScopeExceptionMapper`](https://github.com/Pratiyush/translately/blob/master/backend/api/src/main/kotlin/io/translately/api/security/InsufficientScopeExceptionMapper.kt) serialises that to a 403 with the [`INSUFFICIENT_SCOPE`](errors.md#insufficient_scope) envelope. Multiple scopes on `@RequiresScope(A, B)` are an **AND** — the caller must hold every listed scope. If you need OR semantics, document it explicitly in the endpoint and express the alternative in code; don't overload the annotation. ## Minting API keys and PATs with scopes When a user mints an API key (T110), they pick a subset of the scopes **they currently hold** (intersected with the org role). A MEMBER cannot mint an API key carrying `api-keys.write` — they don't have it themselves. This intersection rule is enforced service-side, not UI-side; the UI hints but the server decides. ## Forward compatibility - **Adding a scope.** New scope lands in the enum and on OWNER by default. ADMIN and MEMBER pick up read scopes automatically (any `*.read` joins MEMBER) and opt-in for writes. - **Deprecating a scope.** Mark it deprecated in the enum + `CHANGELOG` under the release it lands. Keep it in responses for one minor version; remove under `### Removed` in the sunset release. - **Unknown scopes in JWTs.** `Scope.parse` silently drops tokens it doesn't recognize — forward-compat for the case where an older server verifies a token minted by a newer one. ## OpenAPI surface Every endpoint in [`openapi.json`](openapi.json) carries an `x-required-scopes` extension listing the scopes its handler annotated with `@RequiresScope`. Generated SDK clients lift this into their types so IDE completion can surface the requirement at call sites. --- title: Versioning parent: API reference nav_order: 4 --- # API versioning The Translately REST API is versioned in the **URL path**: every endpoint lives under `/api/v1/`. This page documents what is and isn't a breaking change, how deprecations roll out, and the client-compatibility contract. Related: [API conventions steering](https://github.com/Pratiyush/translately/blob/master/.kiro/steering/api-conventions.md), [error codes](errors.md), [changelog](https://github.com/Pratiyush/translately/blob/master/CHANGELOG.md). ## Contract > **One live major version at a time, with a one-minor-version overlap when a `v2` ships.** Additive changes never bump the version. Breaking changes bump the path. - **`v1` is the current surface.** Every Phase 0–7 ticket lands here. A hypothetical `v2` happens only when a change cannot be expressed without breaking a published shape — and even then, we prefer a differently-named endpoint inside `v1` over a whole-API bump. - **When `v2` ships**, `v1` remains served, unchanged, for at least one minor-version overlap. Clients get a `Deprecation: true` header on `v1` responses from that release forward. - **No `Accept` header version negotiation.** The path is authoritative. Clients don't have to invent plumbing to pass a custom media type. ## What counts as breaking | Change | Breaking? | |---|---| | Add an optional request field | **No** | | Add a required request field with a default | **No** (treat as additive) | | Add a required request field without a default | **Yes** | | Add a new response field | **No** — clients must tolerate unknown fields | | Add a new enum value | **No** — clients must tolerate unknown values (see below) | | Remove a response field | **Yes** | | Rename a response field | **Yes** | | Change a response field's type or semantics | **Yes** | | Add a new status code (e.g. 202 where 200 was returned) | **Yes** | | Tighten validation on an existing endpoint | **Yes** | | Add a new endpoint | **No** | | Remove an endpoint | **Yes** | | Rename an endpoint path | **Yes** (serve the old path for the deprecation window) | | Add a new `error.code` value | **No** — clients must tolerate unknowns | | Remove / rename an `error.code` value | **Yes** | Additive enum and error-code values are *never* breaking. This is a hard contract — SDKs and the webapp are written with `default` branches and "unknown" mappings. ## Deprecation workflow When a field, endpoint, or error code is on its way out: 1. **Mark deprecated in the source** — field-level `@Deprecated` / JSDoc comment on the SDK, matching entry in `CHANGELOG.md` under `### Deprecated` in the release the deprecation lands. 2. **Ship the [Deprecation / Sunset headers](https://www.rfc-editor.org/rfc/rfc8594)** on every affected response: ``` Deprecation: true Sunset: Sat, 01 Nov 2026 00:00:00 GMT Link: ; rel="deprecation" ``` 3. **Cross-link from the docs.** The migration path lives under [`docs/migration/`](https://github.com/Pratiyush/translately/blob/master/docs) when cross-version migration is needed. 4. **Sunset at the announced release.** Move the CHANGELOG entry from `### Deprecated` to `### Removed`. Minimum deprecation window: **one minor version**. Longer for anything SDK consumers are likely to hit. ## Client forward-compatibility rules - **Unknown response fields** — ignore silently. Do not fail on extra keys. - **Unknown enum values** — map to a `"unknown"` / `"other"` sentinel in the client; do not throw. The webapp renders unknowns as "Unsupported value — update the app". - **Unknown `error.code`** — log with the `code` string, surface the `message` to the user, treat as a generic failure. Don't branch on `code` without a default. - **Unknown status codes** — treat `2xx` as success, `4xx` as client error, `5xx` as server error, with appropriate fallback messaging. - **Missing optional fields** — treat as `null` / absent rather than the default of some prior version. Don't synthesize values you didn't receive. These rules also apply to SDK regeneration: when the committed [`openapi.json`](openapi.json) gains a field, SDK callers pick it up automatically; losing a field is the breaking-change path above. ## OpenAPI compatibility - The source of truth is Quarkus's Smallrye OpenAPI generation at build time. - [`docs/api/openapi.json`](openapi.json) is committed and regenerated on every API change (T113). - Every resource method must carry `@Operation`, `@APIResponses`, and request / response schemas — CI fails the build if any is missing. - Client SDKs (`@translately/js` in Phase 5+) are generated from the committed `openapi.json` via `openapi-typescript`; regenerating the SDK as part of the API PR means types and runtime can't drift. ## Version in response bodies Responses do **not** carry an explicit `apiVersion` field — the version is in the URL. Adding one would only be useful for negotiating between `v1` and `v2` at the payload level, which the path-versioning strategy explicitly avoids. ## Why path-versioning, not header? Header-based versioning (`Accept: application/vnd.translately.v1+json`) is elegant in theory and a nightmare in practice: - Browser tabs and `curl` commands can't hit it without plumbing. - CDN caching layers ignore custom `Accept` values by default. - Observability tools (log aggregators, Grafana, Sentry) don't see the version dimension. - Clients get the version wrong more often than they get the path wrong. Path-versioning keeps a curl one-liner copy-pasteable and makes the `v1` → `v2` cutover visible to everyone reading the logs. We optimize for operational clarity. ## See also - [`CHANGELOG.md`](https://github.com/Pratiyush/translately/blob/master/CHANGELOG.md) — every release documents its Added / Changed / Deprecated / Removed / Fixed / Security sections. - [Error catalogue](errors.md) — the stable-across-versions contract. - [`.kiro/steering/api-conventions.md`](https://github.com/Pratiyush/translately/blob/master/.kiro/steering/api-conventions.md) — the authoritative steering version of this page. --- title: Architecture nav_order: 4 has_children: true permalink: /architecture/ --- # Architecture docs High-level architecture of the Translately platform. The canonical steering doc is [`.kiro/steering/architecture.md`](https://github.com/Pratiyush/translately/blob/master/.kiro/steering/architecture.md); this tree expands on it with diagrams, module maps, and Architecture Decision Records (ADRs). Per [CLAUDE.md rule #10](https://github.com/Pratiyush/translately/blob/master/CLAUDE.md), every PR that introduces a non-trivial technical decision lands an ADR here in the same PR. ## Overview - [Module map](modules.md) — Gradle module graph, dependencies, ownership. - [Data model](data-model.md) — ER diagram, entity relationships, indexing. - [Request lifecycle](request-lifecycle.md) — from HTTP → filter stack → controller → service → data layer. - [Multi-tenancy](multi-tenancy.md) — `TenantContext`, `TenantRequestFilter`, row-level isolation. - [Authentication](auth.md) — JWT vs. OIDC vs. LDAP, token formats, refresh rotation. - [Authorization](authorization.md) — scopes, roles, `@RequiresScope`, per-resource permissions. - [Crypto](crypto.md) — envelope encryption for BYOK secrets, key rotation, at-rest protections. - [Search](search.md) — Postgres FTS (`tsvector` + `pg_trgm`) over keys and translations; no Elasticsearch in v1. - [Webapp shell](webapp.md) — route tree, state stores, TanStack Query boundaries. - [CI pipelines](ci.md) — the six GitHub Actions workflows, branch-protection gates, and how tagged releases become signed images. - [ICU validation](icu-validation.md) — MessageFormat parse/validate contract, what's checked and not checked, consumers. ## ADRs Architecture Decision Records live under [`decisions/`](decisions/). Every non-trivial technical choice (library swap, auth strategy, storage layout, algorithm, performance trade-off) gets an ADR. They're immutable once accepted — supersede rather than edit. - [ADR index](decisions/README.md) - [ADR template (MADR)](decisions/_template.md) ## Diagram conventions - **Mermaid** in `.md` files — renders natively on GitHub + Pages. - **Exported PNGs** under `diagrams/` for the llms-full.txt ingestion (text-only LLMs can't parse `mermaid` blocks). - **Source alongside rendered** — keep the `.mmd` next to the `.png` so diagrams are reproducible. --- title: Authentication architecture parent: Architecture nav_order: 5 --- # Authentication architecture This page documents how Translately authenticates callers — the JWT format, refresh-token rotation, API-key and PAT validation, and the module boundaries that keep these pieces cleanly separable. Introduced by: [T104](https://github.com/Pratiyush/translately/issues/131) (`JwtIssuer` + `JwtAuthentication`), [T105](https://github.com/Pratiyush/translately/issues/132) (`PasswordHasher` + `TokenGenerator`), T110 (API keys + PATs), [T110-enforce](https://github.com/Pratiyush/translately/issues/149) (API-key + PAT authenticator filters). Related docs: [data-model](data-model.md), [authorization](authorization.md), [multi-tenancy](multi-tenancy.md). ## Credential types | Credential | Header format | Issuer | Verifier | Scope source | Identifies | TTL | |---|---|---|---|---|---|---| | **Access JWT** | `Authorization: Bearer ` | `JwtIssuer` | Smallrye JWT → `JwtSecurityScopesFilter` | JWT `scope` + `groups` claims | a user | ~15 min | | **Refresh JWT** | body / cookie at `/auth/refresh` only | `JwtIssuer` | `RefreshTokenParser` | — (refresh never bears scopes) | a user's session | ~30 days | | **API key** | `Authorization: ApiKey .` | `ApiKeyService` (project-scoped) | `ApiKeyAuthenticator` | `api_keys.scopes` verbatim | a project | until revoked / expired | | **Personal Access Token** | `Authorization: Bearer tr_pat_.` | `PatService` (user-scoped) | `PatAuthenticator` | `personal_access_tokens.scopes` ∩ owner's current effective scopes | a user | until revoked / expired | | **OIDC token** | `Authorization: Bearer ` | Keycloak (Phase 7) | Quarkus OIDC | IdP groups → scopes | a user | IdP-defined | | **LDAP bind** | basic auth | — | Quarkus Elytron LDAP | directory groups → scopes | a user | session-scoped | Exactly one authenticator populates scopes per request. The filter chain tries each authenticator in `Priorities.AUTHENTICATION` order and the first that matches the header shape handles it; the rest return early. A request with two credentials on the same `Authorization` header is impossible (HTTP allows only one); presenting both a bearer token *and* an API-key header (say via `x-api-key`) is rejected at the HTTP layer — we never merge grants across credentials. ## JWT format Translately uses **Smallrye JWT** with RSA-256 signing. The key pair is configured via `translately.jwt.sign-key.private` and `translately.jwt.verify-key.public` (the Quarkus defaults); operators rotate by deploying a new key pair and keeping the old public key in the verifier for one refresh-TTL window. ### Access token | Claim | Type | Meaning | |---|---|---| | `iss` | string | `translately.jwt.issuer`, default `translately` | | `aud` | string | `translately.jwt.audience`, default `translately-webapp` | | `sub` | string | User `external_id` (ULID) | | `upn` | string | User email (the User Principal Name) | | `scope` | string | Space-separated scope tokens — same grammar as `Scope.serialize` | | `groups` | string[] | Same scope tokens as a JSON array — Smallrye uses this for `@RolesAllowed` interop | | `orgs` | object[] | `[{id, slug, role}, …]` — one entry per org the user belongs to | | `typ` | string | `"access"` | | `iat` / `exp` | int | Issued-at and expiry (epoch seconds). Default TTL: **15 minutes** (`translately.jwt.access-ttl = PT15M`) | Tokens are compact-serialized; the `orgs` claim means the webapp rarely needs a second round-trip to resolve membership when deciding what to render. ### Refresh token Minimal claim set — everything the server needs to validate one request and then rotate: | Claim | Type | Meaning | |---|---|---| | `iss`, `aud`, `sub` | string | Same as access | | `jti` | string | Cryptographically-random ULID-like token (24-byte base32); single-use | | `typ` | string | `"refresh"` | | `iat` / `exp` | int | Default TTL: **30 days** (`translately.jwt.refresh-ttl = P30D`) | ### Rotation and replay protection (T103) `/api/v1/auth/refresh` performs an atomic rotation: 1. Verify the inbound refresh JWT (signature, issuer, audience, `typ=refresh`, `exp`). 2. Look up `jti` in the `refresh_tokens` ledger (table introduced by T103's `V2` migration). 3. If the row is already marked `consumed_at IS NOT NULL` → **replay attempt.** Return `REFRESH_TOKEN_REUSED` and invalidate every refresh-token row linked to the same user (forces all sessions to re-login). This is the "what if an attacker has cloned my refresh token?" answer. 4. Otherwise: stamp `consumed_at = NOW()`, mint a fresh access+refresh pair, record the new `jti`. The ledger is the only DB read on the hot path; it carries a `(jti UNIQUE, consumed_at, user_id, expires_at)` shape. ### Bearer-credential split `JwtSecurityScopesFilter` deliberately rejects refresh tokens on regular endpoints — refresh tokens are only valid at the `/api/v1/auth/refresh` controller. This prevents a stolen refresh token from being used to read data directly; it also means the refresh TTL can be longer than the access TTL without compromising the API surface. ### Claim reading `JwtSecurityScopesFilter` reads the `typ` and `scope` claims with a **typed `String` generic** — `token.getClaim(name)` rather than `token.getClaim(name)?.toString()`. Smallrye stores JSON string claims as `jakarta.json.JsonString` internally, and `JsonString.toString()` returns the quoted JSON literal (for example `"access"` with the quote characters included) rather than the underlying value. Reading through the typed generic makes Smallrye unwrap the claim via its internal converter and hand back the raw `String`. The filter additionally strips a leading/trailing `"` pair from the result as a belt-and-braces guard against any code path that still surfaces a quoted value. Root cause for [issue #151](https://github.com/Pratiyush/translately/issues/151); regression guarded by `JwtSecurityScopesFilterIT` (`:backend:app`). ## API key + PAT authentication Both credential types follow the same shape and Argon2id-hash their secrets: - **Prefix:** a stable `tr__` string (`tr_ak_…` for API keys, `tr_pat_…` for PATs). Stored in the `prefix` column of `api_keys` / `personal_access_tokens` with a unique index so lookup is O(1). The prefix is safe to display — it never encodes any part of the secret. - **Secret:** 32 random bytes, base64url-encoded without padding (43 chars), **shown exactly once** at mint time. The DB stores `Argon2id(secret)` only. - **Separator:** a single `.` between the prefix and secret on the wire. The separator lets us split the two halves cleanly even when the base64url-encoded secret contains `_` or `-`. On request: 1. Parse the `Authorization` header. `ApiKey .` routes to `ApiKeyAuthenticator`; `Bearer tr_pat_…` routes to `PatAuthenticator`; any other `Bearer` payload routes to `JwtSecurityScopesFilter` via the smallrye-jwt auth layer. 2. `SELECT … FROM api_keys WHERE prefix = ?` (or `personal_access_tokens` for a PAT) — unique index, O(1) lookup. 3. Compare `Argon2.verify(secret, row.secret_hash)`. A prefix miss and a bad secret collapse into the same 401 `UNAUTHENTICATED` response so attackers can't probe the prefix space. 4. If `revoked_at IS NOT NULL` → 401 `CREDENTIAL_REVOKED`. If `expires_at < now()` → 401 `CREDENTIAL_EXPIRED`. 5. On success, record `last_used_at = NOW()` (synchronous for v0.1.0 — a follow-up moves this to a Quartz-backed batch update). Argon2id parameters (see [ADR 0001](decisions/0001-argon2id-password-hashing.md)): `iterations=3`, `memory=64 MiB`, `parallelism=4`. Same settings for user passwords. ### Scope handling - **API key.** `api_keys.scopes` (space-separated tokens) is pushed into [`SecurityScopes`](../api/scopes.md) verbatim. The minting admin already enforced that the requested scopes were a subset of their own; at request time we trust the row. The owning organization's slug is also bound into `TenantContext` so multi-tenant filters see the right tenant when the URL path doesn't provide one. - **PAT.** `personal_access_tokens.scopes` is **intersected with the owning user's current effective scope set** before being granted. The user's scope set is computed from their `OrganizationMember` rows via `OrgRoleScopes`: so a PAT minted while the user was ADMIN of org X, whose role has since been demoted to MEMBER, can only exercise MEMBER-level scopes going forward. This intersection runs on every request — the stored `scopes` column is an upper bound, never a grant. ### Coexistence with the JWT mechanism Quarkus's proactive authentication layer hands every `Authorization: Bearer ` token to the smallrye-jwt mechanism. Smallrye-jwt treats any bearer token that isn't a parseable JWT as an authentication failure and returns 401 before JAX-RS filters run. That would short-circuit both the `ApiKey` scheme (wrong scheme — smallrye-jwt doesn't claim it, but some downstream checks still expected an auth path) and the `Bearer tr_pat_…` PAT shape (wrong JWT shape). `NonJwtBearerAuthMechanism` (priority 2000, higher than smallrye-jwt's default 1000) intercepts exactly those two header shapes and returns a placeholder authenticated identity. It does **no** real credential verification — that happens downstream in the JAX-RS filter where it can share `SecurityScopes` with the JWT path. The mechanism exists only so proactive auth doesn't 401 a legitimate API-key or PAT request before our filter sees it. For every other header shape it defers to smallrye-jwt. `JwtSecurityScopesFilter` is tolerant of non-JWT principals: if the active `SecurityIdentity` was produced by `NonJwtBearerAuthMechanism`, accessing the injected `JsonWebToken` throws `IllegalStateException`, which the filter catches and treats as "no JWT scopes to contribute" — the API-key / PAT authenticator owns the scope grants in that request. ## Module layout ``` :backend:security jwt/ JwtIssuer, JwtClaims, JwtTokens ← T104 issue #131 password/ PasswordHasher, TokenGenerator ← T105 issue #132 crypto/ CryptoService (envelope) ← T112 issue #136 tenant/ TenantContext ← T111 issue #135 rbac/ OrgRole, OrgRoleScopes, ScopeResolver ← T109 issue #134 Scope.kt, SecurityScopes.kt, RequiresScope.kt ← T108 issue #133 :backend:service credentials/ ApiKeyService, PatService ← T110 issue #28 CredentialAuthenticator ← T110-enforce issue #149 :backend:api tenant/ TenantRequestFilter ← T111 issue #135 security/ NonJwtBearerAuthMechanism ← T110-enforce issue #149 JwtSecurityScopesFilter ApiKeyAuthenticator ← T110-enforce PatAuthenticator ← T110-enforce ScopeAuthorizationFilter ← T108 InsufficientScopeException + mapper ``` ## Request lifecycle ``` Request arrives │ ▼ ┌───────────────────────────────────────────┐ │ Quarkus proactive auth │ │ • NonJwtBearerAuthMechanism (priority 2000) │ │ · Authorization: ApiKey ... → placeholder identity, falls through │ │ · Authorization: Bearer tr_pat_... → placeholder identity, falls through │ │ · anything else → defer to JWTAuthMechanism │ │ • JWTAuthMechanism (priority 1000) │ │ · Authorization: Bearer → parsed JsonWebToken in SecurityIdentity │ └───────────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────┐ │ JAX-RS request filters │ │ • TenantRequestFilter (AUTHENTICATION - 100) │ │ · parses /api/v1/organizations//... into TenantContext │ │ • ApiKeyAuthenticator (AUTHENTICATION) — claims ApiKey header, populates scopes │ │ • PatAuthenticator (AUTHENTICATION) — claims Bearer tr_pat_ header, populates scopes │ │ • JwtSecurityScopesFilter (AUTHENTICATION) — reads JWT claims, populates scopes │ │ • TestScopeHeaderFilter (AUTHENTICATION, test-only) — X-Test-Scopes header │ │ · All four run at the same priority; each short-circuits on header-shape mismatch so │ │ at most one populates SecurityScopes per request. │ │ • ScopeAuthorizationFilter (AUTHORIZATION) — enforces @RequiresScope │ └───────────────────────────────────────────┘ │ ▼ Resource method runs (or 401 / 403 rendered) ``` ## Test coverage - `JwtIssuerIT` / `JwtAuthenticationIT` (in `:backend:app`) — round-trip signed JWTs against a running Quarkus instance; assert every claim field and every rejection path. - `JwtSecurityScopesFilterIT` (`:backend:app`) — regression for [issue #151](https://github.com/Pratiyush/translately/issues/151): mints a JWT via `JwtIssuer`, presents it on a `@RequiresScope` probe endpoint, and asserts the filter unwraps `JsonString` claims correctly so `SecurityScopes.granted` is populated. - `ApiKeyAuthenticatorIT` / `PatAuthenticatorIT` — mint real credentials via `ApiKeyService` / `PatService`, present them on a probe endpoint, and assert the full chain (parse → Argon2id verify → revocation / expiry check → scope grant → `@RequiresScope` enforce). Covers: happy path, revoked, expired, bad secret, unknown prefix, malformed token, other-scheme header ignored, cross-org PAT scope intersection (MEMBER cannot exercise ADMIN-level scope). - `PasswordHasherTest` (`:backend:security`) — verifies Argon2id parameter constants, round-trip hash+verify, wrong-password rejection, and malformed-hash graceful failure. - `CryptoServiceTest` — envelope layout, tamper detection, KEK-size validation. - `ScopeResolverTest` / `OrgRoleScopesTest` — role-to-scope mapping invariants. Integration tests run under `./gradlew :backend:app:test`; unit tests under `./gradlew :backend:security:test` and don't require Docker. --- title: Authorization parent: Architecture nav_order: 6 --- # Authorization — scopes and roles Translately uses a **scope-based authorization model** layered on top of coarse **organization roles**. Scopes are the atomic permission token; roles are the human-facing shorthand that maps to a curated scope set. Introduced by: [T108](https://github.com/Pratiyush/translately/issues/133) (scope enum + `@RequiresScope` + JAX-RS filter), [T109](https://github.com/Pratiyush/translately/issues/134) (role → scope resolver). Related docs: [auth architecture](auth.md), [API scopes reference](../api/scopes.md). ## Scope naming Every scope is a dotted, lowercase token: `.` where `` ∈ `{read, write}` plus the special `ai.suggest`. - `write` **implies** `read` at the resolver level — a caller with `keys.write` passes a `keys.read` check. - **Never rename an existing token.** API keys, PATs, and customer-issued credentials embed these strings in storage. Add a new scope and deprecate the old one; remove one minor version later. The full catalogue lives in [`io.translately.security.Scope`](https://github.com/Pratiyush/translately/blob/master/backend/security/src/main/kotlin/io/translately/security/Scope.kt) and is surfaced in the [API scopes reference](../api/scopes.md). ## Role → scope mapping The three built-in organization roles are deliberately coarse. Finer-grained rules (per-project roles, API-key scope intersection, PAT restriction) compose on top. | Role | Read scopes | Write scopes | Notes | |---|---|---|---| | **OWNER** | every `*.read` | every `*.write` + `ai.suggest` + `audit.read` | Founder / destructive rights. New scopes default to OWNER so we never forget to grant them. | | **ADMIN** | every `*.read` | OWNER minus `project-settings.write`, `ai-config.write`, `api-keys.write` | Can manage members and projects but cannot rename/archive projects, rotate BYOK keys, or mint org API keys. Retains `audit.read` so admins cannot hide their own tracks by rotating their credentials. | | **MEMBER** | every `*.read` | `keys.write`, `translations.write`, `imports.write`, `ai.suggest` | "Day-job" authoring set. Cannot administer the org or configure project-wide toggles. | Invariant: `OWNER ⊃ ADMIN ⊃ MEMBER`. Asserted by `OrgRoleScopesTest` — must be preserved as scopes are added. ### Why this specific ADMIN exclusion list? Three levers stay with OWNER: 1. **`project-settings.write`** — rename / archive / delete a project. These are org-reshape actions. 2. **`ai-config.write`** — attach or rotate a BYOK AI provider. Owners control billing-adjacent decisions because BYOK keys have a cost. 3. **`api-keys.write`** — mint or revoke org-level API keys. Kept with OWNER so an ADMIN can't mint themselves a long-lived credential and drop it outside the audit horizon. ADMIN explicitly keeps `audit.read` — separation so admins can investigate incidents without being able to cover up their own traces. ## Runtime resolution ```mermaid sequenceDiagram participant Client participant Tenant as TenantRequestFilter participant Auth as JwtSecurityScopesFilter participant Authz as ScopeAuthorizationFilter participant Resource as @RequiresScope resource Client->>Tenant: request (+ Authorization header) Tenant->>Auth: request (TenantContext bound) Auth->>Auth: verify JWT / API key / PAT Auth->>Auth: ScopeResolver.resolveFromMemberships(...) Auth->>Authz: SecurityScopes = {KEYS_READ, KEYS_WRITE, ...} Authz->>Resource: required = method's @RequiresScope alt scopes cover required Authz->>Resource: invoke Resource-->>Client: 200 else insufficient Authz-->>Client: 403 INSUFFICIENT_SCOPE end ``` The filters in code: - [`TenantRequestFilter`](https://github.com/Pratiyush/translately/blob/master/backend/api/src/main/kotlin/io/translately/api/tenant/TenantRequestFilter.kt) — runs first. Extracts the tenant identifier from the URL path. - `JwtSecurityScopesFilter` — runs at `Priorities.AUTHENTICATION`. Verifies the credential, hydrates `SecurityScopes` with the resolved scope set. - [`ScopeAuthorizationFilter`](https://github.com/Pratiyush/translately/blob/master/backend/api/src/main/kotlin/io/translately/api/security/ScopeAuthorizationFilter.kt) — runs at `Priorities.AUTHORIZATION`. Reads `@RequiresScope` off the resource method; fails fast with `INSUFFICIENT_SCOPE` if the request doesn't cover it. ## `@RequiresScope` usage ```kotlin @Path("/api/v1/organizations/{orgId}/projects") class ProjectResource { @GET @RequiresScope(Scope.PROJECTS_READ) fun list(@PathParam("orgId") orgId: String): List = ... @POST @RequiresScope(Scope.PROJECTS_WRITE, Scope.PROJECT_SETTINGS_WRITE) fun create(body: CreateProjectRequest): ProjectDto = ... } ``` Multiple scopes in `@RequiresScope` are an **AND** — the caller must hold all of them. Document the `OR` case explicitly in the resource if you need it; don't overload the annotation. `@RequiresScope` is only valid on JAX-RS resource methods (or the class, in which case it applies to every method). See the [`RequiresScope.kt`](https://github.com/Pratiyush/translately/blob/master/backend/security/src/main/kotlin/io/translately/security/RequiresScope.kt) annotation and [`ScopeAuthorizationFilter.kt`](https://github.com/Pratiyush/translately/blob/master/backend/api/src/main/kotlin/io/translately/api/security/ScopeAuthorizationFilter.kt) enforcer. ## Error contract On failure the filter throws `InsufficientScopeException`, caught by `InsufficientScopeExceptionMapper` and serialized as: ```json { "error": { "code": "INSUFFICIENT_SCOPE", "message": "This endpoint requires scope(s): keys.write", "details": { "required": ["keys.write"], "held": ["keys.read"] } } } ``` HTTP status: `403`. Stable across minor versions — CLI, SDK, and webapp all match on `error.code`. ## Where roles meet scopes `ScopeResolver.canResolveFor(userId, memberships, orgId)`: - `orgId = null` — "cross-org view" (e.g. `GET /organizations` to list orgs you can see). Returns the union of every role across every org. - `orgId ≠ null` — filter memberships by that org, then union. A user with no membership in the org receives the empty set → every downstream `@RequiresScope` fails closed. The resolver is intentionally stateless and pure — no DB, no cache. The service layer loads memberships once (at JWT mint or per-request authentication) and passes them in. Cache invalidation is therefore not a problem this layer solves. --- title: CI pipelines parent: Architecture nav_order: 10 --- # CI pipelines Translately's GitHub Actions live under [`.github/workflows/`](https://github.com/Pratiyush/translately/tree/master/.github/workflows). Six workflows handle PR validation, security scanning, link health, docs deploy, and signed-tag releases. Together with the `master` branch-protection rules, they form the full quality gate between a PR and a signed release tag. ## Workflow map ```mermaid flowchart LR subgraph PR/Push A[ci-backend.yml] B[ci-webapp.yml] C[codeql.yml] D[link-checker.yml] E[pages.yml] end subgraph Tag push F[release.yml] end PR[Open/update PR] --> A PR --> B PR --> C PR --> D MASTER[Push to master] --> A MASTER --> B MASTER --> C MASTER --> D MASTER --> E TAG[Push signed tag v*] --> F F --> GHCR[ghcr.io images] F --> REL[GitHub Release] SCHED[Weekly schedule] -.-> C SCHED -.-> D ``` ## Branch protection (master) The workflows sit alongside branch-protection rules on `master`. Both together are the gate. Current settings (verify in repo settings or `gh api repos/Pratiyush/translately/branches/master/protection`): - **Required signed commits.** Unsigned commits are rejected — no `--no-verify`, ever. CLAUDE.md rule #1 is mirrored as server-side enforcement. - **Required linear history.** No merge commits on `master`; PRs merge via squash or rebase. - **Required CODEOWNERS review.** Every matching path in [`CODEOWNERS`](https://github.com/Pratiyush/translately/blob/master/CODEOWNERS) pulls `@Pratiyush` in as a required reviewer. - **Dismiss stale reviews on new commits.** Re-review required after any push. - **Required conversation resolution.** Every review thread must be resolved before merge. - **No force-push, no branch deletion.** Masters stays append-only. CI workflows below are listed as required status checks — a red check blocks the merge button. ## `ci-backend.yml` — backend build + test [File](https://github.com/Pratiyush/translately/blob/master/.github/workflows/ci-backend.yml). Owns JVM build, lint, test, coverage for the Gradle multi-module backend. **Triggers:** `push` to `master` and every `pull_request`, filtered to paths under `backend/**`, `buildSrc/**`, `gradle/**`, the top-level Gradle files, or this workflow itself. Unrelated docs-only PRs don't spin it up. **Concurrency:** `ci-backend-${{ github.ref }}` with `cancel-in-progress: true` — a fresh push supersedes the previous run. **Permissions:** `contents: read`. **Key steps:** 1. `actions/checkout@v4`. 2. `actions/setup-java@v4` — Temurin JDK **21**. 3. `gradle/actions/setup-gradle@v4` with cache cleanup. 4. `./gradlew projects --no-daemon` — sanity-check the 13-module wiring before spending time on anything heavier. 5. `./gradlew ktlintCheck detekt --no-daemon --continue` — ktlint + detekt (`--continue` so both lint reports emit even if one fails). 6. `./gradlew build --no-daemon --stacktrace` — compile, test, Jacoco coverage, archive. 7. `./gradlew jacocoTestReport --no-daemon` (always) — regenerates the HTML report even when tests fail. 8. Upload **`backend-coverage`** and **`backend-test-reports`** artifacts (14-day retention). The `checkOpenApiUpToDate` Gradle task (wired into `build` via `check` in [`buildSrc/.../quarkus-app`](https://github.com/Pratiyush/translately/blob/master/backend/app/build.gradle.kts)) fails the job if the committed `docs/api/openapi.json` drifts from what the backend emits — this is how API-spec docs stay in sync with code on every PR. **Required secrets:** none. ## `ci-webapp.yml` — webapp lint + test + build [File](https://github.com/Pratiyush/translately/blob/master/.github/workflows/ci-webapp.yml). Handles pnpm-based build for `webapp/`. **Triggers:** push to `master` / any PR, filtered to `webapp/**`, `pnpm-workspace.yaml`, `package.json`, `pnpm-lock.yaml`, or this workflow file. **Concurrency:** `ci-webapp-${{ github.ref }}` with cancel-in-progress. **Guard job (`guard`):** checks for `webapp/package.json` and outputs `present=true|false`. The build job is skipped when the webapp hasn't been scaffolded yet — this was important in early Phase 0 before the webapp existed, and it's still the defensive pattern if the tree is ever reshaped. **Build job key steps:** 1. Checkout. 2. `pnpm/action-setup@v4` + `actions/setup-node@v4` (Node **22**, pnpm cache enabled). 3. `pnpm install --frozen-lockfile` — lockfile drift fails the job. 4. `pnpm --filter @translately/webapp lint` — ESLint + Prettier. 5. `pnpm --filter @translately/webapp codegen:check` — regenerates `src/lib/api/types.gen.ts` from the committed `docs/api/openapi.json` and fails if the output drifts. Mirrors the backend's `checkOpenApiUpToDate` so the generated TypeScript client stays pinned to the spec. 6. `pnpm --filter @translately/webapp test -- --run` — Vitest. 7. `pnpm --filter @translately/webapp build` — Vite production bundle. 8. Upload `webapp-bundle` artifact (`webapp/dist/`, 14-day retention). **Required secrets:** none. ## `codeql.yml` — SAST across three languages [File](https://github.com/Pratiyush/translately/blob/master/.github/workflows/codeql.yml). GitHub CodeQL static analysis with the `security-and-quality` query pack. **Triggers:** - `push` to `master`. - `pull_request` targeting `master`. - Scheduled: every **Tuesday 03:17 UTC** (catches new CodeQL rules without waiting for the next PR). **Permissions:** `actions: read`, `contents: read`, `security-events: write` (so findings upload to the Security tab). **Detect job:** probes for the presence of any JS/TS source root (`webapp/`, `sdks/js/`, `sdks/react/`, `cli/`). Sets `has_jsts=true|false`. CodeQL's JS/TS extractor errors out with "no source" if it's pointed at an empty tree, so we skip it until there's something to scan. **Analyze matrix:** | Language | Build mode | Notes | |---|---|---| | `java-kotlin` | `autobuild` | Uses the Gradle wrapper. Temurin JDK 21 set up before init. | | `javascript-typescript` | `none` | Gated on `has_jsts=true`. No build step needed — parses source directly. | | `actions` | `none` | Scans the workflow YAML itself for action misuse / pinning issues. | Each matrix leg runs the standard `codeql-action/init@v3` → `codeql-action/analyze@v3` pair with `queries: security-and-quality`. Timeout: 30 minutes. `fail-fast: false` so one language failing doesn't mask the others. **Required secrets:** none (uses the default `GITHUB_TOKEN`). ## `link-checker.yml` — lychee [File](https://github.com/Pratiyush/translately/blob/master/.github/workflows/link-checker.yml). Runs [lychee](https://github.com/lycheeverse/lychee-action) over every Markdown file and the built `docs/**/*.html` to catch link rot. **Triggers:** - Push / PR on `**/*.md`, anything under `docs/`, or this workflow file. - Scheduled: **every Monday 06:13 UTC**. Config: [`lychee.toml`](https://github.com/Pratiyush/translately/blob/master/lychee.toml) at the repo root. Notable bits: - `exclude_path = ["_reference"]` — skip the gitignored third-party mirror. - A list of known-future URLs (release tags that 404 until their tag lands, the Pages site during first-deploy cache, the docs-bundle ZIP that's built inside `pages.yml` rather than committed) is excluded so link-rot remains actionable. - `accept = [200, 202, 206, 301, 302, 303, 307, 308, 403, 429]` — common redirect/throttle codes count as healthy. **Key steps:** 1. Checkout. 2. `lycheeverse/lychee-action@v2` with `--config lychee.toml './**/*.md' 'docs/**/*.html'`, `fail: true`. 3. Always upload the `lychee-report.md` artifact (14-day retention). **Permissions:** `contents: read`, `issues: write` (some lychee integrations can auto-open issues; we don't currently wire that on). **Required secrets:** none. ## `pages.yml` — GitHub Pages deploy (docs site) [File](https://github.com/Pratiyush/translately/blob/master/.github/workflows/pages.yml). Builds the Jekyll site under `docs/` with the `just-the-docs` remote theme and publishes it to GitHub Pages. **Triggers:** - `push` to `master` on `docs/**` or this workflow file. - `workflow_dispatch` — manual re-deploy without a content change. **Permissions:** `contents: read`, `pages: write`, `id-token: write` (required by `actions/deploy-pages`). **Concurrency:** `group: pages`, `cancel-in-progress: true` — a newer deploy cancels an in-flight one. **Two jobs:** 1. **`build`** (runs on `ubuntu-latest`): - Checkout. - Build the **downloadable docs bundle**: `docs/downloads/translately-docs.zip` containing every raw `.md` / `.txt` / image under `docs/` so offline users can consume the full corpus without cloning. Built **before** Jekyll so it ends up inside `_site`. - `actions/configure-pages@v5`. - `actions/jekyll-build-pages@v1` with `source: ./docs`, `destination: ./_site`. - `actions/upload-pages-artifact@v3`. 2. **`deploy`** (depends on `build`): - `actions/deploy-pages@v4` → . - Environment: `github-pages` (enforces the Pages-specific OIDC token). **Required secrets:** none — GitHub-managed Pages tokens only. ## `release.yml` — signed-tag pipeline [File](https://github.com/Pratiyush/translately/blob/master/.github/workflows/release.yml). The big one. Fires only when a `v*` tag is pushed. **Trigger:** `push` with `tags: ['v*']`. Per the phase gate in [CLAUDE.md rule #6](https://github.com/Pratiyush/translately/blob/master/CLAUDE.md), Phase N ends with a signed tag `v0.N.0`; `v1.0.0` is Phase 7. **Permissions:** `contents: write` (for the Release), `packages: write` (GHCR push), `id-token: write` (Cosign keyless). **Concurrency:** `release-${{ github.ref }}`, **not cancel-in-progress** — releases run to completion. **Jobs:** ### `verify` — signed tag + version extraction - Extracts `VERSION="${TAG#v}"`. - Flags `is_prerelease=true` when `VERSION` starts with `0.` (all Phase 0–7 tags are pre-1.0). - `git tag -v` checks GPG signature. Currently a soft warning on CI (the ephemeral runner may not have the pubkey imported) — branch protection on `master` is the real enforcement that unsigned commits / tags can't land. ### `build-backend` - Temurin JDK 21 + Gradle. - `./gradlew :backend:app:build -x test -x checkOpenApiUpToDate --no-daemon` — builds the Quarkus **fast-jar**, skipping `test` and `checkOpenApiUpToDate` because both depend on a boot-time Quarkus test-mode run that PR-time CI has already enforced. - Uploads `backend-fastjar-${VERSION}` artifact (30-day retention). ### `build-webapp` - Guard-checks for `webapp/package.json` first (mirrors `ci-webapp.yml`). - pnpm install + `pnpm --filter @translately/webapp build`. - Uploads `webapp-bundle-${VERSION}` artifact (30-day retention). ### `docker` — multi-arch image build + push + sign Depends on `verify`, `build-backend`, `build-webapp`. Skipped for `0.0.1` bootstraps (`if: needs.verify.outputs.version != '0.0.1'`). - `docker/setup-qemu-action@v3` + `docker/setup-buildx-action@v3`. - Log in to `ghcr.io` with the job's `GITHUB_TOKEN`. - **Backend image:** `docker/build-push-action@v6` with `infra/docker/backend.Dockerfile`, platforms `linux/amd64,linux/arm64`, `provenance: mode=max`, `sbom: true`. - **Webapp image:** same, conditional on `hashFiles('webapp/package.json') != ''`. - Tags pushed: `ghcr.io/pratiyush/translately-{backend,webapp}:${VERSION}` and `:latest`. - OCI labels pinned: `image.source`, `image.revision`, `image.version`, `image.licenses=MIT`. - `sigstore/cosign-installer@v3` + `cosign sign` (keyless, `COSIGN_YES=true`) — signatures verifiable from the image tag alone. **Required secrets:** none (GHCR uses `GITHUB_TOKEN`; Cosign is OIDC/keyless via `id-token: write`). ### `release` — GitHub Release creation - `fetch-depth: 0` so the CHANGELOG awk scrape can read history. - Extracts the `## [${VERSION}]` section from `CHANGELOG.md` with `awk`; falls back to a stub. - Downloads the backend + webapp artifacts and `tar -czf`'s them into `translately-backend-${VERSION}.tar.gz` / `translately-webapp-${VERSION}.tar.gz`. - `softprops/action-gh-release@v2` creates or updates the Release, attaches both tarballs, and flags pre-release based on the `verify` output. ## How the workflows compose - **Every PR** runs `ci-backend`, `ci-webapp` (when webapp paths changed), `codeql`, and `link-checker` (for Markdown/docs changes). All four must be green; CODEOWNERS approval + signed commits are required; linear history is enforced on merge. - **Push to `master`** re-runs the same four and additionally deploys `docs/` via `pages.yml` when the docs tree changes. - **Signed `v*` tag push** triggers `release.yml` — builds fast-jar + webapp bundle, publishes signed multi-arch images to GHCR, and creates the GitHub Release with CHANGELOG-sourced notes. - **Scheduled runs** — CodeQL weekly (Tue 03:17 UTC) catches new rule updates; lychee weekly (Mon 06:13 UTC) catches external link rot that a code-free week would miss. For the contributor-facing workflow that sits on top of these pipelines (branch naming, commit conventions, pre-merge checklist), see [`.kiro/steering/contributing-rules.md`](https://github.com/Pratiyush/translately/blob/master/.kiro/steering/contributing-rules.md) and the [pull-request template](https://github.com/Pratiyush/translately/blob/master/.github/PULL_REQUEST_TEMPLATE.md). --- title: Crypto — envelope encryption parent: Architecture nav_order: 7 --- # Crypto — envelope encryption Translately encrypts at-rest secrets using AES-256-GCM envelope encryption. The only secret outside the database is the **Key Encryption Key (KEK)**, injected by the operator at boot via an environment variable. Every secret in the database gets its own **Data Encryption Key (DEK)**, and the DEK itself is stored encrypted alongside the ciphertext. Introduced by: [T112](https://github.com/Pratiyush/translately/issues/136). Primary consumer: Phase 4 — per-project BYOK AI API keys (`projects.ai_api_key_encrypted`). Related: [data-model](data-model.md), [hardening](../self-hosting/hardening.md). ## Why envelope encryption? A single "encrypt with the master key" scheme forces every row to share a key. Rotating that key requires re-encrypting every row (expensive) and one compromised ciphertext weakens the whole dataset. Envelope encryption fixes both: - Each row gets its own DEK → a compromised ciphertext leaks exactly that one secret. - Rotating the KEK means re-wrapping each DEK, not re-encrypting each payload — O(envelopes) with constant-size work per envelope. - The KEK never leaves the JVM's memory; it's not in the database. ## Envelope layout Implemented in [`io.translately.security.crypto.CryptoService`](https://github.com/Pratiyush/translately/blob/master/backend/security/src/main/kotlin/io/translately/security/crypto/CryptoService.kt). Every envelope is a single `bytea`: | Offset | Bytes | Meaning | |---|---|---| | 0 | 1 | `version` — currently `0x01` | | 1 | 12 | IV for DEK encryption (GCM) | | 13 | 48 | `AES-GCM(KEK, IV_dek, DEK)` — 32-byte DEK + 16-byte auth tag | | 61 | 12 | IV for data encryption (GCM) | | 73 | N + 16 | `AES-GCM(DEK, IV_data, plaintext)` — N payload bytes + 16-byte auth tag | Minimum envelope length (empty plaintext): **89 bytes**. Constants are exposed via `CryptoService.Companion` for test assertions. ## Guarantees - **Confidentiality** — AES-256-GCM at both layers. - **Integrity + authentication** — the 128-bit GCM tag catches any single-bit flip, reorder, or truncation. `decrypt` throws `AEADBadTagException` on tampering. - **Non-determinism** — two calls to `encrypt(plaintext)` with the same input produce different envelopes (fresh DEK and IVs each time). Observers of the `bytea` column cannot deduplicate or compare ciphertexts. - **Forward version compatibility** — the leading version byte lets us migrate to a different scheme later (key-wrapped, post-quantum, HSM-backed) without breaking old rows. `decrypt` rejects unsupported versions. ## Operator setup One env var at boot: ```bash TRANSLATELY_CRYPTO_MASTER_KEY= ``` Generate with: ```bash openssl rand -base64 32 ``` A wrong-size key fails fast in the CDI producer (`CryptoServiceProducer`) with a clear error — the app refuses to start rather than silently picking up a truncated key. **Rotate** by: 1. Generate a new KEK. 2. Deploy with both keys available (`TRANSLATELY_CRYPTO_MASTER_KEY_OLD` + `TRANSLATELY_CRYPTO_MASTER_KEY`) — the migration CLI decrypts with old, re-encrypts with new. 3. After the migration finishes, remove `_OLD` and restart. The rotation tool itself ships alongside Phase 4 once real envelopes exist; Phase 1 only lays the primitive down. ## Usage from service code ```kotlin class ProjectAiService( private val crypto: CryptoService, private val projects: ProjectRepository, ) { fun storeKey(projectId: Long, apiKey: String) { val envelope = crypto.encrypt(apiKey) projects.updateAiKeyEncrypted(projectId, envelope) } fun loadKey(projectId: Long): String? { val envelope = projects.findAiKeyEncrypted(projectId) ?: return null return crypto.decryptString(envelope) } } ``` The service is the only layer allowed to hold plaintext secrets in memory. JAX-RS resources, controllers, and every module above `:backend:service` receive an opaque `bytea` or a domain object that never exposes plaintext. ## Defensive details - The in-memory DEK buffer is zero-filled immediately after use. The JVM may have copied it during `ByteBuffer.put`, but we clear the reference we own. This is belt-and-braces; GCM already makes ciphertext-only attacks impossible. - `CryptoService` is **stateless** and safe for concurrent use. The underlying `Cipher` is per-call (`Cipher.getInstance("AES/GCM/NoPadding")`); JCE cipher instances are not thread-safe and must not be cached across calls. - The KEK is held as a `SecretKeySpec` built from a defensive `kekBytes.copyOf()` so the operator's caller can zero its own buffer. ## Testing `CryptoServiceTest` in `:backend:security` asserts: - Round-trip for empty, short, and large payloads. - Non-determinism (two encrypts of the same plaintext differ). - Tamper detection (flip any byte → `AEADBadTagException`). - Wrong-KEK rejection. - Version byte rejection for unknown versions. - Invalid KEK length → `IllegalArgumentException` at construction. No Quarkus or Docker is required — these are pure JCE unit tests. ## Not in scope - **KMS integration** (AWS KMS, GCP KMS, Vault Transit) — Phase 7 will add a `KekProvider` port so self-hosters who have a KMS can delegate key material. The envelope format stays the same. - **Field-level deterministic encryption** — we don't encrypt anything we'd need to index, so a non-deterministic scheme is strictly better. - **Transport encryption** — handled at the reverse proxy. See [hardening](../self-hosting/hardening.md). --- title: Data model parent: Architecture nav_order: 2 --- # Data model Translately's persistence layer is PostgreSQL 16 with Hibernate ORM + Panache (blocking JDBC). Schema evolution is driven by Flyway, plain-SQL migrations under `backend/data/src/main/resources/db/migration/`. This page is the narrative partner of `V1__auth_and_orgs.sql` — start here for the **why** and jump to the migration for the **how**. Introduced by: [T101](https://github.com/Pratiyush/translately/issues/129) · First migration: `V1__auth_and_orgs.sql`. ## Identifier strategy Every durable entity carries **two** identifiers: - `id BIGSERIAL PRIMARY KEY` — monotonic, internal only. Foreign keys reference this. - `external_id CHAR(26) NOT NULL UNIQUE` — [ULID](https://github.com/ulid/spec), Crockford base32, lexicographically time-sortable. This is the only identifier that leaves the database: every URL, JSON payload, webhook event, and API-key prefix uses `external_id`. **Why both.** Using a bigserial for FKs keeps indexes tiny and joins fast; exposing ULIDs on the wire avoids leaking row counts, dodges integer-enumeration attacks, and lets callers sort by ID as a coarse creation-time sort. Generation lives in `io.translately.data.Ulid`; a Hibernate `@PrePersist` hook assigns it if the entity is persisted without one. ## Conventions | Convention | Rule | |---|---| | Table names | plural snake_case (`users`, `project_languages`) | | Timestamps | `created_at`, `updated_at`, optional `deleted_at` — all `TIMESTAMPTZ NOT NULL` (`deleted_at` nullable) | | Soft delete | only where retention matters (users, organizations, projects). Everything else hard-deletes on cascade | | FK naming | `fk__` | | Unique constraints | `uk__` | | Indexes | `idx_
_` | | Booleans | avoided; prefer nullable `TIMESTAMPTZ` so we keep the "when" for free (`email_verified_at` vs. `email_verified`) | | Enums | `VARCHAR(n)` + `CHECK` constraint — keeps migrations cheap and readable in psql | ## V1 entity-relationship diagram ```mermaid erDiagram USERS ||--o{ ORGANIZATION_MEMBERS : "is_member_via" USERS ||--o{ PERSONAL_ACCESS_TOKENS : "owns" ORGANIZATIONS ||--o{ ORGANIZATION_MEMBERS : "has" ORGANIZATIONS ||--o{ PROJECTS : "owns" PROJECTS ||--o{ PROJECT_LANGUAGES : "supports" PROJECTS ||--o{ API_KEYS : "issues" USERS { bigserial id PK char external_id UK "26 — ULID" varchar email UK "254" timestamptz email_verified_at varchar password_hash "Argon2id — nullable for SSO" varchar full_name "128" varchar locale "default en" varchar timezone "default UTC" timestamptz deleted_at } ORGANIZATIONS { bigserial id PK char external_id UK "26 — ULID" varchar slug UK "64 — kebab-case" varchar name "128" timestamptz deleted_at } ORGANIZATION_MEMBERS { bigserial id PK char external_id UK "26 — ULID" bigint organization_id FK bigint user_id FK varchar role "OWNER ADMIN MEMBER" timestamptz invited_at timestamptz joined_at } PROJECTS { bigserial id PK char external_id UK "26 — ULID" bigint organization_id FK varchar slug "unique per org" varchar name "128" varchar description "1024" varchar base_language_tag "default en" varchar ai_provider "ANTHROPIC OPENAI OPENAI_COMPATIBLE" varchar ai_model varchar ai_base_url bytea ai_api_key_encrypted "envelope-encrypted T112" numeric ai_budget_cap_usd_monthly timestamptz deleted_at } PROJECT_LANGUAGES { bigserial id PK char external_id UK "26 — ULID" bigint project_id FK varchar language_tag "BCP-47" varchar name "display name" varchar direction "LTR or RTL" } API_KEYS { bigserial id PK char external_id UK "26 — ULID" bigint project_id FK varchar prefix UK "16 — shown once" varchar secret_hash "Argon2id" varchar name "128" varchar scopes "space-separated tokens" timestamptz expires_at timestamptz last_used_at timestamptz revoked_at } PERSONAL_ACCESS_TOKENS { bigserial id PK char external_id UK "26 — ULID" bigint user_id FK varchar prefix UK "16" varchar secret_hash varchar name varchar scopes timestamptz expires_at timestamptz last_used_at timestamptz revoked_at } ``` ## Notable per-entity decisions ### `users` - `password_hash` is nullable — SSO-only users (Phase 7) do not have a local password. Login attempts with email + password against a null hash always fail on the same code path as "wrong password", so there is no enumeration signal. - `email_verified_at` gates anything that isn't signup / verify / login / password-reset. Resource filters check this when T103's email-verify ships. - `locale` and `timezone` are stored so server-rendered emails (Qute templates) and audit exports can respect the user's preferences without another round-trip. ### `organizations` - `slug` is unique globally. A cheap sanity check — URLs are shorter and sharable when the slug is unique. - No explicit billing fields: the platform is open-source self-host; SaaS operators fork and add billing tables on top. ### `organization_members` - `(organization_id, user_id)` is UNIQUE — a user has exactly one role per org at any time. - `invited_at` / `joined_at` split lets the service layer represent a pending invite without a separate table. A row with `joined_at IS NULL` is an outstanding invite. - Roles are enforced by `CHECK (role IN ('OWNER','ADMIN','MEMBER'))` so bad enum values can't land even from a rogue INSERT. ### `projects` - The five `ai_*` columns are all nullable — a project with zero AI columns set is perfectly functional; Suggest simply isn't offered in the UI. This is the schema shape that enforces CLAUDE.md's BYOK-optional rule. - `ai_api_key_encrypted BYTEA` stores the envelope produced by [`CryptoService`](crypto.md) — never the plaintext API key. - `(organization_id, slug)` is UNIQUE — slugs can collide across orgs, just not inside one. ### `api_keys` and `personal_access_tokens` - Secrets are never stored. The 16-character `prefix` is shown once to the caller so we can disambiguate in the UI; the hash is Argon2id (see [auth architecture](auth.md)). - `scopes` is a denormalized space-separated token list. A key with no scopes (`''`) can still be authenticated but will fail every scope-authorization check — a defensive default. - `expires_at`, `last_used_at`, `revoked_at` together give the admin UI everything it needs to surface key health and mint / rotate confidently. ## Soft-delete policy Only `users`, `organizations`, and `projects` soft-delete. Rationale: retention regulations (GDPR deletion requests, SOC2 audit history) push in opposite directions; we keep the row long enough that an undo is cheap and the audit trail intact, then periodically hard-delete. Anything referenced via `FOREIGN KEY … ON DELETE CASCADE` hard-deletes automatically — that's the default for tokens, memberships, project-languages, api-keys, and PATs. A soft-deleted `users` row does *not* cascade: the service layer decides when to fully purge. ## V1 migration story `V1__auth_and_orgs.sql` is the first migration, shipped with T102. It lands seven tables, every FK, every unique constraint, every enum `CHECK`, and the hot-path indexes. ### Creation order Flyway applies the migration as a single transaction. Tables are created top-down so child FKs always have their parent present: 1. `users` — no FK. 2. `organizations` — no FK. 3. `organization_members` — FKs to `organizations`, `users`. 4. `projects` — FK to `organizations`. 5. `project_languages` — FK to `projects`. 6. `api_keys` — FK to `projects`. 7. `personal_access_tokens` — FK to `users`. Indexes and `CHECK` constraints are declared inline with each table. No data seeding — the first user is created through the signup endpoint (T103), never via SQL. ### `ON DELETE CASCADE` coverage Every child FK cascades: | Child table | Parent | Cascade effect | |---|---|---| | `organization_members` | `organizations.id` | Org hard-delete removes every membership row. | | `organization_members` | `users.id` | User hard-delete removes every membership row the user held. | | `projects` | `organizations.id` | Org hard-delete removes every project. | | `project_languages` | `projects.id` | Project delete removes every language. | | `api_keys` | `projects.id` | Project delete revokes every key by deletion. | | `personal_access_tokens` | `users.id` | User hard-delete revokes every PAT. | Three top-level tables (`users`, `organizations`, `projects`) carry `deleted_at TIMESTAMPTZ` for soft-delete. A soft delete does **not** cascade — the row sticks around, referencing data stays intact. A later hard-delete (triggered when the service layer decides retention has expired) cascades through the FKs above. ### `CHECK` constraints Enum-like columns use `VARCHAR(n)` plus a `CHECK` constraint. Migrating the enum values later is a cheap `ALTER TABLE … DROP CONSTRAINT … ADD CONSTRAINT`. | Table | Column | Allowed values | Rationale | |---|---|---|---| | `organization_members` | `role` | `OWNER`, `ADMIN`, `MEMBER` | Matches `io.translately.data.entity.OrganizationRole`; mirrored in security's `OrgRole`. | | `project_languages` | `direction` | `LTR`, `RTL` | UI-relevant only; covers every language in CLDR `v45`. | | `projects` | `ai_provider` | `NULL`, `ANTHROPIC`, `OPENAI`, `OPENAI_COMPATIBLE` | BYOK providers; `NULL` means no AI wired (the default, per CLAUDE.md). | | `projects` | `ai_budget_cap_usd_monthly` | `NULL` or `>= 0` | Prevent accidental negative budgets that would bypass the cap. | Every CHECK has at least one integration-test assertion in `MigrationV1Test` that inserts a violating row and confirms the database rejects it — the constraint is the contract, the test is the proof. ### Unique constraints | Table | Columns | Why | |---|---|---| | all tables | `external_id` | ULID is the public identifier; collisions would leak via URLs | | `users` | `email` | One account per email | | `organizations` | `slug` | Slugs are global; URL-sharability | | `organization_members` | `(organization_id, user_id)` | A user has one role per org | | `projects` | `(organization_id, slug)` | Slugs collide freely across orgs; not within | | `api_keys` | `prefix` | Lookups are by prefix; must be unique globally | | `personal_access_tokens` | `prefix` | Same as API keys | ### Hot-path indexes Beyond the uniques, V1 creates five covering indexes chosen for the first authenticated flows: - `idx_users_email_verified` partial index `WHERE email_verified_at IS NOT NULL` — the "send-me-emails" query pattern. - `idx_org_members_user` — "what orgs is this user in?" on login. - `idx_projects_organization` — "list projects in this org" on dashboard. - `idx_project_languages_project` — "what languages does this project support?" on the translation UI. - `idx_api_keys_project` and `idx_pats_user` — credential listing for the UI. Anything else can run off the uniques / PKs; we'll add more indexes in later phases as real query patterns emerge, never speculatively. ## V3 Phase 2 data model (keys + translations) Phase 2 lands in `V3__keys_translations_icu.sql` (V2 was consumed by the Phase 1 auth-token tables). Eight new tables layer on top of V1's `projects`: - **`namespaces`** — groups keys inside a project. Unique `(project_id, slug)`; lowercase kebab slugs. - **`tags`** — freeform labels for keys. Unique `(project_id, slug)`; optional `#rrggbb` colour. - **`keys`** — the atomic translation unit. Unique `(project_id, namespace_id, key_name)`; state enum (`NEW / TRANSLATING / REVIEW / DONE`); soft-delete via `soft_deleted_at`. - **`key_meta`** — key/value side-table for platform-specific hints (Android `context`, iOS developer notes). Unique `(key_id, meta_key)`. - **`key_tags`** — many-to-many join between `keys` and `tags`. - **`translations`** — one per `(key_id, language_tag)`. ICU source in `value`; state enum (`EMPTY / DRAFT / TRANSLATED / REVIEW / APPROVED` — see [ADR 0002](decisions/0002-translation-state-machine.md)). `author_user_id` is nullable + `ON DELETE SET NULL` so a user deletion doesn't orphan the translation. - **`key_comments`** — translator ⇄ reviewer conversations on a key. Required author; cascades on user delete. - **`key_activity`** — append-only audit trail, one row per lifecycle event. `action_type` enum covers the seven current events; `diff_json` (JSONB) is reserved for the Phase 7 audit-log (T706) payload. Cascades: every child → parent FK is `ON DELETE CASCADE` except the two `author` / `actor` links into `users`, which use `SET NULL` so translations and activity rows outlive the user who created them. Hot-path indexes added in V3: - `idx_keys_project`, `idx_keys_namespace`, `idx_keys_state` — list + filter by namespace and lifecycle state. - `idx_translations_key`, `idx_translations_state` — per-key fan-out + state filter. - `idx_key_activity_key`, `idx_key_activity_created` — the per-key timeline renders newest-first. The state-machine rationale lives in [ADR 0002](decisions/0002-translation-state-machine.md). ### V4 — search index layer `V4__keys_fts_trigram.sql` layers the search infrastructure on top of V3. It's index-only — no new tables, no data model change: - Enables `CREATE EXTENSION pg_trgm` (ships with Postgres 16 core). - Adds a generated `keys.search_vector` column of type `tsvector`. Populated from `key_name` (both as-is and with `[._-]+` runs replaced by spaces so identifier-style names tokenise segment-by-segment) plus `description`. `GENERATED ALWAYS ... STORED` keeps it in lock-step with its source columns without a trigger. - `idx_keys_search_vector` — GIN index on the generated vector, powers the primary FTS path. - `idx_translations_value_trgm` — GIN index with `gin_trgm_ops` on `translations.value`, powers the trigram fallback for fuzzy substring search over translation bodies. The `'simple'` text-search configuration gives up English-aware stemming on purpose — Translately is multilingual and callers search for identifier-like strings where stemming loses precision. A future migration can introduce a per-column language-specific configuration if the UX calls for it. See [ADR 0003](decisions/0003-postgres-fts-over-elasticsearch.md) and the [search architecture page](search.md) for the full rationale. ## Future migrations Phase 3 adds `import_jobs` / `export_jobs`. Phase 4 adds translation memory, budgets, per-provider audit rows. The ID and naming conventions above are inviolate across all of them. See [`.kiro/steering/architecture.md`](https://github.com/Pratiyush/translately/blob/master/.kiro/steering/architecture.md) for the operational guardrails (forward-only migrations, no destructive changes without a deprecation window). --- title: '0001 — Argon2id for password hashing' parent: ADRs grand_parent: Architecture nav_order: 1 --- # 0001 — Argon2id for password and token hashing - **Status:** Accepted - **Date:** 2026-04-18 - **Deciders:** Pratiyush - **Context link:** [T105 — Argon2id password hasher](https://github.com/Pratiyush/translately/issues/132) ## Context and problem statement Translately authenticates users by email + password (T103) and issues two classes of server-verified token: API keys (T110, per-project) and personal access tokens (T110, per-user). All three carry a secret that the server must verify on every request, and none of them may be recoverable from the database if the DB is dumped. We need a password-hashing scheme that is (a) slow enough to frustrate offline brute force on a stolen `secret_hash` column, (b) tunable over time as CPU/GPU cost drops, (c) side-channel resistant, (d) available in a mature JVM library with an acceptable license, and (e) not so slow that the `verify` path becomes a request-latency problem. ## Decision drivers - **MIT-compatible licensing only.** The whole project is MIT (CLAUDE.md rule #4); dependencies must match. - **Java 21 LTS runtime** (Quarkus). No native-image gotchas. - **OWASP current guidance** — the scheme must be a recommendation in the current Password Storage Cheat Sheet, not a legacy workaround. - **Parameter upgradeability.** The stored hash must embed its own parameters so we can harden over time without a schema migration. - **Latency budget.** Targeted verify path: **30–60 ms** on modern server CPU. Above ~150 ms and the login endpoint becomes a DoS amplifier; below ~10 ms and the cost curve for an attacker is too gentle. ## Considered options 1. **bcrypt** — widely deployed, mature Java libraries (Spring Security, Bouncy Castle). Parameters embedded. Downside: max password length is 72 bytes (silently truncates); no memory-hardness, so GPU/ASIC attacks are cheap. 2. **scrypt** — memory-hard, predates Argon2. Works, but no longer the OWASP recommendation; fewer maintained JVM libraries; parameter tuning is finicky. 3. **PBKDF2** — FIPS-approved, simplest, broadly available. Not memory-hard; even high iteration counts do not stop a well-resourced attacker with GPUs. OWASP lists it as acceptable only when Argon2 / scrypt are unavailable. 4. **Argon2id** — winner of the [2015 Password Hashing Competition](https://password-hashing.net/); OWASP's current top recommendation; memory-hard; hybrid of data-independent (side-channel resistant) and data-dependent (brute-force resistant) variants. ## Decision outcome **Chosen option: Argon2id**, because it's the OWASP-recommended default, provides memory-hardness that bcrypt lacks, and has a mature MIT-licensed Java binding (`de.mkammerer:argon2-jvm`) backed by the reference C implementation. ### Parameters Following the [OWASP "m=64 MiB, t=3, p=4"](https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html) Argon2id guideline: - **iterations (t):** 3 - **memory (m):** 65 536 KiB (64 MiB) - **parallelism (p):** 4 - **salt length:** 16 bytes (library default) - **output length:** 32 bytes (library default) These values give ~30–60 ms per hash on a modern server CPU — well inside our latency budget and well outside an attacker's comfort zone for offline brute force. ### Storage The `argon2-jvm` output string is self-describing — it embeds algorithm, version, parameters, salt, and hash in a single compact form: ``` $argon2id$v=19$m=65536,t=3,p=4$$ ``` We store the full string in a single `VARCHAR(256)` column (`users.password_hash`, `api_keys.secret_hash`, `personal_access_tokens.secret_hash`). This means: - **Parameter upgrades require no schema migration.** Future hashes use new parameters; old hashes still verify because `verify` reads them from the encoded string. - **No separate salt column.** The salt rides in the encoded output. ### Consequences - **Good** — state-of-the-art defence against stolen-DB brute force; side-channel resistance; single `VARCHAR` column; parameter upgrades without schema migrations; OWASP-current. - **Neutral** — Each hash allocates 64 MiB transiently. On a server sized for Quarkus this is fine; on a 512 MiB container it's worth configuring login concurrency limits. - **Bad** — `argon2-jvm` depends on a native library (`libargon2`) bundled as a JAR resource. Container images must not strip this. A pure-Java fallback (slower) exists and would be the first fallback if we ever needed to build a true statically-linked native image. ### Implementation notes - Touched modules: `:backend:security` (`io.translately.security.password.PasswordHasher`). - Shared by: user login (T103), API key verification (T110), PAT verification (T110), password-reset and email-verification token hashing (T103). - Migration: none required — this is the first password hasher Translately ships. - Rollback: drop the `argon2-jvm` dependency and replace with bcrypt. Every encoded hash carries its algorithm prefix, so a `verify` could dispatch to bcrypt for `$2a$`-prefixed rows and Argon2 for `$argon2id$` rows during a rollout. ### Progressive hardening `PasswordHasher.verify` can return `true` **and** signal "this hash used weaker parameters than our current default" to the caller. On a successful login we then re-hash with current parameters and write back — users with old hashes transparently upgrade over time. Implementation of the signal is deferred until we first raise the defaults. ## Links - OWASP Password Storage Cheat Sheet: - Argon2 RFC 9106: - `argon2-jvm` library: (MIT) - [Auth architecture](../auth.md) - [PasswordHasher source](https://github.com/Pratiyush/translately/blob/master/backend/security/src/main/kotlin/io/translately/security/password/PasswordHasher.kt) --- title: 0002 — Translation state machine parent: ADRs grand_parent: Architecture nav_order: 2 --- # 0002 — Translation state machine - **Status:** Accepted - **Date:** 2026-04-19 - **Deciders:** Pratiyush - **Context link:** ## Context and problem statement Phase 2 introduces per-language translation cells. A given `Key` has one `Translation` row per configured language, and each cell needs a lifecycle state the UI can filter on, the importer can hydrate, and the audit log can diff. The state doubles as the export gate — only fully-APPROVED translations should land in a production bundle without a reviewer's explicit override. How rich should that state machine be? Too few states and reviewers can't distinguish "I wrote this and need someone to look" from "this is ready to ship." Too many and every transition becomes a UI decision the translator has to defend. ## Decision drivers - Covers the translator → reviewer → export loop with zero manual tagging. - Maps cleanly onto the `CHECK` constraint we ship in V3; string values persist over decades. - Plays well with bulk imports (rows often land already-translated with no human review). - Matches the idioms a translator moving from another localization tool will expect. - Expressible as a single `VARCHAR(16)` column with a CHECK constraint — no side-tables, no bitfields. ## Considered options 1. **3 states:** `EMPTY / TRANSLATED / APPROVED`. Minimal; no draft-vs-finished split, no review limbo. 2. **4 states:** `EMPTY / TRANSLATED / REVIEW / APPROVED`. Review as an explicit interstitial, but no draft state. 3. **5 states:** `EMPTY / DRAFT / TRANSLATED / REVIEW / APPROVED`. Draft = saved but not yet finished; TRANSLATED = the translator thinks it's done; REVIEW = someone else is looking; APPROVED = cleared for export. 4. **6 states:** add `OBSOLETE` or `REJECTED` to 5-state for rejected review outcomes. Every rejection re-enters the flow by reverting to DRAFT anyway, so the extra state is cosmetic. ## Decision outcome **Chosen option: Option 3 (5 states).** The DRAFT↔TRANSLATED split is the one the UI actually needs — autosave while typing lands as DRAFT, explicit "mark done" promotes to TRANSLATED. The REVIEW phase lets the reviewer hold the cell while working through it so edits don't race. Rejection uses the existing DRAFT transition (rejecting sends the cell back to DRAFT with a comment); no sixth state needed. ### Consequences - **Good:** UI gets three meaningful filter chips (DRAFT, TRANSLATED, REVIEW) plus the outer bookends; exporters have one predicate (`state = 'APPROVED'`) to filter on; importers that can't know review status hydrate as TRANSLATED and let a reviewer promote. - **Neutral:** transitions are advisory (not enforced by DB), so every state is reachable from every other — the DB CHECK is the sole hard guarantee. Service layer can add transition rules later without schema churn. - **Bad:** "rejection" has no first-class representation beyond a comment + DRAFT revert. If the product grows a formal rejection UX later, we'd add `REJECTED` in a future migration. ### Implementation notes - Touched modules: `backend/data` (`TranslationState.kt`, V3 migration), `backend/service` (future Phase 2 work wiring the transitions). - Migration: new field, no backfill — this is greenfield. - Rollback: drop the `translations` table (V3 migration revert) undoes it entirely. ## Links - PR: - Entity source: `backend/data/src/main/kotlin/io/translately/data/entity/Translation.kt` (filesystem link omitted until the file lands on master) - [ADR index](README.md) --- title: 0003 — Postgres FTS over Elasticsearch for v1 parent: ADRs grand_parent: Architecture nav_order: 3 --- # 0003 — Postgres FTS over Elasticsearch for v1 - **Status:** Accepted - **Date:** 2026-04-19 - **Deciders:** Pratiyush - **Context link:** ## Context and problem statement Phase 2 ships key search — the UI needs to query across `keys.key_name`, `keys.description`, and translated strings in `translations.value`, combined with filters for namespace, tags, and lifecycle state. The Phase-2 corpus is dozens-to-thousands of keys per project, optionally filtered to a single namespace. Searching is a hot path: the translator clicks into a project and the filter box is the first thing they touch. Translately is meant to be self-hosted. Every added infrastructure dependency — a separate service, its RAM, its index-sync job, its upgrade cadence, its failure mode — is a tax on the operator. The question: do we lean on Postgres's built-in full-text search, or wire Elasticsearch / OpenSearch / Meilisearch as a secondary read-model from day one? ## Decision drivers - Self-hosters run `docker compose up -d` and get a working stack. Every extra service widens the blast radius of that command. - Phase-2 scale is measured in thousands of keys per project, not millions. - We already have Postgres 16 in the stack; `tsvector`, GIN indexing, and `pg_trgm` ship in core with no licensing or install overhead. - Schema evolution on a generated column is cheap; ripping out a secondary index is not. - The search UX (`T207`) needs: exact-ish matching on key names, fuzzy substring matching on translations, filter composition (namespace · tags · state), ranking, pagination. All of this is within Postgres's native capabilities. - MIT-only dependency policy — Elasticsearch's licensing split (Elastic v2 / SSPL) would force us onto OpenSearch, another integration surface. ## Considered options 1. **Postgres FTS (`tsvector`) + `pg_trgm`** — generated `tsvector` column on `keys`, trigram GIN on `translations.value`. Zero new infra. 2. **Elasticsearch / OpenSearch as primary search backend** — write-through index; Postgres remains system of record. 3. **Meilisearch** — lightweight, simpler than Elasticsearch; still a separate service with its own lifecycle. 4. **Trigram-only (no `tsvector`)** — skip FTS, rely on `pg_trgm` substring search for everything. ## Decision outcome **Chosen option: Option 1 — Postgres FTS + `pg_trgm`.** The Phase-2 corpus is tiny by any search backend's standards, Postgres already sits on the critical path, and the generated-column approach keeps the FTS artefact in lock-step with `key_name` / `description` without triggers or application-side bookkeeping. Trigram on `translations.value` is the fuzzy-match fallback when FTS returns no hits. If a deployment later outgrows this — tens of millions of keys, complex language-specific stemming, per-field scoring profiles — Elasticsearch / Meilisearch can land as a read-model in a Phase-8+ optimisation. The service-layer interface (`KeySearchService.search(...)` returning `KeySearchHit`) hides the storage choice, so swapping is a local concern. ### Consequences - **Good:** one fewer service for self-hosters; no index-sync job; transactional consistency between writes and search results by construction; no licensing carve-outs; Postgres upgrades carry search forward. - **Neutral:** search quality is "good enough, not great" — no per-language stemming, no typo tolerance beyond trigram similarity, no phrase proximity scoring. - **Bad:** doesn't scale to very large corpora (millions of translation rows with frequent updates). Mitigated by the Phase-8+ escape hatch above. - **Bad:** the `'simple'` text-search configuration gives up English-aware stemming. We accept this in exchange for uniform behaviour across every language a translator works in. See the [search architecture page](../search.md) for the full rationale. ### Implementation notes - Touched modules: `backend/data` (V4 migration), `backend/service/keys` (new `KeySearchService`), `backend/app` (integration test). - Migration: forward-only. `V4__keys_fts_trigram.sql` adds the generated column + indexes; no data backfill needed (`GENERATED ALWAYS` populates on insert). - Rollback: drop the generated column + the two indexes. The extension can stay; it's cheap. - Future: a per-language `tsvector` on `translations` can land in a later migration without touching V4. ## Links - PR: - Migration source: `V4__keys_fts_trigram.sql` - Service source: `KeySearchService.kt` - Architecture page: [Search](../search.md) - [ADR index](README.md) --- title: ADRs parent: Architecture nav_order: 9 has_children: true permalink: /architecture/decisions/ --- # Architecture Decision Records (ADRs) This directory captures non-trivial technical decisions made during Translately's development. Each record is immutable once accepted — supersede rather than edit. Format: [MADR 3.0](https://adr.github.io/madr/) (Markdown Any Decision Record). See [`_template.md`](_template.md) to start a new one. ## Index | # | Title | Status | Date | |---|---|---|---| | [0001](0001-argon2id-password-hashing.md) | Argon2id for password and token hashing | Accepted | 2026-04-18 | | [0002](0002-translation-state-machine.md) | Translation state machine (5 states) | Accepted | 2026-04-19 | | [0003](0003-postgres-fts-over-elasticsearch.md) | Postgres FTS over Elasticsearch for v1 | Accepted | 2026-04-19 | ## Numbering - Four-digit, monotonic, zero-padded: `0001-`, `0002-`, ... - Filename: `NNNN-.md`. - Reserved: `0000-` is not used (avoid confusion with the template). ## When to write one - Swapping a locked-in library or framework. - Changing the auth / authorization / tenancy model. - Introducing a new storage backend or index structure. - Picking an algorithm with non-obvious trade-offs (crypto, search ranking, diff). - Any performance or scale decision that rules out a simpler alternative. When in doubt, write one — they're cheap to read and the audit trail is the whole point. --- title: ADR template (not an ADR) parent: ADRs grand_parent: Architecture nav_exclude: true --- # NNNN — Title of decision - **Status:** Proposed | Accepted | Superseded by `NNNN-…` | Deprecated - **Date:** YYYY-MM-DD - **Deciders:** Pratiyush - **Context link:** `https://github.com/Pratiyush/translately/issues/` (fill in on adoption) ## Context and problem statement What forced this decision? Describe the pressure: a requirement, a constraint, an incident, a performance bound. One or two paragraphs; no solution yet. ## Decision drivers - Constraint A (e.g. MIT-only dependencies) - Constraint B (e.g. Java 21 LTS) - Non-functional goal C (e.g. < 50 ms p95 on the hot path) ## Considered options 1. Option 1 — one-line description 2. Option 2 — one-line description 3. Option 3 — one-line description ## Decision outcome **Chosen option:** *Option N*, because *one-sentence rationale.* ### Consequences - **Good:** benefit 1, benefit 2. - **Neutral:** implication 1, implication 2. - **Bad:** drawback 1, drawback 2 — mitigated by *...*. ### Implementation notes - Touched modules: `backend/foo`, `webapp/src/bar`. - Migration: *how existing data / callers are moved over.* - Rollback: *what reverting looks like, if viable.* ## Links - Referenced PR: `https://github.com/Pratiyush/translately/pull/` - [Relevant docs](../README.md) --- title: ICU validation parent: Architecture nav_order: 11 --- # ICU MessageFormat validation Translations are stored as ICU MessageFormat source strings. Every write path validates the source before persisting so bad syntax never reaches the database. `IcuValidator` (in `:backend:service`, package `io.translately.service.translations`) is the single entry point. Introduced by T203 (#44) in Phase 2. ## Contract ```kotlin val result: ValidationResult = icuValidator.validate(source, locale) if (!result.ok) { // result.errors is a list of ValidationError(line, col, message, severity) } ``` - **`source: String`** — the ICU MessageFormat source as typed by the translator or pulled from an import. - **`locale: java.util.Locale`** — currently unused; accepted so future WARNING tiers can diff the plural branches against the locale's CLDR plural keywords. - **Return value** — a `ValidationResult` with a derived `ok` flag and a list of structured `ValidationError`s. Empty list ⇔ `ok = true`. Empty or blank sources are valid. The `TranslationState` enum on the Translation entity gates export; the validator only rejects malformed *input*, not empty rows. ## What the validator checks | Check | Rationale | |---|---| | Full ICU grammar — arguments, plural / selectordinal / select, apostrophe escapes, nested branches | Parse via `com.ibm.icu.text.MessagePattern` rather than `MessageFormat` because MessagePattern's parse exceptions carry a source offset we can turn into a line+column. | | Missing `other` branch in plural / selectordinal / select | MessagePattern enforces this at parse time — we catch the resulting `IllegalArgumentException` and surface it as a structured error. CLDR requires `other` on every selector. | | Unknown SIMPLE argument types (`{x, bogusType}`) | Parses cleanly at the grammar level but blows up at format time. Catching here lets the editor mark the problem as the user types. Accepted types: `number`, `date`, `time`, `spellout`, `ordinal`, `duration`. | ## What the validator does NOT check (on purpose) - **Cross-translation argument consistency.** "Does the German translation use the same `{name}` argument the English one does?" — belongs on a key-level diff, not a per-cell validator. - **Locale-specific plural coverage.** A Russian translation that supplies only `one` + `other` is technically under-specified (Russian needs `few` + `many`). Authors routinely ship half-translated cells during a translation sprint; making this an ERROR would gate save on every keystroke. Tracked as a future WARNING once the Severity pipeline lights up in T207. ## Library `com.ibm.icu:icu4j:76.1`. ICU is a Unicode-consortium permissive licence, MIT-compatible. No other ICU/CLDR dep is pulled in — this is the full tree-shaken package. ## Where it's consumed - **Editor autosave (T207)** — validates each keystroke batch on the webapp's PUT path before persisting. Errors surface via CodeMirror 6's linter lane. - **JSON importer (T301)** — validates every incoming translation value; bad ICU aborts the import job with a structured per-row error list. ## Tests - `backend/service/src/test/kotlin/io/translately/service/translations/IcuValidatorTest.kt` — 17 cases covering happy path + Unicode/emoji + empty/blank, CLDR plural sets (English + Russian + Serbian), missing-`other` rejection on plural / selectordinal / select, unknown arg type, malformed single- and multi-line source with line+col recovery, nested plural-in-select, and nested errors surfaced inside well-formed outer structures. No integration test is needed — the validator is a pure function and doesn't touch the database. ## Changelog First shipped in `[Unreleased]` under T203 (#44). --- title: Module map parent: Architecture nav_order: 1 --- # Module map Translately's backend is a single Gradle (Kotlin DSL) build with 13 modules under `:backend:*`. Each module is a narrow library; the single runnable artefact is `:backend:app`, which wires them into a Quarkus application. Webapp and future CLI / SDK packages live outside the Gradle build in `webapp/`, `sdks/`, `cli/` and use their own tooling (pnpm, Vite, TypeScript). ## Module graph ```mermaid flowchart TD app[":backend:app
Quarkus runtime
integration tests"] api[":backend:api
JAX-RS resources
filters + mappers"] service[":backend:service
use-cases
AuthService, OrgService, ..."] data[":backend:data
JPA entities
Panache repos
Flyway migrations"] security[":backend:security
Scope, RBAC
JWT, password, crypto
TenantContext"] email[":backend:email
Quarkus Mailer + Qute"] jobs[":backend:jobs
Quartz job defs"] ai[":backend:ai
AiTranslator port
Claude / OpenAI adapters"] mt[":backend:mt
MachineTranslator port
DeepL / Google / AWS"] storage[":backend:storage
S3 / MinIO adapter"] webhooks[":backend:webhooks
HMAC sender, retries"] cdn[":backend:cdn
content bundle builder"] audit[":backend:audit
append-only log"] app --> api app --> service app --> data app --> jobs app --> ai app --> mt app --> storage app --> email app --> webhooks app --> cdn app --> audit api --> service api --> security service --> data service --> security service --> email service --> ai service --> mt service --> storage service --> audit service --> webhooks data --> security jobs --> service ``` *(Render via GitHub / Pages Mermaid support. A PNG export will land under `diagrams/` the first time someone edits this file in a content-rich PR.)* ## Ownership and rules - **`:backend:security`** is a leaf library. It depends on no other `:backend:*` module. Keep it that way so every module — including `:data` — can use scopes, password hashing, crypto, and tenant context without circular dependencies. Enums that need to exist in both `:data` (as JPA entity state) and `:security` (as pure Kotlin) are duplicated — see `OrganizationRole` (data) vs `OrgRole` (security) with a test that asserts the name round-trip. - **`:backend:data`** owns Flyway migrations and Panache entities. No service, filter, resource, or controller code lives here. Entities are plain data — they may carry `@PrePersist` / `@PreUpdate` for timestamp housekeeping but no business logic. - **`:backend:service`** owns use-case orchestration. A service method is the transactional boundary (`@Transactional`) and is the only layer allowed to emit audit events or send email. - **`:backend:api`** translates HTTP to services and back. Filters here are: `TenantRequestFilter` → authenticators → `ScopeAuthorizationFilter`. Exception mappers map `AuthException` / `InsufficientScopeException` to the uniform `{error:{code,message,details?}}` envelope. - **`:backend:app`** is the only module with `quarkus-resteasy-reactive` at runtime scope. It wires CDI producers (`CryptoServiceProducer`, etc.), hosts the Quarkus test profile, and runs integration tests against Testcontainers Postgres + Mailpit. ## Why this shape - **Boot time matters.** Smaller JARs → faster dev loop → faster CI. The hard rule that `:security` has no heavy deps lets unit tests in that module avoid starting Quarkus at all. - **Testability.** Each leaf module is trivially unit-testable with MockK. Integration tests (`*IT`) live only in `:backend:app` and bring the full Quarkus + Testcontainers environment up; there is no "almost full" middle tier. - **Clean replacement.** Adapters in `:ai`, `:mt`, `:storage`, `:webhooks`, `:cdn` implement a port interface defined in their own package. Adding a new provider is a new class in the same module — no other modules change. See [`.kiro/steering/architecture.md`](https://github.com/Pratiyush/translately/blob/master/.kiro/steering/architecture.md) for the authoritative steering rule. --- title: Multi-tenancy parent: Architecture nav_order: 4 --- # Multi-tenancy Translately is a single-process, multi-tenant server. Every business entity (project, key, translation, API key, screenshot, webhook) hangs off an organization; a request either scopes to exactly one organization or it's a cross-org endpoint (login, signup, organization listing, health, metrics). Introduced by: [T111](https://github.com/Pratiyush/translately/issues/135) (TenantContext + TenantRequestFilter). Related docs: [auth architecture](auth.md), [authorization](authorization.md), [request-lifecycle](request-lifecycle.md). ## Tenant identifier The URL path is the single source of truth: ``` /api/v1/organizations/{orgIdOrSlug}/projects/{projectId}/keys └── tenant identifier ─┘ ``` `{orgIdOrSlug}` is **either**: - a **ULID** (26 Crockford base32 chars), or - a **slug** (lowercase kebab-case, ≤64 chars, starts and ends with `[a-z0-9]`). The syntax check is done in the filter; resolution to an internal `organizations.id BIGINT` happens in service code once the request reaches a DB call. ### Why both? - **Slug** is friendly for URLs users share (`/organizations/acme/…`). - **ULID** is the stable external ID that won't change on rename. Both are accepted at every endpoint; the service layer uses whichever hits first in `SELECT id FROM organizations WHERE external_id = ? OR slug = ? AND deleted_at IS NULL`. ## `TenantContext` [`io.translately.security.tenant.TenantContext`](https://github.com/Pratiyush/translately/blob/master/backend/security/src/main/kotlin/io/translately/security/tenant/TenantContext.kt) is a `@RequestScoped` CDI bean holding exactly the string the client sent — never the resolved internal id. ```kotlin @RequestScoped open class TenantContext { open fun current(): String? // raw URL identifier, or null open fun set(identifier: String?) // filter calls this once per request open fun isBound(): Boolean } ``` Two invariants: 1. **Set exactly once per request**, by `TenantRequestFilter`. Service code reads; nothing else writes. 2. **Never holds the resolved bigint id.** Resolution is DB-bound, cacheable, and happens inside services — not in the request filter chain. `null` is a legitimate value (login, signup, `/q/health`, `GET /`). Resource methods that require a tenant assert `isBound()` or receive the identifier as a `@PathParam`. ## `TenantRequestFilter` [`io.translately.api.tenant.TenantRequestFilter`](https://github.com/Pratiyush/translately/blob/master/backend/api/src/main/kotlin/io/translately/api/tenant/TenantRequestFilter.kt) is a JAX-RS `ContainerRequestFilter` at priority `AUTHENTICATION - 100` — it runs **before** every authenticator. Pseudocode: ```kotlin fun filter(ctx: ContainerRequestContext) { tenantContext.set(extractTenant(ctx.uriInfo.path)) } ``` `extractTenant` matches `^(api/v\d+/)?organizations/([^/]+)(/.*)?$`. The captured identifier is syntax-validated against the ULID / slug regex; anything else is treated as "no tenant" (so that auth endpoints like `/api/v1/auth/login` leave `TenantContext` unbound). **Why before auth?** Authenticators (T110 API-key, T103 JWT) need to know the tenant to scope their credential lookup. An API key is issued against a project; without a tenant in scope, the authenticator can't tell whether the credential is valid for this request. ## Row-level isolation Phase 1 does not activate Hibernate's native multi-tenancy strategy (`@TenantId`) — it uses an explicit `organization_id BIGINT NOT NULL` FK on every tenant-scoped table and Panache repository methods that accept an `organization_id` parameter. This is simpler to reason about and avoids Hibernate's schema-per-tenant gotchas. Phase 2 will layer a Hibernate `@Filter(name = "tenantFilter")` on the relevant entities, activated from the filter chain once the identifier is resolved. The switch is transparent to callers. ## Cross-organization endpoints Not every endpoint is tenant-scoped. Four classes of exception: 1. **Auth** — `/api/v1/auth/*` (signup, login, refresh, verify, reset). 2. **Org listing** — `GET /api/v1/organizations` (the user's own orgs; scoped by `ScopeResolver.canResolveFor(userId, memberships, orgId = null)`). 3. **Health / metrics** — `/q/health`, `/q/metrics`. 4. **Root** — `GET /`. For each, `TenantContext.current()` returns `null` and the resource method handles the non-scoped case explicitly. ## Testing Integration tests live in `:backend:app` under `tenant/`: - `TenantRequestFilterIT` drives a Quarkus request and asserts `TenantContext.current()` after the filter runs. - Pure-unit tests of `extractTenant` live in `:backend:api`'s test tree — they exercise every path shape without bootstrapping Quarkus. - Service-level tests use `@TestProfile` with a stub `TenantContext` so they can run without a full HTTP request. ## Operator implications - **Per-tenant resource caps** (rate-limit, storage quota) are enforced in service code by loading `organizations.limits` (to be added Phase 6) — the filter never makes a policy decision. - **Deleting an organization** soft-deletes the `organizations` row and cascades hard-delete on every FK with `ON DELETE CASCADE`. Multi-tenancy in Phase 1 does not provide a separate "freeze all tenants" switch; that would be a Phase 7 audit feature. --- title: Request lifecycle parent: Architecture nav_order: 3 --- # Request lifecycle A single Translately HTTP request flows through a fixed chain of JAX-RS filters before reaching the resource method, and a fixed chain of exception mappers on the way back. This page documents the order, the invariants each filter maintains, and the error envelope that leaves the server. Introduced by: [T108](https://github.com/Pratiyush/translately/issues/133) (`@RequiresScope` + `ScopeAuthorizationFilter`), [T111](https://github.com/Pratiyush/translately/issues/135) (`TenantRequestFilter`), T103 (auth endpoints), T104 (JWT issuer). Related: [auth architecture](auth.md), [authorization](authorization.md), [multi-tenancy](multi-tenancy.md), [API conventions](https://github.com/Pratiyush/translately/blob/master/.kiro/steering/api-conventions.md). ## Filter chain (request side) ```mermaid sequenceDiagram autonumber participant C as Client participant T as TenantRequestFilter
@Priority(AUTHENTICATION - 100) participant A as JwtSecurityScopesFilter
@Priority(AUTHENTICATION) participant Z as ScopeAuthorizationFilter
@Priority(AUTHORIZATION) participant R as Resource method participant S as Service layer C->>T: HTTP request T->>T: extractTenant(uriInfo.path) T->>T: TenantContext.set(...) T->>A: continue A->>A: parse Authorization header A->>A: verify JWT / API key / PAT A->>A: SecurityScopes.set(resolved scopes) A->>Z: continue Z->>Z: read @RequiresScope on resource alt scopes cover required Z->>R: invoke R->>S: service call (within @Transactional) S-->>R: result R-->>C: 200 + JSON body else scopes insufficient Z-->>C: 403 INSUFFICIENT_SCOPE end ``` ### Why this order - **Tenant first.** Authenticators need to know the tenant to resolve project-scoped credentials (API keys, PATs). - **Auth second.** The scope authorization filter relies on `SecurityScopes` populated by a successful authentication; without a credential, it cannot answer "does this caller have the required scope?". - **Authorization third.** Deliberately last so all upstream context (tenant, principal, scopes) is available when the `@RequiresScope` check runs. Priorities are JAX-RS numeric — lower runs earlier. `Priorities.AUTHENTICATION = 1000` and `Priorities.AUTHORIZATION = 2000` come from `jakarta.ws.rs.Priorities`; the tenant filter's `-100` offset guarantees it runs before any authenticator regardless of future additions. ## Resource → service → data - Resource methods are thin: parse path / query / body, call the service, map the return value to a DTO. No transactions at this layer. - Services are the transactional boundary. `@Transactional` annotates the public method; nested service calls run in the same transaction. - Data access flows through Panache repositories. A single service method is free to issue multiple queries; Hibernate's first-level cache handles same-session identity. ## Exception mapping Uncaught exceptions land in a JAX-RS `ExceptionMapper`. The standard mappings: | Throwable | HTTP | Error envelope `code` | Mapper | |---|---|---|---| | `AuthException.InvalidCredentials` | 401 | `INVALID_CREDENTIALS` | `AuthExceptionMapper` | | `AuthException.EmailNotVerified` | 403 | `EMAIL_NOT_VERIFIED` | ″ | | `AuthException.RefreshTokenReused` | 401 | `REFRESH_TOKEN_REUSED` | ″ | | `AuthException.ValidationFailed` | 400 | `VALIDATION_FAILED` | ″ | | `InsufficientScopeException` | 403 | `INSUFFICIENT_SCOPE` | `InsufficientScopeExceptionMapper` | | `NotFoundException` | 404 | `NOT_FOUND` | `NotFoundMapper` | | `ConstraintViolationException` (Jakarta Validation) | 400 | `VALIDATION_FAILED` | `ConstraintViolationMapper` | | `WebApplicationException` | pass-through | varies | default | | anything else | 500 | `INTERNAL_ERROR` | `FallbackExceptionMapper` (logs stack, hides detail) | Every response uses the uniform envelope: ```json { "error": { "code": "ERROR_CODE", "message": "Human-readable summary", "details": { "optional": "extra context" } } } ``` See [API errors reference](../api/errors.md) for the full catalogue of `code` strings. ## Observability hooks - **Request ID** — Quarkus's default `x-request-id` propagates through the filter chain and lands in structured logs. If absent, the server generates one. - **OpenTelemetry** — Quarkus OTel is enabled by default; the tenant identifier is added as a span attribute (`translately.tenant`) when bound. - **Metrics** — Quarkus Micrometer exposes `/q/metrics` (Prometheus format). Every resource class gets `http.server.requests` counters with route + status labels. ## Cross-cutting tests - `TenantRequestFilterIT` exercises the order: tenant must be bound before authenticators see `ContainerRequestContext`. - `ScopeAuthorizationIT` covers every combination of present / missing scope and proves `INSUFFICIENT_SCOPE` wins over `NOT_FOUND` (we don't leak whether a resource exists). - `AuthResourceIT` (T103) round-trips the full credential set against Testcontainers Postgres + Mailpit. ## When to add a new filter Rarely. Today's chain is three links — keeping it small matters because every filter runs on every request. Before adding one, consider: 1. Can this live in a service instead? 2. Can this be an `ExceptionMapper` triggered by a thrown business exception? 3. If it really must be a filter, what priority relative to the existing three? A new filter ships with a failing test that proves order (it must run before / after an existing one) and a line in this document. --- title: Search parent: Architecture nav_order: 8 --- # Search Translately searches keys and translations with Postgres's built-in full-text + trigram indexes — no Elasticsearch in v1. The steering doc ([`.kiro/steering/architecture.md`](https://github.com/Pratiyush/translately/blob/master/.kiro/steering/architecture.md)) declares the "no Elasticsearch" rule; this page records the how and why. Introduced by: [T206](https://github.com/Pratiyush/translately/issues/47) · First migration: `V4__keys_fts_trigram.sql`. ## Why Postgres, not Elasticsearch Phase 2's search UX — a filter box over keys, optional tag / namespace / state chips, a few dozen results — is a cheap problem. Postgres 16 ships `tsvector` + GIN + `pg_trgm` in core; no separate service to boot, no secondary index to sync, no split-brain failure mode, no extra host or RAM to allocate. Self-hosters get search for free. The escape hatch is live: if a deployment grows past a corpus Postgres handles gracefully (tens of millions of keys, complex multi-language stemming), wiring Elasticsearch or Meilisearch as a read-model is a Phase-8+ optimisation, not a v1 constraint. ## The index layout `V4__keys_fts_trigram.sql` adds three artefacts on top of V3: 1. **`CREATE EXTENSION IF NOT EXISTS pg_trgm;`** — idempotent, ships with Postgres 16 core. 2. **`keys.search_vector`** — a generated `tsvector` column. ```sql ALTER TABLE keys ADD COLUMN search_vector tsvector GENERATED ALWAYS AS ( to_tsvector( 'simple', COALESCE(key_name, '') || ' ' || regexp_replace(COALESCE(key_name, ''), '[._-]+', ' ', 'g') || ' ' || COALESCE(description, '') ) ) STORED; CREATE INDEX idx_keys_search_vector ON keys USING gin (search_vector); ``` The `GENERATED ALWAYS ... STORED` shape keeps the vector in lock-step with `key_name` / `description` without a trigger or app-side bookkeeping. `key_name` is also included with `[._-]+` runs replaced by spaces so identifier-style names like `settings.save.button` produce the lexemes `settings`, `save`, `button` instead of being swallowed by the default parser's `host`-token rule. 3. **`translations.value` trigram GIN** — ```sql CREATE INDEX idx_translations_value_trgm ON translations USING gin (value gin_trgm_ops); ``` Covers `ILIKE` and the `%` similarity operator for fuzzy substring matches over translated text. ## The text-search configuration: `simple` vs `english` Every `to_tsvector(...)` call picks a configuration. V4 uses `'simple'` — no stemming, no language-specific stopwords. Rationale: - Translately is multilingual by design. A single configuration must work identically for `en`, `de`, `ja`, `fr`. An `english` config would stem "running" → "run" but leave German strings alone, biasing ranking against non-English corpora. - Callers search for identifier-like strings — `login.button`, `settings.save.primary` — where stemming loses precision and merges unrelated tokens. - `simple` is cheap and deterministic; upgrade paths stay open. A future migration can introduce a per-column language-specific configuration (e.g. on a `translations.lang_config` generated column keyed off `language_tag`) if the UX calls for it. The generated-column approach keeps that change contained to a single `ALTER`. ## Query composition `io.translately.service.keys.KeySearchService` composes the WHERE clauses from a `KeySearchQuery`. The primary path is FTS on the key side; when FTS finds no hits and a query string was supplied, the service falls through to a trigram similarity match on `translations.value`. ```mermaid flowchart TD A[KeySearchQuery] --> B{Project resolves
for caller?} B -- no --> Z[NOT_FOUND] B -- yes --> C[Compose filters
namespace · tags · state] C --> D{Free-text query?} D -- no --> E[ORDER BY updated_at] D -- yes --> F[FTS on keys.search_vector] F -- hits > 0 --> G[ts_rank order] F -- hits = 0 --> H[Trigram % on translations.value] H --> I[similarity order] ``` Filter rules: - **Namespace** — single `namespace_id` equality; URL-safe kebab slug in, internal id at the boundary. - **Tag intersection** — `key_tags` join with `HAVING COUNT(DISTINCT tag_id) = :required`, so every requested tag must be present. - **State** — equality on `keys.state`; uses `idx_keys_state`. - **Pagination** — `LIMIT :lim OFFSET :off`; stable order via the composite `(rank DESC, id ASC)` or `(updated_at DESC, id ASC)`. ## Ranking - **FTS hits** return `ts_rank(search_vector, plainto_tsquery('simple', :q))` in `matchRank`. Higher is better; ~0.03–0.2 is typical for short corpora. - **Trigram hits** use the `<%` word-similarity operator and return `MAX(word_similarity(:q, value))` — `[0, 1]`, higher is better. Word-similarity is better suited than raw `similarity()` for the "query appears as a word inside a longer translation body" shape the UI typically wants. - **No-query browses** return `matchRank = 0f`. A single `KeySearchHit` carries the entity + its rank so the API layer can expose the score if the UX wants to surface it. ## Membership gating `search()` accepts a `callerExternalId` and requires org membership on the project. Non-members see `OrgException.NotFound("Project")` — the same shape as an unknown project, so the server never leaks project existence. Scope enforcement (e.g. `keys.read`) stays at the JAX-RS resource layer. ## Bench + smoke The integration test `backend/app/src/test/kotlin/io/translately/app/keys/KeySearchServiceIT.kt` seeds 10 keys / 3 tags / 7 translations and exercises every filter combination against a real Postgres container. No Postgres features are themselves tested — the goal is query composition, not PG correctness. See also: [ADR 0003 — Postgres FTS over Elasticsearch for v1](decisions/0003-postgres-fts-over-elasticsearch.md). --- title: Webapp architecture parent: Architecture nav_order: 8 --- # Webapp architecture Translately's webapp is a single-page React application built with Vite, Tailwind, shadcn/ui primitives, and TanStack Query. It renders against the REST API described in [`docs/api/`](../api/) and persists nothing server-side that the API itself doesn't already persist. Introduced by: [T010](https://github.com/Pratiyush/translately/issues/10) (bootstrap), [T114](https://github.com/Pratiyush/translately/issues/24) (theme), [T115](https://github.com/Pratiyush/translately/issues/25) (app shell). Related: [product app-shell](../product/app-shell.md), [product theming](../product/theming.md), [auth architecture](auth.md). ## Stack | Layer | Choice | Why | |---|---|---| | Build | **Vite** | Fast dev server, ES modules, ubiquitous in modern React projects | | UI | **React + TypeScript** | Standard; paired with `strict: true` tsconfig | | Styling | **Tailwind + shadcn/ui primitives** | Token-driven theming without the maintenance tax of a CSS-in-JS runtime | | Icons | **Lucide** (only) | One coherent set; bans ad-hoc icon imports from mixing libraries | | Routing | **React Router** | Data loader story isn't needed yet; we use it for navigation + route gating | | Data | **TanStack Query** | Caches API responses, deduplicates fetches, retries on the happy path | | Forms | **React Hook Form + Zod** | Schema-first validation shared with backend where it makes sense | | Editor | **CodeMirror 6** (Phase 2) | ICU MessageFormat syntax support | | Motion | **Framer Motion** (respects `prefers-reduced-motion`) | | Tests | **Vitest + Testing Library + axe** for unit/component; Playwright for E2E (Phase 3+) | ## Directory layout ``` webapp/src ├── App.tsx, main.tsx, router.tsx entry points + route table ├── theme/ │ ├── ThemeProvider.tsx light/dark/system + persistence │ └── ThemeProvider.test.tsx ├── components/ │ ├── shell/ AppShell, TopBar, NavLinks, │ │ ├── AppShell.tsx OrgSwitcher, UserMenu (+ tests) │ │ ├── TopBar.tsx │ │ ├── NavLinks.tsx │ │ ├── OrgSwitcher.tsx │ │ └── UserMenu.tsx │ ├── routes/ One file per top-level route │ ├── ui/ Owned shadcn/Radix primitives │ │ (Avatar, Button, DropdownMenu, …) │ └── ThemeToggle.tsx Shell-adjacent but not shell-owned ├── lib/ │ ├── auth/ AuthStore + useAuth() hook │ └── utils.ts cn() and friends ├── i18n/ │ ├── en.json Canonical English strings │ └── index.ts t() helper ├── index.css Design-token declarations └── tests/ Test setup ``` ## State model The webapp intentionally keeps global state *small*. Three stores: 1. **`AuthStore`** — dependency-free external store implementing the `useSyncExternalStore` shape. Persists to `localStorage` and subscribes to cross-tab `storage` events so every open tab stays in sync. Holds `{ user, activeOrgId }` and nothing else. 2. **`ThemeProvider`** — React context; source of truth for `theme` (user-selected) and `resolved` (what's actually applied). See [product/theming](../product/theming.md) for the full flow. 3. **TanStack Query cache** — every API-backed thing. Components call `useQuery` / `useMutation`; the cache deduplicates, the retry policy is `retry: false` (fail fast, surface errors to the user). Nothing else sits in global state. Route-local state stays in components; ephemeral UI state stays in React state; server state lives in the Query cache. ## Routing [`router.tsx`](https://github.com/Pratiyush/translately/blob/master/webapp/src/router.tsx) declares: - `/signin` — public, renders outside the shell. - Everything else — inside ``. `RequireAuth` redirects to `/signin` preserving `location.state.from` so the real sign-in flow (T117) can return the user where they started. Phase 3 introduces org-scoped routes (`/{orgSlug}/…`) — that migration is owned by T306. Until then the shell is org-agnostic and the active org is held only in `AuthStore`. ## Component philosophy - **Own your primitives.** The `ui/` folder contains thin Radix wrappers (`Avatar`, `Button`, `DropdownMenu`). The webapp never imports `@radix-ui/*` outside this folder — keeping the API surface small and letting us swap behind the same public interface. - **i18n by default.** No user-visible string is hard-coded in a component. Every label / aria-label / error message goes through `t('…')`; the canonical catalogue is `webapp/src/i18n/en.json`. Tests assert against the English rendering for readability. - **Tokens, not colours.** All colour lives in `index.css` as HSL values behind `--*` custom properties. Components reach for Tailwind utility classes (`bg-background`, `text-foreground`) that resolve to `hsl(var(--…))` — theme switching is a `class="dark"` toggle on ``. ## Build + test ```bash pnpm --filter webapp dev # Vite dev server pnpm --filter webapp test # Vitest pnpm --filter webapp test:a11y # axe assertions under light + dark pnpm --filter webapp build # production build ``` Unit + component tests live next to the files they test. Playwright E2E (Phase 3) lives under `webapp/e2e/`. ## API client `webapp/src/lib/api/` houses the auto-generated TypeScript client (T120). The shape: - **`types.gen.ts`** — generated by [`openapi-typescript`](https://github.com/openapi-ts/openapi-typescript) from the committed `docs/api/openapi.json`. Never hand-edit. Regenerate with `pnpm --filter webapp codegen`; `pnpm codegen:check` fails CI if the committed file drifts. - **`client.ts`** — thin hand-written wrapper on top of [`openapi-fetch`](https://github.com/openapi-ts/openapi-typescript/tree/main/packages/openapi-fetch) (~1 KB gzipped). Exposes `createApiClient({ baseUrl, bearerToken, fetchImpl })` and a singleton `api` for the app; `unwrap()` converts the `{ data, error }` result tuple into a `data`-or-throw shape that feeds TanStack Query's error channel. - **`ApiRequestError`** — thrown by `unwrap()` on any 4xx / 5xx response; carries `status` + the uniform `error.code` / `error.message` / `error.details` envelope so components branch on `code`, not HTTP status. Editing a controller: regenerate the backend schema (`./gradlew :backend:app:copyOpenApi` — T113), regenerate webapp types (`pnpm --filter webapp codegen` — T120), commit both in the same PR. Two drift-checks (`./gradlew check` for the schema, `pnpm codegen:check` for the types) guarantee neither slips. ## Accessibility budget - Every route passes axe-clean under both light and dark themes — the global `App.test.tsx` asserts both. - Every icon-only control carries an explicit `aria-label`. - Focus rings are visible against every surface colour. - `prefers-reduced-motion` is honoured — we collapse transitions to 0 ms rather than removing transitions, so focus handling remains correct. - `prefers-color-scheme` is honoured when the user's theme choice is `system`. - Keyboard-only walk-throughs are part of the acceptance criteria on every user-facing ticket. ## Why this shape - **Static hosting is viable.** The webapp is a dumb SPA — it needs only a CDN and the API. This matters for self-hosters who want to front everything behind nginx / Caddy / Traefik. - **No server-side rendering.** We don't need SEO or cold-start latency: Translately is an authenticated tool. Avoiding SSR keeps the deploy story simple. - **No Redux, no Zustand, no MobX.** The three-store model above covers every real requirement; adding a redux-style tool would be a dependency without a user. - **Tailwind over CSS-in-JS.** Build-time class generation means zero-runtime styling; `class="dark"` toggling on `` means the theme switch is a single-pass repaint. See [`.kiro/steering/ui-conventions.md`](https://github.com/Pratiyush/translately/blob/master/.kiro/steering/ui-conventions.md) for the authoritative UI / accessibility steering rules. --- title: Home layout: default nav_order: 1 description: >- Translately — open-source, MIT-licensed, self-hosted localization and translation management. Keys, translations, ICU validation, i18next JSON import/export shipping in v0.3.0. permalink: / --- # Translately {: .fs-9 } **The open-source, self-hosted translation management platform for teams that ship in more than one language.** MIT. Every feature free. Keys, translations, ICU validation, JSON import/export shipping today. Bring-your-own-key AI arrives in Phase 4 — the platform runs end-to-end without it. {: .fs-5 .fw-300 } [Quickstart]({{ '/quickstart/' | relative_url }}){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } [API reference]({{ '/api/' | relative_url }}){: .btn .fs-5 .mb-4 .mb-md-0 .mr-2 } [GitHub](https://github.com/Pratiyush/translately){: .btn .fs-5 .mb-4 .mb-md-0 .mr-2 } [LLM corpus]({{ '/llms-full.txt' | relative_url }}){: .btn .fs-5 .mb-4 .mb-md-0 } --- ## What ships today — v0.3.0 (MVP) Translately v0.3.0 is the **end of the MVP**: Phases 0 through 3 are complete, the platform works end-to-end, every capability listed here is in `master` and running on a signed release. | Capability | What it means for you | |---|---| | **Email + password auth** with verified accounts, refresh-token rotation, forgot-password + reset | A translator can sign up, verify email, and sign in — nothing else needed to start. | | **Organizations, projects, members** with OWNER / ADMIN / MEMBER roles, last-owner protection, private-org semantics | Multi-tenant from day one. Each org scoped cleanly; non-members can't enumerate private orgs. | | **Keys, namespaces, translations** with the full 5-state lifecycle (`EMPTY → DRAFT → TRANSLATED → REVIEW → APPROVED`) | Translators get a sticky-col table with per-cell autosave; admins get namespaces to group keys by feature. | | **ICU MessageFormat validation** with CLDR plurals, line + col error reporting | Bad ICU is rejected at save time, not discovered in production. | | **Postgres full-text + trigram key search** | Find a key in a 10k-key project without dragging Elasticsearch into your self-host stack. | | **i18next JSON import + export** — flat and nested shapes, `KEEP` / `OVERWRITE` / `MERGE` conflict modes, per-row ICU validation | Paste your existing translations in, export them back out. One language per call; multi-language dumps as a scriptable GET. | | **API keys + Personal Access Tokens** with scope intersection computed on every request | CI pipelines + scripts authenticate with machine credentials; revocation is instant. | | **Light + dark + keyboard-first UI** — WCAG 2.1 AA, Radix Dialog + focus trap, `prefers-reduced-motion` respected | Accessible out of the box. No gated "enterprise a11y" SKU. | All of it MIT. All of it free. No paywalled tier exists. ## Coming next — v0.4.0 (Phase 4) - **Bring-your-own-key AI** — per-project OpenAI / Anthropic / Google / Azure / custom endpoints, envelope-encrypted at rest. **The platform runs end-to-end without this configured.** - **Machine translation** via the same BYOK layer for non-generative providers. - **Translation Memory** over `pgvector` + trigram. - **Async Quartz + SSE** for the bulk-import paths that v0.3.0 ships sync. [Full roadmap](#roadmap) below. --- ## Documentation by surface | Surface | Start here | |---|---| | [Quickstart]({{ '/quickstart/' | relative_url }}) | 10-minute path from `docker compose up` to your first exported JSON. | | [Product]({{ '/product/' | relative_url }}) | Walkthroughs of every user-visible flow — auth, orgs, keys table, editor, import wizard, export modal. | | [API reference]({{ '/api/' | relative_url }}) | OpenAPI spec + scope matrix + error catalogue + rate-limit policy + the imports/exports + keys endpoints. | | [Architecture]({{ '/architecture/' | relative_url }}) | Module map, data model (V1–V4), request lifecycle, multi-tenancy, crypto, ICU, search, ADRs. | | [Self-hosting]({{ '/self-hosting/' | relative_url }}) | Runtime profiles, dev compose, hardening checklist. Everything an operator needs. | Every PR ships its docs — see [CLAUDE.md rule #10](https://github.com/Pratiyush/translately/blob/master/CLAUDE.md#hard-rules-non-negotiable). Stale docs are worse than missing docs. ## Why Translately - **MIT, no gated tier.** SSO, SAML, LDAP, Tasks, Branching, Glossaries, Webhooks, CDN, custom storage, granular permissions, audit logs — all on the free shipping schedule. No "enterprise" upsell. - **BYOK is the only AI shape.** Per-project encryption key, envelope-sealed at rest, zero Translately-owned API keys in the loop. If AI is off, every feature except AI suggestions still works. - **Quarkus + Kotlin backend.** Fast boot, low memory, native-image friendly for the zero-cost deploy shape. - **Translator-first UI.** Sticky column, autosave, ⌘+↵ commit, Escape revert, 5-state badges, inline ICU errors. ## Roadmap Seven-phase plan. One signed minor-version tag per phase. **Phases 0–3 (MVP) are complete.** | Phase | Tag | Theme | Status | |---|---|---|---| | 0 | `v0.0.1` | Bootstrap — CI, repo, scaffolding | ✅ shipped 2026-04-17 | | 1 | `v0.1.0` | Auth + Org / Project + webapp shell | ✅ shipped 2026-04-18 | | 2 | `v0.2.0` | Keys + Translations + ICU | ✅ shipped 2026-04-19 | | 3 | `v0.3.0` | JSON import / export · **MVP** | ✅ shipped 2026-04-19 | | 4 | `v0.4.0` | BYOK AI + MT + Translation Memory | next | | 5 | `v0.5.0` | Screenshots + JS SDK + in-context editor | planned | | 6 | `v0.6.0` | Webhooks + CDN + CLI + glossaries | planned | | 7 | `v1.0.0` | Tasks + Branching + SSO / SAML / LDAP + audit | planned | See the [CHANGELOG](https://github.com/Pratiyush/translately/blob/master/CHANGELOG.md) for per-PR detail and [RELEASE-NOTES](https://github.com/Pratiyush/translately/blob/master/RELEASE-NOTES.md) for the long-form narratives. ## Download - [Full docs bundle (ZIP)]({{ '/downloads/translately-docs.zip' | relative_url }}) — every page under `docs/`, deterministic snapshot of the latest `master`. - [LLM corpus (single file)]({{ '/llms-full.txt' | relative_url }}) — every `.md` concatenated with file-boundary markers, per [llmstxt.org](https://llmstxt.org). For Claude, Cursor, in-house assistants. - [Link index]({{ '/llms.txt' | relative_url }}) — the short llms.txt discovery file. - Container images: `ghcr.io/pratiyush/translately-backend:v0.3.0` + `ghcr.io/pratiyush/translately-webapp:v0.3.0` (published by `release.yml` on every signed tag). ## License - **Outbound:** [MIT](https://github.com/Pratiyush/translately/blob/master/LICENSE). Use it, fork it, ship it, sell it — no strings. - **Inbound:** [Contributor License Agreement](https://github.com/Pratiyush/translately/blob/master/CLA.md) (Apache-ICLA-adapted, copyright-license form — contributor retains ownership). Every PR carries a ticked CLA checkbox. --- title: Product nav_order: 2 has_children: true permalink: /product/ --- # Product docs User-facing documentation of every shipped feature. One page per user-visible flow, with light + dark screenshots and keyboard / a11y notes. Per [CLAUDE.md rule #10](https://github.com/Pratiyush/translately/blob/master/CLAUDE.md), every PR that changes a user-visible flow lands its matching page here in the same PR. ## Index *Pages are added per ticket — see the per-phase back-fill milestones for the current list.* - [Application shell](app-shell.md) — nav, org switcher, user menu, routing _(added by T115)_ - [Theming](theming.md) — light / dark / system toggle, tokens, persistence _(added by T114)_ - [Authentication](auth.md) — signup, email verify, login, password reset _(added by T103)_ - [API keys & PATs](api-keys-and-pats.md) — project-scoped API keys and user-scoped Personal Access Tokens _(added by T110)_ - [Organizations, projects, and members](organizations-and-projects.md) — real CRUD UI for tenants, projects, and role-based membership _(added by T118 + T119)_ - [Screenshots](screenshots.md) — capture workflow, regen command, light + dark `` embed pattern ## Conventions - **One page per user-visible capability.** Not per ticket — a single feature spanning multiple tickets gets one page that is updated in-place. - **Screenshots in both themes.** Every screenshot is taken in light and dark modes. File names: `foo-light.png`, `foo-dark.png`. - **Keyboard section.** Every page lists the keyboard shortcuts that touch the feature, plus any ARIA landmarks used. - **Code blocks are runnable.** No pseudo-code. If a request body is shown, it works against the current API. - **Link to the CHANGELOG entry** that first introduced the feature. --- title: API keys & Personal Access Tokens parent: Product nav_order: 4 --- # API keys & Personal Access Tokens Translately ships two long-lived credential types for server-to-server and CLI use: | Credential | Scope of ownership | Used by | |---|---|---| | **API key** | a single project | CI jobs, deploy pipelines, anything that acts on behalf of a project | | **Personal Access Token (PAT)** | a single user, across every project they belong to | the CLI, personal scripts, integrations that act "as the user" | Both are minted from the Translately REST API (UI lands later in Phase 1). Secrets are **shown exactly once** at mint time, stored only as Argon2id hashes, and can be revoked at any time without affecting other credentials. Introduced by: [T110](https://github.com/Pratiyush/translately/issues/28) · Ships in `v0.1.0`. Related: [API auth endpoints](../api/auth.md), [scopes](../api/scopes.md), [error codes](../api/errors.md). ## Token format Both credential types share the same shape: ``` tr__<8-char-prefix>.<43-char-secret> └───── public prefix ──────┘ └──── secret ────┘ ``` - `tr_ak_…` — API key (project-scoped) - `tr_pat_…` — Personal Access Token (user-scoped) The **public prefix** is stored in the database and shown in listings so you can recognise your keys at a glance. The **secret** half is Argon2id-hashed before persistence; the plaintext is only in the response to the mint call. A full token looks like: ``` tr_ak_k9c4n2xb.a1B2c3D4e5F6g7H8i9J0k1L2m3N4o5P6q7R8s9T0u1V └── prefix ──┘ └─────────────── secret (43 chars) ────────┘ ``` Present it on API requests as: ``` Authorization: ApiKey tr_ak_k9c4n2xb.a1B2c3D4… Authorization: Bearer tr_pat_k9c4n2xb.a1B2c3D4… ``` ## Minting an API key `POST /api/v1/projects/{projectId}/api-keys` — requires the `api-keys.write` scope in the project's organization. ```bash curl -X POST https://your-host/api/v1/projects/01HT…/api-keys \ -H "Authorization: Bearer $ACCESS_JWT" \ -H "Content-Type: application/json" \ -d '{ "name": "CI publisher", "scopes": ["keys.read", "keys.write", "translations.write", "imports.write"] }' ``` Successful response (`201 Created`): ```json { "id": "01HT…", "prefix": "tr_ak_k9c4n2xb", "secret": "tr_ak_k9c4n2xb.a1B2c3D4e5F6g7H8i9J0k1L2m3N4o5P6q7R8s9T0u1V", "name": "CI publisher", "scopes": ["imports.write", "keys.read", "keys.write", "translations.write"], "expiresAt": null, "createdAt": "2026-04-18T10:45:00Z" } ``` **Save `secret` now.** The server will never show it again. Store it in your CI provider's secret vault (GitHub Actions secret, GitLab CI variable, HashiCorp Vault, …). ### Scope intersection You can only mint a key with scopes **you already hold**. Asking for a scope outside your current set returns: ``` 403 Forbidden { "error": { "code": "SCOPE_ESCALATION", "details": { "requested": ["audit.read", "keys.write"], "held": ["keys.write"], "missing": ["audit.read"] } } } ``` A MEMBER minting an API key can only pass the MEMBER scope set. An ADMIN can pass any ADMIN scope. This rule keeps API keys from becoming an escalation vector. ### Optional expiry Pass `expiresAt` (ISO-8601 UTC) to mint a key that self-revokes after the given time: ```json { "name": "1-day CI token", "scopes": ["keys.read", "keys.write"], "expiresAt": "2026-04-19T10:00:00Z" } ``` ## Listing API keys `GET /api/v1/projects/{projectId}/api-keys` — requires `api-keys.read`. ```json { "data": [ { "id": "01HT…", "prefix": "tr_ak_k9c4n2xb", "name": "CI publisher", "scopes": ["imports.write", "keys.read", "keys.write", "translations.write"], "expiresAt": null, "lastUsedAt": "2026-04-17T22:14:00Z", "revokedAt": null, "createdAt": "2026-04-18T10:45:00Z" } ] } ``` Secrets are **never** in listings — only the public prefix. If you've lost the secret, revoke this key and mint a fresh one. ## Revoking an API key `DELETE /api/v1/projects/{projectId}/api-keys/{keyId}` — requires `api-keys.write`. Returns `204 No Content`. - **Idempotent.** Revoking a revoked key is a no-op — still `204`, no error. - **Immediate.** Once revoked, the key fails authentication on the next request. ## Personal Access Tokens Same shape, different audience. PATs belong to a **user** and span every project that user is a member of. They're what you'd use for a personal CLI setup or a one-off integration where a user's identity makes more sense than a project identity. ### Minting a PAT `POST /api/v1/users/me/pats` — no scope required beyond a valid access JWT; users can always manage their own credentials. ```bash curl -X POST https://your-host/api/v1/users/me/pats \ -H "Authorization: Bearer $ACCESS_JWT" \ -H "Content-Type: application/json" \ -d '{"name":"laptop cli", "scopes":["keys.read", "keys.write"]}' ``` Response is identical in shape to the API-key mint, except the prefix is `tr_pat_…`. Same **scope intersection** rule: the PAT's scopes must be a subset of the caller's JWT scopes. ### Listing / revoking PATs - `GET /api/v1/users/me/pats` — list your own PATs (summaries only). - `DELETE /api/v1/users/me/pats/{patId}` — revoke one of your PATs. Trying to revoke someone else's PAT returns `404 NOT_FOUND` — the server never discloses whether the referenced PAT exists. ## Operational guidance - **Rotate on a schedule.** Mint a new key, roll the CI secret, revoke the old. There's no "rotate-in-place" endpoint — the one-time-secret model makes a clean rotation trivially easier than a hot rename. - **Prefer short-lived keys where possible.** Pass `expiresAt` in the CI flow so abandoned branches don't leave stale long-lived credentials behind. - **Least-scoped keys.** A CI job that only publishes translations doesn't need `project-settings.write`. Grant the minimum. - **Detect compromise.** Watch `lastUsedAt` in the UI (arrives in Phase 1's webapp). A key that hasn't been used in months + an unexpected `lastUsedAt` bump → revoke. ## Authentication on protected endpoints Authentication is **live** — both credential types are accepted on every protected endpoint alongside access JWTs. Present the full token exactly as it was returned at mint time: ```bash # API key curl -H "Authorization: ApiKey tr_ak_k9c4n2xb.a1B2c3D4…" \ https://your-host/api/v1/projects/01HT…/keys # Personal Access Token curl -H "Authorization: Bearer tr_pat_k9c4n2xb.a1B2c3D4…" \ https://your-host/api/v1/organizations/acme/projects ``` The backend dispatches on the header shape: - `Authorization: ApiKey ` → API-key authenticator; scopes taken from the stored `api_keys.scopes` column. - `Authorization: Bearer tr_pat_` → PAT authenticator; scopes intersected with the owning user's **current** effective scopes (see below). - `Authorization: Bearer ` (anything else) → normal JWT access-token flow. Every request is scoped by exactly one credential. Presenting two credentials on the same request (e.g. a JWT header plus an API-key query parameter) is refused at the HTTP layer — there's no merging of grants. ### PAT scope intersection at request time The scopes a PAT was minted with are an **upper bound**. On every request the authenticator recomputes the owning user's effective scope set from their current `OrganizationMember` rows and intersects. Practical consequence: - Mint a PAT with `keys.write translations.write` while you're an ADMIN of org X. - You are demoted to MEMBER of org X (ADMIN is a superset of MEMBER, and MEMBER *does* hold `keys.write` + `translations.write`) — the PAT keeps working. - You are demoted to MEMBER of org Y where you originally held ADMIN, and the PAT also carried `api-keys.write` — `api-keys.write` is ADMIN-only, so that scope is dropped from the request's effective set, and any endpoint that requires it will 403. Other scopes the MEMBER still holds continue to work. - You are removed from every org you belong to → the PAT's effective scope set collapses to empty, every protected endpoint returns 403. Revoke the PAT if you want a cleaner "not authenticated" answer. API keys don't re-intersect — they're project-scoped, and the minting admin already enforced intersection at issue time. Revocation or a past `expires_at` is the only way to cut an API key off. ### Failure modes | HTTP | `error.code` | Meaning | |---|---|---| | 401 | `UNAUTHENTICATED` | Unknown prefix, bad secret, or malformed token. Intentionally indistinguishable so attackers can't probe the prefix space. | | 401 | `CREDENTIAL_REVOKED` | `revoked_at` has been stamped on the row. | | 401 | `CREDENTIAL_EXPIRED` | `expires_at` has passed. | | 403 | `INSUFFICIENT_SCOPE` | Credential is valid but lacks the scope(s) the endpoint requires. | Introduced by: [T110-enforce](https://github.com/Pratiyush/translately/issues/149). ## Error-code reference | HTTP | `error.code` | When | |---|---|---| | `201` | — | Credential minted successfully | | `200` | — | Listing returned | | `204` | — | Revoke succeeded (or was already revoked) | | `400` | `VALIDATION_FAILED` | Missing name, empty scopes, past expiry | | `400` | `UNKNOWN_SCOPE` | A requested scope token isn't in [Scope](../api/scopes.md) | | `401` | `UNAUTHENTICATED` | No credential on the request, unknown prefix, or bad secret | | `401` | `CREDENTIAL_REVOKED` | API key / PAT exists and secret matches, but `revoked_at` is set | | `401` | `CREDENTIAL_EXPIRED` | API key / PAT exists and secret matches, but `expires_at` has passed | | `403` | `SCOPE_ESCALATION` | Mint request asked for a scope the caller doesn't hold | | `403` | `INSUFFICIENT_SCOPE` | Valid credential, but the scope required by the endpoint isn't in its effective set | | `404` | `NOT_FOUND` | Project / PAT / API key not found (or not owned by caller) | See the [full catalogue](../api/errors.md) for response envelopes. ## Changelog Shipped in [Unreleased](https://github.com/Pratiyush/translately/blob/master/CHANGELOG.md) (Phase 1, T110). Lands with `v0.1.0`. --- title: Application shell parent: Product nav_order: 1 # See docs/product/auth.md for why this permalink overrides pretty. permalink: /product/app-shell.html --- # Application shell The application shell is the persistent chrome every authenticated route renders inside — the top bar, primary navigation, org switcher, and user menu. Route changes swap only the inner `
` region, so focus, scroll position, and transient state stay put. Introduced by: [T115](https://github.com/Pratiyush/translately/issues/138) · Ships in `v0.1.0` · Source: [`webapp/src/components/shell/`](https://github.com/Pratiyush/translately/blob/master/webapp/src/components/shell/). Related: [theming](theming.md), [authentication](auth.md), [webapp architecture](../architecture/webapp.md). ## Anatomy ``` ┌──────────────────────────────────────────────────────────────────────┐ │ [✦ Translately] [▾ Acme Corp] Dashboard Orgs Projects [☼] [◌]│ ←
TopBar └──────────────────────────────────────────────────────────────────────┘ │ │ │ (route content) │ ←
│ │ ``` - **Left group** — brand link (returns to `/`), vertical divider, **OrgSwitcher**. - **Center** — **NavLinks** (Dashboard, Orgs, Projects). Hidden below `md` breakpoint; the nav collapses in favour of the brand on small screens. - **Right group** — **ThemeToggle**, **UserMenu**. The `
` is the single [`banner`](https://www.w3.org/TR/wai-aria-1.2/#banner) landmark; the `
` is the single [`main`](https://www.w3.org/TR/wai-aria-1.2/#main) landmark with `tabIndex={-1}` so a "skip to content" link can focus it. ### Screenshot Authenticated app shell on the dashboard route: brand, org switcher, primary nav, theme toggle, and user avatar in the top bar. ## OrgSwitcher Replaces the brand-only header on every page. Three states: 1. **No orgs** — the component collapses to a `+ Create organization` CTA that routes to `/orgs`. Shown for brand-new signups until they either create or accept an invite. 2. **One or more orgs, one active** — trigger shows the active org's name and a chevron. Click / Enter opens a dropdown menu listing **every** org alphabetically; the active one has a check icon, each row shows a right-aligned role badge (OWNER / ADMIN / MEMBER). 3. **Org selected** — clicking a row sets it active in-memory (`AuthStore.setActiveOrg`). Phase 1 deliberately does not reflect the active org into the URL; org-scoped routes land in Phase 3 (T306). Keyboard + a11y: - The trigger is a real `