# Clutter

> Clutter generates believable synthetic companies and the documents, spreadsheets, emails, images and datasets they would really have. Use it to fill dev/test/demo systems (SharePoint, CRMs, file shares) with realistic content, or to give AI agents believable data to reason over — without using real data.

Clutter is fully operable over a documented REST API and a Model Context Protocol (MCP) server, so an AI agent can drive the whole pipeline end to end: invent a company, generate a batch of documents/data/metadata grounded in it, and download the files (with the folder tree preserved) ready to load into a target system.

Multi-tenant, pay-as-you-go. New accounts get 10 free documents + 100 free data rows, then draw down prepaid credit. "Ask the company", company builds and metadata generation are free.

## The five modes
- Company builder — prompt -> a fictional company "world bible" (prose story + structured org.json of teams/people). Free.
- Document generator — fan out a batch of believable documents (docx, pdf, xlsx, eml, jpg) across the company, organised into folders. $0.10 USD/document after the free allowance.
- Data generator — one tabular dataset (xlsx/csv/json) with the exact row count you ask for. $0.06 USD/10 rows.
- Metadata generator — one metadata record per document of an existing document run (maps to content-system columns). Free.
- Ask the company — synchronous Q&A grounded in the world bible. Free/unmetered.

## API
- [OpenAPI 3 spec](https://clutter.run/api/docs/json): the machine-readable definition of every endpoint — fetch this first.
- [Interactive API docs (Swagger UI)](https://clutter.run/api/docs): human-browsable reference.

## Authentication
All `/api/*` endpoints take `Authorization: Bearer <token>`, where the token is an API key (string `clt_live_...`) or a browser Cognito ID token. For programmatic/agent use, mint an API key: sign in at https://clutter.run, open Settings, and create a key (shown once). Send it as `Authorization: Bearer clt_live_...`.

## End-to-end lifecycle (REST)
The build/generate operations are asynchronous: POST returns 202 with an id, then you poll a GET until the status is terminal.
1. Build a company: `POST /api/orgs` `{ "prompt": "<describe the company>", "locale_language": "en-US", "target_systems": ["SharePoint"], "web_search": true }` -> `202 { orgId }`. (Omit projectId to auto-create a project. `target_systems` weaves the named systems into the company's story; `web_search` defaults to true and lets the builder ground details in real-world facts — set false to disable.)
2. Poll: `GET /api/orgs/{orgId}` every ~3-5s until `status` is `ready` (or `failed`).
3. Generate documents: `POST /api/runs` `{ "orgId", "kind": "doc_generator", "params": { "prompt": "<what to generate>", "doc_number": 20, "file_types": ["docx","pdf","xlsx","eml","jpg"], "structure": "nested", "target_system": "SharePoint" } }` -> `202 { runId }`.
4. Poll: `GET /api/runs/{runId}` every ~5-10s until `status` is `complete` | `partial` | `failed`.
5. List artefacts: `GET /api/runs/{runId}/documents`.
6. Download: per file `GET /api/documents/{documentId}/url` (presigned URL), or the whole run as a ZIP — `POST /api/runs/{runId}/zip` then `GET /api/runs/{runId}/zip/url`. The ZIP preserves the folder tree, ready to drop into SharePoint or a file share.

### Other run kinds (step 3)
- Data: `kind": "data_generator"`, params `{ "prompt", "row_count": 200, "data_format": "xlsx"|"csv"|"json", "data_fields"?, "target_system"? }`.
- Metadata (1 record per doc of an existing doc run): `kind": "doc_metadata_gen"`, params `{ "sourceRunId": "<a doc_generator runId>", "data_format", "data_fields"?, "target_system"? }`. `sourceRunId` is the document run to describe.
- Document continuity: a `doc_generator` run may include `"reference_run_id": "<a prior doc_generator runId of the same company>"` so the new batch fits alongside the earlier one (planning only — the reference is never copied into generated bodies).
- Ask the company (synchronous, free): `POST /api/orgs/{orgId}/query` `{ "prompt": "<question>" }`.

Limits: up to 300 documents and 2000 data rows per run. Per API key: a 60 req/min rate limit, plus a rolling-24h quota (default 1000) on generation operations only (POST /api/orgs, /api/orgs/{id}/query, /api/runs) — GET polling, downloads and zip do NOT count, so polling your own async runs is free (see response headers `X-RateLimit-*` / `X-Quota-*`). A `402` on `POST /api/runs` means insufficient credit — top up at https://clutter.run/billing.

## Recipe: load a generated run into SharePoint (Microsoft Graph)
A `doc_generator` run lays files out under `folderPath` and offers a whole-run ZIP that preserves the tree. To land them in a SharePoint document library:
1. Generate with `structure: "nested"`, `target_system: "SharePoint"`; wait until the run is `complete`.
2. `GET /api/runs/{runId}/documents` for the file list (title, fileFormat, folderPath); fetch bytes via `GET /api/documents/{id}/url`, or the whole tree via `POST /api/runs/{runId}/zip` then `GET /api/runs/{runId}/zip/url`.
3. With a Microsoft Graph token (app registration, `Sites.ReadWrite.All`): resolve the library `drive-id` (`GET /sites/{host}:/sites/{site}` → `GET /sites/{site-id}/drives`), then upload each file recreating its folderPath: `PUT https://graph.microsoft.com/v1.0/drives/{drive-id}/root:/{folderPath}/{filename}:/content` (Graph auto-creates folders; use an upload session for files >4 MB).
4. Optional: run `doc_metadata_gen` against the same runId for one metadata record per document, then `PATCH .../items/{item-id}/listItem/fields` to set the SharePoint columns (author, department, sensitivity, dates).

Metadata can do more than fill columns. `data_fields` is schema-by-example — request decision fields (`sensitivity`, `retention_label`, `access_group`, `site`, `doc_library`…) and use the coherent per-document values to drive automation: apply sensitivity labels, set item permissions, assign retention, or route flat files to the right site/library instead of fixed folders (flat + metadata is often better than nested folders for automation). You can even generate the information architecture itself as a dataset and provision sites/libraries from it. Worked examples (basic dump → columns → label/permission/retention triggers → flat+metadata routing → generated IA): https://clutter.run/sharepoint-cookbook.md

## MCP server (recommended for agents)
The Clutter MCP server wraps this REST API as agent tools (create a company, generate documents/data/metadata, wait-for-ready / wait-for-complete waiters, list and download artefacts), so an assistant can operate Clutter natively. Package: `clutter-mcp` (run with `npx -y clutter-mcp`). Only `CLUTTER_API_KEY=clt_live_...` is required (`CLUTTER_API_URL` defaults to `https://clutter.run/api`).

## More
- [About Clutter](https://clutter.run/about)
- [Terms of Service (incl. billing & acceptable use)](https://clutter.run/legal/terms)
- [Privacy Policy](https://clutter.run/legal/privacy)