# Clutter → SharePoint cookbook

Worked patterns for loading Clutter-generated content into Microsoft 365 / SharePoint
Online via the **Microsoft Graph** API. These are illustrative, not exhaustive — start
with Pattern 1 (the common case) and reach for the others when you want metadata to do
more than fill columns.

**What Clutter gives you / where your agent takes over.** Clutter invents a believable
company and mass-produces its documents, datasets and per-document metadata, then hands
you download URLs (per file, or a whole-run ZIP that preserves the folder tree). The
upload into SharePoint is your agent's job — Clutter never touches your tenant. Every
Clutter step below maps to a tool on the [`clutter-mcp`](https://www.npmjs.com/package/clutter-mcp)
server (`build_org` → `wait_for_org` → `create_run` → `wait_for_run` →
`list_run_documents` / `get_document_url` / `build_zip` / `get_zip_url`) or to the REST
endpoints in the [OpenAPI spec](https://clutter.run/api/docs/json).

**Prerequisites for the Graph side (all patterns).** An Entra app registration with the
appropriate application permission (`Sites.ReadWrite.All` to write files;
`Sites.FullControl.All` if you also provision sites/libraries; add
`RecordsManagement.ReadWrite.All` / `InformationProtectionPolicy.Read` for the governance
patterns), admin consent, and a Graph token. Resolve a library's `drive-id` once:
`GET /sites/{hostname}:/sites/{site-path}?$select=id` →
`GET /sites/{site-id}/drives` → pick the document library.

---

## Pattern 1 — Drop a folder tree into one library (the common case)

Generate a nested batch and mirror its folder structure straight into a document library.

1. **Generate.** `create_run` with `kind: "doc_generator"`,
   `params: { prompt, doc_number, file_types: ["docx","pdf","xlsx","eml"], structure: "nested", target_system: "SharePoint" }`.
2. **Wait** (`wait_for_run`) until `complete`.
3. **List** (`list_run_documents`) — each item carries `title`, `fileFormat`, `folderPath`.
4. **Upload**, recreating `folderPath`. Small files (≤4 MB — most generated docs):
   ```
   PUT https://graph.microsoft.com/v1.0/drives/{drive-id}/root:/{folderPath}/{filename}:/content
   Authorization: Bearer {graph-token}
   Content-Type: application/octet-stream
   <file bytes>
   ```
   Graph auto-creates intermediate folders. For files >4 MB open an upload session
   (`POST .../{path}:/createUploadSession`). Or pull the whole run as a ZIP
   (`build_zip` → `get_zip_url`) and expand it locally before uploading.

---

## Pattern 2 — Set library columns from generated metadata

SharePoint document libraries are most useful when their columns are populated. Clutter's
metadata generator emits **one record per document** of an existing doc run, shaped to the
fields you ask for.

1. After the doc run completes, `create_run` with `kind: "doc_metadata_gen"`,
   `params: { sourceRunId: "<the doc_generator runId>", data_format: "json", target_system: "SharePoint", data_fields: ["author","department","document_type","review_date","status"] }`.
   - `data_fields` is **schema-by-example**: pass a list of column names (or a small JSON
     object showing the shape) and the model mirrors those keys for every document,
     grounded in the company and each document's content.
2. For each record, find the uploaded item and PATCH its list-item fields:
   ```
   PATCH https://graph.microsoft.com/v1.0/drives/{drive-id}/items/{item-id}/listItem/fields
   { "author": "...", "department": "...", "document_type": "...", "review_date": "...", "status": "..." }
   ```
   (The library must already have matching site columns; create them first via
   `POST /sites/{site-id}/contentTypes` or list columns, or add them in the UI.)

---

## Pattern 3 — Metadata as automation triggers (labels, permissions, retention)

Metadata doesn't have to stop at descriptive columns. Ask Clutter for **decision fields**
and use their values to drive governance automation — exactly the kind of realistic,
varied input you want when testing a data-governance or compliance solution.

Request the fields you need to exercise:
```
create_run kind=doc_metadata_gen params={
  sourceRunId, data_format: "json", target_system: "SharePoint",
  data_fields: [
    "sensitivity",        // e.g. Public | Internal | Confidential | Highly Confidential
    "retention_label",    // e.g. None | Business-3yr | Legal-7yr
    "access_group",       // e.g. All-staff | Finance | Legal | Exec-only
    "contains_pii"        // true | false
  ]
}
```
Clutter returns coherent values per document (an invoice → Confidential / Finance;
a public brochure → Public / All-staff). Your automation then acts on them:

- **Sensitivity labels.** Map `sensitivity` → a published MIP label id and assign it:
  `POST /drives/{drive-id}/items/{item-id}/assignSensitivityLabel`
  `{ "sensitivityLabelId": "...", "assignmentMethod": "standard" }`
  (Graph beta) — or set the corresponding column so a Purview auto-labeling policy picks
  it up.
- **Permissions.** Map `access_group` → an Entra group and grant item scope:
  `POST /drives/{drive-id}/items/{item-id}/invite` (break inheritance first if needed).
- **Retention.** Map `retention_label` → a published label:
  `PATCH /sites/{site-id}/lists/{list-id}/items/{item-id}` setting the retention label, or
  drive it through a Power Automate flow keyed off the column.

The point isn't these exact endpoints — it's that one generated run gives you a realistic
spread of sensitivities, owners and retention needs to test the *whole* automation, instead
of hand-crafting edge cases.

---

## Pattern 4 — Flat files + rich metadata instead of folders

Folders bake one hierarchy in forever; **metadata-driven** organisation is more flexible
and is what enables the automation in Pattern 3. Generate `structure: "flat"` and let the
metadata carry the structure:

1. `create_run kind=doc_generator params={ ..., structure: "flat", target_system: "SharePoint" }`.
2. `create_run kind=doc_metadata_gen` requesting routing fields alongside descriptive ones:
   `data_fields: ["site","doc_library","content_type","department","sensitivity"]`.
3. **Route on upload.** Read each record and PUT the file into the drive that matches its
   `site` + `doc_library` (resolve each distinct pair to a `drive-id` once and cache it),
   rather than into a fixed folder path. Set the remaining fields as columns (Pattern 2)
   and trigger automation (Pattern 3).

This turns a single Clutter run into content spread across a realistic **multi-site
architecture**, organised by metadata that views, search and Flows can pivot on.

---

## Pattern 5 — Generate the information architecture too

Take Pattern 4 further: have Clutter produce the *structure* as data, then provision it
before routing content in.

1. `create_run kind=data_generator params={ prompt: "a SharePoint information architecture for this company: one row per document library", row_count: 20, data_format: "json", data_fields: ["site","doc_library","content_type","default_sensitivity","owning_department"] }`.
2. Provision from that dataset: create the sites/libraries/columns that don't exist yet
   (Graph `POST /sites`, `POST /sites/{id}/lists`, content types/columns; or PnP).
3. Generate documents (Pattern 4) whose `site`/`doc_library` metadata references the same
   taxonomy, and route each file into the library you just provisioned.

Now the synthetic tenant — sites, libraries, columns, content, labels and permissions —
is generated end to end, ready to point a governance or migration tool at.

---

## Notes

- `target_system: "SharePoint"` on any run nudges the model toward SharePoint-appropriate
  field names and shapes; it's a hint, not a hard schema.
- Metadata generation and company builds are **free**; documents are `$0.10` each and data
  rows `$0.06`/10 after the free allowance (all prices USD) — so iterating on metadata schemas
  is cheap.
- Clutter produces **synthetic** data only. None of it is real, which is the point: you can
  load it into any test/demo tenant without privacy or compliance exposure.

Full REST reference: <https://clutter.run/api/docs> · agent guide: <https://clutter.run/llms.txt>
