Due Diligence & Closing

Document-to-Warehouse Pipeline

Orchestration skill that assembles the OUTPUT of single-document extractors into validated, warehouse-ready tabular datasets.

build the warehouse datasetassemble these extractionsvalidate the data room for the modelmake this deck-ready
Open GitHub source

No packaged download — skills install from the open-source plugin repo. Read the SKILL.md and bundled files below before you install.

How to install a skill →
01 · Problem

Orchestration skill that assembles the OUTPUT of single-document extractors into validated, warehouse-ready tabular datasets.

Derived from the skill’s “Skill description” section.

02 · Who & When

Trigger on any of these signals:

  • Explicit: "build the warehouse dataset," "assemble these extractions," "merge the fact tables," "validate the data room for the model," "make this deck-ready," "stage the extracted data for the warehouse," "what's the data quality on this deal package"
  • Implicit: the user already has one or more extractor outputs (data-room fact table, lease abstracts, normalized rent roll, normalized T-12) and needs them combined into a single queryable, validated dataset before underwriting, exhibit-mapping, or deck generation
  • Implicit: the user asks how clean the data is, which rows need review, or whether a figure is safe to put in front of an investment committee
  • Downstream: the user finished extraction and says "okay, now get this ready for the model" or "stage this for the deck"

Negative triggers (do NOT activate; redirect):

  • The user has a raw, unextracted single document (an OM, T-12, rent roll, PCA, ALTA survey, lease, or agency debt quote) and needs the facts pulled out of it for the first time -> use document-to-data-room-extractor. This skill consumes that extractor's output; it does not replace it. If you find yourself reading a PDF page or a spreadsheet cell to create facts, you are in the wrong skill — stop and route to document-to-data-room-extractor.
  • The user wants a single lease abstracted into economic structure -> use lease-abstract-extractor.
  • The user wants WALT, rollover, mark-to-market, and concentration on an already-extracted rent roll -> use rent-roll-analyzer.
  • The user wants management-fee restatement, tax reassessment, and a normalized NOI from a T-12 -> use t12-normalizer.
  • The user wants the validated dataset mapped to deck exhibit specs (table vs. chart, axes, slide binding) -> that is the next step, warehouse-to-exhibit-mapper.
  • The user wants the full 10-year proforma and a go/no-go recommendation -> use acquisition-underwriting-engine.
  • The user wants a due-diligence workstream plan and third-party report ordering -> use dd-command-center.

Derived from the skill’s “When to Activate” section.

03 · How It's Done Today

Not documented yet for this skill.

04 · What This Skill Changes
# Warehouse-Ready Datasets -- {deal_id}
Boundary: assembled & validated already-extracted facts; no document extraction performed.
Validation profile: {validation_profile}   |   Deck scope: {deck_scope}   |   As-of: {as_of_date}
Datasets: {n}   |   Rows: {m}   |   needs-review: {k}   |   flagged: {f}   |   deck-ready: {d}

## Dataset: cre_expense_lineitems_period
Schema (grain: one row per expense line item per period):
| column | type | unit | nullable |
|---|---|---|---|
| line_item | string | -- | no |
| amount | number | USD | no |
| period | string | -- | no |
| source_doc | string | -- | no |
| locator | string | -- | no |
| source_ref | string | -- | no |
| extracted_by | string | -- | no |
| classification | enum | -- | no |
| confidence | enum | -- | no |
| review_status | enum | -- | no |
| extracted_at | datetime | -- | no |
| deck_ready | bool | -- | no |

Rows (sample):
| line_item | amount | period | source_ref | extracted_by | classification | confidence | review_status | deck_ready |
|---|---|---|---|---|---|---|---|---|
| management_fee | 142,300 | 2025 TTM | data-room/T12-001#Summary!B18 | t12-normalizer | calculated | high | accepted | true |
| real_estate_tax | 410,000 | 2025 TTM | data-room/T12-001#Summary!B9 | document-to-data-room-extractor | source-fact | medium | needs-review | false |
| insurance | 88,000 | FY (OM) | data-room/OM-001#p22 | document-to-data-room-extractor | source-fact | low | flagged | false |

## Cross-Dataset Conflicts
- NOI: OM broker-stated $4,210,000 (data-room/OM-001#p14, source-fact, low) vs. T-12-derived $3,961,000 (data-room/T12-001#Summary, calculated, high). Delta 6.3% > 1% tolerance. dedupe_policy=prefer_verified -> retained T-12 value; OM value kept in conflicts, both flagged needs-review.

## Validation Results
| rule | rows checked | passed | flagged | needs-review |
|---|---|---|---|---|
| provenance_columns_nonnull | 214 | 214 | 0 | 0 |
| source_ref_resolves | 214 | 211 | 3 | 0 |
| occupancy_in_range | 14 | 14 | 0 | 0 |
| noi_cross_doc_reconcile | 1 | 0 | 0 | 1 |

## Gate Report (rows blocked from committed deck)
- real_estate_tax (data-room/T12-001#Summary!B9): needs-review (conflicting tax reassessment basis). Unblock: analyst accept or supply tax bill via document-to-data-room-extractor.
- insurance (data-room/OM-001#p22): flagged (sub-floor OCR confidence 0.41; never deck-ready). Unblock: re-extract from a legible source.

## Freshness
- T12-001 period ends 2025-09-30; as_of 2026-05-29 -> 241 days; within 90-day window? NO -> staleness flag carried; 19 revenue/expense rows marked needs-review.

## Handoff
Validated datasets ready for warehouse-to-exhibit-mapper. Missing: title_findings (no ALTA survey extracted) -> route survey to document-to-data-room-extractor before any title exhibit.

Derived from the skill’s “Output Format” section.

05 · Risks & Caveats

Not documented yet for this skill.