Security & Trust

Every skill is reviewed before it is listed.

A skill runs inside your agent, next to your most sensitive deal data. Before a skill appears in this catalog it is read against a published trust rubric — so you can see how it was reviewed, not just take our word that it was.

Why this matters for CRE

A skill extends your fiduciary surface.

A CRE skill rarely runs on toy data — you point it at rent rolls, T-12s, IC memos, tenant PII, and sponsor financials that sit under NDAs and fiduciary duty. The moment a skill can read that material, it extends your security and fiduciary surface to whatever agent runs it: a skill that quietly persisted data, reached for a credential, or carried a buried instruction could leak a sponsor's numbers or a tenant's identity as easily as it could miscalculate a cap rate. That is why this rubric weights what a skill is allowed to do — its purpose, its instruction scope, and how it got onto your machine — above whether it happens to write a file. Reviewing a skill before you feed it a deal is the same discipline as reviewing a vendor before you grant data-room access.

The rubric

Five dimensions, weighted by blast radius.

DimensionWeightWhat it asksWhat a score of 5 looks like
Purpose & Capability×3Is what the skill is allowed to do actually limited to what it claims to do? Does it request file, shell, network, or tool access beyond producing the analysis it advertises?A read-and-reason skill that works only from the input you give it, requests no system, shell, or network capability, and whose stated purpose matches its actual footprint — e.g. screening a deal you paste in, with no hidden side effects.
Instruction Scope×3Are the skill's instructions narrowly scoped to its CRE task, with explicit do-not-trigger boundaries, and free of hidden directives, prompt-injection bait, or instructions to exfiltrate or misuse the data it sees?Tightly scoped, on-topic instructions with clear activation and exclusion rules, no concealed system-prompt overrides, and no language that would coax the agent into leaking a rent roll, T-12, or PII it was given.
Install Mechanism×2How does the skill arrive on your machine? Does it run install-time hooks, post-install scripts, or fetch remote code — or is it plain, inspectable text installed through a transparent, version-pinned mechanism?Ships as human-readable Markdown/YAML through an open-source, version-pinned package with no install-time code execution of its own, so you can read exactly what you installed before you run it.
Credentials×2Does the skill read environment variables, request API keys or tokens, embed secrets, or otherwise touch credentials — and if so, is that handling minimal, disclosed, and necessary?No credential surface at all: it reads no environment variables, asks for no keys or tokens, embeds no secrets, and needs none to do its job.
Persistence & Privilege×1Does the skill hold state between runs, write to disk outside your explicit output, escalate privileges, or run with more access than the task requires?Stateless and least-privilege: nothing retained between runs, nothing written to disk unless you ask for it, and any bundled calculator is pure, standard-library code with no privileged access.

Weights reflect blast radius: a skill's purpose, instruction scope, and how it arrives on your machine matter more than whether it writes a file. Scores are 1–5 per dimension; the weighted total is normalized to a 0–5 scale (weights sum to 11). ≥ 4.0 = Verified, 3.0–3.99 = Caution, below 3.0 = Flagged. This is the rubric we publish; it is not a claim that any skill has been run through it.

The score

How the weighted score is computed.

Each dimension is scored 1–5. The scores are weighted, summed, and divided by the sum of the weights (11) to normalize back onto a 0–5 scale:

(Purpose & Capability × 3) + (Instruction Scope × 3) + (Install Mechanism × 2) + (Credentials × 2) + (Persistence & Privilege × 1)
  ÷ 3 + 3 + 2 + 2 + 1  (= 11)  →  score on a 0–5 scale

Thresholds: ≥ 4.0 is Verified, 3.0–3.99 is Caution, and below 3.0 is Flagged.

Verdicts

Three outcomes.

Verified
≥ 4.0

Tightly scoped, no credential surface, transparent install. The kind of skill you can read and run with confidence.

Caution
3.0 – 3.99

Useful but with a wider surface — broader instructions, an install hook, or some state. Read it before pointing it at sensitive data.

Flagged
< 3.0

A capability, scope, or credential concern serious enough that it would not be listed without changes. Treat with care.

By the numbers

The catalog and the methodology.

These figures describe the catalog and the trust methodology. They are not live audit results — no per-skill security audits have been completed yet.

127
skills in catalog
54
agent prompts
16
curated categories
5
trust dimensions in the rubric
11
weighted-score denominator
Apache-2.0
license
One audit, end to end

How the rubric scores a real skill.

Representative example — illustrative methodology, not a live audit record. This shows how the rubric would score one skill. No score here reflects a completed audit, and this result is not stored, tracked, or shown on the skill's own page.

Illustrative: Verified-tier
Deal QuickScreen4.55 / 5 (50/55 weighted)

Subject: the deal-quick-screen skill's security and trust surface (what the skill itself can read, run, or persist) — not the quality of the deal verdicts it produces.

Purpose & Capability ×35 / 5A read-and-reason prompt library that produces a KEEP/KILL memo from user-supplied deal text; it requests no tool, file, or system capability beyond reading the input it is given.
Instruction Scope ×34 / 5Instructions are tightly bounded to deal screening with explicit do-NOT-trigger guardrails and a conservative-bias rule; one point withheld because, like any prompt, its output is steerable by adversarial input it is asked to summarize.
Install Mechanism ×24 / 5Ships as plain Markdown plus YAML references inside the Apache-2.0 plugin with no install-time hooks of its own; one point withheld because it installs as part of a larger plugin whose hooks the user should review once at the plugin level.
Credentials ×25 / 5Reads no environment variables, requests no API keys or tokens, and contains no secret material; it has no credential surface.
Persistence & Privilege ×15 / 5Holds no state between runs, writes nothing to disk on its own, and the optional calculator it references is pure standard-library Python that reads JSON and prints JSON with no privilege escalation.
deal-quick-screen is a stateless, local, prompt-only skill: it reads the deal text you give it, reasons over a published screening rubric, and returns a one-page KEEP/KILL memo, with no network calls, no credential use, and nothing written to disk on its own. Its trust profile is strong because it asks for no capability beyond reading its input and ships as inspectable Apache-2.0 Markdown. As with any LLM skill, the integrity of its output still depends on the host agent and the accuracy of the material you paste in, so treat its verdict as a fast triage signal — not investment advice — and review the underlying numbers before acting.

Scores above are illustrative judgments about this rubric's application, not the output of a completed third-party audit.

Scope & limitations

What this review does and does not cover.

  • Audits cover the catalog copy, metadata, and methodology presented on this site — not the upstream plugin source code.
  • The plugin is open source and changes independently; an upstream commit can alter a skill's behavior after any review here, so a review reflects a point in time, not a guarantee about the code you install today.
  • Skill behavior depends on the host agent, the model version, your inputs, and your local environment; identical instructions can produce different results across setups.
  • No per-skill security audit has been completed. The score, dimensions, and verdict shown anywhere on this site as a worked example are illustrative methodology, not findings.
  • Everything here is provided "as is," without warranty of any kind, under the Apache License 2.0.
  • You are solely responsible for any sensitive, confidential, or regulated client and portfolio data you choose to put in front of any skill or agent.
  • Nothing on this site is investment, legal, tax, or accounting advice.
The pathway

From methodology to audit-backed signals.

The trust system is built but deliberately neutral: no skill shows a score until a real, committed audit record exists for it. Here is what would have to change — and the order it happens in — for a skill to carry an evidence-backed trust signal.

  1. Today — methodology only

    Every skill is read against the rubric above before it is listed, but no per-skill audit record exists yet. So each skill shows “Not formally audited,” and the worked example above is illustrative methodology, not a finding. Nothing on the site reports a per-skill score.

  2. Next — audit pending

    When a skill is queued for a real review, the contract supports an “audit pending” state. It still shows no score or verdict — only that a review is in progress — so a pending review can never be mistaken for a completed one.

  3. Audit-backed

    A skill shows a score and verdict only when a committed audit record exists that pins to the exact reviewed commit and version, scores all five dimensions, and whose verdict is derived from that score. If the upstream plugin changes, the record stops matching and the skill returns to neutral — so a trust signal can never outlive the code it described.

Custom skills go through the same review.

Skills built for your firm are held to the same rubric before they ship to your team.

Explore custom skills →