microsoft/hve-core

Public

mirrored fromhttps://github.com/microsoft/hve-coreAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
54faab9427eeab9625d32cd8fb9308328fe91b66

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

.github/instructions/experimental/graphify.instructions.md

97lines · modepreview

---
description: "Conventions for consuming graphify-out/ knowledge-graph evidence inside the RPI workflow - Brought to you by microsoft/hve-core"
applyTo: '**/graphify-out/**'
---

# Graphify Output Conventions

Rules that apply whenever Copilot reads, writes, or references files under any `graphify-out/` directory. The directory is generated by the third-party [`graphifyy`](https://github.com/safishamsi/graphify) CLI (MIT, unaffiliated personal project). These conventions govern only how to *consume* its output inside the HVE Core RPI workflow; installation and graph builds are out of scope and handled by upstream tooling.

## Working Directory

A `graphify-out/` directory is generated build output:

```text
graphify-out/
├── graph.json          # Canonical graph data — read-only for agents
├── graph.html          # Interactive visualization
├── GRAPH_REPORT.md     # God nodes, surprising connections, suggested questions
├── wiki/               # Per-community markdown articles
└── cache/              # SHA256 incremental cache (do not edit)
```

Rules:

* Treat every file under `graphify-out/` as build output. Do not edit by hand.
* The directory must be gitignored in the target repository before the first build.
* Prefer MCP queries (`mcp_graphify_*` tools, registered by `graphify vscode install`) over direct `graph.json` parsing. The MCP server applies confidence filtering and edge typing that raw JSON does not.

## Audit-Tag Reporting Discipline

Every edge in a graphify graph carries an audit tag. When summarizing graph evidence in chat, in research documents, or in commit messages, report the tag explicitly:

| Tag         | How to report                                                               |
|-------------|-----------------------------------------------------------------------------|
| `EXTRACTED` | State as fact: "X depends on Y."                                            |
| `INFERRED`  | Hedge with the confidence score: "X likely depends on Y (confidence 0.74)." |
| `AMBIGUOUS` | Surface as a question, not a claim: "It is unclear whether X depends on Y." |

A path through the graph that contains both `EXTRACTED` and `INFERRED` edges is an `INFERRED` path overall — a chain is only as strong as its weakest edge. Never collapse multiple audit tags into a single sentence without distinguishing them.

When `graph_stats` shows the graph has more than ~30% `INFERRED` or `AMBIGUOUS` edges, warn the reader that downstream conclusions are tentative.

## Graph Beats Grep When…

Use graph evidence when the question is structural and the answer is not a literal string:

* "What other modules are implicitly affected if I change `auth_middleware.py`?"
* "What is the shortest dependency path between A and B?"
* "Which nodes are most central to the auth subsystem?"
* "What community or cluster does `feature_x` belong to?"

## Grep Beats Graph When…

Fall back to `grep`, `ripgrep`, or direct file reads when the question is lexical or specific:

* "Where is the literal string `TODO(perf)` used?"
* "Which files import `requests`?"
* "What changed in the last commit?"
* The repository contains only file types graphify cannot parse (verify with `graphify <path> --dry-run`).

An `INFERRED` graph edge is weaker evidence than a deterministic grep hit. When the two disagree, trust grep.

## Never Trust an Inferred Path Blindly

`INFERRED` and `AMBIGUOUS` edges are LLM hypotheses, not ground truth. Before acting on them:

* Cite the source file and line range the edge points at.
* Read the cited source directly and confirm the relationship.
* If the source does not support the edge, treat the edge as noise and continue the analysis without it.

`EXTRACTED` edges are derived from AST or tree-sitter and may be trusted without re-verification.

## Cost and Rebuild Discipline

The deep-mode rebuild path issues many parallel Claude API calls. Agents must not trigger rebuilds autonomously. When a user's question would benefit from a fresher graph, surface the recommendation and the approximate cost-shape ("roughly N files changed since last build, expect a partial rebuild") and let the user decide.

If `GRAPH_REPORT.md` is older than the most recent commit on the default branch, recommend a user-initiated `graphify . --update` rebuild before relying on it.

## Upload Discipline for Sensitive Trees

Graphify's deep-extraction stage uploads file *contents* to the Claude API (or to Moonshot AI when `MOONSHOT_API_KEY` is set — a Beijing-based backend with separate data-residency implications). Before recommending a deep rebuild, check:

* Does the target tree contain secrets, credentials, or `.env` files that are not gitignored?
* Does the tree contain customer data, regulated material, or content covered by data-residency requirements?
* Is the target a Microsoft-internal repository where third-party upload requires explicit approval?

If any answer is yes, recommend `--mode fast` (AST-only, no LLM, no uploads) instead, and note the reduced fidelity in the conversation. Do not surface `MOONSHOT_API_KEY` as a configuration option in regulated contexts without explicit clearance.

## Out of Scope

These conventions do not cover:

* How to install or configure `graphifyy` — see upstream `graphify vscode install` / `graphify copilot install`.
* How to register the MCP server with Copilot Chat — `graphify vscode install` writes the workspace config.
* General code-review or refactor practices — graph centrality is not a code-quality signal.

*🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.*