Architecture¶
Technical architecture of Graphize.
System Overview¶
┌─────────────────────────────────────────────────────────────────────────┐
│ GRAPHIZE PIPELINE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Step 1: Detect Step 2: Extract Step 3: Build │
│ ┌──────────┐ ┌─────────────────┐ ┌──────────────┐ │
│ │ Scan │ │ Part A: AST │ │ Merge AST + │ │
│ │ sources │─────────▶│ (deterministic) │──┬───▶│ Semantic │ │
│ │ │ ├─────────────────┤ │ │ results │ │
│ └──────────┘ │ Part B: LLM │ │ └──────────────┘ │
│ │ (optional) │──┘ │ │
│ └─────────────────┘ ▼ │
│ ┌──────────────┐ │
│ Step 4: Analyze Step 5: Export │ GraphFS │ │
│ ┌──────────┐ ┌─────────────────┐ │ Store │ │
│ │ Cluster │◀─────────│ God nodes │◀─────└──────────────┘ │
│ │ Detect │ │ Surprises │ │
│ │ (Louvain)│ │ Questions │ │
│ └──────────┘ └─────────────────┘ │
│ │ │
│ ▼ │
│ Step 6: Output │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ HTML │ TOON │ JSON │ Neo4j │ Obsidian │ GraphML │ Report │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Component Architecture¶
graphize/
├── cmd/graphize/ # CLI entry point
│ ├── main.go
│ └── cmd/
│ ├── root.go # Cobra root command
│ ├── init.go # Initialize database
│ ├── add.go # Add source repos
│ ├── status.go # Check source currency
│ ├── analyze.go # AST extraction
│ ├── enhance.go # LLM semantic extraction prep
│ ├── merge.go # Merge semantic edges
│ ├── query.go # Graph queries
│ ├── report.go # Analysis reports
│ ├── diff.go # Graph comparison
│ ├── export.go # Export (HTML, JSON)
│ ├── serve.go # MCP server
│ └── summary.go # Generate summary
│
├── pkg/
│ ├── source/ # Source tracking
│ │ ├── source.go # Source struct, git integration
│ │ └── manifest.go # Manifest persistence
│ │
│ ├── extract/ # Extraction engines
│ │ ├── multi.go # Multi-language orchestrator
│ │ ├── golang/ # Go extractor (go/ast)
│ │ ├── java/ # Java extractor (tree-sitter)
│ │ ├── typescript/ # TypeScript extractor (tree-sitter)
│ │ └── swift/ # Swift extractor (tree-sitter)
│ │
│ ├── cache/ # Extraction caching
│ │ └── cache.go # SHA256-based per-file cache
│ │
│ ├── analyze/ # Graph analysis
│ │ ├── gods.go # God nodes, isolated nodes
│ │ ├── cluster.go # Community detection wrapper
│ │ ├── surprise.go # Surprising connections
│ │ ├── questions.go # Suggested questions
│ │ ├── group.go # Edge grouping utilities
│ │ └── diff.go # Graph comparison wrapper
│ │
│ ├── metrics/ # Measurement utilities
│ │ ├── formatter.go # FormatBytes, FormatNumber
│ │ ├── walker.go # Source file walking
│ │ └── tokens.go # LLM token estimation
│ │
│ ├── exporters/ # Export format generators
│ │ ├── cypher/ # Neo4j Cypher statements
│ │ └── obsidian/ # Obsidian vault with wikilinks
│ │
│ └── output/ # Output formatters
│ ├── output.go # TOON, JSON, YAML
│ ├── html.go # Cytoscape.js export
│ └── summary.go # Markdown summary
│
└── agents/ # Agent infrastructure
├── specs/ # multi-agent-spec definitions
├── plugins/ # assistantkit-generated plugins
└── graph/ # Graph artifacts
├── semantic-edges.json
└── GRAPH_SUMMARY.md
Provider Interface¶
Graphize uses a pluggable provider architecture for language extractors. This allows external packages to add support for new languages without modifying the core graphize codebase.
LanguageExtractor Interface¶
package provider
import "github.com/plexusone/graphfs/pkg/graph"
type LanguageExtractor interface {
// Language returns the canonical language name (e.g., "go", "java")
Language() string
// Extensions returns file extensions this extractor handles (e.g., ".go", ".java")
Extensions() []string
// CanExtract returns true if this extractor can handle the given file path
CanExtract(path string) bool
// ExtractFile extracts nodes and edges from a source file
ExtractFile(path, baseDir string) ([]*graph.Node, []*graph.Edge, error)
// DetectFramework returns detected framework info, or nil if none detected
DetectFramework(path string) *FrameworkInfo
}
Priority-Based Registration¶
Extractors are registered with a priority level. Higher priority extractors override lower priority ones for the same file extension:
| Priority | Constant | Use Case |
|---|---|---|
| 0 | PriorityDefault |
Built-in extractors |
| 10 | PriorityThick |
SDK-based extractors (override default) |
| 100 | PriorityCustom |
User-provided custom extractors |
func init() {
provider.Register(func() provider.LanguageExtractor {
return &MyExtractor{}
}, provider.PriorityCustom)
}
Built-in Extractors¶
| Language | Package | Parser |
|---|---|---|
| Go | pkg/extract/golang |
Native go/ast |
| Java | pkg/extract/java |
Tree-sitter |
| TypeScript | pkg/extract/typescript |
Tree-sitter |
| Swift | pkg/extract/swift |
Tree-sitter |
Custom Extractors¶
External packages can implement the LanguageExtractor interface and register with the global provider registry:
package myextractor
import (
"github.com/plexusone/graphize/provider"
"github.com/plexusone/graphfs/pkg/graph"
)
type Extractor struct{}
func New() provider.LanguageExtractor { return &Extractor{} }
func (e *Extractor) Language() string { return "mylang" }
func (e *Extractor) Extensions() []string { return []string{".ml"} }
// ... implement remaining interface methods
func init() {
provider.Register(New, provider.PriorityCustom)
}
Import the extractor in your main package to register it:
Storage Layer¶
Graphize uses GraphFS for storage:
.graphize/
├── manifest.json # Tracked sources
├── nodes/
│ ├── func_main.go.Main.json
│ ├── type_UserService.json
│ └── ...
├── edges/
│ ├── {hash}.json
│ └── ...
└── cache/
├── pkg_handlers_user.go.json
└── ...
Node Format¶
{
"id": "func_handler.go.HandleRequest",
"type": "function",
"label": "HandleRequest",
"attrs": {
"source_file": "pkg/handlers/handler.go",
"package": "handlers",
"signature": "func HandleRequest(ctx context.Context, req *Request) error"
}
}
Edge Format¶
{
"from": "func_handler.go.HandleRequest",
"to": "func_db.go.Query",
"type": "calls",
"confidence": "EXTRACTED",
"confidence_score": 1.0
}
Node Types¶
| Type | Description |
|---|---|
package |
Go package |
file |
Source file |
function |
Top-level function |
method |
Type method |
struct |
Struct type |
interface |
Interface type |
Edge Types¶
| Type | Confidence | Description |
|---|---|---|
contains |
EXTRACTED | Package/file contains entity |
imports |
EXTRACTED | Package imports another |
calls |
EXTRACTED | Function calls another |
implements |
EXTRACTED | Type implements interface |
embeds |
EXTRACTED | Type embeds another |
inferred_depends |
INFERRED | Implicit dependency |
implements_pattern |
INFERRED | Design pattern usage |
shared_concern |
INFERRED | Cross-cutting concern |
similar_to |
INFERRED | Semantic similarity |
rationale_for |
INFERRED | Design rationale |
Analysis Algorithms¶
Community Detection¶
Uses the Louvain algorithm (via gonum) for modularity optimization:
// From graphfs/pkg/analyze/louvain.go
result := DetectCommunitiesLouvain(nodes, edges, LouvainOptions{
Resolution: 1.0,
ExcludeEdgeTypes: []string{"contains", "imports"},
ExcludeNodeTypes: []string{"package", "file"},
})
Hub Detection¶
Identifies highly connected nodes by total degree:
// From graphfs/pkg/analyze/gods.go
hubs := FindHubs(nodes, edges, topN, []string{"package", "file"})
Graph Traversal¶
BFS and DFS traversal for path finding:
// From graphfs/pkg/query/traverse.go
traverser := NewTraverser(graph)
result := traverser.BFS(startNode, Outgoing, maxDepth, edgeTypes)
path := traverser.FindPath(from, to, edgeTypes)
MCP Server¶
The MCP server (graphize serve) exposes tools for AI agents:
| Tool | Purpose |
|---|---|
query_graph |
Search and traverse |
get_node |
Node details |
get_neighbors |
Adjacent nodes |
get_community |
Community members |
graph_summary |
Statistics |
See MCP Server for details.
Dependencies¶
| Package | Purpose |
|---|---|
github.com/plexusone/graphfs |
Graph storage |
github.com/spf13/cobra |
CLI framework |
github.com/modelcontextprotocol/go-sdk |
MCP server |
gonum.org/v1/gonum |
Graph algorithms |
github.com/yaricom/goGraphML |
GraphML export |
github.com/grokify/cytoscape-go |
Cytoscape.js export |
Performance Characteristics¶
| Operation | Complexity | Typical Time |
|---|---|---|
| AST extraction | O(files) | <30s for 20K nodes |
| Community detection | O(edges) | <5s for 70K edges |
| BFS traversal | O(V + E) | <100ms |
| HTML export | O(nodes + edges) | <3s for 20K nodes |
Design Decisions¶
Why GraphFS?¶
- Git-friendly (one file per entity)
- Deterministic serialization
- Schema validation
- Referential integrity
Why Louvain?¶
- Well-understood algorithm
- Available in gonum
- Good balance of quality and speed
- Hierarchical communities
Why TOON Output?¶
- ~8x more token-efficient than JSON
- Designed for AI agent consumption
- Preserves essential structure
- Human-readable
Why Two-Step Extraction?¶
- AST extraction is deterministic and fast
- LLM extraction is optional and expensive
- Separating them allows incremental updates
- Different confidence levels for different sources