Skip to content

Architecture

Technical architecture of Graphize.

System Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                         GRAPHIZE PIPELINE                                │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Step 1: Detect        Step 2: Extract           Step 3: Build          │
│  ┌──────────┐          ┌─────────────────┐       ┌──────────────┐       │
│  │ Scan     │          │ Part A: AST     │       │ Merge AST +  │       │
│  │ sources  │─────────▶│ (deterministic) │──┬───▶│ Semantic     │       │
│  │          │          ├─────────────────┤  │    │ results      │       │
│  └──────────┘          │ Part B: LLM     │  │    └──────────────┘       │
│                        │ (optional)      │──┘           │               │
│                        └─────────────────┘              ▼               │
│                                                  ┌──────────────┐       │
│  Step 4: Analyze       Step 5: Export           │ GraphFS      │       │
│  ┌──────────┐          ┌─────────────────┐      │ Store        │       │
│  │ Cluster  │◀─────────│ God nodes       │◀─────└──────────────┘       │
│  │ Detect   │          │ Surprises       │                              │
│  │ (Louvain)│          │ Questions       │                              │
│  └──────────┘          └─────────────────┘                              │
│       │                                                                  │
│       ▼                                                                  │
│  Step 6: Output                                                          │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ HTML │ TOON │ JSON │ Neo4j │ Obsidian │ GraphML │ Report       │    │
│  └─────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────┘

Component Architecture

graphize/
├── cmd/graphize/           # CLI entry point
│   ├── main.go
│   └── cmd/
│       ├── root.go         # Cobra root command
│       ├── init.go         # Initialize database
│       ├── add.go          # Add source repos
│       ├── status.go       # Check source currency
│       ├── analyze.go      # AST extraction
│       ├── enhance.go      # LLM semantic extraction prep
│       ├── merge.go        # Merge semantic edges
│       ├── query.go        # Graph queries
│       ├── report.go       # Analysis reports
│       ├── diff.go         # Graph comparison
│       ├── export.go       # Export (HTML, JSON)
│       ├── serve.go        # MCP server
│       └── summary.go      # Generate summary
├── pkg/
│   ├── source/             # Source tracking
│   │   ├── source.go       # Source struct, git integration
│   │   └── manifest.go     # Manifest persistence
│   │
│   ├── extract/            # Extraction engines
│   │   ├── multi.go        # Multi-language orchestrator
│   │   ├── golang/         # Go extractor (go/ast)
│   │   ├── java/           # Java extractor (tree-sitter)
│   │   ├── typescript/     # TypeScript extractor (tree-sitter)
│   │   └── swift/          # Swift extractor (tree-sitter)
│   │
│   ├── cache/              # Extraction caching
│   │   └── cache.go        # SHA256-based per-file cache
│   │
│   ├── analyze/            # Graph analysis
│   │   ├── gods.go         # God nodes, isolated nodes
│   │   ├── cluster.go      # Community detection wrapper
│   │   ├── surprise.go     # Surprising connections
│   │   ├── questions.go    # Suggested questions
│   │   ├── group.go        # Edge grouping utilities
│   │   └── diff.go         # Graph comparison wrapper
│   │
│   ├── metrics/            # Measurement utilities
│   │   ├── formatter.go    # FormatBytes, FormatNumber
│   │   ├── walker.go       # Source file walking
│   │   └── tokens.go       # LLM token estimation
│   │
│   ├── exporters/          # Export format generators
│   │   ├── cypher/         # Neo4j Cypher statements
│   │   └── obsidian/       # Obsidian vault with wikilinks
│   │
│   └── output/             # Output formatters
│       ├── output.go       # TOON, JSON, YAML
│       ├── html.go         # Cytoscape.js export
│       └── summary.go      # Markdown summary
└── agents/                 # Agent infrastructure
    ├── specs/              # multi-agent-spec definitions
    ├── plugins/            # assistantkit-generated plugins
    └── graph/              # Graph artifacts
        ├── semantic-edges.json
        └── GRAPH_SUMMARY.md

Provider Interface

Graphize uses a pluggable provider architecture for language extractors. This allows external packages to add support for new languages without modifying the core graphize codebase.

LanguageExtractor Interface

package provider

import "github.com/plexusone/graphfs/pkg/graph"

type LanguageExtractor interface {
    // Language returns the canonical language name (e.g., "go", "java")
    Language() string

    // Extensions returns file extensions this extractor handles (e.g., ".go", ".java")
    Extensions() []string

    // CanExtract returns true if this extractor can handle the given file path
    CanExtract(path string) bool

    // ExtractFile extracts nodes and edges from a source file
    ExtractFile(path, baseDir string) ([]*graph.Node, []*graph.Edge, error)

    // DetectFramework returns detected framework info, or nil if none detected
    DetectFramework(path string) *FrameworkInfo
}

Priority-Based Registration

Extractors are registered with a priority level. Higher priority extractors override lower priority ones for the same file extension:

Priority Constant Use Case
0 PriorityDefault Built-in extractors
10 PriorityThick SDK-based extractors (override default)
100 PriorityCustom User-provided custom extractors
func init() {
    provider.Register(func() provider.LanguageExtractor {
        return &MyExtractor{}
    }, provider.PriorityCustom)
}

Built-in Extractors

Language Package Parser
Go pkg/extract/golang Native go/ast
Java pkg/extract/java Tree-sitter
TypeScript pkg/extract/typescript Tree-sitter
Swift pkg/extract/swift Tree-sitter

Custom Extractors

External packages can implement the LanguageExtractor interface and register with the global provider registry:

package myextractor

import (
    "github.com/plexusone/graphize/provider"
    "github.com/plexusone/graphfs/pkg/graph"
)

type Extractor struct{}

func New() provider.LanguageExtractor { return &Extractor{} }

func (e *Extractor) Language() string { return "mylang" }
func (e *Extractor) Extensions() []string { return []string{".ml"} }
// ... implement remaining interface methods

func init() {
    provider.Register(New, provider.PriorityCustom)
}

Import the extractor in your main package to register it:

import _ "github.com/example/graphize-mylang"

Storage Layer

Graphize uses GraphFS for storage:

.graphize/
├── manifest.json           # Tracked sources
├── nodes/
│   ├── func_main.go.Main.json
│   ├── type_UserService.json
│   └── ...
├── edges/
│   ├── {hash}.json
│   └── ...
└── cache/
    ├── pkg_handlers_user.go.json
    └── ...

Node Format

{
  "id": "func_handler.go.HandleRequest",
  "type": "function",
  "label": "HandleRequest",
  "attrs": {
    "source_file": "pkg/handlers/handler.go",
    "package": "handlers",
    "signature": "func HandleRequest(ctx context.Context, req *Request) error"
  }
}

Edge Format

{
  "from": "func_handler.go.HandleRequest",
  "to": "func_db.go.Query",
  "type": "calls",
  "confidence": "EXTRACTED",
  "confidence_score": 1.0
}

Node Types

Type Description
package Go package
file Source file
function Top-level function
method Type method
struct Struct type
interface Interface type

Edge Types

Type Confidence Description
contains EXTRACTED Package/file contains entity
imports EXTRACTED Package imports another
calls EXTRACTED Function calls another
implements EXTRACTED Type implements interface
embeds EXTRACTED Type embeds another
inferred_depends INFERRED Implicit dependency
implements_pattern INFERRED Design pattern usage
shared_concern INFERRED Cross-cutting concern
similar_to INFERRED Semantic similarity
rationale_for INFERRED Design rationale

Analysis Algorithms

Community Detection

Uses the Louvain algorithm (via gonum) for modularity optimization:

// From graphfs/pkg/analyze/louvain.go
result := DetectCommunitiesLouvain(nodes, edges, LouvainOptions{
    Resolution: 1.0,
    ExcludeEdgeTypes: []string{"contains", "imports"},
    ExcludeNodeTypes: []string{"package", "file"},
})

Hub Detection

Identifies highly connected nodes by total degree:

// From graphfs/pkg/analyze/gods.go
hubs := FindHubs(nodes, edges, topN, []string{"package", "file"})

Graph Traversal

BFS and DFS traversal for path finding:

// From graphfs/pkg/query/traverse.go
traverser := NewTraverser(graph)
result := traverser.BFS(startNode, Outgoing, maxDepth, edgeTypes)
path := traverser.FindPath(from, to, edgeTypes)

MCP Server

The MCP server (graphize serve) exposes tools for AI agents:

Tool Purpose
query_graph Search and traverse
get_node Node details
get_neighbors Adjacent nodes
get_community Community members
graph_summary Statistics

See MCP Server for details.

Dependencies

Package Purpose
github.com/plexusone/graphfs Graph storage
github.com/spf13/cobra CLI framework
github.com/modelcontextprotocol/go-sdk MCP server
gonum.org/v1/gonum Graph algorithms
github.com/yaricom/goGraphML GraphML export
github.com/grokify/cytoscape-go Cytoscape.js export

Performance Characteristics

Operation Complexity Typical Time
AST extraction O(files) <30s for 20K nodes
Community detection O(edges) <5s for 70K edges
BFS traversal O(V + E) <100ms
HTML export O(nodes + edges) <3s for 20K nodes

Design Decisions

Why GraphFS?

  • Git-friendly (one file per entity)
  • Deterministic serialization
  • Schema validation
  • Referential integrity

Why Louvain?

  • Well-understood algorithm
  • Available in gonum
  • Good balance of quality and speed
  • Hierarchical communities

Why TOON Output?

  • ~8x more token-efficient than JSON
  • Designed for AI agent consumption
  • Preserves essential structure
  • Human-readable

Why Two-Step Extraction?

  • AST extraction is deterministic and fast
  • LLM extraction is optional and expensive
  • Separating them allows incremental updates
  • Different confidence levels for different sources