Parser and Analysis Tool Landscape¶
Technical comparison of parsing approaches for security reachability analysis.
Why Parser Choice Matters¶
Security-grade reachability analysis requires:
- Type resolution - Know the actual types being called
- Call graph construction - Trace execution paths
- Semantic understanding - Understand framework patterns
- Cross-package analysis - Follow imports and dependencies
Not all parsers provide these capabilities.
Tree-sitter: Strengths and Limitations¶
Tree-sitter is widely used for multi-language parsing.
What Tree-sitter IS Good For¶
- Fast incremental parsing
- Syntax trees (CST, not semantic AST)
- Editor tooling and syntax highlighting
- Code navigation
- Language-agnostic structural queries
What Tree-sitter is NOT Good For¶
- Type resolution
- Call graph construction
- Semantic reachability
- Framework understanding
- Vulnerability analysis
Key insight: Tree-sitter is a front-end parser, not an analysis engine.
For security reachability:
Tree-sitter alone cannot determine if a vulnerable function is actually called.
Language-Specific Analysis Tools¶
Serious reachability analysis requires compiler-level APIs:
Go¶
Recommended: go/ast + go/types + golang.org/x/tools/go/ssa
Go is exceptionally well-suited for reachability because:
- Static typing with minimal reflection
- First-class analysis tooling in stdlib
go/callgraphfor call graph constructiongo/ssafor SSA-form analysis- Standardized module system
Tools like govulncheck achieve high precision due to these capabilities.
Java¶
Recommended: Eclipse JDT, Soot, WALA
| Tool | Use Case |
|---|---|
| Eclipse JDT | AST + compiler frontend |
| Soot | Bytecode analysis, call graphs |
| WALA | Points-to analysis, slicing |
| Spoon | AST transformation |
Java bytecode analysis (via Soot) often provides better precision than source analysis.
TypeScript / JavaScript¶
Recommended: TypeScript Compiler API
The TypeScript compiler API provides:
- Full type information
- Symbol resolution
- Call graph extraction
- Module resolution
For JavaScript without types, analysis precision degrades significantly.
Python¶
Challenge: Dynamic typing limits static analysis precision.
| Tool | Capability |
|---|---|
ast module |
Basic AST (stdlib) |
| mypy internals | Type inference |
| Pyright | Type checking |
Python reachability often requires runtime analysis or conservative approximation.
C / C++¶
Recommended: Clang/LLVM LibTooling
Clang AST is arguably best-in-class for C/C++:
- Rich semantic information
- Used by CodeQL, many SAST tools
- LLVM IR for deeper analysis
Rust¶
Recommended: rustc internals, MIR
| Level | Use Case |
|---|---|
syn crate |
AST parsing |
| HIR | High-level IR |
| MIR | Mid-level IR (comparable to Go SSA) |
MIR provides the best precision for Rust reachability.
C# / .NET¶
Recommended: Roslyn Compiler APIs
Roslyn provides:
- Full semantic model
- Symbol resolution
- Flow analysis
- Refactoring engines
One of the strongest analysis ecosystems outside of Go.
Recommended Architecture¶
For a multi-language reachability system:
Layer 1: Parsing¶
Use language-specific parsers, not a universal parser:
Layer 2: Semantic Enrichment¶
- Type resolution
- Import graph construction
- Symbol linking
- Framework pattern detection
Layer 3: IR Construction¶
- Call graph
- Control flow graph
- Data flow graph
- Dependency graph
Layer 4: Reachability Engine¶
- Graph traversal (BFS/DFS)
- Path finding
- Taint analysis
- Vulnerability mapping
graphize Approach¶
graphize uses this hybrid architecture:
| Language | Parser | Semantic Analysis |
|---|---|---|
| Go | go/ast |
go/types, SemanticExtractor |
| Java | Tree-sitter | Spring annotation detection |
| TypeScript | Tree-sitter | Import resolution |
| Swift | Tree-sitter | Protocol detection |
Go gets full semantic analysis via go/types and the SemanticExtractor.
Other languages use Tree-sitter for structure with framework-specific heuristics for semantic understanding.
Why Go is Special¶
Go provides uniquely strong guarantees for static analysis:
- Simple language - Limited metaprogramming, predictable call structure
- Stdlib analysis tools -
go/ast,go/types,go/callgraph,go/ssa - Module system - Reproducible builds, clear dependencies
- Interface semantics - Explicit interface implementation
This makes Go one of the easiest languages for:
- SBOM accuracy
- Reachability precision
- Dependency graph correctness
Implications for graphize-appsec¶
Given parser limitations:
| Language | Reachability Confidence |
|---|---|
| Go | High (type-resolved call graphs) |
| Java (Spring) | Medium-High (annotation patterns) |
| TypeScript | Medium (structural analysis) |
| Python | Low-Medium (dynamic typing) |
For languages without strong semantic analysis, graphize-appsec should:
- Report lower confidence scores
- Flag for manual review
- Use conservative (over-approximate) analysis