Skip to content

feat(extraction): add CFML language support (.cfc/.cfm/.cfs) (#1118)#1153

Merged
colbymchenry merged 4 commits into
mainfrom
cfml-language-support-1118
Jul 2, 2026
Merged

feat(extraction): add CFML language support (.cfc/.cfm/.cfs) (#1118)#1153
colbymchenry merged 4 commits into
mainfrom
cfml-language-support-1118

Conversation

@colbymchenry

Copy link
Copy Markdown
Owner

Adds CFML support end-to-end. This carries @ghedwards' work from #1118 (his fork doesn't permit maintainer edits, so it's re-submitted here unchanged, with review fixes on top — full credit to him and the cfmleditor org, who also maintain the tree-sitter-cfml grammar).

From #1118 (ghedwards)

  • Three-grammar wiring (cfml tag-based / cfscript bare-script / cfquery SQL bodies) with a custom CfmlExtractor that dialect-switches per file and delegates <cfscript>/<cfquery> bodies at any tag nesting depth
  • Components/interfaces in both syntaxes, extends/implements, access-derived visibility, call edges, calls inside #hash# expressions in <cfquery> SQL
  • Vendored wasm grammars — verified bit-for-bit reproducible from cfmleditor/tree-sitter-cfml (master ≈ v0.26.29, tree-sitter CLI 0.26.9): SHA-256 identical on all three

Review fixes added on top

  • UTF-8 BOM routing: BOM-prefixed tag-based files (endemic in CFML's Windows-editor history) were sniffed as bare-script and extracted nothing — 110 of ColdBox's 646 CFML files (17%). The sniffer now skips a leading BOM.
  • Unquoted attribute values: <cffunction name=init> (legal, common in older CFML) silently dropped the function; tagAttr now reads the unquoted value shape.
  • Method kinds for component-level <cfscript>: function configure(){} inside <cfcomponent><cfscript> (the ColdBox ModuleConfig shape) now classifies as method, matching script-style CFCs; closures inside functions keep kind function.
  • Rebased onto current main (CHANGELOG/README/grammars conflicts with the Metal work resolved).

Validation

Indexed three public CFML codebases:

Repo Result
FW/1 (305 files) 203/203 .cfc → class, 820 methods, 1,262 call edges, 0.8s
ColdBox (646 files) 512 classes + 10 interfaces (was 457 before the BOM fix; 1 remaining miss is a comment-led perf-harness script), 3,466 methods, ~9k call edges, 1.5s
CFWheels (676 files) 482 classes, 2,683 methods, ~13k call edges, 4.0s (1 miss is a {{placeholder}} template, not real CFML)

Full test suite: 1,991 passing (22 CFML tests; the one flake is the known pre-existing #662 daemon test, green on retry).

Follow-up tracked: #1152 (dotted-path extends="a.b.C" resolution — ~94% of ColdBox core inheritance needs it).

Supersedes #1118.

🤖 Generated with Claude Code

claude and others added 4 commits June 30, 2026 05:30
Adds CodeGraph extraction for ColdFusion Markup Language using
tree-sitter-cfml's cfml and cfscript grammars. Handles both the legacy
tag-based style (<cfcomponent>/<cffunction>) and modern bare-script
style (component { ... }), delegating embedded <cfscript> tag bodies
to the cfscript grammar. .cfs files are routed through the same
extractor so anonymous component names fall back to the filename
consistently across both extensions, since CFML never declares a
component's name in source.

Includes extraction tests covering both dialects, extends/implements,
visibility, and regression coverage for the implicit-end-tag walk and
file-node containment.
… cfquery SQL bodies

A <cfscript> block nested inside control-flow tags (<cfif>/<cfloop>/<cftry>)
inside a <cffunction> or at top-level component scope was silently skipped:
the implicit-end-tag walk only checked direct children/siblings for
cf_script_tag, missing it when wrapped a level deeper. A new recursive
delegateNestedTags helper finds <cfscript>/<cfquery> at any depth without
descending into nested <cffunction> scopes.

<cfquery> SQL bodies were also entirely unhandled - the cf_query_content
node's text was opaque raw SQL, so any #hash# expression inside it (e.g. a
call like #getCurrentUser().getId()# in a WHERE clause) was dropped. Wires
the tree-sitter-cfml cfquery grammar in as a new minimal Language/extractor
(call expressions only - the grammar models no other CodeGraph symbols) and
delegates <cfquery> bodies to it the same way <cfscript> bodies are
delegated to the cfscript grammar.
…ars conflicts)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…-level cfscript method kinds

- Skip a leading UTF-8 BOM in the tag-vs-script sniffer: BOM'd tag-based
  files (17% of ColdBox) were misrouted to the cfscript grammar and
  extracted nothing. ColdBox recovers 55 components (+211 methods).
- Read unquoted attribute values (<cffunction name=init>) — legal and
  common in older CFML; previously the function was silently dropped.
- Classify top-level functions in a component-level <cfscript> block as
  methods (the ColdBox ModuleConfig shape); closures keep kind function.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@colbymchenry colbymchenry merged commit 816bacb into main Jul 2, 2026
1 check passed
@colbymchenry colbymchenry deleted the cfml-language-support-1118 branch July 2, 2026 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants