feat(objectscript): InterSystems IRIS ObjectScript language support#467
Open
isc-tdyar wants to merge 1 commit into
Open
feat(objectscript): InterSystems IRIS ObjectScript language support#467isc-tdyar wants to merge 1 commit into
isc-tdyar wants to merge 1 commit into
Conversation
1 task
Add ObjectScript (InterSystems IRIS / Caché) as a supported language,
covering the UDL class format (.cls), MAC/INT routines (.mac/.int/.rtn),
include/macro files (.inc), and IRIS Studio Export XML.
Definition extraction (extract_defs.c): Class, Method, ClassMethod,
Property, Parameter, Index, Trigger (with body text), XData, Storage,
and Query members as graph nodes; base classes from the Extends clause.
Call dispatch resolution (extract_calls.c) — four ObjectScript patterns
that are structurally invisible to text search:
1. ##class(Pkg.Class).Method() explicit cross-class call
2. ..Method() relative-dot self-call (the dominant
intra-class form; large impact on
CALLS completeness)
3. $$$Macro macro expansion via a per-project
table built from .inc files
4. type inference from %New/%OpenId + declared return types
Ensemble production topology (pass_ensemble_routing.c): EnsembleItem
nodes per production component and ROUTES_TO edges resolved from
ProductionDefinition XData, plus WorkMgr .Queue("##class(X).method")
dispatch — all parsed statically at index time, no live IRIS required.
Language detection (language.c): .mac/.int/.rtn map to ObjectScript
routine directly; .cls (shared with Apex) and .inc (shared with BitBake)
are disambiguated by content, defaulting to the existing language on any
doubt so neither Apex nor BitBake detection regresses.
The two new per-project tables (macros, return types) are threaded
through a new internal cbm_extract_file_ex() so the public
cbm_extract_file() signature is unchanged.
The tree-sitter grammars are NOT vendored in this PR; they are a
dependency to be vendored separately from
https://github.com/intersystems/tree-sitter-objectscript (MIT).
The build will not link until the grammar is present.
Refs DeusData#462
Signed-off-by: Thomas Dyar <tdyar@intersystems.com>
593b161 to
578d36f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds ObjectScript (InterSystems IRIS / Caché) as a supported language, per the discussion in #462.
ObjectScript powers large healthcare, finance, and enterprise systems and has no support in CBM (or most code-graph tools). This PR makes those codebases indexable and resolves the call-dispatch patterns that are structurally invisible to text search.
Refs #462
Per your note on #462, this PR contains only the CBM source changes — not the vendored grammar. The two tree-sitter grammars come from intersystems/tree-sitter-objectscript (MIT licensed, maintained by the language vendor):
objectscript_udl(.cls) andobjectscript_routine(.mac/.inc/.rtn/.int).Consequence: the build will not link until the grammar is vendored at
internal/cbm/vendored/grammars/objectscript_udl/and…/objectscript_routine/— it fails on the missingtree_sitter_objectscript_udl()/_routine()symbols. CI will be red until then. That is intentional, matching your plan to audit and vendor the grammar independently. The grammar shims (grammar_objectscript_*.c) that declare those factories are included — only the generatedparser.c/scanner.care omitted.What's in the PR
Definition extraction — Class, Method, ClassMethod, Property, Parameter, Index, Trigger (with body text), XData, Storage, Query members → graph nodes; base classes from the
Extendsclause.Four call-dispatch patterns (all resolved statically at index time):
##class(Pkg.Class).Method()..Method()$$$Macro.incfilesSet x = ##class(P).%New()…x.Save()%New/%OpenId+ declared return typesEnsemble production topology (
pass_ensemble_routing.c) —EnsembleItemnodes per production component andROUTES_TOedges resolved fromProductionDefinitionXData; plus WorkMgr.Queue("##class(X).method")dispatch. All static — no live IRIS instance required.Two design points for your review
Public API unchanged. ObjectScript needs two per-project tables (a
$$$macrotable and a method-return-type table) that single-file extraction can't build alone. Rather than widen the publiccbm_extract_file()signature (which would rippleNULL, NULLthrough every call site), I added an internalcbm_extract_file_ex()that carries the tables;cbm_extract_file()is a thin wrapper that delegates withNULL, NULL. Only the pipeline passes that build the tables call_ex.Extension collisions with Apex and BitBake (both added since I started this work).
.mac/.int/.rtnmap to ObjectScript routine directly. The two collisions are resolved by content sniffing, following the existingcbm_disambiguate_m()(.mMATLAB-vs-ObjC) pattern, and default to the existing language on any doubt so neither Apex nor BitBake regresses:.cls(vs Apex): aClass <Uppercase…>header line → ObjectScript UDL, else Apex. Edge case: a.clswhoseClassline sits beyond the first 4 KB (e.g. a very large license banner) would fall through to Apex..inc(vs BitBake): aROUTINE <Uppercase>header or an ObjectScript preprocessor directive (#define/#def1arg/#;) → ObjectScript routine, else BitBake. (#define/#def1argnever collide with BitBake, which uses#only for# comment.)These are the spots most likely to need your input — happy to adjust the heuristics or the generalization however you prefer.
EnsembleItem label (your Q2 on #462)
I used a domain-specific
EnsembleItemnode label andROUTES_TOedge type. If you'd prefer a generic label (ServiceComponent/WorkflowNode) withensemble_itemas a property, I'm glad to rename — just let me know before merge.Tests
tests/test_extraction.cgains the ObjectScript suite: UDL class/method extraction, all four dispatch patterns, Ensemble topology parsing, macro expansion, trigger body text, and Export-XML transcoding. The Export-XML transcoder tests are grammar-independent and pass today; the grammar-dependent tests pass once the grammar is vendored. No other test files are touched.Scope / roadmap
This PR is the foundation. If it's well received, two separate follow-up PRs would complete the story (each with its own issue): (a) cross-version
version_tag+diff_versions, and (b) ObjectScript-tuned semantic embeddings. They're deliberately excluded here to keep this reviewable.Checklist
git commit -s) — DCOmake -f Makefile.cbm test) — passes once the grammar is vendored; see note abovemake -f Makefile.cbm lint-ci)