Skip to content

Implement SCA Reachability runtime detection: report vulnerable classes and callsites via telemetry#11352

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 79 commits into
masterfrom
alejandro.gonzalez/sca-reachability
Jun 9, 2026
Merged

Implement SCA Reachability runtime detection: report vulnerable classes and callsites via telemetry#11352
gh-worker-dd-mergequeue-cf854d[bot] merged 79 commits into
masterfrom
alejandro.gonzalez/sca-reachability

Conversation

@jandro996

@jandro996 jandro996 commented May 12, 2026

Copy link
Copy Markdown
Member

What Does This Do

Implements the SCA Reachability subsystem for the Java APM agent. When DD_APPSEC_SCA_ENABLED=true, the agent detects at runtime which classes and methods from vulnerable library versions are actually loaded and invoked, and reports this via app-dependencies-loaded telemetry using the RFC stateful heartbeat model.

Build - Symbol database

  • New Gradle task generateScaCvesJson in dd-java-agent/appsec/build.gradle downloads GHSA enrichments from the private sca-reachability-database and bundles sca_cves.json in the agent JAR. The committed copy is used in CI (no network access required at build time). Maintainers regenerate with ./gradlew generateScaCvesJson -PrefreshSca.
  • GhsaEnrichmentParser (buildSrc Kotlin): converts the GHSA enrichment format to the internal format - one record per Maven artifact, class symbols in JVM internal format (slashes). Filters to JVM language and Maven ecosystem only.

Detection - ClassFileTransformer (two-phase design)

ScaReachabilityTransformer uses a two-phase approach to keep JAR I/O off the class-loading thread:

  • First load (classBeingRedefined == null): transform() only enqueues a PendingClass(className, jarURL) record and returns null immediately. No I/O on the class-loading thread. Matches the pattern used by LocationsCollectingTransformer.
  • Periodic processing (processPendingClassEvents()): runs on the telemetry heartbeat thread; drains the queue, resolves artifact versions from JAR pom.properties, evaluates version ranges, reports class-level hits, and schedules method-level retransformation.
  • Retransform (classBeingRedefined != null): injects ASM bytecode callbacks at vulnerable method entry points. Tradeoff: method calls during the ~10s startup window before the first heartbeat are not captured; there is no behavioral difference for class-level symbols (the only active symbols today).

Additional transformer details:

  • ScaReachabilitySystem: entry point called from Agent.java by reflection (same pattern as AppSecSystem).
  • ScaReachabilityCallback (bootstrap): dedup + dispatch bridge between application classloader (injected bytecode) and agent classloader (handler). Must live in bootstrap to be visible from any classloader.
  • Version resolution: JAR-local pom.properties first, then java.class.path scan as fallback for aggregator artifacts (e.g. Spring Boot starters). The java.class.path scan covers standard deployments (JARs passed via -cp) on both Java 8 and 9+. Known limitation: OSGi bundle JARs and Spring Boot fat JAR nested libraries loaded via LaunchedURLClassLoader are not listed in java.class.path, so version resolution for aggregator artifacts may fail in those environments (falls back to retrying on the next heartbeat). A URLClassLoader chain walk was not viable here because AgentThreadFactory sets the context classloader to null on all agent threads.
  • VersionRangeParser: evaluates GHSA version range conditions (<, <=, >=, >, =, comma-separated AND).
  • performPendingRetransforms(): uses contains()+removeAll() instead of remove() inside the getAllLoadedClasses() loop so that all classloader instances of the same class are retransformed (Spring Boot creates multiple LaunchedURLClassLoader instances).

Callsite detection

ScaReachabilitySystem.findCallsite() walks the thread stack to report the application frame that called the vulnerable method, not the vulnerable method itself:

  • StackWalkerFactory.INSTANCE.walk() is used for lazy stack evaluation on JDK9+ (avoids materializing the full stack array). Agent frames are pre-filtered by the walker.
  • ScaStackExclusionTrie (generated from sca_stack_exclusion.trie via tries.gradle) filters intermediate library frames so wrapper libraries are skipped and client code is reported.
  • Overload findCallsite(String, StackTraceElement[]) for unit-testable stack simulation.

Telemetry - RFC stateful heartbeat

  • ScaReachabilityDependencyRegistry (internal-api): stateful registry for CVE state per artifact@version. registerCve() creates an entry with reached:[]; recordHit() stores the first callsite via AtomicReference.compareAndSet (first-hit-wins).
  • ScaReachabilityPeriodicAction (telemetry): on each heartbeat, drains both DependencyService (new JARs) and the registry (CVE state changes), merges into one entry per dependency. Replaces DependencyPeriodicAction when SCA is enabled to avoid duplicate app-dependencies-loaded entries.
  • Dependency: new nullable reachabilityMetadata field; TelemetryRequestBody.writeDependency() writes the metadata array when present.

Benchmarks

The local benchmark/ folder has been removed from dd-trace-java (see #11502). The sca variant (-Ddd.appsec.enabled=true -Ddd.appsec.sca.enabled=true) was contributed to the new apm-sdks-benchmarks implementation by @sarahchen6 in DataDog/apm-sdks-benchmarks#161, based on the variant I added here originally.

Motivation

Implements APPSEC-62260 - SCA Reachability runtime detection per the RFC.

Additional Notes

  • The generateScaCvesJson Gradle plugin and the committed sca_cves.json are a temporary solution. In a future iteration the SCA symbol database will be delivered at runtime via Remote Config, at which point this plugin and the committed JSON file will be removed.

  • sca_cves.json is committed to the repo because sca-reachability-database is a private repo not accessible from CI without a token. Only the maintainer runs the refresh task.

  • Method-level symbols in sca_cves.json (snakeyaml load/loadAll, xstream fromXML, etc.) are manually curated for testing. The database format does not yet define method-level entries; when it does, the Gradle task will be updated and the manual entries removed.

  • AbstractStackWalker.isNotDatadogTraceStackElement visibility changed from package-private to public to allow callsite filtering from the appsec module.

  • sca_stack_exclusion.trie is a curated copy of the IAST exclusion trie adapted for SCA callsite detection (testing framework entries removed). The trie cannot be reused directly from IAST without a circular dependency.

  • Known limitation - aggregator artifacts in OSGi / Spring Boot fat JARs: version resolution for aggregator artifacts (e.g. spring-boot-starter-web) relies solely on java.class.path, which only covers JARs present on the standard classpath at JVM startup. OSGi bundle JARs and JARs nested inside a Spring Boot fat JAR (loaded by LaunchedURLClassLoader) are not listed there. In those environments, the version fallback for aggregator artifacts may return null and the CVE will be retried on the next heartbeat. Direct artifacts (those whose own JAR contains pom.properties) are unaffected.

  • The ScaReachabilitySmokeTest extends AbstractAppSecServerSmokeTest and remains Groovy until that base class is migrated to Java.

Contributor Checklist

Jira ticket: APPSEC-62260

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

@jandro996

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e607887e99

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@jandro996

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82ea8065d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread dd-java-agent/appsec/src/main/java/com/datadog/appsec/sca/ScaCveDatabase.java Outdated
@jandro996

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@jandro996

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6008ac9ca0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@jandro996

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 750b3c3877

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@jandro996

Copy link
Copy Markdown
Member Author

@codex revew

@chatgpt-codex-connector

Copy link
Copy Markdown

I reviewed the PR context and the currently checked-out code for the flagged areas. I don’t see any new blocking issues beyond what has already been discussed/resolved.

Review result

Commands used

  • git status --short && git rev-parse --abbrev-ref HEAD
  • sed -n ... ScaReachabilityTransformer.java / ScaCveDatabase.java / ScaReachabilityCallback.java
  • rg -n "checkAlreadyLoadedClasses|findArtifactVersionInClasspath|java.class.path|protectionDomain == null|Path B|isJdk" dd-java-agent/appsec/src/main/java/com/datadog/appsec/sca/ScaReachabilityTransformer.java
  • nl -ba ... | sed -n ... (to capture exact line references)

No code changes were made, so no commit or follow-up PR was created.

View task →

@jandro996 jandro996 added type: enhancement Enhancements and improvements comp: asm waf Application Security Management (WAF) comp: telemetry Telemetry tag: override-groovy-enforcement Override the "Enforce Groovy Migration" check labels May 14, 2026
@jandro996 jandro996 changed the title Implement SCA Reachability: detect and report vulnerable library classes loaded at runtime Implement SCA Reachability runtime detection: report vulnerable classes and callsites via telemetry May 14, 2026
@jandro996

Copy link
Copy Markdown
Member Author

@codex review

1 similar comment
@jandro996

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 24c53f05cc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@datadog-official

This comment has been minimized.

@jandro996

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b449cff45

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread buildSrc/src/main/kotlin/datadog/gradle/sca/GhsaEnrichmentParser.kt
Comment thread dd-java-agent/appsec/src/main/java/com/datadog/appsec/sca/ScaCveDatabase.java Outdated
@jandro996 jandro996 marked this pull request as ready for review May 20, 2026 06:08
@jandro996 jandro996 requested review from a team as code owners May 20, 2026 06:08
@jandro996

jandro996 commented Jun 6, 2026

Copy link
Copy Markdown
Member Author

Thanks @jpbempel for all the review comments! all applied (FQN cleanup, catch scope, VisibleForTesting, Strings.getClassName, null guards on getAllLoadedClasses, and StackWalkerFactory)

Thanks @sarahchen6 for porting the SCA benchmark variant to apm-sdks-benchmarks and for the heads up on the migration (updated the PR description accordingly)

Sorry for the delay! :(

@sarahchen6 sarahchen6 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from LP side!

Comment thread dd-java-agent/appsec/src/main/java/com/datadog/appsec/sca/ScaCveDatabase.java Outdated
Comment thread telemetry/src/main/java/datadog/telemetry/TelemetrySystem.java
Comment thread buildSrc/src/main/kotlin/datadog/gradle/plugin/sca/ScaEnrichmentsPlugin.kt Outdated
- Extract ASM injection into ScaMethodCallbackInjector (separate concerns)
- Replace URLClassLoader walk with java.class.path scan (dead code on agent threads
  where AgentThreadFactory sets null context classloader; also not a URLClassLoader
  on Java 9+); add unit tests documenting both constraints
- Replace stream with plain loop in hasMethodLevelSymbolForClass (avoids
  spliterator allocation on each class-load check)
- Normalize class-level vs method-level symbol coexistence in ScaCveDatabase.toScaEntry():
  drop class-level (method=null) symbol when method-level symbols exist for the same
  class in the same entry, preventing first-hit-wins hitRef from discarding the
  more specific callsite
- Guard performPendingRetransforms with isModifiableClass() to avoid infinite retry
  loop for non-modifiable classes (JDK classes, primitive wrappers, etc.)
- Disable SCA when telemetry or dependency collection is disabled (Agent.java):
  ScaReachabilityPeriodicAction is never registered in those cases, so
  pendingClassEvents would grow unbounded with no drain
- Fix generateScaCvesJson up-to-date check: use upToDateWhen to force re-run when
  -PrefreshSca is set (onlyIf alone can be bypassed by Gradle's up-to-date check)
- Add custom URL support to ScaEnrichmentsPlugin via -PscaEnrichmentsUrl
- Add upper bound for ScaReachabilityDependencyRegistry (DD_APPSEC_SCA_MAX_TRACKED_DEPENDENCIES,
  default 1000) with warning log when cap is reached
@jandro996

jandro996 commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

@bric3 Thanks for the thorough and detailed review 💪
Every comment was actionable and helped improve the implementation significantly.

A few clarifications on the decisions that may look like shortcuts:

Class-level detection (match on class load, no callsite): this is a temporary compromise. The current sca_cves.json already contains method-level symbols for most entries, and the transformer injects callsite callbacks for those. The class-level path exists because a small number of entries in the database only have class-level data today - once the backend enrichment pipeline catches up and provides method-level symbols for all entries, class-level detection will be removed entirely.

sca_cves.json committed to the repo / ScaEnrichmentsPlugin: also temporary. The symbol database will be delivered via Remote Config in a future iteration, at which point the Gradle plugin, the committed JSON, and the classpathArtifactCache will all be removed. The plugin is intentionally minimal for that reason.

All 15 comments addressed: isModifiableClass guard, class/method normalization at parse time, telemetry guard in Agent.java, upToDateWhen+onlyIf combination, URL override in the plugin, and the rest.

Thanks again!!!

jandro996 added 2 commits June 9, 2026 12:24
…ngs.getClassName

- Extract duplicated cap-exceeded guard into private isCapExceeded(String key) helper,
  called from both registerCve and recordHit
- Add periodicWorkCallback = null to resetForTesting() to prevent test state leakage
- Replace inline replace('/', '.') with Strings.getClassName for consistency with processClass
…nDepForTesting

- ScaReachabilityDependencyRegistryTest: add registerCve, drain semantics,
  cap enforcement (two tests: new keys rejected, existing keys still updated),
  and resetForTesting clears periodicWorkCallback
- ScaReachabilityPeriodicAction: annotate addKnownDepForTesting with @VisibleForTesting
@jandro996

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a0cf4ce86d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

jandro996 added 3 commits June 9, 2026 13:03
…as drained

When DependencyService resolves a JAR in heartbeat N+1 but the CVE was
already drained in heartbeat N (pendingReport=false), Step 2 previously
emitted metadata:[] for that dep. If the backend uses last-write-wins
semantics, this would silently erase the CVE state reported in HB N.

Fix: add peekSnapshot() to ScaReachabilityDependencyRegistry (reads CVE
state without clearing pendingReport), and call it in Step 2 when the JAR
is newly resolved but no pending snapshot exists for that dep.

Class-level-only CVEs are permanently affected (no method hit to re-trigger
pendingReport), so the fix is necessary for correctness.

Adds regression test: cveRegisteredBeforeDepResolved_step2PeeksCveStateInLaterHeartbeat
…resetForTesting

entriesForClass() always returns a non-empty list or null (never an empty
list), so the || entries.isEmpty() checks in transform(), checkAlreadyLoadedClasses(),
and processPendingClassEvents() are dead code that could mislead readers.

Also annotate resetForTesting() with @VisibleForTesting, consistent with
the rest of the project's convention for package-private test helpers.
@jandro996 jandro996 requested a review from bric3 June 9, 2026 12:06
@jandro996 jandro996 enabled auto-merge June 9, 2026 12:58
@jandro996 jandro996 added this pull request to the merge queue Jun 9, 2026
@dd-octo-sts

dd-octo-sts Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

/merge

@gh-worker-devflow-routing-ef8351

gh-worker-devflow-routing-ef8351 Bot commented Jun 9, 2026

Copy link
Copy Markdown

View all feedbacks in Devflow UI.

2026-06-09 13:28:32 UTC ℹ️ Start processing command /merge


2026-06-09 13:28:37 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in master is approximately 1h (p90).


2026-06-09 14:34:37 UTC ℹ️ MergeQueue: This merge request was merged

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 9, 2026
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit 0e739a1 into master Jun 9, 2026
573 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the alejandro.gonzalez/sca-reachability branch June 9, 2026 14:34
@github-actions github-actions Bot added this to the 1.64.0 milestone Jun 9, 2026
jandro996 added a commit that referenced this pull request Jun 17, 2026
JAR I/O (resolveDependencies -> DependencyResolver -> JarReader) was
happening inside the ClassFileTransformer callback, i.e. while the JVM
holds its internal retransform locks. Libraries like snakeyaml that
trigger class loading during JAR resolution can deadlock because the
JVM class-loading lock and the retransform lock are both held
simultaneously.

Fix: in performPendingRetransforms(), resolve each class's JAR into
jarCache on the telemetry thread BEFORE calling retransformClasses().
When the transform callback fires, resolveDependencies() returns
immediately from cache with no I/O.

This restores the v1 (PR #11352) invariant: JAR I/O on the telemetry
thread only, bytecode injection only inside the callback.

Expose jarCache and resolveDependencies as @VisibleForTesting to support
the regression test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: asm waf Application Security Management (WAF) comp: telemetry Telemetry tag: override-groovy-enforcement Override the "Enforce Groovy Migration" check type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants