Implement SCA Reachability runtime detection: report vulnerable classes and callsites via telemetry#11352
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e607887e99
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 82ea8065d9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. 👍 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6008ac9ca0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 750b3c3877
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex revew |
|
I reviewed the PR context and the currently checked-out code for the flagged areas. I don’t see any new blocking issues beyond what has already been discussed/resolved. Review result
Commands used
No code changes were made, so no commit or follow-up PR was created. |
|
@codex review |
1 similar comment
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 24c53f05cc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
This comment has been minimized.
This comment has been minimized.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b449cff45
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Thanks @jpbempel for all the review comments! all applied (FQN cleanup, catch scope, VisibleForTesting, Strings.getClassName, null guards on getAllLoadedClasses, and StackWalkerFactory) Thanks @sarahchen6 for porting the SCA benchmark variant to apm-sdks-benchmarks and for the heads up on the migration (updated the PR description accordingly) Sorry for the delay! :( |
sarahchen6
left a comment
There was a problem hiding this comment.
Looks good from LP side!
- Extract ASM injection into ScaMethodCallbackInjector (separate concerns) - Replace URLClassLoader walk with java.class.path scan (dead code on agent threads where AgentThreadFactory sets null context classloader; also not a URLClassLoader on Java 9+); add unit tests documenting both constraints - Replace stream with plain loop in hasMethodLevelSymbolForClass (avoids spliterator allocation on each class-load check) - Normalize class-level vs method-level symbol coexistence in ScaCveDatabase.toScaEntry(): drop class-level (method=null) symbol when method-level symbols exist for the same class in the same entry, preventing first-hit-wins hitRef from discarding the more specific callsite - Guard performPendingRetransforms with isModifiableClass() to avoid infinite retry loop for non-modifiable classes (JDK classes, primitive wrappers, etc.) - Disable SCA when telemetry or dependency collection is disabled (Agent.java): ScaReachabilityPeriodicAction is never registered in those cases, so pendingClassEvents would grow unbounded with no drain - Fix generateScaCvesJson up-to-date check: use upToDateWhen to force re-run when -PrefreshSca is set (onlyIf alone can be bypassed by Gradle's up-to-date check) - Add custom URL support to ScaEnrichmentsPlugin via -PscaEnrichmentsUrl - Add upper bound for ScaReachabilityDependencyRegistry (DD_APPSEC_SCA_MAX_TRACKED_DEPENDENCIES, default 1000) with warning log when cap is reached
|
@bric3 Thanks for the thorough and detailed review 💪 A few clarifications on the decisions that may look like shortcuts: Class-level detection (match on class load, no callsite): this is a temporary compromise. The current
All 15 comments addressed: isModifiableClass guard, class/method normalization at parse time, telemetry guard in Agent.java, upToDateWhen+onlyIf combination, URL override in the plugin, and the rest. Thanks again!!! |
…ngs.getClassName
- Extract duplicated cap-exceeded guard into private isCapExceeded(String key) helper,
called from both registerCve and recordHit
- Add periodicWorkCallback = null to resetForTesting() to prevent test state leakage
- Replace inline replace('/', '.') with Strings.getClassName for consistency with processClass
…nDepForTesting - ScaReachabilityDependencyRegistryTest: add registerCve, drain semantics, cap enforcement (two tests: new keys rejected, existing keys still updated), and resetForTesting clears periodicWorkCallback - ScaReachabilityPeriodicAction: annotate addKnownDepForTesting with @VisibleForTesting
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a0cf4ce86d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…as drained When DependencyService resolves a JAR in heartbeat N+1 but the CVE was already drained in heartbeat N (pendingReport=false), Step 2 previously emitted metadata:[] for that dep. If the backend uses last-write-wins semantics, this would silently erase the CVE state reported in HB N. Fix: add peekSnapshot() to ScaReachabilityDependencyRegistry (reads CVE state without clearing pendingReport), and call it in Step 2 when the JAR is newly resolved but no pending snapshot exists for that dep. Class-level-only CVEs are permanently affected (no method hit to re-trigger pendingReport), so the fix is necessary for correctness. Adds regression test: cveRegisteredBeforeDepResolved_step2PeeksCveStateInLaterHeartbeat
…resetForTesting entriesForClass() always returns a non-empty list or null (never an empty list), so the || entries.isEmpty() checks in transform(), checkAlreadyLoadedClasses(), and processPendingClassEvents() are dead code that could mislead readers. Also annotate resetForTesting() with @VisibleForTesting, consistent with the rest of the project's convention for package-private test helpers.
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
JAR I/O (resolveDependencies -> DependencyResolver -> JarReader) was happening inside the ClassFileTransformer callback, i.e. while the JVM holds its internal retransform locks. Libraries like snakeyaml that trigger class loading during JAR resolution can deadlock because the JVM class-loading lock and the retransform lock are both held simultaneously. Fix: in performPendingRetransforms(), resolve each class's JAR into jarCache on the telemetry thread BEFORE calling retransformClasses(). When the transform callback fires, resolveDependencies() returns immediately from cache with no I/O. This restores the v1 (PR #11352) invariant: JAR I/O on the telemetry thread only, bytecode injection only inside the callback. Expose jarCache and resolveDependencies as @VisibleForTesting to support the regression test.
What Does This Do
Implements the SCA Reachability subsystem for the Java APM agent. When
DD_APPSEC_SCA_ENABLED=true, the agent detects at runtime which classes and methods from vulnerable library versions are actually loaded and invoked, and reports this viaapp-dependencies-loadedtelemetry using the RFC stateful heartbeat model.Build - Symbol database
generateScaCvesJsonindd-java-agent/appsec/build.gradledownloads GHSA enrichments from the privatesca-reachability-databaseand bundlessca_cves.jsonin the agent JAR. The committed copy is used in CI (no network access required at build time). Maintainers regenerate with./gradlew generateScaCvesJson -PrefreshSca.GhsaEnrichmentParser(buildSrc Kotlin): converts the GHSA enrichment format to the internal format - one record per Maven artifact, class symbols in JVM internal format (slashes). Filters to JVM language and Maven ecosystem only.Detection - ClassFileTransformer (two-phase design)
ScaReachabilityTransformeruses a two-phase approach to keep JAR I/O off the class-loading thread:classBeingRedefined == null):transform()only enqueues aPendingClass(className, jarURL)record and returnsnullimmediately. No I/O on the class-loading thread. Matches the pattern used byLocationsCollectingTransformer.processPendingClassEvents()): runs on the telemetry heartbeat thread; drains the queue, resolves artifact versions from JARpom.properties, evaluates version ranges, reports class-level hits, and schedules method-level retransformation.classBeingRedefined != null): injects ASM bytecode callbacks at vulnerable method entry points. Tradeoff: method calls during the ~10s startup window before the first heartbeat are not captured; there is no behavioral difference for class-level symbols (the only active symbols today).Additional transformer details:
ScaReachabilitySystem: entry point called fromAgent.javaby reflection (same pattern asAppSecSystem).ScaReachabilityCallback(bootstrap): dedup + dispatch bridge between application classloader (injected bytecode) and agent classloader (handler). Must live in bootstrap to be visible from any classloader.pom.propertiesfirst, thenjava.class.pathscan as fallback for aggregator artifacts (e.g. Spring Boot starters). Thejava.class.pathscan covers standard deployments (JARs passed via-cp) on both Java 8 and 9+. Known limitation: OSGi bundle JARs and Spring Boot fat JAR nested libraries loaded viaLaunchedURLClassLoaderare not listed injava.class.path, so version resolution for aggregator artifacts may fail in those environments (falls back to retrying on the next heartbeat). A URLClassLoader chain walk was not viable here becauseAgentThreadFactorysets the context classloader tonullon all agent threads.VersionRangeParser: evaluates GHSA version range conditions (<,<=,>=,>,=, comma-separated AND).performPendingRetransforms(): usescontains()+removeAll()instead ofremove()inside thegetAllLoadedClasses()loop so that all classloader instances of the same class are retransformed (Spring Boot creates multipleLaunchedURLClassLoaderinstances).Callsite detection
ScaReachabilitySystem.findCallsite()walks the thread stack to report the application frame that called the vulnerable method, not the vulnerable method itself:StackWalkerFactory.INSTANCE.walk()is used for lazy stack evaluation on JDK9+ (avoids materializing the full stack array). Agent frames are pre-filtered by the walker.ScaStackExclusionTrie(generated fromsca_stack_exclusion.trieviatries.gradle) filters intermediate library frames so wrapper libraries are skipped and client code is reported.findCallsite(String, StackTraceElement[])for unit-testable stack simulation.Telemetry - RFC stateful heartbeat
ScaReachabilityDependencyRegistry(internal-api): stateful registry for CVE state perartifact@version.registerCve()creates an entry withreached:[];recordHit()stores the first callsite viaAtomicReference.compareAndSet(first-hit-wins).ScaReachabilityPeriodicAction(telemetry): on each heartbeat, drains bothDependencyService(new JARs) and the registry (CVE state changes), merges into one entry per dependency. ReplacesDependencyPeriodicActionwhen SCA is enabled to avoid duplicateapp-dependencies-loadedentries.Dependency: new nullablereachabilityMetadatafield;TelemetryRequestBody.writeDependency()writes themetadataarray when present.Benchmarks
The local
benchmark/folder has been removed from dd-trace-java (see #11502). Thescavariant (-Ddd.appsec.enabled=true -Ddd.appsec.sca.enabled=true) was contributed to the newapm-sdks-benchmarksimplementation by @sarahchen6 in DataDog/apm-sdks-benchmarks#161, based on the variant I added here originally.Motivation
Implements APPSEC-62260 - SCA Reachability runtime detection per the RFC.
Additional Notes
The
generateScaCvesJsonGradle plugin and the committedsca_cves.jsonare a temporary solution. In a future iteration the SCA symbol database will be delivered at runtime via Remote Config, at which point this plugin and the committed JSON file will be removed.sca_cves.jsonis committed to the repo becausesca-reachability-databaseis a private repo not accessible from CI without a token. Only the maintainer runs the refresh task.Method-level symbols in
sca_cves.json(snakeyamlload/loadAll, xstreamfromXML, etc.) are manually curated for testing. The database format does not yet define method-level entries; when it does, the Gradle task will be updated and the manual entries removed.AbstractStackWalker.isNotDatadogTraceStackElementvisibility changed from package-private topublicto allow callsite filtering from theappsecmodule.sca_stack_exclusion.trieis a curated copy of the IAST exclusion trie adapted for SCA callsite detection (testing framework entries removed). The trie cannot be reused directly from IAST without a circular dependency.Known limitation - aggregator artifacts in OSGi / Spring Boot fat JARs: version resolution for aggregator artifacts (e.g.
spring-boot-starter-web) relies solely onjava.class.path, which only covers JARs present on the standard classpath at JVM startup. OSGi bundle JARs and JARs nested inside a Spring Boot fat JAR (loaded byLaunchedURLClassLoader) are not listed there. In those environments, the version fallback for aggregator artifacts may returnnulland the CVE will be retried on the next heartbeat. Direct artifacts (those whose own JAR containspom.properties) are unaffected.The
ScaReachabilitySmokeTestextendsAbstractAppSecServerSmokeTestand remains Groovy until that base class is migrated to Java.Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: APPSEC-62260
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.