Skip to content

[Refactor][Performance] Improve scalability and consistency of Admin list APIs #1732

Description

@robustmq

Have you checked the documentation for submitting an Issue?

  • Yes.

What type of enhancement is this?

  • Refactor
  • Performance

What does the enhancement do?

Background

Several Admin list APIs in admin-server currently follow the same query pattern:

  1. Load a full in-memory/local dataset into a Vec.
  2. Apply generic filtering and sorting.
  3. Apply pagination at the end.

This works for small datasets, but can become expensive when the cardinality grows to hundreds of thousands or millions.

Problem statement

  • High CPU cost: full scan plus global sort (O(n log n)).
  • High memory pressure: large intermediate vectors and frequent string allocation.
  • Latency spikes: expensive requests can block runtime capacity and increase p95/p99.
  • Inconsistent cluster view for some endpoints: some list APIs are node-local and may return incomplete data in multi-broker deployments.

Candidate optimization directions (without changing core storage/data structures)

  • Add request guardrails: max limit, max page * limit, and stricter behavior for no-filter queries.
  • Split query into two phases: lightweight scan first, then enrich only paginated rows.
  • Reduce expensive sorting paths on large result sets (or restrict sortable fields).
  • Add timeout/degraded behavior for very large queries.
  • Add per-endpoint observability for query phases (scan/filter/sort/paginate/enrich/serialize) to identify bottlenecks.
  • For node-local endpoints, add cluster fan-out aggregation (from Admin layer) where global view is required.

Interfaces that may have similar issues

  • /api/mqtt/client/list
  • /api/mqtt/session/list
  • /api/mqtt/subscribe/list
  • /api/mqtt/topic/list
  • /api/mqtt/system-alarm/list
  • /api/mqtt/ban-log/list
  • /api/mqtt/flapping_detect/list
  • /api/mqtt/connector/list (lower risk by data volume, same query pattern)
  • /api/mqtt/acl/list (lower risk by data volume, same query pattern)
  • /api/mqtt/blacklist/list (lower risk by data volume, same query pattern)
  • /api/mqtt/topic-rewrite/list (lower risk by data volume, same query pattern)
  • /api/mqtt/auto-subscribe/list (lower risk by data volume, same query pattern)

Suggested acceptance criteria

  • Define and document query guardrails for all Admin list endpoints.
  • Add per-endpoint performance metrics and logs for query phases.
  • Prioritize optimization rollout for high-risk endpoints (client_list, session_list).
  • Clarify which endpoints are node-local vs cluster-aggregated in API behavior docs.

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the RobustMQ community to improve.
  • No. I cannot submit a PR at this time.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions