Skip to content

[refactor][CGS][entrance] move smart queue selection logic to entrance layer via rpc#991

Closed
v-kkhuang wants to merge 20 commits intodev-1.18.2-webankfrom
temp-secondary-queue-for-webank
Closed

[refactor][CGS][entrance] move smart queue selection logic to entrance layer via rpc#991
v-kkhuang wants to merge 20 commits intodev-1.18.2-webankfrom
temp-secondary-queue-for-webank

Conversation

@v-kkhuang
Copy link
Copy Markdown

What is the purpose of the change

Background/Problem:
The smart queue selection logic is currently embedded within the LinkisManager engine creation flow. This creates tight coupling between the queue selection logic and engine creation process, making it difficult to maintain and extend. Additionally, the feature toggle, engine type filtering, and creator filtering are implemented at the LinkisManager layer, which is not the most appropriate architectural layer for these concerns.

Purpose of Change:
To address this architectural issue, this PR refactors the smart queue selection feature by moving the logic from LinkisManager to the Entrance layer through an RPC-based approach. The Entrance layer now handles feature toggles, engine type filtering, and creator filtering via a new interceptor, while LinkisManager provides queue selection decision as an RPC service.

Value/Impact:
This refactoring improves code maintainability and separation of concerns. The Entrance layer is now responsible for traffic control (feature toggle, filtering), while LinkisManager focuses on resource-based decision making. This makes the system more modular and easier to extend with additional queue selection strategies in the future.

Related issues/PRs

Related issues: close apache#5415
Related pr:none

Brief change log

  • Add QueueSelectionInterceptor at Entrance layer for smart queue selection control
  • Refactor DefaultEngineCreateService to provide queue selection via RPC
  • Add SecondaryYarnConf protocol for RPC communication between Entrance and LinkisManager
  • Move configuration items from RMConfiguration to Configuration for global access
  • Update EntranceSpringConfiguration to register the new interceptor
  • Add RPC timeout and error handling in Entrance interceptor
  • Improve logging with detailed resource usage percentages

Checklist

  • I have read the Contributing Guidelines on pull requests.
  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible
  • If this is a code change: I have written unit tests to fully verify the new behavior.

v-kkhuang and others added 20 commits March 31, 2026 09:32
…eption (#964)

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 开发阶段: 修复sr任务重试导致加载init_sql异常bug

* #AI commit# 修复: * 增加任务重试开关覆盖范围
…t queue selection

- Translate all Chinese log messages to English for consistency
- Update comments and documentation to English
- No functional changes, only log message translation
Add permission validation before using secondary queue to prevent task submission failures:

Features:
- Add configuration SECONDARY_QUEUE_PERMISSION_CHECK_ENABLED to enable/disable permission check
- Add configuration SECONDARY_QUEUE_ALLOWED_USERS to configure user whitelist
- Modify performSmartQueueSelection method to accept user parameter
- Add checkQueuePermission method to validate user access to secondary queue
- If user has no permission, log warning and fallback to primary queue
- Prevents task submission failures due to insufficient queue permissions

Configuration:
- wds.linkis.rm.secondary.yarnqueue.permission.check.enable (default: false)
- wds.linkis.rm.secondary.yarnqueue.allowed.users (default: empty)
…econdary queue

Replace configuration-based whitelist with actual Yarn permission verification:

Changes:
- Remove configuration items SECONDARY_QUEUE_PERMISSION_CHECK_ENABLED and SECONDARY_QUEUE_ALLOWED_USERS
- Rewrite checkQueuePermission method to use Yarn API for real permission validation
- Query Yarn app info via externalResourceService.getAppInfo to verify user access
- Detect permission errors (403/404/forbidden/unauthorized) and fallback to primary queue
- Handle transient errors gracefully to avoid blocking legitimate users

Permission Check Logic:
1. Try to get app info from target queue using Yarn REST API
2. If successful (even with empty app list) → user has permission
3. If permission error (403/404) → log warning and return false
4. If other error (network/timeout) → assume OK to avoid blocking
# Conflicts:
#	linkis-computation-governance/linkis-manager/linkis-application-manager/src/main/scala/org/apache/linkis/manager/am/service/engine/DefaultEngineCreateService.scala
@v-kkhuang v-kkhuang closed this Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant