Skip to content

feat: Extension installation module#2811

Merged
csmith49 merged 66 commits intomainfrom
feat/installed-extensions
Apr 28, 2026
Merged

feat: Extension installation module#2811
csmith49 merged 66 commits intomainfrom
feat/installed-extensions

Conversation

@csmith49
Copy link
Copy Markdown
Collaborator

@csmith49 csmith49 commented Apr 13, 2026

  • A human has tested these changes.

Why

Skills and plugins have repeated but slightly-divergent implementations of installing/loading. This PR attempts to unify both with one mechanism that can be reused for other extensions in the future.

This PR is a follow-up to #2801 and should be merged after.

Summary

  • New openhands.sdk.extensions.installation module that provides the interface for installing/loading extensions
  • Tests for the above
  • Updates to openhands.sdk.plugin.installed and openhands.sdk.skills.installed to use the new machinery while exposing the old interface

Breaking Changes

  • InstalledPluginInfo.install_path / InstalledSkillInfo.install_path type changed from str to Path.
    Downstream code doing string operations on info.install_path (e.g. info.install_path.startswith(...)) will need to use str(info.install_path) or switch to Path APIs. Path is the better type here and existing tests already use it correctly (e.g. install_path / "skills").

  • InstalledPluginsMetadata and InstalledSkillsMetadata removed from public exports.
    These were internal implementation details. The user-facing functions and InstalledPluginInfo / InstalledSkillInfo type aliases remain exported.

  • Skill-specific fields dropped from InstalledSkillInfo.
    The old InstalledSkillInfo had license, compatibility, metadata, allowed_tools fields. These are gone since it is now aliased to InstallationInfo. These fields were never accessed in production code.

How to Test

Unit/integration tests all use temp files to actually stress the saving/loading.

Type

  • Bug fix
  • Feature
  • Refactor
  • Breaking change
  • Docs / chore

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:907c4b8-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-907c4b8-python \
  ghcr.io/openhands/agent-server:907c4b8-python

All tags pushed for this build

ghcr.io/openhands/agent-server:907c4b8-golang-amd64
ghcr.io/openhands/agent-server:907c4b8-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:907c4b8-golang-arm64
ghcr.io/openhands/agent-server:907c4b8-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:907c4b8-java-amd64
ghcr.io/openhands/agent-server:907c4b8-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:907c4b8-java-arm64
ghcr.io/openhands/agent-server:907c4b8-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:907c4b8-python-amd64
ghcr.io/openhands/agent-server:907c4b8-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:907c4b8-python-arm64
ghcr.io/openhands/agent-server:907c4b8-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:907c4b8-golang
ghcr.io/openhands/agent-server:907c4b8-java
ghcr.io/openhands/agent-server:907c4b8-python

About Multi-Architecture Support

  • Each variant tag (e.g., 907c4b8-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 907c4b8-python-amd64) are also available if needed

csmith49 and others added 30 commits April 10, 2026 09:49
Cover parse_extension_source, get_cache_path, fetch,
fetch_with_resolution, SourceType enum, and repo_path handling
for the new shared extensions module.

Co-authored-by: openhands <openhands@all-hands.dev>
…h.py

- Add class docstring to SourceType enum describing each variant.
- Update parse_extension_source docstring: fix arg description,
  return types, and example function names.
- Fix _apply_subpath docstring example from 'plugin repository'
  to 'extension repository'.
- Fix get_cache_path docstring references from plugin to extension.
- Rename plugin_path variable to ext_path in fetch_with_resolution.

Co-authored-by: openhands <openhands@all-hands.dev>
Both modules now delegate all fetch logic to the shared
extensions.fetch module while preserving their public interfaces:

- plugin/fetch.py: parse_plugin_source, get_cache_path, fetch_plugin,
  fetch_plugin_with_resolution, PluginFetchError, DEFAULT_CACHE_DIR
- skills/fetch.py: fetch_skill, fetch_skill_with_resolution,
  SkillFetchError, DEFAULT_CACHE_DIR

ExtensionFetchError is caught and re-raised as the domain-specific
error type with 'extension' replaced by 'plugin'/'skill' in messages.

Co-authored-by: openhands <openhands@all-hands.dev>
- Remove parse_plugin_source and get_cache_path from plugin/fetch.py;
  callers should use extensions.fetch directly.
- Slim test_plugin_fetch.py from 1038 to 100 lines: keep only
  PluginFetchError wrapping, DEFAULT_CACHE_DIR, and Plugin.fetch() tests.
- Move git infrastructure tests (clone, update, checkout, locking,
  GitHelper errors, get_default_branch) to tests/sdk/git/test_cached_repo.py.
- Update test_installed_plugins.py to import from extensions.fetch.

Co-authored-by: openhands <openhands@all-hands.dev>
The except block was discarding the ExtensionFetchError message and
replacing it with a generic 'Failed to fetch plugin'. Now it carries
the original message through with 'extension' replaced by 'plugin'.

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Apr 23, 2026

@OpenHands /codereview this PR, and tell us directly on github: WDYT?

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented Apr 23, 2026

I'm on it! enyst can track my progress at all-hands.dev

Copy link
Copy Markdown
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - the refactor itself looks sound after the latest fixes. I walked the new installation/metadata path, compared the wrapper behavior against the old plugin/skill implementations, and ran uv run pytest tests/sdk/extensions/installation/test_installation_info.py tests/sdk/extensions/installation/test_installation_manager.py tests/sdk/extensions/installation/test_installation_metadata.py tests/sdk/extensions/installation/test_installation_utils.py tests/sdk/plugin/test_installed_plugins.py tests/sdk/skills/test_installed_skills.py -q locally (80 passed). I don't currently see a code-level blocker in the diff.

[TESTING GAPS]

  • [PR description] No Evidence: the PR template explicitly asks for end-to-end runtime evidence, but the body still only says the unit/integration tests use temp files. Please add an Evidence section with the exact real install/load command(s) you ran and the resulting output. If this PR is also claiming to address the legacy metadata issue discussed in review, include a short before/after reproduction for that path too.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM
    Breaking SDK surface plus install/load refactor, so compatibility matters. The current implementation and tests look reasonably solid; the remaining concern is missing runtime proof in the PR description rather than the code itself.

VERDICT:
Worth merging once the missing runtime evidence is added

KEY INSIGHT:
The risky part here is API/persistence compatibility, and the current code/tests cover that path reasonably well; the remaining gap is proof in the PR description, not the implementation.

This review was generated by an AI assistant (OpenHands) on behalf of the user.

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented Apr 23, 2026

Here’s the final summary of my work since the last summary:

  • I reviewed PR feat: Extension installation module #2811 in context by checking the PR metadata, linked issue, issue comments, review threads, and the current diff.
  • I inspected the refactored extension-installation code and compared the current plugin/skill wrapper behavior against the previous implementation.
  • I ran the targeted validation suite:
    • uv run pytest tests/sdk/extensions/installation/test_installation_info.py tests/sdk/extensions/installation/test_installation_manager.py tests/sdk/extensions/installation/test_installation_metadata.py tests/sdk/extensions/installation/test_installation_utils.py tests/sdk/plugin/test_installed_plugins.py tests/sdk/skills/test_installed_skills.py -q
    • Result: 80 passed
  • I posted a GitHub PR review comment here:

Status checklist:

  • Reviewed the current PR state and addressed the request directly on GitHub
  • Validated the relevant test coverage locally
  • Gave a clear verdict on the PR
  • Followed the instruction not to change code unless necessary
  • Included AI disclosure in the GitHub-posted review
  • Did not make any repository changes, commits, or pushes

Conciseness:

  • No code changes were made, so there are no extraneous edits to revert
  • The only new artifact was the GitHub review comment requested by the user

Bottom line:

  • The request was fully addressed. I reviewed the PR and posted my assessment on GitHub.
  • My conclusion was that the code looks acceptable after the latest fixes, but the PR description still lacks the explicit runtime evidence requested by the PR template, so I left a COMMENT review rather than approving.

@xingyaoww xingyaoww requested a review from all-hands-bot April 24, 2026 20:04
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable — Clean refactor with solid architecture, but requires human review + evals before merge per repo policy.

[EVAL RISK]

  • This refactors plugin/skill loading infrastructure
  • Per repo guidelines: plugin/skill loading changes require human review + lightweight evals
  • Recommendation: Have a human maintainer run evals to verify no behavior regression before merge

[ARCHITECTURAL ASSESSMENT]
Good taste: The generic InstallationManager[T] abstraction elegantly eliminates ~650 lines of duplicated code. The protocol-based interface with type parameters is exactly the right level of abstraction.

Data Structures: Using Pydantic models for metadata, Path for install_path (improvement over str), and context manager for metadata sessions - all solid choices.

Testing: Comprehensive coverage with real disk I/O testing the full persistence round-trip. Not just mocks.

Backward Compatibility: Legacy metadata migration properly handles old {"plugins": ...} and {"skills": ...} formats.

[BREAKING CHANGES]
Properly Documented: Breaking changes are reasonable API improvements (str→Path, removed internal metadata classes).

[RISK ASSESSMENT]
⚠️ Risk Assessment: 🟡 MEDIUM — Infrastructure refactor needs eval verification to ensure no behavior change.

VERDICT:
✅ Code quality is excellent, architecture is sound
❌ Cannot approve without evals per repo policy

KEY INSIGHT:
Well-executed refactor that successfully unifies plugin/skill installation through clean abstraction, but the infrastructure nature requires eval verification before merge.

@xingyaoww xingyaoww added the test-examples Run all applicable "examples/" files. Expensive operation. label Apr 24, 2026
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Functionally verified the extension installation module unification. All existing plugin and skill installation APIs work correctly through the new generic framework.

Does this PR achieve its stated goal?

Yes. The PR successfully unifies plugin and skill installation into a single generic openhands.sdk.extensions.installation module. The refactor consolidates previously duplicated code while maintaining full backward compatibility for existing public APIs. All documented breaking changes are verified and intentional.

Phase Result
Environment Setup ✅ All dependencies installed, project builds successfully
CI Status ✅ All core checks passing (sdk-tests, tools-tests, workspace-tests, agent-server-tests, pre-commit, coverage-report)
Functional Verification ✅ All installation operations verified end-to-end
Functional Verification

Test 1: Plugin Installation API (Backward Compatibility)

Verified the existing openhands.sdk.plugin.installed API continues to work:

Commands executed:

from openhands.sdk.plugin.installed import install_plugin, list_installed_plugins, uninstall_plugin, enable_plugin, disable_plugin, get_installed_plugin

# Install plugin from local path
info = install_plugin(str(test_plugin_source), installed_dir=installed_dir)

# Verify install_path is Path (breaking change)
assert isinstance(info.install_path, Path)  # ✓ PASS

# List, enable/disable, get, uninstall all work
plugins = list_installed_plugins(installed_dir=installed_dir)  # ✓ Returns 1 plugin
disable_plugin("test-plugin", installed_dir=installed_dir)  # ✓ Success
enable_plugin("test-plugin", installed_dir=installed_dir)  # ✓ Success
uninstall_plugin("test-plugin-2", installed_dir=installed_dir)  # ✓ Success

Result: All plugin installation operations work correctly. The breaking change (install_path type changed from str to Path) is verified.


Test 2: Skill Installation API (Backward Compatibility)

Verified the existing openhands.sdk.skills.installed API continues to work:

Commands executed:

from openhands.sdk.skills.installed import install_skill, list_installed_skills, load_installed_skills, disable_skill, enable_skill, uninstall_skill

# Install skill from local path
info = install_skill(str(test_skill_source), installed_dir=installed_dir)

# Verify install_path is Path (breaking change)
assert isinstance(info.install_path, Path)  # ✓ PASS

# Load installed skills (only enabled ones)
skills = load_installed_skills(installed_dir=installed_dir)  # ✓ Returns 1 skill

# Disable/enable filtering works
disable_skill("test-skill", installed_dir=installed_dir)  # ✓ Success
load_installed_skills(installed_dir=installed_dir)  # ✓ Returns 0 (disabled excluded)
enable_skill("test-skill", installed_dir=installed_dir)  # ✓ Success
load_installed_skills(installed_dir=installed_dir)  # ✓ Returns 1 (enabled included)

Result: All skill installation operations work correctly. The breaking change (install_path type changed from str to Path) is verified.


Test 3: Generic InstallationManager (New Unified Framework)

Verified the new generic framework works with custom extension types:

Commands executed:

from openhands.sdk.extensions.installation import InstallationManager, InstallationInterface
from pydantic import BaseModel

# Define custom extension type
class CustomExtension(BaseModel):
    name: str
    version: str
    description: str
    custom_field: str = "custom_value"

# Define loader
class CustomExtensionInterface(InstallationInterface[CustomExtension]):
    @staticmethod
    def load_from_dir(extension_dir: Path) -> CustomExtension:
        return CustomExtension.model_validate_json((extension_dir / "extension.json").read_text())

# Create manager
manager = InstallationManager(
    installation_dir=installed_dir,
    installation_interface=CustomExtensionInterface(),
)

# All operations work with custom type
info = manager.install(str(ext_source))  # ✓ Installed
extensions = manager.load_installed()  # ✓ Returns CustomExtension instances
assert extensions[0].custom_field == "test_value"  # ✓ Custom fields preserved

Result: The generic InstallationManager successfully works with custom extension types, proving the unification is complete and extensible.


Test 4: Metadata Migration (Legacy Compatibility)

Verified legacy metadata formats are migrated automatically:

Commands executed:

from openhands.sdk.extensions.installation import InstallationMetadata
import json

# Write old "plugins" format
legacy_plugins = {
    "plugins": {
        "my-plugin": {"name": "my-plugin", "version": "1.0.0", ...}
    }
}
metadata_file.write_text(json.dumps(legacy_plugins))

# Load - should migrate to "extensions"
metadata = InstallationMetadata.load_from_dir(installed_dir)
assert "my-plugin" in metadata.extensions  # ✓ PASS

# Test "skills" key migration
legacy_skills = {"skills": {"my-skill": {...}}}
# ✓ Also migrates to "extensions"

# Test conflict resolution (extensions wins)
conflict = {
    "extensions": {"shared": {"version": "3.0.0", ...}},
    "plugins": {"shared": {"version": "1.0.0", ...}}
}
# ✓ extensions key takes precedence

Result: Legacy plugins and skills metadata keys are automatically migrated to the new extensions key. Conflicts are resolved correctly (explicit extensions wins).


Test 5: Self-Healing Metadata

Verified metadata automatically syncs with filesystem:

Commands executed:

from openhands.sdk.plugin.installed import list_installed_plugins

# Manually create plugin directory without metadata entry
untracked_plugin = installed_dir / "untracked-plugin"
create_test_plugin(untracked_plugin, "untracked-plugin")

# list_installed_plugins discovers it
plugins = list_installed_plugins(installed_dir=installed_dir)
assert "untracked-plugin" in [p.name for p in plugins]  # ✓ Discovered
assert plugins[0].source == "local"  # ✓ Correct default source

# Add stale metadata entry (directory doesn't exist)
metadata["extensions"]["deleted-plugin"] = {...}

# list_installed_plugins removes stale entry
plugins = list_installed_plugins(installed_dir=installed_dir)
assert "deleted-plugin" not in [p.name for p in plugins]  # ✓ Cleaned up

# Invalid directory names are skipped
invalid_dir = installed_dir / "Invalid_Name"
invalid_dir.mkdir()
plugins = list_installed_plugins(installed_dir=installed_dir)
assert "Invalid_Name" not in [p.name for p in plugins]  # ✓ Skipped

Result: Metadata automatically discovers untracked extension directories, removes stale entries for deleted directories, and skips invalid directory names.

Issues Found

None.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-04-24 20:36:05 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 27.6s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 21.5s $0.03
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 12.7s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 27.9s $0.02
01_standalone_sdk/09_pause_example.py ✅ PASS 12.7s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 34.4s $0.03
01_standalone_sdk/11_async.py ✅ PASS 30.0s $0.04
01_standalone_sdk/12_custom_secrets.py ✅ PASS 10.6s $0.01
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 50.6s $0.05
01_standalone_sdk/14_context_condenser.py ✅ PASS 2m 20s $0.17
01_standalone_sdk/17_image_input.py ✅ PASS 17.1s $0.01
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 17.2s $0.02
01_standalone_sdk/19_llm_routing.py ❌ FAIL
Exit code 1
4m 21s --
01_standalone_sdk/20_stuck_detector.py ✅ PASS 15.6s $0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 10.2s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 22.5s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 1m 7s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 3m 23s $0.23
01_standalone_sdk/25_agent_delegation.py ✅ PASS 51.7s $0.06
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 27.6s $0.06
01_standalone_sdk/28_ask_agent_example.py ✅ PASS 35.8s $0.04
01_standalone_sdk/29_llm_streaming.py ✅ PASS 40.0s $0.04
01_standalone_sdk/30_tom_agent.py ✅ PASS 9.7s $0.01
01_standalone_sdk/31_iterative_refinement.py ✅ PASS 3m 47s $0.26
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 22.2s $0.02
01_standalone_sdk/34_critic_example.py ❌ FAIL
Timed out after 600 seconds
10m 0s --
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 11.7s $0.01
01_standalone_sdk/37_llm_profile_store/main.py ✅ PASS 6.8s $0.00
01_standalone_sdk/38_browser_session_recording.py ✅ PASS 36.9s $0.03
01_standalone_sdk/39_llm_fallback.py ✅ PASS 10.7s $0.01
01_standalone_sdk/40_acp_agent_example.py ✅ PASS 30.2s $0.12
01_standalone_sdk/41_task_tool_set.py ✅ PASS 29.7s $0.03
01_standalone_sdk/42_file_based_subagents.py ✅ PASS 1m 41s $0.09
01_standalone_sdk/43_mixed_marketplace_skills/main.py ✅ PASS 6.9s $0.00
01_standalone_sdk/44_model_switching_in_convo.py ✅ PASS 9.0s $0.01
01_standalone_sdk/45_parallel_tool_execution.py ✅ PASS 3m 11s $0.42
01_standalone_sdk/46_agent_settings.py ✅ PASS 10.8s $0.01
01_standalone_sdk/47_defense_in_depth_security.py ✅ PASS 3.6s $0.00
01_standalone_sdk/48_conversation_fork.py ✅ PASS 15.0s $0.00
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 36.0s $0.03
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ✅ PASS 1m 50s $0.09
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ✅ PASS 49.3s $0.05
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ✅ PASS 1m 42s $0.05
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 30.4s $0.03
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ✅ PASS 3m 40s $0.03
02_remote_agent_server/09_acp_agent_with_remote_runtime.py ✅ PASS 51.9s $0.14
02_remote_agent_server/10_cloud_workspace_share_credentials.py ✅ PASS 33.2s $0.05
02_remote_agent_server/11_conversation_fork.py ✅ PASS 45.5s $0.00
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 30.1s $0.03
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 44.2s $0.06
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 19.1s $0.02
05_skills_and_plugins/02_loading_plugins/main.py ❌ FAIL
Exit code 1
20.4s --

❌ Some tests failed

Total: 52 | Passed: 49 | Failed: 3 | Total Cost: $2.52

Failed examples:

  • examples/01_standalone_sdk/19_llm_routing.py: Exit code 1
  • examples/01_standalone_sdk/34_critic_example.py: Timed out after 600 seconds
  • examples/05_skills_and_plugins/02_loading_plugins/main.py: Exit code 1

View full workflow run

csmith49 and others added 2 commits April 27, 2026 09:08
The installation metadata format was unified under "extensions" but the
loading_plugins example still referenced the old "plugins" key.

Co-authored-by: openhands <openhands@all-hands.dev>
@csmith49 csmith49 added test-examples Run all applicable "examples/" files. Expensive operation. and removed test-examples Run all applicable "examples/" files. Expensive operation. labels Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-04-27 16:21:58 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 24.1s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 25.3s $0.03
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 12.3s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 30.4s $0.03
01_standalone_sdk/09_pause_example.py ✅ PASS 17.7s $0.02
01_standalone_sdk/10_persistence.py ✅ PASS 35.8s $0.03
01_standalone_sdk/11_async.py ✅ PASS 34.8s $0.04
01_standalone_sdk/12_custom_secrets.py ✅ PASS 9.9s $0.01
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 34.3s $0.03
01_standalone_sdk/14_context_condenser.py ✅ PASS 2m 24s $0.15
01_standalone_sdk/17_image_input.py ✅ PASS 17.5s $0.01
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 20.9s $0.02
01_standalone_sdk/19_llm_routing.py ✅ PASS 18.7s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 15.3s $0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 10.4s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 14.7s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 2m 32s $0.02
01_standalone_sdk/24_planning_agent_workflow.py ❌ FAIL
Timed out after 600 seconds
10m 0s --
01_standalone_sdk/25_agent_delegation.py ✅ PASS 1m 5s $0.08
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 19.3s $0.02
01_standalone_sdk/28_ask_agent_example.py ✅ PASS 32.1s $0.03
01_standalone_sdk/29_llm_streaming.py ✅ PASS 34.0s $0.03
01_standalone_sdk/30_tom_agent.py ✅ PASS 9.1s $0.01
01_standalone_sdk/31_iterative_refinement.py ✅ PASS 4m 3s $0.31
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 21.8s $0.02
01_standalone_sdk/34_critic_example.py ✅ PASS 2m 11s $0.17
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 11.3s $0.01
01_standalone_sdk/37_llm_profile_store/main.py ✅ PASS 4.0s $0.00
01_standalone_sdk/38_browser_session_recording.py ✅ PASS 28.8s $0.04
01_standalone_sdk/39_llm_fallback.py ✅ PASS 11.4s $0.01
01_standalone_sdk/40_acp_agent_example.py ✅ PASS 26.4s $0.12
01_standalone_sdk/41_task_tool_set.py ✅ PASS 27.0s $0.03
01_standalone_sdk/42_file_based_subagents.py ✅ PASS 1m 25s $0.09
01_standalone_sdk/43_mixed_marketplace_skills/main.py ✅ PASS 3.3s $0.00
01_standalone_sdk/44_model_switching_in_convo.py ✅ PASS 8.8s $0.01
01_standalone_sdk/45_parallel_tool_execution.py ❌ FAIL
Timed out after 600 seconds
10m 0s --
01_standalone_sdk/46_agent_settings.py ✅ PASS 11.4s $0.01
01_standalone_sdk/47_defense_in_depth_security.py ✅ PASS 3.4s $0.00
01_standalone_sdk/48_conversation_fork.py ✅ PASS 17.6s $0.01
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 31.7s $0.03
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ✅ PASS 1m 45s $0.06
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ✅ PASS 1m 29s $0.08
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ❌ FAIL
Exit code 1
1m 40s --
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 39.2s $0.04
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ✅ PASS 3m 11s $0.02
02_remote_agent_server/09_acp_agent_with_remote_runtime.py ✅ PASS 3m 20s $0.12
02_remote_agent_server/10_cloud_workspace_share_credentials.py ✅ PASS 30.1s $0.05
02_remote_agent_server/11_conversation_fork.py ✅ PASS 35.4s $0.00
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 23.5s $0.02
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 38.4s $0.07
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 14.4s $0.02
05_skills_and_plugins/02_loading_plugins/main.py ✅ PASS 16.3s $0.02

❌ Some tests failed

Total: 52 | Passed: 49 | Failed: 3 | Total Cost: $1.97

Failed examples:

  • examples/01_standalone_sdk/24_planning_agent_workflow.py: Timed out after 600 seconds
  • examples/01_standalone_sdk/45_parallel_tool_execution.py: Timed out after 600 seconds
  • examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1

View full workflow run

@csmith49
Copy link
Copy Markdown
Collaborator Author

Looks like the only blocker from the (numerous) AI reviews is evidence that this works end-to-end. I've evaluated these changes locally while developing, and the relevant example is now successful. I'd encourage one of the potential reviewers to check out this branch and test as well.

There are also several failing examples -- to the best of my knowledge, these are unrelated to the changes in this branch. They're due to timeouts and other stochastic failure modes.

@csmith49 csmith49 requested a review from all-hands-bot April 27, 2026 17:50
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

All functional verification passed. The PR successfully unifies plugin and skill installation into a generic extension mechanism while maintaining backward compatibility.

Does this PR achieve its stated goal?

Yes. The PR set out to unify plugin and skill installation/loading mechanisms into a single reusable framework, and it delivers on that goal. The new openhands.sdk.extensions.installation module provides a generic InstallationManager that works with any extension type (plugins, skills, or custom types), and both the plugin and skill APIs have been successfully refactored to delegate to this unified machinery. All existing functionality works, backward compatibility is maintained (except for documented breaking changes), and legacy metadata files are automatically migrated.

Phase Result
Environment Setup ✅ Build succeeded, all dependencies installed
CI Status ✅ All 30 checks passing
Functional Verification ✅ All 5 verification tests passed
Functional Verification

Test 1: Generic InstallationManager with Custom Extension Type

Objective: Verify the new generic InstallationManager works with arbitrary extension types.

Setup: Created a custom Widget extension type with its own loader interface.

Ran:

uv run python /tmp/test_generic_installation.py

Result:

[1] Installing widget from local path...
✓ Installed: my-widget v1.0.0
  Source: /tmp/tmpu3ioolfw/source/my-widget
  Install path: /tmp/tmpu3ioolfw/installed/my-widget
  Enabled: True

[2] Listing installed extensions...
✓ Found 1 extension(s)
  - my-widget v1.0.0 (enabled=True)

[3] Loading installed extensions...
✓ Loaded 1 widget(s)
  - my-widget: A test widget

[4] Disabling extension...
✓ Disable succeeded
  Loaded after disable: 0 widget(s)

[5] Re-enabling extension...
✓ Enable succeeded
  Loaded after enable: 1 widget(s)

[6] Getting specific extension info...
✓ Found extension: my-widget
  Version: 1.0.0

[7] Uninstalling extension...
✓ Uninstall succeeded
  Remaining extensions: 0

Interpretation: The generic InstallationManager successfully handles a custom extension type, demonstrating the PR's goal of creating a reusable installation framework.


Test 2: Plugin Installation API Backward Compatibility

Objective: Verify existing plugin installation code continues to work after the refactor.

Ran:

uv run python /tmp/test_plugin_backward_compat.py

Result:

[1] Installing plugin...
✓ Installed: test-plugin v1.0.0
  Type of install_path: PosixPath
  install_path value: /tmp/tmpiwmz3whs/installed/test-plugin
  ✓ Confirmed: install_path is now Path (breaking change documented)

[2] Listing installed plugins...
✓ Found 1 plugin(s)
  - test-plugin v1.0.0

[3] Loading installed plugins...
✓ Loaded 1 plugin(s)
  - test-plugin: Test plugin

[4] Disabling plugin...
✓ Disable succeeded

[5] Re-enabling plugin...
✓ Enable succeeded

[6] Getting plugin info...
✓ Found plugin: test-plugin

[7] Uninstalling plugin...
✓ Uninstall succeeded

Interpretation: All plugin installation APIs (install_plugin, list_installed_plugins, load_installed_plugins, enable_plugin, disable_plugin, get_installed_plugin, uninstall_plugin) work correctly. The documented breaking change (install_path is now Path instead of str) is confirmed.


Test 3: Skill Installation API Backward Compatibility

Objective: Verify existing skill installation code continues to work after the refactor.

Ran:

uv run python /tmp/test_skill_backward_compat.py

Result:

[1] Installing skill...
✓ Installed: test-skill
  Type of install_path: PosixPath
  ✓ Confirmed: install_path is now Path (breaking change documented)

[2] Listing installed skills...
✓ Found 1 skill(s)
  - test-skill

[3] Loading installed skills...
✓ Loaded 1 skill(s)

[4] Disabling skill...
✓ Disable succeeded

[5] Re-enabling skill...
✓ Enable succeeded

[6] Getting skill info...
✓ Found skill: test-skill

[7] Uninstalling skill...
✓ Uninstall succeeded

Interpretation: All skill installation APIs work correctly with the same functionality as before the refactor.


Test 4: Updated Plugin Loading Example

Objective: Verify the example code updated in the PR runs successfully.

Ran:

uv run python examples/05_skills_and_plugins/02_loading_plugins/main.py

Result:

DEMO 5: Disabling/Enabling plugins
============================================================

After disable
-------------
Tracked plugins:
  - code-quality (enabled=False)
  - skills-237e139ce4d65e8f (enabled=True)

After re-enable
---------------
Tracked plugins:
  - code-quality (enabled=True)
  - skills-237e139ce4d65e8f (enabled=True)

Metadata file:
{
  "extensions": {
    "code-quality": { ... },
    "skills-237e139ce4d65e8f": { ... }
  }
}

============================================================
EXAMPLE COMPLETED SUCCESSFULLY
============================================================
EXAMPLE_COST: 0.0631

Interpretation: The example runs successfully end-to-end. The PR's change to the example (updating assertions from metadata["plugins"] to metadata["extensions"]) is correct and the metadata file now uses the new "extensions" key.


Test 5: Legacy Metadata Migration

Objective: Verify old metadata files with "plugins" or "skills" keys are automatically migrated to the new "extensions" format.

Ran:

uv run python /tmp/test_metadata_migration.py

Result:

[1] Testing migration of legacy 'plugins' key...
WARNING: Migrating legacy 'plugins' key to 'extensions'
✓ Legacy 'plugins' key successfully migrated to 'extensions'

[2] Testing migration of legacy 'skills' key...
WARNING: Migrating legacy 'skills' key to 'extensions'
✓ Legacy 'skills' key successfully migrated to 'extensions'

[3] Testing migration with both legacy keys...
✓ Both 'plugins' and 'skills' keys migrated to 'extensions'

[4] Testing load from file with legacy key...
✓ Legacy file successfully loaded and migrated
✓ Saved file uses new 'extensions' format

Interpretation: The PR includes automatic migration logic that converts old metadata formats to the new unified format. Existing users' installed plugins and skills will be automatically upgraded when the metadata is loaded.

Issues Found

None.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable — Clean refactor eliminating 655 lines of duplication, but has critical data loss issue in legacy migration and several important correctness/security gaps that should be addressed before merge.

Eval Risk Flag: This PR refactors plugin/skill loading infrastructure. Per repo guidelines, changes to plugin/skill loading require human review + lightweight evals before merge to verify no behavior regression.

Comment thread openhands-sdk/openhands/sdk/extensions/installation/metadata.py
Comment thread openhands-sdk/openhands/sdk/extensions/installation/manager.py
Comment thread openhands-sdk/openhands/sdk/extensions/installation/manager.py
Comment thread openhands-sdk/openhands/sdk/extensions/installation/info.py
Comment thread openhands-sdk/openhands/sdk/extensions/installation/info.py
Comment thread tests/sdk/extensions/installation/test_installation_utils.py
Comment thread tests/sdk/extensions/installation/test_installation_metadata.py
Comment thread openhands-sdk/openhands/sdk/plugin/__init__.py
Comment thread openhands-sdk/openhands/sdk/skills/__init__.py
Comment thread openhands-sdk/openhands/sdk/extensions/installation/README.md
@VascoSch92
Copy link
Copy Markdown
Contributor

It seems good for me...

csmith49 and others added 2 commits April 27, 2026 12:20
- Resolve installation_dir to absolute path in InstallationManager.__post_init__
  to prevent breakage if working directory changes
- Add parametrized path traversal tests for validate_extension_name
- Add test for legacy key conflict when both plugins and skills share a name

Co-authored-by: openhands <openhands@all-hands.dev>
@csmith49 csmith49 merged commit 0719217 into main Apr 28, 2026
32 checks passed
@csmith49 csmith49 deleted the feat/installed-extensions branch April 28, 2026 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test-examples Run all applicable "examples/" files. Expensive operation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants