Skip to content

fix(medtronic): Medtronic pump stuck in PumpUnreachable state with no automatic recovery#4530

Open
mifi100 wants to merge 1 commit intonightscout:devfrom
mifi100:dev
Open

fix(medtronic): Medtronic pump stuck in PumpUnreachable state with no automatic recovery#4530
mifi100 wants to merge 1 commit intonightscout:devfrom
mifi100:dev

Conversation

@mifi100
Copy link
Contributor

@mifi100 mifi100 commented Feb 5, 2026

Preface

I've been facing issues with Medtronic+RileyLink connectivity stability for quite a long time. I found issue inside RL queue mechanism (see #3897) which improved stability. But still I could see that after pump became unreachable it couldn't connect again until AAPS reboot. Logs analysis gives good understanding that root cause is that system stops real connection attempts while it shouldn't after reaching PumpConnectorError RL state. I assessed all my findings with Claude Code which helped to document the change and do thorough review of the RCA and solution finalization.
I haven't tested this change yet, but I believe it should be safe and useful. Please review the proposal.

Summary

This fix addresses a deadlock condition where a Medtronic pump becomes permanently stuck in an unreachable state, requiring an application restart to recover. The issue occurs due to an interaction between two state machines (PumpDeviceState and RileyLinkServiceState) that prevents the regular connection retry mechanism from executing.


Problem Description

Background

The Medtronic pump driver uses two independent state machines:

  1. PumpDeviceState — tracks pump communication status (SleepingWakingUpActivePumpUnreachable, etc.)
  2. RileyLinkServiceState — tracks RileyLink device status (PumpConnectorReadyPumpConnectorErrorRileyLinkReady, etc.)

The Deadlock Scenario

Step 1: Pump becomes unreachable

When isDeviceReachable() in MedtronicCommunicationManager.kt fails to connect after 5 attempts:

  • PumpDeviceState is set to PumpUnreachable
  • If the pump has been unreachable for more than 15 minutes, WakeAndTuneTask is triggered to re-tune the radio frequency

Step 2: Frequency tuning fails

When WakeAndTuneTask executes and the tuning process fails (e.g., pump is out of range or powered off):

// RileyLinkService.kt:139-141
if (newFrequency == 0.0) {
    rileyLinkServiceData.setServiceState(RileyLinkServiceState.PumpConnectorError, 
        RileyLinkError.TuneUpOfDeviceFailed)
}

Step 3: Recovery path is blocked

The application uses KeepAliveWorker to periodically check pump status (approximately every 5 minutes). This triggers the following call chain:

KeepAliveWorker.checkPump()
  → commandQueue.readStatus()
    → MedtronicPumpPlugin.getPumpStatus()
      → refreshAnyStatusThatNeedsToBeRefreshed()
        → isPumpNotReachable property

The isPumpNotReachable property contained this logic:

// MedtronicPumpPlugin.kt (BEFORE fix)
private val isPumpNotReachable: Boolean
    get() {
        val rileyLinkServiceState = rileyLinkServiceData.rileyLinkServiceState
        if (rileyLinkServiceState != RileyLinkServiceState.PumpConnectorReady
            && rileyLinkServiceState != RileyLinkServiceState.RileyLinkReady
            && rileyLinkServiceState != RileyLinkServiceState.TuneUpDevice
        ) {
            aapsLogger.debug(LTag.PUMP, "RileyLink unreachable.")
            return false  // "pump is NOT unreachable" = "pump is reachable"
        }
        return isDeviceReachable() != true
    }

When RileyLinkServiceState is PumpConnectorError:

  • All three conditions in the if statement evaluate to true
  • The method returns false without calling isDeviceReachable()
  • No reconnection attempt is made

Step 4: Permanent deadlock

The system is now stuck in an unrecoverable state:

  • PumpDeviceState = PumpUnreachable
  • RileyLinkServiceState = PumpConnectorError
  • The regular KeepAliveWorker check never attempts to reconnect to the pump
  • The only way to recover is through a Bluetooth reconnection event or application restart

State Diagram

Normal Operation
       │
       ▼
┌─────────────────────────┐
│ RileyLinkServiceState:  │
│   PumpConnectorReady    │
│ PumpDeviceState:        │
│   Sleeping              │
└───────────┬─────────────┘
            │ Connection fails (5 retries)
            ▼
┌─────────────────────────┐
│ RileyLinkServiceState:  │
│   PumpConnectorReady    │
│ PumpDeviceState:        │
│   PumpUnreachable       │──► WakeAndTuneTask triggered (after 15 min)
└───────────┬─────────────┘           │
            │                         │ TuneUp fails (frequency = 0.0)
            │                         ▼
            │              ┌──────────────────────────┐
            └─────────────►│ RileyLinkServiceState:   │
                           │   PumpConnectorError     │
                           │ PumpDeviceState:         │
                           │   PumpUnreachable        │
                           └──────────┬───────────────┘
                                      │
                           ┌──────────▼───────────────┐
                           │ KeepAliveWorker (5 min)  │
                           │ → isPumpNotReachable     │
                           │ → returns false          │◄──┐
                           │ → NO isDeviceReachable() │   │
                           │ → no reconnection attempt│───┘
                           └──────────────────────────┘
                                      │
                           DEADLOCK - only exit:
                           Bluetooth reconnect or app restart

The Fix

This fix consists of two changes, both in the Medtronic-specific code:

Change 1: Allow reconnection attempts when in PumpConnectorError state

File: pump/medtronic/src/main/kotlin/app/aaps/pump/medtronic/MedtronicPumpPlugin.kt

// BEFORE
if (rileyLinkServiceState != RileyLinkServiceState.PumpConnectorReady
    && rileyLinkServiceState != RileyLinkServiceState.RileyLinkReady
    && rileyLinkServiceState != RileyLinkServiceState.TuneUpDevice
) {

// AFTER
if (rileyLinkServiceState != RileyLinkServiceState.PumpConnectorReady
    && rileyLinkServiceState != RileyLinkServiceState.RileyLinkReady
    && rileyLinkServiceState != RileyLinkServiceState.TuneUpDevice
    && rileyLinkServiceState != RileyLinkServiceState.PumpConnectorError
) {

This check aims to prevent real pump check if RileyLink device is not ready for that. And PumpConnectorError state is the state where RileyLink is reachable, so this state is just missed in the list.

Effect: When RileyLinkServiceState is PumpConnectorError, the code now proceeds to call isDeviceReachable(), which attempts up to 5 connection retries. This allows the system to recover when the pump becomes reachable again.

Change 2: Restore RileyLinkServiceState upon successful connection

File: pump/medtronic/src/main/kotlin/app/aaps/pump/medtronic/comm/MedtronicCommunicationManager.kt

// BEFORE
if (valid) {
    if (state === PumpDeviceState.PumpUnreachable)
        medtronicPumpStatus.pumpDeviceState = PumpDeviceState.WakingUp
    else
        medtronicPumpStatus.pumpDeviceState = PumpDeviceState.Sleeping
    rememberLastGoodDeviceCommunicationTime()
    return true
}

// AFTER
if (valid) {
    if (rileyLinkServiceData.rileyLinkServiceState == RileyLinkServiceState.PumpConnectorError) {
        rileyLinkServiceData.setServiceState(RileyLinkServiceState.PumpConnectorReady)
    }
    if (state === PumpDeviceState.PumpUnreachable)
        medtronicPumpStatus.pumpDeviceState = PumpDeviceState.WakingUp
    else
        medtronicPumpStatus.pumpDeviceState = PumpDeviceState.Sleeping
    rememberLastGoodDeviceCommunicationTime()
    return true
}

Effect: When a connection is successfully established (valid pump model received), and the previous state was PumpConnectorError, the RileyLinkServiceState is restored to PumpConnectorReady. This ensures:

  • The UI displays the correct "connected" status
  • The isInitialized property returns the correct value
  • The system state is fully consistent after recovery

Why this change is safe:

  • The code path through connectToDevice() with RileyLinkServiceState.PumpConnectorError was previously unreachable (due to the early return in isPumpNotReachable)
  • Now that Change 1 allows this path, it's logical and correct to restore the state upon successful connection
  • The change only affects PumpConnectorError state, not any other states
  • connectToDevice() performs a real connection test (sends a request and validates the pump model response), so a successful result genuinely indicates working communication

Impact Assessment

Aspect Assessment
Scope Medtronic pump driver only
Risk Low — minimal, targeted changes
Backward compatibility Fully compatible
Other pump drivers No impact (Omnipod Eros uses separate OmnipodRileyLinkCommunicationManager)

Testing Recommendations

  1. Recovery test: Simulate pump unreachable state (e.g., move pump out of range), wait for PumpConnectorError state, then bring pump back in range. Verify automatic recovery without app restart.

  2. Normal operation test: Verify normal pump communication is not affected by the changes.

  3. UI verification: Confirm the RileyLink status indicator shows correct state after recovery.


Files Changed

  1. pump/medtronic/src/main/kotlin/app/aaps/pump/medtronic/MedtronicPumpPlugin.kt

    • Added PumpConnectorError to the list of states that allow reconnection attempts
  2. pump/medtronic/src/main/kotlin/app/aaps/pump/medtronic/comm/MedtronicCommunicationManager.kt

    • Added import for RileyLinkServiceState
    • Added state restoration logic in connectToDevice() method

…ecovery

When pump becomes unreachable and WakeAndTuneTask fails, RileyLinkServiceState
transitions to PumpConnectorError. Previously, isPumpNotReachable would return
false without calling isDeviceReachable(), blocking all automatic reconnection
attempts. This deadlock required app restart to recover.

Changes:
- Add PumpConnectorError to states that allow reconnection attempts in
  isPumpNotReachable (MedtronicPumpPlugin.kt)
- Restore RileyLinkServiceState to PumpConnectorReady upon successful
  connection when recovering from PumpConnectorError state
  (MedtronicCommunicationManager.kt)

Fixes automatic recovery when pump becomes reachable again after being
in PumpUnreachable + PumpConnectorError state.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 5, 2026

@mifi100
Copy link
Contributor Author

mifi100 commented Feb 6, 2026

Update: Further analysis has shown that the current changes are insufficient to resolve the issue. Therefore, this PR is not yet ready for merge and requires additional work. Below, I've outlined several possible fixes. Thoughts and reviews from the community are welcome.

Problem: Current Fix Is Unreachable Through the Normal Recovery Path

Context

Commit cce6720 introduced two changes to fix a deadlock where the Medtronic pump gets permanently stuck in PumpUnreachable / PumpConnectorError state:

  • Change 1 (MedtronicPumpPlugin.kt): Added PumpConnectorError to the list of states that allow isPumpNotReachable to proceed to isDeviceReachable()
  • Change 2 (MedtronicCommunicationManager.kt): Restores RileyLinkServiceState to PumpConnectorReady upon successful connectToDevice()

Both changes are logically correct, but they are unreachable through the normal KeepAliveWorker recovery path.

Root Cause

The recovery chain is:

KeepAliveWorker.checkPump()                         [KeepAliveWorker.kt:226]
  → commandQueue.readStatus(...)                    adds CommandReadStatus to queue
    → QueueWorker.doWorkAndLog()                    [QueueWorker.kt:48]
      → while (true) loop checks pump state:
          if (pump.isConnecting()) {                [QueueWorker.kt:112]
              sleep(1000); continue                 ← STUCK HERE
          }
          ...
          // Never reached:
          queue.pickup()
          it.execute()                              [QueueWorker.kt:143]
            → CommandReadStatus.execute()
              → pump.getPumpStatus()
                → refreshAnyStatusThatNeedsToBeRefreshed()
                  → isPumpNotReachable              ← Change 1 lives here
                    → isDeviceReachable()           ← Change 2 lives here

QueueWorker requires isConnected() == true to proceed to command execution. Before that, it loops while isConnecting() == true. For Medtronic:

// MedtronicPumpPlugin.kt:317-325
override fun isConnected(): Boolean =
    isServiceSet && rileyLinkMedtronicService?.isInitialized == true

override fun isConnecting(): Boolean =
    !isServiceSet || rileyLinkMedtronicService?.isInitialized != true

Where isInitialized is:

// RileyLinkMedtronicService.kt:126-127
val isInitialized: Boolean
    get() = rileyLinkServiceData.rileyLinkServiceState.isReady()

// RileyLinkServiceState.kt:42-44
fun isReady(): Boolean = (this == PumpConnectorReady)

When RileyLinkServiceState == PumpConnectorError:

  • isReady() = false
  • isInitialized = false
  • isConnected() = false
  • isConnecting() = true

QueueWorker spins in the isConnecting() loop for up to 119 seconds (PUMP_MAX_CONNECTION_TIME_IN_SECONDS), then times out without ever executing CommandReadStatus. The fix code is never reached.

This is confirmed by the log — QueueWorker outputs "connecting N" every second with no actual connection attempt:

02:25:47.529 D/PUMPQUEUE: [QueueWorker.doWorkAndLog():112]: connecting 102
02:25:48.532 D/PUMPQUEUE: [QueueWorker.doWorkAndLog():112]: connecting 103
02:25:49.534 D/PUMPQUEUE: [QueueWorker.doWorkAndLog():112]: connecting 104
02:25:50.535 D/PUMPQUEUE: [QueueWorker.doWorkAndLog():112]: connecting 105

Additional context

Key Medtronic methods inherited from PumpPluginAbstract that are not overridden and have empty implementations:

  • connect(reason: String) — does nothing
  • disconnect(reason: String) — does nothing
  • stopConnecting() — does nothing

This means QueueWorker's timeout handler (line 70-104) also cannot help — calling pump.stopConnecting() and pump.disconnect() has no effect.


Proposed Solutions

All solutions are scoped to Medtronic/RileyLink classes only. No changes to QueueWorkerKeepAliveWorker, or other shared infrastructure.

Solution A: Treat PumpConnectorError as "connected" in MedtronicPumpPlugin

Idea: When RileyLinkServiceState is PumpConnectorError, report the pump as "connected" so that QueueWorker proceeds to execute commands. The actual reachability check happens inside getPumpStatus() → isPumpNotReachable → isDeviceReachable(), which is already protected by the existing fix.

Changes in MedtronicPumpPlugin.kt:

override fun isConnected(): Boolean {
    if (displayConnectionMessages) aapsLogger.debug(LTag.PUMP, "MedtronicPumpPlugin::isConnected")
    return isServiceSet && (rileyLinkMedtronicService?.isInitialized == true
        || rileyLinkServiceData.rileyLinkServiceState == RileyLinkServiceState.PumpConnectorError)
}

override fun isConnecting(): Boolean {
    if (displayConnectionMessages) aapsLogger.debug(LTag.PUMP, "MedtronicPumpPlugin::isConnecting")
    return !isServiceSet || (rileyLinkMedtronicService?.isInitialized != true
        && rileyLinkServiceData.rileyLinkServiceState != RileyLinkServiceState.PumpConnectorError)
}

Recovery flow after fix:

  1. KeepAliveWorker enqueues CommandReadStatus
  2. QueueWorker sees isConnected() == true, proceeds to execute
  3. CommandReadStatus.execute() → getPumpStatus() → isPumpNotReachable
  4. Change 1 allows isDeviceReachable() to run (5 retries)
  5. If pump responds: Change 2 restores PumpConnectorReady → full recovery
  6. If pump doesn't respond: isPumpNotReachable returns true, command finishes, QueueWorker exits normally
  7. Next KeepAliveWorker cycle (5 min) retries

Pros:

  • Minimal change (2 methods, 2 lines each)
  • Makes the existing fix (Changes 1 & 2) reachable
  • One recovery attempt per KeepAliveWorker cycle — not aggressive
  • Safe: getPumpStatus() checks reachability before doing anything

Cons:

  • Semantically imprecise: reports "connected" during an error state
  • Other commands in the queue (if any) would also attempt to execute, though they'd fail safely at the communication layer

Solution B: Fix isConnecting() semantics + implement connect() with recovery

Idea: PumpConnectorError is not a "connecting" state — it's an error state. Fix isConnecting() to return false, and implement connect() to attempt recovery when in this state. This follows QueueWorker's design: when isConnecting() == false and isConnected() == false, it calls pump.connect().

Changes in MedtronicPumpPlugin.kt:

override fun isConnecting(): Boolean {
    if (displayConnectionMessages) aapsLogger.debug(LTag.PUMP, "MedtronicPumpPlugin::isConnecting")
    if (!isServiceSet) return true
    if (rileyLinkServiceData.rileyLinkServiceState == RileyLinkServiceState.PumpConnectorError) return false
    return rileyLinkMedtronicService?.isInitialized != true
}

override fun connect(reason: String) {
    if (rileyLinkServiceData.rileyLinkServiceState == RileyLinkServiceState.PumpConnectorError) {
        aapsLogger.debug(LTag.PUMP, "Attempting recovery from PumpConnectorError")
        rileyLinkMedtronicService?.deviceCommunicationManager?.isDeviceReachable()
    }
}

Recovery flow after fix:

  1. QueueWorker sees isConnecting() == falseisConnected() == false
  2. Calls pump.connect() → isDeviceReachable() (5 retries)
  3. If pump responds: Change 2 restores PumpConnectorReady, next loop iteration isConnected() == true, commands execute
  4. If pump doesn't respond: connect() returns, QueueWorker retries after 1s sleep, calls connect() again
  5. After 119 seconds total, QueueWorker times out

Pros:

  • Semantically correct: error state is not "connecting"; connect() is the right place for connection logic
  • Follows QueueWorker's intended design pattern
  • Clear separation of concerns

Cons:

  • More aggressive retry: isDeviceReachable() (5 retries) is called every ~1 second by QueueWorker loop, resulting in continuous retry attempts for up to 119 seconds
  • isDeviceReachable() failure path triggers WakeAndTuneTask when timeout exceeded, which could be called repeatedly within a single QueueWorker session

Solution C: Fix isConnecting() + implement connect() with single-attempt recovery

Idea: Same as Solution B, but connect() performs a single-shot state reset instead of calling isDeviceReachable(). This lets QueueWorker proceed to command execution, where the existing fix handles the full recovery.

Changes in MedtronicPumpPlugin.kt:

override fun isConnecting(): Boolean {
    if (displayConnectionMessages) aapsLogger.debug(LTag.PUMP, "MedtronicPumpPlugin::isConnecting")
    if (!isServiceSet) return true
    if (rileyLinkServiceData.rileyLinkServiceState == RileyLinkServiceState.PumpConnectorError) return false
    return rileyLinkMedtronicService?.isInitialized != true
}

override fun connect(reason: String) {
    if (rileyLinkServiceData.rileyLinkServiceState == RileyLinkServiceState.PumpConnectorError) {
        aapsLogger.debug(LTag.PUMP, "Resetting PumpConnectorError to PumpConnectorReady for recovery attempt")
        rileyLinkServiceData.setServiceState(RileyLinkServiceState.PumpConnectorReady)
    }
}

Recovery flow after fix:

  1. QueueWorker sees isConnecting() == falseisConnected() == false
  2. Calls pump.connect() → resets state to PumpConnectorReady
  3. Next iteration: isConnected() == true, proceeds to execute CommandReadStatus
  4. getPumpStatus() → isPumpNotReachable → isDeviceReachable() (5 retries)
  5. If pump responds: Change 2 confirms PumpConnectorReady → full recovery
  6. If pump doesn't respond: state may cycle back to PumpConnectorError via WakeAndTuneTask
  7. Next KeepAliveWorker cycle (5 min) retries

Pros:

  • Semantically correct (same as B)
  • Single recovery attempt per QueueWorker invocation (same as A)
  • Follows QueueWorker's connect() pattern
  • Delegates actual reachability check to existing getPumpStatus() flow

Cons:

  • Temporarily sets PumpConnectorReady before actual connection is verified — brief inconsistency between reported state and reality
  • If something reads RileyLinkServiceState between connect() and isDeviceReachable(), it would see a misleading PumpConnectorReady

Comparison

Criterion Solution A Solution B Solution C
Number of methods changed 2 2 2
Semantic correctness Low High Medium
Retry aggressiveness 1x / 5 min Continuous / 119s 1x / 5 min
Temporary state inconsistency No No Yes (brief)
Uses existing fix path (Changes 1&2) Yes Partially Yes
Risk of side effects Low Medium Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant