Skip to content

observer: MWOCP67 PMBus Support#2549

Open
evan-oxide wants to merge 15 commits into
masterfrom
evan/observer-pmbus
Open

observer: MWOCP67 PMBus Support#2549
evan-oxide wants to merge 15 commits into
masterfrom
evan/observer-pmbus

Conversation

@evan-oxide

@evan-oxide evan-oxide commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

This PR implements oxidecomputer/pmbus#28. It:

  • Adds support for PMBus communication with the MWOCP67 PSUs.
  • Combines the psc-seq-server and observer-seq-server back into one task, since the differences can be managed by a few cfgs. (I considered using an Mwocp6X trait instead to make it generic over PSU models, but that's hard because of the methods that return device-specific CommandData types).
  • Polls the onboard tmp117 temperature sensor.
  • Adds the framulator task to dev builds, to test that the FRAM works.

Known issues / remaining todos:

  • It's still unclear how many power rails the MWOCP67 actually has, what their voltages are, and what pages are used to access them. It looks suspiciously like it's reporting stats for the same 50V rail on both pages 0 and 1. There's also insufficient documentation about which other registers are paged.
  • The MWOCP67 has 2 new temperature sensors on the positive and negative bus bar clips and they both report ~170°C. This is probably incorrect.
  • I wasn't able to manually control the fan speed. I don't know if we care.
  • I'm not sure what model of EEPROM is built into the PSUs. I'm assuming it's still the M24C02, like the MWOCP68 had, and I was able to read some bytes off it.
  • I skipped implementing the blackbox feature because it looks complicated and we weren't using the (somewhat different) blackbox feature on the PSC.
  • PSU firmware updates aren't supported yet.

Testing:

  • Basic PMBus reads:
    • Can read model number, manufacturer ID, and serial number
    • Can read temperatures 1, 2 and 3
    • Can read both bus bar clip temperatures
    • Can read fan speed
    • Can read status_word
  • humility dashboard shows 5 temps and 1 fan speed per PSU, plus the onboard tmp117
  • humility power shows stats for power rails (though they might be wrong, see above)
  • When someone removes and reinserts a PSU, I see appropriate events in the ringbuf
  • Can read something from the at24csw080 VPD eeprom
  • Can read something from the PSUs' m24c02 eeproms
  • rail name looks right in ereports

@evan-oxide evan-oxide added the Observer PSC + 5.5 kW fun. label Jun 4, 2026
@evan-oxide evan-oxide marked this pull request as ready for review June 4, 2026 18:34
@evan-oxide evan-oxide requested review from hawkw and mkeeter June 4, 2026 19:58
Comment thread app/observer/base.toml
Comment on lines +418 to +421
# This one device actually appears at 8 different i2c addresses. 0b1010_000 -
# 0b1010_011 let you access the main memory registers, where the 2 LSBs of the
# device address are the 2 MSBs of the memory address. 0b1011_000 - 0b1011_011
# let you access the security registers and write protection registers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW --- and I am not saying that we should get rid of this comment, for the record --- this is an EEPROM we use kinda all over the place (there's one on pretty much every board), so perhaps we should document this in a more centralized location.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts about where it should go? This info is in the datasheet, I just added a note here because I was confused about why I saw 8 devices on the bus.

Comment thread app/observer/dev.toml Outdated
Comment thread drv/i2c-devices/src/mwocp67.rs Outdated
Comment thread drv/i2c-devices/src/mwocp6x/mwocp67.rs Outdated
Comment thread drv/i2c-devices/src/mwocp67.rs Outdated
Comment on lines +411 to +448
const REVISION_LEN: usize = 14;

let mut data = [0u8; REVISION_LEN];
let expected = b"XXXX-YYYY-0000";

let len = self
.device
.read_block(CommandCode::MFR_REVISION as u8, &mut data)
.map_err(|code| Error::BadFirmwareRevRead { code })?;

//
// Per ACAN-157, we are expecting this to be of the format:
//
// XXXX-YYYY-0000
//
// Where XXXX is the firmware revision on the primary MCU (AC input
// side) and YYYY is the firmware revision on the secondary MCU (DC
// output side). We aren't going to be rigid about the format of
// either revision, but we will be rigid about the rest of the format.
//
if len != REVISION_LEN {
return Err(Error::BadFirmwareRevLength);
}

for index in 0..len {
if expected[index] == b'X' || expected[index] == b'Y' {
continue;
}

if data[index] != expected[index] {
return Err(Error::BadFirmwareRev { index: index as u8 });
}
}

//
// Return the primary MCU version
//
Ok(FirmwareRev([data[0], data[1], data[2], data[3]]))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the same format as the one used by the MWOCP68, right? That observation kinda makes me want to push for pulling out the parsing code into a common "Murata FRU ID" thingy. But, well...I don't think that actually gains us all that much, I just have some aesthetic desire to not duplicate things.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled it out into a mwocp6x module along with the shared types. What do you think of the new module structure? It might need to change later if the firmware update process ends up being significantly different, but hopefully not.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that we can't share more of the implementations - they look similar but use different types from the pmbus crate.

Comment on lines +490 to +491
// ACAN-157 doesn't specify what page this is on.
// Assume it's on page 0, as it is on the better-documented mwocp68.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, have we tested what happens when we read this (and the other status registers) on page 1? Does it also work, or does the MCU get mad at us?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried reading a bunch of these registers on both pages and always got the same value on both. I don't know if that's because the page is ignored or they happened to be the same on both pages.

Comment thread drv/psc-seq-server/src/main.rs Outdated
Comment thread drv/psc-seq-server/src/main.rs Outdated
Slot::Psu5 => '5',
};
FixedString::try_from_utf8(&v54_psu[..]).unwrap_lite()
Mwocp6x::rail_name(psu_label)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the decision to move this into the drivers a bit weird at first once I saw it, but now that I see how it's being used, it's not unreasonable. However, I don't love putting the rail name in the drv-i2c-devices crate, since we could imagine a universe where we have a MWOCP6x PSU on a board where we use different rail names (although admittedly, that's unlikely). Personally, I might have considered just leaving all the code here, and having one of two different "base" strings ("V54_PSUx" or "V50_PSUx") based on the board cfg. I'm not super bothered by the current approach, but I feel like it might be a bit simpler that way. Up to you!

@evan-oxide evan-oxide Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't love this approach either. On one hand it seems like the Mwocp67 class shouldn't need to know about rail naming conventions, and on the other hand scattering too many cfgs in the psc-seq-server could get confusing. So is this what you're suggesting?

let rail = {
    #[cfg(any(target_board = "psc-b", target_board = "psc-c"))]
    let mut rail_name = *b"V54_PSUx";
    #[cfg(target_board = "observer-a")]
    let mut rail_name = *b"V50_MAIN_PSUx";

    rail_name[rail_name.len() - 1] = match self.slot {
        Slot::Psu0 => b'0',
        ...
    };
    FixedString::try_from_utf8(&rail_name[..]).unwrap_lite()
};

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's a problem that the rail names are different lengths now but EreportFields is still using FixedString<8>.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we will have to either just make it longer and null pad it on the MWOCP68 version, or make the const size depend on which board we're building. But I do think having all of this right there in that file feels better to me, rather than hiding it in the driver.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I agree that the names should be here in psc-seq-server, rather than in the I2C driver (which should be rail-agnostic).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I should probably test this... any suggestions for how to generate an ereport and see if the rail name looks good?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested! I generated a spurious ereport and read it with humility. The rail name looks good.

Comment on lines +141 to +155
Device::Tmp117 => {
for &s in self.temperature_sensors.iter() {
let m = Tmp117::new(&dev);
match m.read_temperature() {
Ok(v) => sensor_api.post_now(s, v.0),
Err(e) => {
let e = Error::Tmp117Error(e);
ringbuf_entry!(Trace::TemperatureReadFailed(s, e));
sensor_api.nodata_now(s, e.into())
}
}
}
// The tmp117 is just a temperature sensor, not a speed sensor.
assert_eq!(self.speed_sensors.len(), 0);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that, after this PR merges, if we were to add a temperature sensor to the TMP117 in the PSC config, we would also be able to fix #2547?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you'd need to update the PSC's base.toml, add an entry to the SENSORS array in sensor-polling, and test that the ComponentDetails part works (not just humility sensors).

Comment thread drv/i2c-devices/src/mwocp6x/mod.rs
@evan-oxide evan-oxide requested a review from hawkw June 8, 2026 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Observer PSC + 5.5 kW fun.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants