Add column mux ratio for ratio = 1, 2, 4#10018
Add column mux ratio for ratio = 1, 2, 4#10018tnguy19 wants to merge 20 commits intoThe-OpenROAD-Project:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements support for column muxing (ratios 1, 2, and 4) in the RAM generator. It introduces logic to partition address bits into row and word selection, utilizes AOI22 cells for read-path muxing, and updates the grid placement logic to accommodate the additional mux and selection components. Feedback highlights a potential row misalignment in the decoder column due to the placement order of word selection logic and a need for more robust filler cell calculation in the inverter column to ensure consistent grid height across different address bit configurations.
a024df6 to
b4bb84d
Compare
60c1025 to
f628127
Compare
|
clang-tidy review says "All clean, LGTM! 👍" |
4 similar comments
|
clang-tidy review says "All clean, LGTM! 👍" |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
Ready for review @rovinski |
rovinski
left a comment
There was a problem hiding this comment.
As discussed in meeting, please increase the power grid pitch to be at least 4 bits wide (whatever that translates to in um)
| @@ -13,6 +13,7 @@ generate_ram \ | |||
| -mask_size 8 \ | |||
| -word_size 8 \ | |||
| -num_words 8 \ | |||
| -column_mux_ratio 1 \ | |||
There was a problem hiding this comment.
This shouldn't be needed, the default col/mux ratio should be 1.
There was a problem hiding this comment.
all of the 8x8 vok files are identical. Just keep the one that's already there and compare against that one.
| if (num_words % column_mux_ratio != 0) { | ||
| logger_->error(RAM, | ||
| 34, | ||
| "num_words ({}) must be divisible by column_mux_ratio ({}).", | ||
| num_words, | ||
| column_mux_ratio); | ||
| } |
There was a problem hiding this comment.
Add a TODO comment about adding support for non-divisible word counts.
It would mean that the last row has some empty spaces instead of bits, but it should be possible and won't force the user to use any more words than they actually want.
|
clang-tidy review says "All clean, LGTM! 👍" |
|
@tnguy19 CI still failing |
|
@gemini-code-assist /review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for column multiplexing (with ratios 1, 2, and 4) in the RAM generator. Key changes include updating the generate, makeSlice, and makeWord functions to incorporate the column_mux_ratio and word_idx parameters, adding logic for shared select nets and AOI muxes for word selection, and integrating a new aoi22_cell_ type. Address inputs have been renamed to addr_rw, and pin placement and grid calculations are adjusted accordingly. A high-severity issue was noted regarding a potential null pointer dereference when deleting port_iter in RamGen::findMasters.
|
clang-tidy review says "All clean, LGTM! 👍" |
|
clang-tidy review says "All clean, LGTM! 👍" |
Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
…e approach Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
|
There was an update that introduce PDK-independent pin mapping, some part of the code has been updated to follow this particularly the logic to determine aoi gates used in mux is now also PDK-independent |
rovinski
left a comment
There was a problem hiding this comment.
First pass, don't have time to fully re-review today
| record_tests { | ||
| make_8x8_sky130 | ||
| make_7x7_nangate45 | ||
| make_8x8 |
There was a problem hiding this comment.
no make_8x8 anymore, that got moved to make_8x8_sky130
| inputs.push_back(p->libertyPort()); | ||
| } | ||
| } | ||
| delete port_iter; |
There was a problem hiding this comment.
Use smart pointer and no delete
std::unique_ptr port_iter = liberty->portIterator();
… finder Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
…en files Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
| bool is_inverted | ||
| = !port_name.empty() | ||
| && (port_name.back() == 'N' || port_name.back() == 'B'); |
There was a problem hiding this comment.
This isn't going to be robust enough. Is there no way to determine an inverted output from STA?
There was a problem hiding this comment.
I found 2 functions in the sta's Sequential.hh file ( /OpenROAD/src/sta/include/sta/Sequential.hh) that returns the output and inverted output port (output() and outputInv()) . Do you want me to use that to check here?
There was a problem hiding this comment.
I tested Sequential::output() and outputInv() to distinguish Q and Qn ports. Found that these return internal state variable names (IQ/IQN) rather than the actual Liberty port names (Q/QN), so they can't be used directly for port name matching. Also tested using function()->op() to check if the output is a not_ expression to identify QN, but both Q and QN returned the same function op, so that approach is also unreliable. The port iterator gives Q/QN, but neither method can distinguish them without relying on the port name directly.
There was a problem hiding this comment.
@maliberty @povik do you have any tips on how to identify an inverted vs. non-inverted output of a flip-flop?
There was a problem hiding this comment.
You need to ask the output port for its function and see if it matches up with the internal pos/neg variable. For example in ng45
cell (DFFRS_X1) {
ff ("IQ" , "IQN") ...
pin (Q) {
direction : output;
function : "IQ";
pin (QN) {
direction : output;
function : "IQN";
| // collect all input ports for this cell | ||
| std::vector<sta::LibertyPort*> inputs; | ||
| std::unique_ptr<sta::LibertyCellPortIterator> port_iter( | ||
| std::unique_ptr<sta::ConcreteCellPortIterator> port_iter( |
There was a problem hiding this comment.
Any particular reason for the change to ConcereteCell?
There was a problem hiding this comment.
This is because it triggers a clang-tidy suggestion from last commit warning: no matching constructor for initialization of 'std::unique_ptrsta::LibertyCellPortIterator' [clang-diagnostic-error]
|
@gemini-code-assist /review |
There was a problem hiding this comment.
Code Review
This pull request implements column muxing support (ratios 1, 2, and 4) in the RAM generator, utilizing AOI22 cells for read-path selection and shared select signals to minimize routing congestion. The review identifies critical bugs where multiple read ports are incorrectly shorted together and muxing logic is hardcoded for only the first port. Key feedback includes adding an error check to limit column muxing to single-port configurations, moving redundant gate instantiations out of per-slice loops to optimize the design, and restoring global pin placement settings to avoid unintended side effects.
| vector<vector<dbNet*>> word_output_nets(read_ports); | ||
| for (int port = 0; port < read_ports; ++port) { | ||
| for (int bit = 0; bit < word_size; ++bit) { | ||
| word_output_nets[port].push_back(word_q_nets[word_idx][bit]); |
| vector<vector<dbNet*>> word_q_nets(column_mux_ratio, | ||
| vector<dbNet*>(word_size)); | ||
| if (column_mux_ratio == 1) { | ||
| for (int bit = 0; bit < word_size; ++bit) { | ||
| word_q_nets[0][bit] = q_outputs_[0][bit]->getNet(); | ||
| } | ||
| } else { | ||
| for (int w = 0; w < column_mux_ratio; ++w) { | ||
| for (int bit = 0; bit < word_size; ++bit) { | ||
| word_q_nets[w][bit] = makeNet("word_q", fmt::format("w{}_b{}", w, bit)); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The word_q_nets structure and initialization do not account for multiple read ports when column_mux_ratio > 1. Currently, it only creates a 2D vector indexed by word and bit, which means all read ports will be shorted to the same intermediate net. This should be a 3D structure indexed by [port][word_idx][bit] to support multi-port RAMs.
| if (column_mux_ratio == 2) { | ||
| // mux placement | ||
| // col base_col+0: buffer | ||
| // col base_col+1: AOI22 + inverter side by side in same cell/column | ||
| auto aoi_out = makeNet(prefix, "aoi_out"); | ||
| auto mux_cell = std::make_unique<Cell>(); | ||
| makeInst(mux_cell.get(), | ||
| prefix, | ||
| "aoi", | ||
| aoi22_cell_, | ||
| {{aoi22_in_a1_, inv_addr[0]}, | ||
| {aoi22_in_a2_, bit_word_q_nets[0]}, | ||
| {aoi22_in_b1_, addr_inputs_[0]->getNet()}, | ||
| {aoi22_in_b2_, bit_word_q_nets[1]}, | ||
| {aoi22_out_, aoi_out}}); | ||
| makeInst(mux_cell.get(), | ||
| prefix, | ||
| "inv", | ||
| inv_cell_, | ||
| {{inv_ports_[{PortRoleType::DataIn, 0}], aoi_out}, | ||
| {inv_ports_[{PortRoleType::DataOut, 0}], | ||
| q_outputs_[0][global_bit]->getNet()}}); | ||
| ram_grid_.addCell(std::move(mux_cell), base_col + 1); | ||
|
|
||
| } else if (column_mux_ratio == 4) { | ||
| // mux placement: | ||
| // col base_col+0: buffer | ||
| // col base_col+1: s1_AOI_0 (w0+w1) | ||
| // col base_col+2: s2_AOI final (even stages so no inverter needed) | ||
| // col base_col+3: s1_AOI_1 (w2+w3) | ||
| auto s1_out_0 = makeNet(prefix, "s1_out_0"); | ||
| auto s1_out_1 = makeNet(prefix, "s1_out_1"); | ||
|
|
||
| // col base_col+1: stage1 AOI for word0+word1 | ||
| auto s1_cell_0 = std::make_unique<Cell>(); | ||
| makeInst(s1_cell_0.get(), | ||
| prefix, | ||
| "s1_aoi_0", | ||
| aoi22_cell_, | ||
| {{aoi22_in_a1_, inv_addr[0]}, | ||
| {aoi22_in_a2_, bit_word_q_nets[0]}, | ||
| {aoi22_in_b1_, addr_inputs_[0]->getNet()}, | ||
| {aoi22_in_b2_, bit_word_q_nets[1]}, | ||
| {aoi22_out_, s1_out_0}}); | ||
| ram_grid_.addCell(std::move(s1_cell_0), base_col + 1); | ||
|
|
||
| // col base_col+3: stage1 AOI for word2 + word3 | ||
| // NOTE: must be placed before s2 so s1_out_1 net exists for s2 input | ||
| auto s1_cell_1 = std::make_unique<Cell>(); | ||
| makeInst(s1_cell_1.get(), | ||
| prefix, | ||
| "s1_aoi_1", | ||
| aoi22_cell_, | ||
| {{aoi22_in_a1_, inv_addr[0]}, | ||
| {aoi22_in_a2_, bit_word_q_nets[2]}, | ||
| {aoi22_in_b1_, addr_inputs_[0]->getNet()}, | ||
| {aoi22_in_b2_, bit_word_q_nets[3]}, | ||
| {aoi22_out_, s1_out_1}}); | ||
| ram_grid_.addCell(std::move(s1_cell_1), base_col + 3); | ||
|
|
||
| // col base_col+2: stage2 AOI combining stage 1 outputs and drives Q | ||
| // directly | ||
| auto s2_cell = std::make_unique<Cell>(); | ||
| makeInst(s2_cell.get(), | ||
| prefix, | ||
| "s2_aoi", | ||
| aoi22_cell_, | ||
| {{aoi22_in_a1_, inv_addr[1]}, | ||
| {aoi22_in_a2_, s1_out_0}, | ||
| {aoi22_in_b1_, addr_inputs_[1]->getNet()}, | ||
| {aoi22_in_b2_, s1_out_1}, | ||
| {aoi22_out_, q_outputs_[0][global_bit]->getNet()}}); | ||
| ram_grid_.addCell(std::move(s2_cell), base_col + 2); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
| if (column_mux_ratio != 1 && column_mux_ratio != 2 && column_mux_ratio != 4) { | ||
| logger_->error(RAM, | ||
| 33, | ||
| "The ram generator currently only supports column_mux_ratio " | ||
| "values of 1, 2, or 4."); | ||
| } |
There was a problem hiding this comment.
Since the current implementation of column muxing only supports single-port RAMs correctly, an explicit error check should be added to prevent generating invalid netlists for multi-port configurations with column_mux_ratio > 1. This ensures critical failure conditions are logged as errors rather than proceeding with an invalid design.
// error checking for column_mux_ratio
if (column_mux_ratio != 1 && column_mux_ratio != 2 && column_mux_ratio != 4) {
logger_->error(RAM,
33,
"The ram generator currently only supports column_mux_ratio "
"values of 1, 2, or 4.");
}
if (column_mux_ratio > 1 && read_ports > 1) {
logger_->error(RAM,
36,
"Column muxing is currently only supported for single-port RAMs.");
}References
- Unexpected failures during the flow should be logged as an error, not a warning, to ensure critical failure messages are not missed by the user.
| write_sel = makeNet(prefix, "write_sel"); | ||
| makeInst(sel_cell.get(), | ||
| prefix, | ||
| "word_and", | ||
| and2_cell_, | ||
| {{and2_ports_[{PortRoleType::DataIn, 0}], selects[0]}, | ||
| {and2_ports_[{PortRoleType::DataIn, 1}], word_select}, | ||
| {and2_ports_[{PortRoleType::DataOut, 0}], write_sel}}); | ||
| } |
There was a problem hiding this comment.
The word_and instance, which combines row_select and word_select, is being created inside makeSlice. Since word_select and selects[0] are constant for all slices within a word, this gate is redundantly instantiated for every slice. Following the principle of analyzing the entire loop structure, this operation should be moved to the beginning of the word generation (makeWord) and the resulting net passed to makeSlice to reduce area and power.
References
- When analyzing code within a loop, consider the entire loop structure to identify and move redundant operations from the loop body to the beginning of the loop.
| auto pin_tech = block_->getDb()->getTech(); | ||
| io_placer_->addHorLayer(pin_tech->findLayer(hor_name)); | ||
| io_placer_->addVerLayer(pin_tech->findLayer(ver_name)); | ||
| io_placer_->getParameters()->setCornerAvoidance(0); |
There was a problem hiding this comment.
This can be ignored because the OR session will not be reused.
|
@tnguy19 I will give you the option of either:
|
Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
|
An error check for when |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
Waiting to find out if there's a better way to distinguish register outputs before approving, but that's the last thing. |
Signed-off-by: Thinh Nguyen <nguyenthinh19011@gmail.com>
Summary
Add
-column_mux_ratioparameter togenerate_ramandgenerate_ram_netlistwith default value = 1.The column_mux_ratio value indicate the number of words that will share the same physical row, and currently this implementation will work correctly for values = 1, 2, and 4*.
Muxes are implemented using AOI22s for ratio =2 (one AOI22+inverter) and ratio =4 (2 stage-1 AOI22s feeding a stage-2 AOI22).
Some changes to the code to accommodate this feature additions are also added.
Configs tested and passed: RAM8x8, 16x8, 32x6, 64x8, 128x8 with
column_mux_ratio= 1, 2, 4Type of Change
Impact
generate_ramandgenerate_ram_netlistcan be use with option-column_mux_ratiowith default value = 1. Scripts that do not havecolumn_mux_ratioare not affectednum_wordstonum_rowsto indicate the number of physical row that will be created. Forcolumn_mux_ratio>1,num_rows = num_words / column_mux_ratiophysical rows are generated. Column mux are placed in the buffer rowstorage_{row}_{bit}tostorage_{row}_{word}_{bit}to reflect the placement of multiple words in one row whencolumn_mux_ratio> 1Verification
./etc/Build.sh).Related Issues
Fixes #9843
@rovinski