Skip to content

Optimize rule-matching by reordering the OutSection-Rule-Sections loop#820

Draft
parth-07 wants to merge 1 commit intoqualcomm:mainfrom
parth-07:OptimizeRuleMatching
Draft

Optimize rule-matching by reordering the OutSection-Rule-Sections loop#820
parth-07 wants to merge 1 commit intoqualcomm:mainfrom
parth-07:OptimizeRuleMatching

Conversation

@parth-07
Copy link
Contributor

@parth-07 parth-07 commented Feb 12, 2026

This commit improves the performance of linker script rule-matching by reordering the nested OutputSection-LinkerScriptRule-InputSections loop.

Currently, the nested loop is as follows:

for (auto O : OutSections) {
  for (auto R : O.rules()) {
    for (auto S : InputSections) {
      // ...
    }
  }
}

This patch changes the nested loop to:

for (auto S : InputSections) {
  for (auto O : OutSections) {
    for (auto R : O.rules()) {
      // ...
    }
  }
}

The new loop structure has the following benefits:

  • Once an input section is matched to a rule, we waste no iterations over it. On the contrary, in the old design, all the input sections are always iterated over for EACH rule.
  • We do a lot of computation for a section. In the old design, all the computations are repeated for EACH rule, whereas in the new design all the computations happen once-per-section in the outermost loop.

On a contrived test case that make good use of linker script rule-matching, this patch results in ~15% performance improvement for the rule-matching phase.

@parth-07 parth-07 force-pushed the OptimizeRuleMatching branch from 93158a0 to 81a7bc3 Compare February 12, 2026 18:32
This commit improves the performance of linker script rule-matching by
reordering the nested OutputSection-LinkerScriptRule-InputSections loop.

Currently, the nested loop is as follows:

```
for (auto O : OutSections) {
  for (auto R : O.rules()) {
    for (auto S : InputSections) {
      // ...
    }
  }
}
```

This patch changes the nested loop to:

```
for (auto S : InputSections) {
  for (auto O : OutSections) {
    for (auto R : O.rules()) {
      // ...
    }
  }
}
```

The new loop structure has the following benefits:

- Once an input section is matched to a rule, we waste no iterations
  over it. On the contrary, in the old design, all the input sections
  are always iterated over for EACH rule.
- We do a lot of computation for a section. In the old design, all the
  computations are repeated for EACH rule, whereas in the new design all
  the computations happen once-per-section in the outermost loop.

On a contrived test case that make good use of linker script
rule-matching, this patch results in ~15% performance improvement
for the rule-matching phase.

Signed-off-by: Parth Arora <partaror@qti.qualcomm.com>
@parth-07 parth-07 force-pushed the OptimizeRuleMatching branch from 81a7bc3 to 60b7158 Compare February 12, 2026 18:32
@quic-seaswara
Copy link
Contributor

I like this approach, and potential savings.

for (Section *Section : Sections) {
ELFSection *ELFSect = llvm::dyn_cast<eld::ELFSection>(Section);
// For each input section.
for (Section *Section : Sections) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the variable Section to Sec or something like that.

if (In->isSpecial())
if (In->isSpecial()) {
IsRetrySect = true;
RetrySections.insert(std::make_pair(Section, true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need this map.

InputSectionHash = llvm::hash_combine(SectName);
}
if (Section->getOutputSection() && !IsRetrySect) {
shouldBreak = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldSkipMatch instead of shouldBreak ?

break;
}
}
bool shouldBreak = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool shouldBreak = false;

This shouldBreak is currently a no-op

}
// For all output sections.
for (auto *Out : SectionMap) {
if (ELFSect) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (ELFSect) {
if (Section->getOutputSection() && !IsRetrySect)
break;
if (ELFSect) {

// RuleMatchTimes[In] += std::chrono::system_clock::now() - Start;
} // end each rule
if (shouldBreak)
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
break;

} // end each output section
// RuleMatchTimes[In] += std::chrono::system_clock::now() - Start;
} // end each rule
if (shouldBreak)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (shouldBreak)

}

void ObjectBuilder::assignInputFromOutput(eld::InputFile *Obj) {
std::unordered_map<Section *, bool> RetrySections;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still used anymore? We write to it but do we read it anywhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants