Skip to content

Implement Tail Call Optimize into ReturnCall Instruction#400

Open
makachanm wants to merge 2 commits intoSamsung:mainfrom
makachanm:impl_tco
Open

Implement Tail Call Optimize into ReturnCall Instruction#400
makachanm wants to merge 2 commits intoSamsung:mainfrom
makachanm:impl_tco

Conversation

@makachanm
Copy link
Copy Markdown

I implemented a tail call optimize into WebAssembly 3's ReturnCall recursive instruction.
It currently unfinished, (the issue with type checking in the standard's spec, indirect call is not completed..) but it passes Return Call Test code.

@clover2123
Copy link
Copy Markdown
Collaborator

clover2123 commented Apr 8, 2026

Please run tools/check_tidy.py --update to format the code according to the style guidelines.

@makachanm
Copy link
Copy Markdown
Author

I applied. thanks.

Copy link
Copy Markdown
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a good idea. I would simplify the parameter storing: the parameters are simply stored in a linear buffer, and copied by a memcpy.

Comment thread src/interpreter/Interpreter.h Outdated
Comment thread src/runtime/ExecutionState.h Outdated
Comment thread src/runtime/ExecutionState.h Outdated
}

std::vector<size_t> tco_paramStore;
size_t tco_paramSize;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector should have a size.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend using the predefined Vector structure as shown below.
Also, regarding naming conventions, the Walrus project typically uses member variables that start with m_ and avoids underscores in the middle of names.

Vector<size_t> m_tcoParamStore;

If you are using a fixed-length vector, please use VectorWithFixedSize instead:

VectorWithFixedSize<size_t> m_tcoParamStore;
m_tcoParamStore.reserve(newSize);  // allocate array
m_tcoParamStore.clear();           // reset array

return reinterpret_cast<ByteCodeStackOffset*>(reinterpret_cast<size_t>(this) + sizeof(Call));
}

uint16_t parameterOffsetsSize() const
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I wrong, but as far as I remember parameters are always stored in a linear memory area. This means their offsets do not need to be stored, they can be computed during running. The size of the tco buffer could be stored so the buffer could be pre-allocated.

@zherczeg
Copy link
Copy Markdown
Collaborator

zherczeg commented Apr 9, 2026

Parameters are stored in a linear area:
https://github.com/Samsung/walrus/blob/main/src/interpreter/Interpreter.h#L57

Then the tco buffer could be copied here without changes.

@makachanm
Copy link
Copy Markdown
Author

I'll rework parameter recover code with memcpy and Walrus's own vector implementation. Thanks for the feedback.

@makachanm
Copy link
Copy Markdown
Author

I think lack of JIT version of TCO implement causes CI failure. I'll seperate it for future works.

Comment thread src/runtime/ExecutionState.h Outdated
void destroyTCO()
{
m_tcoparamStore.clear();
m_tcofunctionTarget = nullptr;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names are wrong. They should be m_tcoParamStore and m_tcoFunction store.

@makachanm
Copy link
Copy Markdown
Author

makachanm commented Apr 13, 2026

I've verified that all tests in the walrus suite are passing in the x64 local machine at interpret mode.

@clover2123
Copy link
Copy Markdown
Collaborator

There is a conflict in ByteCode.h file.
Please rebase it and push this pr again

@makachanm makachanm force-pushed the impl_tco branch 3 times, most recently from 67bc7f8 to daa44c1 Compare April 13, 2026 09:30
@makachanm makachanm marked this pull request as draft April 13, 2026 09:30
@makachanm makachanm marked this pull request as ready for review April 13, 2026 10:13
@makachanm
Copy link
Copy Markdown
Author

makachanm commented Apr 14, 2026

CI is failing when Clang compiler is used. I had no problem with GCC.
I'll fix it with reordering some parameter initialize order.

Table* table = instance->table(code->tableIndex());

uint32_t idx = readValue<uint32_t>(bp, code->calleeOffset());
if (idx >= table->size()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add UNLIKELY to this check code

Trap::throwException(state, "uninitialized element " + std::to_string(idx));
}
const FunctionType* ft = target->functionType();
if (!ft->equals(code->functionType())) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNLIKELY here too

Trap::throwException(state, "null function reference");
}
const FunctionType* ft = target->functionType();
if (!ft->equals(code->functionType())) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines +113 to +118
if (newState.hasTCO()) {
state.m_tcoParamStore.reserve(newState.m_tcoParamStore.size());
VectorCopier<size_t>::copy(state.m_tcoParamStore.data(), newState.m_tcoParamStore.data(), newState.m_tcoParamStore.size());
state.m_tcoFunctionTarget = newState.m_tcoFunctionTarget;
return;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, the vector needs to be copied into the upper ExecutionState.
Instead of this copy, what about allocating one global vector and reusing this vector?
This approach could reduce copy overhead.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a idea for where I declare a global TCO buffer. I think addition of new execution state manager will be affect the code's consistency. and It can be harder to decide when the timing the destroy a TCO buffer.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Store would be a good place to keep the TCO buffer.
Store can be accessed through instance->module()->store() in the interpreter.

uint8_t* bp,
Instance* instance);

static void returnCallOperation(ExecutionState& state,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this function is declared but never used

Comment on lines +55 to +57
if (state.hasTCO()) {
VectorCopier<size_t>::copy((size_t*)functionStackBase, state.m_tcoParamStore.data(), state.m_tcoParamStore.size());
state.destroyTCO();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When interpreter is called again due to tail call,
state.destroyTCO(); has been called so state doesn't have any valid TCO data at this point.
Would you check it again?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check it later.

@clover2123
Copy link
Copy Markdown
Collaborator

Lastly, have you been able to test this patch for recursive tail calls?

@makachanm
Copy link
Copy Markdown
Author

Lastly, have you been able to test this patch for recursive tail calls?

I'll do it through recursive fibonacci test. Thanks for the mention.

@makachanm
Copy link
Copy Markdown
Author

I found fibonacci test makes call stack exhaustion. I'm analyzing what's problem with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants