Updated aliases#2326
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
|
||
| term context_process_alias_message_signal(Context *ctx, struct TermSignal *signal) | ||
| { | ||
| // signal->signal_term is a 2-tuple {Ref, Message}. The alias lookup runs here, in the owner's own |
There was a problem hiding this comment.
Do we want comments that verbose? The drawback is that eventually they end up being unsync with code, and LLM or humans tend to read comments instead of code.
There was a problem hiding this comment.
Not at all, I'm leaving them for helping with review activity, but I will cleanup code from all that narration.
| // Target cannot be NULL as we processed Demonitor signals | ||
| assert(target != NULL); | ||
| int required_terms = REF_SIZE + TUPLE_SIZE(5); | ||
| int required_terms = TERM_BOXED_REFERENCE_PROCESS_SIZE + TUPLE_SIZE(5); |
There was a problem hiding this comment.
This size could be conditional of monitor->monitor_type == CONTEXT_MONITOR_MONITORED_LOCAL_ALIAS, couldn't it?
| // lookups are no longer skipped for this process, trading the optimization for | ||
| // correctness instead of wrapping to 0 and silently deactivating every alias. | ||
| unsigned int active_alias_count : 16; | ||
| unsigned int alias_count_saturated : 1; |
There was a problem hiding this comment.
Not sure this optimization is worth the 17 bits it costs. We could have only one bit and check if there is any alias left when removing an alias. Alternatively, we could reduce it to 8 bits and drop the optimization when the number of aliases reaches (forever) 255.
There was a problem hiding this comment.
Fixed, using just 8 bit stored in the free bitfield area should be a reasonable compromise. So we don't have to scan everything every time an alias is removed.
| { | ||
| Context *p = globalcontext_get_process_lock(glb, process_id); | ||
| if (p) { | ||
| // Post the alias message as a signal carrying {Ref, Message}. The owner validates the alias |
There was a problem hiding this comment.
I don't think this comment adds any value
There was a problem hiding this comment.
All these comments/narration will be removed while squashing, after review.
| * | ||
| * @details Posts an AliasMessageSignal carrying {Ref, Message} to the owner process. The owner | ||
| * validates the alias against its own monitors when it drains signals and either delivers Message | ||
| * as a normal message or drops it. This avoids touching the owner's monitor list from the sender. |
There was a problem hiding this comment.
Last sentence sounds obvious here.
| set_error(ctx, jit_state, 0, BADARG_ATOM); | ||
| return false; | ||
| } else { | ||
| // A reference that is not a local process reference (short/resource/external ref): the |
There was a problem hiding this comment.
A TODO would be welcome for distributed aliases.
Also I would put the term_is_reference first.
| } | ||
|
|
||
| int local_process_id; | ||
| if (term_is_local_pid_or_port(recipient_term)) { |
There was a problem hiding this comment.
Does this test still make sense?
|
AMP: PR Review — Process aliases for AtomVMRange reviewed: Diffstat: ~40 files, +1975 / −173. The series adds OTP process aliases: VerdictStrong, ship-quality work. The design is sound: aliases are validated in The findings below are mostly pre-existing OOM-robustness gaps that this Findings1. (Medium) OOM in
|
| ), | ||
| P ! quit, | ||
| ok = wait_registered(down_batch_relay, 5000000), | ||
| %% The huge bound absorbs valgrind's unfair thread scheduling, which can starve the relay |
Implement Erlang process aliases together with the reference-representation
rework they require, rebased onto release-0.7 and squashed into one commit.
Public API (libs/estdlib/src/erlang.erl):
- alias/0, unalias/1
- monitor/3 with {alias, explicit_unalias | demonitor | reply_demonitor}
- spawn_opt {monitor, [monitor_option()]}
Runtime:
- New process-reference term carrying ref_ticks + owner process id (RefData),
alongside short/local references; BEAM-compliant reference ordering and
binary_to_term/term_to_binary round-tripping (term.{c,h}, external_term.c).
- Monitors store RefData; alias monitors (CONTEXT_MONITOR_ALIAS) with the three
alias modes and their lifecycle (context.{c,h}).
- send/2, the send opcode and the JIT send path resolve an alias to its target
process and drop reply_demonitor aliases after first use (nifs.c,
opcodesswitch.h, jit.c).
- Reference size macros: REF_SIZE -> TERM_BOXED_REFERENCE_SHORT_SIZE, plus
TERM_BOXED_REFERENCE_PROCESS_SIZE and TERM_BOXED_REFERENCE_RESOURCE_SIZE.
Tests: alias coverage added to test_monitor, test_refs_ordering and
test_binary_to_term.
Signed-off-by: Davide Bettio <davide@uninstall.it>
Sending to a process reference resolved and deactivated the alias from the
sending process: it walked and mutated the target's monitor list
(context_find_alias / context_unalias) while holding only the processes_table
read lock. That lock only guards against the target being freed, not against
the target mutating its own monitor list (via alias/0, unalias/1, demonitor or
process teardown), so a send racing with a monitor change could corrupt the
list or free a node used by another scheduler -- a data race / use-after-free
on SMP.
Route alias sends through the mailbox instead, the same way monitor and
demonitor operations are already delivered across processes. The send paths now
post an AliasMessageSignal carrying {Ref, Message} via the shared helper
globalcontext_send_message_to_alias; the owner validates the alias against its
own monitor list when it drains signals (context_process_alias_message_signal)
and either delivers the message or drops it for an inactive alias. All
monitor-list access now happens in the owning process, and the three send paths
(erlang:send/2, the send opcode and the JIT send) share one helper.
Signed-off-by: Davide Bettio <davide@uninstall.it>
A monitor created with {alias, reply_demonitor} previously only deactivated the
alias on the first reply through it; the underlying monitor stayed active, so
the owner could still receive a stale 'DOWN'. Match OTP: on the first message
received through the alias, also remove the monitor and signal the monitored
process to drop its side, as erlang:demonitor/1 would.
Add a regression test asserting that, after a reply through a reply_demonitor
alias, no 'DOWN' is delivered when the monitored process later exits.
Signed-off-by: Davide Bettio <davide@uninstall.it>
Aliases require OTP 24, so the alias tests guarded their alias-specific cases behind a runtime OTP-version check (plus a make_ref-based fallback for older BEAM). AtomVM no longer supports OTP < 26, where aliases are always available, so those checks are dead code: run the alias cases unconditionally and drop the is_atomvm_or_otp_version_at_least/1 helper and the mock-alias fallback. Signed-off-by: Davide Bettio <davide@uninstall.it>
For {alias, demonitor} and {alias, reply_demonitor}, the alias is deactivated
when the monitor is removed, including the automatic removal when a 'DOWN' is
delivered (as erlang:demonitor/1 already does for an explicit demonitor).
context_process_monitor_down_signal removed the monitor but left the alias
active, so a third party could still reach the owner through the alias after
the monitored process had exited.
Deactivate the alias (unless it is explicit_unalias) in both the local and
registered-name 'DOWN' paths, and add a regression test.
Signed-off-by: Davide Bettio <davide@uninstall.it>
nif_erlang_monitor locks the target process, then raises badarg when the object type does not match the target (monitor(process, Port, ...) or vice versa). That error path returned without releasing the processes-table lock, unlike the three sibling out-of-memory error paths in the same function, leaking the lock and deadlocking the next writer on SMP builds. Unlock the target before raising. (A behavioural test would deadlock rather than fail; verified by inspection against the sibling error paths.) Signed-off-by: Davide Bettio <davide@uninstall.it>
Both nif_erlang_monitor and do_spawn published the monitor/alias state (queuing a MonitorSignal to the target and adding monitors to the contexts) and only then reserved heap for the returned reference. An out-of-memory at that last step raised, leaving the monitor half-installed: the target could later send a 'DOWN' for a reference the caller never received. Reserve the result space before publishing, freeing the monitor structs on failure. GC at the new point is safe -- the monitor structs hold only an immediate pid and a plain RefData, none reachable from the heap. Unwinding the separate spawn_opt link side effect on a later OOM is left out of scope. Signed-off-by: Davide Bettio <davide@uninstall.it>
An alias send is delivered via an AliasMessageSignal so the owner validates the alias in its own context (race-free). But mailbox_process_outer_list moved normal messages straight to the inner mailbox while signals were processed afterwards and re-posted, so `Alias ! m1, Pid ! m2` to the same receiver could deliver m2 before m1 -- violating the single-sender message ordering guarantee. Convert alias messages to normal messages in place during the outer-list reverse (new mailbox_process_outer_list_with_aliases), prepending them into the normal stream so they keep send order relative to plain messages from the same sender. context_process_alias_message_signal now returns the payload instead of re-posting it. Teardown and the port scheduler keep the plain mailbox_process_outer_list (which returns alias signals to drop): they must not run the reply_demonitor path, which takes the processes-table lock that context_destroy already holds. Add a regression test (alias send then pid send keep order). Signed-off-by: Davide Bettio <davide@uninstall.it>
term_compare sizes a process reference as 3 words (process_id + ref_ticks) in the local-vs-local branch, but the mixed local-vs-external branch still sized it as 2 words, dropping process_id. A process reference was therefore ordered differently depending on whether it was compared against a local or an external reference, which can make the term order inconsistent. (The reviewer's "two process refs with equal ref_ticks" case cannot actually occur: ref_ticks come from a global monotonic counter, so they are unique.) Size process references as 3 words in the mixed branch too and add the matching extraction, mirroring the local-vs-local logic. Exercising this path requires an external reference (distribution), so it is verified by inspection. Signed-off-by: Davide Bettio <davide@uninstall.it>
binary_to_term of a NEWER_REFERENCE_EXT with len == 3 turned the third word straight into a process reference id without bounding it. Note this was not a memory-safety issue as the reviewer suggested: the id flows into globalcontext_get_process_lock, which performs a linear search of the process table (no out-of-bounds indexing), so an out-of-range value simply matches no process. Still, validate it like a local pid (TERM_MAX_LOCAL_PROCESS_ID) to reject clearly malformed input early. Signed-off-by: Davide Bettio <davide@uninstall.it>
monitor(process, X, [{alias, ...}]) returned a plain short reference (no alias)
when X was self() or an already-dead process, so the returned reference could not
be used to send to the process or be passed to unalias/1. OTP returns a usable
alias in both cases.
Install the alias and return a process reference on both early-return paths. For
self the alias stays active (self never sends a DOWN). On the noproc path the
monitor is immediately removed by the noproc DOWN, so only an explicit_unalias
alias stays active (demonitor / reply_demonitor are deactivated right away, as at
a real DOWN). Add regression tests for both.
Signed-off-by: Davide Bettio <davide@uninstall.it>
A send to a reference that is not an active local process-reference alias (a plain/short ref, a resource ref, or an external ref) is silently dropped, which matches OTP for local references. Distributed aliases (external references) are not supported, so a message addressed to a remote alias is lost rather than routed over distribution. Add comments at the three send sites (erlang:send/2, the send opcode and the JIT send); no behaviour change. Signed-off-by: Davide Bettio <davide@uninstall.it>
On 32-bit, TERM_BOXED_REFERENCE_RESOURCE_SIZE is 5 (one more than header + refc + mso cons) so reference shapes stay size-distinguishable, but term_from_resource only wrote 4 words, leaving the trailing padding word uninitialized. It is not read by comparison or serialization, but it is copied during GC, which is an uninitialized-memory read (sanitizer / valgrind). Initialize it to nil. No effect on 64-bit, where the resource size is exactly 4 words. Signed-off-by: Davide Bettio <davide@uninstall.it>
…onitor The CONTEXT_MONITOR_RESOURCE case fell through to the following monitor cases when the ref_ticks did not match. It is benign today (the next cases only break), but the alias commit added CONTEXT_MONITOR_ALIAS to the same switch, so tighten the fall-through. No behaviour change. Signed-off-by: Davide Bettio <davide@uninstall.it>
globalcontext_init_process appends every new context to the scheduler waiting queue, but context_destroy never removed it. Destroying a process that was never scheduled -- as the spawn NIFs do on option errors, or port drivers on initialization failures -- left a dangling queue entry pointing into freed memory, corrupting the queue on the next scheduler operation. Pre-existing upstream; exposed by the spawn_opt atomicity test added in the next commit, the first to exercise such a path. Dequeue in context_destroy under the processes spinlock, and reset the queue item at the scheduler's own dequeue sites so the destroy-time dequeue is a no-op there. A concurrent re-queue cannot happen: scheduler_make_ready refuses Killed and Spawning processes under the same spinlock, and one of these flags is set on every destroy path. Signed-off-by: Davide Bettio <davide@uninstall.it>
do_spawn published the link before the monitor phase, which can still fail parsing the
monitor options or on out-of-memory, including the result-space reservation. The error
paths destroyed the new context but left the caller's link entry installed, so the
destroyed context's teardown sent the caller a spurious {'EXIT', Pid, normal} -- delivered
as a message when trapping exits -- and process_info(links) reported a link to a dead pid,
for a spawn_opt call that raised. OTP's spawn_opt is atomic: either it returns and the
link exists, or it raises without side effects.
Run every fallible step -- option parsing, link and monitor allocations, and the result
space reservation -- before publishing anything; publishing itself cannot fail. This
closes the unwind residual left out of scope by the result-space commit. Add a regression
test checking that spawn_opt(F, [link, {monitor, [Bad]}]) raises badarg without leaving a
link behind or delivering an 'EXIT' message.
Signed-off-by: Davide Bettio <davide@uninstall.it>
The BEAM conformance CI run (test-erlang -b) showed that on OTP,
monitor(process, self(), [{alias, _}]) is a no-op apart from returning a fresh reference:
no monitor is installed and no alias either -- sends to the returned reference are
dropped, and unalias/1 and demonitor(Ref, [info]) return false (verified on OTP 29).
This reverts the self half of "Return a real alias from monitor/3 on the self and noproc
paths", whose premise was wrong for self; the noproc half is conformant (its test passes
on the BEAM). Return a plain short reference and install nothing, matching the existing
monitor/2 self path, and update the test to assert the OTP behaviour.
Signed-off-by: Davide Bettio <davide@uninstall.it>
On the BEAM the test process is the erl_eval process, which is linked to init, so the
{links, []} assertion failed when running the tests with test-erlang -b. Snapshot the
links before the spawn_opt call and assert they are unchanged after; this still catches
a leaked link on AtomVM and is independent of the harness process's pre-existing links.
Signed-off-by: Davide Bettio <davide@uninstall.it>
The monitor_*_new constructors return a pointer to the struct Monitor embedded in their container struct, and the nifs.c error paths free()d that pointer directly. That is only correct because the monitor happens to be the first member of every container struct; recover the container with CONTAINER_OF instead, through a new monitor_destroy that switches on the monitor type, so a layout change cannot silently corrupt the heap. No functional change. Signed-off-by: Davide Bettio <davide@uninstall.it>
The monitor/3 and alias/1 @doc comments each closed their option-list paragraph with a one-line caveat about an OTP option AtomVM does not support: {tag, Term} for monitor/3, the OTP 28 priority option for alias/1. Tucked onto the end of the bullet list, both were easy to miss. Lift each caveat into its own paragraph, led by a bold "Note:" and reworded to open with "Unlike Erlang/OTP", so the divergence stands out when scanning the generated documentation. Comment-only change; the compiled module is unaffected. Signed-off-by: Davide Bettio <davide@uninstall.it>
The process-reference decode path rejects a process id that is INVALID_PROCESS_ID or above TERM_MAX_LOCAL_PROCESS_ID. The three-line comment above the check only restated it -- the constant names already say what is rejected and why. Remove it. Signed-off-by: Davide Bettio <davide@uninstall.it>
The ordering of alias references was exercised only indirectly, by test_refs_ordering, whose aliases all share one owner. Add a focused test_monitor case pinning the portable, OTP-conformant properties: a plain reference sorts before an alias whichever was created first; aliases of one owner order by creation; the owner pid is encoded, so a process reference round-trips through term_to_binary equal; and two owners' aliases are distinct and strictly ordered. The cross-owner direction follows the internal pid, not the pid term order (checked on OTP 29), so it is asserted only as a strict total order rather than pinned. The module also runs on the BEAM; the test passes there and on the interpreter and JIT flavors. Signed-off-by: Davide Bettio <davide@uninstall.it>
The two len == 3 branches in term_compare each carried a multi-line comment explaining that a local process reference is laid out [ticks_hi, ticks_lo, process_id] to match the wire/external word order. The layout is plain in the three assignments that follow, and each branch is already labelled by its else if (len == 3), like the commentless len == 2 sibling. Remove the narration. Signed-off-by: Davide Bettio <davide@uninstall.it>
test_encode_process_ref only asserted that binary_to_term(term_to_binary(R)) returns R, which a compiler is free to fold to R = R, testing nothing. Route the binary through ?MODULE:id/1 (already used elsewhere in this module) so the round-trip is opaque to the compiler and actually runs. Also match the serialized form against its known bytes: the version byte, the NEWER_REFERENCE_EXT tag, the nonode@nohost node atom and the zero creation. The id-word count differs between implementations (AtomVM 3 words, the BEAM 5), so the length word and the id bytes are left unmatched -- the pattern holds on both, where this module also runs. Signed-off-by: Davide Bettio <davide@uninstall.it>
Drop the "(OTP >= 26 semantics)" parenthetical; the entry now reads "Added support for process aliases: ...". Signed-off-by: Davide Bettio <davide@uninstall.it>
erlang:send/2, the send opcode and the JIT send dispatched on term_is_process_reference before the term_is_reference guard that rejects a non-reference recipient with badarg. Put the guard first, then the process-reference (active alias) branch, then the silent drop for any other reference. A process reference is always a reference (TERM_BOXED_REFERENCE_PROCESS_HEADER carries TERM_BOXED_REF in its low bits), so the three outcomes -- badarg, alias send, drop -- are unchanged. Also turn the note about the unsupported outbound distributed alias into a TODO at each of the three drop sites. Signed-off-by: Davide Bettio <davide@uninstall.it>
active_alias_count was a 16-bit field plus a 1-bit alias_count_saturated flag (17 bits) used only to skip the monitor-list walk for alias-free processes. That skip only distinguishes count == 0 from count > 0, so saturation needs no distinct value: shrink the count to 8 bits and let 0xFF double as the saturated sentinel. It pins at 0xFF once 255 aliases coexist and is never decremented again, so lookups keep walking the list rather than wrapping to 0 and silently deactivating every alias -- the same guarantee as before, in one fewer field and a little less code. The read gates (context_find_alias and the mailbox fast path) are unchanged; the increment loses its saturation branch and the two decrements guard on the 0xFF sentinel instead of the removed flag. Signed-off-by: Davide Bettio <davide@uninstall.it>
context_monitors_handle_terminate reserved TERM_BOXED_REFERENCE_PROCESS_SIZE for every monitor's DOWN, but only a CONTEXT_MONITOR_MONITORED_LOCAL_ALIAS monitor builds a process reference; a plain monitor builds a short reference, one word smaller on 64-bit and two on 32-bit. Plain monitors are the common case, so reserve by the flavor instead of always taking the larger size. Hoist the monitor_type test into a single is_monitored_alias bool that drives both the reservation size and the ref build, so the reserved size cannot drift from what is allocated -- the reservation now exactly matches the allocation on each branch. Signed-off-by: Davide Bettio <davide@uninstall.it>
…send In jit_send's atom branch the registered-name lookup result was re-tested with term_is_local_pid_or_port, raising badarg in the else. That else is unreachable: nif_erlang_register_2 is the only writer of the registry and validates term_is_local_pid_or_port before storing, so globalcontext_get_registered_process returns either a local pid/port or UNDEFINED_ATOM, and UNDEFINED_ATOM is already handled just above. Resolve straight to the process id and send, matching jit_send's own direct-pid branch and nif_erlang_send_2. Signed-off-by: Davide Bettio <davide@uninstall.it>
The priority option (OTP 28) was the only recognized-but-unimplemented OTP
option in the alias/monitor option parsing that raised badarg; the monitor/3
{tag, _} option raises unsupported, the release-0.7 convention for options
AtomVM recognizes but does not implement. Match it: recognize priority
explicitly and raise unsupported, so both report the same error. Genuinely
unknown options still fall through to badarg.
Add the priority atom and update the alias/1 @doc note accordingly.
Signed-off-by: Davide Bettio <davide@uninstall.it>
The alias/process-reference code added by this branch left a few error and not-taken branches without the UNLIKELY / IS_NULL_PTR hint that the surrounding code already applies to the same paths: the send paths guard a registered-name miss with UNLIKELY but not the adjacent non-reference badarg, and nif_erlang_unalias hints the alias lookup with IS_NULL_PTR but not the external-reference rejection just above it. Hint them the same way: the non-reference send target (the send/2 NIF, the send opcode and jit_send) and the external-reference unalias/1 with UNLIKELY before their badarg/false, the out-of-range owner pid in the process-reference decode with UNLIKELY, and the drop of a message to an inactive alias with IS_NULL_PTR. Branch-prediction hints only; no behavior change. Signed-off-by: Davide Bettio <davide@uninstall.it>
The process-aliases work added a number of comments that restate what the code or the header doxygen already says, repeat the same fact across files, or narrate platform internals. The per-function contracts are documented in context.h, globalcontext.h and mailbox.h, so the inline copies in context.c and globalcontext.c added nothing but a second place to drift out of sync. Drop the inline blocks that duplicate the header docs (the alias-signal contract, the send-to-alias rationale), the comment that restated the reference-size ternary right below it, the active_alias_count saturation note already carried by the field declaration, and the glibc/pthread EDEADLK narration on the processes-table lock. Tighten the still-useful comments in the mailbox drain, do_spawn, monitor/3 and the three send sites, keeping the non-obvious reasoning (received order, in-place re-typing, OOM atomicity) but not the restatement. Also remove the duplicated "reference types are distinguished by size" clause from the 32-bit process-reference size comment. Fix a misplaced comment in nif_erlang_dist_ctrl_put_data along the way: the "dist handle" label sat on roots[1] (argv[1], the decoded binary) instead of roots[0] (argv[0], the connection resource), where the other operations in the same function correctly put it. The code itself is unchanged. In the tests, drop narration that the test function name already conveys, and extract the eavmlib file select-handle alias-rejection check out of test_fifo_select into its own test_alias_select_handle_rejected/1. Comment and test-refactor only; no behavior change. Both flavors build and the alias modules plus the eavmlib and estdlib test packs pass. Signed-off-by: Davide Bettio <davide@uninstall.it>
The comments added by the alias work leaned on `--` em-dashes and a few long sentences joined by `;`, which reads as machine-generated and is out of step with the rest of the codebase. Rewrite them as plain, short sentences: the parenthetical and trailing `--` become periods, commas or parentheses, and the long `;` sentences are split in two. Touches the alias comments in context.c, dist_nifs.c, mailbox.c, nifs.c, opcodesswitch.h, otp_socket.c and resources.c, plus the test_monitor.erl docstrings. The pre-existing base `--` uses (the sub-binary doxygen in term.h, the timeout fall-through comments in opcodesswitch.h) are left alone. Comment-only; no behavior change. Both flavors build and the alias test modules pass. Signed-off-by: Davide Bettio <davide@uninstall.it>
The saturation value 0xFF appeared as a bare literal in the three active_alias_count guards, which is easy to misread. Give it a name, ACTIVE_ALIAS_COUNT_SATURATED, defined in context.c next to the other file-local constants, and carry the saturation rationale (why saturate instead of wrapping to 0) in the comment on the define. The field comment in context.h shrinks to the essentials: it is an optimization and it saturates at a sticky sentinel, with a pointer to the define. No behavior change. Both flavors build and test_monitor passes. Signed-off-by: Davide Bettio <davide@uninstall.it>
The arity check for DOP_ALIAS_SEND / DOP_ALIAS_SEND_TT inlined the expected value as a ternary inside the condition, on a long hard-to-scan line. Pull it out into a named expected_arity (size_t, matching arity so the comparison stays sign-clean) so the check reads plainly: DOP_ALIAS_SEND carries 3 control elements, DOP_ALIAS_SEND_TT 4 (the extra trace token). No behavior change. Both flavors build and test_net_kernel (the inbound DOP_ALIAS_SEND test) passes. Signed-off-by: Davide Bettio <davide@uninstall.it>
The FIXME spelled out the leak and the suggested fix over three lines and used the "pre-existing" review tag, which does not belong in a code comment. Reduce it to a one-line note of the actual defect: the void return hides the failure from the caller. Comment-only; no behavior change. Signed-off-by: Davide Bettio <davide@uninstall.it>
process_outer_list took both a Context and a Mailbox, with ctx == NULL used as an alias-blind mode flag. The two arguments had to agree: a caller could pass process_outer_list(ctx1, &ctx2->mailbox) and nothing would catch it. The fast path used only active_alias_count and global, never the whole Context. Split the worker into its two real operations. mailbox_process_outer_list now takes a Mailbox again and is free of any Context dependency: it is the alias-blind drain used by ports, teardown and the crashdump, which leave any alias signal in the signal list for the caller to drop. mailbox_process_outer_list_with_aliases takes the Context and derives its own mailbox, so the mismatched-pair case is gone. The shared mechanics, the CAS detach and the inner-list splice, move into two static inline helpers. The alias-active slow path still needs the whole Context because it mutates the alias table. The hot path stays a single pass and the helpers are inlined, so the generated code is equivalent. No behavior change; both JIT flavors pass the full suite and the full valgrind run is clean. Signed-off-by: Davide Bettio <davide@uninstall.it>
A few comments left by the alias work restated something the reader already has: a function's own doxygen, a neighbouring comment, or the size macro a few lines up. Drop the redundant part and keep whatever is genuinely non-obvious. - mailbox.h: mailbox_process_outer_list already returns "the signal messages", so it need not add that alias messages come back as signals. - mailbox.c: the alias-blind drain need not restate where an alias signal goes, only why the simple handling is safe (its callers never own an active alias). - nifs.c: the self-monitor branch already says no alias is installed, and a SHORT_SIZE reservation followed by term_from_ref_ticks is the usual pattern. - term.h: the process-reference pad-init comment keeps only the GC reason. The size rationale already lives on TERM_BOXED_REFERENCE_PROCESS_SIZE. - erlang.erl: send/2 already lists an alias as a Target, so the doc keeps only the non-obvious rule that a send to an inactive alias is dropped. - test_monitor.erl: the ordering test's header already lists every property, so two inline comments that repeated it are gone (those adding the tie-break and round-trip detail stay). Comment and documentation only; no behavior change. Signed-off-by: Davide Bettio <davide@uninstall.it>
The alias tests carried many comments that narrate what the next lines or the test function name already say (e.g. a header restating the function name, or a note describing the very send/receive below it). Keep comments minimal: only facts a reader cannot infer from the code stay. Kept across the suite: the timing and ordering rationales (why a fence makes an `after 0` race-free, why a selective receive leaves a 'DOWN' queued), the BEAM-vs-AtomVM divergences (stale 'EXIT' under trap_exit, the eval process linked to init, the group-leader setup, opaque reference words), the magic-value choices (the large spin bound, the refc-binary size that forces an mso sweep), and the monotonic-pid plus unique-ticks invariant the stale-alias test relies on. - test_net_kernel.erl: four narration blocks reduced to the one fence + stale 'EXIT' note. - test_monitor.erl: 17 comment blocks dropped, 15 tightened. Comment only; no behavior change. Signed-off-by: Davide Bettio <davide@uninstall.it>
Both select validation sites explained at length why a process alias is rejected as a select handle. That the alias is rejected, and that the select machinery only carries ref ticks, is clear from the surrounding code. Keep only the part a reader cannot infer here: that the BEAM accepts an alias where AtomVM does not. Comment only; no behavior change. Signed-off-by: Davide Bettio <davide@uninstall.it>
A leftover comment joined two clauses with a semicolon. The comment style asks for plain separate sentences instead. Comment only; no behavior change. Signed-off-by: Davide Bettio <davide@uninstall.it>
Both console gen_call comments narrated what the code already shows. The reply reservation comment re-derived the 2 * TUPLE_SIZE(2) size that the very next line reserves and compared it to the other reply shapes; keep only the reason the reservation exists at all (the reply builders below do not ensure_free). The io_reply comment led with "echo ReplyAs verbatim", which the term_put_tuple_element line shows; keep only the reason not to rebuild it from ref ticks (an alias would become a plain reference the requester would not match). Comment only; no behavior change. Signed-off-by: Davide Bettio <davide@uninstall.it>
The spawn, send and DOWN paths carried comments that narrated the adjacent code: re-listing the steps a block is about to run, restating the size the next line reserves, or re-describing a branch condition. Keep only what a reader cannot get from the code. - do_spawn: drop the step enumeration and the result-size restatement; keep the spurious-EXIT atomicity reason, the publish ordering, and the GC-safety note. - The three send paths (nif_erlang_send_2, jit_send, OP_SEND): drop the "not a local process reference" lead, keep the OTP silent-drop divergence and the outbound-unsupported note. - context.c: drop the "remove from the scheduler queue" narration; delete the alias-deactivation comment duplicated within the same function (it is stated at the first site); keep the capture-pid-before-demonitor ordering note. - context.h: trim the active_alias_count field comment to its two real facts. - dist_nifs.c: keep why a stale inbound ref is dropped rather than crashing the distribution connection. Comment only; no behavior change. Signed-off-by: Davide Bettio <davide@uninstall.it>
Signed-off-by: Davide Bettio <davide@uninstall.it>
| // Reverse the LIFO list into received order (oldest first), splitting it into a normal sublist | ||
| // and a signal sublist in one pass. This entry is alias-blind: its callers (ports, teardown and | ||
| // crashdump) never own an active alias. | ||
| MailboxMessage *normal_first = NULL; |
There was a problem hiding this comment.
I'm ok with the renaming, but keeping the old names would reduce the diff.
| TERM_DEBUG_ASSERT(ctx->active_alias_count > 0); | ||
| if (LIKELY(ctx->active_alias_count != ACTIVE_ALIAS_COUNT_SATURATED)) { | ||
| ctx->active_alias_count--; | ||
| } |
There was a problem hiding this comment.
I would put this after the list_remove and if the count is saturated but there is no monitor left (after the remove) I would reset it to 0, so the alias count gets a chance to be reset.
| CONTEXT_MONITOR_LINK_REMOTE, | ||
| CONTEXT_MONITOR_MONITORING_LOCAL_REGISTEREDNAME, | ||
| CONTEXT_MONITOR_ALIAS, | ||
| // Like CONTEXT_MONITOR_MONITORED_LOCAL, but the 'DOWN' carries a process reference (an alias). |
There was a problem hiding this comment.
I find this comment confusing
| * must only be used on monitors that were never passed to context_add_monitor, | ||
| * e.g. on error paths. | ||
| * | ||
| * @param monitor the monitor to free, or NULL in which case nothing is done |
There was a problem hiding this comment.
I checked few calls below, monitor was never NULL. Do we really want this semantic?
|
|
||
| void monitor_destroy(struct Monitor *monitor) | ||
| { | ||
| if (monitor == NULL) { |
There was a problem hiding this comment.
This sounds over defensive.
|
|
||
| term refcopy = term_from_ref_ticks(ref_ticks, &ctx->heap); | ||
|
|
||
| term reply = term_alloc_tuple(3, &ctx->heap); |
There was a problem hiding this comment.
This is a tuple 3 but we alloc for two tuple 2 above.
We had 12 and 2 * TUPLE(2) is 6 I believe, and we need at least 7 in this branch, don't we?
I wonder why we copied the ref and no longer need to. With the ref, the branch was 9. Is there another branch that required 12?
| mbox->inner_last = last; | ||
| } | ||
|
|
||
| MailboxMessage *mailbox_process_outer_list(Mailbox *mbox) |
There was a problem hiding this comment.
I am not convinced this works.
There are several calls of mailbox_process_outer_list that haven't been replaced with the _with_alias variant, so Alias is considered as a signal and order is lost. For example get_process_info it seems. This would break message ordering semantics, wouldn't it?
I'm not sure we should have the variant in the first place.
This revamps PR #2027
These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).
SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later