Skip to content

Updated aliases#2326

Open
bettio wants to merge 87 commits into
atomvm:release-0.7from
bettio:updated-aliases
Open

Updated aliases#2326
bettio wants to merge 87 commits into
atomvm:release-0.7from
bettio:updated-aliases

Conversation

@bettio

@bettio bettio commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

This revamps PR #2027

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

@bettio bettio changed the base branch from main to release-0.7 June 2, 2026 12:10
@petermm

This comment was marked as outdated.

@petermm

This comment was marked as outdated.

Comment thread src/libAtomVM/nifs.c Fixed
Comment thread tests/test-enif.c Fixed
@petermm

This comment was marked as outdated.

@bettio bettio force-pushed the updated-aliases branch from ce79b10 to f6af023 Compare June 7, 2026 19:53
@petermm

This comment was marked as outdated.

@bettio bettio force-pushed the updated-aliases branch from 64e7fb6 to 73bb408 Compare June 11, 2026 13:43
Comment thread src/libAtomVM/context.c Outdated

term context_process_alias_message_signal(Context *ctx, struct TermSignal *signal)
{
// signal->signal_term is a 2-tuple {Ref, Message}. The alias lookup runs here, in the owner's own

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want comments that verbose? The drawback is that eventually they end up being unsync with code, and LLM or humans tend to read comments instead of code.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all, I'm leaving them for helping with review activity, but I will cleanup code from all that narration.

Comment thread src/libAtomVM/context.c Outdated
// Target cannot be NULL as we processed Demonitor signals
assert(target != NULL);
int required_terms = REF_SIZE + TUPLE_SIZE(5);
int required_terms = TERM_BOXED_REFERENCE_PROCESS_SIZE + TUPLE_SIZE(5);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This size could be conditional of monitor->monitor_type == CONTEXT_MONITOR_MONITORED_LOCAL_ALIAS, couldn't it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread src/libAtomVM/context.h Outdated
// lookups are no longer skipped for this process, trading the optimization for
// correctness instead of wrapping to 0 and silently deactivating every alias.
unsigned int active_alias_count : 16;
unsigned int alias_count_saturated : 1;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this optimization is worth the 17 bits it costs. We could have only one bit and check if there is any alias left when removing an alias. Alternatively, we could reduce it to 8 bits and drop the optimization when the number of aliases reaches (forever) 255.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, using just 8 bit stored in the free bitfield area should be a reasonable compromise. So we don't have to scan everything every time an alias is removed.

Comment thread src/libAtomVM/globalcontext.c Outdated
{
Context *p = globalcontext_get_process_lock(glb, process_id);
if (p) {
// Post the alias message as a signal carrying {Ref, Message}. The owner validates the alias

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment adds any value

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these comments/narration will be removed while squashing, after review.

*
* @details Posts an AliasMessageSignal carrying {Ref, Message} to the owner process. The owner
* validates the alias against its own monitors when it drains signals and either delivers Message
* as a normal message or drops it. This avoids touching the owner's monitor list from the sender.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last sentence sounds obvious here.

Comment thread src/libAtomVM/jit.c Outdated
set_error(ctx, jit_state, 0, BADARG_ATOM);
return false;
} else {
// A reference that is not a local process reference (short/resource/external ref): the

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TODO would be welcome for distributed aliases.
Also I would put the term_is_reference first.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread src/libAtomVM/jit.c Outdated
}

int local_process_id;
if (term_is_local_pid_or_port(recipient_term)) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test still make sense?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was deadcode.

@petermm

petermm commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

AMP:

PR Review — Process aliases for AtomVM

Range reviewed: f90e0ee9a..HEAD (62 commits), branched from
9402220d7 (Merge pull request #2331 … valgrind-spinlock-sched).

Diffstat: ~40 files, +1975 / −173. The series adds OTP process aliases:
erlang:alias/0,1, erlang:unalias/1, erlang:monitor/3 with the
{alias, Mode} option, spawn_opt {monitor, MonitorOpts}, sending to an
alias reference (local + inbound distribution), a new boxed process
reference
term, and per-process saturating alias counters.

Verdict

Strong, ship-quality work. The design is sound: aliases are validated in
the owner's own context against the owner's own monitor list (the same
single-owner discipline as alias/0/unalias/1/demonitor), so there is no
cross-process race on alias state. Error paths in do_spawn/monitor are
carefully "allocate-everything-then-publish", the 32-bit reference padding and
size-distinctness are guarded by _Static_assert, and the commit history shows
the tricky cases (same-batch ordering, saturation, wire ordering in
term_compare, REF_SIZE deprecation) were each addressed deliberately. Test
coverage is substantial (tests/erlang_tests/test_monitor.erl +672).

The findings below are mostly pre-existing OOM-robustness gaps that this
feature widens slightly, plus one portability note. None block merge; I'd fix
#1 in this series since the alias code adds to the half-published surface.


Findings

1. (Medium) OOM in mailbox_send_monitor_signal leaves a half-published monitor/alias

nifs.c:5187,
helper at mailbox.c:338.

mailbox_send_monitor_signal() returns void and silently drops the signal on
malloc failure (the FIXME at mailbox.c:340 acknowledges this). In
nif_erlang_monitor the call is the publish step for the target's half of the
monitor. If it fails:

  • other_monitor is leaked,
  • self_monitor and (new in this PR) alias_monitor are still added right
    after at nifs.c:5190-5192,
  • the caller still receives a reference,
  • the target never gets the monitor.

So the monitoring process believes it monitors / has an alias to a target that
knows nothing about it. The alias addition makes the inconsistency worse (a
live alias with no monitored half).

Suggested fix — make the helper report failure and clean up at the call site:

--- a/src/libAtomVM/mailbox.h
+++ b/src/libAtomVM/mailbox.h
-void mailbox_send_monitor_signal(Context *c, enum MessageType type, struct Monitor *monitor);
+bool mailbox_send_monitor_signal(Context *c, enum MessageType type, struct Monitor *monitor);
--- a/src/libAtomVM/mailbox.c
+++ b/src/libAtomVM/mailbox.c
-void mailbox_send_monitor_signal(Context *c, enum MessageType type, struct Monitor *monitor)
+bool mailbox_send_monitor_signal(Context *c, enum MessageType type, struct Monitor *monitor)
 {
     struct MonitorPointerSignal *monitor_signal = malloc(sizeof(struct MonitorPointerSignal));
     if (IS_NULL_PTR(monitor_signal)) {
-        // FIXME (pre-existing): this function returns void, so an out-of-memory here is silently
-        // dropped -- the monitor is leaked and the caller believes it was installed. It should return
-        // bool so the caller can free the monitor and raise out_of_memory.
         fprintf(stderr, "Failed to allocate memory: %s:%i.\n", __FILE__, __LINE__);
-        return;
+        return false;
     }
     ...
     mailbox_post_message(c, &monitor_signal->base);
+    return true;
 }
--- a/src/libAtomVM/nifs.c
+++ b/src/libAtomVM/nifs.c
-    mailbox_send_monitor_signal(target, MonitorSignal, other_monitor);
-    globalcontext_get_process_unlock(ctx->global, target);
+    if (UNLIKELY(!mailbox_send_monitor_signal(target, MonitorSignal, other_monitor))) {
+        monitor_destroy(alias_monitor);
+        monitor_destroy(self_monitor);
+        monitor_destroy(other_monitor);
+        globalcontext_get_process_unlock(ctx->global, target);
+        RAISE_ERROR(OUT_OF_MEMORY_ATOM);
+    }
+    globalcontext_get_process_unlock(ctx->global, target);

The other callers (nifs.c:5275 link, dist_nifs.c, resources.c) then need
to either check the return or explicitly ignore it with cleanup. If a full
audit is out of scope for this PR, at minimum guard the new alias-bearing path
above so the alias is not left dangling.

2. (Low) Same OOM class on mailbox_send / mailbox_send_ref_signal

  • noproc monitor branch: nifs.c:5133 installs the
    explicit_unalias alias before mailbox_send(ctx, down_message_tuple); if
    that send OOMs the DOWN is lost but the alias remains (tolerable, since an
    explicit-unalias alias lives until unalias/1).
  • reply_demonitor: context.c:527
    mailbox_send_ref_signal(target, DemonitorSignal, …) can drop the remote
    demonitor on OOM, leaving the monitored side installed until target exit.

Same root cause as #1 (void mailbox-send helpers). Worth a tracking note rather
than a blocker.

3. (Low, pre-existing) Registered-name DOWN writes a temp-heap tuple into the retained signal

context.c:470-476:

BEGIN_WITH_STACK_HEAP(TUPLE_SIZE(2), temp_heap)
term name_tuple = term_alloc_tuple(2, &temp_heap);
...
term_put_tuple_element(signal->signal_term, 3, name_tuple);  // temp-heap term into the signal
mailbox_send(ctx, signal->signal_term);
END_WITH_STACK_HEAP(temp_heap, ctx->global);

mailbox_send copies the term out before the temp heap is torn down, so the
delivered message is fine. But signal->signal_term now holds a pointer into
freed stack-heap storage, and the original signal fragment is later appended to
ctx->heap by mailbox_message_dispose, where a future GC could scan the
dangling slot.

This code is unchanged by this PR (it predates f90e0ee9a; the series only
renamed ref_data.ref_ticksref_ticks here), so it is out of scope — noted
for a follow-up. Fix is to build a fresh 5-tuple on the temp heap and send that
instead of mutating signal->signal_term.

4. (Low / portability) In-place re-type of TermSignalMessage

mailbox.c:458-459
(and the CONTAINER_OF(current, Message, base) disposal casts at lines 403,
472) re-type an AliasMessageSignal in place. The _Static_asserts above the
function prove identical layout, so this works on every current toolchain, but
identical layout does not by itself satisfy C effective-type / strict-aliasing
rules. Low risk; the clean long-term fix is to unify Message and TermSignal
into one heap-backed mailbox term with a union { term message; term signal_term; }. Not needed for merge.


Things verified as correct (not bugs)

  • Unlocked active_alias_count read in process_outer_list
    (mailbox.c:388):
    owner-written only, runs in owner context, stable for the batch. Fast path
    (no alias) preserves the original single-pass reversal; alias path reverses
    to received order first. Sound.
  • Same-batch ordering: pre-deactivating an alias on a MonitorDownSignal
    seen earlier in the batch (mailbox.c:483)
    so a later same-batch alias send is dropped; idempotent with the later
    context_process_monitor_down_signal. Matches OTP.
  • Saturating counter (context.h): pins at 0xFFFF and stops decrementing,
    so context_find_alias never wrongly short-circuits — correctness over the
    optimization. Sound.
  • Process-reference sizing (term.h): 64-bit = SHORT+1, 32-bit = SHORT+2
    with the trailing pad word initialized to nil; all reference kinds stay
    size-distinguishable, guarded by _Static_assert. Sound.
  • reply_demonitor lock discipline (context.c): monitor_pid captured as
    an immediate before context_demonitor frees the alias; target taken under
    the process-table lock; context_destroy deliberately uses the alias-blind
    drain to avoid taking that lock while holding the write lock. No UAF/deadlock.
  • Inbound distributed alias send (dist_nifs.c, external_term.c): control
    tuple kept rooted across payload-decode GC; decode rejects process_id == 0
    (the INVALID_PROCESS_ID short-ref sentinel) and > TERM_MAX_LOCAL_PROCESS_ID;
    delivery only for node+creation match.
  • term_compare wire ordering: local process refs laid out as
    [ticks_hi, ticks_lo, process_id] to agree with the serialized form and the
    external-ref path; the mixed local/external len==3 branch never has both
    operands local (guarded by the outer is_external check), so the shared
    local_data scratch buffer is not a hazard.
  • do_spawn: all parsing/allocation/result reservation before any publish,
    monitor_destroy on every error path, monitor_destroy(NULL) no-op. No leak
    or half-published state found.
  • REF_SIZE deprecation: no remaining REF_SIZE uses in the tree, so the
    _Pragma("GCC warning …") will not fire (safe even under -Werror).
    CHANGELOG documents both the feature and the deprecation.

Suggested follow-up tests

  • Same-batch: alias-send-before-DOWN (delivered) vs DOWN-before-alias-send
    (dropped); multiple sends to one reply/reply_demonitor alias in a batch
    (first delivered, rest dropped).
  • OOM fault injection on mailbox_send_monitor_signal at nifs.c:5187 (if a
    malloc hook is available) to lock in the bifs_hash.h not found #1 fix.

Comment thread tests/erlang_tests/test_monitor.erl Outdated
),
P ! quit,
ok = wait_registered(down_batch_relay, 5000000),
%% The huge bound absorbs valgrind's unfair thread scheduling, which can starve the relay

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably is fixed with #2331

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked this branch history (787f573) and #2311 is already there, so I think the issue has been fixed before. Also the commits have been done before, so quite likely this issue hasn't been fixed by #2331.

mat-hek and others added 19 commits June 17, 2026 15:25
Implement Erlang process aliases together with the reference-representation
rework they require, rebased onto release-0.7 and squashed into one commit.

Public API (libs/estdlib/src/erlang.erl):
- alias/0, unalias/1
- monitor/3 with {alias, explicit_unalias | demonitor | reply_demonitor}
- spawn_opt {monitor, [monitor_option()]}

Runtime:
- New process-reference term carrying ref_ticks + owner process id (RefData),
  alongside short/local references; BEAM-compliant reference ordering and
  binary_to_term/term_to_binary round-tripping (term.{c,h}, external_term.c).
- Monitors store RefData; alias monitors (CONTEXT_MONITOR_ALIAS) with the three
  alias modes and their lifecycle (context.{c,h}).
- send/2, the send opcode and the JIT send path resolve an alias to its target
  process and drop reply_demonitor aliases after first use (nifs.c,
  opcodesswitch.h, jit.c).
- Reference size macros: REF_SIZE -> TERM_BOXED_REFERENCE_SHORT_SIZE, plus
  TERM_BOXED_REFERENCE_PROCESS_SIZE and TERM_BOXED_REFERENCE_RESOURCE_SIZE.

Tests: alias coverage added to test_monitor, test_refs_ordering and
test_binary_to_term.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Sending to a process reference resolved and deactivated the alias from the
sending process: it walked and mutated the target's monitor list
(context_find_alias / context_unalias) while holding only the processes_table
read lock. That lock only guards against the target being freed, not against
the target mutating its own monitor list (via alias/0, unalias/1, demonitor or
process teardown), so a send racing with a monitor change could corrupt the
list or free a node used by another scheduler -- a data race / use-after-free
on SMP.

Route alias sends through the mailbox instead, the same way monitor and
demonitor operations are already delivered across processes. The send paths now
post an AliasMessageSignal carrying {Ref, Message} via the shared helper
globalcontext_send_message_to_alias; the owner validates the alias against its
own monitor list when it drains signals (context_process_alias_message_signal)
and either delivers the message or drops it for an inactive alias. All
monitor-list access now happens in the owning process, and the three send paths
(erlang:send/2, the send opcode and the JIT send) share one helper.

Signed-off-by: Davide Bettio <davide@uninstall.it>
A monitor created with {alias, reply_demonitor} previously only deactivated the
alias on the first reply through it; the underlying monitor stayed active, so
the owner could still receive a stale 'DOWN'. Match OTP: on the first message
received through the alias, also remove the monitor and signal the monitored
process to drop its side, as erlang:demonitor/1 would.

Add a regression test asserting that, after a reply through a reply_demonitor
alias, no 'DOWN' is delivered when the monitored process later exits.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Aliases require OTP 24, so the alias tests guarded their alias-specific cases
behind a runtime OTP-version check (plus a make_ref-based fallback for older
BEAM). AtomVM no longer supports OTP < 26, where aliases are always available,
so those checks are dead code: run the alias cases unconditionally and drop the
is_atomvm_or_otp_version_at_least/1 helper and the mock-alias fallback.

Signed-off-by: Davide Bettio <davide@uninstall.it>
For {alias, demonitor} and {alias, reply_demonitor}, the alias is deactivated
when the monitor is removed, including the automatic removal when a 'DOWN' is
delivered (as erlang:demonitor/1 already does for an explicit demonitor).
context_process_monitor_down_signal removed the monitor but left the alias
active, so a third party could still reach the owner through the alias after
the monitored process had exited.

Deactivate the alias (unless it is explicit_unalias) in both the local and
registered-name 'DOWN' paths, and add a regression test.

Signed-off-by: Davide Bettio <davide@uninstall.it>
nif_erlang_monitor locks the target process, then raises badarg when the object
type does not match the target (monitor(process, Port, ...) or vice versa). That
error path returned without releasing the processes-table lock, unlike the three
sibling out-of-memory error paths in the same function, leaking the lock and
deadlocking the next writer on SMP builds.

Unlock the target before raising. (A behavioural test would deadlock rather than
fail; verified by inspection against the sibling error paths.)

Signed-off-by: Davide Bettio <davide@uninstall.it>
Both nif_erlang_monitor and do_spawn published the monitor/alias state (queuing a
MonitorSignal to the target and adding monitors to the contexts) and only then
reserved heap for the returned reference. An out-of-memory at that last step
raised, leaving the monitor half-installed: the target could later send a 'DOWN'
for a reference the caller never received.

Reserve the result space before publishing, freeing the monitor structs on
failure. GC at the new point is safe -- the monitor structs hold only an
immediate pid and a plain RefData, none reachable from the heap. Unwinding the
separate spawn_opt link side effect on a later OOM is left out of scope.

Signed-off-by: Davide Bettio <davide@uninstall.it>
An alias send is delivered via an AliasMessageSignal so the owner validates the
alias in its own context (race-free). But mailbox_process_outer_list moved normal
messages straight to the inner mailbox while signals were processed afterwards and
re-posted, so `Alias ! m1, Pid ! m2` to the same receiver could deliver m2 before
m1 -- violating the single-sender message ordering guarantee.

Convert alias messages to normal messages in place during the outer-list reverse
(new mailbox_process_outer_list_with_aliases), prepending them into the normal
stream so they keep send order relative to plain messages from the same sender.
context_process_alias_message_signal now returns the payload instead of re-posting
it. Teardown and the port scheduler keep the plain mailbox_process_outer_list
(which returns alias signals to drop): they must not run the reply_demonitor path,
which takes the processes-table lock that context_destroy already holds.

Add a regression test (alias send then pid send keep order).

Signed-off-by: Davide Bettio <davide@uninstall.it>
term_compare sizes a process reference as 3 words (process_id + ref_ticks) in the
local-vs-local branch, but the mixed local-vs-external branch still sized it as 2
words, dropping process_id. A process reference was therefore ordered differently
depending on whether it was compared against a local or an external reference,
which can make the term order inconsistent. (The reviewer's "two process refs with
equal ref_ticks" case cannot actually occur: ref_ticks come from a global
monotonic counter, so they are unique.)

Size process references as 3 words in the mixed branch too and add the matching
extraction, mirroring the local-vs-local logic. Exercising this path requires an
external reference (distribution), so it is verified by inspection.

Signed-off-by: Davide Bettio <davide@uninstall.it>
binary_to_term of a NEWER_REFERENCE_EXT with len == 3 turned the third word
straight into a process reference id without bounding it. Note this was not a
memory-safety issue as the reviewer suggested: the id flows into
globalcontext_get_process_lock, which performs a linear search of the process
table (no out-of-bounds indexing), so an out-of-range value simply matches no
process. Still, validate it like a local pid (TERM_MAX_LOCAL_PROCESS_ID) to
reject clearly malformed input early.

Signed-off-by: Davide Bettio <davide@uninstall.it>
monitor(process, X, [{alias, ...}]) returned a plain short reference (no alias)
when X was self() or an already-dead process, so the returned reference could not
be used to send to the process or be passed to unalias/1. OTP returns a usable
alias in both cases.

Install the alias and return a process reference on both early-return paths. For
self the alias stays active (self never sends a DOWN). On the noproc path the
monitor is immediately removed by the noproc DOWN, so only an explicit_unalias
alias stays active (demonitor / reply_demonitor are deactivated right away, as at
a real DOWN). Add regression tests for both.

Signed-off-by: Davide Bettio <davide@uninstall.it>
A send to a reference that is not an active local process-reference alias (a
plain/short ref, a resource ref, or an external ref) is silently dropped, which
matches OTP for local references. Distributed aliases (external references) are
not supported, so a message addressed to a remote alias is lost rather than
routed over distribution. Add comments at the three send sites (erlang:send/2,
the send opcode and the JIT send); no behaviour change.

Signed-off-by: Davide Bettio <davide@uninstall.it>
On 32-bit, TERM_BOXED_REFERENCE_RESOURCE_SIZE is 5 (one more than header + refc +
mso cons) so reference shapes stay size-distinguishable, but term_from_resource
only wrote 4 words, leaving the trailing padding word uninitialized. It is not
read by comparison or serialization, but it is copied during GC, which is an
uninitialized-memory read (sanitizer / valgrind). Initialize it to nil. No effect
on 64-bit, where the resource size is exactly 4 words.

Signed-off-by: Davide Bettio <davide@uninstall.it>
…onitor

The CONTEXT_MONITOR_RESOURCE case fell through to the following monitor cases
when the ref_ticks did not match. It is benign today (the next cases only break),
but the alias commit added CONTEXT_MONITOR_ALIAS to the same switch, so tighten
the fall-through. No behaviour change.

Signed-off-by: Davide Bettio <davide@uninstall.it>
globalcontext_init_process appends every new context to the scheduler waiting queue, but
context_destroy never removed it. Destroying a process that was never scheduled -- as the
spawn NIFs do on option errors, or port drivers on initialization failures -- left a
dangling queue entry pointing into freed memory, corrupting the queue on the next
scheduler operation. Pre-existing upstream; exposed by the spawn_opt atomicity test added
in the next commit, the first to exercise such a path.

Dequeue in context_destroy under the processes spinlock, and reset the queue item at the
scheduler's own dequeue sites so the destroy-time dequeue is a no-op there. A concurrent
re-queue cannot happen: scheduler_make_ready refuses Killed and Spawning processes under
the same spinlock, and one of these flags is set on every destroy path.

Signed-off-by: Davide Bettio <davide@uninstall.it>
do_spawn published the link before the monitor phase, which can still fail parsing the
monitor options or on out-of-memory, including the result-space reservation. The error
paths destroyed the new context but left the caller's link entry installed, so the
destroyed context's teardown sent the caller a spurious {'EXIT', Pid, normal} -- delivered
as a message when trapping exits -- and process_info(links) reported a link to a dead pid,
for a spawn_opt call that raised. OTP's spawn_opt is atomic: either it returns and the
link exists, or it raises without side effects.

Run every fallible step -- option parsing, link and monitor allocations, and the result
space reservation -- before publishing anything; publishing itself cannot fail. This
closes the unwind residual left out of scope by the result-space commit. Add a regression
test checking that spawn_opt(F, [link, {monitor, [Bad]}]) raises badarg without leaving a
link behind or delivering an 'EXIT' message.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The BEAM conformance CI run (test-erlang -b) showed that on OTP,
monitor(process, self(), [{alias, _}]) is a no-op apart from returning a fresh reference:
no monitor is installed and no alias either -- sends to the returned reference are
dropped, and unalias/1 and demonitor(Ref, [info]) return false (verified on OTP 29).

This reverts the self half of "Return a real alias from monitor/3 on the self and noproc
paths", whose premise was wrong for self; the noproc half is conformant (its test passes
on the BEAM). Return a plain short reference and install nothing, matching the existing
monitor/2 self path, and update the test to assert the OTP behaviour.

Signed-off-by: Davide Bettio <davide@uninstall.it>
On the BEAM the test process is the erl_eval process, which is linked to init, so the
{links, []} assertion failed when running the tests with test-erlang -b. Snapshot the
links before the spawn_opt call and assert they are unchanged after; this still catches
a leaked link on AtomVM and is independent of the harness process's pre-existing links.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The monitor_*_new constructors return a pointer to the struct Monitor
embedded in their container struct, and the nifs.c error paths free()d
that pointer directly. That is only correct because the monitor happens
to be the first member of every container struct; recover the container
with CONTAINER_OF instead, through a new monitor_destroy that switches
on the monitor type, so a layout change cannot silently corrupt the
heap. No functional change.

Signed-off-by: Davide Bettio <davide@uninstall.it>
bettio added 25 commits June 17, 2026 15:25
The monitor/3 and alias/1 @doc comments each closed their option-list
paragraph with a one-line caveat about an OTP option AtomVM does not
support: {tag, Term} for monitor/3, the OTP 28 priority option for
alias/1. Tucked onto the end of the bullet list, both were easy to miss.

Lift each caveat into its own paragraph, led by a bold "Note:" and
reworded to open with "Unlike Erlang/OTP", so the divergence stands out
when scanning the generated documentation. Comment-only change; the
compiled module is unaffected.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The process-reference decode path rejects a process id that is
INVALID_PROCESS_ID or above TERM_MAX_LOCAL_PROCESS_ID. The three-line
comment above the check only restated it -- the constant names already
say what is rejected and why. Remove it.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The ordering of alias references was exercised only indirectly, by
test_refs_ordering, whose aliases all share one owner. Add a focused
test_monitor case pinning the portable, OTP-conformant properties: a plain
reference sorts before an alias whichever was created first; aliases of one
owner order by creation; the owner pid is encoded, so a process reference
round-trips through term_to_binary equal; and two owners' aliases are
distinct and strictly ordered.

The cross-owner direction follows the internal pid, not the pid term order
(checked on OTP 29), so it is asserted only as a strict total order rather
than pinned. The module also runs on the BEAM; the test passes there and on
the interpreter and JIT flavors.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The two len == 3 branches in term_compare each carried a multi-line comment
explaining that a local process reference is laid out [ticks_hi, ticks_lo,
process_id] to match the wire/external word order. The layout is plain in the
three assignments that follow, and each branch is already labelled by its
else if (len == 3), like the commentless len == 2 sibling. Remove the
narration.

Signed-off-by: Davide Bettio <davide@uninstall.it>
test_encode_process_ref only asserted that binary_to_term(term_to_binary(R))
returns R, which a compiler is free to fold to R = R, testing nothing. Route
the binary through ?MODULE:id/1 (already used elsewhere in this module) so the
round-trip is opaque to the compiler and actually runs.

Also match the serialized form against its known bytes: the version byte, the
NEWER_REFERENCE_EXT tag, the nonode@nohost node atom and the zero creation. The
id-word count differs between implementations (AtomVM 3 words, the BEAM 5), so
the length word and the id bytes are left unmatched -- the pattern holds on
both, where this module also runs.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Drop the "(OTP >= 26 semantics)" parenthetical; the entry now reads "Added
support for process aliases: ...".

Signed-off-by: Davide Bettio <davide@uninstall.it>
erlang:send/2, the send opcode and the JIT send dispatched on
term_is_process_reference before the term_is_reference guard that rejects a
non-reference recipient with badarg. Put the guard first, then the
process-reference (active alias) branch, then the silent drop for any other
reference. A process reference is always a reference
(TERM_BOXED_REFERENCE_PROCESS_HEADER carries TERM_BOXED_REF in its low bits),
so the three outcomes -- badarg, alias send, drop -- are unchanged.

Also turn the note about the unsupported outbound distributed alias into a
TODO at each of the three drop sites.

Signed-off-by: Davide Bettio <davide@uninstall.it>
active_alias_count was a 16-bit field plus a 1-bit alias_count_saturated flag
(17 bits) used only to skip the monitor-list walk for alias-free processes.
That skip only distinguishes count == 0 from count > 0, so saturation needs no
distinct value: shrink the count to 8 bits and let 0xFF double as the saturated
sentinel. It pins at 0xFF once 255 aliases coexist and is never decremented
again, so lookups keep walking the list rather than wrapping to 0 and silently
deactivating every alias -- the same guarantee as before, in one fewer field
and a little less code.

The read gates (context_find_alias and the mailbox fast path) are unchanged;
the increment loses its saturation branch and the two decrements guard on the
0xFF sentinel instead of the removed flag.

Signed-off-by: Davide Bettio <davide@uninstall.it>
context_monitors_handle_terminate reserved TERM_BOXED_REFERENCE_PROCESS_SIZE for
every monitor's DOWN, but only a CONTEXT_MONITOR_MONITORED_LOCAL_ALIAS monitor
builds a process reference; a plain monitor builds a short reference, one word
smaller on 64-bit and two on 32-bit. Plain monitors are the common case, so
reserve by the flavor instead of always taking the larger size.

Hoist the monitor_type test into a single is_monitored_alias bool that drives
both the reservation size and the ref build, so the reserved size cannot drift
from what is allocated -- the reservation now exactly matches the allocation on
each branch.

Signed-off-by: Davide Bettio <davide@uninstall.it>
…send

In jit_send's atom branch the registered-name lookup result was re-tested with
term_is_local_pid_or_port, raising badarg in the else. That else is unreachable:
nif_erlang_register_2 is the only writer of the registry and validates
term_is_local_pid_or_port before storing, so globalcontext_get_registered_process
returns either a local pid/port or UNDEFINED_ATOM, and UNDEFINED_ATOM is already
handled just above. Resolve straight to the process id and send, matching
jit_send's own direct-pid branch and nif_erlang_send_2.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The priority option (OTP 28) was the only recognized-but-unimplemented OTP
option in the alias/monitor option parsing that raised badarg; the monitor/3
{tag, _} option raises unsupported, the release-0.7 convention for options
AtomVM recognizes but does not implement. Match it: recognize priority
explicitly and raise unsupported, so both report the same error. Genuinely
unknown options still fall through to badarg.

Add the priority atom and update the alias/1 @doc note accordingly.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The alias/process-reference code added by this branch left a few error and
not-taken branches without the UNLIKELY / IS_NULL_PTR hint that the
surrounding code already applies to the same paths: the send paths guard a
registered-name miss with UNLIKELY but not the adjacent non-reference
badarg, and nif_erlang_unalias hints the alias lookup with IS_NULL_PTR but
not the external-reference rejection just above it.

Hint them the same way: the non-reference send target (the send/2 NIF, the
send opcode and jit_send) and the external-reference unalias/1 with
UNLIKELY before their badarg/false, the out-of-range owner pid in the
process-reference decode with UNLIKELY, and the drop of a message to an
inactive alias with IS_NULL_PTR. Branch-prediction hints only; no behavior
change.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The process-aliases work added a number of comments that restate what the
code or the header doxygen already says, repeat the same fact across files,
or narrate platform internals. The per-function contracts are documented in
context.h, globalcontext.h and mailbox.h, so the inline copies in context.c
and globalcontext.c added nothing but a second place to drift out of sync.

Drop the inline blocks that duplicate the header docs (the alias-signal
contract, the send-to-alias rationale), the comment that restated the
reference-size ternary right below it, the active_alias_count saturation note
already carried by the field declaration, and the glibc/pthread EDEADLK
narration on the processes-table lock. Tighten the still-useful comments in
the mailbox drain, do_spawn, monitor/3 and the three send sites, keeping the
non-obvious reasoning (received order, in-place re-typing, OOM atomicity) but
not the restatement. Also remove the duplicated "reference types are
distinguished by size" clause from the 32-bit process-reference size comment.

Fix a misplaced comment in nif_erlang_dist_ctrl_put_data along the way: the
"dist handle" label sat on roots[1] (argv[1], the decoded binary) instead of
roots[0] (argv[0], the connection resource), where the other operations in the
same function correctly put it. The code itself is unchanged.

In the tests, drop narration that the test function name already conveys, and
extract the eavmlib file select-handle alias-rejection check out of
test_fifo_select into its own test_alias_select_handle_rejected/1.

Comment and test-refactor only; no behavior change. Both flavors build and the
alias modules plus the eavmlib and estdlib test packs pass.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The comments added by the alias work leaned on `--` em-dashes and a few long
sentences joined by `;`, which reads as machine-generated and is out of step
with the rest of the codebase. Rewrite them as plain, short sentences: the
parenthetical and trailing `--` become periods, commas or parentheses, and the
long `;` sentences are split in two.

Touches the alias comments in context.c, dist_nifs.c, mailbox.c, nifs.c,
opcodesswitch.h, otp_socket.c and resources.c, plus the test_monitor.erl
docstrings. The pre-existing base `--` uses (the sub-binary doxygen in term.h,
the timeout fall-through comments in opcodesswitch.h) are left alone.

Comment-only; no behavior change. Both flavors build and the alias test
modules pass.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The saturation value 0xFF appeared as a bare literal in the three
active_alias_count guards, which is easy to misread. Give it a name,
ACTIVE_ALIAS_COUNT_SATURATED, defined in context.c next to the other file-local
constants, and carry the saturation rationale (why saturate instead of wrapping
to 0) in the comment on the define.

The field comment in context.h shrinks to the essentials: it is an optimization
and it saturates at a sticky sentinel, with a pointer to the define.

No behavior change. Both flavors build and test_monitor passes.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The arity check for DOP_ALIAS_SEND / DOP_ALIAS_SEND_TT inlined the expected
value as a ternary inside the condition, on a long hard-to-scan line. Pull it
out into a named expected_arity (size_t, matching arity so the comparison stays
sign-clean) so the check reads plainly: DOP_ALIAS_SEND carries 3 control
elements, DOP_ALIAS_SEND_TT 4 (the extra trace token).

No behavior change. Both flavors build and test_net_kernel (the inbound
DOP_ALIAS_SEND test) passes.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The FIXME spelled out the leak and the suggested fix over three lines and used
the "pre-existing" review tag, which does not belong in a code comment. Reduce
it to a one-line note of the actual defect: the void return hides the failure
from the caller.

Comment-only; no behavior change.

Signed-off-by: Davide Bettio <davide@uninstall.it>
process_outer_list took both a Context and a Mailbox, with ctx == NULL used as
an alias-blind mode flag. The two arguments had to agree: a caller could pass
process_outer_list(ctx1, &ctx2->mailbox) and nothing would catch it. The fast
path used only active_alias_count and global, never the whole Context.

Split the worker into its two real operations. mailbox_process_outer_list now
takes a Mailbox again and is free of any Context dependency: it is the
alias-blind drain used by ports, teardown and the crashdump, which leave any
alias signal in the signal list for the caller to drop.
mailbox_process_outer_list_with_aliases takes the Context and derives its own
mailbox, so the mismatched-pair case is gone. The shared mechanics, the CAS
detach and the inner-list splice, move into two static inline helpers. The
alias-active slow path still needs the whole Context because it mutates the
alias table.

The hot path stays a single pass and the helpers are inlined, so the generated
code is equivalent. No behavior change; both JIT flavors pass the full suite and
the full valgrind run is clean.

Signed-off-by: Davide Bettio <davide@uninstall.it>
A few comments left by the alias work restated something the reader already
has: a function's own doxygen, a neighbouring comment, or the size macro a few
lines up. Drop the redundant part and keep whatever is genuinely non-obvious.

- mailbox.h: mailbox_process_outer_list already returns "the signal messages",
  so it need not add that alias messages come back as signals.
- mailbox.c: the alias-blind drain need not restate where an alias signal goes,
  only why the simple handling is safe (its callers never own an active alias).
- nifs.c: the self-monitor branch already says no alias is installed, and a
  SHORT_SIZE reservation followed by term_from_ref_ticks is the usual pattern.
- term.h: the process-reference pad-init comment keeps only the GC reason. The
  size rationale already lives on TERM_BOXED_REFERENCE_PROCESS_SIZE.
- erlang.erl: send/2 already lists an alias as a Target, so the doc keeps only
  the non-obvious rule that a send to an inactive alias is dropped.
- test_monitor.erl: the ordering test's header already lists every property, so
  two inline comments that repeated it are gone (those adding the tie-break and
  round-trip detail stay).

Comment and documentation only; no behavior change.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The alias tests carried many comments that narrate what the next lines or the
test function name already say (e.g. a header restating the function name, or a
note describing the very send/receive below it). Keep comments minimal: only
facts a reader cannot infer from the code stay.

Kept across the suite: the timing and ordering rationales (why a fence makes an
`after 0` race-free, why a selective receive leaves a 'DOWN' queued), the
BEAM-vs-AtomVM divergences (stale 'EXIT' under trap_exit, the eval process
linked to init, the group-leader setup, opaque reference words), the magic-value
choices (the large spin bound, the refc-binary size that forces an mso sweep),
and the monotonic-pid plus unique-ticks invariant the stale-alias test relies on.

- test_net_kernel.erl: four narration blocks reduced to the one fence + stale
  'EXIT' note.
- test_monitor.erl: 17 comment blocks dropped, 15 tightened.

Comment only; no behavior change.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Both select validation sites explained at length why a process alias is rejected
as a select handle. That the alias is rejected, and that the select machinery
only carries ref ticks, is clear from the surrounding code. Keep only the part a
reader cannot infer here: that the BEAM accepts an alias where AtomVM does not.

Comment only; no behavior change.

Signed-off-by: Davide Bettio <davide@uninstall.it>
A leftover comment joined two clauses with a semicolon. The comment style asks
for plain separate sentences instead.

Comment only; no behavior change.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Both console gen_call comments narrated what the code already shows. The reply
reservation comment re-derived the 2 * TUPLE_SIZE(2) size that the very next line
reserves and compared it to the other reply shapes; keep only the reason the
reservation exists at all (the reply builders below do not ensure_free). The
io_reply comment led with "echo ReplyAs verbatim", which the term_put_tuple_element
line shows; keep only the reason not to rebuild it from ref ticks (an alias would
become a plain reference the requester would not match).

Comment only; no behavior change.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The spawn, send and DOWN paths carried comments that narrated the adjacent code:
re-listing the steps a block is about to run, restating the size the next line
reserves, or re-describing a branch condition. Keep only what a reader cannot get
from the code.

- do_spawn: drop the step enumeration and the result-size restatement; keep the
  spurious-EXIT atomicity reason, the publish ordering, and the GC-safety note.
- The three send paths (nif_erlang_send_2, jit_send, OP_SEND): drop the "not a
  local process reference" lead, keep the OTP silent-drop divergence and the
  outbound-unsupported note.
- context.c: drop the "remove from the scheduler queue" narration; delete the
  alias-deactivation comment duplicated within the same function (it is stated at
  the first site); keep the capture-pid-before-demonitor ordering note.
- context.h: trim the active_alias_count field comment to its two real facts.
- dist_nifs.c: keep why a stale inbound ref is dropped rather than crashing the
  distribution connection.

Comment only; no behavior change.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Signed-off-by: Davide Bettio <davide@uninstall.it>
@bettio bettio force-pushed the updated-aliases branch from 787f573 to 6990a17 Compare June 17, 2026 13:26
@bettio bettio requested a review from pguyot June 17, 2026 15:26
Comment thread src/libAtomVM/mailbox.c
// Reverse the LIFO list into received order (oldest first), splitting it into a normal sublist
// and a signal sublist in one pass. This entry is alias-blind: its callers (ports, teardown and
// crashdump) never own an active alias.
MailboxMessage *normal_first = NULL;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with the renaming, but keeping the old names would reduce the diff.

Comment thread src/libAtomVM/context.c
TERM_DEBUG_ASSERT(ctx->active_alias_count > 0);
if (LIKELY(ctx->active_alias_count != ACTIVE_ALIAS_COUNT_SATURATED)) {
ctx->active_alias_count--;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this after the list_remove and if the count is saturated but there is no monitor left (after the remove) I would reset it to 0, so the alias count gets a chance to be reset.

Comment thread src/libAtomVM/context.h
CONTEXT_MONITOR_LINK_REMOTE,
CONTEXT_MONITOR_MONITORING_LOCAL_REGISTEREDNAME,
CONTEXT_MONITOR_ALIAS,
// Like CONTEXT_MONITOR_MONITORED_LOCAL, but the 'DOWN' carries a process reference (an alias).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this comment confusing

Comment thread src/libAtomVM/context.h
* must only be used on monitors that were never passed to context_add_monitor,
* e.g. on error paths.
*
* @param monitor the monitor to free, or NULL in which case nothing is done

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked few calls below, monitor was never NULL. Do we really want this semantic?

Comment thread src/libAtomVM/context.c

void monitor_destroy(struct Monitor *monitor)
{
if (monitor == NULL) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds over defensive.

Comment thread src/libAtomVM/nifs.c

term refcopy = term_from_ref_ticks(ref_ticks, &ctx->heap);

term reply = term_alloc_tuple(3, &ctx->heap);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tuple 3 but we alloc for two tuple 2 above.

We had 12 and 2 * TUPLE(2) is 6 I believe, and we need at least 7 in this branch, don't we?

I wonder why we copied the ref and no longer need to. With the ref, the branch was 9. Is there another branch that required 12?

Comment thread src/libAtomVM/mailbox.c
mbox->inner_last = last;
}

MailboxMessage *mailbox_process_outer_list(Mailbox *mbox)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced this works.
There are several calls of mailbox_process_outer_list that haven't been replaced with the _with_alias variant, so Alias is considered as a signal and order is lost. For example get_process_info it seems. This would break message ordering semantics, wouldn't it?

I'm not sure we should have the variant in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants