Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
259 changes: 259 additions & 0 deletions release-notes/11.0/preview/preview5/runtime.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
# .NET Runtime in .NET 11 Preview 5 - Release Notes

.NET 11 Preview 5 includes new runtime features and performance work:

- [Runtime-async suspension and OSR are faster](#runtime-async-suspension-and-osr-are-faster)
- [JIT optimizations](#jit-optimizations)
- [Arm intrinsics add native integer and SVE2 predicates](#arm-intrinsics-add-native-integer-and-sve2-predicates)
- [GC trimming and compaction improvements](#gc-trimming-and-compaction-improvements)
- [NativeAOT diagnostics use cDAC data descriptors](#nativeaot-diagnostics-use-cdac-data-descriptors)
- [Browser/WebAssembly CoreCLR enablement](#browserwebassembly-coreclr-enablement)
- [Diagnostics and loader messages](#diagnostics-and-loader-messages)
- [Bug fixes](#bug-fixes)
- [Community contributors](#community-contributors)

.NET Runtime updates in .NET 11:

- [What's new in .NET 11 runtime](https://learn.microsoft.com/dotnet/core/whats-new/dotnet-11/runtime)

<!-- Verified against Microsoft.NETCore.App.Ref@11.0.0-preview.5.26276.113 -->

## Runtime-async suspension and OSR are faster

Runtime-async suspension and resumption continue to get faster in Preview 5. The biggest win is for async methods that resume inside OSR-compiled code: resumption now jumps directly from tier 0 code into the OSR method instead of going through the patchpoint helper. The PR reports that the patchpoint-helper overhead was around 10-20x, and the sample suspension-heavy benchmark improved from `Took 6357.1 ms` to `Took 457.1 ms` ([dotnet/runtime #127074](https://github.com/dotnet/runtime/pull/127074)).

```diff
- mov ecx, dword ptr [rbp-0x8C]
- call CORINFO_HELP_PATCHPOINT_FORCED
+ jmp qword ptr [rbp-0x90]
int3
```

Other runtime-async changes reduce the cost and size of generated suspension code:

- The JIT now calls `FinishSuspensionNoContinuationContext` and `FinishSuspensionWithContinuationContext` helpers for the common suspension paths. The PR reports an approximately 8% improvement on a suspension-heavy microbenchmark, with five runs moving from `284.2 ms`, `286.2 ms`, `285.4 ms`, `286.6 ms`, and `286.7 ms` to `264.6 ms`, `265.4 ms`, `266.3 ms`, `265.9 ms`, and `265.6 ms`. The generated code in the PR sample shrank from 766 bytes to 751 bytes ([dotnet/runtime #126041](https://github.com/dotnet/runtime/pull/126041)).
- Suspension/resumption now stores `Thread.CurrentThread` inside `RuntimeAsyncAwaitState`, moves TLS object fields into a stack-allocated `ref struct` on the hot paths, and removes several write barriers. The PR sample improved from `Took 350.3 ms` to `Took 291.3 ms` ([dotnet/runtime #127336](https://github.com/dotnet/runtime/pull/127336)).
- Runtime-async callable thunks for `ValueTask` now cache and reuse a custom continuation when an `IValueTaskSource`-backed `ValueTask` suspends. This removes the allocation that previously happened on that path ([dotnet/runtime #127973](https://github.com/dotnet/runtime/pull/127973)).

## JIT optimizations

Several JIT optimizations landed this preview that benefit normal C# without source changes.

### Redundant span and null checks

The JIT now removes more range checks from span loops that repeatedly slice off a fixed-width prefix. In this loop, the `data.Length >= Vector128<int>.Count` guard proves that the next `Vector128.Create(data)` and `data.Slice(Vector128<int>.Count)` are in range across the loop back edge:

```csharp
int Sum(ReadOnlySpan<int> data)
{
Vector128<int> sum = default;
while (data.Length >= Vector128<int>.Count)
{
sum += Vector128.Create(data);
data = data.Slice(Vector128<int>.Count);
}

int result = Vector128.Sum(sum);
foreach (int t in data)
{
result += t;
}

return result;
}
```

The extra range-check block is gone, and the PR sample shrank from 113 bytes to 79 bytes ([dotnet/runtime #127117](https://github.com/dotnet/runtime/pull/127117)):

```diff
G_M38854_IG03:
- cmp ecx, 4
- jl SHORT G_M38854_IG09
vpaddd xmm0, xmm0, xmmword ptr [rax]
add rax, 16
add ecx, -4
cmp ecx, 4
jge SHORT G_M38854_IG03
...
-G_M38854_IG09:
- mov ecx, 6
- call [System.ThrowHelper:ThrowArgumentOutOfRangeException(int)]
- int3
-; Total bytes of code 113
+; Total bytes of code 79
```

A related value-numbering and range-check improvement removes redundant checks for `span.Slice(span.Length - constant)`. This makes patterns such as reading the final four bytes of a span compile directly to the load after the initial length guard ([dotnet/runtime #127488](https://github.com/dotnet/runtime/pull/127488)):

```csharp
int Test(ReadOnlySpan<byte> span)
{
if (span.Length >= sizeof(int))
{
return BinaryPrimitives.ReadInt32BigEndian(span.Slice(span.Length - sizeof(int)));
}

return -1;
}
```

```diff
+ mov rax, bword ptr [rcx]
+ mov ecx, dword ptr [rcx+0x08]
+ cmp ecx, 4
+ jl SHORT G_M6173_IG05
+ add ecx, -4
+ add rax, rcx
+ movbe eax, dword ptr [rax]
+ ret
mov eax, -1
- lea eax, [rdx-0x04]
- cmp eax, edx
- ja SHORT G_M27777_IG07
- ...
- call [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
- int3
```

Null-check propagation also looks through PHIs that merge a newly-created value with an existing non-null value. For `(_inner ??= new Inner()).Do(n)`, the explicit null check before the branch is removed ([dotnet/runtime #127810](https://github.com/dotnet/runtime/pull/127810)):

```diff
G_M*_IG04:
- cmp byte ptr [rax], al
test ebx, ebx
jg SHORT G_M*_IG06
```

### Smaller arithmetic and faster casts

The JIT can now transform `CONST - x` into `x ^ CONST` when range information proves the identities are equivalent. For `255 - byte`, the generated code changes from `neg` + `add` to a single `xor`; for `-1 - x`, it changes to `not` ([dotnet/runtime #126529](https://github.com/dotnet/runtime/pull/126529)). Thank you [@BoyBaykiller](https://github.com/BoyBaykiller) for this contribution!

```diff
movzx rax, dl
- neg eax
- add eax, 255
+ xor eax, 255
```

On x86 processors with AVX-512 or AVX10.2, floating-point to `long`/`ulong` casts now use hardware conversion instructions for all non-overflow casts. The typical `double` to `long` diff replaces `CORINFO_HELP_DBL2LNG` with `vcvttpd2qq` and mask handling ([dotnet/runtime #125180](https://github.com/dotnet/runtime/pull/125180)). Thank you [@saucecontrol](https://github.com/saucecontrol) for this contribution!

```diff
- call CORINFO_HELP_DBL2LNG
+ vcmpordsd k1, xmm0, xmm0
+ vcmpge_oqsd k2, xmm0, qword ptr [@RWD00]
+ vcvttpd2qq xmm0 {k1}{z}, xmm0
+ vpblendmq xmm0 {k2}, xmm0, qword ptr [@RWD08] {1to2}
+ vmovd eax, xmm0
+ vpextrd edx, xmm0, 1
```

## Arm intrinsics add native integer and SVE2 predicates

`System.Runtime.Intrinsics.Arm` now has native-integer overloads for several scalar Arm intrinsics. `ArmBase.LeadingZeroCount`, `ArmBase.ReverseElementBits`, `Crc32.ComputeCrc32`, and `Crc32.ComputeCrc32C` accept `nint` or `nuint`, and the JIT lowers the 64-bit form directly to the Arm64 instruction width ([dotnet/runtime #127327](https://github.com/dotnet/runtime/pull/127327)).

```csharp
if (Crc32.IsSupported)
{
nuint value = (nuint)0x1234_5678;
uint crc = Crc32.ComputeCrc32(0, value);
}
```

SVE2 gained `CreateWhileGreaterThanMask*` and `CreateWhileReadAfterWriteMask*` predicate-generation intrinsics. The new `CreateWhileGreaterThanMask*` methods cover byte, signed byte, 16-bit, 32-bit, 64-bit, `double`, and `single` element masks, while `CreateWhileReadAfterWriteMask*` methods create masks from pointer ranges ([dotnet/runtime #127538](https://github.com/dotnet/runtime/pull/127538)).

## GC trimming and compaction improvements

A new GC configuration switch, `DOTNET_GCTrimYoungestKeepPercent`, lets memory-footprint latency mode keep a configurable percentage of the youngest generation during trimming. This gives applications another way to balance memory trimming against startup cost when using `DOTNET_GCLatencyLevel=0` ([dotnet/runtime #109863](https://github.com/dotnet/runtime/pull/109863)). Thank you [@ashaurtaev](https://github.com/ashaurtaev) for this contribution!

```powershell
$env:DOTNET_GCLatencyLevel = "0"
$env:DOTNET_GCTrimYoungestKeepPercent = "0xF"
```

GC compaction now keeps the `heap_segment_used` watermark accurate after relocating objects into a region. The fix avoids a stale gap that could retain dirty data when large pages or `never_decommit_p` caused `decommit_region` to clear only up to `used`. In the large-pages repro from the PR, the optimized fix was compared with a safe clear-all baseline over five-minute runs on .NET 11 Release standalone GC ([dotnet/runtime #128217](https://github.com/dotnet/runtime/pull/128217)). Thank you [@cshung](https://github.com/cshung) for this contribution!

| Metric | Clear-all | Optimized | Diff |
| ------ | --------: | --------: | ---: |
| Avg throughput (entries) | 3,322,843 | 3,391,784 | +2.1% |
| Peak throughput | 4,708,191 | 5,152,893 | +9.4% |
| OOM count | 52 | 34 | -35% |

## NativeAOT diagnostics use cDAC data descriptors

NativeAOT now emits cDAC data descriptors so diagnostic tools can inspect NativeAOT runtime state through the same contract-based model used by CoreCLR ([dotnet/runtime #126972](https://github.com/dotnet/runtime/pull/126972)). The descriptor data covers thread state, allocation contexts, method-table fields, exception traversal, stress log data, GC descriptors, and managed type descriptors produced by ILC.

The PR validation reported `1586/1586` cDAC reader tests passing and verified all three NativeAOT descriptor sources in dump inspection: the main descriptor, the managed descriptor for `System.Threading.Thread`, and the GC descriptor.

## Browser/WebAssembly CoreCLR enablement

Browser CoreCLR continues its bring-up in Preview 5:

- **JS initializer hooks.** The browser CoreCLR loader now implements `invokeLibraryInitializers`, enabling the same library-initializer hook used by the Mono WebAssembly runtime ([dotnet/runtime #127551](https://github.com/dotnet/runtime/pull/127551)).
- **Download retry is on by default.** The browser loader now enables download retry by default and improves retry sequencing for framework assets ([dotnet/runtime #127559](https://github.com/dotnet/runtime/pull/127559)).
- **Reverse P/Invoke and `UnmanagedCallersOnly` codegen.** WASM RyuJIT can now generate code for reverse P/Invoke and `UnmanagedCallersOnly` paths ([dotnet/runtime #127751](https://github.com/dotnet/runtime/pull/127751)).
- **Interpreter code resolution on portable-entrypoint platforms.** Browser CoreCLR can resolve interpreter code when `FEATURE_PORTABLE_ENTRYPOINTS` is enabled, which supports diagnostics paths such as sampling and profiler lookups ([dotnet/runtime #127370](https://github.com/dotnet/runtime/pull/127370)).

## Diagnostics and loader messages

Heap dumps now use HEAP2 by default. This path unifies memory-region enumeration and includes code heaps, the global loader allocator, app-domain loader allocators, assembly loader allocators, module loader allocators, and collectible loader heaps. The old environment-variable workarounds for slow dumps are deprecated ([dotnet/runtime #127321](https://github.com/dotnet/runtime/pull/127321)).

Assembly version conflicts now include details about the already-loaded assembly in `FileLoadException` messages. When a lower-versioned assembly is already loaded, the message can now include the loaded assembly identity and path ([dotnet/runtime #123969](https://github.com/dotnet/runtime/pull/123969)):

```text
A different copy of assembly 'LibA' is already loaded. Loaded assembly: 'LibA, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' from 'C:\repos\helloworld\bin\Debug\net10.0\LibA.dll'
```

<!-- Filtered features (significant engineering work, but too niche for release notes):
- cDAC DacDbi API parity: many debugger-contract methods landed this preview. Kept as bug-fix/diagnostics bullets because each individual API primarily serves debugger implementation parity rather than a broad developer-facing feature.
- WASM R2R and generated-callhelper infrastructure: important CoreCLR-on-WASM enablement, but most changes are build/linking internals without an independently usable behavior change.
- Additional JIT assertion, liveness, SVE unknown-size-frame, and APX fixes: valuable codegen work, but most entries are narrow correctness fixes without benchmark data or broad behavioral impact.
-->

## Bug fixes

- **JIT / code generation**
- Fixed data breakpoint handling after `CORINFO_HELP_ARRADDR_ST` inlining on x64 ([dotnet/runtime #127251](https://github.com/dotnet/runtime/pull/127251)).
- Fixed profile inconsistency asserts in flow-graph optimization ([dotnet/runtime #127357](https://github.com/dotnet/runtime/pull/127357)).
- Fixed register allocation around implicit kills and local liveness ([dotnet/runtime #127184](https://github.com/dotnet/runtime/pull/127184), [dotnet/runtime #127932](https://github.com/dotnet/runtime/pull/127932), [dotnet/runtime #127910](https://github.com/dotnet/runtime/pull/127910)).
- Fixed runtime-async with ReadyToRun on RISC-V64 ([dotnet/runtime #128066](https://github.com/dotnet/runtime/pull/128066)).
- Fixed Arm64 stack-allocated vector and mask loads/stores ([dotnet/runtime #128037](https://github.com/dotnet/runtime/pull/128037)).
- Fixed SIGILL on ARM64 platforms with SME but no SVE ([dotnet/runtime #127398](https://github.com/dotnet/runtime/pull/127398)).
- **GC**
- Fixed the remaining >1024 CPU affinity case in the GC Unix environment layer ([dotnet/runtime #127572](https://github.com/dotnet/runtime/pull/127572)).
- Disabled an aggressive large-page collection mode for ReadyToRun/Crossgen2 scenarios ([dotnet/runtime #127571](https://github.com/dotnet/runtime/pull/127571)).
- **NativeAOT**
- Fixed NativeAOT GC roots after universal transition; the PR stress loop improved from 69 runs with 6 fail-fast crashes to 132 runs with 132 successes, 0 crashes, and 0 test failures ([dotnet/runtime #127640](https://github.com/dotnet/runtime/pull/127640)).
- Fixed dependent-handle secondary access with standalone GC ([dotnet/runtime #128118](https://github.com/dotnet/runtime/pull/128118)).
- Preserved execution-aborted state in NativeAOT GC info ([dotnet/runtime #127680](https://github.com/dotnet/runtime/pull/127680)).
- Fixed NativeAOT hexadecimal config parsing for `0x` and `0X` prefixes ([dotnet/runtime #127644](https://github.com/dotnet/runtime/pull/127644)).
- **Diagnostics / cDAC**
- Implemented cDAC support for debugger attach state, compiler flags, heap segments, generic type context APIs, vararg signatures, type layouts, array layouts, partial user state, collectible type statics, and additional metadata enumeration APIs ([dotnet/runtime #126794](https://github.com/dotnet/runtime/pull/126794), [dotnet/runtime #127244](https://github.com/dotnet/runtime/pull/127244), [dotnet/runtime #128054](https://github.com/dotnet/runtime/pull/128054), [dotnet/runtime #128263](https://github.com/dotnet/runtime/pull/128263), [dotnet/runtime #128106](https://github.com/dotnet/runtime/pull/128106), [dotnet/runtime #127877](https://github.com/dotnet/runtime/pull/127877), [dotnet/runtime #127848](https://github.com/dotnet/runtime/pull/127848), [dotnet/runtime #128288](https://github.com/dotnet/runtime/pull/128288), [dotnet/runtime #127471](https://github.com/dotnet/runtime/pull/127471)).
- Fixed stale `lastThrownObjectHandle` data in DAC `GetThreadData` during active exception dispatch, affecting commands such as `!PrintException`, `!Threads`, and `!clrstack -a` ([dotnet/runtime #127741](https://github.com/dotnet/runtime/pull/127741)).
- Fixed `createdump` SIGSEGV when heap dumps include active interpreter frames ([dotnet/runtime #128163](https://github.com/dotnet/runtime/pull/128163)).
- **Runtime / VM**
- Fixed possible race conditions in thread-static variable initialization ([dotnet/runtime #127843](https://github.com/dotnet/runtime/pull/127843)).
- Fixed Linux runtime initialization when CPU hotplug is enabled ([dotnet/runtime #128069](https://github.com/dotnet/runtime/pull/128069)).
- Handled the generic-context argument in runtime signature-key computation ([dotnet/runtime #128171](https://github.com/dotnet/runtime/pull/128171)).
- **Mono / interpreter**
- Restored signal handlers during crash chaining on Mono ([dotnet/runtime #125835](https://github.com/dotnet/runtime/pull/125835)).
- Added interpreter support for stack walking and diagnostics in cDAC ([dotnet/runtime #126520](https://github.com/dotnet/runtime/pull/126520)).
- Fixed interpreter breakpoint handling for first-chance native exceptions ([dotnet/runtime #127592](https://github.com/dotnet/runtime/pull/127592)).

## Community contributors

Thank you contributors! ❤️

- [@am11](https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Amerged+author%3Aam11)
- [@ashaurtaev](https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Amerged+author%3Aashaurtaev)
- [@BoyBaykiller](https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Amerged+author%3ABoyBaykiller)
- [@clamp03](https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Amerged+author%3Aclamp03)
- [@cshung](https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Amerged+author%3Acshung)
- [@dovydenkovas](https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Amerged+author%3Adovydenkovas)
- [@giritrivedi](https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Amerged+author%3Agiritrivedi)
- [@hez2010](https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Amerged+author%3Ahez2010)
- [@jbevain](https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Amerged+author%3Ajbevain)
- [@Ruihan-Yin](https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Amerged+author%3ARuihan-Yin)
- [@saucecontrol](https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Amerged+author%3Asaucecontrol)
- [@SingleAccretion](https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Amerged+author%3ASingleAccretion)
- [@snickolls-arm](https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Amerged+author%3Asnickolls-arm)
- [@SwapnilGaikwad](https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Amerged+author%3ASwapnilGaikwad)
Loading