Skip to content

Add SIMD alloca* scratch allocation APIs with ParparVM stack lowering and benchmark/compliance coverage#4772

Draft
Copilot wants to merge 29 commits intosimd-revisitefrom
copilot/add-alloca-special-method-simd-api
Draft

Add SIMD alloca* scratch allocation APIs with ParparVM stack lowering and benchmark/compliance coverage#4772
Copilot wants to merge 29 commits intosimd-revisitefrom
copilot/add-alloca-special-method-simd-api

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 18, 2026

Continue the SIMD fusion work started with blendByMaskTestNonzero. Two remaining Image.java SIMD paths still issue multiple primitive calls (and therefore multiple native dispatches + multiple passes over the buffer). Fuse them into one new SIMD primitive each.

Plan

  • Menu Button On Android Shows Lightweight Menu Instead of Native Menu #1 — Fuse applyMask (4 passes → 1)

    • Add new SIMD primitive replaceTopByteFromUnsignedBytes(int[] rgbSrc, int rgbSrcOff, byte[] alphaSrc, int alphaSrcOff, int[] dst, int dstOff, int len) computing dst[i] = (rgbSrc[i] & 0x00ffffff) | ((alphaSrc[i] & 0xff) << 24) to Simd.java (Java fallback).
    • Add validating override + native binding to IOSSimd.java.
    • Add NEON-vectorised native implementation in IOSSimd.m (vmovl_u8vshlq_n_u32(24)vorrq(vandq, …)).
    • Add validating override to JavaSESimd.java.
    • Rewrite the SIMD branch of Image.applyMask(Object) to a single call to the new primitive (eliminating the unpack/shl/and/or chain and the int-scratch alloca).
    • Add a unit test in SimdTest.java (fallback semantics + registered-array round-trip).
  • Resource Editor - when opened from a generated project actions are disabled #2 — Fuse removeColor path (3 passes → 1)

    • Add new SIMD primitive blendByMaskTestNonzeroSubstituteOnKeepEq(int[] src, int srcOff, int testMask, int trueKeepMask, int trueOrValue, int removeMatch, int removeValue, int[] dst, int dstOff, int len) computing dst[i] = (src[i] & testMask) == 0 ? src[i] : ((src[i] & trueKeepMask) == removeMatch ? removeValue : (src[i] & trueKeepMask) | trueOrValue) (Java fallback in Simd.java, validating overrides, NEON impl on iOS).
    • Rewrite Image.replaceAlphaPreserveTransparentRemoveColorSimd to a single call (eliminating the tmp/removeMask scratch buffers and the cmpEq + select follow-up passes).
    • Add a unit test in SimdTest.java.
  • Validation

    • mvn clean verify -DunitTests=true -pl core-unittests -am -Dmaven.javadoc.skip=true -Plocal-dev-javaseBUILD SUCCESS.
    • 11 SimdTest tests pass (was 9 before — confirms both new tests run); 106 image-related tests across ImageTest, IndexedImageTest, DynamicImageTest, RGBImageTest, LabelFeatureTest, ComponentImageTest, SimdTest pass.
    • No new Checkstyle / SpotBugs / PMD errors in any changed file.

Copilot AI and others added 4 commits April 18, 2026 04:18
Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/fe7ec42c-9a02-4f55-a18b-2f038764b2ef

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/fe7ec42c-9a02-4f55-a18b-2f038764b2ef

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/fe7ec42c-9a02-4f55-a18b-2f038764b2ef

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/fe7ec42c-9a02-4f55-a18b-2f038764b2ef

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@shai-almog shai-almog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a verification stage for bytecodes in the maven build that checks bytecode isn't using methods it isn't allowed to. We can use that code to verify that calls to alloca methods don't return that value from the method, don't store it into a variable and don't invoke any method that isn't on the Simd class.


/// Allocates a scratch byte array for temporary SIMD work.
/// On ParparVM this may be lowered to a stack-backed faux array, so callers
/// should keep it method-local and avoid returning or storing it beyond the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix wording to MUST and mark the methods as deprecated with a note that developers need to be cautious about using them. Don't use the @Deprecated annotation which I don't think we support.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 4f33c2c. The wording now uses MUST, and the methods are marked deprecated via JavaDoc only with a cautionary note instead of using the @Deprecated annotation.

Comment thread CodenameOne/src/com/codename1/util/Base64.java
Comment thread vm/ByteCodeTranslator/src/cn1_globals.h Outdated
extern JAVA_OBJECT allocArray(CODENAME_ONE_THREAD_STATE, int length, struct clazz* type, int primitiveSize, int dim);
extern JAVA_OBJECT allocArrayAligned(CODENAME_ONE_THREAD_STATE, int length, struct clazz* type, int primitiveSize, int dim, int alignment);
extern JAVA_OBJECT allocMultiArray(int* lengths, struct clazz* type, int primitiveSize, int dim);
#define CN1_SIMD_STACK_ARRAY(length, arrayClass, primitiveSize, dim, alignment) \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alloca arrays are always single dimension and are always primitive. Does it make sense to align them always on the same value too?

This code should be simplified.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified in e16c0ee. The stack-allocation macro is now fixed to a single primitive, single-dimension layout with a named SIMD alignment constant instead of carrying dimension/alignment parameters.

@shai-almog
Copy link
Copy Markdown
Collaborator

shai-almog commented Apr 18, 2026

Compared 37 screenshots: 37 matched.

Native Android coverage

  • 📊 Line coverage: 7.81% (4133/52893 lines covered) [HTML preview] (artifact android-coverage-report, jacocoAndroidReport/html/index.html)
    • Other counters: instruction 6.14% (20444/333000), branch 2.98% (956/32114), complexity 3.64% (1119/30721), method 6.41% (918/14329), class 10.63% (202/1900)
    • Lowest covered classes
      • kotlin.collections.kotlin.collections.ArraysKt___ArraysKt – 0.00% (0/6327 lines covered)
      • kotlin.collections.unsigned.kotlin.collections.unsigned.UArraysKt___UArraysKt – 0.00% (0/2384 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.ClassReader – 0.00% (0/1519 lines covered)
      • kotlin.collections.kotlin.collections.CollectionsKt___CollectionsKt – 0.00% (0/1148 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.MethodWriter – 0.00% (0/923 lines covered)
      • kotlin.sequences.kotlin.sequences.SequencesKt___SequencesKt – 0.00% (0/730 lines covered)
      • kotlin.text.kotlin.text.StringsKt___StringsKt – 0.00% (0/623 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.Frame – 0.00% (0/564 lines covered)
      • kotlin.collections.kotlin.collections.ArraysKt___ArraysJvmKt – 0.00% (0/495 lines covered)
      • kotlinx.coroutines.kotlinx.coroutines.JobSupport – 0.00% (0/423 lines covered)

✅ Native Android screenshot tests passed.

Native Android coverage

  • 📊 Line coverage: 7.81% (4133/52893 lines covered) [HTML preview] (artifact android-coverage-report, jacocoAndroidReport/html/index.html)
    • Other counters: instruction 6.14% (20444/333000), branch 2.98% (956/32114), complexity 3.64% (1119/30721), method 6.41% (918/14329), class 10.63% (202/1900)
    • Lowest covered classes
      • kotlin.collections.kotlin.collections.ArraysKt___ArraysKt – 0.00% (0/6327 lines covered)
      • kotlin.collections.unsigned.kotlin.collections.unsigned.UArraysKt___UArraysKt – 0.00% (0/2384 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.ClassReader – 0.00% (0/1519 lines covered)
      • kotlin.collections.kotlin.collections.CollectionsKt___CollectionsKt – 0.00% (0/1148 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.MethodWriter – 0.00% (0/923 lines covered)
      • kotlin.sequences.kotlin.sequences.SequencesKt___SequencesKt – 0.00% (0/730 lines covered)
      • kotlin.text.kotlin.text.StringsKt___StringsKt – 0.00% (0/623 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.Frame – 0.00% (0/564 lines covered)
      • kotlin.collections.kotlin.collections.ArraysKt___ArraysJvmKt – 0.00% (0/495 lines covered)
      • kotlinx.coroutines.kotlinx.coroutines.JobSupport – 0.00% (0/423 lines covered)

Benchmark Results

Detailed Performance Metrics

Metric Duration
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 native encode 1026.000 ms
Base64 CN1 encode 132.000 ms
Base64 encode ratio (CN1/native) 0.129x (87.1% faster)
Base64 native decode 931.000 ms
Base64 CN1 decode 224.000 ms
Base64 decode ratio (CN1/native) 0.241x (75.9% faster)
Image encode benchmark status skipped (SIMD unsupported)

Copilot AI and others added 5 commits April 18, 2026 04:50
Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/90bc6a2c-a647-42c2-b5fb-b753d4f84d2e

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/90bc6a2c-a647-42c2-b5fb-b753d4f84d2e

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/90bc6a2c-a647-42c2-b5fb-b753d4f84d2e

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/90bc6a2c-a647-42c2-b5fb-b753d4f84d2e

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/90bc6a2c-a647-42c2-b5fb-b753d4f84d2e

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 18, 2026

We have a verification stage for bytecodes in the maven build that checks bytecode isn't using methods it isn't allowed to. We can use that code to verify that calls to alloca methods don't return tha...

Implemented in 9d015d3. I extended the Maven bytecode-compliance scan with SIMD alloca usage checks so alloca scratch arrays are flagged if they escape via returns/field stores/object-array stores or are passed to non-Simd methods.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • www.codenameone.com
    • Triggering command: /usr/lib/jvm/temurin-8-jdk-amd64/bin/java /usr/lib/jvm/temurin-8-jdk-amd64/bin/java -classpath /usr/share/apache-maven-3.9.14/boot/plexus-classworlds-2.9.0.jar -Dclassworlds.conf=/usr/share/apache-maven-3.9.14/bin/m2.conf -Dmaven.home=/usr/share/apache-maven-3.9.14 -Dlibrary.jansi.path=/usr/share/apache-maven-3.9.14/lib/jansi-native -Dmaven.multiModuleProjectDirectory=/home/REDACTED/work/CodenameOne/CodenameOne/maven org.codehaus.plexus.classworlds.launcher.Launcher -pl codenameone-maven-plugin -am -DfailIfNoTests=false -Dtest=BytecodeComplianceMojoTest test (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from shai-almog April 18, 2026 04:55
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 18, 2026

✅ ByteCodeTranslator Quality Report

Test & Coverage

  • Tests: 381 total, 0 failed, 2 skipped

Benchmark Results

  • Execution Time: 10708 ms

  • Hotspots (Top 20 sampled methods):

    • 23.60% java.lang.String.indexOf (446 samples)
    • 17.67% com.codename1.tools.translator.Parser.isMethodUsed (334 samples)
    • 12.38% java.util.ArrayList.indexOf (234 samples)
    • 5.93% com.codename1.tools.translator.Parser.addToConstantPool (112 samples)
    • 4.76% java.lang.Object.hashCode (90 samples)
    • 4.44% com.codename1.tools.translator.ByteCodeClass.markDependent (84 samples)
    • 3.28% java.lang.System.identityHashCode (62 samples)
    • 1.85% com.codename1.tools.translator.ByteCodeClass.updateAllDependencies (35 samples)
    • 1.59% com.codename1.tools.translator.Parser.getClassByName (30 samples)
    • 1.43% com.codename1.tools.translator.Parser.generateClassAndMethodIndexHeader (27 samples)
    • 1.22% com.codename1.tools.translator.ByteCodeClass.calcUsedByNative (23 samples)
    • 1.22% com.codename1.tools.translator.ByteCodeClass.fillVirtualMethodTable (23 samples)
    • 1.11% com.codename1.tools.translator.BytecodeMethod.appendMethodSignatureSuffixFromDesc (21 samples)
    • 1.11% com.codename1.tools.translator.BytecodeMethod.appendCMethodPrefix (21 samples)
    • 0.95% com.codename1.tools.translator.Parser.cullMethods (18 samples)
    • 0.90% java.lang.StringBuilder.append (17 samples)
    • 0.90% com.codename1.tools.translator.BytecodeMethod.appendMethodC (17 samples)
    • 0.85% com.codename1.tools.translator.BytecodeMethod.optimize (16 samples)
    • 0.85% java.lang.StringCoding.encode (16 samples)
    • 0.58% com.codename1.tools.translator.ByteCodeClass.markDependencies (11 samples)
  • ⚠️ Coverage report not generated.

Static Analysis

  • ✅ SpotBugs: no findings (report was not generated by the build).
  • ⚠️ PMD report not generated.
  • ⚠️ Checkstyle report not generated.

Generated automatically by the PR CI workflow.

}

/**
* @deprecated This is a special scratch-allocation API. On ParparVM this may be lowered to a
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Java 25 markdown Javadoc comments like the rest of the code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in bf3dde7. The alloca*() docs now use the Java 25 markdown JavaDoc /// style used in this codebase, while keeping the MUST/caution wording and the deprecation guidance.

Comment thread vm/ByteCodeTranslator/src/cn1_globals.h
Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/bb5fa00f-a32c-4546-b631-d21141725593

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
Copilot AI requested a review from shai-almog April 18, 2026 05:09
@shai-almog
Copy link
Copy Markdown
Collaborator

shai-almog commented Apr 18, 2026

Compared 37 screenshots: 37 matched.
✅ Native iOS screenshot tests passed.

Benchmark Results

  • VM Translation Time: 0 seconds
  • Compilation Time: 75 seconds

Build and Run Timing

Metric Duration
Simulator Boot 0 ms
Simulator Boot (Run) 1000 ms
App Install 2000 ms
App Launch 5000 ms
Test Execution 154000 ms

Detailed Performance Metrics

Metric Duration
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 native encode 1078.000 ms
Base64 CN1 encode 1478.000 ms
Base64 encode ratio (CN1/native) 1.371x (37.1% slower)
Base64 native decode 737.000 ms
Base64 CN1 decode 1184.000 ms
Base64 decode ratio (CN1/native) 1.607x (60.7% slower)
Base64 SIMD encode 392.000 ms
Base64 encode ratio (SIMD/native) 0.364x (63.6% faster)
Base64 encode ratio (SIMD/CN1) 0.265x (73.5% faster)
Base64 SIMD decode 393.000 ms
Base64 decode ratio (SIMD/native) 0.533x (46.7% faster)
Base64 decode ratio (SIMD/CN1) 0.332x (66.8% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 60.000 ms
Image createMask (SIMD on) 13.000 ms
Image createMask ratio (SIMD on/off) 0.217x (78.3% faster)
Image applyMask (SIMD off) 190.000 ms
Image applyMask (SIMD on) 59.000 ms
Image applyMask ratio (SIMD on/off) 0.311x (68.9% faster)
Image modifyAlpha (SIMD off) 133.000 ms
Image modifyAlpha (SIMD on) 81.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.609x (39.1% faster)
Image modifyAlpha removeColor (SIMD off) 155.000 ms
Image modifyAlpha removeColor (SIMD on) 79.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.510x (49.0% faster)
Image PNG encode (SIMD off) 1917.000 ms
Image PNG encode (SIMD on) 810.000 ms
Image PNG encode ratio (SIMD on/off) 0.423x (57.7% faster)
Image JPEG encode 440.000 ms

Copy link
Copy Markdown
Collaborator

@shai-almog shai-almog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks generally good in terms of code but the performance is regressed:
Metric Duration
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 native encode 916.000 ms
Base64 CN1 encode 1418.000 ms
Base64 encode ratio (CN1/native) 1.548x (54.8% slower)
Base64 native decode 678.000 ms
Base64 CN1 decode 914.000 ms
Base64 decode ratio (CN1/native) 1.348x (34.8% slower)
Base64 SIMD encode 416.000 ms
Base64 encode ratio (SIMD/native) 0.454x (54.6% faster)
Base64 encode ratio (SIMD/CN1) 0.293x (70.7% faster)
Base64 SIMD decode 419.000 ms
Base64 decode ratio (SIMD/native) 0.618x (38.2% faster)
Base64 decode ratio (SIMD/CN1) 0.458x (54.2% faster)
Image encode benchmark iterations 250
Image PNG encode (SIMD off) 359.000 ms
Image PNG encode (SIMD on) 454.000 ms
Image PNG encode ratio (SIMD on/off) 1.265x (26.5% slower)
Image JPEG encode (SIMD off) 215.000 ms
Image JPEG encode (SIMD on) 308.000 ms
Image JPEG encode ratio (SIMD on/off) 1.433x (43.3% slower)

Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/9a8f2cf5-e212-48b0-ab89-e71cd2519b4a

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 18, 2026

This looks generally good in terms of code but the performance is regressed:
Metric Duration
Base64 payload size 8192 bytes...

Fixed in 6e7b209. The image SIMD paths were still reading the zeroOffset lane from alloca-backed scratch after the stack memset removal, so ParparVM could compare/select against undefined values. I now explicitly initialize those zero lanes in the affected Image/RGBImage SIMD loops.

Copilot AI and others added 2 commits April 18, 2026 14:19
Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/f3eb3bb6-bb5f-492d-80ff-adc83cce9d51

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/f3eb3bb6-bb5f-492d-80ff-adc83cce9d51

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/220180d6-f744-495c-9d75-8899303421b8

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
… benchmark image to 256x256

Agent-Logs-Url: https://github.com/codenameone/CodenameOne/sessions/9aba781a-23d2-4935-b086-6e0334cecb8f

Co-authored-by: shai-almog <67850168+shai-almog@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants