Skip to content

[Feature] Discuss DoS Protection Strategy for Constant Call APIs #6682

@warku123

Description

@warku123

Summary

Nodes with vm.supportConstant = true expose triggerConstantContract and estimateEnergy APIs that are free for callers but consume node CPU. We ran pressure tests on a private network and found that a single machine with zero TRX can degrade a 16-vCPU node to 1.5% success rate within seconds using concurrent requests.

This issue presents test data and aims to discuss: should java-tron provide built-in concurrency protection for these APIs, and if so, should it be enabled by default or configured by operators?

Problem

Motivation

triggerConstantContract is widely used for reading smart contract state without broadcasting transactions. Because it is free, attackers can send high volumes of complex calls to exhaust node CPU. We investigated whether the existing configuration parameters effectively mitigate this.

Current State

Background: Real-world Energy Consumption

We measured the Top 100 most-called contracts on TRON mainnet (456 view/pure functions):

Statistic Energy
Min 261
Average 648
P50 471
P90 1,024
P95 1,458
Max 4,818

The current default maxEnergyLimitForConstant = 100M provides a ~20,000x safety margin over the observed maximum. However, as shown in Test 3 below, this parameter is not the real execution boundary — CPU time is.


We ran a controlled DoS test suite on an AWS EC2 c6a.4xlarge instance running java-tron 4.8.1.

Test environment

Item Value
Instance AWS EC2 c6a.4xlarge (16 vCPU, 32GB RAM)
java-tron version 4.8.1
Test contract EnergyLevelFlexible.consumeWithCount(uint256) (see Appendix)
Test duration 10 seconds per concurrency level

Test 1: Baseline Concurrent Attack

Default configuration with supportConstant=true and no concurrent-request limiting. The test contract executes a configurable number of hash operations:

  • Light payload (count=1000): ~0.7M Energy, ~25ms per call
  • Heavy payload (count=1500): ~1.0M Energy, ~36ms per call — closer to the CPU time limit

Light payload (count=1000)

Concurrency Success Rate Throughput (req/s) Avg Latency (ms) P99 Latency (ms)
1 100.0% 40.7 25 34
10 100.0% 303.4 33 45
20 99.7% 300.3 66 85
30 100.0% 300.4 100 118

Heavy payload (count=1500)

Concurrency Success Rate Throughput (req/s) Avg Latency (ms) P99 Latency (ms)
1 99.9% 28.0 36 43
10 56.9% 123.6 46 52
20 49.9% 106.3 94 112
30 49.5% 105.4 140 155

With a heavy payload, success rate drops to ~50% at concurrency 10+. An attacker only needs to increase the count parameter to maximize per-request CPU cost.


Test 2: GlobalPreemptibleAdapter Protection

GlobalPreemptibleAdapter uses a semaphore (tryAcquire(2, TimeUnit.SECONDS)) to limit concurrent execution. Excess requests queue for up to 2 seconds rather than being rejected immediately.

Config Concurrency Success Rate Throughput (req/s) Avg Latency (ms) P99 Latency (ms)
Baseline (no protection) 300 1.5% 10.4 348 4212
GlobalPreemptibleAdapter permit=10 300 100.0% 285.1 546 3933

Without protection, 300 concurrent connections collapse the node to 1.5% success. With permit=10, the node stays stable at ~285 req/s with 100% success. The 0 rejections are because each call completes in ~35ms, releasing permit slots fast enough for queued requests to acquire within the 2-second window.


Test 3: maxEnergyLimitForConstant Is Not the Real Limit

We swept the count parameter under different maxEnergyLimitForConstant configurations:

Energy Limit Count Success Rate Avg Energy Avg Latency (ms) Error
100M 100 100% 67636 37 -
100M 500 100% 340287 30 -
100M 1000 100% 689012 47 -
100M 1500 100% 1046525 43 -
10M 100 100% 67636 34 -
10M 500 100% 340287 26 -
10M 1000 100% 689012 48 -
10M 1500 100% 1046525 46 -
5M 100 100% 67636 15 -
5M 500 100% 340287 39 -
5M 1000 100% 689012 27 -
5M 1500 100% 1046525 37 -
3M 100 100% 67636 34 -
3M 500 100% 340287 29 -
3M 1000 100% 689012 54 -
3M 1500 100% 1046525 41 -

Regardless of maxEnergyLimitForConstant (100M, 10M, 5M, or 3M), the Energy limit is never the binding constraint. CPU timeout (OutOfTimeException) is always what stops execution first. Lowering the Energy limit is a semantic cleanup, not a security fix.


Test 4: QPS Blocking Mode Does Not Help

The default QpsRateLimiterAdapter uses Guava RateLimiter.acquire() which blocks but never rejects:

Config Concurrency Success Rate Throughput (req/s) Rejected
Default (global.qps=50000, blocking) 300 1.5% 10.4 0
Low QPS (global.qps=100, blocking) 30 10.7% 58.8 0
GlobalPreemptibleAdapter permit=10 300 100.0% 285.1 0

Even with global.qps=100, the blocking limiter still only achieves 10.7% success. The fundamental issue: RateLimiter.acquire() queues all requests and exhausts the thread pool, regardless of the QPS setting. Only GlobalPreemptibleAdapter effectively limits concurrent execution.


Test 5: estimateEnergy CPU Amplification

Comparing triggerConstantContract vs estimateEnergy with identical contract calls (count=1000, 10 concurrent):

Endpoint Success Rate Throughput (req/s) Avg Latency (ms)
triggerConstantContract 99.0% 292.4 34
estimateEnergy 100.0% 33.3 299

estimateEnergy is ~9x slower due to binary search retries (estimateEnergyMaxRetry=3), each executing the full EVM contract. Any concurrency protection should cover both endpoints.


Test 6: maxConnectionAge and maxConcurrentCallsPerConnection (Code Analysis)

These gRPC-only parameters were investigated via code analysis (empirical testing planned as follow-up):

Parameter Default Risk
maxConcurrentCallsPerConnection Integer.MAX_VALUE Unlimited concurrent calls per connection via HTTP/2 multiplexing
maxConnectionAgeInMillis Long.MAX_VALUE Connections never expire

A single gRPC connection can bypass maxConnectionsWithSameIp=2 and exhaust all rpcThread workers.

Limitations or Risks

If left unaddressed, publicly accessible API nodes remain vulnerable to trivial DoS attacks that require no TRX and no on-chain transactions.

Proposed Solution

Reference: Comparison with Ethereum Geth

Ethereum's Geth faces the same problem with eth_call and eth_estimateGas. Geth provides per-call protections (RPCGasCap = 50M, RPCEVMTimeout = 5s, BatchRequestLimit = 1000) but no native concurrent execution limit. This is likely because Go's goroutine model (~few KB each) is more resilient to concurrent load than Java's thread pool model (~1MB per thread). In practice, Geth operators rely on external infrastructure (Nginx, cloud load balancers) for concurrency control.

java-tron already has a built-in mechanism — GlobalPreemptibleAdapter — that can provide this protection natively.

Proposed Design

Based on our test results, GlobalPreemptibleAdapter is the most effective built-in defense. We'd like community input on the deployment strategy.

Key Changes

# Item Current Suggestion
1 GlobalPreemptibleAdapter for TriggerConstantContract Not configured Enable with configurable permit value
2 GlobalPreemptibleAdapter for EstimateEnergyServlet Not configured Enable with a lower permit (higher CPU cost per request)
3 maxEnergyLimitForConstant 100M Consider lowering for semantic alignment (not a security fix)

Impact

  • Security: Limits the CPU that free constant calls can consume concurrently.
  • Stability: Prevents node collapse under concurrent load.
  • Performance: Normal queries are unaffected — permit=10 still delivers ~285 req/s.

Compatibility

  • Breaking Change: No.
  • Default Behavior Change: Depends on discussion outcome.
  • Migration Required: No.

References (Optional)

Additional Notes

We'd like to discuss the following questions with the community:

  1. Default-on vs operator-configured? Should GlobalPreemptibleAdapter be enabled by default, or should operators enable it when needed? Enabling by default protects out-of-the-box but may affect high-concurrency use cases; leaving it off means most operators remain unprotected unless they discover the option.

  2. Node-level vs external protection? Some operators may prefer to handle rate limiting externally (Nginx, cloud LB). Should java-tron focus on providing the capability and documenting it, rather than enabling it by default?

  3. What is a reasonable permit value? Our test used permit=10 on 16 vCPU, achieving ~285 req/s with 100% success. Should the default be tied to CPU cores (e.g., cores / 2), or a fixed conservative value?

  4. Should maxEnergyLimitForConstant be lowered? The Top 100 contracts peak at only 4,818 Energy vs the 100M default (~20,000x gap). Lowering doesn't improve DoS resilience but better reflects real-world usage.

  • Do you have ideas regarding implementation? Yes
  • Are you willing to implement this feature? Yes

Appendix: Test Contract

The consumeWithCount(uint) function performs count iterations of keccak256 hashing and arithmetic, allowing precise control over Energy consumption and CPU time per call.

// Solidity 0.4.23 (compatible with TRON TVM)
contract EnergyLevelFlexible {
    function consumeWithCount(uint count) public pure returns (uint) {
        uint result = 1;
        for (uint i = 0; i < count && i < 500000; i++) {
            result = uint(keccak256(abi.encodePacked(result, i)));
            result = result * 3 + i;
            if (result > 1000000000000) {
                result = result / 2;
            }
        }
        return result;
    }
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions