Java Log Analytics Engine

Deterministic Real-Time Anomaly Detection with Concurrency Benchmarking

Overview

This repository presents a high-throughput Java-based log analytics engine designed to perform real-time anomaly detection using incremental statistical methods.

The objective of this project is to characterize the performance impact of different Java concurrency models on a deterministic, real-time statistical workload.

Rather than introducing machine learning complexity, the system uses incremental statistical techniques to isolate and measure concurrency behavior under controlled, reproducible conditions.

Core Research Question

What is the throughput and consistency impact of different Java concurrency models when executing identical real-time statistical anomaly detection workloads?

This project evaluates:

Single-threaded execution (baseline)
Thread pool–based multithreading (ExecutorService)
Parallel stream execution (ForkJoinPool)
Partitioned local processing (isolated statistical windows)

All engines execute the same deterministic anomaly detection pipeline. Only the execution strategy changes.

System Architecture

Each log entry flows through a fixed processing pipeline:

Raw Log Entry
    ↓
LogRecord Object
    ↓
Sliding Window Buffer (bounded, circular)
    ↓
Incremental Statistics (Welford’s algorithm)
    ↓
Z-score Anomaly Detection

Key Properties

O(1) statistical updates per log
No full-history scans
Bounded memory usage
Deterministic output
No machine learning
No distributed systems

Statistical Model

Anomalies are detected using a Z-score–based approach:

Z = (value - mean) / standard_deviation

If:

|Z| > threshold

the log entry is classified as an anomaly.

Statistics are maintained incrementally using Welford’s algorithm to ensure:

Numerical stability
Constant-time updates
No need to retain full history

Concurrency Models Compared

1. SingleThreadEngine

Sequential processing
Baseline performance reference
Minimal overhead

2. ThreadPoolEngine

Uses ExecutorService
Configurable worker threads
Shared sliding window with synchronization

3. ParallelStreamEngine

Uses parallelStream()
Backed by ForkJoinPool
High-level implicit parallelism

4. PartitionedLocalEngine

Dataset partitioned across threads
Each partition maintains its own sliding window
Results merged post-processing

Note: The partitioned model may introduce statistical drift due to isolated windows. This is an intentional trade-off and part of the performance analysis.

Determinism & Reproducibility

Reproducibility is a core design goal.

Synthetic datasets generated using a fixed random seed (42)
Dataset created once per execution
All engines process identical data
Consistency verification mode ensures anomaly counts match across runs

Deterministic guarantees ensure that performance comparisons are not influenced by data variation, only execution strategy.

To verify determinism:

java -cp src Main consistency 1000000

If anomaly counts remain identical across executions, determinism is verified.

Benchmark Mode

To compare engine performance:

java -cp src Main benchmark 1000000

Example output:

================ ENGINE BENCHMARK =================

Dataset Size: 1,000,000
Random Seed: 42

----------------------------------------------------------
Engine                  Time (ms)    Anomalies
----------------------------------------------------------
SingleThreadEngine      38           3229
ThreadPoolEngine(4)     37           3229
ParallelStreamEngine    30           3229
PartitionedLocal(4)     54           3232
----------------------------------------------------------

Fastest Engine: ParallelStreamEngine
Slowest Engine: PartitionedLocal(4)

Deterministic anomaly results verified.

This mode ensures:

Fair comparison
Single dataset generation
Structured performance summary
Deterministic anomaly counts

Performance Metrics

The following metrics are evaluated:

Total execution time (ms)
Throughput (logs processed per second)
Anomaly count consistency
Relative speedup vs single-thread baseline

No synthetic optimizations are applied between engines. All results reflect raw execution behavior under identical workloads.

Usage

Compile

javac src/*.java

Run

java -cp src Main

Available Modes

single
threadpool
parallel
partitioned
consistency
benchmark

Examples

java -cp src Main single 100000
java -cp src Main threadpool 1000000
java -cp src Main benchmark 1000000
java -cp src Main consistency 500000

If dataset size is omitted, a default value is used.

Experimental Setup

Datasets tested:

50,000 logs
100,000 logs
1,000,000+ logs

All experiments use:

Identical hardware
Same JVM
Same random seed
Same anomaly detection threshold
Identical workload

Only the concurrency model changes.

Key Observations

Based on benchmark runs:

ParallelStreamEngine often performs best under high load
ThreadPoolEngine performs comparably with explicit control
Single-threaded execution remains competitive at smaller dataset sizes
PartitionedLocalEngine trades statistical consistency for potential scalability gains
Deterministic anomaly detection is feasible at high throughput without ML overhead

Project Structure

src/
  Main.java
  SyntheticDataGenerator.java
  ConsistencyChecker.java
  EngineBenchmarkRunner.java
  ExecutionEngine.java
  SingleThreadEngine.java
  ThreadPoolEngine.java
  ParallelStreamEngine.java
  PartitionedLocalEngine.java
  SlidingWindowBuffer.java
  WindowedStatistics.java
  IncrementalStatistics.java
  AnomalyDetector.java
  LogAnomalyEngine.java
  ...

The design separates:

Data representation
Statistical logic
Execution strategy
Benchmarking utilities

What This Project Is

Systems engineering study
Concurrency performance comparison
Deterministic real-time statistical processing
Reproducible benchmarking framework

What This Project Is Not

Machine learning research
Distributed systems framework
Big data platform
Production log aggregation service

Research Framing

This project contributes:

A reproducible experimental framework for evaluating Java concurrency models
Evidence that incremental statistics enable real-time anomaly detection
Insight into performance trade-offs between abstraction and manual thread control
Analysis of consistency vs scalability trade-offs in partitioned processing

Conclusion

This repository provides a controlled experimental environment for studying concurrency behavior in Java under statistically grounded real-time workloads.

It demonstrates that meaningful performance insights can be obtained without distributed infrastructure or machine learning complexity — provided experimental rigor and determinism are enforced.

Future Work

Potential extensions include:

Support for streaming input sources
Integration with real log datasets
Adaptive threshold mechanisms
Visualization dashboards
Distributed execution experiments

License

This project is intended for academic and educational use.

Author

Abhinav Sai Gunnampalli (Abhiix0) Yashwanth Abhishek Guvvala (Yashabhi0)

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
src		src
.gitignore		.gitignore
README.md		README.md
logs.txt		logs.txt

Folders and files

Latest commit

History

Repository files navigation

Java Log Analytics Engine

Deterministic Real-Time Anomaly Detection with Concurrency Benchmarking

Overview

Core Research Question

System Architecture

Key Properties

Statistical Model

Concurrency Models Compared

1. SingleThreadEngine

2. ThreadPoolEngine

3. ParallelStreamEngine

4. PartitionedLocalEngine

Determinism & Reproducibility

Benchmark Mode

Performance Metrics

Usage

Compile

Run

Available Modes

Examples

Experimental Setup

Key Observations

Project Structure

What This Project Is

What This Project Is Not

Research Framing

Conclusion

Future Work

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages