A multi-stage AI security assessment platform for Android applications combining static analysis, dynamic sandboxing, LLM-powered behavioral analysis, and explainable machine learning.
Features β’ Architecture β’ Installation β’ Usage β’ API Reference β’ Development
- Overview
- Features
- Architecture
- Technology Stack
- Installation
- Configuration
- Usage
- API Reference
- Pipeline Components
- Machine Learning
- Frontend Architecture
- Testing
- Deployment
- Project Structure
- Contributing
- License
MobileGuard AI is an enterprise-grade Android malware detection system designed for financial institutions, security operations centers (SOCs), and cybersecurity agencies. It provides comprehensive threat analysis through a 7-stage pipeline:
- Static Analysis - APK decompilation, permission analysis, API usage patterns, certificate validation, code obfuscation detection
- YARA Signature Scanning - Multi-rule malware family detection with severity-based scoring
- MITRE ATT&CK Mapping - Permission and API call mapping to mobile attack techniques
- Malware Family Classification - Rule-based family detection (BankBot, Spyware, RAT, Ransomware)
- Dynamic Analysis - Sandbox execution with runtime event collection, behavioral anomaly detection, and ADB-based monitoring
- LLM Analysis - Resilient multi-tier LLM routing (Gemini 2.5 Flash β GPT-4o) with smart truncation
- Risk Scoring - XGBoost-based ML classifier with SHAP explainability and context-aware boost rules
Key Differentiators:
- Explainable AI - SHAP values show which features drove the risk score
- Regional Threat Intelligence - India-specific banking malware patterns (UPI, OTP interception)
- Real-time Streaming - Server-sent events provide live analysis progress
- Multi-modal Analysis - Combines YARA signatures, MITRE mapping, ML, and LLM approaches
- Resilient LLM Infrastructure - Automatic failover between Gemini and OpenAI with context window management
- Production-Ready Sandbox - Full ADB + Frida integration with runtime event collection
- VirusTotal Integration - Hash-based reputation checking
- Audit Trail - JSONL-based audit logging with SQLite feature caching
- APK Parsing - Androguard-based decompilation with manifest extraction and DEX analysis
- Permission Profiling - 22+ dangerous permission detection with combo risk scoring (READ_SMS + INTERNET = +10 points)
- API Fingerprinting - Call graph analysis with 14+ suspicious API pattern matching (sendTextMessage, Runtime.exec, DexClassLoader)
- Obfuscation Detection - Shannon entropy analysis on strings (>4.5 threshold), base64 pattern matching
- Certificate Validation - Self-signed cert detection, validity period analysis, issuer verification, debug cert flagging
- Native Code Inspection -
.solibrary enumeration with known malware signature matching (libfrida-gadget.so, libinject.so) - Call Graph Construction - NetworkX-based control flow analysis with graph density metrics
- VirusTotal Integration - SHA256 hash reputation checking with malicious/suspicious count extraction
- C2 Domain Detection - URL extraction with threat intelligence feed matching
- Execution Modes - Live sandbox (ADB + Frida + logcat) with automatic device detection and fallback to emulated mode
- Behavioral Monitoring - SMS send attempts, accessibility service abuse, silent install detection, overlay detection
- Runtime Event Collection - CollectorOrchestrator with multiple specialized collectors (Frida hooks, dumpsys, logcat parsing)
- Network Traffic Analysis - Domain extraction from logcat, C2 server detection with threat intel matching
- System State Analysis - dumpsys inspection for accessibility services, device admin, foreground services, overlay windows
- Device Admin Detection - Privilege escalation attempt monitoring via device_policy dumpsys
- Behavioral Scoring - Evidence-based anomaly score (0-100) calculated from observed runtime events
- Safe Failure Handling - Graceful degradation to emulated mode when ADB unavailable or device unreachable
- Malware Family Matching - Behavioral pattern correlation against known family signatures
- Resilient Multi-Tier Routing - Primary: Gemini 2.5 Flash β Fallback: GPT-4o with automatic failover
- Smart Truncation - Center-out code truncation (top 40% + bottom 40%) for context window management
- Native JSON Output - Structured response format with strict schema validation and markdown stripping
- Contextual Analysis - Decompiled code interpretation with malicious behavior extraction and evidence citations
- Evidence-Based Reasoning - Cites specific class names, methods, API calls, and permissions in findings
- Zero-Day Hypothesis Generation - Novel threat detection for unknown malware families (triggered when severity > 0.6 AND family_similarity < 0.4)
- India-Specific Risk Assessment - UPI, BHIM, PhonePe, Paytm targeting detection with regional threat patterns
- Structured JSON Output - Confidence scores, verdict classification (APPROVE/MONITOR/ESCALATE/BLOCK), executive summaries
- Security Analyst Persona - 15+ years malware analysis experience, national SOC-level assessment standards
- Automatic Retry Logic - 2-attempt system with progressive truncation on context window errors
- XGBoost Classifier - 300 tree ensemble with early stopping, class imbalance handling, and CPU-optimized hist tree method
- Dataset Feature Extraction - 37 engineered features from Drebin/CIC-AndMal2017 compatible format
- SHAP Explainability - TreeExplainer integration with top-5 feature attribution and waterfall visualization
- Synthetic Training Data - Statistical distribution matching for benign (n=800) and malicious (n=800) samples
- Multi-Dimensional Scoring - 6 weighted risk dimensions:
- Permission Abuse (10%)
- Obfuscation (10%)
- Behavioral Anomaly (15%)
- ML Malware (45%)
- Developer Trust (10%)
- LLM Severity (10%)
- Context-Aware Boost Rules - Dynamic score amplification:
- SMS send attempts (+15)
- C2 domains contacted (+20)
- Accessibility abuse (+12)
- Static C2 IPs (+10)
- LLM verdict BLOCK (+10)
- Silent install attempt (+15)
- Root activity detected (+15)
- Shell execution (+10)
- Dynamic code loading (+10)
- VirusTotal malicious β₯5 (+25)
- YARA signature matches (+3 to +25)
- Threat Reports - Structured JSON with verdict, forensic indicators, recommended actions, MITRE techniques
- Audit Logging - ISO 8601 timestamped JSONL logs with dimension scores, SHAP values, and YARA matches
- Feature Store - SQLite-based result caching with model version tracking for duplicate APK detection
- CERT-In Compliance - Reporting format aligned with Indian cybersecurity standards
- YARA Match Reporting - Matched rule names, families, severity levels, and specific string matches
- MITRE ATT&CK Coverage - Tactic-grouped technique IDs with evidence traceability
- Family Attribution - Confidence-scored malware family classification with matched behavioral signals
- Real-time Updates - Server-sent events with live progress tracking across 7 pipeline stages
- Interactive Visualizations - Recharts-based risk gauge, 6-axis dimension radar charts, SHAP waterfall plots
- Framer Motion Animations - Smooth page transitions and component mounting with spring physics
- Tailwind CSS Design System - Dark mode with glassmorphism effects and responsive 12-column grid
- Responsive Layout - Mobile-first design with adaptive breakpoints
- Lucide Icons - Shield, Activity, AlertTriangle, FileSearch, Brain, BarChart, and 50+ security icons
- Component Library - 8 specialized components:
- UploadZone (drag-and-drop with 150MB client-side validation)
- ProgressTracker (5-stage pipeline visualization)
- RiskGauge (circular gauge with dynamic gradient coloring)
- DimensionChart (6-axis radar with tooltips)
- ShapExplainer (top-5 feature attribution bars)
- ThreatReport (collapsible sections with copy-to-clipboard)
- ActionBanner (verdict display with color-coded badges)
- AuditLog (paginated table with filters)
graph TB
subgraph "Client Layer"
UI[React Frontend<br/>Vite + Tailwind CSS]
Browser[Web Browser<br/>Chrome/Firefox/Safari]
end
subgraph "API Gateway"
API[FastAPI Server<br/>Uvicorn ASGI]
CORS[CORS Middleware]
SSE[Server-Sent Events<br/>Streaming]
end
subgraph "Core Pipeline"
Orch[Pipeline Orchestrator<br/>Event Coordinator]
Cache{Cache Check<br/>SHA256 + Version}
end
subgraph "Analysis Engines"
Static[Static Analyzer<br/>Androguard + NetworkX]
YARA[YARA Engine<br/>Signature Scanner]
MITRE[MITRE Mapper<br/>ATT&CK Techniques]
Family[Family Classifier<br/>Rule-Based Detection]
Dynamic[Dynamic Analyzer<br/>ADB + Frida + Logcat]
LLM[LLM Analyzer<br/>Resilient Router]
Scorer[Risk Scorer<br/>XGBoost + SHAP]
Report[Report Generator<br/>Intelligence Synthesis]
end
subgraph "External Services"
Gemini[Google Gemini 2.5 Flash<br/>Primary LLM]
GPT[OpenAI GPT-4o<br/>Fallback LLM]
VT[VirusTotal API<br/>Hash Reputation]
ADB[Android Device<br/>Live Sandbox]
end
subgraph "Data Layer"
FS[(SQLite<br/>Feature Store)]
Audit[(JSONL<br/>Audit Logs)]
Model[(XGBoost Model<br/>SHAP Explainer)]
Rules[(YARA Rules<br/>.yar Files)]
Intel[(Threat Intel<br/>C2 Blocklists)]
end
Browser -->|Upload APK| UI
UI <-->|REST API<br/>SSE Stream| API
API --> CORS
CORS --> Orch
Orch --> Cache
Cache -->|Hit| UI
Cache -->|Miss| Static
Static --> YARA
Static -->|Hash Check| VT
YARA --> MITRE
MITRE --> Family
Family --> Dynamic
Dynamic -->|If Available| ADB
Dynamic --> LLM
LLM -->|Primary| Gemini
LLM -->|Fallback| GPT
LLM --> Scorer
Scorer --> Report
Static -.->|Load| Model
Scorer -.->|Load| Model
YARA -.->|Load| Rules
Static -.->|Query| Intel
Dynamic -.->|Query| Intel
Report -->|Cache| FS
Report -->|Log| Audit
Cache -.->|Query| FS
Report -->|Final Result| SSE
SSE --> UI
style UI fill:#3B82F6,stroke:#1E40AF,color:#fff
style API fill:#10B981,stroke:#047857,color:#fff
style Orch fill:#F59E0B,stroke:#D97706,color:#fff
style Gemini fill:#8B5CF6,stroke:#6D28D9,color:#fff
style GPT fill:#8B5CF6,stroke:#6D28D9,color:#fff
style FS fill:#EF4444,stroke:#B91C1C,color:#fff
style Model fill:#EF4444,stroke:#B91C1C,color:#fff
sequenceDiagram
participant User
participant Frontend
participant API
participant Orchestrator
participant Analyzers
participant LLM
participant Database
User->>Frontend: Upload APK File (150MB max)
Frontend->>API: POST /analyze (multipart/form-data)
API->>Orchestrator: Initialize Pipeline
Orchestrator->>Database: Check Cache (SHA256)
alt Cache Hit
Database-->>Orchestrator: Return Cached Result
Orchestrator-->>Frontend: SSE: cache_hit (100%)
else Cache Miss
Orchestrator-->>Frontend: SSE: static_analysis (10%)
Orchestrator->>Analyzers: Static Analysis
Analyzers-->>Orchestrator: Static Features
Orchestrator-->>Frontend: SSE: yara_scan (25%)
Orchestrator->>Analyzers: YARA + MITRE + Family
Analyzers-->>Orchestrator: Signatures + Techniques
Orchestrator-->>Frontend: SSE: dynamic_analysis (40%)
Orchestrator->>Analyzers: Dynamic Analysis
Analyzers-->>Orchestrator: Runtime Events
alt Composite Score > 40
Orchestrator-->>Frontend: SSE: llm_analysis (80%)
Orchestrator->>LLM: Analyze Code
LLM->>LLM: Try Gemini 2.5 Flash
alt Gemini Success
LLM-->>Orchestrator: Analysis Result
else Gemini Fail
LLM->>LLM: Fallback to GPT-4o
LLM-->>Orchestrator: Analysis Result
end
else Score β€ 40
Orchestrator-->>Frontend: SSE: llm_skipped (80%)
end
Orchestrator-->>Frontend: SSE: risk_scoring (90%)
Orchestrator->>Analyzers: ML Score + SHAP
Analyzers-->>Orchestrator: Risk Score + Attribution
Orchestrator->>Database: Cache Result
Orchestrator->>Database: Log Audit Entry
Orchestrator-->>Frontend: SSE: complete (100%)
end
Frontend->>User: Display Risk Gauge + Report
graph TD
A[Frontend React SPA] -->|HTTP POST /analyze| B[FastAPI Backend]
B -->|Stream SSE Events| A
B --> C[Pipeline Orchestrator]
C --> D[Static Analyzer]
D -->|Androguard| D1[APK Decompilation]
D1 --> D2[Permission Analysis]
D2 --> D3[API Call Graph]
D3 --> D4[Certificate Validation]
D4 --> D5[Obfuscation Detection]
D5 --> D6[VirusTotal Hash Check]
C --> Y[YARA Engine]
Y -->|Scan DEX/Manifest/.so| Y1[Signature Matching]
Y1 --> Y2[Severity Scoring]
C --> M[MITRE Mapper]
M --> M1[Permission β Technique]
M1 --> M2[API β Technique]
C --> FC[Family Classifier]
FC --> FC1[BankBot/Spyware/RAT Rules]
C --> E[Dynamic Analyzer]
E -->|ADB + Frida| E1[Sandbox Execution]
E1 --> E2[Runtime Collectors]
E2 --> E3[Behavioral Scoring]
E3 --> E4[Event Mapping]
C --> F[LLM Analyzer]
F -->|Resilient Router| F1[Gemini 2.5 Flash]
F1 -->|Fallback| F2[GPT-4o]
F2 --> F3[Smart Truncation]
F3 --> F4[JSON Validation]
C --> G[Risk Scorer]
G -->|Dataset Features| G1[XGBoost Model]
G1 -->|SHAP| G2[Feature Attribution]
G2 --> G3[6 Dimension Scores]
G3 --> G4[Boost Rules]
C --> H[Report Generator]
H --> I[Threat Report]
G --> J[Feature Store]
J -->|SQLite| K[(Cache DB)]
H --> L[Audit Logger]
L -->|JSONL| N[(Audit Logs)]
- User uploads APK β Frontend sends multipart/form-data POST
- Backend validates β Size check (150MB max), magic byte verification (PK header)
- Orchestrator streams events β Each stage emits SSE with progress percentage
- Static analysis β 14 numeric features + graph topology metrics
- Dynamic analysis β Sandbox execution (if enabled) or emulated mode
- LLM analysis β Gemini API call with decompiled code context
- Risk scorer builds feature vector β 37-dimensional array for XGBoost
- SHAP explainer β Top-5 feature contributions extracted
- Report generated β JSON with verdict (APPROVE/MONITOR/ESCALATE/BLOCK)
- Results cached β SQLite feature store + JSONL audit log
- Frontend renders β Risk gauge, dimension chart, SHAP waterfall, threat report
| Component | Technology | Version | Purpose |
|---|---|---|---|
| API Framework | FastAPI | 0.111.0 | Async REST API with OpenAPI docs |
| ASGI Server | Uvicorn | 0.30.1 | Production ASGI server with WebSocket support |
| APK Analysis | Androguard | 3.3.5 | DEX decompilation, manifest parsing |
| ML Framework | XGBoost | 2.0.3 | Gradient boosting classifier |
| Explainability | SHAP | 0.45.1 | TreeExplainer for feature attribution |
| LLM API | Google Gemini + OpenAI | 2.5 Flash / GPT-4o | Resilient multi-tier routing |
| Graph Analysis | NetworkX | 3.3 | Call graph construction |
| Data Processing | Pandas + NumPy | 2.2.2 + 1.26.4 | Feature engineering |
| Database | SQLAlchemy | 2.0.30 | ORM for SQLite feature store |
| File Type Detection | python-magic | 0.4.27 | APK validation |
| YARA Scanner | yara-python | 4.5.1 | Signature-based detection |
| Testing | pytest + httpx | 8.2.2 + 0.27.0 | Unit/integration tests |
| Component | Technology | Version | Purpose |
|---|---|---|---|
| Framework | React | 19.2.6 | Component-based UI |
| Build Tool | Vite | 8.0.12 | Fast HMR development server |
| Styling | Tailwind CSS | 3.4.19 | Utility-first CSS framework |
| Charts | Recharts | 3.8.1 | D3-based data visualization |
| Animations | Framer Motion | 12.40.0 | Declarative animations |
| Icons | Lucide React | 1.20.0 | SVG icon library |
| HTTP Client | Fetch API | Native | Server-sent events streaming |
- Containerization - Docker + Docker Compose
- Reverse Proxy - Nginx (frontend static serving)
- Storage - SQLite (feature cache), JSONL (audit logs)
# Python 3.11+
python --version # Should be >= 3.11
# Node.js 20+
node --version # Should be >= 20
# Docker & Docker Compose (optional)
docker --version
docker-compose --version
# Java Runtime (for Androguard)
java -version # Required for APK decompilation
# ADB (for live sandbox mode)
adb version # Optional - only if USE_LIVE_SANDBOX=true# 1. Clone the repository
git clone https://github.com/yourusername/mobileguard-ai.git
cd mobileguard-ai
# 2. Configure environment variables
cp .env.example .env
nano .env # Add your GEMINI_API_KEY
# 3. Build and run with Docker Compose
docker-compose up --build
# 4. Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docscd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Train the XGBoost model (generates models/xgboost_mobileguard.json)
python -m backend.training.train_xgboost
# Start the API server
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000cd frontend
# Install dependencies
npm install
# Start development server with HMR
npm run dev
# Build for production
npm run build
npm run preview # Preview production build# Required
GEMINI_API_KEY="your-gemini-api-key-here"
OPENAI_API_KEY="your-openai-api-key-here" # For LLM fallback tier
# Optional
VIRUSTOTAL_API_KEY="your-vt-api-key" # For threat intelligence enrichment
USE_LIVE_SANDBOX="false" # Enable ADB-based sandbox (requires devices)
MAX_APK_SIZE_MB="150" # Max upload size
SANDBOX_TIMEOUT_SECS="90" # Dynamic analysis timeout
# Paths (auto-configured)
FEATURE_CACHE_DB="data/feature_cache.sqlite"
AUDIT_LOG_PATH="data/audit.jsonl"
MODEL_PATH="models/xgboost_mobileguard.json"Backend Configuration (backend/config.py):
LLM_MODEL- Gemini model name (default:gemini-2.0-flash)RISK_THRESHOLDS- Score boundaries for APPROVE/MONITOR/ESCALATE/BLOCKDANGEROUS_PERMISSIONS- Permission risk weights (0-5 scale)SUSPICIOUS_API_PATTERNS- Regex patterns for malicious API detection
Frontend Configuration (frontend/vite.config.js):
- Build settings for production optimization
- Proxy configuration for local development
Tailwind Config (frontend/tailwind.config.js):
- Custom color palette (background, accent, danger, success)
- Animation keyframes for glow effects
- Navigate to
http://localhost:3000 - Check System Status - Verify API health and model loading
- Upload APK - Drag & drop or click to select
.apkfile (max 150MB) - Monitor Progress - Watch real-time analysis stages:
- Static Analysis (0-30%)
- Dynamic Analysis (30-50%)
- LLM Analysis (50-70%)
- Risk Scoring (70-90%)
- Report Generation (90-100%)
- Review Results:
- Risk Gauge - Composite score with action recommendation
- Dimension Chart - 6 risk dimension breakdown
- SHAP Explainer - Top-5 features driving the score
- Threat Report - Executive summary with forensic indicators
- Audit Log - View historical analyses with scores and timestamps
curl -X POST "http://localhost:8000/analyze" \
-H "Content-Type: multipart/form-data" \
-F "file=@sample.apk" \
--no-buffer # Required for SSE streamingResponse (Server-Sent Events):
data: {"stage":"static_analysis","status":"running","progress":10}
data: {"stage":"yara_scan","status":"running","progress":25}
data: {"stage":"cache_hit","status":"done","progress":20} # If APK previously analyzed
data: {"stage":"dynamic_analysis","status":"running","progress":40}
data: {"stage":"llm_skipped","status":"done","progress":80} # If composite_score β€ 40
data: {"stage":"llm_analysis","status":"running","progress":80} # If composite_score > 40
data: {"stage":"risk_scoring","status":"running","progress":60}
data: {"stage":"report_generation","status":"running","progress":90}
data: {"stage":"complete","status":"done","progress":100,"result":{...}}
curl http://localhost:8000/healthResponse:
{
"status": "ok",
"version": "1.0.0",
"model_loaded": true,
"llm_available": true,
"sandbox_available": false
}curl http://localhost:8000/analysis/{apk_sha256_hash}curl "http://localhost:8000/audit-log?limit=10&offset=0"Analyze an APK file with full pipeline execution.
Request:
- Body -
multipart/form-data - Field -
file(APK binary, max 150MB)
Response:
- Content-Type -
text/event-stream - Events - JSON objects with
stage,status,progress,error,resultfields
Error Codes:
422- Invalid APK format (not a ZIP/PK header)413- File too large (> 150MB)500- Analysis pipeline failure
System health and component availability.
Response:
{
"status": "ok",
"version": "1.0.0",
"model_loaded": true,
"sandbox_available": false
}Retrieve cached analysis by SHA256 hash.
Response: Full AnalysisResult JSON
Error Codes:
404- Hash not found in cache503- Feature store unavailable
Fetch audit log entries.
Query Parameters:
limit(int) - Max entries (default: 50)offset(int) - Pagination offset (default: 0)
Response:
{
"entries": [
{
"apk_hash": "abc123...",
"filename": "sample.apk",
"score": 68.5,
"action": "ESCALATE",
"analyzed_at": "2026-06-18T10:30:45Z"
}
]
}Remove cached analysis result.
Response: {"status": "ok"}
File: backend/pipeline/orchestrator.py
The orchestrator coordinates all 7 analysis stages with intelligent caching and conditional LLM invocation:
- Cache Check - Query feature store by SHA256 hash + model version
- Static Analysis - APK decompilation and feature extraction
- YARA Scanning - Signature-based malware family detection
- MITRE Mapping - Permission/API β ATT&CK technique mapping
- Family Classification - Rule-based family detection
- Dynamic Analysis - Runtime behavior monitoring (if sandbox available)
- Conditional LLM Analysis - Only invoked if
composite_score > 40(cost optimization) - Risk Scoring - ML prediction + SHAP + boost rules
- Report Generation - Threat report with actionable intelligence
Key Optimizations:
- Smart LLM Gating - Skip expensive LLM calls for low-risk apps (saves 70% of API costs)
- Hash-Based Caching - Instant results for previously analyzed APKs
- Model Version Tracking - Cache invalidation when model updates
- SSE Streaming - Real-time progress updates to frontend
Implementation: backend/pipeline/static_analyzer.py
Extracts 27+ static features using Androguard with VirusTotal enrichment.
Core Capabilities:
- APK decompilation and manifest parsing
- Permission risk scoring with combo detection
- Call graph construction using NetworkX
- Certificate chain validation
- Obfuscation detection via Shannon entropy
- Native library inspection
- C2 domain/IP extraction
- VirusTotal hash reputation check
Implementation: backend/detection/yara_engine.py
Production-grade signature scanning with metadata-aware severity scoring.
Architecture:
- Independent rule compilation (one broken rule doesn't disable engine)
- Unpacked content scanning (DEX, AndroidManifest.xml, .so files)
- Cross-component deduplication
- Severity-based score weighting (CRITICAL: 100, HIGH: 70, MEDIUM: 40, LOW: 15)
- Action prioritization (BLOCK > ESCALATE > MONITOR > APPROVE)
- Safe failure handling with scan error reporting
Score Contribution:
severity_score- Highest single rule weight (0-100)score_boost- Additive boost for risk composite (capped at 40)
Implementation: backend/intel/mitre_mapper.py
Maps permissions and APIs to MITRE Mobile ATT&CK techniques.
Coverage:
- 20+ permission mappings (SMS β T1636.004, Accessibility β T1417)
- 10+ API mappings (DexClassLoader β T1407, Runtime.exec β T1406)
- Tactic-grouped findings (Collection, Persistence, Defense Evasion, etc.)
- Evidence traceability (each technique links to triggering signal)
Implementation: backend/intel/family_classifier.py
Rule-based classification with confidence scoring.
Supported Families:
- BankBot-like - SMS + Accessibility overlay (confidence: 0.75+)
- Spyware-like - Contacts + Location exfiltration (confidence: 0.75+)
- RAT-like - Dynamic code loading + shell execution (confidence: 0.70+)
- Ransomware-like - Storage encryption + device locking (confidence: 0.70+)
Algorithm:
- Required signals (all must match)
- Trigger signals (any N must match)
- Bonus signals (each adds +0.05 confidence)
- Threshold filtering (default: 0.8)
Implementation: backend/pipeline/dynamic_analyzer.py
Full ADB + Frida + logcat integration with automatic fallback.
Live Sandbox Mode:
- APK installation via ADB
- Logcat capture with signal extraction
- UI interaction via
adb shell monkey - dumpsys inspection (accessibility, device_policy, window, activity services)
- Runtime event collection (Frida hooks, system state)
- Behavioral anomaly scoring (0-100)
Emulated Mode:
- Graceful degradation when no device available
- Returns neutral values instead of blocking analysis
- Allows ML/LLM tiers to carry the decision weight
Implementation: backend/pipeline/llm_analyzer.py + backend/pipeline/resilient_router.py
Multi-tier routing with smart truncation and strict JSON validation.
Routing Strategy:
- Primary Tier - Gemini 2.5 Flash with native JSON output
- Fallback Tier - GPT-4o with structured response format
- Retry Logic - 2 attempts with progressive truncation on context errors
Smart Truncation:
- Center-out strategy (top 40% + bottom 40%)
- Preserves class headers/imports and execution tails
- Triggered automatically on context window errors
Security Analyst Persona:
- 15+ years malware analysis experience
- Evidence-based reasoning (no speculation)
- Distinguishes capability from intent
- Maps to MITRE ATT&CK Mobile
- India-specific threat assessment
Implementation: backend/pipeline/risk_scorer.py
XGBoost classifier with SHAP explainability and context-aware boost rules.
Multi-Dimensional Scoring:
dimension_scores = {
"permission_abuse": 10% weight, # Dangerous permissions
"obfuscation": 10% weight, # Code obfuscation
"behavioral_anomaly": 15% weight, # Runtime behavior
"ml_malware": 45% weight, # XGBoost prediction
"developer_trust": 10% weight, # Certificate validation
"llm_severity": 10% weight, # LLM assessment
}
composite_score = Ξ£(dimension_score Γ weight)Boost Rules (Context-Aware Amplification):
- SMS send attempts β +15 points
- C2 domains contacted β +20 points
- Accessibility service abuse β +12 points
- Static C2 IPs found β +10 points
- LLM verdict BLOCK β +10 points
- Silent install attempt β +15 points
- Root activity detected β +15 points
- Shell execution β +10 points
- Dynamic code loading β +10 points
- VirusTotal malicious β₯5 β +25 points
- YARA matches β +3 to +25 points (capped at +40)
Action Thresholds:
RISK_THRESHOLDS = {
"LOW": 0-25 β APPROVE,
"MEDIUM": 26-50 β MONITOR,
"HIGH": 51-75 β ESCALATE,
"CRITICAL": 76-100 β BLOCK
}SHAP Explainability:
explainer = shap.TreeExplainer(xgb_model)
shap_values = explainer.shap_values(feature_vector)
# Extract top 5 contributors
top_features = [
("permission_danger", +18.5),
("obfuscation_score", +12.3),
("c2_hit_count", +9.7),
("has_native_code", +5.2),
("graph_density", -3.1)
]File: backend/pipeline/static_analyzer.py
Features Extracted:
@dataclass
class StaticFeatures:
apk_hash: str # SHA256 hash
package_name: str # com.example.app
permission_list: List[str] # Manifest permissions
permission_danger_score: float # 0-100 weighted risk
dangerous_permission_count: int # Count of high-risk perms
suspicious_api_count: int # Matches against SUSPICIOUS_API_PATTERNS
api_suspicion_score: float # 0-100 API risk
top_apis: List[str] # Most called methods
high_entropy_count: int # Shannon > 4.5
obfuscation_score: float # 0-100 code obfuscation
suspicious_urls: List[str] # Extracted HTTP(S) URLs
c2_hit_count: int # C2 IP matches
is_self_signed: bool # Certificate issuer = subject
cert_trust_score: float # 0-100 certificate trust
has_native_code: bool # .so libraries present
native_risk_score: float # 0-100 native lib risk
receiver_list: List[str] # Broadcast receivers
service_list: List[str] # Background services
graph_density: float # NetworkX call graph density
graph_node_count: int # Methods in call graph
graph_edge_count: int # Method calls
min_sdk: int # Minimum Android SDK
target_sdk: int # Target Android SDKRisk Calculation:
- Permission Combo Bonus - READ_SMS + INTERNET = +10 points
- Self-Signed Penalty - -40 cert_trust_score
- Native Library Check - Matches KNOWN_MALICIOUS_LIBS (libfrida-gadget.so, etc.)
File: backend/pipeline/dynamic_analyzer.py
Sandbox Modes:
- Live Mode (
USE_LIVE_SANDBOX=true) - Requires ADB + Frida + mitmproxy- Installs APK on connected device
- Injects Frida hooks for API monitoring
- Captures network traffic with mitmproxy
- Runs
monkeyfor UI interaction
- Emulated Mode (default) - Returns neutral values when no sandbox available
Features Extracted:
@dataclass
class DynamicFeatures:
sandbox_mode: str # "live" or "emulated"
sms_send_attempts: int # sendTextMessage() calls
network_domains_contacted: List[str]
c2_domains_hit: int # Known C2 matches
data_exfil_bytes: int # Total outbound traffic
accessibility_service_abused: bool # Overlay attack detection
clipboard_hijack_detected: bool # ClipboardManager hooks
silent_install_attempted: bool # PackageInstaller calls
camera_accessed: bool # Camera.open() detected
microphone_accessed: bool # MediaRecorder usage
location_accessed: bool # GPS provider access
device_admin_requested: bool # DevicePolicyManager
behavioural_anomaly_score: float # 0-100 runtime risk
matched_malware_family: str # e.g. "BankBot", "Unknown"
family_similarity_score: float # 0.0-1.0 confidenceLive Sandbox Requirements:
# Android Debug Bridge
adb devices # Must show at least one device
# Frida (optional - for runtime hooking)
pip install frida-tools
frida-ps -U # List processes on USB device
# mitmproxy (optional - for network capture)
pip install mitmproxy
mitmdump --versionFile: backend/pipeline/llm_analyzer.py
System Prompt:
You are an elite Android malware analyst at a national cybersecurity agency. You have 15 years of experience with banking trojans, spyware, SMS stealers, and overlay attack frameworks. Never speculate without evidence from the code. Never produce generic statements β cite specific class names, method names, API calls, or string literals from the code.
Features Extracted:
@dataclass
class LLMFeatures:
primary_function: str # "What this app really does"
malicious_behaviors: List[str] # Specific behaviors with evidence
data_collection: List[str] # Data exfiltration methods
obfuscation_techniques: List[str] # Code obfuscation patterns
attack_vectors: List[str] # Technical attack chains
india_specific_risks: List[str] # UPI/OTP/Banking risks
severity_score: float # 0.0-1.0 LLM confidence
confidence: float # 0.0-1.0 verdict confidence
verdict: str # CRITICAL/HIGH/MEDIUM/LOW/UNKNOWN
recommended_action: str # Next steps for analyst
executive_summary: str # 2-3 sentence summary
zero_day_hypotheses: List[str] # Novel threat theoriesZero-Day Detection:
- Triggered when
severity_score > 0.6ANDfamily_similarity_score < 0.4 - Generates 3 ranked threat hypotheses for unknown malware
API Configuration:
import google.generativeai as genai
genai.configure(api_key=GEMINI_API_KEY)
model = genai.GenerativeModel("gemini-2.0-flash")
response = model.generate_content(prompt)File: backend/pipeline/risk_scorer.py
Multi-Dimensional Scoring:
dimension_scores = {
"permission_abuse": 20% weight, # Dangerous permissions
"obfuscation": 15% weight, # Code obfuscation
"behavioral_anomaly": 25% weight, # Runtime behavior
"ml_malware": 20% weight, # XGBoost prediction
"developer_trust": 10% weight, # Certificate validation
"llm_severity": 10% weight, # Gemini assessment
}
composite_score = Ξ£(dimension_score Γ weight)Boost Rules (Context-Aware Amplification):
- SMS send attempts β +15 points
- C2 domains contacted β +20 points
- Accessibility service abuse β +12 points
- Static C2 IPs found β +10 points
- LLM verdict CRITICAL β +10 points
- Silent install attempt β +15 points
Action Thresholds:
RISK_THRESHOLDS = {
"LOW": 0-25 β APPROVE,
"MEDIUM": 26-50 β MONITOR,
"HIGH": 51-75 β ESCALATE,
"CRITICAL": 76-100 β BLOCK
}SHAP Explainability:
explainer = shap.TreeExplainer(xgb_model)
shap_values = explainer.shap_values(feature_vector)
# Extract top 5 contributors
top_features = [
("permission_danger", +18.5),
("obfuscation_score", +12.3),
("c2_hit_count", +9.7),
("has_native_code", +5.2),
("graph_density", -3.1)
]File: backend/pipeline/report_generator.py
Report Structure:
VERDICT: BLOCK β App exhibits clear signs of malicious intent.
RISK SCORE: 82.3/100 β Score driven by: permission_danger (+18.5), c2_hit_count (+12.0)
TECHNICAL FINDINGS:
- Permission Analysis: 8 dangerous permissions requested.
- Code Behaviour: Banking overlay with SMS interception. Accessibility service abuse for OTP capture.
- Network Activity: Contacted 3 domains. C2 hits: 1.
- Obfuscation: 127 high entropy strings detected (Score: 64.2).
INDIA-SPECIFIC THREAT: UPI transaction overlay, OTP SMS interception targeting Bank of India users.
RECOMMENDED ACTIONS:
1. Immediate: Block application execution and network access.
2. Investigation: Identify affected devices and reset credentials.
3. Reporting: File a formal report with CERT-In.
EVIDENCE SUMMARY:
* Hardcoded C2 IPs detected in code
* Requests Accessibility Service (Overlay/Keylogger potential)
* LLM identified: SMS interception with runtime code injection
* Network traffic to known C2 domains
* Suspicious API usage: sendTextMessage, getDeviceId, Runtime.exec
Forensic Indicators: Top 5 evidence items ranked by criticality, with technical citations (class names, method names, API calls).
Dataset Generation:
python -m backend.training.train_xgboostSynthetic Data Distribution:
- Benign Apps (n=800)
- Permission danger: ΞΌ=15, Ο=10
- API suspicion: ΞΌ=16, Ο=8
- Obfuscation: ΞΌ=12, Ο=8
- Self-signed: 60%
- Malicious Apps (n=800)
- Permission danger: ΞΌ=70, Ο=18
- API suspicion: ΞΌ=72, Ο=18
- Obfuscation: ΞΌ=65, Ο=20
- Self-signed: 90%
Feature Engineering (backend/training/feature_engineering.py):
- Missing value imputation (median strategy)
- Column removal (>40% missing)
- StandardScaler normalization
- SMOTE oversampling (if class imbalance > 5:1)
Model Hyperparameters:
XGBClassifier(
n_estimators=300, # 300 boosting rounds
max_depth=6, # Tree depth
learning_rate=0.05, # Step size shrinkage
subsample=0.8, # Row sampling
colsample_bytree=0.8, # Column sampling
scale_pos_weight=ratio, # Class imbalance weight
eval_metric=["logloss", "auc"],
early_stopping_rounds=20, # Validation patience
tree_method="hist" # CPU-optimized
)Evaluation Metrics (backend/training/evaluate.py):
- Precision, Recall, F1-Score
- ROC-AUC
- Confusion Matrix
- Feature Importance (gain/weight/cover)
Model Artifacts:
models/
βββ xgboost_mobileguard.json # Trained XGBoost model
βββ scaler.pkl # StandardScaler object
βββ feature_columns.json # 37 feature names
βββ shap_feature_importance.png # SHAP summary plot
Replace synthetic data with:
- Drebin Dataset - 15,036 malware samples, 123K+ benign apps
- CIC-AndMal2017 - 426 malware families across 5 categories
- AndroZoo - 10M+ APKs with VirusTotal labels
# Example: Load Drebin parquet
df = pd.read_parquet("data/drebin_features.parquet")
X, y, feature_columns, scaler = engineer_features(df)App.jsx
βββ Header (System Status)
β βββ API Health Indicator
β βββ Analysis Engine Status
β βββ Last Scan Timestamp
β
βββ Left Panel (4 cols)
β βββ UploadZone (Drag & Drop)
β βββ ProgressTracker (5 stages with icons)
β
βββ Right Panel (8 cols)
βββ ActionBanner (Verdict + Score)
βββ RiskGauge (Circular gauge with gradient)
βββ DimensionChart (Radar chart with 6 axes)
βββ ShapExplainer (Waterfall plot)
βββ ThreatReport (Collapsible sections)
βββ AuditLog (Paginated table)
UploadZone (src/components/UploadZone.jsx):
- Drag-and-drop zone with hover state
- File type validation (
.apkonly) - Size validation (150MB max client-side check)
- Lucide Upload icon with animation
ProgressTracker (src/components/ProgressTracker.jsx):
- 5 stages with icons (FileSearch, Activity, Brain, BarChart, FileText)
- Progress bar with gradient fill
- Real-time status updates from SSE
- Error state with AlertTriangle icon
RiskGauge (src/components/RiskGauge.jsx):
- Recharts RadialBarChart
- Dynamic color gradient (green β yellow β orange β red)
- Score label with action badge
- Animated arc fill with easeElastic
DimensionChart (src/components/DimensionChart.jsx):
- Recharts RadarChart with 6 dimensions
- Permission Abuse, Obfuscation, Behavioral Anomaly, ML Malware, Developer Trust, LLM Severity
- Gradient fill with opacity
- Tooltip with dimension explanations
ShapExplainer (src/components/ShapExplainer.jsx):
- Top 5 feature contributions
- Positive values (red) vs Negative values (green)
- Horizontal bar chart with labels
- Explanation text from risk_scorer
ThreatReport (src/components/ThreatReport.jsx):
- Collapsible sections (Executive Summary, Technical Findings, Evidence)
- Copy-to-clipboard functionality
- Malware family badge
- India-specific risk flag
- Forensic indicators with checkboxes
ActionBanner (src/components/ActionBanner.jsx):
- Color-coded by action (APPROVE=green, MONITOR=yellow, ESCALATE=orange, BLOCK=red)
- Large composite score display
- Icon (Shield, AlertTriangle, XCircle)
- Framer Motion slide-in animation
Colors (Tailwind config):
colors: {
background: "#07111F", // Deep navy
card: "rgba(255,255,255,0.04)", // Glassmorphism
accent: "#3B82F6", // Blue
success: "#22C55E", // Green
warning: "#F59E0B", // Amber
danger: "#EF4444", // Red
muted: "#64748B", // Slate
textPrimary: "#F8FAFC", // Off-white
textSecondary: "#94A3B8" // Light slate
}Animations:
- Page load: Staggered fade-in (Framer Motion)
- Card mount: Scale + opacity transition
- Progress bar: Smooth width animation with spring physics
- Gauge fill: Arc sweep with easeElastic timing
Typography:
- Font: Inter (variable font for optimal performance)
- Heading: 4xl/5xl bold with tight tracking
- Body: Base/lg with relaxed line height
- Code: Monospace (JetBrains Mono fallback)
cd backend
pytest tests/ -v --cov=backend --cov-report=htmlTest Files:
tests/test_api.py- FastAPI endpoint teststests/test_static.py- Static analyzer unit teststests/test_scorer.py- Risk scoring validation
Example Test:
def test_health_endpoint_returns_ok():
response = client.get("/health")
assert response.status_code == 200
assert response.json()["status"] == "ok"
def test_analyze_rejects_non_apk_files():
with open("test.txt", "w") as f:
f.write("Not an APK")
with open("test.txt", "rb") as f:
response = client.post("/analyze", files={"file": f})
assert response.status_code == 422cd frontend
npm run test # Vitest + React Testing LibraryTesting Strategy:
- Unit tests for API client functions
- Component tests with mocked API responses
- Integration tests for upload flow
- Visual regression tests (optional - with Playwright)
# docker-compose.yml
services:
backend:
build:
context: .
dockerfile: Dockerfile.backend
ports:
- "8000:8000"
env_file: .env
volumes:
- ./data:/app/data # Persistent cache & logs
- ./models:/app/models # Pre-trained model
restart: unless-stopped
frontend:
build:
context: .
dockerfile: Dockerfile.frontend
ports:
- "3000:80"
depends_on:
- backend
restart: unless-stoppedDeployment Commands:
docker-compose up -d # Start in detached mode
docker-compose logs -f backend # View backend logs
docker-compose down # Stop all services-
Environment Variables:
- Use Docker secrets or AWS Secrets Manager for API keys
- Never commit
.envto version control
-
Reverse Proxy:
- Configure Nginx for SSL termination
- Set up rate limiting (e.g., 10 uploads/minute per IP)
- Enable CORS only for trusted origins
-
Database:
- Replace SQLite with PostgreSQL for multi-node deployments
- Use connection pooling (SQLAlchemy
pool_size=20)
-
Storage:
- Mount persistent volumes for
data/andmodels/ - Use S3 for audit log archival
- Mount persistent volumes for
-
Monitoring:
- Prometheus metrics for API latency, error rates
- Grafana dashboards for pipeline stage durations
- Sentry for exception tracking
-
Security:
- Run containers as non-root user
- Scan Docker images with Trivy
- Enable AppArmor/SELinux profiles
AWS ECS (Fargate):
# Build and push to ECR
aws ecr get-login-password | docker login --username AWS --password-stdin <ecr-repo>
docker build -f Dockerfile.backend -t mobileguard-backend .
docker tag mobileguard-backend:latest <ecr-repo>/mobileguard-backend:latest
docker push <ecr-repo>/mobileguard-backend:latest
# Deploy with Fargate task definition
aws ecs update-service --cluster prod --service mobileguard --force-new-deploymentKubernetes (GKE/EKS):
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mobileguard-backend
spec:
replicas: 3
template:
spec:
containers:
- name: api
image: gcr.io/project/mobileguard-backend:v1.0.0
env:
- name: GEMINI_API_KEY
valueFrom:
secretKeyRef:
name: api-secrets
key: gemini-keymobileguard-ai/
βββ backend/ # Python FastAPI backend
β βββ config.py # Environment config & constants
β βββ main.py # FastAPI app & endpoints
β βββ requirements.txt # Python dependencies
β βββ dataset_feature_extractor.py # 37-feature extraction for ML model
β β
β βββ pipeline/ # Analysis pipeline modules
β β βββ orchestrator.py # Pipeline coordinator with SSE streaming
β β βββ static_analyzer.py # Androguard-based APK analysis
β β βββ dynamic_analyzer.py # ADB + Frida sandbox execution
β β βββ llm_analyzer.py # LLM analysis with structured output
β β βββ resilient_router.py # Multi-tier LLM routing (Gemini β GPT-4o)
β β βββ risk_scorer.py # XGBoost + SHAP scoring
β β βββ report_generator.py # Threat report generation
β β βββ behavior_scorer.py # Runtime behavior anomaly scoring
β β βββ runtime_collectors.py # Frida hooks + dumpsys collectors
β β βββ runtime_events.py # Event data structures
β β βββ event_mapper.py # Event β feature mapping
β β
β βββ detection/ # Signature-based detection
β β βββ yara_engine.py # YARA scanner with metadata scoring
β β βββ yara_rules/ # .yar signature files
β β
β βββ intel/ # Threat intelligence
β β βββ mitre_mapper.py # ATT&CK technique mapping
β β βββ family_classifier.py # Malware family classification
β β
β βββ data/ # Data management
β β βββ feature_store.py # SQLite caching layer
β β βββ audit_logger.py # JSONL audit logging
β β βββ threat_intel.py # C2 blocklist + VirusTotal integration
β β
β βββ training/ # ML model training
β β βββ train_xgboost.py # Model training script
β β βββ feature_engineering.py # SMOTE + StandardScaler
β β βββ evaluate.py # Metrics & SHAP plots
β β
β βββ tests/ # Pytest test suite
β βββ test_api.py # FastAPI endpoint tests
β βββ test_static.py # Static analyzer tests
β βββ test_scorer.py # Risk scorer tests
β
βββ frontend/ # React + Vite frontend
β βββ src/
β β βββ App.jsx # Main application component
β β βββ main.jsx # React entry point
β β βββ index.css # Tailwind base styles
β β β
β β βββ api/
β β β βββ client.js # Fetch API wrapper (SSE support)
β β β
β β βββ components/ # React components
β β βββ UploadZone.jsx # Drag & drop file upload
β β βββ ProgressTracker.jsx # 5-stage progress indicator
β β βββ RiskGauge.jsx # Recharts radial gauge
β β βββ DimensionChart.jsx # 6-axis radar chart
β β βββ ShapExplainer.jsx # Feature attribution viz
β β βββ ThreatReport.jsx # Collapsible report card
β β βββ ActionBanner.jsx # Verdict display banner
β β βββ AuditLog.jsx # Paginated log table
β β
β βββ public/
β β βββ favicon.svg
β β βββ icons.svg
β β
β βββ package.json # NPM dependencies
β βββ vite.config.js # Vite build config
β βββ tailwind.config.js # Tailwind theme
β βββ postcss.config.js # PostCSS plugins
β
βββ models/ # ML model artifacts
β βββ xgboost_mobileguard.json # Trained XGBoost model
β βββ scaler.pkl # StandardScaler object
β βββ feature_columns.json # 37 feature names
β βββ shap_feature_importance.png # Feature importance plot
β
βββ data/ # Runtime data storage
β βββ feature_cache.sqlite # APK analysis cache
β βββ audit_2026-06-18.jsonl # Daily audit logs
β βββ certin_iocs.json # Threat intel feed
β
βββ docker-compose.yml # Multi-container orchestration
βββ Dockerfile.backend # Backend container image
βββ Dockerfile.frontend # Frontend container image
βββ nginx.conf # Nginx config for frontend
βββ .env # Environment variables
βββ README.md # This file
- DEX2JAR Integration - Decompile to Java bytecode for deeper semantic analysis
- Control Flow Graph (CFG) Analysis - Detect code reachability and dead code patterns
- Data Flow Tracking - Trace sensitive data from source to sink (taint analysis)
- String Encryption Detection - Pattern matching for common encryption libraries (AES, RSA)
- Anti-Analysis Detection - Identify emulator checks, debugger detection, root detection
- Resource Analysis - Inspect assets, raw files, and embedded payloads
- Automated Device Farm - Integrate with AWS Device Farm or BrowserStack
- Multi-Device Testing - Test across Android 8-14 with different screen sizes
- Kernel-Level Monitoring - eBPF-based syscall tracing for privilege escalation detection
- UI Automation - Selenium-like APK interaction for permission dialog testing
- Memory Dump Analysis - Extract runtime strings, loaded libraries, decrypted payloads
- SSL Pinning Bypass - Automatic certificate unpinning for network analysis
- Multi-Model Ensemble - Combine Gemini, GPT-4, Claude for consensus scoring
- Code Summarization - Generate human-readable pseudocode from smali/DEX
- Threat Actor Attribution - Link malware samples to known APT groups
- Natural Language Queries - "Show me all apps that access SMS and call APIs"
- Automated IOC Extraction - Extract IPs, domains, file hashes from analysis
- Fine-Tuned Security Model - Train Gemini on labeled malware corpus
- Celery Task Queue - Asynchronous APK processing with Redis backend
- Horizontal Scaling - Load balancer with 3+ API replicas
- Database Migration - PostgreSQL with read replicas for feature store
- Caching Layer - Redis for hot APK hashes (< 1ms retrieval)
- Batch Analysis API - Upload 100+ APKs with parallel processing
- GraphQL API - Flexible querying for frontend/integrations
- Model Quantization - Reduce XGBoost model size by 60% (int8 inference)
- Lazy Feature Extraction - Extract only features needed by ML model
- Incremental Analysis - Cache intermediate results (static β dynamic β LLM)
- APK Deduplication - SHA256-based early termination for known samples
- Streaming Decompilation - Process APK classes incrementally
- CDN Integration - Serve frontend assets via CloudFront/Cloudflare
- VirusTotal Integration - Cross-reference hashes with 70+ AV engines
- MISP Integration - Ingest IOCs from Malware Information Sharing Platform
- AlienVault OTX - Community threat intelligence feed
- CERT-In Feed - Official Indian government threat bulletins
- Custom IOC Management - Upload enterprise-specific C2 domains/IPs
- Threat Actor Profiles - Link samples to known groups (Lazarus, APT28)
- Drebin Feature Vectors - Train classifier on 179 Drebin features
- Signature Database - 500+ malware family YARA rules
- Similarity Hashing - SSDeep/TLSH for variant detection
- Behavioral Clustering - Group unknown samples by runtime behavior
- Family Evolution Tracking - Detect new variants of known families
- UPI Deep Inspection - Detect PhonePe/Paytm/Google Pay overlay attacks
- Aadhaar OTP Monitoring - Flag apps intercepting UIDAI SMS
- Banking App Whitelist - Trusted app signatures for 30+ Indian banks
- Regional Language Support - Hindi/Tamil/Bengali UI translations
- RBI Compliance Reporting - Generate reports aligned with RBI guidelines
- NPCI Notification Integration - Alert on suspicious UPI transaction apps
- SIEM Integration - Export logs to Splunk/ELK/QRadar
- SOAR Playbooks - Automated response workflows (quarantine, alert, block)
- Active Directory SSO - LDAP/SAML authentication
- Multi-Tenancy - Isolated workspaces for different business units
- Role-Based Access Control (RBAC) - Analyst/Admin/Auditor roles
- Compliance Reports - SOC 2, ISO 27001, GDPR audit trails
- MalConv - 1D CNN for raw APK byte sequence classification
- DexRay - Graph neural network on call graphs
- Transformer-Based Classifier - BERT fine-tuned on decompiled code
- Generative Adversarial Network (GAN) - Synthetic malware generation for training
- Reinforcement Learning Sandbox - AI-driven APK interaction for maximum coverage
- LIME Integration - Local interpretable model-agnostic explanations
- Counterfactual Analysis - "What changes would flip the verdict?"
- Feature Interaction Plots - 2D SHAP dependence plots
- Natural Language Explanations - LLM-generated risk summaries
- Interactive Decision Trees - Visualize XGBoost tree paths
- Active Learning Pipeline - Flag uncertain samples for analyst review
- Model Drift Detection - Monitor prediction distribution shifts
- Online Learning - Update model with new labeled samples
- A/B Testing Framework - Compare model versions in production
- AutoML Integration - Hyperparameter tuning with Optuna/Ray Tune
- 3D Call Graph Visualization - Three.js interactive network diagram
- Timeline View - Chronological analysis stage progression
- Comparison Mode - Side-by-side analysis of 2+ APKs
- Dark/Light Mode Toggle - User preference persistence
- Export Reports - PDF/DOCX generation with branding
- Mobile App - React Native companion for on-the-go analysis
- Team Comments - Annotate analysis results with threaded discussions
- Shared Workspaces - Collaborative investigations
- Notification System - Email/Slack alerts for high-risk APKs
- Analyst Dashboard - Personal queue, statistics, leaderboard
- API Webhooks - Push notifications to external systems
- Plugin System - Custom analyzers via Python entry points
- YARA Rule Repository - Community-contributed malware signatures
- Threat Hunt Queries - Sigma-style detection rules
- Sample Exchange - Secure APK sharing platform (hashed uploads)
- Public API - Rate-limited free tier for researchers
- Documentation Portal - Interactive API explorer, tutorials, blog
- Academic Partnerships - Collaborate with universities on novel techniques
- Conference Papers - Publish findings at BlackHat, DEF CON, USENIX
- Bug Bounty Program - Reward security researchers for vulnerabilities
- Open Dataset Release - Anonymized analysis results for research
- Benchmark Suite - Standard test set for comparing malware detectors
We welcome contributions! Please follow these guidelines:
-
Fork the repository and create a feature branch
git checkout -b feature/your-feature-name
-
Make changes with clear commit messages
git commit -m "feat(static): Add native library signature matching" -
Write tests for new features
pytest tests/test_your_feature.py -v
-
Update documentation if adding public APIs
-
Submit a pull request with:
- Description of changes
- Test results
- Screenshots (for UI changes)
Commit Convention:
feat:- New featurefix:- Bug fixdocs:- Documentation updaterefactor:- Code refactoringtest:- Test additions/updateschore:- Build/tooling changes
MIT License - See LICENSE file for details.
- Androguard - APK analysis framework
- XGBoost - Gradient boosting library
- SHAP - Explainable AI toolkit
- Google Gemini - LLM API for contextual analysis
- Drebin Dataset - Android malware research dataset
- CERT-In - Indian cybersecurity standards
- Recharts - React charting library
- Framer Motion - Animation library
- Documentation: docs.mobileguard.ai
- Issues: GitHub Issues
- Email: indiser01@gmail.com
Built with β€οΈ for cybersecurity professionals