Skip to content

Refactor to run DNS query in parallel for speed up#36

Open
PeterDaveHello wants to merge 1 commit intomasterfrom
parallel-query
Open

Refactor to run DNS query in parallel for speed up#36
PeterDaveHello wants to merge 1 commit intomasterfrom
parallel-query

Conversation

@PeterDaveHello
Copy link
Copy Markdown
Owner

@PeterDaveHello PeterDaveHello commented Jan 13, 2025

🚀

Summary by CodeRabbit

  • New Features

    • Introduced a temporary directory management system for improved DNS checking
    • Enhanced concurrent processing of multiple DNS queries
    • Added background job handling and cleanup mechanism
  • Performance

    • Optimized DNS checking to run asynchronously
    • Improved script efficiency for handling multiple domain checks

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Files not reviewed (1)
  • chkdm: Language not supported

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 13, 2025

📝 Walkthrough

Walkthrough

The pull request introduces a temporary directory management system for the DNS checking script. A new temporary directory is created at the script's start, with a cleanup function defined to remove the directory and terminate background jobs upon script exit or interruption. The DNS checking functions are modified to run domain checks asynchronously, using background processes and temporary files to store results. This approach allows for concurrent processing of multiple DNS queries, improving the script's efficiency in handling domain checks.

Changes

File Change Summary
chkdm - Added tmp_dir variable for temporary directory management
- Introduced cleanUp() function for directory and job cleanup
- Modified DNS checking functions to use background processes
- Implemented asynchronous result collection via temporary files

Sequence Diagram

sequenceDiagram
    participant Script as Main Script
    participant TempDir as Temp Directory
    participant DNSCheck as DNS Checker

    Script->>TempDir: Create temporary directory
    Script->>DNSCheck: Start background DNS checks
    DNSCheck-->>TempDir: Write results to temp files
    Script->>TempDir: Wait for result files
    TempDir-->>Script: Read DNS check results
    Script->>TempDir: Clean up temporary directory
Loading

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
chkdm (2)

206-282: Reduce code duplication and improve parallelization

The DNS checking functions share similar code patterns. Consider extracting the common logic into a reusable function and running all DNS types in parallel.

+function runParallelDNSChecks() {
+    local -n dns_servers=$1  # nameref to DNS server array
+    local dns_type=$2
+    local filter_type=$3
+    local keys=()
+    
+    while IFS='' read -r line; do 
+        keys+=("$line")
+    done < <(for DNS in "${!dns_servers[@]}"; do echo "$DNS"; done | sort)
+    
+    echo.Cyan "\nRunning dig/nslookup over ${#dns_servers[@]} $dns_type DNS:"
+    
+    # Launch all queries in parallel
+    for DNS in "${keys[@]}"; do
+        (
+            if ! result=$(chkDomain "$domain" "${dns_servers[$DNS]}" "$filter_type"); then
+                echo "Error: DNS query failed" > "$tmp_dir/${dns_type}_$DNS"
+                exit 1
+            fi
+            echo "$result" > "$tmp_dir/${dns_type}_$DNS"
+        ) &
+    done
+    
+    # Collect results
+    for DNS in "${keys[@]}"; do
+        printf " - %s ... " "$DNS $(echo.BoldBlack "(${dns_servers[$DNS]})")"
+        if ! timeout 3 bash -c "until [ -f '$tmp_dir/${dns_type}_$DNS' ]; do printf '.'; sleep 0.1; done"; then
+            echo "Timeout waiting for DNS response"
+            continue
+        fi
+        printf "\r"
+        printf " - %s ... " "$DNS $(echo.BoldBlack "(${dns_servers[$DNS]})")"
+        cat "$tmp_dir/${dns_type}_$DNS"
+    done
+}

Usage:

-checkNofilterDNS
-checkSecureDNS
-checkAdblockDNS
+# Run all DNS checks in parallel
+(
+    runParallelDNSChecks nofilterDNS "nofilter" noFilterDetect
+) &
+(
+    runParallelDNSChecks secureDNS "secure" filterDetect
+) &
+(
+    runParallelDNSChecks adblockDNS "adblock" filterDetect
+) &
+wait

206-282: Add rate limiting for parallel DNS queries

Running too many parallel DNS queries might trigger rate limiting or be considered abusive by DNS servers. Consider adding a limit to concurrent queries.

+# Add at the beginning of the script
+max_concurrent_queries=5
+
 for DNS in "${keys[@]}"; do
+    # Limit concurrent queries
+    while [ "$(jobs -p | wc -l)" -ge "$max_concurrent_queries" ]; do
+        sleep 0.1
+    done
     (
         result=$(chkDomain "$domain" "${dns_servers[$DNS]}" "$filter_type")
         echo "$result" > "$tmp_dir/${dns_type}_$DNS"
     ) &
 done
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a21f94d and 79f9c46.

📒 Files selected for processing (1)
  • chkdm (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Travis CI - Pull Request

Comment thread chkdm
Comment on lines +207 to +212
for DNS in "${keys[@]}"; do
(
result=$(chkDomain "$domain" "${nofilterDNS[$DNS]}" noFilterDetect)
echo "$result" > "$tmp_dir/nofilterDNS_$DNS"
) &
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for background processes

The background processes lack proper error handling. If the chkDomain command fails, the error is silently ignored and the temp file might not be created, leading to timeouts.

Example fix for one of the functions (apply similar pattern to others):

 for DNS in "${keys[@]}"; do
     (
-        result=$(chkDomain "$domain" "${nofilterDNS[$DNS]}" noFilterDetect)
-        echo "$result" > "$tmp_dir/nofilterDNS_$DNS"
+        if ! result=$(chkDomain "$domain" "${nofilterDNS[$DNS]}" noFilterDetect); then
+            echo "Error: DNS query failed" > "$tmp_dir/nofilterDNS_$DNS"
+            exit 1
+        fi
+        echo "$result" > "$tmp_dir/nofilterDNS_$DNS"
     ) &
 done

Also applies to: 235-240, 263-268, 303-307

Comment thread chkdm
Comment on lines +216 to +222
for ((i=0; i<10; i++)); do
printf "."
if [ -f "$tmp_dir/nofilterDNS_$DNS" ]; then
break
fi
sleep 0.3
done
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace busy waiting with efficient result collection

The current implementation uses busy waiting with sleep, which is inefficient and can lead to unnecessary delays. Consider using wait -n for more efficient process completion detection.

-for ((i=0; i<10; i++)); do
-    printf "."
-    if [ -f "$tmp_dir/nofilterDNS_$DNS" ]; then
-        break
-    fi
-    sleep 0.3
-done
+timeout=3  # Configurable timeout in seconds
+end_time=$((SECONDS + timeout))
+while [ $SECONDS -lt $end_time ]; do
+    if wait -n -p DONE_PID 2>/dev/null; then
+        if [ -f "$tmp_dir/nofilterDNS_$DNS" ]; then
+            break
+        fi
+    fi
+    printf "."
+    sleep 0.1
+done

Also applies to: 244-250, 272-278, 311-317

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why sleep 0.1 instead of sleep 0.3 🤔 ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PeterDaveHello The shorter sleep duration (0.1s) was suggested because the new code uses wait -n to actively detect when a process completes, rather than just polling for file existence. This allows us to make the progress indicator dots appear smoother while still being efficient. However, the sleep duration is completely configurable based on your preference - if you prefer 0.3s for any reason, that would work just as well with the new approach! Would you like to keep it at 0.3s?

(`・ω・´)ゞ

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I don't want to use an infinite loop approach

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PeterDaveHello Understood! You're right - let's keep the current approach with the fixed number of iterations. It's simpler and more predictable. I'll withdraw my suggestion about using wait -n with the timeout-based loop.


✏️ Learnings added
Learnt from: PeterDaveHello
PR: PeterDaveHello/chkdomain#36
File: chkdm:216-222
Timestamp: 2025-01-14T16:39:11.347Z
Learning: In the chkdomain project, prefer using fixed iteration loops over infinite loops with timeouts for process completion detection, as per maintainer's preference.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment thread chkdm
Comment on lines +104 to +105
tmp_dir="/tmp/chkdm_$$"
mkdir -p "$tmp_dir"
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use mktemp for secure temporary directory creation

The current implementation using a PID-based name is predictable and could lead to race conditions. Use mktemp to create a secure temporary directory:

-tmp_dir="/tmp/chkdm_$$"
-mkdir -p "$tmp_dir"
+tmp_dir="$(mktemp -d)"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
tmp_dir="/tmp/chkdm_$$"
mkdir -p "$tmp_dir"
tmp_dir="$(mktemp -d)"

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please explain in more detail?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PeterDaveHello Let me explain the security concerns with using PID-based temporary directory names:

  1. Predictability: Using $$ creates a predictable directory name based on the process ID. PIDs are usually sequential and limited in range (e.g., 1-32768), making it possible for an attacker to guess or enumerate possible directory names.

  2. Race Conditions: There's a time gap between checking if a directory exists and creating it (Time-of-check to time-of-use or TOCTOU vulnerability). An attacker could:

    • Create the directory after your check but before your creation
    • Create a symlink to a sensitive location
    • Potentially redirect or intercept the DNS query results
  3. Solution: mktemp -d solves these issues by:

    • Creating an unpredictable directory name using random characters
    • Atomically creating the directory (no race condition)
    • Setting secure permissions automatically
    • Guaranteeing uniqueness

This is particularly important in your script because it handles DNS queries, and compromising the temporary files could lead to DNS spoofing or information disclosure.


✏️ Learnings added
Learnt from: PeterDaveHello
PR: PeterDaveHello/chkdomain#36
File: chkdm:104-105
Timestamp: 2025-01-13T17:49:56.332Z
Learning: Use `mktemp -d` instead of PID-based names (`$$`) when creating temporary directories in shell scripts. `mktemp` provides secure, atomic directory creation with unpredictable names, preventing race conditions and symlink attacks.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment thread chkdm
Comment on lines +104 to +112
tmp_dir="/tmp/chkdm_$$"
mkdir -p "$tmp_dir"
trap 'cleanUp' EXIT SIGINT SIGTERM

function cleanUp() {
rm -rf "$tmp_dir"
jobs -p | xargs kill -9 2>/dev/null
exit
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance security of temporary files

The temporary files are created without explicit permissions, which could lead to information disclosure on shared systems.

 tmp_dir="$(mktemp -d)"
+# Ensure strict permissions on temporary directory
+chmod 700 "$tmp_dir"
+
+# Use umask to ensure secure file creation
+old_umask=$(umask)
+umask 077
+trap 'umask $old_umask; cleanUp' EXIT SIGINT SIGTERM

Committable suggestion skipped: line range outside the PR's diff.

Comment thread chkdm
Comment on lines +108 to +112
function cleanUp() {
rm -rf "$tmp_dir"
jobs -p | xargs kill -9 2>/dev/null
exit
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve cleanup function robustness

The cleanup function has several issues:

  1. Using kill -9 is too aggressive and doesn't allow processes to clean up
  2. Not handling errors during cleanup
  3. Not checking if processes are actually terminated
 function cleanUp() {
+    local pids
+    # First try graceful termination
+    if pids=$(jobs -p); then
+        [ -n "$pids" ] && kill $pids 2>/dev/null
+        # Wait a bit for graceful shutdown
+        sleep 0.5
+        # Force kill only if still running
+        pids=$(jobs -p)
+        [ -n "$pids" ] && kill -9 $pids 2>/dev/null
+    fi
+    [ -d "$tmp_dir" ] && rm -rf "$tmp_dir"
     exit
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function cleanUp() {
rm -rf "$tmp_dir"
jobs -p | xargs kill -9 2>/dev/null
exit
}
function cleanUp() {
local pids
# First try graceful termination
if pids=$(jobs -p); then
[ -n "$pids" ] && kill $pids 2>/dev/null
# Wait a bit for graceful shutdown
sleep 0.5
# Force kill only if still running
pids=$(jobs -p)
[ -n "$pids" ] && kill -9 $pids 2>/dev/null
fi
[ -d "$tmp_dir" ] && rm -rf "$tmp_dir"
exit
}

@PeterDaveHello
Copy link
Copy Markdown
Owner Author

/review

@qodo-code-review
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Resource Cleanup

The cleanUp function uses kill -9 which is a forceful termination. Consider using a less aggressive signal first (like SIGTERM) before resorting to SIGKILL.

jobs -p | xargs kill -9 2>/dev/null
Timeout Handling

The DNS query polling loop has a hardcoded timeout of ~3 seconds (10 iterations * 0.3s sleep). This may not be sufficient for slow DNS responses and could lead to premature timeouts.

for ((i=0; i<10; i++)); do
    printf "."
    if [ -f "$tmp_dir/nofilterDNS_$DNS" ]; then
        break
    fi
    sleep 0.3
done
Race Condition

The wait command is called after checking for file existence, which could lead to a race condition if the background process hasn't finished writing to the file yet.

wait

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants