Skip to content

Fix age range mismatch in create_cachar_inspired_data() test helper#4

Closed
Copilot wants to merge 2 commits intomasterfrom
copilot/fix-age-range-mismatch
Closed

Fix age range mismatch in create_cachar_inspired_data() test helper#4
Copilot wants to merge 2 commits intomasterfrom
copilot/fix-age-range-mismatch

Conversation

Copy link
Copy Markdown

Copilot AI commented Feb 25, 2026

sample() in create_cachar_inspired_data() passed a 53-element range (18:70) with a 73-element prob vector, causing 15 test failures.

Change

# Before
age = sample(18:70, n, replace = TRUE,
             prob = c(rep(0.8, 22), rep(1.2, 30), rep(0.6, 21)))  # 53 ≠ 73 ❌

# After
age = sample(18:90, n, replace = TRUE,
             prob = c(rep(0.8, 22), rep(1.2, 30), rep(0.6, 21)))  # 73 == 73 ✓

18:90 yields 73 elements, matching the intended probability distribution across the three age bands.

Original prompt

Fix create_cachar_inspired_data() Age Range Mismatch in Tests

The create_cachar_inspired_data() helper function in tests/testthat/test-calc-risk-diff.R has a critical bug causing 15 test failures.

The Problem:

In the create_cachar_inspired_data() function (used in ~9 tests), the age sampling has mismatched vector lengths:

# BROKEN CODE:
age = sample(18:70, n, replace = TRUE,
             prob = c(rep(0.8, 22), rep(1.2, 30), rep(0.6, 21)))

Why it fails:

  • 18:70 creates a range with 53 elements (70-18+1 = 53)
  • prob vector has 73 elements (22+30+21 = 73)
  • R's sample() function requires equal lengths: 53 ≠ 73

The Solution:

The probability vector is designed for a range of 73 values (0-72). Change the age range to match:

# FIXED CODE:
age = sample(18:90, n, replace = TRUE,
             prob = c(rep(0.8, 22), rep(1.2, 30), rep(0.6, 21)))

Why this works:

  • 18:90 creates a range with 73 elements (90-18+1 = 73) ✓
  • prob vector has 73 elements
  • Maintains the intended epidemiological age distribution

Affected Tests (15 total):

All tests using create_cachar_inspired_data() fail with identical error:

  1. "calc_risk_diff works with Cachar-inspired tobacco/areca data" (line 226)
  2. "calc_risk_diff works with tobacco_areca_both combination variable" (line 243)
  3. "calc_risk_diff compares single vs combined tobacco/areca exposures" (line 260)
  4. "calc_risk_diff handles age-adjusted analysis with tobacco/areca data" (line 280)
  5. "calc_risk_diff handles sex-stratified analysis with tobacco/areca data" (line 295)
  6. "calc_risk_diff handles residence-stratified analysis" (line 311)
  7. "calc_risk_diff works with head/neck specific outcomes" (line 326)
  8. "calc_risk_diff produces epidemiologically plausible results with tobacco data" (line 568)
  9. "calc_risk_diff shows expected sex differences in tobacco use patterns" (line 590)
  10. "calc_risk_diff handles moderately large datasets efficiently" (line 661)
  11. Plus additional failures from test data generator usage

File to Modify:

  • tests/testthat/test-calc-risk-diff.R - Line where age = sample(18:70, appears in create_cachar_inspired_data() function

Expected Result:

  • Before: 15 failures, 548 passing
  • After: 1 failure (boundary detection test only), 562 passing

This pull request was created from Copilot chat.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: jackmurphy2351 <120122776+jackmurphy2351@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix age range mismatch in create_cachar_inspired_data tests Fix age range mismatch in create_cachar_inspired_data() test helper Feb 25, 2026
@jackmurphy2351 jackmurphy2351 marked this pull request as ready for review February 25, 2026 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants