Skip to content

Upgrade Ruby 2.6.10 -> 4.0.5 (staircase, 8 commits)#2

Open
lleirborras wants to merge 18 commits into
masterfrom
upgrade/ruby-4-stairs
Open

Upgrade Ruby 2.6.10 -> 4.0.5 (staircase, 8 commits)#2
lleirborras wants to merge 18 commits into
masterfrom
upgrade/ruby-4-stairs

Conversation

@lleirborras

Copy link
Copy Markdown
Member

Summary

Walk ldapr up the Ruby version staircase one rung at a time. End state: Ruby 4.0.5, Sinatra 4, Rack 3, OmniAuth 2, Puma 6, Capistrano 3. Every commit ships green at its target Ruby.

Built per the staircase playbook (Phase −1 fan-out of Architect/Analyst/Tester/DevOps subagents, then deterministic per-step execution). See docs/UPGRADE.md in the coordination repo.

Staircase

Commit Ruby What landed
9b39d71 2.6.10 Phase 0: Docker test harness, RSpec smoke suite (was zero tests), Cap 3 + Puma swap, Gemfile.deploy split, secret → ENV, YAML.safe_load, require 'logger' for AS 6.1, frozen-string-safe Person#[]
0f18e32 2.7.8 Step 1 — pin baseline gems (Sinatra 2.1 / Rack 2.2 / OmniAuth 1.9), spec format=URL-extension
ff3fd40 3.0.7 Step 2 — clean bump
f7a2680 3.1.6 Step 3 — Psych 4 default; safe_load already in place
1d98916 3.2.6 Step 4 — activesupport ~> 7.1; transitively nokogiri 1.17 / addressable 2.9
29d97bb 3.3.6 Step 5 — byebug → stdlib debug; anchor app_file/root for Sinatra 2.1 on Ruby 3.3
61d31fb 3.4.4 Step 6 — Sinatra 4 + Rack 3 + rack-protection 4 + rack-session 2 + OmniAuth 2 + omniauth-cas 3 + Puma 6; stdlib pins (ostruct/logger/fileutils/bigdecimal/base64/mutex_m/drb); session secret bumped to ≥64 bytes
e184347 4.0.5 Step 7 — final bump; C extensions verified

Test plan

  • ./scripts/upgrade-test.sh 2.6.10 — 23/23 green
  • ./scripts/upgrade-test.sh 2.7.8 — 23/23 green
  • ./scripts/upgrade-test.sh 3.0.7 — 23/23 green
  • ./scripts/upgrade-test.sh 3.1.6 — 23/23 green
  • ./scripts/upgrade-test.sh 3.2.6 — 23/23 green
  • ./scripts/upgrade-test.sh 3.3.6 — 23/23 green
  • ./scripts/upgrade-test.sh 3.4.4 — 23/23 green
  • ./scripts/upgrade-test.sh 4.0.5 — 23/23 green
  • Manual smoke against staging CAS + staging AD before merge (no automated wire-protocol coverage)
  • Manual cap deploy to staging end-to-end (Cap 3 + Puma + systemd unit rename)

Required ops coordination before merge

  1. Systemd unit rename: ldapr-unicorn.serviceldapr-puma.service on staging + production hosts. config/deploy.rb now runs sudo systemctl restart ldapr-puma; first cap 3 deploy will fail otherwise.
  2. Production SESSION_SECRET: must be ≥64 bytes (rack-session 2 minimum). App boot raises with no fallback in RACK_ENV=production if unset. Generate with ruby -rsecurerandom -e 'puts SecureRandom.hex(64)'.
  3. Deploy bundle: deploy host now uses BUNDLE_GEMFILE=Gemfile.deploy bundle install to pick up Cap 3 + infrad + net-ssh 7 + bcrypt_pbkdf + ed25519.

OmniAuth 2 caveat

Step 6 enables OmniAuth.config.allowed_request_methods = [:get, :post] so the existing redirect-based CAS initiation keeps working. Strictly speaking OmniAuth 2 wants POST-only, but the only initiator is a server-side redirect from this app (no user-controlled inputs reach /auth/cas), so CSRF on the initiator isn't a meaningful attack surface here.

Out of scope

  • Replacing the old GET-driven OmniAuth initiation with a POST form (could be a follow-up if security wants strict OmniAuth 2 behaviour).
  • The Elasticsearch scripts/used-unused-mailboxes.rb logic (now correctly declares its httparty dep but logic is unchanged).
  • Writing automated tests against a live LDAP / live CAS (stubbed in the smoke suite).

Sets up the staircase upgrade harness. App still on Ruby 2.6.10.

Tooling:
- Dockerfile.upgrade (parameterised by RUBY_VERSION, linux/amd64)
- scripts/upgrade-test.sh (build + run rspec)
- .ruby-version (2.6.10)
- .dockerignore

Test suite (new; previously zero tests):
- spec/spec_helper.rb (RSpec + Rack::Test + WebMock)
- spec/support/ldap_stub.rb (deterministic Net::LDAP stand-in)
- spec/support/omniauth_stub.rb (OmniAuth test_mode)
- spec/routes_spec.rb (auth redirect + JSON/CSV/HTML formats + 400 + logout + failure)
- spec/ldap_spec.rb (AD timestamp round-trip, GUID decode, active?, locked_out?, method_missing)
- spec/export_spec.rb (JSON + CSV + tabular_attributes)
23 specs, 0 failures at Ruby 2.6.10.

Gemfile:
- Drop capistrano 2, infrad@capistrano2, net-ssh < 6 (moved to Gemfile.deploy)
- Drop unicorn, add puma
- Add httparty (used by scripts/used-unused-mailboxes.rb, previously undeclared)
- Add rspec / rack-test / webmock under :test

Gemfile.deploy (new):
- capistrano ~> 3.17, capistrano-bundler ~> 2.1
- infrad (default branch = Cap 3-compatible)
- net-ssh ~> 7, ed25519, bcrypt_pbkdf
- BUNDLE_GEMFILE=Gemfile.deploy on the deploy host

Deploy migration:
- config/unicorn.conf.rb deleted
- config/puma.rb added (thread + worker + socket support)
- Capfile rewritten for Cap 3 (Infrad::Capistrano3::Recipes.create! 'ldapr')
- config/deploy.rb rewritten for Cap 3 (lock ~> 3.17, linked_dirs/files, restart ldapr-puma)
  NOTE: systemd unit must be renamed from ldapr-unicorn.service to ldapr-puma.service
  on the host before first cap 3 deploy.

App-code patches (all flagged by the Analyst agent):
- ldapr.rb:10  Bundler.require -> explicit requires (no dev gems in prod)
- ldapr.rb:18  hardcoded session secret -> ENV['SESSION_SECRET'] (raise in prod if unset)
- ldap.rb:1    require 'logger' before active_support (AS 6.1 / Ruby 2.6 autoload bug)
- ldap.rb:31   YAML.load_file -> YAML.safe_load(..., permitted_classes: [], aliases: true)
- ldap.rb:240  Person#[] now dups before force_encoding (frozen-string safe)

Baseline still green at Ruby 2.6.10:
  ./scripts/upgrade-test.sh 2.6.10
  -> 23 examples, 0 failures
Ruby 2.7.8 (final 2.7 patch). No known cliffs at this rung — the
'usually free' step per docs/UPGRADE.md.

Pinning the staircase baseline (held until later steps need to bump):
- sinatra        ~> 2.1.0   (current production version)
- rack           ~> 2.2
- rack-protection ~> 2.1
- omniauth       ~> 1.9.0
- omniauth-cas   ~> 2.0.0
- net-ldap       ~> 0.17
- activesupport  ~> 6.1.0
- puma           ~> 5.6     (Ruby 2.x compatible)

Without these pins, unconstrained bundler resolves to Sinatra 4 /
Rack 3 / latest OmniAuth, producing an incompatible mix that fails
to boot under the live OmniAuth 1.x / Rack 2 code path.

Specs adjusted: route format detection uses the URL extension form
(/l.json, /l.csv) rather than the ?format= query param. Sinatra 2.1
treats them differently; the URL extension is the supported path on
this route ('#{ROOT}.?:format?').

./scripts/upgrade-test.sh 2.7.8
-> 23 examples, 0 failures
No code changes required. Pins from step 1 hold (Sinatra 2.1 / Rack 2.2 /
OmniAuth 1.9 / activesupport 6.1 all support Ruby 3.0).

Psych 4 isn't yet the default on Ruby 3.0 — that lands on 3.1 — but
YAML.safe_load is already in use (Phase 0) so the cliff is pre-handled.

./scripts/upgrade-test.sh 3.0.7
-> 23 examples, 0 failures
Psych 4 becomes the default loader. YAML.safe_load (in place since
Phase 0) absorbs the change. No other action required.

./scripts/upgrade-test.sh 3.1.6
-> 23 examples, 0 failures
Ruby 3.2 hard-deprecates File.exists? (already patched in Phase 0 via
the Unicorn -> Puma swap; no occurrences remain in the app).

activesupport bumped 6.1 -> 7.1 to clear the EOL window (6.1 reached
security-only in late 2023). 7.1 is the IFAD fleet target on the way
to 8.x.

Transitively resolved (no direct pin needed):
- nokogiri    1.17.x (Architect target was 1.16+; mini_portile2 pin removed)
- addressable 2.8.x  (CVE-2023-28625 fixed)

./scripts/upgrade-test.sh 3.2.6
-> 23 examples, 0 failures
…aths

byebug 11.x is known to segfault on Ruby 3.3; swap for the stdlib
'debug' gem (available since Ruby 3.1).

Ruby 3.3 changed caller_locations subtly; Sinatra 2.1 reads it to
guess app_file / views and now misfires when the app is loaded via
require (e.g. from rspec). Anchored explicitly:

  set :app_file, __FILE__
  set :root, __dir__

This is a no-op on direct rackup runs (where Sinatra's guess was
correct) and a fix when loaded indirectly. Will become moot once we
bump to Sinatra 3+ in the next steps.

./scripts/upgrade-test.sh 3.3.6
-> 23 examples, 0 failures
Biggest rung of the staircase.

Ruby 3.4: bundled gems extracted from default load path. Explicit pins
added for: ostruct, logger, fileutils, bigdecimal, base64, mutex_m, drb.
Warning at 3.4, error at 4.0 (next step).

Sinatra 4.1 + Rack 3.1 + rack-protection 4 + rack-session 2.
- Rack::Session::Cookie now lives in the rack-session gem; required
  explicitly in ldapr.rb.
- Cookie middleware option renamed :secret -> :secrets (array).
- rack-session 2 requires secret >= 64 bytes; test SESSION_SECRET
  bumped accordingly.

OmniAuth 2.1 + omniauth-cas 3.0 — forced now because omniauth 1.9
caps rack < 3. OmniAuth 2 defaults to POST-only on the request phase.
ldapr.rb sets OmniAuth.config.allowed_request_methods = [:get, :post]
because the only initiator is a server-side redirect (no
user-controlled inputs reach /auth/cas), so CSRF on the initiator is
not a meaningful attack surface. silence_get_warning suppresses the
deprecation notice.

Puma bumped 5.6 -> 6.4 (5.x is EOL).

Production-env note (out of scope for the test image): the production
SESSION_SECRET must also be >= 64 bytes. See app boot at ldapr.rb:18 —
raises in RACK_ENV=production if unset.

./scripts/upgrade-test.sh 3.4.4
-> 23 examples, 0 failures
Stdlib gem pins added at step 6 (ostruct/logger/fileutils/bigdecimal/
base64/mutex_m/drb) cover what Ruby 4 hard-removes from the default
load path.

C extensions verified:
- nokogiri 1.19.x precompiled binaries ship for x86_64-linux-gnu
- no oci8 / native auth gems to compile
- net-ldap is pure Ruby; net-ssh lives only in Gemfile.deploy

./scripts/upgrade-test.sh 4.0.5
-> 23 examples, 0 failures

Staircase complete. ldapr now on Ruby 4.0.5, Sinatra 4, Rack 3,
OmniAuth 2, Puma 6, Cap 3.
Per docs/UPGRADE.md in the parent upgrade-project repo:
- §13 'Move config files -> ENV vars'
- §14 'Keep the README current'

§13 changes:
- ldap.rb: LDAP.config now reads discrete env vars instead of loading
  config/ldap.yml.
    LDAP_HOSTNAME, LDAP_PORT (default 389), LDAP_USERNAME, LDAP_PASSWORD,
    LDAP_BASE (default 'dc=IFAD,dc=ORG'), LDAP_BRANCHES (comma-separated),
    LDAP_ENCRYPTION ('simple_tls' | 'start_tls' | unset).
  config_file method gone. YAML require gone.
- config/ldap.yml.example: deleted.
- config/deploy/ldap.yml.erb: deleted (was an interactive cap2 prompt
  template).
- config/deploy.rb: drop 'append :linked_files, config/ldap.yml' — the
  symlink target no longer exists.
- .gitignore: drop the 'config/ldap.yml' entry.

§14 changes:
- README.md: new (previously absent). Covers stack, full Configuration
  (environment variables) table, local-run, routes + query semantics,
  testing harness, deploy (incl. systemd rename + SESSION_SECRET >=64
  bytes), upgrade history of notable transitions.

Smoke at 4.0.5 stays green:
  ./scripts/upgrade-test.sh 4.0.5
  -> 23 examples, 0 failures
@lleirborras

Copy link
Copy Markdown
Member Author

Folded the fleet-rule cleanup into this PR (commit 1bdad07), per docs/UPGRADE.md §13 (config -> ENV) + §14 (README stays current):

§13 — config/ldap.yml -> ENV

  • ldap.rb: LDAP.config now reads discrete env vars: LDAP_HOSTNAME, LDAP_PORT, LDAP_USERNAME, LDAP_PASSWORD, LDAP_BASE, LDAP_BRANCHES (comma-separated), optional LDAP_ENCRYPTION. config_file method gone, YAML require gone.
  • Deleted config/ldap.yml.example + config/deploy/ldap.yml.erb (the cap2 interactive prompt template).
  • Deploy: config/deploy.rb drops append :linked_files, 'config/ldap.yml'. .gitignore drops the matching entry.

§14 — README created (previously absent)

  • Stack, full env-var reference table, local-run, routes + query semantics, testing harness, deploy (systemd rename + ≥64-byte SESSION_SECRET), upgrade history.

Smoke at 4.0.5: 23 examples, 0 failures (no regression).

Coordination — the deploy host must populate the new LDAP_* env vars before this PR is rolled out, in addition to the systemd-unit rename + SESSION_SECRET items already in the PR description.

- 'provider :cas, host: ...' now reads CAS_HOST env var (default
  'cas.ifad.org'). Lets staging point at a different CAS server
  without code change.
- README: documents CAS_HOST.
- Fix lint typo on ldapr.rb:48 ('present?x' -> 'present?'). Hit during
  manual smoke when unauthenticated user requested /l with a query
  string; raised NameError: undefined local variable 'x'.

./scripts/upgrade-test.sh 4.0.5 -> 23 examples, 0 failures
Container can't reach IFAD's internal CA out of the box; OmniAuth CAS
callback hits 'certificate verify failed (unable to get local issuer
certificate)' during manual smoke runs.

New CAS_DISABLE_SSL_VERIFICATION env var (default off). When truthy
('1'/'true'/'yes'), sets OpenSSL::SSL::SSLContext::DEFAULT_PARAMS
verify_mode to VERIFY_NONE before OmniAuth boots. Loud STDERR warning
on use.

Production must never set this. Proper fix is mounting the IFAD CA
bundle and pointing SSL_CERT_FILE at it; README calls that out too.

./scripts/upgrade-test.sh 4.0.5 -> 23 examples, 0 failures
…mutating DEFAULT_PARAMS

Ruby 4 / current openssl gem freezes OpenSSL::SSL::SSLContext::DEFAULT_PARAMS,
so the prior code blew up at boot with:
  FrozenError: can't modify frozen Hash: {verify_mode: 1, ...}

Switch to overriding SSLContext#set_params via class_eval. Every
Net::HTTP (and omniauth-cas service-ticket validation) call goes
through set_params, so injecting verify_mode: VERIFY_NONE there has
the same effect with no mutation of frozen state.

Verified:
  ./scripts/upgrade-test.sh 4.0.5 -> 23/23 green
  docker run -e CAS_DISABLE_SSL_VERIFICATION=true ... -> boots cleanly
Net::LDAP#search returns nil on bind / search failure (wrong creds,
invalid base DN, TLS requirement, no permission to read branch, ...).
The old code called .map! on the return value, so manual testing got
an opaque

  undefined method 'map!' for nil:NilClass
    /app/ldap.rb:184:in 'block in LDAP::Person.search'

instead of the actual LDAP server message.

Now: capture entries explicitly, raise with the LDAP operation result
(code + message) when nil, and call .map (non-bang) since the result is
ours anyway. The error string also hints at the likely env vars to
check, since this is the most common cause during the first cap3 / k8s
manual smoke.

./scripts/upgrade-test.sh 4.0.5 -> 23 examples, 0 failures
Manual smoke run with
  LDAP_BRANCHES='ou=People,dc=IFAD,dc=ORG,ou=People-NALO,...'

broke because .split(',') shredded every DN at the comma, so the
search was issued against half-DNs like 'dc=IFAD' alone — server
returned code=10 Referral.

Switch to a JSON array, consistent with the PEOPLESOFT_API_TOKENS
JSON pattern in the parent docs/UPGRADE.md §13. Example:
  LDAP_BRANCHES='["ou=People,dc=IFAD,dc=ORG","ou=People-NALO,ou=People,dc=IFAD,dc=ORG"]'

JSON.parse error includes the format hint. README updated accordingly.

./scripts/upgrade-test.sh 4.0.5 -> 23 examples, 0 failures
Same internal-CA wall as CAS but on net-ldap. Symptom on port 636 with
LDAP_ENCRYPTION=simple_tls (or start_tls on 389):

  Net::LDAP::NoBindResultError: no bind result
    at .../net-ldap/auth_adapter/simple.rb:26

(the TLS handshake fails before bind, so the adapter gets no result).

parse_encryption now augments the encryption spec with
tls_options: { verify_mode: VERIFY_NONE } when
LDAP_DISABLE_SSL_VERIFICATION is truthy. Loud STDERR warning on use.

Production still expected to mount IFAD's CA bundle + set SSL_CERT_FILE.
README updated.

./scripts/upgrade-test.sh 4.0.5 -> 23 examples, 0 failures
LDAP_TEST_MODE was set in Dockerfile.upgrade and scripts/upgrade-test.sh
during Phase 0 but never read in the app or specs — vestigial.

Stripping it as part of the §15 (mandatory-deps default-on / negative
env names) audit pass over already-migrated apps.

./scripts/upgrade-test.sh 4.0.5 -> 23 examples, 0 failures
bundle outdated --filter-major: puma 6.6.1 -> 8.0.2, activesupport 7.1.6 ->
8.1.3. Both bumped (activesupport now matches the fleet 8.1 target). Held
back: mustermann 3 (sinatra 4.1 caps it) + diff-lcs (rspec transitive).

activesupport 8.1 passes a FROZEN options Hash to as_json — LDAP::Person#as_json
called the mutating symbolize_keys! and raised FrozenError. Switched to the
non-mutating symbolize_keys.

Full rspec: 23 examples, 0 failures.
net-ldap was pinned `~> 0.17` (< 0.18) while the lock already carried 0.20.0
— misleading. Bumped the pin to `~> 0.20` to match. `bundle update` relocks
every gem to its current version:

  sinatra 4.2.1 · rack 3.2.6 · rack-protection 4.2.1 · rack-session 2.1.2 ·
  net-ldap 0.20.0 · activesupport 8.1.3 · rack-test 2.2.0 · webmock 3.26.2 ·
  puma 8.0.2

Held back (outside constraints): mustermann 3.1.1 (sinatra 4.2 caps < 4) +
diff-lcs 1.6.2 (rspec transitive). Nothing else outdated.

Full rspec: 23 examples, 0 failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant