Skip to content

Comments

create sample script for ISSUE930#1113

Merged
suketa merged 2 commits intomainfrom
example_of_issue920
Feb 21, 2026
Merged

create sample script for ISSUE930#1113
suketa merged 2 commits intomainfrom
example_of_issue920

Conversation

@suketa
Copy link
Owner

@suketa suketa commented Feb 21, 2026

Summary by CodeRabbit

  • New Features

    • Register in-memory CSV data (from IO-like objects) as queryable tables for ad-hoc queries.
  • Samples

    • Example demonstrating how to register and query CSV-backed tables from memory.
    • Performance benchmark comparing a streaming table-function approach versus creating & populating a table, with timings.

@coderabbitai
Copy link

coderabbitai bot commented Feb 21, 2026

📝 Walkthrough

Walkthrough

Adds sample Ruby methods that register CSV-backed tables for DuckDB connections: a table-valued-function based registrar and a CREATE TABLE + insert variant, plus a benchmark comparing the two approaches using in-memory CSV (StringIO).

Changes

Cohort / File(s) Summary
CSV registration sample
sample/issue930.rb
Adds DuckDB::Connection#register_as_table(name, io, csv_options: {}) that builds and registers a CSV-backed table-valued function from an IO object, inferring columns from headers and streaming rows.
Benchmark & alternate registration
sample/issue930_benchmark.rb
Adds register_as_table_with_table_function and register_as_table_with_create_table to compare table-function streaming vs CREATE TABLE + inserts; includes a benchmarking script generating large StringIO CSV and measuring elapsed times.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Script as Client Script
participant Conn as DuckDB::Connection
participant CSV as CSV Reader (StringIO)
participant TF as Table Function / Table Storage
Script->>Conn: register_as_table(name, io, csv_options)
Conn->>CSV: open/parse headers
CSV-->>Conn: column names/types
Conn->>TF: define/register table-valued function with schema
loop per CSV row
Conn->>CSV: read next row
CSV-->>Conn: row values
Conn->>TF: output.set_value(...); emit row
end
Script->>Conn: execute "SELECT * FROM <registered_function>()"
Conn->>TF: stream rows to query engine
TF-->>Conn: result rows
Conn-->>Script: query results

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

🐰 I nibble bytes from StringIO's lair,
streaming rows with elegant flair.
Tables appear with a hop and a cheer,
benchmarks hum — the results draw near. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'create sample script for ISSUE930' accurately describes the main change—adding sample/example scripts for issue 930—which aligns with the PR's primary objective of creating demonstration and benchmarking scripts.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch example_of_issue920

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (3)
sample/issue930.rb (2)

40-47: Connection and database are never closed.

con and db remain open for the lifetime of the process. Either close them explicitly or use DuckDB::Database.open with a block (the block form in lib/duckdb/database.rb ensures db.close is called in ensure).

♻️ Proposed cleanup
-con = db.connect
-con.query('SET threads=1')
-con.register_as_table('csv_io', csv_io, csv_options: { headers: true })
-result = con.query('SELECT * FROM csv_io()').to_a
-
-p result
-puts result.first.first == '1'
-puts result.first[1] == 'Alice'
+DuckDB::Database.open do |db|
+  con = db.connect
+  con.query('SET threads=1')
+  con.register_as_table('csv_io', csv_io, csv_options: { headers: true })
+  result = con.query('SELECT * FROM csv_io()').to_a
+  con.close
+
+  p result
+  puts result.first.first == '1'
+  puts result.first[1] == 'Alice'
+end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sample/issue930.rb` around lines 40 - 47, The code opens a DuckDB Database
and Connection (db and con) but never closes them; either wrap the database
creation in DuckDB::Database.open with a block (which ensures db.close) and
perform con = db.connect inside that block, or add explicit cleanup by calling
con.close and db.close in an ensure/finally to guarantee they are closed even on
errors; update the area around con = db.connect and any use of register_as_table
/ con.query to ensure closure.

10-30: Metrics/MethodLength pipeline failure — 19 lines vs. the 10-line limit.

Either add an inline disable comment or extract the table-function block into a named helper lambda to bring the method within the limit.

♻️ Option: extract block into a named lambda
 def register_as_table(name, io, csv_options: {})
+  raise ArgumentError, 'csv_options must include headers: true' unless csv_options[:headers]
   csv = CSV.new(io, **csv_options)
   headers = csv.first.headers
   csv.rewind
   columns = headers.each_with_object({}) { |header, hash| hash[header] = LogicalType::VARCHAR }
-  tf = DuckDB::TableFunction.create(
-    name:,
-    columns:
-  ) do |_func_info, output|
-    line = csv.readline
-    if line
-      line.each_with_index do |cell, index|
-        output.set_value(index, 0, cell[1])
-      end
-      1
-    else
-      0
-    end
-  end
+  scanner = ->(_, output) {
+    line = csv.readline
+    if line
+      line.each_with_index { |cell, idx| output.set_value(idx, 0, cell[1]) }
+      1
+    else
+      0
+    end
+  }
+  tf = DuckDB::TableFunction.create(name:, columns:, &scanner)
   register_table_function(tf)
 end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sample/issue930.rb` around lines 10 - 30, The method register_as_table is too
long; extract the anonymous table-function block passed to
DuckDB::TableFunction.create into a named helper (e.g., row_provider_lambda) so
register_as_table returns under the 10-line limit; move the logic that reads
csv.readline, iterates cells and calls output.set_value into that lambda
(ensuring it closes over csv and headers as needed), then pass the named lambda
to DuckDB::TableFunction.create and keep the final register_table_function(tf)
call in register_as_table.
sample/issue930_benchmark.rb (1)

10-31: register_as_table_with_table_function duplicates register_as_table from issue930.rb.

The only behavioral difference is the csv.rewind in the else branch. Consider requiring issue930.rb and reusing its method, or moving the shared logic to a common module.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sample/issue930_benchmark.rb` around lines 10 - 31, The method
register_as_table_with_table_function is duplicating register_as_table from
issue930.rb; refactor by either requiring issue930.rb and calling the existing
register_as_table to avoid duplication, or extract the shared
CSV-to-table-function logic into a common module/class (e.g., a new
CsvTableHelper with a method used by both register_as_table and
register_as_table_with_table_function) and make the differing behavior (the
csv.rewind in the else branch) configurable via an argument or callback; update
register_as_table_with_table_function to delegate to the shared implementation
(or call register_as_table) so only the small behavioral difference is
implemented without copying the whole method.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sample/issue930_benchmark.rb`:
- Line 1: The file header contains a typo "frozen_sring_literal: true"; update
the magic comment to the correct "frozen_string_literal: true" (fix the
misspelled token "sring" to "string") so the top-level frozen string literal
pragma is applied correctly; locate and correct the comment line in the file
(the existing "frozen_sring_literal" occurrence).
- Around line 59-61: The second benchmark silently depends on the first run
having advanced csv_io to EOF; before calling
con.register_as_table_with_create_table('csv_ct', csv_io, ...) or re-running
con.query('SELECT * FROM csv_ct') you must explicitly reset the IO position
(e.g., csv_io.rewind or csv_io.seek(0)) so the CSV is read from the start; add
that reset between the first run and the block that uses csv_io again to ensure
register_as_table_with_create_table and con.query always see the full CSV input.
- Around line 37-40: The SQL construction in register_as_table_with_create_table
is vulnerable: stop interpolating raw table names (name), column names
(headers), and cell values (cell[1]) directly into execute; instead quote/escape
identifiers (use the DB driver's identifier-quoting API or wrap identifiers in
double quotes and escape embedded quotes) when building the CREATE TABLE and
INSERT column list, and avoid string interpolation for values by using
parameterized queries if the DuckDB client supports bind parameters (or at
minimum escape single quotes in cell values with cell[1]&.gsub("'", "''") before
embedding). Ensure you reference the execute/execute("CREATE OR REPLACE TABLE
..."), headers.map block, and the csv.each/values construction so all three
injection surfaces are fixed consistently.

In `@sample/issue930.rb`:
- Line 1: Fix the malformed magic comment: replace the typo
"frozen_sring_literal: true" with the correct "frozen_string_literal: true" as
the first line of the file so Ruby recognizes it and RuboCop's
Style/FrozenStringLiteralComment passes; locate the existing magic comment (the
token frozen_sring_literal) and correct the spelling to frozen_string_literal.
- Around line 37-43: Replace the literal 30000 with an underscore-separated
numeric literal (30_000) in the csv_data construction and change the
double-quoted SQL string to a single-quoted string; specifically update the
expression that builds csv_data (the 30000.times.map call that appends to
csv_data) to use 30_000 and update the con.query call that invokes "SELECT *
FROM csv_io()" to use a single-quoted 'SELECT * FROM csv_io()' string to satisfy
RuboCop for csv_data, csv_io, db, and con.register_as_table/con.query usages.
- Around line 10-28: The method register_as_table assumes CSV rows are CSV::Row
(headers present) which causes NoMethodError otherwise; ensure headers are
enforced and iteration is robust: when creating CSV (CSV.new(io,
**csv_options)), set csv_options[:headers] = true if not provided, and update
the row/field handling in the DuckDB::TableFunction block (the csv.first.headers
and the loop using cell[1]) to treat each yielded cell as a [header,value] pair
consistently (or normalize to value via cell.is_a?(Array) ? cell[1] : cell),
referencing register_as_table, csv_options, CSV.new, csv.first.headers, and
output.set_value to locate the changes.

---

Duplicate comments:
In `@sample/issue930_benchmark.rb`:
- Around line 10-13: The method register_as_table_with_table_function currently
calls csv.first.headers which will raise if csv_options lacks headers: true;
update this method to ensure headers are enabled before accessing headers by
either merging a default (e.g. csv_options = {headers: true}.merge(csv_options))
when constructing CSV.new or explicitly checking csv_options[:headers] (or
csv.headers?) before calling csv.first.headers and handling the non-headers
case; refer to the register_as_table_with_table_function method, the csv
variable, and csv_options to locate and apply the guard/fix.

---

Nitpick comments:
In `@sample/issue930_benchmark.rb`:
- Around line 10-31: The method register_as_table_with_table_function is
duplicating register_as_table from issue930.rb; refactor by either requiring
issue930.rb and calling the existing register_as_table to avoid duplication, or
extract the shared CSV-to-table-function logic into a common module/class (e.g.,
a new CsvTableHelper with a method used by both register_as_table and
register_as_table_with_table_function) and make the differing behavior (the
csv.rewind in the else branch) configurable via an argument or callback; update
register_as_table_with_table_function to delegate to the shared implementation
(or call register_as_table) so only the small behavioral difference is
implemented without copying the whole method.

In `@sample/issue930.rb`:
- Around line 40-47: The code opens a DuckDB Database and Connection (db and
con) but never closes them; either wrap the database creation in
DuckDB::Database.open with a block (which ensures db.close) and perform con =
db.connect inside that block, or add explicit cleanup by calling con.close and
db.close in an ensure/finally to guarantee they are closed even on errors;
update the area around con = db.connect and any use of register_as_table /
con.query to ensure closure.
- Around line 10-30: The method register_as_table is too long; extract the
anonymous table-function block passed to DuckDB::TableFunction.create into a
named helper (e.g., row_provider_lambda) so register_as_table returns under the
10-line limit; move the logic that reads csv.readline, iterates cells and calls
output.set_value into that lambda (ensuring it closes over csv and headers as
needed), then pass the named lambda to DuckDB::TableFunction.create and keep the
final register_table_function(tf) call in register_as_table.

Comment on lines +37 to +40
execute("CREATE OR REPLACE TABLE #{name} (#{headers.map { |h| "#{h} VARCHAR" }.join(', ')})")
csv.each do |row|
values = row.map { |cell| "'#{cell[1]}'" }.join(', ')
execute("INSERT INTO #{name} VALUES (#{values})")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SQL injection on table name, column names, and cell values.

Three distinct injection surfaces in register_as_table_with_create_table:

  1. Line 37 — table identifier: #{name} is interpolated directly into both DDL and DML with no quoting.
  2. Line 37 — column identifiers: CSV header names are also interpolated unquoted.
  3. Lines 39-40 — values: '#{cell[1]}' wraps values in single quotes but applies no escaping. A CSV cell containing O'Brien produces a syntax error; a cell containing '; DROP TABLE csv_ct; -- executes arbitrary SQL.

Even in sample scripts, this pattern propagates dangerous habits. Quote identifiers and use parameterized queries or at least escape values:

🛡️ Proposed fix
 def register_as_table_with_create_table(name, io, csv_options: {})
   csv = CSV.new(io, **csv_options)
   headers = csv.first.headers
   csv.rewind
-  execute("CREATE OR REPLACE TABLE #{name} (#{headers.map { |h| "#{h} VARCHAR" }.join(', ')})")
+  quoted_name = "\"#{name.gsub('"', '""')}\""
+  col_defs = headers.map { |h| "\"#{h.gsub('"', '""')}\" VARCHAR" }.join(', ')
+  execute("CREATE OR REPLACE TABLE #{quoted_name} (#{col_defs})")
   csv.each do |row|
-    values = row.map { |cell| "'#{cell[1]}'" }.join(', ')
-    execute("INSERT INTO #{name} VALUES (#{values})")
+    placeholders = row.fields.map.with_index(1) { |_, i| "$#{i}" }.join(', ')
+    query("INSERT INTO #{quoted_name} VALUES (#{placeholders})", *row.fields)
   end
 end

Note: Verify whether DuckDB::Connection#query accepts positional bind parameters in this form for your ruby-duckdb version; if not, at minimum escape single quotes with cell[1]&.gsub("'", "''").

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sample/issue930_benchmark.rb` around lines 37 - 40, The SQL construction in
register_as_table_with_create_table is vulnerable: stop interpolating raw table
names (name), column names (headers), and cell values (cell[1]) directly into
execute; instead quote/escape identifiers (use the DB driver's
identifier-quoting API or wrap identifiers in double quotes and escape embedded
quotes) when building the CREATE TABLE and INSERT column list, and avoid string
interpolation for values by using parameterized queries if the DuckDB client
supports bind parameters (or at minimum escape single quotes in cell values with
cell[1]&.gsub("'", "''") before embedding). Ensure you reference the
execute/execute("CREATE OR REPLACE TABLE ..."), headers.map block, and the
csv.each/values construction so all three injection surfaces are fixed
consistently.

Comment on lines +59 to +61
start_time2 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
con.register_as_table_with_create_table('csv_ct', csv_io, csv_options: { headers: true })
con.query('SELECT * FROM csv_ct').to_a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Second benchmark silently depends on the first run's csv.rewind having fired.

The csv.rewind inside the else branch (line 26-27) is called only when DuckDB exhausts all rows with no LIMIT. If the query is ever changed to return a partial result, csv_io stays at EOF and register_as_table_with_create_table will process zero rows without any error. Reset the IO position explicitly between runs for robustness:

🛡️ Proposed fix
 end_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)

+csv_io.rewind
+
 start_time2 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
start_time2 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
con.register_as_table_with_create_table('csv_ct', csv_io, csv_options: { headers: true })
con.query('SELECT * FROM csv_ct').to_a
end_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
csv_io.rewind
start_time2 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
con.register_as_table_with_create_table('csv_ct', csv_io, csv_options: { headers: true })
con.query('SELECT * FROM csv_ct').to_a
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sample/issue930_benchmark.rb` around lines 59 - 61, The second benchmark
silently depends on the first run having advanced csv_io to EOF; before calling
con.register_as_table_with_create_table('csv_ct', csv_io, ...) or re-running
con.query('SELECT * FROM csv_ct') you must explicitly reset the IO position
(e.g., csv_io.rewind or csv_io.seek(0)) so the CSV is read from the start; add
that reset between the first run and the block that uses csv_io again to ensure
register_as_table_with_create_table and con.query always see the full CSV input.

Comment on lines 10 to 28
def register_as_table(name, io, csv_options: {})
csv = CSV.new(io, **csv_options)
headers = csv.first.headers
csv.rewind
columns = headers.each_with_object({}) { |header, hash| hash[header] = LogicalType::VARCHAR }
tf = DuckDB::TableFunction.create(
name:,
columns:
) do |_func_info, output|
line = csv.readline
if line
line.each_with_index do |cell, index|
output.set_value(index, 0, cell[1])
end
1
else
0
end
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

headers: true is an undocumented, unenforced requirement that crashes without it.

Line 12 calls csv.first.headers: CSV#shift returns a CSV::Row when headers: true, but a plain Array otherwise. Array has no #headers method, so omitting headers: true from csv_options raises NoMethodError. The cell[1] access on line 22 also silently depends on this — iterating a CSV::Row yields [header, value] pairs, but iterating a plain row Array yields plain values, making cell[1] index into a string rather than fetch the field value.

Add a guard at the method boundary:

🛡️ Proposed fix
 def register_as_table(name, io, csv_options: {})
+  raise ArgumentError, 'csv_options must include headers: true' unless csv_options[:headers]
   csv = CSV.new(io, **csv_options)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def register_as_table(name, io, csv_options: {})
csv = CSV.new(io, **csv_options)
headers = csv.first.headers
csv.rewind
columns = headers.each_with_object({}) { |header, hash| hash[header] = LogicalType::VARCHAR }
tf = DuckDB::TableFunction.create(
name:,
columns:
) do |_func_info, output|
line = csv.readline
if line
line.each_with_index do |cell, index|
output.set_value(index, 0, cell[1])
end
1
else
0
end
end
def register_as_table(name, io, csv_options: {})
raise ArgumentError, 'csv_options must include headers: true' unless csv_options[:headers]
csv = CSV.new(io, **csv_options)
headers = csv.first.headers
csv.rewind
columns = headers.each_with_object({}) { |header, hash| hash[header] = LogicalType::VARCHAR }
tf = DuckDB::TableFunction.create(
name:,
columns:
) do |_func_info, output|
line = csv.readline
if line
line.each_with_index do |cell, index|
output.set_value(index, 0, cell[1])
end
1
else
0
end
end
🧰 Tools
🪛 GitHub Actions: Linter

[error] 10-10: Metrics/MethodLength: Method has too many lines. [19/10]

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sample/issue930.rb` around lines 10 - 28, The method register_as_table
assumes CSV rows are CSV::Row (headers present) which causes NoMethodError
otherwise; ensure headers are enforced and iteration is robust: when creating
CSV (CSV.new(io, **csv_options)), set csv_options[:headers] = true if not
provided, and update the row/field handling in the DuckDB::TableFunction block
(the csv.first.headers and the loop using cell[1]) to treat each yielded cell as
a [header,value] pair consistently (or normalize to value via cell.is_a?(Array)
? cell[1] : cell), referencing register_as_table, csv_options, CSV.new,
csv.first.headers, and output.set_value to locate the changes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
sample/issue930.rb (1)

35-44: Database and connection are never explicitly closed.

For a sample that others will use as a reference, the idiomatic block form of DuckDB::Database.open (which auto-closes via ensure) better demonstrates safe resource management.

♻️ Suggested idiomatic rewrite
-db = DuckDB::Database.open
-
-csv_data = "id,name,age\n1,Alice,30\n2,Bob,25\n3,Charlie,35"
-csv_data += 30_000.times.map { |i| "\n#{i + 4},Name#{i + 4},#{rand(0..100)}" }.join
-csv_io = StringIO.new(csv_data)
-
-con = db.connect
-con.query('SET threads=1')
-con.register_as_table('csv_io', csv_io, csv_options: { headers: true })
-result = con.query('SELECT * FROM csv_io()').to_a
-
-p result
-puts result.first.first == '1'
-puts result.first[1] == 'Alice'
+csv_data = "id,name,age\n1,Alice,30\n2,Bob,25\n3,Charlie,35"
+csv_data += 30_000.times.map { |i| "\n#{i + 4},Name#{i + 4},#{rand(0..100)}" }.join
+csv_io = StringIO.new(csv_data)
+
+DuckDB::Database.open do |db|
+  con = db.connect
+  con.query('SET threads=1')
+  con.register_as_table('csv_io', csv_io, csv_options: { headers: true })
+  result = con.query('SELECT * FROM csv_io()').to_a
+
+  p result
+  puts result.first.first == '1'
+  puts result.first[1] == 'Alice'
+end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sample/issue930.rb` around lines 35 - 44, Replace the manual opens with the
idiomatic block forms so the Database and Connection are auto-closed: wrap the
existing logic that uses DuckDB::Database.open and con = db.connect inside
DuckDB::Database.open do |db| ... end and db.connect do |con| ... end (keeping
your con.query('SET threads=1'), con.register_as_table('csv_io', csv_io, ...),
and the SELECT call inside) so both db and con are closed via ensure;
alternatively, if you prefer not to use block form, explicitly call con.close
and db.close after the query.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sample/issue930.rb`:
- Around line 18-29: The closure captures and mutates the CSV object (csv) but
register_as_table is being used while DuckDB may be running with multiple
threads, causing concurrent invocations and races; update register_as_table to
assert the DuckDB thread count is 1 at runtime (e.g., query PRAGMA threads or
equivalent via the active connection and raise a clear error if >1), and add a
short inline comment/docs note near the register_as_table implementation
explaining the SET threads=1 precondition so callers know to set threads=1
before registering a mutable CSV-backed table.

---

Duplicate comments:
In `@sample/issue930.rb`:
- Line 12: The code calls csv.first.headers and later uses cell[1] assuming CSV
rows have headers, but there is no guard that csv was parsed with headers: true;
add a check in the method to detect/require header-enabled CSV parsing (e.g.,
inspect csv.first's type or csv_options) and raise or convert to headered rows
if missing, then use csv.first.headers and access fields with header-aware
indexing (cell[1]) safely; update the function that reads the CSV (the
invocation that uses csv.first.headers and later cell[1]) so it either enforces
csv_options[:headers] == true or transforms the plain Array rows into
CSV::Row-like headered rows before accessing headers and cell[1].

---

Nitpick comments:
In `@sample/issue930.rb`:
- Around line 35-44: Replace the manual opens with the idiomatic block forms so
the Database and Connection are auto-closed: wrap the existing logic that uses
DuckDB::Database.open and con = db.connect inside DuckDB::Database.open do |db|
... end and db.connect do |con| ... end (keeping your con.query('SET
threads=1'), con.register_as_table('csv_io', csv_io, ...), and the SELECT call
inside) so both db and con are closed via ensure; alternatively, if you prefer
not to use block form, explicitly call con.close and db.close after the query.

Comment on lines +18 to +29
) do |_func_info, output|
line = csv.readline
if line
line.each_with_index do |cell, index|
output.set_value(index, 0, cell[1])
end
1
else
csv.rewind
0
end
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

csv shared across threads — SET threads=1 is a required but unenforced precondition.

The csv object is captured in the closure and mutated on every block invocation. DuckDB's default thread count is greater than 1, so parallel scans will call this block concurrently, causing races on the shared CSV state (wrong rows, crashes). The demo script correctly calls SET threads=1 before registering the table, but register_as_table neither documents nor enforces this requirement.

At minimum, add a guard comment (or a runtime check):

🛡️ Suggested documentation / guard
 def register_as_table(name, io, csv_options: {}) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize
+  # NOTE: The CSV IO object is not thread-safe. Ensure DuckDB is configured
+  # with SET threads=1 (or a thread-safe IO subclass) before calling this,
+  # otherwise concurrent table scans will produce data races.
   csv = CSV.new(io, **csv_options)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sample/issue930.rb` around lines 18 - 29, The closure captures and mutates
the CSV object (csv) but register_as_table is being used while DuckDB may be
running with multiple threads, causing concurrent invocations and races; update
register_as_table to assert the DuckDB thread count is 1 at runtime (e.g., query
PRAGMA threads or equivalent via the active connection and raise a clear error
if >1), and add a short inline comment/docs note near the register_as_table
implementation explaining the SET threads=1 precondition so callers know to set
threads=1 before registering a mutable CSV-backed table.

@suketa suketa merged commit 2ac9be5 into main Feb 21, 2026
40 of 41 checks passed
@suketa suketa deleted the example_of_issue920 branch February 21, 2026 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant