Skip to content

feat(perf): skip redundant DDL checks in SQLSampleStore.__init__#670

Open
michael-johnston wants to merge 6 commits intomainfrom
maj_skip_redundant_ddl_checks
Open

feat(perf): skip redundant DDL checks in SQLSampleStore.__init__#670
michael-johnston wants to merge 6 commits intomainfrom
maj_skip_redundant_ddl_checks

Conversation

@michael-johnston
Copy link
Member

Previously, SQLSampleStore.init called _create_source_table() unconditionally on every construction. _create_source_table() calls meta.create_all(checkfirst=True), which issues a separate table-existence query for each of the four backing tables even when the tables already exist — which they always do on read-only CLI commands. Locally this cost was ~1.45s. The changes reduce this to 0.24s.

The main fix is to probe for table existence instead of blinding issuing _create_source_tables(). This removes the 1.45 and adds the probe. We use a direct SQL probe (taking 240ms locally measured) instead of sqlalchemy.inspect() which took 800ms locally.

The PR also introduces a process-level _source_tables_verified that records tablename prefixes whose backing tables have already been confirmed to exist.
On the second and subsequent SQLSampleStore constructions for the same store within the same process the probe is skipped entirely (0 round-trips).
The main impact of this is in the tests.

Previously, SQLSampleStore.__init__ called _create_source_table()
unconditionally on every construction.  _create_source_table() calls
meta.create_all(checkfirst=True), which issues a separate table-existence
query for each of the four backing tables even when
the tables already exist — which they always do on read-only CLI commands.
Locally this cost was ~1.45s. The changes reduce this to 0.24s.

The main fix is to probe for table existence instead of blinding issuing _create_source_tables(). This removes the 1.45 and adds the probe. We use a direct SQL probe (taking 240ms locally measured) instead of sqlalchemy.inspect() which took 800ms locally.

 The PR also introduces a process-level _source_tables_verified that records tablename prefixes whose backing tables have already been confirmed to exist.
  On the second and subsequent SQLSampleStore constructions for the same store within the same process the probe is skipped entirely (0 round-trips).
  The main impact of this is in the tests.
Copy link
Member

@AlessandroPomponio AlessandroPomponio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be careful with this: the use of inspect is dialect independent and is the suggested way to check for table existence. Where do you see the 15 round trips? It should be a DESCRIBE of the 4 tables releated to the sqlsource

I'm all for caching the existence of the tables (like we added in SQLResourceStore) but I'd keep what we have in term of inspection.

Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>
@michael-johnston
Copy link
Member Author

Where do you see the 15 round trips?

That was a typo. Fixed

I'd be careful with this: the use of inspect is dialect independent and is the suggested way to check for table existence.

This is the typical optimisation trade off. If we keep the generic method, inspect, we have to pay the ~1.2 secs for it.

@AlessandroPomponio
Copy link
Member

This is the typical optimisation trade off. If we keep the generic method, inspect, we have to pay the ~1.2 secs for it.

By not checking what the table structure is, though, we will not be able to perform migrations down the line

@michael-johnston
Copy link
Member Author

michael-johnston commented Mar 11, 2026

By not checking what the table structure is, though, we will not be able to perform migrations down the line

Could we not add it back if necessary?

@AlessandroPomponio
Copy link
Member

This should be updated now that #672 has been merged

@DRL-NextGen
Copy link
Member

DRL-NextGen commented Mar 18, 2026

No vulnerabilities found.

Comment on lines +395 to +403
if self.engine.dialect.name == "sqlite":
existence_query = sqlalchemy.text(
"SELECT 1 FROM sqlite_master WHERE type='table' AND name=:name"
).bindparams(name=self._tablename)
else:
existence_query = sqlalchemy.text(
"SELECT 1 FROM information_schema.tables"
" WHERE table_schema = DATABASE() AND table_name = :name LIMIT 1"
).bindparams(name=self._tablename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same as what we added for the resources table (which is why I was asking to update from main, because at first glance I thought the code was the same).

At this point it's worth having a parametrized statement in orchestrator/metastore/sql/statements.py for this

@DRL-NextGen
Copy link
Member

Checks Summary

Last run: 2026-03-24T20:48:15.306Z

Mend Unified Agent vulnerability scan found 1 vulnerabilities:

Severity Identifier Package Details Fix
🔸 Low CVE-2026-4539 pygments-2.19.2-py3-none-any.whl
A security flaw has been discovered in pygments up to 2.19.2. The impacted element is the function A...A security flaw has been discovered in pygments up to 2.19.2. The impacted element is the function AdlLexer of the file pygments/lexers/archetype.py. The manipulation results in inefficient regular expression complexity. The attack is only possible with local access. The exploit has been released to the public and may be used for attacks. The project was informed of the problem early through an issue report but has not responded yet.
Not Available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants