Skip to content

Feature: add \shards command to track relocations#494

Open
mannreis wants to merge 6 commits intocrate:mainfrom
mannreis:main
Open

Feature: add \shards command to track relocations#494
mannreis wants to merge 6 commits intocrate:mainfrom
mannreis:main

Conversation

@mannreis
Copy link
Copy Markdown

Summary of the changes / Why this is an improvement

As requested in #493 this PR aims adds a new command \shards to conveniently query for shards undergoing relocation. It computes the the aggregated progress per table based on the sum of the sizes of relocating shards divided by the total size of the table shards.

Optionally, state and relocating can be provided as an argument to respectively:

  • get a short summary of shards aggregated by state
  • view details of which ones are currently relocating

Integration testing

I resorted to docker compose and scripting to move shards withing a cluster in order to test the what's implemented here. In order integrate that type of testing it in this code base, I spent sometime looking at cratedb_toolkit.testing.testcontainers.cratedb but I believe it would require another PR. For now, I resort to single-node integration testing.

Checklist

@amotl amotl requested review from mfussenegger and seut March 24, 2026 23:05
Copy link
Copy Markdown
Member

@seut seut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some suggestions related to the used statements, otherwise looks fine.
I had a bit in mind that having some ongoing progress visible with some kind of progress bar, which one could get out e.g. by ctrl-c would be awesome, but also probably better to work on this as a followup, now that you've done the ground work :)

About real integration testing: Right, this would require at least 2 nodes to make it work. Utilizing the testcontainers to make this work would be really great, but agree, lets look into this as followup as well.

Comment on lines +241 to +258
SELECT
table_name,
COUNT(*)
AS total_shards,
SUM(num_docs)
AS total_num_docs,
SUM(size)
As total_sum_shard_size,
SUM(CASE WHEN routing_state = 'RELOCATING' THEN 1 ELSE 0 END)
AS relocating_shards,
SUM(CASE WHEN routing_state = 'RELOCATING' THEN size ELSE 0 END)
AS relocating_size,
100.0 * SUM(CASE WHEN routing_state != 'RELOCATING' THEN size ELSE 0 END) / CAST(SUM(size) as DOUBLE)
AS relocated_percent
FROM sys.shards
WHERE routing_state != 'UNASSIGNED'
GROUP BY table_name
ORDER BY relocated_percent, table_name;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be rather the relocating statement and the state maybe the default one?

I think adding schema_name and partition_ident would be helpful.
Additionally, we could make use of filters on aggregations to ease the statement (also improves performance afaik). If we treat this as the relocating statement, I'd leave out number of docs as not really interesting (keep focus on the relevant parts).

Suggested change
SELECT
table_name,
COUNT(*)
AS total_shards,
SUM(num_docs)
AS total_num_docs,
SUM(size)
As total_sum_shard_size,
SUM(CASE WHEN routing_state = 'RELOCATING' THEN 1 ELSE 0 END)
AS relocating_shards,
SUM(CASE WHEN routing_state = 'RELOCATING' THEN size ELSE 0 END)
AS relocating_size,
100.0 * SUM(CASE WHEN routing_state != 'RELOCATING' THEN size ELSE 0 END) / CAST(SUM(size) as DOUBLE)
AS relocated_percent
FROM sys.shards
WHERE routing_state != 'UNASSIGNED'
GROUP BY table_name
ORDER BY relocated_percent, table_name;
SELECT
schema_name,
table_name,
partition_ident,
COUNT(*)
AS total_shards,
SUM(size)
As total_size,
COUNT(*) FILTER (WHERE routing_state = 'RELOCATING')
AS relocating_shards,
SUM(size) FILTER (WHERE routing_state = 'RELOCATING')
AS relocating_size,
100.0 * SUM(size) FILTER(WHERE routing_state != 'RELOCATING') / SUM(size)
AS relocated_percent
FROM sys.shards
WHERE routing_state != 'UNASSIGNED'
GROUP BY schema_name, table_name, partition_ident
ORDER BY relocated_percent, schema_name, table_name, partition_ident;

Comment on lines +262 to +272
SELECT
routing_state,
COUNT(*)
AS shard_count,
SUM(num_docs)
AS num_docs,
SUM(size) / 1073741824.0
AS size_gb
FROM sys.shards
GROUP BY routing_state
ORDER BY routing_state;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said, I'd suggest to use this as the default statement for \shards.
Adding the primary column may be helpful, especially when UNASSIGNED shards exists as unassigned replica shards may be expected while unassigned primaries are critical.

Suggested change
SELECT
routing_state,
COUNT(*)
AS shard_count,
SUM(num_docs)
AS num_docs,
SUM(size) / 1073741824.0
AS size_gb
FROM sys.shards
GROUP BY routing_state
ORDER BY routing_state;
SELECT
state,
primary,
COUNT(*)
AS shard_count,
SUM(num_docs)
AS num_docs,
SUM(size) / 1024^3
AS size_gb
FROM sys.shards
GROUP BY state, primary
ORDER BY state, primary;

Comment on lines +275 to +292
RELOC_STMT = """
SELECT
table_name,
node['name'],
id,
recovery['stage'],
size,
routing_state,
state,
primary,
relocating_node,
size / 1024.0
AS size_kb,
partition_ident
FROM sys.shards
WHERE routing_state = 'RELOCATING'
ORDER BY table_name, id, node['name'];
"""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this statement won't help much as it won't give us any information on the relocation progress. I've played around a bit and currently I see no way how we can get the progress per shard as CrateDB does not expose this (the recovery['size'] only exposes real shard recovery and not the relocation).
Thus, we can only show a overall progress, counting complete shard sizes as you did in the existing DEFAULT_STMT unfortunately. Seems like we need to improve something on CrateDB to have real, per shard progresses.
For now, I suggest to remove it.

def setUp(self):
node.reset()

def test_shards_command(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should name this a bit more concrete to reflect whats really tested (only output format, no values).

Suggested change
def test_shards_command(self):
def test_shards_command_output_format(self):

Same for the other test methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants