Skip to content
This repository was archived by the owner on Jan 5, 2023. It is now read-only.

Unit tests for shuffler#22

Open
krfricke wants to merge 3 commits into
ray-project:mainfrom
krfricke:test-shuffler
Open

Unit tests for shuffler#22
krfricke wants to merge 3 commits into
ray-project:mainfrom
krfricke:test-shuffler

Conversation

@krfricke

Copy link
Copy Markdown
Contributor

No description provided.

@clarkzinzow clarkzinzow left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just a few nits and questions.



class DataLoaderShuffleTest(unittest.TestCase):
"""This test suite validates core RayDMatrix functionality."""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""This test suite validates core RayDMatrix functionality."""
"""This test suite validates core shuffle functionality."""

def tearDownClass(cls):
ray.shutdown()

def testShuffleMap(self):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Tests should be snake case.

Suggested change
def testShuffleMap(self):
def test_shuffle_map(self):

assert len(set(all_keys)) == len(all_keys), \
"Keys in full dataset are not distinct."

def testShuffleReduce(self):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Tests should be snake case.

Suggested change
def testShuffleReduce(self):
def test_shuffle_reduce(self):

assert set(unshuffled) == set(shuffled), \
"Key mismatch between unshuffled and shuffled parts"

def testShuffleEndToEnd(self):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Tests should be snake case.

Suggested change
def testShuffleEndToEnd(self):
def test_shuffle_end_to_end(self):

Comment on lines +59 to +61
# 3sd = 99.7% chance of passing
assert mean - 3 * sd < len(part_keys) < mean + 3 * sd, \
f"Not enough rows in partition {i}"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! How should we interpret the outliers when this assertion eventually fails?

assert len(all_keys) == self.num_rows, "Not all rows were returned."

assert len(set(all_keys)) == len(all_keys), \
"Keys in full dataset are not distinct."

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can wait, but we may want to confirm that none of the actual data was unintentionally mutated, e.g. due to type coercion. That would probably require a slight refactor of (or utility added to) the data generation code.


shuffled = ray.get(
shuffle_reduce.remote(
0,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't matter much (I actually don't think it's even used in the reducer anymore), but maybe we should set the reducer_index here.

Suggested change
0,
i,

for tid, epoch_batches in consumer.rank_epoch_batches.items():
for i in range(len(epoch_batches) - 1):
assert len(epoch_batches[i]) == len(
epoch_batches[+1]) == num_epochs, \

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
epoch_batches[+1]) == num_epochs, \
epoch_batches[1]) == num_epochs, \

"Keys in dataset are not distinct."

assert set1 == set2, \
"Shuffled key sets are not equal."

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great e2e test!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants