Unit tests for shuffler by krfricke · Pull Request #22 · ray-project/ray_shuffling_data_loader

krfricke · 2021-07-15T14:05:58Z

No description provided.

clarkzinzow

LGTM overall, just a few nits and questions.

clarkzinzow · 2021-07-15T18:58:02Z

+
+
+class DataLoaderShuffleTest(unittest.TestCase):
+    """This test suite validates core RayDMatrix functionality."""


Suggested change

"""This test suite validates core RayDMatrix functionality."""

"""This test suite validates core shuffle functionality."""

clarkzinzow · 2021-07-15T18:58:43Z

+    def tearDownClass(cls):
+        ray.shutdown()
+
+    def testShuffleMap(self):


Nit: Tests should be snake case.

Suggested change

def testShuffleMap(self):

def test_shuffle_map(self):

clarkzinzow · 2021-07-15T18:58:59Z

+        assert len(set(all_keys)) == len(all_keys), \
+            "Keys in full dataset are not distinct."
+
+    def testShuffleReduce(self):


Nit: Tests should be snake case.

Suggested change

def testShuffleReduce(self):

def test_shuffle_reduce(self):

clarkzinzow · 2021-07-15T18:59:20Z

+            assert set(unshuffled) == set(shuffled), \
+                "Key mismatch between unshuffled and shuffled parts"
+
+    def testShuffleEndToEnd(self):


Nit: Tests should be snake case.

Suggested change

def testShuffleEndToEnd(self):

def test_shuffle_end_to_end(self):

clarkzinzow · 2021-07-15T19:02:35Z

+            # 3sd = 99.7% chance of passing
+            assert mean - 3 * sd < len(part_keys) < mean + 3 * sd, \
+                f"Not enough rows in partition {i}"


Nice! How should we interpret the outliers when this assertion eventually fails?

clarkzinzow · 2021-07-15T19:04:47Z

+        assert len(all_keys) == self.num_rows, "Not all rows were returned."
+
+        assert len(set(all_keys)) == len(all_keys), \
+            "Keys in full dataset are not distinct."


This can wait, but we may want to confirm that none of the actual data was unintentionally mutated, e.g. due to type coercion. That would probably require a slight refactor of (or utility added to) the data generation code.

clarkzinzow · 2021-07-15T21:23:24Z

+
+            shuffled = ray.get(
+                shuffle_reduce.remote(
+                    0,


It shouldn't matter much (I actually don't think it's even used in the reducer anymore), but maybe we should set the reducer_index here.

Suggested change

0,

i,

clarkzinzow · 2021-07-15T21:24:29Z

+        for tid, epoch_batches in consumer.rank_epoch_batches.items():
+            for i in range(len(epoch_batches) - 1):
+                assert len(epoch_batches[i]) == len(
+                    epoch_batches[+1]) == num_epochs, \


Suggested change

epoch_batches[+1]) == num_epochs, \

epoch_batches[1]) == num_epochs, \

clarkzinzow · 2021-07-15T21:24:56Z

+                    "Keys in dataset are not distinct."
+
+                assert set1 == set2, \
+                    "Shuffled key sets are not equal."


Great e2e test!

Kai Fricke added 3 commits July 14, 2021 13:41

test shuffle_map

8331037

test shuffle_reduce

49e88c1

Shuffle end to end test

cee22d2

krfricke requested a review from clarkzinzow July 15, 2021 14:06

krfricke assigned clarkzinzow Jul 15, 2021

krfricke mentioned this pull request Jul 15, 2021

[Testing] Add unit tests. #15

Open

clarkzinzow approved these changes Jul 15, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unit tests for shuffler#22

Unit tests for shuffler#22
krfricke wants to merge 3 commits into
ray-project:mainfrom
krfricke:test-shuffler

krfricke commented Jul 15, 2021

Uh oh!

clarkzinzow left a comment •

edited

Loading

Uh oh!

clarkzinzow Jul 15, 2021

Uh oh!

clarkzinzow Jul 15, 2021

Uh oh!

clarkzinzow Jul 15, 2021

Uh oh!

clarkzinzow Jul 15, 2021

Uh oh!

clarkzinzow Jul 15, 2021

Uh oh!

clarkzinzow Jul 15, 2021

Uh oh!

clarkzinzow Jul 15, 2021

Uh oh!

clarkzinzow Jul 15, 2021

Uh oh!

clarkzinzow Jul 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		class DataLoaderShuffleTest(unittest.TestCase):
		"""This test suite validates core RayDMatrix functionality."""

	"""This test suite validates core RayDMatrix functionality."""
	"""This test suite validates core shuffle functionality."""

	def testShuffleEndToEnd(self):
	def test_shuffle_end_to_end(self):

	epoch_batches[+1]) == num_epochs, \
	epoch_batches[1]) == num_epochs, \

Conversation

krfricke commented Jul 15, 2021

Uh oh!

clarkzinzow left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clarkzinzow left a comment •

edited

Loading