Skip to content

Allow cosmos tests to run in parallel#37696

Draft
JoasE wants to merge 5 commits intodotnet:mainfrom
JoasE:feature/cosmos-test-parallelism
Draft

Allow cosmos tests to run in parallel#37696
JoasE wants to merge 5 commits intodotnet:mainfrom
JoasE:feature/cosmos-test-parallelism

Conversation

@JoasE
Copy link
Contributor

@JoasE JoasE commented Feb 13, 2026

Allow cosmos tests to run in parallel
Use guid for store name

  • I've read the guidelines for contributing and seen the walkthrough
  • I've posted a comment on an issue with a detailed description of how I am planning to contribute and got approval from a member of the team
  • The code builds and tests pass locally (also verified by our automated build checks)
  • Commit messages follow this format:
        Summary of the changes
        - Detail 1
        - Detail 2

        Fixes #bugnumber
  • Tests for the changes have been added (for bug fixes / features)
  • Code follows the same patterns and style as existing code in this repo

Use guid for store name
@roji
Copy link
Member

roji commented Feb 13, 2026

@JoasE the Cosmos emulator in general has had stability issues (and/or rate limiting problems) when we tried to run the tests in parallel. Is it working for you locally?

@JoasE
Copy link
Contributor Author

JoasE commented Feb 13, 2026

@roji Thanks for the heads-up! Was just poking around a little bit. Did notice some instability locally, but it was mostly with a high level of parallelism. It appears to be worse on the pipeline. (which was to be expected). Not sure if this will really turn into something, but just needed the PR to test the changes, hope that's ok!

@AndriySvyryd
Copy link
Member

@JoasE To speed up local runs you can change CosmosTestStore.DisposeAsync to not delete the database when running on emulator (TestEnvironment.IsEmulator). I was meaning to do this, but haven't had time.

@JoasE
Copy link
Contributor Author

JoasE commented Feb 13, 2026

@AndriySvyryd Thanks for the tip! I will look into that. I was looking into recreating the containers also because of this statement in the docs But will try and see what performs better if I have the time

@AndriySvyryd
Copy link
Member

Oh yeah, perhaps I had tried this already and the results were disappointing.

@roji
Copy link
Member

roji commented Feb 14, 2026

Since we're looking at Cosmos test running time, here are the running times of the 20 most slowest tests. Note that the top-most ones are in new session token-related ones - 20 seconds each (seems odd)...

1    00:00:21.1775984   Microsoft.EntityFrameworkCore.CosmosSessionTokensTest+CosmosNonSharedSessionTokenTests.Read_item_session_not_found_throws_CosmosException
2    00:00:20.8897695   Microsoft.EntityFrameworkCore.CosmosSessionTokensTest+CosmosNonSharedSessionTokenTests.UseSessionTokens_uses_session_tokens
3    00:00:12.4439730   Microsoft.EntityFrameworkCore.EndToEndCosmosTest.Can_add_update_delete_with_nested_collections(transactionalBatch: False)
4    00:00:08.4849110   Microsoft.EntityFrameworkCore.Update.CosmosBulkEndToEndTestNoBatching.Can_add_update_delete_with_nested_collections(transactionalBatch: False)
5    00:00:08.2154638   Microsoft.EntityFrameworkCore.Update.CosmosBulkEndToEndTest.Can_add_update_delete_with_nested_collections(transactionalBatch: False)
6    00:00:07.3356782   Microsoft.EntityFrameworkCore.Update.CosmosBulkEndToEndTestNoBatching.Can_add_update_delete_with_collections(transactionalBatch: False)
7    00:00:07.2064352   Microsoft.EntityFrameworkCore.Update.CosmosBulkEndToEndTest.Can_add_update_delete_with_collections(transactionalBatch: False)
8    00:00:06.7284424   Microsoft.EntityFrameworkCore.CosmosTransactionalBatchTest.SaveChanges_update_id_contains_special_chars_which_makes_request_larger_than_2_mib_splits_into_2_batches(isIdSpecialChar: True)
9    00:00:06.5989839   Microsoft.EntityFrameworkCore.CosmosTransactionalBatchTest.SaveChanges_create_id_contains_special_chars_which_would_make_request_larger_than_2_mib_on_update_does_not_split_into_2_batches_for_create(isIdSpecialChar: False)
10   00:00:06.3263467   Microsoft.EntityFrameworkCore.Scaffolding.CompiledModelCosmosTest.BigModel
11   00:00:05.5321270   Microsoft.EntityFrameworkCore.CosmosTransactionalBatchTest.SaveChanges_succeeds_for_101_entities_in_same_partition
12   00:00:05.4970919   Microsoft.EntityFrameworkCore.CosmosTransactionalBatchTest.SaveChanges_transaction_behavior_always_update_entities_payload_can_be_exactly_cosmos_limit_and_throws_when_1byte_over(oneByteOver: True)
13   00:00:05.4817949   Microsoft.EntityFrameworkCore.CosmosTransactionalBatchTest.SaveChanges_transaction_behavior_never_fails_for_duplicate_key_in_same_partition_writes_all_staged_before_error
14   00:00:05.4674573   Microsoft.EntityFrameworkCore.CosmosTransactionalBatchTest.SaveChanges_transaction_behavior_always_create_entities_payload_can_be_exactly_cosmos_limit_and_throws_when_1byte_over(oneByteOver: True)
15   00:00:05.4668164   Microsoft.EntityFrameworkCore.EndToEndCosmosTest.Can_add_update_delete_with_nested_collections(transactionalBatch: True)
16   00:00:05.4523636   Microsoft.EntityFrameworkCore.CosmosTransactionalBatchTest.SaveChanges_id_does_not_count_double_toward_request_size_on_create(oneByteOver: True)
17   00:00:05.4294677   Microsoft.EntityFrameworkCore.CosmosTransactionalBatchTest.SaveChanges_create_id_contains_special_chars_which_would_make_request_larger_than_2_mib_on_update_does_not_split_into_2_batches_for_create(isIdSpecialChar: True)
18   00:00:05.4087656   Microsoft.EntityFrameworkCore.Update.CosmosBulkEndToEndTest.Can_add_update_delete_with_nested_collections(transactionalBatch: True)
19   00:00:05.3604424   Microsoft.EntityFrameworkCore.CosmosTransactionalBatchTest.SaveChanges_exactly_2_mib_does_not_split_and_one_byte_over_splits(oneByteOver: True)
20   00:00:04.9312782   Microsoft.EntityFrameworkCore.EndToEndCosmosTest.Can_add_update_delete_with_collections(transactionalBatch: False)

// Since the databases are deleted, they can't be shared across test runs, so we generate a new name for each run.
// https://learn.microsoft.com/en-us/azure/cosmos-db/emulator#differences-between-the-emulator-and-cloud-service
return Guid.NewGuid().ToString();
return "EF-" + Guid.NewGuid().ToString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndriySvyryd do you think we can get rid of the GUID-named containers? I'm not sure what emulator limitations make this necessary and why we need to delete etc.

Copy link
Contributor Author

@JoasE JoasE Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its documented that the emulator will show performance degradation with >10 containers in this statement in the docs

Which I think is why its deleting the containers after every test.

Because different test classes might use the same test store, using a guid was a quick fix to allow parallel execution (as we otherwise will start deleting something while it's still being used.) There might be a better way to do this, or the performance of >10 containers isn't that bad (or is more about >10 active containers). I'm still looking into that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, I missed that note. I'm guessing that's also a good reason to not parallelize (or at least to keep the degree of parallelization down)...

But I'm still not sure why it's useful to include a random GUID inside the container name. That feels more like a practice for when a shared cloud database is used potentially in parallel, to ensure you don't have conflicts - not relevant for the emulator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roji
The GUID guarantees that parallel execution becomes safe by ensuring every test gets its own isolated container.
Since different test classes might use the same fixture and therefore the same test store name, conflicts will occur with parallel test execution. The tests sharing fixtures are (usually, or as far as I saw until now) read only tests, but since we are deleting the containers one test might finish and delete the container while another is still using it. Using a GUID here is a quick fix, where another option would be to group these tests and create a collection fixture, tying the container lifetime to the collection fixture, but that might require significant changes in the setup with the specifications test or throughout the entire cosmos tests project. Which I am not sure is worth the effort at this moment as I am still trying to determine whether running tests in parallel is actually feasible with the emulator.

It currently seems to be stable locally with no limit on the parallelism (which would mean processor count, which is 16 for my laptop), and no significant performance increment compared to a limit of 3 threads. With by stable I mean running the complete test suite 15 times with no errors.

If I don't delete the shared containers the emulator will stop responding to create collection requests at some point, around 20 databases with each 1-3 containers.

I haven't tested this against a real cosmos db tho, and enabling multi threading is not conditional, so I still have to do that. Maybe I could handle the parallelism myself with semaphores in the test store completely, but it might be kinda funky, also affecting test run times for tests that don't use a fixture but call InitializeAsync in the test.

I just want to mention that this is still an experiment, and I’m not sure yet whether it will succeed or if I’ll even complete it. Please don’t feel like you need to limit your questions though, I really appreciate the input and enjoy discussing it. I just don’t want you to feel like you have to spend time on it, unless you are ok with that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants