Skip to content

Conversation

@nickita-khylkouski
Copy link
Contributor

Summary

This PR makes the existing bitSize() method in BloomFilter public. Previously it was package-private with @VisibleForTesting.

Motivation

Users who embed Bloom filters in custom file formats need to know the serialization size before writing to pre-allocate space (e.g., for memory-mapped files or fixed-size records). Currently the only way to determine the size is to actually serialize the filter, which is inefficient.

Changes

  • Changed bitSize() visibility from package-private to public
  • Removed @VisibleForTesting annotation
  • Added comprehensive Javadoc explaining:
    • The method's purpose
    • How to calculate serialization size: bitSize() / 8 + 6 bytes
    • The 6-byte header breakdown (1 byte strategy, 1 byte hash functions, 4 bytes array length)
  • Added @since 35.0 annotation
  • Applied changes to both JRE and Android flavors

Tests

Added testBitSizeMatchesSerializationSize() test that verifies bitSize() correctly predicts the writeTo() output size across various configurations.

Test plan

  • New test testBitSizeMatchesSerializationSize() validates the size calculation formula
  • Existing testBitSize() test continues to pass (now testing public API)
  • Run full Guava test suite

Fixes #6866

This change makes the existing bitSize() method public (previously
package-private with @VisibleForTesting) to allow users to determine
the serialization size of a BloomFilter without actually serializing it.

This is useful when pre-allocating space in file formats that embed
Bloom filters. The serialization size can be calculated as:
bitSize() / 8 + 6 bytes (for the header).

The method is documented with:
- Javadoc explaining its purpose and the serialization size formula
- @SInCE 35.0 annotation

Also adds testBitSizeMatchesSerializationSize() test to verify that
bitSize() correctly predicts the writeTo() output size.

Fixes google#6866
@kluever kluever self-assigned this Feb 6, 2026
@kluever kluever added type=enhancement Make an existing feature better package=hash status=triaged P3 no SLO labels Feb 6, 2026
copybara-service bot pushed a commit that referenced this pull request Feb 7, 2026
See #8198

RELNOTES=n/a
PiperOrigin-RevId: 866534945
copybara-service bot pushed a commit that referenced this pull request Feb 7, 2026
See #8198

RELNOTES=n/a
PiperOrigin-RevId: 866662371
@kluever
Copy link
Member

kluever commented Feb 7, 2026

I wonder if it'd be more convenient for users if we exposed an API that tells you how many bytes (instead of bits) the underlying AtomicLongArray is? We'd want to figure out a good name for it (I'm not convinced byteSize() is very good :-)

@jrtom
Copy link
Contributor

jrtom commented Feb 7, 2026

waves

I'm not convinced that byteSize() is the best name, but at least it's easily digestible. ;)

More serially seriously: I do agree that providing the size in bytes is better than the size in bits, at least for the proposed purpose. That said: if the sole purpose of providing it is to inform serialization users of the size of the object to be serialized, maybe the reported size should both (a) be in bytes and (b) include the header as well, and be named something like serializationSize().

@nickita-khylkouski
Copy link
Contributor Author

Great suggestions! I agree — returning bytes with the header included is a cleaner API. Users shouldn't need to do the bitSize() / 8 + 6 math themselves.

For naming, a few options:

  • serializationSize()@jrtom's suggestion, clean and direct
  • serializedSizeInBytes() — explicit about units, but verbose
  • writtenSize() — ties to writeTo(), but less clear in isolation

I lean toward serializationSize() for simplicity. Happy to update the PR once you settle on a name — or if you'd prefer a different approach entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P3 no SLO package=hash status=triaged type=enhancement Make an existing feature better

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make BloomFilter.bitSize() public

3 participants