Skip to content

Redesing of PooledArray internals #64

@bkamins

Description

@bkamins

Things to do:

  • treat missing as a special value that is not pooled, probably with level 0. This would work the same as in CategoricalArrays.jl; the benefit is that two PooledArrays differing only in the fact if they allow Missing or not could share pool
  • add locking for setindex! but make sure that we support batch operations of adding levels (both in setindex! and in e.g. copyto!); this will allow to fully drop Copy-On-Write and never copy pool and invpool by default; tentatively unsafe_setindex! would be an alternative that does not use lock
  • stress in documentation that using invpool is not safe if potentially other threads are modifying it (this should not be a problem)
  • add droplevels! to DataAPI.jl and to PooledArrays.jl (this requires also a change in CategoricalArrays.jl); this function would reduce pool and invpool to only used levels and also at the same time make a fresh copy of them (as a way to detach pool and invpool between PooledArray-s)

I think this design is better than global pool. It will still cost us a bit in H2O benchmarks, but at least we avoid a global pool that is not reclaimable.

@nalimilan + @quinnj : any additional comments on this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions