Skip to content

make file_type a settable attribute #170

@daler

Description

@daler

Imagine we have two bed files, and we do this:

z = x.intersect(y, wao=True)

The resulting file could look like this, is incorrectly guessed to be SAM format:

chr1  0      11447  0  none  0  0  0  chr1  0      11447  0  11447
chr1  11447  11502  1  1     1  0  0  chr1  11447  11502  2  55
chr1  11502  11675  0  none  0  0  0  chr1  11502  11675  0  173
chr1  31291  31431  0  none  0  0  0  .     -1     -1     .  0

When we try to iterate over it, we get an OverflowError. Currently the fix is to make the name field a non-integer before doing the intersection:

def fix(f):
    f.name = f.name + '.'
    return f

z = x.each(fix).intersect(y, wao=True)

While we could get more fancy with detecting SAM, I don't want to go the route of checking against a regex for every field in every line of a file for pathological cases like this. Instead, it would be useful to set the filetype on the BedTool object and use that to short-circuit the create_interval_from_fields heuristics. So then you could do this:

z = x.intersect(y, wao=True)
z.file_type = 'bed'
print(z)  # no longer raises OverflowError

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions