Imagine we have two bed files, and we do this:
z = x.intersect(y, wao=True)
The resulting file could look like this, is incorrectly guessed to be SAM format:
chr1 0 11447 0 none 0 0 0 chr1 0 11447 0 11447
chr1 11447 11502 1 1 1 0 0 chr1 11447 11502 2 55
chr1 11502 11675 0 none 0 0 0 chr1 11502 11675 0 173
chr1 31291 31431 0 none 0 0 0 . -1 -1 . 0
When we try to iterate over it, we get an OverflowError. Currently the fix is to make the name field a non-integer before doing the intersection:
def fix(f):
f.name = f.name + '.'
return f
z = x.each(fix).intersect(y, wao=True)
While we could get more fancy with detecting SAM, I don't want to go the route of checking against a regex for every field in every line of a file for pathological cases like this. Instead, it would be useful to set the filetype on the BedTool object and use that to short-circuit the create_interval_from_fields heuristics. So then you could do this:
z = x.intersect(y, wao=True)
z.file_type = 'bed'
print(z) # no longer raises OverflowError
Imagine we have two bed files, and we do this:
The resulting file could look like this, is incorrectly guessed to be SAM format:
When we try to iterate over it, we get an
OverflowError. Currently the fix is to make the name field a non-integer before doing the intersection:While we could get more fancy with detecting SAM, I don't want to go the route of checking against a regex for every field in every line of a file for pathological cases like this. Instead, it would be useful to set the filetype on the
BedToolobject and use that to short-circuit thecreate_interval_from_fieldsheuristics. So then you could do this: