Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
346a87d
skip biopython integration test if it's not installed
daler Jan 17, 2026
a2d5cc5
bump version of artifact action
daler Jan 17, 2026
b6a3331
mambaforge -> miniforge for CI
daler Jan 17, 2026
03ff505
parametrize test (rather than yield)
daler Jan 17, 2026
141c074
fix pybedtools_integration doctests
daler Jan 17, 2026
8a8942a
don't need module name on api docs autosummary
daler Jan 17, 2026
7f6cc51
don't use now-deprecated doc theme options
daler Jan 17, 2026
f71eb0d
escape backslash in docstring
daler Jan 17, 2026
c3d0800
update cli to address #224
daler Jan 17, 2026
8b470c8
update GHA to run on more PR situations
daler Jan 26, 2026
ad0aa0e
update .gitignore
daler Jan 26, 2026
3e3fa7d
parser overhaul
daler Jan 26, 2026
862f39c
fix example data to not have leading spaces on attributes
daler Jan 26, 2026
dd7dad4
rm some residual py2/py3 dual support
daler Jan 26, 2026
e27ca05
add more parser tests, and test dialect simultaneously
daler Jan 26, 2026
96420fc
add semicolon in quotes test for #212
daler Jan 26, 2026
906aab5
more detailed test for repeated keys and comma inside quotes
daler Jan 26, 2026
aa5fe7c
update docs for parser and dialect changes
daler Jan 26, 2026
79b3127
update changelog
daler Jan 26, 2026
56fcaa7
a round of comments on the overhauled parser.py
daler Mar 29, 2026
69b827c
address #213, and in general convert Path -> str where possible
daler Mar 29, 2026
0bba482
address #242 (parents/children docstring consistency)
daler Mar 29, 2026
0c32316
update link to GFF spec (mentioned in #238)
daler Mar 29, 2026
6e482a5
bump version
daler Mar 30, 2026
9ed9b9c
updates to setup.py
daler Mar 30, 2026
78027f8
migrate to using pyproject.toml
daler Mar 31, 2026
74db9b1
address some pytest warnings
daler Mar 31, 2026
a3c7b14
explicitly export all in __init__.py
daler Mar 31, 2026
3453c2e
exclude docs from python package
daler Mar 31, 2026
4a87738
updates to GitHub Actions config
daler Mar 31, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 23 additions & 20 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,21 @@
name: main
on: [push]
on:
push:
branches:
- master
pull_request:
types:
- opened
- reopened
- synchronize
jobs:
build-and-test:
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
python-version: ["3.10", "3.11", "3.12"]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v6
- name: git setup
id: git-setup
run: |
Expand All @@ -18,35 +26,33 @@ jobs:

- name: conda env
run: |
wget -O Mambaforge.sh "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge.sh -b -p "${HOME}/conda"
wget -O Miniforge3.sh "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3.sh -b -p "${HOME}/conda"
source "${HOME}/conda/etc/profile.d/conda.sh"
source "${HOME}/conda/etc/profile.d/mamba.sh"
which conda
conda config --system --add channels defaults
conda config --system --add channels bioconda
conda config --system --add channels conda-forge
conda config --system --set channel_priority strict
mamba create -y -n gffutils-env \
conda create -y -n gffutils-env \
python=${{ matrix.python-version }} \
bedtools

conda activate gffutils-env
python setup.py clean sdist
(cd dist && pip install gffutils-*.tar.gz)
python -m pip install build
python -m build
(cd dist && python -m pip install gffutils-*.tar.gz)
cd $TMPDIR
python -c "import gffutils; print(gffutils.__file__)"
conda deactivate

- name: run unit tests
run: |
source "${HOME}/conda/etc/profile.d/conda.sh"
source "${HOME}/conda/etc/profile.d/mamba.sh"

conda activate gffutils-env
pip install pytest hypothesis biopython pybedtools
pytest -v --doctest-modules gffutils
conda install -y bedtools
python -m pip install -e '.[optional,test]'
pytest
conda deactivate

- name: doctests
Expand All @@ -61,9 +67,8 @@ jobs:
if: ${{ (matrix.python-version != 3.8) }}
run: |
source "${HOME}/conda/etc/profile.d/conda.sh"
source "${HOME}/conda/etc/profile.d/mamba.sh"
mamba install -y -n gffutils-env --file docs-requirements.txt
conda activate gffutils-env
python -m pip install -e '.[docs]'
(cd doc && make clean doctest)
conda deactivate

Expand All @@ -72,7 +77,6 @@ jobs:
if: ${{ (matrix.python-version != 3.8) }}
run: |
source "${HOME}/conda/etc/profile.d/conda.sh"
source "${HOME}/conda/etc/profile.d/mamba.sh"
conda activate gffutils-env
(cd doc && make html)
conda deactivate
Expand All @@ -83,7 +87,6 @@ jobs:
--branch gh-pages "https://x-acess-token:${{ secrets.GITHUB_TOKEN }}@github.com/$GITHUB_REPOSITORY" \
/tmp/docs


# clean it out and add newly-built docs
rm -rf /tmp/docs/*
cp -r doc/build/html/* /tmp/docs
Expand All @@ -102,15 +105,15 @@ jobs:

- name: push artifact
if: ${{ (matrix.python-version == 3.9) }}
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v6
with:
name: doc
path: /tmp/docs

- name: push docs to gh-pages branch
# Push docs to gh-pages if this test is running on master branch, and
# restrict to a single Python version.
if: ${{ (github.ref == 'refs/heads/master') && (matrix.python-version == 3.9) }}
if: ${{ (github.ref == 'refs/heads/master') && (matrix.python-version == 3.12) }}
run: |
cd /tmp/docs
git push "https://x-access-token:${{ secrets.GITHUB_TOKEN }}@github.com/$GITHUB_REPOSITORY" gh-pages
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
env/
*.swo
*gfffeature.so
*.swp
Expand Down
6 changes: 0 additions & 6 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,12 +1,6 @@
include README.rst
include requirements.txt
include LICENSE
recursive-include docs/source *.rst
recursive-include docs/source *.py
recursive-include docs/source/images *
recursive-include doc/source/_templates *
include docs/Makefile
include docs/make.bat
include gffutils/test/data/c_elegans_WS199_ann_gff.txt
include gffutils/test/data/c_elegans_WS199_dna_shortened.fa
include gffutils/test/data/c_elegans_WS199_shortened_gff.txt
Expand Down
22 changes: 11 additions & 11 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,10 +117,10 @@ Integration with other tools
:toctree: autodocs
:nosignatures:

gffutils.biopython_integration.to_seqfeature
gffutils.biopython_integration.from_seqfeature
gffutils.pybedtools_integration.tsses
gffutils.pybedtools_integration.to_bedtool
biopython_integration.to_seqfeature
biopython_integration.from_seqfeature
pybedtools_integration.tsses
pybedtools_integration.to_bedtool



Expand All @@ -131,10 +131,10 @@ Utilities
:toctree: autodocs
:nosignatures:

gffutils.helpers.asinterval
gffutils.helpers.merge_attributes
gffutils.helpers.sanitize_gff_db
gffutils.helpers.annotate_gff_db
gffutils.helpers.infer_dialect
gffutils.helpers.example_filename
gffutils.inspect.inspect
helpers.asinterval
helpers.merge_attributes
helpers.sanitize_gff_db
helpers.annotate_gff_db
helpers.infer_dialect
helpers.example_filename
inspect.inspect
46 changes: 46 additions & 0 deletions doc/source/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,52 @@
Change log
==========


v0.14
-----

- If a value contained a semicolon there would be unexpected behavior (reported
in `#212 <https://github.com/daler/gffutils/issues/212>`__). This is solved
by adding a new entry to the dialect, ``semicolon in quotes```, and running
the necessary regular expression only when inferring dialect, or, if
``semicolon in quotes`` is ``True``, on every feature. In the latter case,
this can dramatically increase the parsing time, since in Python regular
expressions are relatively slow, but it does correctly parse. Thanks to
@DevangThakkar for the fix.
- While working on that, refactored the attributes parsing to make it clearer
to follow along, and added more tests. The refactoring fixed some subtle bugs
on corner cases:
- Previously, for features with repeated keys, the ``order`` key of dialects
would list the repeated keys each time they appeared (i.e., the list had
duplicates) which could result in undetermined behavior. The ``order`` key
is now unique and only the first occurrence of a repeated key will be added
to the order.
- Previously, the ``ensembl_gtf.txt`` example file had a leading *space* in
front of the attributes. This looks to be an error in the creation of the
example file in the first place, but had previously parsed fine. Now the
parser (correctly) mis-handles it. Since I'm unaware of any cases in the
wild that have a leading space, I actually consider the new parsing, which
complains about the space, to be more correct.
- Added tests to directly inspect the inferred dialects for the test cases.
- Preserve GFF directives when ``create_db()`` imports from a file path,
matching the behavior for string-backed iterators and fixing
`#213 <https://github.com/daler/gffutils/issues/213>`__. This was due to
a different path through the code when using a `pathlib.Path` object. In
addition to this fix, `pathlib.Path` objects are now converted to `str`
throughout the code base with ``os.fspath`` where appropriate.
- CI, testing, and docs infrastructure updates (miniforge instead of
mambaforge; GitHub Action version bumps; skip biopython test if it's not
installed (`#233 <https://github.com/daler/gffutils/issues/233>`__); reduce build errors for docs)
- Fix `#224 <https://github.com/daler/gffutils/issues/224>`__, which was caused
by changes to the ``argh`` package used for the command-line tool.
- Address `#242 <https://github.com/daler/gffutils/issues/242>`__ (typo in docstring)
- Migrate to using ``pyproject.toml`` for packaging. This changes how versions are calculated
and reported, and removes the need for ``setup.py``. Version is only ever
recorded in ``pyproject.toml``; ``version.py`` gets the installed version or
parses the TOML if not installed; ``setup.py`` just calls ``setup()`` with no
arguments since everything has been migrated to ``pyproject.toml``.


v0.13
-----

Expand Down
2 changes: 0 additions & 2 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,5 +53,3 @@
templates_path = ['_templates']
exclude_patterns = []
html_theme = 'sphinx_rtd_theme'
html_static_path = ['_static']
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
7 changes: 5 additions & 2 deletions doc/source/dialect.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ A GTF dialect might look like this::
'multival separator': ',',
'quoted GFF2 values': True,
'repeated keys': False,
'trailing semicolon': True}
'trailing semicolon': True,
'semicolon_in_quotes': False}

In contrast, a GFF dialect might look like this::

Expand All @@ -49,7 +50,9 @@ In contrast, a GFF dialect might look like this::
'multival separator': ',',
'quoted GFF2 values': False,
'repeated keys': False,
'trailing semicolon': False}
'trailing semicolon': False,
'semicolon_in_quotes': False}


As other real-world files are brought to the attention of the developers, it's
likely that more entries will be added to the dialect.
6 changes: 3 additions & 3 deletions doc/source/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,7 @@ data upon import into the database:
... return x


Now we can supply this tranform function to :func:`create_db`:
Now we can supply this transform function to :func:`create_db`:

>>> fn = gffutils.example_filename('ensembl_gtf.txt')
>>> db = gffutils.create_db(fn, ":memory:",
Expand Down Expand Up @@ -643,8 +643,8 @@ attributes to have the same format. To help with this, we can use the
>>> dialect = helpers.infer_dialect(
... 'Transcript "B0019.1" ; WormPep "WP:CE40797" ; Note "amx-2" ; Prediction_status "Partially_confirmed" ; Gene "WBGene00000138" ; CDS "B0019.1" ; WormPep "WP:CE40797" ; Note "amx-2" ; Prediction_status "Partially_confirmed" ; Gene "WBGene00000138"',
... )
>>> print(dialect)
{'leading semicolon': False, 'trailing semicolon': False, 'quoted GFF2 values': True, 'field separator': ' ; ', 'keyval separator': ' ', 'multival separator': ',', 'fmt': 'gtf', 'repeated keys': True, 'order': ['Transcript', 'WormPep', 'Note', 'Prediction_status', 'Gene', 'CDS', 'WormPep', 'Note', 'Prediction_status', 'Gene']}
>>> print({k: v for k, v in sorted(dialect.items())})
{'field separator': ' ; ', 'fmt': 'gtf', 'keyval separator': ' ', 'leading semicolon': False, 'multival separator': ',', 'order': ['Transcript', 'WormPep', 'Note', 'Prediction_status', 'Gene', 'CDS'], 'quoted GFF2 values': True, 'repeated keys': True, 'semicolon in quotes': False, 'trailing semicolon': False}

>>> db.dialect = dialect

Expand Down
6 changes: 3 additions & 3 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
Introduction
============
:mod:`gffutils` is a Python package for working with `GFF
<http://www.sanger.ac.uk/resources/software/gff/spec.htm>`_ and `GTF
<http://mblab.wustl.edu/GTF22.html>`_ files in a hierarchical manner. It
allows operations which would be complicated or time-consuming using
<https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md>`__
and `GTF <http://mblab.wustl.edu/GTF22.html>`_ files in a hierarchical manner.
It allows operations which would be complicated or time-consuming using
a text-file-only approach.

Below is a short demonstration of :mod:`gffutils`. For the full documentation,
Expand Down
11 changes: 11 additions & 0 deletions gffutils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,14 @@
from gffutils.helpers import example_filename
from gffutils.exceptions import FeatureNotFoundError, DuplicateIDError
from gffutils.version import version as __version__

__all__ = [
"__version__",
"create_db",
"FeatureDB",
"Feature",
"DataIterator",
"example_filename",
"FeatureNotFoundError",
"DuplicateIDError",
]
6 changes: 6 additions & 0 deletions gffutils/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,12 @@
# vs
# ID=001; Name=gene1
"field separator": ";",
# Sometimes there are semicolons inside quotes that break things, e.g.,
#
# note "Evidence 1a: Function1, Function2"
# vs
# note "Evidence 1a: Function; PubMedId: 123, 456"
"semicolon in quotes": False,
# Usually "=" for GFF3; " " for GTF, e.g.,
#
# gene_id "GENE1"
Expand Down
9 changes: 6 additions & 3 deletions gffutils/contrib/plotting.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
import warnings

from gffutils.helpers import asinterval

try:
from pybedtools.contrib.plotting import Track
except ImportError:
import warnings

warnings.warn("Please install pybedtools for plotting.")
Track = None


class Gene(object):
Expand Down Expand Up @@ -49,6 +49,9 @@ def __init__(
UTRs, CDSs are. Padding is essentially "full" minus the largest height
(CDS, 0.9, by default).
"""
if Track is None:
warnings.warn("Please install pybedtools for plotting.")
raise ImportError("pybedtools is required for gffutils.contrib.plotting")

self.heights = {"transcript": 0.2, "utrs": 0.5, "cds": 0.9, "full": 1.0}
self.kwargs = kwargs
Expand Down
3 changes: 3 additions & 0 deletions gffutils/create.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,9 @@ def __init__(
Base class for _GFFDBCreator and _GTFDBCreator; see create_db()
function for docs
"""
if isinstance(dbfn, os.PathLike):
dbfn = os.fspath(dbfn)

self._keep_tempfiles = _keep_tempfiles
if force_merge_fields is None:
force_merge_fields = []
Expand Down
3 changes: 3 additions & 0 deletions gffutils/feature.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from pyfaidx import Fasta
import os
import simplejson as json
from gffutils import constants
from gffutils import helpers
Expand Down Expand Up @@ -383,6 +384,8 @@ def sequence(self, fasta, use_strand=True):
-------
string
"""
if isinstance(fasta, os.PathLike):
fasta = os.fspath(fasta)
if isinstance(fasta, str):
fasta = Fasta(fasta, as_raw=False)

Expand Down
3 changes: 3 additions & 0 deletions gffutils/gffwriter.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
##
import tempfile
import shutil
import os
from time import strftime, localtime
from gffutils.version import version

Expand Down Expand Up @@ -33,6 +34,8 @@ class GFFWriter:
"""

def __init__(self, out, with_header=True, in_place=False):
if isinstance(out, os.PathLike):
out = os.fspath(out)
self.out = out
self.with_header = with_header
self.in_place = in_place
Expand Down
Loading