Skip to content

Commit dbe5fc5

Browse files
committed
Add support for discussions
Closes #290
1 parent ed29a91 commit dbe5fc5

6 files changed

Lines changed: 961 additions & 41 deletions

File tree

CHANGES.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ Changelog
33

44
Unreleased
55
----------
6+
- Add GitHub Discussions backups via GraphQL, including comments, replies,
7+
optional attachment downloads, and per-repository incremental checkpoints.
68
- Add ``--token-from-gh`` to read authentication from ``gh auth token``.
79

810

README.rst

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ github-backup
44

55
|PyPI| |Python Versions|
66

7-
The package can be used to backup an *entire* `Github <https://github.com/>`_ organization, repository or user account, including starred repos, issues and wikis in the most appropriate format (clones for wikis, json files for issues).
7+
The package can be used to backup an *entire* `Github <https://github.com/>`_ organization, repository or user account, including starred repos, issues, discussions and wikis in the most appropriate format (clones for wikis, json files for issues and discussions).
88

99
Requirements
1010
============
@@ -44,8 +44,9 @@ CLI Help output::
4444
[--issues] [--issue-comments] [--issue-events] [--pulls]
4545
[--pull-comments] [--pull-commits] [--pull-details]
4646
[--labels] [--hooks] [--milestones] [--security-advisories]
47-
[--repositories] [--bare] [--no-prune] [--lfs] [--wikis]
48-
[--gists] [--starred-gists] [--skip-archived] [--skip-existing]
47+
[--discussions] [--repositories] [--bare] [--no-prune]
48+
[--lfs] [--wikis] [--gists] [--starred-gists]
49+
[--skip-archived] [--skip-existing]
4950
[-L [LANGUAGES ...]] [-N NAME_REGEX] [-H GITHUB_HOST]
5051
[-O] [-R REPOSITORY] [-P] [-F] [--prefer-ssh] [-v]
5152
[--keychain-name OSX_KEYCHAIN_ITEM_NAME]
@@ -104,6 +105,7 @@ CLI Help output::
104105
--milestones include milestones in backup
105106
--security-advisories
106107
include security advisories in backup
108+
--discussions include discussions in backup
107109
--repositories include repository clone in backup
108110
--bare clone bare repositories
109111
--no-prune disable prune option for git fetch
@@ -144,8 +146,8 @@ CLI Help output::
144146
applies if including releases
145147
--skip-assets-on [SKIP_ASSETS_ON ...]
146148
skip asset downloads for these repositories
147-
--attachments download user-attachments from issues and pull
148-
requests
149+
--attachments download user-attachments from issues, pull requests,
150+
and discussions
149151
--throttle-limit THROTTLE_LIMIT
150152
start throttling of GitHub API requests after this
151153
amount of API requests remain
@@ -184,7 +186,7 @@ Customise the permissions for your use case, but for a personal account full bac
184186

185187
**User permissions**: Read access to followers, starring, and watching.
186188

187-
**Repository permissions**: Read access to contents, issues, metadata, pull requests, and webhooks.
189+
**Repository permissions**: Read access to contents, discussions, issues, metadata, pull requests, and webhooks.
188190

189191

190192
GitHub Apps
@@ -265,9 +267,9 @@ LFS objects are fetched for all refs, not just the current checkout, ensuring a
265267
About Attachments
266268
-----------------
267269

268-
When you use the ``--attachments`` option with ``--issues`` or ``--pulls``, the tool will download user-uploaded attachments (images, videos, documents, etc.) from issue and pull request descriptions and comments. In some circumstances attachments contain valuable data related to the topic, and without their backup important information or context might be lost inadvertently.
270+
When you use the ``--attachments`` option with ``--issues``, ``--pulls`` or ``--discussions``, the tool will download user-uploaded attachments (images, videos, documents, etc.) from issue, pull request and discussion descriptions and comments. In some circumstances attachments contain valuable data related to the topic, and without their backup important information or context might be lost inadvertently.
269271

270-
Attachments are saved to ``issues/attachments/{issue_number}/`` and ``pulls/attachments/{pull_number}/`` directories, where ``{issue_number}`` is the GitHub issue number (e.g., issue #123 saves to ``issues/attachments/123/``). Each attachment directory contains:
272+
Attachments are saved to ``issues/attachments/{issue_number}/``, ``pulls/attachments/{pull_number}/`` and ``discussions/attachments/{discussion_number}/`` directories, where ``{issue_number}`` is the GitHub issue number (e.g., issue #123 saves to ``issues/attachments/123/``). Each attachment directory contains:
271273

272274
- The downloaded attachment files (named by their GitHub identifier with appropriate file extensions)
273275
- If multiple attachments have the same filename, conflicts are resolved with numeric suffixes (e.g., ``report.pdf``, ``report_1.pdf``, ``report_2.pdf``)
@@ -287,6 +289,16 @@ The tool automatically extracts file extensions from HTTP headers to ensure file
287289
**Fine-grained token limitation:** Due to a GitHub platform limitation, fine-grained personal access tokens (``github_pat_...``) cannot download attachments from private repositories directly. This affects both ``/assets/`` (images) and ``/files/`` (documents) URLs. The tool implements a workaround for image attachments using GitHub's Markdown API, which converts URLs to temporary JWT-signed URLs that can be downloaded. However, this workaround only works for images - document attachments (PDFs, text files, etc.) will fail with 404 errors when using fine-grained tokens on private repos. For full attachment support on private repositories, use a classic token (``-t``) instead of a fine-grained token (``-f``). See `#477 <https://github.com/josegonzalez/python-github-backup/issues/477>`_ for details.
288290

289291

292+
About Discussions
293+
-----------------
294+
295+
GitHub Discussions are backed up with GitHub's GraphQL API because the REST API does not expose discussions. Use ``--discussions`` to save each discussion as JSON under ``repositories/{repo}/discussions/{number}.json``. Discussion backups include the discussion body and metadata, category information, comments, and comment replies.
296+
297+
``--discussions`` is included in ``--all``. Unlike most REST API-backed resources, discussions require authentication because GitHub's GraphQL API requires a token. Fine-grained personal access tokens and GitHub Apps need read access to the repository's Discussions permission.
298+
299+
Incremental backups use a per-repository checkpoint at ``repositories/{repo}/discussions/last_update`` based on discussion ``updatedAt`` timestamps. This is separate from the repository-level ``last_update`` file so discussion activity is not missed if the repository's own update timestamp does not change. If you enable ``--discussions`` on an existing incremental backup, the first run performs a full discussions backup for each repository and creates the discussions checkpoint for future runs.
300+
301+
290302
About security advisories
291303
-------------------------
292304

@@ -419,14 +431,14 @@ Quietly and incrementally backup useful Github user data (public and private rep
419431
export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN
420432
GH_USER=YOUR-GITHUB-USER
421433

422-
github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --security-advisories --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER
434+
github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --security-advisories --discussions --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER
423435
424436
Debug an error/block or incomplete backup into a temporary directory. Omit "incremental" to fill a previous incomplete backup. ::
425437

426438
export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN
427439
GH_USER=YOUR-GITHUB-USER
428440

429-
github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER
441+
github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --discussions --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER
430442

431443
Pipe a token from stdin to avoid storing it in environment variables or command history (Unix-like systems only)::
432444

@@ -442,7 +454,7 @@ This tool creates backups only, there is no inbuilt restore command.
442454
cd /tmp/white-house/repositories/petitions/repository
443455
git push --mirror git@github.com:WhiteHouse/petitions.git
444456

445-
**Issues, pull requests, comments, and other metadata** are saved as JSON files for archival purposes. The GitHub API does not support recreating this data faithfully, creating issues via the API has limitations:
457+
**Issues, pull requests, discussions, comments, and other metadata** are saved as JSON files for archival purposes. The GitHub API does not support recreating this data faithfully, creating issues via the API has limitations:
446458

447459
- New issue/PR numbers are assigned (original numbers cannot be set)
448460
- Timestamps reflect creation time (original dates cannot be set)

0 commit comments

Comments
 (0)