diff --git a/README.md b/README.md index 6271da0..d2b28fe 100644 --- a/README.md +++ b/README.md @@ -30,6 +30,19 @@ Complete workflow for finding relevant Filecoin Slack messages related to specif This is particularly useful for investigating storage provider issues and tracking community discussions. +### 📋 GitHub Projects v2 Exporter (FOC GA Tracker) +**Directory:** [foc-ga-tracker/](foc-ga-tracker/) + +Export and interact with GitHub Projects v2 data, specifically designed for the [FilOzone FOC GA Tracker project](https://github.com/orgs/FilOzone/projects/14/views/22). + +Features: +- Export project items to CSV/TSV/JSON formats +- Preserve all custom fields and metadata +- Compatible with spreadsheet applications +- Full GraphQL API integration + +See [foc-ga-tracker/README.md](foc-ga-tracker/README.md) for detailed usage instructions. + ## Getting Started 1. **Clone the repository** diff --git a/foc-ga-tracker/README.md b/foc-ga-tracker/README.md new file mode 100644 index 0000000..d12579a --- /dev/null +++ b/foc-ga-tracker/README.md @@ -0,0 +1,166 @@ +# FOC GA Tracker + +Utilities and scripts to interact with GitHub Projects v2 (specifically the [FilOzone FOC GA Tracker project](https://github.com/orgs/FilOzone/projects/14/views/22)). + +## Overview + +This directory contains tools for: +- Exporting GitHub Projects v2 data to CSV/TSV/JSON formats (like GitHub's built-in export) +- Querying and analyzing project items +- Filtering and processing project data + +## Setup + +1. **Install dependencies**: + ```bash + pip install gql requests + ``` + +2. **Set up GitHub token**: + ```bash + export GITHUB_TOKEN=your_github_personal_access_token + ``` + + Your token must have the `read:project` scope for organization projects. You can create a token at: https://github.com/settings/tokens + +## Usage + +### Export Project to CSV/TSV + +The main tool is `github_project_exporter.py`, which exports GitHub Projects v2 data to spreadsheet-friendly formats. + +**Basic usage**: +```bash +# Export using organization name and project number +python github_project_exporter.py --org FilOzone --project-number 14 --output foc_ga_tracker.csv + +# Export using URL (easiest) +python github_project_exporter.py --url https://github.com/orgs/FilOzone/projects/14 --output foc_ga_tracker.csv + +# Export to TSV format (tab-separated, better for Excel/Google Sheets) +python github_project_exporter.py --url https://github.com/orgs/FilOzone/projects/14 --format tsv --output foc_ga_tracker.tsv + +# Export to JSON (full data structure) +python github_project_exporter.py --url https://github.com/orgs/FilOzone/projects/14 --format json --output foc_ga_tracker.json +``` + +**Options**: +- `--org`: Organization name (e.g., `FilOzone`) +- `--project-number`: Project number (e.g., `14`) +- `--url`: Full GitHub Projects URL (easiest option) +- `--project-id`: Direct project node ID (if you already have it) +- `--output`, `-o`: Output file path (default: `project_export.csv`) +- `--format`: Output format: `csv`, `tsv`, or `json` (default: `csv`) +- `--max-items`: Maximum number of items to fetch (default: 1000) +- `--token`: GitHub token (or use `GITHUB_TOKEN` env var) + +### Exporting Specific Views + +GitHub Projects v2 URLs often include a view number (e.g., `/views/22`). However, the export tool exports all items from the project, not just those visible in a specific view. Views are filters/groupings on the web interface, but the API gives you access to all project items. + +If you need to filter items to match a view, you can: +1. Export all items to CSV/JSON +2. Apply filters manually or via script based on the view's filter criteria +3. Future versions of this tool may support view-specific filtering + +## Understanding the Export Format + +### CSV/TSV Columns + +The exported CSV/TSV includes: + +**Standard columns** (always present): +- `Type`: Item type (`Issue`, `PullRequest`, or `Draft`) +- `Title`: Item title +- `Number`: Issue/PR number +- `State`: State (`OPEN`, `CLOSED`, `MERGED`) +- `URL`: Link to the item +- `Assignees`: Comma-separated list of assignee usernames +- `Labels`: Comma-separated list of labels +- `Created At`: ISO 8601 timestamp +- `Updated At`: ISO 8601 timestamp + +**Dynamic columns** (based on your project's custom fields): +- Any custom fields you've added to the project (Status, Priority, Iteration, etc.) +- Field values are flattened into columns with the field name as the header + +### JSON Format + +The JSON export contains the full GraphQL response structure, including: +- All field values with their types +- Complete assignee and label information +- All metadata + +Use JSON format if you need to programmatically process the data or need the full structure. + +## Examples + +### Weekly Report Export + +```bash +# Export current state of FOC GA Tracker +python github_project_exporter.py \ + --url https://github.com/orgs/FilOzone/projects/14 \ + --format tsv \ + --output foc_ga_tracker_$(date +%Y%m%d).tsv +``` + +### Processing with Python + +```python +import csv +from collections import Counter + +# Read exported CSV +with open('foc_ga_tracker.csv', 'r') as f: + reader = csv.DictReader(f) + items = list(reader) + +# Count items by status +statuses = Counter(item.get('Status', 'No Status') for item in items) +print("Items by Status:") +for status, count in statuses.most_common(): + print(f" {status}: {count}") +``` + +## Troubleshooting + +### "Project not found or not accessible" + +- Verify your GitHub token has the `read:project` scope +- Ensure you have access to the FilOzone organization +- Check that the project number is correct + +### "Rate limit exceeded" + +The GitHub GraphQL API has rate limits. The script includes a small delay between requests. If you hit rate limits: +- Wait a few minutes and try again +- Use a token with higher rate limits (authenticated requests have higher limits) +- Process projects in smaller batches + +### Missing fields in export + +- Custom fields must exist on the project for them to appear in the export +- Some field types may not be fully supported yet +- Check the JSON export if you need to see all available data + +## Future Enhancements + +Potential additions to this toolkit: +- [ ] View-specific filtering (export only items matching a view's filters) +- [ ] Update project items via API +- [ ] Bulk operations (update status, assignees, etc.) +- [ ] Change history tracking +- [ ] Integration with other FilOz utilities + +## Related Tools + +- [`github_pr_report.py`](../github_pr_report.py) - Generate PR reports across repositories +- [`team_pr_report.py`](../team_pr_report.py) - Track team member PRs +- [`slack_search.py`](../slack_search.py) - Search Slack conversations + +## Resources + +- [GitHub Projects v2 API Documentation](https://docs.github.com/en/graphql/reference/objects#projectv2) +- [GitHub GraphQL Explorer](https://docs.github.com/en/graphql/overview/explorer) - Test queries interactively +- [GitHub Projects Documentation](https://docs.github.com/en/issues/planning-and-tracking-with-projects) diff --git a/foc-ga-tracker/github_project_exporter.py b/foc-ga-tracker/github_project_exporter.py new file mode 100755 index 0000000..7733f84 --- /dev/null +++ b/foc-ga-tracker/github_project_exporter.py @@ -0,0 +1,503 @@ +#!/usr/bin/env python3 +""" +GitHub Projects v2 Exporter + +This script queries GitHub Projects v2 (via GraphQL API) and exports +project items to CSV/TSV format, similar to GitHub's built-in export. + +Usage: + python github_project_exporter.py --org FilOzone --project-number 14 --view 22 + python github_project_exporter.py --project-id + python github_project_exporter.py --url https://github.com/orgs/FilOzone/projects/14 +""" + +import os +import sys +import csv +import json +import argparse +from typing import List, Dict, Any, Optional +import time + +try: + from gql import gql, Client + from gql.transport.requests import RequestsHTTPTransport +except ImportError: + print("Error: gql library required. Install with: pip install gql") + sys.exit(1) + + +class GitHubProjectExporter: + def __init__(self, token: str): + self.token = token + transport = RequestsHTTPTransport( + url='https://api.github.com/graphql', + headers={'Authorization': f'Bearer {token}'}, + use_json=True, + ) + self.client = Client(transport=transport, fetch_schema_from_transport=False) + + def get_project_id_from_url(self, url: str) -> Optional[str]: + """ + Extract project number from GitHub Projects URL and convert to node ID. + URL format: https://github.com/orgs/{org}/projects/{number} + """ + parts = url.rstrip('/').split('/') + try: + if 'orgs' in parts: + org_idx = parts.index('orgs') + org = parts[org_idx + 1] + project_num_idx = parts.index('projects') + project_number = int(parts[project_num_idx + 1]) + return self.get_project_node_id(org=org, project_number=project_number) + elif 'users' in parts: + user_idx = parts.index('users') + user = parts[user_idx + 1] + project_num_idx = parts.index('projects') + project_number = int(parts[project_num_idx + 1]) + return self.get_project_node_id(user=user, project_number=project_number) + except (ValueError, IndexError) as e: + print(f"Error parsing URL: {e}") + return None + + def get_project_node_id(self, org: Optional[str] = None, user: Optional[str] = None, project_number: int = None) -> Optional[str]: + """Convert organization/user + project number to GitHub node ID.""" + if not org and not user: + return None + + query_str = """ + query($owner: String!, $projectNumber: Int!) { + organization(login: $owner) { + projectV2(number: $projectNumber) { + id + } + } + } + """ + + if user: + query_str = """ + query($owner: String!, $projectNumber: Int!) { + user(login: $owner) { + projectV2(number: $projectNumber) { + id + } + } + } + """ + + query = gql(query_str) + variables = { + "owner": org or user, + "projectNumber": project_number + } + + try: + result = self.client.execute(query, variable_values=variables) + if org: + project_id = result.get('organization', {}).get('projectV2', {}).get('id') + else: + project_id = result.get('user', {}).get('projectV2', {}).get('id') + + if not project_id: + print(f"Error: Project {project_number} not found for {'org' if org else 'user'} {org or user}") + return None + + return project_id + except Exception as e: + print(f"Error fetching project ID: {e}") + return None + + def get_project_items(self, project_id: str, max_items: int = 1000) -> List[Dict[str, Any]]: + """Fetch all items from a GitHub Project v2.""" + query = gql(""" + query($projectId: ID!, $first: Int!, $after: String) { + node(id: $projectId) { + ... on ProjectV2 { + title + items(first: $first, after: $after) { + pageInfo { + hasNextPage + endCursor + } + nodes { + id + content { + ... on DraftIssue { + title + body + type: __typename + } + ... on Issue { + id + title + number + state + url + assignees(first: 10) { + nodes { + login + } + } + labels(first: 20) { + nodes { + name + } + } + createdAt + updatedAt + type: __typename + } + ... on PullRequest { + id + title + number + state + url + assignees(first: 10) { + nodes { + login + } + } + labels(first: 20) { + nodes { + name + } + } + createdAt + updatedAt + type: __typename + } + } + fieldValues(first: 50) { + nodes { + ... on ProjectV2ItemFieldTextValue { + text + field { + ... on ProjectV2Field { + name + } + ... on ProjectV2IterationField { + name + } + ... on ProjectV2SingleSelectField { + name + } + } + } + ... on ProjectV2ItemFieldSingleSelectValue { + name + field { + ... on ProjectV2SingleSelectField { + name + } + } + } + ... on ProjectV2ItemFieldNumberValue { + number + field { + ... on ProjectV2Field { + name + } + } + } + ... on ProjectV2ItemFieldDateValue { + date + field { + ... on ProjectV2Field { + name + } + } + } + ... on ProjectV2ItemFieldIterationValue { + title + startDate + duration + field { + ... on ProjectV2IterationField { + name + } + } + } + ... on ProjectV2ItemFieldMilestoneValue { + title + field { + ... on ProjectV2Field { + name + } + } + } + ... on ProjectV2ItemFieldRepositoryValue { + nameWithOwner + field { + ... on ProjectV2Field { + name + } + } + } + ... on ProjectV2ItemFieldUserValue { + users(first: 10) { + nodes { + login + } + } + field { + ... on ProjectV2Field { + name + } + } + } + } + } + } + } + } + } + } + """) + + all_items = [] + cursor = None + page_size = 100 # GitHub allows up to 100 items per page + fetched = 0 + + print(f"Fetching project items...", end="", flush=True) + + while fetched < max_items: + variables = { + "projectId": project_id, + "first": min(page_size, max_items - fetched), + "after": cursor + } + + try: + result = self.client.execute(query, variable_values=variables) + project = result.get('node') + + if not project: + print(f"\nError: Project not found or not accessible") + break + + items_data = project.get('items', {}) + items = items_data.get('nodes', []) + page_info = items_data.get('pageInfo', {}) + + if not items: + break + + all_items.extend(items) + fetched += len(items) + + print(".", end="", flush=True) + + if not page_info.get('hasNextPage', False): + break + + cursor = page_info.get('endCursor') + + # Rate limiting - be nice to GitHub API + time.sleep(0.5) + + except Exception as e: + print(f"\nError fetching items: {e}") + if "rate limit" in str(e).lower(): + print("Rate limit exceeded. Please wait and try again.") + break + + print(f" Found {len(all_items)} items") + return all_items + + def flatten_item(self, item: Dict[str, Any]) -> Dict[str, Any]: + """Flatten a project item into a flat dictionary suitable for CSV export.""" + flattened = {} + + content = item.get('content', {}) + + # Basic content fields + if content: + flattened['Type'] = content.get('type', 'Unknown') + flattened['Title'] = content.get('title', '') + flattened['Number'] = content.get('number', '') + flattened['State'] = content.get('state', '') + flattened['URL'] = content.get('url', '') + flattened['Created At'] = content.get('createdAt', '') + flattened['Updated At'] = content.get('updatedAt', '') + + # Assignees + assignees = content.get('assignees', {}).get('nodes', []) + flattened['Assignees'] = ', '.join([a['login'] for a in assignees]) if assignees else '' + + # Labels + labels = content.get('labels', {}).get('nodes', []) + flattened['Labels'] = ', '.join([l['name'] for l in labels]) if labels else '' + else: + flattened['Type'] = 'Draft' + flattened['Title'] = '' + flattened['Number'] = '' + flattened['State'] = '' + flattened['URL'] = '' + flattened['Created At'] = '' + flattened['Updated At'] = '' + flattened['Assignees'] = '' + flattened['Labels'] = '' + + # Field values + field_values = item.get('fieldValues', {}).get('nodes', []) + for field_value in field_values: + field = field_value.get('field', {}) + field_name = field.get('name', 'Unknown Field') + + # Handle different field types + if 'text' in field_value: + flattened[field_name] = field_value['text'] + elif 'name' in field_value: + flattened[field_name] = field_value['name'] + elif 'number' in field_value: + flattened[field_name] = field_value['number'] + elif 'date' in field_value: + flattened[field_name] = field_value['date'] + elif 'title' in field_value: + # Iteration or Milestone + iteration = field_value.get('title', '') + if 'startDate' in field_value: + # It's an iteration + start = field_value.get('startDate', '') + flattened[field_name] = f"{iteration} ({start})" if start else iteration + else: + flattened[field_name] = iteration + elif 'nameWithOwner' in field_value: + flattened[field_name] = field_value['nameWithOwner'] + elif 'users' in field_value: + users = field_value['users'].get('nodes', []) + flattened[field_name] = ', '.join([u['login'] for u in users]) if users else '' + else: + flattened[field_name] = '' + + return flattened + + def export_to_csv(self, items: List[Dict[str, Any]], output_file: str, delimiter: str = ','): + """Export project items to CSV/TSV file.""" + if not items: + print("No items to export") + return + + flattened_items = [self.flatten_item(item) for item in items] + + # Get all unique field names + all_fields = set() + for item in flattened_items: + all_fields.update(item.keys()) + + # Sort fields for consistent output (standard fields first, then custom) + standard_fields = ['Type', 'Title', 'Number', 'State', 'URL', 'Assignees', 'Labels', 'Created At', 'Updated At'] + field_order = [f for f in standard_fields if f in all_fields] + field_order.extend(sorted([f for f in all_fields if f not in standard_fields])) + + # Write CSV + with open(output_file, 'w', newline='', encoding='utf-8') as f: + writer = csv.DictWriter(f, fieldnames=field_order, delimiter=delimiter, extrasaction='ignore') + writer.writeheader() + writer.writerows(flattened_items) + + print(f"Exported {len(items)} items to {output_file}") + + def export_to_json(self, items: List[Dict[str, Any]], output_file: str): + """Export project items to JSON file.""" + with open(output_file, 'w', encoding='utf-8') as f: + json.dump(items, f, indent=2, default=str) + + print(f"Exported {len(items)} items to {output_file}") + + +def main(): + parser = argparse.ArgumentParser( + description='Export GitHub Projects v2 data to CSV/TSV/JSON', + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + # Export using organization and project number + python github_project_exporter.py --org FilOzone --project-number 14 + + # Export using URL + python github_project_exporter.py --url https://github.com/orgs/FilOzone/projects/14 + + # Export using project node ID + python github_project_exporter.py --project-id PVT_kwDO... + + # Export to TSV format + python github_project_exporter.py --url --format tsv --output project.tsv + + # Export to JSON + python github_project_exporter.py --url --format json --output project.json + """ + ) + + # Project identification (mutually exclusive) + project_group = parser.add_mutually_exclusive_group(required=True) + project_group.add_argument('--org', help='Organization name') + project_group.add_argument('--user', help='User name (for user projects)') + project_group.add_argument('--url', help='GitHub Projects URL') + project_group.add_argument('--project-id', help='Project node ID (e.g., PVT_kwDO...)') + + parser.add_argument('--project-number', type=int, help='Project number (required with --org or --user)') + parser.add_argument('--token', help='GitHub personal access token (or set GITHUB_TOKEN env var)') + parser.add_argument('--output', '-o', default='project_export.csv', help='Output file (default: project_export.csv)') + parser.add_argument('--format', choices=['csv', 'tsv', 'json'], default='csv', help='Output format (default: csv)') + parser.add_argument('--max-items', type=int, default=1000, help='Maximum number of items to fetch (default: 1000)') + + args = parser.parse_args() + + # Validate arguments + if (args.org or args.user) and not args.project_number: + parser.error("--project-number is required when using --org or --user") + + # Get GitHub token + token = args.token or os.getenv('GITHUB_TOKEN') + if not token: + print("Error: GitHub token required. Set GITHUB_TOKEN environment variable or use --token flag.") + print("Token must have 'read:project' scope for organization projects.") + sys.exit(1) + + exporter = GitHubProjectExporter(token) + + # Get project ID + project_id = None + if args.project_id: + project_id = args.project_id + elif args.url: + project_id = exporter.get_project_id_from_url(args.url) + elif args.org: + project_id = exporter.get_project_node_id(org=args.org, project_number=args.project_number) + elif args.user: + project_id = exporter.get_project_node_id(user=args.user, project_number=args.project_number) + + if not project_id: + print("Error: Could not determine project ID") + sys.exit(1) + + print(f"Project ID: {project_id}") + + try: + # Fetch items + items = exporter.get_project_items(project_id, max_items=args.max_items) + + if not items: + print("No items found in project") + sys.exit(0) + + # Export + if args.format == 'json': + exporter.export_to_json(items, args.output) + else: + delimiter = '\t' if args.format == 'tsv' else ',' + exporter.export_to_csv(items, args.output, delimiter=delimiter) + + except Exception as e: + print(f"Error: {e}") + import traceback + traceback.print_exc() + sys.exit(1) + + +if __name__ == '__main__': + main() diff --git a/requirements.txt b/requirements.txt index 37912b8..5f42a38 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1 +1,2 @@ -requests>=2.31.0 \ No newline at end of file +requests>=2.31.0 +gql>=3.4.0 \ No newline at end of file