-
Notifications
You must be signed in to change notification settings - Fork 44
feat(graph): add subset command to extract package-related subgraphs #915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| Extracting Graph Subsets | ||
| ======================== | ||
|
|
||
| The ``fromager graph subset`` command extracts a focused subgraph containing only the dependencies and dependents of a specific package. This is useful for understanding the impact scope of a particular package, debugging specific dependency issues, or creating smaller, more manageable graphs for analysis. | ||
|
|
||
| Basic Usage | ||
| ----------- | ||
|
|
||
| To extract a subset graph for a specific package: | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| fromager graph subset <graph-file> <package-name> | ||
|
|
||
| Example | ||
| ------- | ||
|
|
||
| Using the example graph file from the e2e test, let's extract a subset for the ``keyring`` package: | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| fromager graph subset e2e/build-parallel/graph.json keyring | ||
|
|
||
| This command will output a JSON graph containing: | ||
|
|
||
| - The ``keyring`` package itself | ||
| - All packages that depend on ``keyring`` (dependents) | ||
| - All packages that ``keyring`` depends on (dependencies) | ||
| - The ROOT node if ``keyring`` is a top-level dependency | ||
|
|
||
| The resulting subset will include packages like: | ||
|
|
||
| - ``keyring==25.6.0`` (the target package) | ||
| - ``imapautofiler==1.14.0`` (depends on keyring) | ||
| - ``jaraco-classes==3.4.0`` (keyring dependency) | ||
| - ``jaraco-context==6.0.1`` (keyring dependency) | ||
| - ``jaraco-functools==4.1.0`` (keyring dependency) | ||
| - And their transitive dependencies | ||
|
|
||
| Version Filtering | ||
| ----------------- | ||
|
|
||
| You can limit the subset to a specific version of the target package using the ``--version`` flag: | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| fromager graph subset e2e/build-parallel/graph.json setuptools --version 80.8.0 | ||
|
|
||
| This is particularly useful when dealing with packages that have multiple versions in the graph, allowing you to focus on the relationships of a specific version. | ||
|
|
||
| File Output | ||
| ----------- | ||
|
|
||
| Save the subset graph to a file instead of printing to stdout: | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| fromager graph subset e2e/build-parallel/graph.json jinja2 -o jinja2-subset.json | ||
|
|
||
| The output file will be in the same JSON format as the original graph file and can be used as input to other ``fromager graph`` commands. | ||
|
|
||
| Use Cases | ||
| --------- | ||
|
|
||
| **Debugging Dependency Issues** | ||
| When a specific package is causing build problems, extract its subset to focus on just the relevant dependencies without the noise of the full graph. | ||
|
|
||
| **Impact Analysis** | ||
| Before upgrading or removing a package, understand what other packages would be affected by examining its dependents. | ||
|
|
||
| **Creating Focused Build Graphs** | ||
| Generate smaller graphs for specific components of your application, making it easier to understand and manage complex dependency trees. | ||
|
|
||
| **Documentation and Communication** | ||
| Create focused dependency diagrams for specific packages when documenting or explaining system architecture to team members. | ||
|
|
||
| **Performance Optimization** | ||
| When working with very large dependency graphs, extract subsets to improve performance of analysis tools and reduce memory usage. | ||
|
|
||
| Example Workflow | ||
| ---------------- | ||
|
|
||
| Here's a typical workflow for investigating a package's dependencies: | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| # Extract subset for a problematic package | ||
| fromager graph subset my-project-graph.json problematic-package -o debug-subset.json | ||
|
|
||
| # Visualize the subset | ||
| fromager graph to-dot debug-subset.json -o debug-subset.dot | ||
| dot -Tpng debug-subset.dot -o debug-subset.png | ||
|
|
||
| # Analyze why specific dependencies appear | ||
| fromager graph why debug-subset.json some-unexpected-dependency | ||
|
|
||
| This workflow helps you quickly isolate and understand issues within a complex dependency tree. | ||
|
|
||
| Output Format | ||
| ------------- | ||
|
|
||
| The subset command preserves the original graph structure and format. The output is a valid dependency graph that: | ||
|
|
||
| - Maintains all edge relationships between included nodes | ||
| - Preserves requirement specifications and constraint information | ||
| - Can be used as input to other graph commands | ||
| - Is compatible with existing fromager workflows | ||
|
|
||
| Error Handling | ||
| -------------- | ||
|
|
||
| The command will report an error if: | ||
|
|
||
| - The specified package is not found in the graph | ||
| - The specified version of a package is not found | ||
| - The graph file is invalid or corrupted | ||
|
|
||
| Example error output: | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| $ fromager graph subset e2e/build-parallel/graph.json nonexistent-package | ||
| Error: Package nonexistent-package not found in graph |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -459,6 +459,190 @@ def why( | |
| find_why(graph, node, depth, 0, requirement_type) | ||
|
|
||
|
|
||
| @graph.command() | ||
| @click.option( | ||
| "-o", | ||
| "--output", | ||
| type=clickext.ClickPath(), | ||
| help="Output file path for the subset graph", | ||
| ) | ||
| @click.option( | ||
| "--version", | ||
| type=clickext.PackageVersion(), | ||
| help="Limit subset to specific version of the package", | ||
| ) | ||
| @click.argument( | ||
| "graph-file", | ||
| type=str, | ||
| ) | ||
| @click.argument("package-name", type=str) | ||
| @click.pass_obj | ||
| def subset( | ||
| wkctx: context.WorkContext, | ||
| graph_file: str, | ||
| package_name: str, | ||
| output: pathlib.Path | None, | ||
| version: Version | None, | ||
| ) -> None: | ||
| """Extract a subset of a build graph related to a specific package. | ||
|
|
||
| Creates a new graph containing only nodes that depend on the specified package | ||
| and the dependencies of that package. By default includes all versions of the | ||
| package, but can be limited to a specific version with --version. | ||
| """ | ||
| try: | ||
| graph = DependencyGraph.from_file(graph_file) | ||
| subset_graph = extract_package_subset(graph, package_name, version) | ||
|
|
||
| if output: | ||
| with open(output, "w") as f: | ||
| subset_graph.serialize(f) | ||
| else: | ||
| subset_graph.serialize(sys.stdout) | ||
| except ValueError as e: | ||
| raise click.ClickException(str(e)) from e | ||
|
|
||
|
|
||
| def extract_package_subset( | ||
| graph: DependencyGraph, | ||
| package_name: str, | ||
| version: Version | None = None, | ||
| ) -> DependencyGraph: | ||
| """Extract a subset of the graph containing nodes related to a specific package. | ||
|
|
||
| Creates a new graph containing: | ||
| - All nodes matching the package name (optionally filtered by version) | ||
| - All nodes that depend on the target package (dependents) | ||
| - All dependencies of the target package | ||
|
|
||
| Args: | ||
| graph: The source dependency graph | ||
| package_name: Name of the package to extract subset for | ||
| version: Optional version to filter target nodes | ||
|
|
||
| Returns: | ||
| A new DependencyGraph containing only the related nodes | ||
|
|
||
| Raises: | ||
| ValueError: If package not found in graph | ||
| """ | ||
| # Find target nodes matching the package name | ||
| target_nodes = graph.get_nodes_by_name(package_name) | ||
| if version: | ||
| target_nodes = [node for node in target_nodes if node.version == version] | ||
|
|
||
| if not target_nodes: | ||
| version_msg = f" version {version}" if version else "" | ||
| raise ValueError(f"Package {package_name}{version_msg} not found in graph") | ||
|
|
||
| # Collect all related nodes | ||
| related_nodes: set[str] = set() | ||
|
|
||
| # Add target nodes | ||
| for node in target_nodes: | ||
| related_nodes.add(node.key) | ||
|
|
||
| # Traverse up to find dependents (what depends on our package) | ||
| visited_up: set[str] = set() | ||
| for target_node in target_nodes: | ||
| _collect_dependents(target_node, related_nodes, visited_up) | ||
|
|
||
| # Traverse down to find dependencies (what our package depends on) | ||
| visited_down: set[str] = set() | ||
| for target_node in target_nodes: | ||
| _collect_dependencies(target_node, related_nodes, visited_down) | ||
|
|
||
| # Always include ROOT if any target nodes are top-level dependencies | ||
| for target_node in target_nodes: | ||
| for parent_edge in target_node.parents: | ||
| if parent_edge.destination_node.key == ROOT: | ||
| related_nodes.add(ROOT) | ||
| break | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the code we have from line 555 to 560 is most likely dead code. Here is the analysis from a AI agent The code is dead. By line 555, _collect_dependents (line 548) has already run. That function walks every ancestor of the target node all the way up to ROOT. ROOT is _collect_dependents(pyyaml==6.0.2): So by the time execution reaches line 555, ROOT is already in related_nodes. The add() on line 559 is a no-op on a set that already contains the value.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I'll take a look through and see, thanks for catching that. |
||
|
|
||
| # Create new graph with only related nodes | ||
| subset_graph = DependencyGraph() | ||
| _build_subset_graph(graph, subset_graph, related_nodes) | ||
|
|
||
| return subset_graph | ||
|
|
||
|
|
||
| def _collect_dependents( | ||
| node: DependencyNode, | ||
| related_nodes: set[str], | ||
| visited: set[str], | ||
| ) -> None: | ||
| """Recursively collect all nodes that depend on the given node.""" | ||
| if node.key in visited: | ||
| return | ||
| visited.add(node.key) | ||
|
|
||
| for parent_edge in node.parents: | ||
| parent_node = parent_edge.destination_node | ||
| related_nodes.add(parent_node.key) | ||
| _collect_dependents(parent_node, related_nodes, visited) | ||
|
|
||
|
|
||
| def _collect_dependencies( | ||
| node: DependencyNode, | ||
| related_nodes: set[str], | ||
| visited: set[str], | ||
| ) -> None: | ||
| """Recursively collect all dependencies of the given node.""" | ||
| if node.key in visited: | ||
| return | ||
| visited.add(node.key) | ||
|
|
||
| for child_edge in node.children: | ||
| child_node = child_edge.destination_node | ||
| related_nodes.add(child_node.key) | ||
| _collect_dependencies(child_node, related_nodes, visited) | ||
|
|
||
|
|
||
| def _build_subset_graph( | ||
| source_graph: DependencyGraph, | ||
| target_graph: DependencyGraph, | ||
| included_nodes: set[str], | ||
| ) -> None: | ||
| """Build the subset graph with only the included nodes and their edges.""" | ||
| # First pass: add all included nodes | ||
| for node_key in included_nodes: | ||
| source_node = source_graph.nodes[node_key] | ||
| if node_key == ROOT: | ||
| continue # ROOT is already created in the new graph | ||
|
|
||
| # Add the node to target graph | ||
| target_graph._add_node( | ||
| req_name=source_node.canonicalized_name, | ||
| version=source_node.version, | ||
| download_url=source_node.download_url, | ||
| pre_built=source_node.pre_built, | ||
| constraint=source_node.constraint, | ||
| ) | ||
|
|
||
| # Second pass: add edges between included nodes | ||
| for node_key in included_nodes: | ||
| source_node = source_graph.nodes[node_key] | ||
| for child_edge in source_node.children: | ||
| child_key = child_edge.destination_node.key | ||
| # Only add edge if both parent and child are in the subset | ||
| if child_key in included_nodes: | ||
| child_node = child_edge.destination_node | ||
| target_graph.add_dependency( | ||
| parent_name=source_node.canonicalized_name | ||
| if source_node.canonicalized_name | ||
| else None, | ||
| parent_version=source_node.version | ||
| if source_node.canonicalized_name | ||
| else None, | ||
| req_type=child_edge.req_type, | ||
| req=child_edge.req, | ||
| req_version=child_node.version, | ||
| download_url=child_node.download_url, | ||
| pre_built=child_node.pre_built, | ||
| constraint=child_node.constraint, | ||
| ) | ||
|
|
||
|
|
||
| def find_why( | ||
| graph: DependencyGraph, | ||
| node: DependencyNode, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add documentation for this new command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for the CLI reference section is generated automatically, but I can add a how-to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some additional docs.