feat(dist-git): Construct dist-git with lock file history#121
feat(dist-git): Construct dist-git with lock file history#121Tonisal-byte wants to merge 2 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR changes synthetic dist-git history generation to derive synthetic commits from project lock file history (based on input-fingerprint changes) rather than scanning project commit messages for Affects: <component> trailers.
Changes:
- Add a
git.RunInDirhelper to executegit -C <dir> ...via the repo command factory. - Rework
synthistoryto discoverFingerprintChangeentries by reading lock files from project git history and to rebuild dist-git history with interleaved upstream/synthetic commits. - Update source preparation to pass a
CmdFactoryfor git-based fingerprint detection and adjust call sites accordingly.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/utils/git/git.go | Adds RunInDir helper for running git commands in a specific directory. |
| internal/app/azldev/core/sources/synthistory.go | Implements lock-file fingerprint change detection and interleaved dist-git history replay. |
| internal/app/azldev/core/sources/sourceprep.go | Switches synthetic history support to require a CmdFactory for git operations. |
| internal/app/azldev/core/sources/synthistory_test.go | Replaces Affects-trailer tests with tests for lock-path, interleaving, orphan handling, and metadata parsing. |
| internal/app/azldev/cmds/component/render.go | Updates WithGitRepo call to pass env as the command factory. |
| internal/app/azldev/cmds/component/preparesources.go | Updates WithGitRepo call to pass env as the command factory. |
| internal/app/azldev/cmds/component/build.go | Updates WithGitRepo call to pass env as the command factory. |
| // that touched the given file path in the repository at repoDir. Each commit's | ||
| // metadata is separated by a NUL byte in the git log output. | ||
| func gitLogFileMetadata( | ||
| ctx context.Context, cmdFactory opctx.CmdFactory, repoDir, filePath string, | ||
| ) ([]CommitMetadata, error) { | ||
| output, err := git.RunInDir(ctx, cmdFactory, repoDir, | ||
| "log", "--format=%H%n%an%n%ae%n%at%n%s%x00", "--follow", "--", filePath) |
There was a problem hiding this comment.
gitLogFileMetadata uses 'git log --follow -- ' (tracking renames), but gitShowLockFile always attempts to read ':'. If the file was renamed, the log can include commits where the lock file path differs, and gitShowLockFile will fail and be skipped. Either drop '--follow' (if renames aren't supported) or make the show step rename-aware so history is not partially lost.
| // that touched the given file path in the repository at repoDir. Each commit's | |
| // metadata is separated by a NUL byte in the git log output. | |
| func gitLogFileMetadata( | |
| ctx context.Context, cmdFactory opctx.CmdFactory, repoDir, filePath string, | |
| ) ([]CommitMetadata, error) { | |
| output, err := git.RunInDir(ctx, cmdFactory, repoDir, | |
| "log", "--format=%H%n%an%n%ae%n%at%n%s%x00", "--follow", "--", filePath) | |
| // that touched the given file path in the repository at repoDir, using the | |
| // path exactly as provided and without following renames. Each commit's | |
| // metadata is separated by a NUL byte in the git log output. | |
| func gitLogFileMetadata( | |
| ctx context.Context, cmdFactory opctx.CmdFactory, repoDir, filePath string, | |
| ) ([]CommitMetadata, error) { | |
| output, err := git.RunInDir(ctx, cmdFactory, repoDir, | |
| "log", "--format=%H%n%an%n%ae%n%at%n%s%x00", "--", filePath) |
| // WithGitRepo returns a [PreparerOption] that enables dist-git repository | ||
| // creation during source preparation. When set, the upstream .git directory | ||
| // is preserved and synthetic commit history is generated on top of it. This | ||
| // requires the project configuration to reside inside a git repository. | ||
| // Without this option, no dist-git is created and synthetic history is skipped. | ||
| // | ||
| // The defaultAuthorEmail is used for synthetic changelog entries and commits | ||
| // when no author email is available from git history. | ||
| func WithGitRepo(defaultAuthorEmail string) PreparerOption { | ||
| // The cmdFactory is used to shell out to git for fingerprint change detection. | ||
| func WithGitRepo( | ||
| cmdFactory opctx.CmdFactory, | ||
| ) PreparerOption { | ||
| return func(p *sourcePreparerImpl) { | ||
| p.withGitRepo = true | ||
| p.defaultAuthorEmail = defaultAuthorEmail | ||
| p.cmdFactory = cmdFactory | ||
| } | ||
| } |
There was a problem hiding this comment.
WithGitRepo enables synthetic history generation but does not validate that cmdFactory is non-nil. If a caller accidentally passes nil, later code will panic when attempting to run git commands. Consider validating this in WithGitRepo (e.g., ignore the option) or in NewPreparer (return an error if withGitRepo is set but cmdFactory is nil).
| // RunInDir executes a git command in the given directory and returns its | ||
| // trimmed stdout output. The dir argument is passed via 'git -C dir'. | ||
| func RunInDir( | ||
| ctx context.Context, cmdFactory opctx.CmdFactory, dir string, args ...string, | ||
| ) (string, error) { | ||
| var stderr bytes.Buffer | ||
|
|
||
| fullArgs := make([]string, 0, len(args)+2) | ||
| fullArgs = append(fullArgs, "-C", dir) | ||
| fullArgs = append(fullArgs, args...) | ||
|
|
||
| rawCmd := exec.CommandContext(ctx, "git", fullArgs...) | ||
| rawCmd.Stderr = &stderr | ||
|
|
||
| cmd, err := cmdFactory.Command(rawCmd) | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to create git command:\n%w", err) | ||
| } | ||
|
|
||
| output, err := cmd.RunAndGetOutput(ctx) | ||
| if err != nil { | ||
| return "", fmt.Errorf("git %s failed:\n%v\n%w", args[0], stderr.String(), err) | ||
| } |
There was a problem hiding this comment.
RunInDir formats the failure message using args[0], which will panic if called with no args (even though current call sites pass at least one). Please add an explicit guard for len(args)==0 and produce a clearer error (e.g., include the full arg list) rather than indexing blindly.
| // LockFilePath returns the relative path to a component's lock file within the | ||
| // project repository. The path follows the same letter-prefix convention used by | ||
| // [components.RenderedSpecDir]: specs/<letter>/<name>/<name>.lock. | ||
| func LockFilePath(componentName string) string { | ||
| prefix := strings.ToLower(componentName[:1]) | ||
|
|
||
| return filepath.Join("specs", prefix, componentName, componentName+".lock") | ||
| } |
There was a problem hiding this comment.
LockFilePath slices componentName[:1] and uses componentName directly in a path without any validation. This can panic on empty component names and also allows unsafe names (path separators/traversal) to influence the git path. Consider validating with fileutils.ValidateFilename (similar to components.RenderedSpecDir) and returning an error if invalid.
| func CommitInterleavedHistory( | ||
| repo *gogit.Repository, | ||
| commits []CommitMetadata, | ||
| changes []FingerprintChange, | ||
| importCommit string, | ||
| ) error { | ||
| // The latest fingerprint change's UpstreamCommit is the commit we're | ||
| // pinned to — use it as the upper bound for the upstream walk instead | ||
| // of HEAD, which may be ahead (e.g., at the branch tip). | ||
| upstreamCommit := changes[len(changes)-1].UpstreamCommit | ||
|
|
||
| // Collect upstream commits BEFORE staging, so the temporary commit | ||
| // created by stageAndCaptureOverlayTree is not included. | ||
| upstreamCommits, err := collectUpstreamCommits(repo, importCommit, upstreamCommit) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
CommitInterleavedHistory indexes changes[len(changes)-1] without checking len(changes)>0. Since this is an exported helper, it should defensively return an error on empty input rather than panicking (callers may not always pre-check).
| // stageAndCaptureOverlayTree stages all working tree changes and creates a | ||
| // temporary commit to capture the resulting tree hash. The tree hash is used | ||
| // later to set the content of the final synthetic commit. | ||
| func stageAndCaptureOverlayTree(repo *gogit.Repository) (plumbing.Hash, error) { | ||
| worktree, err := repo.Worktree() | ||
| if err != nil { | ||
| return fmt.Errorf("failed to get worktree:\n%w", err) | ||
| return plumbing.ZeroHash, fmt.Errorf("failed to get worktree:\n%w", err) | ||
| } | ||
|
|
||
| // Stage all working tree changes once — overlays have already been applied. | ||
| if err := worktree.AddWithOptions(&gogit.AddOptions{All: true}); err != nil { | ||
| return fmt.Errorf("failed to stage changes:\n%w", err) | ||
| return plumbing.ZeroHash, fmt.Errorf("failed to stage changes:\n%w", err) | ||
| } | ||
|
|
||
| tempHash, err := worktree.Commit("temp: capture overlay tree", &gogit.CommitOptions{ | ||
| AllowEmptyCommits: true, | ||
| Author: &object.Signature{Name: "azldev", When: time.Unix(0, 0).UTC()}, | ||
| }) | ||
| if err != nil { | ||
| return plumbing.ZeroHash, fmt.Errorf("failed to create temporary commit:\n%w", err) | ||
| } | ||
|
|
||
| tempCommit, err := repo.CommitObject(tempHash) | ||
| if err != nil { | ||
| return plumbing.ZeroHash, fmt.Errorf("failed to read temporary commit:\n%w", err) | ||
| } | ||
|
|
||
| return tempCommit.TreeHash, nil |
There was a problem hiding this comment.
stageAndCaptureOverlayTree creates a temporary commit, which advances HEAD. If any later step fails (e.g., replayInterleavedHistory returns an error), the repo will be left pointing at this temp commit and the caller gets a failure with a mutated repo state. Please preserve the original HEAD ref and restore it on error (or avoid creating a temp commit by deriving the tree hash directly).
| if entry.upstreamCommit != nil { | ||
| hash, err := createCommitObject(repo, | ||
| entry.upstreamCommit.TreeHash, lastHash, | ||
| entry.upstreamCommit.Author, entry.upstreamCommit.Committer, | ||
| entry.upstreamCommit.Message) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to replay upstream commit:\n%w", err) | ||
| } | ||
|
|
||
| lastHash = hash | ||
| lastTreeHash = entry.upstreamCommit.TreeHash | ||
|
|
||
| continue | ||
| } |
There was a problem hiding this comment.
replayInterleavedHistory rewrites upstream commits by chaining them with a single parent (lastHash). This implicitly assumes upstream history is strictly linear; merge commits (multiple parents) would be flattened/incorrectly replayed. Please either explicitly detect and error on merges, or extend the replay logic to preserve multiple parents.
| for _, meta := range metas { | ||
| fields, err := gitShowLockFile(ctx, cmdFactory, projectRepoDir, meta.Hash, lockFileRelPath) | ||
| if err != nil { | ||
| slog.Warn("Failed to read lock file at commit; skipping", | ||
| "commit", meta.Hash, "error", err) | ||
|
|
||
| continue | ||
| } |
There was a problem hiding this comment.
FindFingerprintChanges currently logs and skips any error reading/parsing the lock file at a commit. This can silently drop real failures (e.g., corrupt TOML, unexpected git errors) and produce an incomplete synthetic history. Consider only skipping the specific "file not present at that commit" case and returning an error for parse/command failures.
Replaces the Affects: commit trailer approach for synthetic dist-git history with lock file fingerprint change detection. Instead of scanning commit messages for
Affects: <component>lines, synthetic commits are now derived from changes to a component'sinput-fingerprintfield in its lock file (specs///.lock).