Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NEXT_CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

### CLI
* Show a once-per-day notice after a command when a newer CLI release is available, with a link to the release and the upgrade command for the detected install method. Suppressed for non-interactive/CI runs, JSON output, the Databricks Runtime, and development builds, and can be disabled with `DATABRICKS_CLI_DISABLE_UPDATE_CHECK` ([#5470](https://github.com/databricks/cli/pull/5470)).
* `databricks labs list` now only shows projects that can be installed (those shipping a `labs.yml` manifest), and `databricks labs install` explains when a project does not provide one instead of failing with a generic "not found" error ([#5559](https://github.com/databricks/cli/pull/5559), [#5560](https://github.com/databricks/cli/pull/5560)).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bullet also promises the improved labs install error, which is #5559, still open. If this PR merges and a release cuts before #5559 lands, the changelog overpromises. Suggest trimming the bullet to the labs list half and letting #5559 carry its own line when it merges (or coordinating the merge order).


### Bundles
* Remove API enum values and types that are still in development from the `databricks-bundles` Python package; these were never accepted by the backend ([#5484](https://github.com/databricks/cli/pull/5484)).
Expand Down
75 changes: 66 additions & 9 deletions cmd/labs/list.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,20 @@ package labs

import (
"context"
"errors"
"time"

"github.com/databricks/cli/cmd/labs/github"
"github.com/databricks/cli/cmd/labs/localcache"
"github.com/databricks/cli/cmd/labs/project"
"github.com/databricks/cli/libs/cmdio"
"github.com/spf13/cobra"
"golang.org/x/sync/errgroup"
)

const (
labsOrg = "databrickslabs"
installableCacheTTL = 24 * time.Hour
)

type labsMeta struct {
Expand All @@ -20,14 +29,68 @@ func allRepos(ctx context.Context) (github.Repositories, error) {
if err != nil {
return nil, err
}
cache := github.NewRepositoryCache("databrickslabs", cacheDir)
cache := github.NewRepositoryCache(labsOrg, cacheDir)
return cache.Load(ctx)
}

// installableRepos returns the org repositories that `databricks labs install` can
// actually install. Most repositories in the org don't ship a labs.yml manifest
// (e.g. libraries published to package indexes), so listing them would only
// advertise projects that fail to install.
func installableRepos(ctx context.Context) (github.Repositories, error) {
cacheDir, err := project.PathInLabs(ctx)
if err != nil {
return nil, err
}
cache := localcache.NewLocalCache[github.Repositories](cacheDir, labsOrg+"-installable-repositories", installableCacheTTL)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

labs clear-cache (cmd/labs/clear_cache.go:22) only removes databrickslabs-repositories.json and the per-project caches, so this new cache file survives the command whose help says it clears "everywhere relevant". The moment a user reaches for clear-cache under this feature is when a project just added a labs.yml and they want it to show up, and this is exactly the file that needs purging.

clear_cache.go is in the same package: suggest extracting the cache name (labsOrg + "-installable-repositories") into a const next to installableCacheTTL, reusing it in both places, and adding a second os.Remove. There is no clear-cache test today; worth adding one that asserts both org-level cache files are gone.

return cache.Load(ctx, func() (github.Repositories, error) {
repos, err := allRepos(ctx)
if err != nil {
return nil, err
}
return filterInstallable(ctx, repos)
})
Comment on lines +46 to +52

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An offline cold start caches an empty list for 24 hours:

  1. With no repos cache on disk and the network down, the inner repos cache hits refreshCache, the fetch fails with a *url.Error, and the offline branch (localcache/jsonfile.go:59-62) returns the zero value with no error and no cache write. For github.Repositories that zero value is nil.
  2. filterInstallable(ctx, nil) iterates zero repos and returns (nil, nil), a success.
  3. This outer cache sees a successful refresh and writes the empty list with a fresh timestamp (jsonfile.go:66).

Result: labs list renders an empty table for the next 24h even after connectivity returns. Pre-PR, the same scenario rendered an empty table once and cached nothing. The clear-cache gap flagged in the other comment compounds this, since the documented remedy will not purge the file.

Suggested fix: hoist allRepos out of the cache.Load closure and return early when it comes back empty, so nothing is written. That preserves the current offline UX (empty table, exit 0); the cost is consulting the repos cache freshness on every list run, which is a disk read while it is fresh. Returning an error from the closure when len(repos) == 0 also works but turns the offline cold start into a failure. Either way, a test with no caches on disk plus an unreachable server, asserting that no databrickslabs-installable-repositories.json is created, would lock the behavior in.

}

// filterInstallable keeps repositories that have a root labs.yml manifest on their
// default branch. The manifest is fetched from raw.githubusercontent.com, which is
// not subject to the low unauthenticated GitHub API rate limit.
func filterInstallable(ctx context.Context, repos github.Repositories) (github.Repositories, error) {
installable := make([]bool, len(repos))
g, gctx := errgroup.WithContext(ctx)
g.SetLimit(10)
for i, repo := range repos {
if repo.IsArchived || repo.IsFork {
continue
}
g.Go(func() error {
_, err := github.ReadFileFromRef(gctx, labsOrg, repo.Name, repo.DefaultBranch, "labs.yml")
if errors.Is(err, github.ErrNotFound) {
return nil
}
if err != nil {
return err
}
installable[i] = true
return nil
})
}
if err := g.Wait(); err != nil {
return nil, err
}
var out github.Repositories
for i, repo := range repos {
if installable[i] {
out = append(out, repo)
}
}
return out, nil
}

func newListCommand() *cobra.Command {
return &cobra.Command{
Use: "list",
Short: "List all labs",
Short: "List labs that can be installed",
Annotations: map[string]string{
"template": cmdio.Heredoc(`
Name Description
Expand All @@ -37,18 +100,12 @@ func newListCommand() *cobra.Command {
},
RunE: func(cmd *cobra.Command, args []string) error {
ctx := cmd.Context()
repositories, err := allRepos(ctx)
repositories, err := installableRepos(ctx)
if err != nil {
return err
}
var info []labsMeta
for _, v := range repositories {
if v.IsArchived {
continue
}
if v.IsFork {
continue
}
description := v.Description
if len(description) > 50 {
description = description[:50] + "..."
Expand Down
39 changes: 39 additions & 0 deletions cmd/labs/list_test.go
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
package labs_test

import (
"net/http"
"net/http/httptest"
"testing"

"github.com/databricks/cli/cmd/labs/github"
"github.com/databricks/cli/internal/testcli"
"github.com/databricks/cli/libs/env"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)

Expand All @@ -15,4 +19,39 @@ func TestListingWorks(t *testing.T) {
stdout, _, err := c.Run()
require.NoError(t, err)
require.Contains(t, stdout.String(), "ucx")
// blueprint is in the repositories cache fixture but not in the
// installable-repositories cache fixture, proving the latter is rendered.
require.NotContains(t, stdout.String(), "blueprint")
}

func TestListingFiltersReposWithoutLabsYml(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
switch r.URL.Path {
case "/users/databrickslabs/repos":
_, err := w.Write([]byte(`[
{"name": "ucx", "description": "Unity Catalog Migrations", "default_branch": "main"},
{"name": "brickster", "description": "R interface to Databricks", "default_branch": "main"}
]`))
assert.NoError(t, err)
case "/databrickslabs/ucx/main/labs.yml":
_, err := w.Write([]byte("name: ucx"))
assert.NoError(t, err)
case "/databrickslabs/brickster/main/labs.yml":
w.WriteHeader(http.StatusNotFound)
default:
t.Logf("Requested: %s", r.URL.Path)
t.FailNow()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: t.FailNow() from the handler goroutine is technically illegal (the testing docs require it on the goroutine running the test; here it Goexits the connection goroutine). It does mark the test failed in practice, and installer_test.go uses the same catch-all idiom, so fine to leave for consistency. If you touch it anyway, t.Errorf plus http.Error(w, ..., http.StatusInternalServerError) is the correct shape.

}
}))
defer server.Close()
ctx := t.Context()
ctx = github.WithApiOverride(ctx, server.URL)
ctx = github.WithUserContentOverride(ctx, server.URL)
ctx = env.WithUserHomeDir(ctx, t.TempDir())

c := testcli.NewRunner(t, ctx, "labs", "list")
stdout, _, err := c.Run()
require.NoError(t, err)
require.Contains(t, stdout.String(), "ucx")
require.NotContains(t, stdout.String(), "brickster")
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"refreshed_at": "2033-01-01T00:00:00.92857+02:00",
"data": [
{
"name": "ucx",
"description": "Unity Catalog Migrations",
"language": "Python",
"default_branch": "main",
"stargazers_count": 100500,
"fork": false,
"archived": false,
"topics": [],
"html_url": "https://github.com/databrickslabs/ucx",
"clone_url": "https://github.com/databrickslabs/ucx.git",
"ssh_url": "git@github.com:databrickslabs/ucx.git",
"license": {
"name": "Other"
}
}
]
}
Loading