Skip to content

feat: add init and upload in kernel-builder cli#378

Draft
drbh wants to merge 15 commits intomainfrom
kernel-builder-init-command
Draft

feat: add init and upload in kernel-builder cli#378
drbh wants to merge 15 commits intomainfrom
kernel-builder-init-command

Conversation

@drbh
Copy link
Collaborator

@drbh drbh commented Mar 18, 2026

This PR is a WIP to add the init command to the kernel-builder rust cli. This branch mainly ports the init command from the python cli, with some improvements to the interface and updates to the template

This PR adds an the init and upload command to the rust kernel-builder cli similar to implementation in the current python cli. It also adds a build command that is a facade for the nix run so devs can interact with a single tool for the full creation lifecycle of a kernel.

Mainly it aims to have better defaults to make the commands easier to use

  1. the init command only requires the name of the kernels (folder) and the owner is pulled via a whoami request if the user is already logged into hf
  2. the init command creates a new dir if a name is specified, if called within a dir, the cli will use the dir name as the kernel name
  3. the upload command does not require a repo id if the id can be read from the build
  4. the upload command does not require a path to the build folder and will default to looking for the build dir if no explicit path is specified

example usage

init from inside of a existing dir

mkdir some-kernel
cd some-kernel
kernel-builder init
# Downloading template from kernels-community/template...
# Initialized `drbh/some-kernelt` at /home/drbh/Projects/kernels/kernel-builder/some-kernel

init from outside of a dir

kernel-builder init some-kernel
# Downloading template from kernels-community/template...
# Initialized `drbh/some-kernelt` at /home/drbh/Projects/kernels/kernel-builder/some-kernel

build

kernel-builder build
# ...
# some-kernel-torch-ext> no Makefile or custom installCheckPhase, doing nothing
# some-kernel-torch-ext> Checking of ABI compatibility
# some-kernel-torch-ext> 🐍 Checking for compatibility with manylinux_2_28 and Python ABI version 3.9
# some-kernel-torch-ext> ✅ No compatibility issues found
# some-kernel-torch-ext> Checking loading kernel with get_kernel
# some-kernel-torch-ext> Check whether the kernel can be loaded with get-kernel: some_kernel
# some-kernel-torch-ext> Running phase: removeBytecodeHook
# some-kernel-torch-ext> Removing Python bytecode

upload

kernel-builder upload
# Found 7 build variant(s) in /home/drbh/Projects/kernels/kernel-builder/some-kernel/build
# Using branch `v1` (new)
# Uploading 36 operations...
# Kernel upload successful. Find the kernel at: https://hf.co/drbh/some-kernel

@drbh drbh changed the title feat: support init in builder cli feat: add init and upload in kernel-builder cli Mar 19, 2026
@drbh drbh force-pushed the kernel-builder-init-command branch from 534af0b to 368adad Compare March 19, 2026 15:25
@drbh drbh marked this pull request as ready for review March 19, 2026 16:42
@sayakpaul
Copy link
Member

sayakpaul commented Mar 20, 2026

Some high-level thoughts first:

the init command only requires the name of the kernels (folder) and the owner is pulled via a whoami request if the user is already logged into hf

Should we somehow inform the users about it and take confirmation before proceeding?

# Kernel upload successful. Find the kernel at: https://hf.co/drbh/some-kernel

Should this be Find the kernel at: https://hf.co/drbh/some-kernel/tree/{version}? This way, users can directly use that link to inspect files, etc.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comments. No RUST expertise yet.

template/CARD.md Outdated
@@ -0,0 +1,38 @@
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


/// Nix flake target to run.
#[arg(long, default_value = "build-and-copy")]
pub target: BuildTarget,
Copy link
Member

@danieldk danieldk Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think top-level

kernel-builder build
kernel-builder build-and-copy
kernel-builder build-and-upload

are nicer, easier to discover. I think we can use #[command(flatten)] to share arguments between different build commands, something like:

// ...
Build {
  #[command(flatten)]
  pub common_build_args: CommonBuildArgs,
  // ...
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, updated in latest and added those three commands at the top level


/// Additional arguments passed through to `nix run`.
#[arg(last = true)]
pub nix_args: Vec<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I guess if the user wants more advance flags they should call nix run directly. removed in latest

let bare_name = path.file_name().and_then(OsStr::to_str).ok_or_else(|| {
eyre::eyre!("Cannot determine directory name from `{path_str}`")
})?;
let owner = hf::whoami_username()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

for (from, to) in replacements {
text = text.replace(from, to);
}
fs::write(&destination, text).wrap_err_with(|| {
Copy link
Member

@danieldk danieldk Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I possible (not sure if it supports everything needed), I think it would be nice to use the fileset API that is also used for generating pyproject files. The nice thing is that it is atomic, if something fails while still preparing the output, nothing is written.

I think we would have to extend it with the notion of files that should not be overwritten (if a project already exists and you only want to add new files). But I think making it impossible to have an operation half-completed is a nice feature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yea thats a nice improvement, updated to use the file set api in the latest changes. this is a better ux and pattern so we avoid getting into a strange state

.wrap_err_with(|| format!("Cannot parse TOML in `{}`", build_toml_path.display()))?;

// Update [general].backends
if let Some(general) = document.get_mut("general").and_then(|v| v.as_table_mut()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget if we discussed this before, but it would be worthwhile putting jinja templates in the template, so that we can render everything editing TOML (but maybe parse + write to format).

This has the benefit that if we ever change the build.toml format, we don't have to change it both in the template and here in the code.

E.g., suppose that backends moves somewhere else, we have to change it in the template and here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes thats a great improvement. updated the kernel template to use jinja in the latest changes

Comment on lines +498 to +512
let mut impl_block = String::new();
for (i, (condition, device, ref_prefix)) in conditions.iter().enumerate() {
let directive = if i == 0 { "#if" } else { "#elif" };
impl_block.push_str(directive);
impl_block.push(' ');
impl_block.push_str(condition);
impl_block.push_str("\n ops.impl(\"");
impl_block.push_str(func_name);
impl_block.push_str("\", ");
impl_block.push_str(device);
impl_block.push_str(", ");
impl_block.push_str(ref_prefix);
impl_block.push_str(func_name);
impl_block.push_str(");\n");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a jinja template with if blocks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to prefer jinja in latest

Comment on lines +526 to +554
let marker = "ops.def(\"";
let start = content.find(marker)? + marker.len();
let rest = &content[start..];
let end = rest.find('(')?;
Some(&rest[..end])
}

/// Replace `#if defined(CPU_KERNEL)...#endif` block with new content
fn replace_ifdef_block(content: &str, replacement: &str) -> String {
const START_MARKER: &str = "#if defined(CPU_KERNEL)";
const END_MARKER: &str = "#endif";

let Some(start) = content.find(START_MARKER) else {
return content.to_owned();
};

let search_region = &content[start..];
let Some(end_offset) = search_region.find(END_MARKER) else {
return content.to_owned();
};

let end = start + end_offset + END_MARKER.len();

let mut result = String::with_capacity(content.len());
result.push_str(&content[..start]);
result.push_str(replacement);
result.push_str(&content[end..]);
result
}
Copy link
Member

@danieldk danieldk Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be easier to maintain with a jinja template and some if blocks I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to prefer jinja in latest

/// Upload kernel build artifacts to the Hugging Face Hub.
Upload(UploadArgs),

#[command(hide = true)]
Copy link
Member

@danieldk danieldk Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this removes the subcommand from help. Sounds bad? Leftover from testing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in latest

Copy link
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the upload subcommand now as well. Really excited to get batched uploads!

Comment on lines +22 to +23
#[arg(value_name = "KERNEL_DIR", default_value = ".")]
pub kernel_dir: PathBuf,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small nit, I'd use

Suggested change
#[arg(value_name = "KERNEL_DIR", default_value = ".")]
pub kernel_dir: PathBuf,
#[arg(value_name = "KERNEL_DIR")]
pub kernel_dir: Option<PathBuf>,

It expresses the semantics a bit better, there is now a helper function to get the path from that. (No strong opinions though.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed this is a more idiomatic, updated in latest

let mut operations: Vec<CommitOperation> = Vec::new();

collect_benchmark_ops(&kernel_dir, &existing_files, is_new_branch, &mut operations)?;
collect_readme_ops(&kernel_dir, &mut operations);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README should go to the main branch. I think probably the most elegant way is to make operations a HashMap<String, Vec<CommitOperation>> with the branch name as the key. That way we can have a nested loop to handle everything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For later: we may want to do the card generation and upload in one go as part of upload, since we cannot do the card generation in the Nix sandbox iff we need to fetch something from the main repo to do it (depends a bit on whether we still need to enumerate all the build variants or whether that information is handled by the Hub itself through the new kernel type, if the remaining information in the card is relatively static, we can handle it during build as well).

Not for this PR though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// Collect benchmark file operations: add matching files, delete stale ones.
fn collect_benchmark_ops(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should call them commit_ops? I was initially a bit confused, because we already use the term ops for ops in the kernel itself. Adding the commit_ prefix may make it a bit clearer that we are referring to different ops here.

Suggested change
fn collect_benchmark_ops(
fn collect_benchmark_commit_ops(

Comment on lines +322 to +342
fn read_repo_id_from_build_toml(kernel_dir: &Path) -> Result<String> {
let build_toml_path = kernel_dir.join("build.toml");
let content = fs::read_to_string(&build_toml_path)
.wrap_err_with(|| format!("Cannot read `{}`", build_toml_path.display()))?;
let parsed: toml::Value = toml::from_str(&content)
.wrap_err_with(|| format!("Cannot parse `{}`", build_toml_path.display()))?;

parsed
.get("general")
.and_then(|g| g.get("hub"))
.and_then(|h| h.get("repo-id"))
.and_then(|v| v.as_str())
.filter(|s| !s.is_empty())
.map(|s| s.to_owned())
.ok_or_else(|| {
eyre::eyre!(
"No `general.hub.repo-id` in `{}`. Use --repo-id to specify it.",
build_toml_path.display()
)
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use util::parse_build. We could also implement a fn repo_id(&self) -> Option<&str>) on Build if it is helpful and then we probably do not need this function at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea thats much cleaner, updated in latest! thanks

Comment on lines +388 to +410
for variant in variants {
let metadata_path = variant.join("metadata.json");
let data = fs::read_to_string(&metadata_path)
.wrap_err_with(|| format!("Cannot read `{}`", metadata_path.display()))?;
let parsed: serde_json::Value = serde_json::from_str(&data)
.wrap_err_with(|| format!("Cannot parse `{}`", metadata_path.display()))?;
let version = parsed.get("version").and_then(|v| v.as_u64());
versions.insert(version);
}

if versions.len() > 1 {
let version_strs: Vec<String> = versions
.iter()
.map(|v| match v {
Some(n) => n.to_string(),
None => "none".to_owned(),
})
.collect();
bail!(
"Found multiple versions in build variants: {}",
version_strs.join(", ")
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pyproject::Metadata to deserialize metadata.json. If helpful, we could add a function similar to parse_build.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice point, added a parse_metadata function in latest commits.

Comment on lines +426 to +439
fn walk_recursive(dir: &Path, files: &mut Vec<PathBuf>) -> Result<()> {
for entry in
fs::read_dir(dir).wrap_err_with(|| format!("Cannot read directory `{}`", dir.display()))?
{
let entry = entry?;
let path = entry.path();
if path.is_dir() {
walk_recursive(&path, files)?;
} else {
files.push(path);
}
}
Ok(())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_dir follows symbolic links, so if a build directory contains a link like ln -s . foo, this will get stuck in an infinite loop (until it runs out of stack or heap). So we either need to skip symbolic links or we need loop detection. Maybe it's worth considering BurntSushi's walkdir, since it covers a lot of edge cases and doesn't follow symlinks by default. It also returns an iterator, which is more efficient than pushing everything into a Vec. Not that we care too much about performance here, but it is a nice property.

https://docs.rs/walkdir/latest/walkdir/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good catch, updated to use walkdir in latest. thanks for pointing out

@drbh drbh force-pushed the kernel-builder-init-command branch from 57523a2 to f88776a Compare March 23, 2026 21:19
@drbh drbh marked this pull request as draft March 23, 2026 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants