Conversation
534af0b to
368adad
Compare
|
Some high-level thoughts first:
Should we somehow inform the users about it and take confirmation before proceeding?
Should this be Find the kernel at: https://hf.co/drbh/some-kernel/tree/{version}? This way, users can directly use that link to inspect files, etc. |
sayakpaul
left a comment
There was a problem hiding this comment.
Very minor comments. No RUST expertise yet.
template/CARD.md
Outdated
| @@ -0,0 +1,38 @@ | |||
| --- | |||
There was a problem hiding this comment.
Do we want to maintain two copies? We have one in https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/cli/card_template.md
kernel-builder/src/build.rs
Outdated
|
|
||
| /// Nix flake target to run. | ||
| #[arg(long, default_value = "build-and-copy")] | ||
| pub target: BuildTarget, |
There was a problem hiding this comment.
I think top-level
kernel-builder build
kernel-builder build-and-copy
kernel-builder build-and-upload
are nicer, easier to discover. I think we can use #[command(flatten)] to share arguments between different build commands, something like:
// ...
Build {
#[command(flatten)]
pub common_build_args: CommonBuildArgs,
// ...
}
There was a problem hiding this comment.
makes sense, updated in latest and added those three commands at the top level
kernel-builder/src/build.rs
Outdated
|
|
||
| /// Additional arguments passed through to `nix run`. | ||
| #[arg(last = true)] | ||
| pub nix_args: Vec<String>, |
There was a problem hiding this comment.
yea I guess if the user wants more advance flags they should call nix run directly. removed in latest
| let bare_name = path.file_name().and_then(OsStr::to_str).ok_or_else(|| { | ||
| eyre::eyre!("Cannot determine directory name from `{path_str}`") | ||
| })?; | ||
| let owner = hf::whoami_username()?; |
kernel-builder/src/init.rs
Outdated
| for (from, to) in replacements { | ||
| text = text.replace(from, to); | ||
| } | ||
| fs::write(&destination, text).wrap_err_with(|| { |
There was a problem hiding this comment.
I possible (not sure if it supports everything needed), I think it would be nice to use the fileset API that is also used for generating pyproject files. The nice thing is that it is atomic, if something fails while still preparing the output, nothing is written.
I think we would have to extend it with the notion of files that should not be overwritten (if a project already exists and you only want to add new files). But I think making it impossible to have an operation half-completed is a nice feature.
There was a problem hiding this comment.
oh yea thats a nice improvement, updated to use the file set api in the latest changes. this is a better ux and pattern so we avoid getting into a strange state
kernel-builder/src/init.rs
Outdated
| .wrap_err_with(|| format!("Cannot parse TOML in `{}`", build_toml_path.display()))?; | ||
|
|
||
| // Update [general].backends | ||
| if let Some(general) = document.get_mut("general").and_then(|v| v.as_table_mut()) { |
There was a problem hiding this comment.
I forget if we discussed this before, but it would be worthwhile putting jinja templates in the template, so that we can render everything editing TOML (but maybe parse + write to format).
This has the benefit that if we ever change the build.toml format, we don't have to change it both in the template and here in the code.
E.g., suppose that backends moves somewhere else, we have to change it in the template and here.
There was a problem hiding this comment.
ah yes thats a great improvement. updated the kernel template to use jinja in the latest changes
kernel-builder/src/init.rs
Outdated
| let mut impl_block = String::new(); | ||
| for (i, (condition, device, ref_prefix)) in conditions.iter().enumerate() { | ||
| let directive = if i == 0 { "#if" } else { "#elif" }; | ||
| impl_block.push_str(directive); | ||
| impl_block.push(' '); | ||
| impl_block.push_str(condition); | ||
| impl_block.push_str("\n ops.impl(\""); | ||
| impl_block.push_str(func_name); | ||
| impl_block.push_str("\", "); | ||
| impl_block.push_str(device); | ||
| impl_block.push_str(", "); | ||
| impl_block.push_str(ref_prefix); | ||
| impl_block.push_str(func_name); | ||
| impl_block.push_str(");\n"); | ||
| } |
There was a problem hiding this comment.
Could be a jinja template with if blocks.
There was a problem hiding this comment.
updated to prefer jinja in latest
kernel-builder/src/init.rs
Outdated
| let marker = "ops.def(\""; | ||
| let start = content.find(marker)? + marker.len(); | ||
| let rest = &content[start..]; | ||
| let end = rest.find('(')?; | ||
| Some(&rest[..end]) | ||
| } | ||
|
|
||
| /// Replace `#if defined(CPU_KERNEL)...#endif` block with new content | ||
| fn replace_ifdef_block(content: &str, replacement: &str) -> String { | ||
| const START_MARKER: &str = "#if defined(CPU_KERNEL)"; | ||
| const END_MARKER: &str = "#endif"; | ||
|
|
||
| let Some(start) = content.find(START_MARKER) else { | ||
| return content.to_owned(); | ||
| }; | ||
|
|
||
| let search_region = &content[start..]; | ||
| let Some(end_offset) = search_region.find(END_MARKER) else { | ||
| return content.to_owned(); | ||
| }; | ||
|
|
||
| let end = start + end_offset + END_MARKER.len(); | ||
|
|
||
| let mut result = String::with_capacity(content.len()); | ||
| result.push_str(&content[..start]); | ||
| result.push_str(replacement); | ||
| result.push_str(&content[end..]); | ||
| result | ||
| } |
There was a problem hiding this comment.
Would be easier to maintain with a jinja template and some if blocks I think.
There was a problem hiding this comment.
updated to prefer jinja in latest
kernel-builder/src/main.rs
Outdated
| /// Upload kernel build artifacts to the Hugging Face Hub. | ||
| Upload(UploadArgs), | ||
|
|
||
| #[command(hide = true)] |
There was a problem hiding this comment.
If I understand correctly, this removes the subcommand from help. Sounds bad? Leftover from testing?
danieldk
left a comment
There was a problem hiding this comment.
Reviewed the upload subcommand now as well. Really excited to get batched uploads!
kernel-builder/src/upload.rs
Outdated
| #[arg(value_name = "KERNEL_DIR", default_value = ".")] | ||
| pub kernel_dir: PathBuf, |
There was a problem hiding this comment.
Only a small nit, I'd use
| #[arg(value_name = "KERNEL_DIR", default_value = ".")] | |
| pub kernel_dir: PathBuf, | |
| #[arg(value_name = "KERNEL_DIR")] | |
| pub kernel_dir: Option<PathBuf>, |
It expresses the semantics a bit better, there is now a helper function to get the path from that. (No strong opinions though.)
There was a problem hiding this comment.
agreed this is a more idiomatic, updated in latest
kernel-builder/src/upload.rs
Outdated
| let mut operations: Vec<CommitOperation> = Vec::new(); | ||
|
|
||
| collect_benchmark_ops(&kernel_dir, &existing_files, is_new_branch, &mut operations)?; | ||
| collect_readme_ops(&kernel_dir, &mut operations); |
There was a problem hiding this comment.
README should go to the main branch. I think probably the most elegant way is to make operations a HashMap<String, Vec<CommitOperation>> with the branch name as the key. That way we can have a nested loop to handle everything.
There was a problem hiding this comment.
For later: we may want to do the card generation and upload in one go as part of upload, since we cannot do the card generation in the Nix sandbox iff we need to fetch something from the main repo to do it (depends a bit on whether we still need to enumerate all the build variants or whether that information is handled by the Hub itself through the new kernel type, if the remaining information in the card is relatively static, we can handle it during build as well).
Not for this PR though.
There was a problem hiding this comment.
Some relevant internal discussions: https://huggingface.slack.com/archives/C090JN2P8NB/p1774236408982349
kernel-builder/src/upload.rs
Outdated
| } | ||
|
|
||
| // Collect benchmark file operations: add matching files, delete stale ones. | ||
| fn collect_benchmark_ops( |
There was a problem hiding this comment.
Maybe we should call them commit_ops? I was initially a bit confused, because we already use the term ops for ops in the kernel itself. Adding the commit_ prefix may make it a bit clearer that we are referring to different ops here.
| fn collect_benchmark_ops( | |
| fn collect_benchmark_commit_ops( |
kernel-builder/src/upload.rs
Outdated
| fn read_repo_id_from_build_toml(kernel_dir: &Path) -> Result<String> { | ||
| let build_toml_path = kernel_dir.join("build.toml"); | ||
| let content = fs::read_to_string(&build_toml_path) | ||
| .wrap_err_with(|| format!("Cannot read `{}`", build_toml_path.display()))?; | ||
| let parsed: toml::Value = toml::from_str(&content) | ||
| .wrap_err_with(|| format!("Cannot parse `{}`", build_toml_path.display()))?; | ||
|
|
||
| parsed | ||
| .get("general") | ||
| .and_then(|g| g.get("hub")) | ||
| .and_then(|h| h.get("repo-id")) | ||
| .and_then(|v| v.as_str()) | ||
| .filter(|s| !s.is_empty()) | ||
| .map(|s| s.to_owned()) | ||
| .ok_or_else(|| { | ||
| eyre::eyre!( | ||
| "No `general.hub.repo-id` in `{}`. Use --repo-id to specify it.", | ||
| build_toml_path.display() | ||
| ) | ||
| }) | ||
| } |
There was a problem hiding this comment.
Use util::parse_build. We could also implement a fn repo_id(&self) -> Option<&str>) on Build if it is helpful and then we probably do not need this function at all.
There was a problem hiding this comment.
yea thats much cleaner, updated in latest! thanks
| for variant in variants { | ||
| let metadata_path = variant.join("metadata.json"); | ||
| let data = fs::read_to_string(&metadata_path) | ||
| .wrap_err_with(|| format!("Cannot read `{}`", metadata_path.display()))?; | ||
| let parsed: serde_json::Value = serde_json::from_str(&data) | ||
| .wrap_err_with(|| format!("Cannot parse `{}`", metadata_path.display()))?; | ||
| let version = parsed.get("version").and_then(|v| v.as_u64()); | ||
| versions.insert(version); | ||
| } | ||
|
|
||
| if versions.len() > 1 { | ||
| let version_strs: Vec<String> = versions | ||
| .iter() | ||
| .map(|v| match v { | ||
| Some(n) => n.to_string(), | ||
| None => "none".to_owned(), | ||
| }) | ||
| .collect(); | ||
| bail!( | ||
| "Found multiple versions in build variants: {}", | ||
| version_strs.join(", ") | ||
| ); | ||
| } |
There was a problem hiding this comment.
Use pyproject::Metadata to deserialize metadata.json. If helpful, we could add a function similar to parse_build.
There was a problem hiding this comment.
nice point, added a parse_metadata function in latest commits.
kernel-builder/src/upload.rs
Outdated
| fn walk_recursive(dir: &Path, files: &mut Vec<PathBuf>) -> Result<()> { | ||
| for entry in | ||
| fs::read_dir(dir).wrap_err_with(|| format!("Cannot read directory `{}`", dir.display()))? | ||
| { | ||
| let entry = entry?; | ||
| let path = entry.path(); | ||
| if path.is_dir() { | ||
| walk_recursive(&path, files)?; | ||
| } else { | ||
| files.push(path); | ||
| } | ||
| } | ||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
is_dir follows symbolic links, so if a build directory contains a link like ln -s . foo, this will get stuck in an infinite loop (until it runs out of stack or heap). So we either need to skip symbolic links or we need loop detection. Maybe it's worth considering BurntSushi's walkdir, since it covers a lot of edge cases and doesn't follow symlinks by default. It also returns an iterator, which is more efficient than pushing everything into a Vec. Not that we care too much about performance here, but it is a nice property.
There was a problem hiding this comment.
very good catch, updated to use walkdir in latest. thanks for pointing out
57523a2 to
f88776a
Compare
This PR is a WIP to add the init command to the kernel-builder rust cli. This branch mainly ports the init command from the python cli, with some improvements to the interface and updates to the templateThis PR adds an the
initanduploadcommand to the rust kernel-builder cli similar to implementation in the current python cli. It also adds abuildcommand that is a facade for thenix runso devs can interact with a single tool for the full creation lifecycle of a kernel.Mainly it aims to have better defaults to make the commands easier to use
initcommand only requires the name of the kernels (folder) and the owner is pulled via awhoamirequest if the user is already logged into hfinitcommand creates a new dir if a name is specified, if called within a dir, the cli will use the dir name as the kernel nameuploadcommand does not require a repo id if the id can be read from the builduploadcommand does not require a path to the build folder and will default to looking for thebuilddir if no explicit path is specifiedexample usage
init from inside of a existing dir
init from outside of a dir
build
upload