-
Notifications
You must be signed in to change notification settings - Fork 171
Add developer and user guides for JIT #1876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
787a6ec
cd1c730
b14207a
5d2d8d6
70a1805
3868d3e
2a07fbf
1194699
c3f64ee
5751907
7832603
38e6681
683b730
ee62eb7
4fd9ca7
be9b5df
a884f7e
73615b7
63200fa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| Advanced Topics | ||
| =============== | ||
|
|
||
| - `Just-in-Time Compilation`_ | ||
|
|
||
| Just-in-Time Compilation | ||
| ------------------------ | ||
| cuVS uses the Just-in-Time (JIT) `Link-Time Optimization (LTO) <https://developer.nvidia.com/blog/cuda-12-0-compiler-support-for-runtime-lto-using-nvjitlink-library/>`_ compilation technology to compile certain kernels. When a JIT compilation is triggered, cuVS will compile the kernel for your architecture and automatically cache it in-memory and on-disk. The validity of the cache is as follows: | ||
|
|
||
| 1. In-memory cache is valid for the lifetime of the process. | ||
| 2. On-disk cache is valid until a CUDA driver upgrade is performed. The cache can be portably shared between machines in network or cloud storage and we strongly recommend that you store the cache in a persistent location. For more details on how to configure the on-disk cache, look at CUDA documentation on `JIT Compilation <https://docs.nvidia.com/cuda/cuda-programming-guide/05-appendices/environment-variables.html#jit-compilation>`_. Specifically, the environment variables of interest are: `CUDA_CACHE_PATH` and `CUDA_CACHE_MAX_SIZE`. | ||
|
|
||
|
|
||
| Thus, the JIT compilation is a one-time cost and you can expect no loss in real performance after the first compilation. We recommend that you run a "warmup" to trigger the JIT compilation before the actual usage. | ||
|
|
||
| Currently, the following capabilities will trigger a JIT compilation: | ||
| - IVF Flat search APIs: :doc:`cuvs::neighbors::ivf_flat::search() <cpp_api/neighbors_ivf_flat>` | ||
|
|
||
| .. toctree:: | ||
| :maxdepth: 2 | ||
|
|
||
| jit_lto_guide | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -406,3 +406,13 @@ void foo(const raft::resources& res, ...) | |
| ... | ||
| } | ||
| ``` | ||
|
|
||
| ## Using Just-in-Time Link-Time Optimization | ||
|
|
||
| cuVS is moving to using link-time optimization for new kernels, and this requires some changes to the way kernels are written. Instead of compiling all kernel variants at build time (which leads to binary size explosion), JIT LTO compiles kernel fragments separately and links them together at runtime based on the specific configuration needed. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we link somewhere in the cuda docs in this paragraph? Maybe for "link time optimation"?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, can you provide an ever so brief summary of the perf implications? Maybe link to the cuda docs where appropriate for expectations?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First run perf implications are very kernel and hardware dependent.. the CUDA docs make no guarantees about that. |
||
|
|
||
| This approach ultimately enables: | ||
| - **Reduced binary size**: Compile fragments once, combine many ways | ||
| - **User Defined Functions**: Link UDFs in cuVS CUDA kernels | ||
|
|
||
| For more information on JIT LTO, see [Advanced Topics](advanced_topics). For a complete guide on implementing JIT LTO kernels, including step-by-step examples, see the [JIT LTO Guide](jit_lto_guide.md). | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to make it super mega obvious to people who deploy services based on cuvs in containers that "We really really strongly recommend you make sure the cache is stored in a persistent location so that containers don't have to warm up the cache after each restart"
Is it possible to include something that warms up the cache in my
Dockerfile? So that the cache is built into the image?I am not sure if I'd make the connection from reading the current docs, hence wondering if a really explicit "hit people over the head with it" call out would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a great idea. Let me add some phrasing to convey that very clearly.
You mean automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it read now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good now. Let's see if people get it, if not can always tune this later.
Wasn't thinking of something automatic, more a command I can include in my
Dockerfileas aRUNcommandThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you see the link that I added now, you can control where the cache is written with an environment variable. I'm hoping docker savvy users can now figure out the volume mount and environment variable connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me