Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions media/docs/pythonDSL/cute_dsl_general/compile_with_tvm_ffi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ To maximize performance benefits, we recommend setting up your workflow as follo
- **Declare shape constraints using fake tensors** and reuse the compiled function
throughout your execution.
- **Pass PyTorch tensors directly** to the compiled function to avoid explicit DLPack conversion.
- **Use the environment stream flag** to implicitly synchronize with the current PyTorch stream.
- **Use the environment stream flag** to implicitly pass the current PyTorch stream.
- **Rely on compiled argument validation** instead of Python-side attribute validation,
as TVM FFI functions perform fast compiled checks.

Expand Down Expand Up @@ -246,11 +246,11 @@ The following example demonstrates this approach; the function accepts ``torch.c
Using Environment Stream
~~~~~~~~~~~~~~~~~~~~~~~~

The second option is to rely on the environment-stream flag.
Pass ``use_tvm_ffi_env_stream=True`` to ``make_fake_stream`` to mark the argument as an
environment stream so it no longer has to be provided explicitly.
TVM FFI will reuse its environment stream, synchronizing it with ``torch.cuda.current_stream()``
before each call. The example below shows this flow:
The second option is to rely on the environment stream flag.
Pass ``use_tvm_ffi_env_stream=True`` to ``make_fake_stream`` to mark the stream argument as an
environment stream, which means it no longer needs to be provided explicitly.
TVM FFI will automatically use its environment stream (i.e., the current PyTorch stream)
as the stream argument. The example below demonstrates this flow:

.. code-block:: python

Expand Down