diff --git a/docs/guides/checkpointing_solutions/gcs_checkpointing.md b/docs/guides/checkpointing_solutions/gcs_checkpointing.md
index b80a9c536b..e78a6939ef 100644
--- a/docs/guides/checkpointing_solutions/gcs_checkpointing.md
+++ b/docs/guides/checkpointing_solutions/gcs_checkpointing.md
@@ -28,30 +28,30 @@ startup. The first valid condition met is the one executed:
 
 ### MaxText configuration
 
-Flag | Description | Type | Default
-:------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :-------- | :------
-`enable_checkpointing` | A master switch to enable (`True`) or disable (`False`) saving checkpoints during the training run. | `boolean` | `False`
-`async_checkpointing` | When set to (`True`), this flag makes checkpoint saving asynchronous. The training step is only blocked for the minimal time needed to capture the model's state, and the actual writing to storage happens in a background thread. This is highly recommended for performance. It's enabled by default. | `boolean` | `True`
-`checkpoint_period` | The interval, in training steps, for how often a checkpoint is saved. | `integer` | `10000`
-`enable_single_replica_ckpt_restoring` | If `True`, one replica reads the checkpoint from storage and then broadcasts it to all other replicas. This can significantly speed up restoration on multi-host systems by reducing redundant reads from storage.<br>**Note**: This feature is only compatible with training jobs that utilize a Distributed Data Parallel (DDP) strategy. | `boolean` | `False`
-`checkpoint_todelete_subdir` | Subdirectory to move checkpoints to before deletion. For example: `".todelete"` (Ignored if directory is prefixed with gs://) | `string` | `""`
-`checkpoint_todelete_full_path` | Full path to move checkpoints to before deletion. | `string` | `""`
-`load_parameters_path` | Specifies a path to a checkpoint directory to load a parameter only checkpoint.<br>**Example**: `"gs://my-bucket/my-previous-run/checkpoints/items/1000"` | `string` | `""` (disabled)
-`load_full_state_path` | Specifies a path to a checkpoint directory to load a full checkpoint including optimizer state and step count from a specific directory.<br>**Example**: `"gs://my-bucket/my-interrupted-run/checkpoints/items/500"` | `string` | `""` (disabled)
-`lora_input_adapters_path` | Specifies a parent directory containing LoRA (Low-Rank Adaptation) adapters. | `string` | `""` (disabled)
-`force_unroll` | If `True`, unrolls the loop when generating a parameter-only checkpoint. | `boolean` | `False`
+| Flag                                   | Description                                                                                                                                                                                                                                                                                                                                 | Type      | Default         |
+| :------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :-------- | :-------------- |
+| `enable_checkpointing`                 | A master switch to enable (`True`) or disable (`False`) saving checkpoints during the training run.                                                                                                                                                                                                                                         | `boolean` | `False`         |
+| `async_checkpointing`                  | When set to (`True`), this flag makes checkpoint saving asynchronous. The training step is only blocked for the minimal time needed to capture the model's state, and the actual writing to storage happens in a background thread. This is highly recommended for performance. It's enabled by default.                                    | `boolean` | `True`          |
+| `checkpoint_period`                    | The interval, in training steps, for how often a checkpoint is saved.                                                                                                                                                                                                                                                                       | `integer` | `10000`         |
+| `enable_single_replica_ckpt_restoring` | If `True`, one replica reads the checkpoint from storage and then broadcasts it to all other replicas. This can significantly speed up restoration on multi-host systems by reducing redundant reads from storage.<br>**Note**: This feature is only compatible with training jobs that utilize a Distributed Data Parallel (DDP) strategy. | `boolean` | `False`         |
+| `checkpoint_todelete_subdir`           | Subdirectory to move checkpoints to before deletion. For example: `".todelete"` (Ignored if directory is prefixed with `gs://`)                                                                                                                                                                                                             | `string`  | `""`            |
+| `checkpoint_todelete_full_path`        | Full path to move checkpoints to before deletion.                                                                                                                                                                                                                                                                                           | `string`  | `""`            |
+| `load_parameters_path`                 | Specifies a path to a checkpoint directory to load a parameter only checkpoint.<br>**Example**: `"gs://my-bucket/my-previous-run/checkpoints/items/1000"`                                                                                                                                                                                   | `string`  | `""` (disabled) |
+| `load_full_state_path`                 | Specifies a path to a checkpoint directory to load a full checkpoint including optimizer state and step count from a specific directory.<br>**Example**: `"gs://my-bucket/my-interrupted-run/checkpoints/items/500"`                                                                                                                        | `string`  | `""` (disabled) |
+| `lora_input_adapters_path`             | Specifies a parent directory containing LoRA (Low-Rank Adaptation) adapters.                                                                                                                                                                                                                                                                | `string`  | `""` (disabled) |
+| `force_unroll`                         | If `True`, unrolls the loop when generating a parameter-only checkpoint.                                                                                                                                                                                                                                                                    | `boolean` | `False`         |
 
 ## Storage and format configuration
 
 These settings control the underlying storage mechanism
 ([Orbax](https://orbax.readthedocs.io)) for performance and compatibility.
 
-Flag | Description | Type | Default
-:----------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------- | :------
-`checkpoint_storage_target_data_file_size_bytes` | Sets a target file size for Orbax to chunk large arrays into smaller physical files. This can dramatically speed up loading over a network and in distributed environments. | `integer` | `2147483648` (2 GB)
-`checkpoint_storage_use_ocdbt` | If `True`, uses the TensorStore **OCDBT** (Optionally-Cooperative Distributed B+ Tree)) key-value store as the underlying storage format for checkpointing. Set to `0` for Pathways. | `boolean` | `True`
-`checkpoint_storage_use_zarr3` | If `True`, uses the Zarr v3 storage format within Orbax, which is optimized for chunked, compressed, N-dimensional arrays. Set to `0` for Pathways. | `boolean` | `True`
-`checkpoint_storage_concurrent_gb` | Controls the concurrent I/O limit in gigabytes for the checkpointer. Larger models may require increasing this value to avoid I/O bottlenecks. | `integer` | `96`
-`enable_orbax_v1` | A boolean flag to explicitly enable features and behaviors from Orbax version 1. | `boolean` | `False`
-`source_checkpoint_layout` | Specifies the format of the checkpoint being **loaded**. This tells the system how to interpret the files at the source path.<br>**Options**: `"orbax"`, `"safetensors"` | `string` | `"orbax"`
-`checkpoint_conversion_fn` | A user-defined function to process a loaded checkpoint dictionary into a format that the model can understand. This is essential for loading checkpoints from different frameworks or formats (e.g., converting keys from a Hugging Face SafeTensors file). | `function` or `None` | `None`
+| Flag                                             | Description                                                                                                                                                                                                                                                 | Type                 | Default             |
+| :----------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------- | :------------------ |
+| `checkpoint_storage_target_data_file_size_bytes` | Sets a target file size for Orbax to chunk large arrays into smaller physical files. This can dramatically speed up loading over a network and in distributed environments.                                                                                 | `integer`            | `2147483648` (2 GB) |
+| `checkpoint_storage_use_ocdbt`                   | If `True`, uses the TensorStore **OCDBT** (Optionally-Cooperative Distributed B+ Tree)) key-value store as the underlying storage format for checkpointing. Set to `0` for Pathways.                                                                        | `boolean`            | `True`              |
+| `checkpoint_storage_use_zarr3`                   | If `True`, uses the Zarr v3 storage format within Orbax, which is optimized for chunked, compressed, N-dimensional arrays. Set to `0` for Pathways.                                                                                                         | `boolean`            | `True`              |
+| `checkpoint_storage_concurrent_gb`               | Controls the concurrent I/O limit in gigabytes for the checkpointer. Larger models may require increasing this value to avoid I/O bottlenecks.                                                                                                              | `integer`            | `96`                |
+| `enable_orbax_v1`                                | A boolean flag to explicitly enable features and behaviors from Orbax version 1.                                                                                                                                                                            | `boolean`            | `False`             |
+| `source_checkpoint_layout`                       | Specifies the format of the checkpoint being **loaded**. This tells the system how to interpret the files at the source path.<br>**Options**: `"orbax"`, `"safetensors"`                                                                                    | `string`             | `"orbax"`           |
+| `checkpoint_conversion_fn`                       | A user-defined function to process a loaded checkpoint dictionary into a format that the model can understand. This is essential for loading checkpoints from different frameworks or formats (e.g., converting keys from a Hugging Face SafeTensors file). | `function` or `None` | `None`              |
diff --git a/docs/guides/data_input_pipeline/data_input_grain.md b/docs/guides/data_input_pipeline/data_input_grain.md
index 5a7d66981d..5522020b3f 100644
--- a/docs/guides/data_input_pipeline/data_input_grain.md
+++ b/docs/guides/data_input_pipeline/data_input_grain.md
@@ -110,10 +110,10 @@ Note that `FILE_PATH` is optional; when provided, the script runs `ls -R` for pr
 
 ```sh
 bash src/dependencies/scripts/setup_gcsfuse.sh \
-DATASET_GCS_BUCKET=maxtext-dataset \
+DATASET_GCS_BUCKET=gs://<your-dataset-bucket> \
 MOUNT_PATH=/tmp/gcsfuse && \
 python3 -m maxtext.trainers.pre_train.train \
-run_name=<RUN_NAME> base_output_directory=gs://<MY_BUCKET>  \
+run_name=<run-name> base_output_directory=gs://<your-bucket> \
 dataset_type=grain \
 grain_file_type=arrayrecord # or parquet \ 
 grain_train_files=/tmp/gcsfuse/array-record/c4/en/3.0.1/c4-train.array_record* \
diff --git a/docs/guides/monitoring_and_debugging/features_and_diagnostics.md b/docs/guides/monitoring_and_debugging/features_and_diagnostics.md
index a6952fae04..4a0f5efbef 100644
--- a/docs/guides/monitoring_and_debugging/features_and_diagnostics.md
+++ b/docs/guides/monitoring_and_debugging/features_and_diagnostics.md
@@ -87,7 +87,7 @@ export LIBTPU_INIT_ARGS="--xla_enable_async_all_gather=true"
 python3 -m maxtext.trainers.pre_train.train run_name=example_load_compile \
   compiled_trainstep_file=my_compiled_train.pickle \
   global_parameter_scale=16 per_device_batch_size=4 steps=10000 learning_rate=1e-3 \
-  base_output_directory=gs://my-output-bucket dataset_path=gs://my-dataset-bucket
+  base_output_directory=gs://<your-output-bucket> dataset_path=gs://<your-dataset-bucket>
 ```
 
 In the save step of example 2 above we included exporting the compiler flag `LIBTPU_INIT_ARGS` and `learning_rate` because those affect the compiled object `my_compiled_train.pickle.` The sizes of the model (e.g. `global_parameter_scale`, `max_sequence_length` and `per_device_batch`) are fixed when you initially compile via `compile_train.py`, you will see a size error if you try to run the saved compiled object with different sizes than you compiled with. However a subtle note is that the **learning rate schedule** is also fixed when you run `compile_train` - which is determined by both `steps` and `learning_rate`. The optimizer parameters such as `adam_b1` are passed only as shaped objects to the compiler - thus their real values are determined when you run `train.py`, not during the compilation. If you do pass in different shapes (e.g. `per_device_batch`), you will get a clear error message reporting that the compiled signature has different expected shapes than what was input. If you attempt to run on different hardware than the compilation targets requested via `compile_topology`, you will get an error saying there is a failure to map the devices from the compiled to your real devices. Using different XLA flags or a LIBTPU than what was compiled will probably run silently with the environment you compiled in without error. However there is no guaranteed behavior in this case; you should run in the same environment you compiled in.
@@ -125,7 +125,7 @@ export XLA_FLAGS="--xla_gpu_enable_async_collectives=true"
 python3 -m maxtext.trainers.pre_train.train run_name=example_load_compile \
   compiled_trainstep_file=my_compiled_train.pickle \
   attention=dot_product global_parameter_scale=16  per_device_batch_size=4 steps=10000 learning_rate=1e-3 \
-  base_output_directory=gs://my-output-bucket dataset_path=gs://my-dataset-bucket
+  base_output_directory=gs://<your-output-bucket> dataset_path=gs://<your-dataset-bucket>
 ```
 
 As in the TPU case, note that the compilation environment must match the execution environment, in this case by setting the same `XLA_FLAGS`.
diff --git a/docs/guides/monitoring_and_debugging/ml_workload_diagnostics.md b/docs/guides/monitoring_and_debugging/ml_workload_diagnostics.md
index 81206bff6b..dccaebc02f 100644
--- a/docs/guides/monitoring_and_debugging/ml_workload_diagnostics.md
+++ b/docs/guides/monitoring_and_debugging/ml_workload_diagnostics.md
@@ -37,8 +37,8 @@ MaxText has integrated the ML Diagnostics [SDK](https://github.com/AI-Hypercompu
    ```
       python3 -m maxtext.trainers.pre_train.train \
          run_name=${USER}-tpu-job \
-         base_output_directory="gs://your-output-bucket/" \
-         dataset_path="gs://your-dataset-bucket/" \
+         base_output_directory="gs://<your-output-bucket>/" \
+         dataset_path="gs://<your-dataset-bucket>/" \
          steps=100 \
          log_period=10 \
          managed_mldiagnostics=True
@@ -49,8 +49,8 @@ MaxText has integrated the ML Diagnostics [SDK](https://github.com/AI-Hypercompu
    ```
       python3 -m maxtext.trainers.pre_train.train \
          run_name=${USER}-tpu-job \
-         base_output_directory="gs://your-output-bucket/" \
-         dataset_path="gs://your-dataset-bucket/" \
+         base_output_directory="gs://<your-output-bucket>/" \
+         dataset_path="gs://<your-dataset-bucket>/" \
          steps=100 \
          log_period=10 \
          profiler=xplane \
@@ -62,8 +62,8 @@ MaxText has integrated the ML Diagnostics [SDK](https://github.com/AI-Hypercompu
    ```
       python3 -m maxtext.trainers.pre_train.train \
          run_name=${USER}-tpu-job \
-         base_output_directory="gs://your-output-bucket/" \
-         dataset_path="gs://your-dataset-bucket/" \
+         base_output_directory="gs://<your-output-bucket>/" \
+         dataset_path="gs://<your-dataset-bucket>/" \
          steps=100 \
          log_period=10 \
          profiler=xplane \
diff --git a/docs/guides/monitoring_and_debugging/understand_logs_and_metrics.md b/docs/guides/monitoring_and_debugging/understand_logs_and_metrics.md
index e381e7e8ac..e9833d7e6e 100644
--- a/docs/guides/monitoring_and_debugging/understand_logs_and_metrics.md
+++ b/docs/guides/monitoring_and_debugging/understand_logs_and_metrics.md
@@ -20,11 +20,11 @@
 
 When you run a training job, MaxText produces detailed output logs. This guide shows you how to interpret these logs to understand your configuration and monitor performance.
 
-To start, run a simple pretraining job on a single-host TPU. For instance, we can run the following command on TPU v5p-8. The resulting log is used as an example throughout this guide.
+To start, run a simple pretraining job on a single-host TPU. For instance, we can run the following command on TPU v5p-8. The resulting log is used as an example throughout this guide. Replace `<your-bucket>` in the command below to point to the GCS bucket you want to use for your output.
 
 ```bash
 python3 -m maxtext.trainers.pre_train.train \
-base_output_directory=gs://runner-maxtext-logs run_name=demo \
+base_output_directory=gs://<your-bucket> run_name=demo \
 model_name=deepseek2-16b \
 per_device_batch_size=24 max_target_length=2048 steps=10 dataset_type=synthetic enable_checkpointing=false
 ```
@@ -80,23 +80,23 @@ Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequen
 This also includes the **output paths** for your run artifacts.
 
 ```
-Config param base_output_directory: gs://runner-maxtext-logs
+Config param base_output_directory: gs://<your-bucket>
 Config param run_name: demo
-Config param metrics_dir: gs://runner-maxtext-logs/demo/metrics/
-Config param tensorboard_dir: gs://runner-maxtext-logs/demo/tensorboard/
-Config param checkpoint_dir: gs://runner-maxtext-logs/demo/checkpoints/
+Config param metrics_dir: gs://<your-bucket>/demo/metrics/
+Config param tensorboard_dir: gs://<your-bucket>/demo/tensorboard/
+Config param checkpoint_dir: gs://<your-bucket>/demo/checkpoints/
 ```
 
 ### Understanding output paths
 
-MaxText organizes all of your run's artifacts into a main output directory. The primary location for your run is constructed by combining the `base_output_directory` and the `run_name` you specify in your command. Based on the logs above, the base path for this specific run is `gs://runner-maxtext-logs/demo`.
+MaxText organizes all of your run's artifacts into a main output directory. The primary location for your run is constructed by combining the `base_output_directory` and the `run_name` you specify in your command. Based on the logs above, the base path for this specific run is `gs://<your-bucket>/demo`.
 
 Within this base path, MaxText creates several subdirectories for different types of artifacts. Many of these are optional and only created if you enable them with a specific flag.
 
 - **TensorBoard logs (`tensorboard/`)**
 
   - Flag: `enable_tensorboard=True` (default)
-  - Path: `gs://runner-maxtext-logs/demo/tensorboard/`
+  - Path: `gs://<your-bucket>/demo/tensorboard/`
 
 - **Profiler traces (`tensorboard/plugins/profile/`)**
 
@@ -106,17 +106,17 @@ Within this base path, MaxText creates several subdirectories for different type
 - **Metrics in plain text (`metrics/`)**
 
   - Flag: `gcs_metrics=True`
-  - Path: `gs://runner-maxtext-logs/demo/metrics/`
+  - Path: `gs://<your-bucket>/demo/metrics/`
 
 - **Configuration file (`config.yml`)**
 
   - Flag: `save_config_to_gcs=True`
-  - Path: `gs://runner-maxtext-logs/demo/config.yml`
+  - Path: `gs://<your-bucket>/demo/config.yml`
 
 - **Checkpoints (`checkpoints/`)**
 
   - Flag: `enable_checkpointing=True`
-  - Path: `gs://runner-maxtext-logs/demo/checkpoints/`
+  - Path: `gs://<your-bucket>/demo/checkpoints/`
 
 To generate all optional artifacts in one run, you can set the corresponding flags in the command line, like in the example below.
 
@@ -124,7 +124,7 @@ This command enables tensorboard, profiler, text metrics, config saving, and che
 
 ```bash
 python3 -m maxtext.trainers.pre_train.train \
-base_output_directory=gs://runner-maxtext-logs run_name=demo2 \
+base_output_directory=gs://<your-bucket> run_name=demo2 \
 model_name=deepseek2-16b \
 per_device_batch_size=24 max_target_length=2048 steps=10 dataset_type=synthetic \
 enable_tensorboard=True \
diff --git a/docs/install_maxtext.md b/docs/install_maxtext.md
index 47d31e93cb..ec7ab961ed 100644
--- a/docs/install_maxtext.md
+++ b/docs/install_maxtext.md
@@ -112,7 +112,7 @@ environment to avoid dependency conflicts.
    cd maxtext
    ```
 
-:::\{only} is_not_latest
+````{only} is_not_latest
 
 By default, cloning the repository provides the latest version (**HEAD**).
 If you wish to use the latest features, please follow the [latest guide](https://maxtext.readthedocs.io/en/latest/install_maxtext.html).
@@ -126,7 +126,7 @@ before proceeding with the installation.
   git checkout |version|
 ```
 
-:::
+````
 
 2. Create virtual environment:
 
diff --git a/docs/reference/core_concepts/quantization.md b/docs/reference/core_concepts/quantization.md
index dae117a85a..1b2ef6518d 100644
--- a/docs/reference/core_concepts/quantization.md
+++ b/docs/reference/core_concepts/quantization.md
@@ -87,7 +87,7 @@ Common options for the `quantization` flag when using Qwix include:
 Here is an example of how to run a training job with int8 quantization enabled via Qwix:
 
 ```bash
-python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} base_output_directory=gs://<my-bucket> dataset_type=synthetic use_qwix_quantization=true quantization='int8'
+python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} base_output_directory=gs://<your-bucket> dataset_type=synthetic use_qwix_quantization=true quantization='int8'
 ```
 
 #### The Qwix Interception API
@@ -142,7 +142,7 @@ When using AQT, you can pass one of the following values to the `quantization` f
 #### Example command for AQT
 
 ```bash
-python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} base_output_directory=gs://<my-bucket> dataset_type=synthetic use_qwix_quantization=false quantization='int8'
+python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} base_output_directory=gs://<your-bucket> dataset_type=synthetic use_qwix_quantization=false quantization='int8'
 ```
 
 Note that `use_qwix_quantization` is not set to `True`.
diff --git a/docs/run_maxtext/run_maxtext_localhost.md b/docs/run_maxtext/run_maxtext_localhost.md
index 843c52a5f3..1a148cc6d8 100644
--- a/docs/run_maxtext/run_maxtext_localhost.md
+++ b/docs/run_maxtext/run_maxtext_localhost.md
@@ -26,7 +26,11 @@ MaxText uses a primary YAML file, `configs/base.yml`, to manage its settings. Th
   - `learning_rate`: The core hyperparameter for the optimizer.
   - Mode shape parameters: `base_num_decoder_layers`, `base_emb_dim`, `base_num_query_heads`, `base_num_kv_heads`, and `head_dim`.
 - **Override settings (optional):** You can modify training parameters in two ways: by editing `configs/base.yml` directly or by passing them as command-line arguments to the training script which is the recommended method. For example, to change the number of training steps, you can pass `--steps=500` when running `train.py`.
-- **Note**: You **must** update the variable `base_output_directory` which is initialized in `configs/base.yml` to point to a folder within the GCS bucket you just created (e.g., `gs://your-bucket-name/maxtext-output`).
+- **Note**: You **must** update the variable `base_output_directory` which is initialized in `configs/base.yml` to point to a folder within the GCS bucket you just created (e.g., `gs://<your-bucket-name>/maxtext-output`). You can set set an environment variable for that:
+
+```bash
+export BASE_OUTPUT_DIRECTORY=gs://<gcs bucket path>
+```
 
 ## Development
 
@@ -40,12 +44,12 @@ Local development on a single host TPU/GPU VM is a convenient way to run MaxText
 
 #### Run a Test Training Job
 
-After the installation is complete, run a short training job using synthetic data to confirm everything is working correctly. This command trains a model for just 10 steps. Remember to replace `$YOUR_JOB_NAME` with a unique name for your run and `gs://<my-bucket>` with the path to the GCS bucket you configured in the prerequisites.
+After the installation is complete, run a short training job using synthetic data to confirm everything is working correctly. This command trains a model for just 10 steps. Remember to replace `$YOUR_JOB_NAME` with a unique name for your run and to set `$BASE_OUTPUT_DIRECTORY` with the path to the GCS bucket you configured in the prerequisites.
 
 ```bash
 python3 -m maxtext.trainers.pre_train.train \
   run_name=${YOUR_JOB_NAME?} \
-  base_output_directory=gs://<my-bucket> \
+  base_output_directory=${BASE_OUTPUT_DIRECTORY?} \
   dataset_type=synthetic \
   steps=10
 ```
@@ -59,7 +63,7 @@ To demonstrate model output, run the following command:
 ```bash
 python3 -m maxtext.inference.decode \
   run_name=${YOUR_JOB_NAME?} \
-  base_output_directory=gs://<my-bucket> \
+  base_output_directory=${BASE_OUTPUT_DIRECTORY?} \
   per_device_batch_size=1
 ```
 
@@ -80,7 +84,7 @@ To use a pre-configured model for TPUs, you override the `model_name` parameter,
 python3 -m maxtext.trainers.pre_train.train \
   model_name=llama3-8b \
   run_name=${YOUR_JOB_NAME?} \
-  base_output_directory=gs://<my-bucket> \
+  base_output_directory=${BASE_OUTPUT_DIRECTORY?} \
   dataset_type=synthetic \
   steps=10
 ```
@@ -94,7 +98,7 @@ python3 -m maxtext.trainers.pre_train.train \
 python3 -m maxtext.trainers.pre_train.train \
   model_name=qwen3-4b \
   run_name=${YOUR_JOB_NAME?} \
-  base_output_directory=gs://<my-bucket> \
+  base_output_directory=${BASE_OUTPUT_DIRECTORY?} \
   dataset_type=synthetic \
   steps=10
 ```
@@ -111,7 +115,7 @@ To use a GPU-optimized configuration, you should specify the path to the model's
 ```bash
 python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/gpu/models/mixtral_8x7b.yml \
   run_name=${YOUR_JOB_NAME?} \
-  base_output_directory=gs://<my-bucket> \
+  base_output_directory=${BASE_OUTPUT_DIRECTORY?} \
   dataset_type=synthetic \
   steps=10
 ```
@@ -126,7 +130,7 @@ This will load `gpu/mixtral_8x7b.yml`, which inherits from `base.yml`.
 ```bash
 python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/gpu/models/llama3-8b.yml \
   run_name=${YOUR_JOB_NAME?} \
-  base_output_directory=gs://<my-bucket> \
+  base_output_directory=${BASE_OUTPUT_DIRECTORY?} \
   dataset_type=synthetic \
   steps=10
 ```
diff --git a/docs/run_maxtext/run_maxtext_via_xpk.md b/docs/run_maxtext/run_maxtext_via_xpk.md
index c800366aee..12c3a733cc 100644
--- a/docs/run_maxtext/run_maxtext_via_xpk.md
+++ b/docs/run_maxtext/run_maxtext_via_xpk.md
@@ -144,8 +144,8 @@ This guide focuses on submitting workloads to an existing cluster. Cluster creat
    # region as your TPUs to minimize latency and costs.
    # You can list your buckets and their locations in the
    # [Cloud Console](https://console.cloud.google.com/storage/browser).
-   export BASE_OUTPUT_DIRECTORY=<gcs bucket path> # e.g., gs://my-bucket/maxtext-runs
-   export DATASET_PATH="gs://your-dataset-bucket/"
+   export BASE_OUTPUT_DIRECTORY=<gcs bucket path> # e.g., gs://<my-bucket>/maxtext-runs
+   export DATASET_PATH="gs://<your-dataset-bucket>/"
    ```
 
 2. **Configure gcloud CLI**
diff --git a/docs/tutorials/first_run.md b/docs/tutorials/first_run.md
index f04c6acb9c..3bc0ca4e68 100644
--- a/docs/tutorials/first_run.md
+++ b/docs/tutorials/first_run.md
@@ -24,7 +24,11 @@ This topic provides a basic introduction to get your MaxText workload up and run
 
 1. To store logs and checkpoints, [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) in your project. To run MaxText, the TPU or GPU VMs must have read/write permissions for the bucket. These permissions are granted by service account roles, such as the `STORAGE ADMIN` role.
 
-2. MaxText reads a yaml file for configuration. We also recommend reviewing the configurable options in `configs/base.yml`. This file includes a decoder-only model of ~1B parameters. The configurable options can be overwritten from the command line. For instance, you can change the `steps` or `log_period` by either modifying `configs/base.yml` or by passing in `steps` and `log_period` as additional arguments to the `train.py` call. Set `base_output_directory` to a folder in the bucket you just created.
+2. MaxText reads a yaml file for configuration. We also recommend reviewing the configurable options in `configs/base.yml`. This file includes a decoder-only model of ~1B parameters. The configurable options can be overwritten from the command line. For instance, you can change the `steps` or `log_period` by either modifying `configs/base.yml` or by passing in `steps` and `log_period` as additional arguments to the `train.py` call. Set `base_output_directory` to a folder in the bucket you just created. You can set set an environment variable for that:
+
+```bash
+export BASE_OUTPUT_DIRECTORY=gs://<gcs bucket path>
+```
 
 ## Local development for single host
 
@@ -42,7 +46,7 @@ multiple hosts but is a good way to learn about MaxText.
 ```sh
 python3 -m maxtext.trainers.pre_train.train \
   run_name=${YOUR_JOB_NAME?} \
-  base_output_directory=gs://<my-bucket> \
+  base_output_directory=${BASE_OUTPUT_DIRECTORY?} \
   dataset_type=synthetic \
   steps=10
 ```
@@ -54,7 +58,7 @@ Optional: If you want to try training on a Hugging Face dataset, see [Data Input
 ```sh
 python3 -m maxtext.inference.decode \
   run_name=${YOUR_JOB_NAME?} \
-  base_output_directory=gs://<my-bucket> \
+  base_output_directory=${BASE_OUTPUT_DIRECTORY?} \
   per_device_batch_size=1
 ```
 
@@ -76,7 +80,7 @@ You can use [demo_decoding.ipynb](https://github.com/AI-Hypercomputer/maxtext/bl
 ```sh
 python3 -m maxtext.trainers.pre_train.train \
   run_name=${YOUR_JOB_NAME?} \
-  base_output_directory=gs://<my-bucket> \
+  base_output_directory=${BASE_OUTPUT_DIRECTORY?} \
   dataset_type=synthetic \
   steps=10
 ```
@@ -86,7 +90,7 @@ python3 -m maxtext.trainers.pre_train.train \
 ```sh
 python3 -m maxtext.inference.decode \
   run_name=${YOUR_JOB_NAME?} \
-  base_output_directory=gs://<my-bucket> \
+  base_output_directory=${BASE_OUTPUT_DIRECTORY?} \
   per_device_batch_size=1
 ```
 
diff --git a/docs/tutorials/posttraining/multimodal.md b/docs/tutorials/posttraining/multimodal.md
index 8590761be6..8fff8da747 100644
--- a/docs/tutorials/posttraining/multimodal.md
+++ b/docs/tutorials/posttraining/multimodal.md
@@ -127,8 +127,8 @@ Supervised Fine-Tuning (SFT) of multimodal LLMs in MaxText focuses specifically
 Here, we use [ChartQA](https://huggingface.co/datasets/HuggingFaceM4/ChartQA) as an example to demonstrate SFT functionality:
 
 ```shell
-export MAXTEXT_CKPT_PATH=...  # either set to an already available MaxText ckpt or to the one we just converted in the previous step
-export BASE_OUTPUT_DIRECTORY=gs://...
+export MAXTEXT_CKPT_PATH=<your-checkpoints-path>  # either set to an already available MaxText ckpt or to the one we just converted in the previous step
+export BASE_OUTPUT_DIRECTORY=gs://<your-bucket>
 export STEPS=1000
 python -m maxtext.trainers.post_train.sft.train_sft_deprecated \
     src/maxtext/configs/post_train/sft-vision-chartqa.yml \
diff --git a/docs/tutorials/pretraining.md b/docs/tutorials/pretraining.md
index a1ae985db0..e19c04b83e 100644
--- a/docs/tutorials/pretraining.md
+++ b/docs/tutorials/pretraining.md
@@ -40,7 +40,7 @@ We can use this **command** for pretraining:
 ```bash
 # replace base_output_directory with your bucket
 python3 -m maxtext.trainers.pre_train.train \
-base_output_directory=gs://runner-maxtext-logs run_name=demo \
+base_output_directory=gs://<your-bucket> run_name=demo \
 model_name=deepseek2-16b per_device_batch_size=1 steps=10 max_target_length=2048 enable_checkpointing=false \
 dataset_type=hf hf_path=allenai/c4 hf_data_dir=en train_split=train \
 tokenizer_type=huggingface tokenizer_path=deepseek-ai/DeepSeek-V2-Lite
@@ -93,9 +93,9 @@ Grain is a library for reading data for training and evaluating JAX models. It i
 
 **Data preparation**: You need to download data to a Cloud Storage bucket, and read data via Cloud Storage Fuse with [setup_gcsfuse.sh](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/dependencies/scripts/setup_gcsfuse.sh).
 
-- For example, we can mount the bucket `gs://maxtext-dataset` on the local path `/tmp/gcsfuse` before training
+- For example, we can mount the bucket `gs://<your-dataset>` on the local path `/tmp/gcsfuse` before training
   ```bash
-  bash setup_gcsfuse.sh DATASET_GCS_BUCKET=maxtext-dataset MOUNT_PATH=/tmp/gcsfuse
+  bash setup_gcsfuse.sh DATASET_GCS_BUCKET=gh://<your-dataset> MOUNT_PATH=/tmp/gcsfuse
   ```
 - After training, we unmount the local path
   ```bash
@@ -107,7 +107,7 @@ This **command** shows pretraining with Grain pipeline, along with evaluation:
 ```bash
 # replace DATASET_GCS_BUCKET and base_output_directory with your buckets
 python3 -m maxtext.trainers.pre_train.train \
-base_output_directory=gs://runner-maxtext-logs run_name=demo \
+base_output_directory=gs://<your-bucket> run_name=demo \
 model_name=deepseek2-16b per_device_batch_size=1 steps=10 max_target_length=2048 enable_checkpointing=false \
 dataset_type=grain grain_file_type=arrayrecord grain_train_files=/tmp/gcsfuse/array-record/c4/en/3.0.1/c4-train.array_record* grain_worker_count=2 \
 eval_interval=5 eval_steps=10 grain_eval_files=/tmp/gcsfuse/array-record/c4/en/3.0.1/c4-validation.array_record* \
@@ -144,7 +144,7 @@ This **command** shows pretraining with TFDS pipeline, along with evaluation:
 ```bash
 # replace base_output_directory and dataset_path with your buckets
 python3 -m maxtext.trainers.pre_train.train \
-base_output_directory=gs://runner-maxtext-logs run_name=demo \
+base_output_directory=gs://<your-bucket> run_name=demo \
 model_name=deepseek2-16b per_device_batch_size=1 steps=10 max_target_length=2048 enable_checkpointing=false \
 dataset_type=tfds dataset_path=gs://maxtext-dataset dataset_name='c4/en:3.0.1' train_split=train \
 eval_interval=5 eval_steps=10 eval_dataset_name='c4/en:3.0.1' eval_split=validation \