Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ data:
--kv_cache_free_gpu_mem_fraction 0.95 > $output_file

cat $output_file
gsutil cp $output_file /gcs/benchmark_logs/
gcloud storage cp $output_file /gcs/benchmark_logs/

rm -rf $engine_dir
rm -f $dataset_file
Expand Down
2 changes: 1 addition & 1 deletion src/launchers/trtllm-launcher.sh
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ run_benchmark() {
fi

cat $output_file
gsutil cp $output_file /gcs/benchmark_logs/trtllm/
gcloud storage cp $output_file /gcs/benchmark_logs/trtllm/

rm -rf $engine_dir
rm -f $dataset_file
Expand Down
5 changes: 2 additions & 3 deletions src/utils/data_processing/waymo_dataset/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Before running the script, ensure you have the following prerequisites installed

#### Google Cloud SDK

The `gsutil` command-line tool is required to download the dataset from Google Cloud Storage.
The `gcloud storage` command-line tool is required to download the dataset from Google Cloud Storage.

1. Install the Google Cloud SDK.
2. Authenticate with Google Cloud:
Expand Down Expand Up @@ -103,12 +103,11 @@ print(processed_dataset[0])

### 5. Common Issues

1. **`gsutil` Command Not Found**: This error occurs if the Google Cloud SDK is not installed or not in your system's `PATH`. Please follow the installation instructions in the Prerequisites section.
1. **`gcloud storage` Command Not Found**: This error occurs if the Google Cloud SDK is not installed or not in your system's `PATH`. Please follow the installation instructions in the Prerequisites section.

2. **GCS Access Denied / 401 Errors**: This indicates an authentication or permission issue.
- Ensure you have registered for the Waymo dataset.
- Run `gcloud auth login` and `gcloud auth application-default login` to authenticate.
- Make sure your GCP user or service account has `Storage Object Viewer` permissions on the `gs://waymo_open_dataset_v_2_0_1/` bucket.

3. **Corrupted Files**: If a specific Parquet file fails to process, it might be corrupted. The script is designed to be robust and will log an error and skip the corrupted segment, continuing with the rest of the data.

Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ def _download_dataset_locally(input_dir: str):
# If PARQUET_ID is empty, download all parquets in the directory
source_for_gsutil = os.path.join(remote_path_item, "*.parquet")

gsutil_command = ["gsutil", "-m", "cp", "-r", source_for_gsutil, local_path_dir]
gsutil_command = ["gcloud", "storage", "cp", "--recursive", source_for_gsutil, local_path_dir]

logger.info(
f"[DATALOADER] Downloading dataset. Command: {' '.join(gsutil_command)}"
Expand All @@ -164,9 +164,9 @@ def _download_dataset_locally(input_dir: str):
)
logger.info(f"[DATALOADER] Successfully downloaded to {local_path_dir}.")
if result.stdout:
logger.info(f"[DATALOADER] gsutil stdout: {result.stdout}")
logger.info(f"[DATALOADER] gcloud stdout: {result.stdout}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this be "gcloud storage" ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcloud should be fine

if result.stderr: # gsutil often prints status to stderr even on success
logger.info(f"[DATALOADER] gsutil stderr: {result.stderr}")
logger.info(f"[DATALOADER] gcloud stderr: {result.stderr}")
except subprocess.CalledProcessError as e:
logger.error(
f"[Fatal][DATALOADER] Failed to download from {source_for_gsutil} to {local_path_dir}. "
Expand Down