I think the script in the Setup of SayCan on a Robot Pick and Place Tabletop Environment loads the pre-trained weights of ViLDs.
https://github.com/google-research/google-research/blob/master/saycan/SayCan-Robot-Pick-Place.ipynb
ViLD pretrained model weights.
!gsutil cp -r gs://cloud-tpu-checkpoints/detection/projects/vild/colab/image_path_v2 . /
Are the weights here pre-trained by Google Research?
I was curious because I could not find any weights named image_path_v2 in the official ViLD github repository.
https://github.com/tensorflow/tpu/tree/master/models/official/detection/projects/vild