-
Notifications
You must be signed in to change notification settings - Fork 13
Reproducing PACO and LVIS #28
Description
Hello.
I am trying to reproduce the semantic similarity and semantic IoU results for your LVIS and PACO experiments. My results are worse than reported, so I suspect I might be doing something wrong.
I have both the 1.5B and 3B checkpoints and can run the notebooks correctly on them, getting similar outputs to the ones from the provided notebook. I am trying to reproduce the paper region image classification results. I could not find the eval script to get them, so I referred to Osprey repo for these scripts, in particular, the file for LVIS and PACO eval. The referred script uses BERT to encode both ground-truth and decoded generate prediction, then compares them.
To get the numbers, I simply used the prompt from PAM for recognize_en task, then compared the outputs like in the reference scripts. I used bounding boxes as auxiliary visual prompts, as they are provided in the data. Since both my numbers and also the qualitative predictions are much inferior to what they should be, I must be doing something wrong. Can you help me figure out what? Was there any extra fine-tuning on top of the provided 3B/1.5B scripts? Is the generate configuration changed in any way compared to the notebook? Was there any extra preprocessing not in the current repo?