Reproducing PACO and LVIS

Hello.

I am trying to reproduce the semantic similarity and semantic IoU results for your LVIS and PACO experiments. My results are worse than reported, so I suspect I might be doing something wrong.

I have both the 1.5B and 3B checkpoints and can run the notebooks correctly on them, getting similar outputs to the ones from the provided notebook. I am trying to reproduce the paper region image classification results. I could not find the eval script to get them, so I referred to [Osprey repo](https://github.com/CircleRadon/Osprey/tree/main) for these scripts, in particular, the [file for LVIS and PACO eval](https://github.com/CircleRadon/Osprey/blob/main/osprey/eval/lvis_paco_eval.py). The referred script uses BERT to encode both ground-truth and decoded `generate` prediction, then compares them.

To get the numbers, I simply used the prompt from PAM for `recognize_en` task, then compared the outputs like in the reference scripts. I used bounding boxes as auxiliary visual prompts, as they are provided in the data. Since both my numbers and also the qualitative predictions are much inferior to what they should be, I must be doing something wrong. Can you help me figure out what? Was there any extra fine-tuning on top of the provided 3B/1.5B scripts? Is the generate configuration changed in any way compared to the notebook? Was there any extra preprocessing not in the current repo?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing PACO and LVIS #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproducing PACO and LVIS #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions