Skip to content

Reproducing PACO and LVIS #28

@hsilva664

Description

@hsilva664

Hello.

I am trying to reproduce the semantic similarity and semantic IoU results for your LVIS and PACO experiments. My results are worse than reported, so I suspect I might be doing something wrong.

I have both the 1.5B and 3B checkpoints and can run the notebooks correctly on them, getting similar outputs to the ones from the provided notebook. I am trying to reproduce the paper region image classification results. I could not find the eval script to get them, so I referred to Osprey repo for these scripts, in particular, the file for LVIS and PACO eval. The referred script uses BERT to encode both ground-truth and decoded generate prediction, then compares them.

To get the numbers, I simply used the prompt from PAM for recognize_en task, then compared the outputs like in the reference scripts. I used bounding boxes as auxiliary visual prompts, as they are provided in the data. Since both my numbers and also the qualitative predictions are much inferior to what they should be, I must be doing something wrong. Can you help me figure out what? Was there any extra fine-tuning on top of the provided 3B/1.5B scripts? Is the generate configuration changed in any way compared to the notebook? Was there any extra preprocessing not in the current repo?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions