human_ratings

Thank you for your work. I would like to inquire about the human rating section. Do rater1, rater2, and rater3 each represent scores given by three different people on the same image?

Regarding the computation of inter-rater Spearman correlation: I noticed in your code that it seems to calculate the Spearman correlation between one rater’s scores and the average of the other two raters' scores. Is that correct?

Since the human ratings are fixed, I believe the inter-rater Spearman correlation should also be a fixed value. However, when I computed the Spearman correlation for the SC score under the Text-Guided_IG task, I obtained a value of 0.55178, whereas the value reported in your paper is 0.5044. Could you kindly clarify this discrepancy?

If possible, would you mind sharing a brief implementation snippet for this part of the calculation?

Here is the code I calculated:



```python
    human_ratings = human_ratings_dir
    mm = task.replace("ImagenHub_", "")

    df1 = pd.read_csv(os.path.join(human_ratings_dir, f"Text-Guided_IG_rater1.tsv"), sep="\t")
    df2 = pd.read_csv(os.path.join(human_ratings_dir, f"Text-Guided_IG_rater2.tsv"), sep="\t")
    df3 = pd.read_csv(os.path.join(human_ratings_dir, f"Text-Guided_IG_rater3.tsv"), sep="\t")

    samples = df1['uid'].tolist()

    score_dict = {}
    for sample in samples:
        try:
            s1 = ast.literal_eval(df1.loc[df1['uid'] == sample, model].values[0])[0]  
            s2 = ast.literal_eval(df2.loc[df2['uid'] == sample, model].values[0])[0]
            s3 = ast.literal_eval(df3.loc[df3['uid'] == sample, model].values[0])[0]
            score_dict[sample] = [s1, s2, s3]
        except:
            continue  

    def get_rho(target_idx):
        target_scores = []
        avg_other_scores = []
        for sample, scores in score_dict.items():
            if len(scores) != 3:
                continue
            target = scores[target_idx]
            others = [scores[i] for i in range(3) if i != target_idx]
            avg_others = np.mean(others)
            target_scores.append(target)
            avg_other_scores.append(avg_others)
        if len(target_scores) < 2:
            return np.nan
        rho, _ = spearmanr(target_scores, avg_other_scores)
        return rho

    rho1 = get_rho(0)
    rho2 = get_rho(1)
    rho3 = get_rho(2)

    avg_rho = np.nanmean([rho1, rho2, rho3])
    print(f"Inter-rater Spearman ρ (each vs others' avg, SC only): {avg_rho:.4f}")
    return avg_rho
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

human_ratings #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

human_ratings #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions