Skip to content

human_ratings #4

@kyleeea

Description

@kyleeea

Thank you for your work. I would like to inquire about the human rating section. Do rater1, rater2, and rater3 each represent scores given by three different people on the same image?

Regarding the computation of inter-rater Spearman correlation: I noticed in your code that it seems to calculate the Spearman correlation between one rater’s scores and the average of the other two raters' scores. Is that correct?

Since the human ratings are fixed, I believe the inter-rater Spearman correlation should also be a fixed value. However, when I computed the Spearman correlation for the SC score under the Text-Guided_IG task, I obtained a value of 0.55178, whereas the value reported in your paper is 0.5044. Could you kindly clarify this discrepancy?

If possible, would you mind sharing a brief implementation snippet for this part of the calculation?

Here is the code I calculated:

    human_ratings = human_ratings_dir
    mm = task.replace("ImagenHub_", "")

    df1 = pd.read_csv(os.path.join(human_ratings_dir, f"Text-Guided_IG_rater1.tsv"), sep="\t")
    df2 = pd.read_csv(os.path.join(human_ratings_dir, f"Text-Guided_IG_rater2.tsv"), sep="\t")
    df3 = pd.read_csv(os.path.join(human_ratings_dir, f"Text-Guided_IG_rater3.tsv"), sep="\t")

    samples = df1['uid'].tolist()

    score_dict = {}
    for sample in samples:
        try:
            s1 = ast.literal_eval(df1.loc[df1['uid'] == sample, model].values[0])[0]  
            s2 = ast.literal_eval(df2.loc[df2['uid'] == sample, model].values[0])[0]
            s3 = ast.literal_eval(df3.loc[df3['uid'] == sample, model].values[0])[0]
            score_dict[sample] = [s1, s2, s3]
        except:
            continue  

    def get_rho(target_idx):
        target_scores = []
        avg_other_scores = []
        for sample, scores in score_dict.items():
            if len(scores) != 3:
                continue
            target = scores[target_idx]
            others = [scores[i] for i in range(3) if i != target_idx]
            avg_others = np.mean(others)
            target_scores.append(target)
            avg_other_scores.append(avg_others)
        if len(target_scores) < 2:
            return np.nan
        rho, _ = spearmanr(target_scores, avg_other_scores)
        return rho

    rho1 = get_rho(0)
    rho2 = get_rho(1)
    rho3 = get_rho(2)

    avg_rho = np.nanmean([rho1, rho2, rho3])
    print(f"Inter-rater Spearman ρ (each vs others' avg, SC only): {avg_rho:.4f}")
    return avg_rho

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions