-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Thank you for your work. I would like to inquire about the human rating section. Do rater1, rater2, and rater3 each represent scores given by three different people on the same image?
Regarding the computation of inter-rater Spearman correlation: I noticed in your code that it seems to calculate the Spearman correlation between one rater’s scores and the average of the other two raters' scores. Is that correct?
Since the human ratings are fixed, I believe the inter-rater Spearman correlation should also be a fixed value. However, when I computed the Spearman correlation for the SC score under the Text-Guided_IG task, I obtained a value of 0.55178, whereas the value reported in your paper is 0.5044. Could you kindly clarify this discrepancy?
If possible, would you mind sharing a brief implementation snippet for this part of the calculation?
Here is the code I calculated:
human_ratings = human_ratings_dir
mm = task.replace("ImagenHub_", "")
df1 = pd.read_csv(os.path.join(human_ratings_dir, f"Text-Guided_IG_rater1.tsv"), sep="\t")
df2 = pd.read_csv(os.path.join(human_ratings_dir, f"Text-Guided_IG_rater2.tsv"), sep="\t")
df3 = pd.read_csv(os.path.join(human_ratings_dir, f"Text-Guided_IG_rater3.tsv"), sep="\t")
samples = df1['uid'].tolist()
score_dict = {}
for sample in samples:
try:
s1 = ast.literal_eval(df1.loc[df1['uid'] == sample, model].values[0])[0]
s2 = ast.literal_eval(df2.loc[df2['uid'] == sample, model].values[0])[0]
s3 = ast.literal_eval(df3.loc[df3['uid'] == sample, model].values[0])[0]
score_dict[sample] = [s1, s2, s3]
except:
continue
def get_rho(target_idx):
target_scores = []
avg_other_scores = []
for sample, scores in score_dict.items():
if len(scores) != 3:
continue
target = scores[target_idx]
others = [scores[i] for i in range(3) if i != target_idx]
avg_others = np.mean(others)
target_scores.append(target)
avg_other_scores.append(avg_others)
if len(target_scores) < 2:
return np.nan
rho, _ = spearmanr(target_scores, avg_other_scores)
return rho
rho1 = get_rho(0)
rho2 = get_rho(1)
rho3 = get_rho(2)
avg_rho = np.nanmean([rho1, rho2, rho3])
print(f"Inter-rater Spearman ρ (each vs others' avg, SC only): {avg_rho:.4f}")
return avg_rho