Evaluation Protocol for synchronization accuracy in Perfect Match Paper

Hello,

I have a couple of questions regarding the 75.8% synchronization accuracy reported in https://ieeexplore.ieee.org/abstract/document/9067055/ 

Perfect match Evaluation protocol: The task is to determine the correct synchronisation within a ±15 frame window, and the synchronisation is determined to be correct if the predicted offset is within 1 video frame of the ground truth. A random prediction would therefore yield 9.7% accuracy.

1. How does changing M affect the model?
2. The training is a 46-way classification. How exactly do you go from 46-way classification to ±15 way classification?
3. Do you have the class-split for your evaluation data? Aren't all the test samples in sync? Where do you get out of sync ground truth frames from?
4. The accuracy for N-way classification reported [here](https://arxiv.org/pdf/2002.08742.pdf) is 49%. But your numbers are much higher. I'm wondering why there is a large discrepancy in the two numbers.
5. The visual stream uses whole face pixels and not just mouth crops. Is that correct?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Protocol for synchronization accuracy in Perfect Match Paper #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Evaluation Protocol for synchronization accuracy in Perfect Match Paper #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions