I use a multi-step process to rank papers based on OpenReview data:
Review Data Collection:
I start by gathering all available reviews for each paper from the OpenReview platform for the specified venue. This includes numerical scores (like overall rating, confidence, soundness, etc.) and any textual comments provided by the reviewers. Each review is associated with a specific paper and reviewer.
Normalization with Z-scores:
Reviewers often have different grading tendencies. Some might consistently give high scores, while others are more critical. To make the scores comparable, I use a technique called z-score normalization. A z-score tells you how far away a particular score is from the average score, measured in units of standard deviation. The standard deviation represents the typical spread or variability in the scores.
z = (reviewer's score - average score for that
aspect) / standard deviation of scores for that
aspect
Aggregation of Normalized Scores:
After calculating z-scores for each aspect of each review, I combine the scores for each paper. For each aspect (like "rating" or "confidence"), I average the z-scores across all reviewers who reviewed that paper. This gives me a single, aggregated z-score for each aspect, for each paper.
Beta Distribution Modeling:
To represent the overall quality of a paper, and to acknowledge the inherent uncertainty in the review process, I model the aggregated, normalized scores using a Beta distribution.
α = 1 + transformed_z` and `β = 1 + (1 -
transformed_z)
. The mean of this Beta distribution is then
calculated as α / (α + β)
.Final Score Calculation:
To arrive at a single final score for each paper, I multiply together the means of the Beta distributions calculated for each aspect (e.g., rating, soundness, contribution). This combines all the individual aspect scores into one overall quality metric. The product is used to ensure that a paper scores highly in all aspects to get a high final score. A low score in even one aspect will significantly reduce the final score.