Scoring Metric

The peer summaries will be automatically compared with the reference/model summaries by using the ROUGE toolkit. Character-based ROUGE-2 and ROUGE-SU4 Recall / F-measure will be used as evaluation metrics.


Due to the large amount of computation, it takes roughly 4 to 8 minutes(depends on your connection quality) to evaluate your results. Please be patient.