曝光台 注意防骗
网曝天猫店富美金盛家居专营店坑蒙拐骗欺诈消费者
For update summarization task, we experimented with
different sets of features. Using averages of feature
values comparing summary-background input and
summary-update input, we obtained lower correlations
with manual scores than when features based only on
the update input were used. The summary- update input
features also outperform a linear regression metric
which combines individual features from comparison
with background and update inputs (Table 4). This result
is not intuitive given the task definition. The background
input is an important factor affecting the decision
to include a particular content unit from the update set of
documents. Further analysis needs to be carried out to
ascertain the relative importance of the two input sets and
how to best combine their features.
We also plan to expand our suite of features. A handful
of other distributional similarity functions remain unexplored
for our task and will be a readily accessible set of
features— Euclidean distance, Jaccard’s coefficient, L1
norm, confusion probability and skew divergence (Lee,
1999).
9 Conclusion
Summarization evaluation has always included human
effort thereby limiting their scale and repeatability. In
this paper, we have presented a successful framework for
moving towards model-free evaluations—using the input
as reference.
We have analyzed a variety of features for input/
summary comparisons and demonstrated that the
strength of different features vary, with certain features
better suited for content comparisons. Low divergence
from the input and diverse use of topic signatures in the
summary are highly indicative of good content. We also
find that preprocessing like stemming is useful in leveraging
the capability of some features.
Very good results were obtained from a correlation
analysis with human judgements, showing that input can
indeed substitute for model summaries and manual efforts
in summary evaluation. The best correlations were
obtained by a single feature, JS divergence (0.9with pyramid
scores and 0.7 with responsiveness).
We have shown that the power of model-free evaluations
generalizes across atleast two summarization tasks.
Input is found useful in evaluating both query focused
and update summaries. We have also presented a discussion
on interesting questions on optimization and evaluation
that arise as a result of this work and some future
directions for input based evaluations.
References
Breck Baldwin, Robert Donaway, Eduard Hovy, Elizabeth
Liddy, Inderjeet Mani, Daniel Marcu, Kathleen McKeown,
Vibhu Mittal, Marc Moens, Dragomir Radev, Karen Sparck-
Jones, Beth Sundheim, Simone Teufel, Ralph Weischedel,
and Michael White. 2000. An Evaluation Road Map for
Summarization Research. The Summarization Roadmap.
Ronald Brandow, Karl Mitze, and Lisa F. Rau. 1995. Automatic
condensation of electronic publications by sentence
selection. Information Processing and Management,
31(5):675–685.
John Conroy, Judith Schlesinger, and Dianne O’Leary. 2006.
Topic-focused multi-document summarization using an approximate
oracle score. In Proceedings of ACL, short paper.
Ido Dagan, Fernando Pereira, and Lillian Lee. 1994.
Similarity-based estimation of word cooccurrence probabilities.
In Proceedings of the 32nd annual meeting on Association
for Computational Linguistics, pages 272–278.
Robert L. Donaway, Kevin W. Drummey, and Laura A. Mather.
2000. A comparison of rankings produced by summarization
evaluation measures. In NAACL-ANLP Workshop on Automatic
Summarization.
Donna Harman and Paul Over. 2004. The effects of human
variation in duc summarization evaluation. In ACL Text summarization
branches out workshop.
Martin Hassel and Jonas Sj¨obergh. 2006. Towards holistic
summarization: Selecting summaries, not sentences. In Proceedings
of LREC 2006, Genoa, Italy.
Hongyan Jing, Regina Barzilay, Kathleen McKeown, and
Michael Elhadad. 1998. Summarization evaluation methods:
Experiments and analysis. In AAAI Symposium on Intelligent
Summarization.
Mirella Lapata and Regina Barzilay. 2005. Automatic evaluation
of text coherence: Models and representations. In IJCAI’
05.
Maria Lapata. 2000. The automatic interpretation of nominalizations.
In Proceedings of the Seventeenth National Conference
on Artificial Intelligence and Twelfth Conference on
Innovative Applications of Artificial Intelligence, pages 716–
721.
Lillian Lee. 1999. Measures of distributional similarity. In
Proceedings of the 37th annual meeting of the Association
中国航空网 www.aero.cn
航空翻译 www.aviation.cn
本文链接地址:
航空资料36(46)