航空资料8(95)_航空信息_民用航空_通用航空

曝光台注意防骗网曝天猫店富美金盛家居专营店坑蒙拐骗欺诈消费者

and background inputs do not lead to better
correlations than those obtained using the update
input only. The best performance from combined
features is given by the linear regression metric.
Although the correlation of this regression feature
with pyramid scores (0.80) is comparable to JS divergence
with update inputs, its correlation with
responsiveness (0.67) is clearly lower. These results
show that the term distributions in the update
input are sufficiently good predictors of content
for update summaries. The role of the background
input appears to be negligable.
8 Discussion
We have presented a successful framework for
model-free evaluations of content which uses the
input as reference. The power of model-free evaluations
generalizes across at least two summarization
tasks: query focused and update summarization.
We have analyzed a variety of features for inputsummary
comparison and demonstrated that the
strength of different features varies considerably.
Similar term distributions in the input and the summary
and diverse use of topic signatures in the
summary are highly indicative of good content.
We also find that preprocessing like stemming improves
the performance of KL and JS divergence
features.
Very good results were obtained from a correlation
analysis with human judgements, showing
that input can indeed substitute for model summaries
and manual efforts in summary evaluation.
The best correlations were obtained by a single
feature, JS divergence (0.88 with pyramid scores
and 0.73 with responsiveness at system level).
Our best features can therefore be used to evaluate
the content selection performance of systems
in a new domain where model summaries are unavailable.
However, like all other content evaluation
metrics, our features must be accompanied by
judgements of linguistic quality to obtain wholesome
indicators of summary quality and system
performance. Evidence for this need is provided
by the lower correlations with responsiveness than
the content-only pyramid evaluations.
The results of our analysis zero in on JS divergence
and topic signature as desirable objectives to
optimize during content selection. On the macro
level, they are powerful predictors of content quality.
These findings again emphasize the need for
always including linguistic quality as a component
of evaluation.
Observations from our input-based evaluation
also have important implications for the design of
novel summarization tasks. We find that high correlations
with manual evaluations are obtained by
comparing query-focused summaries with the entire
input and making no use of the query at all.
Similarly in the update summarization task, the
best predictions of content for update summaries
were obtained using only the update input. The
uncertain role of background inputs and queries
expose possible problems with the task designs.
Under such conditions, it is not clear if queryfocused
content selection or ability to compile updates
are appropriately captured by any evaluation.
References
J. Conroy and H. Dang. 2008. Mind the gap: Dangers
of divorcing evaluations of summary content from
linguistic quality. In Proceedings of the 22nd International
Conference on Computational Linguistics
(Coling 2008), pages 145–152.
J. Conroy, J. Schlesinger, and D. O’Leary. 2006.
Topic-focusedmulti-document summarization using
an approximate oracle score. In Proceedings of
ACL, short paper.
313
R. Donaway, K. Drummey, and L. Mather. 2000. A
comparison of rankings produced by summarization
evaluation measures. In NAACL-ANLP Workshop
on Automatic Summarization.
H. Jing, R. Barzilay, K. Mckeown, and M. Elhadad.
1998. Summarization evaluation methods: Experiments
and analysis. In In AAAI Symposium on Intelligent
Summarization, pages 60–68.
M. Lapata and R. Barzilay. 2005. Automatic evaluation
of text coherence: Models and representations.
In IJCAI’05.
C. Lin and E. Hovy. 2000. The automated acquisition
of topic signatures for text summarization. In Proceedings
of the 18th conference on Computational
linguistics, pages 495–501.
C. Lin and E. Hovy. 2003. Automatic evaluation of
summaries using n-gram co-occurance statistics. In
Proceedings of HLT-NAACL 2003.
C. Lin, G. Cao, J. Gao, and J. Nie. 2006. An
information-theoretic approach to automatic evaluation
of summaries. In Proceedings of the Human
Language Technology Conference of the NAACL,
Main Conference, pages 463–470.
　
中国航空网 www.aero.cn
航空翻译 www.aviation.cn
本文链接地址：航空资料8(95)