曝光台 注意防骗
网曝天猫店富美金盛家居专营店坑蒙拐骗欺诈消费者
those reported in [20] is not possible. However, the lists of
top attributes shown in Table 3 for City and River can be
compared against their equivalent lists reported in [20] for
the classes Town and River, shown below for reference:
Town: [population, history, home page, sightseeing, info,
nance, facility, heritage, environment, hot spring];
River: [water level, upstream, name, environment, water
quality, history, head stream, picture, water, surface] [20].
Although query logs received much attention in the task
of improving information retrieval, they were explored as a
resource for acquiring explicit relations in information ex-
traction only recently [13]. Comparatively, the attribute ex-
traction method introduced here replaces handcrafted pat-
terns with seed attributes, pursues a larger-scale evaluation
over 40 instead of only 5 target classes, and operates with
signicantly higher accuracy as described in Section 3.
5. CONCLUSION
Traditional wisdom suggests that textual documents tend
to assert information (statements or facts) about the world
in the form of expository text. Comparatively, search queries
can be thought of as being nothing more than noisy, keyword-
based approximations of often-underspecied user informa-
tion needs (interrogations). Despite this apparent disad-
WWW 2007 / Track: Data Mining Session: Mining Textual Data
109
vantage, and in a departure from previous approaches to
large-scale information extraction from the Web, this pa-
per introduces a weakly-supervised extraction framework for
mining useful knowledge from query logs, rather than Web
documents. The framework lends itself to a concrete Web
mining task, namely class attribute extraction. In an eval-
uation of attributes extracted for a variety of domains of
interest to Web search users, the quality of the resulting at-
tributes exceeds previously reported results by 25% at rank
10, and 43% at rank 50, thus holding the promise of a new
path in research in information extraction from query logs.
Since the extracted attributes correspond to types of facts
collected from actual search queries submitted byWeb users,
they constitute a building block towards acquiring large
repositories of facts from Web documents, and exploiting
them during Web search. Ongoing work spans diversi-
cation towards other languages; the development of open-
domain methods for populating attributes corresponding to
non-traditional types of facts (e.g., side eects for Drug and
seating arrangement for AircraftModel, rather than tradi-
tionally studied cases like population for Country or height
for Mountain); an exploration of the role of query logs in
other information extraction tasks; and the integration of
signals from documents in addition to query logs.
6. ACKNOWLEDGMENTS
The author would like to thank Hang Cui, for comments
on an early draft; Nikesh Garera and Benjamin Van Durme,
for implementing the computation of pairwise similarities
of search-signature vectors, writing scripts for determining
precision scores and manually evaluating a subset of the can-
didate attributes extracted for 5 target classes; and Dekang
Lin, for collecting and providing access to distributionally
similar phrases.
7. REFERENCES
[1] A. Budanitsky and G. Hirst. Evaluating WordNet-based
measures of semantic distance. Computational Linguistics,
2006.
[2] M. Cafarella, D. Downey, S. Soderland, and O. Etzioni.
KnowItNow: Fast, scalable information extraction from the
Web. In Proceedings of the Human Language Technology
Conference (HLT-EMNLP-05), pages 563{570, Vancouver,
Canada, 2005.
[3] T. Chklovski and Y. Gil. An analysis of knowledge
collected from volunteer contributors. In Proceedings of the
20th National Conference on Articial Intelligence
(AAAI-05), pages 564{571, Pittsburgh, Pennsylvania, 2005.
[4] H. Cui, J. Wen, J. Nie, and W. Ma. Probabilistic query
qxpansion using query logs. In Proceedings of the 11th
World Wide Web Conference (WWW-02), pages 325{332,
Honolulu, Hawaii, 2002.
[5] S. Dumais, M. Banko, E. Brill, J. Lin, and A. Ng. Web
question answering: Is more always better? In Proceedings
of the 24th ACM Conference on Research and
Development in Information Retrieval (SIGIR-02), pages
207{214, Tampere, Finland, 2002.
[6] L. Lee. Measures of distributional similarity. In Proceedings
of the 37th Annual Meeting of the Association of
Computational Linguistics (ACL-99), pages 25{32, College
Park, Maryland, 1999.
[7] M. Li, M. Zhu, Y. Zhang, and M. Zhou. Exploring
distributional similarity based models for query spelling
中国航空网 www.aero.cn
航空翻译 www.aviation.cn
本文链接地址:
航空资料36(25)