曝光台 注意防骗
网曝天猫店富美金盛家居专营店坑蒙拐骗欺诈消费者
functionality is an excellent rst step towards extracting the
World Wide Web of facts:
Step One (mining from Web documents, as described
in [12]): for a target fact type (e.g., birth years of people),
starting from as few as 10 seed facts such as (John Lennon,
1941), mine a collection of textual Web documents to ac-
quire sets in the order of a million facts of the same type.
To fully take advantage of this rst step, however, one
would need to identify the types of facts or class attributes
that are of common interest to people in general, and to
Web search users in particular:
Step Two (mining from query logs, as introduced in
this paper): for a target class (e.g., Drug or AircraftModel),
starting from as few as 5 seed attributes (e.g., side eects
and maximum dose for Drug, or seating arrangement and
wingspan for AircraftModel) and/or 10 seed instances (e.g.,
Vicodin and Xanax for Drug, or Boeing 747 and Airbus 380
for AircraftModel), mine a collection of Web search queries
to acquire large sets of attributes for the same class.
1.2 Contributions
The seed-based identication of prominent class attributes
from unstructured text, without any further domain knowl-
edge, corresponds to a second, more general step towards
building the World Wide Web of facts. In this light, the
main contributions of this paper are twofold:
1) We introduce a weakly supervised framework in Sec-
tion 2.2, for mining Web search queries in order to explic-
itly extract open-domain knowledge that is expected to be
meaningful and suitable for later use. In contrast, previ-
ous work that looks at query logs as a useful resource does
so only to implicitly derive signals improving the quality of
various tasks such as information retrieval, whether through
re-ranking of the retrieved documents [22], query expan-
sion [4], or the development of spelling correction models [7].
Conversely, previous studies in large-scale information ex-
WWW 2007 / Track: Data Mining Session: Mining Textual Data
101
traction uniformly choose to capitalize on document col-
lections [11] rather than queries as preferred data source,
thus failing to take advantage of the wisdom of the (search)
crowds, to which millions of Web users contribute daily.
2)We illustrate how the proposed generic extraction frame-
work applies to the concrete task of class attribute extrac-
tion in Section 2.3. To properly ensure varied experimenta-
tion on several dimensions, the evaluation in Section 3 com-
putes the precision of attributes extracted for as many as 40
dierent target classes (CarModel, City, Drug, VideoGame
etc.), chosen liberally from a wide range of domains of inter-
est. As an illustration of the scope and time-intensive nature
of the evaluation, one of its pre-requisites is the manual as-
sessment of the correctness of more than 18,000 candidate
attributes. The precision numbers over the target classes are
excellent both in absolute value (0.90 for prec@10, 0.85 for
prec@20, and 0.76 for prec@50), and relatively to the qual-
ity of attributes extracted with handcrafted patterns from
query logs (with precision increasing by 25% for prec@10,
32% for prec@20, and 43% for prec@50).
1.3 Potential Applications
Besides their intended role in assembling a high-coverage
World WideWeb of facts, the extracted class attributes have
an array of other applications. In Web publishing, the at-
tributes constitute topics (e.g., radius, surface gravity, or-
bital velocity) to be suggested automatically, as human con-
tributors manually add new entries (e.g., for a newly discov-
ered celestial body) to resources such as Wikipedia [16]. In
open-domain question answering, the attributes are useful
in expanding and calibrating existing answer type hierar-
chies [8] towards frequent information needs. InWeb search,
the results returned to a query that refers to a named entity
(e.g., Pink Floyd) can be augmented with a compilation of
specic facts, based on the set of attributes extracted in ad-
vance for the class to which the named entity belongs. More-
over, the original query can be rened into semantically-
justied query suggestions, by concatenating it with one of
the top extracted attributes for the corresponding class (e.g.,
Pink Floyd albums for Pink Floyd).
Attribute extraction is a powerful tool in building new
search verticals in Web search semi-automatically, for exam-
ple to improve or provide alternative views of search results
for popular topics such as health, travel and so forth.
2. EXTRACTIONFROMSEARCHQUERIES
2.1 Mining Queries Rather Than Documents
中国航空网 www.aero.cn
航空翻译 www.aviation.cn
本文链接地址:
航空资料36(14)