曝光台 注意防骗
网曝天猫店富美金盛家居专营店坑蒙拐骗欺诈消费者
Religion gods, goddesses
Stadium layout, design
Stadium turf, grass
Hypernyms BasicFood nutrients, vitamins
BasicFood nutrients, antioxidants
Painter artwork, paintings
Stadium events, concerts
TerroristGroup attacks, bombings
Probably useless:
Siblings BasicFood calories, carbs
Mountain longitude, latitude
NationalPark birds, animals
Painter paintings, drawings
River length, width
Incorrect Actor ethnicity, nationality
Disease symptoms, complications
Disease pathophysiology, etiology
NationalPark camping, hiking
SportEvent winners, nalists
Stadium renovation, construction
Table 5: Manual correctness judgments for various
pairs of Top-50 extracted attributes found to be
strongly related to one another based on distribu-
tional similarities
ities are a useful but limited criterion for nding similar
attributes. As shown in the lower part of Table 5, some
of the pairs of attributes automatically deemed to be simi-
lar are in fact useless when manually judging their correct-
ness, either because the attributes are siblings (e.g., length
and width for River) or simply because they are not equiv-
alent (e.g., symptoms and complications for Disease). In
contrast, the upper part of Table 5 shows pairs of attributes
whose automatic identication as similar is potentially use-
ful, including hypernyms (e.g., attacks is a hypernnym of
bombings for TerroristGroup), strongly related phrases (e.g.,
gods and goddesses for Religion) and, ideally, synonyms (e.g.,
climate and weather for NationalPark). The manual judg-
ment of all pairs of attributes found to be similar based on
distributional similarities, across the top 50 attributes ex-
tracted for each of the 40 target classes, indicates that 50%
of the pairs contain actual synonyms. Moreover, 79% of
the pairs are potentially useful, as they contain synonyms,
hypernyms or strongly related attributes. The results sug-
gest that although distributional similarities alone are in-
sucient for nding similar attributes, they constitute an
attractive, data-driven alternative to using manually con-
structed resources such as WordNet. In fact, distributional
similarities identify pairs of attributes such as climate and
weather for NationalPark, as well as emblem and logo for
SoccerClub, to be similar although the respective pairs are
not listed as synonyms in WordNet.
4. RELATED WORK
In contrast to previous approaches to large-scale infor-
mation extraction, which rely exclusively on large docu-
ment collections, for mining pre-specied relations, we ex-
plore the role of query logs in extracting unrestricted types
of relations, namely class attributes. A related recent ap-
proach [18] pursues the goal of unrestricted relation discov-
ery from textual documents.
Our extracted attributes are relations among objects in
the given class, and objects or values from other, \hidden"
classes. Determining the type of the \hidden" argument of
each attribute (e.g., Person and Location for the attributes
chief executive ocer and headquarters of the class Com-
pany) is beyond the scope of this paper. Nevertheless, the
lists of extracted attributes have direct benets in gauging
existing methods for harvesting pre-specied semantic rela-
tions [2, 14], towards the acquisition of relations that are of
real-world interest to a wide set of Web users, e.g., towards
nding mechanisms of action for drugs.
In [3], the acquisition of attributes and other knowledge
relies onWeb users who explicitly specify it by hand. In con-
trast, we may think of our approach as Web users implicitly
giving us the same type of information, outside of any sys-
tematic attempts to collect knowledge of general use from
the users. The method proposed in [20] applies handcrafted
lexico-syntactic patterns to text within a small collection
of Web documents. The resulting attributes are evaluated
through a notion of question answerability, wherein an at-
tribute is judged to be valid if a question can be formu-
lated about it. More precisely, evaluation consists of users
manually assessing how natural the resulting candidate at-
tributes are, when placed in a wh- question. Comparatively,
our evaluation is stricter. Indeed, many attributes, such as
long term uses and users for the class Drug, are marked as
wrong in our evaluation, although they would easily pass the
question answerability test (e.g., \What are the long term
uses of Prilosec?") used in [20]. Because our evaluation is
stricter, a direct comparison of our precision numbers with
中国航空网 www.aero.cn
航空翻译 www.aviation.cn
本文链接地址:
航空资料36(24)