航空资料36(23)_航空信息_民用航空_通用航空

曝光台注意防骗网曝天猫店富美金盛家居专营店坑蒙拐骗欺诈消费者

Holiday 0.65 0.90 0.60 0.65 0.36 0.52 VideoGame 0.70 0.90 0.57 0.90 0.44 0.90
Hurricane 1.00 0.95 0.87 0.95 0.76 0.73 Wine 0.40 1.00 0.42 0.87 0.29 0.57
Mountain 0.90 1.00 0.82 0.92 0.62 0.88 WorldWarBattle 0.00 0.85 0.00 0.82 0.00 0.66
Average-Class 0.72 0.90 0.64 0.85 0.53 0.76
Table 4: Detailed relative performance of pattern-based extraction based on handcrafted patterns (Pold) as
proposed in previous work, vs. seed-based extraction proposed in this paper (Snew) using Jensen-Shannon as
similarity function
0.6
0.8
1
0 10 20 30 40 50
Precision
Rank
Class: Average-Class
10 inst/class
20 inst/class
All inst/class
0.6
0.8
1
0 10 20 30 40 50
Precision
Rank
Class: Average-Class
2 seeds/class
3 seeds/class
4 seeds/class
5 seeds/class
Figure 6: Impact of separately varying the number
of input instances per class (left) or the number of
seed attributes per class (right), on the precision of
the ranked list of attributes extracted using Jensen-
Shannon as similarity function, as an average over
all target classes
duces the required amount of supervision, from 200 (5 times
40) seed attributes for all classes in the standard method, to
5 seed attributes for all classes in the tweaked experiment.
The resulting precision values, as an average over all target
classes, are 0.75 (at rank 10), 0.69 (at rank 20) and 0.56
(at rank 50). A quick comparison of these numbers with
the last row of Table 4 shows that the precision is lower
than with the standard conguration (Snew in Table 4), but
still higher than the performance of the older, handcrafted-
pattern method from [13] (Pold in Table 4).
3.5 Identifying Similar Attributes
The cumulative precision of each individual attribute, in
a list of attributes extracted for a given class, is correlated
with, but not a denitive measure of, the overall quality
of the list. Attributes with a signicant semantic overlap,
such as the pairs features and functions for CellPhoneModel,
or photographs and photos for NationalPark, or tenets and
principles for Religion in Table 3, are relevant separately
but less useful together, as they provide (near-)duplicate
information. Even if a list of attributes has high precision,
its diversity improves if attributes that are strongly related
to each other are identied and grouped into equivalence (or
semantic relatedness) classes.
To estimate the semantic relatedness between two given
phrases, most approaches rely on external lexical resources
created manually by experts, which organize concepts into
rich thesauri (e.g., Roget's Thesaurus) or hierarchically (e.g.,
WordNet). The latter is the de-facto standard in comput-
ing semantic relatedness [1], although resources compiled
collaboratively by non-experts (e.g., Wikipedia) represent
intriguing alternatives [19]. Since WordNet is not designed
to accommodate noisy, arbitrarily specic phrases (in this
case, attributes) that occur in search queries, and in gen-
eral to avoid using any manually compiled resources, it is
preferable to identify similar attributes by exploiting lexical
resources acquired from text by unsupervised methods. One
such resource consists of pairs of phrases associated with
distributional similarity scores, which measure the extent to
which the component phrases occur in similar contexts in
documents [9]. The scores have values in the interval [0,1].
For example, tenets and principles have a relatively high
distributional similarity score, namely 0.39. The pairs and
their scores are collected oine from 50 million news articles
maintained by the Google search engine.
In an experiment designed as an initial exploration of
whether distributional similarities could be used to iden-
tify similar attributes within a class, any two extracted at-
tributes are automatically deemed to be similar, if their dis-
tributional similarity score is higher than 0.2 and each at-
tribute is among the top 10 phrases that are most similar
to the other attribute. As expected, distributional similar-
WWW 2007 / Track: Data Mining Session: Mining Textual Data
108
Manual Examples of Attribute Pairs
Judgment Class Attributes
Potentially useful:
Synonyms CartoonChar quotations, sayings
Country gdp, gross domestic product
Empire administration, government
Hurricane path, route
NationalPark climate, weather
Religion gods, deities
SoccerClub emblem, logo
WorldWarBattle signicance, importance
Strongly AircraftModel details, information
Related Company ceo, chairman
　
中国航空网 www.aero.cn
航空翻译 www.aviation.cn
本文链接地址：航空资料36(23)