曝光台 注意防骗
网曝天猫店富美金盛家居专营店坑蒙拐骗欺诈消费者
classes and the same set of search queries as data source.
Figure 5 shows comparative precision curves for the classes
ProgLanguage, SkyBody, SoccerClub and Wine. To avoid
clutter, the performance of Snew is shown only for the two
similarity functions that were shown earlier in Figure 4 to
perform the best (Jensen-Shannon) and the worst (Jaccard).
The graphs in Figure 5 indicate that Pold has lower per-
formance than Snew for the classes SoccerClub and Wine,
and lower precision than Snew using Jensen-Shannon for the
class SkyBody. In the case of the class ProgLanguage, Pold
has higher accuracy than Snew for ranks 4 through 12, after
which precision degrades more quickly as the rank in the
list increases. To precisely quantify the dierences between
Pold and Snew using Jensen-Shannon, Table 4 provides an
in-depth comparison of precision at ranks 10, 20 and 50, for
each of the 40 target classes and as an average over all target
classes. The seed-based approach Snew proposed in this pa-
per clearly outperforms the previous method Pold from [13],
with relative precision boosts of 25% (0.90 vs. 0.72) at rank
10, 32% (0.85 vs. 0.64) at rank 20, and 43% (0.76 vs. 0.53)
at rank 50.
3.4 Minimizing the Amount of Supervision
The extraction of attributes from query logs requires a
relatively small amount of supervision, which in our experi-
ments is provided in the form of 5 seed attributes and a me-
dian of 172 instances per target class. While we believe that
such a requirement is not at all unreasonable or impractical,
quantifying the exact impact of reducing the amount of su-
pervision on the quality of extracted attributes is still useful
for at least two reasons. First, it oers an insight into the
robustness of the method, and on its ability to perform the
task at hand even in non-optimal conditions (that is, with
scarce input data). More importantly, it gives a trustwor-
thy estimate on the expected accuracy level in what could
be called a fast-track development scenario, when the at-
tributes for a new search vertical (with new target classes)
need to be collected quickly with minimum eort and mini-
mum input data.
Figure 6 illustrates the impact of providing less input data
on the output quality. The graphs show the precision plots
corresponding to reducing the number of instances, from
the (regular) all down to 20 and then to 10 per class (left
graph); and to reducing the number of seed attributes from
the regular 5 down to 4, 3 and then 2 per class (right graph).
As expected, the precision gradually decreases in both cases,
but the decrease is small especially at lower ranks (1 through
20). In other words, the extraction method proposed in this
paper can extract attributes of high quality even if it is given
as few as 10 instances and 2 seed attributes per target class.
In a separate experiment, Steps 10 through 15 from the
earlier Figure 1 are temporarily tweaked to generate a refer-
ence search-signature vector for only one of the target classes
(randomly chosen to be Country), instead of generating one
such reference vector for each class. This change further re-
WWW 2007 / Track: Data Mining Session: Mining Textual Data
107
Class Precision Class Precision
@10 @20 @50 @10 @20 @50
Pold Snew Pold Snew Pold Snew Pold Snew Pold Snew Pold Snew
Actor 0.85 1.00 0.82 1.00 0.74 0.96 Movie 0.95 1.00 0.90 0.95 0.72 0.85
AircraftModel 0.80 0.80 0.77 0.85 0.68 0.71 NationalPark 0.70 0.85 0.80 0.85 0.66 0.88
Award 0.30 0.95 0.15 0.77 0.24 0.69 NbaTeam 0.60 0.80 0.40 0.77 0.33 0.78
BasicFood 1.00 1.00 0.90 0.95 0.65 0.86 Newspaper 0.85 0.90 0.62 0.80 0.39 0.72
CarModel 0.95 1.00 0.77 1.00 0.77 0.89 Painter 1.00 1.00 0.95 0.97 0.86 0.90
CartoonChar 0.45 0.70 0.47 0.67 0.39 0.64 ProgLanguage 0.95 0.90 0.77 0.90 0.50 0.85
CellPhoneModel 0.55 0.90 0.57 0.87 0.23 0.78 Religion 1.00 1.00 0.92 0.95 0.78 0.95
ChemicalElem 0.90 0.80 0.67 0.80 0.71 0.83 River 0.75 1.00 0.70 0.75 0.55 0.73
City 0.20 0.75 0.20 0.75 0.31 0.68 SearchEngine 0.50 0.65 0.50 0.55 0.48 0.39
Company 0.90 1.00 0.82 0.97 0.79 0.85 SkyBody 1.00 1.00 0.97 1.00 0.77 0.96
Country 0.85 1.00 0.82 0.97 0.88 0.95 Skyscraper 0.85 0.95 0.60 0.87 0.48 0.74
Currency 0.50 0.80 0.25 0.67 0.16 0.36 SoccerClub 0.55 1.00 0.42 0.90 0.21 0.90
DigitalCamera 0.50 0.90 0.25 0.82 0.10 0.87 SportEvent 0.60 1.00 0.42 0.95 0.42 0.84
Disease 1.00 0.95 1.00 0.92 0.77 0.87 Stadium 0.75 0.90 0.72 0.85 0.57 0.83
Drug 1.00 0.90 0.87 0.90 0.84 0.81 TerroristGroup 0.55 0.90 0.62 0.82 0.43 0.49
Empire 0.85 0.90 0.77 0.87 0.66 0.82 Treaty 0.20 0.95 0.40 0.87 0.46 0.64
Flower 0.90 0.75 0.70 0.65 0.59 0.58 University 0.90 0.85 0.82 0.85 0.65 0.74
中国航空网 www.aero.cn
航空翻译 www.aviation.cn
本文链接地址:
航空资料36(22)