• 热门标签

当前位置: 主页 > 航空资料 > 国外资料 >

时间:2010-09-06 01:00来源:蓝天飞行翻译 作者:admin
曝光台 注意防骗 网曝天猫店富美金盛家居专营店坑蒙拐骗欺诈消费者

River location, length, mouth, tributaries, width, source, physical features, headwaters, depth, origin
SearchEngine market share, share price, phone book, net worth, submit url, mission statement, owner, submissions, inventor,
SkyBody distance, size, age, volume, diameter, radius, mass, surface gravity, orbital velocity, period of revolution
Skyscraper height, architect, location,
oors, dimensions, address, history, pics,
oor plans, architecture
SoccerClub league, capacity, chairman, titles, ocial site, ocial website, managers, tours, seating plan, past players
SportEvent winners, events, champions, results, champs, dates, matchups, ocial site, ocial website, locations
Stadium location, seating capacity, architect, address, seating map, dimensions, tours, pics, poster, box oce
TerroristGroup attacks, leader, goals, meaning, website, leadership, photos, images, de nition,
ag
Treaty countries, rati cation, date, de nition, summary, purpose, pros, cons, members, picture
University alumni, mascot, dean, economics department, career center, graduation 2005, department of psychology, school
colors, tuition costs, campus map
VideoGame price, system requirements, creator, ocial site, ocial website, free game download, concept art, download
demo, pc cheat codes, reviews
Wine vintage, color, cost, style, taste, vintage chart, pronunciation, shelf life, wine ratings, wine reviews
WorldWarBattle date, location, signi cance, images, importance, timeline, summary, pics, maps, photographs
Table 3: Top attributes extracted using Jensen-Shannon as similarity function
WWW 2007 / Track: Data Mining Session: Mining Textual Data
106
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50
Precision
Rank
Class: ProgLanguage
Jaccard
Jensen-Shannon
handcrafted patterns
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50
Precision
Rank
Class: SkyBody
Jaccard
Jensen-Shannon
handcrafted patterns
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50
Precision
Rank
Class: SoccerClub
Jaccard
Jensen-Shannon
handcrafted patterns
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50
Precision
Rank
Class: Wine
Jaccard
Jensen-Shannon
handcrafted patterns
Figure 5: Relative performance of pattern-based extraction based on handcrafted patterns as proposed in
previous work, vs. seed-based extraction proposed in this paper using Jaccard and Jensen-Shannon as
similarity functions, for a few target classes
as similarity function, the attributes extracted for the class
Painter are better than for River, which in turn are better
than for CartoonChar. Second, not all similarity functions
produce the same level of accuracy. The rightmost graph in
Figure 4 shows the precision as an average over all target
classes. Although none of the similarity functions outper-
forms the others on each and every target class, it turns
out that, on average, Jensen-Shannon performs the best
and Jaccard the worst, with L1-Norm, Cosine and Skew-
Divergence placed in-between. For a more detailed view
into the extracted attributes, Table 3 shows the top at-
tributes extracted with Jensen-Shannon for each of the 40
target classes. Third, regardless of which of the ve simi-
larity functions is used, the precision as an average over all
target classes is higher than 0.8 for rank 10, higher than 0.7
for rank 30, and higher than 0.6 for rank 50, as shown by
the rightmost graph in Figure 4. These numbers are very
high both in absolute value, but also relatively to the qual-
ity of attributes extracted with handcrafted patterns from
query logs based on a previously-proposed method [13], as
explained in the following.
3.3 Comparison to Previous Results
A recent study in information extraction from theWeb [13]
describes a method similar to this paper in goals (extrac-
tion of class attributes) and textual data source (extraction
from query logs). The main di erence is that, in that study,
the extraction is fully supervised (rather than weakly super-
vised), using handcrafted extraction patterns (rather than
seed attributes) to extract candidate attributes from queries
that match those patterns. In the following, we denote the
older, handcrafted-pattern method from [13] by Pold, and
the seed-based method introduced in this paper by Snew.
For a direct comparison of Pold and Snew, and since the
evaluation of Pold in [13] reports results on only 5 target
classes, the older method Pold was re-run in the experimen-
tal setting de ned in this paper, with the same 40 target
 
中国航空网 www.aero.cn
航空翻译 www.aviation.cn
本文链接地址:航空资料36(21)