• 热门标签

当前位置: 主页 > 航空资料 > 国外资料 >

时间:2010-09-06 01:00来源:蓝天飞行翻译 作者:admin
曝光台 注意防骗 网曝天猫店富美金盛家居专营店坑蒙拐骗欺诈消费者

ing to the query template [ ]prefix [company one year]infix
[target]postfix. After combining the vectors associated with
the seed attributes (stock price, headquarters etc.) into a
reference vector for the class, the relevance of each candi-
date attribute for the class is computed as the similarity
score of the vector associated to the candidate attribute,
with respect to the reference vector for the class.
3. EVALUATION
3.1 Experimental Setting
Data: The input to the experiments is a random sample of
around 50 million unique, fully-anonymized queries in En-
glish submitted by Web users to the Google search engine
in 2006. All queries are considered independently of one
another, whether they were submitted by the same user or
di erent users, within the same or di erent search sessions.
Figure 3 shows the distribution of the queries from the
random sample, according to the number of words in each
query. If multiple occurrences of identical queries count
towards the computation of the distribution, a fraction of
14.5%, 31.3%, 26.5%, 13.2% and 7.5% of the queries from
the random sample contain 1, 2, 3, 4 and 5 words respec-
tively, as shown by the solid line in Figure 3. Put di er-
ently, 93% of the queries from the sample contain 5 words
or less. Only 0.7% of the queries consist of more than 10
words. If the computation ignores the frequency in the logs
of each query, which corresponds to the dotted line in the
gure, the distribution moderately shifts to indicate longer
unique queries on average. Only 2.4% of the unique queries
contain 1 word, whereas 14.3%, 27.8%, 22.9% and 14.9%
WWW 2007 / Track: Data Mining Session: Mining Textual Data
103
installing delphi honda accord apple computer headquarters stock price motorola
mission statement google clinical paxil duracell lithium zoloft generic equivalent
side effects vioxx dosage medrol mechanism of action zithromax order sectral
honda accord 1989 sei installing toyota cressida waterpump new honda accord
installing oracle 8.1−7 on solaris 8 coca cola company one year stock price target
washington mutual new headquarters impact mission statement for delta airlines
where is the world headquarters for delphi corporation
[ ] [ ] [cressida water pump]
prefix infix postfix
Company: installing
[ ] [ ] [8.1−7 on solaris 8]
prefix infix postfix
Company: stock price
[ ] [company one year] [target]
prefix infix postfix
[ ] [air lines] [history]
prefix infix postfix
Company: accord
[ ] [ ] [1989 sei]
prefix infix postfix
[new] [ ] [ ]
prefix infix postfix
Company: headquarters
[where is the world] [for the] [corporation]
prefix infix postfix
[ ] [new] [impact]
prefix infix postfix
[ ] [new] [impact]
prefix infix postfix
[where is the world] [for the] [corporation]
prefix infix postfix
Target classes
Company: {Delphi, Apple Computer, Honda, Motorola, Google, Coca Cola,
Toyota, General Motors, Canon, Reuters, Time Warner, Target, ...}
Drug: {Paxil, Lithium, Zoloft, Vioxx, Medrol, Zithromax, Sectral, Vicodin,
Lipitor, Zyrtec, Prilosec, Cipro, Oxycontin, Avandia, Imitrex, Albuterol, ...}
Query logs
delta air lines stock price history mission statement for the oracle corporation
Company: mission statement
[ ] [for the] [corporation]
[ ] [for] [airlines]
prefix infix postfix
prefix infix postfix
Reference search−signature vectors (one per class)
Search−signature vectors (one per candidate attribute)
[ ] [company one year] [target]
prefix infix postfix
[ ] [air lines] [history]
prefix infix postfix
Company
(1)
Ranked list of class attributes
Company: {headquarters, mission statement, stock price, ceo, code of conduct,
stock symbol, organizational structure, corporate address, cio, ...}
Drug: {side effects, withdrawal symptoms, generic equivalent, half life, dosage,
mechanism of action, contraindications, ld50, clinical uses, cost, ...}
Company: {installing, stock price, accord, headquarters, mission statement, ...}
Pool of candidate attributes
Drug: {side effects, clinical, generic equivalent, duracell, order, dosage, viral, ...}
(16−19)
(2−9)
(10−15)
Seed attributes
Drug: {price, dosage, side effects, color, chemical name}
Company: {headquarters, stock price, ceo, location, chairman}
Figure 2: Overview of weakly supervised extraction of attributes from query logs
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10
Percentage of queries
 
中国航空网 www.aero.cn
航空翻译 www.aviation.cn
本文链接地址:航空资料36(17)