曝光台 注意防骗
网曝天猫店富美金盛家居专营店坑蒙拐骗欺诈消费者
. small set of seed phrases fKg expected in output
. large repository of search queries fQg
Output: ranked list of phrases for C
Variables: fPg = pool of candidate phrases
. TQ = query template
. FQ = query frequency in logs
. VP = search signature vector
. fVg = search-signature vectors, one per P
. fSg = vector of scores, one score per P
. VK = reference search-signature vector
Steps:
00. fPg = ;; T = nil; FQ = 0; fVg = ;; VK = nil
01. Collect fPg from fQg based on fIg
02. For each query Q in fQg
03. For each candidate phrase P in fPg
04. For each instance I in fIg
05. If (Q contains both P and I)
06. TQ = QueryRemainderTemplate(Q, P, I)
07. FQ = QueryFrequency(Q)
08. VP = Find/create entry for P in fVg
09. Update weight of TQ in VP based on FQ
10. For each candidate phrase P in fPg
11. For each seed phrase K in fKg
12. If (P == K)
13. VP = Find entry for P in fVg
14. If (VP != nil)
15. Merge elements of VP into VK
16. For each candidate phrase P in fPg
17. VP = Find entry for P in fVg
18. S.At(P) = ComputeSimScore(VP , VK)
19. Return SortedList(fPg, fSg)
Figure 1: Generic Framework for Weakly Super-
vised Information Extraction from Queries
both an instance I, and a candidate phrase P. The remain-
der of a matching query, that is, the concatenation of the
remaining prex, inx and postx (any of which may be
empty), becomes an entry in a query template vector that
acts as a search signature of the candidate phrase with re-
spect to the class. For example, given the instance Intel
in the class Company and the candidate phrases headquar-
ters and mission statement, the queries \where is the world
headquarters for intel corporation" and \mission statement
for intel" respectively produce the templates:
[where is the world]prefix [for]infix [corporation]postfix;
[ ]prefix [for]infix [ ]postfix.
Thus, query templates are added as weighted elements in
the search-signature vector of each candidate phrase. The
weights aggregate the frequency-based contribution of dis-
tinct queries (via distinct instances) to the same template
(e.g., as another query may also ask about the world head-
quarters but for Oracle, rather than Intel). Steps 10 through
15 in Figure 1 introduce weak supervision in the extraction
process. They identify the vectors associated with the seed
phrases K that are known a-priori to be part of the desired
output. Those vectors are merged into a reference search-
signature vector that can be thought of as a loose search n-
gerprint of the desired output with respect to the class. In
steps 16 through 19, the similarity scores among the search-
signature vector of each candidate phrase, on one hand, and
the reference search-signature vector, on the other hand, in-
duce a ranking over the candidate phrases and determine
the list of candidate phrases returned as output.
2.3 Class Attribute Extraction
An immediate application of the proposed extraction frame-
work is the extraction of class attributes. Specically, given
a set of target classes specied as sets of representative in-
stances, and a set of seed attributes for each class, the goal is
to extract relevant class attributes from query logs, without
relying on any further domain knowledge.
Figure 2 illustrates the extraction of class attributes from
queries. The numbered arrows correspond to steps from the
generic algorithm described earlier in Figure 1. As shown
in the upper part of Figure 2, it is straightforward to collect
a very large (and extremely noisy) pool of candidate at-
tributes, by identifying the queries which contain one of the
class instances (e.g., delphi and apple computer) at one ex-
tremity (e.g., \installing delphi" and \apple computer head-
quarters"), and collecting the remainders of the queries as
candidate attributes (e.g., installing and headquarters for
the class Company).
The search-signature vectors are populated for each can-
didate attribute in a second pass over the query logs. For
instance, the query \installing oracle 8.1-7 on solaris 8",
containing both a class instance (Oracle) and a candidate at-
tribute (installing) produces an entry in the search-signature
vector of installing with respect to the class Company. The
entry corresponds to the unique query template [ ]prefix [
]infix [8.1-7 on solaris 8]postfix. Similarly, the query \coca
cola company one year stock price target" results in a new
entry being added to the search-signature vector of stock
price with respect to the same class Company, correspond-
中国航空网 www.aero.cn
航空翻译 www.aviation.cn
本文链接地址:
航空资料36(16)