DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 2, 5, 9, 12, 16, and 19 have been amended by Applicant. Claim 1 is cancelled and no new claims have been added. Claims 2-21 are currently pending.
Response to Arguments
Claim Rejections under 35 U.S.C. 103
The rejection of claims 2, 5, 7-9, 12, 14-16, 19, and 21 under 35 U.S.C. 103 has been withdrawn in view of Applicant’s amendments to independent claims 2, 9, and 16. However, upon further consideration and in view of said amendments, a new grounds of rejection has been made herein.
The rejection of claims 3-4, 10-11, and 17-18 has been withdrawn in view of Applicant’s amendments to independent claims 2, 9, and 16. However, upon further consideration and in view of said amendments, a new grounds of rejection has been made herein.
The rejection of claims 6, 13, and 20 has been withdrawn in view of Applicant’s amendments to independent claims 2, 9, and 16. However, upon further consideration and in view of said amendments, a new grounds of rejection has been made herein.
Applicant argues (in page 9 of the remarks) that the combination in view of Jiwei Li does not teach the limitation “selecting a first one of the nodes of the taxonomic hierarchy based on (a) the normalized distribution of scores and (b) an allow list of allowable nodes associated with connected ones of the child nodes, the allow list based on a first output from a previous position that precedes the respective position”, as recited in claim 2 (as amended).
As set forth in the instant Office Action, in view of Applicant’s amendment to the claims, Eberhardt is cited as teaching the limitation selecting a first one of the nodes of the taxonomic hierarchy based on … and (b) an allow list of allowable nodes associated with connected ones of child nodes, … To this effect, Eberhardt, Paragraph [0065] teaches a data pipeline has at least one node that includes information pertaining to how the objects are described, manipulated, and/or displayed to the user. For example, TABLE 1 illustrates exemplary characteristics of a node. The node includes items such as a name, logic steps, inputs, outputs, flow control choices, data type, and type of server that can execute the node. For each item of the node, TABLE 1 includes a description, type, and cardinality. In addition, the inputs include user inputs that are defined by the user at the UI. The user also defines dependencies between the nodes in the data pipelines at the pipeline editor. For example, the user defines one or more parent nodes that can pass control to one or more dependent children nodes based on the logic steps, user inputs, and flow control choices.; Paragraph [0066] teaches FIG. 10 is an exemplary illustration of a pipeline editor ontology 1100; Paragraph [0068] FIGS. 11A-11C are exemplary illustrations of pipe editing, according to certain embodiments.; See Fig. 10 illustrating pipeline node (1108), edges (1102), and parent node (1104) and child node (1106); Eberhardt, Paragraph [0069] further teaching FIGS. 11B and 11C are exemplary illustrations of a data pipeline 1210 with a choice set, according to certain embodiments. In certain implementations, the choice set is defined by the logic steps in a pipe node or by user input. During pipe editing, for pipe nodes having two or more items in the choice set, when edges are drawn to connect two pipe nodes, the user is presented with a dropdown list that includes one or more choices of the choice set. For example, for data pipeline 1210, when drawing edges between a parent node P2 and child nodes C3, C4, and C5, a user is presented with choice set 1212 as a dropdown list that includes choices C3, C4, and C5 [i.e., understood as an allow list of nodes]. In an implementation, the user selects choice C3 from the dropdown list to draw an edge connecting parent node P2 and child node C3. Since choice C3 was selected from the choice set 1212, in FIG. 11C, choice C3 is removed from the choice set 1214, and the user can select choice C4 or C5 from the dropdown list [i.e., understood as a block list of nodes]; See Figs. 11A, 11B and 11 C; Eberhardt, Abstract, teaches a data analytics system includes processing circuitry that receives one or more objects from one or more data sources, and the one or more objects are described based on a common ontology that defines the one or more objects as data objects, manipulation objects, visualization objects, and utility objects.; Eberhardt, Paragraph [0029] further teaches each object has specific documentation that allows the objects to self-validate, which supports workflows that are configured by a user and are reusable. In certain embodiments, the XACT.TM. DTD 206 involves assigning sub-document types to each object, which can include a descriptive document 402, a semantic document 404, and an access document 406.)
Although Applicant argues with reference to Jiwei Li, it is Damle that was cited as teaching the limitation selecting a first one of the taxonomic hierarchy of nodes based on (a) the normalized distribution of scores… To this effect, Damle, [claim 1] teaches “means for determining the grouping of concepts in the document; means for determining a hierarchy among the concepts [i.e., understood to read on selecting one of a taxonomic hierarchy of nodes where the “concepts” are the “nodes”]; means for determining how information is distributed through the hierarchy; and means for determining, based on the determined information distribution, what portion of the document is attempting to convey information; and normalizing means, in communication with the testing means, for normalizing semantic relevance scores assigned to each concept, across the total number of concepts; … and summary generating means, in communication with the determining means, for creating, based on the vectors, a summary of at least one document in the set of received documents,”).
Jiwei Li was cited in the Non-Final Office Action and the instant office action to teach the portion of the limitation reciting the allow list based on a first output from a previous position that precedes the respective position . Although this portion of the limitation was not argued by Applicant it should be noted that Jiwei does teach this limitation as it was shown that Jiwei Li, Section 3.4 teaches “attention models adopt a look-back strategy by linking the current decoding stage with the input sentences…During decoding suppose that
e
t
s
denotes the sentence-level embedding at current step and that
h
t
-
1
s
(dec) denotes the hidden vector outputted from LSTMsentence decode at previous time step t-1 [i.e., as in from a previous position]. Attention models would first link the current decoding information… which is outputted from LSTMsentence decode with each of the input sentences, characterized by a strength indicator.; Jiwei Li, Section 4 further teaches implementing the proposed autoencoder [i.e., that adopts a look-back strategy as described above] on two datasets including a hotels review data set. And further teaches considering only reviews consisting of sentences ranging from 50 to 250 words and keeping [i.e., as in selecting] the vocabulary set consisting of most frequent words [i.e., as in allowable nodes]. Jiwei Li further teaches at Section 4 that a special <unk> token is used to denote all the remaining less frequent words and that reviews consisting of more that 2 percent of unknown words are discarded [i.e., as in a block list of disallowed nodes]).
Applicant’s remaining arguments with respect to claim(s) 2 and analogous claims 9 and 16 (as amended) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 2, 5, 7-9, 12, 14-16, 19, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Vinyals et al. (US 20160180215 A1, Published Jun. 23, 2016) in view of Paulus (US 20180300400 A1; published Oct. 18, 2018), Eberhardt, III et al. (US 20150120644 A1), Jiwei Li et al., “A Hierarchical Neural Autoencoder for Paragraphs and Documents” (June, 2015), Damle (U.S. Patent No. 7571177, Published Aug. 4, 2009) and Li et al., “Joint Embedding of Hierarchical Categories and Entities for Concept Categorization and Dataless Classification”, (2016)
Regarding claim 2, Vinyals teaches an apparatus to classify text, comprising:
interface circuitry to obtain a text block (Vinyals, Abstract, teaches one of the methods includes obtaining an input text segment; See also Vinyals Paragraphs , [0006], [0073], and [claim 1]);
machine readable instructions (Vinyals, Paragraph [0066] teaches Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory program carrier for execution by, or to control the operation of, data processing apparatus.); and
at least one processor circuit to be programmed by the machine readable instructions (Vinyals, Paragraph [0067] teaches the term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include special purpose logic circuitry, See also Vinyals, Paragraph [0066].) to at least:
generate encoder hidden states for the text block by sequentially processing a sequence of inputs corresponding to the text block with an encoder recurrent neural network (RNN) (Vinyals, [0032] teaches the encoder LSTM neural network 110 has been configured, e.g., through training, to process each word in a given input text segment to generate the alternative representation of the input text segment in accordance with a set of parameters. In particular, the encoder LSTM neural network 110 is configured to receive each word in the input text segment in the input order and, for a given received input, to update the current hidden state of the encoder LSTM neural network 110 by processing the received input, i.e., to modify the current hidden state of the encoder LSTM neural network 110 that has been generated by processing previous inputs from the input text segment by processing the current received input.; [Note: an LSTM neural network is a type of RNN.]; [Note: training to generate in Paragraph [0032] of Vinyals has been understood to read on “generate…” as claimed.);
However, Vinyals does not distinctly disclose the remaining limitations.
Nevertheless, Paulus teaches iteratively select an output for a respective position in a sequence of outputs generated by a decoder RNN based on encoder hidden states by: generating an attention vector of scores based on first attention scores that correspond to a first decoder hidden state and the encoder hidden states, the first decoder hidden state to correspond to the respective position in the sequence of outputs, the first attention scores based on dot products between the first decoder hidden state and respective ones of the encoder hidden states (Paulus, Abstract, teaches Disclosed RNN-implemented methods and systems for abstractive text summarization process input token embeddings of a document through an encoder that produces encoder hidden states; applies the decoder hidden state to encoder hidden states to produce encoder attention scores for encoder hidden states; generates encoder temporal scores for the encoder hidden states by exponentially normalizing a particular encoder hidden state's encoder attention score over its previous encoder attention scores; Paulus, Paragraph [0045] teaches calculating attention scores between current decoder hidden state and the encoder hidden state. And further teaches that in other implementations a simple dot-product between the two vectors can be utilized.; See also, Paulus Paragraphs [0015], [0064]);
generating a normalized distribution of scores … based on the attention vector and the first decoder hidden state (Paulus, Abstract and Paragraph [0015], teaches normalizing a particular encoder hidden state's encoder attention score over its previous encoder attention scores; generates normalized encoder temporal scores by unity normalizing the temporal scores… and producing the current intra-decoder attention vector as convex combination of the previous decoder hidden states scaled by the corresponding current normalized decoder attention scores and processing the vector to emit a summary token; See also, Paulus, Paragraph [0040]; Paulus, Paragraph [0044] further teaches At each decoding step, the decoder emits a summary token using a current intra-temporal encoder attention vector, a current intra-decoder attention vector, and a current decoder hidden state, applying the current decoder hidden state to each of the encoder hidden states to produce current encoder attention scores for each of the encoder hidden states; Paulus, Paragraph [0047] further teaches producing the current intra-temporal encoder attention vector as a convex combination of the encoder hidden states scaled by the corresponding current normalized encoder temporal score—generating current normalized decoder attention scores for each of the previous decoder hidden states by exponentially normalizing each of the current decoder attention scores. The intra-temporal attention context vector is calculated as follows, producing the current intra-decoder attention vector as convex combination of the previous decoder hidden states scaled by the corresponding current normalized decoder attention scores and processing the vector to emit a summary token.); and
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the recurrent neural network for generating parse trees, as taught by Vinyals, with the attention scores, as taught by Paulus, as the attention is modulated to ensure that the model uses different parts of the input when generating the output text, hence increasing information coverage of the summary. (Paulus, Paragraph [0050])
However, the combination of Vinyals in view of Paulus does not distinctly disclose:
…scores corresponding to nodes of a taxonomic hierarchy…
selecting a first one of the nodes of the taxonomic hierarchy based on … and (b) an allow list of allowable nodes associated with connected ones of child nodes, …
Nevertheless, Eberhardt teaches:
…scores corresponding to nodes of a taxonomic hierarchy… ([0077] FIG. 13 is an exemplary illustration of a data pipeline execution, according to certain embodiments. In the example of FIG. 15, the Edge Effect framework is used to determine commonalities in naming girls who are born through a number of years. For example, a data pipeline 1500 is accessed by an end user through URL 1502, which initiates execution of the data pipeline 1500 that is created by a user, such as a pipe editor. The data pipeline includes a node 1504 that retrieves a data file that includes a table of one hundred names most frequently given to girls born in the year 1990 along with a frequency count and a frequency ranking for the one hundred names. For example, the name "Jessica" was most frequently given to girls in the year 1990 and has a frequency count of 46,463 and a frequency ranking of one, according to certain embodiments. In addition, node 1506 retrieves a data file that includes a table of one hundred names most frequently given to girls born in the year 2000 along with the corresponding frequency count and frequency ranking. Due to the common lexicon that describes the objects within the Edge Effect framework based on the content ontology 202, taxonomy 204, and XACT.TM. DTD 206, the data files retrieved at nodes 1504 and 1506 can have different formats, languages, and the like.; Eberhadtm Paragraph [0079] teaches, at node 1512, the logic steps determine the names that are common between 1990, 2000, and 2010. The execution of node 1512 result in an output of a table of the common names between the years 1990, 2000, and 2010 along with the corresponding frequency counts and frequencies rankings for 1990, 2000, and 2010. At node 1514, a column filter is applied that sorts columns of the table output from node 1512 based on name and the frequency ranking and frequency count for the years 1990, 2000, and 2010. At node 1516, the columns of the table output from node 1514 are sorted in descending order based on the 1990 frequency ranking. [NOTE: the distribution of frequency ranking as disclosed in Eberhardt is being understood as a distribution of scores of a taxonomic hierarchy. See Fig. 10 and 11A, B, C for the taxonomic hierarchy of nodes] In certain embodiments, node 1516 is the end node, and the table output from 1516 is returned to the end user via an application on an external machine.)
selecting a first one of the nodes of the taxonomic hierarchy based on … and (b) an allow list of allowable nodes associated with connected ones of child nodes, … (Eberhardt, Paragraph [0065] teaches a data pipeline has at least one node that includes information pertaining to how the objects are described, manipulated, and/or displayed to the user. For example, TABLE 1 illustrates exemplary characteristics of a node. The node includes items such as a name, logic steps, inputs, outputs, flow control choices, data type, and type of server that can execute the node. For each item of the node, TABLE 1 includes a description, type, and cardinality. In addition, the inputs include user inputs that are defined by the user at the UI. The user also defines dependencies between the nodes in the data pipelines at the pipeline editor. For example, the user defines one or more parent nodes that can pass control to one or more dependent children nodes based on the logic steps, user inputs, and flow control choices.; Paragraph [0066] teaches FIG. 10 is an exemplary illustration of a pipeline editor ontology 1100; Paragraph [0068] FIGS. 11A-11C are exemplary illustrations of pipe editing, according to certain embodiments.; See Fig. 10 illustrating pipeline node (1108), edges (1102), and parent node (1104) and child node (1106); Eberhardt, Paragraph [0069] further teaching FIGS. 11B and 11C are exemplary illustrations of a data pipeline 1210 with a choice set, according to certain embodiments. In certain implementations, the choice set is defined by the logic steps in a pipe node or by user input. During pipe editing, for pipe nodes having two or more items in the choice set, when edges are drawn to connect two pipe nodes, the user is presented with a dropdown list that includes one or more choices of the choice set. For example, for data pipeline 1210, when drawing edges between a parent node P2 and child nodes C3, C4, and C5, a user is presented with choice set 1212 as a dropdown list that includes choices C3, C4, and C5 [i.e., understood as an allow list of nodes]. In an implementation, the user selects choice C3 from the dropdown list to draw an edge connecting parent node P2 and child node C3. Since choice C3 was selected from the choice set 1212, in FIG. 11C, choice C3 is removed from the choice set 1214, and the user can select choice C4 or C5 from the dropdown list [i.e., understood as a block list of nodes]; See Figs. 11A, 11B and 11 C; Eberhardt, Abstract, teaches a data analytics system includes processing circuitry that receives one or more objects from one or more data sources, and the one or more objects are described based on a common ontology that defines the one or more objects as data objects, manipulation objects, visualization objects, and utility objects.; Eberhardt, Paragraph [0029] further teaches each object has specific documentation that allows the objects to self-validate, which supports workflows that are configured by a user and are reusable. In certain embodiments, the XACT.TM. DTD 206 involves assigning sub-document types to each object, which can include a descriptive document 402, a semantic document 404, and an access document 406.)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the recurrent neural network for generating parse trees, as taught by Vinyals in view of Paulus, to further include the data analytics system, as taught by Eberhardt, in order to enable implementation of algorithms and data manipulations on a single platform which results in better performance, reliability, and/or cost. (Eberhardt, Paragraphs [0023] and [0086])
However the combination does not distinctly disclose… the allow list based on a first output from a previous position that precedes the respective position
Nevertheless, Jiwei Li teaches the allow list based on a first output from a previous position that precedes the respective position (Jiwei Li, Section 3.4 teaches “attention models adopt a look-back strategy by linking the current decoding stage with the input sentences…During decoding suppose that
e
t
s
denotes the sentence-level embedding at current step and that
h
t
-
1
s
(dec) denotes the hidden vector outputted from LSTMsentence decode at previous time step t-1 [i.e., as in from a previous position]. Attention models would first link the current decoding information… which is outputted from LSTMsentence decode with each of the input sentences, characterized by a strength indicator.; Jiwei Li, Section 4 further teaches implementing the proposed autoencoder [i.e., that adopts a look-back strategy as described above] on two datasets including a hotels review data set. And further teaches considering only reviews consisting of sentences ranging from 50 to 250 words and keeping [i.e., as in selecting] the vocabulary set consisting of most frequent words [i.e., as in allowable nodes]. Jiwei Li further teaches at Section 4 that a special <unk> token is used to denote all the remaining less frequent words and that reviews consisting of more that 2 percent of unknown words are discarded [i.e., as in a block list of disallowed nodes]).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the recurrent neural network for generating parse trees, as taught by Vinyals in view of Paulus and Eberhardt, to further include the LSTM autoencoder with attention that allows only reviews withing a certain range and a vocabulary set consisting of most frequent words, as taught by Jiwei Li, as the hierarchical model that considers sentence level structure outperforms standard sequence-to-sequence models and attention models at the sentence level introduce performance boost over vanilla hierarchical models. (Jiwei Li, Section 4.4).
However, the combination does not distinctly disclose selecting a first one of the taxonomic hierarchy of nodes based on (a) the normalized distribution of scores….
Nevertheless, Damle teaches selecting a first one of the taxonomic hierarchy of nodes based on (a) the normalized distribution of scores… (Damle, [claim 1] teaches “means for determining the grouping of concepts in the document; means for determining a hierarchy among the concepts [i.e., understood to read on selecting one of a taxonomic hierarchy of nodes where the concepts are the nodes]; means for determining how information is distributed through the hierarchy; and means for determining, based on the determined information distribution, what portion of the document is attempting to convey information; and normalizing means, in communication with the testing means, for normalizing semantic relevance scores assigned to each concept, across the total number of concepts; … and summary generating means, in communication with the determining means, for creating, based on the vectors, a summary of at least one document in the set of received documents,”).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the recurrent neural network for generating parse trees, as taught by Vinyals in view of Paulus, Eberhardt, and Jiwei Li, to further include the means for determining a hierarchy among the concepts based on normalized semantic relevance scores assigned to each concept, as taught by Damle, in order to “remove "noise" from a given document by an analysis of the structure of a document. This is especially useful in the context of the web, where an article is usually presented with a large amount of unrelated information (i.e., "noise"). The system accomplishes this by … determining the local hierarchy of such, determining how the information is distributed through this structure, and then deciding what part of the document is actually attempting to convey information. This can be enhanced by an original pass of the "semantic relevance filter" to understand the semantically relevant concepts, and then to see how they are distributed through the local hierarchy of the document structure.” (Damle, Cols. 2-3, lines 58-67 and lines 1-4).
However the combination does not distinctly disclose:
convert the sequence of outputs to an output classification based on a first dictionary having mappings between ones of the connected child nodes in the taxonomic hierarchy and corresponding ones of class labels, the sequence of outputs corresponding to a directed hierarchical sequence of outputs representing a first directed classification path for the text block in a multi-level hierarchical classification taxonomy.
Nevertheless, Li teaches convert the sequence of outputs to an output classification based on a first dictionary having mappings between ones of the… nodes in the taxonomic hierarchy and corresponding ones of class labels (Li, Section 4.1, teaches given a set of concepts and a set of candidate categories, converting all concepts to concept vectors and all candidate categories to category vectors; Li, Section 1, par. 2, teaches in this paper we propose two models to simultaneously learn entity and category representation from large-scale knowledge bases (KBs) [knowledge bases reading on dictionary]; Li, Section 3.1 teaches in KBs [i.e., dictionary], each entity is labeled with one or more categories (c1, c2, …., ck), k>1 and described by an article containing other context entities (See Data in Figure 1); See also Fig.1 “concept categorization” – concept clustering and Fig. 2 – Category and entity embedding visualization of the DOTA-all data set wherein t-SNE algorithms are used to map vectors into a 2-dimensional space. Labels with the same color are entities belonging to the same category. Labels surrounded by a box are category vector;, the sequence of outputs corresponding to a directed hierarchical sequence of outputs representing a first directed classification path for the text block in a multi-level hierarchical classification taxonomy (Li, Section 1, par. 1 teaches hierarchies, most commonly represented as Tree or Directed Acyclic Graph (DAG) structures [i.e., Tree or DAG structures reading on directed hierarchical sequence and directed classification path], provide a natural way to categorize and locate knowledge in large knowledge bases (KBs) [i.e., dictionaries]. For example, WordNet, Freebase, and Wikipedia use hierarchical taxonomy to organize entities into category hierarchies. These hierarchical categories could benefit applications such as document classification.; See also Fig. 1 teaching parent node and child nodes c2, c3 and c4; Li, Section 3.2 teaches in a category hierarchy, the categories at lower layers will cover fewer but more specific concepts than categories at upper layers. To capture this feature, we extend the CE model to further incorporate the ancestor categories of the target entity when predicting the context entities (see HCE Model in Figure 1).; Li, Section 5- Experiments - teaches preprocessing the category hierarchy by pruning administrative categories and deleting bottom-up edges to construct a DAG. [Note: Eberhardt has also been shown to teach connected child nodes as stated above.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the recurrent neural network for generating parse trees, as taught by Vinyals in view of Paulus, Eberhardt, Jiwei Li, and Damle, to further include the Tree or Directed Acyclic Graph (DAG) structures, as taught by Li, as tree or DAG structures provide a natural way to categorize and locate knowledge in large knowledge bases (KBs). (Li, Section 1, par. 1)
Regarding claim 5, the combination of Vinyals in view of Paulus, Eberhardt Jiwei Li, Damle and Li, teaches all of the limitations of claim 2, and the combination further teaches wherein one or more of the at least one processor circuit is to select the first one of the taxonomic hierarchy of the nodes based on a block list of disallowed nodes, the block list based on the first output from the previous/preceding position that precedes the respective position (Jiwei Li, Section 3.4 teaches “attention models adopt a look-back strategy by linking the current decoding stage with the input sentences…During decoding suppose that
e
t
s
denotes the sentence-level embedding at current step and that
h
t
-
1
s
(dec) denotes the hidden vector outputted from LSTMsentence decode at previous time step t-1 [i.e., as in a previous position]. Attention models would first link the current decoding information… which is outputted from LSTMsentence decode with each of the input sentences, characterized by a strength indicator.; Jiwei Li, Section 4 further teaches implementing the proposed autoencoder [i.e., that adopts a look-back strategy as described above] on two datasets including a hotels review data set. And further teaches considering only reviews consisting of sentences ranging from 50 to 250 words and keeping the vocabulary set consisting of most frequent words [i.e., as in allowable nodes]. Jiwei Li further teaches at Section 4 that a special <unk> token is used to denote all the remaining less frequent words and that reviews consisting of more that 2 percent of unknown words are discarded [i.e., all the remaining less frequent words consisting of unknown words marked with an <unk> token and discarded are understood as a block list of disallowed nodes]; [Note: Eberhard was also shown to each an allow list of nodes and a block list of nodes in the rejection for claim 1. Specifically, in FIG. 11C, choice C3 is removed from the choice set 1214, and the user can select choice C4 or C5 from the dropdown list [i.e., understood as a block list of nodes]]).
Motivation to combine same as stated above for claim 2.
Regarding claim 7, the combination of Vinyals in view of Paulus, Eberhardt Jiwei Li, Damle and Li, teaches all of the limitations of claim 2, and the combination further teaches wherein the encoder RNN and the decoder RNN are long short-term memory (LTSM) networks (Vinyals, Paragraph [0029]: The system includes an encoder long short-term memory (LSTM) neural network and a decoder LSTM neural network 120. Also see Paragraph [0031]).
Motivation to combine same as stated for claim 2.
Regarding claim 8, the combination of Vinyals in view of Paulus, Eberhardt, Jiwei Li, Damle and Li, teaches all of the limitations of claim 2, and the combination further teaches wherein the sequence of outputs is a first sequence of outputs, one or more of the at least one processor circuit is to process the ones of the encoder hidden states with the decoder RNN to produce a second sequence of outputs that is different than the first sequence of outputs (Vinyals, Abstract, teaches obtaining an input text segment, processing the input text segment using a first long short term memory (LSTM) neural network to convert the input text segment into an alternative representation for the input text segment, and processing the alternative representation for the input text segment using a second LSTM neural network to generate a linearized representation of a parse tree for the input text segment), the second sequence of outputs representing a second directed classification path for the text block in the multi-level hierarchical classification taxonomy that is different from the first directed classification path (Li, Section 3.1 teaches In knowledge bases such as Wikipedia, category hierarchies are usually given as DAG or tree structures, entities are categorized into one or more categories as leaves.).
Motivation to combine same as stated for claim 2.
Regarding claim 9,
Claim 9 (as amended) recites the same or similar remaining limitations as claim 2 (as amended) and therefore it is rejected under the same rationale and for the same reasons as claim 2.
Vinyals further teaches at least one non-transitory machine-readable storage medium comprising machine-readable instructions to cause at least one processor circuit to at least: (Vinyals, [claim 20] teaches A computer program product encoded on one or more non-transitory computer storage media, the computer program product comprising instruction that, when executed by one or more computers, cause the one or more computers to perform operations)
Vinyals further teaches pass encoder hidden states generated by an encoder recurrent neural network (RNN) to a decoder RNN (Vinyals, Fig. 1, element 110 encoder LSTM Network going into element 120 decoder LSTM Network; See also Vinyals, Fig. 2 elements 204 and 206 – and corresponding paragraphs [0040], [0042], [0043])
Regarding claim 12,
Claim 12 (as amended) recites the same or similar limitations as claim 5 (as amended) and therefore it is rejected under the same rationale and for the same reasons as claim 5.
Regarding claim 14,
Claim 14 recites the same or similar limitations as claim 7 and therefore it is rejected under the same rationale and for the same reasons as claim 7.
Regarding claim 15,
Claim 15 recites the same or similar limitations as claim 8 and therefore it is rejected under the same rational and for the same reasons as claim 8.
Regarding claim 16,
Claim 16 recites the same or similar limitations as claim 2 and therefore it is rejected under the same rationale and for the same reasons as claim 2.
Vinyals further teaches a method to classify text (Vinyals, Paragraph [0006] teaches in general, one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of obtaining an input text segment; processing the input text segment using a first long short term memory (LSTM) neural network to convert the input text segment into an alternative representation for the input text segment; and processing the alternative representation for the input text segment using a second LSTM neural network to generate a linearized representation of a parse tree for the input text segment.)
Regarding claim 19,
Claim 19 (as amended) recites the same or similar limitations as claim 5 (as amended) and therefore it is rejected under the same rational and for the same reasons as claim 5.
Regarding claim 21,
Claim 21 recites the same or similar limitations as claim 8 and therefore it is rejected under the same rationale and for the same reasons as claim 8.
Claims 3-4, 10-11, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Vinyals et al. in view of Paulus, Eberhardt, Jiwei Li et al., Damle and Li et al., as applied to claim 2, and further in view of Hill et al., “Learning to Understand Phrases by Embedding the Dictionary” (Published Feb., 2016))
Regarding claim 3, the combination of Vinyals in view of Paulus, Eberhardt, Jiwei Li, Damle, and Li teaches all of the limitations of claim 2. However the combination does not distinctly wherein one or more of the at least one processor circuit is to: generate the sequence of inputs corresponding to the text block by iteratively replacing a word in the text block with a respective input word embedding based on mappings stored in a second dictionary of input words.
Nevertheless, Hill teaches wherein one or more of the at least one processor circuit is to: generate the sequence of inputs corresponding to the text block by iteratively replacing a word in the text block with a respective input word embedding based on mappings stored in a second dictionary of input words (Hill, Section 3.6 teaches RNN architecture modified to create a bilingual reverse dictionary. To create the bilingual variant, we simply replace the Word2Vec target embeddings with those from a bilingual embedding space. Bilingual embedding models use bilingual corpora to learn a space of representations of the words in two languages, such that words from either language that have similar meanings are close together (Hermann and Blunsom, 2013… We trained the RNN model to map from English definitions to English words in the bilingual space.);
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the recurrent neural network for generating parse trees, as taught by Vinyals in view of Paulus, Eberhardt, Jiwei Li, Damle, and Li, to further include the target word embedding, as taught by Hill, in order to lead to improved output from more general QA and dialog systems and information retrieval engines in general. (Hill, Section 5)
Regarding claim 4, the combination of Vinyals in view of Paulus, Eberhardt, Jiwei Li, Damle, Li, and Hill teaches all of the limitations of claim 3, and the combination further teaches wherein one or more of the at least one processor circuit is to apply, during training of the encoder RNN and the decoder RNN (Vinyals 0065 teaches in order to configure the encoder LSTM neural network and the decoder LSTM neural network, the system can train the networks using conventional machine learning training techniques, e.g., using Stochastic Gradient Descent with backpropagation through time. In particular, the system can train the networks jointly by backpropagating gradients computed for the decoder LSTM neural network back to the encoder LSTM neural network to adjust the values of the parameters of the encoder LSTM neural network during the training technique.; Vinyals 0032 teaches the encoder LSTM neural network 110 has been configured, e.g., through training, to process each word in a given input text segment to generate the alternative representation of the input text segment in accordance with a set of parameters.), an embedded layer to learn the respective embeddings for respective ones the class labels in the first dictionary and respective ones of the input words in the second dictionary (Li, Introduction, teaches training the category and entity vectors on Wikipedia, and then evaluating the methods from two applications concept categorization and dataless hierarchical classification; Li Introduction further teaches in this paper we propose two models to simultaneously learn entity and category representation from large-scale knowledge bases [i.e., the first dictionary]. The category embedding model extends the entity embedding method of (Hu et al. 2015) by using category information with entities to learn entity and category embeddings. The hierarchical category embedding model extends the category embedding model by integrating categories’ hierarchical structure…The final learned entity and category vectors can capture meaningful semantic relatedness between entities and categories.; Li, Section 5.2.1 teaches three controlled vocabularies [note: understood to read on first and second dictionaries]).
Motivation to combine same as stated above for claim 3.
Regarding claim 10,
Claim 10 recites the same or similar limitations as claim 3 and therefore it is rejected under the same rationale and for the same reasons as claim 3.
Regarding claim 11,
Claim 11 recites the same or similar limitations as claim 4 and therefore it is rejected under the same rational and for the same reasons as claim 4.
Regarding claim 17,
Claim 17 recites the same or similar limitations as claim 3 and therefore it is rejected under the same rationale and for the same reasons as claim 3.
Regarding claim 18,
Claim 18 recites the same or similar limitations as claim 4 and therefore it is rejected under the same rational and for the same reasons as claim 4.
Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Vinyals in view of Paulus, Eberhardt, Jiwei Li, Damle, and Li, as applied to claim 2, and further in view of Chung et al., “Empirical evaluation of gated recurrent neural networks on sequence modeling”, (2014)
Regarding claim 6, the combination of Vinyals in view of Paulus, Eberhardt, Jiwei Li, Damle and Li, teaches all of the limitations of claim 2, however the combination does not distinctly disclose wherein the encoder RNN and the decoder RNN are gated recurrent unit (GRU) neural networks.
Nevertheless, Chung teaches wherein the encoder RNN and the decoder RNN are gated recurrent unit (GRU) neural networks (Chung, Abstract, teaches in this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU).; Chung, Discussion § 3.3: both LSTM unit and Gated Recurrent Unit are similar in a way, both can keep the existing content and add the new content on top of it. It is easy for each unit to remember the existence of a specific feature in the input stream for a long series of steps.).
Before the effective filing date of the claimed invention, it would’ve been obvious to a person of ordinary skill in the art to modify the recurrent neural network for generating parse trees, as taught by Vinyals in view of Paulus, Eberhardt, Jiwei Li, Damle and Li, to further include the Gated Recurrent Unit (GRU), as taught by Cho. The motivation would be GRU recurrent neural network are indeed better than more traditional recurrent units (e.g., LSTM) as convergence in CPU time may be reached faster and the final solutions tend to be better. (Chung, Abstract and Section 5).
Regarding claim 13,
Claim 13 recites the same or similar limitations as claim 6 and therefore it is rejected under the same rationale and for the same reasons as claim 6.
Regarding claim 20,
Claim 20 recites the same or similar limitations as claims 6 and 7 and therefore it is rejected under the same rationale and for the same reasons as claims 6 and 7.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BEATRIZ RAMIREZ BRAVO whose telephone number is 571-272-2156. The examiner can normally be reached Mon. - Fri. 7:30a.m.-5:00p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, USMAAN SAEED can be reached at 571-272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/B.R.B./Examiner, Art Unit 2146
/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146