Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12/29/2025 has been entered.
Response to Amendment
The amendment filed 12/29/2025 has been entered. The status of the claims is as follows:
Claims 1-19 remain pending in the application.
Claim 20 is canceled.
Claims 1, 17-19 are amended.
Response to Arguments
In reference to the Claim Rejections under 35 U.S.C 103:
Argument 1:
Applicant asserts in Remarks pg. 2 that the Office Action acknowledges that Lu does not disclose deep neural network layers and asserts on page 15 that "Fig.1 discloses the AND-OR Grammar (AOG) structure operates hierarchically, with nodes combining or selecting components at different levels. This is conceptually similar to how layers in neural network process and combine features." Applicant further submits that Lu's AOG structure is not analogous to Applicant's claimed one or more compositional grammatical neural network node layers, wherein at least one of the one or more compositional grammatical neural network node layers comprises an AND-OR grammar building block. Accordingly, Applicant concludes that Lu fails to disclose the features for which it is cited.
Response 1:
Examiner respectfully disagrees and notes that Applicant’s argument is inconsistent with Applicant’s own disclosure in the Specification. Claim 1 recites, in substance, “compositional grammatical neural network node layer”. In the Instant Specification, paragraph ¶[0049] explicitly identifies the “compositional grammatical neural network node layer” as the structure labeled 102 in FIG.1 of the Drawings (i.e., the AOG building-block depiction comprising AND-nodes, OR-nodes, and terminal-nodes arranged in layered form). As shown in FIG. 1 of Lu reference, the block labeled “AND-OR Grammatical Building Block” depicts a compositional grammar structure with AND-nodes, OR-nodes, and terminal-nodes and corresponding layered grouping (e.g., node layers denoted along the right side), operating on output feature maps and producing an output feature map. This depiction is the same type of node-layer architecture that ¶[0049] identifies as the claimed “compositional grammatical neural network node layer” (drawing element 102). Accordingly, FIG. 1 of Lu does disclose the claimed compositional grammatical neural network node layers, at least because Applicant’s Specification expressly defines that claimed feature as the AOG node-layer structure shown in the drawings.
Applicant’s arguments filed on 12/29/2025 have been fully considered but they are not persuasive.
Argument 2:
Applicant asserts in Remarks pg. 3 that Tu also fails to disclose one or more compositional grammatical neural network node layers, wherein at least one of the one or more compositional grammatical neural network node layers comprises an AND-OR grammar building block as claimed. Further, Applicant submits that Tu is silent regarding channel-wise concatenation, element-wise summation, or per-channel learnable gates as recited in Claim 1.
In response, the Examiner maintains the rejection as shown in the previous office action. Applicant is attempting to argue the references in a vacuum. Tu was not cited for all the limitations Applicant discussed, and has no requirement to meet these limitations. The Tu reference was only used to show the limitation “wherein the AND-OR grammar building block comprises a graph of stacked and interconnected plurality of AND nodes configured to concatenate features from connected child nodes, plurality of OR nodes configured to element-wise sum features from connected child nodes, and plurality of terminal-nodes each configured to select or output a channel-wise slice of a given input feature channel that connects in a set of combinations of AND nodes and OR nodes to the N groups of inputted features of each of the plurality of feature channels.” This is all that Tu needs to disclose, and it is not required to meet limitations it has not been cited for. Regarding the limitation that Applicant is arguing about, Examiner has explained and noted above to show how Lu reference teaches the “one or more compositional grammatical neural network node layers, wherein at least one of the one or more compositional grammatical neural network node layers comprises an AND-OR grammar building block”.
In reference to the Applicant’s argument stating that Tu is silent regarding channel-wise concatenation, element-wise summation, or per-channel learnable gates as recited in Claim 1. Examiner respectfully disagrees and notes that in the previous office action, Examiner clearly show how Tu teaches the AND-node specifying the string concatenation (See Tu Pg. 6, Section 2.1, ¶[2]). Tu in Pg. 7, ¶[3] also teaches “Or-nodes and And-nodes of the AOG can be used to represent sum nodes and product nodes in the SPN respectively”. Further, Examiner notes that ¶[0018] of the Specification also discloses “wherein the AND-OR grammar building block comprises an input that maps N groups of input-able features (e.g., via a terminal node configured to extract a word, sub- word, phrase, or sub-phrase) from one or more feature channel”. Accordingly, Tu explicitly concatenating all the substrings using AND-node and performing summation using OR-node. Regarding the new amended elements in claim 1 reciting the per-channel learnable gates, Tu also teaches that Pg. 20, ¶[3]: “A sum node computes a weighted sum of its child nodes” and Pg. 21, ¶[1] also discloses “an Or-rule with the sum node as the left-hand side, the child node as the right-hand side, and the normalized weight of the child node as the conditional probability”. Therefore, the rejection is maintained as shown in the previous office action.
Applicant’s arguments filed on 12/29/2025 have been fully considered but they are not persuasive.
Argument 3:
Applicant asserts in Remarks pg. 3-5 that the 103 rejection is improper because the Office Action allegedly fails to consider claim 1 “as a whole” and instead uses a piecemeal approach by picking unrelated features from different references without a reference-based reason to combine them.
Response 3:
Applicant’s allegation that the Office Action provides only “conclusory” motivation is not persuasive because the proposed combinations are supported by a concrete, reference-based technical rationale, not merely a citation to MPEP 2143. Lu expressly provides a compositional And-Or graph framework for jointly tracking/ learning/ parsing, and Harang teaches implementing multiple related tasks using a multi-layer neural network with multiple nodes per layer; the ordinary skill in the art would have been motivated to incorporate Harang’s multi-task NN into Lu’s AOG-based framework to perform Lu’s multiple, related recognition tasks using a known and routinely adopted neural network implementation, yielding predictable improvements (e.g., shared feature learning and improved task performance) with a reasonable expectation of success. Likewise, Lu and Tu are directed to the same class of structured And-Or representations, Tu teaches stochastic And-Or grammars for modeling images and events data, so an ordinary skill in the art would have been motivated to apply Tu’s stochastic grammar formulation to Lu’s hierarchical AOG framework to provide a known probabilistic extension for modeling uncertainty in the same domain, again representing a predictable variation of a known technique for use in related systems. Accordingly, the Office Action’s rationale is tied to the references’ teachings and is not an improper “piecemeal” reconstruction.
Applicant’s arguments filed on 12/29/2025 have been fully considered but they are not persuasive.
Claim Rejections - 35 USC § 112(a) – New Matter
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-19 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
The recitation of “plurality of OR-nodes configured to perform an element-wise sum with learned per-channel gating weights applied to features from concatenated child nodes” in claims 1, 18 and 19 is not supported by the Specification. In Remarks pg. 2, Applicant points to these paragraphs in the Specification: ¶0053-0054], ¶[0079], ¶[0121-0122] and ¶[0173] as the support for the new amended elements in claim 1. However, the Examiner notes that the Specification at no time discusses the “learned per-channel gating weights” (at least in the cited paragraphs above), let alone performing an element-wise sum with it. At no other point in the specification is there any discussion of “learned per-channel gating weights” applied to features from concatenated child nodes, and therefore the claims are rejected under U.S.C. 112(a) as new matter. In the next response, please point to the portions of the specification supporting these limitations.
Dependent claims 2-17 are also rejected under the same rationale as they inherit the deficiency from their independent claims 1, 18 and 19.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-8 and 12-19 are rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. (“Online Object Tracking, Learning and Parsing with And-Or Graphs”) (hereafter referred to as “Lu”) in view of Harang (US 2019/0266492 A1) and further in view of Kewei Tu (“Stochastic And-Or Grammars: A Unified Framework and Logic Perspective”) (hereafter referred to as “Tu”)
Regarding Claim 1, Lu explicitly discloses:
instantiating one or more compositional grammatical neural network node layers,
wherein at least one of the one or more compositional grammatical neural network node layers comprises an AND-OR grammar building block having a plurality of split inputs that span across a plurality of feature channels of an input feature map. (Lu, Page 3462, Col. 2, Section i.): “an AOG represents an object in a hierarchical and compositional manner which has three types of nodes: an And-node represents the rule of decomposing a complex structure (e.g., a walking person or a running basketball player) into simple ones; an Or-node represents alternative structures at both object and part levels which can capture different poses and viewpoints and partial occlusion; and a Terminal-node grounds the representational symbol to image data using different appearance templates to capture local appearance change.”, Page 3463, Fig. 1:
PNG
media_image1.png
585
948
media_image1.png
Greyscale
) [Examiner’s note: Fig.1 discloses the AND-OR Grammar (AOG) structure operates hierarchically, with nodes combining or selecting components at different levels. This is conceptually similar to how layers in neural network process and combine features.]
wherein the plurality of split inputs of the AND-OR grammar building block maps N groups of input-able features from the plurality of feature channels, and (Lu, Page 3463, Fig. 1:
PNG
media_image2.png
560
987
media_image2.png
Greyscale
, Page 3463, Co. 1, Section 1.2.i): “i) The AOG for modeling the tracked object. Given the input bounding box of the object in the first frame (top-left), we divide the bounding box into a r × c cells (3 × 3 here). The set of primitive parts are then enumerated in the r × c cells, which quantize the hypothesis space of AOG using the method proposed in [31]. The quantization is capable of exploring a large number of latent part configurations (capturing discriminative and stable parts at different frames), meanwhile it makes the problem of online learning AOG feasible.”) [Examiner’s note: The process begins with an input bounding box of the object (top-left in the image). The bounding box is divided into smaller cells (e.g., r x c), and “primitive parts” are enumerated within these cells. This is conceptually similar to extracting features from different regions of an input image, akin to feature channels in convolutional network]
Lu fails to disclose:
neural network node layers
wherein the AND-OR grammar building block comprises a graph of stacked and interconnected plurality of AND nodes configured to concatenate features from connected child nodes, plurality of OR nodes configured to element-wise sum features from connected child nodes, and plurality of terminal-nodes each configured to select or output a channel-wise slice of a given input feature channel that connects in a set of combinations of AND nodes and OR nodes to the N groups of inputted features of each of the plurality of feature channels.
However, Harang explicitly discloses:
neural network node layers (Harang, ¶[0060]: “FIGS. 4a, 4b and 4c are schematic illustrations of a neural network model (e.g., similar to the neural network model 113 shown and described with respect to FIG.1), according to an embodiment. In FIGS. 4a, 4b and 4c, the neural network model includes an input layer 410, a hidden layer-1 420, a hidden layer-2 430 and an output layer 440. The input layer 410 includes input nodes that are used for receiving features associated with one or more input files ( e.g., images, documents and/or the like). The hidden layer-1 420 and the hidden layer-2 430 are used for classification of one or more input files. The hidden layer-1 420 and the hidden layer-2 430 include a set of interconnected nodes.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Lu and Harang. Lu teaches a framework for simultaneously tracking, learning and parsing objects with a hierarchical and compositional And-Or graph (AOG) representation. Harang teaches performing multiple tasks using neural network which has multiple different layers, where each layer includes multiple nodes. One of ordinary skill would have motivation to combine Lu and Harang because MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results; (E): “Obvious to try” choosing from a finite number identified, predictable solutions, with a reasonable expectation of success; (F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of the ordinary skill in the art.
However, Tu explicitly discloses:
wherein the AND-OR grammar building block comprises a graph of stacked and interconnected plurality of AND nodes configured to concatenate features from connected child nodes, plurality of OR nodes configured to perform an element-wise sum with learned per-channel gating weights applied to features from connected child nodes, and plurality of terminal-nodes each configured to select or output a channel-wise slice of a given input feature channel that connects in a set of combinations of AND nodes and OR nodes to the N groups of inputted features of each of the plurality of feature channels. (Tu, Pg. 5, Figure 1: “(a) A graphical representation of an example stochastic AOG of line drawings of animal faces. Each And-rule is represented by an And-node and all of its child nodes in the graph. The spatial relations within each And-rule are not shown for clarity. Each Or-rule is represented by an Or-node and one of its child nodes, with its probability shown on the corresponding edge. (b) A line drawing image and its compositional structure generated from the example AOG. Again, the spatial relations between nodes are not shown for clarity. The probability of the compositional structure is partially computed at the top right.
PNG
media_image3.png
439
512
media_image3.png
Greyscale
”, Tu, Pg. 4, ¶[1]: “An Or-rule, parameterized by an ordered pair <r, p>, represents an alternative configuration of a pattern. The Or-rule specifies a production r : O->x, where O is an Or-node and x is either a terminal or a nonterminal node representing a possible configuration.”, Pg. 4, ¶[3]: “Fig. 1(a) shows an example stochastic context-free AOG of line drawings. Each terminal or nonterminal node represents an image patch and its parameter is a 2D vector representing the position of the patch in the image. Each terminal node denotes a line segment of a specific orientation while each nonterminal node denotes a class of line drawing patterns.”, Tu, Pg. 6, Section 2.1, ¶2]: “In a stochastic AOG representing a SCFG, each node represents a string and the parameter of a node is the start/end positions of the string in the complete sentence; the parameter relation and parameter function in an And-rule specify string concatenation, i.e., the substrings must be adjacent and the concatenation of all the substrings forms the composite string represented by the parent And-node”, Pg. 20, ¶[3]: “A sum node computes a weighted sum of its child nodes”, Pg. 21, ¶[1]: “an Or-rule with the sum node as the left-hand side, the child node as the right-hand side, and the normalized weight of the child node as the conditional probability”)
such that the AND-OR grammar building block defines a phrase structure grammar and dependency grammar in a bottom-up configuration (Tu, Pg. 7, Section 2.2, ¶1]: “Our algorithm is based on bottom-up dynamic programming and can be seen as a generalization of several previous exact inference algorithms designed for special cases of stochastic AOGs (such as the CYK algorithm for text parsing).”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Lu and Tu. Lu teaches a framework for simultaneously tracking, learning and parsing objects with a hierarchical and compositional And-Or graph (AOG) representation. Tu teaches a stochastic And-Or grammars extending traditional stochastic grammars of language to model images or events data. One of ordinary skill would have motivation to combine Lu and Tu because MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results; (E): “Obvious to try” choosing from a finite number identified, predictable solutions, with a reasonable expectation of success; (F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of the ordinary skill in the art.
Regarding Claim 2, the combination of Lu, Tu and Harang discloses all the limitations of Claim 1 (as shown in the rejections above)
Lu in view of Harang and Tu further discloses:
wherein the graph of interconnected plurality of AND nodes and plurality of OR
nodes are configured in a plurality of stacked stages, including a first stage followed by a second stage, wherein the first stage comprises at least one AND-node, and wherein the second stage comprises at least one OR-node. (Lu, Page 3463, Fig.1:
PNG
media_image1.png
585
948
media_image1.png
Greyscale
, Page 3464, Col. 1, ¶[2]: “In the subsequent frames, the AOG is re-learned iteratively with two steps: The first step collects the false positives and false negatives of the current AOG in a new frame by exploring the temporal and spatial constraints in the trajectory, similar to the P-N learning proposed in TLD [19], and the second step updates the structure of the AOG (e.g., adding a new object template and/or some part configurations and corresponding templates) if necessary, and re-estimates the parameters based on the augmented training dataset.”) [Examiner’s note: Figure 1 discloses parsing and re-learning process occur iteratively, where each stage builds on the previous one by incorporating new information and refining the graph structure. The iterative nature of this process (initial stage -> re-learning with updates) demonstrates a stacked-stage configuration)]
Regarding Claim 3, the combination of Lu, Tu and Harang discloses all the limitations of Claim 1 (as shown in the rejections above)
Lu in view of Harang and Tu further discloses:
wherein the graph of interconnected plurality of AND nodes and plurality of OR nodes are configured in a plurality of stacked stages, including a first stage followed by a second stage, wherein the first stage comprises at least one OR-node, and wherein the second stage comprises at least one AND-node. (Lu, Page 3463, Fig.1:
PNG
media_image1.png
585
948
media_image1.png
Greyscale
, Page 3464, Col. 1, ¶[2]: “In the subsequent frames, the AOG is re-learned iteratively with two steps: The first step collects the false positives and false negatives of the current AOG in a new frame by exploring the temporal and spatial constraints in the trajectory, similar to the P-N learning proposed in TLD [19], and the second step updates the structure of the AOG (e.g., adding a new object template and/or some part configurations and corresponding templates) if necessary, and re-estimates the parameters based on the augmented training dataset.”) [Examiner’s note: Figure 1 discloses parsing and re-learning process occur iteratively, where each stage builds on the previous one by incorporating new information and refining the graph structure. The iterative nature of this process (initial stage -> re-learning with updates) demonstrates a stacked-stage configuration)]
Regarding Claim 4, the combination of Lu, Tu and Harang discloses all the limitations of Claim 3 (as shown in the rejections above)
Lu in view of Harang and Tu further discloses:
wherein the first stage comprises a first OR-node and a second OR-node, wherein the first OR-node is connected to a portion of the input, and wherein the second OR-node is connected to another portion of the input and to the first OR-node (Lu, Page 3463, Fig.1:
PNG
media_image1.png
585
948
media_image1.png
Greyscale
) [Examiner’s note: The “Initial AOG” on the left of Figure 1 is being interpreted as the first stage. With the OR-node being shown by the green circle, the Examiner interprets there are 4 OR-nodes (those at the bottom of the tree) connected to parts of the input image, wherein these child OR-nodes also connected to the parent OR-node]
Regarding Claim 5, the combination of Lu, Tu and Harang discloses all the limitations of Claim 2 (as shown in the rejections above)
Lu in view of Harang and Tu further discloses:
wherein the first stage comprises a first OR-node and a second OR-node, wherein the first OR-node is connected to a portion of the input, and wherein the second OR-node is connected to another portion of the input. (Lu, Page 3463, Fig.1:
PNG
media_image1.png
585
948
media_image1.png
Greyscale
) [Examiner’s note: The “Initial AOG” on the left of Figure 1 is being interpreted as the first stage. With the OR-node being shown by the green circle, the Examiner interprets there are 4 OR-nodes (those at the bottom of the tree) connected to parts of the input image, wherein these child OR-nodes also connected to the parent OR-node]
Regarding Claim 6, the combination of Lu, Tu and Harang discloses all the limitations of Claim 2 (as shown in the rejections above)
Lu in view of Harang and Tu further discloses:
wherein the first stage comprises a first OR-node and a second OR-node, wherein the first OR-node is connected to a portion of the input, and wherein the second OR-node is connected to another portion of the input. (Lu, Page 3463, Fig.1:
PNG
media_image1.png
585
948
media_image1.png
Greyscale
) [Examiner’s note: The “Initial AOG” on the left of Figure 1 is being interpreted as the first stage. With the OR-node being shown by the green circle, the Examiner interprets there are 4 OR-nodes (those at the bottom of the tree) connected to parts of the input image, wherein these child OR-nodes also connected to the parent OR-node]
Regarding Claim 7, the combination of Lu, Tu and Harang discloses all the limitations of Claim 1 (as shown in the rejections above)
Lu in view of Harang and Tu further discloses:
wherein the AND-OR grammar building block comprises a first hyper-parameter associated with a number of N groups of input-able features. (Lu, Fig.1:
PNG
media_image1.png
585
948
media_image1.png
Greyscale
Page 3463, Col. 2, Section ii): “We maintain a DP table memoizing the candidate object states generated by the spatial DP algorithm in the past n frames (e.g., 20 in our experiments). The temporal DP algorithm is then used to find the optimal solutions for the n frames, which can help correct tracking errors (i.e., false negatives and false positives collected online) by leveraging more spatial-temporal information.”) [Examiner’s note: “hyper-parameter associated with a number of N groups of input-able features” is being interpreted as n frames of the input image in the initial stage of AOG]
Regarding Claim 8, the combination of Lu, Tu and Harang discloses all the limitations of Claim 1 (as shown in the rejections above)
Lu in view of Harang and Tu further discloses:
wherein the AND-OR grammar building block comprises a second hyper-parameter associated with a branching factor for each AND-nodes in the AND-OR grammar building block. (Lu, Page 3466, Col. 1, Section 4.1, ¶[1-2]: “To compute
PNG
media_image4.png
27
123
media_image4.png
Greyscale
, we do parsing inside ΛBi with the current AOG G with the optimal configuration
C
i
*
being sought. We denote this parsing process by
PNG
media_image5.png
28
125
media_image5.png
Greyscale
which is given in Algorithm 1 in the supplementary material. The basic idea is that for a given candidate Bi, we want to find the best of all possible parse trees in the AOG, and for each parse tree we want to find the best part configuration (through local deformation of the Terminal-nodes)… If the DP solution has high confidence matching score based on the online learned threshold, it will be accepted. Otherwise, it keeps all the candidates with scores greater than some threshold (e.g., 70% of the high confidence threshold), and then run DP algorithm in the “surround” with all the candidates kept in the similar manner, followed by running the temporal DP algorithm.”, Page 3466, Col.2, Section 5: “In this paper, we do not use the quadratic deformation term as done in the DPM, instead we use local max when summing the scores over child nodes for an And-node (as written in Algorithm.1 in the supplementary). The local deformation range is proportional to the side lengths of a terminal-node (e.g., 0.1 in this paper).”) [Examiner’s note: “hyper-parameter associated with a branching factor for each AND-nodes” is being interpreted as the confidence matching score threshold (e.g., 70%) as it limits the branching factor by pruning less likely branches]
Regarding Claim 12, the combination of Lu, Tu and Harang discloses all the limitations of Claim 1 (as shown in the rejections above)
Lu in view of Harang and Tu further discloses:
wherein the one or more compositional grammatical neural network node layers define a deep neural network structure that comprises a second compositional grammatical neural network node layer, wherein the second compositional grammatical neural network node layer comprises an AND-OR grammar building block, (Lu, Page 3462, Col. 2, Section i.): “an AOG represents an object in a hierarchical and compositional manner which has three types of nodes: an And-node represents the rule of decomposing a complex structure (e.g., a walking person or a running basketball player) into simple ones; an Or-node represents alternative structures at both object and part levels which can capture different poses and viewpoints and partial occlusion; and a Terminal-node grounds the representational symbol to image data using different appearance templates to capture local appearance change.”, Page 3463, Fig. 1:
PNG
media_image2.png
560
987
media_image2.png
Greyscale
) [Examiner’s note: Fig.1 discloses the AND-OR Grammar (AOG) structure operates hierarchically, with nodes combining or selecting components at different levels. This is conceptually similar to how layers in neural network process and combine features.]
wherein the AND-OR grammar building block comprises an input that maps N groups of input-able features from one or more feature channels, and (Lu, Page 3463, Fig. 1:
PNG
media_image2.png
560
987
media_image2.png
Greyscale
, Page 3463, Co. 1, Section 1.2.i): “i) The AOG for modeling the tracked object. Given the input bounding box of the object in the first frame (top-left), we divide the bounding box into a r × c cells (3 × 3 here). The set of primitive parts are then enumerated in the r × c cells, which quantize the hypothesis space of AOG using the method proposed in [31]. The quantization is capable of exploring a large number of latent part configurations (capturing discriminative and stable parts at different frames), meanwhile it makes the problem of online learning AOG feasible.”) [Examiner’s note: The process begins with an input bounding box of the object (top-left in the image). The bounding box is divided into smaller cells (e.g., r x c), and “primitive parts” are enumerated within these cells. This is conceptually similar to extracting features from different regions of an input image, akin to feature channels in convolutional network]
wherein the AND-OR grammar building block comprises a graph of stacked and interconnected plurality of AND nodes and plurality of OR nodes that connects in a set of combinations of AND nodes and OR nodes to the N groups of inputted features of each of the one or more feature channels (Lu, Page 3465, Col.1, ¶[2]: “The AOG is a directed acyclic graph, denoted by G = (V,E). The node set V consists of three subsets of Or-nodes, And-nodes and Terminal-nodes respectively, which represent different aspects of modeling objects in a grammatical manner [39]. From the top to bottom, the AOG consists of: The object Or-node (plotted by green circles), which represents alternative object configurations; A set of And-nodes (solid blue circles), each of which represents a typical configuration of the tracked object; A set of part Or-nodes, which handle local variations and configurations in a recursive manner; A set of Terminal-nodes (red rectangles), which link the whole object and parts to the image data (i.e., grounding the symbols), and take into account appearance Or-node (i.e., local appearance mixture) and occlusions (e.g., the head-shoulder of a walking person before and after opening a sun umbrella).”, Page 3463, Fig.1:
PNG
media_image1.png
585
948
media_image1.png
Greyscale
)
Regarding Claim 13, the combination of Li, Tu and Harang discloses all the limitations of Claim 1 (as shown in the rejections above)
Li in view of Harang and Tu further discloses:
wherein the one or more compositional grammatical neural network node layers define a deep neural network structure that comprises one or more Conv-BatchNorm-ReLu stage that connects to a first instantiated compositional grammatical neural network node layer. (Li, Page 2, Figure 2:
PNG
media_image6.png
326
1181
media_image6.png
Greyscale
) [The highlight indicates one or more Conv-BatchNorm-ReLu stage that connects to a first instantiated compositional grammatical neural network node layer]
Regarding Claim 14, the combination of Lu, Tu and Harang discloses all the limitations of Claim 1 (as shown in the rejections above)
Lu in view of Harang and Tu further discloses:
wherein the one or more compositional grammatical neural network nodes comprise a second AND-OR grammar building block. (Lu, Page 3462, Col. 2, Section i.): “an AOG represents an object in a hierarchical and compositional manner which has three types of nodes: an And-node represents the rule of decomposing a complex structure (e.g., a walking person or a running basketball player) into simple ones; an Or-node represents alternative structures at both object and part levels which can capture different poses and viewpoints and partial occlusion; and a Terminal-node grounds the representational symbol to image data using different appearance templates to capture local appearance change.”, Page 3463, Fig. 1:
PNG
media_image2.png
560
987
media_image2.png
Greyscale
) [Examiner’s note: Fig.1 discloses the AND-OR Grammar (AOG) structure operates hierarchically, with nodes combining or selecting components at different levels. This is conceptually similar to how layers in neural network process and combine features.]
Regarding Claim 15, the combination of Lu, Tu and Harang discloses all the limitations of Claim 1 (as shown in the rejections above)
Lu in view of Harang and Tu further discloses:
classifying an image using the instantiated one or more neural network nodes. (Lu, Page 3462, Col. 1, Section 1.1: “Given a specified object in the first frame of a video, the objective of online object tracking is to locate it in the subsequent frames with bounding boxes.”, Page 3462, Col. 2, Section i: “and a Terminal-node grounds the representational symbol to image data using different appearance templates to capture local appearance change. Both the structure and appearance of the AOG will be dis-triminatively trained online to account for the variations of a tracked object against its scene backgrounds.”) [Examiner’s note: locating object in subsequent image/video frames is being interpreted as classifying an image]
Regarding Claim 16, the combination of Li, Tu and Harang discloses all the limitations of Claim 1 (as shown in the rejections above)
Lu in view Harang and Tu further discloses:
classifying a linguistic text body using the instantiated one or more neural network nodes. (Harang, [0070]: “Thus, the nodes used to perform a first task (such as classifying a first set of documents having a first characteristic) are substantially independent from the nodes used to perform a second task (such as classifying a second set of documents having a second characteristic) and the neural network is effectively operating as two separate neural networks.”)
using the instantiated one or more neural network nodes (Lu, Page 3465, Col. 1, ¶[2-3]: “The AOG is a directed acyclic graph, denoted by G = (V,E). The node set V consists of three subsets of Or-nodes, And-nodes and Terminal-nodes respectively… A parse tree is an instantiation of the AOG with the best child node of each encountered Or-node being selected.”)
Regarding Claim 17, the combination of Lu, Tu and Harang discloses all the limitations of Claim 1 (as shown in the rejections above)
Lu in view of Harang and Tu further discloses:
wherein an N group of inputted features of at least one of the one or more feature channels includes at least two groups. (Lu, Page 3464, Col. 1, Section iii): “The learning is done incrementally as time evolves, starting with a small set of positive examples (bootstrapped based on the given bounding box) and a set of negative examples (mined from outside of the given bounding box) to train the initial AOG.”) [Examiner’s note: 2 groups: set of positive features and set of negative features]
Regarding Claim 18, Lu explicitly discloses:
instantiate one ore more compositional grammatical neural network node layers, having a plurality of split inputs that span across a plurality of feature channels of an input feature map, wherein at least one of the one ore more compositional grammatical neural network node layers comprises an AND-OR grammar building block, (Lu, Page 3462, Col. 2, Section i.): “an AOG represents an object in a hierarchical and compositional manner which has three types of nodes: an And-node represents the rule of decomposing a complex structure (e.g., a walking person or a running basketball player) into simple ones; an Or-node represents alternative structures at both object and part levels which can capture different poses and viewpoints and partial occlusion; and a Terminal-node grounds the representational symbol to image data using different appearance templates to capture local appearance change.”, Page 3463, Fig. 1:
PNG
media_image1.png
585
948
media_image1.png
Greyscale
) [Examiner’s note: Fig.1 discloses the AND-OR Grammar (AOG) structure operates hierarchically, with nodes combining or selecting components at different levels. This is conceptually similar to how layers in neural network process and combine features.]
wherein the plurality of split inputs of the AND-OR grammar building block maps N groups of input-able features from the plurality of feature channels, and (Lu, Page 3463, Fig. 1:
PNG
media_image2.png
560
987
media_image2.png
Greyscale
, Page 3463, Co. 1, Section 1.2.i): “i) The AOG for modeling the tracked object. Given the input bounding box of the object in the first frame (top-left), we divide the bounding box into a r × c cells (3 × 3 here). The set of primitive parts are then enumerated in the r × c cells, which quantize the hypothesis space of AOG using the method proposed in [31]. The quantization is capable of exploring a large number of latent part configurations (capturing discriminative and stable parts at different frames), meanwhile it makes the problem of online learning AOG feasible.”) [Examiner’s note: The process begins with an input bounding box of the object (top-left in the image). The bounding box is divided into smaller cells (e.g., r x c), and “primitive parts” are enumerated within these cells. This is conceptually similar to extracting features from different regions of an input image, akin to feature channels in convolutional network]
Li fails to disclose:
a processor;
and a memory having instructions stored thereon, wherein execution of the instructions by the processor causes the processor to
wherein the AND-OR grammar building block comprises a graph of stacked and interconnected plurality of AND nodes configured to concatenate features from connected child nodes, plurality of OR nodes configured to element-wise sum features from connected child nodes, and plurality of terminal-nodes each configured to select or output a channel-wise slice of a given input feature channel that connects in a set of combinations of AND nodes and OR nodes to the N groups of inputted features of each of the plurality of feature channels.
However, Harang explicitly discloses:
a processor; (Harang [0078]: “Hardware modules may include, for example, a general-purpose processor,”)
and a memory having instructions stored thereon, wherein execution of the instructions by the processor causes the processor to (Harang [0076]: “Some embodiments described herein relate to a computer storage product with a non-transitory computer readable medium (also can be referred to as a non-transitory processor-readable medium) having instructions or computer code thereon for performing various computer-implemented operations.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Lu and Harang. Lu teaches a framework for simultaneously tracking, learning and parsing objects with a hierarchical and compositional And-Or graph (AOG) representation. Harang teaches training a network with multiple connected nodes to classify files, documents, images and the like. One of ordinary skill would have motivation to combine Lu and Harang in order to enable quick access to the instructions, reducing delays during execution (Harang [0014])
However, Tu explicitly discloses:
wherein the AND-OR grammar building block comprises a graph of stacked and interconnected plurality of AND nodes configured to concatenate features from connected child nodes, plurality of OR nodes configured to element-wise sum features from connected child nodes, and plurality of terminal-nodes each configured to select or output a channel-wise slice of a given input feature channel that connects in a set of combinations of AND nodes and OR nodes to the N groups of inputted features of each of the plurality of feature channels. (Tu, Pg. 5, Figure 1: “(a) A graphical representation of an example stochastic AOG of line drawings of animal faces. Each And-rule is represented by an And-node and all of its child nodes in the graph. The spatial relations within each And-rule are not shown for clarity. Each Or-rule is represented by an Or-node and one of its child nodes, with its probability shown on the corresponding edge. (b) A line drawing image and its compositional structure generated from the example AOG. Again, the spatial relations between nodes are not shown for clarity. The probability of the compositional structure is partially computed at the top right.
PNG
media_image3.png
439
512
media_image3.png
Greyscale
”, Tu, Pg. 4, ¶[1]: “An Or-rule, parameterized by an ordered pair <r, p>, represents an alternative configuration of a pattern. The Or-rule specifies a production r : O->x, where O is an Or-node and x is either a terminal or a nonterminal node representing a possible configuration.”, Pg. 4, ¶[3]: “Fig. 1(a) shows an example stochastic context-free AOG of line drawings. Each terminal or nonterminal node represents an image patch and its parameter is a 2D vector representing the position of the patch in the image. Each terminal node denotes a line segment of a specific orientation while each nonterminal node denotes a class of line drawing patterns.”, Tu, Pg. 6, Section 2.1, ¶2]: “In a stochastic AOG representing a SCFG, each node represents a string and the parameter of a node is the start/end positions of the string in the complete sentence; the parameter relation and parameter function in an And-rule specify string concatenation, i.e., the substrings must be adjacent and the concatenation of all the substrings forms the composite string represented by the parent And-node”, Pg. 20, ¶[3]: “A sum node computes a weighted sum of its child nodes”, Pg. 21, ¶[1]: “an Or-rule with the sum node as the left-hand side, the child node as the right-hand side, and the normalized weight of the child node as the conditional probability”)
such that the AND-OR grammar building block defines a phrase structure grammar and dependency grammar in a bottom-up configuration (Tu, Pg. 7, Section 2.2, ¶1]: “Our algorithm is based on bottom-up dynamic programming and can be seen as a generalization of several previous exact inference algorithms designed for special cases of stochastic AOGs (such as the CYK algorithm for text parsing).”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Lu and Tu. Lu teaches a framework for simultaneously tracking, learning and parsing objects with a hierarchical and compositional And-Or graph (AOG) representation. Tu teaches a stochastic And-Or grammars extending traditional stochastic grammars of language to model images or events data. One of ordinary skill would have motivation to combine Lu and Tu because MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results; (E): “Obvious to try” choosing from a finite number identified, predictable solutions, with a reasonable expectation of success; (F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of the ordinary skill in the art.
Regarding Claim 19, Lu explicitly discloses:
instantiate one ore more compositional grammatical neural network node layers, having a plurality of split inputs that span across a plurality of feature channels of an input feature map, wherein at least one of the one ore more compositional grammatical neural network node layers comprises an AND-OR grammar building block, (Lu, Page 3462, Col. 2, Section i.): “an AOG represents an object in a hierarchical and compositional manner which has three types of nodes: an And-node represents the rule of decomposing a complex structure (e.g., a walking person or a running basketball player) into simple ones; an Or-node represents alternative structures at both object and part levels which can capture different poses and viewpoints and partial occlusion; and a Terminal-node grounds the representational symbol to image data using different appearance templates to capture local appearance change.”, Page 3463, Fig. 1:
PNG
media_image1.png
585
948
media_image1.png
Greyscale
) [Examiner’s note: Fig.1 discloses the AND-OR Grammar (AOG) structure operates hierarchically, with nodes combining or selecting components at different levels. This is conceptually similar to how layers in neural network process and combine features.]
wherein the plurality of split inputs of the AND-OR grammar building block maps N groups of input-able features from the plurality of feature channels, and (Lu, Page 3463, Fig. 1:
PNG
media_image2.png
560
987
media_image2.png
Greyscale
, Page 3463, Co. 1, Section 1.2.i): “i) The AOG for modeling the tracked object. Given the input bounding box of the object in the first frame (top-left), we divide the bounding box into a r × c cells (3 × 3 here). The set of primitive parts are then enumerated in the r × c cells, which quantize the hypothesis space of AOG using the method proposed in [31]. The quantization is capable of exploring a large number of latent part configurations (capturing discriminative and stable parts at different frames), meanwhile it makes the problem of online learning AOG feasible.”) [Examiner’s note: The process begins with an input bounding box of the object (top-left in the image). The bounding box is divided into smaller cells (e.g., r x c), and “primitive parts” are enumerated within these cells. This is conceptually similar to extracting features from different regions of an input image, akin to feature channels in convolutional network]
Lu fails to disclose:
A non-transitory computer readable medium comprising instructions stored
thereon, wherein execution of the instructions by a processor causes the processor to:
wherein the AND-OR grammar building block comprises a graph of stacked and interconnected plurality of AND nodes configured to concatenate features from connected child nodes, plurality of OR nodes configured to element-wise sum features from connected child nodes, and plurality of terminal-nodes each configured to select or output a channel-wise slice of a given input feature channel that connects in a set of combinations of AND nodes and OR nodes to the N groups of inputted features of each of the plurality of feature channels.
However, Harang explicitly discloses:
A non-transitory computer readable medium comprising instructions stored
thereon, wherein execution of the instructions by a processor causes the processor to: (Harang [0076]: “Some embodiments described herein relate to a computer storage product with a non-transitory computer readable medium (also can be referred to as a non-transitory processor-readable medium) having instructions or computer code thereon for performing various computer-implemented operations.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Lu and Harang. Lu teaches a framework for simultaneously tracking, learning and parsing objects with a hierarchical and compositional And-Or graph (AOG) representation. Harang teaches training a network with multiple connected nodes to classify files, documents, images and the like. One of ordinary skill would have motivation to combine Lu and Harang in order to enable quick access to the instructions, reducing delays during execution (Harang [0014])
However, Tu explicitly discloses:
wherein the AND-OR grammar building block comprises a graph of stacked and interconnected plurality of AND nodes configured to concatenate features from connected child nodes, plurality of OR nodes configured to element-wise sum features from connected child nodes, and plurality of terminal-nodes each configured to select or output a channel-wise slice of a given input feature channel that connects in a set of combinations of AND nodes and OR nodes to the N groups of inputted features of each of the plurality of feature channels. (Tu, Pg. 5, Figure 1: “(a) A graphical representation of an example stochastic AOG of line drawings of animal faces. Each And-rule is represented by an And-node and all of its child nodes in the graph. The spatial relations within each And-rule are not shown for clarity. Each Or-rule is represented by an Or-node and one of its child nodes, with its probability shown on the corresponding edge. (b) A line drawing image and its compositional structure generated from the example AOG. Again, the spatial relations between nodes are not shown for clarity. The probability of the compositional structure is partially computed at the top right.
PNG
media_image3.png
439
512
media_image3.png
Greyscale
”, Tu, Pg. 4, ¶[1]: “An Or-rule, parameterized by an ordered pair <r, p>, represents an alternative configuration of a pattern. The Or-rule specifies a production r : O->x, where O is an Or-node and x is either a terminal or a nonterminal node representing a possible configuration.”, Pg. 4, ¶[3]: “Fig. 1(a) shows an example stochastic context-free AOG of line drawings. Each terminal or nonterminal node represents an image patch and its parameter is a 2D vector representing the position of the patch in the image. Each terminal node denotes a line segment of a specific orientation while each nonterminal node denotes a class of line drawing patterns.”, Tu, Pg. 6, Section 2.1, ¶2]: “In a stochastic AOG representing a SCFG, each node represents a string and the parameter of a node is the start/end positions of the string in the complete sentence; the parameter relation and parameter function in an And-rule specify string concatenation, i.e., the substrings must be adjacent and the concatenation of all the substrings forms the composite string represented by the parent And-node”, Pg. 20, ¶[3]: “A sum node computes a weighted sum of its child nodes”, Pg. 21, ¶[1]: “an Or-rule with the sum node as the left-hand side, the child node as the right-hand side, and the normalized weight of the child node as the conditional probability”)
such that the AND-OR grammar building block defines a phrase structure grammar and dependency grammar in a bottom-up configuration (Tu, Pg. 7, Section 2.2, ¶1]: “Our algorithm is based on bottom-up dynamic programming and can be seen as a generalization of several previous exact inference algorithms designed for special cases of stochastic AOGs (such as the CYK algorithm for text parsing).”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Lu and Tu. Lu teaches a framework for simultaneously tracking, learning and parsing objects with a hierarchical and compositional And-Or graph (AOG) representation. Tu teaches a stochastic And-Or grammars extending traditional stochastic grammars of language to model images or events data. One of ordinary skill would have motivation to combine Lu and Tu because MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results; (E): “Obvious to try” choosing from a finite number identified, predictable solutions, with a reasonable expectation of success; (F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of the ordinary skill in the art.
Claim(s) 9 is rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. (“Online Object Tracking, Learning and Parsing with And-Or Graphs”) (hereafter referred to as “Lu”), in view of Harang (US 2019/0266492 A1), Kewei Tu (“Stochastic And-Or Grammars: A Unified Framework and Logic Perspective”) (hereafter referred to as “Tu”) and in further view of Wang & Zong (“Phrase Structure Parsing with Dependency Structure”) (hereafter referred to as “Wang”)
Regarding Claim 9, the combination of Lu, Tu and Harang discloses all the limitations of Claim 1 (as shown in the rejections above).
Lu in view of Harang and Tu fails to disclose:
wherein the AND-OR grammar building block comprises a third hyper-parameter associated with i) phase structure grammar only and ii) a combination of phase structure grammar and dependency grammar.
However, Wang explicitly discloses:
wherein the AND-OR grammar building block comprises a third hyper-parameter associated with i) phase structure grammar only and ii) a combination of phase structure grammar and dependency grammar. (Wang, Page 1294, Figure 1:
PNG
media_image7.png
537
484
media_image7.png
Greyscale
, Page 1297, Section 4.3, Col. 2, ¶[1]: “In this subsection we explore feasibility and effectiveness of phrase parsing with the help of dependency trees generated automatically… So in order to make our system more robust we use N-best dependency structures to guide phrase parsing procedure.”, Page 1798, Section 4.3, Col. 1, ¶[2]: “Considering the number of dependency structures (N-best) will affect the final result, we make use of the development set shown in Table1 to turning parameters. We parse the development set many times with different number of dependency structures.”) [Examiner’s note: hyper-parameter associated with phrase structure grammar and combination of phrase structure grammar and dependency grammar i.e., N- number of best dependency structures ]
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Lu, Harang, Tu and Wang. Lu teaches a framework for simultaneously tracking, learning and parsing objects with a hierarchical and compositional And-Or graph (AOG) representation. Harang teaches performing multiple tasks using neural network which has multiple different layers, where each layer includes multiple nodes. Tu teaches Stochastic And-Or grammars (AOG) extend traditional stochastic grammars of language to model other types of data such as images and events. Wang teaches a novel phrase structure parsing approach with the help of dependency structure. One of ordinary skill would have motivation to combine Lu, Harang, Tu and Wang in order to improve the phrase parsing procedure by applying dependency grammar structures to bring more linguistic priori knowledge into phrase structures and make the parsing procedure more flexible (Wang, Page 1293, Section 2, Col. 2, ¶p[3])
Claim(s) 10 is rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. (“Online Object Tracking, Learning and Parsing with And-Or Graphs”) (hereafter referred to as “Lu”), in view of Harang (US 2019/0266492 A1), Kewei Tu (“Stochastic And-Or Grammars: A Unified Framework and Logic Perspective”) (hereafter referred to as “Tu”) and in further view of Wu et al. (“Online Object Tracking, Learning and Parsing with And-Or Graphs”) (hereafter referred to as “Wu”)
Regarding Claim 10, the combination of Lu, Tu and Harang discloses all the limitations of Claim 1 (as shown in the rejections above).
Lu in view of Harang and Tu fails to disclose:
wherein the AND-OR grammar building block comprises a fourth hyper-parameter associated with i) full phrase structure and ii) a partial phrase structure that do not include syntactically symmetric child nodes for OR- nodes.
However, Wu explicitly discloses:
wherein the AND-OR grammar building block comprises a fourth hyper-parameter associated with i) full phrase structure and ii) a partial phrase structure that do not include syntactically symmetric child nodes for OR- nodes. (Wu, Page 10, Section 6, Col. 2, ¶[2]: “When re-learning structure and parameters, we could use all the frames with valid tracking results. To reduce the time complexity, the number of frames used in relearning is at most 100 in our experiments.”, and Page 9, Figure 7:
PNG
media_image8.png
781
501
media_image8.png
Greyscale
) [Figure 7 discloses full structure of AND-OR graph (AOG) in (a) and partial phrase structure that do not include syntactically symmetric child nodes for OR- nodes in (b), wherein the hyper-parameter associated with both structures is the maximum number of frame]
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Lu, Harang, Tu and Wu. Lu teaches a framework for simultaneously tracking, learning and parsing objects with a hierarchical and compositional And-Or graph (AOG) representation. Harang teaches performing multiple tasks using neural network which has multiple different layers, where each layer includes multiple nodes. Wu teaches a method called AOGTracker for simultaneously tracking, learning, and parsing unknown objects in sequences with a hierarchical and compositional AND-OR Graph (AOG). Tu teaches Stochastic And-Or grammars (AOG) extend traditional stochastic grammars of language to model other types of data such as images and events. One of ordinary skill would have motivation to combine Lu, Harang, Tu and Wu in order to achieve a more representational power, more robust tracking and online learning strategies and fine-grained tracking results (Wu, Page 3, Section 2, Col. 2, ¶[8-9], Page 4, Section 2, Col. 1, ¶[2])
Claim(s) 11 is rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. (“Online Object Tracking, Learning and Parsing with And-Or Graphs”) (hereafter referred to as “Lu”), in view of Harang (US 2019/0266492 A1), Kewei Tu (“Stochastic And-Or Grammars: A Unified Framework and Logic Perspective”) (hereafter referred to as “Tu”) and in further view of Liang et al. (“WPNets and PWNets: From the Perspective of Channel Fusion”) (hereafter referred to as “Liang”) and Hu et al (“Channel-wise and Spatial Feature Modulation Network for Single Image Super-Resolution”) (hereafter referred to as “Hu”)
Regarding Claim 11, the combination of Lu, Tu and Harang discloses all the limitations of Claim 1 (as shown in the rejections above).
Lu in view of Harang and Tu fails to disclose:
wherein the one or more compositional grammatical neural network node layers are instantiated in a convolutional neural network selected from the group consisting of GoogLeNets, ResNets, ResNeXts, DenseNets, and DualPathNets.
However, Liang explicitly discloses:
wherein the one or more compositional grammatical neural network node layers are instantiated in a convolutional neural network selected from the group consisting of GoogLeNets, ResNets, ResNeXts, DenseNets, and DualPathNets. (Liang, Page 34226, Section I., Col. 1, ¶[1]: “There are a lot of networks that have achieved very good performance by applying new architectures… GoogLeNets [3] use different convolution kernels to establish more connections and more diverse representations between adjacent layers. ResNets [4] and Highway Networks [5] add the front layer information to the back layer through the bypass structure, which is more conducive to the backpropagation of the gradient, thus further deepening the depth of the network. ResNeXts [6] combine group convolution into ResNets [4], which perform split-transform-merge operations on features to improve network performance while reducing parameters. DenseNets [7] pass the features of each preceding layer to all of its subsequent layers to alleviate the vanishing/exploding gradient problem [8], [9] and to facilitate information fusion between layers.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Lu, Harang, Tu and Liang. Lu teaches a framework for simultaneously tracking, learning and parsing objects with a hierarchical and compositional And-Or graph (AOG) representation. Harang teaches performing multiple tasks using neural network which has multiple different layers, where each layer includes multiple nodes. Tu teaches Stochastic And-Or grammars (AOG) extend traditional stochastic grammars of language to model other types of data such as images and events. Liang teaches the relationship between the whole and the part of the neural network's channels, include whole-to- part and part-to-whole connection relation. One of ordinary skill would have motivation to combine Lu, Harang, Tu and Liang because MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results; (E): “Obvious to try” choosing from a finite number identified, predictable solutions, with a reasonable expectation of success; (F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of the ordinary skill in the art.
However, Hu explicitly discloses:
and DualPathNets. (Hu, Page 4, Col.1, Section C, ¶[1]: “Considering these, Chen et al. [41] combined the insights of ResNets [23] and DenseNets [24] and proposed a DualPathNet which utilized both concatenation and summation for previous features.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Lu, Harang, Tu and Hu. Lu teaches a framework for simultaneously tracking, learning and parsing objects with a hierarchical and compositional And-Or graph (AOG) representation. Harang teaches performing multiple tasks using neural network which has multiple different layers, where each layer includes multiple nodes. Tu teaches Stochastic And-Or grammars (AOG) extend traditional stochastic grammars of language to model other types of data such as images and events. Hu teaches channel-wise and spatial feature modulation (CSFM) network for modeling the process of single image super-resolution. One of ordinary skill would have motivation to combine Lu, Harang, Tu and Hu because MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results; (E): “Obvious to try” choosing from a finite number identified, predictable solutions, with a reasonable expectation of success; (F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of the ordinary skill in the art.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMY TRAN whose telephone number is (571)270-0693. The examiner can normally be reached Monday - Friday 7:30 am - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AMY TRAN/Examiner, Art Unit 2126
/DAVID YI/Supervisory Patent Examiner, Art Unit 2126