Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
1. This Office Action is in response to the application filed on 11/14/2022.
Claims 1-24 are pending.
Priority
2. The Provisional Application No. 63/280,102, which was filed on 11/16/2021, was acknowledged and considered.
Information Disclosure Statement
3. The information disclosure statement (IDS) filed on 04/13/2023 complies with the provisions of M.P.E.P. 609. The examiner has considered it.
Claim Rejections - 35 USC § 101
4. 35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Step 1:
The claims 1-6, 7-12 and 19-24 directed to a statutory category, such as processes, machines and manufactures.
Step 2A, prong 1:
Independent claims 1, 7, 13 and 19 recite, “defining a search space of student neural network …operators” and “performing trust-region Bayesian optimization to select a student neural network … based on a pre-defined teacher model”, which are directed
to a judicial exception. These activities merely employ mathematical relationship to match or come up with derived or candidate models. This idea is similar to the basic concept of comparing or matching information using mathematical relationship (e.g., converting numerical representation in Benson), which has been found by the courts to be an abstract idea. Further, the claim does not include additional elements beyond the abstract idea of classification using the comparing. Therefore, the claim does not amount to more than the abstract idea itself. The claims are not patent eligible.
Step 2A, prong 2:
The judicial exception is not integrated into a practical application. In particular, the claim(s) only recites the additional elements of certain operators and models that are used to implement the steps of “defining” and “performing”. These elements are recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic operations. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Step 2B:
The claims do not include additional elements that are sufficient to amount to significantly more that the judicial exception. The additional elements amounts to no more than operators and models to apply to the implementation of selecting certain models based on certain pre-defined models.
5. Claims 2-6, 8-12, 14-18 and 20-24 are rejected under 35 U.S. C. 101 because it fails to resolve the deficiencies of claims 1, 7, 13 and 19.
Invocation of 35 USC § 112 (f)
6. The following is a quotation of the sixth paragraph of 35 U.S.C. 112:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
7. Claims 19-24 recite "means for", and thus invoke 35 USC § 112 Paragraph 6 (35 U.S.C. § 112(f)). Independent claim 19 uses a term “means” modified by functional language or linked by a transition word “for”, but is not modified by sufficient structure or material for performing the claimed invention. Therefore, claims 19-24 are "construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof", and will be interpreted accordingly.
Examiner’s Note
8. Preliminary mappings of some pertinent arts:
Student neural network (According to Google): “A student neural network is a smaller, simpler network trained in a process called knowledge distillation to mimic a larger, more complex "teacher" network. Its architecture is typically similar to the teacher's but with fewer layers and parameters, and it learns by combining the teacher's soft probability outputs with the original data's hard targets to improve its performance compared to a model trained only on the hard targets.”
Teacher neural network (According to Google): “A teacher neural network architecture is a system where a large "teacher" network transfers its knowledge to a smaller "student" network, a process often called knowledge distillation. This allows the compact student model to achieve performance comparable to the larger teacher by learning from its soft predictions and intermediate feature representations. The teacher-student approach is used for model compression, improving the student's efficiency while maintaining accuracy.”
A search space in AI (According to Google): “In AI, a search space is the set of all possible solutions to a problem, which an algorithm explores to find the best or optimal one. It's like a map where every possible path is a potential solution, and the AI uses a search algorithm to navigate this map from a starting point to a goal state. This set can be visualized as a graph or tree, with each node representing a state and each branch representing an action or operation.”
A convolutional operator (According to Google): “A convolutional operator is a mathematical operation that combines two functions or matrices to produce a third, which represents a modified version of the input. In fields like image processing and deep learning, it is a core component of a convolutional neural network (CNN) that uses a small matrix called a "kernel" to slide across an input image, detecting features like edges and corners. The operator calculates the dot product at each position to create a new, "filtered" output image, known as a feature map.”
Examples of a convolutional operators in an AI model (According to Google): “An example of a convolutional operator in an AI model is a kernel used to detect features like edges in an image. The kernel, a small matrix, slides across the image, multiplying its values with the corresponding pixel values in the image and summing them to produce a single output value. This process is repeated to create an "activation map" that highlights specific features, such as vertical edges, when using a specific kernel like a Sobel kernel.”
A transformer operator (According to Google): “A "transformer operator" is not a standard, singular term, but rather a functional component within a transformer neural network. It acts as a specialized mathematical operation or layer that transforms an input sequence by creating new, contextual representations of each element. This definition is based on the transformer architecture, a type of neural network that revolutionized natural language processing (NLP) and is fundamental to large language models (LLMs) like GPT and BERT.”
Examples of transformer operators in AI models (According to Google): “Examples of transformer operators in AI models include self-attention for understanding word relationships, the Transformer architecture which uses multiple transformer blocks, and specific models like BERT and GPT that leverage the architecture for tasks like translation, text generation, and question answering. Other operators include tokenization, positional encoding, and the softmax function for converting output into probabilities.”
Trust-Region Bayesian Optimization (According to Google): “Trust-Region Bayesian Optimization (TuRBO) is an advanced optimization algorithm designed to efficiently solve expensive, high-dimensional black-box optimization problems. It combines the strengths of Bayesian Optimization (BO) with trust region methods to enhance scalability and efficiency.
The core idea behind TuRBO is to perform local optimization within dynamically adjusted "trust regions" in the search space. Instead of attempting to model the entire, potentially complex, objective function globally, TuRBO focuses on approximating the function within these local regions using a Bayesian surrogate model, typically a Gaussian Process (GP).”
Yin et al, US 20220284283, [Paragraph 60 (“a third phase comprises use of training dataset 106 to transfer learning from teacher neural network 102 to student neural network 108. In at least one embodiment, this transferring is done by having teacher neural network 102 and student neural network 104 perform corresponding tasks based on input from training dataset 106. In at least one embodiment, student neural network 108 is trained based on a loss factor that indicates how well student network 108 is mimicking teacher network”)] [Paragraph 105 (“it is desired to transfer this teacher network's knowledge to another neural network. In at least one embodiment, this other neural network is referred to as a student neural network”)] [Paragraph 82 (“consider a two-dimensional convolutional operator”)] [Paragraphs 62, 71 and 86 (“weight transpose operator” AND “standard three-step transformation”, i.e., transformer operators)] [Paragraphs 69, 73 and 524 (“inverted network is reconnected back to its target model, forming a cycle path” AND “computational graph of a pre-trained model F” AND “a tree traversal unit configured to traverse a hierarchical tree data structure”, i.e., search space)].
Carlucci et al, US 20230237337, [Paragraphs 35 and 45 (“The Bayesian optimisation may have one or more objectives, wherein at least one of said objectives refers to (i) improved classification accuracy of the second candidate neural network and/or (ii) reduced computational intensiveness of the second candidate neural network. This may assist in forming a student neural network that is accurate and less computationally intensive that the teacher neural network.”)] [Paragraphs 8 and 10 (“base neural network”, i.e., teacher neural network, AND “candidate neural network to be trained that can emulate a larger based network”, i.e., selecting student neural network)] [Paragraphs 11 and 31 (“Bayesian optimization is able to take advantage of the full information provided by the history of the optimization to make the search efficient”, i.e., search space)].
Claim Rejections - 35 USC § 103
9. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
10. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
11. Claims 1-24 are rejected under 35 U.S.C. 103 as being unpatentable over Yin et al (US 20220284283), in view of Carlucci et al (US 20230237337).
Claim 1:
Yin suggests a processor-implemented method, comprising: defining a search space of student neural network architectures for knowledge distillation, the search space including a plurality of convolutional operators and a plurality of transformer operators [Yin: Paragraph 60 (“a third phase comprises use of training dataset 106 to transfer learning from teacher neural network 102 to student neural network 108. In at least one embodiment, this transferring is done by having teacher neural network 102 and student neural network 104 perform corresponding tasks based on input from training dataset 106. In at least one embodiment, student neural network 108 is trained based on a loss factor that indicates how well student network 108 is mimicking teacher network”)] [Yin: Paragraph 105 (“it is desired to transfer this teacher network's knowledge to another neural network. In at least one embodiment, this other neural network is referred to as a student neural network”)] [Yin: Paragraph 82 (“consider a two-dimensional convolutional operator”)] [Yin: Paragraphs 62, 71 and 86 (“weight transpose operator” AND “standard three-step transformation”, i.e., transformer operators)] [Yin: Paragraphs 69, 73 and 524 (“inverted network is reconnected back to its target model, forming a cycle path” AND “computational graph of a pre-trained model F” AND “a tree traversal unit configured to traverse a hierarchical tree data structure”, i.e., search space)].
Carlucci suggests performing trust-region Bayesian optimization to select a student neural network architecture from the search space based on a pre-defined teacher model [Carlucci: Paragraphs 35 and 45 (“The Bayesian optimisation may have one or more objectives, wherein at least one of said objectives refers to (i) improved classification accuracy of the second candidate neural network and/or (ii) reduced computational intensiveness of the second candidate neural network. This may assist in forming a student neural network that is accurate and less computationally intensive that the teacher neural network.”)] [Carlucci: Paragraphs 8 and 10 (“base neural network”, i.e., teacher neural network, AND “candidate neural network to be trained that can emulate a larger based network”, i.e., selecting student neural network)] [Carlucci: Paragraphs 11 and 31 (“Bayesian optimization is able to take advantage of the full information provided by the history of the optimization to make the search efficient”, i.e., search space)].
Both references (Yin and Carlucci) taught features that were directed to analogous art and they were directed to the same field of endeavor, such as neural network. It would have been obvious to one of ordinary skill in the art at the time the invention was made, having the teachings of Yin and Carlucci before him/her, to modify the system of Yin with the teaching of Carlucci in order to generate student neural networks [Carlucci: Paragraphs 35 and 45].
Claim 2:
The combined teachings of Yin and Carlucci suggest in which performing the trust-region Bayesian optimization comprises performing a plurality of simultaneous local optimizations with a plurality of competing objectives [Carlucci: Paragraphs 35 and 45 (“The Bayesian optimisation may have one or more objectives, wherein at least one of said objectives refers to (i) improved classification accuracy of the second candidate neural network and/or (ii) reduced computational intensiveness of the second candidate neural network. This may assist in forming a student neural network that is accurate and less computationally intensive that the teacher neural network.”)] [Trust-Region Bayesian Optimization (According to Google): “Trust-Region Bayesian Optimization (TuRBO) is an advanced optimization algorithm designed to efficiently solve expensive, high-dimensional black-box optimization problems. It combines the strengths of Bayesian Optimization (BO) with trust region methods to enhance scalability and efficiency. The core idea behind TuRBO is to perform local optimization within dynamically adjusted "trust regions" in the search space. Instead of attempting to model the entire, potentially complex, objective function globally, TuRBO focuses on approximating the function within these local regions using a Bayesian surrogate model, typically a Gaussian Process (GP).”]).
Both references (Yin and Carlucci) taught features that were directed to analogous art and they were directed to the same field of endeavor, such as neural network. It would have been obvious to one of ordinary skill in the art at the time the invention was made, having the teachings of Yin and Carlucci before him/her, to modify the system of Yin with the teaching of Carlucci in order to generate student neural networks [Carlucci: Paragraphs 35 and 45].
Claim 3:
The combined teachings of Yin and Carlucci suggest in which the plurality of competing objectives includes one or more of model accuracy, a number of parameters, operations per second, and latency [Yin: Paragraph 79 (“operations type associated with those layers. For example, in at least one embodiment, different transposition operations are performed for linear and fully-connected layers, convolutional layers, batch normalization layers, and activation layers. In at least one embodiment, to invert a block in a teacher mode”)].
Claim 4:
The combined teachings of Yin and Carlucci suggest in which the search space assigns the convolutional operators to visual processing and the transformer operators to representation learning [Yin: Paragraph 82 (“consider a two-dimensional convolutional operator”)] [Yin: Paragraphs 62, 71 and 86 (“weight transpose operator” AND “standard three-step transformation”, i.e., transformer operators)].
Claim 5:
The combined teachings of Yin and Carlucci suggest regularizing kernel orthogonality for pointwise convolution operations [Yin: Paragraphs 81 and 83 (“weights at initialization are drawn from a normal distribution which provides orthogonality, … being input and output channels, and K, S.sub.inv, S.sub.out denoting kernel size, input and output spatial sizes” AND “windows sliding position of a kernel during convolution and can be obtained by unfolding kernel into a vector and fill kernel uncovered regions by zero”)].
Claim 6:
The combined teachings of Yin and Carlucci suggest regularizing kernel orthogonality for a feed-forward network layers in the transformer operators [Yin: Paragraphs 81 and 83 (“weights at initialization are drawn from a normal distribution which provides orthogonality, … being input and output channels, and K, S.sub.inv, S.sub.out denoting kernel size, input and output spatial sizes” AND “windows sliding position of a kernel during convolution and can be obtained by unfolding kernel into a vector and fill kernel uncovered regions by zero”)] ”)] [Yin: Paragraphs 62, 71 and 86 (“weight transpose operator” AND “standard three-step transformation”, i.e., transformer operators)] [Paragraphs 70 and 419 (“Without loss of generality, a feed-forward network with L-blocks can be considered” AND “layer 2510 may be referred to as a “feed-forward layer””)].
Claim 7:
Claim 7 is essentially the same as claim 1 except that it sets forth the claimed invention as an apparatus rather than a method and rejected under the same reasons as applied above.
Claim 8:
Claim 8 is essentially the same as claim 2 except that it sets forth the claimed invention as an apparatus rather than a method and rejected under the same reasons as applied above.
Claim 9:
Claim 9 is essentially the same as claim 3 except that it sets forth the claimed invention as an apparatus rather than a method and rejected under the same reasons as applied above.
Claim 10:
Claim 10 is essentially the same as claim 4 except that it sets forth the claimed invention as an apparatus rather than a method and rejected under the same reasons as applied above.
Claim 11:
Claim 11 is essentially the same as claim 5 except that it sets forth the claimed invention as an apparatus rather than a method and rejected under the same reasons as applied above.
Claim 12:
Claim 12 is essentially the same as claim 6 except that it sets forth the claimed invention as an apparatus rather than a method and rejected under the same reasons as applied above.
Claim 13:
Claim 13 is essentially the same as claim 1 except that it sets forth the claimed invention as a program product rather than a method and rejected under the same reasons as applied above.
Claim 14:
Claim 14 is essentially the same as claim 2 except that it sets forth the claimed invention as a program product rather than a method and rejected under the same reasons as applied above.
Claim 15:
Claim 15 is essentially the same as claim 3 except that it sets forth the claimed invention as a program product rather than a method and rejected under the same reasons as applied above.
Claim 16:
Claim 16 is essentially the same as claim 4 except that it sets forth the claimed invention as a program product rather than a method and rejected under the same reasons as applied above.
Claim 17:
Claim 17 is essentially the same as claim 5 except that it sets forth the claimed invention as a program product rather than a method and rejected under the same reasons as applied above.
Claim 18:
Claim 18 is essentially the same as claim 6 except that it sets forth the claimed invention as a program product rather than a method and rejected under the same reasons as applied above.
Claim 19:
Claim 19 is essentially the same as claim 1 except that it sets forth the claimed invention as an apparatus rather than a method and rejected under the same reasons as applied above.
Claim 20:
Claim 20 is essentially the same as claim 2 except that it sets forth the claimed invention as an apparatus rather than a method and rejected under the same reasons as applied above.
Claim 21:
Claim 21 is essentially the same as claim 3 except that it sets forth the claimed invention as an apparatus rather than a method and rejected under the same reasons as applied above.
Claim 22:
Claim 22 is essentially the same as claim 4 except that it sets forth the claimed invention as an apparatus rather than a method and rejected under the same reasons as applied above.
Claim 23:
Claim 23 is essentially the same as claim 5 except that it sets forth the claimed invention as an apparatus rather than a method and rejected under the same reasons as applied above.
Claim 24:
Claim 24 is essentially the same as claim 6 except that it sets forth the claimed invention as an apparatus rather than a method and rejected under the same reasons as applied above.
12. Any inquiry concerning this communication or earlier communications from the examiner should be directed to [Hung D. Le], whose telephone number is [571-270-1404]. The examiner can normally be communicated on [Monday to Friday: 9:00 A.M. to 5:00 P.M.].
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Apu Mofiz can be reached on [571-272-4080]. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, contact [800-786-9199 (IN USA OR CANADA) or 571-272-1000].
Hung Le
10/30/2025
/HUNG D LE/Primary Examiner, Art Unit 2161