Last updated: May 29, 2026
Application No. 17/475,330
METHOD FOR SPARSIFICATION OF FEATURE MAPS IN SELF-ATTENTION MECHANISMS

Non-Final OA §101§103
Filed
Sep 14, 2021
Priority
Jul 15, 2021 — provisional 63/222,405
Examiner
MCINTOSH, ANDREW T
Art Unit
2144
Tech Center
2100 — Computer Architecture & Software
Assignee
Samsung Electronics Co., Ltd.
OA Round
3 (Non-Final)
Interview Optional

— +18.0% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 77% grant rate with +18.0% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 515 resolved cases, 2023–2026
Examiner Intelligence

MCINTOSH, ANDREW T View full profile →
Grants 77% — above average
Career Allowance Rate
397 granted / 515 resolved
+22.1% vs TC avg
Strong +18% interview lift
Without
With
+18.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
13 currently pending
Career history
540
Total Applications
across all art units
Statute-Specific Performance

§101
1.7%
-38.3% vs TC avg
§103
88.8%
+48.8% vs TC avg
§102
5.7%
-34.3% vs TC avg
§112
0.8%
-39.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 515 resolved cases
Office Action

§101 §103
DETAILED ACTION
	This action is in response to Applicant’s Request for Continued Examination ("Response”) received on November 18, 2025 in response to the Office Action dated August 19, 2025. This action is made Non-Final.
	Claims 1-20 are pending.
Claims 1, 8, and 15 are independent claims.
	Claims 1-20 are rejected.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Applicant’s Response
	In Applicant’s Response, Applicant amended claims 1, 2, 7, 8, 9, 14-16, and 20, and submitted arguments against the prior art in the Office Action dated August 19, 2025.
	Based on the Applicant’s amendments and remarks, the Examiner withdraws the rejection of claims under 35 USC §101 abstract idea analysis.

Information Disclosure Statement
The information disclosure statement (IDS(s)) submitted on 01/15/2025 is/are in compliance with the provisions of 37 C.F.R. 1.97. Accordingly, the IDS(s) is/are being considered by the examiner.


	Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 8-14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Claims 8-14:
	In summary, Claim 8 recites a “transforming self-attention model” for encoding and decoding. The components of the “transforming self-attention model” recited can be embodied in the form of software are, and are not necessarily hardware components (e.g. a processor). 
	Accordingly, the recited “transforming self-attention model” is computer software per se and not a “process,” a “machine,” a “manufacture,” or a “composition of matter,” as defined in 35 U.S.C. 101. Claims 9-14 depend from Claim 8 and merely further define the “transforming self-attention model.” Thus, Claims 8-14 fail to recite statutory subject matter.











Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-7 and 15-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Clement et al., US Publication 2021/0357187 (“Clement”), in view of Choi et al., US Publication 2023/0153631 (“Choi”), in view of Georgiadis, US Publication 2020/0143226 (“Georgiadis”), and further in view of Zheng et al., US Publication 2022/0351087 (“Zheng”).
Claim 1:
	Clement teaches or suggests a method comprising:
	training a self-attention model (see Fig. 2-5; para. 0006 - neural transformer model is trained through multi-modal learning which integrates source code and natural language text. The neural transformer model is pre-trained on a large corpus of unsupervised source code methods from various programming languages in order to learn the structure a method signature and a method body and the relationships between them. The neural transformer model is then fine-tuned on various translation tasks using combinations of the features which include a method signature, method body and natural language text describing the method (i.e., method docstring) in order to learn to translate an input sequence to an output sequence; para. 0022 - sequence-to-sequence neural transformer model with attention; para. 0026 - based on a neural transformer model with attention trained on multiple modalities, such as source code and natural language text (e.g., documentation in source code); para. 0027 - neural transformer model is pre-trained on a large unsupervised training dataset of source code; para. 0058 - training iteration includes forward propagation, loss calculation, backpropagation steps followed by updating the weights; para. 0064 - self-attention and normalization layers; para. 0065 - neural networks in the encoder blocks 622 and the decoder blocks 624 are trained iteratively, making multiple passes over the training dataset before converging to a minimum. Each training iteration includes forward propagation, loss calculation, backpropagation steps followed by updating the weights; para. 0077 - computation at training time can be parallelized using masked self-attention.);
	... weights of the trained self-attention model (see para. 0033 - weighs the relevance of each subtoken represented in the context tensor to each other by generating attention weights for each subtoken in the input embedding; para. 0058 - training iteration includes forward propagation, loss calculation, backpropagation steps followed by updating the weights; para. 0065 - neural networks in the encoder blocks 622 and the decoder blocks 624 are trained iteratively, making multiple passes over the training dataset before converging to a minimum. Each training iteration includes forward propagation, loss calculation, backpropagation steps followed by updating the weights.);
	receiving an input sequence including a natural language sentence (see Fig. 1B, 2, 9A, 9B, 10, and 11; para. 0005 - complete a method signature using a neural transformer model with attention. neural transformer model predicts the programming language instructions that implement a method based on a given method signature or on a given combination of a method signature and natural language text describing the method; para. 0077 - neural transformer model factorizes the probability of the target sub tokens in an input sequence into a product of conditional probabilities for each subtoken; para. 0078 - probability distribution generated by the neural transformer model to identify the top k subtokens likely to be the next subtoken in a candidate sequence; para. 0079 - updates the current context sequence with the selected subtoken to input into the neural transformer model to generate an additional probability distribution for the next subtoken in a sequence. This process is repeated until the end of a method token is predicted as being the next likely subtoken candidate; para. 0080 - extract tokens and/or subtokens in an ordered sequence.);
	applying the trained self-attention model for inference to the input sequence, comprising ... of the trained self-attention model (see Fig. 1B, 2, 9A, 9B, 10, and 11; para. 0005 - complete a method signature using a neural transformer model with attention. neural transformer model predicts the programming language instructions that implement a method based on a given method signature or on a given combination of a method signature and natural language text describing the method; para. 0077 - neural transformer model factorizes the probability of the target sub tokens in an input sequence into a product of conditional probabilities for each subtoken; para. 0078 - probability distribution generated by the neural transformer model to identify the top k subtokens likely to be the next subtoken in a candidate sequence; para. 0079 - updates the current context sequence with the selected subtoken to input into the neural transformer model to generate an additional probability distribution for the next subtoken in a sequence. This process is repeated until the end of a method token is predicted as being the next likely subtoken candidate; para. 0080 - extract tokens and/or subtokens in an ordered sequence.),
	wherein the self-attention model includes a self-attention mechanism to determine a weight of a part of the input sequence and calculate an attention score (see Fig. 1B, 2, 9A, 9B, 10, and 11; para. 0005 - complete a method signature using a neural transformer model with attention. neural transformer model predicts the programming language instructions that implement a method based on a given method signature or on a given combination of a method signature and natural language text describing the method; para. 0033 - weighs the relevance of each subtoken represented in the context tensor to each other by generating attention weights for each subtoken in the input embedding. attention function is scaled dot-product attention which is described mathematically; para. 0042 – softmax layer 236 then turns the scores of the logits vector into probabilities for each subtoken in the vocabulary which are positive and normalized; para. 0077 - neural transformer model factorizes the probability of the target sub tokens in an input sequence into a product of conditional probabilities for each subtoken; para. 0078 - probability distribution generated by the neural transformer model to identify the top k subtokens likely to be the next subtoken in a candidate sequence; para. 0079 - updates the current context sequence with the selected subtoken to input into the neural transformer model to generate an additional probability distribution for the next subtoken in a sequence. This process is repeated until the end of a method token is predicted as being the next likely subtoken candidate; para. 0080 - extract tokens and/or subtokens in an ordered sequence.).
	Clement does not explicitly disclose comprising adding a feature-map regularization term to a loss function to reduce activation values of feature maps; quantizing weights; adding a filter layer to change at least one feature having a low value from at least one feature map.
	Choi teaches or suggests adding a feature-map regularization term to a loss function to reduce activation values of feature maps (see para. 0005 - performing training a target model for fine-tuning by adding, to a loss function, a regularization term that reduces the difference between parameters of the source model 110 (refer to non-patent reference 1), a regularization term that reduces the difference between the activation levels of the source model 110 and the target model 100 (refer to non-patent reference 2), and a regularization term that suppresses activation of a feature causing a singular value with small magnitude.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include adding a feature-map regularization term to a loss function to reduce activation values of feature maps for the purpose of efficiently suppressing activation of a feature by fine-tuning a loss function using a regularization term, improving model performance, as taught by Choi (0005). 
	Georgiadis further teaches or suggests quantizing weights (see para. 0019 - activation maps of a neural network may be sparsified to reduce the number of non-zero values of the activation map. In the quantization stage, the activation maps of each layer are quantized; para. 0029 - the non-quantized values of the sparsified activation map 111 may be quantized by the quantizer 108 into integer values having any bit width (i.e., 8 bits, 12 bits, 16 bits, etc.) to form a sparsified and quantized activation map 112. Quantizing by the quantizer 108, if needed, may also be considered to be a way to introduce additional compression.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include quantizing weights for the purpose of efficiently simplifying activation or feature map values, improving model leanness, as taught by Georgiadis (0029). 
	Zheng further teaches or suggests adding a filter layer to change at least one feature having a low value from at least one feature map (see para. 0019 - careful feature pruning and algorithm selection prior to the training phase may result in a more optimized machine learning model; para. 0020 – “special feature" may refer to any feature that is unlikely to improve the training of a machine learning model. Example special features may include ... less important features. pre-processing system may evaluate a performance of each mapping and may remove the values associated with a special feature if the mapping of the pruned dataset excluding such values exceeds a threshold performance level; para. 0037 - generate a pruned dataset based on correlated features by removing, from the input dataset, any feature sets associated with correlated features that are determined to be less important. score is below a threshold score. generate a pruned dataset based on less important numerical features by removing, from the input dataset, any feature sets associated with less important numerical features.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include adding a filter layer to change at least one feature having a low value from at least one feature map for the purpose of efficiently removing features based on comparing feature values to a threshold, improving model optimization, as taught by Zheng (0019, 0020, and 0037).

Claim 2:
	Choi further teaches or suggests wherein adding the feature-map regularization term modifies activation values of feature maps (see para. 0005 - performing training a target model for fine-tuning by adding, to a loss function, a regularization term that reduces the difference between parameters of the source model 110 (refer to non-patent reference 1), a regularization term that reduces the difference between the activation levels of the source model 110 and the target model 100 (refer to non-patent reference 2), and a regularization term that suppresses activation of a feature causing a singular value with small magnitude.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include wherein adding the feature-map regularization term modifies activation values of feature maps for the purpose of efficiently suppressing activation of a feature by fine-tuning a loss function using a regularization term, improving model performance, as taught by Choi (0005). 

Claim 3:
	Zheng further teaches or suggests wherein adding a filter layer to change the at least one feature comprising setting ... based on the low value being less than a predetermined threshold (see para. 0019 - careful feature pruning and algorithm selection prior to the training phase may result in a more optimized machine learning model; para. 0020 – “special feature" may refer to any feature that is unlikely to improve the training of a machine learning model. Example special features may include ... less important features. pre-processing system may evaluate a performance of each mapping and may remove the values associated with a special feature if the mapping of the pruned dataset excluding such values exceeds a threshold performance level; para. 0037 - generate a pruned dataset based on correlated features by removing, from the input dataset, any feature sets associated with correlated features that are determined to be less important. score is below a threshold score. generate a pruned dataset based on less important numerical features by removing, from the input dataset, any feature sets associated with less important numerical features.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include wherein adding a filter layer to change the at least one feature comprising setting ... based on the low value being less than a predetermined threshold for the purpose of efficiently removing features based on comparing feature values to a threshold, improving model optimization, as taught by Zheng (0019, 0020, and 0037).
Georgiadis more specifically teaches or suggests the at least one feature to be equal to zero (see para. 0004 - para. 0005 - target this sparsity to reduce the memory requirements of the training algorithm; para. 0008 - Zero-encoding. sparsifying, using the processor, a number of non-zero values of the activation map; para. 0019 - activation maps of a neural network may be sparsified to reduce the number of non-zero values of the activation map. In the quantization stage, the activation maps of each layer are quantized; para. 0022 - techniques disclosed herein reduce memory consumption by a neural network during training and/or as embedded in a dedicated device; para. 0027 - sparsified by the sparsifier 107 to form sparsified activation maps 111 that have an increased number of values that are equal to zero so that the lossless compression performed by the encoder 110 will be more effective; para. 0031 – Zero-encoding; para. 0052 - Zero-encoding compression mode checks whether the compress unit is formed entirely of zeros and, if so, an empty bitstream is returned; para. 0061 - values that are equal to zero so that the lossless compression performed later will be more effective; para. 0063 - Zero-encoding.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include the at least one feature to be equal to zero for the purpose of efficiently simplifying activation or feature map values, improving model leanness and reducing memory usage, as taught by Georgiadis (0029). 

Claim 4:
	As indicated above, Clement teaches or suggests of the self-attention model.
	Georgiadis further teaches or suggests quantizing feature maps ...; and compressing quantized feature maps (see para. 0019 - activation maps of a neural network may be sparsified to reduce the number of non-zero values of the activation map. In the quantization stage, the activation maps of each layer are quantized; para. 0029 - the non-quantized values of the sparsified activation map 111 may be quantized by the quantizer 108 into integer values having any bit width (i.e., 8 bits, 12 bits, 16 bits, etc.) to form a sparsified and quantized activation map 112. Quantizing by the quantizer 108, if needed, may also be considered to be a way to introduce additional compression; para. 0030 - facilitate compression, the HxWxC sparsified and quantized activation map 112 may be formatted by the formatter 109 into blocks of values, in which each block is referred to herein as "compress units" 113).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include quantizing feature maps ...; and compressing quantized feature maps for the purpose of efficiently simplifying activation or feature map values, improving model leanness, as taught by Georgiadis (0029). 

Claim 5:
	Georgiadis further teaches or suggests wherein quantizing weights comprises using at least 8 bit quantization (see para. 0019 - activation maps of a neural network may be sparsified to reduce the number of non-zero values of the activation map. In the quantization stage, the activation maps of each layer are quantized; para. 0029 - the non-quantized values of the sparsified activation map 111 may be quantized by the quantizer 108 into integer values having any bit width (i.e., 8 bits, 12 bits, 16 bits, etc.) to form a sparsified and quantized activation map 112. Quantizing by the quantizer 108, if needed, may also be considered to be a way to introduce additional compression; para. 0030 - facilitate compression, the HxWxC sparsified and quantized activation map 112 may be formatted by the formatter 109 into blocks of values, in which each block is referred to herein as "compress units" 113).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include wherein quantizing weights comprises using at least 8 bit quantization for the purpose of efficiently simplifying activation or feature map values, improving model leanness, as taught by Georgiadis (0029).

Claim 6:
	Georgiadis further teaches or suggests compressing quantized weights based on an Exponential-Golomb coding technique (see para. 0019 - activation maps of a neural network may be sparsified to reduce the number of non-zero values of the activation map. In the quantization stage, the activation maps of each layer are quantized; para. 0029 - the non-quantized values of the sparsified activation map 111 may be quantized by the quantizer 108 into integer values having any bit width (i.e., 8 bits, 12 bits, 16 bits, etc.) to form a sparsified and quantized activation map 112. Quantizing by the quantizer 108, if needed, may also be considered to be a way to introduce additional compression; para. 0030 - facilitate compression, the HxWxC sparsified and quantized activation map 112 may be formatted by the formatter 109 into blocks of values, in which each block is referred to herein as "compress units" 113; para. 0031 - lossless compression modes include, but are not limited to, ExponentialGolomb; para. 0032 - Exponential-Golomb encoding is a wellknown compression mode that assigns variable length codes in which smaller numbers are assigned shorter codes. The number of bits used to encode numbers increases exponentially, and one parameter, commonly referred to as the order k parameter, controls the rate at which the number of bits increases).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include compressing quantized weights based on an Exponential-Golomb coding technique for the purpose of efficiently simplifying activation or feature map values, improving model leanness, as taught by Georgiadis (0029).

Claim 7:
	Clement further teaches or suggests wherein applying the trained self-attention model further comprises determining a token in an output sequence (see Fig. 1B, 2, 9A, 9B, 10, and 11; para. 0005 - complete a method signature using a neural transformer model with attention. neural transformer model predicts the programming language instructions that implement a method based on a given method signature or on a given combination of a method signature and natural language text describing the method; para. 0077 - neural transformer model factorizes the probability of the target sub tokens in an input sequence into a product of conditional probabilities for each subtoken; para. 0078 - probability distribution generated by the neural transformer model to identify the top k subtokens likely to be the next subtoken in a candidate sequence; para. 0079 - updates the current context sequence with the selected subtoken to input into the neural transformer model to generate an additional probability distribution for the next subtoken in a sequence. This process is repeated until the end of a method token is predicted as being the next likely subtoken candidate; para. 0080 - extract tokens and/or subtokens in an ordered sequence.)

Claim 15:
	Clement teaches or suggests a method comprising:
	training a self-attention model, comprising ... loss function of the self-attention model ... the self-attention model comprising an encoder and a decoder (see Fig. 2-5; para. 0006 - neural transformer model is trained through multi-modal learning which integrates source code and natural language text. The neural transformer model is pre-trained on a large corpus of unsupervised source code methods from various programming languages in order to learn the structure a method signature and a method body and the relationships between them. The neural transformer model is then fine-tuned on various translation tasks using combinations of the features which include a method signature, method body and natural language text describing the method (i.e., method docstring) in order to learn to translate an input sequence to an output sequence; para. 0022 - sequence-to-sequence neural transformer model with attention; para. 0026 - based on a neural transformer model with attention trained on multiple modalities, such as source code and natural language text (e.g., documentation in source code); para. 0027 - neural transformer model is pre-trained on a large unsupervised training dataset of source code; para. 0058 - training iteration includes forward propagation, loss calculation, backpropagation steps followed by updating the weights; para. 0064 - self-attention and normalization layers; para. 0065 - neural networks in the encoder blocks 622 and the decoder blocks 624 are trained iteratively, making multiple passes over the training dataset before converging to a minimum. Each training iteration includes forward propagation, loss calculation, backpropagation steps followed by updating the weights; para. 0077 - computation at training time can be parallelized using masked self-attention.);
	... weights of the trained self-attention model (see para. 0033 - weighs the relevance of each subtoken represented in the context tensor to each other by generating attention weights for each subtoken in the input embedding; para. 0058 - training iteration includes forward propagation, loss calculation, backpropagation steps followed by updating the weights; para. 0065 - neural networks in the encoder blocks 622 and the decoder blocks 624 are trained iteratively, making multiple passes over the training dataset before converging to a minimum. Each training iteration includes forward propagation, loss calculation, backpropagation steps followed by updating the weights.);
	receiving an input sequence including a natural language sentence (see Fig. 1B, 2, 9A, 9B, 10, and 11; para. 0005 - complete a method signature using a neural transformer model with attention. neural transformer model predicts the programming language instructions that implement a method based on a given method signature or on a given combination of a method signature and natural language text describing the method; para. 0077 - neural transformer model factorizes the probability of the target sub tokens in an input sequence into a product of conditional probabilities for each subtoken; para. 0078 - probability distribution generated by the neural transformer model to identify the top k subtokens likely to be the next subtoken in a candidate sequence; para. 0079 - updates the current context sequence with the selected subtoken to input into the neural transformer model to generate an additional probability distribution for the next subtoken in a sequence. This process is repeated until the end of a method token is predicted as being the next likely subtoken candidate; para. 0080 - extract tokens and/or subtokens in an ordered sequence.);
	applying the trained self-attention model for inference to the input sequence, comprising ... of at least one of the encoder and the decoder (see Fig. 1B, 2, 9A, 9B, 10, and 11; para. 0005 - complete a method signature using a neural transformer model with attention. neural transformer model predicts the programming language instructions that implement a method based on a given method signature or on a given combination of a method signature and natural language text describing the method; para. 0077 - neural transformer model factorizes the probability of the target sub tokens in an input sequence into a product of conditional probabilities for each subtoken; para. 0078 - probability distribution generated by the neural transformer model to identify the top k subtokens likely to be the next subtoken in a candidate sequence; para. 0079 - updates the current context sequence with the selected subtoken to input into the neural transformer model to generate an additional probability distribution for the next subtoken in a sequence. This process is repeated until the end of a method token is predicted as being the next likely subtoken candidate; para. 0080 - extract tokens and/or subtokens in an ordered sequence.),
	wherein the self-attention model includes a self-attention mechanism to determine a weight part of the input sequence and calculate an attention score (see Fig. 1B, 2, 9A, 9B, 10, and 11; para. 0005 - complete a method signature using a neural transformer model with attention. neural transformer model predicts the programming language instructions that implement a method based on a given method signature or on a given combination of a method signature and natural language text describing the method; para. 0033 - weighs the relevance of each subtoken represented in the context tensor to each other by generating attention weights for each subtoken in the input embedding. attention function is scaled dot-product attention which is described mathematically; para. 0042 – softmax layer 236 then turns the scores of the logits vector into probabilities for each subtoken in the vocabulary which are positive and normalized; para. 0077 - neural transformer model factorizes the probability of the target sub tokens in an input sequence into a product of conditional probabilities for each subtoken; para. 0078 - probability distribution generated by the neural transformer model to identify the top k subtokens likely to be the next subtoken in a candidate sequence; para. 0079 - updates the current context sequence with the selected subtoken to input into the neural transformer model to generate an additional probability distribution for the next subtoken in a sequence. This process is repeated until the end of a method token is predicted as being the next likely subtoken candidate; para. 0080 - extract tokens and/or subtokens in an ordered sequence.).
	Clement does not explicitly disclose comprising adding a feature-map regularization term to a loss function ... to reduce activation values of feature maps; quantizing weights; adding a filter layer to change at least on feature having a low value from at least one feature map.
Choi teaches or suggests adding a feature-map regularization term to a loss function ... to reduce activation values of feature maps (see para. 0005 - performing training a target model for fine-tuning by adding, to a loss function, a regularization term that reduces the difference between parameters of the source model 110 (refer to non-patent reference 1), a regularization term that reduces the difference between the activation levels of the source model 110 and the target model 100 (refer to non-patent reference 2), and a regularization term that suppresses activation of a feature causing a singular value with small magnitude.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include adding a feature-map regularization term to a loss function ... to reduce activation values of feature maps for the purpose of efficiently suppressing activation of a feature by fine-tuning a loss function using a regularization term, improving model performance, as taught by Choi (0005). 
	Georgiadis further teaches or suggests quantizing weights (see para. 0019 - activation maps of a neural network may be sparsified to reduce the number of non-zero values of the activation map. In the quantization stage, the activation maps of each layer are quantized; para. 0029 - the non-quantized values of the sparsified activation map 111 may be quantized by the quantizer 108 into integer values having any bit width (i.e., 8 bits, 12 bits, 16 bits, etc.) to form a sparsified and quantized activation map 112. Quantizing by the quantizer 108, if needed, may also be considered to be a way to introduce additional compression.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include quantizing weights for the purpose of efficiently simplifying activation or feature map values, improving model leanness, as taught by Georgiadis (0029). 
	Zheng further teaches or suggests comprising adding a filter layer to change at least on feature having a low value from at least one feature map (see para. 0019 - careful feature pruning and algorithm selection prior to the training phase may result in a more optimized machine learning model; para. 0020 – “special feature" may refer to any feature that is unlikely to improve the training of a machine learning model. Example special features may include ... less important features. pre-processing system may evaluate a performance of each mapping and may remove the values associated with a special feature if the mapping of the pruned dataset excluding such values exceeds a threshold performance level; para. 0037 - generate a pruned dataset based on correlated features by removing, from the input dataset, any feature sets associated with correlated features that are determined to be less important. score is below a threshold score. generate a pruned dataset based on less important numerical features by removing, from the input dataset, any feature sets associated with less important numerical features.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include adding a filter layer to change at least on feature having a low value from at least one feature map for the purpose of efficiently removing features based on comparing feature values to a threshold, improving model optimization, as taught by Zheng (0019, 0020, and 0037).

Claim 16:
	As indicated above, Clement teaches or suggests of the encoder and decoder.
	Choi further teaches or suggests wherein adding the feature-map regularization term modifies activation values of the at least one feature map (see para. 0005 - performing training a target model for fine-tuning by adding, to a loss function, a regularization term that reduces the difference between parameters of the source model 110 (refer to non-patent reference 1), a regularization term that reduces the difference between the activation levels of the source model 110 and the target model 100 (refer to non-patent reference 2), and a regularization term that suppresses activation of a feature causing a singular value with small magnitude.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include wherein adding the feature-map regularization term modifies activation values of the at least one feature map for the purpose of efficiently suppressing activation of a feature by fine-tuning a loss function using a regularization term, improving model performance, as taught by Choi (0005). 

Claim 17:
	As indicated above, Clement teaches or suggests of the at least one of the encoder and decoder.
	Zheng further teaches or suggests wherein changing the at least one feature from at least one feature map ... comprises setting the low value based on the low value being less than a predetermined threshold (see para. 0019 - careful feature pruning and algorithm selection prior to the training phase may result in a more optimized machine learning model; para. 0020 – “special feature" may refer to any feature that is unlikely to improve the training of a machine learning model. Example special features may include ... less important features. pre-processing system may evaluate a performance of each mapping and may remove the values associated with a special feature if the mapping of the pruned dataset excluding such values exceeds a threshold performance level; para. 0037 - generate a pruned dataset based on correlated features by removing, from the input dataset, any feature sets associated with correlated features that are determined to be less important. score is below a threshold score. generate a pruned dataset based on less important numerical features by removing, from the input dataset, any feature sets associated with less important numerical features.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include wherein changing the at least one feature from at least one feature map ... comprises setting the low value based on the low value being less than a predetermined threshold for the purpose of efficiently removing features based on comparing feature values to a threshold, improving model optimization, as taught by Zheng (0019, 0020, and 0037).
Georgiadis more specifically teaches or suggests to be equal to zero (see para. 0004 - para. 0005 - target this sparsity to reduce the memory requirements of the training algorithm; para. 0008 - Zero-encoding. sparsifying, using the processor, a number of non-zero values of the activation map; para. 0019 - activation maps of a neural network may be sparsified to reduce the number of non-zero values of the activation map. In the quantization stage, the activation maps of each layer are quantized; para. 0022 - techniques disclosed herein reduce memory consumption by a neural network during training and/or as embedded in a dedicated device; para. 0027 - sparsified by the sparsifier 107 to form sparsified activation maps 111 that have an increased number of values that are equal to zero so that the lossless compression performed by the encoder 110 will be more effective; para. 0031 – Zero-encoding; para. 0052 - Zero-encoding compression mode checks whether the compress unit is formed entirely of zeros and, if so, an empty bitstream is returned; para. 0061 - values that are equal to zero so that the lossless compression performed later will be more effective; para. 0063 - Zero-encoding.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include to be equal to zero for the purpose of efficiently simplifying activation or feature map values, improving model leanness and reducing memory usage, as taught by Georgiadis (0029). 

Claim 18:
	Georgiadis further teaches or suggests wherein quantizing the weights of at least one of the encoder and the decoder comprises using at least 8 bit quantization (see para. 0019 - activation maps of a neural network may be sparsified to reduce the number of non-zero values of the activation map. In the quantization stage, the activation maps of each layer are quantized; para. 0029 - the non-quantized values of the sparsified activation map 111 may be quantized by the quantizer 108 into integer values having any bit width (i.e., 8 bits, 12 bits, 16 bits, etc.) to form a sparsified and quantized activation map 112. Quantizing by the quantizer 108, if needed, may also be considered to be a way to introduce additional compression; para. 0030 - facilitate compression, the HxWxC sparsified and quantized activation map 112 may be formatted by the formatter 109 into blocks of values, in which each block is referred to herein as "compress units" 113).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include wherein quantizing the weights of at least one of the encoder and the decoder comprises using at least 8 bit quantization for the purpose of efficiently simplifying activation or feature map values, improving model leanness, as taught by Georgiadis (0029). 

Claim 19:
Georgiadis further teaches or suggests compressing quantized weights of at least one of the encoder and decoder (see para. 0004 - para. 0005 - target this sparsity to reduce the memory requirements of the training algorithm; para. 0008 - Zero-encoding. sparsifying, using the processor, a number of non-zero values of the activation map; para. 0019 - activation maps of a neural network may be sparsified to reduce the number of non-zero values of the activation map. In the quantization stage, the activation maps of each layer are quantized; para. 0022 - techniques disclosed herein reduce memory consumption by a neural network during training and/or as embedded in a dedicated device; para. 0027 - sparsified by the sparsifier 107 to form sparsified activation maps 111 that have an increased number of values that are equal to zero so that the lossless compression performed by the encoder 110 will be more effective; para. 0031 – Zero-encoding; para. 0052 - Zero-encoding compression mode checks whether the compress unit is formed entirely of zeros and, if so, an empty bitstream is returned; para. 0061 - values that are equal to zero so that the lossless compression performed later will be more effective; para. 0063 - Zero-encoding.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include compressing quantized weights of at least one of the encoder and decoder for the purpose of efficiently simplifying activation or feature map values, improving model leanness and reducing memory usage, as taught by Georgiadis (0029).

Claim 20:
	Clement further teaches or suggests wherein applying the trained self-attention model further comprises determining a token in an output sequence (see Fig. 1B, 2, 9A, 9B, 10, and 11; para. 0005 - complete a method signature using a neural transformer model with attention. neural transformer model predicts the programming language instructions that implement a method based on a given method signature or on a given combination of a method signature and natural language text describing the method; para. 0077 - neural transformer model factorizes the probability of the target sub tokens in an input sequence into a product of conditional probabilities for each subtoken; para. 0078 - probability distribution generated by the neural transformer model to identify the top k subtokens likely to be the next subtoken in a candidate sequence; para. 0079 - updates the current context sequence with the selected subtoken to input into the neural transformer model to generate an additional probability distribution for the next subtoken in a sequence. This process is repeated until the end of a method token is predicted as being the next likely subtoken candidate; para. 0080 - extract tokens and/or subtokens in an ordered sequence.)

Claim(s) 8-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Clement, in view of Ren et al., US Publication 2022/0310068 (“Ren”), in view of Choi, in view of Zheng, and further in view of Georgiadis.
Claim 8:
	Clement teaches or suggests a transforming self-attention model, comprising:
	an encoder having multiple layers, ... a multi-head attention sublayer that receives a first input sequence including a natural language sentence and processes an encoder query feature map Q, an encoder key feature map K, and an encoder value map V, an encoder layer being trained for inference based on the first input sequence ... for the encoder ... having at least one feature ... from at least one of the encoder Q, K and V feature maps (see Fig. 2 and 6; para. 0030 – initial inputs to an encoder block 202 are the input embeddings 206 of an input sequence of the training dataset. In order to retain the order of the tokens in the input sequence; para. 0031 - encoder block 202 consists of two layers. The first layer includes a multi-head attention component 210; para. 0034 - where the input consists of queries Q and keys K of dimension d"' and values V of dimension dv. Q is a matrix that contains the query or vector representation of one subtoken in a sequence, K is the vector representations of all subtokens in the sequence, and V is the vector representations of all the subtokens in the sequence; para. 0063 - first encoder block 622 of the neural transformer model 620 takes the context tensor 618 as input and passes it through the multiple layers of multi-head attention; para. 0065 - neural networks in the encoder blocks 622 and the decoder blocks 624 are trained iteratively, making multiple passes over the training dataset before converging to a minimum. Each training iteration includes forward propagation, loss calculation, backpropagation steps followed by updating the weights.); and
	a decoder having multiple layers, ... including a decoder multi-head attention sublayer that receives a second input sequence including a natural language sentence and processes a decoder query feature map Q, a decoder key feature map K, and a decoder value feature map V, and each decoder layer being trained for inference on the second input sequence ... for the decoder ...  having at least one feature ... from at least one decoder Q, K and V feature maps (see Fig. 2 and 6; para. 0034 - where the input consists of queries Q and keys K of dimension d"' and values V of dimension dv. Q is a matrix that contains the query or vector representation of one subtoken in a sequence, K is the vector representations of all subtokens in the sequence, and V is the vector representations of all the subtokens in the sequence; para. 0040 - first layer includes a masked multi-head attention component 222 followed by a layer normalization component 224. The output of the layer normalization component 224 is input into the encoder-decoder multi-head attention component 226 with a residual connection to layer normalization component 228. The second layer includes an encoder-decoder multi-head attention component 226 followed by a layer normalization component 228; para. 0041 - multi-head attention layer 226 receives queries from the previous decoder layer 225 and the memory keys and values 217 from the output of the encoder block 202. In this manner, the decoder block 204 can attend to every position of the input sequence; para. 0065 - neural networks in the encoder blocks 622 and the decoder blocks 624 are trained iteratively, making multiple passes over the training dataset before converging to a minimum. Each training iteration includes forward propagation, loss calculation, backpropagation steps followed by updating the weights.),
	wherein the decoder includes a linear classifier that produces a probability corresponding to a determination of a token in an output sequence (see Fig. 2, 9B; para. 0042 – linear layer 234 projects the vector produced by the stack of decoders into a logits vector. The softmax layer 236 then turns the scores of the logits vector into probabilities for each subtoken in the vocabulary which are positive and normalized; para. 0083 - decoder block outputs a vector of floating point numbers that is projected by the linear layer 936 into unnormalized predictions or logits.).
Clement does not explicitly disclose each encoder layer including an encoder multi-head attention sublayer ... adding a feature-map regularization term to a loss function ... to reduce activation values of feature maps ... feature having a low value changed ... having weights of the encoder quantized; each decoder layer including a decoder multi-head attention sublayer ... adding a feature-map regularization term to a loss function ... to reduce activation values of feature maps ... having a low value changed ... feature having weights of the decoder quantized.
	Ren further teaches or suggests each encoder layer including an encoder multi-head attention sublayer; each decoder layer including a decoder multi-head attention sublayer (see Fig. 9, 10; para. 0040 - Each encoder layer 501-i (i is a positive integer between 1 and N) may include two feed forward modules, a multi-head self-attention module; para. 0041 - each decoder layer 601-j may include a multi-head cross attention module, a multi-head self-attention module.). 
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include each encoder layer including an encoder multi-head attention sublayer; each decoder layer including a decoder multi-head attention sublayer for the purpose of efficiently capturing different relationships and dependency in the input data using attention mechanisms, improving model performance, as taught by Ren (0002, 0041, and 0042).
Choi teaches or suggests adding a feature-map regularization term to a loss function ... to reduce activation values of feature maps; adding a feature-map regularization term to a loss function ... to reduce activation values of feature maps (see para. 0005 - performing training a target model for fine-tuning by adding, to a loss function, a regularization term that reduces the difference between parameters of the source model 110 (refer to non-patent reference 1), a regularization term that reduces the difference between the activation levels of the source model 110 and the target model 100 (refer to non-patent reference 2), and a regularization term that suppresses activation of a feature causing a singular value with small magnitude.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include adding a feature-map regularization term to a loss function ... to reduce activation values of feature maps; adding a feature-map regularization term to a loss function ... to reduce activation values of feature maps for the purpose of efficiently suppressing activation of a feature by fine-tuning a loss function using a regularization term, improving model performance, as taught by Choi (0005). 
Zheng further teaches or suggests comprising feature having a low value changed; feature having a low value changed (see para. 0019 - careful feature pruning and algorithm selection prior to the training phase may result in a more optimized machine learning model; para. 0020 – “special feature" may refer to any feature that is unlikely to improve the training of a machine learning model. Example special features may include ... less important features. pre-processing system may evaluate a performance of each mapping and may remove the values associated with a special feature if the mapping of the pruned dataset excluding such values exceeds a threshold performance level; para. 0037 - generate a pruned dataset based on correlated features by removing, from the input dataset, any feature sets associated with correlated features that are determined to be less important. score is below a threshold score. generate a pruned dataset based on less important numerical features by removing, from the input dataset, any feature sets associated with less important numerical features.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include feature having a low value changed; feature having a low value changed for the purpose of efficiently removing features based on comparing feature values to a threshold, improving model optimization, as taught by Zheng (0019, 0020, and 0037).
Georgiadis further teaches or suggests having weights of the encoder quantized; having weights of the decoder quantized (see Fig. 1A, 1B, 2A, 2B, 3; para. 0019 - activation maps of a neural network may be sparsified to reduce the number of non-zero values of the activation map. In the quantization stage, the activation maps of each layer are quantized; para. 0029 - the non-quantized values of the sparsified activation map 111 may be quantized by the quantizer 108 into integer values having any bit width (i.e., 8 bits, 12 bits, 16 bits, etc.) to form a sparsified and quantized activation map 112. Quantizing by the quantizer 108, if needed, may also be considered to be a way to introduce additional compression.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include having weights of the encoder quantized; having weights of the decoder quantized for the purpose of efficiently simplifying activation or feature map values, improving model leanness, as taught by Georgiadis (0029). 

Claim 9:
	As indicated above, Clement teaches or suggests for the encoder ... of the encoder ... for the decoder ... of the decoder.
	Choi further teaches or suggests wherein adding the feature map regularization term to the loss function ... modifies activation values ..., and adding the feature map regularization term to the loss function ... modifies activation values ... (see para. 0005 - performing training a target model for fine-tuning by adding, to a loss function, a regularization term that reduces the difference between parameters of the source model 110 (refer to non-patent reference 1), a regularization term that reduces the difference between the activation levels of the source model 110 and the target model 100 (refer to non-patent reference 2), and a regularization term that suppresses activation of a feature causing a singular value with small magnitude.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include wherein adding the feature map regularization term to the loss function ... modifies activation values ..., and adding the feature map regularization term to the loss function ... modifies activation values ... for the purpose of efficiently suppressing activation of a feature by fine-tuning a loss function using a regularization term, improving model performance, as taught by Choi (0005). 

Claim 10:
	As indicated above, Clement teaches or suggests from the at least one of the encoder and decoder.
	Zheng further teaches or suggests wherein the at least one feature changed ... is changed by setting the low value ... based on the low value being less than a predetermined threshold (see para. 0019 - careful feature pruning and algorithm selection prior to the training phase may result in a more optimized machine learning model; para. 0020 – “special feature" may refer to any feature that is unlikely to improve the training of a machine learning model. Example special features may include ... less important features. pre-processing system may evaluate a performance of each mapping and may remove the values associated with a special feature if the mapping of the pruned dataset excluding such values exceeds a threshold performance level; para. 0037 - generate a pruned dataset based on correlated features by removing, from the input dataset, any feature sets associated with correlated features that are determined to be less important. score is below a threshold score. generate a pruned dataset based on less important numerical features by removing, from the input dataset, any feature sets associated with less important numerical features.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include wherein the at least one feature changed ... is changed by setting the low value ... based on the low value being less than a predetermined threshold for the purpose of efficiently removing features based on comparing feature values to a threshold, improving model optimization, as taught by Zheng (0019, 0020, and 0037).
Georgiadis more specifically teaches or suggests to be equal to zero (see para. 0004 - para. 0005 - target this sparsity to reduce the memory requirements of the training algorithm; para. 0008 - Zero-encoding. sparsifying, using the processor, a number of non-zero values of the activation map; para. 0019 - activation maps of a neural network may be sparsified to reduce the number of non-zero values of the activation map. In the quantization stage, the activation maps of each layer are quantized; para. 0022 - techniques disclosed herein reduce memory consumption by a neural network during training and/or as embedded in a dedicated device; para. 0027 - sparsified by the sparsifier 107 to form sparsified activation maps 111 that have an increased number of values that are equal to zero so that the lossless compression performed by the encoder 110 will be more effective; para. 0031 – Zero-encoding; para. 0052 - Zero-encoding compression mode checks whether the compress unit is formed entirely of zeros and, if so, an empty bitstream is returned; para. 0061 - values that are equal to zero so that the lossless compression performed later will be more effective; para. 0063 - Zero-encoding.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include to be equal to zero for the purpose of efficiently simplifying activation or feature map values, improving model leanness and reducing memory usage, as taught by Georgiadis (0029). 

Claim 11:
Georgiadis further teaches or suggests wherein weights of at least one of the encoder and the decoder are quantized (see Fig. 1A, 1B, 2A, 2B, 3; para. 0019 - activation maps of a neural network may be sparsified to reduce the number of non-zero values of the activation map. In the quantization stage, the activation maps of each layer are quantized; para. 0029 - the non-quantized values of the sparsified activation map 111 may be quantized by the quantizer 108 into integer values having any bit width (i.e., 8 bits, 12 bits, 16 bits, etc.) to form a sparsified and quantized activation map 112. Quantizing by the quantizer 108, if needed, may also be considered to be a way to introduce additional compression.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include wherein weights of at least one of the encoder and the decoder are quantized for the purpose of efficiently simplifying activation or feature map values, improving model leanness, as taught by Georgiadis (0029).

Claim 12:
Georgiadis further teaches or suggests wherein weights of at least one of the encoder and the decoder are compressed (see para. 0004 - para. 0005 - target this sparsity to reduce the memory requirements of the training algorithm; para. 0008 - Zero-encoding. sparsifying, using the processor, a number of non-zero values of the activation map; para. 0019 - activation maps of a neural network may be sparsified to reduce the number of non-zero values of the activation map. In the quantization stage, the activation maps of each layer are quantized; para. 0022 - techniques disclosed herein reduce memory consumption by a neural network during training and/or as embedded in a dedicated device; para. 0027 - sparsified by the sparsifier 107 to form sparsified activation maps 111 that have an increased number of values that are equal to zero so that the lossless compression performed by the encoder 110 will be more effective; para. 0031 – Zero-encoding; para. 0052 - Zero-encoding compression mode checks whether the compress unit is formed entirely of zeros and, if so, an empty bitstream is returned; para. 0061 - values that are equal to zero so that the lossless compression performed later will be more effective; para. 0063 - Zero-encoding.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include weights of at least one of the encoder and the decoder are compressed for the purpose of efficiently simplifying activation or feature map values, improving model leanness and reducing memory usage, as taught by Georgiadis (0029).

Claim 13:
	Georgiadis further teaches or suggests wherein weights of at least one of the encoder and the decoder are quantized based on an Exponential-Golomb coding technique (see para. 0019 - activation maps of a neural network may be sparsified to reduce the number of non-zero values of the activation map. In the quantization stage, the activation maps of each layer are quantized; para. 0029 - the non-quantized values of the sparsified activation map 111 may be quantized by the quantizer 108 into integer values having any bit width (i.e., 8 bits, 12 bits, 16 bits, etc.) to form a sparsified and quantized activation map 112. Quantizing by the quantizer 108, if needed, may also be considered to be a way to introduce additional compression; para. 0030 - facilitate compression, the HxWxC sparsified and quantized activation map 112 may be formatted by the formatter 109 into blocks of values, in which each block is referred to herein as "compress units" 113; para. 0031 - lossless compression modes include, but are not limited to, ExponentialGolomb; para. 0032 - Exponential-Golomb encoding is a wellknown compression mode that assigns variable length codes in which smaller numbers are assigned shorter codes. The number of bits used to encode numbers increases exponentially, and one parameter, commonly referred to as the order k parameter, controls the rate at which the number of bits increases).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Clement, to include wherein weights of at least one of the encoder and the decoder are quantized based on an Exponential-Golomb coding technique for the purpose of efficiently simplifying activation or feature map values, improving model leanness, as taught by Georgiadis (0029).

Claim 14:
	Clement further teaches or suggests wherein the at least one of the encoder and the decoder determines the token in the output sequence based on a softmax function (see Fig. 1B, 2, 9A, 9B, 10, and 11; para. 0005 - complete a method signature using a neural transformer model with attention. neural transformer model predicts the programming language instructions that implement a method based on a given method signature or on a given combination of a method signature and natural language text describing the method; para. 0042 – softmax layer 236 then turns the scores of the logits vector into probabilities for each subtoken in the vocabulary which are positive and normalized; para. 0077 - neural transformer model factorizes the probability of the target sub tokens in an input sequence into a product of conditional probabilities for each subtoken; para. 0078 - probability distribution generated by the neural transformer model to identify the top k subtokens likely to be the next subtoken in a candidate sequence; para. 0079 - updates the current context sequence with the selected subtoken to input into the neural transformer model to generate an additional probability distribution for the next subtoken in a sequence. This process is repeated until the end of a method token is predicted as being the next likely subtoken candidate; para. 0080 - extract tokens and/or subtokens in an ordered sequence; para. 0083 - logits 942 are normalized using the softmax function 944 to generate the softmax prediction.).

Response to Arguments
Applicant’s further arguments have been considered but are not persuasive because the arguments do not correspond to the rationales as used in the current rejection.

	
	
	
	
	
	Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Andrew T McIntosh whose telephone number is (571)270-7790. The examiner can normally be reached M-Th 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached at 571-272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANDREW T MCINTOSH/Primary Examiner, Art Unit 2144
Read full office action
Prosecution Timeline

Show 2 earlier events
Apr 09, 2025
Applicant Interview (Telephonic)
Apr 10, 2025
Examiner Interview Summary
Apr 18, 2025
Response Filed
Aug 19, 2025
Final Rejection mailed — §101, §103
Nov 14, 2025
Examiner Interview Summary
Nov 18, 2025
Request for Continued Examination
Nov 28, 2025
Response after Non-Final Action
Mar 17, 2026
Non-Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/959,302
Patent 12632647
ARTIFICIAL INTELLIGENCE ASSISTED RECOGNITION METHOD AND DEVICE
5y 10m to grant Granted May 19, 2026
18/169,793
Patent 12632669
GENERATIVE COLLABORATIVE PUBLISHING SYSTEM
3y 3m to grant Granted May 19, 2026
17/807,039
Patent 12626166
PREDICTING THE NEED FOR XAI IN ARTIFICIAL INTELLIGENCE SYSTEMS
3y 11m to grant Granted May 12, 2026
17/867,958
Patent 12619983
MACHINE LEARNING CLASSIFIER BASED ON CATEGORY MODELING
3y 9m to grant Granted May 05, 2026
15/415,693
Patent 12602534
Method and System to Display Content from a PDF Document on a Small Screen
9y 2m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
77%
Grant Probability
95%
With Interview (+18.0%)
3y 0m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 515 resolved cases by this examiner. Grant probability derived from career allowance rate.