Last updated: April 19, 2026
Application No. 18/130,648
GENERATING GLOBAL HIERARCHICAL SELF-ATTENTION

Non-Final OA §101§102§103
Filed
Apr 04, 2023
Examiner
PARK, GRACE A
Art Unit
2144
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
1 (Non-Final)
Interview Optional

— +18.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 557 resolved cases, 2023–2026
Examiner Intelligence

PARK, GRACE A View full profile →
Grants 76% — above average
Career Allow Rate
421 granted / 557 resolved
+20.6% vs TC avg
Strong +18% interview lift
Without
With
+18.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
23 currently pending
Career history
580
Total Applications
across all art units
Statute-Specific Performance

§101
11.1%
-28.9% vs TC avg
§103
53.7%
+13.7% vs TC avg
§102
17.0%
-23.0% vs TC avg
§112
10.4%
-29.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 557 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 1 is objected to because of the following informalities: The limitation “the second subregion” has no antecedent basis and should be corrected to “a second subregion.”  Appropriate correction is required.

Claim Rejections - 35 USC § 101 - Alice
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-4, 8-11, and 14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

[CLAIM 1]
A processor, comprising: one or more circuits to cause one or more machine learning processes to process at least a portion of a feature map, a first subregion of the feature map to be associated with a plurality of values, the portion of the feature map to be processed based, at least in part, on the plurality of values, a first value determined based at least in part on the plurality of values, and a second value representing a second subregion.

Claim interpretation: Under the broadest reasonable interpretation, the terms of the claim are presumed to have their plain meaning consistent with the specification as it would be interpreted by one of ordinary skill in the art. See Manual of Patent Examining Procedure (MPEP) 2111.

Regarding the limitation “A processor, comprising: one or more circuits,” the claim does not provide any details about the processor comprising the one or more circuits. The recited processor is recited at a high level of generality, i.e., as generic computer performing generic computer functions.
Regarding the limitation “one or more machine learning processes,” the claim also does not provide any details about how the one or more machine learning processes are implemented. The one or more machine learning processes are recited at a high level of generality, i.e., as generic machine learning processes performed by generic machine learning models.
Regarding the limitation “process at least a portion of a feature map, a first subregion of the feature map to be associated with a plurality of values, the portion of the feature map to be processed based, at least in part, on the plurality of values, a first value determined based at least in part on the plurality of values, and a second value representing a second subregion,” the first subregion is associated with a plurality of values, and the second subregion is represented by a second value. Machine learning is used to generate a first value using the plurality of values as input. The claim does not provide any details about what the features represent, how the features are divided, what any of the values indicate, how the machine learning generates the first value based on the plurality of values, and whether the second subregion is a subregion of the feature map.
The broadest reasonable interpretation of claim 1 is a processor comprising one or more circuits executing one or more machine learning processes to generate a first value from a plurality of values, where the plurality of values is associated with a first subregion of a feature map. There is also a second subregion that is represented by a second value.

Step 1: This part of the eligibility analysis evaluates whether the claim falls within any statutory category. See MPEP 2106.03. The claim recites a processor comprising one or more circuits. Thus, the claim is to a machine, which is one of the statutory categories of invention. (Step 1: YES, also applicable to claims 2-4, 8, and 9).

Step 2A, Prong One: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04, subsection II, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim. 

process at least a portion of a feature map, a first subregion of the feature map to be associated with a plurality of values, the portion of the feature map to be processed based, at least in part, on the plurality of values, a first value determined based at least in part on the plurality of values, and a second value representing a second subregion – the processing may be practically performed in the human mind using observation, evaluation, judgment, and/or opinion. For example, a feature map can be processed mentally (or with the aid of pen and paper) by dividing the features into two subregions, associating a first subregion with a plurality of values (e.g., pick 3 numbers), determining a first value based on the plurality of values (e.g., add the 3 numbers together), then representing a second subregion with a second value (e.g., pick a number). (Step 2A, Prong One: YES).

Step 2A, Prong Two: This part of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception. This evaluation is performed by (1) identifying whether there are any additional elements recited in the claim beyond the judicial exception, and (2) evaluating those additional elements individually and in combination to determine whether the claim as a whole integrates the exception into a practical application. See MPEP 2106.04(d).
The claim recites an additional limitation: one or more machine learning processes to perform the processing.
MPEP 2106.05(f) provides the following considerations for determining whether a claim simply recites a judicial exception with the words “apply it” (or an equivalent), such as mere instructions to implement an abstract idea on a computer: (1) whether the claim recites only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished; (2) whether the claim invokes computers or other machinery merely as a tool to perform an existing process; and (3) the particularity or generality of the application of the judicial exception.
As explained above, the claim is directed to a processor comprising one or more circuits executing one or more machine learning processes to generate a first value from a plurality of values, where the plurality of values is associated with a first subregion of a feature map. There is also a second subregion that is represented by a second value. Since none of the steps in the claim are directed to a solution to a problem, the claim fails to recite details of how a solution to a problem is accomplished. 
The claim also does not provide any details about how the one or more machine learning processes are implemented. The recited machine learning processes are recited at a high level of generality, i.e., as generic machine learning processes performed by generic machine learning models.  Thus, the additional limitations invoke the machine learning processes merely as tools to generally apply the abstract idea without placing any limits on how they function. Therefore, the additional limitations provide nothing more than mere instructions to implement an abstract idea on a “machine” (i.e., generic computer). See MPEP 2106.05(f). 
Further, the processing is recited as being performed by A processor, comprising: one or more circuits. The processor is recited at a high level of generality and used as a tool to perform generic computer functions. See MPEP 2106.05(f). In these limitations, the processor is used to perform an abstract idea, as discussed above in Step 2A, Prong One, such that it amounts to no more than mere instructions to apply the exception using a generic computer. See MPEP 2106.05(f).
Even when viewed in combination, the above-noted additional limitations do not integrate the recited judicial exception into a practical application (Step 2A, Prong Two: NO), and the claim is directed to the judicial exception. (Step 2A: YES).

Step 2B: This part of the eligibility analysis evaluates whether the claim as a whole amounts to significantly more than the recited exception i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim. See MPEP 2106.05.
As explained with respect to Step 2A, Prong Two, the additional limitation involving the machine learning processes is at best mere instructions to “apply” the abstract idea, which cannot provide an inventive concept. See MPEP 2106.05(f).
Further, as discussed in Step 2A, Prong Two above, the recitation of a processor to execute the machine learning processes amounts to no more than mere instructions to apply the exception using a generic computer.
Even when considered in combination, the above-noted additional limitations represent mere instructions to implement an abstract idea or other exception on a computer, which do not provide an inventive concept. (Step 2B: NO).



[CLAIM 2]
Step 2A, Prong One: wherein the one or more circuits are to determine relation values between the first value and the plurality of values, and are to process the portion of the feature map based, at least in part, on the relation values – this elaborates on the processing recited in claim 1 with the addition of relation values. (Step 2A, Prong One: YES).
Step 2A, Prong Two: No further additional limitations recited. (Step 2A, Prong Two: NO). (Step 2A: YES).
Step 2B: No further additional limitations recited. (Step 2B: NO).

[CLAIM 3]
Step 2A, Prong One: wherein the one or more circuits are to determine relation values between the first value and the second value, and are to process the portion of the feature map based, at least in part, on the relation values – this elaborates on the processing recited in claim 1 with the addition of relation values. (Step 2A, Prong One: YES).
Step 2A, Prong Two: No further additional limitations recited. (Step 2A, Prong Two: NO). (Step 2A: YES).
Step 2B: No further additional limitations recited. (Step 2B: NO).

[CLAIM 4]
Step 2A, Prong One: wherein the one or more circuits are to: determine at least one first relation value indicating relatedness between the first value and the plurality of values; determine at least one second relation value indicating relatedness between the first value and the second value; and process the portion of the feature map based, at least in part, on the at least one first relation value and the at least one second relation value – this elaborates on the processing recited in claim 1 with the addition of first and second relation values. (Step 2A, Prong One: YES).
Step 2A, Prong Two: No further additional limitations recited. (Step 2A, Prong Two: NO). (Step 2A: YES).
Step 2B: No further additional limitations recited. (Step 2B: NO).

[CLAIM 8]
Step 2A, Prong One: wherein the one or more circuits are to determine the first value, based, at least in part, on an average value of the plurality of values – this elaborates on the processing recited in claim 1 but specifies taking the average of the plurality of values. (Step 2A, Prong One: YES).
Step 2A, Prong Two: No further additional limitations recited. (Step 2A, Prong Two: NO). (Step 2A: YES).
Step 2B: No further additional limitations recited. (Step 2B: NO).

[CLAIM 9]
Step 2A, Prong One: wherein, the second subregion is associated with a second plurality of values, and the one or more circuits are to determine the second value, based, at least in part, on an average value of the second plurality of values – this elaborates on the processing recited in claim 1 with but specifies taking the average of a second plurality of values. (Step 2A, Prong One: YES).
Step 2A, Prong Two: No further additional limitations recited. (Step 2A, Prong Two: NO). (Step 2A: YES).
Step 2B: No further additional limitations recited. (Step 2B: NO).


[CLAIM 10]
A system, comprising: at least one processor; and memory storing instructions that when executed by the at least one processor cause the at least one processor to: 
(a) determine at least one first relation value indicating relatedness between a plurality of first values obtained from a first subsection of data and a first metric calculated based at least in part on the plurality of first values; 
(b) determine at least one second relation value indicating relatedness between the first metric and a second metric calculated based at least in part on a plurality of second values associated with a second subsection of the data; and 
(c) at least one of classify, detect, or segment at least a portion of the data based at least in part on at least one of the at least one first relation value or the at least one second relation value.

Claim interpretation: Under the broadest reasonable interpretation, the terms of the claim are presumed to have their plain meaning consistent with the specification as it would be interpreted by one of ordinary skill in the art. See Manual of Patent Examining Procedure (MPEP) 2111.

Regarding steps (a) and (b), first and second relation values are determined based on other values. The claim does not provide any details about the plurality of first values, the first subsection of data, the first metric, the plurality of second values, the second subsection of data, and the second metric.
Regarding step (c), the data is classified, detected, or segmented based on the first or second relation values. The claim does not provide any details about how the classifying, detecting, or segmenting are performed.
Steps (a)-(c) are all recited as being implemented by a system comprising at least one processor and memory storing instructions (i.e., a computer). The computer is recited at a high level of generality, i.e., as generic computer performing generic computer functions.

The broadest reasonable interpretation of claim 1 is a generic computer for executing instructions for determining first/second relation values indicating relatedness between plurality of first/second values obtained from first/second subsection of data and first/second metric calculated from the first/second plurality of values. The data is classified, detected, or segmented based on the first/second relation values.

Step 1: This part of the eligibility analysis evaluates whether the claim falls within any statutory category. See MPEP 2106.03. The claim recites a system comprising a processor and memory. Thus, the claim is to a machine, which is one of the statutory categories of invention. (Step 1: YES, also applicable to claims 2-4, 8, and 9).

Step 2A, Prong One: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04, subsection II, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim. 

(a) determine at least one first relation value indicating relatedness between a plurality of first values obtained from a first subsection of data and a first metric calculated based at least in part on the plurality of first values – Step (a) may be practically performed in the human mind using observation, evaluation, judgment, and/or opinion. For example, the determining can be performed mentally (or with the aid of pen and paper) by dividing data into two subsections, selecting a plurality of first values from the first subsection (e.g., pick 3 numbers), calculating a first metric (e.g., add the 3 numbers), and deciding on first relation values that indicate relatedness (e.g., variance or deviation). (Step 2A, Prong One: YES);
(b) determine at least one second relation value indicating relatedness between the first metric and a second metric calculated based at least in part on a plurality of second values associated with a second subsection of the data – Step (b) may be practically performed in the human mind using observation, evaluation, judgment, and/or opinion. For example, the determining can be performed mentally (or with the aid of pen and paper) by dividing data into two subsections, selecting a plurality of second values from the second subsection (e.g., pick 3 numbers), calculating a second metric (e.g., add the 3 numbers), and deciding on second relation values that indicate relatedness (e.g., distance or similarity). (Step 2A, Prong One: YES); and 
(c) at least one of classify, detect, or segment at least a portion of the data based at least in part on at least one of the at least one first relation value or the at least one second relation value – Step (c) may be practically performed in the human mind using observation, evaluation, judgment, and/or opinion. For example, the classify, detect, or segment can be performed mentally (or with the aid of pen and paper) by reviewing the first/second relation values, then deciding how to classify/detect/or segment the data based on the review. (Step 2A, Prong One: YES).

Step 2A, Prong Two: This part of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception. This evaluation is performed by (1) identifying whether there are any additional elements recited in the claim beyond the judicial exception, and (2) evaluating those additional elements individually and in combination to determine whether the claim as a whole integrates the exception into a practical application. See MPEP 2106.04(d).
Steps (a)-(c) are recited as being performed by a system comprising at least one processor; and memory storing instructions (i.e., a computer). The computer is recited at a high level of generality and used as a tool to perform generic computer functions. See MPEP 2106.05(f). In these limitations, the processor is used to perform an abstract idea, as discussed above in Step 2A, Prong One, such that it amounts to no more than mere instructions to apply the exception using a generic computer. See MPEP 2106.05(f).
Even when viewed in combination, the above-noted additional limitations do not integrate the recited judicial exception into a practical application (Step 2A, Prong Two: NO), and the claim is directed to the judicial exception. (Step 2A: YES).

Step 2B: This part of the eligibility analysis evaluates whether the claim as a whole amounts to significantly more than the recited exception i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim. See MPEP 2106.05.
As discussed in Step 2A, Prong Two above, the recitation of a processor and memory amounts to no more than mere instructions to apply the exception using a generic computer.
Even when considered in combination, the above-noted additional limitations represent mere instructions to implement an abstract idea or other exception on a computer, which do not provide an inventive concept. (Step 2B: NO).

[CLAIM 11]
Step 2A, Prong One: wherein the instructions, when executed by the at least one processor, cause at least one processor to: determine at least one third relation value indicating relatedness between the plurality of second values and the second metric; and determine at least one measure of relatedness between at least a portion of the plurality of first values and at least a portion of the plurality of second values based at least in part on the at least one first relation value, the at least one second relation value, and the at least one third relation value – this may be practically performed in the human mind using observation, evaluation, judgment, and/or opinion. For example, the determining of the third relation value and at least one measure of relatedness can be performed mentally (or with the aid of pen and paper) by reviewing and deciding. (Step 2A, Prong One: YES).
Step 2A, Prong Two: No further additional limitations recited. (Step 2A, Prong Two: NO). (Step 2A: YES).
Step 2B: No further additional limitations recited. (Step 2B: NO).

[CLAIM 14]
Step 2A, Prong One: wherein the first metric is calculated, based, at least in part, on an average value of the plurality of first values, and the second metric is calculated, based, at least in part, on an average value of the plurality of second values – this elaborates on the first/second metrics recited in claim 1 with but specifies that the metrics are averages. (Step 2A, Prong One: YES).
Step 2A, Prong Two: No further additional limitations recited. (Step 2A, Prong Two: NO). (Step 2A: YES).
Step 2B: No further additional limitations recited. (Step 2B: NO).


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-6, 10-13, 15-18, 21, and 22 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Chen et al. (US Pub. 20230386197).

Referring to claim 1, Chen discloses A processor, comprising: one or more circuits [fig. 2, processor(s) 204] to cause one or more machine learning processes [fig. 2; par. 18; vision tool 220 comprises one or more neural networks] to process at least a portion of a feature map [pars. 43-45; the vision tool generates (i.e., processes) at least one feature map for a visual content item], a first subregion of the feature map to be associated with a plurality of values [pars. 43-45; each region (i.e., a first region and a second region) of the visual content item is associated with a plurality of local tokens], the portion of the feature map to be processed based, at least in part, on the plurality of values, a first value determined based at least in part on the plurality of values, and a second value representing the second subregion [pars. 29, 31, and 43-45; the feature map is generated based on a plurality of local tokens associated with the first region, a regional token (i.e., a first value) covering the plurality of local features, and a regional token (i.e., a second value) associated with the second region].

Referring to claim 2, Chen discloses The processor of claim 1, wherein the one or more circuits are to determine relation values between the first value and the plurality of values, and are to process the portion of the feature map based, at least in part, on the relation values [par. 36; local self-attention (LSA) combines features from both regional and local tokens associated with the first region].

Referring to claim 3, Chen discloses The processor of claim 1, wherein the one or more circuits are to determine relation values between the first value and the second value, and are to process the portion of the feature map based, at least in part, on the relation values [par. 36; regional self-attention (RSA) combines features from regional tokens associated with both the first and second regions].

Referring to claim 4, Chen discloses The processor of claim 1, wherein the one or more circuits are to: determine at least one first relation value indicating relatedness between the first value and the plurality of values [par. 36; local self-attention (LSA) combines features from both regional and local tokens associated with the first region]; determine at least one second relation value indicating relatedness between the first value and the second value; and process the portion of the feature map based, at least in part, on the at least one first relation value and the at least one second relation value [par. 36; regional self-attention (RSA) combines features from regional tokens associated with both the first and second regions].

Referring to claim 5, Chen discloses The processor of claim 4, wherein the one or more circuits are to determine the at least one first relation value using at least one convolutional neural network [pars. 18 and 31-37; note the convolutions and neural network structures (e.g., weights, parameters, and layers)] and at least one transformer neural network [pars. 18 and 31-37; note the transformers and neural network structures (e.g., weights, parameters, and layers)].

Referring to claim 6, Chen discloses The processor of claim 5, wherein the one or more circuits are to downsample data output by the at least one convolutional neural network before providing the data to the at least one transformer neural network [fig. 3; pars. 31-37; note the convolutions, the transformers, the neural network structures (e.g., weights, parameters, and layers), and downsampling after each stage].

Referring to claim 10, Chen discloses A system, comprising: at least one processor; and memory storing instructions that when executed by the at least one processor cause the at least one processor to [fig. 2, processor(s) 204, memory 206, vision tool 220]: 
determine at least one first relation value indicating relatedness between a plurality of first values obtained from a first subsection of data and a first metric calculated based at least in part on the plurality of first values [pars. 35-37 and 43-45; local self-attention (LSA) combines features from both a regional token (i.e., first metric) and local tokens (i.e., a plurality of first values) associated with a first region of a visual content item]; 
determine at least one second relation value indicating relatedness between the first metric and a second metric calculated based at least in part on a plurality of second values associated with a second subsection of the data [pars. 35-37 and 43-45; regional self-attention (RSA) combines features from regional tokens (i.e., first and second metrics) associated with both the first region and a second region of the visual content item]; and 
at least one of classify, detect, or segment at least a portion of the data based at least in part on at least one of the at least one first relation value or the at least one second relation value [pars. 18-20; vision tool 220 uses R2L, which includes RSA and LSA, to perform vision tasks such as image classification, object, detection, and semantic/instance segmentation].

Referring to claim 11, Chen discloses The system of claim 10, wherein the instructions, when executed by the at least one processor, cause at least one processor to: determine at least one third relation value indicating relatedness between the plurality of second values and the second metric [pars. 35-37 and 43-45; this would be LSA as applied to a third region of the visual content item]; and determine at least one measure of relatedness between at least a portion of the plurality of first values and at least a portion of the plurality of second values based at least in part on the at least one first relation value, the at least one second relation value, and the at least one third relation value [pars. 35-37 and 43-45; this would be R2L as applied to regional and local tokens associated with first, second, and third regions of the visual content item].

Referring to claim 12, Chen discloses The system of claim 10, wherein the instructions, when executed by the at least one processor, cause at least one processor to determine the at least one first relation value using at least one convolutional neural network [pars. 18 and 31-37; note the convolutions and neural network structures (e.g., weights, parameters, and layers)] and at least one transformer neural network [pars. 18 and 31-37; note the transformers and neural network structures (e.g., weights, parameters, and layers)].

Referring to claim 13, Chen discloses The system of claim 12, wherein the instructions, when executed by the at least one processor, cause at least one processor to downsample data output by the at least one convolutional neural network before providing the data to the at least one transformer neural network [fig. 3; pars. 31-37; note the convolutions, the transformers, the neural network structures (e.g., weights, parameters, and layers), and downsampling after each stage].

Referring to claim 15, Chen discloses A method using at least one processor, the method comprising: 
generating first and second subregions of an image, the first subregion being associated with a plurality of first values [pars. 43-45; each region (i.e., a first region and a second region) of a visual content item is associated with a plurality of local tokens]; 
generating a first metric based at least in part on the plurality of first values [pars. 29, 31, and 43-45; each region (e.g., the first region) is associated with a regional token (e.g., a first metric) covering the plurality of local tokens (e.g., first values)]; 
generating a second metric representing the second subregion [pars. 29, 31, and 43-45; each region (e.g., the second region) is associated with a regional token (e.g., a second metric) covering the plurality of local tokens (e.g., second values)]; and 
processing the image, using a neural network, based, at least in part, on the plurality of first values, the first metric, and the second metric [pars. 18-20 and 35-37; vision tool 220 uses R2L, which includes RSA and LSA, to perform vision tasks; local self-attention (LSA) combines features from both regional and local tokens associated with each region; regional self-attention (RSA) combines features from regional tokens associated with all regions].

Referring to claim 16, Chen discloses The method of claim 15 further comprising: determining one or more relation values based, at least in part, on the first metric and the plurality of first values, the image being processing based, at least in part, on the one or more relation values [par. 36; local self-attention (LSA) combines features from both regional and local tokens associated with the first region].

Referring to claim 17, Chen discloses The method of claim 15, further comprising: determining one or more relation values based, at least in part, on the first metric and the second metric, the image being processing based, at least in part, on the one or more relation values [par. 36; regional self-attention (RSA) combines features from regional tokens associated with both the first and second regions].

Referring to claim 18, Chen discloses The method of claim 15, further comprising: determining at least one first relation value based, at least in part, on the first metric and the plurality of first values [par. 36; local self-attention (LSA) combines features from both the regional and local tokens associated with the first region]; and determining at least one second relation value based, at least in part, on the first metric and the second metric, the image being processing based, at least in part, on the at least one first relation value and the at least one second relation value [par. 36; regional self-attention (RSA) combines features from regional tokens associated with both the first and second regions].

Referring to claim 21, Chen discloses The method of claim 15, wherein processing the image comprises at least one of classifying, detecting, or segmenting at least a portion of the image [pars. 18-20; the vision tasks include image classification, object, detection, and semantic/instance segmentation].

Referring to claim 22, Chen discloses The method of claim 15, further comprising: determining at least one first relation value indicating relatedness between the plurality of first values and the first metric [pars. 35-37; note LSA as applied to the first region]; determining at least one second relation value indicating relatedness between a plurality of second values associated with the second subregion and the second metric, which was calculated based at least in part on the plurality of second values [pars. 35-37; note LSA as applied to the second region]; determining at least one third relation value indicating relatedness between the first and second subregions [pars. 35-37; note RSA as applied to the first and second regions]; and determining at least one measure of relatedness between at least a portion of the plurality of first values and at least a portion of the plurality of second values based at least in part on the at least one first relation value, the at least one second relation value, and the at least one third relation value [pars. 35-37; note R2L, which includes RSA and LSA from the first and second regions].

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 7 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Fan et al. (US Pub. 20210049452).

Referring to claim 7, Chen discloses The processor of claim 1, wherein the one or more machine learning processes comprise one or more neural networks, the one or more circuits are to implement the one or more neural networks when the one or more neural networks process the portion of the feature map [pars. 18 and 31-37; note the neural network structures (e.g., weights, parameters, and layers)], the portion of the feature map to be processed by the one or more neural networks based, at least in part, on: a first set of relation values determined, by a first...layer of the one or more neural networks, based at least is part, on the first value, the plurality of values, and the second value; and a second set of relation values determined, by a second...layer of the one or more neural networks, based, at least in part, on the first set of relation values [pars. 35-37; local self-attention (LSA) combines features from both regional and local tokens associated with each region; regional self-attention (RSA) combines features from regional tokens associated with all regions; R2L includes RSA and LSA with weights and layer normalization].
Chen does not appear to explicitly disclose that the first and second layers for determining the first and second set of relation values are hidden layers.
However, Fan discloses that the first and second layers for determining the first and second set of relation values are hidden layers [par. 34; an attention mechanism is applied to a hidden layer to determine which step is more relevant (i.e., attention values)].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the generation of RSA and LSA values taught by Chen so that they are determined by an attention mechanism in a hidden layer as taught by Fan, with a reasonable expectation of success. The motivation for doing so because hidden layers enable neural networks to model non-linear relationships between input and output, thereby improving accuracy of predictions.

Referring to claim 19, Chen discloses The method of claim 15, further comprising: generating, using a first...layer of a neural network, a first set of relation values based, at least in part, on the first metric, the plurality of first values, and the second metric; and generating, using a second...layer of the neural network, a second set of relation values, based at least in part, on the first set of relation values, the image being processing based, at least in part, on the first set of relation values and the second set of relation values [pars. 18 and 31-37; note the neural network structures (e.g., weights, parameters, and layers); local self-attention (LSA) combines features from both regional and local tokens associated with each region; regional self-attention (RSA) combines features from regional tokens associated with all regions; R2L includes RSA and LSA with weights and layer normalization].
Chen does not appear to explicitly disclose that the first and second layers for determining the first and second set of relation values are hidden layers.
However, Fan discloses that the first and second layers for determining the first and second set of relation values are hidden layers [par. 34; an attention mechanism is applied to a hidden layer to determine which step is more relevant (i.e., attention values)].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the generation of RSA and LSA values taught by Chen so that they are determined by an attention mechanism in a hidden layer as taught by Fan, with a reasonable expectation of success. The motivation for doing so because hidden layers enable neural networks to model non-linear relationships between input and output, thereby improving accuracy of predictions.

Claims 8, 9, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Lee et al. (US Pub. 20230206645).

Referring to claim 8, Chen does not appear to explicitly disclose The processor of claim 1, wherein the one or more circuits are to determine the first value, based, at least in part, on an average value of the plurality of values.
However, Lee discloses The processor of claim 1, wherein the one or more circuits are to determine the first value, based, at least in part, on an average value of the plurality of values [pars. 56 and 77; global average pooling is applied to local feature points to generate pooled feature vectors].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the regional tokens taught by Chen so that they are generated via global average pooling as taught by Lee, with a reasonable expectation of success. The motivation for doing so would have been to provide global context more efficiently and compensate information loss due to downsampling at the end of each stage [Lee, par. 59].

Referring to claim 9, Chen discloses The processor of claim 1, wherein, the second subregion is associated with a second plurality of values [pars. 43-45; note that each region (i.e., the first region and the second region) of the visual content item is associated with a plurality of local tokens].
Chen does not appear to explicitly disclose the one or more circuits are to determine the second value, based, at least in part, on an average value of the second plurality of values.
However, Lee discloses the one or more circuits are to determine the second value, based, at least in part, on an average value of the second plurality of values [pars. 56 and 77; global average pooling is applied to local feature points to generate pooled feature vectors].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the regional tokens taught by Chen so that they are generated via global average pooling as taught by Lee, with a reasonable expectation of success. The motivation for doing so would have been to provide global context more efficiently and compensate information loss due to downsampling at the end of each stage [Lee, par. 59].

Referring to claim 14, Chen does not appear to explicitly disclose The system of claim 10, wherein the first metric is calculated, based, at least in part, on an average value of the plurality of first values, and the second metric is calculated, based, at least in part, on an average value of the plurality of second values.
However, Lee discloses The system of claim 10, wherein the first metric is calculated, based, at least in part, on an average value of the plurality of first values, and the second metric is calculated, based, at least in part, on an average value of the plurality of second values [pars. 56 and 77; global average pooling is applied to local feature points to generate pooled feature vectors].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the regional tokens taught by Chen so that they are generated via global average pooling as taught by Lee, with a reasonable expectation of success. The motivation for doing so would have been to provide global context more efficiently and compensate information loss due to downsampling at the end of each stage [Lee, par. 59].

Referring to claim 20, Chen does not appear to explicitly disclose The method of claim 15, wherein the first metric is generated based at least in part on an average value of the plurality of first values.
However, Lee discloses The method of claim 15, wherein the first metric is generated based at least in part on an average value of the plurality of first values [pars. 56 and 77; global average pooling is applied to local feature points to generate pooled feature vectors].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the regional tokens taught by Chen so that they are generated via global average pooling as taught by Lee, with a reasonable expectation of success. The motivation for doing so would have been to provide global context more efficiently and compensate information loss due to downsampling at the end of each stage [Lee, par. 59]

Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Fu et al. (US Pub. 20200160124) discloses image recognition using a convolutional neural network and extracting global and local features.


Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GRACE PARK whose telephone number is (571)270-7727. The examiner can normally be reached M-F 8AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, TAMARA KYLE can be reached at (571)272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Grace Park/Primary Examiner, Art Unit 2144
Read full office action
Prosecution Timeline

Apr 04, 2023
Application Filed
Dec 17, 2025
Non-Final Rejection — §101, §102, §103
Mar 09, 2026
Examiner Interview Summary
Mar 09, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

18/149,682
Patent 12591807
SKETCHED AND CLUSTERED FEDERATED LEARNING WITH AUTOMATIC TUNING
2y 5m to grant Granted Mar 31, 2026
17/452,519
Patent 12585924
CAUSAL MULTI-TOUCH ATTRIBUTION
2y 5m to grant Granted Mar 24, 2026
17/726,675
Patent 12585728
METHOD AND APPARATUS FOR MACHINE LEARNING BASED INLET DEBRIS MONITORING
2y 5m to grant Granted Mar 24, 2026
17/721,873
Patent 12579150
Hybrid and Hierarchical Multi-Trial and OneShot Neural Architecture Search on Datacenter Machine Learning Accelerators
2y 5m to grant Granted Mar 17, 2026
17/741,612
Patent 12579431
METHOD AND SYSTEM FOR MACHINE LEARNING BASED UNDERSTANDING OF DATA ELEMENTS IN MAINFRAME PROGRAM CODE
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
76%
Grant Probability
94%
With Interview (+18.2%)
3y 4m
Median Time to Grant
Low
PTA Risk
Based on 557 resolved cases by this examiner. Grant probability derived from career allow rate.