Office Action Analysis: 18005804 — METHODS AND APPARATUS TO DYNAMICALLY NORMALIZE DATA IN NEURAL NETWORKS

Examiner Intelligence

GORMLEY, AARON PATRICK View full profile →
Grants 60% of resolved cases
Career Allow Rate
3 granted / 5 resolved
+5.0% vs TC avg
Minimal -60% lift
Without
With
+-60.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 4m
Avg Prosecution
30 currently pending
Career history
35
Total Applications
across all art units
Statute-Specific Performance

§101
30.2%
-9.8% vs TC avg
§103
36.0%
-4.0% vs TC avg
§102
8.4%
-31.6% vs TC avg
§112
21.5%
-18.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 5 resolved cases
Office Action

§101 §102 §103 §112
DETAILED ACTION
	This action is in response to the application filed 01/17/2023. Claims 1-25 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
	The information disclosure statement (IDS) submitted on 05/11/2023 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
The abstract of the disclosure is objected to because “Different ones of the alternate normalized outputs based on different normalization techniques” is improper grammar.  A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. This includes claims 23-25.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are:
Claim 1:
Limitation 1: “at least one normalization calculator”
Limitation 2: “a soft weighting engine”
Limitation 3: “a normalized output generator”
Claim 2: “the normalized output generator”
Claim 3: “the soft weighting engine”
Claim 4:
Preamble: “the soft weighting engine”
Limitation 1: “an aggregation analyzer”
Limitation 2: “a mapping analyzer”
Claim 5: “the soft weighting engine”
Claim 7: “the soft weighting engine”
Claim 8: “the normalization calculator”, “the normalized output generator”
Claim 9: “the soft weighting engine”
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-10 and 23-25 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

	Limitations reciting the use of a “means” or equivalent generic placeholder that is modified by functional language, and not modified by sufficient structure within the claim, is interpreted as a means-plus-function limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (MPEP 2181(I) A.). For limitations interpreted under 35 U.S.C. 112(f) using means-plus-function language, the structure of the “means” or the equivalent generic placeholder substitute must be disclosed in the specification itself in a way that one skilled in the art will understand what structure will perform the recited function (MPEP 2181 (II.) A.). Additionally, for a computer-implemented means-plus-function limitation interpreted under 35 U.S.C. 112(f), the specification must disclose an algorithm for performing the claimed specific computer function (MPEP 2181 (II.) A.). Failure to adequately disclose either the structure or algorithm in sufficient detail in the specification for a computer-implemented means-plus-function limitation renders the claim indefinite under 35 U.S.C. 112(b).

As noted in the claim interpretation section above, claims 1-5 & 7-9 recite computer-implemented means-plus-function limitations incorporating the use of generic placeholders substituting “means”. Additionally, claims 23-25 recite computer-implemented means-plus-function limitations incorporating the use of “means”. The instant specification fails to discloses any meaningful structure for these generic placeholders and means, and would be insufficient for one of ordinary skill in the art to understand what structures could perform the recited functions. Thus, claims 1-5, 7-9, & 23-25 are considered indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. This deficiency is inherited by child claims 6 and 10.

The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-10 & 23-25 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.

	Limitations reciting the use of a “means” or equivalent generic placeholder that is modified by functional language, and not modified by sufficient structure within the claim, is interpreted as a means-plus-function limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (MPEP 2181(I) A.). For limitations interpreted under 35 U.S.C. 112(f) using means-plus-function language, the written description under 35 U.S.C. 112(a) must adequately link or associate particular structure, material, or acts to perform the function or it must be clear based on the facts of the application that one skilled in the art would have known what structure, material, or acts disclosed in the specification perform the recited function (MPEP 2163(II) A. (3)).

Claims 1-5 & 7-9 recite computer-implemented means-plus-function limitations incorporating the use of generic placeholders substituting “means”. Additionally, claims 23-25 recite computer-implemented means-plus-function limitations incorporating the use of “means”. As noted above, these claims are rejected under 35 U.S.C. 112(b) as being indefinite for failing to adequately disclose the corresponding structures or algorithms in sufficient detail in the specification. When a claim containing a computer-implemented 35 U.S.C. 112(f) claim limitation is found to be indefinite under 35 U.S.C. 112(b) for failure to disclose sufficient corresponding structure in the specification that performs the entire claimed function, it will also lack written description under 35 U.S.C. 112(a). See MPEP § 2163.03, subsection VI. Thus, these claims are rejected under 35 U.S.C. 112(a) for lack of written description.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-25 are rejected under 35 U.S.C. 101 because the claimed inventions are directed to non-statutory subject matter without significantly more.

Claim 1
Step 1: The claim recites “An apparatus”, and is therefore directed to the statutory category of machine
Step 2A Prong 1: The claim recites the following judicial exception(s)
at least one normalization calculator to generate a plurality of alternate normalized outputs associated with input data for the machine learning model, different ones of the alternate normalized outputs based on different normalization techniques: This can be performed as a mental process. One can merely calculate the outputs of various normalization techniques on the input data.
a soft weighting engine to generate a plurality of weights based on the input data: This can be performed as a mental process. One can merely decide on weights based on the input data.
a normalized output generator to generate a final normalized output based on the plurality of alternate normalized outputs and the plurality of weights: This can be performed as a mental process. One can merely calculate a weighted sum of the alternate normalized outputs with the generated weights.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the following additional element(s)
An apparatus for use with a machine learning model: This is mere instruction to execute the judicial exceptions with generic computer components and generic data structure (MPEP 2106.05(f)).
at least one normalization calculator to generate a plurality of alternate normalized outputs associated with input data for the machine learning model, different ones of the alternate normalized outputs based on different normalization techniques: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
a soft weighting engine to generate a plurality of weights based on the input data: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
a normalized output generator to generate a final normalized output based on the plurality of alternate normalized outputs and the plurality of weights: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
Step 2B: The following additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
An apparatus for use with a machine learning model: This is mere instruction to execute the judicial exceptions with generic computer components and generic data structure (MPEP 2106.05(f)).
at least one normalization calculator to generate a plurality of alternate normalized outputs associated with input data for the machine learning model, different ones of the alternate normalized outputs based on different normalization techniques: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
a soft weighting engine to generate a plurality of weights based on the input data: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
a normalized output generator to generate a final normalized output based on the plurality of alternate normalized outputs and the plurality of weights: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).

Claim 2
Step 1: The claim recites a machine, as in claim 1
Step 2A Prong 1: The claim recites the following further judicial exception(s)
wherein the normalized output generator is to generate the final normalized output as a sum of a product of ones of the plurality of weights and respective ones of the plurality of alternate normalized outputs: This can be performed as a mental process. One can merely calculate a weighted sum with the alternate normalized outputs and plurality of weights.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
wherein the normalized output generator is to generate the final normalized output as a sum of a product of ones of the plurality of weights and respective ones of the plurality of alternate normalized outputs: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
wherein the normalized output generator is to generate the final normalized output as a sum of a product of ones of the plurality of weights and respective ones of the plurality of alternate normalized outputs: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).

Claim 3
Step 1: The claim recites a machine, as in claim 1
Step 2A Prong 1: The claim recites the following further judicial exception(s)
wherein the input data is first input data and the plurality of weights is a first plurality of weights: Generating a plurality of alternate normalized outputs associated with the input data and generating a plurality of weights based on the first input data can still be performed as mental processes.
the soft weighting engine to generate a second plurality of weights based on second input data different than the first input data, the second plurality of weights different than the first plurality of weights due to distinctions between the first input data and the second input data: This can be performed as a mental process. One can merely decide on weights based on the second input data.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
the soft weighting engine to generate a second plurality of weights based on second input data different than the first input data, the second plurality of weights different than the first plurality of weights due to distinctions between the first input data and the second input data: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
the soft weighting engine to generate a second plurality of weights based on second input data different than the first input data, the second plurality of weights different than the first plurality of weights due to distinctions between the first input data and the second input data: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).

Claim 4
Step 1: The claim recites a machine, as in claim 1
Step 2A Prong 1: The claim recites the following further judicial exception(s)
an aggregation analyzer to aggregate the input data into a first vector: This can be performed as a mental process. One can merely imagine a vector containing the input data.
a mapping analyzer to map the first vector to a second vector, a number of elements in the second vector being the same as a number of the different normalization techniques: This can be performed as a mental process. One can calculate each of several normalization methods on the first vector, storing each output in the second vector.
the plurality of weights based on values in the second vector: Generating the plurality of weights can still be performed as a mental process. One can merely imagine a weight for each element in the second vector.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
an aggregation analyzer to aggregate the input data into a first vector: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
a mapping analyzer to map the first vector to a second vector, a number of elements in the second vector being the same as a number of the different normalization techniques: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
an aggregation analyzer to aggregate the input data into a first vector: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
a mapping analyzer to map the first vector to a second vector, a number of elements in the second vector being the same as a number of the different normalization techniques: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).

Claim 5
Step 1: The claim recites a machine, as in claim 4
Step 2A Prong 1: The claim recites the following further judicial exception(s)
wherein the soft weighting engine includes a scaling analyzer to scale the values in the second vector: This can be performed as a mental process. One can merely multiply the second vector values by a scalar.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
wherein the soft weighting engine includes a scaling analyzer to scale the values in the second vector: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
wherein the soft weighting engine includes a scaling analyzer to scale the values in the second vector: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).

Claim 6
Step 1: The claim recites a machine, as in claim 1
Step 2A Prong 1: The claim recites the following further judicial exception(s)
wherein the machine learning model is a neural network with multiple layers: Generating a plurality of alternate normalized outputs associated with input data for the machine learning model can still be performed as a mental process.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the additional element(s)
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)

Claim 7
Step 1: The claim recites a machine, as in claim 6
Step 2A Prong 1: The claim recites the following further judicial exception(s)
wherein the input data is first input data for a first layer in the neural network and the plurality of weights is a first plurality of weights: Generating a plurality of alternate normalized outputs associated with the input data and generating a plurality of weights based on the first input data can still be performed as mental processes.
the soft weighting engine to generate a second plurality of weights based on second input data for a second layer in the neural network, the second input data based on the final normalized output: This can be performed as a mental process. One can merely decide weights based on the second input data associated with some layer in the neural network subsequent to the first layer.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
the soft weighting engine to generate a second plurality of weights based on second input data for a second layer in the neural network, the second input data based on the final normalized output: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
the soft weighting engine to generate a second plurality of weights based on second input data for a second layer in the neural network, the second input data based on the final normalized output: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).

Claim 8
Step 1: The claim recites a machine, as in claim 7
Step 2A Prong 1: The claim recites the following further judicial exception(s)
wherein the plurality of alternate normalized outputs is a first plurality of alternate normalized outputs associated with the first layer in the neural network and the final normalized output is a first final normalized output associated with the first layer in the neural network: Calculating the plurality of alternate normalized outputs and the final normalized output can still be performed as a mental process.
the at least one normalization calculator to generate a second plurality of alternate normalized outputs associated with second input data: This can be performed as a mental process. One can merely calculate the outputs of various normalization techniques on the second input data.
the normalized output generator to generate a second final normalized output based on the second plurality of alternate normalized outputs and the second plurality of weights: This can be performed as a mental process. One can merely calculate a weighted sum of the second plurality of alternate normalized outputs with the generated second plurality of weights.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
the at least one normalization calculator to generate a second plurality of alternate normalized outputs associated with second input data: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
the normalized output generator to generate a second final normalized output based on the second plurality of alternate normalized outputs and the second plurality of weights: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
the at least one normalization calculator to generate a second plurality of alternate normalized outputs associated with second input data: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
the normalized output generator to generate a second final normalized output based on the second plurality of alternate normalized outputs and the second plurality of weights: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).

Claim 9
Step 1: The claim recites a machine, as in claim 1
Step 2A Prong 1: The claim recites the following further judicial exception(s)
wherein the soft weighting engine is to generate the plurality of weights independent of the alternate normalized outputs: Generating the plurality of weights can still be performed as a mental process. They simply must be calculated using a process that doesn’t require the alternate normalized outputs as input.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the further additional element(s)
wherein the soft weighting engine is to generate the plurality of weights independent of the alternate normalized outputs: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
Step 2B: The further additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
wherein the soft weighting engine is to generate the plurality of weights independent of the alternate normalized outputs: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).

Claim 10
Step 1: The claim recites a machine, as in claim 1
Step 2A Prong 1: The claim recites the following further judicial exception(s)
wherein the plurality of weights corresponds to soft weights with values that may differ along a range from 0 to 1: Generating the weights can still be performed as a mental process.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the additional element(s)
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)

Claim 11
Step 1: The claim recites “At least one non-transitory computer readable medium”, and is therefore directed to the statutory category of article of manufacture
Step 2A Prong 1: The claim recites the following judicial exception(s)
generate a plurality of alternate normalized outputs associated with input data for the machine learning model, different ones of the alternate normalized outputs based on different normalization techniques: This can be performed as a mental process. One can merely calculate the outputs of various normalization techniques on the input data.
generate a plurality of weights based on the input data: This can be performed as a mental process. One can merely decide on weights based on the input data.
generate a final normalized output based on the plurality of alternate normalized outputs and the plurality of weights: This can be performed as a mental process. One can merely calculate a weighted sum of the alternate normalized outputs with the generated weights.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the following additional element(s)
At least one non-transitory computer readable medium comprising instructions that, when executed, cause at least one processor to at least: This is mere instruction to execute the judicial exceptions with generic computer hardware (MPEP 2106.05(f)).
Step 2B: The following additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
At least one non-transitory computer readable medium comprising instructions that, when executed, cause at least one processor to at least: This is mere instruction to execute the judicial exceptions with generic computer hardware (MPEP 2106.05(f)).

Claims 12-18
Step 1: Claims 12-18 recite an article of manufacture, as in claim 11.
Step 2A Prong 1: Claims 12-18 recite the same judicial exception(s) as claims 2-8, respectively.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through any additional elements. The analysis of claims 12-18 at this step mirrors that of claims 2-8, respectively, with the exception that claims 12-18 are directed to “At least one non-transitory computer readable medium comprising instructions that, when executed, cause at least one processor to at least”, said processor performing the operations of  of claims 2-8. This is a mere instruction to apply the exceptions using generic computer equipment (MPEP 2106.05(f)).
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s). The analysis of claims 12-18 at this step mirrors that of claims 2-8, with the exception that claims 12-18 are directed to “At least one non-transitory computer readable medium comprising instructions that, when executed, cause at least one processor to at least”, said processor performing the operations of  of claims 2-8. This is a mere instruction to apply the exceptions using generic computer equipment (MPEP 2106.05(f)).

Claim 19
Step 1: The claim recites “A method”, and is therefore directed to the statutory category of article of process
Step 2A Prong 1: The claim recites the following judicial exception(s)
generating a plurality of alternate normalized outputs associated with input data for the machine learning model, different ones of the alternate normalized outputs based on different normalization techniques: This can be performed as a mental process. One can merely calculate the outputs of various normalization techniques on the input data.
generating a plurality of weights based on the input data: This can be performed as a mental process. One can merely decide on weights based on the input data.
generating a final normalized output based on the plurality of alternate normalized outputs and the plurality of weights: This can be performed as a mental process. One can merely calculate a weighted sum of the alternate normalized outputs with the generated weights.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the following additional element(s)
A method for using a machine learning model: This merely links the recited judicial exceptions to a field of use (machine learning) (MPEP 2106.05(h)).
Step 2B: The following additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
A method for using a machine learning model: This merely links the recited judicial exceptions to a field of use (machine learning) (MPEP 2106.05(h)).

Claims 20-22
Step 1: Claims 20-22 recite a process, as in claim 19.
Step 2A Prong 1: Claims 20-22 recite the same judicial exception(s) as claims 2-4, respectively.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through any additional elements. The methods of claims 20-22 at this step are disclosed in their entirety by claims 2-4. Thus, claims 20-22 are found not to be integrated into a practical integration under the same basis as claims 2-4, respectively.
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s). The methods of claims 20-22 at this step are disclosed in their entirety by claims 2-4. Thus, claims 20-22 are found not to amount to significantly more under the same basis as claims 2-4, respectively.

Claim 23
Step 1: The claim recites “An apparatus”, and is therefore directed to the statutory category of machine
Step 2A Prong 1: The claim recites the following judicial exception(s)
means for generating a plurality of alternate normalized outputs associated with input data for the machine learning model, different ones of the alternate normalized outputs based on different normalization techniques: This can be performed as a mental process. One can merely calculate the outputs of various normalization techniques on the input data.
means for generating a plurality of weights based on the input data: This can be performed as a mental process. One can merely decide on weights based on the input data.
means for generating a final normalized output based on the plurality of alternate normalized outputs and the plurality of weights: This can be performed as a mental process. One can merely calculate a weighted sum of the alternate normalized outputs with the generated weights.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through the following additional element(s)
An apparatus for use with a machine learning model: This is mere instruction to execute the judicial exceptions with generic computer components and generic data structure (MPEP 2106.05(f)).
means for generating a plurality of alternate normalized outputs associated with input data for the machine learning model, different ones of the alternate normalized outputs based on different normalization techniques: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
means for generating a plurality of weights based on the input data: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
means for generating a final normalized output based on the plurality of alternate normalized outputs and the plurality of weights: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
Step 2B: The following additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s)
An apparatus for use with a machine learning model: This is mere instruction to execute the judicial exceptions with generic computer components and generic data structure (MPEP 2106.05(f)).
means for generating a plurality of alternate normalized outputs associated with input data for the machine learning model, different ones of the alternate normalized outputs based on different normalization techniques: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
means for generating a plurality of weights based on the input data: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).
means for generating a final normalized output based on the plurality of alternate normalized outputs and the plurality of weights: This is mere instruction to execute a judicial exception with a generic computing component (MPEP 2106.05(f)).

Claims 24-25
Step 1: Claims 24-25 recite a machine, as in claim 23.
Step 2A Prong 1: Claims 24-25 recite the same judicial exception(s) as claims 2-3, respectively.
Step 2A Prong 2: The judicial exception(s) are not integrated into a practical application through any additional elements. The methods of claims 24-25 at this step are disclosed in their entirety by claims 2-3. Thus, claims 24-25 are found not to be integrated into a practical integration under the same basis as claims 2-3, respectively.
Step 2B: The additional element(s) of the claim, taken alone or in combination, do not amount to significantly more than the recited judicial exception(s). The methods of claims 24-25 at this step are disclosed in their entirety by claims 2-3. Thus, claims 24-25 are found not to amount to significantly more under the same basis as claims 2-3, respectively.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-3, 6-10, 19-21, and 23-25 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Luo et al. (Switchable Normalization for Learning-to-Normalize Deep Representation, published July 22nd, 2019, arXiv:1907.10473v1), hereafter referred to as Luo.

	Regarding claim 1, Luo discloses [a]n apparatus for use with a machine learning model, the apparatus comprising:
at least one normalization calculator to generate a plurality of alternate normalized outputs associated with input data for the machine learning model, different ones of the alternate normalized outputs based on different normalization techniques:
“We take CNN (machine learning model) as an illustrative example. Let h be the input data of an arbitrary normalization layer represented by a 4D tensor (N, C, H, W), indicating number of samples, number of channels, height and width of a channel respectively.” (Luo, page 3, left column, paragraph 6)
“Let                                 
                                    
                                            h
                                        
                                            n
                                            c
                                            i
                                            j
                                        
                             (input data) … be a pixel before … normalization” (Luo, page 3, right column, paragraph 1)
“we compare SN with five popular normalization methods, i.e. BN, IN, LN, GN and WN” (Luo, page 2, right column, paragraph 4)
“In general, we have 
    PNG
    media_image1.png
    179
    610
    media_image1.png
    Greyscale
where                                 
                                    k
                                    ∈
                                    {
                                    i
                                    n
                                    ,
                                    l
                                    n
                                    ,
                                    b
                                    n
                                    }
                                
                             is used to distinguish different methods.                                 
                                    
                                            I
                                        
                                            k
                                        
                             is a set pixels [sic] and                                 
                                    |
                                    
                                            I
                                        
                                            k
                                        
                                    |
                                
                             denotes the number of pixels. Specifically,                                 
                                    
                                            I
                                        
                                            i
                                            n
                                        
                            ,                                 
                                    
                                            I
                                        
                                            l
                                            n
                                        
                            , and                                  
                                    
                                            I
                                        
                                            b
                                            n
                                        
                             are the sets of pixels used to compute statistics in different approaches” (Luo, page 3, right column, paragraph 3).                                 
                                    
                                            μ
                                        
                                            k
                                        
                             and                                 
                                    
                                            σ
                                        
                                            k
                                        
                             are a plurality of alternate normalized outputs.
a soft weighting engine to generate a plurality of weights based on the input data:
“SN adapts to various scenarios by changing its importance weights. For example, SN prefers BN when the minibatch is sufficiently large (size of the input data), while it selects LN instead when small minibatch is presented” (Luo, page 6, left column, paragraph 4)
“The importance weights in each SN layer are visualized in Fig.5. We have several observations to answer what factors that impact the choices of normalizers. First, for the same batch size, the importance weights of                                 
                                    μ
                                
                             and                                 
                                    σ
                                
                             could have notable differences … this is because the variance (statistical properties of the input data) estimated in a minibatch produces larger noise than the mean, making training instable … Second, the SN layers in different places of a network (source of the input data) may select distinct operations … Third, deeper layers (depth of the input data) prefer LN and IN more than BN” (Luo, page 6, right column, paragraph 2)
a normalized output generator to generate a final normalized output based on the plurality of alternate normalized outputs and the plurality of weights:
“Let …                                 
                                    
                                                    h
                                                
                                                ^
                                            
                                            n
                                            c
                                            i
                                            j
                                        
                             (final normalized output) be a pixel … after normalization” (Luo, page 3, right column, paragraph 1)
“SN has an intuitive expression 
    PNG
    media_image2.png
    96
    613
    media_image2.png
    Greyscale
where                                 
                                    Ω
                                
                             is a set of statistics estimated in different ways. In this work, we define                                 
                                    Ω
                                    =
                                    {
                                    i
                                    n
                                    ,
                                    l
                                    n
                                    ,
                                    b
                                    n
                                    }
                                
                             the same as above where                                 
                                    
                                            μ
                                        
                                            k
                                        
                             and                                 
                                    
                                            σ
                                        
                                            k
                                        
                             (plurality of alternate normalized outputs) can be calculated by following Eqn. (2).” (Luo, page 4, left column, paragraph 1)
“Furthermore,                                 
                                    
                                            w
                                        
                                            k
                                        
                             and                                 
                                    
                                            w
                                        
                                            k
                                        
                                            '
                                        
                             (plurality of weights) in Eqn. (3) are importance ratios used to weighted average the means and variances respectively … There are 3                                 
                                    ×
                                
                             2 = 6 importance weights in SN” (Luo, page 4, left column, paragraph 2). Weights, importance weights, and importance ratios are used interchangeably in Luo.

	Regarding claim 2, the rejection of claim 1 in view of Luo is incorporated. Luo further discloses an apparatus, wherein the normalized output generator is to generate the final normalized output as a sum of a product of ones of the plurality of weights and respective ones of the plurality of alternate normalized outputs:
“SN has an intuitive expression 
    PNG
    media_image2.png
    96
    613
    media_image2.png
    Greyscale
where                                 
                                    Ω
                                
                             is a set of statistics estimated in different ways. In this work, we define                                 
                                    Ω
                                    =
                                    {
                                    i
                                    n
                                    ,
                                    l
                                    n
                                    ,
                                    b
                                    n
                                    }
                                
                             the same as above where                                 
                                    
                                            μ
                                        
                                            k
                                        
                             and                                 
                                    
                                            σ
                                        
                                            k
                                        
                             (plurality of alternate normalized outputs) can be calculated by following Eqn. (2).” (Luo, page 4, left column, paragraph 1)
“Furthermore,                                 
                                    
                                            w
                                        
                                            k
                                        
                             and                                 
                                    
                                            w
                                        
                                            k
                                        
                                            '
                                        
                             (plurality of weights) in Eqn. (3) are importance ratios used to weighted average the means and variances respectively” (Luo, page 4, left column, paragraph 2).
Examiner’s note: As can be seen in equation 3 above, the final normalized output is produced by two weighted sums of weights and alternate normalized outputs.

	Regarding claim 3, the rejection of claim 1 in view of Luo is incorporated. Luo further discloses an apparatus, wherein the input data is first input data and the plurality of weights is a first plurality of weights, the soft weighting engine to generate a second plurality of weights based on second input data different than the first input data, the second plurality of weights different than the first plurality of weights due to distinctions between the first input data and the second input data: 
Examiner’s note: As discussed regarding parent claim 1, weights are generated based on input data.

    PNG
    media_image3.png
    627
    869
    media_image3.png
    Greyscale
”There are 53 SN layers. (a, b) show the importance weights for                                 
                                    μ
                                
                             and                                 
                                    σ
                                
                             of (8, 32), while (c, d) show those of (8, 2). The y-axis represents the importance weights that sum to 1, while the x-axis shows different residual blocks of ResNet50. The SN layers in different places are highlighted differently. For example, the SN layers follow the                                 
                                    3
                                    ×
                                    3
                                
                             conv layers are outlined by shaded color, those in the shortcuts are marked with ‘■’, while those follow the                                  
                                    1
                                    ×
                                    1
                                
                             conv layers are in flat color. The first SN layer follows a                                 
                                    7
                                    ×
                                    7
                                
                             conv layer. We see that SN learns distinct importance weights for different normalization methods as well as                                 
                                    μ
                                
                             and                                 
                                    σ
                                
                            , adapting to different batch sizes, places, and depths of a deep network” (Luo, page 7, Fig. 5). Each layer of the network has a unique input. As shown in figure 5, this results in each layer having a unique plurality of weights. Any two layers can be labeled as having a first and second plurality of weights.

	Regarding claim 6, the rejection of claim 1 in view of Luo is incorporated. Luo further discloses an apparatus, wherein the machine learning model is a neural network with multiple layers: “By enabling each normalization layer in a deep network (neural network) to have its own operation, SN helps ease the usage of normalizers, pushes the frontier of normalization in deep learning” (Luo, page 2, left column, paragraph 2); “There are 53 SN layers.” (Luo, page 7, Fig. 5).

	Regarding claim 7, the rejection of claim 6 in view of Luo is incorporated. Luo further discloses an apparatus, wherein the input data is first input data for a first layer in the neural network and the plurality of weights is a first plurality of weights, the soft weighting engine to generate a second plurality of weights based on second input data for a second layer in the neural network, the second input data based on the final normalized output:
Examiner’s note: As discussed regarding parent claim 1, weights are generated based on input data.

    PNG
    media_image3.png
    627
    869
    media_image3.png
    Greyscale
”There are 53 SN layers. (a, b) show the importance weights for                                 
                                    μ
                                
                             and                                 
                                    σ
                                
                             of (8, 32), while (c, d) show those of (8, 2). The y-axis represents the importance weights that sum to 1, while the x-axis shows different residual blocks of ResNet50. The SN layers in different places are highlighted differently. For example, the SN layers follow the                                 
                                    3
                                    ×
                                    3
                                
                             conv layers are outlined by shaded color, those in the shortcuts are marked with ‘■’, while those follow the                                  
                                    1
                                    ×
                                    1
                                
                             conv layers are in flat color. The first SN layer follows a                                 
                                    7
                                    ×
                                    7
                                
                             conv layer. We see that SN learns distinct importance weights for different normalization methods as well as                                 
                                    μ
                                
                             and                                 
                                    σ
                                
                            , adapting to different batch sizes, places, and depths of a deep network” (Luo, page 7, Fig. 5). Each layer of the network has a unique input. As shown in figure 5, this results in each layer having a unique plurality of weights. Any two layers can be labeled as having a first and second plurality of weights. The input of one layer is based on the normalized outputs of the previous layers.

	Regarding claim 8, the rejection of claim 7 in view of Luo is incorporated. Luo further discloses an apparatus, wherein the plurality of alternate normalized outputs is a first plurality of alternate normalized outputs associated with the first layer in the neural network and the final normalized output is a first final normalized output associated with the first layer in the neural network, the at least one normalization calculator to generate a second plurality of alternate normalized outputs associated with second input data, the normalized output generator to generate a second final normalized output based on the second plurality of alternate normalized outputs and the second plurality of weights: 

    PNG
    media_image3.png
    627
    869
    media_image3.png
    Greyscale
”There are 53 SN layers” (Luo, page 7, Fig. 5). In this example, Luo’s method (switchable normalization) is performed 53 times, for each of 53 layers. That necessarily includes calculating 53 final normalized outputs accordingly, including a first and second calculation.
“SN has an intuitive expression 
    PNG
    media_image2.png
    96
    613
    media_image2.png
    Greyscale
 where                                 
                                    Ω
                                
                             is a set of statistics estimated in different ways. In this work, we define                                 
                                    Ω
                                    =
                                    {
                                    i
                                    n
                                    ,
                                    l
                                    n
                                    ,
                                    b
                                    n
                                    }
                                
                             the same as above where                                 
                                    
                                            μ
                                        
                                            k
                                        
                             and                                 
                                    
                                            σ
                                        
                                            k
                                        
                             (plurality of alternate normalized outputs) can be calculated by following Eqn. (2).” (Luo, page 4, left column, paragraph 1). The formula for calculating a final normalized output with Luo’s method, SN. As discussed regarding parent claim 7, each layer has a unique plurality of weights resulting from its unique input.
“In general, we have 
    PNG
    media_image1.png
    179
    610
    media_image1.png
    Greyscale
” (Luo, page 3, right column, paragraph 3). The value of the alternate normalized outputs is dependent on the input of the layer being normalized (h). 

	Regarding claim 9, the rejection of claim 1 in view of Luo is incorporated. Luo further discloses a method, wherein the soft weighting engine is to generate the plurality of weights independent of the alternate normalized outputs:
“In general, we have 
    PNG
    media_image1.png
    179
    610
    media_image1.png
    Greyscale
” (Luo, page 3, right column, paragraph 3). The formula for calculating the alternate normalized outputs.

    PNG
    media_image4.png
    87
    563
    media_image4.png
    Greyscale
(Luo, page 4, left column, paragraph 2). The formula for generating the plurality of weights.
Examiner’s note: The formula for weight generation doesn’t include any alternate normalized outputs, nor do the formulas for the alternate normalized outputs include weights. Thus, these operations are independent.

	Regarding claim 10, the rejection of claim 1 in view of Luo is incorporated. Luo further discloses an apparatus, wherein the plurality of weights corresponds to soft weights with values that may differ along a range from 0 to 1: 

    PNG
    media_image4.png
    87
    563
    media_image4.png
    Greyscale
”Here each                                 
                                    
                                            w
                                        
                                            k
                                        
                             is computed by using a softmax function” (Luo, page 4, left column, paragraph 2). 
Examiner’s note: Continuous weight values, as shown here, fall within the definition given by paragraph [0028] of the instant specification: “As used herein, the term ‘soft,’  used in the context of ‘soft weights,’ means that the weights are given a value on a continuous scale rather than being defined as one of different discrete values”.
Examiner’s note: As known to one of ordinary skill in the art, softmax scales values between 0 and 1.

	Regarding claim 19, Luo discloses [a] method for using a machine learning model, the method comprising:
generating a plurality of alternate normalized outputs associated with input data for the machine learning model, different ones of the alternate normalized outputs based on different normalization techniques:
“We take CNN (machine learning model) as an illustrative example. Let h be the input data of an arbitrary normalization layer represented by a 4D tensor (N, C, H, W), indicating number of samples, number of channels, height and width of a channel respectively.” (Luo, page 3, left column, paragraph 6)
“Let                                 
                                    
                                            h
                                        
                                            n
                                            c
                                            i
                                            j
                                        
                             (input data) … be a pixel before … normalization” (Luo, page 3, right column, paragraph 1)
“we compare SN with five popular normalization methods, i.e. BN, IN, LN, GN and WN” (Luo, page 2, right column, paragraph 4)
“In general, we have 
    PNG
    media_image1.png
    179
    610
    media_image1.png
    Greyscale
where                                 
                                    k
                                    ∈
                                    {
                                    i
                                    n
                                    ,
                                    l
                                    n
                                    ,
                                    b
                                    n
                                    }
                                
                             is used to distinguish different methods.                                 
                                    
                                            I
                                        
                                            k
                                        
                             is a set pixels [sic] and                                 
                                    |
                                    
                                            I
                                        
                                            k
                                        
                                    |
                                
                             denotes the number of pixels. Specifically,                                 
                                    
                                            I
                                        
                                            i
                                            n
                                        
                            ,                                 
                                    
                                            I
                                        
                                            l
                                            n
                                        
                            , and                                  
                                    
                                            I
                                        
                                            b
                                            n
                                        
                             are the sets of pixels used to compute statistics in different approaches” (Luo, page 3, right column, paragraph 3).                                 
                                    
                                            μ
                                        
                                            k
                                        
                             and                                 
                                    
                                            σ
                                        
                                            k
                                        
                             are a plurality of alternate normalized outputs.
generating a plurality of weights based on the input data:
“SN adapts to various scenarios by changing its importance weights. For example, SN prefers BN when the minibatch is sufficiently large (size of the input data), while it selects LN instead when small minibatch is presented” (Luo, page 6, left column, paragraph 4)
“The importance weights in each SN layer are visualized in Fig.5. We have several observations to answer what factors that impact the choices of normalizers. First, for the same batch size, the importance weights of                                 
                                    μ
                                
                             and                                 
                                    σ
                                
                             could have notable differences … this is because the variance (statistical properties of the input data) estimated in a minibatch produces larger noise than the mean, making training instable … Second, the SN layers in different places of a network (source of the input data) may select distinct operations … Third, deeper layers (depth of the input data) prefer LN and IN more than BN” (Luo, page 6, right column, paragraph 2)
generating a final normalized output based on the plurality of alternate normalized outputs and the plurality of weights:
“Let …                                 
                                    
                                                    h
                                                
                                                ^
                                            
                                            n
                                            c
                                            i
                                            j
                                        
                             (final normalized output) be a pixel … after normalization” (Luo, page 3, right column, paragraph 1)
“SN has an intuitive expression 
    PNG
    media_image2.png
    96
    613
    media_image2.png
    Greyscale
where                                 
                                    Ω
                                
                             is a set of statistics estimated in different ways. In this work, we define                                 
                                    Ω
                                    =
                                    {
                                    i
                                    n
                                    ,
                                    l
                                    n
                                    ,
                                    b
                                    n
                                    }
                                
                             the same as above where                                 
                                    
                                            μ
                                        
                                            k
                                        
                             and                                 
                                    
                                            σ
                                        
                                            k
                                        
                             (plurality of alternate normalized outputs) can be calculated by following Eqn. (2).” (Luo, page 4, left column, paragraph 1)
“Furthermore,                                 
                                    
                                            w
                                        
                                            k
                                        
                             and                                 
                                    
                                            w
                                        
                                            k
                                        
                                            '
                                        
                             (plurality of weights) in Eqn. (3) are importance ratios used to weighted average the means and variances respectively … There are 3                                 
                                    ×
                                
                             2 = 6 importance weights in SN” (Luo, page 4, left column, paragraph 2). Weights, importance weights, and importance ratios are used interchangeably in Luo.

	The methods of claims 20-21 mirror the apparatus operations of claims 2-3, respectively. Thus, claims 20-21 are rejected under the same rationales used for claims 2-3, respectively.

	Regarding claim 23, Luo discloses [a]n apparatus for use with a machine learning model, the apparatus comprising:
means for generating a plurality of alternate normalized outputs associated with input data for the machine learning model, different ones of the alternate normalized outputs based on different normalization techniques:
“We take CNN (machine learning model) as an illustrative example. Let h be the input data of an arbitrary normalization layer represented by a 4D tensor (N, C, H, W), indicating number of samples, number of channels, height and width of a channel respectively.” (Luo, page 3, left column, paragraph 6)
“Let                                 
                                    
                                            h
                                        
                                            n
                                            c
                                            i
                                            j
                                        
                             (input data) … be a pixel before … normalization” (Luo, page 3, right column, paragraph 1)
“we compare SN with five popular normalization methods, i.e. BN, IN, LN, GN and WN” (Luo, page 2, right column, paragraph 4)
“In general, we have 
    PNG
    media_image1.png
    179
    610
    media_image1.png
    Greyscale
where                                 
                                    k
                                    ∈
                                    {
                                    i
                                    n
                                    ,
                                    l
                                    n
                                    ,
                                    b
                                    n
                                    }
                                
                             is used to distinguish different methods.                                 
                                    
                                            I
                                        
                                            k
                                        
                             is a set pixels [sic] and                                 
                                    |
                                    
                                            I
                                        
                                            k
                                        
                                    |
                                
                             denotes the number of pixels. Specifically,                                 
                                    
                                            I
                                        
                                            i
                                            n
                                        
                            ,                                 
                                    
                                            I
                                        
                                            l
                                            n
                                        
                            , and                                  
                                    
                                            I
                                        
                                            b
                                            n
                                        
                             are the sets of pixels used to compute statistics in different approaches” (Luo, page 3, right column, paragraph 3).                                 
                                    
                                            μ
                                        
                                            k
                                        
                             and                                 
                                    
                                            σ
                                        
                                            k
                                        
                             are a plurality of alternate normalized outputs.
means for generating a plurality of weights based on the input data:
“SN adapts to various scenarios by changing its importance weights. For example, SN prefers BN when the minibatch is sufficiently large (size of the input data), while it selects LN instead when small minibatch is presented” (Luo, page 6, left column, paragraph 4)
“The importance weights in each SN layer are visualized in Fig.5. We have several observations to answer what factors that impact the choices of normalizers. First, for the same batch size, the importance weights of                                 
                                    μ
                                
                             and                                 
                                    σ
                                
                             could have notable differences … this is because the variance (statistical properties of the input data) estimated in a minibatch produces larger noise than the mean, making training instable … Second, the SN layers in different places of a network (source of the input data) may select distinct operations … Third, deeper layers (depth of the input data) prefer LN and IN more than BN” (Luo, page 6, right column, paragraph 2)
means for generating a final normalized output based on the plurality of alternate normalized outputs and the plurality of weights:
“Let …                                 
                                    
                                                    h
                                                
                                                ^
                                            
                                            n
                                            c
                                            i
                                            j
                                        
                             (final normalized output) be a pixel … after normalization” (Luo, page 3, right column, paragraph 1)
“SN has an intuitive expression 
    PNG
    media_image2.png
    96
    613
    media_image2.png
    Greyscale
where                                 
                                    Ω
                                
                             is a set of statistics estimated in different ways. In this work, we define                                 
                                    Ω
                                    =
                                    {
                                    i
                                    n
                                    ,
                                    l
                                    n
                                    ,
                                    b
                                    n
                                    }
                                
                             the same as above where                                 
                                    
                                            μ
                                        
                                            k
                                        
                             and                                 
                                    
                                            σ
                                        
                                            k
                                        
                             (plurality of alternate normalized outputs) can be calculated by following Eqn. (2).” (Luo, page 4, left column, paragraph 1)
“Furthermore,                                 
                                    
                                            w
                                        
                                            k
                                        
                             and                                 
                                    
                                            w
                                        
                                            k
                                        
                                            '
                                        
                             (plurality of weights) in Eqn. (3) are importance ratios used to weighted average the means and variances respectively … There are 3                                 
                                    ×
                                
                             2 = 6 importance weights in SN” (Luo, page 4, left column, paragraph 2). Weights, importance weights, and importance ratios are used interchangeably in Luo.

The apparatus operations of claims 24-25 mirror the apparatus operations of claims 2-3, respectively. Thus, claims 24-25 are rejected under the same rationales used for claims 2-3, respectively.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 4-5 & 22 are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (Switchable Normalization for Learning-to-Normalize Deep Representation, published July 22nd, 2019, arXiv:1907.10473v1), hereafter referred to as Luo, in view of Wang et al. (Semi- Supervised Domain Adaptation for Weakly Labeled Semantic Video Object Segmentation, published 6/7/2016, arXiv:1606.02280v1), hereafter referred to as Wang.

	Regarding claim 4, the rejection of claim 1 in view of Luo is incorporated. Luo further discloses an apparatus, wherein the soft weighting engine includes: … a mapping analyzer to map the first vector to a second vector, a number of elements in the second vector being the same as a number of the different normalization techniques, the plurality of weights based on values in the second vector: 
“Let                                 
                                    
                                            h
                                        
                                            n
                                            c
                                            i
                                            j
                                        
                             (first vector) … be a pixel before … normalization” (Luo, page 3, right column, paragraph 1).                                 
                                    
                                            h
                                        
                                            n
                                            c
                                            i
                                            j
                                        
                             across all channels, samples, and pixels is a vector of all the input data.
“In general, we have 
    PNG
    media_image1.png
    179
    610
    media_image1.png
    Greyscale
where                                 
                                    k
                                    ∈
                                    {
                                    i
                                    n
                                    ,
                                    l
                                    n
                                    ,
                                    b
                                    n
                                    }
                                
                             is used to distinguish different methods.                                 
                                    
                                            I
                                        
                                            k
                                        
                             is a set pixels [sic] and                                 
                                    |
                                    
                                            I
                                        
                                            k
                                        
                                    |
                                
                             denotes the number of pixels. Specifically,                                 
                                    
                                            I
                                        
                                            i
                                            n
                                        
                            ,                                 
                                    
                                            I
                                        
                                            l
                                            n
                                        
                            , and                                  
                                    
                                            I
                                        
                                            b
                                            n
                                        
                             are the sets of pixels used to compute statistics in different approaches” (Luo, page 3, right column, paragraph 3). These functions together comprise a mapping between the first vector (input pixels) and the second vector (k pairs of                                 
                                    
                                            μ
                                        
                                            k
                                        
                             and                                 
                                    
                                            σ
                                        
                                            k
                                        
                             across all k normalization techniques).
	While Luo fails to disclose the further limitations of the claim, Wang discloses an aggregation analyzer to aggregate the input data into a first vector:
“Spatial Average Pooling
After the initial discovery, a large number of region proposals are positively detected with regard to a class label, which include overlapping regions on the same objects and spurious detections. We adopt a simple weighted spatial average pooling strategy to aggregate the region-wise score, confidence as well as their spatial extent. For each proposal                         
                            
                                    r
                                
                                    i
                                
                    , we rescore it by multiplying its score and classification confidence, which is denoted by                         
                            
                                            s
                                        
                                        ~
                                    
                                            r
                                        
                                            i
                                        
                            =
                            
                                    s
                                
                                            r
                                        
                                            i
                                        
                            ∙
                            
                                    c
                                
                                            r
                                        
                                            i
                                        
                    . We then generate score map                         
                            
                                    S
                                
                                            r
                                        
                                            i
                                        
                     of the size of image frame, which is composited as the binary map of current region proposal multiplied by its score                         
                            
                                            s
                                        
                                        ~
                                    
                                            r
                                        
                                            i
                                        
                    . We perform an average pooling over the score maps of all the proposals to compute a confidence map” (Wang, page 6, paragraph 1). As noted by paragraph [0030] of the instant specification, spatial average pooling is a viable implementation of the aggregation operation.

    PNG
    media_image5.png
    200
    400
    media_image5.png
    Greyscale
”An illustration of the weighted spatial average pooling strategy” (Wang, page 6, Fig. 2)
	Wang relates to adaptive convolutional neural networks and is analogous to the claimed invention. Luo teaches an apparatus that has unique composite normalization for data at each layer. Wang teaches an apparatus that performs spatial average pooling on CNN data. It would have been obvious to one of ordinary skill in the art to combine Luo and Wang by pooling Luo’s input data before further processing. This would achieve the predictable result of emphasizing the most important aspects of the images while ensuring their compatibility with further processing in the CNN, with Luo’s normalization and Wang’s pooling performing the same together as they did separately. (MPEP 2143 I. (A) Combining prior art elements according to known methods to yield predictable results).

	Regarding claim 5, the rejection of claim 4 in view of Luo and Wang is incorporated. Luo further discloses an apparatus, wherein the soft weighting engine includes a scaling analyzer to scale the values in the second vector: “                        
                            γ
                        
                     and                         
                            β
                        
                     are a scale and a shift parameter respectively“ (Luo, page 3, right column, paragraph 1); “SN has an intuitive expression 
    PNG
    media_image2.png
    96
    613
    media_image2.png
    Greyscale
where                         
                            Ω
                        
                     is a set of statistics estimated in different ways. In this work, we define                         
                            Ω
                            =
                            {
                            i
                            n
                            ,
                            l
                            n
                            ,
                            b
                            n
                            }
                        
                     the same as above where                         
                            
                                    μ
                                
                                    k
                                
                     and                         
                            
                                    σ
                                
                                    k
                                
                     (second vector) can be calculated by following Eqn. (2).” (Luo, page 4, left column, paragraph 1)

	The method of claim 22 mirrors the apparatus operations of claim 4. Thus, claim 22 is rejected under the same rationales used for claim 4.

Claims 11-13 & 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (Switchable Normalization for Learning-to-Normalize Deep Representation, published July 22nd, 2019, arXiv:1907.10473v1), hereafter referred to as Luo, in view of Nemlekar et al. (FUSED CONVOLUTION AND BATCH NORMALIZATION FOR NEURAL NETWORKS, published 6/18/2020, US 20200192631 A1), hereafter referred to as Nemlekar.

Regarding claim 11, Luo discloses instructions to:
generate a plurality of alternate normalized outputs associated with input data for the machine learning model, different ones of the alternate normalized outputs based on different normalization techniques:
“We take CNN (machine learning model) as an illustrative example. Let h be the input data of an arbitrary normalization layer represented by a 4D tensor (N, C, H, W), indicating number of samples, number of channels, height and width of a channel respectively.” (Luo, page 3, left column, paragraph 6)
“Let                                 
                                    
                                            h
                                        
                                            n
                                            c
                                            i
                                            j
                                        
                             (input data) … be a pixel before … normalization” (Luo, page 3, right column, paragraph 1)
“we compare SN with five popular normalization methods, i.e. BN, IN, LN, GN and WN” (Luo, page 2, right column, paragraph 4)
“In general, we have 
    PNG
    media_image1.png
    179
    610
    media_image1.png
    Greyscale
where                                 
                                    k
                                    ∈
                                    {
                                    i
                                    n
                                    ,
                                    l
                                    n
                                    ,
                                    b
                                    n
                                    }
                                
                             is used to distinguish different methods.                                 
                                    
                                            I
                                        
                                            k
                                        
                             is a set pixels [sic] and                                 
                                    |
                                    
                                            I
                                        
                                            k
                                        
                                    |
                                
                             denotes the number of pixels. Specifically,                                 
                                    
                                            I
                                        
                                            i
                                            n
                                        
                            ,                                 
                                    
                                            I
                                        
                                            l
                                            n
                                        
                            , and                                  
                                    
                                            I
                                        
                                            b
                                            n
                                        
                             are the sets of pixels used to compute statistics in different approaches” (Luo, page 3, right column, paragraph 3).                                 
                                    
                                            μ
                                        
                                            k
                                        
                             and                                 
                                    
                                            σ
                                        
                                            k
                                        
                             are a plurality of alternate normalized outputs.
generate a plurality of weights based on the input data:
“SN adapts to various scenarios by changing its importance weights. For example, SN prefers BN when the minibatch is sufficiently large (size of the input data), while it selects LN instead when small minibatch is presented” (Luo, page 6, left column, paragraph 4)
“The importance weights in each SN layer are visualized in Fig.5. We have several observations to answer what factors that impact the choices of normalizers. First, for the same batch size, the importance weights of                                 
                                    μ
                                
                             and                                 
                                    σ
                                
                             could have notable differences … this is because the variance (statistical properties of the input data) estimated in a minibatch produces larger noise than the mean, making training instable … Second, the SN layers in different places of a network (source of the input data) may select distinct operations … Third, deeper layers (depth of the input data) prefer LN and IN more than BN” (Luo, page 6, right column, paragraph 2)
generate a final normalized output based on the plurality of alternate normalized outputs and the plurality of weights:
“Let …                                 
                                    
                                                    h
                                                
                                                ^
                                            
                                            n
                                            c
                                            i
                                            j
                                        
                             (final normalized output) be a pixel … after normalization” (Luo, page 3, right column, paragraph 1)
“SN has an intuitive expression 
    PNG
    media_image2.png
    96
    613
    media_image2.png
    Greyscale
where                                 
                                    Ω
                                
                             is a set of statistics estimated in different ways. In this work, we define                                 
                                    Ω
                                    =
                                    {
                                    i
                                    n
                                    ,
                                    l
                                    n
                                    ,
                                    b
                                    n
                                    }
                                
                             the same as above where                                 
                                    
                                            μ
                                        
                                            k
                                        
                             and                                 
                                    
                                            σ
                                        
                                            k
                                        
                             (plurality of alternate normalized outputs) can be calculated by following Eqn. (2).” (Luo, page 4, left column, paragraph 1)
“Furthermore,                                 
                                    
                                            w
                                        
                                            k
                                        
                             and                                 
                                    
                                            w
                                        
                                            k
                                        
                                            '
                                        
                             (plurality of weights) in Eqn. (3) are importance ratios used to weighted average the means and variances respectively … There are 3                                 
                                    ×
                                
                             2 = 6 importance weights in SN” (Luo, page 4, left column, paragraph 2). Weights, importance weights, and importance ratios are used interchangeably in Luo.
While Luo fails to disclose the further limitations of the claim, Nemlekar discloses [a]t least one non-transitory computer readable medium comprising instructions that, when executed, cause at least one processor to at least: “In some embodiments, certain aspects of the techniques described above may implemented by one or more processors of a processing system executing software. The software includes one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer readable storage medium. The software can include the instructions and certain data that, when executed by the one or more processors, manipulate the one or more processors to perform one or more aspects of the techniques described above. The non-transitory computer readable storage medium can include, for example, a magnetic or optical disk storage device, solid state storage devices such as Flash memory, a cache, random access memory (RAM) or other non-volatile memory device or devices, and the like. The executable instructions stored on the non-transitory computer readable storage medium may be in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executable by one or more processors.” (Nemlekar, [0029])
	Nemlekar relates to normalization in convolutional neural networks and is analogous to the claimed invention. Luo teaches an apparatus for performing normalization in convolutional neural networks. The claimed invention improves upon this method by storing it in the form of instructions on computer hardware. Nemlekar teaches computer hardware for normalization in CNNs, applicable to Luo. A person of ordinary skill in the art would have recognized that storing Luo’s method as computer instructions on Nemlekar’s hardware would lead to the predictable result of the method being executable by a computing system, and would improve the known device by allowing it to be performed with real data (MPEP 2143 I. (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results).

	The analysis of claims 12-13 & 16-18 mirrors that of claims 2-3 & 6-8, with the exception that claims 12-13 & 16-18 are directed to generic computer hardware which executes the methods of claims 2-3 & 6-8. This generic hardware is taught by Nemlekar, as discussed regarding claim 11. Thus, claims 12-13 & 16-18 are rejected under the same rationales used for claims 2-3 & 6-8, respectively.

Claims 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. (Switchable Normalization for Learning-to-Normalize Deep Representation, published July 22nd, 2019, arXiv:1907.10473v1), hereafter referred to as Luo, in view of Nemlekar et al. (FUSED CONVOLUTION AND BATCH NORMALIZATION FOR NEURAL NETWORKS, published 6/18/2020, US 20200192631 A1), hereafter referred to as Nemlekar, Wang et al. (Semi- Supervised Domain Adaptation for Weakly Labeled Semantic Video Object Segmentation, published 6/7/2016, arXiv:1606.02280v1), hereafter referred to as Wang.

	The analysis of claims 14-15 mirrors that of claims 4-5, with the exception that claims 14-15 are directed to generic computer hardware which executes the methods of claims 4-5. This generic hardware is taught by Nemlekar, as discussed regarding claim 11. Thus, claims 14-15 are rejected under the same rationales used for claims 4-5, respectively.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Luo et al. (NORMALIZATION METHOD AND APPARATUS FOR DEEP NEURAL NETWORK, AND STORAGE MEDIA, filed 4/29/2020, US 2020/0257979 A1) details a proposed patent for a system highly similar to Luo’s “Switchable Normalization” method.
Jia et al. (Instance-Level Meta Normalization, published 4/6/2019, arXiv:1904.03516v1) discloses a method of normalizing data in a CNN with dynamic, separately calculated normalization parameters that can be combined with existing normalization methods
Li et al. (Attentive Normalization, published 11/23/2019, arXiv:1908.01259v2) discloses normalization through a weighted sum of different normalization parameters based on attention
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Aaron P Gormley whose telephone number is (571)272-1372. The examiner can normally be reached Monday - Friday 12:00 PM - 8:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle T Bechtold can be reached at (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/AG/Examiner, Art Unit 2148                                                                                                                                                                                                        /MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Jan 17, 2023
Application Filed
Feb 04, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/537,475
Patent 12585955
Minimal Trust Data Sharing
2y 5m to grant Granted Mar 24, 2026
17/524,338
Patent 12579440
Training Artificial Neural Networks Using Context-Dependent Gating with Weight Stabilization
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

1-2
Expected OA Rounds
60%
Grant Probability
0%
With Interview (-60.0%)
4y 4m
Median Time to Grant
Low
PTA Risk
Based on 5 resolved cases by this examiner. Grant probability derived from career allow rate.
METHODS AND APPARATUS TO DYNAMICALLY NORMALIZE DATA IN NEURAL NETWORKS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHODS AND APPARATUS TO DYNAMICALLY NORMALIZE DATA IN NEURAL NETWORKS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email