Last updated: April 19, 2026
Application No. 17/902,076
INFORMATION PROCESSING APPARATUS, MACHINE LEARNING MODEL, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM

Final Rejection §103
Filed
Sep 02, 2022
Examiner
BUDISALICH, ANDREW STEVEN
Art Unit
2662
Tech Center
2600 — Communications
Assignee
Canon Kabushiki Kaisha
OA Round
4 (Final)
Interview Optional

— +8.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 46 resolved cases, 2023–2026
Examiner Intelligence

BUDISALICH, ANDREW STEVEN View full profile →
Grants 78% — above average
Career Allow Rate
36 granted / 46 resolved
+16.3% vs TC avg
Moderate +9% lift
Without
With
+8.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
35 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
14.5%
-25.5% vs TC avg
§103
65.6%
+25.6% vs TC avg
§102
5.2%
-34.8% vs TC avg
§112
13.0%
-27.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 46 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 0 has been entered.

Status of Claims
Claims 1, 3, and 6-20 are pending. Claims 2, 4, and 5 are canceled. 

Response to Arguments
Applicant’s arguments, see p.11-14, filed 02/02/2026, with respect to the rejections of Claims 1, 3, and 6-20 under 35 U.S.C. 103 have been fully considered but are moot because Applicant’s amendments of the independent claims has altered the scope of the claims, and therefore, necessitated new grounds of rejection which are presented below. Accordingly, THIS ACTION IS MADE FINAL.   

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 3, 10, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Taheri et al. (US 20210117724 A1) in view of Hiasa (US 20200111198 A1), Kim et al. (US 12249154 B2), Koivisto et al. (US 20190251442 A1), and Ravuna et al. (US 20210369174 A1).

Regarding Claim 1, Taheri teaches "An information processing apparatus including a machine learning model configured to perform a recognition process on a recognition target in a captured image, based on pixel information of the captured image, and information about the captured image other than the pixel information comprising: a processor; a memory, including instructions, which when executed by the processor, cause the information processing apparatus to:"; (Taheri, Paras. 58, 70, 82, and 137, teach a data processor that receives instructions from memory that comprises a machine learning model configured to perform object detection, i.e., recognition process on a target, within an image based on pixel data and other appropriate data of the image, i.e., target recognition using pixel information and information other than pixel information to determine depicted objects).
However, Taheri does not explicitly teach “input the pixel information to a first portion of the machine learning model; input the information about the captured image other than the pixel information to a second portion of the machine learning model following the first portion; and perform the recognition process by inputting correction information obtained by correcting an output of the first portion of the machine learning model by using the information about the captured image other than the pixel information, to a second portion of the machine learning model, which follows the first portion, wherein the machine learning model is a convolutional neural network including an intermediate layer between the first portion and the second portion, and wherein the information about the captured image other than the pixel information is used as a bias in a convolutional calculation in the intermediate layer, multiplied by the output of the first portion for each element, or connected to the output of the first portion in a channel direction, and wherein before being used in the convolutional calculation in the intermediate layer, the information about the captured image other than the pixel information undergoes a process of multiplying the information by a previously learned weight, or a process of adding a previously learned bias to the information".
In an analogous field of endeavor, Hiasa teaches "input the pixel information to a first portion of the machine learning model"; (Hiasa, Fig. 1 [201-202] and Para. 53, teaches inputting an image to a first convolution layer of the machine learning model, i.e., first portion, wherein the input image includes pixel information);
"input the information about the captured image other than the pixel information to a second portion of the machine learning model following the first portion"; (Hiasa, Paras. 34, 43, 57, and 78, teaches a multilayer convolutional neural network which includes a processing step for learning a filter of a network and processing step for correcting a blur using the learned filter wherein the image processor specifies a filter to be read based on information on a lens state when a captured image is obtained in which a lens state refers to the zoom, the F-number, and the in-focus distance and wherein a first feature map is a summary of the results calculated for each filter and is input to the second convolution layer, i.e., input information about the captured image other than the pixel information being the lens state filter summary as the first feature map into a second portion of the machine learning model being the second convolution layer which follows a first convolution layer);
"(Hiasa, FIG. 13 and Paras. 34, 43, 57, and 78, teaches the filters to be read based on information on a lens state when a captured image is obtained in which a lens state refers to the zoom, the F-number, and the in-focus distance wherein a blurred image and an intermediate corrected image are input to a first convolution layer that outputs a first feature map which is a summary of the results calculated for each filter and inputs the first feature map into the second convolution layer wherein the result obtained by repeating this operation by inputting an N-1th feature map to an N-th convolutional layer is a corrected component which is added to the blurred image to obtain a corrected image, i.e., input correction information by correcting the output of the first portion by using non-pixel image information into a second portion of the model that follows the first portion);
"wherein the machine learning model is a convolutional neural network including an intermediate layer between the first portion and the second portion"; (Hiasa, Fig. 1 and Paras. 34 teaches a multilayer convolutional neural network comprising a learning step and a correcting step wherein the CNN comprises N convolution layers between the first layer and the output image, i.e., the machine learning model is a convolutional neural network including at least one intermediate layer between a first and second portion);
"and wherein the information about the captured image other than the pixel information is used (Hiasa, FIG. 8 and Paras 5, 57-58, and 78, teaches the image processor reading out a filter corresponding to the acquired captured image wherein the filter to be read out is based on information on a lens state when a captured image is obtained from header information of the image in which lens state refers to zoom, F-number, and the in-focus distance wherein a first feature map being the summary of the results calculated for each filter is output from the first convolutional layer having the blurred image and the intermediate corrected image stacked in the channel direction, i.e., information about the image other than the pixel information is connected to the output of the first portion in a channel direction).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Taheri by including the inputting of pixel information, the inputting of non-pixel image information into a second portion of a model, correcting the output of the first portion using the non-pixel image information, and the non-pixel information being connected to the output in a channel direction taught by Hiasa. One of ordinary skill in the art would be motivated to combine the references since it improves the model (Hiasa, Para. 78, teaches the motivation of combination to be to update the CNN filter and bias which would improve the model).
However, the combination of references of Taheri in view of Hiasa does not explicitly teach “and perform the recognition process by inputting correction information obtained by correcting an output of the first portion of the machine learning model information by a previously learned weight, or a process of adding a previously learned bias to the information".
In an analogous field of endeavor, Kim teaches "and perform the recognition process by inputting correction information obtained by correcting an output of the first portion of the machine learning model (Kim, Claim 13, teaches classifying the at least one object into the class based on the extracted feature point wherein a first layer corrects the input image and a third layer extracts the feature point from the correct image and the second layer relates to the classifying of the object, i.e., perform the recognition process being the classification of the objects by inputting correction information of the input image of the first portion of the model to a second portion of the model that follows the first portion).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Taheri and Hiasa wherein the correction information obtained by correcting an output of the first portion of the model uses the information about the captured image other than the pixel information by including the performance of a recognition process by inputting the correction information to a second portion taught by Kim. One of ordinary skill in the art would be motivated to combine the references since it improves recognition rate (Kim, Col. 1 lines 18-25, teaches the motivation of combination to be to improve recognition rate).
However, the combination of references of Taheri in view of Hiasa and Kim does not explicitly teach “and wherein the information about the captured image other than the pixel information is used as a bias in a convolutional calculation in the intermediate layer, and wherein before being used in the convolutional calculation in the intermediate layer, the information about the captured image other than the pixel information undergoes a process of multiplying the information by a previously learned weight, or a process of adding a previously learned bias to the information".
In an analogous field of endeavor, Koivisto teaches "and wherein the information about the captured image other than the pixel information is used as a bias in a convolutional calculation in the intermediate layer";(Koivisto, Claim 10, teaches adjusting a bias associated with the second layer of the CNN based on a first bias associated with the filter, i.e., non-pixel image information being the filter is being used as a bias in convolutional calculation in intermediate second layer of the CNN).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Taheri, Hiasa, and Kim wherein filters are read based on information on a lens state when a captured image is obtained in which a lens state refers to the zoom, the F-number, and the in-focus distance by including the filters being used as a bias in convolutional calculation of intermediate layers taught by Koivisto. One of ordinary skill in the art would be motivated to combine the references since it increases efficiency (Koivisto, Para. 93, teaches the motivation of combination to be to increase the efficiency of training).
However, the combination of reference of Taheri in view of Hiasa, Kim, and Koivisto does not explicitly teach "and wherein before being used in the convolutional calculation in the intermediate layer, the information about the captured image other than the pixel information undergoes a process of multiplying the information by a previously learned weight, or a process of adding a previously learned bias to the information".
In an analogous field of endeavor, Ravuna teaches "and wherein before being used in the convolutional calculation in the intermediate layer, the information about the captured image other than the pixel information undergoes a process of multiplying the information by a previously learned weight, or a process of adding a previously learned bias to the information"; (Ravuna, FIGs. 24 and 25 and Paras. 165 and 195-207, teaches a scalar distance input being provided to an activation function after multiplying by the weight and adding the bias wherein the input of the softmax layer may be multiplied by the output of the activation function wherein the distance is provided as an input to the second network prior to use in the convolutional calculations of the intermediate layers, i.e., the input of the scalar distance being the information about the captured image other than the pixel information undergoes a process of multiplying the information by a previously learned weight and of adding a previously learned bias to the information before being used in the convolutional calculation of the softmax layer being the intermediate layer or before being input to a second network for convolutional calculations within its intermediate layers).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Taheri, Hiasa, Kim, and Koivisto by including the non-pixel image information undergoing a process of multiplying the information by a weight or adding a bias to the information before use in convolutional calculation of an intermediate layer taught by Ravuna. One of ordinary skill in the art would be motivated to combine the references since it improves outputs (Ravuna, Para. 190, teaches the motivation of combination to be to train the network to achieve improved outputs).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Regarding Claim 3, the combination of references of Taheri in view of Hiasa, Kim, Koivisto, and Ravuna teaches "The apparatus according to claim 1, wherein the information about the captured image other than the pixel information is used in a convolutional calculation in some channels of the intermediate layer"; (Hiasa, Para. 78, teaches the first convolution layer outputting a first feature map that is a summary of the results calculated for each filter based on the lens state wherein the blurred image and the intermediate corrected image are stacked in the channel direction with a total of eight channels, i.e., the filter based on the lens state containing non-pixel image information is used in convolutional calculation in channels of intermediate layers).
The proposed combination as well as the motivation for combining the Taheri in view of Hiasa, Kim, Koivisto, and Ravuna references presented in the rejection of Claim 1, applies to claim 3. Thus, the apparatus recited in claim 3 is met by Taheri in view of Hiasa, Kim, Koivisto, and Ravuna.

Regarding Claim 10, the combination of references of Taheri in view of Hiasa, Kim, Koivisto, and Ravuna teaches "The apparatus according to claim 1, wherein the captured image is one of a plurality of temporally continuous images, and wherein the instructions, when executed by the processor, further cause the information processing apparatus to: track a recognition target in the plurality of images, as the recognition process"; (Taheri, Paras. 23 and 26, teach capturing video wherein the video includes multiple sequential images or frames, i.e., plurality of temporally continuous images, wherein objects are tracked in the sequence of images, i.e., tracks a recognition target in the plurality of images).

	Claim 18 recites a method with steps corresponding to the elements of the system recited in Claim 1. Therefore, the recited steps of this claim are mapped to the proposed combination in the same manner as the corresponding elements in its corresponding system claim.  Additionally, the rationale and motivation to combine the Taheri in view of Hiasa, Kim, Koivisto, and Ravuna references, presented in rejection of Claim 1, apply to this claim.

Claim 20 recites a computer-readable storage medium storing a program with instructions corresponding to the steps recited in Claim 1.  Therefore, the recited programming instructions of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Taheri in view of Hiasa, Kim, Koivisto, and Ravuna references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of the Taheri in view of Hiasa, Kim, Koivisto, and Ravuna references discloses a computer readable storage medium (for example, see Taheri, Claim 18).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Taheri in view of Hiasa, Kim, Koivisto, Ravuna, and Chen et al. (US 20210182077 A1).

Regarding Claim 6, the combination of references of Taheri in view of Hiasa, Kim, Koivisto, and Ravuna does not explicitly teach "The apparatus according to claim 1, wherein the information about the captured image other than the pixel information is a scalar value, a one-dimensional vector, or a two-dimensional vector".
In an analogous field of endeavor, Chen teaches "The apparatus according to claim 1, wherein the information about the captured image other than the pixel information is a scalar value, a one-dimensional vector, or a two-dimensional vector"; (Chen, Para. 2782, teaches data from the images processed by the neural network includes scalars, one-dimensional vectors, or two-dimensional vectors, i.e., non-pixel information).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Taheri in view of Hiasa, Kim, Koivisto, And Ravuna wherein image data includes a lens state when a captured image is obtained in which a lens state refers to the zoom, the F-number, and the in-focus distance by including the information being a scalar, 1D vector, or a 2D vector taught by Chen. One of ordinary skill in the art would be motivated to combine the references since it improves efficiency (Chen, Abstract, teaches the motivation of combination to be to improve information processing efficiency).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Taheri in view of Hiasa, Kim, Koivisto, Ravuna, and Nobori et al. (US 20180316906 A1).

Regarding Claim 7, the combination of references of Taheri in view of Hiasa, Kim, Koivisto, And Ravuna does not explicitly teach "The apparatus according to claim 1, wherein the information about the captured image other than the pixel information is calculated from an image capturing parameter of an image capturing device that captures the captured image or from the pixel information".
In an analogous field of endeavor, Nobori teaches "The apparatus according to claim 1, wherein the information about the captured image other than the pixel information is calculated from an image capturing parameter of an image capturing device that captures the captured image"; (Nobori, Fig. 10 [S1001-S1004], teaches calculating image information from camera parameter sets, i.e., calculating non-pixel information from an image capturing parameter of an image capturing device that captures the captured image);
"or from the pixel information"; (Nobori, Para. 123, teaches calculating further evaluation image information from calculated pixel information).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Taheri in view of Hiasa, Kim, Koivisto, And Ravuna by including the calculation of non-pixel image information from image capturing parameters or from the pixel information taught by Nobori. One of ordinary skill in the art would be motivated to combine the references since it enables accurate calibration (Nobori, Paras. 4-5, teach the motivation of combination to be to enable accurate and self-calibration of the cameras using the parameter and image information).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Taheri in view of Hiasa, Kim, Koivisto, Ravuna, Nobori, Nishide et al. (US 20210398274 A1), and Sheikh et al. (US 20200186710 A1).

Regarding Claim 8, the combination of references of Taheri in view of Hiasa, Kim, Koivisto, Ravuna, and Nobori does not explicitly teach "The apparatus according to claim 7, wherein the information about the captured image other than the pixel information is a coefficient of white balance processing, an aperture value, a focal length, an evaluation value of automatic exposure, an evaluation value of a subject distance, or a motion vector”.
In an analogous field of endeavor, Nishide teaches "The apparatus according to claim 7, wherein the information about the captured image other than the pixel information is a coefficient of white balance processing"; (Nishide, Para. 72, teaches captured image information includes a coefficient of white balance for processing, i.e., non-pixel information includes a coefficient of white balance processing).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Taheri in view of Hiasa, Kim, Koivisto, Ravuna, and Nobori by including the non-pixel image information being a coefficient of white balance processing taught by Nishide. One of ordinary skill in the art would be motivated to combine the references since it outputs more accurate recognition (Nishide, Para. 7, teaches the motivation of combination to be to output a more accurate recognition result using the trained model).
However, the combination of references of Taheri in view of Hiasa, Kim, Koivisto, Ravuna, Nobori, and Nishide does not explicitly teach "an aperture value, a focal length, an evaluation value of automatic exposure, an evaluation value of a subject distance, or a motion vector".
In an analogous field of endeavor, Sheikh teaches "an aperture value, a focal length, an evaluation value of automatic exposure, an evaluation value of a subject distance, or a motion vector"; (Sheikh, Para. 64, teaches information about the image including aperture, focal distance, and exposure).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Taheri in view of Hiasa, Kim, Koivisto, Ravuna, Nobori, and Nishide by including the non-pixel image information including aperture, focal distance, and exposure taught by Sheikh. One of ordinary skill in the art would be motivated to combine the references since it allows multiple configurations (Sheikh, Para. 64, teaches the motivation of combination to be to allow for multiple options of camera configuration).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Taheri in view of Hiasa, Kim, Koivisto, Ravuna, and Korobov et al. (US 20200210511 A1).

Regarding Claim 9, the combination of references of Taheri in view of Hiasa, Kim, Koivisto, And Ravuna does not explicitly teach "The apparatus according to claim 1, wherein the instructions, when executed by the processor, further cause the information processing apparatus to: perform a process of classifying a partial region in the captured image, or a process of detecting a recognition target in the captured image, as the recognition process".
In an analogous field of endeavor, Korobov teaches "The apparatus according to claim 1, wherein the instructions, when executed by the processor, further cause the information processing apparatus to: perform a process of classifying a partial region in the captured image, or a process of detecting a recognition target in the captured image, as the recognition process"; (Korobov, Paras. 18 and 20, teach classifying parts of a region of interest of an image, i.e., classifying a partial region in a captured image, and performing object detection with a neural net of a captured image, i.e., detecting a recognition target in a captured image).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Taheri in view of Hiasa, Kim, Koivisto, and Ravuna by including the classifying of a partial region of the image and of detecting a target in the image taught by Korobov. One of ordinary skill in the art would be motivated to combine the references since it increases quality of processing (Korobov, Para. 3, teaches the motivation of combination to be to increase quality of information processing and decrease extraction time).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Taheri in view of Hiasa, Kim, Koivisto, Ravuna, Kim et al. (US 20230143687 A1, hereinafter Kim’687), and Arberet et al. (US 20230298162 A1).

Regarding Claim 11, the combination of references of Taheri in view of Hiasa, Kim, Koivisto, And Ravuna does not explicitly teach "The apparatus according to claim 1, wherein the number of dimensions of the information about the captured image other than the pixel information is smaller than that of the pixel information; and the number of dimensions of the correction information is larger than that of the information about the captured image other than the pixel information".
In an analogous field of endeavor, Kim’687 teaches "The apparatus according to claim 1, wherein the number of dimensions of the information about the captured image other than the pixel information is smaller than that of the pixel information" (Kim'687, Para. 21, teaches calculating three-dimensional pixel information from two-dimensional image information, i.e., the number of dimensions of the image information is smaller than that of the pixel information. For further clarification, Kim, Paras. 18-21, teaches using scalars such as camera height, vertical viewing angle, and an azimuth angle, i.e., smaller dimension non-pixel information, to estimate three-dimensional pixel information of an image using linear interpolation, i.e., pixel information with bigger dimensions).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Taheri in view of Hiasa, Kim, Koivisto, And Ravuna by including the number of dimensions of the non-pixel image information being smaller than the pixel information taught by Kim’687. One of ordinary skill in the art would be motivated to combine the references since it helps estimate object distance (Kim, Para. 22, teaches the motivation of combination to be to estimate object distance).
However, the combination of references of Taheri in view of Hiasa, Kim, Koivisto, Ravuna, and Kim’687 does not explicitly teach "and the number of dimensions of the correction information is larger than that of the information about the captured image other than the pixel information".
In an analogous field of endeavor, Arberet teaches "and the number of dimensions of the correction information is larger than that of the information about the captured image other than the pixel information"; (Arberet, Paras. 47, 94, and 96, teach correction information being applied to an output of reconstruction wherein the output includes two-dimensional pixel distribution and three-dimensional voxel distribution compared to just the two-dimensional input image information and scalar values used to display color values, i.e., dimensions of the correction information is larger than the non-pixel scalar image information). 
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Taheri in view of Hiasa, Kim, Koivisto, Ravuna, and Kim’687 by including the number of dimensions of the correction information being larger than the non-pixel image information taught by Arberet. One of ordinary skill in the art would be motivated to combine the references since it improves the reconstruction of images by using the correction information (Arberet, Para. 5, teaches the motivation of combination to be to better reconstruct images by using the correction information).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Taheri in view of Hiasa, Kim, Koivisto, Ravuna, and Arberet.

Regarding Claim 12, the combination of references of Taheri in view of Hiasa, Kim, Koivisto, Ravuna, and Arberet teaches "The apparatus according to claim 1, wherein learning is performed on the machine learning model by using first ground truth data representing ground truth of the correction information, with respect to a parameter used when the output of the first portion is corrected"; (Arberet, Paras. 12 and 46, teach training a machine learning model using corrected ground truth data, i.e., first ground truth data of the correction information, wherein the learnable parameters are set in an optimized way for the correction data for reconstruction, i.e., with respect to a parameter used when correcting the output).
The proposed combination as well as the motivation for combining the Taheri in view of Hiasa, Kim, Koivisto, Ravuna, Kim’687, and Arberet references presented in the rejection of Claim 11, applies to claim 12. Thus, the apparatus recited in claim 12 is met by Taheri in view of Hiasa, Kim, Koivisto, Ravuna, and Arberet.

Claims 13 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Taheri in view of Arberet, Hiasa, and Ravuna.

Regarding Claim 13, the combination of references of Taheri in view of Arberet, Hiasa, and Ravuna teaches "An information processing apparatus for performing learning of a machine learning model configured to perform a recognition process on a recognition target in a captured image, based on pixel information of the captured image, and information about the captured image other than the pixel information, comprising: a processor; a memory, including instructions, which when executed by the processor, cause the information processing apparatus to: acquire second ground truth data indicating ground truth of an output of the machine learning model with respect to the captured image"; (Taheri, Paras. 34-35, 58, and 82,  teach one or more processors capable of acquiring a second ground truth label, i.e., second ground truth data, which indicates the ground truth of output of a machine learning object detection model with respect to captured training images, i.e., target recognition in a captured image, wherein pixel data and other appropriate data is used);
"form first ground truth data indicating ground truth of correction information obtained by correcting an output of a first portion of the machine learning model that receives the pixel information by using the information about the captured image other than the pixel information"; (Arberet, Paras. 5, 12-15, 29, 30, and 39, teaches generating an averaged ground truth from the phase corrected ground truths, i.e., form first ground truth data indicating ground truth of correction information, wherein image or object domain data is input, i.e., the model receives pixel information of the image, and wherein a phase map is extracted from the output of the model in training or extracted from the ground truth of the training data for phase correction based on the phase map in which the phase correction is applied to the ground truth and/or the output of the model in training, i.e., correcting an output of a first portion of the model using non-pixel image information being the phase map);
"and perform learning of the machine learning model based on an error between the correction information and the first ground truth data" (Arberet, Para. 12, teaches training the machine learning model based on the loss between the correction information and the corrected ground truth data, i.e., learning of the model based on the error between correction and ground truth);
"and an error between the second ground truth data and an output when the correction information is input to a second portion of the machine learning model, which follows the first portion"; (Hiasa, Para. 78, teaches a second convolution layer, i.e., a second portion of the machine learning model which follows a first portion, wherein an error between a corrected data and ground truth data is calculated wherein the correction information in input into a given layer, i.e., error between second ground truth data and the output of a second portion which has correction data input);
"wherein before being used in a convolutional calculation in an intermediate layer, the information about the captured image other than the pixel information undergoes a process of multiplying the information by a previously learned weight, or a process of adding a previously learned bias to the information"; (Ravuna, FIGs. 24 and 25 and Paras. 165 and 195-207, teaches a scalar distance input being provided to an activation function after multiplying by the weight and adding the bias wherein the input of the softmax layer may be multiplied by the output of the activation function wherein the distance is provided as an input to the second network prior to use in the convolutional calculations of the intermediate layers, i.e., the input of the scalar distance being the information about the captured image other than the pixel information undergoes a process of multiplying the information by a previously learned weight and of adding a previously learned bias to the information before being used in the convolutional calculation of the softmax layer being the intermediate layer or before being input to a second network for convolutional calculations within its intermediate layers).
The proposed combination as well as the motivation for combining the Taheri in view of Hiasa, Kim, Koivisto, Ravuna, Kim’687, and Arberet references presented in the rejection of Claims 1 and 11, applies to claim 13. Thus, the apparatus recited in claim 13 is met by Taheri in view of Arberet, Hiasa, and Ravuna.

Claim 19 recites a method with steps corresponding to the elements of the system recited in Claim 13. Therefore, the recited steps of this claim are mapped to the proposed combination in the same manner as the corresponding elements in its corresponding system claim.  Additionally, the rationale and motivation to combine the Taheri in view of Arberet, Hiasa, and Ravuna references, presented in rejection of Claims 1 and 11, apply to this claim.

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Taheri in view of Arberet, Hiasa, Ravuna, and Kao et al. (US 20200202516 A1).

Regarding Claim 14, the combination of references of Taheri in view of Arberet, Hiasa, and Ravuna teaches "The apparatus according to claim 13, wherein the instructions, when executed by the processor, further cause the information processing apparatus to: evaluate accuracy of the recognition process when using a set of the information about the captured image other than the pixel information and the first ground truth data"; (Taheri, Paras. 72 and 82, teaches evaluating accuracy of the object detection model, i.e., recognition process, using information about the input image such as other appropriate data other than pixel data and ground truth data, i.e., first ground truth data and non-pixel information).
However, the combination of references of Taheri in view of Arberet, Hiasa, and Ravuna does not explicitly teach "wherein the learning of the machine learning model is performed by using a set having a highest evaluation value of the accuracy, from a plurality of sets".
In an analogous field of endeavor, Kao teaches "wherein the learning of the machine learning model is performed by using a set having a highest evaluation value of the accuracy, from a plurality of sets"; (Kao, Para. 45, teaches training the machine learning model by using the feature path, i.e. set of information, that has the highest accuracy of the plurality of feature paths).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Taheri, Arberet, Hiasa, and Ravuna by including the use of a highest evaluation accuracy set for learning taught by Kao. One of ordinary skill in the art would be motivated to combine the references since it reduces prediction error (Kao, Paras. 5-6, teaches the motivation of combination to be to reduce prediction error and increase accuracy).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Taheri in view of Hiasa, Kim, Koivisto, Ravuna, Arberet, and Rhodes et al. (US 20200074185 A1).

Regarding Claim 15, the combination of references of Taheri in view of Hiasa, Kim, Koivisto, Ravuna, and Arberet does not explicitly teach "The apparatus according to claim 12, wherein the first ground truth data is an RGB value of the captured image before white balance processing is applied, a defocus map based on an aperture value or a focal length, a map indicating an absolute value of light intensity obtained by automatic exposure, a depth map based on a subject distance, or an optical flow based on a motion vector".
In an analogous field of endeavor, Rhodes teaches "The apparatus according to claim 12, wherein the first ground truth data is an RGB value of the captured image before white balance processing is applied, a defocus map based on an aperture value or a focal length, a map indicating an absolute value of light intensity obtained by automatic exposure, a depth map based on a subject distance, or an optical flow based on a motion vector"; (Rhodes, Paras. 36 and 53, teaches ground truth segmentation data being RGB channel values of input images before calculating optical flow which is based on pixel-wise motion vectors between segmentation and a current frame, i.e., the first ground truth RGB data is taken before processing and used to calculate optical flow).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Taheri in view of Hiasa, Kim, Koivisto, Ravuna, and Arberet by including the ground truth being an RGB value before processing is applied taught by Rhodes. One of ordinary skill in the art would be motivated to combine the references since it improves segmentation (Rhodes, Para. 2, teaches the motivation of combination to be to automate and improve dense fine-grain object segmentation).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Taheri in view of Hiasa, Kim, Koivisto, Ravuna, Arberet, Rhodes, and Li; Ruei et al. (US 20200013190 A1, hereinafter Li’190).

Regarding Claim 16, the combination of references of Taheri in view of Hiasa, Kim, Koivisto, Ravuna, Arberet, and Rhodes does not explicitly teach "The apparatus according to claim 15, wherein the motion vector is calculated from a captured image at first time and a captured image at second time following the first time; and the optical flow is calculated from the captured image at the second time and a captured image at third time following the second time”.
In an analogous field of endeavor, Li’190 teaches "The apparatus according to claim 15, wherein the motion vector is calculated from a captured image at first time and a captured image at second time following the first time"; (Li'190, Para. 13, teaches calculating a motion vector from a first image and a second image wherein the second image has a time that follows the first image time);
"and the optical flow is calculated from the captured image at the second time and a captured image at third time following the second time"; (Li'190, Para. 63, teaches calculating optical flow between the third image and second image wherein the third image has a time that follows the second image time).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Taheri in view of Hiasa, Kim, Koivisto, Ravuna, Arberet, and Rhodes by including the motion vector being calculated from an image at a first and second time and the optical flow is calculated at the second and third time image taught by Li’190. One of ordinary skill in the art would be motivated to combine the references since it tracks the position of the object over time (Li'190, Para. 63, teaches the motivation of combination to be track the position of the object between images or times).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Taheri in view of Hiasa, Kim, and Ravuna.

Regarding Claim 17, the combination of references of Taheri in view of Hiasa and Kim teaches "An apparatus comprising: a processor; and a memory including instructions stored thereon, which when executed cause the apparatus to: provide a machine learning model, which has been trained, configured to perform a recognition process on a recognition target in a captured image, based on pixel information of the captured image, and information about the captured image other than the pixel information, the machine learning model consisting of:"; (Taheri, Paras. 58, 70, 82, and 137, teach a data processor that receives instructions from memory that comprises a machine learning model trained to perform object detection, i.e., recognition process on a target, within an image based on pixel data and other appropriate data of the image, i.e., target recognition using pixel information and information other than pixel information to determine depicted objects);
"a first portion, on which learning is performed so as to extract and output characteristic of the pixel information using the pixel information as input"; (Hiasa, Fig. 1 [201-202] and Para. 53, teaches inputting an image to a first convolution layer of the machine learning model, i.e., first portion which is trained or learned, wherein the input image includes pixel information and a feature map is output, i.e., extract and output characteristic of the pixel information);
"and a second portion which follows the first portion, on which learning is performed  (Hiasa, FIG. 13 and Paras. 34, 43, 57, and 78, teaches the filters to be read based on information on a lens state when a captured image is obtained in which a lens state refers to the zoom, the F-number, and the in-focus distance wherein a blurred image and an intermediate corrected image are input to a first convolution layer that outputs a first feature map which is a summary of the results calculated for each filter and inputs the first feature map into the second convolution layer wherein the result obtained by repeating this operation by inputting an N-1th feature map to an N-th convolutional layer is a corrected component which is added to the blurred image to obtain a corrected image, i.e., input correction information by correcting the output of the first portion by using non-pixel image information into a second portion of the model that follows the first portion);
"and a second portion which follows the first portion, on which learning is performed so as to perform the recognition process using correction information obtained by correcting an output of the first portion  (Kim, Claim 13, teaches classifying the at least one object into the class based on the extracted feature point wherein a first layer corrects the input image and a third layer extracts the feature point from the correct image and the second layer relates to the classifying of the object, i.e., perform the recognition process being the classification of the objects by inputting correction information of the input image of the first portion of the model to a second portion of the model that follows the first portion);
"wherein before being used in a convolutional calculation in an intermediate layer, the information about the captured image other than the pixel information undergoes a process of multiplying the information by a previously learned weight, or a process of adding a previously learned bias to the information"; (Ravuna, FIGs. 24 and 25 and Paras. 165 and 195-207, teaches a scalar distance input being provided to an activation function after multiplying by the weight and adding the bias wherein the input of the softmax layer may be multiplied by the output of the activation function wherein the distance is provided as an input to the second network prior to use in the convolutional calculations of the intermediate layers, i.e., the input of the scalar distance being the information about the captured image other than the pixel information undergoes a process of multiplying the information by a previously learned weight and of adding a previously learned bias to the information before being used in the convolutional calculation of the softmax layer being the intermediate layer or before being input to a second network for convolutional calculations within its intermediate layers).
The proposed combination as well as the motivation for combining the Taheri in view of Hiasa, Kim, Koivisto, and Ravuna references presented in the rejection of Claim 1, applies to claim 17. Thus, the apparatus recited in claim 17 is met by Taheri in view of Hiasa, Kim, and Ravuna.

	Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW STEVEN BUDISALICH whose telephone number is (703)756-5568. The examiner can normally be reached Monday - Friday 8:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached on (571) 272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx
for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ANDREW S BUDISALICH/Examiner, Art Unit 2662

/AMANDEEP SAINI/Supervisory Patent Examiner, Art Unit 2662
Read full office action
Prosecution Timeline

Sep 02, 2022
Application Filed
Nov 22, 2024
Non-Final Rejection — §103
Dec 13, 2024
Examiner Interview (Telephonic)
Dec 13, 2024
Examiner Interview Summary
Apr 07, 2025
Response Filed
Apr 21, 2025
Final Rejection — §103
Jul 29, 2025
Response after Non-Final Action
Aug 21, 2025
Applicant Interview (Telephonic)
Aug 21, 2025
Examiner Interview Summary
Aug 28, 2025
Response after Non-Final Action
Sep 29, 2025
Request for Continued Examination
Oct 02, 2025
Response after Non-Final Action
Oct 29, 2025
Non-Final Rejection — §103
Jan 23, 2026
Interview Requested
Jan 29, 2026
Applicant Interview (Telephonic)
Jan 29, 2026
Examiner Interview Summary
Feb 02, 2026
Response Filed
Feb 27, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/342,892
Patent 12602820
METHOD AND APPARATUS WITH ATTENTION-BASED OBJECT ANALYSIS
2y 5m to grant Granted Apr 14, 2026
18/038,197
Patent 12597106
METHOD AND APPARATUS FOR IDENTIFYING DEFECT GRADE OF BAD PICTURE, AND STORAGE MEDIUM
2y 5m to grant Granted Apr 07, 2026
18/215,428
Patent 12592078
VIDEO MONITORING DEVICE, VIDEO MONITORING SYSTEM, VIDEO MONITORING METHOD, AND STORAGE MEDIUM STORING VIDEO MONITORING PROGRAM
2y 5m to grant Granted Mar 31, 2026
18/333,890
Patent 12586232
METHOD FOR OBJECT DETECTION USING CROPPED IMAGES
2y 5m to grant Granted Mar 24, 2026
17/954,417
Patent 12567151
Microscopy System and Method for Instance Segmentation
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
78%
Grant Probability
87%
With Interview (+8.9%)
2y 9m
Median Time to Grant
High
PTA Risk
Based on 46 resolved cases by this examiner. Grant probability derived from career allow rate.
INFORMATION PROCESSING APPARATUS, MACHINE LEARNING MODEL, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email