Prosecution Insights
Last updated: April 19, 2026
Application No. 18/643,349

IMAGE PROCESSING APPARATUS, METHOD OF CONTROLLING IMAGE PROCESSING APPARATUS, AND STORAGE MEDIUM

Non-Final OA §103
Filed
Apr 23, 2024
Examiner
MILLER, RONDE LEE
Art Unit
2663
Tech Center
2600 — Communications
Assignee
Canon Kabushiki Kaisha
OA Round
1 (Non-Final)
73%
Grant Probability
Favorable
1-2
OA Rounds
2y 11m
To Grant
99%
With Interview

Examiner Intelligence

Grants 73% — above average
73%
Career Allow Rate
16 granted / 22 resolved
+10.7% vs TC avg
Strong +38% interview lift
Without
With
+37.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
26 currently pending
Career history
48
Total Applications
across all art units

Statute-Specific Performance

§101
11.2%
-28.8% vs TC avg
§103
46.5%
+6.5% vs TC avg
§102
20.8%
-19.2% vs TC avg
§112
19.5%
-20.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 22 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . The IDS filed 12/16/2024 have been received and considered. Claims 4 and 9 have been objected to. Claims 1 – 3, 5 – 8, and 10 – 11, all of the remaining claims pending in this application, have been rejected. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1 – 3, 5 – 8, and 10 – 11 are rejected under 35 U.S.C. 103 as being unpatentable over US Publication No. 2022/0207904 A1 to Uchiyama et al. (hereinafter Uchiyama) in view of Non-Patent Literature "Learning Clip Representations for Skeleton-Based 3D Action Recognition" to Ke et al. (hereinafter Ke). Claim 1 Regarding Claim 1, an independent apparatus claim, Uchiyama teaches an image processing apparatus, comprising: at least one processor (Paragraph [0136]); and a memory coupled to the at least one processor (Paragraph [0136]), the memory storing instructions that, when executed by the at least one processor (Paragraph [0136]), cause the at least one processor to: detect a region of a human body from an image (Figure 9A); PNG media_image1.png 335 362 media_image1.png Greyscale detect joint points of the human body from the first clipped image to generate joint point information on the human body ("In step S402, a plurality of feature points associated with a plurality of parts of an object is detected from a captured image of the object including the plurality of parts in the image frame acquired in step S401 (first detection method). This step corresponds to an operation of the first detection unit 102 illustrated in FIG. 1. In step S402, the image frame is input, and a plurality of feature points of a person in the image and the reliability of each feature point are extracted. For each of the detected feature points, the reliability indicating a likelihood that the feature point is present in the image is acquired. If an image processing target is a person, a position of each joint of a human body can be used as a feature point. In this step, five feature points, namely, a head vertex, a neck, a waist, a right ankle, and a left ankle of a person are detected, In the detection of the feature points", Paragraph [0045]); Uchiyama does not teach generate, from the image, a first clipped image including the region of the human body and a second clipped image different from the first clipped image and including the region of the human body; convert the joint point information so as to have spatial information coincident with spatial information on the second clipped image; estimate a posture of the human body based on the converted joint point information and the second clipped image. However, Ke teaches generate, from the image, a first clipped image including the region of the human body and a second clipped image different from the first clipped image and including the region of the human body (Figure 5; "The main steps of the proposed method are: 1) the generation of clips from skeleton sequences using two different methods, i.e., clip generation by computing the relative positions between joints (RelaClips) and clip generation by rotating the skeletons (RotClips).2) the use of Multi-task CNN (MTCNN) to jointly train the multiple frames of the generated clips to learn the spatial temporal information of the skeleton sequences for action recognition…RelaFrames+CNN: In this baseline, only one single frame of each clip (that is generated with the relative positions between the skeleton joints) is used to train a CNN for classification…RotFrames+CNN: This baseline is similar to RelaFrames+CNN except that the frame is generated by rotating the skeletons."; Section IV - Experiments); PNG media_image2.png 273 355 media_image2.png Greyscale convert the joint point information so as to have spatial information coincident with spatial information on the second clipped image ("The entire clip aggregates multiple frames with different spatial relationships to provide important information of the spatial structure of the human skeleton. The first method is based on the relative coordinates between different skeleton joints. This is inspired by the observation that the relative positions between skeleton joints generally provide discriminant information of different body postures, which can be used to describe the spatial structure of the skeleton. Previous works have also used pairs of joints to design relational features such as joint distance for interaction detection and recognition [41], [42]. The second method of clip generation is based on the coordinates of the skeleton joints in different viewpoints. Clip generation from different viewpoints is inspired by the observation that humans usually obtain a good understanding of an object by observing the object from different viewpoints. Each frame of a generated clip captures the information of the skeleton from one particular directions. All of the frames can be aggregated to obtain the spatial structural information of the skeleton.", Section I - Introduction); and estimate a posture of the human body based on the converted joint point information and the second clipped image ("This paper is an extension of our previous conference paper [44]. In [44], the clip representation is learned without taking viewpoint variations into account. In this paper, we introduce a new clip generation method, which is more robust to viewpoint variations. We also extensively evaluate the proposed method on three additional datasets and extend the experimental analysis. The main contributions of this paper include the following. First, we introduce two different methods of transforming each skeleton sequence to a new representation, i.e., three clips, to allow spatial temporal feature learning using deep networks. Second, the additional clip representation that is proposed in this paper is robust to viewpoint variations. Third, we introduce an MTCNN to process the generated clips for action recognition. MTCNN utilizes the intrinsic relationships between the different frames of the clips and improves the overall performance. The proposed method achieves state-of-the-art performance on six skeleton datasets, including the large scale NTU RGB+D dataset", Section I - Introduction), where the list of actions (or posture) can be found in Section IV - Experiments: Part A - Datasets where an example of one particular dataset "UTKinect-Action3D Dataset" contains 200 action sequences which are performed by 10 actors. There are 10 action classes, i.e., sit down, stand up, walk, carry, pick up, pull, push, clap hands, wave and throw. The 3D coordinates of 20 joints are provided for each skeleton. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Uchiyama to incorporate generating a second clipped image from the captured image and using the second clipped image to obtain spatial information and determine the action (posture) of the human body detected, as disclosed by Ke. The suggestion/motivation for doing so would have been to make it easier for a machine learning model to learn and output results by using generated clips which are much smaller than the originally captured image. Claim 2 Regarding Claim 2, dependent on claim 1, Uchiyama, in view of Ke, teaches the invention as claimed in claim 1. Uchiyama further teaches wherein the joint point information is a map indicating positions of joint points as likelihood ("In this method, a reliability map indicating joint positions on the image is calculated using a trained model (neural network). The reliability map is a two-dimensional map. Where the number of joint points is represented by P, (P+1) maps are present (one map corresponds to a background). In a reliability map indicating a certain joint point, a position with a high reliability is regarded as a position where the joint point is present. The reliability is indicated by a real number in a range from 0 to 1 representing the likelihood that the feature point is present in the image.", Paragraph [0045]). Uchiyama, in view of Ke, further teach wherein the instructions cause the at least one processor to convert the map into a map having spatial information coincident with the spatial information on the second clipped image by performing at least one type of conversion processing among types of conversion processing of rotation, resizing, and addition of margins, on the map (Rejected as applied to claim 1). Claim 3 Regarding Claim 3, dependent on claim 1, Uchiyama, in view of Ke, teaches the invention as claimed in claim 1. Uchiyama further teaches wherein the instructions cause the at least one processor to convert the joint point information into information having a weight assigned to each of the detected joint points ("In this case, weighting is performed to reflect the feature amounts of the respective parts in the entire feature amount depending on the reliability of each feature point. In other words, a feature amount extracted from a part corresponding to a feature point with a low reliability is prevented from contributing to a final recognition result. This is because the feature point with the low reliability may indicate that an object is occluded or much noise is generated, and thus the feature amount extracted from the part does not always indicate the feature of the part of the object.", Paragraph [0088]). Claim 5 Regarding Claim 5, dependent on claim 1, Uchiyama, in view of Ke, teaches the invention as claimed in claim 1. Uchiyama does not teach wherein the instructions cause the at least one processor to estimate the posture of the human body by using a deep learning model including a first feature extraction layer configured to output a first feature amount from the converted joint point information, a second feature extraction layer configured to output a second feature amount from the second clipped image, a connected layer configured to connect the first feature amount and the second feature amount, and an identification layer configured to estimate a posture based on the connected feature amounts. However, Ke teaches wherein the instructions cause the at least one processor to estimate the posture of the human body by using a deep learning model including a first feature extraction layer configured to output a first feature amount from the converted joint point information, a second feature extraction layer configured to output a second feature amount from the second clipped image, a connected layer configured to connect the first feature amount and the second feature amount, and an identification layer configured to estimate a posture based on the connected feature amounts (Introduction). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify the teachings of Uchiyama to incorporate feature extraction layers for the first and second clipped images being connected to estimate the action (posture), as disclosed by Ke. The suggestion/motivation for doing so would have been to have a higher level of accuracy when determining exactly which action (posture) is being exemplified of the human body based on the image clips derived from the captured image. Claim 6 Regarding Claim 6, dependent on claim 1, Uchiyama, in view of Ke, teaches the invention as claimed in claim 1. Uchiyama further teaches wherein the instructions cause the at least one processor to: detect the region of the human body by rotating the image (Figure 9B; "In step S406 illustrated in FIG. 4, the image extraction unit 107 clips the partial image area determined in step S405 as a person image from the image frame. If the rectangle of the partial image area determined in step S405 is inclined, the image is rotated so that the rectangle is in an upright position. FIG. 9B illustrates an example where the area is clipped from the image frame 903 illustrated in FIG. 9A. The operation of step S406 corresponds to an operation of the image extraction unit 107 illustrated in FIG. 1.", Paragraph [0073]); and PNG media_image3.png 251 216 media_image3.png Greyscale generate the first clipped image from the image rotated by a rotation amount at which the region of the human body is detected (Rejected as applied directly above). Claim 7 Regarding Claim 7, dependent on claim 1, Uchiyama, in view of Ke, teaches the invention as claimed in claim 1. Uchiyama, in view of Ke, further teaches wherein the instructions cause the at least one processor to: estimate a depth of an imaging environment from the image (Rejected as applied to claim 1), specifically Ke, where the acquiring spatial information between the 3D skeletal joints requires depth information; and convert the joint point information based on the estimated depth (Rejected as applied directly above). Claim 8 Regarding Claim 8, dependent on claim 1, Uchiyama, in view of Ke, teaches the invention as claimed in claim 1. Uchiyama, in view of Ke, further teaches estimate a depth of an imaging environment from the image (Rejected as applied to claim 7); and generate the second clipped image based on the estimated depth (Rejected as applied to claim 1), specifically Ke, where the acquiring spatial information between the 3D skeletal joints requires depth information. Claim 10, an independent method claim, is rejected for the same reasons as applied to claim 1. Claim 11, an independent non-transitory computer-readable storage medium claim, is rejected for the same reasons as applied to claim 1. Allowable Subject Matter Claims 4 and 9 have been objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US Publication No. 2021/0407264 A1 to Kawano et al. (hereinafter Kawano) Non Patent Literature "High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification" to Wang et al. (hereinafter Wang) Non Patent Literature "Pose-Guided Feature Alignment for Occluded Person Re-Identification" to Miao et al. (hereinafter Miao) Non Patent Literature "Region Generation and Assessment Network for Occluded Person Re-Identification " to He et al. (hereinafter He) Non Patent Literature "Pose-guided Visible Part Matching for Occluded Person ReID" to Gao et al. (hereinafter Gao) A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ronde Miller whose telephone number is (703) 756-5686 The examiner can normally be reached Monday-Friday 8:00-4:00. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Gregory Morse can be reached on (571) 272-3838. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /RONDE LEE MILLER/Examiner, Art Unit 2663 /GREGORY A MORSE/Supervisory Patent Examiner, Art Unit 2698
Read full office action

Prosecution Timeline

Apr 23, 2024
Application Filed
Feb 18, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12573215
LEARNING APPARATUS, LEARNING METHOD, OBJECT DETECTION APPARATUS, OBJECT DETECTION METHOD, LEARNING SUPPORT SYSTEM AND LEARNING SUPPORT METHOD
2y 5m to grant Granted Mar 10, 2026
Patent 12548114
METHOD FOR CODE-LEVEL SUPER RESOLUTION AND METHOD FOR TRAINING SUPER RESOLUTION MODEL THEREFOR
2y 5m to grant Granted Feb 10, 2026
Patent 12524833
X-RAY DIAGNOSIS APPARATUS, MEDICAL IMAGE PROCESSING APPARATUS, AND STORAGE MEDIUM
2y 5m to grant Granted Jan 13, 2026
Patent 12502905
SECURE DOCUMENT AUTHENTICATION
2y 5m to grant Granted Dec 23, 2025
Patent 12505581
ONLINE TRAINING COMPUTER VISION TASK MODELS IN COMPRESSION DOMAIN
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
73%
Grant Probability
99%
With Interview (+37.5%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 22 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month