Last updated: May 04, 2026

Application No. 18/643,349

IMAGE PROCESSING APPARATUS, METHOD OF CONTROLLING IMAGE PROCESSING APPARATUS, AND STORAGE MEDIUM

Non-Final OA §103

Filed

Apr 23, 2024

Priority

Apr 25, 2023 — JP 2023-071680

Examiner

MILLER, RONDE LEE

Art Unit

2663

Tech Center

2600 — Communications

Assignee

Canon Kabushiki Kaisha

OA Round

1 (Non-Final)

Interview Optional

— +24.6% interview lift. Examiner has a relatively high allowance rate (68%); +24.6% interview lift. A written response may suffice.

Based on 25 resolved cases, 2023–2026

Examiner Intelligence

MILLER, RONDE LEE View full profile →

Grants 68% — above average

Career Allowance Rate

17 granted / 25 resolved

+6.0% vs TC avg

Strong +25% interview lift

Without

With

+24.6%

Interview Lift

resolved cases with interview

Typical timeline

2y 10m

Avg Prosecution

24 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

10.9%

-29.1% vs TC avg

§103

47.8%

+7.8% vs TC avg

§102

20.2%

-19.8% vs TC avg

§112

19.0%

-21.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 25 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

The IDS filed 12/16/2024 have been received and considered.

	Claims 4 and 9 have been objected to.
	Claims 1 – 3, 5 – 8, and 10 – 11, all of the remaining claims pending in this application, have been rejected.

	Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	Claims 1 – 3, 5 – 8, and 10 – 11 are rejected under 35 U.S.C. 103 as being unpatentable over US Publication No. 2022/0207904 A1 to Uchiyama et al. (hereinafter Uchiyama) in view of Non-Patent Literature "Learning Clip Representations for Skeleton-Based 3D Action Recognition" to Ke et al. (hereinafter Ke).
Claim 1
Regarding Claim 1, an independent apparatus claim, Uchiyama teaches an image processing apparatus, comprising: at least one processor (Paragraph [0136]); and
a memory coupled to the at least one processor (Paragraph [0136]), the memory storing instructions that, when executed by the at least one processor (Paragraph [0136]), cause the at least one processor to: detect a region of a human body from an image (Figure 9A);

    PNG
    media_image1.png
    335
    362
    media_image1.png
    Greyscale

detect joint points of the human body from the first clipped image to generate joint point information on the human body ("In step S402, a plurality of feature points associated with a plurality of parts of an object is detected from a captured image of the object including the plurality of parts in the image frame acquired in step S401 (first detection method). This step corresponds to an operation of the first detection unit 102 illustrated in FIG. 1. In step S402, the image frame is input, and a plurality of feature points of a person in the image and the reliability of each feature point are extracted. For each of the detected feature points, the reliability indicating a likelihood that the feature point is present in the image is acquired. If an image processing target is a person, a position of each joint of a human body can be used as a feature point. In this step, five feature points, namely, a head vertex, a neck, a waist, a right ankle, and a left ankle of a person are detected, In the detection of the feature points", Paragraph [0045]);
Uchiyama does not teach generate, from the image, a first clipped image including the region of the human body and a second clipped image different from the first clipped image and including the region of the human body;
convert the joint point information so as to have spatial information coincident with spatial information on the second clipped image;
estimate a posture of the human body based on the converted joint point information and the second clipped image.
However, Ke teaches generate, from the image, a first clipped image including the region of the human body and a second clipped image different from the first clipped image and including the region of the human body (Figure 5; "The main steps of the proposed method are: 1) the generation of clips from skeleton sequences using two different methods, i.e., clip generation by computing the relative positions between joints (RelaClips) and clip generation by rotating the skeletons (RotClips).2) the use of Multi-task CNN (MTCNN) to jointly train the multiple frames of the generated clips to learn the spatial temporal information of the skeleton sequences for action recognition…RelaFrames+CNN: In this baseline, only one single frame of each clip (that is generated with the relative positions between the skeleton joints) is used to train a CNN for classification…RotFrames+CNN: This baseline is similar to RelaFrames+CNN except that the frame is generated by rotating the skeletons."; Section IV - Experiments);

    PNG
    media_image2.png
    273
    355
    media_image2.png
    Greyscale

convert the joint point information so as to have spatial information coincident with spatial information on the second clipped image ("The entire clip aggregates multiple frames with different spatial relationships to provide important information of the spatial structure of the human skeleton. The first method is based on the relative coordinates between different skeleton joints. This is inspired by the observation that the relative positions between skeleton joints generally provide discriminant information of different body postures, which can be used to describe the spatial structure of the skeleton. Previous works have also used pairs of joints to design relational features such as joint distance for interaction detection and recognition [41], [42]. The second method of clip generation is based on the coordinates of the skeleton joints in different viewpoints. Clip generation from different viewpoints is inspired by the observation that humans usually obtain a good understanding of an object by observing the object from different viewpoints. Each frame of a generated clip captures the information of the skeleton from one particular directions. All of the frames can be aggregated to obtain the spatial structural information of the skeleton.", Section I - Introduction); and
estimate a posture of the human body based on the converted joint point information and the second clipped image ("This paper is an extension of our previous conference paper [44]. In [44], the clip representation is learned without taking viewpoint variations into account. In this paper, we introduce a new clip generation method, which is more robust to viewpoint variations. We also extensively evaluate the proposed method on three additional datasets and extend the experimental analysis. The main contributions of this paper include the following. First, we introduce two different methods of transforming each skeleton sequence to a new representation, i.e., three clips, to allow spatial temporal feature learning using deep networks. Second, the additional clip representation that is proposed in this paper is robust to viewpoint variations. Third, we introduce an MTCNN to process the generated clips for action recognition. MTCNN utilizes the intrinsic relationships between the different frames of the clips and improves the overall performance. The proposed method achieves state-of-the-art performance on six skeleton datasets, including the large scale NTU RGB+D dataset", Section I - Introduction), where the list of actions (or posture) can be found in Section IV - Experiments: Part A - Datasets where an example of one particular dataset "UTKinect-Action3D Dataset" contains 200 action sequences which are performed by 10 actors. There are 10 action classes, i.e., sit down, stand up, walk, carry, pick up, pull, push, clap hands, wave and throw. The 3D coordinates of 20 joints are provided for each skeleton.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Uchiyama to incorporate generating a second clipped image from the captured image and using the second clipped image to obtain spatial information and determine the action (posture) of the human body detected, as disclosed by Ke. The suggestion/motivation for doing so would have been to make it easier for a machine learning model to learn and output results by using generated clips which are much smaller than the originally captured image.

Claim 2
Regarding Claim 2, dependent on claim 1, Uchiyama, in view of Ke, teaches the invention as claimed in claim 1.
Uchiyama further teaches wherein the joint point information is a map indicating positions of joint points as likelihood ("In this method, a reliability map indicating joint positions on the image is calculated using a trained model (neural network). The reliability map is a two-dimensional map. Where the number of joint points is represented by P, (P+1) maps are present (one map corresponds to a background). In a reliability map indicating a certain joint point, a position with a high reliability is regarded as a position where the joint point is present. The reliability is indicated by a real number in a range from 0 to 1 representing the likelihood that the feature point is present in the image.", Paragraph [0045]).
Uchiyama, in view of Ke, further teach wherein the instructions cause the at least one processor to convert the map into a map having spatial information coincident with the spatial information on the second clipped image by performing at least one type of conversion processing among types of conversion processing of rotation, resizing, and addition of margins, on the map (Rejected as applied to claim 1).

Claim 3
Regarding Claim 3, dependent on claim 1, Uchiyama, in view of Ke, teaches the invention as claimed in claim 1.
Uchiyama further teaches wherein the instructions cause the at least one processor to convert the joint point information into information having a weight assigned to each of the detected joint points ("In this case, weighting is performed to reflect the feature amounts of the respective parts in the entire feature amount depending on the reliability of each feature point. In other words, a feature amount extracted from a part corresponding to a feature point with a low reliability is prevented from contributing to a final recognition result. This is because the feature point with the low reliability may indicate that an object is occluded or much noise is generated, and thus the feature amount extracted from the part does not always indicate the feature of the part of the object.", Paragraph [0088]).

Claim 5
Regarding Claim 5, dependent on claim 1, Uchiyama, in view of Ke, teaches the invention as claimed in claim 1.
Uchiyama does not teach wherein the instructions cause the at least one processor to estimate the posture of the human body by using a deep learning model including a first feature extraction layer configured to output a first feature amount from the converted joint point information, a second feature extraction layer configured to output a second feature amount from the second clipped image, a connected layer configured to connect the first feature amount and the second feature amount, and an identification layer configured to estimate a posture based on the connected feature amounts.
However, Ke teaches wherein the instructions cause the at least one processor to estimate the posture of the human body by using a deep learning model including a first feature extraction layer configured to output a first feature amount from the converted joint point information, a second feature extraction layer configured to output a second feature amount from the second clipped image, a connected layer configured to connect the first feature amount and the second feature amount, and an identification layer configured to estimate a posture based on the connected feature amounts (Introduction).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify the teachings of Uchiyama to incorporate feature extraction layers for the first and second clipped images being connected to estimate the action (posture), as disclosed by Ke. The suggestion/motivation for doing so would have been to have a higher level of accuracy when determining exactly which action (posture) is being exemplified of the human body based on the image clips derived from the captured image.

Claim 6
Regarding Claim 6, dependent on claim 1, Uchiyama, in view of Ke, teaches the invention as claimed in claim 1.
Uchiyama further teaches wherein the instructions cause the at least one processor to: detect the region of the human body by rotating the image (Figure 9B; "In step S406 illustrated in FIG. 4, the image extraction unit 107 clips the partial image area determined in step S405 as a person image from the image frame. If the rectangle of the partial image area determined in step S405 is inclined, the image is rotated so that the rectangle is in an upright position. FIG. 9B illustrates an example where the area is clipped from the image frame 903 illustrated in FIG. 9A. The operation of step S406 corresponds to an operation of the image extraction unit 107 illustrated in FIG. 1.", Paragraph [0073]); and

    PNG
    media_image3.png
    251
    216
    media_image3.png
    Greyscale

generate the first clipped image from the image rotated by a rotation amount at which the region of the human body is detected (Rejected as applied directly above).

Claim 7
Regarding Claim 7, dependent on claim 1, Uchiyama, in view of Ke, teaches the invention as claimed in claim 1.
Uchiyama, in view of Ke, further teaches wherein the instructions cause the at least one processor to: estimate a depth of an imaging environment from the image (Rejected as applied to claim 1), specifically Ke, where the acquiring spatial information between the 3D skeletal joints requires depth information; and
convert the joint point information based on the estimated depth (Rejected as applied directly above).

Claim 8
Regarding Claim 8, dependent on claim 1, Uchiyama, in view of Ke, teaches the invention as claimed in claim 1.
Uchiyama, in view of Ke, further teaches estimate a depth of an imaging environment from the image (Rejected as applied to claim 7); and
generate the second clipped image based on the estimated depth (Rejected as applied to claim 1), specifically Ke, where the acquiring spatial information between the 3D skeletal joints requires depth information.

Claim 10, an independent method claim, is rejected for the same reasons as applied to claim 1.

Claim 11, an independent non-transitory computer-readable storage medium claim, is rejected for the same reasons as applied to claim 1.

Allowable Subject Matter
Claims 4 and 9 have been objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

US Publication No. 2021/0407264 A1 to Kawano et al. (hereinafter Kawano)

Non Patent Literature "High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification" to Wang et al. (hereinafter Wang)

Non Patent Literature "Pose-Guided Feature Alignment for Occluded Person Re-Identification" to Miao et al. (hereinafter Miao)

Non Patent Literature "Region Generation and Assessment Network for Occluded Person Re-Identification " to He et al. (hereinafter He)

Non Patent Literature "Pose-guided Visible Part Matching for Occluded Person ReID" to Gao et al. (hereinafter Gao)

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ronde Miller whose telephone number is (703) 756-5686 The examiner can normally be reached Monday-Friday 8:00-4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Gregory Morse can be reached on (571) 272-3838. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/RONDE LEE MILLER/Examiner, Art Unit 2663                                                                                                                                                                                                        
/GREGORY A MORSE/Supervisory Patent Examiner, Art Unit 2698

Read full office action

Prosecution Timeline

Apr 23, 2024

Application Filed

Feb 18, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/899,122

Patent 12573215

LEARNING APPARATUS, LEARNING METHOD, OBJECT DETECTION APPARATUS, OBJECT DETECTION METHOD, LEARNING SUPPORT SYSTEM AND LEARNING SUPPORT METHOD

3y 6m to grant Granted Mar 10, 2026

17/972,034

Patent 12548114

METHOD FOR CODE-LEVEL SUPER RESOLUTION AND METHOD FOR TRAINING SUPER RESOLUTION MODEL THEREFOR

3y 3m to grant Granted Feb 10, 2026

17/819,037

Patent 12524833

X-RAY DIAGNOSIS APPARATUS, MEDICAL IMAGE PROCESSING APPARATUS, AND STORAGE MEDIUM

3y 5m to grant Granted Jan 13, 2026

17/926,152

Patent 12502905

SECURE DOCUMENT AUTHENTICATION

3y 1m to grant Granted Dec 23, 2025

18/125,008

Patent 12505581

ONLINE TRAINING COMPUTER VISION TASK MODELS IN COMPRESSION DOMAIN

2y 9m to grant Granted Dec 23, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

68%

Grant Probability

93%

With Interview (+24.6%)

2y 10m (~10m remaining)

Median Time to Grant

Low

PTA Risk

Based on 25 resolved cases by this examiner. Grant probability derived from career allowance rate.