Last updated: April 19, 2026

Application No. 18/133,185

SYSTEMS AND METHODS FOR MULTI-PERSON POSE ESTIMATION

Final Rejection §103

Filed

Apr 11, 2023

Examiner

KOETH, MICHELLE M

Art Unit

2671

Tech Center

2600 — Communications

Assignee

Shanghai United Imaging Intelligence Co. Ltd.

OA Round

2 (Final)

Interview Optional

— +16.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 429 resolved cases, 2023–2026

Examiner Intelligence

KOETH, MICHELLE M View full profile →

Grants 77% — above average

Career Allow Rate

331 granted / 429 resolved

+15.2% vs TC avg

Strong +17% interview lift

Without

With

+16.7%

Interview Lift

resolved cases with interview

Typical timeline

2y 4m

Avg Prosecution

34 currently pending

Career history

463

Total Applications

across all art units

Statute-Specific Performance

§101

7.4%

-32.6% vs TC avg

§103

62.2%

+22.2% vs TC avg

§102

8.5%

-31.5% vs TC avg

§112

14.7%

-25.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 429 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments and amendments in the Amendment filed 10/1/2025 (herein “Amendment”), with respect to the objection to claim 19 have been fully considered and are persuasive.  The objection to claim 19 has been withdrawn. 
Applicant’s arguments and amendments in the Amendment with respect to the rejections of claims 1–20 under 35 U.S.C. 103 have been fully considered and are persuasive in part.  Therefore, the rejection has been withdrawn.  However, upon further consideration, new grounds of rejection are made in view of Akbas, U.S. Patent No. 11,321,868 B1. For clarity of the record, the following additional remarks in response to applicants arguments are made. Applicant sets forth on page 8 of the Amendment that “The Office Action alleges that Fang teaches [limitations partially recited by claims 3-5 previously pending] claim limitations. However, its noted that the Office Action issued 7/1/2025 relied upon the combination of Feng with Ramani, with more reliance upon Ramani than Feng for limitations recited previously in claims 3–5. Therefore, to the extent the Ramani reference was relied upon for limitations from previously pending claims 3–5, now amended into the independent claims, that reliance upon Ramani is maintained in the updated rejection rationale provided below.
Further, to the extent applicant argues against the motivation to combine Feng with Ramani, on pages 8–9 of the Amendment, due to the Applicant’s own asserted combination of Fang’s “deterministic” refinement model not being combinable with Ramani’s refinement architecture, such arguments are not found persuasive. As provided in further detail in the updated rejection rationale below, Ramani’s refinement architecture operates from features determined from a Pose Relation Transformer, disclosed as being a trained model. Therefore, at least Ramani teaches using features output from a trained model. The stated motivation to combine Feng with Ramani is given as well to increase reliability of and interoperability of keypoint detection. Because Applicant has not addressed the stated motivation to combine of record in their contentions that “there is no teaching or suggestion for the claimed recovery step,” Applicant’s arguments, while having been fully considered, are not persuasive, and the rejection maintains the combination of Feng with Ramani, but also now adds into the combination the teachings of newly cited Akbar directed towards the newly amended limitations.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1–2, 6, 10–12, 16, and 19–20 are rejected under 35 U.S.C. 103 as being unpatentable over Fang et al., US Patent Application Publication No. US 2024/0290034 A1 (herein “Fang”) in view of Ramani et al., US Patent Application Publication No. US 2024/0296582 A1 (herein “Ramani”) in view of Akbas et al., US Patent No. 11,321,868 B1 (herein “Akbas”).
Regarding claims 1 and 11, with significant differences between the claims noted in curly brackets {}, and with deficiencies of Fang noted in square brackets [], and with claim 1 as exemplary, Fang teaches {an apparatus, comprising: at least one processor configured to: - claim 1 (Fang Abstract, ¶¶123 and 128, image processing system including processors to perform the disclosed operations)/ a method of image processing, the method comprising: - claim 11 (Fang Abstract, method of image processing)}
obtain an image that depicts at least a first person and a second person in a scene (Fang ¶43, video stream of images obtained showing one or more beings from a scene of a team sports event with two or more players on a play field); 
determine, based on a first machine learning (ML) model, a plurality of joint locations in the image, wherein the first ML model has been pre-trained to extract a first plurality of features from the image and the plurality of joint locations is determined based on the first plurality of features (Fang ¶¶70–71, use of neural networks (first machine learning model) to detect joints from a 2D bounding box,  from extracted pose points (first plurality of features))[without identifying a person to whom each of the plurality of joint locations belongs];
further determine, based on the first machine learning (ML) model, that a first group of joint locations of the plurality of joint locations belongs to a first person and a second group of joint locations of the plurality of joint locations belongs to a second person (Fang ¶¶70–71, use of neural networks (first machine learning model) to detect joints from a 2D bounding box, where ¶91 teaches that there can be multiple bounding boxes, where some of the bounding boxes are from “a different person” due to athletes being grouped together within the image) [wherein the first ML model has been further pre-trained to generate a heatmap that indicates the person to whom each of the plurality of joint locations belongs and wherein the further determination is made based on the heatmap]; 
recover one or more joint locations [missing from] the first group of joint locations or the second group of joint locations (Fang ¶¶101–102, fig. 12A disclosing a refinement operation including taking joints with erroneous locations and adjusting them into the correct location) [based at least on a second ML model, wherein the second ML model has been pre-trained to extract a second plurality of features from at least one of the first group of joint locations or the second group of joint locations, and wherein the one or more joint locations are recovered based on a combination of the first plurality of feature and the second plurality of features]; and 
perform a task using the one or more joint locations recovered based on the second ML model and at least one of the first group of joint locations or the second group of joint locations determined based on the first ML model (Fang ¶55, skeletons determined from the refined joint locations, are used in a task of motion analysis for surveillance, event detection or automatic driving, or other applications).
While Fang teaches identifying the joints from bounding boxes, and refining the joint points, Fang does not explicitly teach that the recovering is of joint locations missing from the first or second group of joint locations, nor does Fang explicitly teach that the recovering is “based at least on a second ML model, wherein the second ML model has been pre-trained to extract a second plurality of features from at least one of the first group of joint locations or the second group of joint locations, and wherein the one or more joint locations are recovered based on a combination of the first plurality of feature and the second plurality of features.”
Fang further does not explicitly teach the claimed “without identifying a person to whom each of the plurality of joint locations belongs,” and “wherein the first ML model has been further pre-trained to generate a heatmap that indicates the person to whom each of the plurality of joint locations belongs and wherein the further determination is made based on the heatmap.”
Ramani teaches that are missing from the first group of joint locations or second group of joint locations (Ramani ¶32, occluded joints from an image are reconstructed from visible joints), and Ramani further teaches based at least on a second ML model (Ramani ¶34, occlusion refinement architecture being its own neural network (second ML model)), wherein the second ML model has been pre-trained (Ramani ¶¶32 and 58, PORT, a pose-relation transformer (ML model) is trained to reconstruct occluded (missing) joints, before being used for inferencing, thus pre-trained) to extract a second plurality of features from at least one of the first group of joint locations or the second group of joint locations, and wherein the one or more joint locations are recovered based on a combination of the first plurality of feature and the second plurality of features (Ramani ¶¶32, 39–40, after the keypoint detector estimated joints from joint features are transformed into feature embeddings, from which skeleton features are extracted (second plurality of features), and where ¶¶36 and 51 teach determining a plurality of reconstructed keypoints (including the one or more missing joints) from processing the transformed feature embeddings).
Akbas teaches “without identifying a person to whom each of the plurality of joint locations belongs,” (Akbas fig. 1, col. 9, ll. 34-50, keypoint estimation subnet 30 taking hierarchical CNN features and outputting keypoint heatmaps 38 organized by keypoint class (thus without identifying the person having the joints as the heatmaps are for joint class only considering all joints (and people) in the aggregate)) and “wherein the first ML model has been further pre-trained to generate a heatmap that indicates the person to whom each of the plurality of joint locations belongs and wherein the further determination is made based on the heatmap” (Akbas fig. 1, col. 9, ll. 34-50, col. 11, ll. 31–55, keypoint estimation subnet 30 taking hierarchical CNN features and outputting a person segmentation mask 39 at the last layer of heatmaps that encodes the pixelwise spatial layout of people in the image, where the pose residual network uses the person segmentation mask as the last layer of heatmaps 38 to determine bounding boxes of each person and their respective joints).
Therefore, taking the teachings of Fang and Ramani together as a whole, it would have been obvious to a person having ordinary skill in the art (herein “PHOSITA”) before the effective filing date of the claimed invention to have modified the skeleton reconstruction system and method of Fang with the refinement of joints to include occluded joints in its own neural network as disclosed by Ramani at least because doing so would provide a model agnostic manner to refine keypoints, thereby increasing the reliability of and interoperability of keypoint detection. See Ramani ¶¶20–22.
Further, taking the teachings of Fang and Akbas together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the skeleton reconstruction system and method of Fang with the specifically cited above aspects of the heatmaps, person segmentation mask and pose residual network operations as disclosed by Akbas at least because doing so would provide a bottom-up approach to joint detection which is known to be faster in test time and smaller in size than other approaches to joint detection. See Akbas col. 2, ll. 61–64.
Regarding claims 2 and 12, Fang does not explicitly teach but Ramani teaches wherein the one or more joint locations that are missing from the first group of joint locations or second group of joint locations includes a joint location that is obstructed in the image (Ramani ¶32, occluded joins (obstructed in the image) are reconstructed using the visible joints).
Therefore, taking the teachings of Fang and Ramani together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the skeleton reconstruction system and method of Fang with the refinement of joints to include occluded joints as disclosed by Ramani at least because doing so would provide a model agnostic manner to refine keypoints, thereby increasing the reliability of and interoperability of keypoint detection. See Ramani ¶¶20–22.
Regarding claims 6 and 16, with claim 6 as exemplary, Fang does not explicitly teach, but Ramani teaches wherein the at least one processor is configured to recover the one or more joint locations based further on (Ramani ¶¶52 and 36, as part of forming a refined plurality of keypoints using the masking vector M (extracted second plurality of feature)) a third ML model (Ramani ¶53, pose relation transformer (a type of machine learning model) is used for the keypoint refinement) pre-trained for receiving a set of incomplete joint locations of a person, extracting features from the set of incomplete joint locations, and predicting one or more joint locations of the person that are missing from the set of incomplete joint locations based on the extracted features (Ramani figs. 1, and 3, ¶¶32, 52–53, 58, keypoint refinement 50 by the pose relation transformer (PORT) forms a refined plurality of keypoints (to predict and then include the missing joint locations) from the masking vector M using received plurality of keypoints J (incomplete joint locations) based on the confidence values of the keypoints in keypoints J (extracting features), where PORT is trained to reconstruct occluded (missing) joints, before being used for inferencing, thus pre-trained).
Therefore, taking the teachings of Fang and Ramani together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the skeleton reconstruction system and method of Fang with the refinement of joints to include further feature extraction for obtaining joint keypoints as disclosed by Ramani at least because doing so would provide a model agnostic manner to refine keypoints, thereby increasing the reliability of and interoperability of keypoint detection. See Ramani ¶¶20–22.
Regarding claims 10 and 19, with claim 10 as exemplary, Fang teaches wherein the task performed by the at least one processor includes determination of a pose of the first person or the second person (Fang ¶47, process including determining the pose of sports players).
Regarding claim 20, Fang teaches a non-transitory computer-readable medium comprising instructions that, when executed by a processor included in a computing device, cause the processor to implement the method of claim 11 (Fang ¶119, operations of the disclosed figures undertaken in response to instruction provided by computer program products provided by computer readable media such as a non-transitory computer readable medium).
Claims 7–8 and 17–18 are rejected under 35 U.S.C. 103 as being unpatentable over Fang in view of Ramani in view of Akbas, further in view of Xia et al., "Joint Multi-Person Pose Estimation and Semantic Part Segmentation," arXiv:1708.03383v1 [cs.CV], https://doi.org/10.48550/arXiv.1708.03383 (herein “Xia”).
Regarding claims 7 and 17, with claim 7 as exemplary, Fang as modified by Ramani above does not explicitly teach, but Xia teaches wherein the second ML model is trained at least to fuse the first plurality of features and the second plurality of features, and to determine the one or more joint locations missing from the first group of joint locations or the second group of joint locations based on the fused features (Xia pages 2, an FCRF (ML model) is used to fuse various joint score maps (first and second plurality of features), to refine the joint locations including details where appearance cues are missing (joint locations missing from the first group)).
Therefore, taking the teachings of Fang as modified by Ramani and Akbas above, and Xia together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the skeleton reconstruction system including joint refinement of Fang modified by Ramani, with the fusion of joint maps as disclosed by Xia at least because doing so would allow for reducing the learning difficulty in training neural networks for pose estimation (Xia page 2, left column).
Regarding claims 8 and 18, with claim 8 as exemplary, Fang as modified by Ramani teaches the first plurality of features and the second plurality of features by averaging the first plurality of features and the second plurality of features (Fang ¶¶50–53, reconstruction algorithm and refining process both determine joints including those that are out of position by using the average distance to a joint location (averaging)). Fang as modified by Ramani does not teach wherein the second ML model is trained to fuse. Xia teaches is trained to fuse (Xia pages 2, an FCRF (ML model) is used to fuse various joint score maps (first and second plurality of features), to refine the joint locations).
Therefore, taking the teachings of Fang as modified by Ramani above, and Xia together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the skeleton reconstruction system including joint refinement of Fang modified by Ramani, with the fusion as disclosed by Xia at least because doing so would allow for reducing the learning difficulty in training neural networks for pose estimation (Xia page 2, left column).
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Fang in view of Ramani in view of Akbas, further in view of Kabaria et al., US Patent Application Publication No. US 2022/0189195 A1 (herein “Kabaria”).
Regarding claim 9, although Fang is directed primarily towards detecting the skeletons/joint locations of players on a sports field, given the breadth of the limitations recited in claim 9, arguably, such images of players on a sports field could become “a medical environment” when players are injured and need medical attention, particularly since Feng teaches monitoring players on the field with its inventive skeleton detecting algorithm. Nonetheless, Feng does not explicitly teach that the images analyzed are associated with a medical environment. Kabaria however, teaches wherein the scene depicted by the image is associated with a medical environment and wherein the at least one processor is configured to obtain the image from a sensing device installed in the medical environment (Kabaria Abstract, ¶¶25, 3–5, joints detected from patient’s hand for use by medical providers in telehealth scenarios).
Therefore, taking the teachings of Fang as modified by Ramani above, and Kabaria together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the skeleton reconstruction system including joint refinement of Fang modified by Ramani, with the medical environment application as disclosed by Kabaria at least because doing so would allow medical providers and patients to utilize their time and resources more effectively since patient information regarding joints can be easily transferred to a patient’s medical chart (Kabaria ¶25).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.



Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908. The examiner can normally be reached Monday-Thursday, 09:00-17:00, Friday 09:00-13:00, EDT/EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached at 571-272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MICHELLE M. KOETH
Primary Examiner
Art Unit 2671



/MICHELLE M KOETH/Primary Examiner, Art Unit 2671

Read full office action

Prosecution Timeline

Apr 11, 2023

Application Filed

Jun 27, 2025

Non-Final Rejection — §103

Oct 01, 2025

Response Filed

Oct 22, 2025

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/297,396

Patent 12586221

METHOD AND APPARATUS FOR ESTIMATING DEPTH INFORMATION OF IMAGES

2y 5m to grant Granted Mar 24, 2026

17/886,027

Patent 12579651

IMPEDED DIFFUSION FRACTION FOR QUANTITATIVE IMAGING DIAGNOSTIC ASSAY

2y 5m to grant Granted Mar 17, 2026

17/988,795

Patent 12567241

Method For Generating Training Data Used To Learn Machine Learning Model, System, And Non-Transitory Computer-Readable Storage Medium Storing Computer Program

2y 5m to grant Granted Mar 03, 2026

18/132,751

Patent 12567177

METHOD, ELECTRONIC DEVICE, AND COMPUTER PROGRAM PRODUCT FOR IMAGE PROCESSING

2y 5m to grant Granted Mar 03, 2026

18/221,227

Patent 12566493

METHODS AND SYSTEMS FOR EYE-GAZE LOCATION DETECTION AND ACCURATE COLLECTION OF EYE-GAZE DATA

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

77%

Grant Probability

94%

With Interview (+16.7%)

2y 4m

Median Time to Grant

Moderate

PTA Risk

Based on 429 resolved cases by this examiner. Grant probability derived from career allow rate.

SYSTEMS AND METHODS FOR MULTI-PERSON POSE ESTIMATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email