Last updated: May 29, 2026
Application No. 17/888,507
ACTION RECOGNITION DEVICE AND METHOD AND ELECTRONIC DEVICE

Final Rejection §103
Filed
Aug 16, 2022
Priority
Sep 18, 2021 — CN 202111097743.1
Examiner
TRAN, DUY ANH
Art Unit
2674
Tech Center
2600 — Communications
Assignee
Fujitsu Limited
OA Round
4 (Final)
Interview Optional

— +18.4% interview lift. Examiner has a relatively high allowance rate (80%); +18.4% interview lift. A written response may suffice.
Based on 133 resolved cases, 2023–2026
Examiner Intelligence

TRAN, DUY ANH View full profile →
Grants 80% — above average
Career Allowance Rate
107 granted / 133 resolved
+18.5% vs TC avg
Strong +18% interview lift
Without
With
+18.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
21 currently pending
Career history
162
Total Applications
across all art units
Statute-Specific Performance

§101
1.0%
-39.0% vs TC avg
§103
81.7%
+41.7% vs TC avg
§102
12.2%
-27.8% vs TC avg
§112
3.4%
-36.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 133 resolved cases
Office Action

§103
DETAILED ACTION
This Action is in response to Applicant’s response filed on 11/12/2025.  Claims 1-10 and newly adding claims 11-12 are still pending in the present application.  This Action is made FINAL.
Response to Arguments
Applicant's arguments filed on 07/31/2024 have been fully considered but are moot in view of the new ground(s) rejection in view of Cao Zhe et al (“OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields”; Cao).

Claims Status
Claim(s) 1-2, 7 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cao Zhe et al (“OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields”; Cao), in view of Qin Yongyin et al ( CN-111310625 A; Qin).
Claim(s) 3-6, 8-9 and 11-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cao Zhe et al (“OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields”; Cao), in view of Qin Yongyin et al ( CN-111310625 A; Qin), and in further view of Ning et al (U.S. 20210090284 A1; Ning).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-2, 7 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cao Zhe et al (“OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields”; Cao), in view of Qin Yongyin et al ( CN-111310625 A; Qin).

Regarding claim 1, Cao discloses an action recognition device, (4. OpenPose: “ OpenPose [4], the first real-time multiperson system to jointly detect human body, foot, hand, and facial keypoints (in total 135 keypoints) on single images.”) characterized in that the device comprises: a processor coupled to a memory and configured to, (4.1 System: “OpenPose overcome all of these problems. It can run on different platforms, including Ubuntu, Windows, Mac OSX, and embedded systems (e.g., Nvidia Tegra TX2). It also provides support for different hardware, such as CUDA GPUs, OpenCL GPUs, and CPU only devices. The user can select an input between images, video, webcam, and IP camera streaming. He can also select whether to display the results or save them on disk, enable or disable each detector (body, foot, face, and hand), enable pixel coordinate normalization, control how many GPUs to use,”)
perform key point recognition on one or more objects in a video frame (1. Introduction: “we consider a core component in obtaining a detailed understanding of people in images and videos: human 2D pose estimation”) by using a neural network to obtain key point information and to calculate part affinity field (PAF) score(s) between any two key points according to part affinity field information and the key point information; (Figs. 2-3 ; 3. Method – 3.1. Network Architecture – 3.2. Simultaneous Detection and Association: “First, a feedforward network predicts a set of 2D confidence maps S of body part locations (Fig. 2b) and a set of 2D vector fields L of part affinity fields (PAFs), which encode the degree of association between parts (Fig. 2c). …  The image is analyzed by a CNN (initialized by the first 10 layers of VGG-19 [53] and fine-tuned), generating a set of feature maps F that is input to the first stage. At this stage, the network produces a set of part affinity fields (PAFs) L1 ¼ f1ðFÞ, where f1 refers to the CNNs for inference at Stage 1.”; 3.5 Multi-Person Parsing Using PAFs: “ We perform non-maximum suppression on the detection confidence maps to obtain a discrete set of part candidate locations. For each part, we may have several candidates, due to multiple people in the image or false positives (Fig. 6b). These part candidates define a large set of possible limbs. We score each candidate limb using the line integral computation on the PAF, defined in Eq. (11).”)
perform key point connection according to the key point information and the part affinity field score(s); (Figs. 5-6 and 3.4 Part Affinity Fields for Part Association - 3.5 Multi-Person Parsing Using PAFs: “we measure association between candidate part detections by computing the line integral over the corresponding PAF along the line segment connecting the candidate part locations … We score each candidate limb using the line integral computation on the PAF, defined in Eq. (11) … This case is shown in Fig. 5b. In this graph matching problem, nodes of the graph are the body part detection candidates Dj1 and Dj2 , and the edges are all possible connections between pairs of detection candidates.”)
generate multiple key point connection candidates after traversing all relation pairs (all limb connection candidates), wherein key point connection relationship pairs are determined according to the key point information and a final relationship pair is determined according to the part affinity field score(s); (3.5 Multi-Person Parsing Using PAFs: “we first obtain a set of body part detection candidates DJ for multiple people, … These part detection candidates still need to be associated with other parts from the same person—in other words, we need to find the pairs of part detections that are in fact connected limbs. We define a variable to indicate whether two detection candidates dmj1 and dnj2 are connected, and the goal is to find the optimal assignment for the set of all possible connections, … the limb connection candidates for each limb type independently using Eqs. (13), (14), and (15). With all limb connection candidates, we can assemble the connections that share the same part detection candidates into full-body poses of multiple people … While the original approach started from a root component, our algorithm sorts all pairwise possible connections by their PAF score.”)
for at least two of the multiple key point connection candidates, determine whether one of the at least two key point connection candidates is valid, to perform selection on the multiple key point connection candidates; (Fig. 5: “Part association strategies. (a) The body part detection candidates (red and blue dots) for two body part types and all connection candidates (grey lines). (b) The connection results using the midpoint (yellow dots) representation: correct connections (black lines) and incorrect connections (green lines) that also satisfy the incidence constraint. (c) The results using PAFs (yellow arrows). By encoding position and orientation over the support of the limb, PAFs eliminate false associations.” ; 3.4 Part Affinity Fields for Part Association: “ when people crowd together—as they are prone to do—these midpoints are likely to support false associations (shown as green lines in Fig. 5b) … Part Affinity Fields (PAFs) address these limitations. They preserve both location and orientation information across the region of support of the limb (as shown in Fig. 5c). … This case is shown in Fig. 5b. In this graph matching problem, nodes of the graph are the body part detection candidates Dj1 and Dj2 , and the edges are all possible connections between pairs of detection candidates …obtain the limb connection candidates for each limb type independently using Eqs. (13), (14), and (15)”)  and 
perform action recognition on the one or more objects according to the selected key point connection candidates wherein key points belong to a same object are connected and/or key points for multiple objects are grouped respectively. (Fig. 2e: Parsing Result: full body poses for all people in the image; Fig. 5(c):The results using PAFs (yellow arrows) ; 3. Method: “The system takes, as input, a color image of size w x h (Fig. 2a) and produces the 2D locations of anatomical keypoints for each person in the image (Fig. 2e) … Finally, the confidence maps and the PAFs are parsed by greedy inference (Fig. 2d) to output the 2D keypoints for all people in the image” ; 3.5 Multi-Person Parsing Using PAFs: “This case is shown in Fig. 5b. In this graph matching problem, nodes of the graph are the body part detection candidates Dj1 and Dj2 , and the edges are all possible connections between pairs of detection candidates …obtain the limb connection candidates for each limb type independently using Eqs. (13), (14), and (15). With all limb connection candidates, we can assemble the connections that share the same part detection candidates into full-body poses of multiple people.”)
However, Cao does not disclose generate multiple key point connection candidates after traversing all relation pairs according to final relationship pairs, wherein key point connection relationship pairs are determined according to the key point information and the final relationship pair is a relationship pair with a smallest part affinity field score(s) score between two key points;
Qin discloses perform key point recognition on one or more objects an object in a video frame by using a neural network to obtain key point information and to calculate part affinity field score(s) between any two key points according to part affinity field information and the key point information; (Paragraph 46-47: “Key point generation module: Extract the key point positions of the bone features in the image data to generate a feature map. The key point production module is realized by convolutional neural network. … input image data acquisition equipment to collect images of multi-person human torso or directly use the picture through the VGG19 network to use bottom-up The method first detects the bone features of the human body to extract key points; after the key points are extracted, the key points are connected to form a multi-person posture display. In the process of key point connection, part of the affinity domain PAFs is used, and it is used to learn the body part”; Paragraph 68: “NMS non-maximum suppression is performed on the detection confidence map, and a set of discrete candidate positions are obtained. For each limb part, we may have several candidates, which define a large number of possible branches. Use the line integral calculation on PAF to score each candidate limb to find the optimal solution. In order to guide the network to repeatedly predict the confidence of key points, predict the confidence of the body part in the network module and the affinity of the key point.”; Paragraphs 81-83)
perform key point connection according to the key point information and the part affinity field score(s); (Paragraph 47: “The method first detects the bone features of the human body to extract key points; after the key points are extracted, the key points are connected to form a multi-person posture display. In the process of key point connection, part of the affinity domain PAFs is used, and it is used to learn the body part”; Paragraph 51: “Key point affinity vector field prediction network module: According to the feature map, establish the association between the feature points, generate the confidence level and generate the vector field L of each limb. Determining the connection between key points not only only looks at the detection results of all key points, but also finds an explicit feature expression based on the visual characteristics of the image to find the key point information of the human body, and proposes one of the key points of the human body affinity field prediction. … Each keypoint affinity field PAF is a two-dimensional vector field of each limb. For each pixel belonging to a specific limb area, the two-dimensional vector encodes the direction that one part of the limb points to another part. Each type of limb has a corresponding PAF connecting its two related body parts.”; Paragraph 84)
generate multiple key point connection candidates after traversing all relation pairs according to final relationship pairs, wherein key point connection relationship pairs are determined according to the key point information (Key points are used as effective key points) and the final relationship pair is a relationship pair with a smallest part affinity field score(s) score between two key points; Paragraphs 76-77: “the key point clustering module applies a threshold (for example, 0.1) to the confidence map for the key points of each type of limb to eliminate the key points whose confidence is lower than the threshold, and generates a binary graph to obtain the number of persons corresponding to this type of limb Key points are used as effective key points. Find the relationship pair connecting a key point to all other key points, and use the relationship pair with the smallest affinity between the two key points as the final relationship pair. Traverse the key point affinity vector field to predict the relationship pair finally output by the network module”; Paragraph 85)
Therefore, it would been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of  Cao by including key point clustering module uses the multi-objective evolutionary algorithm to optimize the connection between the key point for global optimization that is taught by Qin, to make the invention that A Openpose-based Multi-person Posture Detecting Method And System; thus, one of ordinary skilled in the art would have been motivated to combine the references since this will improving the detection efficiency as well as enhancing run-time performance and precision of maintaining high recognition accuracy.
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention.

Regarding claim 2, Cao, as modified by Qin, discloses all the claims invention. Cao further discloses the processor is configured to: obtain part affinity field information and confidence map information of the object according to the key point recognition; (3. Method: “The system takes, as input, a color image of size w-h (Fig. 2a) and produces the 2D locations of anatomical keypoints for each person in the image (Fig. 2e). First, a feedforward network predicts a set of 2D confidence maps S of body part locations (Fig. 2b) and a set of 2D vector fields L of part affinity fields (PAFs), which encode the degree of association between parts (Fig. 2c).”)
 and obtain the key point information according to the confidence map information. (3.3. Confidence Maps for Part Detection: “we generate the groundtruth confidence maps S from the annotated 2D keypoints. Each confidence map is a 2D representation of the belief that a particular body part can be located in any given pixel. Ideally, if a single person appears in the image, a single peak should exist in each confidence map if the corresponding part is visible; if multiple people are in the image, there should be a peak corresponding to each visible part j for each person k”)

Regarding claim 7, Cao discloses an action recognition method, characterized in that the method (Abstract: “we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image”) comprises: 
performing key point recognition on one or more objects in a video frame (1. Introduction: “we consider a core component in obtaining a detailed understanding of people in images and videos: human 2D pose estimation”) by using a neural network to obtain key point information and to calculate part affinity field (PAF) score(s) between any two key points according to part affinity field information and the key point information; (Figs. 2-3 ; 3. Method – 3.1. Network Architecture – 3.2. Simultaneous Detection and Association: “First, a feedforward network predicts a set of 2D confidence maps S of body part locations (Fig. 2b) and a set of 2D vector fields L of part affinity fields (PAFs), which encode the degree of association between parts (Fig. 2c). …  The image is analyzed by a CNN (initialized by the first 10 layers of VGG-19 [53] and fine-tuned), generating a set of feature maps F that is input to the first stage. At this stage, the network produces a set of part affinity fields (PAFs) L1 ¼ f1ðFÞ, where f1 refers to the CNNs for inference at Stage 1.”; 3.5 Multi-Person Parsing Using PAFs: “ We perform non-maximum suppression on the detection confidence maps to obtain a discrete set of part candidate locations. For each part, we may have several candidates, due to multiple people in the image or false positives (Fig. 6b). These part candidates define a large set of possible limbs. We score each candidate limb using the line integral computation on the PAF, defined in Eq. (11).”)
performing key point connection according to the key point information and the part affinity field score(s); (Figs. 5-6 and 3.4 Part Affinity Fields for Part Association - 3.5 Multi-Person Parsing Using PAFs: “we measure association between candidate part detections by computing the line integral over the corresponding PAF along the line segment connecting the candidate part locations … We score each candidate limb using the line integral computation on the PAF, defined in Eq. (11) … This case is shown in Fig. 5b. In this graph matching problem, nodes of the graph are the body part detection candidates Dj1 and Dj2 , and the edges are all possible connections between pairs of detection candidates.”)
generating multiple key point connection candidates after traversing all relation pairs (all limb connection candidates), wherein key point connection relationship pairs are determined according to the key point information and a final relationship pair is determined according to the part affinity field score(s); (3.5 Multi-Person Parsing Using PAFs: “we first obtain a set of body part detection candidates DJ for multiple people, … These part detection candidates still need to be associated with other parts from the same person—in other words, we need to find the pairs of part detections that are in fact connected limbs. We define a variable to indicate whether two detection candidates dmj1 and dnj2 are connected, and the goal is to find the optimal assignment for the set of all possible connections, … the limb connection candidates for each limb type independently using Eqs. (13), (14), and (15). With all limb connection candidates, we can assemble the connections that share the same part detection candidates into full-body poses of multiple people … While the original approach started from a root component, our algorithm sorts all pairwise possible connections by their PAF score.”)
for at least two of the multiple key point connection candidates, determine whether one of the at least two key point connection candidates is valid, to perform selection on the multiple key point connection candidates; (Fig. 5: “Part association strategies. (a) The body part detection candidates (red and blue dots) for two body part types and all connection candidates (grey lines). (b) The connection results using the midpoint (yellow dots) representation: correct connections (black lines) and incorrect connections (green lines) that also satisfy the incidence constraint. (c) The results using PAFs (yellow arrows). By encoding position and orientation over the support of the limb, PAFs eliminate false associations.” ; 3.4 Part Affinity Fields for Part Association: “ when people crowd together—as they are prone to do—these midpoints are likely to support false associations (shown as green lines in Fig. 5b) … Part Affinity Fields (PAFs) address these limitations. They preserve both location and orientation information across the region of support of the limb (as shown in Fig. 5c). … This case is shown in Fig. 5b. In this graph matching problem, nodes of the graph are the body part detection candidates Dj1 and Dj2 , and the edges are all possible connections between pairs of detection candidates …obtain the limb connection candidates for each limb type independently using Eqs. (13), (14), and (15)”)  and 
performing action recognition on the one or more objects according to the selected key point connection candidates wherein key points belong to a same object are connected and/or key points for multiple objects are grouped respectively. (Fig. 2e: Parsing Result: full body poses for all people in the image; Fig. 5(c):The results using PAFs (yellow arrows) ; 3. Method: “The system takes, as input, a color image of size w x h (Fig. 2a) and produces the 2D locations of anatomical keypoints for each person in the image (Fig. 2e) … Finally, the confidence maps and the PAFs are parsed by greedy inference (Fig. 2d) to output the 2D keypoints for all people in the image” ; 3.5 Multi-Person Parsing Using PAFs: “This case is shown in Fig. 5b. In this graph matching problem, nodes of the graph are the body part detection candidates Dj1 and Dj2 , and the edges are all possible connections between pairs of detection candidates …obtain the limb connection candidates for each limb type independently using Eqs. (13), (14), and (15). With all limb connection candidates, we can assemble the connections that share the same part detection candidates into full-body poses of multiple people.”)
	However, Cao does not disclose generating multiple key point connection candidates after traversing all relation pairs according to final relationship pairs, wherein key point connection relationship pairs are determined according to the key point information and the final relationship pair is a relationship pair with a smallest part affinity field score(s) score between two key points;
Qin discloses performing key point recognition on one or more objects an object in a video frame by using a neural network to obtain key point information and to calculate part affinity field score(s) between any two key points according to part affinity field information and the key point information; (Paragraph 46-47: “Key point generation module: Extract the key point positions of the bone features in the image data to generate a feature map. The key point production module is realized by convolutional neural network. … input image data acquisition equipment to collect images of multi-person human torso or directly use the picture through the VGG19 network to use bottom-up The method first detects the bone features of the human body to extract key points; after the key points are extracted, the key points are connected to form a multi-person posture display. In the process of key point connection, part of the affinity domain PAFs is used, and it is used to learn the body part”; Paragraph 68: “NMS non-maximum suppression is performed on the detection confidence map, and a set of discrete candidate positions are obtained. For each limb part, we may have several candidates, which define a large number of possible branches. Use the line integral calculation on PAF to score each candidate limb to find the optimal solution. In order to guide the network to repeatedly predict the confidence of key points, predict the confidence of the body part in the network module and the affinity of the key point.”; Paragraphs 81-83)
performing key point connection according to the key point information and the part affinity field score(s); (Paragraph 47: “The method first detects the bone features of the human body to extract key points; after the key points are extracted, the key points are connected to form a multi-person posture display. In the process of key point connection, part of the affinity domain PAFs is used, and it is used to learn the body part”; Paragraph 51: “Key point affinity vector field prediction network module: According to the feature map, establish the association between the feature points, generate the confidence level and generate the vector field L of each limb. Determining the connection between key points not only only looks at the detection results of all key points, but also finds an explicit feature expression based on the visual characteristics of the image to find the key point information of the human body, and proposes one of the key points of the human body affinity field prediction. … Each keypoint affinity field PAF is a two-dimensional vector field of each limb. For each pixel belonging to a specific limb area, the two-dimensional vector encodes the direction that one part of the limb points to another part. Each type of limb has a corresponding PAF connecting its two related body parts.”; Paragraph 84)
generating multiple key point connection candidates after traversing all relation pairs according to final relationship pairs, wherein key point connection relationship pairs are determined according to the key point information (Key points are used as effective key points) and the final relationship pair is a relationship pair with a smallest part affinity field score(s) score between two key points; Paragraphs 76-77: “the key point clustering module applies a threshold (for example, 0.1) to the confidence map for the key points of each type of limb to eliminate the key points whose confidence is lower than the threshold, and generates a binary graph to obtain the number of persons corresponding to this type of limb Key points are used as effective key points. Find the relationship pair connecting a key point to all other key points, and use the relationship pair with the smallest affinity between the two key points as the final relationship pair. Traverse the key point affinity vector field to predict the relationship pair finally output by the network module”; Paragraph 85)
Therefore, it would been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of  Cao by including key point clustering module uses the multi-objective evolutionary algorithm to optimize the connection between the key point for global optimization that is taught by Qin, to make the invention that A Openpose-based Multi-person Posture Detecting Method And System; thus, one of ordinary skilled in the art would have been motivated to combine the references since this will improving the detection efficiency as well as enhancing run-time performance and precision of maintaining high recognition accuracy.
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention.

Regarding claim 10, Cao discloses an electronic device, (4. OpenPose: “ OpenPose [4], the first real-time multiperson system to jointly detect human body, foot, hand, and facial keypoints (in total 135 keypoints) on single images.”) comprising a memory and a processor, the memory storing a computer program, and the processor being configured to execute the computer program to (4.1 System: “OpenPose overcome all of these problems. It can run on different platforms, including Ubuntu, Windows, Mac OSX, and embedded systems (e.g., Nvidia Tegra TX2). It also provides support for different hardware, such as CUDA GPUs, OpenCL GPUs, and CPU only devices. The user can select an input between images, video, webcam, and IP camera streaming. He can also select whether to display the results or save them on disk, enable or disable each detector (body, foot, face, and hand), enable pixel coordinate normalization, control how many GPUs to use,”) carry out
 perform key point recognition on one or more objects in a video frame (1. Introduction: “we consider a core component in obtaining a detailed understanding of people in images and videos: human 2D pose estimation”) by using a neural network to obtain key point information and to calculate part affinity field (PAF) score(s) between any two key points according to part affinity field information and the key point information; (Figs. 2-3 ; 3. Method – 3.1. Network Architecture – 3.2. Simultaneous Detection and Association: “First, a feedforward network predicts a set of 2D confidence maps S of body part locations (Fig. 2b) and a set of 2D vector fields L of part affinity fields (PAFs), which encode the degree of association between parts (Fig. 2c). …  The image is analyzed by a CNN (initialized by the first 10 layers of VGG-19 [53] and fine-tuned), generating a set of feature maps F that is input to the first stage. At this stage, the network produces a set of part affinity fields (PAFs) L1 ¼ f1ðFÞ, where f1 refers to the CNNs for inference at Stage 1.”; 3.5 Multi-Person Parsing Using PAFs: “ We perform non-maximum suppression on the detection confidence maps to obtain a discrete set of part candidate locations. For each part, we may have several candidates, due to multiple people in the image or false positives (Fig. 6b). These part candidates define a large set of possible limbs. We score each candidate limb using the line integral computation on the PAF, defined in Eq. (11).”)
perform key point connection according to the key point information and the part affinity field score(s); (Figs. 5-6 and 3.4 Part Affinity Fields for Part Association - 3.5 Multi-Person Parsing Using PAFs: “we measure association between candidate part detections by computing the line integral over the corresponding PAF along the line segment connecting the candidate part locations … We score each candidate limb using the line integral computation on the PAF, defined in Eq. (11) … This case is shown in Fig. 5b. In this graph matching problem, nodes of the graph are the body part detection candidates Dj1 and Dj2 , and the edges are all possible connections between pairs of detection candidates.”)
generate multiple key point connection candidates after traversing all relation pairs (all limb connection candidates), wherein key point connection relationship pairs are determined according to the key point information and a final relationship pair is determined according to the part affinity field score(s); (3.5 Multi-Person Parsing Using PAFs: “we first obtain a set of body part detection candidates DJ for multiple people, … These part detection candidates still need to be associated with other parts from the same person—in other words, we need to find the pairs of part detections that are in fact connected limbs. We define a variable to indicate whether two detection candidates dmj1 and dnj2 are connected, and the goal is to find the optimal assignment for the set of all possible connections, … the limb connection candidates for each limb type independently using Eqs. (13), (14), and (15). With all limb connection candidates, we can assemble the connections that share the same part detection candidates into full-body poses of multiple people … While the original approach started from a root component, our algorithm sorts all pairwise possible connections by their PAF score.”)
for at least two of the multiple key point connection candidates, determine whether one of the at least two key point connection candidates is valid, to perform selection on the multiple key point connection candidates; (Fig. 5: “Part association strategies. (a) The body part detection candidates (red and blue dots) for two body part types and all connection candidates (grey lines). (b) The connection results using the midpoint (yellow dots) representation: correct connections (black lines) and incorrect connections (green lines) that also satisfy the incidence constraint. (c) The results using PAFs (yellow arrows). By encoding position and orientation over the support of the limb, PAFs eliminate false associations.” ; 3.4 Part Affinity Fields for Part Association: “ when people crowd together—as they are prone to do—these midpoints are likely to support false associations (shown as green lines in Fig. 5b) … Part Affinity Fields (PAFs) address these limitations. They preserve both location and orientation information across the region of support of the limb (as shown in Fig. 5c). … This case is shown in Fig. 5b. In this graph matching problem, nodes of the graph are the body part detection candidates Dj1 and Dj2 , and the edges are all possible connections between pairs of detection candidates …obtain the limb connection candidates for each limb type independently using Eqs. (13), (14), and (15)”)  and 
perform action recognition on the one or more objects according to the selected key point connection candidates wherein key points belong to a same object are connected and/or key points for multiple objects are grouped respectively. (Fig. 2e: Parsing Result: full body poses for all people in the image; Fig. 5(c):The results using PAFs (yellow arrows) ; 3. Method: “The system takes, as input, a color image of size w x h (Fig. 2a) and produces the 2D locations of anatomical keypoints for each person in the image (Fig. 2e) … Finally, the confidence maps and the PAFs are parsed by greedy inference (Fig. 2d) to output the 2D keypoints for all people in the image” ; 3.5 Multi-Person Parsing Using PAFs: “This case is shown in Fig. 5b. In this graph matching problem, nodes of the graph are the body part detection candidates Dj1 and Dj2 , and the edges are all possible connections between pairs of detection candidates …obtain the limb connection candidates for each limb type independently using Eqs. (13), (14), and (15). With all limb connection candidates, we can assemble the connections that share the same part detection candidates into full-body poses of multiple people.”)
	However, Cao does not disclose generate multiple key point connection candidates after traversing all relation pairs according to final relationship pairs, wherein key point connection relationship pairs are determined according to the key point information and the final relationship pair is a relationship pair with a smallest part affinity field score(s) score between two key points;
Qin discloses perform key point recognition on one or more objects an object in a video frame by using a neural network to obtain key point information and to calculate part affinity field score(s) between any two key points according to part affinity field information and the key point information; (Paragraph 46-47: “Key point generation module: Extract the key point positions of the bone features in the image data to generate a feature map. The key point production module is realized by convolutional neural network. … input image data acquisition equipment to collect images of multi-person human torso or directly use the picture through the VGG19 network to use bottom-up The method first detects the bone features of the human body to extract key points; after the key points are extracted, the key points are connected to form a multi-person posture display. In the process of key point connection, part of the affinity domain PAFs is used, and it is used to learn the body part”; Paragraph 68: “NMS non-maximum suppression is performed on the detection confidence map, and a set of discrete candidate positions are obtained. For each limb part, we may have several candidates, which define a large number of possible branches. Use the line integral calculation on PAF to score each candidate limb to find the optimal solution. In order to guide the network to repeatedly predict the confidence of key points, predict the confidence of the body part in the network module and the affinity of the key point.”; Paragraphs 81-83)
perform key point connection according to the key point information and the part affinity field score(s); (Paragraph 47: “The method first detects the bone features of the human body to extract key points; after the key points are extracted, the key points are connected to form a multi-person posture display. In the process of key point connection, part of the affinity domain PAFs is used, and it is used to learn the body part”; Paragraph 51: “Key point affinity vector field prediction network module: According to the feature map, establish the association between the feature points, generate the confidence level and generate the vector field L of each limb. Determining the connection between key points not only only looks at the detection results of all key points, but also finds an explicit feature expression based on the visual characteristics of the image to find the key point information of the human body, and proposes one of the key points of the human body affinity field prediction. … Each keypoint affinity field PAF is a two-dimensional vector field of each limb. For each pixel belonging to a specific limb area, the two-dimensional vector encodes the direction that one part of the limb points to another part. Each type of limb has a corresponding PAF connecting its two related body parts.”; Paragraph 84)
generate multiple key point connection candidates after traversing all relation pairs according to final relationship pairs, wherein key point connection relationship pairs are determined according to the key point information (Key points are used as effective key points) and the final relationship pair is a relationship pair with a smallest part affinity field score(s) score between two key points; Paragraphs 76-77: “the key point clustering module applies a threshold (for example, 0.1) to the confidence map for the key points of each type of limb to eliminate the key points whose confidence is lower than the threshold, and generates a binary graph to obtain the number of persons corresponding to this type of limb Key points are used as effective key points. Find the relationship pair connecting a key point to all other key points, and use the relationship pair with the smallest affinity between the two key points as the final relationship pair. Traverse the key point affinity vector field to predict the relationship pair finally output by the network module”; Paragraph 85)
Therefore, it would been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of  Cao by including key point clustering module uses the multi-objective evolutionary algorithm to optimize the connection between the key point for global optimization that is taught by Qin, to make the invention that A Openpose-based Multi-person Posture Detecting Method And System; thus, one of ordinary skilled in the art would have been motivated to combine the references since this will improving the detection efficiency as well as enhancing run-time performance and precision of maintaining high recognition accuracy.
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention.

Claim(s) 3-6, 8-9 and 11-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cao Zhe et al (“OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields”; Cao), in view of Qin Yongyin et al ( CN-111310625 A; Qin), and in further view of Ning et al (U.S. 20210090284 A1; Ning).

Regarding claim 3, Cao, as modified by Qin, discloses all the claims invention except wherein the at least two key point connection candidates include a first key point connection candidate and a second key point connection candidate; and the processor is configured to determine whether the first key point connection candidate is valid according to a position and size of a bounding box of the first key point connection candidate and a position and size of a bounding box of the second key point connection candidate.
Ning discloses the at least two key point connection candidates include a first key point connection candidate and a second key point connection candidate; (Paragraph 134: “The re-identification module 180 has a SGCN structure, the input is bounding boxes and keypoints for comparison, and the output are two pose feature vectors or specifically a distance between the two pose feature vectors, and a determination of whether the distance is small enough that the two objects defined by the bounding boxes and keypoints are the same object.”)  and the processor is configured to determine whether the first key point connection candidate is valid according to a position and size of a bounding box of the first key point connection candidate and a position and size of a bounding box of the second key point connection candidate. (Paragraph 92: “ The keypoints defines pose of the object, the number of the keypoints for each object is 15, and the 15 keypoints are, for example, “right knee,” “left knee, … the inferred bounding box is inferred by: defining an enclosing box (or namely minimum bounding box or smallest bounding box) with the smallest measure within which all the points lie, and enlarging the rectangular box by 20% in horizontal direction and vertical direction to obtain the inferred bounding box. The enclosing box is also called minimum bounding box or smallest bounding box.”)
Therefore, it would been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of  Cao and Quin by including a system for pose tracking that is taught by Ning, to make the invention that A system and a method for pose tracking, particularly for top-down, online, multi-person pose tracking; thus, one of ordinary skilled in the art would have been motivated to combine the references since this will improving the pose tracking performance such as conveniently track specific targets as well as the tracking is performed using keypoints based on enlarged regions, which is fast yet accurate.
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention.

Regarding claim 4, Cao, as modified by Qin and Ning, discloses all the claims invention. Ning further discloses the processor is further configured to: according to the number of key points of the first key point connection candidate and the number of key points of the second key point connection candidate, (Paragraph 140: “At procedure 706, the pose estimation module 162 determines keypoints for each of the bounding box. Now each object is featured with frame ID and its bounding box coordinates, and the keypoints are featured with category ID and their coordinates. The category ID of the keypoints corresponds to parts of a human body, such as 15 keypoints corresponding to head, shoulder, elbow, wrist, waist, knee, ankle, etc. Each of the keypoints has a confidence score.”) determine whether the first key point connection candidate is valid. (Paragraph 141: “At procedure 708, the re-identification module 180 matches pose of the detected object (with bounding box/keypoints) to poses of stored objects (with bounding boxes/keypoints). When there is a match, the re-identification module 180 assigns the detected object an object ID the same as the object ID of the matched object.”)

Regarding claim 5, Cao, as modified by Qin and Ning, discloses all the claims invention. Ning further discloses the processor is configured to: in a case where the bounding box of the first key point connection candidate (Fig.5: 510- total 15 key points) is covered by the bounding box of the second key point connection candidate, (Fig.5: connection of point 514 to 518b) determine whether the number of key points of the first key point connection candidate is less than a first proportion of a total number of key points of the object and determine whether the number of key points of the second key point connection candidate is greater than a second proportion of the total number of the key points of the object, (Paragraph 105: “The sequential keypoints, each corresponding to a specific part of human body, indicate pose of the human object. The result may include the frame ID, the object ID, coordinates of the bounding box, coordinates of the keypoints, and confidence score (or average confidence score) of the keypoints. The pose estimation module 162 takes the advantages that an object's coarse location defined by the bounding box helps the determination of the keypoints of the object, and the keypoints of the object indicates rough location of the object in the frame. By recurrently estimating the object location and the keypoints of the object, the pose estimation module 162 may estimate object pose and determine the object keypoints accurately and efficiently.”; Paragraph 106: “the object state module 164 determines the possibility the determined keypoints reside in the region of the current frame that is covered by the inferred bounding box. … An average of the confidence scores s of the estimated keypoints of the object is calculated and compared to a standard error τ.sub.s.”) and 
in a case where the number of key points of the first key point connection candidate is less than the first proportion of the total number of the key points of the object and the number of key points of the second key point connection candidate is greater than the second proportion of the total number of the key points of the object, determine that the first key point connection candidate is invalid and discard the first key point connection candidate. (Fig.5 and Fig. 7; Paragraph 140: “The category ID of the keypoints corresponds to parts of a human body, such as 15 keypoints corresponding to head, shoulder, elbow, wrist, waist, knee, ankle, etc. Each of the keypoints has a confidence score.” ; Paragraph 144: “At procedure 712, the pose estimation module 162 uses the inferred bounding box and the (j+1)-th frame to estimate keypoints in the (j+1)-th frame in the area covered by the inferred bounding box. The pose estimation module 162 output heatmaps for estimating keypoints, and each keypoint has a confidence score based on the heatmap.”; Paragraph 145: “At procedure 714, the object state module 164 calculates object state based on the confidence score of the estimated keypoints in the (j+1)-th frame. When the averaged confidence score is large, the object state is “tracked.” … When the averaged confidence score is small, the object state is “lost.”)

Regarding claim 6, Cao, as modified by Qin and Ning, discloses all the claims invention. Ning further discloses the bounding box of the first key point connection candidate is a smallest first rectangular box containing all key points in the first key point connection candidate;(Paragraph 92: “defining an enclosing box (or namely minimum bounding box or smallest bounding box) with the smallest measure within which all the points lie, and enlarging the rectangular box by 20% in horizontal direction and vertical direction to obtain the inferred bounding box. The enclosing box is also called minimum bounding box or smallest bounding box. … The enclosing box is provided by picking up the top-most keypoint, the lowest keypoint, the leftmost keypoint, and the rightmost keypoint from the 15 keypoints, and draw a rectangular box.”) and 
the bounding box of the second key point connection candidate is a smallest second rectangular box containing all key points in the second key point connection candidate, or the bounding box of the second key point connection candidate is a rectangular box obtained by expanding longer sides of the second rectangular box by a proportion and/or by expanding shorter sides of the second rectangular box by a proportion. (Paragraph 92: “The pose tracking module 160 is then configured to infer an inferred bounding box for each object from the determined keypoints. In certain embodiments, the inferred bounding box is inferred by: defining an enclosing box (or namely minimum bounding box or smallest bounding box) with the smallest measure within which all the points lie, and enlarging the rectangular box by 20% in horizontal direction and vertical direction to obtain the inferred bounding box. The enclosing box is also called minimum bounding box or smallest bounding box.”)

Regarding claim 8, Cao, as modified by Qin, discloses all the claims invention except wherein the at least two key point connection candidates comprise a first key point connection candidate and a second key point connection candidate; and the performing selection on the multiple key point connection candidates comprises: determining whether the first key point connection candidate is valid according to a position and size of a bounding box of the first key point connection candidate and a position and size of a bounding box of the second key point connection candidate.
Ning discloses the at least two key point connection candidates comprise a first key point connection candidate and a second key point connection candidate; (Paragraph 134: “The re-identification module 180 has a SGCN structure, the input is bounding boxes and keypoints for comparison, and the output are two pose feature vectors or specifically a distance between the two pose feature vectors, and a determination of whether the distance is small enough that the two objects defined by the bounding boxes and keypoints are the same object.”) and the performing selection on the multiple key point connection candidates comprises: determining whether the first key point connection candidate is valid according to a position and size of a bounding box of the first key point connection candidate and a position and size of a bounding box of the second key point connection candidate. (Paragraph 92: “ The keypoints defines pose of the object, the number of the keypoints for each object is 15, and the 15 keypoints are, for example, “right knee,” “left knee, … the inferred bounding box is inferred by: defining an enclosing box (or namely minimum bounding box or smallest bounding box) with the smallest measure within which all the points lie, and enlarging the rectangular box by 20% in horizontal direction and vertical direction to obtain the inferred bounding box. The enclosing box is also called minimum bounding box or smallest bounding box.”)
Therefore, it would been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of  Cao and Quin by including a system for pose tracking that is taught by Ning, to make the invention that A system and a method for pose tracking, particularly for top-down, online, multi-person pose tracking; thus, one of ordinary skilled in the art would have been motivated to combine the references since this will improving the pose tracking performance such as conveniently track specific targets as well as the tracking is performed using keypoints based on enlarged regions, which is fast yet accurate.
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention.
Regarding claim 9, Cao, as modified by Qin and Ning, discloses all the claims invention. Ning further discloses the performing selection on the multiple key point connection candidates further comprises: according to the number of key points of the first key point connection candidate and the number of key points of the second key point connection candidate, (Paragraph 140: “At procedure 706, the pose estimation module 162 determines keypoints for each of the bounding box. Now each object is featured with frame ID and its bounding box coordinates, and the keypoints are featured with category ID and their coordinates. The category ID of the keypoints corresponds to parts of a human body, such as 15 keypoints corresponding to head, shoulder, elbow, wrist, waist, knee, ankle, etc. Each of the keypoints has a confidence score.”) determining whether the first key point connection candidate is valid. (Paragraph 141: “At procedure 708, the re-identification module 180 matches pose of the detected object (with bounding box/keypoints) to poses of stored objects (with bounding boxes/keypoints). When there is a match, the re-identification module 180 assigns the detected object an object ID the same as the object ID of the matched object.”)

Regarding claim 11, Cao, as modified by Qin, discloses all the claims invention except wherein the at least two key point connection candidates include a first key point connection candidate and a second key point connection candidate: (Paragraph 134: “The re-identification module 180 has a SGCN structure, the input is bounding boxes and keypoints for comparison, and the output are two pose feature vectors or specifically a distance between the two pose feature vectors, and a determination of whether the distance is small enough that the two objects defined by the bounding boxes and keypoints are the same object.”) and the processor is configured to determine whether the first key point connection candidate is valid according to a position and size of a bounding box of the first key point connection candidate and a position and size of a bounding box of the second key point connection candidate. (Paragraph 92: “ The keypoints defines pose of the object, the number of the keypoints for each object is 15, and the 15 keypoints are, for example, “right knee,” “left knee, … the inferred bounding box is inferred by: defining an enclosing box (or namely minimum bounding box or smallest bounding box) with the smallest measure within which all the points lie, and enlarging the rectangular box by 20% in horizontal direction and vertical direction to obtain the inferred bounding box. The enclosing box is also called minimum bounding box or smallest bounding box.”)
Therefore, it would been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of  Cao and Quin by including a system for pose tracking that is taught by Ning, to make the invention that A system and a method for pose tracking, particularly for top-down, online, multi-person pose tracking; thus, one of ordinary skilled in the art would have been motivated to combine the references since this will improving the pose tracking performance such as conveniently track specific targets as well as the tracking is performed using keypoints based on enlarged regions, which is fast yet accurate.
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention.

Regarding claim 12, Cao, as modified by Qin and Ning, discloses all the claims invention. Ning further discloses the processor is further configured to: according to the number of key points of the first key point connection candidate and the number of key points of the second key point connection candidate, (Paragraph 140: “At procedure 706, the pose estimation module 162 determines keypoints for each of the bounding box. Now each object is featured with frame ID and its bounding box coordinates, and the keypoints are featured with category ID and their coordinates. The category ID of the keypoints corresponds to parts of a human body, such as 15 keypoints corresponding to head, shoulder, elbow, wrist, waist, knee, ankle, etc. Each of the keypoints has a confidence score.”)  determine whether the first key point connection candidate is valid. (Paragraph 141: “At procedure 708, the re-identification module 180 matches pose of the detected object (with bounding box/keypoints) to poses of stored objects (with bounding boxes/keypoints). When there is a match, the re-identification module 180 assigns the detected object an object ID the same as the object ID of the matched object.”)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Black et al (U.S. 20100111370 A1), “Method and Apparatus for Estimating Body Shape”, teaches about estimation of human body shape using a low-dimensional 3D model using sensor data and other forms of input data that may be imprecise, ambiguous or partially obscured. It also teaches about a system and method to estimate human body shape from sensor data where that data is imprecise, ambiguous or partially obscured is described. To make this possible, a low-dimensional 3D model of the human body is employed that accurately captures details of the human form. The method fits the body model to sensor measurements and, because it is low-dimensional, many fewer and less accurate measurements are needed.
Yoo et al (U.S. 20130028517 A1), “Apparatus, Method and Medium Detecting Object Pose”, teaches about An apparatus and method detecting an object pose are provided. Key joint data of an object may be extracted, a candidate pose may be generated based on the extracted key joint data, and a most likely pose may be retrieved using a database, based on the generated candidate pose.
Fang et al (U.S. 20190279014 A1), “Method and Apparatus for Detecting Object Keypoint, and Electronic Device”, teaches about a method and an apparatus for detecting an object keypoint, an electronic device, a computer-readable storage medium, and a computer program. It also teaches about a method and an apparatus for detecting an object keypoint include: obtaining a respective feature map of at least one local regional proposal box of an image to be detected, the at least one local regional proposal box corresponding to at least one target object; and separately performing target object keypoint detection on a corresponding local regional proposal box of the image to be detected according to the feature map of the at least one local regional proposal box.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Duy A Tran whose telephone number is (571)272-4887. The examiner can normally be reached Monday-Friday 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ONEAL R MISTRY can be reached at (313)-446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DUY TRAN/            Examiner, Art Unit 2674                                                                                                                                                                                            

/ONEAL R MISTRY/            Supervisory Patent Examiner, Art Unit 2674
Read full office action
Prosecution Timeline

Show 2 earlier events
Jan 29, 2025
Response Filed
Mar 13, 2025
Final Rejection mailed — §103
Jun 05, 2025
Response after Non-Final Action
Jul 09, 2025
Request for Continued Examination
Jul 10, 2025
Response after Non-Final Action
Aug 12, 2025
Non-Final Rejection mailed — §103
Nov 12, 2025
Response Filed
Jan 15, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/332,927
Patent 12632960
CIGAR TOBACCO LEAF HARVESTING MATURITY IDENTIFICATION METHOD AND SYSTEM BASED ON INTEGRATED LEARNING
2y 11m to grant Granted May 19, 2026
18/085,007
Patent 12614277
OUTPUT DEVICE, METHOD, NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM AND DISPLAY DEVICE
3y 4m to grant Granted Apr 28, 2026
18/035,858
Patent 12608979
GESTURE RECOGNITION APPARATUS AND METHOD FOR RECOGNIZING GESTURE
2y 11m to grant Granted Apr 21, 2026
18/176,497
Patent 12608797
MEDICAL IMAGE DETECTION SYSTEM, TRAINING METHOD AND MEDICAL ANALYZATION METHOD
3y 1m to grant Granted Apr 21, 2026
17/947,989
Patent 12573024
IMAGE AUGMENTATION FOR MACHINE LEARNING BASED DEFECT EXAMINATION
3y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
80%
Grant Probability
99%
With Interview (+18.4%)
2y 10m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 133 resolved cases by this examiner. Grant probability derived from career allowance rate.