Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 2/4/2026 with respect to applied prior arts Jun and Wang have been fully considered but they are not persuasive for the following reasons. Applicant argues that Jun does not appear to disclose or suggest applying the same convolution (the “second convolution”) in higher- and lower-resolution branches. Furthermore, applicant argues that “applying”, to the “first output” and “first suboutput”, the same “second convolution to extract” a “second set of features” and “second set of subfeatures” cannot be found in the prior art of record. Additionally, applicant argues that Jun and Wang do not disclose applying a first convolution and a second convolution to generate the joint and bone heatmaps. Lastly, applicant argues that neither Jun nor Wang disclose generating “(i) one or more first joint heatmaps of the first object, (ii) one or more second joint heatmaps of the second object, (iii) one or more first bone heatmaps of the first object, and (iv) one or more second bone heatmaps of the second object” from merging the second output and the second suboutput.
While the examiner does concede that Jun et. al. fails to fully disclose a neural network engine that includes a higher-resolution branch and a lower-resolution branch, the backbone HRNet is derived from Wang. Thus, Jun and Wang in combination still disclose the heatmaps that are claimed using at least two convolutions. Two convolutions are still used in the neural network with the function maintained as analogous to the claimed invention, to generate heatmaps representative of the skeleton, joint and bone.
Applicant’s arguments with respect to amended claims 1, 8, 9 and 17 have been considered but are moot in view of additional prior art to Choutas et. al. (European Patent EP 3547211 A1). Choutas et. al. addresses the limitations discussed above that the applicant argues Jun and Wang fail to disclose. This includes first the proposed architecture of the CNN. Broken down, the CNN used advantageously in Choutas et. al. comprises 6 convolutional layers CONV, 3 blocks with 2 convolutional layers CONV in each one, and 1 fully-connected layer FC, represented by Figure 7 of Choutas. The heatmaps are generated by applying an “activation function” in the non-linear layer to the information inputted, a first normalized image and second normalized image, which are used as a representation of the evolution of the position estimation of a keypoint during the video taken. In the process, the input image has a higher resolution than the heatmap produced. The spatial resolution of a heatmap can be lower than the input frame, due to the stride of the network. Additionally, the first server (learning server) may be merged with the second server (classification server). Furthermore, in the steps of training the CNN, sub-steps are involved are equivalent within each server to generate the heatmaps of joint and bone distinctly. Finally, for each keypoint (bone, joint, body part of interest), the heatmaps are aggregated into at least one image. Thus, Choutas effectively addresses all the limitations that are details not included in the combination of Jun and Wang in terms of the CNN architecture and function.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s)1-6, 8-14, 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Jun et. al. (Jun, J., Lee, J., & Kim, C. (2020). Human Pose Estimation Using Skeletal Heatmaps. 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 1287-1292.), in view of Wang et .al. (Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., & Xiao, B. (2019). Deep High-Resolution Representation Learning for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 3349-3364.), and in further view of Choutas et. al. (European Patent EP 3547211 A1).
Regarding independent claims 1 and 9, Jun et. al. discloses an apparatus and method (Jun et. al.: implicitly uses a computer for training and inference of the HRNet, figure 2) comprising:
a communications interface at which to receive, from an image source, raw image data (Jun et. al.: first paragraph in section II, the input image in figures 1a and 2), that includes a representation of a first object and a second object (Jun et. al.: section III.A; Figure 5);
a memory storage unit in which to store the raw image data (Jun et. al.: implicitly in section III.A for storing the dataset for training and inference).
However, Jun et. al. fails to fully disclose a neural network engine that includes a higher-resolution branch and a lower-resolution branch and that is configured to:
apply, by the higher-resolution branch to the raw image data, a first convolution to extract a first set of features that is representative of a first output, downsample the first output to extract a first set of subfeatures that is representative of a first suboutput, apply, by the higher-resolution branch to the first output, a second convolution to extract a second set of features that is representative of a second output, apply, by the lower-resolution branch to the first suboutput, the second convolution to extract a second set of subfeatures that is representative of a second suboutput, and merge the second output and the second suboutput to generate
one or more first joint heatmaps of the first object,
one or more joint heatmaps of the second object,
one or more first bone heatmaps of the first object, and
one or more second bone heatmaps of the second object.
Jun et. al. incorporates the HRNet implementation details of Wang et. al.
Jun et. al. and Wang et. al. discloses a neural network engine to apply a first convolution to the raw data to extract first features from a first output (Jun et. al.: first blue blocks before downsampling in the first line in figure 2; Wang et. al.: the stem in the first paragraph in section 3 and the yellow blocks before downsampling the first line in figure 2), to downsample the first output to extract a first set of subfeatures from a first suboutput (Jun et. al.: first blue block in the second line in figure 2; Wang et. al.: the first orange block in the second line in figure 2), to apply a second convolution to the first output to extract a second set of features from a second output (Jun et. al.: second blue blocks before the first fusion step in the first line in Figure 2; Wang et. al.: second yellow blocks before the first fusion step in the first line in Figure 2), and to apply the second convolution to the first suboutput to extract a second set of subfeatures from a second suboutput (Jun et. al.: second blue block in the second line in figure 2; Wang et. al.: the last orange block before the first fusion step in the second line in figure 2), wherein the second output and the second suboutput are merged (Jun et. al.: first fusion step in figure 2; Wang et. al.: first fusion step in figure 2, sections 3.1-3.2, figure 3) to generate joint heatmaps of the first object and the second object (Jun et. al. figure 1f, output heatmaps in figure 2, first paragraph in section II, multiple objects in figure 5), and bone heatmaps of the first object and the second object (Jun et. al. paragraph 6 in section 1: "Given an image in Figure. 1(a), skeletal attention module produces skeletal heatmaps as shown in Fig. 1(b)", HS in figure 3, section II.C).
Neither Jun nor Wang disclose applying a first convolution and a second convolution to generate the joint and bone heatmaps, generating “(i) one or more first joint heatmaps of the first object, (ii) one or more second joint heatmaps of the second object, (iii) one or more first bone heatmaps of the first object, and (iv) one or more second bone heatmaps of the second object” from merging the second output and the second suboutput. Wang discloses a HRNet that utilizes four stages of the HRNet.
Choutas et. al. teaches a communications interface to receive raw image data (Choutas et. al. para. 0059) that includes a representation of a first object and a second object (Choutas et. al. figure 2); a memory storage unit in which to store the raw data; and a neural network engine (Choutas et. al. para. 0002) that includes a higher-resolution branch and a lower-resolution branch configured to: apply a first convolution to extract first set of features that is representative of a first output, downsample the first output to extract a first set of subfeatures that is representative of a first suboutput, apply by the higher-resolution branch to the first output, a second convolution to extract a second set of features that is representative of a second output, apply, by the lower-resolution branch to the first suboutput, the second convolution to extract a second set of subfeatures that is representative of a second suboutput, (Choutas et. al. para. 0014-0019) and merge the second output and the second suboutput to generate joint heatmaps (Choutas et. al. para. 0044) of the first object and the second object, and bone heatmaps (Choutas et. al. para. 0043) of the first object and the second object (Choutas et. al. para. 0014-0019, figure 5). Choutas et. al. also teaches a neural network engine that includes a higher-resolution branch and a lower-resolution branch (Choutas et. al. para 0044, the spatial resolution of a heatmap could be lower than the input frame, due to the stride of the network).
Dependent claims 2-6, and 8 rely on the apparatus of claim 1, and dependent claims 10-14 rely on the method of claim 9.
Claim 8 recites the apparatus of claim 1, wherein the second output and the second suboutput are merged to generate (i) a first plurality of joint heatmaps of the first object, wherein each of the first plurality of joint heatmaps corresponds to a different joint of the first object,(ii) a second plurality of joint heatmaps of the second object, wherein each of the second plurality of joint heatmaps corresponds to a different joint of the second object,(iii) the one or more first bone heatmaps of the first object, wherein each of the one or more first bone heatmaps corresponds to a different pair of the first plurality of joint heatmaps, and(iv) the one or more second bone heatmaps of the second object, wherein each of the one or more second bone heatmaps corresponds to a different pair of the second plurality of joint heatmaps.
Jun et. al. and Wang et. al. further discloses the second output and the second suboutput are merged (Jun et. al.: first fusion step in figure 2; Wang et. al.: first fusion step in figure 2, sections 3.1-3.2, figure 3) to generate joint heatmaps of the first object and the second object (Jun et. al. figure 1f, output heatmaps in figure 2, first paragraph in section II, multiple objects in figure 5), and bone heatmaps of the first object and the second object (Jun et. al. paragraph 6 in section 1: "Given an image in Figure. 1(a), skeletal attention module produces skeletal heatmaps as shown in Fig. 1(b)", HS in figure 3, section II.C).
Claims 2 and 10 pertain to generating a first merged output by upsampling the second suboutput and merging it with the second output. Jun et. al. discloses the claimed first merged output that corresponds to the fifth blue block in the first line in figure 2.
Claims 3 and 11 recite generating a first merged suboutput by downsampling the second output and merging it with the second output. Jun et. al. shows the third blue block in the second line in figure 2 that corresponds to the claimed first merged suboutput.
Claims 4 and 12 recites further configured to applying a third convolution to the first merged output to generate a third output, and applying the third convolution the first merged suboutput to generate a third suboutput. This is disclosed by the sixth blue block in the first line and the fourth blue block in the second line of figure 2 in Jun et. al., respectively.
Claims 5-6 and 13-14 specify the first set of features are low-level features and more specifically as edges. This subject-matter is implicitly disclosed in Jun et. al. because a CNN's initial layers, operating at a high resolution, are generally responsible for extracting low-level features including edges and textures from the input. Additionally, the specification states in paragraph [0027] that the "initial convolution may be carried out on the initial STEM outputs to extract low level features such as edges in the image".
Regarding claim 17, Choutas et. al. teaches a non-transitory computer readable medium encoded with codes, wherein the codes are to direct a processor (Choutas et. al. para. 0019) to: receive raw image data (Choutas et. al. para. 0059) from an image source via a communications interface, wherein the raw image data includes a representation of a first object and a second object (Choutas et. al. figure 2); store the raw image data in a memory storage unit; apply, by the higher-resolution branch to the raw image data, a first convolution to extract a first set of features that is representative of a first output; downsample the first output to extract a first set of subfeatures that is representative of a first suboutput; apply, by the higher-resolution branch to the first output, a second convolution to extract a second set of features that is representative of a second output; apply, by the lower-resolution branch to the first suboutput, the second convolution to extract a second set of subfeatures that is representative of a second suboutput; and merge the second output (Choutas et. al. para. 0014-0019) and the second suboutput to generate (i) joint heatmaps (Choutas et. al. para. 0044) of the first object and the second object and (ii) bone heatmaps (Choutas et. al. para. 0043) of the first object and the second object (Choutas et. al. para. 0014-0019; figure 5).
Jun et. al. and Wang et. al. fails to disclose a non-transitory computer readable medium encoded with codes, wherein the codes are to direct a processor to carry out these tasks at the
scale of the claimed invention. It is critical to the claimed invention to have a machine that
allows for the storage of the instructions for the tasks carried out by the CNN model. Thus, it
would have been obvious to a person having ordinary skill in the art (PHOSITA) to combine the
teachings of Jun et. al. and Wang et. al. with the teachings of Choutas et. al. to allow for the
storage of the computer instructions written by Jun et. al. and Wang et. al
Claims 18-20 recite the non-transitory computer readable medium of claim 17, wherein the codes are to direct the processor to carry out the tasks carried out by the apparatus as outlined by claims 2-4.
Claim 18 recites wherein the codes are to direct the processor to upsample the second suboutput and to merge the second suboutput with the second output to generate a first merged output. Choutas et. al. discloses this limitation in paragraph 0014-0019.
Claim 19 recites wherein the codes are to direct the processor to downsample the second output and to merge the second output with the second suboutput to generate a first merged suboutput. Choutas et. al. discloses this limitation in paragraph 0014-0019.
Claim 20 recites wherein the codes are to direct the processor to apply a third convolution to the second merged output to generate a third output, and to apply the third convolution the first merged suboutput to generate a third suboutput. Choutas et. al. discloses this limitation in paragraphs 0061-0062.
Claims 7, and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Jun et. al., Wang et. al., and Choutas et. al. as applied to claims 1 and 9 above, and further in view of Wang Fei (US Patent 2019/311223 A1).
Claims 7 and 15-16 recite downsampling comprises maximum pooling and upsampling comprises a deconvolution operation. Wang et. al. discloses that HRNet performs downsampling by a strided convolution and bilinear upsampling (Wang et. al. section 3.2, Figure 3). However, using maximum pooling and deconvolution for down-and upsampling, respectively,
are regarded as an obvious alternative in the context of CNNs. Wang Fei performs
downsampling (Wang Fei, paragraph [0067]) and upsampling (Wang Fei, paragraph [0052]) as
claimed.
Conclusion
Response to Amendment
Applicant’s amendments to claim 1 effectively resolves the 112b lack of antecedent basis issue previously noted in the non-final rejection.
Applicant's arguments filed 2/4/2026 have been fully considered but they are not persuasive. The prior art of record in combination addresses all the limitations and features of the claims as amended, which covers the solution of the invention as a whole. The amendments do not overcome the prior art rejections of the previous non-final rejection.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA YIFANG LIN whose telephone number is (571)272-6435. The examiner can normally be reached M-F 7:00am-6:15pm, with optional day off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached at 571-272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JESSICA YIFANG LIN/Examiner, Art Unit 2668 March 10, 2026
/VU LE/Supervisory Patent Examiner, Art Unit 2668