Prosecution Insights
Last updated: April 19, 2026
Application No. 18/696,147

APPARATUS FOR GENERATING 3-DIMENSIONAL OBJECT MODEL AND METHOD THEREOF

Non-Final OA §103
Filed
Mar 27, 2024
Examiner
TUNG, KEE M
Art Unit
2611
Tech Center
2600 — Communications
Assignee
Nextdoor Co. Ltd.
OA Round
1 (Non-Final)
8%
Grant Probability
At Risk
1-2
OA Rounds
3y 0m
To Grant
18%
With Interview

Examiner Intelligence

Grants only 8% of cases
8%
Career Allow Rate
15 granted / 189 resolved
-54.1% vs TC avg
Moderate +11% lift
Without
With
+10.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
12 currently pending
Career history
201
Total Applications
across all art units

Statute-Specific Performance

§101
9.3%
-30.7% vs TC avg
§103
56.3%
+16.3% vs TC avg
§102
17.8%
-22.2% vs TC avg
§112
11.2%
-28.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 189 resolved cases

Office Action

§103
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . DETAILED ACTION Status of Claims Claims 1-14 are currently pending in this application. Information Disclosure Statement The information disclosure statement (IDS) submitted on March 27, 2024 is hereby acknowledged. All references have been considered by the examiner. Initialed copies of the PTO-1449 are included in this correspondence. Claim Objection Claim 10 is objected to due to minor informalities: a). Claim 10 line 1 recites “the deep learning module is two or more” and it shall be “there are two or more deep learning modules”. Appropriate correction is required. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2(c) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: Determining the scope and contents of the prior art. Ascertaining the differences between the prior art and the claims at issue. Resolving the level of ordinary skill in the pertinent art. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claims 1, 4-5, 7, 11 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Brookshire et al. (2021/0019507) in view of Awai et al. (2022/0083771). Regarding claim 1, Brookshire teaches an apparatus for generating a 3-dimensional object model (e.g., A skeleton fitting technique, such as a nonlinear least squares (NONLINLSQ) skeleton optimization further reduces this error to around 3 cm. The skeleton is then converted to a skinned multi-person linear (SMPL) model representation via skeleton conversion. The SMPL model attaches a “flesh” mesh to the skeleton, allowing the skeleton to be further refined against 3D point cloud data from Dense Stereo reconstruction, reducing the average error to around 1 cm. Brookshire: [0049] L.21-27), the apparatus comprising: a memory for storing one or more instructions (e.g., an apparatus including a processor and a memory, coupled to the processor, the memory having stored therein at least one of programs or instructions executable by the processor. Brookshire: [0021] L.4-7); and a processor configured to execute the stored instructions (e.g., the processor executes the programs or instructions, Brookshire: [0021] L.8-9) to perform: a motion of acquiring two-dimensional skeleton information extracted from a two-dimensional image of a target object (e.g., to synchronously capture images of a human moving through an area from a plurality of different points of view, for each of the plurality of captured images, determine a bounding box that bounds the human in the captured image and identify pixel locations of the bounding box in the image, for each of the plurality of captured images, determine at least one of a 2D skeleton and a single-view 3D skeleton from the identified pixel locations of the bounding box, Brookshire: [0021] L.9-17; see 1_1 below); a motion of converting the two-dimensional skeleton information into three-dimensional skeleton information through a deep learning module (e.g., determine at least one of a 2D skeleton and a single-view 3D skeleton from the identified pixel locations of the bounding box, Brookshire: [0021] L.9-17; see 1_2 below); and a motion of generating a three-dimensional model for the target object based on the three-dimensional skeleton information (e.g., The skeleton is then converted to a skinned multi-person linear (SMPL) model representation via skeleton conversion. The SMPL model attaches a “flesh” mesh to the skeleton, allowing the skeleton to be further refined against 3D point cloud data from Dense Stereo reconstruction, reducing the average error to around 1 cm. The final output is the 3D position of each joint in the skeleton. Brookshire: [0049] L.21-28. In some embodiments, the optimized 3D skeleton from the skeleton fitting module 160 can be converted to a Skinned Multi-Person Linear (SMPL) representation at the skeleton conversion module 170. That is, in some embodiments, the skeleton conversion module 170 provides a way to associate "flesh" with a skeletal model by, in some embodiments, using a machine learning approach to produce a linear function mapping from joint angles to mesh vertices Brookshire: [0071] L.8-15. ). While Brookshire does not explicitly teach, Awai teaches: (1_1). acquiring two-dimensional skeleton information extracted from a two-dimensional image of a target object (e.g., two-dimensional skeleton coordinates obtained through skeleton detection on a two-dimensional video; Awai: [0006] L.2-4); (1_2). converting the two-dimensional skeleton information into three-dimensional skeleton information through a deep learning module (e.g., There is known a technique called skeleton detection for detecting a skeleton of a person from a video. For example, there is a deep learning (DL) framework that converts two-dimensional (2D) skeleton coordinates obtained from a 2D video into three-dimensional (3D) skeleton coordinates. Awai: [0003]. As one embodiment, the object recognition unit 13 recognizes an object for each frame of a video. As described above, “recognition” mentioned herein may include recognition of a region where an object is present, which is so-called object detection, in addition to recognition of individual objects or a class of an object. Such object recognition may be implemented by a model that has learned objects in accordance with an arbitrary machine learning algorithm, for example, deep learning or like, which is merely an example. Awai: [0071] L.4-13. Therefore, the 3D skeleton coordinates (information) is obtained from converting the 2D skeleton coordinates (information) using a deep learning framework); It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Awai into the teaching of Brookshire so that the 3D skeleton coordinates are obtained from the 2D coordinates using a deep learning framework. Regarding claim 4, the combined teaching of Brookshire and Awai teaches the apparatus according to claim 1, wherein the processor further acquires another object information other than the two-dimensional skeleton information from the two-dimensional image, and the converting motion comprises a motion of inputting the two-dimensional skeleton information and the other object information into the deep learning module and acquiring the three-dimensional skeleton information (e.g., The spatial state recognition function 5 recognizes a spatial state in accordance with whether or not a relationship between a person determined by object recognition and a space around the person satisfies a certain condition. The “space” mentioned herein may be a so-called region of interest (ROI) defined in a video or may be a region in which an object is recognized through object recognition. Awai: [0035]. In one aspect, the spatial state recognition function 5 is capable of recognizing a spatial state by performing threshold-based determination for a distance between a person and a space. For example, a case will be exemplified where a skeleton into which 3D skeleton coordinates of a person are modeled and a region where a certain object, for example, a chair is present are obtained as an example of an object recognition result. In this case, a spatial state “chair” is recognized through determination as to whether a distance between a center position of the hip, which is calculated from a right hip position and the left hip position among joints included in the skeleton, and a barycenter position of the region of the object is less than or equal to a certain threshold. In another aspect, the spatial state recognition function 5 is capable of recognizing a spatial state by performing determination as to whether the position of the target is inside or outside a region between a person and a space. For example, a case will be exemplified where a skeleton into which 3D skeleton coordinates of a person are modeled and a region where a certain object, for example, a keyboard is present are obtained as an example of an object recognition result. In this case, a spatial state “keyboard” is recognized through determination as to whether a position of the left wrist among joints included in the skeleton is inside a region of the keyboard. The example in which a space is defined by an object has been described merely as an example. However, a space does not necessarily have to be defined by an object, and a space may be defined by a ROI or the like set in a video. Awai: [0036]. Therefore, ROI is defined so that objects recognized as related like “chair”, “keyboard” and etc.). Regarding claim 5, the combined teaching of Brookshire and Awai teaches the apparatus according to claim 4, wherein the other object information comprises at least one of: bone information comprising a bone length (e.g., optimizing a position of each joint of the first, multi-view 3D skeleton by maximizing a likelihood from neural network detections used to determine the at least one of the 2D skeletons and the single-view 3D skeletons and keeping bone lengths of the first, multi-view 3D skeleton fixed, Brookshire: [0026] L.3-7. The distance of the joint i from the hip is then normalized by a length of a hip section and a length of a neck section from an aspect of making the scale of the subject in different videos substantially uniform when the 3D skeleton coordinates obtained from the different videos are used in calculation of the coefficient k. Awai: [0057] L.11-16), joint information comprising a joint angle (e.g., using a machine learning approach to produce a linear function mapping from angles of joints of a determined human skeleton to mesh vertices determined by the skeleton conversion module, Brookshire: [0026] L.8-11), and body part information comprising an area of a body part (e.g., The basic motion recognition function 4 is a function for recognizing a basic motion from 3D skeleton coordinates in each frame. The “basic motions” mentioned herein may include a “whole body action” in which a motion appears in the whole body of a person, a “partial action” in which a motion appears in a part of the body of a person, and so on. Among these, examples of the “whole body action” include actions such as “walking”, “running”, and “staying still”. Examples of the “partial action” include actions such as “raising the right hand”, “looking down”, and “looking straight”. Since the “whole body action” and the “partial action” are “basic motions” performed in the daily life, the “whole body action” and the “partial action” are simple motions as compared with the “higher-level action”. Awai: [0034] L.1-14). Regarding claim 7, the combined teaching of Brookshire and Awai teaches the apparatus according to claim 1, wherein the deep learning module is trained based on an error between three-dimensional skeleton information predicted from two-dimensional skeleton information for learning and correct answer information, and the error comprises at least one of an error in a center of weight, a bone length error and a joint angle error (e.g., The optimization then proceeds by adjusting the rigid body transform at the f-th frame and the joint angles at the f-th frame {Tf w, θf} until the error between the stereo points and the mesh is minimized. The shape parameters β are fixed as the shape parameters were previously optimized as described above and it is not expected that the height/weight/etc. of the subject human to change significantly between the SMPL and stacked hourglass models. Brookshire: [0080] L.7-10). Regarding claim 11, the combined teaching of Brookshire and Awai teaches the apparatus according to claim 1, wherein the converting motion comprises a motion of inputting the two-dimensional skeleton information and domain information of the target object into the deep learning module to acquire the three-dimensional skeleton information, wherein the domain is defined to be distinguished based on motion features of the object (e.g., FIG. 4 is a diagram illustrating an example of a 2D-to-3D conversion result. FIG. 4 illustrates a conversion result of the 2D skeleton coordinates illustrated in FIG. 3 into 3D skeleton-coordinates. FIG. 4 illustrates 3D skeleton coordinates that are defined by a three-dimensional coordinate system in which the left-right direction of the camera 2 is set as an X axis, the up-down direction of the camera 2 is set as a Y axis, and the depth direction of the camera 2 is set as a Z axis. FIG. 4 illustrates, side by side, a skeleton into which the 3D skeleton coordinates are modeled on an XY plane and a skeleton into which the 3D skeleton coordinates are modeled on a YZ plane. As illustrated in FIG. 4, when the XY plane is viewed from the front direction of the camera 2, it is difficult to observe the abnormal values of the 3D skeleton coordinates that appear in the front-rear direction of the camera 2. On the other hand, when the YZ plane is viewed from the lateral surface direction of the camera 2, the abnormal values of the 3D skeleton coordinates that represent the body axis of the person inclined in a direction toward the front of the camera 2 are observed. For such abnormal values of the 3D skeleton coordinates, a correction value for correcting the inclination of the axis in the depth direction of the camera 2 may not be accurately calculated, partly because the 2D-to-3D conversion itself is not correctly performed. Therefore, even when axis correction is performed by the perspective projection transform function 3D described above, it is difficult to correct the abnormal values of the 3D skeleton coordinates to normal values. Awai: [0047]). Regarding claim 13, the claim is a method claim of system claim 1. The claim is similar in scope to claim 1 and it is rejected under similar rationale as claim 1. Brookshire teaches that “Embodiments of the present principles generally relate to the estimation of a pose of a human skeleton, and more particularly, to methods, apparatuses, and systems for estimating the pose of a human skeleton to sub-centimeter accuracy.” (Brookshire: [0003]). Regarding claim 14, the claim is a program claim of apparatus claim 1. The claim is similar in scope to claim 1 and it is rejected under similar rationale as claim 1. Brookshire teaches that “Embodiments in accordance with the disclosure can be implemented in hardware, firmware, software, or any combination thereof. Embodiments can also be implemented as instructions stored using one or more machine-readable media, which may be read and executed by one or more processors.” (Brookshire: [0118] L.1-). Claims 2 are rejected under 35 U.S.C. 103 as being unpatentable over Brookshire in view of Awai as applied to claim 1 and further in view of Wang et al. (“Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images”, Computer Vision - ECCV 2018. ECCV 2018. https://doi.org/10.48550/arXiv.1804.01654 ). Regarding claim 2, the combined teaching of Brookshire and Awai teaches the apparatus according to claim 1, wherein the deep learning module is a Graph Convolutional Networks (GCN)-based module, and comprises an encoder configured to receive the two-dimensional skeleton information and extract feature data; and a decoder configured to decode the extracted feature data and output the three-dimensional skeleton information (see 2_1 below). While the combined teaching of Brookshire and Awai does not explicitly teach, Wang teaches: (2_1). the deep learning module is a Graph Convolutional Networks (GCN)-based module (e.g., We propose an end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image. Limited by the nature of deep neural network, previous methods usually represent a 3D shape in volume or point cloud, and it is non-trivial to convert them to the more ready-to-use mesh model. Unlike the existing methods, our network represents 3D mesh in a graph-based convolutional neural network and produces correct geometry by progressively deforming an ellipsoid, leveraging perceptual features extracted from the input image. Wang: Abstract L.1-8), and comprises an encoder configured to receive the two-dimensional skeleton information and extract feature data; and a decoder configured to decode the extracted feature data and output the three-dimensional skeleton information (e.g., The first challenge is how to represent a mesh model, which is essentially an irregular graph, in a neural network and still be capable of extracting shape details effectively from a given color image represented in a 2D regular grid. It requires the integration of the knowledge learned from two data modalities. On the 3D geometry side, we directly build a graph based fully convolutional network (GCN) [3,8,18] on the mesh model, where the vertices and edges in the mesh are directly represented as nodes and connections in a graph. Network feature encoding information for 3D shape is saved on each vertex. Through forward propagation, the convolutional layers enable feature exchanging across neighboring nodes, and eventually regress the 3D location for each vertex. On the 2D image side, we use a VGG-16 like architecture to extract features as it has been demonstrated to be successful for many tasks [10,20]. To bridge these two, we design a perceptual feature pooling layer which allows each node in the GCN to pool image features from its 2D projection on the image, which can be readily obtained by assuming known camera intrinsic matrix. The perceptual feature pooling is enabled once after several convolutions (i.e. a deformation block described in Sec. 3.4) using updated 3D locations, and hence the image features from correct locations can be effectively integrated with 3D shapes. Wang: sec. 1 para. 3); It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Wang into the combined teaching of Brookshire and Awai so that the mesh information is represented as shape in graph with known connectivity that allows a higher order loss functions be defined across neighboring nodes, which are important to regularize 3D shapes (Wang: sec. 1 para. 5 L.1-3). Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Brookshire in view of Awai and Wang as applied to claim 2 and further in view of Guler et al. (2021/0241522). Regarding claim 3, the combined teaching of Brookshire, Awai and Wang teaches the apparatus according to claim 2, wherein the encoder performs a down-sampling process to extract a plurality of feature data with different abstraction levels, and the decoder performs an up-sampling process using the plural feature data (see 3_1 below). While the combined teaching of Brookshire, Awai and Wang does not explicitly teach, Guler teaches: (3_1). the encoder performs a down-sampling process to extract a plurality of feature data with different abstraction levels, and the decoder performs an up-sampling process using the plural feature data (e.g., It advantageous to reparametrize the body surface with a locally Cartesian coordinate system. This allows the above process to be replaced with bilinear interpolation and use a Spatial Transformer Layer to efficiently handle large numbers of points. In order to perform this re-parametrization, Multi-Dimensional Scaling is first performed to flatten parts of the parametric model surface to two dimensions and then these parts are sampled uniformly on a grid. Guler: [0064]). It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Guler into the combined teaching of Brookshire, Awai and Wang so that reduced sampling is implemented by uniform sampling on a grid after multi-dimensional scaling to two dimensions is performed. Allowable Subject Matter Claim(s) 6, 8-10 and 12 is/are objected to being dependent upon rejected base claim. The claim would be allowable if rewritten in independent form including all the limitations of the base claim and any intervening claims. The following is a statement of reasons for the indication of allowable subject matter in claim 6: The prior art of record, either individually or in combination, fails to teach the claimed limitation in the following: a first deep learning module for receiving first object information and a second deep learning module for receiving second object information among the additional object information, and the acquiring motion comprises a motion of combining first skeleton information outputted through the first deep learning module and second skeleton information outputted through the second deep learning module to acquire the three-dimensional skeleton information. as recited in claim 6. The following is a statement of reasons for the indication of allowable subject matter in claim 8: The prior art of record, either individually or in combination, fails to teach the claimed limitation in the following: the deep learning module is trained using two-dimensional skeleton information corrected based on domain information of an object, the correcting comprises at least one of adding new connection lines between key points that make up a skeleton and strengthening connection lines, and the domain is defined to be distinguished based on motion features of the object. as recited in claim 8. The following is a statement of reasons for the indication of allowable subject matter in claim 9: The prior art of record, either individually or in combination, fails to teach the claimed limitation in the following: wherein two-dimensional skeleton information for learning of the deep learning module is generated by correcting a connection line between key points, based on a movement speed of the key points, with two-dimensional skeleton information extracted from consecutive frame images. as recited in claim 9. The following is a statement of reasons for the indication of allowable subject matter in claim 10: The prior art of record, either individually or in combination, fails to teach the claimed limitation in the following: wherein the deep learning module is two or more, and the converting motion comprises: a motion of determining a deep learning module corresponding to a domain of the target object among the plural deep learning modules; and a motion of converting the two-dimensional skeleton information into the three-dimensional skeleton information through the determined deep learning module, wherein the domain is defined to be distinguished based on motion features of the object. as recited in claim 10. The following is a statement of reasons for the indication of allowable subject matter in claim 12: The prior art of record, either individually or in combination, fails to teach the claimed limitation in the following: the processor further acquires other object information, other than the two-dimensional skeleton information, from the two-dimensional image, and further performs a motion of correcting a three-dimensional model generated based on the other object information, wherein the correcting motion comprises: a motion of extracting three-dimensional skeleton information from the generated three-dimensional model; a motion of correcting the extracted three-dimensional skeleton information according to the other object information; and a motion of re-generating a three-dimensional model for the target object based on the corrected three-dimensional skeleton information. as recited in claim 12. Conclusion The prior arts made of record and not relied upon is considered pertinent to applicant's disclosure: a). Lim (2012/0162217) teaches that “Disclosed herein is a 3D model shape transformation apparatus. The 3D model shape transformation apparatus includes a camera unit, a shape restoration unit, a skeleton structure generation unit, and a skeleton transformation unit. The camera unit obtains a plurality of 2D images in a single frame by capturing the shape of an object. The shape restoration unit generates a 3D volume model by restoring the shape of the object based on the plurality of 2D images. The skeleton structure generation unit generates the skeleton structure of the 3D volume model. The skeleton transformation unit transforms the size and posture of the 3D volume model into those of a template model by matching the skeleton structure of the template model with the skeleton structure of the 3D volume model.” (Lim: Abstract). Any inquiry concerning this communication or earlier communications from the examiner should be directed to SING-WAI WU whose telephone number is (571)270-5850. The examiner can normally be reached 9:00am - 5:30pm (Central Time). Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /SING-WAI WU/Primary Examiner, Art Unit 2611
Read full office action

Prosecution Timeline

Mar 27, 2024
Application Filed
Nov 07, 2025
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12597174
METHOD AND APPARATUS FOR DELIVERING 5G AR/MR COGNITIVE EXPERIENCE TO 5G DEVICES
2y 5m to grant Granted Apr 07, 2026
Patent 12591304
SYSTEMS AND METHODS FOR CONTEXTUALIZED INTERACTIONS WITH AN ENVIRONMENT
2y 5m to grant Granted Mar 31, 2026
Patent 12586311
APPARATUS AND METHOD FOR RECONSTRUCTING 3D HUMAN OBJECT BASED ON MONOCULAR IMAGE WITH DEPTH IMAGE-BASED IMPLICIT FUNCTION LEARNING
2y 5m to grant Granted Mar 24, 2026
Patent 12537877
MANAGING CONTENT PLACEMENT IN EXTENDED REALITY ENVIRONMENTS
2y 5m to grant Granted Jan 27, 2026
Patent 12530797
PERSONALIZED SCENE IMAGE PROCESSING METHOD, APPARATUS AND STORAGE MEDIUM
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
8%
Grant Probability
18%
With Interview (+10.6%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 189 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month