Last updated: May 29, 2026
Application No. 18/354,073
PREDICTION METHOD FOR TARGET OBJECT, COMPUTER DEVICE, AND STORAGE MEDIUM

Final Rejection §103
Filed
Jul 18, 2023
Priority
Jul 20, 2022 — CN 202210853238.3
Examiner
FELIX, BRADLEY OBAS
Art Unit
2671
Tech Center
2600 — Communications
Assignee
BEIJING TUSEN ZHITU TECHNOLOGY CO., LTD.
OA Round
2 (Final)
This examiner grants 11% of cases after interview

— +66.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 18 resolved cases, 2023–2026
Examiner Intelligence

FELIX, BRADLEY OBAS View full profile →
Grants only 11% of cases
Career Allowance Rate
2 granted / 18 resolved
-50.9% vs TC avg
Strong +67% interview lift
Without
With
+66.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
11 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§103
99.0%
+59.0% vs TC avg
§112
1.1%
-38.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 18 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, see Remarks page 10, filed 12/10/2025, with respect to claims 6, 9, and 19 have been fully considered and are persuasive. The objections of claims 6, 9, and 19 have been withdrawn. 

Applicant’s arguments, see Remarks page 10, filed 12/10/2025, with respect to claims 16-20 have been fully considered and are persuasive. The 35 U.S.C. 101 rejections of claims 16-20 have been withdrawn. 

Applicant’s arguments, see Remarks page 11, filed 12/10/2025, with respect to claims 10 and 20 have been fully considered and are persuasive. The 35 U.S.C. 112 rejections of claims 10 and 20 have been withdrawn. 

Applicant’s arguments, see Remarks pages 11-12, filed 12/10/2025, with respect to the rejections of independent claims 1, 11, and 16 under 35 U.S.C. 103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Zhang, in further view of Huang. Therefore, this action is made FINAL.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 11-14, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Learning Scale-Adaptive Representations by Tongfeng Zhang et al., hereinafter Zhang included in IDS received 1/31/2024, in further view of Text-Guided Graph Neural Networks for Referring 3D Instance Segmentation by Pin-Hao Huang et al.

As per claim 1, Zhang discloses a method of predicting a target object, comprising:performing a voxelization processing on point cloud data to obtain a plurality of voxels (see Zhang FIG. 2, wherein their proposed network inputs a point cloud and the point cloud is converted to voxel representation),wherein the plurality of voxels corresponds to a plurality of points in the point cloud data (see Zhang page 922 section 3, voxels which contains a varying number of points within each voxel), and at least a portion of the plurality of voxels forms a voxel set;extracting a plurality of voxel features from voxels in the voxel set (see Zhang page 922 section 3.1, wherein a 3D U-Net encoder is used to acquire a voxel grid, i.e., voxel set);mapping the plurality of voxel features to a plurality of points comprised in the plurality of voxels, respectively, to obtain a plurality of point features of the plurality of points (see Zhang page 924 section 3.2 and FIGS. 4-5, obtain point-wise prediction: wherein the voxel features are projected back to the 3D space. The LPR module assigns voxel features to their corresponding centroids, then queries                         
                            k
                        
                     closest centroids for each point in the original point cloud to get the aggregated point feature                         
                            
                                            x
                                        
                                        ^
                                    
                                            p
                                        
                                            i
                                        
                     for each voxel centroid, i.e., plurality of point features for a plurality of points);and predicting, according to the plurality of point features, the target object (see Zhang FIG. 2, wherein the output point-wise label is acquired from the LPR. See further page 926 Visualization and FIG. 8 and Table 3, wherein the predicted objects are labeled).
While Zhang does not disclose a singular target object, it would have been obvious to include it. The reason is Zhang discloses at the bottom of page 7-8 section 5, wherein the LPR module is capable of discriminate useful information for different objects on various categories. By doing this, it allows the capability of determining specific and non-specific target objects. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang's method by predicting a singular target object from the labeled objects in order to further specify an object to predict.
Additionally, Zhang fails to explicitly disclose where Huang teaches:obtaining a first prediction point set belonging to the target object, based on the plurality of point features (see Huang page 1610 Introduction and FIG. 1, wherein the target object to be predicted is described by language expression/query. See more specifically Huang 1612 Phase 1, wherein a plurality of point features belonging to the same predicted semantic class of the target are grouped, i.e., first prediction point set of the target object); calculating, based on a first weight coefficient, a first instance feature of points in the first prediction point set (see Huang page 1612 Feature Embedding, wherein instance segmentation is calculated using a loss function with the weighting parameter                         
                            α
                        
                     within the point cluster); andpredicting the target object based on a difference between the point features of the points in the first prediction point set and the first instance feature (see Huang page 1612 Method and FIGS. 2-3, wherein the instance features and the clustered, or grouped, points are fed into the Text-Guided Graph Neural Network to predict the result utilizing the cross entropy loss                         
                            
                                    L
                                
                                    s
                                    s
                                
                     for choosing the target with highest score, to yield the final prediction of the referred 3D instance).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang’s method by using Huang’s teaching by including a first prediction point set, a first weight coefficient, and a first instance feature to the objects in order to improve the object detection consistencies by using the instance features to confirm that a plurality of point features belong to the target object.

As per claim 2, Zhang discloses the method according to claim 1, wherein the performing the voxelization processing on the point cloud data to obtain the plurality of voxels comprises:dividing, according to a spatial coordinate range of the point cloud data, the point cloud data into the plurality of voxels by a specified resolution (see Zhang page 922 section 3 and FIG. 2, wherein the 3D space covered by a point cloud is divided into a regular 3D grid in the resolution of                         
                            H
                             
                            x
                             
                            W
                             
                            x
                             
                            D
                        
                    ); and extracting, based on the point cloud data, initial voxel features of the plurality of voxels (see Zhang page 922 section 3, wherein after the point cloud is transferred into voxel-based representation, aggregate features are extracted at different scales. See also page 922-923 section 3.1 and FIG. 2, wherein the 3D U-Net extracts the multi-scale features, and a 3D backbone network is used to extract voxel features with different grid sizes after voxelization from the input point cloud).

As per claim 3, Zhang discloses the method according to claim 2, wherein the extracting the plurality of voxel features from the voxels in the voxel set, comprises:inputting the initial voxel features into a preset sparse voxel encoder to extract the plurality of voxel features (see Zhang page 922 section 3.1 and FIG. 2, wherein a 3D U-Net encoder is used for voxel features).

As per claim 4, Zhang discloses the method according to claim 1, wherein the mapping the plurality of voxel features to the plurality of points comprised in the plurality of voxels, respectively, to obtain the plurality of point features of the plurality of points, comprise:mapping the voxel features of the voxels in the voxel set to points comprised in voxels corresponding to the voxel features to obtain initial point features (see Zhang page 924 section 3.2 and FIGS. 4-5, wherein the voxel features are projected back into the 3D point space to obtain initial point features);calculating geometric features between the points and centers of the voxels corresponding to the points (see Zhang page 924 section 3.2, wherein distances, i.e., geometric features, between voxel centroids                         
                            
                                    v
                                
                                    j
                                
                     to a point                         
                            
                                    p
                                
                                    i
                                
                     is disclosed); andgenerating the point features of the points based on the initial point features and the geometric features (see Zhang page 924 section 3.2, wherein a weighted sum is calculated to acquire the aggregated point feature                         
                            
                                            x
                                        
                                        ^
                                    
                                            p
                                        
                                            i
                                        
                    . The aggregated voxel features are concatenated with the initial point features and passed to a point convolution block).

As per claims 11-14, the rationale provided in claims 1-4 are incorporated herein. In addition, the computer device of claims 11-14 corresponds to the method of claims 1-4.

As per claims 16, the rationale provided in claims 1 is incorporated herein. In addition, the non-transitory computer-readable medium of claim 16 corresponds to the method of claim 1.

Claims 5, 7-8, 15, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang, in combination with Huang, in further view of Density Based Clustering by Syeda Mariam Ahmed et al., hereinafter Ahmed, and Xiaobin Xu Object Detection Based on Fusion of Sparse Point Cloud and Image Information, hereinafter Xu.

As per claim 5, while Zhang, in combination with Huang, fails to explicitly disclose where Ahmed teaches:The method according to claim 1, wherein the obtaining the first prediction point set belonging to the target object, based on the plurality of point features comprises:predicting a plurality of points belonging to a foreground of each object, to obtain a plurality of foreground points (see Ahmed page 2/10 section 1 and FIG. 1, wherein background-foreground segmentation is disclosed, wherein a plurality of points for the foreground are acquired using the input point cloud 3D coordinates and RGB values, i.e., point features. See more specifically page 3-4/10 section 3.1, wherein                         
                            S
                            
                                            y
                                        
                                            1
                                        
                            ,
                            …
                            S
                            (
                            
                                    y
                                
                                    k
                                
                            )
                        
                     are the predicted labels); andperforming a clustering processing on the plurality of foreground points to obtain the first prediction point set belonging to each object, wherein the first prediction point set comprises a plurality of points corresponding to each object (see Ahmed page 2/10 section 1 and FIG. 1, wherein clustering segmentation using the foreground points from the previous stage is disclosed. See more specifically Ahmed page 4/10 section 3.2 and FIGS. 2-3, wherein the cluster is comprised of core points belonging to an object, i.e., prediction point set).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang’s, in combination with Huang, method by using Ahmed’s teaching by including foreground prediction and clustering to predicting the point features in order to more precisely group points together for object recognition.
However, Zhang, in combination with Huang and Ahmed, fails to explicitly disclose where Xu teaches:predicting, based on the plurality of point features, a plurality of points belonging to a foreground of the target object, to obtain a plurality of foreground points (see Xu pages 5-6/12, wherein the target point cloud, which is the point cloud of the target object as clarified on page 2/12, is estimated, or predicted, and extracted from the background point clouds. The point clouds are a plurality of points corresponding to the image coordinate system, and the point features are projected point cloud points);performing a clustering processing on the plurality of foreground points to obtain the first prediction point set belonging to the target object, wherein the first prediction point set comprises a plurality of points corresponding to the target object (see Xu page 6/12, wherein density clustering is performed on the target point cloud data. A bounding box is fit around the target according to the clustered points. See also FIG. 15, wherein a bounding box around the person, i.e., target, is shown).
Xu discloses semantic segmentation of the target object and the other obstacles in a given environment, as disclosed on page 2/12. The semantic segmentation as tied to obtaining a target object among a group of objects correlate to the objects of Ahmed, and the semantic segmentation of Zhang, in combination with Huang. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang’s, in combination with Huang and Ahmed, method by using Xu’s teaching by including a target object to the prediction of the foreground points in order to more specifically determine an object in a given environment of objects and obstacles.

As per claim 6, Zhang, in combination with Huang, Ahmed and Xu, discloses the method according to claim 5, wherein the step of predicting the plurality of points belonging to the foreground of the target object, to obtain the plurality of foreground points, comprises:predicting a type of the target object corresponding to the plurality of points by using the plurality of point features (see Xu page 6/12 section Target Point Cloud Clustering, wherein the semantic label of the target object (i.e., person or car as shown in FIG. 11) is assigned using the clustered points, which were obtained from the point cloud projection, i.e., point features); anddetermining the plurality of foreground points based on the type of the target object (see Xu page 6/12 section Fit Bounding Box and FIG. 7, wherein the extracted target point cloud, i.e., foreground points, are fitted inside a bounding box representing the target object).

As per claim 7, Zhang, in combination with Huang, Ahmed, and Xu, discloses the method according to claim 5, wherein the performing the clustering processing on the plurality of foreground points to obtain the first prediction point set belonging to the target object comprises (see Ahmed page 2/10 section 1 and FIG. 1, wherein clustering segmentation using the foreground points from the previous stage is disclosed. See more specifically Ahmed page 4/10 section 3.2 and FIGS. 2-3, wherein the cluster is comprised of core points belonging to an object, i.e., prediction point set. Xu page 6/12 clarifies that the first prediction point set belongs to the target object):determining a plurality of center points representing the target object, wherein the center point represent a center of the target object to which the foreground points are predicted to belong (see Ahmed page 4/10 section 3.2 and FIGS. 2-3, wherein the clustering algorithm acquires centroid and core points in the object from the once obtained foreground points. Xu page 6/12 clarifies that the first prediction point set belongs to the target object);separately calculating distances among the plurality of center points (see Ahmed page 4/10 section 3.2, wherein the DBSCAN algorithm specifies the range of neighborhood for each point); and adding at least one of the plurality of foreground points meeting a condition to the first prediction point set, wherein the condition is that the distances are less than or equal to a preset distance (see Ahmed page 4/10 section 3.2, wherein the algorithm searches for connected component of the core points to form clusters using these points within the range                         
                            ϵ
                        
                    , as further shown in FIG. 3. These clusters, as further described on page 5/10 section 3, are used to predict the respective object’s bounding box and class label).

As per claim 8, Zhang, in combination with Huang, Ahmed, and Xu, discloses the method according to claim 5, wherein the predicting the target object based on the difference between the point features of the points in the first prediction point set and the first instance feature (see Huang page 1614 and FIGS. 2-3, wherein the instance features and the clustered, or grouped, points are fed into the Text-Guided Graph Neural Network to predict the result utilizing the cross entropy loss                         
                            
                                    L
                                
                                    s
                                    s
                                
                    , i.e., difference calculation, in page 1612 for choosing the target with highest score) comprises:calculating, based on a first weight coefficient, a first instance feature of the points in the first prediction point set (see Huang page 1612 Feature Embedding, wherein instance segmentation is calculated using a loss function with the weighting parameter                         
                            α
                        
                     within the point cluster);obtaining a first target feature based on the difference between the point features of the points in the first prediction point set and the first instance feature, wherein the first target feature is used as the first prediction point set feature (see Huang page 1614 and FIGS. 2-3, wherein the instance features and the clustered, or grouped, points are fed into the Text-Guided Graph Neural Network to predict the result utilizing the cross entropy loss                         
                            
                                    L
                                
                                    s
                                    s
                                
                    in page 1612 for choosing the target with highest score); andpredicting the target object based on the first prediction point set feature (see Huang FIG. 2, wherein the TGNN acquires the final prediction result using the clustered points).

As per claims 15 and 17, the rationale provided in claim 5 is incorporated herein. In addition, the computer device of claim 15 and the computer-readable medium of claim 17 correspond with the method of claim 5.

As per claim 18, the rationale provided in claim 8 is incorporated herein. In addition, the computer-readable medium of claim 18 corresponds with the method of claim 8.

Claims 9-10 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang, in combination with Huang, Ahmed, and Xu, in further view of YING CHEN CN-114445633-A, hereinafter CHEN, and Multi-Feature Fusion Target Re-Location Tracking Based on Correlation Filters by Qingzhong Shu, hereinafter Shu.

As per claim 9, Zhang, in combination with Huang, Ahmed, and Xu, fails to explicitly disclose where CHEN teaches:The method according to claim 8, further comprising:calculating, based on a second weight coefficient, weighted coordinates of the points in the first prediction point set (see CHEN page 34/100 step 209, wherein the attention weight of preset coordinates in the common view area, i.e., first prediction point set as disclosed in CHEN page 21/100, wherein the predicted area is disclosed, using the associate feature is disclosed. See prior CHEN bottom of page 33/100, wherein the associated weight, i.e., second weight, is disclosed); determining difference between coordinates of the points in the first prediction point set and the weighted coordinates of the points in the first prediction point set to obtain a second target feature (see CHEN page 36/100, wherein the relative central point offset of the common view area, i.e., difference in the point set, is obtained).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang’s, in combination with Huang, Ahmed, and Xu, method by using CHEN’s teaching by including a second target feature to the first prediction point set in order to acquire a further confirmation as to the points of the object.
While Zhang, in combination with Huang, Ahmed, Xu, and CHEN, teaches first and second target features, Shu also teaches a first and second target features (see Shu page 3/11, wherein the different filters, which contain the features of the target object, i.e., first and second target features). Additionally, Zhang, in combination with Ahmed and CHEN, fails to explicitly disclose where Shu discloses:fusing the first target feature and the second target feature to obtain a third target feature, wherein the third target feature is used as the first prediction point set feature (see Shu page 3/11, wherein the different filters, i.e., first and second target features, are used for feature fusion using each feature’s weighting coefficients and acquires an optimal value as shown in FIG. 1, wherein the new optimal value is the new predicted coordinates, or points).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Zhang’s, in combination with Huang, Ahmed, Xu, and CHEN, method by using Shu’s teaching by including fusion to the first and second target features in order to acquire the prediction map of the two features which can provided a more accurate target feature.

As per claim 10, Zhang, in combination with Huang, Ahmed, Xu, CHEN, and Shu, discloses the method according to claim 9, further comprising:generating, based on the first prediction point set feature, a prediction box indicating that an object is predicted to be the target object, wherein at least one of the plurality of foreground points within the prediction box is used as a second prediction point set (see Ahmed page 5/10 section 3.4, wherein an object and its amodal bounding box are predicted using the segmented foreground points extracted in section 3.2. Huang page 1612 Phase 1 clarifies that the first prediction point set belongs to the target object); calculating a second instance feature of the points in the second prediction point set by using a third weight coefficient (see Shu page 3/11 and FIG. 1, wherein a weighing coefficient is used in the prediction of the target, wherein the second instance feature would be the second layer of FIG. 1); obtaining a fourth target feature based on a difference between the point features of the points in the second prediction point set and the second instance feature (see Shu page 4/11, wherein a coordinate different                 
                    d
                    ’
                
             is used to predict the target position offset, and reacquire the target position in frame                 
                    
                            d
                        
                            t
                            +
                            1
                        
            ); and adjusting, according to the fourth target feature, the predicted target object (see Shu page 4/11, wherein the target is required in frame                 
                    
                            d
                        
                            t
                            +
                            1
                        
             ).

As per claims 19-20, the rationale provided in claims 9-10 is incorporated herein. In addition, the computer-readable medium of claim 19-20 corresponds to the method of claims 9-10.

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Bradley Obas Felix whose telephone number is (703)756-1314. The examiner can normally be reached M-F 8-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached at 5712728243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BRADLEY O FELIX/Examiner, Art Unit 2671                                                                                                                                                                                                        

/VINCENT RUDOLPH/Supervisory Patent Examiner, Art Unit 2671
Read full office action
Prosecution Timeline

Jul 18, 2023
Application Filed
Sep 17, 2025
Non-Final Rejection mailed — §103
Dec 10, 2025
Response Filed
Mar 17, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/331,399
Patent 12608780
IMAGE PROCESSING APPARATUS AND METHOD, IMAGE CAPTURING APPARATUS AND STORAGE MEDIUM
2y 10m to grant Granted Apr 21, 2026
17/729,277
Patent 12592076
OBJECT IDENTIFICATION SYSTEM AND METHOD
3y 11m to grant Granted Mar 31, 2026
17/774,868
Patent 12340540
AN IMAGING SENSOR, AN IMAGE PROCESSING DEVICE AND AN IMAGE PROCESSING METHOD
3y 1m to grant Granted Jun 24, 2025
Study what changed to get past this examiner. Based on 3 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
11%
Grant Probability
78%
With Interview (+66.7%)
3y 2m (~3m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 18 resolved cases by this examiner. Grant probability derived from career allowance rate.