Prosecution Insights
Last updated: April 19, 2026
Application No. 18/211,522

SYSTEM AND METHOD OF GENERATING BOUNDING POLYGONS

Non-Final OA §103
Filed
Jun 19, 2023
Examiner
TRAN, TAN H
Art Unit
2141
Tech Center
2100 — Computer Architecture & Software
Assignee
Plainsight Technologies Inc.
OA Round
3 (Non-Final)
60%
Grant Probability
Moderate
3-4
OA Rounds
3y 6m
To Grant
92%
With Interview

Examiner Intelligence

Grants 60% of resolved cases
60%
Career Allow Rate
184 granted / 307 resolved
+4.9% vs TC avg
Strong +32% interview lift
Without
With
+31.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 6m
Avg Prosecution
60 currently pending
Career history
367
Total Applications
across all art units

Statute-Specific Performance

§101
14.4%
-25.6% vs TC avg
§103
55.3%
+15.3% vs TC avg
§102
19.2%
-20.8% vs TC avg
§112
6.1%
-33.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 307 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status 1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . DETAILED ACTION Continued Examination Under 37 CFR 1.114 2. A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 5/13/2025 has been entered. Claims 2, 6, and 15 have been amended. Claim 22 has been added. Claims 2-22 remain pending in the application. Information Disclosure Statement 3. The information disclosure statement (IDS(s)) submitted on 5/13/2025 is/are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. Response to Arguments 4. Applicant’s arguments with respect to claims have been considered but are moot in view of new ground of rejection. See rejections below for details. Claim Rejections – 35 USC § 103 5. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 6. Claims 2, 6-7, 10, 15-17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Cao et al. (U.S. Patent Application Pub. No. US 20210365717 A1) in view of Chen et al. (U.S. Patent Application Pub. No. US 20180253622 A1), and further in view of Farooqi et al. (U.S. Patent Application Pub. No. US 20180150713 A1). Claim 2: Cao teaches a system comprising: at least one processor (i.e. a non-transitory computer-readable storage medium storing a plurality of processor executable instructions; para. [0009]); a first memory with instructions stored thereon that, when executed by the at least one processor, cause the at least one processor to (i.e. The apparatus may include a memory operable to store computer-readable instructions and a processor operable to read the computer-readable instructions; para. [0007]): extract, from a first image, a first portion that is representative of content within a bounding shape that is arranged around a depiction of a first object of a particular type (i.e. FIG. 1, an example in which an apparatus for segmenting a medical image is integrated in an electronic device 100 is used. The electronic device 100 may obtain a slice pair 102 (the slice pair including two slices 103 and 104 by sampling a to-be-segmented medical image 101), perform feature extraction 105 on each slice in the slice pair by using different receptive fields, to obtain high-level feature information 107 and 108 and low-level feature information 106 and 109 of the each slice. In one aspect, the electronic device 100 then segments, for each slice in the slice pair, a target object in the slice according to the low-level feature information 106 and 109 and the high-level feature information 107 and 108 of the slice, to obtain an initial segmentation result 111 and 113 of the slice; para. [0036]); establish one or more high-level features and one or more low-level features of the first portion (i.e. FIG. 1, an example in which an apparatus for segmenting a medical image is integrated in an electronic device 100 is used. The electronic device 100 may obtain a slice pair 102 (the slice pair including two slices 103 and 104 by sampling a to-be-segmented medical image 101), perform feature extraction 105 on each slice in the slice pair by using different receptive fields, to obtain high-level feature information 107 and 108 and low-level feature information 106 and 109 of the each slice. In one aspect, the electronic device 100 then segments, for each slice in the slice pair, a target object in the slice according to the low-level feature information 106 and 109 and the high-level feature information 107 and 108 of the slice, to obtain an initial segmentation result 111 and 113 of the slice; para. [0036]); apply Atrous Spatial Pyramid Pooling (ASPP) to the one or more high-level features of the first portion to aggregate the one or more high-level features as aggregate features (i.e. The high-level feature information corresponding to the first slice sample and the high-level feature information corresponding to the second slice sample may be further processed by using ASPP, to obtain high-level feature information in more different dimensions, referring to FIG. 8; para. [0141]);; up-sample the aggregate features (i.e. upsamples the high-level feature information; para. [0173]); apply a convolution to the one or more low-level features (i.e. The electronic device performs convolution with a convolution kernel of “1×1” on the low-level feature information 806 and the high-level feature information 808 of the first slice 801 by using the first segmentation network branch 803, upsamples the high-level feature information obtained after convolution to have the same size as the low-level feature information obtained after convolution, concatenates the upsampled high-level feature information and low-level feature information obtained after convolution, to obtain the concatenated feature information of the first slice 801; para. [0173]); concatenate the aggregate features after upsampling with the one or more low-level features after convolution to form combined features (i.e. concatenates the upsampled high-level feature information and low-level feature information obtained after convolution, to obtain the concatenated feature information of the first slice 801; para. [0173]); and segment the combined features to generate a first polygonal shape outline along first outer boundaries of the first object in the first portion (i.e. figs. 8-10, performs convolution with a convolution kernel of “3×3” on the concatenated feature information, and then upsamples the concatenated feature information obtained after convolution to obtain a size of the first slice, so that the initial segmentation result 814 of the first slice 801 can be obtained; para. [0173]); and a second memory with instructions stored thereon that, when executed by the at least one processor, cause the at least one processor to (i.e. The apparatus may include a memory operable to store computer-readable instructions and a processor operable to read the computer-readable instructions; para. [0007]): receive a second image that includes a depiction of a second object of the particular type (i.e. FIG. 1, an example in which an apparatus for segmenting a medical image is integrated in an electronic device 100 is used. The electronic device 100 may obtain a slice pair 102 (the slice pair including two slices 103 and 104 by sampling a to-be-segmented medical image 101), perform feature extraction 105 on each slice in the slice pair by using different receptive fields, to obtain high-level feature information 107 and 108 and low-level feature information 106 and 109 of the each slice. In one aspect, the electronic device 100 then segments, for each slice in the slice pair, a target object in the slice according to the low-level feature information 106 and 109 and the high-level feature information 107 and 108 of the slice, to obtain an initial segmentation result 111 and 113 of the slice; para. [0036]); apply at least a first convolutional neural network to the second image to generate one or more feature maps (i.e. The electronic device performs convolution with a convolution kernel of “1×1” on the low-level feature information 806 and the high-level feature information 808 of the first slice 801 by using the first segmentation network branch 803, upsamples the high-level feature information obtained after convolution to have the same size as the low-level feature information obtained after convolution, concatenates the upsampled high-level feature information and low-level feature information obtained after convolution, to obtain the concatenated feature information of the first slice 801; para. [0173]); generate a plurality of regions of interest, each of which is representative of a non-rectangular, polygonal shape (i.e. The image segmentation is a technology and a process of segmenting an image into several particular regions having special properties, and specifying a target of interest. This embodiment of this disclosure is mainly to segment a three-dimensional medical image and find a required target object. For example, a 3D medical image is divided in a z-axis direction into a plurality of single-frame slices (referred to as slices for short). A liver region or the like is then segmented from the slices. After segmentation results of all slices in the 3D medical image are obtained, these segmentation results are combined in the z-axis direction, so that a 3D segmentation result corresponding to a 3D medical image may be obtained. That is, the target object is, for example, a 3D form of the liver region. The segmented target object may be subsequently analyzed by a medical care person or another medical expert for further operation; para. [0035]); predict segmentation masks on at least a subset of the plurality of regions of interest in a pixel-to-pixel manner (i.e. A pixel belonging to a target object in the first slice sample is then selected according to the concatenated feature information, to obtain the first predicted segmentation value of the slice sample. For example, convolution with a convolution kernel of “3×3” may be specifically performed on the concatenated feature information, and upsampling is performed to obtain a size of the first slice sample, so that the first predicted segmentation value of the slice sample can be obtained; para. [0116]). Cao does not explicitly teach slide a first window across the one or more feature maps to obtain a plurality of anchor shapes using a region proposal network; determine whether each anchor shape of the plurality of anchor shapes contains an object to generate a plurality of regions of interest, each of which is representative of a non-rectangular, polygonal shape; produce classifications for objects, if any, in each region of interest using a second convolutional neural network, trained in part using the first polygonal shape outline of the first object; and identify individual objects of the second image based on the classifications and the segmentation masks; wherein the first portion is a cropped portion of the first image; wherein the first polygonal shape outline is defined by lines and/or curves that collectively surround the first object. However, Chen teaches slide a first window across the one or more feature maps to obtain a plurality of anchor shapes using a region proposal network (i.e. in the instance-level semantic segmentation sub-network, technologies such as R-FCN, MNC (e.g., or ROI-warping from MNC, as described herein), and/or the like, can be used, as described, for region proposals, to generate the instance masks, etc. For example, region proposals (e.g., regions of interest (RoI) 232) for the layer 202 can be generated from RPN 230, and a customized RoI classifier 210 is provided to classify the region proposals. In an example, a last layer 240 of the convolutional blocks (e.g., conv5, which may include 2048 channels in an example) can be convolved with a 1×1 convolutional layer to generate a feature map (e.g., a 1024-channel feature map). Then, k.sup.2 (C+1) channels feature maps, also referred to as detection position-sensitive score maps 250, can be generated, where the +1 can be for the background class and a total of C categories. The k.sup.2 can correspond to a k×k spatial grid, where the cell in the grid encodes the relative positions (e.g., top-left and bottom-right). In one example, k can be set to 7. In an example, the detection position-sensitive score maps can be generated for each RoI 232 in the image provided as output from the RPN 230. A pooling operation (e.g., position sensitive pooling 242) can be applied to the detection position-sensitive score maps to obtain a C+1-dimensional vector for each RoI 232; para. [0037]); determine whether each anchor shape of the plurality of anchor shapes contains an object to generate a plurality of regions of interest, each of which is representative of a non-rectangular, polygonal shape (i.e. in the instance-level semantic segmentation sub-network, technologies such as R-FCN, MNC (e.g., or ROI-warping from MNC, as described herein), and/or the like, can be used, as described, for region proposals, to generate the instance masks, etc. For example, region proposals (e.g., regions of interest (RoI) 232) for the layer 202 can be generated from RPN 230, and a customized RoI classifier 210 is provided to classify the region proposals. In an example, a last layer 240 of the convolutional blocks (e.g., conv5, which may include 2048 channels in an example) can be convolved with a 1×1 convolutional layer to generate a feature map (e.g., a 1024-channel feature map). Then, k.sup.2 (C+1) channels feature maps, also referred to as detection position-sensitive score maps 250, can be generated, where the +1 can be for the background class and a total of C categories. The k.sup.2 can correspond to a k×k spatial grid, where the cell in the grid encodes the relative positions (e.g., top-left and bottom-right). In one example, k can be set to 7. In an example, the detection position-sensitive score maps can be generated for each RoI 232 in the image provided as output from the RPN 230. A pooling operation (e.g., position sensitive pooling 242) can be applied to the detection position-sensitive score maps to obtain a C+1-dimensional vector for each RoI 232; para. [0037]); produce classifications for objects, if any, in each region of interest using a second convolutional neural network, trained in part using the first polygonal shape outline of the first object (i.e. method 400 may optionally include, at block 414, training the convolutional network based on the segmentations and/or the feedback. In an aspect, segmentation component 306, e.g., in conjunction with processor 302, memory 304, etc., can train the convolutional network based on the segmentations and/or the feedback. As described, segmentation component 306 can incorporate the feature maps 212, instance masks 214, etc. into the fully convolutional network to provide additional comparisons for determining categories and/or identifiable regions of input images; para. [0057]); predict segmentation masks on at least a subset of the plurality of regions of interest in a pixel-to-pixel manner (i.e. generate masks of instances of objects in the image, etc. As described above, and further herein, category-level semantic segmentation can relate to a process for analyzing pixels in an image and assigning a label for each pixel, where the label may be indicative of an object type or category; para. [0017]); and identify individual objects of the second image based on the classifications and the segmentation masks (i.e. the classification of each instance mask is determined by the RoI classifier 210. To further boost the performance of the ROI classifier 210, the feature maps 212 can be stacked into the layers of RoI classifier 210, which may include stacking the feature maps 212 using a pooling operation (e.g., a position sensitive pooling (PSP) 238, a compact bilinear pooling 244 or other pooling or fusion operation, etc.); para. [0038]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Cao to include the feature of Chen. One would have been motivated to make this modification because it improves detection and classification accuracy. However, Farooqi teaches wherein the first portion is a cropped portion of the first image (i.e. FIG. 2 is a process flow diagram 200 that illustrates a variation of FIG. 1 in which the cropped RGB image 160 is subject to further processing. It will be appreciated that with the example of FIG. 2, similar processes can be applied to the RGB-D image 170 and the example uses only RGB image data 160 to simplify the explanation. Similar to with FIG. 1, RGB-D data is received 110 and then bifurcated into an RGB image 150 and a depth channel image 120 so that depth segmentation 130 can be applied to the depth channel image 120. This depth segmentation 130 is used to define bounding polygons 140 are then subsequently applied to the RGB image 150 so that the RGB image 150 can be made into a cropped RGB image 160; para. [0032]); wherein the first polygonal shape outline (i.e. more than two object localization techniques can be used. Further, in some variations, the object localization techniques can be performed in sequence and/or partially in parallel. The first and second set of proposed bounding polygons (in some cases only one bounding polygon is identified by one of the localization techniques) are then analyzed to determine an intersection of union or other overlap across the first and second sets of proposed bounding polygons 230. Based on this determination, at least one optimal bounding polygon 240 is determined. This optimal bounding polygon 240 can then be used for subsequent image processing including classification of any encapsulated objects within the optimal bounding polygon 240 as applied to the cropped RGB image 160; para. [0032-0034]) is defined by lines and/or curves that collectively surround the first object (i.e. fig. 3, As is illustrated in image 340, a bounding polygon 342 can then be generated that encapsulates the foreground object. The image data encapsulated by the various edges of the bounding polygon 342 can then be subjected to further image processing including, without limitation, classification of the objects; para. [0035]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Cao and Chen to include the feature of Farooqi. One would have been motivated to make this modification because it the outline captures the true geometry of the object more accurately, so the CNN learns from precise, noise-reduced labels. Claim 6: Cao teaches a computer-implemented method comprising: acquiring one or more high-level features and one or more low-level features that are established for an image that depicts an object of a given type (i.e. FIG. 1, an example in which an apparatus for segmenting a medical image is integrated in an electronic device 100 is used. The electronic device 100 may obtain a slice pair 102 (the slice pair including two slices 103 and 104 by sampling a to-be-segmented medical image 101), perform feature extraction 105 on each slice in the slice pair by using different receptive fields, to obtain high-level feature information 107 and 108 and low-level feature information 106 and 109 of the each slice. In one aspect, the electronic device 100 then segments, for each slice in the slice pair, a target object in the slice according to the low-level feature information 106 and 109 and the high-level feature information 107 and 108 of the slice, to obtain an initial segmentation result 111 and 113 of the slice; para. [0036]); applying Atrous Spatial Pyramid Pooling (ASPP) to the one or more high-level features of the image to aggregate the one or more high-level features as aggregate features (i.e. The high-level feature information corresponding to the first slice sample and the high-level feature information corresponding to the second slice sample may be further processed by using ASPP, to obtain high-level feature information in more different dimensions, referring to FIG. 8; para. [0141]); concatenating the aggregate features with the one or more low-level features to form combined features (i.e. The electronic device performs convolution with a convolution kernel of “1×1” on the low-level feature information 806 and the high-level feature information 808 of the first slice 801 by using the first segmentation network branch 803, upsamples the high-level feature information obtained after convolution to have the same size as the low-level feature information obtained after convolution, concatenates the upsampled high-level feature information and low-level feature information obtained after convolution, to obtain the concatenated feature information of the first slice 801; para. [0173]); segmenting the combined features to generate a polygonal shape outline along outer boundaries of the object in the image (i.e. figs. 8-10, performs convolution with a convolution kernel of “3×3” on the concatenated feature information, and then upsamples the concatenated feature information obtained after convolution to obtain a size of the first slice, so that the initial segmentation result 814 of the first slice 801 can be obtained; para. [0173]), and the polygonal shape outline into a dataset that is used to train of the given type upon being applied to images (i.e. the preset image segmentation model may be converged by using the true values annotated in the slice sample pair, the predicted segmentation values of the slice samples in the slice sample pair, and the predicted correlation information, to obtain the trained image segmentation model; para. [0144]). Cao does not explicitly teach incorporating the outline into a dataset that is used to train a convolutional neural network that is used to classify objects; wherein the polygonal shape outline is defined by lines and/or curves that collectively surround the object in the image. However, Chen teaches incorporating the polygonal shape outline into a dataset that is used to train a convolutional neural network that is used to classify objects of the given type upon being applied to images (i.e. method 400 may optionally include, at block 414, training the convolutional network based on the segmentations and/or the feedback. In an aspect, segmentation component 306, e.g., in conjunction with processor 302, memory 304, etc., can train the convolutional network based on the segmentations and/or the feedback. As described, segmentation component 306 can incorporate the feature maps 212, instance masks 214, etc. into the fully convolutional network to provide additional comparisons for determining categories and/or identifiable regions of input images; para. [0057]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Cao to include the feature of Chen. One would have been motivated to make this modification because it improves detection and classification accuracy. However, Farooqi teaches wherein the polygonal shape outline (i.e. more than two object localization techniques can be used. Further, in some variations, the object localization techniques can be performed in sequence and/or partially in parallel. The first and second set of proposed bounding polygons (in some cases only one bounding polygon is identified by one of the localization techniques) are then analyzed to determine an intersection of union or other overlap across the first and second sets of proposed bounding polygons 230. Based on this determination, at least one optimal bounding polygon 240 is determined. This optimal bounding polygon 240 can then be used for subsequent image processing including classification of any encapsulated objects within the optimal bounding polygon 240 as applied to the cropped RGB image 160; para. [0032-0034]) is defined by lines and/or curves that collectively surround the object in the image (i.e. fig. 3, As is illustrated in image 340, a bounding polygon 342 can then be generated that encapsulates the foreground object. The image data encapsulated by the various edges of the bounding polygon 342 can then be subjected to further image processing including, without limitation, classification of the objects; para. [0035]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Cao and Chen to include the feature of Farooqi. One would have been motivated to make this modification because it the outline captures the true geometry of the object more accurately, so the CNN learns from precise, noise-reduced labels. Claim 7: Cao, Chen, and Farooqi teach the computer-implemented method of claim 6. Cao further teaches comprising: applying a convolution to the one or more low-level features (i.e. The electronic device performs convolution with a convolution kernel of “1×1” on the low-level feature information 806 and the high-level feature information 808 of the first slice 801 by using the first segmentation network branch 803, upsamples the high-level feature information obtained after convolution to have the same size as the low-level feature information obtained after convolution, concatenates the upsampled high-level feature information and low-level feature information obtained after convolution, to obtain the concatenated feature information of the first slice 801; para. [0173]); and concatenating the one or more low-level features after convolution with the aggregate features to form combined features (i.e. concatenates the upsampled high-level feature information and low-level feature information obtained after convolution, to obtain the concatenated feature information of the first slice 801; para. [0173]). Claim 10: Cao, Chen, and Farooqi teach the computer-implemented method of claim 6. Cao further teaches up-sampling the polygonal shape outline (i.e. the upsampled high-level feature information and low-level feature information obtained after convolution are concatenated, to obtain concatenated feature information of the first slice sample; para. [0116]). Claim 15: Cao teaches a non-transitory, computer-readable storage medium with instructions stored thereon that, when executed by at least one data processor of a system (i.e. a non-transitory computer-readable storage medium storing a plurality of processor executable instructions; para. [0009]), cause the system to: receive a first image that includes a depiction of a first object of a particular type (i.e. the to-be-segmented medical image may be provided to the apparatus for segmenting a medical image after each medical image acquisition device performs image acquisition on biological tissue (for example, heart or liver). The medical image acquisition device may include an electronic device such as a magnetic resonance imaging (MM) scanner, a CT scanner, a colposcope, or an endoscope; para. [0043]); generate a set of regions of interest, wherein each region of interest is representative of a non-rectangular, polygonal shape (i.e. The image segmentation is a technology and a process of segmenting an image into several particular regions having special properties, and specifying a target of interest. This embodiment of this disclosure is mainly to segment a three-dimensional medical image and find a required target object. For example, a 3D medical image is divided in a z-axis direction into a plurality of single-frame slices (referred to as slices for short). A liver region or the like is then segmented from the slices. After segmentation results of all slices in the 3D medical image are obtained, these segmentation results are combined in the z-axis direction, so that a 3D segmentation result corresponding to a 3D medical image may be obtained. That is, the target object is, for example, a 3D form of the liver region. The segmented target object may be subsequently analyzed by a medical care person or another medical expert for further operation; para. [0035]); the polygonal shape outline having been generated by: establishing one or more high-level features and one or more low-level features of the second image (i.e. FIG. 1, an example in which an apparatus for segmenting a medical image is integrated in an electronic device 100 is used. The electronic device 100 may obtain a slice pair 102 (the slice pair including two slices 103 and 104 by sampling a to-be-segmented medical image 101), perform feature extraction 105 on each slice in the slice pair by using different receptive fields, to obtain high-level feature information 107 and 108 and low-level feature information 106 and 109 of the each slice. In one aspect, the electronic device 100 then segments, for each slice in the slice pair, a target object in the slice according to the low-level feature information 106 and 109 and the high-level feature information 107 and 108 of the slice, to obtain an initial segmentation result 111 and 113 of the slice; para. [0036]), applying Atrous Spatial Pyramid Pooling (ASPP) to the one or more high-level features of the second image to aggregate the one or more high-level features as aggregate features (i.e. The high-level feature information corresponding to the first slice sample and the high-level feature information corresponding to the second slice sample may be further processed by using ASPP, to obtain high-level feature information in more different dimensions, referring to FIG. 8; para. [0141]);, concatenating the aggregate features with the one or more low-level features to form combined features (i.e. The electronic device performs convolution with a convolution kernel of “1×1” on the low-level feature information 806 and the high-level feature information 808 of the first slice 801 by using the first segmentation network branch 803, upsamples the high-level feature information obtained after convolution to have the same size as the low-level feature information obtained after convolution, concatenates the upsampled high-level feature information and low-level feature information obtained after convolution, to obtain the concatenated feature information of the first slice 801; para. [0173]), and segmenting the combined features to generate the polygonal shape outline along outer boundaries of the second object in the second image (i.e. figs. 8-10, performs convolution with a convolution kernel of “3×3” on the concatenated feature information, and then upsamples the concatenated feature information obtained after convolution to obtain a size of the first slice, so that the initial segmentation result 814 of the first slice 801 can be obtained; para. [0173]), predict segmentation masks on at least a subset of the set of regions of interest in a pixel- to-pixel manner (i.e. A pixel belonging to a target object in the first slice sample is then selected according to the concatenated feature information, to obtain the first predicted segmentation value of the slice sample. For example, convolution with a convolution kernel of “3×3” may be specifically performed on the concatenated feature information, and upsampling is performed to obtain a size of the first slice sample, so that the first predicted segmentation value of the slice sample can be obtained; para. [0116]). Cao does not explicitly teach generate a set of regions of interest; produce classifications for objects, if any, in each region of interest using a convolutional neural network trained in part using a polygonal shape outline of a second object depicted in a second image; identify objects of the first image based on classifications; wherein the first portion is a cropped portion of the first image; wherein the polygonal shape outline is defined by lines and/or curves that collectively surround the second object. However, Chen teaches generate a set of regions of interest, wherein each region of interest is representative of a non-rectangular, polygonal shape (i.e. in the instance-level semantic segmentation sub-network, technologies such as R-FCN, MNC (e.g., or ROI-warping from MNC, as described herein), and/or the like, can be used, as described, for region proposals, to generate the instance masks, etc. For example, region proposals (e.g., regions of interest (RoI) 232) for the layer 202 can be generated from RPN 230, and a customized RoI classifier 210 is provided to classify the region proposals; para. [0037]); produce classifications for objects, if any, in each region of interest using a convolutional neural network trained in part using a polygonal shape outline of a second object depicted in a second image (i.e. in the instance-level semantic segmentation sub-network, technologies such as R-FCN, MNC (e.g., or ROI-warping from MNC, as described herein), and/or the like, can be used, as described, for region proposals, to generate the instance masks, etc. For example, region proposals (e.g., regions of interest (RoI) 232) for the layer 202 can be generated from RPN 230, and a customized RoI classifier 210 is provided to classify the region proposals; para. [0037-0040]), predict segmentation masks on at least a subset of the set of regions of interest in a pixel- to-pixel manner (i.e. generate masks of instances of objects in the image, etc. As described above, and further herein, category-level semantic segmentation can relate to a process for analyzing pixels in an image and assigning a label for each pixel, where the label may be indicative of an object type or category; para. [0017]), identify objects of the first image based on classifications and the segmentation masks (i.e. the classification of each instance mask is determined by the RoI classifier 210. To further boost the performance of the ROI classifier 210, the feature maps 212 can be stacked into the layers of RoI classifier 210, which may include stacking the feature maps 212 using a pooling operation (e.g., a position sensitive pooling (PSP) 238, a compact bilinear pooling 244 or other pooling or fusion operation, etc.); para. [0038]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Cao to include the feature of Chen. One would have been motivated to make this modification because it improves detection and classification accuracy. However, Farooqi teaches wherein the polygonal shape outline (i.e. more than two object localization techniques can be used. Further, in some variations, the object localization techniques can be performed in sequence and/or partially in parallel. The first and second set of proposed bounding polygons (in some cases only one bounding polygon is identified by one of the localization techniques) are then analyzed to determine an intersection of union or other overlap across the first and second sets of proposed bounding polygons 230. Based on this determination, at least one optimal bounding polygon 240 is determined. This optimal bounding polygon 240 can then be used for subsequent image processing including classification of any encapsulated objects within the optimal bounding polygon 240 as applied to the cropped RGB image 160; para. [0032-0034]) is defined by lines and/or curves that collectively surround the second object (i.e. fig. 3, As is illustrated in image 340, a bounding polygon 342 can then be generated that encapsulates the foreground object. The image data encapsulated by the various edges of the bounding polygon 342 can then be subjected to further image processing including, without limitation, classification of the objects; para. [0035]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Cao and Chen to include the feature of Farooqi. One would have been motivated to make this modification because it the outline captures the true geometry of the object more accurately, so the CNN learns from precise, noise-reduced labels. Claim 16: Cao, Chen, and Farooqi teach the non-transitory, computer-readable storage medium of claim 15. Cao further teaches generating feature maps from the first image by applying at least a second convolutional neural network to at least a portion of the first image (i.e. the receptive field determines a region size of an input layer corresponding to an element in an output result of a layer. That is, the receptive field is a size, mapped on an input image, of an element point of an output result of a layer in the convolutional neural network (that is, a feature map, also referred to as feature information). For example, for details, refer to FIG. 3. Generally, a receptive field size of an output feature map pixel of a first convolutional layer (for example, C1) is equal to a convolution kernel size (a filter size), while a receptive field size of a high convolutional layer (for example, C4) is related to convolution kernel sizes and step sizes of all layers before the high convolutional layer. Therefore, different levels of information may be captured based on different receptive fields, to extract the feature information of different scales. That is, after feature extraction is performed on a slice by using different receptive fields, high-layer feature information of different scales and low-layer feature information of different scales of the slice may be obtained; para. [0045]). Cao does not explicitly teach sliding a window across the feature maps to obtain a plurality of anchor shapes using a region proposal network; and determining if each anchor shape of the plurality of anchor shapes contains an object to generate a set of regions of interest. However, Chen further teaches where generating a set of regions of interest comprises: generating feature maps from the first image by applying at least a second convolutional neural network to at least a portion of the first image; sliding a window across the feature maps to obtain a plurality of anchor shapes using a region proposal network; and determining if each anchor shape of the plurality of anchor shapes contains an object to generate a set of regions of interest (i.e. in the instance-level semantic segmentation sub-network, technologies such as R-FCN, MNC (e.g., or ROI-warping from MNC, as described herein), and/or the like, can be used, as described, for region proposals, to generate the instance masks, etc. For example, region proposals (e.g., regions of interest (RoI) 232) for the layer 202 can be generated from RPN 230, and a customized RoI classifier 210 is provided to classify the region proposals. In an example, a last layer 240 of the convolutional blocks (e.g., conv5, which may include 2048 channels in an example) can be convolved with a 1×1 convolutional layer to generate a feature map (e.g., a 1024-channel feature map). Then, k.sup.2 (C+1) channels feature maps, also referred to as detection position-sensitive score maps 250, can be generated, where the +1 can be for the background class and a total of C categories. The k.sup.2 can correspond to a k×k spatial grid, where the cell in the grid encodes the relative positions (e.g., top-left and bottom-right). In one example, k can be set to 7. In an example, the detection position-sensitive score maps can be generated for each RoI 232 in the image provided as output from the RPN 230. A pooling operation (e.g., position sensitive pooling 242) can be applied to the detection position-sensitive score maps to obtain a C+1-dimensional vector for each RoI 232; para. [0037]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Cao and Farooqi to include the feature of Chen. One would have been motivated to make this modification because it improves detection and classification accuracy. Claim 17: Cao, Chen, and Farooqi teach the non-transitory, computer-readable storage medium of claim 16. Cao does not explicitly teach wherein each anchor shape is a non-rectangular, polygonal shape. However, Chen further teaches wherein each anchor shape is a non-rectangular, polygonal shape (i.e. FIG. 1 illustrates examples of images and semantic segmentations according to one aspect of the disclosure. Given an image in (a), a ground truth of category-level semantic segmentation is shown in (b), where each pixel is labeled with its corresponding category, which is represented by reference numerals 110 for sidewalk pixels, 112 for pedestrian pixels, 114 for automobile pixels, etc. in the representation of the image shown in (b). In FIG. 1, an example of instance-level semantic segmentation ground truth is shown in (c), where each object in the image is localized based on one or more masks, and are shown as represented using reference numerals 120 and 122 for different instances of pedestrians, 124 for an instance of an automobile, etc., to denote the segmentation of the objects (or instances). In FIG. 1, the expected output of joint category-level and instance-level semantic segmentation, as described herein, is shown in (d). In (d), instances of traffic participants (e.g., cars, pedestrians and riders) are localized using masks, and categorized using category-level semantic segmentation, which can be denoted using different colors in the segmentation for categories with each instance being separately outlined, but are shown here in black and white); para. [0025]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Cao and Farooqi to include the feature of Chen. One would have been motivated to make this modification because it improves detection and classification accuracy. Claim 19: Cao, Chen, and Farooqi teach the non-transitory, computer-readable storage medium of claim 15. Cao does not explicitly teach wherein each segmentation mask encodes an associated object's spatial layout. However, Chen further teaches wherein each segmentation mask encodes an associated object's spatial layout (i.e. n the instance-level semantic segmentation sub-network, technologies such as R-FCN, MNC (e.g., or ROI-warping from MNC, as described herein), and/or the like, can be used, as described, for region proposals, to generate the instance masks, etc. For example, region proposals (e.g., regions of interest (RoI) 232) for the layer 202 can be generated from RPN 230, and a customized RoI classifier 210 is provided to classify the region proposals. In an example, a last layer 240 of the convolutional blocks (e.g., conv5, which may include 2048 channels in an example) can be convolved with a 1×1 convolutional layer to generate a feature map (e.g., a 1024-channel feature map). Then, k.sup.2 (C+1) channels feature maps, also referred to as detection position-sensitive score maps 250, can be generated, where the +1 can be for the background class and a total of C categories. The k.sup.2 can correspond to a k×k spatial grid, where the cell in the grid encodes the relative positions (e.g., top-left and bottom-right). In one example, k can be set to 7. In an example, the detection position-sensitive score maps can be generated for each RoI 232 in the image provided as output from the RPN 230. A pooling operation (e.g., position sensitive pooling 242) can be applied to the detection position-sensitive score maps to obtain a C+1-dimensional vector for each RoI 232; para. [0037]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Cao and Farooqi to include the feature of Chen. One would have been motivated to make this modification because it enhances object representation. 7. Claims 3-4, 8-9, 11-13, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Cao, Chen, Farooqi, and further in view of Agarwal et al. (U.S. Patent Pub. No. US 11468675 B1). Claim 3: Cao, Chen, and Farooqi teach the system of claim 2. Cao does not explicitly teach provide a user interface displaying the first image including the bounding shape. However, Agarwal teaches a user interface displaying the first image including the bounding shape (i.e. fig. 5, the user bay utilize an input device (e.g., a mouse, a finger for touch input, etc.) to draw a bounding box 514 (or otherwise provide user input such as selecting an object, for example, by drawing bounding box 516 around an object (e.g., a dress)). Location data (e.g., dimensions of the bounding box, coordinates, etc.) for the bounding box (or selection) within the video frame may be identified; col. 11, lines 41-50). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Cao, Chen, and Farooqi to include the feature of Agarwal. One would have been motivated to make this modification because it allows for more precise and focused training data preparation. Claim 4: Cao, Chen, Farooqi, and Agarwal teach the system of claim 3. Cao does not explicitly teach receive input that is indicative of the bounding shape being placed by a user through the user interface. However, Agarwal teaches receive input that is indicative of the bounding shape being placed by a user through the user interface (i.e. fig. 5, the user bay utilize an input device (e.g., a mouse, a finger for touch input, etc.) to draw a bounding box 514 (or otherwise provide user input such as selecting an object, for example, by drawing bounding box 516 around an object (e.g., a dress)). Location data (e.g., dimensions of the bounding box, coordinates, etc.) for the bounding box (or selection) within the video frame may be identified; col. 11, lines 41-50). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Cao, Chen, and Farooqi to include the feature of Agarwal. One would have been motivated to make this modification because it allows for more precise and focused training data preparation. Claim 8: Cao, Chen, and Farooqi teach the computer-implemented method of claim 6. Cao does not explicitly teach wherein the image is representative of a rectangular portion of a second image, and wherein the rectangular portion is defined by a user. However, Agarwal teaches wherein the image is representative of a rectangular portion of a second image, and wherein the rectangular portion is defined by a user (i.e. The user bay utilize an input device (e.g., a mouse, a finger for touch input, etc.) to draw a bounding box 514 (or otherwise provide user input such as selecting an object, for example, by drawing bounding box 516 around an object (e.g., a dress)). Location data (e.g., dimensions of the bounding box, coordinates, etc.) for the bounding box (or selection) within the video frame may be identified; col. 11, lines 41-50). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Cao, Chen, and Farooqi to include the feature of Agarwal. One would have been motivated to make this modification because it allows for more precise and focused training data preparation. Claim 9: Cao, Chen, Farooqi, and Agarwal teach the computer-implemented method of claim 8. Cao does not explicitly teach wherein the second image is presented on a user interface, and wherein a bounding shape is placed by the user onto the second image to define the image. However, Agarwal further teaches wherein the second image is presented on a user interface, and wherein a bounding shape is placed by the user onto the second image to define the image user (i.e. The user bay utilize an input device (e.g., a mouse, a finger for touch input, etc.) to draw a bounding box 514 (or otherwise provide user input such as selecting an object, for example, by drawing bounding box 516 around an object (e.g., a dress)). Location data (e.g., dimensions of the bounding box, coordinates, etc.) for the bounding box (or selection) within the video frame may be identified; col. 11, lines 41-50). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Cao, Chen, and Farooqi to include the feature of Agarwal. One would have been motivated to make this modification because it allows for more precise and focused training data preparation. Claim 11: Cao, Chen, and Farooqi teach the computer-implemented method of claim 6. Cao does not explicitly teach presenting, on a user interface, at least a portion of the image and the polygonal shape outline. However, Agarwal teaches presenting, on a user interface, at least a portion of the image and the polygonal shape outline (i.e. fig. 5, the user bay utilize an input device (e.g., a mouse, a finger for touch input, etc.) to draw a bounding box 514 (or otherwise provide user input such as selecting an object, for example, by drawing bounding box 516 around an object (e.g., a dress)). Location data (e.g., dimensions of the bounding box, coordinates, etc.) for the bounding box (or selection) within the video frame may be identified; col. 11, lines 41-50). Therefore, it would have been obvious to one of ordinary skill in the art
Read full office action

Prosecution Timeline

Jun 19, 2023
Application Filed
May 04, 2024
Non-Final Rejection — §103
Nov 08, 2024
Response Filed
Feb 09, 2025
Final Rejection — §103
Apr 22, 2025
Interview Requested
Apr 30, 2025
Applicant Interview (Telephonic)
May 01, 2025
Examiner Interview Summary
May 13, 2025
Request for Continued Examination
May 18, 2025
Response after Non-Final Action
Oct 20, 2025
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12594668
BRAIN-LIKE DECISION-MAKING AND MOTION CONTROL SYSTEM
2y 5m to grant Granted Apr 07, 2026
Patent 12579420
Analog Hardware Realization of Trained Neural Networks
2y 5m to grant Granted Mar 17, 2026
Patent 12579421
Analog Hardware Realization of Trained Neural Networks
2y 5m to grant Granted Mar 17, 2026
Patent 12572850
METHOD FOR IMPLEMENTING MODEL UPDATE AND DEVICE THEREOF
2y 5m to grant Granted Mar 10, 2026
Patent 12572326
DIGITAL ASSISTANT FOR MOVING AND COPYING GRAPHICAL ELEMENTS
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
60%
Grant Probability
92%
With Interview (+31.8%)
3y 6m
Median Time to Grant
High
PTA Risk
Based on 307 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month