Last updated: April 19, 2026
Application No. 17/496,156
SEGMENTING AN IMAGE USING A NEURAL NETWORK

Final Rejection §103
Filed
Oct 07, 2021
Examiner
GORMLEY, AARON PATRICK
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Applied Materials Israel Ltd.
OA Round
2 (Final)
Interview Optional

— -60.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 5 resolved cases, 2023–2026
Examiner Intelligence

GORMLEY, AARON PATRICK View full profile →
Grants 60% of resolved cases
Career Allow Rate
3 granted / 5 resolved
+5.0% vs TC avg
Minimal -60% lift
Without
With
+-60.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 4m
Avg Prosecution
30 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
30.2%
-9.8% vs TC avg
§103
36.0%
-4.0% vs TC avg
§102
8.4%
-31.6% vs TC avg
§112
21.5%
-18.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 5 resolved cases
Office Action

§103
DETAILED ACTION
	This action is in response to the application filed 10/07/2021. Claims 1-6, 9-14, and 17-19 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities:
It’s unclear what paragraph amended paragraph [0005] is intended to replace, as no paragraph labeled [005] existed prior to the instant amendments. It’s likely this was a intended to be paragraph [005].
[005] “an method” should be “a method” 
[0058] In Formula 1, “enpropy” should be “entropy”. While one instance of “enpropy” was corrected in the instant amendments, the other was not.
[0058] The relation “                                
                                    e
                                    n
                                    
                                        
                                            t
                                        
                                    
                                    r
                                    o
                                    p
                                    y
                                    
                                        
                                            
                                                
                                                    s
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    >
                                    e
                                    n
                                    t
                                    r
                                    o
                                    p
                                    y
                                    
                                        
                                            
                                                
                                                    E
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            ”, used twice in Formula 1, cannot be summed, multiplied, or used as a divisor, as it does not represent a value. It’s likely that an Iverson bracket was intended [(                                
                                    e
                                    n
                                    
                                        
                                            t
                                        
                                    
                                    r
                                    o
                                    p
                                    y
                                    
                                        
                                            
                                                
                                                    s
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    >
                                    e
                                    n
                                    t
                                    r
                                    o
                                    p
                                    y
                                    
                                        
                                            
                                                
                                                    E
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    ]
                                
                            , which would represent a value. While an Iverson bracket was added to the first use of the relation in the instant amendments, it was not added to the second. 
[0078] “for example by two parameter” should be “for example by two parameters”
[0078] “these limited number of parameters” should be “this limited number of parameters” or “these limited numbers of parameters”
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1-3, 8, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 10,311,578 B1) in view of Naito (US 2011/0019928 A1).

Regarding claim 1, Kim teaches a method, comprising:
[R]eceiving, by a processing device, an image: “[T]here is provided a learning method for segmenting an input image having one or more lanes, including steps of: (a) a learning device (processing device), if the input image is acquired (receiv[ed]), instructing a convolutional neural network ...” (Kim, Col. 2, lines 15-19).
[A]pplying a machine learning model to the image, wherein the machine-learning model is trained by a training process comprising evaluating training outputs generated during the training process using a loss function: “[T]here is provided a testing method for segmenting a test image having one or more lanes, including steps of: (a) on condition that a learning device ... (2) has instructed the CNN (machine learning model) module to apply at least one Softmax operation to each of the segmentation scores for training to thereby generate each of Softmax scores for training (training outputs); and (3) has instructed the CNN module to (I) (i) apply at least one multinomial logistic loss operation (loss function) to each of the Softmax scores for training to thereby generate each of Softmax losses ... and then (II) learn at least one parameter of the CNN module through backpropagation by using each of the Softmax losses and each of the embedding losses” (Kim, Col. 3, lines 12-37) 
wherein the training process further comprises forming different clusters of machine learning model results for pixels that belong to different classes of segments: “there is provided a testing method for segmenting a test image having one or more lanes, including steps of: (a) on condition that a learning device … (3) has instructed the CNN module to (I) (i) apply at least one multinomial 25 logistic loss operation to each of the Softmax scores for training to thereby generate each of Softmax losses“ (Kim, column 3, paragraph 2); “Each of the multinomial logistic losses is calculated by using an equation: 
    PNG
    media_image1.png
    121
    541
    media_image1.png
    Greyscale
Herein, s denotes the number of the pixels included in the one input image, l denotes a one-hot-encoding vector indicating to which cluster an i-th pixel belongs on its corresponding ground truth (GT) (class / segment) label, and P(i) denotes each of the Softmax scores corresponding to each of the pixels. Herein, each of the Softmax scores indicates the cluster to which each pixel belongs through the largest element value within the vector” (Kim, Col. 8, lines 25-40)
and modifying the different clusters, wherein the modifying comprises performing expansion operations, to move at least a portion of a first cluster to a second cluster, and shrinking operations, to increase a distance between the different clusters: “The present disclosure relates to a learning method for segmenting an input image having one or more lanes … and more particularly, to the learning method for segmenting the input image having the lanes including steps of: … (c) instructing the CNN module to (I) (i) apply at least one multinomial logistic loss operation to each of the Softmax scores to thereby generate each of Softmax losses and (ii) apply at least one pixel embedding operation to each of the Softmax scores to thereby generate each of embedding losses which causes a learning of the CNN module to increase each of inter-lane differences (distance[s]) among respective averages of the segmentation scores of the respective lanes (different clusters) and decrease (shrink) each of intra-lane variances among the segmentation scores of the respective lanes, and then (II) learn at least one parameter of the CNN module through backpropagation by using each of the Softmax losses and each of the embedding losses, and a learning device, a testing method and a testing device using the same” (Kim, column 1, paragraph 1). Loss and backpropagation are used to gradually increase the distance of different lane clusters in the segmentation / scoring space as the model is trained.
[O]btaining, for each pixel of multiple pixels of the image, an output of the machine learning model within a multi-dimensional domain: “[T]here is provided a learning method for ... the learning device instructing the CNN (machine-learning model) module to ... generate each of Softmax scores (output[s]); and (c) the learning device instructing the CNN module to ... (I) (i) apply at least one multinomial logistic loss operation to each of the Softmax scores” (Kim, Col. 2, lines 16-31); “Herein, each of the Softmax scores indicates the cluster to which each pixel belongs through the largest element value within the vector (output ... within a multi-dimensional domain)” (Kim, Col. 8, lines 38-40).
wherein the output is obtained by providing the machine-learning model with pixels of different classes of segments of the image (“Each of the embedding losses is calculated ... [a]ssuming that the input image includes a plurality of clusters having the lanes and one or more background parts (pixels of different classes of segments)” (Kim, Col. 7, lines 37-50)) that are mapped to spaced apart clusters associated with different axes of the multi-dimensional domain (“Each of the multinomial logistic losses is calculated by using an equation: 
    PNG
    media_image1.png
    121
    541
    media_image1.png
    Greyscale
Herein, s denotes the number of the pixels included in the one input image, l denotes a one-hot-encoding vector indicating to which cluster an i-th pixel belongs on its corresponding ground truth (GT) label (spaced apart clusters associated with different axes of the multi-dimensional domain), and P(i) denotes each of the Softmax scores corresponding to each of the pixels. Herein, each of the Softmax scores indicates the cluster to which each pixel belongs through the largest element value within the vector” (Kim, Col. 8, lines 25-40))
Examiner’s note: As described in paragraph [0060] of the instant Specification, “The one-hot regulation loss function attempts to center the clusters about value ‘1’ on each axis”. By using one-hot encoding outputs to represent clusters, Kim is associating each cluster (class) with a single dimension (axis) of a multi-dimensional vector encoding (multi-dimensional domain) to space them out for ground truth. The Softmax scores are an approximation of this one-hot ground truth, still associating each pixel with a cluster based on the highest-valued dimension of a vector.
[D]etermining, using the machine-learning model and for each pixel of multiple pixels of the image, a class of a segment that comprises the pixel by finding a closest axis to the output: “[T]here is provided a testing method for segmenting a test image having one or more lanes, including steps of: (a) on condition that a learning device (1) has instructed a convolutional neural network (CNN) (machine learning model) module to ...  generate each of segmentation scores for training of each of pixels on the training image; (2) has instructed the CNN module to ... generate each of Softmax scores (output) for training” (Kim, Col. 3, lines 12-24); “[E]ach of the Softmax scores indicates the cluster (class of segment) to which each pixel belongs through the largest element value (closest axis) within the vector” (Kim, Col. 8, lines 38-40).
While Kim fails to disclose the further limitations of the claim, Naito discloses a method of modifying the different clusters, wherein the modifying comprises performing expansion operations, to move at least a portion of a first cluster to a second cluster, and shrinking operations, to increase a distance between the different clusters: “the clustering processing unit 302 merges a cluster (first cluster) having a small number of allocated pixels to another cluster (second cluster) every time it processes pixels for L lines” (Naito, [0025]). The larger cluster(s) are expanded by having new pixels added to its classification.
	Kim and Naito relate to image segmentation and are analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Kim and Naito to merge small clusters into larger ones, as disclosed by Naito. Doing so would remove noise from the from the clustering. See Naito, [0044]

Regarding claim 2,  the rejection of claim 1 in view of Kim and Naito is incorporated. Kim also teaches a method, wherein the machine-learning model comprises a neural network: “[T]here is provided a learning method for segmenting an input image having one or more lanes, including steps of ... instructing a convolutional neural network (CNN) (neural network) module to ... generate each of segmentation scores of each of pixels on the input image” (Kim, Col. 2, lines 16-25).

Regarding claim 3,  the rejection of claim 1 in view of Kim and Naito is incorporated. Kim also teaches a method, wherein the loss function comprises at least one of a normalized classification loss function, a one-hot regulation loss function, or a cluster mean perpendicular inducing loss function: “Each of the multinomial logistic losses is calculated by using an equation (classification loss function):

    PNG
    media_image1.png
    121
    541
    media_image1.png
    Greyscale

... Herein, each of the Softmax scores indicates the cluster to which each pixel belongs ... [E]ach of the loss values calculated as such per pixel is added and then divided by the number of pixels (normaliz[ation])” (Kim, Col. 8, lines 25-48).

	Regarding claim 17, Kim teaches [a] system comprising: a memory; and a processing device operatively coupled with the memory, to perform operations: “[T]here is provided a learning device for segmenting an input image having one or more lanes, including: at least one memory that stores instructions; and at least one processor configured to execute the instructions” (Kim, Col. 3, lines 47-51).
Kim also teaches operations comprising...
[P]roviding an image as input to a machine-learning model: “[T]here is provided a learning method for segmenting an input image having one or more lanes, including steps of ... if the input image is acquired, instructing a convolutional neural network (CNN) module (machine-learning model) to apply at least one convolution operation to the input image” (Kim, Col. 2, lines 16-21).
wherein the machine-learning model is trained by a training process comprising evaluating training outputs generated during the training process using a loss function: “[T]here is provided a testing method for segmenting a test image having one or more lanes, including steps of: (a) on condition that a learning device ... (2) has instructed the CNN (machine learning model) module to apply at least one Softmax operation to each of the segmentation scores for training to thereby generate each of Softmax scores for training (training outputs); and (3) has instructed the CNN module to (I) (i) apply at least one multinomial logistic loss operation (loss function) to each of the Softmax scores for training to thereby generate each of Softmax losses ... and then (II) learn at least one parameter of the CNN module through backpropagation by using each of the Softmax losses and each of the embedding losses” (Kim, Col. 3, lines 12-37).
wherein the training process further comprises forming different clusters of machine learning model results for pixels that belong to different classes of segments: “there is provided a testing method for segmenting a test image having one or more lanes, including steps of: (a) on condition that a learning device … (3) has instructed the CNN module to (I) (i) apply at least one multinomial 25 logistic loss operation to each of the Softmax scores for training to thereby generate each of Softmax losses“ (Kim, column 3, paragraph 2); “Each of the multinomial logistic losses is calculated by using an equation: 
    PNG
    media_image1.png
    121
    541
    media_image1.png
    Greyscale
Herein, s denotes the number of the pixels included in the one input image, l denotes a one-hot-encoding vector indicating to which cluster an i-th pixel belongs on its corresponding ground truth (GT) (class / segment) label, and P(i) denotes each of the Softmax scores corresponding to each of the pixels. Herein, each of the Softmax scores indicates the cluster to which each pixel belongs through the largest element value within the vector” (Kim, Col. 8, lines 25-40)
and modifying the different clusters, wherein the modifying comprises performing expansion operations, to move at least a portion of a first cluster to a second cluster, and shrinking operations, to increase a distance between the different clusters: “The present disclosure relates to a learning method for segmenting an input image having one or more lanes … and more particularly, to the learning method for segmenting the input image having the lanes including steps of: … (c) instructing the CNN module to (I) (i) apply at least one multinomial logistic loss operation to each of the Softmax scores to thereby generate each of Softmax losses and (ii) apply at least one pixel embedding operation to each of the Softmax scores to thereby generate each of embedding losses which causes a learning of the CNN module to increase each of inter-lane differences (distance[s]) among respective averages of the segmentation scores of the respective lanes (different clusters) and decrease (shrink) each of intra-lane variances among the segmentation scores of the respective lanes, and then (II) learn at least one parameter of the CNN module through backpropagation by using each of the Softmax losses and each of the embedding losses, and a learning device, a testing method and a testing device using the same” (Kim, column 1, paragraph 1). Loss and backpropagation are used to gradually increase the distance of different lane clusters in the segmentation / scoring space as the model is trained.
[O]btaining, for each pixel of multiple pixels of the image, an output of the machine learning model within a multi-dimensional domain: “[T]here is provided a learning method for ... the learning device instructing the CNN (machine-learning model) module to ... generate each of Softmax scores (output[s]); and (c) the learning device instructing the CNN module to ... (I) (i) apply at least one multinomial logistic loss operation to each of the Softmax scores” (Kim, Col. 2, lines 16-31); “Herein, each of the Softmax scores indicates the cluster to which each pixel belongs through the largest element value within the vector (output ... within a multi-dimensional domain)” (Kim, Col. 8, lines 38-40).
wherein the output is obtained by providing the machine-learning model with pixels of different classes of segments (“Each of the embedding losses is calculated ... [a]ssuming that the input image includes a plurality of clusters having the lanes and one or more background parts (pixels of different classes of segments)” (Kim, Col. 7, lines 37-50)) mapped to spaced apart clusters that are associated with different axes of the multi-dimensional domain: (“Each of the multinomial logistic losses is calculated by using an equation: 
    PNG
    media_image1.png
    121
    541
    media_image1.png
    Greyscale
Herein, s denotes the number of the pixels included in the one input image, l denotes a one-hot-encoding vector indicating to which cluster an i-th pixel belongs on its corresponding ground truth (GT) label (spaced apart clusters associated with different axes of the multi-dimensional domain), and P(i) denotes each of the Softmax scores corresponding to each of the pixels. Herein, each of the Softmax scores indicates the cluster to which each pixel belongs through the largest element value within the vector” (Kim, Col. 8, lines 25-40))
Examiner’s note: As described in paragraph [0060] of the instant Specification, “The one-hot regulation loss function attempts to center the clusters about value ‘1’ on each axis”. By using one-hot encoding outputs to represent clusters, Kim is associating each cluster (class) with a single dimension (axis) of a multi-dimensional vector encoding (multi-dimensional domain) to space them out for ground truth. The Softmax scores are an approximation of this one-hot ground truth, still associating each pixel with a cluster based on the highest-valued dimension of a vector.
[D]etermining, using the machine-learning model and for each pixel of multiple pixels of the image, a class of a segment that comprises the pixel by finding a closest axis to the output: “[T]here is provided a testing method for segmenting a test image having one or more lanes, including steps of: (a) on condition that a learning device (1) has instructed a convolutional neural network (CNN) (machine learning model) module to ...  generate each of segmentation scores for training of each of pixels on the training image; (2) has instructed the CNN module to ... generate each of Softmax scores (output) for training” (Kim, Col. 3, lines 12-24); “[E]ach of the Softmax scores indicates the cluster (class of segment) to which each pixel belongs through the largest element value (closest axis) within the vector” (Kim, Col. 8, lines 38-40).
While Kim fails to disclose the further limitations of the claim, Naito discloses a method of modifying the different clusters, wherein the modifying comprises performing expansion operations, to move at least a portion of a first cluster to a second cluster, and shrinking operations, to increase a distance between the different clusters: “the clustering processing unit 302 merges a cluster (first cluster) having a small number of allocated pixels to another cluster (second cluster) every time it processes pixels for L lines” (Naito, [0025]). The larger cluster(s) are expanded by having new pixels added to its classification.
Kim and Naito relate to image segmentation and are analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Kim and Naito to merge small clusters into larger ones, as disclosed by Naito. Doing so would remove noise from the from the clustering. See Naito, [0044]

Regarding claim 18, the rejection of claim 17 in view of Kim and Naito is incorporated. Kim also teaches a method, wherein the loss function comprises at least one of a normalized classification loss function, a one-hot regulation loss function, or a cluster mean perpendicular inducing loss function: “Each of the multinomial logistic losses is calculated by using an equation (classification loss function):

    PNG
    media_image1.png
    121
    541
    media_image1.png
    Greyscale

... Herein, each of the Softmax scores indicates the cluster to which each pixel belongs ... [E]ach of the loss values calculated as such per pixel is added and then divided by the number of pixels (normaliz[ation])” (Kim, Col. 8, lines 25-48).

Claim(s) 4, 9-12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 10,311,578 B1) in view of Naito (US 2011/0019928 A1) and further in view of Redmon et al. (US 2019/0102646 A1).

Regarding claim 4,  the rejection of claim 1 in view of Kim and Naito is incorporated. While Kim and Naito fail to teach the further limitations of claim 4, Redmon teaches a method, wherein the training process comprises feeding the machine-learning model with images from a plurality of datasets, wherein images of at least two of the datasets of the plurality of datasets are labeled independently from each other and without using a common taxonomy: “Some implementations use a hierarchical view of object classification that enables the combination of distinct datasets together for training of convolutional neural networks for object detection and classification (feeding the machine-learning model with images from a plurality of datasets)” (Redmon, [0025]); “FIG. 10A is a diagram of examples of single level tree representations of classes represented in two training datasets of images with different class labeling schemes (images of at least two of the datasets ... labeled independently from each other and without using a common taxonomy)” (Redmon, [0103]). An image of Figure 10A is included below.

    PNG
    media_image2.png
    721
    1150
    media_image2.png
    Greyscale

Redmon relates to image classification with machine learning and is analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Kim and Naito to use images data from independently labeled datasets, as disclosed by Redmon. Doing so would harness the larger amount of classification data already available to expand the scope and accuracy of Kim’s object detection. See Redmon, [0025].

	Regarding claim 9, Kim teaches a method, comprising:
[P]roviding an image as input to a machine-learning model: “[T]here is provided a learning method for segmenting an input image having one or more lanes, including steps of ... if the input image is acquired, instructing a convolutional neural network (CNN) module (machine-learning model) to apply at least one convolution operation to the input image” (Kim, Col. 2, lines 16-21).
wherein the training process further comprises forming different clusters of machine learning model results for pixels that belong to different classes of segments: “there is provided a testing method for segmenting a test image having one or more lanes, including steps of: (a) on condition that a learning device … (3) has instructed the CNN module to (I) (i) apply at least one multinomial 25 logistic loss operation to each of the Softmax scores for training to thereby generate each of Softmax losses“ (Kim, column 3, paragraph 2); “Each of the multinomial logistic losses is calculated by using an equation: 
    PNG
    media_image1.png
    121
    541
    media_image1.png
    Greyscale
Herein, s denotes the number of the pixels included in the one input image, l denotes a one-hot-encoding vector indicating to which cluster an i-th pixel belongs on its corresponding ground truth (GT) (class / segment) label, and P(i) denotes each of the Softmax scores corresponding to each of the pixels. Herein, each of the Softmax scores indicates the cluster to which each pixel belongs through the largest element value within the vector” (Kim, Col. 8, lines 25-40)
and modifying the different clusters, wherein the modifying comprises performing expansion operations, to move at least a portion of a first cluster to a second cluster, and shrinking operations, to increase a distance between the different clusters: “The present disclosure relates to a learning method for segmenting an input image having one or more lanes … and more particularly, to the learning method for segmenting the input image having the lanes including steps of: … (c) instructing the CNN module to (I) (i) apply at least one multinomial logistic loss operation to each of the Softmax scores to thereby generate each of Softmax losses and (ii) apply at least one pixel embedding operation to each of the Softmax scores to thereby generate each of embedding losses which causes a learning of the CNN module to increase each of inter-lane differences (distance[s]) among respective averages of the segmentation scores of the respective lanes (different clusters) and decrease (shrink) each of intra-lane variances among the segmentation scores of the respective lanes, and then (II) learn at least one parameter of the CNN module through backpropagation by using each of the Softmax losses and each of the embedding losses, and a learning device, a testing method and a testing device using the same” (Kim, column 1, paragraph 1). Loss and backpropagation are used to gradually increase the distance of different lane clusters in the segmentation / scoring space as the model is trained.
[O]btaining, for each pixel of multiple pixels of the image, an output of the machine learning model within a multi-dimensional domain: “[T]here is provided a learning method for ... the learning device instructing the CNN (machine-learning model) module to ... generate each of Softmax scores (output[s]); and (c) the learning device instructing the CNN module to ... (I) (i) apply at least one multinomial logistic loss operation to each of the Softmax scores” (Kim, Col. 2, lines 16-31); “Herein, each of the Softmax scores indicates the cluster to which each pixel belongs through the largest element value within the vector (output ... within a multi-dimensional domain)” (Kim, Col. 8, lines 38-40).
wherein the output is obtained by providing the machine-learning model with pixels of different classes of segments (“Each of the embedding losses is calculated ... [a]ssuming that the input image includes a plurality of clusters having the lanes and one or more background parts (pixels of different classes of segments)” (Kim, Col. 7, lines 37-50)) mapped to spaced apart clusters that are associated with different axes of the multi- dimensional domain (“Each of the multinomial logistic losses is calculated by using an equation: 
    PNG
    media_image1.png
    121
    541
    media_image1.png
    Greyscale
Herein, s denotes the number of the pixels included in the one input image, l denotes a one-hot-encoding vector indicating to which cluster an i-th pixel belongs on its corresponding ground truth (GT) label (spaced apart clusters associated with different axes of the multi-dimensional domain), and P(i) denotes each of the Softmax scores corresponding to each of the pixels. Herein, each of the Softmax scores indicates the cluster to which each pixel belongs through the largest element value within the vector” (Kim, Col. 8, lines 25-40))
Examiner’s note: As described in paragraph [0060] of the instant Specification, “The one-hot regulation loss function attempts to center the clusters about value ‘1’ on each axis”. By using one-hot encoding outputs to represent clusters, Kim is associating each cluster (class) with a single dimension (axis) of a multi-dimensional vector encoding (multi-dimensional domain) to space them out for ground truth. The Softmax scores are an approximation of this one-hot ground truth, still associating each pixel with a cluster based on the highest-valued dimension of a vector.
[D]etermining, using the machine-learning model and for each pixel of multiple pixels of the image, a class of a segment that comprises the pixel by finding a closest axis to the output: “[T]here is provided a testing method for segmenting a test image having one or more lanes, including steps of: (a) on condition that a learning device (1) has instructed a convolutional neural network (CNN) (machine learning model) module to ...  generate each of segmentation scores for training of each of pixels on the training image; (2) has instructed the CNN module to ... generate each of Softmax scores (output) for training” (Kim, Col. 3, lines 12-24); “[E]ach of the Softmax scores indicates the cluster (class of segment) to which each pixel belongs through the largest element value (closest axis) within the vector” (Kim, Col. 8, lines 38-40).
While Kim fails to disclose the further limitations of the claim, Naito discloses a method of modifying the different clusters, wherein the modifying comprises performing expansion operations, to move at least a portion of a first cluster to a second cluster, and shrinking operations, to increase a distance between the different clusters: “the clustering processing unit 302 merges a cluster (first cluster) having a small number of allocated pixels to another cluster (second cluster) every time it processes pixels for L lines” (Naito, [0025]). The larger cluster(s) are expanded by having new pixels added to its classification.
Kim and Naito relate to image segmentation and are analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Kim and Naito to merge small clusters into larger ones, as disclosed by Naito. Doing so would remove noise from the from the clustering. See Naito, [0044]
While Naito doesn’t teach the further limitations of the claim, Redmon teaches a method, wherein the machine-learning model is trained by a training process comprising inputting into the machine-learning model a plurality of datasets, wherein images of at least two of the plurality of datasets are labeled independently from each other and without using a common taxonomy: “Some implementations use a hierarchical view of object classification that enables the combination of distinct datasets together for training of convolutional neural networks for object detection and classification (inputting into the machine-learning model a plurality of datasets)” (Redmon, [0025]); “FIG. 10A is a diagram of examples of single level tree representations of classes represented in two training datasets of images with different class labeling schemes (images ... are labeled independently from each other and without using a common taxonomy)” (Redmon, [0103]).
Redmon relates to image classification with machine learning and is analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Kim and Naito to use images data from independently labeled datasets, as disclosed by Redmon. Doing so would harness the larger amount of classification data already available to expand the scope and accuracy of Kim’s object detection. See Redmon, [0025].

Regarding claim 10, the rejection of claim 9 in view of Kim, Naito and Redmon is incorporated. Kim further teaches a method, wherein the machine-learning model comprises a neural network: “[T]here is provided a learning method for segmenting an input image having one or more lanes, including steps of ... instructing a convolutional neural network (CNN) (neural network) module to ... generate each of segmentation scores of each of pixels on the input image” (Kim, Col. 2, lines 16-25).

Regarding claim 11, the rejection of claim 9 in view of Kim, Naito, and Redmon is incorporated. Kim further teaches a method, wherein the training process comprises evaluating training outputs generated during the training process using a loss function: “[T]here is provided a testing method for segmenting a test image having one or more lanes, including steps of: (a) on condition that a learning device ... (2) has instructed the CNN (machine learning model) module to apply at least one Softmax operation to each of the segmentation scores for training to thereby generate each of Softmax scores for training (training outputs); and (3) has instructed the CNN module to (I) (i) apply at least one multinomial logistic loss operation (loss function) to each of the Softmax scores for training to thereby generate each of Softmax losses ... and then (II) learn at least one parameter of the CNN module through backpropagation by using each of the Softmax losses and each of the embedding losses” (Kim, Col. 3, lines 12-37).

Regarding claim 12, the rejection of claim 11 in view of Kim, Naito, and Redmon is incorporated. Kim further teaches a method, wherein the loss function comprises at least one of a normalized classification loss function, a one-hot regulation loss function, or a cluster mean perpendicular inducing loss function: “Each of the multinomial logistic losses is calculated by using an equation (classification loss function):

    PNG
    media_image1.png
    121
    541
    media_image1.png
    Greyscale

... Herein, each of the Softmax scores indicates the cluster to which each pixel belongs ... [E]ach of the loss values calculated as such per pixel is added and then divided by the number of pixels (normaliz[ation])” (Kim, Col. 8, lines 25-48).

Regarding claim 19, the rejection of claim 17 in view of Kim and Naito is incorporated. While Kim doesn’t disclose the further limitations of the claim, Redmon teaches a method, wherein the training process comprises feeding the machine-learning model with images from a plurality of datasets, wherein images of at least two of the datasets of the plurality of datasets are labeled independently from each other and without using a common taxonomy: “Some implementations use a hierarchical view of object classification that enables the combination of distinct datasets together for training of convolutional neural networks for object detection and classification (feeding the machine-learning model with images from a plurality of datasets)” (Redmon, [0025]); “FIG. 10A is a diagram of examples of single level tree representations of classes represented in two training datasets of images with different class labeling schemes (images of at least two of the datasets ... labeled independently from each other and without using a common taxonomy)” (Redmon, [0103]).
Redmon relates to image classification with machine learning and is analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Kim and Naito to use images data from independently labeled datasets, as disclosed by Redmon. Doing so would harness the larger amount of classification data already available to expand the scope and accuracy of Kim’s object detection. See Redmon, [0025].

Claim(s) 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 10,311,578 B1) in view of Naito (US 2011/0019928 A1) and further in view of Sharma (The Most Comprehensive Guide to K-Means Clustering You’ll Ever Need, 2019, Analytics Vidhya).

Regarding claim 5,  the rejection of claim 1 in view of Kim and Naito is incorporated. Kim further teaches a method, wherein the training process further comprises forming different clusters of machine-learning model results for pixels that belong to different classes of segments: “[T]here is provided a testing method for segmenting a test image having one or more lanes, including steps of: (a) on condition that a learning device (1) has instructed a convolutional neural network (CNN) (machine learning model) module to ... generate each of segmentation scores for training of each of pixels on the training image; (2) has instructed the CNN module to apply at least one Softmax operation to each of the segmentation scores for training to thereby generate each of Softmax scores (model results) for training” (Kim, Col. 3, lines 12-24); “[E]ach of the Softmax scores indicates the cluster to which each pixel belongs” (Kim, Col. 8, lines 38-39). Examiner’s Note: By generating Softmax scores for each pixel, each pixel is being assigned to a cluster. The process of finding the pixels that constitute each cluster can be considered forming clusters.
Kim and Naito fail to disclose the further limitations of the claim, but Sharma teaches a method, wherein the training process further comprises ... modifying the different clusters, wherein the modifying comprises performing expansion operations and shrinking operations without reducing distances between the different clusters: “K-means is a centroid-based algorithm, or a distance-based algorithm, where we calculate the distances to assign a point to a cluster (modifying the different clusters). In K-Means, each cluster is associated with a centroid ... The first step in k-means is to pick the number of clusters, k” (Sharma, Introduction to K-Means Clustering); “We can stop the algorithm if the centroids of newly formed clusters are not changing (without reducing distances between the different clusters). Even after multiple iterations, if we are getting the same centroids for all the clusters, we can say that the algorithm is not learning any new pattern and it is a sign to stop the training” (Sharma, Stopping Criteria for K-Means Clustering). Examiner’s note: K-means is a clustering algorithm that finds k clusters within a set of data. This algorithm can be used to divide and recombine existing clusters into more numerous, smaller clusters by setting k to a value greater than the number of existing clusters, a shrinking operation. It can also be used to recombine existing clusters into fewer, larger clusters by setting k to a value smaller than the number of existing clusters, an expansion operation. As taught by Sharma, when constant centroids are being used as stopping criteria, no distances between clusters change during the final few iterations of the algorithm, as cluster centroids remain constant.
	Sharma relates to machine learning methods for clustering data and is analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Kim and Naito to modify clusters with expansion and shrinking operations, as disclosed by Sharma. K-means minimizes the intracluster distances of data clusters, resulting in groups likely to be similar, for an arbitrary number of clusters. See Sharma, Property 1, Inertia, and Introduction to K-Means Clustering.

	Regarding claim 6, the rejection of claim 5 in view of Kim, Naito, and Sharma is incorporated. Kim further teaches a method wherein at least one of the forming and the modifying is based, at least in part, on applying a normalized classification loss function: “[E]ach of the Softmax scores indicates the cluster to which each pixel belongs” (Kim, Col. 8, lines 38-39); “Each of the Softmax losses (normalized classification loss) ... may be used for learning at least one parameter of the CNN module through the backpropagation” (Kim, Col. 8, lines 52-55). Examiner’s note: The process of finding the pixels that constitute each cluster can be considered forming clusters. Training a model that forms these clusters can absolutely be considered part of the cluster forming process.

Claims 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 10,311,578 B1) in view of Naito (US 2011/0019928 A1) in view of Redmon et al. (US 2019/0102646 A1) and further in view of Sharma (The Most Comprehensive Guide to K-Means Clustering You’ll Ever Need, 2019, Analytics Vidhya).

Regarding claim 13, the rejection of claim 9 in view of Kim, Naito, and Redmon is incorporated. Kim further teaches a method of forming different clusters of machine-learning model results for pixels that belong to different classes of segments: “[T]here is provided a testing method for segmenting a test image having one or more lanes, including steps of: (a) on condition that a learning device (1) has instructed a convolutional neural network (CNN) (machine learning model) module to ... generate each of segmentation scores for training of each of pixels on the training image; (2) has instructed the CNN module to apply at least one Softmax operation to each of the segmentation scores for training to thereby generate each of Softmax scores (model results) for training” (Kim, Col. 3, lines 12-24); “[E]ach of the Softmax scores indicates the cluster to which each pixel belongs” (Kim, Col. 8, lines 38-39). Examiner’s note: By generating Softmax scores for each pixel, each pixel is being assigned to a cluster. The process of finding the pixels that constitute each cluster can be considered forming clusters.
While Kim, Naito, and Redmon don’t teach the further limitations of the claim, Sharma teaches a method of modifying the different clusters, wherein the modifying comprises performing expansion operations and shrinking operations without reducing distances between the different clusters: “K-means is a centroid-based algorithm, or a distance-based algorithm, where we calculate the distances to assign a point to a cluster (modifying the different clusters). In K-Means, each cluster is associated with a centroid ... The first step in k-means is to pick the number of clusters, k” (Sharma, Introduction to K-Means Clustering); “We can stop the algorithm if the centroids of newly formed clusters are not changing (without reducing distances between the different clusters). Even after multiple iterations, if we are getting the same centroids for all the clusters, we can say that the algorithm is not learning any new pattern and it is a sign to stop the training” (Sharma, Stopping Criteria for K-Means Clustering). Examiner’s note: K-means is a clustering algorithm that finds k clusters within a set of data. This algorithm can be used to divide and recombine existing clusters into more numerous, smaller clusters by setting k to a value greater than the number of existing clusters, a shrinking operation. It can also be used to recombine existing clusters into fewer, larger clusters by setting k to a value smaller than the number of existing clusters, an expansion operation. As taught by Sharma, when constant centroids are being used as stopping criteria, no distances between clusters change during the final few iterations of the algorithm, as cluster centroids remain constant.
Sharma relates to machine learning methods for clustering data and is analogous to the claimed invention. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Kim and Redmon to modify clusters with expansion and shrinking operations, as disclosed by Sharma. K-means minimizes the intracluster distances of data clusters, resulting in groups likely to be similar, for an arbitrary number of clusters. See Sharma, Property 1, Inertia, and Introduction to K-Means Clustering.

Regarding claim 14, the rejection of claim 13 in view of Kim, Naito, Redmon, and Sharma is incorporated. Kim further teaches a method, wherein at least one of the forming or the modifying is based, at least in part, on applying a normalized classification loss function: “[E]ach of the Softmax scores indicates the cluster to which each pixel belongs” (Kim, Col. 8, lines 38-39); “Each of the Softmax losses (normalized classification loss) ... may be used for learning at least one parameter of the CNN module through the backpropagation” (Kim, Col. 8, lines 52-55). Examiner’s note: The process of finding the pixels that constitute each cluster can be considered forming clusters. Training a model that forms these clusters can absolutely be considered part of the cluster forming process.

Response to Arguments
	The following responses address argu
Read full office action
Prosecution Timeline

Oct 07, 2021
Application Filed
Mar 07, 2025
Non-Final Rejection — §103
Jun 18, 2025
Applicant Interview (Telephonic)
Jun 18, 2025
Examiner Interview Summary
Aug 19, 2025
Response Filed
Sep 26, 2025
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/537,475
Patent 12585955
Minimal Trust Data Sharing
2y 5m to grant Granted Mar 24, 2026
17/524,338
Patent 12579440
Training Artificial Neural Networks Using Context-Dependent Gating with Weight Stabilization
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
60%
Grant Probability
With Interview (-60.0%)
4y 4m
Median Time to Grant
Moderate
PTA Risk
Based on 5 resolved cases by this examiner. Grant probability derived from career allow rate.