Office Action Analysis: 18397994 — IMAGE SEGMENTATION MASK REFINEMENT WITH DIFFUSION MODEL

Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
	The information disclosure statement (IDS) submitted on February 29, 2024 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claim 6 objected to because of the following informalities: In claim 6, lines 5-6, “pixel values of the a coarse segmentation mask” should read “pixel values of .  Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6-7, 9-13, 15-16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (US 20220245824 A1) in further view of Lai et al. ("Denoising diffusion semantic segmentation with mask prior modeling.", 2023; Copy provided by examiner).

Regarding Claim 1:
	Liu et al. teaches: A computing system, comprising: a processor; and a storage device holding instructions executable by the processor to (Abstract “The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate refined segmentation masks”; ¶ [0136] “the components 902-928 can include one or more instructions stored on a computer-readable storage medium and executable by processors”): 
receive an initial image segmentation mask for an image, (¶ [0007] “system utilizes a neural network to iteratively refine a segmentation mask for an input digital visual media item (e.g., image or video frame)”; ¶ [0022] “…the segmentation refinement system utilizes a segmentation refinement neural network to generate an initial segmentation mask for the digital visual media item by determining whether the plurality of pixels correspond to the one or more objects.”);
input the initial image segmentation mask to a (Liu et al. teaches the initial segmentation mask is input in a trained model (i.e. a segmentation refinement neural network) to generate a refined image segmentation mask by generating one or more refined segmentation masks based on the initial image mask, ¶ [0041] “ a segmentation refinement neural network refers to a neural network that analyzes a digital visual media item, generates an initial segmentation mask for the digital visual media item, and generates one or more refined segmentation masks based on the initial segmentation mask.”.  Liu et al. further teaches an iterative process of segmenting where each iteration of segmentation uses a neural network to change pixel values by identifying uncertain pixels and reclassifying at least some of the uncertain pixels, ¶ [007] “…system utilizes a neural network to iteratively refine a segmentation mask for an input digital visual media item (e.g., image or video frame). In some embodiments, for each iteration, the system utilizes the neural network to identify uncertain pixels (e.g., pixels that have been uncertainly classified) from a previously-generated segmentation mask and reclassifies at least some of the uncertain pixels.”), and by identifying pixels associated with objects and updating the pixel’s probability that they are associated with the feature values, ¶ [0082] “the segmentation refinement neural network 300 generates the refined segmentation mask 318 by further redetermining whether the certain pixels correspond to the one or more objects depicted in the digital visual media item 304 based on the feature values associated with the certain pixels (e.g., those feature values extracted from the final feature map 308). Indeed, in some implementations, the segmentation refinement neural network 300 reclassifies all pixels. In other words, the segmentation refinement neural network 300 generates updated probabilities that the pixels correspond to the one or more objects depicted in the digital visual media item 304.”  A series of intermediary segmentation masks is performed (Refer to Fig. 3 ¶ [0086] “Fig. 3 illustrates two refinement iterations (i.e., generation of two refined segmentation masks), the segmentation refinement neural network 300 performs more or less iterations in various embodiments” and ¶ [0087] “In one or more embodiments, the segmentation refinement system 106 modifies the number of refinement iterations performed by the segmentation refinement neural network 300 based on the capabilities of the implementing computing device.”, and changes in pixel value previously taught results in an increase in pixel resolution during each iteration of the refinement, ¶ [0086] “In some implementations, with each refinement iteration, the segmentation refinement neural network 300 increases the resolution of the generated refined segmentation mask. For example, in some embodiments, the segmentation refinement neural network 300 doubles the resolution of the refined segmentation mask generated in each iteration.”); 
and output the refined image segmentation mask (Abstract “the disclosed systems utilize a segmentation refinement neural network to generate an initial segmentation mask for a digital visual media item. The disclosed systems further utilize the segmentation refinement neural network to generate one or more refined segmentation masks based on uncertainly classified pixels identified from the initial segmentation mask. the disclosed systems utilize a segmentation refinement neural network to generate an initial segmentation mask for a digital visual media item. The disclosed systems further utilize the segmentation refinement neural network to generate one or more refined segmentation masks based on uncertainly classified pixels identified from the initial segmentation mask.”; ¶ [0038] “the term “neural network” refers to a machine learning model that can be tuned (e.g., trained) based on inputs to approximate unknown functions used for generating the corresponding outputs”;  The purpose of the system Liu et al. teaches is to generate a refined image segmentation mask, not merely to use a refined image segmentation mask as an intermediate tool or representation.  Thus, examiner interprets the refined image segmentation mask as an output of the system.).
Liu et al. fails to teach a discrete diffusion model.
In a related art, Lai et al. teaches using a previously segmented input image in a discrete diffusion model (Abstract “we propose to ameliorate the semantic segmentation quality of existing discriminative approaches with a mask prior modeled by a recently developed denoising diffusion generative model. Beginning with a unified architecture that adapts diffusion models for mask prior modeling, we focus this work on a specific instantiation with discrete diffusion”). 
Refer to Fig. 1 (see below) depicting a denoising diffusion generative model (seen on the upper right of the figure) taught by Lai et al. that uses a “segmentation mask prior p(x)” input from an image mask previously segmented.

    PNG
    media_image1.png
    542
    509
    media_image1.png
    Greyscale

Refer to Fig. 3 (see below) that teaches a Denoising Diffusion Prior Modeling for semantic Segmentation (DDPS) which uses an image representation from a base semantic segmentation model input in the diffusion model to ultimately create a refined mask.

    PNG
    media_image2.png
    465
    1089
    media_image2.png
    Greyscale

Liu et al. and Lai et al. are analogous art to the instant application because they teach methods of refining image masks based on input image(s) and segmented image(s) masks.  If someone were looking to learn about how segmentation models can be used in the process to refine pixels and improve the quality of an image, then they would refer to Liu et al. and Lai et al for knowledge on the topic.
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the
effective filing date of the claimed invention to have modified the teachings of Liu et al. to incorporate the teachings of Lai et al. and use a discrete diffusion model to refine and output a segmented image mask.  Doing so “could achieve competitively quantitative performance and more appealing visual quality” (Lai et al. Abstract).

Regarding Claim 11:
	Claim 11 equally mirrors limitations of claim 1 and is therefore rejected based on the prior art taught in claim 1, including a method for image segmentation mask refinement (see Liu et al. Abstract).

Regarding Claims 2 and 12
	Liu et al. and Lai et al. teach the limitations of claims 1 and 11
Liu et al. further teaches: wherein the (¶ [0028] “the segmentation refinement neural network generates multiple refined segmentation masks for a digital visual media item via multiple refinement iterations”. Digital visual media is cited as an image, see ¶ [0007] “digital visual media item (e.g., image or video frame”.);
 changing pixel values of one or more mask pixels of a preceding image segmentation mask generated on a preceding iteration cycle (Liu et al teaches, ¶ [0007] “for each iteration, the system utilizes the neural network to identify uncertain pixels (e.g., pixels that have been uncertainly classified) from a previously-generated segmentation mask and reclassifies at least some of the uncertain pixels”, thereby iteratively updating mask pixel values across iterations.).
Liu et al. does not teach using a discrete diffusion model.
In a related art, Lai et al. teaches: using a denoising diffusion generative model to improve segmentation mask quality by conditioning on a mask prior (Abstract).  Lai et al. further teaches a discrete diffusion model, “Though a variety of diffusion models are applicable, we focus this work on a specific installation with discrete diffusion. In a nutshell, starting from a stationary state                         
                            
                                    x
                                
                                    T
                                
                     , e.g., random mask or blank mask with all pixels belonging to a special [MASK] category, the diffusion model with mask prior then iteratively denoises the current state x t to the next state                         
                            
                                    x
                                
                                    t
                                    -
                                    1
                                
                     , conditioned on initial predictions of image representations.” (See p. 4, Sec. 3.2 “Overall Architecture”, sub-section “Denoising Diffusion Mask Prior”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the
effective filing date of the claimed invention to have modified the teachings of Liu et al. to incorporate the teachings of Lai et al. and use a discrete diffusion model when iterating through the segmentation masks and changing pixel values of previous cycles.  Doing so would help avoid continuous approximations of pixel values by adding conditions to the iterations and the associated pixels, thus making the system more efficient.

Regarding Claims 3 and 13:
	Liu et al. and Lai et al. teach the limitations of claims 2 and 12
Liu et al. further teaches: wherein a trained neural network is used to output the series of intermediary image segmentation masks (Abstract “the disclosed systems utilize a segmentation refinement neural network to generate an initial segmentation mask for a digital visual media item. The disclosed systems further utilize the segmentation refinement neural network to generate one or more refined segmentation masks based on uncertainly classified pixels identified from the initial segmentation mask.”;  ¶ [0039] “a neural network layer includes a collection of artificial neurons that processes an input to the neural network layer, which includes an input to the neural network or an output of a previous neural network layer”; ¶ [0041] “a multi-layer perceptron renderer includes a feedforward neural network that generates one or more refined segmentation masks for a digital visual media item based on an initial segmentation mask corresponding to the digital visual media item.”; In summary,  Liu et al. teaches a system that creates an initial segmentation mask then utilizes the feedforward neural network (i.e. segmentation refinement neural network) to generate one or more intermediary outputs of refined image segmentation masks based on the initial segmentation mask and proceeding layers.).

Regarding Claim 4:
	Liu et al. and Lai et al. teach the limitations of claim 3.
Lai et al. further teaches: using a U-Net architecture.  Lai et al. first teaches a diffusion model being used (DPPS) “…we first construct a unified architecture adapting Denoising Diffusion Prior modeling for semantic Segmentation (DDPS)…” (See Sec. 1, p. 2, left column, ¶ (04)), then describes the variant of discrete diffusion model architecture being trained and the implementation of an U-Net denoiser “Though a variety of diffusion models are applicable, we focus this work on a specific installation with discrete diffusion. In a nutshell, starting from a stationary state                         
                            
                                    x
                                
                                    T
                                
                     , e.g., random mask or blank mask with all pixels belonging to a special [MASK] category, the diffusion model with mask prior then iteratively denoises the current state x t to the next state                         
                            
                                    x
                                
                                    t
                                    -
                                    1
                                
                     , conditioned on initial predictions of image representations. Following, we adopt a variant of U-Net as denoiser. It is trained to predict the unknown clean mask                         
                            
                                    x
                                
                                    0
                                
                     for deriving                         
                            
                                    x
                                
                                    (
                                    t
                                    -
                                    1
                                    )
                                
                    instead of directly predicting                         
                            
                                    x
                                
                                    (
                                    t
                                    -
                                    1
                                    )
                                
                    , which is also known as                         
                            
                                    x
                                
                                    0
                                
                     parameterization.” (See p. 4, Sec. 3.2 “Overall Architecture”, sub-section “Denoising Diffusion Mask Prior”).   Based on the teachings of Lai et al., DPPS is interpreted to be a trained neural network and the U-Net denoiser is interpreted as U-Net architecture.
	
Regarding Claims 6 and 15:
	Liu et al. and Lai et al. teach the limitations of claims 1 and 12.
	Lai et al. further teaches: wherein the diffusion model is trained in a two-phase training process including a forward diffusion phase and a reverse diffusion phase, wherein the forward diffusion phase includes iteratively adding noise to a ground truth image segmentation mask to generate a training coarse segmentation mask, and wherein the reverse diffusion phase includes iteratively changing pixel values of a coarse segmentation mask to generate a training refined segmentation mask during inference (Refer to Fig. 2, an excerpt from Lai et al., found below).

    PNG
    media_image3.png
    442
    486
    media_image3.png
    Greyscale

Fig. 2 teaches the two-phase diffusion model that uses a Markov chain, target distribution, and a transition distribution to gradually (i.e. iteratively) add noise (interpreted as changing pixel value to one of ordinary skill in the art) during the forward diffusion phase (phase 1) and apply a denoiser during the reverse diffusion phase (phase 2).  Lai et al. also teaches the diffusion processes generates the noisy samples by corrupting the ground-truth data (refer to p. 5, Section “3.3. Training”, paragraph 2), therefore the image segmentation mask is interpreted to be a ground truth image segmentation mask.  Additionally, Lai et al. teaches the inference process model uses a gradual process of denoising, see p. 6, Section 3.4, paragraph 2, “The inference process of diffusion models follows a denoising trajectory that starts from a pure noisy mask                         
                            
                                    x
                                
                                    T
                                
                      and gradually predicts a less noisy mask                         
                            
                                    x
                                
                                    (
                                    t
                                    -
                                    1
                                    )
                                
                     given the current mask                         
                            
                                    x
                                
                                    t
                                
                    .”.  The forward phase of generating samples and adding noise to the ground truth image segmentation mask is interpreted to be equal to generating a training coarse segmentation mask and the reverse phase of learning and reversing the transition (denoising) is interpreted to be equal to iteratively changing pixel values of a coarse segmentation mask to generate a training refined segmentation mask during inference.

Regarding Claims 7 and 16:
	Liu et al. and Lai et al. teach the limitations of claims 6 and 15.
	Lai et al. further teaches: wherein the forward diffusion phase is a unidirectional process in which every mask pixel of the ground truth image segmentation mask is transitioned from a fine state to a coarse state. First Lai et al. teaches the diffusion processes generates noisy samples by corrupting the ground-truth data (refer to p. 5, Secion “3.3. Training”, paragraph 2), therefore the image segmentation mask is based on ground truth data.   Next Lae et al. teaches how the forward diffusion phase uses a Markov chain to progressively add noise at each step and a transition distributer to progress the steps (Refer to Fig. 2 found above).  The forward diffusion process is considered unidirectional by the examiner because the image mask only goes in one direction, from a finite state to a course state in this phase.  The forward diffusion process is interpreted as unidirectional and the step of adding noise to each pixel beginning with a clean ground truth-based image segmentation mask is interpreted as transitioning every mask pixel from a fine state to a coarse state. 

Regarding Claims 9 and 18:
	Liu et al. and Lai et al. teach the limitations of claims 1 and 11.
	Liu et al. further teaches: wherein the initial image segmentation mask is output by an image segmentation model trained to output image segmentation masks for input images ( Abstract “the disclosed systems utilize a segmentation refinement neural network to generate an initial segmentation mask for a digital visual media item.”; [007] “system utilizes a neural network to iteratively refine a segmentation mask for an input digital visual media item (e.g., image or video frame)”; [0082] “the segmentation refinement neural network 300 provides the output of the one or more convolutional layers to a classifier to generate the refined segmentation mask.”).

Regarding Claims 10 and 19:
	Liu et al. and Lai et al. teach the limitations of claims 9 and 18.
	Liu et al. further teaches: wherein the image segmentation model is a convolutional neural network (CNN) (¶ [0023] “the segmentation refinement neural network includes a backbone neural network component, such as a convolutional neural network, that generates initial segmentation masks for digital visual media items.”).

Regarding Claim 20:
Liu et al. teaches: A computing system, comprising: a processor; and a storage device holding instructions executable by the processor to (Abstract “The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate refined segmentation masks”; ¶ [0136] “the components 902-928 can include one or more instructions stored on a computer-readable storage medium and executable by processors”): 
receive an initial image segmentation mask for an image, the initial image segmentation mask output by a trained image segmentation model ( ¶ [007] “system utilizes a neural network to iteratively refine a segmentation mask for an input digital visual media item (e.g., image or video frame)”; ¶ [0022] “…the segmentation refinement system utilizes a segmentation refinement neural network to generate an initial segmentation mask for the digital visual media item by determining whether the plurality of pixels correspond to the one or more objects.”; ¶ [0038] “…the term “neural network” refers to a machine learning model that can be tuned (e.g., trained)”); 
input the initial image segmentation mask to a preceding image segmentation mask generated on a preceding iteration cycle (Liu et al. teaches the initial segmentation mask is input in the trained model (i.e. a segmentation refinement neural network) to generate a refined image segmentation mask by generating one or more refined segmentation masks based on the initial image mask, ¶ [0041] “ a segmentation refinement neural network refers to a neural network that analyzes a digital visual media item, generates an initial segmentation mask for the digital visual media item, and generates one or more refined segmentation masks based on the initial segmentation mask.”.  Liu et al. further teaches an iterative process of segmenting where each iteration of segmentation uses a neural network to change pixel values by identifying uncertain pixels and reclassifying at least some of the uncertain pixels, ¶ [007] “…system utilizes a neural network to iteratively refine a segmentation mask for an input digital visual media item (e.g., image or video frame). In some embodiments, for each iteration, the system utilizes the neural network to identify uncertain pixels (e.g., pixels that have been uncertainly classified) from a previously-generated segmentation mask and reclassifies at least some of the uncertain pixels.”), and by identifying pixels associated with objects and updating the pixel’s probability that they are associated with the feature values, ¶ [0082] “the segmentation refinement neural network 300 generates the refined segmentation mask 318 by further redetermining whether the certain pixels correspond to the one or more objects depicted in the digital visual media item 304 based on the feature values associated with the certain pixels (e.g., those feature values extracted from the final feature map 308). Indeed, in some implementations, the segmentation refinement neural network 300 reclassifies all pixels. In other words, the segmentation refinement neural network 300 generates updated probabilities that the pixels correspond to the one or more objects depicted in the digital visual media item 304.”  A series of intermediary segmentation masks is performed (Refer to Fig. 3 ¶ [0086] “Fig. 3 illustrates two refinement iterations (i.e., generation of two refined segmentation masks), the segmentation refinement neural network 300 performs more or less iterations in various embodiments” and ¶ [0087] “In one or more embodiments, the segmentation refinement system 106 modifies the number of refinement iterations performed by the segmentation refinement neural network 300 based on the capabilities of the implementing computing device.”, and changes in pixel value previously taught results in an increase in pixel resolution during each iteration of the refinement, ¶ [0086] “In some implementations, with each refinement iteration, the segmentation refinement neural network 300 increases the resolution of the generated refined segmentation mask. For example, in some embodiments, the segmentation refinement neural network 300 doubles the resolution of the refined segmentation mask generated in each iteration.”); 
and output the refined image segmentation mask (Abstract “the disclosed systems utilize a segmentation refinement neural network to generate an initial segmentation mask for a digital visual media item. The disclosed systems further utilize the segmentation refinement neural network to generate one or more refined segmentation masks based on uncertainly classified pixels identified from the initial segmentation mask. the disclosed systems utilize a segmentation refinement neural network to generate an initial segmentation mask for a digital visual media item. The disclosed systems further utilize the segmentation refinement neural network to generate one or more refined segmentation masks based on uncertainly classified pixels identified from the initial segmentation mask.”; ¶ [0038] “the term “neural network” refers to a machine learning model that can be tuned (e.g., trained) based on inputs to approximate unknown functions used for generating the corresponding outputs”;  The purpose of the system Liu et al. teaches is to generate a refined image segmentation mask, not merely to use a refined image segmentation mask as an intermediate tool or representation.  Thus, examiner interprets the refined image segmentation mask as an output of the system.).
Liu et al. fails to teach a discrete diffusion model.
In a related art, Lai et al. teaches using a priorly segmented input image in a discrete diffusion model (Abstract “we propose to ameliorate the semantic segmentation quality of existing discriminative approaches with a mask prior modeled by a recently developed denoising diffusion generative model. Beginning with a unified architecture that adapts diffusion models for mask prior modeling, we focus this work on a specific instantiation with discrete diffusion”). Refer to Fig. 1 (also found above, on page 6, of the instant office action) depicting a denoising diffusion generative model (seen on the upper right of the figure) taught by Lai et al. that uses a “segmentation mask prior p(x)” input from an image mask previously segmented. Refer to Fig. 3 (also found above, on page 7, of the instant office action) that teaches a Denoising Diffusion Prior Modeling for semantic Segmentation (DDPS) which uses an image representation from a base semantic segmentation model input in the diffusion model to ultimately create a refined mask.
Liu et al. and Lai et al. are analogous art to the instant application because they teach methods of refining image masks based on input image(s) and segmented image(s) masks.  If someone were looking to learn about how segmentation models can be used in the process to refine pixels and improve the quality of an image, then Liu et al. and Lai et al. would provide the methods for doing so.
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the
effective filing date of the claimed invention to have modified the teachings of Liu et al. to incorporate the teachings of Lai et al. and use a discrete diffusion model to refine and output a segmented image mask.  Doing so “could achieve competitively quantitative performance and more appealing visual quality” (Lai et al. Abstract).

Claims 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (US 20220245824 A1) in view of Lai et al. ("Denoising diffusion semantic segmentation with mask prior modeling.", 2023; Copy provided by examiner), and in further view of Krähenbühl et al. (“Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials”, 2011; Copy provided by examiner).

Regarding Claims 5 and 14:
For the purpose of this rejection, a “state” of a mask pixel is interpreted as the pixel’s assignment at a particular segmentation granularity, including an initial (course) segmentation state and a refined (fine) segmentation state. A “state transition probability” is interpreted as a probability indicating whether the pixel is assigned one segmentation state or another during refinement, as described in the instant application’s specifications (p. 10-11).
Liu et al. and Lai et al. teach the limitations of claims 2 and 12.
Liu et al. further teaches wherein the pixel values of the one or more mask pixels are changed based at least in part on a 
Liu et al. specifically teaches changing pixel values of one or more segmentation masks by refining a segmentation mask across multiple stages in an iterative process (¶ [0028] “the segmentation refinement neural network generates multiple refined segmentation masks for a digital visual media item via multiple refinement iterations”; ¶ [0007] “for each iteration, the system utilizes the neural network to identify uncertain pixels (e.g., pixels that have been uncertainly classified) from a previously-generated segmentation mask and reclassifies at least some of the uncertain pixels”.), as described in rejection for claims 2 and 12.  Liu et al. goes on to teach the refinement is based on probability (¶ [0082] “the segmentation refinement neural network 300 generates the refined segmentation mask 318 by redetermining whether the uncertain pixels correspond to the one or more objects depicted in the digital visual media item 304 based on the feature values associated with the uncertain pixels… the segmentation refinement neural network 300 generates updated probabilities that the pixels correspond to the one or more objects depicted in the digital visual media item”).  In summary, Liu et al. teaches changing pixel values of an initial image segmentation mask to a refined image segmentation mask based on updated pixel probabilities and iterative classification.
	Liu et al. and Lai et al. fails to explicitly teach determining a probability based on relationships with other mask pixels that indicates whether that pixel changes its assignment (either a coarse or fine pixel) between the initial image segmentation mask and the refined image segmentation mask.
	In a related art, Krähenbühl et al. teaches: methods for image segmentation, including a pixel state transition probability model conditioned on neighboring pixels and iteratively updating pixel labels through inference over unary and pairwise potentials (Refer to Abstract).
Krähenbühl et al. first teaches determining probabilities associated with different individual pixel assignments and refining those assignments across iterations (See p. 1-p. 2, Sec. “1 Introduction”, [0001]-[0003] “The goal is to label every pixel in the image with one of several predetermined object categories, thus concurrently performing recognition and segmentation of multiple object classes… Basic CRF models are composed of unary potentials on individual pixels…and pairwise potentials on neighboring pixels or patches… We use a fully connected CRF that establishes pairwise potentials on all pairs of pixels in the image… our model connects all pairs of individual pixels in the image, enabling greatly refined segmentation and labeling.”).  Krähenbühl et al. goes on to teach the pixel level probabilities are calculated and updated across refinement iterations ((p. 3, Section “2 The Fully Connected CRF Model”, ¶ [0003] “The unary potential                         
                            
                                    ψ
                                
                                    u
                                
                    (                        
                            
                                    x
                                
                                    i
                                
                    ) is computed independently for each pixel by a classifier that produces a distribution over the label assignment                         
                            
                                    x
                                
                                    i
                                
                     given image features.”; p. 3, Section “3.1 Mean Field Approximation”, ¶ [0001] “the mean field approximation computes a distribution Q(X)”; The approximation or distribution “is iteratively optimized through a series of message passing steps, each of which updates a single variable by aggregating information from all other variables.” (p. 2, Section “1 Introduction”, ¶ [0004])).  In summary, Krähenbühl et al. teaches determining, for each pixel, a probability associated with alternative pixel assignments and refining those probabilities based on information from other pixels, resulting in changed pixel assignments between an initial (coarse) image segmentation mask and a refined (fine) image segmentation mask.
Liu et al., Lai et al., Krähenbühl et al., and the instant application are considered analogous art because they are all directly related to image segmentation and refinement based on individual pixels.  Liu et al. and Krähenbühl et al.s’ teachings are further relevant to claims 5 and 14 because they use probability information for refining the segmentation masks.  Therefore, one of ordinary skill in the art would have reasonably consulted Krähenbühl et al. to learn how to improve image segmentation refinement methods taught by Liu et al. and Lai et al.
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the
effective filing date of the claimed invention to have modified the teachings of Liu et al. and Lai et al. to incorporate the teachings of Krähenbühl et al. and apply the pixel probability refinement model taught by Krähenbühl et al. to the segmentation refinement process taught by Liu et al.  While Liu et al. teaches reclassifying pixels across refinement iterations based on pixel probabilities, Krähenbühl et al. teaches refining the probabilities based on relationships between pixels to determine whether a pixel maintains or changes its assignment during the refinement process.  The resulting teaches would provide for a model that changes pixel values of one or more masks based on a probability indicating whether the mask pixel changes its assignment between an initial segmentation mask and a refined segmentation mask, as claim.  Applying Krähenbühl et al.’s probability approach to the refinement process taught by Liu et al. would have been a predictable use of a known technique to those of ordinary skill in the art prior to the effective filing date to improve the consistency and accuracy of a refined image segmentation mask.  

Claims 8 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (US 20220245824 A1) in view of Lai et al. ("Denoising diffusion semantic segmentation with mask prior modeling.", 2023; Copy provided by examiner), and in further view of Wang et al. (“DFormer: Diffusion-guided Transformer for Universal Image Segmentation”, 2023, Copy provided by examiner).

Regarding Claims 8 and 17:
	Liu et al. and Lai et al. teach the limitations of claims 1 and 11.
	Liu et al. and Lai et al. fail to teach: wherein the image is input to the diffusion model with the initial image segmentation mask.
	In a related art, Wang et al. teaches an “image segmentation task as a denoising process using a diffusion model” (Abstract).  Wang et al. further teaches wherein the image is input to the diffusion model with the initial image segmentation mask (see Abstract “…we take deep pixel-level features along with the noisy masks as inputs to generate mask features and attention masks, employing diffusion-based decoder to perform mask prediction gradually.” and “DFormer first adds various levels of Gaussian noise to ground-truth masks, and then learns a model to predict denoising masks from corrupted masks.”; The “deep pixel-level features” is interpreted as an image at a pixel level, and the “noisy mask” representing a corrupt or noisy version of the “ground-truth mask” at a segmentation level is interpreted as the initial image segmentation mask.). 
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified the teachings of Liu et al. and Lai et al. to incorporate the teachings of Wang et al. and input the original image along with the segmented image mask into the diffusion model. Doing so would have provided an “universal architecture that performs different image segmentation tasks through a unified framework”, solving the “end-to-end set prediction problem” previous segmentation tasks struggled with when preformed individually (See Wang et al. Section 1, paragraphs 2-3) and improve the output’s accuracy by providing two input parameters (original image and segmented image mask) for segmenting an image to the diffusion model.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMUEL DAVID BAYNES whose telephone number is (571)272-0607. The examiner can normally be reached Monday - Friday 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen R Koziol can be reached at (408)918-7630. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.D.B/
Samuel D. Baynes
Examiner, Art Unit 2665

/Stephen R Koziol/Supervisory Patent Examiner, Art Unit 2665
Read full office action
IMAGE SEGMENTATION MASK REFINEMENT WITH DIFFUSION MODEL

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

IMAGE SEGMENTATION MASK REFINEMENT WITH DIFFUSION MODEL

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email