Office Action Analysis: 18410525 — METHOD FOR GENERATING TASK-SPECIFIC SCENE STRUCTURE

Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. KR10-2023-0008665, filed on 2/13/2024.
Should applicant desire to obtain the benefit of foreign priority under 35 U.S.C. 119(a)-(d) prior to declaration of an interference, a certified English translation of the foreign application must be submitted in reply to this action.  37 CFR 41.154(b) and 41.202(e).
Failure to provide a certified translation may result in no benefit being accorded for the non-English application.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 4/30/2024. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Specification
The disclosure is objected to because of the following informalities:
Page 4, line 19, “o n” should read “on.”
Page 5, line 14, “dinoising” should read “denoising.”
Page 16, line 1, “encode-decoder” should read “encoder-decoder.”
Appropriate correction is required.
Claim Objections
Claims 5 and 6 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim 8 is objected to because of the following informalities: 
Page 25, line 7, “o n” should read “on.”
Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Bear et al. (NPL “Learning Physical Graph Representations from Visual Scenes,” 2020, hereafter referred to as Bear) in view of Ter Haar Romenij et al. (U.S. Patent No. 10,713,563, hereafter referred to as Ter Haar Romenij.).
Regarding Claim 1, Bear teaches a method for generating a task-specific scene structure using a neural network model (Abstract, Bear discloses PSGNet, a neural network architecture that learns to extract “Physical Scene Graphs” (PSGs) which represent scenes as hierarchical graphs, with nodes in the hierarchy corresponding intuitively to object parts at different scales, and edges to physical connections between parts. Bound to each node is a vector of latent attributes that intuitively represent object properties such as surface shape and texture.) applied to a baseline network (S3 Comparing PSGNet to Baseline Models, “Assessing the dependence of baselines on geometric feature maps,” Bear teaches providing the depth and normal maps as inputs in addition to the RBG image to MONet and IODINE, which are CNN baseline models.) performing an image processing task by a plug-and- play scheme by using the scene structure (Section 3 Experiments and Analysis, Fig. 2A, Bear teaches using the baseline networks, MONet and IODINE to predict scene segmentations, with inputs being depth and normal maps in addition to the RGB image. The examiner interprets predicting scene segmentations to be an image processing task since the claim is silent to the specifics of the image processing task.) and outputting the scene structure to the baseline network (S3 Comparing PSGNet to Baseline Models, “Assessing the dependence of baselines on geometric feature maps,” Fig. S4, Bear teaches providing depth and normal maps as inputs in addition to the RGB image to each baseline network, MONet and IODINE. PSGNet generates depth and normal maps, shown in Fig. S4, which provide detailed geometric information. The examiner interprets depth and normal maps to be scene structures because they include detailed information representing the boundaries, shapes, structures, and textures of objects in images.).
Bear does not explicitly disclose the method comprising: generating, by a processor, a plurality of eigenvectors for an image according to an affinity matrix of the image; and generating, by the processor, the scene structure by convolutioning the plurality of eigenvectors.
Ter Haar Romenij is in the same field of art of image analysis for an image processing task using a neural network model. Further Ter Haar Romenij teaches the method comprising: generating, by a processor (Col. 5, lines 62-67, Ter Haar Romenij teaches implementing a CNN with a Graphical Processing Unit (GPU).), a plurality of eigenvectors for an image according to an affinity matrix of the image (Col. 5, lines 15-24, Ter Haar Romenij discloses an affinity matrix constructed by computing the outer products of all pixels and patches. The square affinity matrix is expressed as the correlation (dot product) between the feature vectors and represents how well pairs of feature stacks per pixel are similar to each other. Next, PCA is performed on the affinity matrix of the image to produce a set of eigenvectors.); and generating, by the processor (Col. 5, lines 62-67, Ter Haar Romenij teaches implementing a CNN with a Graphical Processing Unit (GPU).), the scene structure by convolutioning the plurality of eigenvectors (Col. 4, lines 44-48, Ter Haar Romenij teaches partitioning eigenvectors into δxδ kernels, or filters, giving, after convolution of the image with these kernels, a rich set of features per pixel. These kernels detect edges, corners, lines, and other primitives. They are δxδ patches. The kernels are visualized as square filters and typically show the next conceptual perceptual groups, the parts (for example, in an image of a face, elements of mouths noses, eyes). The examiner interprets a “set of features” to be a scene structure in light of the instant application specification, which states “the scene structure … may include arbitrary features representing the texture of the image, the boundary of the object in the image, a structure, a shape, etc., … and may include all concepts expressed as … a structure feature (Jeon et al., Page 11, lines 3-9).”).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Bear by incorporating in the preprocessing steps, the generation of eigenvectors for an image based on pixel affinities and partitioning the eigenvectors into kernels which perform convolutions on the image and detect features such as edges, corners, and lines that is taught by Ter Haar Romenij to make the invention that uses eigenvectors according to an affinity matrix to learn and detect complex and abstract features of the image including low-level features such as edges and lines, and high-level features such as shapes and arrangements of shapes to produce a scene structure for an image (Ter Haar Romenij, Col. 1, lines 31-38); thus one of ordinary skill in the art would have been motivated to combine the references because they are both in the field of image analysis for an image processing task using a neural network model (ter Haar Romenij, Abstract), (Bear, Abstract).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
In regards to Claim 7, Bear in view of Ter Haar Romenij discloses the method of claim 1, wherein the generating of the scene structure includes generating the scene structure by inputting the plurality of eigenvectors into a single convolution layer (Col. 4, lines 57-62, Fig. 2, reference characters 202 and 204, Col. 5, lines 4-8, Ter Haar Romenij discloses selecting eigenvectors associated with the largest eigenvalues to produce a set of convolutional filters associated with a first layer of a convolutional neural network. The examiner interprets “a first layer” to be a single convolution layer.).
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Bear et al. (NPL “Learning Physical Graph Representations from Visual Scenes,” 2020, hereafter referred to as Bear) in view of Ter Haar Romenij et al. (U.S. Patent No. 10,713,563, hereafter referred to as Ter Haar Romenij.) in further view of Tachella et al. (NPL “The Neural Tangent Link Between CNN Denoisers and Non-Local Filters,” 2020, hereafter referred to as Tachella.).
Regarding Claim 2, Bear in view of Ter Haar Romenij disclose the method of claim 1.
Bear in view of Ter Haar Romenij does not explicitly disclose wherein the baseline network performs at least one task of denoising, image deblurring, image super-resolution, image inpainting, or depth upsampling, and depth completion.
Tachella is in the same field of art of image processing using a neural network. Further, Tachella teaches wherein the baseline network performs at least one task of denoising, image deblurring, image super-resolution, image inpainting, or depth upsampling, and depth completion (Fig. 1, Introduction, Tachella teaches a convolutional neural network (CNN) trained with gradient descent on a single corrupted image for achieving powerful denoising. CNNs are used to perform denoising steps, either in unrolled schemes or in the context of plug-and-play methods.).
Therefore it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Bear in view of Ter Haar Romenij by incorporating the post-processing step of denoising the image with a CNN that is taught by Tachella to make the invention that generates a scene structure based on an image using a neural network model and outputs the scene structure to a CNN for performing an image processing task, specifically the task of denoising an image; thus one of ordinary skill in the art would have been motivated to combine the references to achieve competitive image processing results with the CNN trained on a single corrupted image compared to a fully trained network, reducing the computationally and time intensive process of training a large neural network (Tachella, Introduction). 
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Claims 3-5 are rejected under 35 U.S.C. 103 as being unpatentable over Bear et al. (NPL “Learning Physical Graph Representations from Visual Scenes,” 2020, hereafter referred to as Bear) in view of Ter Haar Romenij et al. (U.S. Patent No. 10,713,563, hereafter referred to as Ter Haar Romenij.) in further view of Melas-Kyriazi et al. (NPL “Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization,” 2022, hereafter referred to as Melas-Kyriazi.).
Regarding Claim 3, Bear in view of Ter Haar Romenij disclose the method of claim 1.
Bear in view of Ter Haar Romenij does not explicitly disclose wherein the generating of the eigenvector includes generating an eigenvector corresponding to a structure for each region of the image clustered according to the affinity matrix through an encoder/decoder in the neural network model.
Melas-Kyriazi is in the same field of art of generating a scene structure from an image using a neural network. Further, Melas-Kyriazi teaches wherein the generating of the eigenvector includes generating an eigenvector corresponding to a structure for each region of the image clustered according to the affinity matrix through an encoder/decoder in the neural network model (Abstract, Section 3.2 Semantic Spectral Decomposition, Melas-Kyriazi discloses examining the eigenvectors of the Laplacian of a feature affinity matrix. The eigenvectors compose an image into meaningful segments. By clustering the features associated with these segments across a dataset, well-delineated, namable regions can be obtained. The features are extracted using a neural network. Additionally, for the case where the network is a transformer architecture, the transformer inherently contains an encoder and decoder.).
Therefore it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Bear in view of Ter Haar Romenij by incorporating the pre-processing step of generating eigenvectors corresponding to each meaningful segment in the image and clustering the features associated with the segments to identify distinct regions in the image that is taught by Melas-Kyriazi to make the invention that is able to segment multiple regions in an image and outperforms prior methods (Melas-Kyriazi, “Introduction”); thus one of ordinary skill in the art would have been motivated to combine the references because they are both in the field of generating scene structures from an image using a neural network.
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
In regards to Claim 4, Bear in view of Ter Haar Romenij disclose the method of claim 1.
Bear in view of Ter Haar Romenij does not explicitly disclose wherein the generating of the eigenvector includes transforming the affinity matrix, and deriving a Laplacian matrix, and generating an eigenvector which makes a value of a quadratic form of the Laplacian matrix for the eigenvector become the minimum.
Melas-Kyriazi is in the same field of art of performing image processing using a neural network. Further Melas-Kyriazi discloses wherein the generating of the eigenvector includes transforming the affinity matrix (Fig. 2 Caption, Melas-Kyriazi teaches given an image, extracting dense features from a network and using these to construct a semantic affinity matrix. The affinity matrix is then fused with low-level color information. Next, the image is decomposed into soft segments by computing the eigenvectors of the Laplacian of the affinity matrix.), and deriving a Laplacian matrix (Section 3.1 Background, Melas-Kyriazi teaches calculating the Laplacian matrix L of a graph given by L = D – W.), and generating an eigenvector which makes a value of a quadratic form of the Laplacian matrix for the eigenvector become the minimum (Section 3.1 Background, Melas-Kyriazi teaches the Laplacian corresponding to a quadratic form: 

    PNG
    media_image1.png
    54
    368
    media_image1.png
    Greyscale

    PNG
    media_image2.png
    26
    217
    media_image2.png
    Greyscale

The eigenvalues and eigenvectors of L (the Laplacian matrix) are the central objects of study in the graph spectral theory. The eigenvectors yi span an orthogonal basis on G (the graph) that is, the smoothest possible orthogonal basis:

    PNG
    media_image3.png
    32
    261
    media_image3.png
    Greyscale

The ‘argmin’ function returns the eigenvector at which the quadratic form of the Laplacian reaches its minimum value.).
Therefore it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Bear in view of Ter Haar Romenij by deriving the Laplacian matrix from the affinity matrix and finding the smallest non-zero eigenvector that makes the quadratic form of the Laplacian matrix a minimum that is taught by Melas-Kyriazi to make the invention that takes the eigenvectors of its Laplacian and decomposes an image into soft segments or “eigensegments,” which correspond to semantically meaningful image regions and have well-delineated boundaries. As such, localization and segmentation tasks are natural immediate applications of this approach (Melas-Kyriazi, Section 3.2 Semantic Spectral Decomposition); thus one of ordinary skill in the art would have been motivated to combine the references to enable identification of the most critical semantic regions in the image, where the first eigenvector usually identifies the most salient object in the image (Melas-Kyriazi, Fig. 3 Caption), enabling image post-processing tasks. 
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Bear et al. (NPL “Learning Physical Graph Representations from Visual Scenes,” 2020, hereafter referred to as Bear) in view of Ter Haar Romenij et al. (U.S. Patent No. 10,713,563, hereafter referred to as Ter Haar Romenij.) in further view of Kanakis et al. (NPL “Reparametrizing Convolutions for Incremental Multi-Task Learning without Task Interference,” 2020, hereafter referred to as Kanakis.).
Regarding Claim 8, Bear in view of Ter Haar Romenij discloses the method of claim 7.
Bear in view of Ter Haar Romenij does not explicitly disclose wherein the single convolution layer convolutions the plurality of eigenvectors based on a weight learned according to a task of the baseline network.
Kanakis is in the same field of art of using a neural network to complete one or more image processing tasks. Further, Kanakis discloses wherein the single convolution layer convolutions the plurality of eigenvectors based on a weight learned according to a task of the baseline network (Introduction, Section B Reparameterization Details, Section 4.7 Incremental learning for multi-tasking,” Kanakis teaches using a set of task-specific parameters for each convolutional layer. Each convolution is decomposed into a shared part that acts as a filter bank encoding common knowledge, and a task-specific modulator that adapts this common knowledge uniquely for each task. The task specific parameters,                 
                    
                        
                            W
                        
                        
                            t
                        
                        
                            i
                        
                    
                
            , optimized independently for each task, i, are initialized by U (an orthonormal matrix with eigenvectors on the columns), and is implemented as a 1 x 1 convolution. The tasks include (i) edge detection and surface normals (low-level tasks), (ii) saliency (mid-level task), and (iii) semantic segmentation and human parts segmentation (high-level tasks).).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Bear in view of Ter Haar Romenij by adjusting the weights of the convolutions depending on the task to be performed by the neural network that is taught by Kanakis to make the invention that performs convolutions on the eigenvectors with varying weights, depending on the type of image processing task to be performed by the neural network; thus one of ordinary skill in the art would have been motivated to combine the references to enable the model to solve more complex real-world problems, as well as perform multiple tasks on-demand without significantly comprising each task’s performance (Kanakis, Introduction).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SYDNEY L BLACKSTEN whose telephone number is (571)272-7651. The examiner can normally be reached 8:30am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Oneal Mistry can be reached at 313-446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SYDNEY L BLACKSTEN/Examiner, Art Unit 2674                                                                                                                                                                                                        
/ONEAL R MISTRY/Supervisory Patent Examiner, Art Unit 2674
Read full office action
METHOD FOR GENERATING TASK-SPECIFIC SCENE STRUCTURE

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

METHOD FOR GENERATING TASK-SPECIFIC SCENE STRUCTURE

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email