Last updated: April 19, 2026
Application No. 17/949,527
AUTO-GROUPING GALLERY WITH IMAGE SUBJECT CLASSIFICATION

Non-Final OA §103
Filed
Sep 21, 2022
Examiner
PHAM, NHUT HUY
Art Unit
2674
Tech Center
2600 — Communications
Assignee
International Business Machines Corporation
OA Round
1 (Non-Final)
Interview Optional

— +26.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 53 resolved cases, 2023–2026
Examiner Intelligence

PHAM, NHUT HUY View full profile →
Grants 79% — above average
Career Allow Rate
42 granted / 53 resolved
+17.2% vs TC avg
Strong +27% interview lift
Without
With
+26.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
31 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
9.4%
-30.6% vs TC avg
§103
62.2%
+22.2% vs TC avg
§102
11.9%
-28.1% vs TC avg
§112
14.5%
-25.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 53 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
The United States Patent & Trademark Office appreciates the application that is submitted by the inventor/assignee. The United States Patent & Trademark Office reviewed the following application and has made the following comments below.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/21/2022 is considered and attached.

Claim Status
Claims 1, 3-8, 10-15 and 17-20 are rejected under 35 USC § 103:
Claims 1, 4, 7-8, 11, 14-15 and 18 are rejected over Vaduva in view of Black in view of Zhao in view of Fox.
Claims 3, 10 and 17 are rejected over Vaduva in view of Black in view of Zhao, in view of Fox and further in view of Wikipedia.
Claims 5-6, 12-13 and 19-20 are rejected over Vaduva in view of Black in view of Zhao, in view of Fox and further in view of Cai.
Claims 2, 9 and 16 are objected

35 USC § 101
In regards to Claim 8, the Examiner has reviewed Applicant’s specification, paragraph
0017 (“A computer readable storage medium, as that term is used in the present disclosure, is not to be construed as storage in the form of transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide, light pulses passing through a fiber optic cable, electrical signals communicated through a wire, and/or other transmission media”), and determined that the claim does not contain transitory signal. Thus, a 101 rejection is not necessary.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1, 4, 7-8, 11, 14-15 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over:
Vaduva et al. (Văduva, Corina, Inge Gavăt, and Mihai Datcu. "Latent Dirichlet allocation for spatial analysis of satellite images." IEEE, 2012, hereinafter Vaduva) in view of
Black et al. (US-5802203-A, published 1998, hereinafter Black), and further in view of
Zhao et al. (Zhao, Li-Jun, Ping Tang, and Lian-Zhi Huo. "Land-use scene classification using a concentric circle-structured multiscale bag-of-visual-words model." IEEE (2014), hereinafter Zhao)
Fox et al. (Fox, James, et al. "Concentric spherical gnn for 3d representation learning." arXiv, published 2021)
CLAIM 1
In regards to Claim 1, Vaduva teaches a computer-implemented method (Vaduva, Abstract: “The method presented in this paper … human inductive learning and reasoning in high-level scene understanding and content extraction”; section VI: “The source of the LDA code applied for experiments can be found in [21]”) comprising:
replacing visual words of an unsupervised machine learning classification model (Vaduva, page 2771, right col: “to model the spatial positioning of objects inside the scene, we propose latent Dirichlet allocation (LDA)”) with visual objects of an image. (Vaduva, page 2771, right col: “The novelty of this paper consists in an adequate analysis at the scene level: groups of objects are treated together, as a whole, being considered spatial visual words (blobs), unlike [15] and [16] where visual words are depicted by pixels or small patches of the scene”, see FIG. 2 and 3; Vaduva teaches using image regions of objects as visual words)
combining at least two co-occurring single visual objects adjacent to each other, based on threshold adjacency (Vaduva, Page 2776, left col: “Inside every document, we compute the spatial signature corresponding to all the possible two by two object combinations, considering that the distance between them should not exceed a desired threshold”; Page 2782: “. The objects inside every tile were grouped in pairs and spatial signatures computed for each pair of objects. With the purpose of engendering proper pairs of objects, a threshold was established for the distance between the centroids of the two objects envisaged. It had to be less than 1 kilometer (500 pixels).”), in pixels of the image to obtain a compound visual object (Vaduva, Page 2771: “For this reason, the idea of an image processing chain is first to extract objects (pixel level analysis) and then to define and understand configurations of regions (object level analysis) based on the relative positioning of objects in the configurations… Once they are extracted, the regions can be grouped in different configurations with different interpretations, according to their relative position”; Vaduva teaches extracting regions of objects and grouping them together based on their relative position);
	Vaduva does not explicitly disclose model the image as a mixture of horizontal layers.
Black is in the same field of art of image segmentation. Further, Black teaches model the image as a mixture of horizontal layers. (Black, col 4, 4th paragraph: “This invention provides a system which segments images into component elements by modelling the image as a series of combined layers. Each pixel in each layer corresponds to a pixel in the image being segmented. The brightness of the pixels within each layer is modelled as a parametric function of pixel position”; col 5-6)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva by incorporating method to model an image as a mixture of layers that is taught by Black, to make a machine learning model that can segment image into a mixture of layers; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to improve image segmentation (“an image segmentation process which has improved segmentation capabilities is required”).
The combination of Vaduva and Black does not explicitly disclose visual objects in a mixture of concentric circles centering on a mixture of intersections.
Zhao is in the same field of art of bag of visual words models. Further, Zhao teaches visual objects in a mixture of concentric circles (Zhao, page 4622-4623, subsection C: “To acquire the spatial information of rotation-invariance, this paper proposes to use the concentric circle structure to extract the spatial distribution of visual words”, see modified figure 2 (c) and (d) below) centering on a mixture of intersections. (Zhao, page 4622-4623, subsection C: “Let U be the set of coordinates (x, y) of all the keypoints in the image”, see FIG. 2-3, circles are centered on keypoints (coordinate) of the object. The Examiner notes coordinate is intersection of latitude and longitude)

    PNG
    media_image1.png
    419
    826
    media_image1.png
    Greyscale

Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva and Black by incorporating concentric circle image representation that is taught by Zhao, to make a method to represent image with concentric circles centered on objects; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to improve bag of visual word models (Zhao, page 4621, right col, 2nd paragraph: “To achieve rotation-invariance, this paper proposes a concentric circle-structured multiscale BOVW model for landuse scene classification, which is a simple but effective way to incorporate rotation-invariant spatial layout information of scene images into the original BOVW model…The proposed method exploits the concentric circle strategy to improve the multiresolution representation-based BOVW model”).
The combination of Vaduva, Black and Zhao teaches augmenting the unsupervised machine learning classification model to model the image as a mixture of subjects, where each subject is represented through placements of the visual objects in a mixture of concentric layered circles (Zhao, page 4622-4623, subsection C: “To acquire the spatial information of rotation-invariance, this paper proposes to use the concentric circle structure to extract the spatial distribution of visual words”, see FIG. 3) (Black, col 4, 4th paragraph: “This invention provides a system which segments images into component elements by modelling the image as a series of combined layers. Each pixel in each layer corresponds to a pixel in the image being segmented. The brightness of the pixels within each layer is modelled as a parametric function of pixel position”; col 5-6) (The Examiner notes layers of concentric circles is one representation of sphere) centering on a mixture of intersections (Zhao, page 4622-4623, subsection C. The Examiner notes coordinate is intersection of latitude and longitude) on a mixture of horizontal layers (see Black, col 4, 4th paragraph above); and learning latent relationships between the placements of the visual objects in a three- dimensional space depicted in the image and image semantics (Vaduva, Page 2775-2776, section V: “By counting the spatial visual words occurrences, we get a word frequency vector for each tile. LDA models each word in a document as a sample from a mixture model, where the mixture components can be viewed as representations of latent topics. Thus, LDA leaves flexibility to assign a different topic to every observed word in a document”. Vaduva teaches using LDA to learn latent topic of pair of objects in image, based on their relative position), wherein the learning trains (Vaduva, see section VI, system is trained with different datasets) the unsupervised machine learning classification model to perform image subject classification through the placements of the visual objects (Vaduva, Page 2780, right col: “The proposed method, however, returns topics with semantics, given the spatial similarity between pairs of regions”, see Fig. 23. Images are classified with corresponding topics) in a new image. (Vaduva, page 2772-2773, section III: “We automatically group the selected objects in pairs, two by two, and for every possible combination, a spatial signature is computed… By performing a k-means clustering on the image spatial signatures database, spatial visual words, “blobs” [19] are defined. In further processing, these spatial visual words will constitute inputs for a LDA model”. Vaduva teaches objects (or image regions of objects) are grouped and inputted to LDA model. The Examiner notes image region of objects in an image is different with the initial image)
The combination of Vaduva, Black and Zhao does not explicitly disclose visual objects in a mixture of concentric spheres.
Fox is in the same field of art of image data classification. Further, Fox teaches visual objects in a mixture of concentric spheres. (Fox, page 4-5, section 4.1: “discretizing space by concentric spheres, which serves as the input spatial dimensions to the proposed architecture. We then explain our method for mapping arbitrary point cloud data to this discretization”, see FIG. 2-4. Fox discloses using concentric spheres to represent point cloud data, point cloud data can be extracted from 2D images)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva, Black and Zhao by incorporating the concentric spheres representation that is taught by Fox, to make a method to represent image data using concentric spheres; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need for image representation that capture more information of objects (Fox, page 2-3, section 3: “Concentric spherical discretization, made up of multiple spheres at different radii, to more natively capture data according to their distribution in 3D space”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.


CLAIM 4
In regards to Claim 4, the combination of Vaduva, Black, Zhao and Fox teaches the method of Claim 1. In addition, the combination of Vaduva, Black, Zhao and Fox teaches the horizontal layers are defined based on gray values of the pixels of the image. (Black, col 4, 4th paragraph: “This invention provides a system which segments images into component elements by modelling the image as a series of combined layers. Each pixel in each layer corresponds to a pixel in the image being segmented. The brightness of the pixels within each layer is modelled as a parametric function of pixel position”; col 5-6. The Examiner notes pixel’s brightness is analogous with pixel’s gray value)

CLAIM 7
In regards to Claim 7, the combination of Vaduva, Black, Zhao and Fox teaches the method of Claim 1. In addition, the combination of Vaduva, Black, Zhao and Fox teaches using the unsupervised machine learning classification model that is trained, to perform the subject image classification on a given new image. (Vaduva, page 2772-2773, section III: “We automatically group the selected objects in pairs, two by two, and for every possible combination, a spatial signature is computed… By performing a k-means clustering on the image spatial signatures database, spatial visual words, “blobs” [19] are defined. In further processing, these spatial visual words will constitute inputs for a LDA model”. Vaduva teaches objects (or image regions of objects) are grouped and inputted to LDA model. The Examiner notes image region of objects in an image is different with the initial image)

CLAIM 8
In regards to Claim 8, Vaduva teaches a computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions readable by a device (Vaduva, Abstract: “The method presented in this paper … human inductive learning and reasoning in high-level scene understanding and content extraction”; section VI: “The source of the LDA code applied for experiments can be found in [21]”. The Examiner notes computer program code implies the existence of a computer) to cause the device to:
replacing visual words of an unsupervised machine learning classification model (Vaduva, page 2771, right col: “to model the spatial positioning of objects inside the scene, we propose latent Dirichlet allocation (LDA)”) with visual objects of an image. (Vaduva, page 2771, right col: “The novelty of this paper consists in an adequate analysis at the scene level: groups of objects are treated together, as a whole, being considered spatial visual words (blobs), unlike [15] and [16] where visual words are depicted by pixels or small patches of the scene”, see FIG. 2 and 3; Vaduva teaches using image regions of objects as visual words)
combining at least two co-occurring single visual objects adjacent to each other, based on threshold adjacency (Vaduva, Page 2776, left col: “Inside every document, we compute the spatial signature corresponding to all the possible two by two object combinations, considering that the distance between them should not exceed a desired threshold”; Page 2782: “. The objects inside every tile were grouped in pairs and spatial signatures computed for each pair of objects. With the purpose of engendering proper pairs of objects, a threshold was established for the distance between the centroids of the two objects envisaged. It had to be less than 1 kilometer (500 pixels).”), in pixels of the image to obtain a compound visual object (Vaduva, Page 2771: “For this reason, the idea of an image processing chain is first to extract objects (pixel level analysis) and then to define and understand configurations of regions (object level analysis) based on the relative positioning of objects in the configurations… Once they are extracted, the regions can be grouped in different configurations with different interpretations, according to their relative position”; Vaduva teaches extracting regions of objects and grouping them together based on their relative position);
	Vaduva does not explicitly disclose model the image as a mixture of horizontal layers.
Black is in the same field of art of image segmentation. Further, Black teaches model the image as a mixture of horizontal layers. (Black, col 4, 4th paragraph: “This invention provides a system which segments images into component elements by modelling the image as a series of combined layers. Each pixel in each layer corresponds to a pixel in the image being segmented. The brightness of the pixels within each layer is modelled as a parametric function of pixel position”; col 5-6). In addition, Black teaches a computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions readable by a device. (Black, col 8, 3rd paragraph: “a standard general purpose computer … a software module in the general purpose computer … a microprocessor … a memory”)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva by incorporating method to model an image as a mixture of layers that is taught by Black, to make a machine learning model that can segment image into a mixture of layers; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to improve image segmentation (“an image segmentation process which has improved segmentation capabilities is required”).
The combination of Vaduva and Black does not explicitly disclose visual objects in a mixture of concentric circles centering on a mixture of intersections.

    PNG
    media_image1.png
    419
    826
    media_image1.png
    Greyscale
Zhao is in the same field of art of bag of visual words models. Further, Zhao teaches visual objects in a mixture of concentric circles (Zhao, page 4622-4623, subsection C: “To acquire the spatial information of rotation-invariance, this paper proposes to use the concentric circle structure to extract the spatial distribution of visual words”, see modified figure 2 (c) and (d) below) centering on a mixture of intersections. (Zhao, page 4622-4623, subsection C: “Let U be the set of coordinates (x, y) of all the keypoints in the image”, see FIG. 2-3, circles are centered on keypoints (coordinate) of the object. The Examiner notes coordinate is intersection of latitude and longitude)

Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva and Black by incorporating concentric circle image representation that is taught by Zhao, to make a method to represent image with concentric circles centered on objects; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to improve bag of visual word models (Zhao, page 4621, right col, 2nd paragraph: “To achieve rotation-invariance, this paper proposes a concentric circle-structured multiscale BOVW model for landuse scene classification, which is a simple but effective way to incorporate rotation-invariant spatial layout information of scene images into the original BOVW model…The proposed method exploits the concentric circle strategy to improve the multiresolution representation-based BOVW model”).
The combination of Vaduva, Black and Zhao teaches augmenting the unsupervised machine learning classification model to model the image as a mixture of subjects, where each subject is represented through placements of the visual objects in a mixture of concentric layered circles (Zhao, page 4622-4623, subsection C: “To acquire the spatial information of rotation-invariance, this paper proposes to use the concentric circle structure to extract the spatial distribution of visual words”, see FIG. 3) (Black, col 4, 4th paragraph: “This invention provides a system which segments images into component elements by modelling the image as a series of combined layers. Each pixel in each layer corresponds to a pixel in the image being segmented. The brightness of the pixels within each layer is modelled as a parametric function of pixel position”; col 5-6) (The Examiner notes layers of concentric circles is one representation of sphere) centering on a mixture of intersections (Zhao, page 4622-4623, subsection C. The Examiner notes coordinate is intersection of latitude and longitude) on a mixture of horizontal layers (see Black, col 4, 4th paragraph above); and learning latent relationships between the placements of the visual objects in a three- dimensional space depicted in the image and image semantics (Vaduva, Page 2775-2776, section V: “By counting the spatial visual words occurrences, we get a word frequency vector for each tile. LDA models each word in a document as a sample from a mixture model, where the mixture components can be viewed as representations of latent topics. Thus, LDA leaves flexibility to assign a different topic to every observed word in a document”. Vaduva teaches using LDA to learn latent topic of pair of objects in image, based on their relative position), wherein the learning trains (Vaduva, see section VI, system is trained with different datasets) the unsupervised machine learning classification model to perform image subject classification through the placements of the visual objects (Vaduva, Page 2780, right col: “The proposed method, however, returns topics with semantics, given the spatial similarity between pairs of regions”, see Fig. 23. Images are classified with corresponding topics) in a new image. (Vaduva, page 2772-2773, section III: “We automatically group the selected objects in pairs, two by two, and for every possible combination, a spatial signature is computed… By performing a k-means clustering on the image spatial signatures database, spatial visual words, “blobs” [19] are defined. In further processing, these spatial visual words will constitute inputs for a LDA model”. Vaduva teaches objects (or image regions of objects) are grouped and inputted to LDA model. The Examiner notes image region of objects in an image is different with the initial image)
The combination of Vaduva, Black and Zhao does not explicitly disclose visual objects in a mixture of concentric spheres.
Fox is in the same field of art of image data classification. Further, Fox teaches visual objects in a mixture of concentric spheres. (Fox, page 4-5, section 4.1: “discretizing space by concentric spheres, which serves as the input spatial dimensions to the proposed architecture. We then explain our method for mapping arbitrary point cloud data to this discretization”, see FIG. 2-4. Fox discloses using concentric spheres to represent point cloud data, point cloud data can be extracted from 2D images)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva, Black and Zhao by incorporating the concentric spheres representation that is taught by Fox, to make a method to represent image data using concentric spheres; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need for image representation that capture more information of objects (Fox, page 2-3, section 3: “Concentric spherical discretization, made up of multiple spheres at different radii, to more natively capture data according to their distribution in 3D space”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.



CLAIM 11
In regards to Claim 11, the combination of Vaduva, Black, Zhao and Fox teaches the computer program product of Claim 8. In addition, the combination of Vaduva, Black, Zhao and Fox teaches the horizontal layers are defined based on gray values of the pixels of the image. (Black, col 4, 4th paragraph: “This invention provides a system which segments images into component elements by modelling the image as a series of combined layers. Each pixel in each layer corresponds to a pixel in the image being segmented. The brightness of the pixels within each layer is modelled as a parametric function of pixel position”; col 5-6. The Examiner notes pixel’s brightness is analogous with pixel’s gray value)

CLAIM 14
In regards to Claim 14, the combination of Vaduva, Black, Zhao and Fox teaches the computer program product of Claim 8. In addition, the combination of Vaduva, Black, Zhao and Fox teaches using the unsupervised machine learning classification model that is trained, to perform the subject image classification on a given new image. (Vaduva, page 2772-2773, section III: “We automatically group the selected objects in pairs, two by two, and for every possible combination, a spatial signature is computed… By performing a k-means clustering on the image spatial signatures database, spatial visual words, “blobs” [19] are defined. In further processing, these spatial visual words will constitute inputs for a LDA model”. Vaduva teaches objects (or image regions of objects) are grouped and inputted to LDA model. The Examiner notes image region of objects in an image is different with the initial image)

CLAIM 15
In regards to Claim 15, Vaduva teaches a system comprising: at least one processor;
a memory device coupled with the at least one processor (Vaduva, Abstract: “The method presented in this paper … human inductive learning and reasoning in high-level scene understanding and content extraction”; section VI: “The source of the LDA code applied for experiments can be found in [21]”. The Examiner notes computer program code implies the existence of a computer); the at least one processor configured at least to:
replacing visual words of an unsupervised machine learning classification model (Vaduva, page 2771, right col: “to model the spatial positioning of objects inside the scene, we propose latent Dirichlet allocation (LDA)”) with visual objects of an image. (Vaduva, page 2771, right col: “The novelty of this paper consists in an adequate analysis at the scene level: groups of objects are treated together, as a whole, being considered spatial visual words (blobs), unlike [15] and [16] where visual words are depicted by pixels or small patches of the scene”, see FIG. 2 and 3; Vaduva teaches using image regions of objects as visual words)
combining at least two co-occurring single visual objects adjacent to each other, based on threshold adjacency (Vaduva, Page 2776, left col: “Inside every document, we compute the spatial signature corresponding to all the possible two by two object combinations, considering that the distance between them should not exceed a desired threshold”; Page 2782: “. The objects inside every tile were grouped in pairs and spatial signatures computed for each pair of objects. With the purpose of engendering proper pairs of objects, a threshold was established for the distance between the centroids of the two objects envisaged. It had to be less than 1 kilometer (500 pixels).”), in pixels of the image to obtain a compound visual object (Vaduva, Page 2771: “For this reason, the idea of an image processing chain is first to extract objects (pixel level analysis) and then to define and understand configurations of regions (object level analysis) based on the relative positioning of objects in the configurations… Once they are extracted, the regions can be grouped in different configurations with different interpretations, according to their relative position”; Vaduva teaches extracting regions of objects and grouping them together based on their relative position);
	Vaduva does not explicitly disclose model the image as a mixture of horizontal layers.
Black is in the same field of art of image segmentation. Further, Black teaches model the image as a mixture of horizontal layers. (Black, col 4, 4th paragraph: “This invention provides a system which segments images into component elements by modelling the image as a series of combined layers. Each pixel in each layer corresponds to a pixel in the image being segmented. The brightness of the pixels within each layer is modelled as a parametric function of pixel position”; col 5-6). In addition, Black teaches a system comprising: at least one processor; a memory device coupled with the at least one processor. (Black, col 8, 3rd paragraph: “a standard general purpose computer … a microprocessor … a memory”)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva by incorporating method to model an image as a mixture of layers that is taught by Black, to make a machine learning model that can segment image into a mixture of layers; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to improve image segmentation (“an image segmentation process which has improved segmentation capabilities is required”).
The combination of Vaduva and Black does not explicitly disclose visual objects in a mixture of concentric circles centering on a mixture of intersections.

    PNG
    media_image1.png
    419
    826
    media_image1.png
    Greyscale
Zhao is in the same field of art of bag of visual words models. Further, Zhao teaches visual objects in a mixture of concentric circles (Zhao, page 4622-4623, subsection C: “To acquire the spatial information of rotation-invariance, this paper proposes to use the concentric circle structure to extract the spatial distribution of visual words”, see modified figure 2 (c) and (d) below) centering on a mixture of intersections. (Zhao, page 4622-4623, subsection C: “Let U be the set of coordinates (x, y) of all the keypoints in the image”, see FIG. 2-3, circles are centered on keypoints (coordinate) of the object. The Examiner notes coordinate is intersection of latitude and longitude)

Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva and Black by incorporating concentric circle image representation that is taught by Zhao, to make a method to represent image with concentric circles centered on objects; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to improve bag of visual word models (Zhao, page 4621, right col, 2nd paragraph: “To achieve rotation-invariance, this paper proposes a concentric circle-structured multiscale BOVW model for landuse scene classification, which is a simple but effective way to incorporate rotation-invariant spatial layout information of scene images into the original BOVW model…The proposed method exploits the concentric circle strategy to improve the multiresolution representation-based BOVW model”).
The combination of Vaduva, Black and Zhao teaches augmenting the unsupervised machine learning classification model to model the image as a mixture of subjects, where each subject is represented through placements of the visual objects in a mixture of concentric layered circles (Zhao, page 4622-4623, subsection C: “To acquire the spatial information of rotation-invariance, this paper proposes to use the concentric circle structure to extract the spatial distribution of visual words”, see FIG. 3) (Black, col 4, 4th paragraph: “This invention provides a system which segments images into component elements by modelling the image as a series of combined layers. Each pixel in each layer corresponds to a pixel in the image being segmented. The brightness of the pixels within each layer is modelled as a parametric function of pixel position”; col 5-6) (The Examiner notes layers of concentric circles is one representation of sphere) centering on a mixture of intersections (Zhao, page 4622-4623, subsection C. The Examiner notes coordinate is intersection of latitude and longitude) on a mixture of horizontal layers (see Black, col 4, 4th paragraph above); and learning latent relationships between the placements of the visual objects in a three- dimensional space depicted in the image and image semantics (Vaduva, Page 2775-2776, section V: “By counting the spatial visual words occurrences, we get a word frequency vector for each tile. LDA models each word in a document as a sample from a mixture model, where the mixture components can be viewed as representations of latent topics. Thus, LDA leaves flexibility to assign a different topic to every observed word in a document”. Vaduva teaches using LDA to learn latent topic of pair of objects in image, based on their relative position), wherein the learning trains (Vaduva, see section VI, system is trained with different datasets) the unsupervised machine learning classification model to perform image subject classification through the placements of the visual objects (Vaduva, Page 2780, right col: “The proposed method, however, returns topics with semantics, given the spatial similarity between pairs of regions”, see Fig. 23. Images are classified with corresponding topics) in a new image. (Vaduva, page 2772-2773, section III: “We automatically group the selected objects in pairs, two by two, and for every possible combination, a spatial signature is computed… By performing a k-means clustering on the image spatial signatures database, spatial visual words, “blobs” [19] are defined. In further processing, these spatial visual words will constitute inputs for a LDA model”. Vaduva teaches objects (or image regions of objects) are grouped and inputted to LDA model. The Examiner notes image region of objects in an image is different with the initial image)
The combination of Vaduva, Black and Zhao does not explicitly disclose visual objects in a mixture of concentric spheres.
Fox is in the same field of art of image data classification. Further, Fox teaches visual objects in a mixture of concentric spheres. (Fox, page 4-5, section 4.1: “discretizing space by concentric spheres, which serves as the input spatial dimensions to the proposed architecture. We then explain our method for mapping arbitrary point cloud data to this discretization”, see FIG. 2-4. Fox discloses using concentric spheres to represent point cloud data, point cloud data can be extracted from 2D images)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva, Black and Zhao by incorporating the concentric spheres representation that is taught by Fox, to make a method to represent image data using concentric spheres; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need for image representation that capture more information of objects (Fox, page 2-3, section 3: “Concentric spherical discretization, made up of multiple spheres at different radii, to more natively capture data according to their distribution in 3D space”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.


CLAIM 18
In regards to Claim 18, the combination of Vaduva, Black, Zhao and Fox teaches the system of Claim 15. In addition, the combination of Vaduva, Black, Zhao and Fox teaches the horizontal layers are defined based on gray values of the pixels of the image. (Black, col 4, 4th paragraph: “This invention provides a system which segments images into component elements by modelling the image as a series of combined layers. Each pixel in each layer corresponds to a pixel in the image being segmented. The brightness of the pixels within each layer is modelled as a parametric function of pixel position”; col 5-6. The Examiner notes pixel’s brightness is analogous with pixel’s gray value)


Claim(s) 3, 10 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vaduva in view of Black in view of Zhao in view of Fox and further in view of Wikipedia (Wikipedia, "Latent Dirichlet allocation", a 2021 archived copy of this document is attached, https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation).
CLAIM 3
In regards to Claim 3, the combination of Vaduva, Black, Zhao and Fox teaches the method of Claim 1. 
The combination of Vaduva, Black, Zhao and Fox does not explicitly disclose extending Gibbs sampling equation for unsupervised machine learning classification model.
Wikipedia is in the same field of art of Latent Dirichlet Allocation model. Further, Wikipedia teaches extending Gibbs sampling equation for unsupervised machine learning classification model. (Wikipedia, page 4, section Alternative approaches: “Recent research has been focused on speeding up the inference of latent Dirichlet allocation to support the capture of a massive number of topics in a large number of documents. The update equation of the collapsed Gibbs sampler mentioned in the earlier section has a natural sparsity within it that can be taken advantage of”) 
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva, Black, Zhao and Fox by incorporating Gibbs sampling that is taught by Wikipedia, to make an LDA model that use Gibbs sampling to assign topics; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to improve the speed of LDA model (Wikipedia, page 4, section Alternative approaches: “Recent research has been focused on speeding up the inference of latent Dirichlet allocation to support the capture of a massive number of topics in a large number of documents. The update equation of the collapsed Gibbs sampler mentioned in the earlier section has a natural sparsity within it that can be taken advantage of”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 10
In regards to Claim 10, the combination of Vaduva, Black, Zhao and Fox teaches the computer program product of Claim 8. 
The combination of Vaduva, Black, Zhao and Fox does not explicitly disclose extending Gibbs sampling equation for unsupervised machine learning classification model.
Wikipedia is in the same field of art of Latent Dirichlet Allocation model. Further, Wikipedia teaches extending Gibbs sampling equation for unsupervised machine learning classification model. (Wikipedia, page 4, section Alternative approaches: “Recent research has been focused on speeding up the inference of latent Dirichlet allocation to support the capture of a massive number of topics in a large number of documents. The update equation of the collapsed Gibbs sampler mentioned in the earlier section has a natural sparsity within it that can be taken advantage of”) 
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva, Black, Zhao and Fox by incorporating Gibbs sampling that is taught by Wikipedia, to make an LDA model that use Gibbs sampling to assign topics; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to improve the speed of LDA model (Wikipedia, page 4, section Alternative approaches: “Recent research has been focused on speeding up the inference of latent Dirichlet allocation to support the capture of a massive number of topics in a large number of documents. The update equation of the collapsed Gibbs sampler mentioned in the earlier section has a natural sparsity within it that can be taken advantage of”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 17
In regards to Claim 17, the combination of Vaduva, Black, Zhao and Fox teaches the system of Claim 15. 
The combination of Vaduva, Black, Zhao and Fox does not explicitly disclose extending Gibbs sampling equation for unsupervised machine learning classification model.
Wikipedia is in the same field of art of Latent Dirichlet Allocation model. Further, Wikipedia teaches extending Gibbs sampling equation for unsupervised machine learning classification model. (Wikipedia, page 4, section Alternative approaches: “Recent research has been focused on speeding up the inference of latent Dirichlet allocation to support the capture of a massive number of topics in a large number of documents. The update equation of the collapsed Gibbs sampler mentioned in the earlier section has a natural sparsity within it that can be taken advantage of”) 
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva, Black, Zhao and Fox by incorporating Gibbs sampling that is taught by Wikipedia, to make an LDA model that use Gibbs sampling to assign topics; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to improve the speed of LDA model (Wikipedia, page 4, section Alternative approaches: “Recent research has been focused on speeding up the inference of latent Dirichlet allocation to support the capture of a massive number of topics in a large number of documents. The update equation of the collapsed Gibbs sampler mentioned in the earlier section has a natural sparsity within it that can be taken advantage of”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

Claim(s) 5-6, 12-13 and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vaduva in view of Black in view of Zhao in view of Fox, and further in view of Cai (Cai, Wenjie, et al. "Panoptic segmentation-based attention for image captioning." Applied Sciences 10.1 (2020), hereinafter Cai).
CLAIM 5
In regards to Claim 5, the combination of Vaduva, Black, Zhao and Fox teaches the method of Claim 1. 
The combination of Vaduva, Black, Zhao and Fox does not explicitly disclose the visual objects of the image are determined using panoptic segmentation.
Cai is in the same field of art of generating textual description for image. Further, Cai teaches the visual objects of the image are determined using panoptic segmentation. (Cai, section 4: “… Note that in panoptic segmentation m contains things and stuff classes while in instance segmentation m only contains things classes”, see FIG. 1; section 6.2: “As there is no current available model to jointly perform both elements of the panoptic segmentation, we used Mask R-CNN [7] to perform instance segmentation for things classes and DeepLab [36] to perform semantic segmentation for stuff classes”; section 2 Related work: “Latent Dirichlet allocation”. Cai teaches extracting image features using panoptic segmentation and instance segmentation, then utilizing an LSTM to generate textual information. Cai also mention system that use LDA instead of LSTM)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva, Black, Zhao and Fox by incorporating panoptic and instance segmentation that is taught by Cai, to make a system that extract object using panoptic and instance segmentation, then generating latent relationship between extracted objects; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to improve performance of recognizing objects and better understanding of image’s scene (Cai, Abstract: “Experimental results showed that our model could recognize the overlapped objects and understand the scene better. Our approach achieved competitive performance against state-of-the-art methods”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 6
In regards to Claim 6, the combination of Vaduva, Black, Zhao and Fox teaches the method of Claim 1. 
The combination of Vaduva, Black, Zhao and Fox does not explicitly disclose the visual objects of the image are determined using instance segmentation.
Cai is in the same field of art of generating textual description for image. Further, Cai teaches the visual objects of the image are determined using instance segmentation. (Cai, section 4: “… Note that in panoptic segmentation m contains things and stuff classes while in instance segmentation m only contains things classes”, see FIG. 1; section 6.2: “As there is no current available model to jointly perform both elements of the panoptic segmentation, we used Mask R-CNN [7] to perform instance segmentation for things classes and DeepLab [36] to perform semantic segmentation for stuff classes”; section 2 Related work: “Latent Dirichlet allocation”. Cai teaches extracting image features using panoptic segmentation and instance segmentation, then utilizing an LSTM to generate textual information. Cai also mention system that use LDA instead of LSTM)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva, Black, Zhao and Fox by incorporating panoptic and instance segmentation that is taught by Cai, to make a system that extract object using panoptic and instance segmentation, then generating latent relationship between extracted objects; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to improve performance of recognizing objects and better understanding of image’s scene (Cai, Abstract: “Experimental results showed that our model could recognize the overlapped objects and understand the scene better. Our approach achieved competitive performance against state-of-the-art methods”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 12
In regards to Claim 12, the combination of Vaduva, Black, Zhao and Fox teaches the computer program product of Claim 8. 
The combination of Vaduva, Black, Zhao and Fox does not explicitly disclose the visual objects of the image are determined using panoptic segmentation.
Cai is in the same field of art of generating textual description for image. Further, Cai teaches the visual objects of the image are determined using panoptic segmentation. (Cai, section 4: “… Note that in panoptic segmentation m contains things and stuff classes while in instance segmentation m only contains things classes”, see FIG. 1; section 6.2: “As there is no current available model to jointly perform both elements of the panoptic segmentation, we used Mask R-CNN [7] to perform instance segmentation for things classes and DeepLab [36] to perform semantic segmentation for stuff classes”; section 2 Related work: “Latent Dirichlet allocation”. Cai teaches extracting image features using panoptic segmentation and instance segmentation, then utilizing an LSTM to generate textual information. Cai also mention system that use LDA instead of LSTM)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Vaduva, Black, Zhao and Fox by incorporating panoptic and instance segmentation that is taught by Cai, to make a system that extract object using panoptic and instance segmentation, then generating latent relationship between extracted objects; thus, one of ordinary skilled in the art would be motivated to combine the reference
Read full office action
Prosecution Timeline

Sep 21, 2022
Application Filed
Oct 05, 2023
Response after Non-Final Action
Nov 10, 2025
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/925,903
Patent 12598397
DIRT DETECTION METHOD AND DEVICE FOR CAMERA COVER
2y 5m to grant Granted Apr 07, 2026
17/990,310
Patent 12598074
FACIAL RECOGNITION METHOD AND APPARATUS, DEVICE, AND MEDIUM
2y 5m to grant Granted Apr 07, 2026
17/992,917
Patent 12597254
TRACKING OPERATING ROOM PHASE FROM CAPTURED VIDEO OF THE OPERATING ROOM
2y 5m to grant Granted Apr 07, 2026
18/125,767
Patent 12592087
IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM
2y 5m to grant Granted Mar 31, 2026
17/973,627
Patent 12579622
METHOD AND APPARATUS FOR PROCESSING IMAGE SIGNAL, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
79%
Grant Probability
99%
With Interview (+26.8%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 53 resolved cases by this examiner. Grant probability derived from career allow rate.
AUTO-GROUPING GALLERY WITH IMAGE SUBJECT CLASSIFICATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email