Last updated: May 29, 2026
Application No. 18/283,736
IMAGE LAYERING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM

Final Rejection §101§102§103
Filed
Sep 22, 2023
Priority
Aug 12, 2022 — CN 202210970146.3 +1 more
Examiner
SATCHER, DION JOHN
Art Unit
2676
Tech Center
2600 — Communications
Assignee
BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.
OA Round
2 (Final)
Interview Optional

— +14.1% interview lift. Interview lift (+14.1%) is below the 15.0% threshold. A written response is recommended.
Based on 42 resolved cases, 2023–2026
Examiner Intelligence

SATCHER, DION JOHN View full profile →
Grants 86% — above average
Career Allowance Rate
36 granted / 42 resolved
+23.7% vs TC avg
Moderate +14% lift
Without
With
+14.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
21 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.5%
-37.5% vs TC avg
§103
94.2%
+54.2% vs TC avg
§102
1.7%
-38.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 42 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant’s Amendments filed on 03/10/2026 has been entered and made of record. 
Currently pending Claim(s):
Independent Claim(s): 
Amended Claim(s):
Cancelled Claim(s):
1–12 and 14–21
1, 14 and 15
1–3, 6, 9, 10, 14–17 and 20
13

Response to Applicant’s Arguments
This office action is responsive to Applicant’s Arguments/Remarks Made in an Amendment received on 03/10/2026.
In view of the amendments filed on 03/10/2026 to claim(s) 9 and 10 the claim objection to claim(s) 9 and 10 are withdrawn.
In view of the amendments filed on 03/10/2026 to the specification, the specification objection is withdrawn.
In view of applicant Arguments/Remarks and amendment filed on 03/10/2026 with respect to independent claims 1, 14 and 15 under 35 U.S.C 101 abstract idea, claim rejection has been fully considered and the arguments are found to be persuasive (See Page(s) 12–14), therefore the claim rejection with respect to 35 U.S.C. 101 abstract idea is withdrawn. 
In view of applicant Arguments/Remarks and amendment filed on 03/10/2026 with respect to independent claim 15 under 35 U.S.C 101 CRM, claim rejection has been fully considered and the arguments are found to be not persuasive (See Page(s) 13), therefore the claim rejection with respect to 35 U.S.C. 101 CRM still applies. Applicant has cited on page 13 that claim 15 recites a “non-transitory storage medium” but claim 15 seems be missing that recitation within the claim. Perhaps this is a typo or a mistake, in which case applicant can address the issue at their earliest convenience by amending claim 15 to recite the “non-transitory storage medium”.  
In view of applicant Arguments/Remarks and amendment filed on 03/10/2026 with respect to independent claims 1, 14 and 15 under 35 U.S.C 103, claim rejection has been fully considered and the arguments are found to be not persuasive (See Page(s) 15 and 16), therefore the claim rejection with respect to 35 U.S.C. 103 still applies.
Applicant’s Reply (March 10, 2026) includes substantive amendments to the claims. This Office action has been updated with a new grounds of rejection addressing those amendments. Further Applicant’s Arguments/Remarks with respect to the independent claims 1, 14 and 15 have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection and the arguments are now rejected by newly cited art ‘Zhao et al. (US 20150189239 A1)’ as explained in the body of the rejection below. 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because they cover both statutory and non-statutory embodiments (under the broadest reasonable interpretation of the claim when read in light of the specification and in view of one skilled in the art) and embraces subject matter that is not eligible for patent protection and therefore is directed to non-statutory subject matter.
“[a] transitory, propagating signal … is not a “process, machine, manufacture, or composition of matter.”  Those four categories define the explicit scope and reach of subject matter patentable under 35 U.S.C. § 101; thus, such a signal cannot be patentable subject matter.”  (In re Petrus A.C.M. Nuijten; Fed Cir, 2006-1371, 9/20/2007).
Specifically, Applicant’s specification describes at [Pg. 30, ln. 15–18] of the specification recites: “The computer-readable signal medium may further be any computer-readable medium other than the computer-readable storage medium. The computer-readable signal medium may send, propagate, or transmit a program used by or in conjunction with an instruction execution system, apparatus, or device” describes and as a result is drawn to a recording medium that covers a signal per se. Thus, the claims are not eligible subject matter. It is recommended to amend and narrow the claims to cover only statutory embodiments to avoid a rejection under 35 U.S.C. § 101 by adding the limitation "non-transitory" to the claims. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claim(s) 1, 2, 8, 11, 12, 14, 15 and 16 are rejected under 35 U.S.C. 102 as being unpatentable over Gross et al. (US 20150235408 A1, hereafter, "Gross") in view of Liu et al. (See NPL attached, "Collaborative Deconvolutional Neural Networks for Joint Depth Estimation and Semantic Segmentation", hereafter, "Liu") further in view of Zhao et al. (US 20150189239 A1, hereafter, “Zhao”).
Regarding claim 1, Gross discloses an image layering method, performed by an electronic device (See Gross, [Abstract], A pseudo-three dimensional image may be created from a two dimensional image by segmenting the two dimensional image, adjusting the scale of individual segments of the two dimensional image, then superimposing the scaled segment as layers of the pseudo-three dimensional image), comprising: 
acquiring a to-be-processed two-dimensional scene image (See Gross, ¶ [0031], For example, this image may be a conventional 2D image captured by a single camera, a stereo image captured with 2 or more 2D cameras, or even a synthetically rendered image (2D or 3D)), and determining a target segmentation image of the two-dimensional scene image and a target depth image of the two- dimensional scene image, wherein the target segmentation image comprises at least one target segmentation region (See Gross, ¶ [0041], In block 810, a depth map is generated for the image, using any desired technique. Then in block 820, the depth map is used to segment the image into 2 or more segments. See also [FIG. 8], 810 and 820); 
determining a depth level corresponding to at least part of the at least one target segmentation region in the target segmentation image according to the target segmentation image and the target depth image (See Gross, ¶ [0041], In block 830, the segments are rendered as a multilayer image. When rendering the image, the layers are arranged based on the depth map, so that the more foreground layers are layered on top of the more background layer(s), in one embodiment, to produce a pseudo-3D image that appears to bring the more foreground layers toward the observer. Note: They are layered based on their depth in the map on whether they are a foreground or background. So the examiner is interpreting that as determining its depth level); and
generating a target layering image corresponding to the two-dimensional scene image based on the at least one target segmentation region and the depth level corresponding to the at least part of the at least one target segmentation region (See Gross, ¶ [0041], In block 830, the segments are rendered as a multilayer image. When rendering the image, the layers are arranged based on the depth map, so that the more foreground layers are layered on top of the more background layer(s), in one embodiment, to produce a pseudo-3D image that appears to bring the more foreground layers toward the observer. Note: the Examiner is interpreting the multilayer image or rather the layering of the segments as the target layering image).
wherein determining the target segmentation image of the two-dimensional scene image (See Gross, ¶ [0041], In block 810, a depth map is generated for the image, using any desired technique. Then in block 820, the depth map is used to segment the image into 2 or more segments. See also [FIG. 8], 810 and 820) comprises: 
[determining a semantic segmentation image of the two-dimensional scene image based on a pre-trained semantic segmentation model, wherein preliminary segmentation regions are marked in the semantic segmentation image, and each of the preliminary segmentation regions corresponds to a semantic category; 
wherein a to-be-processed segmentation region is determined from at least one preliminary segmentation region of the preliminary segmentation regions, the to-be-processed segmentation region is adjacent to a plurality of adjacent preliminary segmentation regions, and a semantic category of an adjacent preliminary segmentation region of the plurality of adjacent preliminary segmentation regions is used to update a semantic category of the to-be-processed segmentation region].
However, Gross fail(s) to teach determining a semantic segmentation image of the two-dimensional scene image based on a pre-trained semantic segmentation model, wherein preliminary segmentation regions are marked in the semantic segmentation image, and each of the preliminary segmentation regions corresponds to a semantic category; wherein a to-be-processed segmentation region is determined from at least one preliminary segmentation region of the preliminary segmentation regions, the to-be-processed segmentation region is adjacent to a plurality of adjacent preliminary segmentation regions, and a semantic category of an adjacent preliminary segmentation region of the plurality of adjacent preliminary segmentation regions is used to update a semantic category of the to-be-processed segmentation region.
Liu, working in the same field of endeavor, teaches: determining a semantic segmentation image of the two-dimensional scene image based on a pre-trained semantic segmentation model, wherein preliminary segmentation regions are marked in the semantic segmentation image, and each of the preliminary segmentation regions corresponds to a semantic category (See Liu, [Pg. 5657, Col. 2, ln. 8–11], In the third stage, a fully connected CRF is used to further improve the performance of semantic segmentation using the predicted depth and semantic labels. See also [FIG. 1]. Note: the Examiner is interpreting the CRF as the semantic segmentation model and the semantic labels as the segmentation regions and categories).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference to determining a semantic segmentation image of the two-dimensional scene image based on a pre-trained semantic segmentation model, wherein preliminary segmentation regions are marked in the semantic segmentation image, and each of the preliminary segmentation regions corresponds to a semantic category based on the method of Liu’s reference. The suggestion/motivation would have been to improve the accuracy of the segmentation and the depth estimation (See Liu, [Pg. 5661, Col. 2, ln. 13–20]).
However, Gross and Liu fail(s) to teach wherein a to-be-processed segmentation region is determined from at least one preliminary segmentation region of the preliminary segmentation regions, the to-be-processed segmentation region is adjacent to a plurality of adjacent preliminary segmentation regions, and a semantic category of an adjacent preliminary segmentation region of the plurality of adjacent preliminary segmentation regions is used to update a semantic category of the to-be-processed segmentation region.
Zhao, working in the same field of endeavor, teaches: wherein a to-be-processed segmentation region is determined from at least one preliminary segmentation region of the preliminary segmentation regions, the to-be-processed segmentation region is adjacent to a plurality of adjacent preliminary segmentation regions (See Zhao, ¶ [0093], On the same image, small regions that pertain to a same category and are adjacent to or interconnected with each other are merged into a large region, which is referred to as a cluster. Note: Examiner is interpreting the to-be-processed segmentation region to one of the regions in the adjacent regions), and a semantic category of an adjacent preliminary segmentation region of the plurality of adjacent preliminary segmentation regions is used to update a semantic category of the to-be-processed segmentation region (See Zhao, ¶ [0093], On the same image, small regions that pertain to a same category and are adjacent to or interconnected with each other are merged into a large region, which is referred to as a cluster. Each cluster includes at least one super pixel. However, a cluster typically includes tens to even hundreds of super pixels. Note: Examiner is interpreting the updating as merging the regions of the similar categories).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference to wherein a to-be-processed segmentation region is determined from at least one preliminary segmentation region of the preliminary segmentation regions, the to-be-processed segmentation region is adjacent to a plurality of adjacent preliminary segmentation regions, and a semantic category of an adjacent preliminary segmentation region of the plurality of adjacent preliminary segmentation regions is used to update a semantic category of the to-be-processed segmentation region based on the method of Zhao’s reference. The suggestion/motivation would have been to automatically segment and accurately inspect items (See Zhao, ¶ [0002–0005]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Liu and Zhao with Gross to obtain the invention as specified in claim 1.
Regarding claim 2, Gross in view of Liu further in view of Zhao teaches the image layering method according to claim 1, wherein determining the target segmentation image of the two-dimensional scene image (See Gross, ¶ [0041], Then in block 820, the depth map is used to segment the image into 2 or more segments. See also [FIG. 8], 820) comprises: 
[processing, according to at least part of the preliminary segmentation regions and a semantic category to which each of the preliminary segmentation regions belongs, the semantic segmentation image to obtain the target segmentation image].
However, Gross and Zhao fail(s) to teach processing, according to at least part of the preliminary segmentation regions and a semantic category to which each of the preliminary segmentation regions belongs, the semantic segmentation image to obtain the target segmentation image.
Liu, working in the same field of endeavor, teaches: processing, according to at least part of the preliminary segmentation regions and a semantic category to which each of the preliminary segmentation regions belongs, the semantic segmentation image to obtain the target segmentation image (See Liu, [Pg. 5657, Col. 2, ln. 8–11], In the third stage, a fully connected CRF is used to further improve the performance of semantic segmentation using the predicted depth and semantic labels. See also [FIG. 1], Fully Connected CRP and output segmentation image. Note: Examiner is interpreting the semantic segmentation image as the semantic segmentation with the labels and output image as the target segmentation image).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference processing, according to at least part of the preliminary segmentation regions and a semantic category to which each of the preliminary segmentation regions belongs, the semantic segmentation image to obtain the target segmentation image based on the method of Liu’s reference. The suggestion/motivation would have been to improve the accuracy of the segmentation and the depth estimation (See Liu, [Pg. 5661, Col. 2, ln. 13–20]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Liu with Gross and Zhao to obtain the invention as specified in claim 2.
Regarding claim 8, Gross in view of Liu further in view of Zhao teaches the image layering method according to claim 1, wherein determining the target depth image of the two-dimensional scene image (See Gross, ¶ [0041], In block 810, a depth map is generated for the image, using any desired technique. See also [FIG. 8], 810) comprises:
[determining the target depth image of the two-dimensional scene image based on a pre- trained depth estimation model, wherein the depth estimation model is trained according to a sample two-dimensional image and an expected depth image corresponding to the sample two- dimensional image].
However, Gross and Zhao fail(s) to teach determining the target depth image of the two-dimensional scene image based on a pre- trained depth estimation model, wherein the depth estimation model is trained according to a sample two-dimensional image and an expected depth image corresponding to the sample two- dimensional image.
Liu, working in the same field of endeavor, teaches: determining the target depth image of the two-dimensional scene image based on a pre- trained depth estimation model, wherein the depth estimation model is trained according to a sample two-dimensional image and an expected depth image corresponding to the sample two- dimensional image (See Liu, [Pg. 5658, Col. 2, ln. 39-43], Depth estimation is essentially a regression problem, since the depth values are continuous. In this paper, we transform it into a classification problem and map the continuous depth values to discrete depth labels as the ground truth. Note: that the network uses supervised learning which is learning based on labeled datasets with the expected depth image).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference determining the target depth image of the two-dimensional scene image based on a pre- trained depth estimation model, wherein the depth estimation model is trained according to a sample two-dimensional image and an expected depth image corresponding to the sample two- dimensional image based on the method of Liu’s reference. The suggestion/motivation would have been to improve the accuracy of the segmentation and the depth estimation (See Liu, [Pg. 5661, Col. 2, ln. 13–20]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Liu with Gross and Zhao to obtain the invention as specified in claim 8.
Regarding claim 11, Gross teaches the image layering method according to claim 1, wherein generating the target layering image corresponding to the two-dimensional scene image based on the depth level corresponding to the at least part of the at least one target segmentation region (See Gross, ¶ [0041], In block 830, the segments are rendered as a multilayer image. When rendering the image, the layers are arranged based on the depth map, so that the more foreground layers are layered on top of the more background layer(s), in one embodiment, to produce a pseudo-3D image that appears to bring the more foreground layers toward the observer. Note: the Examiner is interpreting the multilayer image as the target layering image) comprises:
marking each of the at least one target segmentation region in the target segmentation image based on a preset region marking manner corresponding to each depth level and a depth level corresponding to the each of the at least one target segmentation region to obtain the target layering image corresponding to the two-dimensional scene image (See Gross, ¶ [0042], Then in block 830, the pseudo-3D image may be presented to simulate depth of field with parallax. In one embodiment, the scaling is performed corresponding to a depth ordering of the layers, with more foreground layers are scaled monotonically greater than the more background layers, so that each foreground layer is larger relative to its immediately more background layer than in the original image. In one embodiment, color grading or other color differentiation techniques may be used as desired to help the foreground objects "pop out" better. Note: Examiner is interpreting the color grading or differentiation as the marking manner).
Regarding claim 12, Gross teaches the image layering method according to claim 1, wherein acquiring the to- be-processed two-dimensional scene image (See Gross, ¶ [0031], For example, this image may be a conventional 2D image captured by a single camera, a stereo image captured with 2 or more 2D cameras, or even a synthetically rendered image (2D or 3D)) comprises:
in response to an image conversion trigger operation, acquiring the to-be-processed two- dimensional scene image (See Gross, ¶ [0031], For example, this image may be a conventional 2D image captured by a single camera, a stereo image captured with 2 or more 2D cameras, or even a synthetically rendered image (2D or 3D). Note: can trigger a camera to capture an image); and
wherein after generating the target layering image corresponding to the two-dimensional scene image based on the at least one target segmentation region and the depth level corresponding to the at least part of the at least one target segmentation region (See Gross, ¶ [0041], In block 830, the segments are rendered as a multilayer image. When rendering the image, the layers are arranged based on the depth map, so that the more foreground layers are layered on top of the more background layer(s), in one embodiment, to produce a pseudo-3D image that appears to bring the more foreground layers toward the observer. Note: the Examiner is interpreting the multilayer image as the target layering image), the method further comprises:
generating a three-dimensional scene image based on the two-dimensional scene image and the target layering image, and displaying the three-dimensional scene image (See Gross, ¶ [0042], Then in block 830, the pseudo-3D image may be presented to simulate depth of field with parallax. In one embodiment, the scaling is performed corresponding to a depth ordering of the layers, with more foreground layers are scaled monotonically greater than the more background layers, so that each foreground layer is larger relative to its immediately more background layer than in the original image).
Regarding claim 14, claim 14 is rejected the same as claim 1 and the arguments similar to that presented above for claim 1 are equally applicable to the claim 14, and all of the other limitations similar to claim 1 are not repeated herein, but incorporated by reference. Furthermore, Gross teaches an electronic device, comprising: at least one processor; and a memory configured to store at least one program, wherein the at least one program, when executed by the at least one processor, cause the at least one processor to implement (See Gross, [FIG. 2], 216 PROCESSOR, 212 MEMORY).
Regarding claim 15, claim 15 is rejected the same as claim 1 and the arguments similar to that presented above for claim 1 are equally applicable to the claim 15, and all of the other limitations similar to claim 1 are not repeated herein, but incorporated by reference. Furthermore, Gross teaches a storage medium comprising a computer-executable instruction, wherein the computer-executable instruction, when executed by a processor of a computer, is configured to execute (See Gross, [FIG. 2], 216 PROCESSOR, 212 MEMORY).
Regarding claim 16, claim 16 is rejected the same as claim 2 and the arguments similar to that presented above for claim 2 are equally applicable to the claim 16, and all of the other limitations similar to claim 2 are not repeated herein, but incorporated by reference. 
Claim(s) 3–6 and 17–20 are rejected under 35 U.S.C. 103 as being unpatentable over Gross et al. (US 20150235408 A1, hereafter, "Gross") in view of Liu et al. (See NPL attached, "Collaborative Deconvolutional Neural Networks for Joint Depth Estimation and Semantic Segmentation", hereafter, "Liu") further in view of Zhao et al. (US 20150189239 A1, hereafter, “Zhao”) and further in view of Yin et al. (US 20220327312 A1, hereafter, "Yin").
Regarding claim 3, Gross in view of Liu further in view of Zhao teaches the image layering method according to claim 2, [wherein processing, according to the at least part of the preliminary segmentation regions and the semantic category to which each of the preliminary segmentation regions belongs, the semantic segmentation image to obtain the target segmentation image comprises:
determining the to-be-processed segmentation region in at least one preliminary segmentation region of the preliminary segmentation regions marked in the semantic segmentation image;
updating the semantic category of the to-be-processed segmentation region according to the semantic category of the adjacent preliminary segmentation region of the plurality of adjacent segmentation regions; and
in a case where a semantic category of each to-be-processed segmentation region has been updated, determining a corresponding target segmentation region in the semantic segmentation image according to the semantic category of each of the preliminary segmentation regions, and using an image composed of target segmentation regions corresponding to all the preliminary segmentation regions as the target segmentation image].
However, Gross fail(s) to teach wherein processing, according to the at least part of the preliminary segmentation regions and the semantic category to which each of the preliminary segmentation regions belongs, the semantic segmentation image to obtain the target segmentation image comprises: determining the to-be-processed segmentation region in at least one preliminary segmentation region of the preliminary segmentation regions marked in the semantic segmentation image; updating the semantic category of the to-be-processed segmentation region according to the semantic category of the adjacent preliminary segmentation region of the plurality of adjacent segmentation regions; and in a case where a semantic category of each to-be-processed segmentation region has been updated, determining a corresponding target segmentation region in the semantic segmentation image according to the semantic category of each of the preliminary segmentation regions, and using an image composed of target segmentation regions corresponding to all the preliminary segmentation regions as the target segmentation image.
Liu, working in the same field of endeavor, teaches: wherein processing, according to the at least part of the preliminary segmentation regions and the semantic category to which each of the preliminary segmentation regions belongs, the semantic segmentation image to obtain the target segmentation image comprises (See Liu, [Pg. 5657, Col. 2, ln. 8–11], In the third stage, a fully connected CRF is used to further improve the performance of semantic segmentation using the predicted depth and semantic labels):
determining the to-be-processed segmentation region in at least one preliminary segmentation region of the preliminary segmentation regions marked in the semantic segmentation image (See Liu, [Pg. 5657, Col. 2, ln. 8–11], In the third stage, a fully connected CRF is used to further improve the performance of semantic segmentation using the predicted depth and semantic labels. See also [FIG. 1], Fully Connected CRF).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference to wherein processing, according to the at least part of the preliminary segmentation regions and the semantic category to which each of the preliminary segmentation regions belongs, the semantic segmentation image to obtain the target segmentation image comprises: determining the to-be-processed segmentation region in at least one preliminary segmentation region of the preliminary segmentation regions marked in the semantic segmentation image based on the method of Liu’s reference. The suggestion/motivation would have been to improve the accuracy of the segmentation and the depth estimation (See Liu, [Pg. 5661, Col. 2, ln. 13–20]).
However, Gross and Liu fail(s) to teach updating the semantic category of the to-be-processed segmentation region according to the semantic category of the adjacent preliminary segmentation region of the plurality of adjacent segmentation regions; and in a case where a semantic category of each to-be-processed segmentation region has been updated, determining a corresponding target segmentation region in the semantic segmentation image according to the semantic category of each of the preliminary segmentation regions, and using an image composed of target segmentation regions corresponding to all the preliminary segmentation regions as the target segmentation image.
Zhao, working in the same field of endeavor, teaches updating the semantic category of the to-be-processed segmentation region according to the semantic category of the adjacent preliminary segmentation region of the plurality of adjacent segmentation regions (See Zhao, ¶ [0093], On the same image, small regions that pertain to a same category and are adjacent to or interconnected with each other are merged into a large region, which is referred to as a cluster. Note: Examiner is interpreting the to-be-processed segmentation region to one of the regions in the adjacent regions);
in a case where a semantic category of each to-be-processed segmentation region has been updated (See Zhao, ¶ [0093], On the same image, small regions that pertain to a same category and are adjacent to or interconnected with each other are merged into a large region, which is referred to as a cluster. Each cluster includes at least one super pixel. However, a cluster typically includes tens to even hundreds of super pixels. Note: Examiner is interpreting the updating as merging the regions of the similar categories).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference updating the semantic category of the to-be-processed segmentation region according to the semantic category of the adjacent preliminary segmentation region of the plurality of adjacent segmentation regions; in a case where a semantic category of each to-be-processed segmentation region has been updated based on the method of Zhao’s reference. The suggestion/motivation would have been to automatically segment and accurately inspect items (See Zhao, ¶ [0002–0005]).
However, Gross, Liu and Zhao fail(s) to teach determining a corresponding target segmentation region in the semantic segmentation image according to the semantic category of each of the preliminary segmentation regions, and using an image composed of target segmentation regions corresponding to all the preliminary segmentation regions as the target segmentation image.
Yin, working in the same field of endeavor, teaches: determining a corresponding target segmentation region in the semantic segmentation image according to the semantic category of each of the preliminary segmentation regions, and using an image composed of target segmentation regions corresponding to all the preliminary segmentation regions as the target segmentation image (See Yin, ¶ [0045], In an embodiment, annotation program 101 sorts all regions in an initial map according to their scales in ascending order. In an embodiment, if a region is below a certain threshold, the region is merged with its nearest region and their colors are averaged. In an embodiment, annotation program 101 generates one or more layers with one or more different thresholds. In an embodiment, annotation program 101 computes a scale value. In an embodiment, annotation program 101 determines the scale value based on shape uniformities to further determine region sizes for merging regions. In an embodiment, annotation program 101 labels each layer to produce a resulting regional map. Note: the final output of the merging is being interpreted as the target segmentation image).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference determining a corresponding target segmentation region in the semantic segmentation image according to the semantic category of each of the preliminary segmentation regions, and using an image composed of target segmentation regions corresponding to all the preliminary segmentation regions as the target segmentation image based on the method of Yin’s reference. The suggestion/motivation would have been to accurately annotate a large number of images and merge and group labels (See Yin, ¶ [0002–0007]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Liu, Zhao and Yin with Gross to obtain the invention as specified in claim 3.
Regarding claim 4, Gross in view of Liu further in view of Zhao and further in view of Yin teaches the image layering method according to claim 3, [wherein determining the to-be-processed segmentation region in the at least one preliminary segmentation region of the preliminary segmentation regions marked in the semantic segmentation image comprises:
determining the to-be-processed segmentation region in the at least one preliminary segmentation region marked in the semantic segmentation image according to a region area of each of the preliminary segmentation regions].
However, Gross and Zhao fail(s) to teach wherein determining the to-be-processed segmentation region in the at least one preliminary segmentation region of the preliminary segmentation regions marked in the semantic segmentation image comprises;
Liu, working in the same field of endeavor, teaches: wherein determining the to-be-processed segmentation region in the at least one preliminary segmentation region of the preliminary segmentation regions marked in the semantic segmentation image comprises (See Liu, [Pg. 5657, Col. 2, ln. 8–11], In the third stage, a fully connected CRF is used to further improve the performance of semantic segmentation using the predicted depth and semantic labels):
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference wherein determining the to-be-processed segmentation region in the at least one preliminary segmentation region of the preliminary segmentation regions marked in the semantic segmentation image comprises based on the method of Liu’s reference. The suggestion/motivation would have been to improve the accuracy of the segmentation and the depth estimation (See Liu, [Pg. 5661, Col. 2, ln. 13–20]).
However, Gross, Zhao and Liu fail(s) to teach determining the to-be-processed segmentation region in the at least one preliminary segmentation region marked in the semantic segmentation image according to a region area of each of the preliminary segmentation regions.
Yin, working in the same field of endeavor, teaches: determining the to-be-processed segmentation region in the at least one preliminary segmentation region marked in the semantic segmentation image according to a region area of each of the preliminary segmentation regions (See Yin, ¶ [0045], In an embodiment, annotation program 101 sorts all regions in an initial map according to their scales in ascending order. In an embodiment, if a region is below a certain threshold, the region is merged with its nearest region and their colors are averaged); 
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference determining the to-be-processed segmentation region in the at least one preliminary segmentation region marked in the semantic segmentation image according to a region area of each of the preliminary segmentation regions based on the method of Yin’s reference. The suggestion/motivation would have been to accurately annotate a large number of images and merge and group labels (See Yin, ¶ [0002–0007]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Liu and Yin with Gross and Zhao to obtain the invention as specified in claim 4.
Regarding claim 5, Gross in view of Liu further in view of Zhao and further in view of Yin teaches The image layering method according to claim 4, [wherein determining the to-be-processed segmentation region in the at least one preliminary segmentation region of the preliminary segmentation regions marked in the semantic segmentation image according to the region area of each of the preliminary segmentation regions comprises:
for each of the preliminary segmentation regions, using a preliminary segmentation region of the at least one preliminary segmentation region marked in the semantic segmentation image whose region area is less than or equal to a preset area threshold as the to-be-processed segmentation region; or,
for each of the preliminary segmentation regions, using a preliminary segmentation region of the at least one preliminary segmentation region marked in the semantic segmentation image whose proportion of region area to image area of the semantic segmentation image does not exceed a preset proportion threshold as the to-be-processed segmentation region].
However, Gross, Liu and Zhao fail(s) to teach wherein determining the to-be-processed segmentation region in the at least one preliminary segmentation region of the preliminary segmentation regions marked in the semantic segmentation image according to the region area of each of the preliminary segmentation regions comprises: for each of the preliminary segmentation regions, using a preliminary segmentation region of the at least one preliminary segmentation region marked in the semantic segmentation image whose region area is less than or equal to a preset area threshold as the to-be-processed segmentation region; or, for each of the preliminary segmentation regions, using a preliminary segmentation region of the at least one preliminary segmentation region marked in the semantic segmentation image whose proportion of region area to image area of the semantic segmentation image does not exceed a preset proportion threshold as the to-be-processed segmentation region.
Yin, working in the same field of endeavor, teaches: wherein determining the to-be-processed segmentation region in the at least one preliminary segmentation region of the preliminary segmentation regions marked in the semantic segmentation image according to the region area of each of the preliminary segmentation regions (See Yin, ¶ [0045], In an embodiment, annotation program 101 sorts all regions in an initial map according to their scales in ascending order. In an embodiment, if a region is below a certain threshold, the region is merged with its nearest region and their colors are averaged. In an embodiment, annotation program 101 generates one or more layers with one or more different thresholds. In an embodiment, annotation program 101 computes a scale value. In an embodiment, annotation program 101 determines the scale value based on shape uniformities to further determine region sizes for merging regions. In an embodiment, annotation program 101 labels each layer to produce a resulting regional map) comprises:
for each of the preliminary segmentation regions, using a preliminary segmentation region of the at least one preliminary segmentation region marked in the semantic segmentation image whose region area is less than or equal to a preset area threshold as the to-be-processed segmentation region (See Yin, ¶ [0045], In an embodiment, annotation program 101 sorts all regions in an initial map according to their scales in ascending order. In an embodiment, if a region is below a certain threshold, the region is merged with its nearest region and their colors are averaged. Note: note that the claim requires at least one of semantic segmentation image whose region area is less than or equal to a preset area threshold OR a semantic segmentation image does not exceed a preset proportion threshold. Examiner only needs to teach one of); or,
for each of the preliminary segmentation regions, using a preliminary segmentation region of the at least one preliminary segmentation region marked in the semantic segmentation image whose proportion of region area to image area of the semantic segmentation image does not exceed a preset proportion threshold as the to-be-processed segmentation region. 
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference wherein determining the to-be-processed segmentation region in the at least one preliminary segmentation region of the preliminary segmentation regions marked in the semantic segmentation image according to the region area of each of the preliminary segmentation regions comprises: for each of the preliminary segmentation regions, using a preliminary segmentation region of the at least one preliminary segmentation region marked in the semantic segmentation image whose region area is less than or equal to a preset area threshold as the to-be-processed segmentation region; based on the method of Yin’s reference. The suggestion/motivation would have been to accurately annotate a large number of images and merge and group labels (See Yin, ¶ [0002–0007]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Yin with Gross, Liu and Zhao to obtain the invention as specified in claim 5.
Regarding claim 6, Gross in view of Liu further in view of Zhao and further in view of Yin teaches the image layering method according to claim 3, [wherein updating the semantic category of the to-be-processed segmentation region according to the semantic category of the adjacent preliminary segmentation region of the plurality of adjacent preliminary segmentation regions comprises:
acquiring the adjacent preliminary segmentation region adjacent to the to-be-processed segmentation region in the semantic segmentation image as a reference adjacent region; and
determining a target reference adjacent region corresponding to the to-be-processed segmentation region according to the reference adjacent region, and updating a semantic category to which the target reference adjacent region belongs to the semantic category of the to- be-processed segmentation region].
However, Gross, Liu and Yin fail(s) to teach wherein updating the semantic category of the to-be-processed segmentation region according to the semantic category of the adjacent preliminary segmentation region of the plurality of adjacent preliminary segmentation regions comprises: acquiring the adjacent preliminary segmentation region adjacent to the to-be-processed segmentation region in the semantic segmentation image as a reference adjacent region; and determining a target reference adjacent region corresponding to the to-be-processed segmentation region according to the reference adjacent region, and updating a semantic category to which the target reference adjacent region belongs to the semantic category of the to- be-processed segmentation region.
Zhao, working in the same field of endeavor, teaches: wherein updating the semantic category of the to-be-processed segmentation region according to the semantic category of the adjacent preliminary segmentation region of the plurality of adjacent preliminary segmentation regions comprises (See Zhao, ¶ [0093], On the same image, small regions that pertain to a same category and are adjacent to or interconnected with each other are merged into a large region, which is referred to as a cluster. Note: Examiner is interpreting the to-be-processed segmentation region to one of the regions in the adjacent regions) comprises:
acquiring the adjacent preliminary segmentation region adjacent to the to-be-processed segmentation region in the semantic segmentation image as a reference adjacent region (See Zhao, ¶ [0093], On the same image, small regions that pertain to a same category and are adjacent to or interconnected with each other are merged into a large region, which is referred to as a cluster. Note: Examiner is interpreting the adjacent to-be processed segmentation region to be one of the small regions in the adjacent regions); and
determining a target reference adjacent region corresponding to the to-be-processed segmentation region according to the reference adjacent region, and updating a semantic category to which the target reference adjacent region belongs to the semantic category of the to- be-processed segmentation region (See Zhao, ¶ [0093], On the same image, small regions that pertain to a same category and are adjacent to or interconnected with each other are merged into a large region, which is referred to as a cluster. Each cluster includes at least one super pixel. However, a cluster typically includes tens to even hundreds of super pixels. Note: Examiner is interpreting the updating as merging the regions of the similar categories into a cluster).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference wherein updating the semantic category of the to-be-processed segmentation region according to the semantic category of the adjacent preliminary segmentation region of the plurality of adjacent preliminary segmentation regions comprises: acquiring the adjacent  preliminary segmentation region adjacent to the to-be-processed segmentation region in the semantic segmentation image as a reference adjacent region; and determining a target reference adjacent region corresponding to the to-be-processed segmentation region according to the reference adjacent region, and updating a semantic category to which the target reference adjacent region belongs to the semantic category of the to- be-processed segmentation region based on the method of Zhao’s reference. The suggestion/motivation would have been to automatically segment and accurately inspect items (See Zhao, ¶ [0002–0005]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Zhao with Gross, Liu and Yin to obtain the invention as specified in claim 6.
Regarding claim 17, claim 17 is rejected the same as claim 3 and the arguments similar to that presented above for claim 3 are equally applicable to the claim 17, and all of the other limitations similar to claim 3 are not repeated herein, but incorporated by reference. 
Regarding claim 18, claim 18 is rejected the same as claim 4 and the arguments similar to that presented above for claim 4 are equally applicable to the claim 18, and all of the other limitations similar to claim 4 are not repeated herein, but incorporated by reference. 
Regarding claim 19, claim 19 is rejected the same as claim 5 and the arguments similar to that presented above for claim 5 are equally applicable to the claim 19, and all of the other limitations similar to claim 5 are not repeated herein, but incorporated by reference. 
Regarding claim 20, claim 20 is rejected the same as claim 6 and the arguments similar to that presented above for claim 6 are equally applicable to the claim 20, and all of the other limitations similar to claim 6 are not repeated herein, but incorporated by reference. 
Claim(s) 7 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Gross et al. (US 20150235408 A1, hereafter, "Gross") in view of Liu et al. (See NPL attached, "Collaborative Deconvolutional Neural Networks for Joint Depth Estimation and Semantic Segmentation", hereafter, "Liu") further in view of Zhao et al. (US 20150189239 A1, hereafter, “Zhao”) and further in view of Yin et al. (US 20220327312 A1, hereafter, "Yin") and further in view of Meng et al. (CN 107945183 A, hereafter, “Meng”).
Regarding claim 7, Gross in view of Liu further in view of Yin teaches the image layering method according to claim 6, [wherein determining the target reference adjacent region corresponding to the to-be-processed segmentation region according to the reference adjacent region comprises:
calculating an overlapping length of a boundary of the to-be-processed segmentation region and boundaries of a plurality of reference adjacent regions, separately, and using a reference adjacent region of the plurality of reference adjacent regions that has a longest overlapping length with the boundary of the to-be-processed segmentation region as the target reference adjacent region corresponding to the to-be-processed segmentation region].
However, Gross, Liu and Yin fail(s) to teach wherein determining the target reference adjacent region corresponding to the to-be-processed segmentation region according to the reference adjacent region comprises:
Zhao, working in the same field of endeavor, teaches: wherein determining the target reference adjacent region corresponding to the to-be-processed segmentation region according to the reference adjacent region comprises (See Zhao, ¶ [0093], On the same image, small regions that pertain to a same category and are adjacent to or interconnected with each other are merged into a large region, which is referred to as a cluster. Note: Examiner is interpreting the to-be-processed segmentation region to one of the regions in the adjacent regions):
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference wherein determining the target reference adjacent region corresponding to the to-be-processed segmentation region according to the reference adjacent region comprises based on the method of Zhao’s reference. The suggestion/motivation would have been to automatically segment and accurately inspect items (See Zhao, ¶ [0002–0005]).
However, Gross, Liu, Yin and Zhao fail(s) to teach calculating an overlapping length of a boundary of the to-be-processed segmentation region and boundaries of a plurality of reference adjacent regions, separately, and using a reference adjacent region of the plurality of reference adjacent regions that has a longest overlapping length with the boundary of the to-be-processed segmentation region as the target reference adjacent region corresponding to the to-be-processed segmentation region.
Meng, working in the same field of endeavor, teaches: calculating an overlapping length of a boundary of the to-be-processed segmentation region and boundaries of a plurality of reference adjacent regions, separately, and using a reference adjacent region of the plurality of reference adjacent regions that has a longest overlapping length with the boundary of the to-be-processed segmentation region as the target reference adjacent region corresponding to the to-be-processed segmentation region (See Meng, ¶ [0030], This technique effectively reduces the oversegmentation phenomenon of watershed segmentation, laying a good foundation for effective image merging. The "FastLambda Algorithm Based on Controllable Common Boundary Length Penalty" incorporates the ratio of the common boundary length between two regions to the square root of the smaller region's area into the formula, solving the problem of the algorithm's own shape element, namely the ratio of the common boundary length. Note: Examiner is interpreting the common boundary length as the overlapping length).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference calculating an overlapping length of a boundary of the to-be-processed segmentation region and boundaries of a plurality of reference adjacent regions, separately, and using a reference adjacent region of the plurality of reference adjacent regions that has a longest overlapping length with the boundary of the to-be-processed segmentation region as the target reference adjacent region corresponding to the to-be-processed segmentation region based on the method of Meng’s reference. The suggestion/motivation would have been to accurately and timely segment images (See Meng, ¶ [0002–0010]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Meng and Zhao with Gross, Liu and Yin to obtain the invention as specified in claim 7.
Regarding claim 21, claim 21 is rejected the same as claim 7 and the arguments similar to that presented above for claim 7 are equally applicable to the claim 21, and all of the other limitations similar to claim 7 are not repeated herein, but incorporated by reference. 
Claim(s) 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Gross et al. (US 20150235408 A1, hereafter, "Gross") in view of Liu et al. (See NPL attached, "Collaborative Deconvolutional Neural Networks for Joint Depth Estimation and Semantic Segmentation", hereafter, "Liu") further in view of Zhao et al. (US 20150189239 A1, hereafter, “Zhao”) and further in view of Baruch et al. (US 9,741,125 B2, hereafter, "Baruch").
Regarding claim 9, Gross in view of Liu further in view of Zhao teaches the image layering method according to claim 1, determining the depth level corresponding to the at least part of the at least one target segmentation region in the two-dimensional scene image according to the target segmentation image and the target depth image (See Gross, ¶ [0041], In block 830, the segments are rendered as a multilayer image. When rendering the image, the layers are arranged based on the depth map, so that the more foreground layers are layered on top of the more background layer(s), in one embodiment, to produce a pseudo-3D image that appears to bring the more foreground layers toward the observer. Note: They are layered based on their depth in the map on whether they are a foreground or background. So the examiner is interpreting that as determining its depth level) comprises:
[for each target segmentation region whose depth level is to be determined, determining a depth information feature of the each target segmentation region according to the target depth image, wherein the depth information feature comprises at least one of depth mean or depth variance of at least part of pixels in the each target segmentation region; and
performing, according to the depth information feature of each target segmentation region, clustering processing on the each target segmentation region to obtain a depth level corresponding to the each target segmentation region in the target segmentation image].
However, Gross, Liu and Zhao fail(s) to teach for each target segmentation region whose depth level is to be determined, determining a depth information feature of the each target segmentation region according to the target depth image, wherein the depth information feature comprises at least one of depth mean or depth variance of at least part of pixels in the each target segmentation region; and performing, according to the depth information feature of each target segmentation region, clustering processing on the each target segmentation region to obtain a depth level corresponding to the each target segmentation region in the target segmentation image.
Baruch, working in the same field of endeavor, teaches: for each target segmentation region whose depth level is to be determined, determining a depth information feature of the each target segmentation region according to the target depth image, wherein the depth information feature comprises at least one of depth mean or depth variance of at least part of pixels in the each target segmentation region (See Baruch, [Col. 7, ln. 50–53], Thus, the image depth data, except for the areas of the planar components when provided, are used in algorithms that group pixels into clusters or components by finding some common characteristic for the pixels. [Col. 7, ln. 60–67], Thus, the mean-shift algorithm may involve using a distance function for measuring the distances between depth values at the pixels where all pixels within the radius (measured according to the distance) will be accounted for in the cluster to establish a depth value difference (based on pixel data) so that only those pixels that 65 have data within this depth value difference are used for calculating the mean. Note: the clustering is on segmentation components); and
performing, according to the depth information feature of each target segmentation region, clustering processing on the each target segmentation region to obtain a depth level corresponding to the each target segmentation region in the target segmentation image (See Baruch, [Col. 7, ln. 50–53], Thus, the image depth data, except for the areas of the planar components when provided, are used in algorithms that group pixels into clusters or components by finding some common characteristic for the pixels. [Col. 7, ln. 60–67], Thus, the mean-shift algorithm may involve using a distance function for measuring the distances between depth values at the pixels where all pixels within the radius (measured according to the distance) will be accounted for in the cluster to establish a depth value difference (based on pixel data) so that only those pixels that 65 have data within this depth value difference are used for calculating the mean. Note: the Examiner is interpreting mean-shift as the clustering).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference for each target segmentation region whose depth level is to be determined, determining a depth information feature of the each target segmentation region according to the target depth image, wherein the depth information feature comprises at least one of depth mean or depth variance of at least part of pixels in the each target segmentation region; and performing, according to the depth information feature of each target segmentation region, clustering processing on the each target segmentation region to obtain a depth level corresponding to the each target segmentation region in the target segmentation image based on the method of Baruch’s reference. The suggestion/motivation would have been to accurately segment an image using depth information (See Baruch, [Col. 1, ln. 7–47]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Baruch with Gross, Liu and Zhao to obtain the invention as specified in claim 9.
Regarding claim 10, Gross in view of Liu further in view of Zhao and further in view of Baruch teaches the image layering method according to claim 9, [wherein performing, according to the depth information feature of each target segmentation region, the clustering processing on the each target segmentation region comprises:
performing the clustering processing on the each target segmentation region according to the depth information feature of the each target segmentation region and a preset unsupervised clustering algorithm, wherein the preset unsupervised clustering algorithm comprises at least one of a Euclidean distance-based clustering algorithm, a hierarchical clustering algorithm, a non- linear dimensionality reduction clustering algorithm, or a density-based clustering algorithm].
However, Gross, Liu and Zhao fail(s) to teach wherein performing, according to the depth information feature of each target segmentation region, the clustering processing on the each target segmentation region comprises: performing the clustering processing on the each target segmentation region according to the depth information feature of the each target segmentation region and a preset unsupervised clustering algorithm, wherein the preset unsupervised clustering algorithm comprises at least one of a Euclidean distance-based clustering algorithm, a hierarchical clustering algorithm, a non- linear dimensionality reduction clustering algorithm, or a density-based clustering algorithm.
Baruch, working in the same field of endeavor, teaches: wherein performing, according to the depth information feature of each target segmentation region, the clustering processing on the each target segmentation region (See Baruch, [Col. 7, ln. 50–53], Thus, the image depth data, except for the areas of the planar components when provided, are used in algorithms that group pixels into clusters or components by finding some common characteristic for the pixels) comprises:
performing the clustering processing on the each target segmentation region according to the depth information feature of the each target segmentation region and a preset unsupervised clustering algorithm, wherein the preset unsupervised clustering algorithm comprises at least one of a Euclidean distance-based clustering algorithm, a hierarchical clustering algorithm, a non- linear dimensionality reduction clustering algorithm, or a density-based clustering algorithm (See Baruch, [Col. 7, ln. 50–60], Thus, the image depth data, except for the areas of the planar components when provided, are used in algorithms that group pixels into clusters or components by finding some common characteristic for the pixels. By one example, a mean-shift algorithm is formed that replaces the depth value of individual pixels with the depth value mean of the pixels in a range-r neighborhood ( or cluster) and is included in the cluster, in part and generally, depending on whether the pixel with a certain depth value is within a certain distance (or window) from the location of the mean value within a range of depth values. Note: Examiner is interpreting the certain distance and Euclidean distance).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Gross’s reference wherein performing, according to the depth information feature of each target segmentation region, the clustering processing on the each target segmentation region comprises: performing the clustering processing on the each target segmentation region according to the depth information feature of the each target segmentation region and a preset unsupervised clustering algorithm, wherein the preset unsupervised clustering algorithm comprises at least one of a Euclidean distance-based clustering algorithm, a hierarchical clustering algorithm, a non- linear dimensionality reduction clustering algorithm, or a density-based clustering algorithm based on the method of Baruch’s reference. The suggestion/motivation would have been to accurately segment an image using depth information (See Baruch, [Col. 1, ln. 7–47]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Baruch with Gross, Liu and Zhao to obtain the invention as specified in claim 10.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Dai et al. (US 9865042 B2) teaches in implementations of the subject matter described herein, the feature maps are obtained by convoluting an input image using a plurality of layers of convolution filters. The feature maps record semantic information for respective regions on the image and only need to be computed once. Segment features of the image are extracted from the convolutional feature maps. Particularly, the binary masks may be obtained from a set of candidate segments of the image. The binary masks are used to mask the feature maps instead of the raw image. The masked feature maps define the segment features. The semantic segmentation of the image is done by determining a semantic category for each pixel in the image at least in part based on the resulting segment features.
Isola et al. (US 9437027 B2) teaches the subject disclosure is directed towards layered image understanding by which a layered scene representation is generated for an image. Providing such a scene representation explains a scene being presenting in the image by defining that scene's semantic structure. To generate the layered scene representation, the subject disclosure recognizes objects within the image by combining objects sampled from annotated image data and determining whether that combination is both semantically well-formed and matches the visual appearance of the image. The objects are transformed and then can be used to modify the query image. The subject disclosure models the objects into semantic segments that form a portion of the scene representation.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DION J SATCHER whose telephone number is (703)756-5849. The examiner can normally be reached Monday - Thursday 5:30 am - 2:30 pm, Friday 5:30 am - 9:30 am PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Henok Shiferaw can be reached at (571) 272-4637. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DION J SATCHER/Patent Examiner, Art Unit 2676                                                                                                                                                                                                        

/Henok Shiferaw/Supervisory Patent Examiner, Art Unit 2676
Read full office action
Prosecution Timeline

Sep 22, 2023
Application Filed
Dec 10, 2025
Non-Final Rejection mailed — §101, §102, §103
Mar 10, 2026
Response Filed
May 15, 2026
Final Rejection mailed — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/254,733
Patent 12639835
METHOD AND APPARATUS OF FUSION OF MULTIMODAL IMAGES TO FLUOROSCOPIC IMAGES
3y 0m to grant Granted May 26, 2026
17/991,368
Patent 12620070
DETERMINING AND USING POINT SPREAD FUNCTION FOR IMAGE DEBLURRING
3y 5m to grant Granted May 05, 2026
18/462,579
Patent 12620053
IMAGE PROCESSING METHOD AND APPARATUS, MEDIUM, DEVICE AND DRIVING SYSTEM
2y 8m to grant Granted May 05, 2026
18/148,405
Patent 12611552
METHODS AND SYSTEMS FOR RADIATION THERAPY GUIDANCE
3y 4m to grant Granted Apr 28, 2026
18/340,043
Patent 12614262
IMAGE PROCESSING APPARATUS, ENDOSCOPIC APPARATUS, IMAGE PROCESSING METHOD, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM
2y 10m to grant Granted Apr 28, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
86%
Grant Probability
99%
With Interview (+14.1%)
2y 10m (~2m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 42 resolved cases by this examiner. Grant probability derived from career allowance rate.