Last updated: May 29, 2026
Application No. 18/293,363
PANORAMIC DEPTH IMAGE SYNTHESIS METHOD, STORAGE MEDIUM, AND SMARTPHONE

Final Rejection §103
Filed
Jan 30, 2024
Priority
Aug 19, 2021 — CN 202110953450.2 +1 more
Examiner
LE, JOHNNY TRAN
Art Unit
2614
Tech Center
2600 — Communications
Assignee
Huizhou TCL Cloud Internet Corporation Technology Co., Ltd.
OA Round
2 (Final)
This examiner grants 60% of cases after interview

— +50.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 5 resolved cases, 2023–2026
Examiner Intelligence

LE, JOHNNY TRAN View full profile →
Grants 60% of resolved cases
Career Allowance Rate
3 granted / 5 resolved
-2.0% vs TC avg
Strong +50% interview lift
Without
With
+50.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
22 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§103
98.2%
+58.2% vs TC avg
§112
1.8%
-38.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 5 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/14/2025 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement has been considered by the examiner.

Response to Amendment
1	This action is in response to the amendment filed on 1/9/2026. Claims 1, 6-8, 10-12, and 16 have been amended, and claims 5, 9, and 17 have been cancelled. The drawings have also been amended to overcome an objection. Claims 1-4, 6-8, 10-16, and 18-20 remain rejected.

Response to Arguments
2	Applicant’s arguments with respect to independent claim 1 filed on 1/9/2026, with respect to the rejection under 35 USC § 102 regarding that the prior art does not teach the following but not limited to “capturing two images through dual-focus operation (face focus and far- distance focus); obtaining a portrait area image using an image segmentation neural network trained with annotated face data; inferring portrait position through section partitioning and ROI feature-based fixed-height feature vectors; and performing pixel- level alignment and fusion between the portrait area image and the far-focus image.”. There are new elements as well as elements of claims 5, 9, and 17 were added into claim 1, therefore cancelling the particular claims as well. The argument has been considered, but are moot due to similar (with some additional clarification) and new grounds of rejection under 35 USC § 103.

3	Regarding arguments to claims 2-4, 6-8, 10-16, and 18-20, they directly/indirectly depend on independent claim 1 respectively. Applicant does not argue anything other than independent claim 1. The limitations in those, in conjunction with combination, was mostly previously established as explained, with some modification to account for the new grounds of rejections made for claim 1.

4	Regarding arguments for claim 16 in regards to the 35 USC § 112 rejection, the claim has been amended, thus overcoming the rejection.

5	Claims 5, 9, and 17 has been cancelled by the applicant as mentioned previously, therefore the claims will not be reviewed further.

Claim Rejections - 35 USC § 103
6	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
  
7	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

8	Claim(s) 1, 6, 8, 10-16, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wadhwa, N., Garg, R., Jacobs, D. E., Feldman, B. E., Kanazawa, N., Carroll, R., ... & Levoy, M. (2018). Synthetic depth-of-field with a single-camera mobile phone. ACM Transactions on Graphics (ToG), 37(4), 1-13. (hereinafter Wadhwa) in view of Zhe et al. (CN 108986127 A).

9	Regarding claim 1, Wadhwa teaches a panoramic depth image synthesis method ([Page 1; Abstract] reciting “We present a system to computationally synthesize shallow depth-of-field images with a single mobile camera and a single button press.”), wherein the method comprises the step of: 
obtaining a first image and a second image ([Page 1; Abstract] reciting “We present a system to computationally synthesize shallow depth-of-field images with a single mobile camera and a single button press.”) correspondingly by locking focus in a face and a farthest point of a lens respectively when a photo is taken ([Section 1; Page 2] reciting “However, for the constrained category of selfie images, we found it sufficient to only segment out people using the trained segmentation model and to apply a uniform blur to the background...Second, if available, we use a sensor with dual-pixel (DP) auto-focus hardware, which effectively gives us a 2-sample light field with a narrow ∼1 millimeter baseline”; [Section 4; Page 5] reciting “Dual-pixel (DP) auto-focus systems work by splitting pixels in half, such that the left half integrates light over the right half of the aperture and vice versa (Fig. 5). Because image content is optically blurred based on distance from the focal plane, there is a shift, or disparity, between the two views that depends on depth and on the shape of the blur kernel.”; [Page 8; Fig.10 Description] reciting “All disparities are negative for this focus distance of 2.3m, which is past the farthest target.”); 
obtaining a portrait area image by performing portrait segmentation processing on the first image ([Page 5; Section 3.5] reciting “We compare the accuracy of our model against the PortraitFCN+ model … by computing the mean Intersection-over-Union (IoU), i.e., area(output ∩ ground truth) / area(output ∪ ground truth), over their evaluation dataset…We also compare against a state-of-the-art semantic segmentation model Mask-RCNN…We also compare against a state-of-the-art semantic segmentation model Mask-RCNN”); 
and obtaining a merged image by aligning and merging the portrait area image and the second image ([Page 10; Section 5.3] reciting “After the individual layers have been blurred, they are upsampled to full-resolution and composited together, back to front. One special in-focus layer is inserted over the sub-image containing dfocus, taken directly from the full resolution input, to avoid any quality loss from the downsample/upsample round trip imposed by the blur pipeline”); 
wherein the step of obtaining the portrait area image by performing the portrait segmentation processing on the first image comprises the steps of ([Page 5; Section 3.5] reciting “We compare the accuracy of our model against the PortraitFCN+ model … by computing the mean Intersection-over-Union (IoU), i.e., area(output ∩ ground truth) / area(output ∪ ground truth), over their evaluation dataset…We also compare against a state-of-the-art semantic segmentation model Mask-RCNN…We also compare against a state-of-the-art semantic segmentation model Mask-RCNN”):
using annotated image data containing a face is used as a training sample to train an image segmentation neural network, and obtaining a trained image segmentation neural network ([Page 2; Section 1] reciting “We run a face detector on an input color image and identify the faces of the subjects being photographed. A neural network uses the color image and the identified faces to infer a low-resolution mask that segments the people that the faces belong to. The mask is then upsampled to full resolution using edge-aware filtering. This mask can be used to uniformly blur the background while keeping the subject sharp.; ”; [Page 3-4; Section 3.1] reciting “To train our neural network, we downloaded 122k images from Flickr (www.flickr.com) that contain between 1 to 5 faces and annotated a polygon mask outlining the people in the image. The mask is refined using the filtering approach…With each improvement we made over a 9- month period in our training data, we observed the quality of our defocused portraits to improve commensurately.”); 
and obtaining the portrait area image by inputting the first image into the trained image segmentation neural network to perform image segmentation ([Page 4; Section 3.3] reciting “At inference time, we are provided with an RGB image and face rectangles output by a face detector. Our model is trained to predict the segmentation mask corresponding to the face location in the input (Fig. 3).”); 
wherein the step of using the annotated image data containing the face as the training sample to train the image segmentation neural network and obtaining the trained image segmentation neural network comprises the steps of (see above): 
using the image segmentation network for obtaining a target area where at least one portrait in a training image is located, and obtaining position information of the portrait which needs to be segmented in the target area, wherein the training image is marked with position annotation information of the portrait which needs to be segmented; and training the image segmentation neural network to obtain the trained image segmentation neural network based on the position information of the portrait which needs to be segmented and the position annotation information of the portrait which needs to be segmented ([Page 4; Sections 3.2 & 3.4] reciting “We use a two stage training process. In the first stage, we train with cross entropy losses for both segmentation and pose, which are weighted by a 1 : 5 ratio. After the first stage of training has converged, we remove the pose loss and prune training images for which the model predictions had large L1 error for pixels in the interior of the mask, i.e., we only trained using examples with errors near the object boundaries. Large errors distant from the object boundary can be attributed to either annotation error or model error. It is obviously beneficial to remove training examples with annotation error. In the case of model error, we sacrifice performance on a small percentage of images to focus on improving near the boundaries for a large percentage of images…We also use this filtering to refine the ground truth masks used for training—this enables human annotators to only provide approximate mask edges, thus improving the quality of annotation given a fixed human annotation time.”; [See also Fig. 4]);

    PNG
    media_image1.png
    213
    349
    media_image1.png
    Greyscale

wherein the step of inputting the first image into the trained image segmentation neural network to perform image segmentation and obtaining the portrait area image comprises (see above): 
; and 


10	Although Wadhwa could teach obtaining the first image; and using the trained image segmentation neural network to obtain the target area where the at least one portrait in the first image is located, to obtain the position information of the portrait which needs to be segmented in the target area, and to obtain the portrait area image based on the location information of the portrait ([Page 3-4; Section 3.1] reciting “To train our neural network, we downloaded 122k images from Flickr (www.flickr.com) that contain between 1 to 5 faces and annotated a polygon mask outlining the people in the image. The mask is refined using the filtering approach…With each improvement we made over a 9- month period in our training data, we observed the quality of our defocused portraits to improve commensurately.”; [Page 7; Section 4.2] reciting “To calibrate for variations in blur size, we place a mobile phone camera on a tripod in front of a textured fronto-parallel planar test target (Fig. 10(a-b)) that is at a known constant depth. We capture images spanning a range of focus distances and target distances”), prior art from Zhe can teach this limitation further. Wadhwa also does not explicitly teach dividing the first image into a plurality of sections, respectively performing feature extraction on a region of interest (ROI) of any one of the plurality of sections, generating a fixed-height feature vector based on results of the feature extraction, and obtaining the position information of the portrait to be segmented in the corresponding section based on the fixed-height feature vector.

11	Zhe teaches obtaining the first image; and using the trained image segmentation neural network to obtain the target area where the at least one portrait in the first image is located, to obtain the position information of the portrait which needs to be segmented in the target area ([Abstract] reciting “…acquiring a target area where at least one strip-shaped object in a sample image is located by using an image segmentation neural network…”), and to obtain the portrait area image based on the location information of the portrait ([Page 1; Lines 47-49] reciting “Obtaining, by using an image segmentation neural network, a target area where at least one strip object in the sample image is located, and acquiring location information of the strip object to be segmented in the target area; wherein the sample image is labeled with the Position labeling information of long strip objects that need to be segmented”); and
dividing the first image into a plurality of sections ([Page 2; Lines 1-2] reciting “Dividing the sample image into a plurality of segments according to a target direction, the target directions including at least a vertical direction or a horizontal direction”), respectively performing feature extraction on a region of interest (ROI) of any one of the plurality of sections, generating a fixed-height feature vector based on results of the feature extraction, and obtaining the position information of the portrait to be segmented in the corresponding section based on the fixed-height feature vector ([Page 2; Lines 14-16, 18-20] reciting “Feature extraction is performed on the ROI in any one of the plurality of segments, and a fixed height feature vector is generated based on the feature extraction result; and the feature vector is obtained based on the fixed height feature vector Position information of the elongated object in the segment that needs to be divided. In the embodiment of the present invention, the image segmentation neural network is used to acquire the target region where the at least one strip object in the sample image is located, and the location information of the strip object to be segmented in the target region is obtained…”).

12	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Wadhwa) to incorporate the teachings of Zhe to provide a method that can obtain a type of image with segmentation rules, as well as to perform feature extraction for a region of interest and to generate a fixed vector using the image segmentation neural network and the provided portrait or picture taught by Wadhwa. Doing so would allow the adjustment of a loss function value as stated by Zhe ([Page 2; Lines 51-52] recited).

13	Regarding claim 6, Wadhwa in view of Zhe teaches the panoramic depth image synthesis method of claim 1 (see claim 1 rejection above), wherein the image segmentation neural network is an end-to-end trainable neural network (Zhe; [Page 6; Lines 50-51] reciting “In the technical solution of the embodiment of the present invention, an end-to-end trainable neural network is proposed to implement image segmentation processing, and the end-to-end trainable neural network is referred to as an image segmentation neural network.”).

14	Regarding claim 8, Wadhwa in view of Zhe teaches the panoramic depth image synthesis method of claim 1, wherein the obtaining the portrait area image by inputting the first image into the trained image segmentation neural network to perform the image segmentation comprises (see claim 1 rejection above), inputting the first image into a local computer device to indicate the computer device to use the image segmentation neural network to segment the first image to obtain the portrait area image (Zhe; [Page 7; Lines 9-11] reciting “The user inputs the collected image containing the long strip object into the local computer device, and the computer device uses the image segmentation neural network of the embodiment of the present invention to segment the image.”); and 
receiving the portrait area image returned by the local computer device (Zhe; [Page 7; Lines 28-30] reciting “In the embodiment of the present invention, the more the number of sample images, the better the training result of the image segmentation neural network, and on the other hand, the more the number of sample images, the more computer resources need to be consumed.”).

15	Regarding claim 10, Wadhwa in view of Zhe teaches the panoramic depth image synthesis method of claim 1 (see claim 1 rejection above), wherein a format of the position annotation information is a heat map (Wadhwa; [Page 4; Section 3.2] reciting “Each of the three stages outputs a segmentation mask — a 256 × 256 × 1 output of a layer with sigmoid activation, and a 64 × 64 × 17 output containing heatmaps corresponding to the locations of the 17 keypoint…Large errors distant from the object boundary can be attributed to either annotation error or model error. It is obviously beneficial to remove training examples with annotation error.”).  

16	Regarding claim 11, Wadhwa in view of Zhe and W. Song teaches the panoramic depth image synthesis method of claim 1 (see claim 1 rejection above), wherein a format of the position annotation information is a coordinate point (Wadhwa; [Page 7; Section 4.2] reciting “To correct peripheral disparities, we use Sz (x) and Iz (x) to solve for inverse depth. Then we apply Sz (0) and Iz (0), where 0 is the image center coordinates…Since focus distance z varies continuously, we calibrate 20 different focus distance and linearly interpolate Sz and Iz between them.”).

17	Regarding claim 12, Wadhwa in view of Zhe teaches the panoramic depth image synthesis method of claim 1 (see claim 1 rejection above), wherein the image segmentation neural network comprises at least a first sub-network, a second sub- network, and a third sub-network (Zhe; [Page 7; Lines 40-42] reciting “In the embodiment of the present invention, the image segmentation neural network includes at least a first sub-network and a second sub-network, and the first sub-network in the image segmentation neural network is used to acquire a target region where at least one long-shaped object in the sample image is located.”; [Page 7; Lines 50-51] reciting “A feature map of the sample image is acquired by using a third sub-network in the image segmentation neural network.”); 
the using the image segmentation network for obtaining the target area where the at least one portrait in the training image is located comprises: using the first sub-network for obtaining a feature map of the training image (Zhe; [Page 7; Lines 41-42] reciting “…the first sub-network in the image segmentation neural network is used to acquire a target region where at least one long-shaped object in the sample image is located…”; [Page 7; Lines 50-51] reciting “A feature map of the sample image is acquired by using a third sub-network in the image segmentation neural network.”); and 
using the second sub-network for processing the feature map of the training image to obtain the target area where the at least one portrait in the training image is located (Zhe; [Page 7; Lines 42-44] reciting “The second sub-network in the image segmentation neural network acquires location information of the elongated object in the target region that needs to be segmented.”); and 
the obtaining the position information of the portrait which needs to be segmented in the target area, comprises: using the third sub-network for obtaining the position information of the portrait which needs to be segmented in the target area (Zhe; [Page 6; Lines 14-17, 18-22] reciting “In the technical solution of the embodiment of the present invention, an image segmentation neural network is used to acquire a target region where at least one strip object in the sample image is located, and position information of the strip object to be segmented in the target region is acquired; And the sample image is marked with the position labeling information of the long strip object to be divided… The image segmentation neural network is trained. Based on the trained image segmentation neural network, the first sub-network in the image segmentation neural network is used to acquire a target region where at least one strip object in the target image is located, and the second sub-network in the image segmentation neural network is used to obtain Position information of the elongated object in the target area that needs to be divided.”).

18	Regarding claim 13, Wadhwa in view of Zhe teaches the panoramic depth image synthesis method of claim 12, wherein the using the second sub-network for processing the feature map of the training image to obtain the target area where the at least one portrait in the training image is located comprises (see claims 1 and 12 rejections above):
using the second sub-network for dividing the sample image into multiple sections according to a target direction, wherein the target direction at least comprises a vertical direction or a horizontal direction (Zhe; [Page 6; Lines 21-22] reciting “…and the second sub-network in the image segmentation neural network is used to obtain Position information of the elongated object in the target area that needs to be divided.”; [Page 5; Lines 20-22] reciting “In the embodiment of the present invention, the second acquiring unit is configured to divide the target image into a plurality of segments according to a target direction, where the target direction includes at least a vertical direction or a horizontal direction”);
for any one of the multiple sections, determining an ROI corresponding to the at least one portrait in any one of the multiple sections, and determining via a first boundary and a second boundary by the ROI, wherein directions of the first boundary and the second boundary are perpendicular to the target direction (Zhe; [Page 5; Lines 22-25] reciting “Determining, in any of the segments, an ROI corresponding to the at least one elongated object, the ROI being determined by the first boundary and the second boundary, the first boundary and the second boundary The direction is perpendicular to the target direction; based on the ROI in the plurality of segments, a target region in which at least one elongated object is located is determined”);
and determined the target area where the at least one portrait is located based on the ROI in the multiple sections (Zhe; [Page 5; Lines 24-25] reciting “…based on the ROI in the plurality of segments, a target region in which at least one elongated object is located is determined.”).

19	Regarding claim 14, Wadhwa in view of Zhe teaches the panoramic depth image synthesis method of claim 13 (see claims 1 and 12-13 rejections above), wherein the multiple sections are equal-width sections arranged in the vertical direction ([Page 8; Lines 21-22] reciting “Here, the plurality of segments may be equal-width segments arranged in the vertical direction, or equal-width segments arranged in the horizontal direction;”).

20	Regarding claim 15, the panoramic depth image synthesis method of claim 13 (see claims 1 and 12-13 rejections above), wherein the multiple sections are equal-width sections arranged in the horizontal direction ([Page 8; Lines 21-22] reciting “Here, the plurality of segments may be equal-width segments arranged in the vertical direction, or equal-width segments arranged in the horizontal direction;”).

21	Regarding claim 16, Wadhwa in view of W. Song teaches the panoramic depth image synthesis method of claim 9, wherein the training the image segmentation neural network to obtain the trained image segmentation neural network based on the position information of the portrait which needs to be segmented and the position annotation information of the portrait which needs to be segmented comprises (see claim 1 rejection above), 
obtaining a first loss function value based on the position information of the portrait which needs to be segmented and the position annotation information of the portrait which needs to be segmented (Zhe; [Page 8; Lines 47-51] reciting “Specifically, the first loss function value is obtained based on the position information of the long strip object to be divided and the position label information of the long strip object to be divided; and determining whether the first loss function value satisfies the first a preset condition; in response to the first loss function value not satisfying the first preset condition, adjusting a parameter value of the image segmentation neural network based on the first loss function value…”; [Page 11, Lines 48-50] reciting “In addition, the L2 loss function only guarantees the similarity between the prediction and the annotation map, which is a Gaussian peak, but there is almost no supervision of the position of the highest value.”); 
determining whether the first loss function value satisfies a first preset condition (Zhe; [Page2; Line 49] reciting “Determining whether the first loss function value satisfies a first preset condition”); and 
in response to the first loss function value not meeting the first preset condition and adjusting the parameter value of the image segmentation neural network based on the first loss function value, the following operation is performed iteratively until the first loss function value meets the first preset condition (Zhe; [Page 2; Lines 51-53] reciting “Responding to the first loss function value does not satisfy the first preset condition, adjusting the parameter value of the image segmentation neural network based on the first loss function value, and then performing the following operations iteratively until the first loss”): 
a second sub-network is used in the image segmentation neural network to obtain the target area where the at least one portrait in the training image is located, and a third sub-network is used in the image segmentation neural network to obtain the position information of the portrait which needs to be segmented in the target area (Zhe; [Page 2; Lines 53-56] reciting “The function value satisfies the first preset condition: acquiring, by using the first sub-network in the image segmentation neural network, a target region where at least one strip object in the sample image is located, and acquiring the second sub-network in the image segmentation neural network Position information of the elongated object in the target area that needs to be divided.”).  

22	Regarding claim 20, Wadhwa teaches a smartphone, wherein the smartphone comprises a processor ([Page 1; Abstract] reciting “Our system can process a 5.4 megapixel image in 4 seconds on a mobile phone, is fully automatic, and is robust enough to be used by non-experts.”; [Page 12; Section 6] reciting “Our code was implemented in Halide [Ragan-Kelley et al. 2013], then manually scheduled for the CPU.”);
and a storage medium adapted to store a plurality of instructions; and wherein the instructions are adapted to be loaded by the processor and executed in the steps (Zhe; [Page 6; Lines 5-7] reciting “The storage medium provided by the embodiment of the present invention stores executable instructions, and when the executable instructions are executed by the processor, the training method or the image segmentation method of the image segmentation neural network described above is implemented.”) of the panoramic depth image synthesis method of claim 1 (see claim 1 rejection above).

23	Claim(s) 2-3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wadhwa, N., Garg, R., Jacobs, D. E., Feldman, B. E., Kanazawa, N., Carroll, R., ... & Levoy, M. (2018). Synthetic depth-of-field with a single-camera mobile phone. ACM Transactions on Graphics (ToG), 37(4), 1-13. (hereinafter Wadhwa) in view of Zhe et al. (CN 108986127 A) as of claim 1, further in view of W. Song et al. (US 20170118404 A1).

24	Regarding claim 2, Wadhwa in view of Zhe teaches the panoramic depth image synthesis method of claim 1 (see claim 1 rejection above), wherein the first image is a face focus image (Wadhwa; [Page 4; Section 3.2] reciting “The fourth channel encodes the location of the face as a posterior distribution of an isotropic Gaussian centered on the face detection box with a standard deviation of 21 pixels and scaled to be 1 at the mean location.”) and the step of obtaining the face focus image comprises:
activating a camera and detecting whether a camera preview lens contains face data (Wadhwa; [Page 4; Section 3.3] reciting “At inference time, we are provided with an RGB image and face rectangles output by a face detector. Our model is trained to predict the segmentation mask corresponding to the face location in the input (Fig. 3). As a heuristic to avoid including bystanders in the segmentation mask, we seed the network only with faces that are at least one third the area of the largest face and larger than 1.3% the area of the image.”);


25	Wadhwa in view of Zhe does not explicitly teach obtaining the face focus image by activating a face focus mode to take a photo when it is detected that the camera preview lens contains the face data.

W. Song teaches obtaining the face focus image by activating a face focus mode to take a photo when it is detected that the camera preview lens contains the face data ([Abstract] reciting “An embodiment of the disclosure provides a digital image processing device including a camera and a processor configured to detect a face area in a preview image obtained by the camera, obtain a size of the face area, determine a focus area based on the size of the face area, and set focus based on the focus area.”; [0076] reciting “According to an embodiment of the present disclosure, the camera module 291 may include one or more image sensors (e.g., a front sensor or a back sensor), a lens, an Image Signal Processor (ISP) or a flash (e.g., LED or xenon lamp).”).

26	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Wadhwa in view of Zhe) to incorporate the teachings of W. Song to provide a method that can detect if the camera has some type of face data and obtaining the face data, which can be the data taught by Wadhwa in view of Zhe. Doing so would allow the accuracy of focus set through the auto focusing function to be enhanced as stated by W. Song ([0152] recited).

27	Regarding claim 3, Wadhwa in view of Zhe and W. Song teaches the panoramic depth image synthesis method of claim 2 (see claims 1-2 rejections above) wherein the second image is a distance focus image (Wadhwa; [Page 8; Fig. 10 Description] reciting “Results from our calibration procedure. We use a mobile camera to capture images of a fronto-parallel textured target sweeping through all focus and target distances (a). One such image (b). Disparity vs. target distance and best-fit lines for several spatial locations for one focus setting (c).”), and the step of obtaining the distance focus image comprises: 
obtaining the distance focus image by activating a distance focus mode to take a photo, after the face focus mode is activated to take the photo to obtain the face focus image (Wadhwa; [0034] reciting “When a scene includes multiple faces as illustrated at FIG. 5, the multiple faces in the frame may be detected. The distances that correspond to the sizes of each of these faces are then calculated. The distances are sorted and stored for the multiple faces. A divide-et-impera style search may be performed across the focus distances…The outcome of these searches will be a focus distance that would theoretically maximize sharpness across all faces in the photo. ”).

28	Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wadhwa, N., Garg, R., Jacobs, D. E., Feldman, B. E., Kanazawa, N., Carroll, R., ... & Levoy, M. (2018). Synthetic depth-of-field with a single-camera mobile phone. ACM Transactions on Graphics (ToG), 37(4), 1-13. (hereinafter Wadhwa) in view of Zhe et al. (CN 108986127 A) and W. Song et al. (US 20170118404 A1) as of claims 1-2, further in view of Nakamura et al. (JP 2013013119 A).

29	Regarding claim 4, Wadhwa in view of Zhe and W. Song teaches the panoramic depth image synthesis method of claim 2, wherein the method further comprises the step of (see claims 1-2 rejections above): but does not explicitly teach obtaining a close-up image by activating a close-up focus mode when it is detected that the camera preview lens does not contain the face data.

30	Nakamura teaches obtaining a close-up image by activating a close-up focus mode when it is detected that the camera preview lens does not contain the face data ([0094] reciting “On the other hand, when it is determined that the face is not detected, it is determined which photographing mode is set as the "scene position photographing mode" at the time of photographing the target image (step S22). Here, in a case where it is determined that the landscape imaging mode is set as the imaging mode, the target image is grouped into a landscape group in step S24, in a case where it is determined that the close-up imaging mode is set, the target image is grouped into a close-up group in step S26, and in other cases, the target image is grouped into an image group that is not to be grouped in step S28.”).

31	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Wadhwa in view of Zhe and W. Song) to incorporate the teachings of Nakamura to provide a method to obtain a close-up image if the face is not detected by the camera methods taught by Wadhwa in view of Zhe and W. Song. Doing so would allow other methods of camera shooting like "night scene shooting mode", "sports shooting mode", "underwater shooting mode", "sunset shooting mode", and "snow shooting mode" as taught by Nakamura ([0095] recited).

32	Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wadhwa, N., Garg, R., Jacobs, D. E., Feldman, B. E., Kanazawa, N., Carroll, R., ... & Levoy, M. (2018). Synthetic depth-of-field with a single-camera mobile phone. ACM Transactions on Graphics (ToG), 37(4), 1-13. (hereinafter Wadhwa) in view of Zhe et al. (CN 108986127 A) as of claim 1, further in view of Xu et al. (US 20200357095 A1).

33	Regarding claim 7, Wadhwa in view of Zhe teaches the panoramic depth image synthesis method of claim 5, wherein the obtaining the portrait area image by inputting the first image into the trained image segmentation neural network to perform the image segmentation comprises (see claim 1 rejection above), transmitting the first image to a cloud through a network to indicate the cloud to use the image segmentation neural network to segment the first image to obtain the portrait area image (Zhe; [Page 7; Lines 5-7] reciting “The user transmits the collected image containing the long strip object to the cloud through the network, and the image segmentation processing is performed by the cloud using the image segmentation neural network of the embodiment of the present invention.”); and 


34	Wadhwa in view of Zhe does not explicitly teach receiving the portrait area image returned by the cloud.

35	Xu teaches receiving the portrait area image returned by the cloud ([0107] reciting “In this embodiment, after receiving the high resolution layer image returned by the cloud server, the displaying terminal performs a pixel filling processing on the to-be-processed layer image according to the pixels of the high resolution layer image to obtain a high resolution target layered image, and performs a smoothing processing on the high resolution target layered image so as to avoid an occurrence of sharpness distortion and jagged image, etc., thereby improving the resolution of the image.”).

36	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Wadhwa in view of Zhe) to incorporate the teachings of Xu to provide a method that can allow the cloud server to return a type of portrait/image, utilizing the cloud servers and the specific portrait/images taught by Wadhwa and Zhe. Doing so would improve the resolution of the images as stated by Xu ([0107] recited).

37	Claim(s) 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wadhwa, N., Garg, R., Jacobs, D. E., Feldman, B. E., Kanazawa, N., Carroll, R., ... & Levoy, M. (2018). Synthetic depth-of-field with a single-camera mobile phone. ACM Transactions on Graphics (ToG), 37(4), 1-13. (hereinafter Wadhwa) in view of Zhe et al. (CN 108986127 A) as of claim 1, further in view of Ansorregui et al. (US 20200380739 A1).
 
38	Regarding claim 18, Wadhwa in view of Zhe teaches the panoramic depth image synthesis method of claim 1, wherein the step of aligning and merging the portrait area image and the second image to obtain the merged image comprises (see claim 1 rejection above):
using a pixel alignment algorithm, calculating an offset of the portrait area image relative to the second image (Wadhwa; [Page 10; Fig. 13 Description] reciting “Large disk blur kernels are generated by offseting and truncating a distance function. These ideal kernels can be well approximated by a discretized disk. The sparsity of the discrete kernel’s gradient in y—shown with red and blue representing positive and negative values, respectively— allows us to perform a scatter blur with far fewer operations per pixel.”; [Page 10; Section 5.3] reciting “After the individual layers have been blurred, they are upsampled to full-resolution and composited together, back to front. One special in-focus layer is inserted over the sub-image containing dfocus, taken directly from the full resolution input, to avoid any quality loss from the downsample/upsample round trip imposed by the blur pipeline.”), 

39	Wadhwa in view of Zhe does not explicitly teach performing a pixel replacement of the portrait area image on the corresponding pixels of the second image to obtain the merged image.

40	Ansorregui teaches performing a pixel replacement of the portrait area image on the corresponding pixels of the second image to obtain the merged image ([0005] reciting “According to various embodiments of the present disclosure, a method for processing an image is provided. The method comprises: comparing a first image and a modified second image, the modified second image comprising a copy of the first image in which at least one pixel value of at least one replaced pixel has been modified; and based on the result of the comparison, generating first image reconstruction information which can be combined with the modified second image to reconstruct the first image.”).

41	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Wadhwa in view of Zhe) to incorporate the teachings of Ansorregui to provide a method that can perform a pixel replacement of the portrait area image utilizing the “pixel alignment method” that was taught by Wadhwa. Doing so would provide efficient reconstruction of original images from modified images as stated by Ansorregui ([0010] recited).

42	Regarding claim 19, Wadhwa in view of Zhe teaches the panoramic depth image synthesis method of claim 1 (see claim 1 rejection above).

43	Wadhwa does not explicitly teach a non-transitory storage medium, wherein the storage medium stores one or more programs, and the one or more programs are executed by one or more processors to implement the steps in the panoramic depth image synthesis method of claim 1.

44	Ansorregui teaches a non-transitory storage medium, wherein the storage medium stores one or more programs, and the one or more programs are executed by one or more processors to implement the steps in ([0008] reciting “According to various embodiments of the present disclosure, an apparatus for processing an image is provided. The apparatus comprises: at least one processor; and memory adapted to store computer program instructions which, when executed by the at least one processors, cause the image processing apparatus to compare a first image and a modified second image, the modified second image comprising a copy of the first image in which at least one pixels have been replaced, and based on the result of the comparison, generate image reconstruction information which can be combined with the modified second image to reconstruct the first image.”; [0078] reciting “According to various embodiments of the present disclosure, a non-transitory computer readable storage medium adapted to store computer program instructions which, when executed, perform a method according to any one of the preceding methods.”) the panoramic depth image synthesis method of claim 1.

45	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Wadhwa in view of Zhe) to incorporate the teachings of Ansorregui to provide the combination of the usage of non-transitory computer readable storage medium to execute instructions like all of the instructions taught by Wadhwa in view of Zhe. Doing so would allow the encryption of the reconstruction information as stated by Ansorregui ([0072] recited).

Conclusion
46	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

47	Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNNY TRAN LE whose telephone number is (571)272-5680. The examiner can normally be reached Mon-Thu: 7:30am-5pm; First Fridays Off; Second Fridays: 7:30am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached at (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JOHNNY T LE/            Examiner, Art Unit 2614                                                                                                                                                                                            
/KENT W CHANG/            Supervisory Patent Examiner, Art Unit 2614
Read full office action
Prosecution Timeline

Jan 30, 2024
Application Filed
Oct 14, 2025
Non-Final Rejection mailed — §103
Jan 09, 2026
Response Filed
Mar 27, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/602,009
Patent 12614243
GPU Processor System
2y 1m to grant Granted Apr 28, 2026
Study what changed to get past this examiner. Based on 1 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
60%
Grant Probability
99%
With Interview (+50.0%)
2y 7m (~3m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 5 resolved cases by this examiner. Grant probability derived from career allowance rate.