Last updated: April 19, 2026
Application No. 18/645,194
FULLY AUTOMATED ESTIMATION OF SCENE PARAMETERS

Final Rejection §101§103§112
Filed
Apr 24, 2024
Examiner
DU, HAIXIA
Art Unit
2611
Tech Center
2600 — Communications
Assignee
Rembrand Inc.
OA Round
2 (Final)
Interview Optional

— +18.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 553 resolved cases, 2023–2026
Examiner Intelligence

DU, HAIXIA View full profile →
Grants 86% — above average
Career Allow Rate
477 granted / 553 resolved
+24.3% vs TC avg
Strong +18% interview lift
Without
With
+18.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
22 currently pending
Career history
575
Total Applications
across all art units
Statute-Specific Performance

§101
10.0%
-30.0% vs TC avg
§103
50.1%
+10.1% vs TC avg
§102
8.4%
-31.6% vs TC avg
§112
20.2%
-19.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 553 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is in response to Applicant’s Amendments and Remarks filed on 2/3/2026. Claims 1, 8, and 15 have been amended. Claims 1-20 are present for examination.

Response to Arguments
Applicant's arguments filed 2/3/2026 regarding the 35 USC 101 rejections have been fully considered but they are not persuasive. 
Applicant submits: “Similar to the claim language of Example 39 discussed in the August Memo, even it, as alleged by the Examiner, the present ‘calculating’ and ‘generating’ claim limitations involve or rely upon mathematical concepts, these claim limitations do not ‘set forth or describe any mathematical relationships, calculations, formulas, or equations using words or mathematical symbols.’ Accordingly, these claim limitations do not, individually or collectively, recite a judicial exception.” (See Remarks filed on 2/3/2026, p. 10, 2nd para.)
The examiner respectfully disagrees. The limitations referred in the above arguments are “calculating a relative depth value for each of one or more pixels included in the input scene that correspond to the one or more bounding boxes; calculating an average relative head size based on the one or more bounding boxes and relative depth values associated with the one or more pixels; and generating a depth scale based on the average relative head size and a known real-world dimension of an average human head.” (Emphasis added.)
Calculating a relative depth value is clearly a mathematical calculation because it is calculating a mathematical value (i.e., a relative depth value). Similarly, calculating an average relative head size is clearly a mathematical calculation because it is calculating a mathematical value (i.e. an average relative head size). Generating a depth scale based on the average relative head size and a known real-world dimension of an average human head can be interpret as calculating a mathematical value (i.e. depth scale) using two other mathematical values (i.e. (1) average relative head size and (2) known real-world dimension of an average human head). 
These steps are different than the steps of Example 39, which recites “applying one or more transformations to each digital facial image… training the neural network in a first stage … training the neural network in a second stage…” The steps in Example 39 do not recite any mathematical relationships, formulas, or calculations. However, the steps of claim 1 of the present application discussed above clearly recite mathematical calculations. Therefore, these steps are abstract ideas. Applicant’s arguments are not persuasive.
Applicant further submits: “the human mind is not equipment to convert a 2D input scene into a latent feature representation of the 2D input scene. Applicant also submits that “the human mind is not equipment to identify depictions of human faces included in the latent feature space representation of the 2D input scene based on a search of the latent feature space representation and one or more latent feature vectors describing facial features. Applicant further contends that “the human mind is not equipped to determine pixel-level measurements of bounding boxes included in an input scene captured by a camera.” (See Remarks filed on 2/3/2026, p. 11.)
The examiner respectfully disagrees. 
The disclosure does not provide a definition for “latent feature space” or “latent feature representation”. A person skilled in the art would understand that a latent feature space is usually a lower dimension than the input data. Here the input scene is 2D, therefore the latent feature space has been interpreted as 1D or at most 2D. In addition, claim 1 does not recite how the 2D input scene is converted into the latent feature space representation. Under the broadest reasonable interpretation (BRI), converting a 2D input scene into a latent feature space representation can be interpreted as identifying 1D or 2D features in the 2D input scene and make the 1D or 2D features as the latent feature space representation. A human mind can easily convert a 2D input scene into a 1D or 2D latent feature space by identifying one or more features in the 2D input scene and using the identified features as the latent feature space representation of the input scene.
Because a human mind can easily identify features in a 2D input scene, it can also easily search the latent feature space representations and latent feature vectors describing facial features to identify depiction of human faces in the identified features included in the latent feature space representation by comparing the latent feature vectors describing facial features with each of the latent feature space representations.
A human mind can also easily determine bounding boxes associated with the input scene and the heights and width of the bounding boxes. Even though a human mind can not accurately determine the sizes in measurements of pixels, it can estimate the sizes in pixels based on the size of the input scene. It can also use a generic computer as a tool to determine the height and width of a bounding box in measurements of pixels. According to MPEP  2106.04(a)(2)(III)(C), using a computer as a tool to perform a mental process is still directed to a mental process. 
Therefore, Applicant’s arguments above are not persuasive.

Applicant’s arguments with respect to claim(s) 1 regarding the 35 USC 103 rejections have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea without significantly more. 

MPEP 2106 III provides a flowchart for the subject matter eligibility test for product and processes. The claim analysis following the flowchart is as follows:
Regarding claim 1: 
Step 1: Is the claim to a process, machine, manufacture or composition of matter? 
Yes. It recites a method, which is a process.
Step 2A, Prong One: Does the claim recite an abstract idea, law of nature, or nature phenomenon?
Yes. 
Claim 1 recites the steps of “converting a two-dimensional (2D) input scene captured by a camera into a latent feature space representation of the input scene; identifying one or more depictions of human faces included in the latent feature space representation of the input scene, based on a search of the latent feature space representation and one or more latent feature vectors describing facial features; determining, for each of one or more bounding boxes associated with the input scene, a height and a width each associated with the bounding box and expressed in measurements of pixels, wherein each bounding box represents a head size associated with a different one of the one or more identified depictions of human faces; calculating a relative depth value for each of one or more pixels included in the input scene that correspond to the one or more bounding boxes; calculating an average relative head size based on the one or more bounding boxes and relative depth values associated with the one or more pixels; and generating a depth scale based on the average relative head size and a known real-world dimension of an average human head.” 
The step of converting can be performed as a mental process. Under the broadest reasonable interpretation (BRI), converting a 2D input scene into a latent feature space representation can be interpreted as identifying 1D or 2D features in the 2D input scene and make the 1D or 2D features as the latent feature space representation. A human mind can easily convert a 2D input scene into a 1D or 2D latent feature space by identifying one or more features in the 2D input scene and using the identified features as the latent feature space representation of the input scene.
The step of identifying one or more depictions of human faces can be performed as a mental process. Because a human mind can easily identify features in a 2D input scene, it can also easily search the latent feature space representations and latent feature vectors describing facial features to identify depiction of human faces in the identified features included in the latent feature space representation by comparing the latent feature vectors describing facial features with each of the latent feature space representations.
The step of determining bounding boxes and height and width associated with the bounding boxes and expressed in measurements of pixels can be performed as a mental process with the simple aid of pen and paper and/or computer as a tool because a person can generate bounding boxes by putting rectangles surrounding human faces in an image using pen and paper, and it can easily determine the size expressed in measurements of pixels of a bounding box using a computer as a tool. According to MPEP  2106.04(a)(2)(III)(C), using a computer as a tool to perform a mental process is still directed to a mental process. 
The calculating steps can be mathematical concepts because they involve mathematical calculations.
The step of generating a depth scale based on the average relative head size and a known real-world dimension of an average human head can be a mathematical concept because it involves mathematical calculations using mathematical values.
Step 2A, Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. 
The claim does not recite any additional elements besides the abstract ideas.
Therefore, this judicial exception is not integrated into a practical application because no additional elements are recited in the claim. 
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. 
The claim does not recite any additional elements besides the abstract ideas.
Therefore, the claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because no additional elements are recited in the claim.
Therefore, claim 1 is not eligible subject matter under 35 USC 101.

Regarding claim 2, it depends from claim 1 with additional limitations further limiting the input scene and the calculating the average relative head size step, which are not additional elements but further limit the mental process and mathematical concept respectively. Therefore, no additional elements are recited in claim 2 to integrate the abstract ideas into practical application or amount to significantly more than the abstract ideas.
Therefore, claim 2 is not eligible subject matter under 35 USC 101.

Regarding claim 3, it depends from claim 1 with additional limitation further limiting the calculating the average relative head size step, which are not additional elements but further limit the mathematical concept. Therefore, no additional elements are recited in claim 3 to integrate the abstract ideas into practical application or amount to significantly more than the abstract ideas.
Therefore, claim 3 is not eligible subject matter under 35 USC 101.

Regarding claim 4, it depends from claim 1 with additional limitation of the known real-world dimension of the average human head comprises a mental-crinion distance, which is a mathematical value that further limits the mathematical concept. Therefore, no additional elements are recited in claim 4 to integrate the abstract ideas into practical application or amount to significantly more than the abstract ideas.
Therefore, claim 4 is not eligible subject matter under 35 USC 101.

Regarding claim 5, it depends from claim 1 with additional limitations of estimating one or more real-world dimensions for a scene object included in the input scene based on the depth value and one or more pixel dimensions associated with the scene object. The estimating step can be a mental process because estimating dimensions of a scene object can be performed mentally or mathematical concept because it can be done with mathematical calculations. Therefore, no additional elements are recited in claim 5 to integrate the abstract ideas into practical application or amount to significantly more than the abstract ideas.
Therefore, claim 5 is not eligible subject matter under 35 USC 101.

Regarding claim 6, it depends from claim 1 with additional limitations of estimating pixel dimensions associated with world object based on the one or more real-world object dimensions, the specified insertion point, and the depth scale. The estimating step can be a mental process because estimating pixel dimensions associated with a world object can be performed mentally or mathematical concept because it can be done with mathematical calculations. Therefore, no additional elements are recited in claim 6 to integrate the abstract ideas into practical application or amount to significantly more than the abstract ideas.
Therefore, claim 6 is not eligible subject matter under 35 USC 101.

Regarding claim 7, it depends from claim 6 with additional limitation of generating a modified scene based on the input scene, the world object, and the specified insertion point, which can be a mental process by the aid of pen and paper because a person can draw a sketch of an object based on the world object at the insertion point on an image of the input scene. Therefore, no additional elements are recited in claim 7 to integrate the abstract ideas into practical application or amount to significantly more than the abstract ideas.
Therefore, claim 7 is not eligible subject matter under 35 USC 101.

Regarding claim 8-14, they respectively recite similar limitations of claims 1-7 but in a non-transitory computer-readable media form. The non-transitory computer-readable media are generic computer components that do not integrate the recited abstract ideas into practical application or amount to significantly more than the abstract ideas.
Therefore, claims 8-14 are not eligible subject matter under 35 USC 101.
Regarding claims 15-20, they respectively recite similar limitations of claims 1-7 but in a system form with additional elements of memories and processors. Memories and processors are generic computer components that do not integrate the recited abstract ideas into practical application or amount to significantly more than the abstract ideas.
Therefore, claims 15-20 are not eligible subject matter under 35 USC 101.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 1, it recites “determining, for each of one or more bounding boxes associated with the input scene, a height and a width each associated with the bounding box…” However, claim 1 only recites a two-dimensional (2D) input scene captured by a camera without any previous disclosure of bounding boxes. It is not clear whether the bounding boxes are physically placed in the scene and captured by the camera or generated after the 2D input scene is captured by a camera. If the bounding boxes are generated after the 2D input scene is captured, such steps are missing from claim 1, rendering claim 1 incomplete and such omission amounting to a gap between steps. See MPEP § 2172.01.
Therefore, claim 1 is indefinite.
Claims 2-7 depend from claim 1 but fail to cure the deficiencies of claim 1. Claims 8 and 15 respectively recite similar limitations of claim 1 discussed above. Claims 9-14 depend from claim 8 but fail to cure the deficiencies of claim 8. Claims 16-20 depend from claim 15 but fail to cure the deficiencies of claim 15.
Therefore, claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 2, 8, 9, 15, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over US Patent Publication No. 20240185449 A1 to Bhatt et al. in view of  US Patent No. 11514714 B1 to Soltani et al. and Japanese Patent Publication JP2012244466A to Shotaro et al.
Regarding claim 1, Bhatt discloses  A computer-implemented method for estimating a real-world size of an object included in an input scene (Bhatt, Abstract, para. [0036], disclosing a vertical head height measure can be calculated), the computer-implemented method comprising: 
identifying one or more depictions of human faces included in a two-dimensional (2D) input scene captured by a camera (Bhatt, FIG. 5, showing a 2D input scene where one or more depictions of human faces are identified, para. [0005], disclosing a camera in the video conference meeting room may be provided with a human head detector machine learning model which is trained to detect or classify human heads from input camera video frame or image data, and to identify, for each detected human head, a head bounding box with specified image plane coordinate and dimension information in a head box data structure); 
determining, for each of one or more bounding boxes associated with the input scene, a height and a width each associated with the bounding box and expressed in measurements of pixels, wherein each bounding box represents a head size associated with a different one of the one or more identified depictions of human faces (Bhatt, Abstract, disclosing extracting a pixel width measure and a pixel height measure from each head bounding box, FIG. 5, showing a 2D input scene where one or more depictions of human faces are identified with a bounding box, each bounding box represents a head size associated with a different one of the one or more depictions of human faces, para. [0003], disclosing a camera in the video conference meeting room may be provided with a human head detector machine learning model which is trained to detect or classify human heads from input camera video frame or image data, and to identify, for each detected human head, a head bounding box with specified image plane coordinate and dimension information in a head box data structure, a head bounding box with specified image plane coordinate and dimension information in a head box data structure {xbox, ybox, width, height}, para. [0035], disclosing the number of pixels for the head height and width measures); 
calculating a relative depth value for each of one or more pixels included in the input scene that correspond to the one or more bounding boxes (Bhatt, para. [0035], disclosing computing a look-up table for min-median-max head sizes at various distances, detecting the location of each head in a 2D viewing plane with specified image plane coordinates and associated width and height measures for a head bounding box, by using the reverse look-up table operation, the distance can be determined between the camera and each head that is located on the center line of sight of camera focal point, the distance can correspond to a relative depth value for each of one or more pixels included in the input scene that correspond to one or more bounding boxes); 
calculating an average relative head size based on the one or more bounding boxes and relative depth values associated with the one or more pixels (Bhatt, para. [0036], disclosing a vertical head height measure can be calculated based on an angular extent angle of the upper half of the vertical head height and the distance between the camera and the head, the human head has a height corresponding to the vertical dimension of a head bounding box, the vertical head height measure can correspond to an average relative head size calculated based on one or more bounding boxes and the distance as the relative depth values associated with the one or more pixels in the bounding box of the head).
However, Bhatt does not expressly disclose converting a two-dimensional (2D) input scene captured by a camera into a latent feature space representation of the input scene; identifying one or more depiction of human faces included in the latent feature space representation of the input scene, based on a search of the latent feature space representation and one or more latent feature vectors describing facial features; generating a depth scale based on the average relative head size and a known real-world dimension of an average human head.
On the other hand, Soltani discloses converting a two-dimensional (2D) input scene captured by a camera into a latent feature space representation of the input scene (Soltani, FIG. 1, showing image data acquisition system getting input from set of cameras and providing image data to inference server including a recognition model, detection model, and face inference subsystem, col. 3, lines 60-66, disclosing using the recognition model to generate face vectors in a latent space based on the features of a sub-image); identifying one or more depiction of human faces included in the latent feature space representation of the input scene, based on a search of the latent feature space representation and one or more latent feature vectors describing facial features (Soltani, col. 4, lines 3-5, disclosing using the recognition model to recognize a face and track the face between different images, lines 55-59, disclosing providing face vectors to recognize faces, detect presents of individuals, col. 9, lines 49-60, disclosing determining a set of matched faces based on the search parameters of a search request, transforming the search parameters into a search vector within a latent space of the face vectors, determining a latent distance between the search vector and a set of representative vectors to find a match, indicating determining a match can correspond to identifying one or more depiction of human faces included in the latent feature space representation of the input scene, based on a search of the latent feature space representation and one or more latent feature vectors describing facial features, using the search vector and the set of representative vectors describing facial features).
Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine Bhatt and Soltani. The suggestion/motivation would have been to provide a set of face vectors and associated bounding boxes, as suggested by Soltani (see Soltani, col. 4, lines 55-63).
However, Bhatt or Soltani does not expressly disclose generating a depth scale based on the average relative head size and a known real-world dimension of an average human head.
On the other hand, Shotaro discloses generating a depth scale based on the average relative head size and a known real-world dimension of an average human head (Shotaro, Translation, para. [0018], disclosing detecting the sizes h1 and w1 of the face in the image, which are the vertical dimension and the horizontal dimension of the face, para. [0019], disclosing calculating the ratio between the size of the face detected by the face detection unit and the average face model, the average face model has data on the average sizes ha and wa of actual human faces, and ultimately calculating the display screen distance, para. [0020], disclosing calculating the ratio between the face sizes h1, w1 in the image and the average face size ha, wa in the average face model, indicating the detected face size can correspond to an average relative head size in the image and face size of the average face model can correspond to a known real-world dimension of an average human head because face is a part of a human head, and the ratio between the two sizes can correspond to a depth scale because it can be used to calculate the display screen distance as a depth).
Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine Bhatt in view of Soltani and Shotaro. The suggestion/motivation would have been to provide natural stereoscopic display, as suggested by Shotaro (see Shotaro, Translation, para. [0011]).

    PNG
    media_image1.png
    554
    734
    media_image1.png
    Greyscale


Regarding claim 2, the combination of Bhatt, Soltani, and Shotaro discloses the computer-implemented method of claim 1, wherein the input scene includes a 2D representation of a three-dimensional (3D) scene captured by a camera (Bhatt, FIG. 5, showing a 2D input scene where one or more depictions of human faces are identified, para. [0005], disclosing a camera in the video conference meeting room may be provided with a human head detector machine learning model which is trained to detect or classify human heads from input camera video frame or image data, indicating the video conference meeting room is a 3D scene captured by the camera, and the 2D input scene includes a 2D representation of the video conference meeting room as the 3D scene), and calculating the average relative head size is further based on a relative focal length associated with the camera (Bhatt, para. [0031], disclosing a meeting participant located on the center line of sight of camera focal point at a distance of d=0.5 meters will appear larger to the camera than a meeting participant located on the center line of sight of camera focal point at a larger distance of d=1.0, para. [0036], disclosing calculating the vertical head height measure based on the angular extent angle of the upper half of the vertical head height and the distance between the camera and the head, from the vantage of the camera, the vertical head height makes an angle extending from the bottom to the top of the head, indicating the distance between the camera and the participant located on the center line of sight of the camera focal point is based on a relative focal length associated with the camera, and calculating the average relative head size for the participant is based on the distance and thus based on the relative focal length).

Regarding claim 8, it recites similar limitations of claim 1 but in a non-transitory computer-readable media form. The rationale of claim 1 rejection is applied to reject claim 8. In addition, Bhatt discloses memory (see Bhatt, FIG. 21).

Regarding claim 9, it recites similar limitations of claim 2 but in a non-transitory computer-readable media form. The rationale of claim 2 rejection is applied to reject claim 9. In addition, Bhatt discloses memory (see Bhatt, FIG. 21).

Regarding claim 15, it recites similar limitations of claim 1 but in a system form. The rationale of claim 1 rejection is applied to reject claim 15. In addition, Bhatt discloses memory and processor (see Bhatt, FIG. 21).

Regarding claim 16, it recites similar limitations of claim 2 but in a system form. The rationale of claim 2 rejection is applied to reject claim 16. In addition, Bhatt discloses memory and processor (see Bhatt, FIG. 21).

Claim(s) 3, 10, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Bhatt, Soltani, and Shotaro as applied to claims 1, 8, and 15 above, and further in view of Chinese Patent Publication No. CN110287361B to Han.
Regarding claim 3, the combination of Bhatt, Soltani, and Shotaro discloses the computer-implemented method of claim 1, one or more confidence values associated with the one or more bounding boxes (Bhatt, para. [0052], disclosing computing for each head bounding box, a corresponding confidence measure or score, para. [0054], disclosing the human head data structure include the image plane coordinates of the detected head, width and height of the head bounding box, and confidence score. However, the combination of Bhatt, Soltani, and Shotaro does not expressly discloses wherein calculating the average relative head size is further based on one or more confidence values associated with the one or more bounding boxes.
On the other hand, Han discloses calculating the average relative head size is further based on one or more confidence values associated with the one or more bounding boxes (Han, Translation, para. [n0073], disclosing when the face confidence score is greater than the preset confidence score threshold, obtaining the face length and face width of the face region in the image of the person, indicating the face length and face width of the face region can correspond to the average relative head size calculated based on the one or more confidence values associated with the face region corresponding to the bounding boxes).
Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Bhatt, Soltani, and Shotaro with Han. The suggestion/motivation would have been to filter out the image of the person with low face confidence score, reduce image labeling workload and improve the efficiency, as suggested by Han (see Han, Translation, paras. [n0061] and [n0080]).

Regarding claim 10, it recites similar limitations of claim 3 but in a non-transitory computer-readable media form. The rationale of claim 3 rejection is applied to reject claim 10. In addition, Bhatt discloses memory (see Bhatt, FIG. 21).
Regarding claim 17, it recites similar limitations of claim 3 but in a system form. The rationale of claim 3 rejection is applied to reject claim 17. In addition, Bhatt discloses memory and processor (see Bhatt, FIG. 21).

Claim(s) 4, 11, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Bhatt, Soltani, and Shotaro as applied to claims 1, 8, and 15 above, and further in view of International Application Publication No. WO 2017212490 A1 to Siboni et al.
Regarding claim 4, the combination of Bhatt, Soltani, and Shotaro discloses the computer-implemented method of claim 1. However, the combination of Bhatt, Soltani, and Shotaro does not expressly disclose wherein the known real-world dimension of the average human head comprises a menton-crinion distance.
On the other hand, Siboni discloses the known real-world dimension of the average human head comprises a menton-crinion distance (Siboni, p. 11, last para., disclosing thee nominal facial height is generally the menton-crinion length).
Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Bhatt, Soltani, and Shotaro with Siboni. The suggestion/motivation would have been to determine the user’s looking direction, as suggested by Siboni (see Siboni, p. 11, last para.).

Regarding claim 11, it recites similar limitations of claim 4 but in a non-transitory computer-readable media form. The rationale of claim 4 rejection is applied to reject claim 11. In addition, Bhatt discloses memory (see Bhatt, FIG. 21).
Regarding claim 18, it recites similar limitations of claim 4 but in a system form. The rationale of claim 4 rejection is applied to reject claim 18. In addition, Bhatt discloses memory and processor (see Bhatt, FIG. 21).

Claim(s) 5, 12, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Bhatt, Soltani, and Shotaro as applied to claims 1, 8, and 15 above, and further in view of US Publication No. 20250000476 A1 to Hansroul.
Regarding claim 5, the combination of Bhatt, Soltani, and Shotaro discloses the computer-implemented method of claim 1. However, the combination of Bhatt, Soltani, and Shotaro does not expressly disclose further comprising estimating one or more real-world dimensions for a scene object included in the input scene based on the depth scale and one or more pixel dimensions associated with the scene object.
On the other hand, Hansroul discloses estimating one or more real-world dimensions for a scene object included in the input scene based on the depth scale and one or more pixel dimensions associated with the scene object (Hansroul, FIG. 11, showing an object in a scene, para. [0053], disclosing the actual dimension of an object is determined from the image dimension of the object and the magnification factor obtained using the SOD from the depth map, indicating the actual dimension of an object can correspond to the one or more real-world dimensions for the object corresponding to the scene object included in the input scene based on the magnification factor obtained using the SOD from the depth map corresponding to the depth scale and the image dimension of the object as the one or more pixel dimension associated with the scene object).
Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Bhatt, Soltani, and Shotaro with Hansroul. The suggestion/motivation would have been to determine dimensions of a feature of a subject in an image, as suggested by Hansroul (see Hansroul, Abstract).

Regarding claim 12, it recites similar limitations of claim 5 but in a non-transitory computer-readable media form. The rationale of claim 5 rejection is applied to reject claim 12. In addition, Bhatt discloses memory (see Bhatt, FIG. 21).

Regarding claim 19, it recites similar limitations of claim 5 but in a system form. The rationale of claim 5 rejection is applied to reject claim 19. In addition, Bhatt discloses memory and processor (see Bhatt, FIG. 21).

Claim(s) 6, 7, 13, 14, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Bhatt, Soltani, and Shotaro as applied to claims 1, 8, and 15 above, and further in view of US Patent No. 11494997 B1 to Ho.
Regarding claim 6, the combination of Bhatt, Soltani, and Shotaro discloses the computer-implemented method of claim 1. However, the combination of Bhatt, Soltani, and Shotaro does not expressly disclose further comprising: estimating, for a world object including one or more real-world object dimensions and a specified insertion point within the input scene, one or more pixel dimensions associated with the world object based on the one or more real-world object dimensions, the specified insertion point, and the depth scale.
On the other hand, Ho discloses estimating, for a world object including one or more real-world object dimensions and a specified insertion point within the input scene, one or more pixel dimensions associated with the world object based on the one or more real-world object dimensions, the specified insertion point, and the depth scale (Ho, col. 2, lines 44-52, disclosing calculating display dimensions for the second object based on the distance scaling factor and the real world dimensions of the second object, generate a display object image by scaling the second object image to the display dimensions, and display the display object image onto the display based on a comparison of the first distance for the first object and the second distance for the second object, where the display object image is placed at the 2D display position, indicating the second object can correspond to a world object including one or more real-world object dimensions and the 2D display position as a specified insertion point within the input scene, the display dimensions as the one or more pixel dimensions associated with the second object as the world object estimated based on the  one or more real-world object dimensions, the specified insertion point, and distance scaling factor as the depth scale).
Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Bhatt, Soltani, and Shotaro with Ho. The suggestion/motivation would have been for displaying objects with real world dimensions in an augmented reality (AR) system, as suggested by Ho (see Ho, col. 2, lines 10-11).
Regarding claim 7, the combination of Bhatt, Soltani, Shotaro, and Ho discloses the computer-implemented method of claim 6, further comprising generating a modified scene based on the input scene, the world object, and the specified insertion point (Ho, col. 2, lines 44-52, disclosing calculating display dimensions for the second object based on the distance scaling factor and the real world dimensions of the second object, generate a display object image by scaling the second object image to the display dimensions, and display the display object image onto the display based on a comparison of the first distance for the first object and the second distance for the second object, where the display object image is placed at the 2D display position, indicating the displaying the display object image onto the display can correspond to generating a modified scene based on the original display as the input scene, the second object as the world object, and the 2D display position as the specified insertion point). Before the invention was effectively filed, it would have been obvious for a person skilled in the art to combine the combination of Bhatt, Soltani, and Shotaro with Ho. The suggestion/motivation would have been for displaying objects with real world dimensions in an augmented reality (AR) system, as suggested by Ho (see Ho, col. 2, lines 10-11).

Regarding claim 13, it recites similar limitations of claim 6 but in a non-transitory computer-readable media form. The rationale of claim 6 rejection is applied to reject claim 13. In addition, Bhatt discloses memory (see Bhatt, FIG. 21).

Regarding claim 14, it recites similar limitations of claim 7 but in a non-transitory computer-readable media form. The rationale of claim 7 rejection is applied to reject claim 14. In addition, Bhatt discloses memory (see Bhatt, FIG. 21).

Regarding claim 20, it recites similar limitations of claims 6 and 7 but in a system form. The rationale of claims 6 and 7 rejections is applied to reject claim 20. In addition, Bhatt discloses memory and processor (see Bhatt, FIG. 21).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAIXIA DU whose telephone number is (571)270-5646. The examiner can normally be reached Monday - Friday 8:00 am-4:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HAIXIA DU/Primary Examiner, Art Unit 2611
Read full office action
Prosecution Timeline

Apr 24, 2024
Application Filed
Oct 31, 2025
Non-Final Rejection — §101, §103, §112
Feb 03, 2026
Response Filed
Mar 05, 2026
Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/596,505
Patent 12602857
GENERATING IMAGE DATA
2y 5m to grant Granted Apr 14, 2026
18/685,611
Patent 12597204
MODEL GENERATING DEVICE, MODEL GENERATING SYSTEM, MODEL GENERATING METHOD, AND PROGRAM
2y 5m to grant Granted Apr 07, 2026
18/396,132
Patent 12573137
System and Method for Unsupervised and Autonomous 4D Dynamic Scene and Objects Interpretation, Segmentation, 3D Reconstruction, and Streaming
2y 5m to grant Granted Mar 10, 2026
18/552,777
Patent 12561882
IMAGE RENDERING METHOD AND APPARATUS
2y 5m to grant Granted Feb 24, 2026
18/373,074
Patent 12555304
RAY TRACING USING INDICATIONS OF RE-ENTRY POINTS IN A HIERARCHICAL ACCELERATION STRUCTURE
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
86%
Grant Probability
99%
With Interview (+18.0%)
2y 6m
Median Time to Grant
Moderate
PTA Risk
Based on 553 resolved cases by this examiner. Grant probability derived from career allow rate.