Last updated: April 19, 2026

Application No. 17/966,152

INTERACTIVE IMAGE GENERATION

Non-Final OA §103§DP

Filed

Oct 14, 2022

Examiner

DEMETER, HILINA K

Art Unit

2617

Tech Center

2600 — Communications

Assignee

Outward Inc.

OA Round

5 (Non-Final)

Interview Optional

— +19.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 659 resolved cases, 2023–2026

Examiner Intelligence

DEMETER, HILINA K View full profile →

Grants 72% — above average

Career Allow Rate

472 granted / 659 resolved

+9.6% vs TC avg

Strong +19% interview lift

Without

With

+19.4%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

27 currently pending

Career history

686

Total Applications

across all art units

Statute-Specific Performance

§101

8.7%

-31.3% vs TC avg

§103

61.0%

+21.0% vs TC avg

§102

14.5%

-25.5% vs TC avg

§112

6.7%

-33.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 659 resolved cases

Office Action

§103 §DP

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/23/2025 has been entered.

Response to Arguments
Applicant’s argument with respect to the Double Patenting Rejection has been withdrawn in light of the claim amendment. 

Applicant's arguments filed 12/23/2025 have been fully considered but they are not persuasive. 

On page 8, Applicant argues that the prior arts do not teach “generating an interactive image of the scene comprising a plurality of interactive features modifiable by an end user according to user preference, wherein the generated interactive image of the scene comprises a two-dimensional image with at least partial three-dimensional capabilities but without having an underlying three-dimensional model and wherein the at least partial three-dimensional capabilities are facilitated by the one or more machine learning based networks that have at least in part been trained on training images generated from three-dimensional models and wherein the at least partial three-dimensional capabilities are facilitated by the one or more machine learning based on networks that have at least in part been trained on training images generated from three-dimensional models”.

In response: As disclosed in Beall para. [0064], a multi-view interactive digital media representation can use a series of 2-D images of a physical object taken from multiple viewpoints. When the 2-D images are output to a display, the physical object can appear to undergo a 3-D transformation, such as a rotation in 3-D space is disclosed, generating an interactive image of the scene comprising a plurality of interactive features modifiable by an end user according to user preference, para. [0064], note that this embodiment of the multi-view interactive digital media representation approach differs from using a full 3-D model of the physical object. Also see para. [0069], note that in this embodiment of the multi-view interactive digital media representation approach, because of the elimination of the 3-D modeling steps, user-selected objects from user generated 2-D images can be converted quickly to a multi-view interactive digital media representation and then output to a display in real-time, wherein the generated interactive image of the scene comprises a two-dimensional image with at least partial three-dimensional capabilities but without having an underlying three-dimensional model and wherein the at least partial three-dimensional capabilities are facilitated by the one or more machine learning based networks that have at least in part been trained on training images generated from three-dimensional models, and wherein the at least partial three-dimensional capabilities are facilitated by the one or more machine learning based on networks that have at least in part been trained on training images generated from three-dimensional models. Also see in para. [0227] and FIGS. 12 to 17 describe prior to training a machine learning algorithm to recognize landmarks on a type of object, the landmarks can be selected. Therefore, the stated argument is not persuasive.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-9, 11, 13-14, 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Eder et al. (US Publication Number 2021/0279957 A1, hereinafter “Eder”) in view of Beall et al. (US Publication Number 2019/0332866 A1, hereinafter “Beall”).

(1) regarding claim 1:
As shown in fig. 1, Eder disclosed a method (para. [0049], note that FIG. 1 illustrates a block diagram providing a high-level overview of components and functions of system(s) and method(s)), comprising: 
processing one or more sets of captured images of a scene using one or more machine learning based networks (para. [0059], note that a machine learning model may be trained to use a 3D model and metadata as inputs, and trained to spatially localize the metadata based on semantic or instance segmentation of the 3D model. Also see para. [0061], note that a user may attach high-resolution images of the scene and associated comments to a spatially localized annotation in the virtual representation in order to better indicate a feature of the location. Also see fig. 6B, para. [0119]). 
Eder disclosed most of the subject matter as described as above except for specifically teaching generating an interactive image of the scene comprising a plurality of interactive features modifiable by an end user according to user preference, wherein the generated interactive image of the scene comprises a two-dimensional image with at least partial three-dimensional capabilities but without having an underlying three-dimensional model and wherein the at least partial three-dimensional capabilities are facilitated by the one or more machine learning based networks that have at least in part been trained on training images generated from three-dimensional models and wherein the at least partial three-dimensional capabilities are facilitated by the one or more machine learning based on networks that have at least in part been trained on training images generated from three-dimensional models.
However, Beall disclosed generating an interactive image of the scene comprising a plurality of interactive features modifiable by an end user according to user preference (para. [0064], note that a multi-view interactive digital media representation can use a series of 2-D images of a physical object taken from multiple viewpoints. When the 2-D images are output to a display, the physical object can appear to undergo a 3-D transformation, such as a rotation in 3-D space), wherein the generated interactive image of the scene comprises a two-dimensional image with at least partial three-dimensional capabilities but without having an underlying three-dimensional model (para. [0064], note that this embodiment of the multi-view interactive digital media representation approach differs from using a full 3-D model of the physical object. Also see para. [0069], note that in this embodiment of the multi-view interactive digital media representation approach, because of the elimination of the 3-D modeling steps, user-selected objects from user generated 2-D images can be converted quickly to a multi-view interactive digital media representation and then output to a display in real-time.) and wherein the at least partial three-dimensional capabilities are facilitated by the one or more machine learning based networks that have at least in part been trained on training images generated from three-dimensional models (para. [0043], note that a 3-D skeleton of an object is constructed from image data using machine learning algorithms and structure from motion algorithms) and wherein the at least partial three-dimensional capabilities are facilitated by the one or more machine learning based on networks that have at least in part been trained on training images generated from three-dimensional models (para. [0223], note that nee aspect of the methods can be associated with using a trained machine learning algorithm (TMLA) to recognize landmarks on an object, such as a car, in the image data, such as the frames associated with an MVIDMR or another sequence of frames including the object).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach generating an interactive image of the scene comprising a plurality of interactive features modifiable by an end user according to user preference, wherein the generated interactive image of the scene comprises a two-dimensional image with at least partial three-dimensional capabilities but without having an underlying three-dimensional model and wherein the at least partial three-dimensional capabilities are facilitated by the one or more machine learning based networks that have at least in part been trained on training images generated from three-dimensional models and wherein the at least partial three-dimensional capabilities are facilitated by the one or more machine learning based on networks that have at least in part been trained on training images generated from three-dimensional models. The suggestion/motivation for doing so would have been in order to acquire an interactive input system and allow manipulation of image objects (para. [0002]). Therefore, it would have been obvious to combine Eder with Beall to obtain the invention as specified in claim 1.

(2) regarding claim 2:
Eder further disclosed the method of claim 1, wherein the generated interactive image comprises a merging of actual reality captured with one or more imaging devices and virtual reality generated using the one or more machine learning based networks (para. [0061], note that a user may attach high-resolution images of the scene and associated comments to a spatially localized annotation in the virtual representation in order to better indicate a feature of the location. In another example, a user can interactively indicate the sequence of corners and walls corresponding to the layout of the location to create a floor plan).

(3) regarding claim 3:
Eder further disclosed the method of claim 1, wherein the generated interactive image is indistinguishable from imagery generated using physically based rendering techniques (para. [0080], note that metric posed RGB-D images or video with pose information within a location (e.g., a physical scene of a room) may be used as inputs to construct the 3D model by integrating posed depth images into a persistent representation of distance-to-surface function).

(4) regarding claim 4:
Eder further disclosed the method of claim 1, wherein one or more of the plurality of interactive features are manipulated to edit the generated interactive image according to user preferences (para. [0052], note that the graphical user interface provides multiple capabilities for users to view, edit, augment, and otherwise modify the virtual representation VIR and its associated information).

(5) regarding claim 5:
Eder further disclosed the method of claim 1, wherein the plurality of interactive features is customizable or user configurable (para. [0052], note that the graphical user interface further enables a user to review previously captured scenes 4001, merge captured scenes, add new images and videos to a scene, and mark out a floor plan of a scene, among other capabilities).

(6) regarding claim 6:
Eder further disclosed the method of claim 1, wherein the plurality of interactive features comprises one or more image editing options (para. [0061], note that the software used to edit the image, or other information related to the image, the location or the camera).

(7) regarding claim 7:
Eder further disclosed the method of claim 1, wherein the plurality of interactive features comprises options to modify one or more of size, orientation, and placement location of an object comprising the scene (para. [0052], note that the graphical user interface further enables a user to review previously captured scenes 4001, merge captured scenes, add new images and videos to a scene, and mark out a floor plan of a scene, among other capabilities).

(8) regarding claim 8:
Eder further disclosed the method of claim 1, wherein the plurality of interactive features comprises options to modify one or more of camera pose and zoom level with respect to the scene (para. [0060], note that computing camera poses associated with the additional images with respect to the existing plurality of images and the 3D model using a geometric estimation or a machine learning model configured to estimate camera poses).

(9) regarding claim 9:
Eder further disclosed the method of claim 1, wherein the plurality of interactive features comprises an option to modify an environment or a background of the scene (para. [0062], note that the graphical user interface may provide a top-down virtual view (e.g., 3D model) of a scene. At operation 5303, a user may select and join corners of a floor in the 3D model using the graphical user interface).

(10) regarding claim 11:
Eder further disclosed the method of claim 1, wherein the plurality of interactive features comprises an option to modify a texture or a material of a surface comprising the scene (para. [0140], note that the first graphical user interface also enables the user to choose to further refine the 3D model by applying algorithms that may improve the camera pose estimation and color texturing, as well as other aspects of the 3D model, in order to better inspect the quality of the 3D model).

(11) regarding claim 13:
Eder further disclosed the method of claim 1, wherein the one or more sets of captured images of the scene are captured in a known and controlled physical imaging environment (para. [0049], note that the location may be a physical scene, a room, a warehouse, a classroom, an office space, an office room, a restaurant room, a coffee shop, etc. In an embodiment, operation 4001 involves capturing a scene to receive captured data 4000).

(12) regarding claim 14:
Eder further disclosed the method of claim 1, wherein the one or more sets of captured images of the scene comprise one or more different views of the scene (para. [0125], note that if a user specifies a region of interest through a graphical user interface, the user could view the images associated with that region of the 3D model. Similarly, a camera position and orientation within the context of the 3D model is also available from this process, allowing a user to specify a specific image to view based on its spatial location in the virtual representation).

(13) regarding claim 17:
Eder further disclosed the method of claim 1, wherein the generated interactive image comprises a still image, a video frame, or a view of a three-dimensional space (para. [0143], note that the video stream and real-time construction of the 3D model is a continuous process and both the video stream and corresponding update of the 3D model can be seen).

(14) regarding claim 18:
Eder further disclosed the method of claim 1, further comprising generating a web page or an interactive application comprising a plurality of differently generated images, including the generated interactive image, embedded in the web page or the interactive application (para. [0204], note that FIG. 19 illustrates details of how a deployment server 300 running AI framework may be architected. It may include one or more of a consumer interaction module 302, a service provider interaction module 304, an AI improvement engine 306, a database 308, and/or other elements).

The proposed rejection of claim 1, renders obvious the steps of the system of claim 19 (para. [0049], system is disclosed also see para. [0054] a mobile computing device associated with a user and transmitted to the one or more processors) and claim 20, (para. [0012], computer medium) because these steps occur in the operation of the proposed rejection as discussed above. Thus, the arguments similar to that presented above for claim 1 is equally applicable to claims 19-20.

Claim(s) 10, 12, 15-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over  Eder and Beall, and further in view of Sadalgi et al. (US Publication Number 2022/0084296 A1, hereinafter “Sadalgi”).

(1) regarding claim 10:
Eder disclosed most of the subject matter as described as above except for specifically teaching wherein the plurality of interactive features comprises options to modify one or more of lighting and shadows of the scene.
However, Sadalgi disclosed wherein the plurality of interactive features comprises options to modify one or more of lighting and shadows of the scene (para. [0111], note that the lighting information indicates ambient light intensity, in the physical scene; and setting lighting in the 3D scene in accordance with the lighting information and using the indication of the plane in the physical scene to generate lighting effects (e.g., shadows and/or reflections on a surface) in the 3D scene).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach wherein the plurality of interactive features comprises options to modify one or more of lighting and shadows of the scene. The suggestion/motivation for doing so would have been in order to provide more accurate, photorealistic visualizations while reducing the overall computational resources needed by the user's device (para. [0102]). Therefore, it would have been obvious to combine Eder, Beall with Sadalgi to obtain the invention as specified in claim 10.

(2) regarding claim 12:
Eder disclosed most of the subject matter as described as above except for specifically teaching wherein the plurality of interactive features comprises options to modify one or more of file type or format, file size, and file resolution of the generated interactive image.
However, Sadalgi disclosed wherein the plurality of interactive features comprises options to modify one or more of file type or format, file size, and file resolution of the generated interactive image (para. [0166], note that the lower resolution 3D model may allow the XR system 102A of the computing device 102 to efficiently render the 3D model of the product in the XR Scene, also see para. [0199] for size, para. [0116], format). 
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach wherein the plurality of interactive features comprises options to modify one or more of file type or format, file size, and file resolution of the generated interactive image. The suggestion/motivation for doing so would have been in order to provide more accurate, photorealistic visualizations while reducing the overall computational resources needed by the user's device (para. [0102]). Therefore, it would have been obvious to combine Eder, Beall with Sadalgi to obtain the invention as specified in claim 12.

(3) regarding claim 15:
Eder disclosed most of the subject matter as described as above except for specifically teaching wherein a set of captured images of the one or more sets of captured images of the scene comprises images captured with different lighting conditions, different camera poses, or both.
However, Sadalgi disclosed wherein a set of captured images of the one or more sets of captured images of the scene comprises images captured with different lighting conditions, different camera poses, or both (para. [0154], note that shown in FIG. 1A. FIG. 1B shows a physical scene 108. The physical scene 108 includes object 1 108C and object 2 108D. The physical scene 108 includes light source 1 108A and light source 2 108B. In one example, the physical scene 108 may be a space (e.g., a room or portion thereof) in a home of the user 110).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach wherein a set of captured images of the one or more sets of captured images of the scene comprises images captured with different lighting conditions, different camera poses, or both. The suggestion/motivation for doing so would have been in order to provide more accurate, photorealistic visualizations while reducing the overall computational resources needed by the user's device (para. [0102]). Therefore, it would have been obvious to combine Eder, Beall with Sadalgi to obtain the invention as specified in claim 15.

(4) regarding claim 16:
Eder disclosed most of the subject matter as described as above except for specifically teaching wherein the generated interactive image is photorealistic.
However, Sadalgi disclosed wherein the generated interactive image is photorealistic (para. [0100], note that a product in a physical scene by generating a high-quality, photorealistic two-dimensional (2D) image of the product within the physical scene (rather than a higher-resolution 3D image of the product).).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach wherein the generated interactive image is photorealistic. The suggestion/motivation for doing so would have been in order to provide more accurate, photorealistic visualizations while reducing the overall computational resources needed by the user's device (para. [0102]). Therefore, it would have been obvious to combine Eder, Beall with Sadalgi to obtain the invention as specified in claim 16.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Totty et al. (US Publication Number 2020/0302686 A1) disclosed a method for determining a visual scene virtual representation and a highly accurate visual scene-aligned geometric representation for virtual interaction.

Rowell et al. (US Publication Number 2020/0342652 A1) disclosed systems and methods for generating synthetic image data including synthetic images, depth information, and optical flow data. Embodiments of the invention assemble image scenes from virtual objects and capture realistic perspectives of images scenes as synthetic images. Realistic perspectives captured in synthetic images are defined by camera views created from camera settings files.

Any inquiry concerning this communication or earlier communication from the examiner should be directed to Hilina K Demeter whose telephone number is (571) 270-1676. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Y. Poon could be reached at (571) 270- 0728. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about PAIR system, see http://pari-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HILINA K DEMETER/Primary Examiner, Art Unit 2617

Read full office action

Prosecution Timeline

Oct 14, 2022

Application Filed

Aug 24, 2024

Non-Final Rejection — §103, §DP

Oct 24, 2024

Applicant Interview (Telephonic)

Oct 24, 2024

Examiner Interview Summary

Nov 27, 2024

Response Filed

Dec 10, 2024

Final Rejection — §103, §DP

Mar 13, 2025

Examiner Interview Summary

Mar 13, 2025

Request for Continued Examination

Mar 13, 2025

Applicant Interview (Telephonic)

Mar 15, 2025

Response after Non-Final Action

Apr 10, 2025

Non-Final Rejection — §103, §DP

Jul 15, 2025

Response Filed

Jul 16, 2025

Examiner Interview Summary

Jul 16, 2025

Applicant Interview (Telephonic)

Sep 23, 2025

Final Rejection — §103, §DP

Dec 16, 2025

Applicant Interview (Telephonic)

Dec 17, 2025

Examiner Interview Summary

Dec 23, 2025

Request for Continued Examination

Jan 09, 2026

Response after Non-Final Action

Jan 16, 2026

Non-Final Rejection — §103, §DP

Mar 31, 2026

Examiner Interview Summary

Mar 31, 2026

Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

18/083,474

Patent 12602864

EVENT ROUTING IN 3D GRAPHICAL ENVIRONMENTS

2y 5m to grant Granted Apr 14, 2026

18/378,049

Patent 12592042

SYSTEMS AND METHODS FOR MAINTAINING SECURITY OF VIRTUAL OBJECTS IN A DISTRIBUTED NETWORK

2y 5m to grant Granted Mar 31, 2026

17/966,363

Patent 12586297

INTERACTIVE IMAGE GENERATION

2y 5m to grant Granted Mar 24, 2026

18/331,906

Patent 12579724

EXPRESSION GENERATION METHOD AND APPARATUS, DEVICE, AND MEDIUM

2y 5m to grant Granted Mar 17, 2026

18/154,219

Patent 12561906

METHOD FOR GENERATING AT LEAST ONE GROUND TRUTH FROM A BIRD'S EYE VIEW

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

5-6

Expected OA Rounds

72%

Grant Probability

91%

With Interview (+19.4%)

3y 1m

Median Time to Grant

High

PTA Risk

Based on 659 resolved cases by this examiner. Grant probability derived from career allow rate.

INTERACTIVE IMAGE GENERATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email