Last updated: April 19, 2026
Application No. 18/971,975
ELECTRONIC DEVICE FOR GENERATING VIRTUAL OBJECT BASED ON MULTI-MODAL INFORMATION AND METHOD FOR OPERATING THE SAME

Final Rejection §102§103
Filed
Dec 06, 2024
Examiner
DAVIS, DAVID DONALD
Art Unit
2627
Tech Center
2600 — Communications
Assignee
Samsung Electronics Co., Ltd.
OA Round
2 (Final)
Interview Optional

— +9.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 900 resolved cases, 2023–2026
Examiner Intelligence

DAVIS, DAVID DONALD View full profile →
Grants 70% — above average
Career Allow Rate
631 granted / 900 resolved
+8.1% vs TC avg
Moderate +9% lift
Without
With
+9.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
41 currently pending
Career history
941
Total Applications
across all art units
Statute-Specific Performance

§101
1.2%
-38.8% vs TC avg
§103
41.6%
+1.6% vs TC avg
§102
40.8%
+0.8% vs TC avg
§112
10.6%
-29.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 900 resolved cases
Office Action

§102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on October 9, 2025 has been considered by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-5 and 7-15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kim et al (US 2017/0212585).

As per claim 1 Kim et al depicts in figure 1 and discloses: An electronic device 100 comprising:

 a display 160; a camera 180 configured to obtain an image; a memory 130 storing at least one instruction; and at least one processor 120 / 301 configured to execute the at least one instruction to cause the electronic device 100 to {figure 1}: 

obtain spatial information about a real-world space based on the image obtained through the camera 180 {figure 3B: 301};

obtain user inputs based on the image obtained through the camera 180; obtain object characteristic information from the user inputs { [0052] For example, the data about the real object can include location information of each of real objects included in an image collected by the camera module 180 and distance information and relative distance information from the camera module 180, and the like. The data about the virtual object can include a size and a location of each of virtual objects output together with real objects generated by an input of the user, information about correlation with the real objects, and the like.}; 

obtain object generation information for generating a virtual object, based on the spatial information and the object characteristic information {figure 3B: 302}; 

generate the virtual object for the object generation information by inputting the object generation information to a generative artificial intelligence (AI) model trained to generate a three-dimensional (3D) virtual object based on information about a space and an object {figure 3B: 304 & [0087] The virtual object modeling unit 301e can model a virtual model operated by the operation object used by the user to generate a 3D virtual rendering space including the modeled virtual object. Location information of the virtual object modeled by the virtual object modeling unit 301e can be represented as a 3D coordinate on the 3D rendering space. Also, the virtual object modeling unit 301e can add physical characteristic information in a weightless state of a virtual object to model the virtual object.}; and 

control the display 160 to display 160 the virtual object {figure 3B: 305}.

As per claim 2 Kim et al discloses: The electronic device 100 of claim 1, wherein the camera 180 comprises a first camera 180 configured to obtain a spatial image of the real-world space by capturing an image of the real-world space, and wherein the at least one processor 120 / 301 is further configured to execute the at least one instruction to cause the electronic device 100 to: 

obtain the spatial information regarding at least one of a type, a category, a color, a theme, or an atmosphere of the real-world space from the spatial image obtained through the first camera 180 {[0069] The camera module 180 can collect image data in real time from an AR output. The camera module 180 can include at least one red, green, and blue (RGB) cameras, an infrared (IR) camera, a depth camera, or a thermal camera. In various embodiments, the image data can be generated by combining data collected by a plurality of cameras.}.

As per claim 3 Kim et al discloses: The electronic device 100 of claim 1, wherein the camera 180 comprises a second camera 180 configured to obtain a hand image by capturing an image of a hand of a user, and wherein the at least one processor 120 / 301 is further configured to execute the at least one instruction to cause the electronic device 100 to: 

recognize a gesture input from the user in the hand image obtained through the second camera 180; and extract the object characteristic information regarding at least one of a shape, a location, or a size of the object from the gesture input {[0077] In various embodiments, an image collected by the camera module 211 can be used to generate a depth map and recognize a gesture of the user. Also, the camera module 211 can be used for 3D modeling/mapping of an object around the electronic device 201.}.

As per claim 4 Kim et al discloses: The electronic device 100 of claim 3, wherein the second camera 211 is configured as a depth camera 211 comprising at least one of a time-of-flight (ToF) camera, a stereo vision camera 211, or a light detection and ranging (LiDAR) sensor, and is configured to obtain a depth image by capturing the image of the hand of the user {[0076] In an embodiment, the camera module 211 can be a stereo camera. The stereo camera can capture one object to simultaneously obtain two images by mounting two capturing lenses. In another embodiment, the camera module 211 can include an IR output device and an IR camera. The IR output device can emit infrared rays to an object around the electronic device 201, and the IR camera can sense infrared rays reflected from the object.}, and wherein the at least one processor 120 / 301 is further configured to execute the at least one instruction to cause the electronic device 100 to: 

recognize the gesture input from the user in the depth image obtained through the second camera 211 {[0077] In various embodiments, an image collected by the camera module 211 can be used to generate a depth map and recognize a gesture of the user.}.

As per claim 5 Kim et al discloses: The electronic device 100 of claim 1, further comprising: a touch screen configured to receive a touch input from a user, wherein the at least one processor 120 / 301 is further configured to execute the at least one instruction to cause the electronic device 100 to: 

recognize a gesture input from the touch input received through the touch screen { [0059] The display 160 can include a touch screen, and can receive, for example, a touch, a gesture, proximity, or a hovering input using an electronic pen or part of a body of the user.}; and 

extract the object characteristic information regarding at least one of a shape, a location, or a size of the object from the gesture input {[0103] For example, the user input can be menu selection on a graphic user interface (GUI). For another example, the user input can be an input according to a specific gesture or a specific input pattern of the user.}.

As per claim 7 Kim et al discloses: The electronic device 100 of claim 1, wherein the at least one processor 120 / 301 is further configured to execute the at least one instruction to cause the electronic device 100 to: 

obtain a two-dimensional (2D) guide image 731 {figure 7}; and 

extract the object characteristic information 741 comprising at least one of a type, a shape, a color, or a theme of the object from the 2D guide image 731 {figure 7}.

As per claim 8 Kim et al discloses: The electronic device 100 of claim 1, wherein the object characteristic information comprises at least one of first object characteristic information obtained from spatial information, second object characteristic information obtained from a gesture input, third characteristic information obtained from a voice input, or fourth characteristic information obtained from a 2D guide image {Note: the fourth characteristic set forth by way of alternative language is disclosed}, wherein the at least one processor 120 / 301 is further configured to execute the at least one instruction to cause the electronic device 100 to: 

convert the first object characteristic information into first feature data by performing vector embedding on the spatial information; 

convert the second object characteristic information into second feature data by performing vector embedding on the second object characteristic information obtained from the gesture input; 

convert the third object characteristic information into third feature data by performing vector embedding on the third object characteristic information obtained from the voice input, 

convert the fourth object characteristic information into fourth feature data by performing vector embedding on the fourth object characteristic information obtained from the 2D guide image; and obtain feature data representing the object generation information based on the first to fourth feature data { [0190] In an embodiment, the processor 120 can extract a second portion in the 3D space information included in a range corresponding to a path of a user input and can measure a distance between the second portion and the path of the user input. For example, if a plane normal vector corresponding to the path of the user input is x and if a plane normal vector corresponding to the second portion is y, a distance between the path of the user input and the second portion can be calculated through a vector inner product (x.Math.y).}.

As per claim 9 Kim et al discloses: The electronic device 100 of claim 8, wherein the at least one processor 120 / 301 is further configured to execute the at least one instruction to cause the electronic device 100 to: modify the object generation information based on the user inputs {figure 3C: 350 & figure 7}.

As per claim 10 Kim et al discloses since the alternative language is set forth: The electronic device 100 of claim 9, wherein the at least one processor 120 / 301 is further configured to execute the at least one instruction to cause the electronic device 100 to: modify the object generation information by adjusting, based on the user inputs, a weight value assigned to each of the first object characteristic information, the second object characteristic information extracted from the gesture input, the third object characteristic information obtained from the voice input, and the fourth object characteristic information obtained from the 2D guide image {figure 3C: 350 & figure 7}.

As per claim 11 Kim et al discloses: A method, performed by an electronic device 100, of generating a virtual object, the method comprising:

 obtaining an image through a camera 180; obtaining user inputs based on the image obtained via through the camera 180 {figure 1}; 

obtaining spatial information about a real-world space based on the image {figure 3B: 301}; 

obtaining object characteristic information from the user inputs { [0052] For example, the data about the real object can include location information of each of real objects included in an image collected by the camera module 180 and distance information and relative distance information from the camera module 180, and the like. The data about the virtual object can include a size and a location of each of virtual objects output together with real objects generated by an input of the user, information about correlation with the real objects, and the like.}; 

obtaining object generation information for generating the virtual object, based on the spatial information and the object characteristic information {figure 3B: 302}; 

generating the virtual object for the object generation information by inputting the object generation information to a generative artificial intelligence (AI) model trained to generate a three-dimensional (3D) virtual object based on information about a space and an object {figure 3B: 304 & [0087] The virtual object modeling unit 301e can model a virtual model operated by the operation object used by the user to generate a 3D virtual rendering space including the modeled virtual object. Location information of the virtual object modeled by the virtual object modeling unit 301e can be represented as a 3D coordinate on the 3D rendering space. Also, the virtual object modeling unit 301e can add physical characteristic information in a weightless state of a virtual object to model the virtual object.}; and 

displaying, with a display 160, the virtual object {figure 3B: 305}.

As per claim 12 Kim et al discloses: The method of claim 11, wherein the object characteristic information comprises at least one of first object characteristic information obtained from spatial information, second object characteristic information obtained from a gesture input, third characteristic information obtained from a voice input, or fourth characteristic information obtained from a 2D guide image {Note: the fourth characteristic set forth by way of alternative language is disclosed}, and wherein the generating of the object generation information comprises: 

converting the first object characteristic information into first feature data by performing vector embedding on the spatial information; 

converting the second object characteristic information into second feature data by performing vector embedding on the second object characteristic information obtained from the gesture input; 

converting the third object characteristic information into third feature data by performing vector embedding on the third object characteristic information obtained from the voice input; 

converting the fourth object characteristic information into fourth feature data by performing vector embedding on the fourth object characteristic information obtained from the 2D guide image; and obtaining feature data representing the object generation information based on the first to fourth feature data { [0190] In an embodiment, the processor 120 can extract a second portion in the 3D space information included in a range corresponding to a path of a user input and can measure a distance between the second portion and the path of the user input. For example, if a plane normal vector corresponding to the path of the user input is x and if a plane normal vector corresponding to the second portion is y, a distance between the path of the user input and the second portion can be calculated through a vector inner product (x.Math.y).}.

As per claim 13 Kim et al discloses: The method of claim 12, further comprising: receiving the user inputs for modifying the object generation information; and modifying the object generation information based on the user inputs {figure 3C: 350 & figure 7}.

As per claim 14 Kim et al discloses: The method of claim 13, wherein the modifying of the object generation information comprises modifying the object generation information by adjusting, based on the user inputs, a weight value assigned to each of the first object characteristic information, the second object characteristic information obtained from the gesture input, the third object characteristic information obtained from the voice input, and the fourth object characteristic information obtained from the 2D guide image {figure 3C: 350 & figure 7}.

As per claim 15 Kim et al discloses: A computer program product comprising a computer-readable storage medium, wherein the computer-readable storage medium comprises instructions that are readable by an electronic device 100 to: 

obtain an image through a camera 180; obtain user inputs based on the image obtained through the camera 180 {figure 1}; 

obtain spatial information about a real-world space based on the image {figure 3B: 301}; 

obtain object characteristic information from the user inputs { [0052] For example, the data about the real object can include location information of each of real objects included in an image collected by the camera module 180 and distance information and relative distance information from the camera module 180, and the like. The data about the virtual object can include a size and a location of each of virtual objects output together with real objects generated by an input of the user, information about correlation with the real objects, and the like.}; 

obtain object generation information for generating a virtual object, based on the spatial information and the object characteristic information {figure 3B: 302}; 

generate the virtual object for the object generation information by inputting the object generation information to a generative artificial intelligence (AI) model trained to generate a three-dimensional (3D) virtual object based on information about a space and an object {figure 3B: 304 & [0087] The virtual object modeling unit 301e can model a virtual model operated by the operation object used by the user to generate a 3D virtual rendering space including the modeled virtual object. Location information of the virtual object modeled by the virtual object modeling unit 301e can be represented as a 3D coordinate on the 3D rendering space. Also, the virtual object modeling unit 301e can add physical characteristic information in a weightless state of a virtual object to model the virtual object.}; and 

display, with a display 160, the virtual object {figure 3B: 305}.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Kim et al (US 2017/0212585) in view of Fallon (US 2018/0004481).

As per claim 6 Kim et al discloses: The electronic device 100 of claim 1, further comprising: a microphone configured to receive a voice input from a user, wherein the at least one processor 120 / 301 is further configured to execute the at least one instruction to cause the electronic device 100 to: 

obtain a speech signal from the voice input { [0246] The processor 120 can use ambient information such as a voice during an input operation.} received through the microphone { [0335] The audio module 1980 can interchangeably convert a sound into an electric signal. At least some of components of the audio module 1980 can be included in, for example, an I/O interface 150 shown in FIG. 1. The audio module 1980 can process sound information input or output through, for example, a speaker 1982, a receiver 1984, an earphone 1986, or the microphone 1988, and the like. }; 

Regarding claim 6 Kim et al is silent as to: converting the speech signal into text; and extracting the object characteristic information comprising at least one of a type, a shape, a color, or a theme of the object by analyzing the text using a natural language understanding (NLU) model.  With respect to claim 6 Fallon depicts in figure 2:  converting the speech signal into text {figure 2: 226}; and extracting the object characteristic information comprising at least one of a type, a shape, a color, or a theme of the object by analyzing the text using a natural language understanding (NLU) model {figure 2: 228}.  

It would have been obvious to a person having ordinary skill in the art at the time the invention was effectively filed to have the electronic device of Kim et al convert the speech signal into text; and extract the object characteristic information comprising at least one of a type, a shape, a color, or a theme of the object by analyzing the text using a natural language understanding (NLU) model. The rationale is as follows: one of ordinary skill in the art at the time the invention was effectively filed would have been motivated to have an electronic device to convert the speech signal into text; and extract the object characteristic information comprising at least one of a type, a shape, a color, or a theme of the object by analyzing the text using a natural language understanding (NLU) model so as to provide a user friendly hands-free input by using voice commands.

Response to Arguments
Applicant's arguments filed November 10, 2025 have been fully considered but they are not persuasive. In the second full paragraph on page 2 applicant asserts the following:
The citation to process step 304 in Fig. 3B is merely a general statement of modeling a 3D virtual object. Neither paragraph [0087] nor step 304 describes a generative AI model trained to generate the virtual object.

First, the claims only require a model generating a three-dimensional virtual object, which Kim et al discloses, since applicant hasn’t set forth anything regarding the “generative AI” architecture in the claims.  Also, figure 3 of the instant application merely shows a box numbered 146.  The instant specification in [0094] discloses: 
The generative AI model 146 is an AI model trained to generate a 3D virtual object based on spatial information and object characteristic information. In one or more embodiments of the disclosure, the generative AI model 146 may be implemented as a GAN or a diffusion model based on multi-modalities. A diffusion model based on multi-modalities may be implemented, for example, as Stable Diffusion or SDFusion. However, the generative AI model 146 is not limited thereto, and may include any known AI model in the art trained to generate a 3D virtual object based on spatial information and object characteristic information.

The description in the specification, which simply discloses known and proprietary AI models, doesn’t further enable model 146 show in figure 3 or the claims.  Therefore, the model in claimed invention is not unlike the model of Kim et al shown in figure 3B: 304 and [0087].

Additionally in the paragraph bridging pages 2 and 3 applicant asserts the following:
Until the Examiner cites an express teaching of a trained generative AI model in Kim or explains why the gesture-manipulated model of Kim is not merely a user-edited 3D rendering done in real-time, the anticipation rejection cannot stand. In other words, the mention of a 3D model is not expressly or inherently a generative artificial intelligence (AI) model trained to generate a three-dimensional (3D) virtual object. The above comments explain why the Examiner's citations to Kim does not expressly or inherently teach a trained generative AI model. It is the Examiner's burden to show where the trained generative AI model is taught or suggested. Until then, the anticipation rejection is incomplete, improper and is based on unsupportable assumption.

Applicants adversarial and inaccurate assertion with respect to the rejection supra, fails to clearly and specifically point out why the generative AI model in the claimed invention is novel. The claimed invention of claim 1 doesn’t preclude rendering done in real-time and the claimed invention doesn’t provide limiting detail as to what the “generative artificial intelligence (AI) model” is “trained” to do beyond rendering a virtual object.  Therefore, the claimed invention is not unlike the modeling Kim et al discloses and teaches.

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID D DAVIS whose telephone number is (571)272-7572. The examiner can normally be reached Monday - Friday, 8 a.m. - 4 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ke Xiao can be reached at 571-272-7776. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DAVID D DAVIS/Primary Examiner, Art Unit 2627                                                                                                                                                                                                        

DDD
Read full office action
Prosecution Timeline

Dec 06, 2024
Application Filed
Aug 07, 2025
Non-Final Rejection — §102, §103
Nov 10, 2025
Response Filed
Feb 06, 2026
Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/088,300
Patent 12602106
Ambience-Driven User Experience
2y 5m to grant Granted Apr 14, 2026
18/422,649
Patent 12602128
DISPLAY DEVICE HAVING PIXEL DRIVE CIRCUITS AND SENSOR DRIVE CIRCUITS
2y 5m to grant Granted Apr 14, 2026
18/963,891
Patent 12602121
TOUCH DEVICE FOR PASSIVE RESONANT STYLUS, DRIVING METHOD FOR THE SAME AND TOUCH SYSTEM
2y 5m to grant Granted Apr 14, 2026
18/538,736
Patent 12596265
Aiming Device with a Diffractive Optical Element and Reflective Image Combiner
2y 5m to grant Granted Apr 07, 2026
18/397,082
Patent 12592178
Display Device Including an Electrostatic Discharge Circuit for Discharging Static Electricity
2y 5m to grant Granted Mar 31, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
70%
Grant Probability
79%
With Interview (+9.1%)
3y 2m
Median Time to Grant
Moderate
PTA Risk
Based on 900 resolved cases by this examiner. Grant probability derived from career allow rate.
ELECTRONIC DEVICE FOR GENERATING VIRTUAL OBJECT BASED ON MULTI-MODAL INFORMATION AND METHOD FOR OPERATING THE SAME

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email