Last updated: April 19, 2026
Application No. 18/279,752
VIRTUAL OBJECT PLACEMENT BASED ON REFERENTIAL EXPRESSIONS

Non-Final OA §103
Filed
Aug 31, 2023
Examiner
GALERA, PATRICK PAUL CONTRER
Art Unit
2617
Tech Center
2600 — Communications
Assignee
Apple Inc.
OA Round
3 (Non-Final)
Interview Optional

— +16.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 7 resolved cases, 2023–2026
Examiner Intelligence

GALERA, PATRICK PAUL CONTRER View full profile →
Grants 86% — above average
Career Allow Rate
6 granted / 7 resolved
+23.7% vs TC avg
Strong +17% interview lift
Without
With
+16.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
21 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.1%
-37.9% vs TC avg
§103
72.9%
+32.9% vs TC avg
§102
18.8%
-21.2% vs TC avg
§112
5.2%
-34.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 7 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments/Amendments
A corrected abstract of the disclosure filed on February 5, 2026 has been fully considered. Objection to the abstract of the disclosure is withdrawn.
Objection to claim 10 is withdrawn in accordance with the applicant’s amendment filed on Feb 5, 2026.
Applicant’s arguments/amendment filed on February 5, 2026 with respect to independent claims 1, 22, and 23 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims  1-4, 6-7, 9-10, 13-14, 18, 22-23, and 25--26 are rejected under 35 U.S.C. 103 as being unpatentable by Fisher (US 20200151277 A1, hereinafter “Fisher”) in view of Moon et al. (US 20170316268 A1, hereinafter “Moon”) in view of Ren et al. (US 20220207872 A1, hereinafter “Ren”).

Regarding claim 1:
Fisher teaches: 
A electronic device, comprising: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors (Fisher: ¶113, "Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Embodiments within the scope of the present disclosure also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. In particular, one or more of the processes described herein may be implemented at least in part as instructions embodied in a non-transitory computer-readable medium and executable by one or more computing devices (e.g., any of the media content access devices described herein). In general, a processor (e.g., a microprocessor) receives instructions, from a non-transitory computer-readable medium, (e.g., a memory, etc.), and executes those instructions, thereby performing one or more processes, including one or more of the processes described herein"), 
the one or more programs including instructions for:
receiving a speech input (Fisher: ¶42, "In one or more embodiments, the series of acts 200 includes a first act 202 of identifying a natural language phrase. Specifically, the 3D modeling system 102 identifies a natural language phrase based on a user input by a user. The user input can include a text input (e.g., via keyboard or touchscreen) or a voice input (e.g., via microphone). The 3D modeling system 102 can also determine that a user input includes a plurality of natural language phrases and then process each phrase individually. Alternatively, the 3D modeling system 102 can determine that a plurality of natural language phrases combine to form a single request and process the plurality of natural language phrases together"
including a referenced virtual object (Fisher: ¶58, "FIG. 3A illustrates a dependency tree 300 that the 3D modeling system 102 generates for a natural language phrase. Specifically, FIG. 3A illustrates a natural language phrase that reads “Put some books on the desk.” The dependency tree illustrates dependencies (including dependency/relationship types) involving the different phrase components in the natural language phrase. The dependency tree provides a consistent lexical structure that the 3D modeling system can use to identify different words in a phrase");
(NOTE 1A: The “books” in “Put some books on the desk” is the referenced virtual object.)
obtaining, based on the speech input, a first reference set (Fisher: ¶70, "While FIG. 3B illustrates an entity-command representation 302 for the phrase of FIG. 3A, the 3D modeling system 102 can generate entity-command representations for more complex phrases involving any number of nouns and commands. For instance, a natural language phrase can include a plurality of requests. The 3D modeling system 102 can parse the natural language phrase to determine each separate request and then generate separate entity-command representations for each request. To illustrate, for a natural language phrase that states, “Move the chairs around the dining table farther apart and transfer some of the books on the desk to a table,” the 3D modeling system 102 can determine that the natural language phrase includes a first request to move the chairs around the table farther apart and a second request to transfer some books from a desk to a table"; Fisher: ¶73, "After generating the entity-command representation 302, the 3D modeling system 102 generates a semantic scene graph 316 corresponding to the entity-command representation 302, as illustrated in FIG. 3C. In one or more embodiments, the 3D modeling system 102 generates the semantic scene graph 316 to create a representation that the 3D modeling system 102 can use to easily identify a previously generated three-dimensional scene that most closely corresponds to the requested three-dimensional scene. As described below, the semantic scene graph 316 includes a plurality of nodes and edges that indicate spatial relationships between objects in a three-dimensional scene");
(NOTE 1B: The semantic scene graph 316 is the first reference set. Also see FIsher: Fig. 3C.)
comparing the first reference set to a plurality of second reference sets (Fisher: ¶87, "FIG. 5 illustrates a plurality of available three-dimensional scenes from a content database identified based on the semantic scene graph 316 of FIG. 3C. Specifically, a first scene 500 includes a table with a single book on top of the table. A second scene 502 includes a desk with six books on top of the desk. A third scene 504 includes a bookcase with three books on a shelf within the bookcase. The 3D modeling system 102 compares the semantic scene graph 316 to the semantic scene graphs of each scene and then selects at least one scene that most closely match the intended request of the natural language phrase"; Fisher: ¶35, "In one or more embodiments, the 3D modeling system 102 also accesses the content database 106 to obtain semantic scene graph information for the plurality of available three-dimensional scenes. In particular, the 3D modeling system 102 can obtain semantic scene graphs of the plurality of available three-dimensional scenes in the content database 106. The 3D modeling system 102 can obtain the semantic scene graphs by analyzing the three-dimensional scenes and then generating the semantic scene graphs. Alternatively, the 3D modeling system 102 obtains pre-generated semantic scene graphs from the content database 106 or third-party system");
(NOTE 1C: Fisher teaches a plurality of second reference sets (the semantic scene graphs of each three-dimensional scenes; scenes 500, 502, and 504) as discussed in Fisher: ¶35, and ¶87 and compares the first reference set which is the semantic scene graph 316 to the semantic scene graphs of each three-dimensional scenes; scenes 500, 502, and 504).
identifying an object based on the second reference set (Fisher: ¶88, “The 3D modeling system 102 selects the second scene 502 (note: the selected second reference set) for use in generating the three-dimensional scene of the natural language phrase. The 3D modeling system selects the second scene 502 because the second scene 502 is most similar to the semantic scene graph 316 (i.e., the second scene 502 includes a desk with books on top of the desk). In particular, even though the second scene 502 has some differences based on the semantic scene graph of the second scene 502 (i.e., six books instead of three books as in the semantic scene graph 316), the second scene 502 is still closer to the requested scene than other available scenes (e.g., books on a desk, rather than books on a table or in a bookcase). Furthermore, because the requested scene included an imprecise count (“some”) with reference to the books entity, the 3D modeling system 102 can determine that the second scene 502 is an acceptable deviation from the semantic scene graph 316. Additionally, the 3D modeling system 102 may require certain entities, commands, and/or other properties to be shared for a scene to be a close match”; “Fisher: ¶89, "As briefly mentioned previously, the 3D modeling system 102 can generate the three-dimensional scene for the natural language phrase using an available three-dimensional scene by inserting objects of the selected, available three-dimensional scene (or from a plurality of available three-dimensional scenes) into a workspace of the user. Specifically, the 3D modeling system 102 can select an object in the available three-dimensional scene based on an object identifier that corresponds to the object. The 3D modeling system 102 can then copy the object, along with one or more properties of the object, and paste/duplicate the object in the user's workspace. The 3D modeling system 102 can perform copy-paste operations for a plurality of objects in a selected, available three-dimensional scene until the 3D modeling system 102 determines that the requested three-dimensional scene is complete. To illustrate, the 3D modeling system 102 can copy the desk and books of the second scene 502 of FIG. 4 and paste the copied objects into a user workspace at a client device of the user");
(NOTE 1E: Since the 3d modeling system can copy the desk and books of the second scene 502 (second reference set). Therefore, the 3d modeling system can identify objects based on the second reference set.)
and displaying, based on the identified object, the referenced virtual object (Fisher: ¶89, "As briefly mentioned previously, the 3D modeling system 102 can generate the three-dimensional scene for the natural language phrase using an available three-dimensional scene by inserting objects of the selected, available three-dimensional scene (or from a plurality of available three-dimensional scenes) into a workspace of the user. Specifically, the 3D modeling system 102 can select an object in the available three-dimensional scene based on an object identifier that corresponds to the object. The 3D modeling system 102 can then copy the object, along with one or more properties of the object, and paste/duplicate the object in the user's workspace. The 3D modeling system 102 can perform copy-paste operations for a plurality of objects in a selected, available three-dimensional scene until the 3D modeling system 102 determines that the requested three-dimensional scene is complete. To illustrate, the 3D modeling system 102 can copy the desk (note: identified object) and books (note: the referenced virtual object) of the second scene 502 of FIG. 4 and paste the copied objects into a user workspace at a client device of the user"; Fisher: ¶90, "In alternative embodiments, the 3D modeling system 102 presents the selected three-dimensional scene to the user to allow the user to insert the objects into a workspace. For example, the 3D modeling system 102 can cause the client device of the user to open a new workspace within a client application. The user can then select one or more of the objects in the client application and move the objects into a previously existing workspace or leave the objects in the new workspace to start a new project"). 
(NOTE 1F: When the desk and books are pasted into a user workspace, the desk and books are visually displayed.)
updating the plurality of secondary reference sets (Fisher: ¶39, “. . .After generating the three-dimensional scene, the 3D modeling system 102 can store the three-dimensional scene at the content database 106 for using to generate future three-dimensional scenes”).
However, Fisher fails to teach: receiving image information associated with a device environment; identifying a plurality of objects from the image information; identifying a plurality of relationships between objects in the plurality of objects; generating a plurality of second reference sets based on the identified objects and identified plurality of relationships; detecting a change in the image information associated with the device environment; updating the plurality of second reference sets based on the change in the image information;
(NOTE: Fisher’s secondary reference sets are pre-generated three-dimensional scenes from a content database)
The analogous art Moon teaches:
receiving image information (paragraph 4, upload video (image); NOTE: The image information received includes object information of tracked objects in the input video, the input video itself is made up of image frames which are also image information) 
associated with a device environment (NOTE: Moon’s invention is designed to solve diverse events from videos or images captured from a user’s smartphone as described in Moon paragraph 4. When a user records a video, the video/frames of the video depicts the scene surrounding the device, and because the captured frames depicts objects and spatial relationships within the field of view (which Moon’s system identifies) of the device image sensor or the camera, therefore, the image information is inherently associated with a device environment. When a user records a video, it literally captures the environment associated with the device) (Moon: ¶42, “The object information generation unit 110 may generate object information based on objects in an input video. Here, the object information generation unit 110 may track the objects in the video and generate object information corresponding to the tracked objects. The object information may include the ID, the object type, the time interval, and the spatial region of each of the objects. The time interval of an object may consist of start and end frame numbers or start and end time corresponding to each of the objects. The spatial region may be represented by a Minimum Bounding Polygon (MBP) including each of the objects, for each frame, during the time interval. Therefore, the spatial region may correspond to MBP information ranging from the start frame to the end frame of each of the objects. In this case, when the spatial region is stored in the shape of a rectangle, selected from among polygons, it may include the spatial region frame, X axis coordinate, Y axis coordinate, horizontal length (width), and vertical length (height) of each object. Here, when the spatial region is stored in the shape of a typical polygon, it may correspond to a set of coordinate points constituting the polygon. For example, the set of coordinate points may be {(x.sub.1, y.sub.1), (x.sub.2, y.sub.2), . . . , (x.sub.n, y.sub.n)}”); 
identifying a plurality of objects from the image information (Moon: ¶42, “The object information generation unit 110 may generate object information based on objects in an input video. Here, the object information generation unit 110 may track the objects in the video and generate object information corresponding to the tracked objects. The object information may include the ID, the object type, the time interval, and the spatial region of each of the objects. The time interval of an object may consist of start and end frame numbers or start and end time corresponding to each of the objects. The spatial region may be represented by a Minimum Bounding Polygon (MBP) including each of the objects, for each frame, during the time interval. Therefore, the spatial region may correspond to MBP information ranging from the start frame to the end frame of each of the objects. In this case, when the spatial region is stored in the shape of a rectangle, selected from among polygons, it may include the spatial region frame, X axis coordinate, Y axis coordinate, horizontal length (width), and vertical length (height) of each object. Here, when the spatial region is stored in the shape of a typical polygon, it may correspond to a set of coordinate points constituting the polygon. For example, the set of coordinate points may be {(x.sub.1, y.sub.1), (x.sub.2, y.sub.2), . . . , (x.sub.n, y.sub.n)}”); 
identifying a plurality of relationships between objects in the plurality of objects (Moon: ¶44-45, “[0044] The dynamic spatial relation generation unit 120 may generate dynamic spatial relations between the objects based on the object information. Each dynamic spatial relation may include a relation type which is based on variation in a spatial relation between the objects, time interval information, and a spatial region of a relation. [0045] For example, Into(A, B) may be a dynamic spatial relation in which object A is present near object B and then enters object B and is not seen any more. Further, On(A, B) may be a dynamic spatial relation in which object A is disposed on the surface of object B. That is, On(A, B) may be the dynamic spatial relation in which spatial regions for object A and object B intersect each other.”); 
generating a plurality of second reference sets (Moon: Figs. 3-4; the image frame, Note:  image frame contain the object together with other objects and the background ; the second reference set is equivalent to the images in the image frames)
based on the identified plurality of objects (Moon: ¶63, “. . .object information. . .ID. . .type. . .spatial region. . .”) 
and identified plurality of relationships (Moon: ¶45, “For example, Into(A, B) may be a dynamic spatial relation in which object A is present near object B and then enters object B and is not seen any more. Further, On(A, B) may be a dynamic spatial relation in which object A is disposed on the surface of object B. That is, On(A, B) may be the dynamic spatial relation in which spatial regions for object A and object B intersect each other.”; Moon: ¶65, The general event information may store a relation ID (EventID), a generation type (EventType), generation interval information (StartFrame and EndFrame), video information including a sentence (Semantics), and video information including an event description (EventTracks). Here, one or more pieces of video information (Semantics 1, . . . ∞), each including a sentence, may be stored. Further, the video information including a sentence may store a verb or a verb phrase (Verb), a subject (Subject), and objects (Object1 and Object2). Here, video information including an event description (EventTracks) may store one or more pieces of generation area information (EventTracks 1, . . . , ∞). Here, the generation area information (EventTrack) may include the generation area frame (frameNum), X axis coordinate (X), Y axis coordinate (Y), width (W), and height (H) of each general event.”) (Moon: ¶63, “[0063] Referring to FIG. 3, the data structure of the video descriptor storage unit 150 may be implemented as a data structure corresponding to objects and events. Here, the video descriptor storage unit 150 may store objects such that one or more pieces of object information (objects 1, . . . , ∞) are stored. The object information may include an object ID (ObjectID), object type information (ObjectType), start frame and start time information (StartFrame), end frame and end time information (EndFrame), and a spatial region (ObjectTracks). For each frame, one or more pieces of the spatial region (ObjectTracks) (e.g. Object Tracks 1, . . . , ∞) may be stored. Here, the spatial region (ObjectTracks) may include the spatial region frame (frameNum), X axis coordinate (X), Y axis coordinate (Y), width (W), and height (H) of each object”; [0065] The general event information may store a relation ID (EventID), a generation type (EventType), generation interval information (StartFrame and EndFrame), video information including a sentence (Semantics), and video information including an event description (EventTracks). Here, one or more pieces of video information (Semantics 1, . . . ∞), each including a sentence, may be stored. Further, the video information including a sentence may store a verb or a verb phrase (Verb), a subject (Subject), and objects (Object1 and Object2). Here, video information including an event description (EventTracks) may store one or more pieces of generation area information (EventTracks 1, . . . , ∞). Here, the generation area information (EventTrack) may include the generation area frame (frameNum), X axis coordinate (X), Y axis coordinate (Y), width (W), and height (H) of each general event”); 
detecting a change in the image information associated with the device environment (Moon: ¶72, “. . .dynamic spatial relations between objects may be generated based on the object information. Each dynamic spatial relation may include a relation type which is based on variation in a spatial relation between the objects, time interval information, and a spatial region of a relation”; Moon: ¶73, “For example, Into(A, B) may be a dynamic spatial relation in which object A is present near object B and then enters object B and is not seen any more. Further, On(A, B) may be a dynamic spatial relation in which object A is disposed on the surface of object B. That is, On(A, B) may be the dynamic spatial relation in which spatial regions for object A and object B intersect each other”; Moon: ¶49, “For example, the general event information may include GoIn(A, B)(‘A goes into B’) corresponding to Into1(A, B), HeadFor(A, B)(‘A heads for B’) corresponding to Into2(A, B), and CrashInto(A, B)(‘A crashes into B’) corresponding to Into3(A, B)” (NOTE: Moon’s invention is designed to interpret a video and can identify dynamic spatial relation and event information in an input video. For example, Moon’s system can identify that an object A is present near object B, and then object A then enters object B and is not seen anymore, therefore it can detect a change in the image information as object A is visible in one frame, and then object A is not seen any more in another frame. As discussed above, an input video recorded using a smart phone records a video depicting scenes surrounding the image sensor or the camera are inherently associated with the device environment.); 
updating the plurality of second reference sets based on the change in the image information (Moon: ¶63, “. . . For each frame, one or more pieces of the spatial region (ObjectTracks) (e.g. Object Tracks 1, . . . , ∞) may be stored . . .”; Moon: ¶65, “The general event information may store a relation ID (EventID), a generation type (EventType), generation interval information (StartFrame and EndFrame), video information including a sentence (Semantics), and video information including an event description (EventTracks). Here, one or more pieces of video information (Semantics 1, . . . ∞), each including a sentence, may be stored. Further, the video information including a sentence may store a verb or a verb phrase (Verb), a subject (Subject), and objects (Object1 and Object2). Here, video information including an event description (EventTracks) may store one or more pieces of generation area information (EventTracks 1, . . . , ∞). Here, the generation area information (EventTrack) may include the generation area frame (frameNum), X axis coordinate (X), Y axis coordinate (Y), width (W), and height (H) of each general event”) (NOTE: the video continuously showing the changes of relationship between objects and background in the image frames; each new frame is an update to the older frames as time moves on)
It would have been obvious to a person having ordinary skill in the art (PHOSITA) before the effective filing date of the claimed invention to combine Fisher and Moon and implement Moon’s video interpretation apparatus to identify object relationships dynamically and include the second reference sets (object relationships) identified by Moon for each frame identified from a video input to be used by Fisher’s system as its second reference sets in combination of the pre-generated 3d scenes of Fisher. The reason for doing so is “to interpret a video using object information about objects in an input video, dynamic spatial relations between the objects, and information about a general event” (Moon: ¶7).
Although Fisher teaches obtaining a respective second reference set (scene 502 is selected described in paragraph 88 of Fisher) from a plurality of second reference sets (scene 500, 502, 504 Fig. 5 of Fisher), the obtained respective second reference set is obtained based on a similarity score between objects and relationship of the first reference set from the speech input and the second reference set described in Fisher paragraphs 86-88, the combination of Fisher and Moon still fails to teach: wherein comparing includes identifying a user request frequency corresponding to a respective second reference set of the plurality of second reference sets; obtaining, based on the comparison, a second reference set from the plurality of second reference sets, wherein the second reference set is identified based on a matching score between the first reference set and the second reference set.
The analogous art Ren teaches:
wherein comparing includes identifying a user request frequency corresponding to a respective second reference set of the plurality of second reference sets (Ren: ¶201, “As shown in FIG. 8, for an AR scene, when the system obtains a plurality of selectable real object options for displaying prompt information, that is, when there are multiple selectable objects, or for a VR scene, when the system obtains a plurality of selectable virtual object options for displaying prompt information, the system may use the preference selector to establish weights for respective selectable objects according to the user's preference. As shown in the figure, assuming that the number of selectable real objects is M, W2_1 shown in the figure represents the weight of the first selectable real object, W2_M represents the weight of the Mth selectable real object; similarly, W1_1 represents the weight of the first selectable virtual object, and W1_N represents the weight of the nth selectable virtual object, wherein the preference selector may set the above weights based on the analysis result of the user behavior habit analysis, that is, the weights are set according to the user's habit, and the user behavior habit information may be obtained from the user relevant information stored in the database module (the user data shown in the figure). After that, when the system encounters fuzzy demonstration, the system may make recommendations according to the user's historical weights, and update the weights and save these weights into the database after the user finally makes a choice. The initial value of the weight may be given an initial value by counting the behavioral habits of most users”; 
NOTE: The identified user request frequency is the counted behavioral habits to determine weights for the respective selectable objects. The respective second reference set is the respective selectable object among the plurality of selectable virtual object options to be displayed as prompt information. The prompt information can be notes, drawings, books, televisions, animals, characters, etc. as described in Ren paragraphs 87-91. The plurality of second reference sets is the plurality of selectable virtual object options which are the Virtual objects 1 through N, and Real objects 1 through M. The user request frequency is based on voice input using automatic speech recognition module as described and paragraph 203 and shown in fig. 8. Speech input from the user which contains the first reference set >> preference selector updates the weights of the respective second reference set (respective selectable object) of the plurality of second reference sets (plurality of selectable virtual object options). When the weight is updated for a respective selectable object, then the frequency request for the respective selectable object is updated.)
obtaining, based on the comparison, a second reference set from the plurality of second reference sets, wherein the second reference set is identified based on a matching score between the first reference set and the second reference set (Ren: ¶201, “As shown in FIG. 8, for an AR scene, when the system obtains a plurality of selectable real object options for displaying prompt information, that is, when there are multiple selectable objects, or for a VR scene, when the system obtains a plurality of selectable virtual object options for displaying prompt information, the system may use the preference selector to establish weights for respective selectable objects according to the user's preference. . .  wherein the preference selector may set the above weights based on the analysis result of the user behavior habit analysis, that is, the weights are set according to the user's habit, . . . the system may make recommendations according to the user's historical weights. . . The initial value of the weight may be given an initial value by counting the behavioral habits of most users”; NOTE: The obtained second reference set is the recommended object that is based on the user’s historical weights according to the counted behavioral habits of users. The system inherently makes a comparison of weights so it can make a recommendation. Therefore, the obtained second reference set is based on the comparison of historical weights. The matching score is the weight given to the selectable objects corresponding to the user request and counted behavioral habits. First reference set (speech input data representation) >> matching score between the first reference set and the second reference set (weights assigned to respective selectable objects, that are updated according to user behavior habits. The plurality of second reference sets is the plurality of respective selectable objects) >> identified second reference set (the recommended object);
It would have been obvious to a person having ordinary skill in the art (PHOSITA) before the effective filing date of the claimed invention to combine Fisher, Moon, and Ren and include wherein comparing includes identifying a user request frequency corresponding to a respective second reference set of the plurality of second reference sets; obtaining, based on the comparison, a second reference set from the plurality of second reference sets, wherein the second reference set is identified based on a matching score between the first reference set and the second reference set.
The reason for doing so is to provide “an output that most satisfies the user intent in combination with the scene” (Ren: ¶101) and “to learn whether the user habitually refers to one object by using another appellation”, “for eliminating ambiguity”, and “accurately map the user instruction with the actual scene, and finally obtain the result of the object recognition and the speech recognition accurately, and at the same time it may also screen objects unrelated to the instruction, and output useful information” (Ren: ¶189).

Regarding claim 2, depending on claim 1:
The combination of Fisher, Moon, and Ren teaches:
The electronic device of claim 1, 
the one or more programs including instructions for:
identifying, from the speech input, a plurality of words (Fisher: ¶58, "FIG. 3A illustrates a dependency tree 300 that the 3D modeling system 102 generates for a natural language phrase. Specifically, FIG. 3A illustrates a natural language phrase that reads “Put some books on the desk.” The dependency tree illustrates dependencies (including dependency/relationship types) involving the different phrase components in the natural language phrase. The dependency tree provides a consistent lexical structure that the 3D modeling system can use to identify different words in a phrase");
providing the plurality of words to an input layer (Fisher: ¶70, "While FIG. 3B illustrates an entity-command representation 302 for the phrase of FIG. 3A, the 3D modeling system 102 can generate entity-command representations for more complex phrases involving any number of nouns and commands. For instance, a natural language phrase can include a plurality of requests (note: which may include plurality of words). The 3D modeling system 102 can parse the natural language phrase to determine each separate request and then generate separate entity-command representations for each request. To illustrate, for a natural language phrase that states, “Move the chairs around the dining table farther apart and transfer some of the books on the desk to a table,” the 3D modeling system 102 can determine that the natural language phrase includes a first request to move the chairs around the table farther apart and a second request to transfer some books from a desk to a table");
NOTE 2A: The natural language (voice input) is being parsed by the 3D modeling system. Therefore, the plurality of words are being used as an input.
obtaining, from an output layer, a plurality of tokens based on the plurality of words (Fisher: ¶59, "In one or more embodiments, the 3D modeling system 102 identifies a plurality of tokens in the natural language phrase corresponding to a plurality of character strings the have identified meanings. As constructed, the natural language phrase includes a request for the 3D modeling system to place some books on a desk within a three-dimensional modeling environment. The 3D modeling system 102 first identifies each character string that has an identified meaning such that, in this case, the 3D modeling system 102 identifies every character string in the natural language phrase as a token"; Fisher: ¶106, "The series of acts 800 also includes an act 804 of generating an entity-command representation of the natural language phrase. For example, act 804 involves generating an entity-command representation of the natural language phrase using the determined dependencies between the one or more entities and the one or more commands. For instance, act 804 can first involve generating a dependency tree comprising a plurality of tokens representing words in the natural language phrase and dependency relationships corresponding to the plurality of tokens. Act 804 can then involve converting the dependency tree into: an entity list comprising one or more entities annotated with one or more attributes and one or more relationships corresponding to the one or more entities; and a command list comprising one or more command verbs operating over the one or more entities");
NOTE 2B: Since the tokens are generated (outputted), then the tokens are obtained from an output layer.
obtaining, based on the plurality of tokens, the first reference set (Fisher: ¶67, "In one or more embodiments, the 3D modeling system 102 uses pattern matching to determine and assign the other properties of the natural language phrase to the corresponding entity or command in the entity-command representation 302. For example, the 3D modeling system 102 determines that amod(noun : A, adjective : B) assigns token B as either an attribute or a count of the entity seeded at A, if one exists. When pattern matching, the 3D modeling system 102 augments the standard parts of speech used by the natural language processor to create the dependency tree 300 with four classes used for scene understanding. A first class includes spatial nouns, which are spatial regions relative to entities (e.g., “right,” “center,” “side”). A second class includes counting adjectives, which are adjectives representing object count or general qualifiers (e.g., “all,” “many”). A third class includes group nouns that embody special meaning over a collection of objects (e.g., “stack,” “arrangement”). A fourth class includes adjectival verbs, which the 3D modeling system 102 can model as an attribute modification over the direct object (e.g., “clean,” “brighten”)").
NOTE 2A: The semantic scene graph 316 (first reference set) is based on the entity-command representation 302 (Fisher: ¶73), which is based on tokens (Fisher: ¶67). Therefore, the first reference set (semantic scene graph 316) is obtained based on the plurality of tokens.

Regarding claim 3, depending on claim 2:
The combination of Fisher, Moon, and Ren teaches:
The electronic device of claim 2, 
wherein the plurality of words include the referenced virtual object, a relational object, and a landmark object (Fisher: ¶64, "In one or more embodiments, the 3D modeling system 102 generates the entity-command representation by defining a set of entities and a set of commands for the natural language phrase based on the dependency tree. In particular, an entity includes a category of the base noun and is associated with any attributes of the base noun, a count of the noun, relationships connecting the noun to another entity within the sentence, and any determiners corresponding to the noun. An entity category includes the base noun used to describe an object in the three-dimensional scene (e.g., “table,” “plate,” “arrangement”). As mentioned, base nouns includes that are not in a compound dependency relationship with another noun, are not abstract concepts, and do not represent spatial regions. An attribute of a base noun includes one or more modifier words to modify the base noun (e.g., “modern,” “blue (very, dark)”). A count of the noun includes either an integer representing the number of entities in a group or a qualitative descriptor (e.g., “2,” “three,” “many,” “some”). A relationship connecting the noun to another entity includes a set of (string, entity) pairs that describe a connection to another specific entity in the sentence (e.g., “on:desk,” “left-of:keyboard”). A determiner corresponding to a noun includes a word, phrase, or affix that expresses the reference of the noun in the context (e.g., “a,” “the,” “another,” “each”)"; Fisher: ¶66, "As illustrated in FIG. 3B, the 3D modeling system 102 converts the dependency tree 300 of FIG. 3A into the entity-command representation 302 by placing the identified entities and commands in a logical configuration that connects the entities and commands based on the determined dependencies from the dependency tree 300. For instance, to convert the phrase, “Put some books on the desk,” into an entity-command representation, the 3D modeling system 102 identifies two separate entities—a first entity 304a (“books”) and a second entity 304b (“desk”)"). 

NOTE 3A: Referencing the phrase "Put some books on the desk" (Fisher: ¶66), it may be parsed wherein the word "books" refers to the referenced virtual object, the words "on the" refers to the relational object, and the word "desk" refers to the landmark object.

Regarding claim 4, depending on claim 1:
The combination of Fisher, Moon, and Ren teaches:
The electronic device of claim 1, the one or more programs including instructions for:
obtaining, from an output layer, a plurality of tokens based on the speech input (Fisher: ¶59, also see claim 2 rejection for reference);
identifying, for each token of the plurality of tokens, a parent index and a label classifier (Fisher: ¶45, "In one or more embodiments, the 3D modeling system converts a natural language phrase into a dependency tree by first identifying tokens of the natural language phrase. As previously mentioned, tokens include character strings that have an identified meaning (e.g., words). The 3D modeling system then assigns an annotation label to each token in the natural language phrase. This includes identifying nouns, verbs, adjectives, etc., in the natural language phrase and then determining the dependencies of each component of the phrase relative to the other components in the phrase. An example of assigning parent tokens and annotation labels to each token in a phrase is described in more detail below with respect to FIG. 3A. According to at least some embodiments, the 3D modeling system uses an established framework for determining lexical dependencies (e.g., the Universal Dependencies framework)");
and obtaining, based on the plurality of tokens, the first reference set (Fisher: ¶67, also see claim 2 rejection for reference, and NOTE: 2A).

Regarding claim 6, depending on claim 1:
The combination of Fisher, Moon, and Ren teaches:
The electronic device of claim 1,
wherein the first reference set includes a first object, a second object, and a first relationship object (Referencing again, the phrase “Put some books on the desk”, “books” is the first object, the “desk” is the second object, and “on the” is the first relational object), 
and each reference set of the plurality of second reference sets includes a respective first object, a respective second object, and a respective first relationship object (Fisher: ¶85, "In response to generating a semantic scene graph for a natural language phrase, the 3D modeling system 102 generates a three-dimensional scene using the semantic scene graph. In one or more embodiments, the 3D modeling system 102 identifies one or more available scenes that most closely resemble the requested scene from the natural language phrase. For example, the 3D modeling system 102 compares the semantic scene graph to semantic scene graphs for a plurality of available three-dimensional scenes to identify one or more scenes that are similar to the requested scene. To illustrate, the 3D modeling system 102 can compare the semantic scene graph of the natural language phrase to each semantic scene graph of the available scenes by comparing the object nodes, relationship nodes, and edges of the semantic scene graph of the natural language phrase with the object nodes, relationship nodes, and edges of the semantic scene graph of the available scene. A higher number of overlapping nodes and edges indicates a closer match, while a lower number of overlapping nodes and edges indicates less of a match"; Fisher: ¶86, "Additionally, the 3D modeling system 102 can rank the available scenes by determining how closely the respective semantic scene graphs match the semantic scene graph of the natural language phrase. For instance, the 3D modeling system 102 can assign a comparison score to each semantic scene graph of the available three-dimensional scenes based on the number and similarity of nodes and edges with the semantic scene graph of the natural language phrase. The 3D modeling system 102 can then use the comparison scores to generate a ranked list of scenes with higher comparison scores being higher on the ranked list of scenes. The 3D modeling system 102 may also use a threshold score to determine whether any of the available scenes closely align with the natural language phrase"),
NOTE 6A: (The semantic scene graphs include object nodes (first and second objects) and relationship nodes (relation object). Therefore, both the first reference set, and each reference set of the plurality of second reference sets includes a respective first object, a respective second object, and a respective relationship object.
wherein comparing comprises:
comparing, for each reference set of the plurality of second reference sets (see claim 1 rejection and NOTE 1B):
a first semantic similarity between the first object and the respective first object (See Fisher: ¶86, comparing object nodes);
a second semantic similarity between the second object and the respective second object (See Fisher: ¶86, comparing object nodes);
a third semantic similarity between the first relationship object and the respective first relationship object (See Fisher: ¶86, comparing relationship node).

Regarding claim 7, depending on claim 1:
The combination of Fisher and Moon teaches:
The electronic device of claim 1, wherein comparing comprises:
determining a distance between an object of the first reference set and an object of the plurality of second reference sets; and comparing the first reference set to the plurality of second reference sets based on the determined distance (Fisher: ¶86, "Additionally, the 3D modeling system 102 can rank the available scenes by determining how closely the respective semantic scene graphs (plurality of second reference sets) match the semantic scene graph of the natural language phrase (first reference set). For instance, the 3D modeling system 102 can assign a comparison score to each semantic scene graph of the available three-dimensional scenes based on the number and similarity of nodes and edges with the semantic scene graph of the natural language phrase. The 3D modeling system 102 can then use the comparison scores to generate a ranked list of scenes with higher comparison scores being higher on the ranked list of scenes. The 3D modeling system 102 may also use a threshold score to determine whether any of the available scenes closely align with the natural language phrase").
NOTE 7A: The distance is the "comparison score" that the 3d modeling system use to generate a ranked list of scenes.

Regarding claim 9, depending on claim 1:
The combination of Fisher, Moon, and Ren teaches:
The electronic device of claim 1, 
wherein obtaining a second reference set from the plurality of second reference sets comprises: obtaining a ranked list of reference sets from the plurality of second reference sets, wherein each reference set of the ranked list is associated with a matching score (Fisher: ¶86, "Additionally, the 3D modeling system 102 can rank the available scenes by determining how closely the respective semantic scene graphs (plurality of second reference sets) match the semantic scene graph of the natural language phrase (first reference set). For instance, the 3D modeling system 102 can assign a comparison score to each semantic scene graph of the available three-dimensional scenes based on the number and similarity of nodes and edges with the semantic scene graph of the natural language phrase. The 3D modeling system 102 can then use the comparison scores to generate a ranked list of scenes with higher comparison scores being higher on the ranked list of scenes. The 3D modeling system 102 may also use a threshold score to determine whether any of the available scenes closely align with the natural language phrase"); 
and selecting a second reference set having a highest matching score from the ranked list of reference sets (Fisher: ¶86, “. . . use the comparison scores to generate a ranked list of scenes with higher comparison scores being higher on the ranked list of scenes. . .”; Fisher: ¶112, “Act 806 can also involve identifying, from the plurality of available three-dimensional scenes, a three-dimensional scene that has a semantic scene graph matching the semantic scene graph of the natural language phrase. For example, act 806 can involve selecting, from the plurality of available three-dimensional scenes, a three-dimensional scene that has a highest rank”).

Regarding claim 10, depending on claim 9:
The combination of Fisher, Moon, and Ren teaches:
The electronic device of claim 9, wherein the second reference set having a highest matching score is determined based on an arguments of the maxima function (Fisher: ¶86, “. . . use the comparison scores to generate a ranked list of scenes with higher comparison scores being higher on the ranked list of scenes. . .”; Fisher: ¶112, “Act 806 can also involve identifying, from the plurality of available three-dimensional scenes, a three-dimensional scene that has a semantic scene graph matching the semantic scene graph of the natural language phrase. For example, act 806 can involve selecting, from the plurality of available three-dimensional scenes, a three-dimensional scene that has a highest rank”).
NOTE 10A: The 3D modeling system selects a second reference set based on a 3d scene that has a highest rank (the maxima, which is based from score comparison (function).

Regarding claim 13, depending on claim 1:
The combination of Fisher, Moon, and Ren teaches:
The electronic device of claim 1, wherein identifying an object based on the second reference set comprises:
identifying, from the second reference set, a first respective object, a second respective object, and a relationship between the first respective object and the second respective object, wherein the relationship defines a location of the first respective object relative to the second respective object (Fisher: ¶88, "The 3D modeling system 102 selects the second scene 502 (second reference set) for use in generating the three-dimensional scene of the natural language phrase. The 3D modeling system selects the second scene 502 because the second scene 502 is most similar to the semantic scene graph 316 (i.e., the second scene 502 includes a desk with books on top of the desk). . ."; Fisher: ¶89, "As briefly mentioned previously, the 3D modeling system 102 can generate the three-dimensional scene for the natural language phrase using an available three-dimensional scene by inserting objects of the selected, available three-dimensional scene (or from a plurality of available three-dimensional scenes) into a workspace of the user. Specifically, the 3D modeling system 102 can select an object in the available three-dimensional scene based on an object identifier that corresponds to the object. The 3D modeling system 102 can then copy the object, along with one or more properties of the object, and paste/duplicate the object in the user's workspace. The 3D modeling system 102 can perform copy-paste operations for a plurality of objects in a selected, available three-dimensional scene until the 3D modeling system 102 determines that the requested three-dimensional scene is complete. To illustrate, the 3D modeling system 102 can copy the desk and books of the second scene 502 of FIG. 4 and paste the copied objects into a user workspace at a client device of the user")
NOTE 13A: The scene 502 is the second reference set as it is selected from a plurality of second reference sets (closest match to the voice input - "Put some books on the desk"). The 3d modeling system identifies that scene 502 has books on a desk (books: first respective object, desk: second respective object, "on the": location of the first respective object relative to the second respective object), and determined that it is the closest match to the first reference set.
and the first respective object corresponds to the objected identified based on the second reference set.
NOTE 13B: The books (first respective object) is identified by the 3d modeling system in scene 502 (with books being on a desk). Therefore, the first respective object corresponds to the object identified on the second reference set.

Regarding claim 14, depending on claim 1:
The combination of Fisher, Moon, and Ren teaches:
The electronic device of claim 1, 
wherein identifying an object based on the second reference set comprises:
identifying, from the second reference set, a first respective object, a second respective object, and a relationship between the first respective object and the second respective object 
NOTE 14A: Since a comparison is being made between the first reference set and the second reference set (selected from the plurality of second reference sets, Fisher: ¶85-86, 112), then the 3d modeling system identifies a first respective object, a second respective object, and a relationship between the first respective object and the second respective object from scene 502 (second reference set). (also see NOTE 6A and NOTE 13A); 
and obtaining a region associated with the first respective object and the second respective object (Fisher: ¶67, "In one or more embodiments, the 3D modeling system 102 uses pattern matching to determine and assign the other properties of the natural language phrase to the corresponding entity or command in the entity-command representation 302. For example, the 3D modeling system 102 determines that amod(noun : A, adjective : B) assigns token B as either an attribute or a count of the entity seeded at A, if one exists. When pattern matching, the 3D modeling system 102 augments the standard parts of speech used by the natural language processor to create the dependency tree 300 with four classes used for scene understanding. A first class includes spatial nouns, which are spatial regions relative to entities (e.g., “right,” “center,” “side”). A second class includes counting adjectives, which are adjectives representing object count or general qualifiers (e.g., “all,” “many”). A third class includes group nouns that embody special meaning over a collection of objects (e.g., “stack,” “arrangement”). A fourth class includes adjectival verbs, which the 3D modeling system 102 can model as an attribute modification over the direct object (e.g., “clean,” “brighten”)").

Regarding claim 18, depending on claim 1,
The combination of Fisher, Moon, and Ren teaches:
The electronic device of claim 17,
the one or more programs including instructions for: 
identifying a first respective object and a second respective object from the plurality of objects (Moon: ¶44, “The dynamic spatial relation generation unit 120 may generate dynamic spatial relations between the objects based on the object information. Each dynamic spatial relation may include a relation type which is based on variation in a spatial relation between the objects, time interval information, and a spatial region of a relation”; Moon: ¶45, “For example, Into(A, B) may be a dynamic spatial relation in which object A is present near object B and then enters object B and is not seen any more. Further, On(A, B) may be a dynamic spatial relation in which object A is disposed on the surface of object B. That is, On(A, B) may be the dynamic spatial relation in which spatial regions for object A and object B intersect each other”) (NOTE: Moon’s system can distinguish objects within the input video, a first respective object (object A) and a second respective object (object B)); 
and identifying a relationship between the first respective object and the second respective object (Moon: ¶44, “The dynamic spatial relation generation unit 120 may generate dynamic spatial relations between the objects based on the object information), 
wherein the relationship defines a location of the first respective object relative to the second respective object (Moon: ¶44, “The dynamic spatial relation generation unit 120 may generate dynamic spatial relations between the objects based on the object information. Each dynamic spatial relation may include a relation type which is based on variation in a spatial relation between the objects, time interval information, and a spatial region of a relation”; Moon: ¶45, “For example, Into(A, B) may be a dynamic spatial relation in which object A is present near object B and then enters object B and is not seen any more. Further, On(A, B) may be a dynamic spatial relation in which object A is disposed on the surface of object B. That is, On(A, B) may be the dynamic spatial relation in which spatial regions for object A and object B intersect each other”)).

Regarding CRM claim 22:
CRM claim 22 is drawn to the CRM corresponding to the instructions of using same as claimed in the apparatus of claim 1. Therefore, CRM claim 22 corresponds to the instructions in the apparatus of claim 15, and is rejected for the same reasons of anticipation as used above.

Regarding method claim 23, method claim 23 is drawn to the method corresponding to the instructions of using same as claimed in the apparatus of claim 1. Therefore, CRM claim 23 corresponds to the instructions in the apparatus of claim 1, and is rejected for the same reasons of anticipation as used above.

Regarding claim 25, depending on 1,
The combination of Fisher and Moon teaches:
The electronic device of claim 1,
wherein the change in the image information is based on movement of an image sensor of the electronic device (NOTE: this is an inherent feature of a video camera capturing videos while the image sensor or camera is moving. Moon’s invention is to interpret an input video by analyzing spatial relations and events of objects in each frame of the input video; when the camera is moving, each frame will be changed according to the movement of the camera).

Regarding claim 26, depending on 1,
The electronic device of claim 1,
wherein the change in the image information is based on movement of an object within the device environment (NOTE: Moon’s system is designed to recognize “dynamic spatial relations” in an input video) (Moon: ¶45, “For example, Into(A, B) may be a dynamic spatial relation in which object A is present near object B and then enters object B and is not seen any more. Further, On(A, B) may be a dynamic spatial relation in which object A is disposed on the surface of object B. That is, On(A, B) may be the dynamic spatial relation in which spatial regions for object A and object B intersect each other”; Moon: ¶49, “For example, the general event information may include GoIn(A, B)(‘A goes into B’) corresponding to Into1(A, B), HeadFor(A, B)(‘A heads for B’) corresponding to Into2(A, B), and CrashInto(A, B)(‘A crashes into B’) corresponding to Into3(A, B)”). 


Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Fisher in view of Moon further in view of Ren further in view of Williams et al. (US 20090265160 A1, hereinafter “Williams”).

Regarding claim 8, depending on claim 1:
The combination of Fisher, Moon, and Ren teaches:
the electronic device of claim 1. 
(NOTE: Fisher further teaches to obtain a comparison score to generate a ranked list of scenes to determine whether any of the available scenes closely align with the speech input (Fisher: ¶96). Additionally , Fisher also teaches converting voice input to text using speech-to-text analysis (Fisher: ¶37)).
However, the combination Fisher, Moon, and Ren does not teach the analogous art Williams teaches.
Williams teaches obtaining vector representation of text based documents including a first and second document, and comparing the alignment of the vector representations to produce a score of the similarity of the second document to the first document (Williams: ¶15-20, "[0015] According to a second aspect of the present invention there is provided a system for comparing text based documents comprising: [0016] means for lexically normalising each word of the text of a first document to form a first normalised representation; [0017] means for building a vector representation of the first document from the first normalised representation; [0018] means for lexically normalising each word of the text of a second document to form a second normalised representation; [0019] means for building a vector representation of the second document from the second normalised representation; means for lexically normalising the text of a first document; [0020] means for comparing the alignment of the vector representations to produce a score of the similarity of the second document to the first document").
It would have been obvious to a person having ordinary skill in the art (PHOSITA) before the effective filing date of the claimed invention to modify Fisher and combine the teachings Williams in obtaining, for each reference set of the plurality of second reference sets, a vector representation (Williams: ¶15-20, the first and second documents are regarded as the reference sets to be compared); and comparing a vector representation of the first reference set to each vector representation obtained from the plurality of second reference sets (Williams: ¶15-20, comparing the vector representations) to produce a score of the similarity of the second document to the first document (Williams: ¶20).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Fisher in view of Moon further in view Ren further in view of Miller et al. (US 10979535 B1, hereinafter “Miller”) further in view of Kotzin (US 20060077897 A1, hereinafter “Kotzin”).

Regarding claim 11, depending on claim 9:
The combination of Fisher, Moon, and Ren teaches:
The electronic device of claim 9, the one or more programs including instructions for: in accordance with a determination that a reference set from the plurality of second reference sets is associated with a highest score (associated with ranking, Fisher: ¶86, 112, also see claim 9 rejection).
However, the combination of Fisher and Moon does not teach the analogous art Miller teaches.
Miller teaches a content selection server that could determine that two or more pieces of content of the set of candidate content have equal ranking scores and may rank the two or more pieces of content based on one or more secondary factors. Wherein the secondary factors include content history (Miller: col. 9 ln 50-67 to col. 10 ln. 1-14, "The content selection server 110 may determine the respective bid amounts for the eligible pieces of content of the set of candidate content at the time of delivery of the pieces of content to the semi-connected device using the content history. For example, the content selection server 110 may determine the respective bid amounts for the eligible pieces of content at the time of delivery of the pieces of content based on the bid amounts and the download timestamps provided by the content history. The content selection server 110 then may determine a cost for the impression interaction based on the respective bid amounts for the eligible pieces of content at the time of delivery. The content selection server 110 may rank the eligible pieces of content based on their respective ranking scores and determine the cost for the impression interaction based on the bid amount for the second-ranked eligible piece of content. For example, the content selection server 110 may determine the cost for the impression interaction to be a sum of the bid amount for the second-ranked eligible piece of content and an additional value. In some embodiments, the additional value may be 1. In some embodiments, the content selection server 110 may determine that two or more pieces of content of the set of candidate content have equal ranking scores and may rank the two or more pieces of content based on one or more secondary factors. The secondary factors may include flight times, conversion rates, budget consumption amounts, pacing constraints, and/or other settings or characteristics of the two or more pieces of content having equal ranking scores. In an example embodiment, the content selection server 110 and/or the semi-connected device may randomly select one of multiple pieces of content that have equal ranking scores or bid amounts)
It would have been obvious to a person having ordinary skill in the art (PHOSITA) before the effective filing date of the claimed invention to modify Fisher and combine the teachings of Miller to determine that two or more reference sets of the plurality of second reference sets are associated with equal highest matching scores: selecting, from the two or more reference sets, a second reference set from the ranked list of reference sets based on content history. The combination of Fisher and Miller allows for a tie breaker in situations when there is an equal comparison score, and to enable the 3d modeling system of Fisher to select a 3d scene to generate the requested scene.
However, the combination of Fisher, Moon, Ren and Miller still does not teach the analogous art Kotzin teaches.
Kotzin teaches a content server that track requests such as request frequency, a request history, and a request profile for prioritization (Kotzin: ¶2, prioritization) (Kotzin: ¶26, “As noted, the content server 101 or other infrastructure component may maintain a count or otherwise track the requests made for web pages to establish various metrics such as a request frequency, a request history, a request profile, or the like, based on tracking all requests from all network clients or based on tracking requests from selected groups of network clients or individual clients”).
It would have been obvious to a person having ordinary skill in the art (PHOSITA) before the effective filing date of the claimed invention to combine Fisher as modified by Miller and apply the teachings of Kotzin to substitute the content history taught by Miller, with the request history taught by Kotzin to prioritize selecting, from the two or more reference sets, a second reference set from the ranked list of reference sets based on a request history.


Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Fisher in view of Moon further in view of Ren further in view of Schwarz et al. (US 20180286126 A1, hereinafter “Schwarz”).

Regarding claim 15, depending on claim 14,
The combination of Fisher, Moon, and Ren teaches:
The electronic device of claim 14, the one or more programs including instructions for: identifying a first region associated with the first respective object (Fisher: ¶67, also see claim 14 rejection and note 14A)
However, the combination Fisher, Moon, and Ren does not teach the analogous art Schwarz teaches.
Schwarz teaches: 
wherein a region includes a first top boundary, a first bottom boundary, a first left boundary, and a first right boundary (Schwarz: ¶48, "With reference to FIG. 4, in some examples a virtual object may be displayed within a virtual bounding box or other virtual container. In the example of FIG. 4, the holographic motorcycle 244 is displayed within virtual bounding box 400. In this example, a user may manipulate the holographic motorcycle 244 by interacting with the virtual bounding box 400. Also in this example, the 4 virtual UI elements 76 are displayed on one face of the bounding box 400, such as the face that is oriented toward the HMD device 18"; also see figure 4 that illustrates (sides of a box)a first top boundary, a first bottom boundary, a first left boundary, and a first right boundary); 
and displaying the referenced virtual object within the identified first region (Schwarz: ¶48, ". . .a virtual object may be displayed within a virtual bounding box. . .).
It would have been obvious to a person having ordinary skill in the art (PHOSITA) before the effective filing date of the claimed invention to modify Fisher and combine Schwarz teachings to include boundaries to the identified region associated with the first respective object wherein the first region includes a first top boundary, a first bottom boundary, a first left boundary, and a first right boundary and displaying the referenced virtual object within the identified first region to determine that one or more of the virtual object and the one or more user interface elements are within a predetermined distance of a physical surface in the environment (Schwarz: ¶3).

Allowable Subject Matter
Claims 5, 12, and 16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PATRICK GALERA whose telephone number is (571)272-5070. The examiner can normally be reached Mon-Fri 0800-1700 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at 571-270-0728. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PATRICK P GALERA/Examiner, Art Unit 2617                                                                                                                                                                                             /KING Y POON/Supervisory Patent Examiner, Art Unit 2617
Read full office action
Prosecution Timeline

Aug 31, 2023
Application Filed
Oct 18, 2024
Response after Non-Final Action
Jun 26, 2025
Non-Final Rejection — §103
Aug 21, 2025
Applicant Interview (Telephonic)
Aug 21, 2025
Examiner Interview Summary
Aug 28, 2025
Response Filed
Nov 03, 2025
Final Rejection — §103
Feb 04, 2026
Examiner Interview Summary
Feb 04, 2026
Applicant Interview (Telephonic)
Feb 05, 2026
Request for Continued Examination
Feb 20, 2026
Response after Non-Final Action
Feb 25, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/555,546
Patent 12602567
SYSTEM AND METHOD FOR RENDERING A VIRTUAL MODEL-BASED INTERACTION
2y 5m to grant Granted Apr 14, 2026
18/264,402
Patent 12597184
IMAGE PROCESSING METHOD AND APPARATUS, DEVICE AND READABLE STORAGE MEDIUM
2y 5m to grant Granted Apr 07, 2026
18/430,695
Patent 12586549
Image conversion apparatus and method having timing reconstruction mechanism
2y 5m to grant Granted Mar 24, 2026
18/399,412
Patent 12579921
ELECTRONIC DEVICE HAVING FLEXIBLE DISPLAY AND METHOD FOR CONTROLLING THE SAME
2y 5m to grant Granted Mar 17, 2026
17/875,699
Patent 12491085
SYSTEMS AND METHODS FOR ORTHOPEDIC IMPLANT FIXATION
2y 5m to grant Granted Dec 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
86%
Grant Probability
99%
With Interview (+16.7%)
2y 5m
Median Time to Grant
High
PTA Risk
Based on 7 resolved cases by this examiner. Grant probability derived from career allow rate.