Last updated: May 04, 2026

Application No. 18/652,201

SCENE CREATION USING LANGUAGE MODELS

Final Rejection §103

Filed

May 01, 2024

Priority

Nov 07, 2023 — provisional 63/596,729

Examiner

TRAN, JENNY NGAN

Art Unit

2615

Tech Center

2600 — Communications

Assignee

Roblox Corporation

OA Round

2 (Final)

Interview Optional

— +50.0% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 20% grant rate with +50.0% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 5 resolved cases, 2023–2026

Examiner Intelligence

TRAN, JENNY NGAN View full profile →

Grants only 20% of cases

Career Allowance Rate

1 granted / 5 resolved

-42.0% vs TC avg

Strong +50% interview lift

Without

With

+50.0%

Interview Lift

resolved cases with interview

Typical timeline

2y 6m

Avg Prosecution

31 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

8.3%

-31.7% vs TC avg

§103

52.8%

+12.8% vs TC avg

§102

20.2%

-19.8% vs TC avg

§112

17.0%

-23.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 5 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of the Claims
	Claims 1-20 are currently pending in the present application, with claims 1, 9, and 16 being independent.
Response to Arguments / Amendments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-11, 13-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. "Voyager: An open-ended embodied agent with large language models." arXiv preprint arXiv:2305.16291 (2023), hereinafter referred to as “Wang”, in view of Beauchamp et al. (US 20230410436 A1), hereinafter referred to as “Beauchamp”.
	Regarding claim 1, Wang discloses a computer-implemented method, the method comprising:
receiving a user prompt (Pg. 4, Section 2.1; The input prompt to GPT-4. Fig. 10; "Build Nether Portal", "Build House"…human input is demonstrated from left to right. Pg. 19-20; User: A detailed instruction that guides the assistant for the next immediate response), the user prompt comprising text criteria specifying for generation or modification of a virtual experience (Pg. 4, Section 2.1; (1) Directives encouraging diverse behaviors and imposing constraints…(2) The agent's current state…(3) Previously completed and failed tasks…(4) Additional context…Pg. 9, Section 3.5; (2) Human as a curriculum (equivalent to VOYAGER's automatic curriculum module)…) , wherein the user prompt is a natural language prompt that includes at least one of text data, audio data, or video data (Pg. 3-4, Section 2.1; automatic curriculum capitalizes on the internet-scale knowledge contained within GPT-4 by prompting it to provide a steady stream of new tasks or challenges…Fig. 4; Task: Craft Iron Pickaxe "How to craft an iron pickaxe in Minecraft?"),
identifying one or more objects in the virtual experience having one or more attributes that correspond to the text criteria (Fig. 4 Retrieve Top-5 Relevant Skills; Skill retrieval. When faced with a new task proposed by the automatic curriculum, we first leverage GPT-3.5 to generate a general suggestion for solving the task, which is combined with environment feedback as the query context. Subsequently, we perform querying to identify the top-5 relevant skills. Pg. 9, Section 3.5 and Fig. 10; VOYAGER is able to construct complex 3D structures in Minecraft, such as a Nether Portal and a house…break down a complex building task into smaller steps, guiding VOYAGER to complete them incrementally), the one or more objects being identified by a large language model (Pg. 2, Section 1; VOYAGER, the first LLM-powered embodied lifelong learning agent…VOYAGER is made possible through three key modules (Fig. 2): 1) an automatic curriculum that maximizes exploration; 2) a skill library for storing and retrieving complex behaviors; and 3) a new iterative prompting mechanism that generates executable code for embodied control),
determining spatial placement information in the virtual experience for the one or more objects using the large language model to interpret the text criteria to determine locations for the one or more objects in the virtual experience (Pg. 9, Section 3.5; (1) Human as a critic (equivalent to VOYAGER’s self-verification module): humans provide visual critique to VOYAGER, allowing it to modify the code from the previous round. This feedback is essential for correcting certain errors in the spatial details of a 3D structure that VOYAGER cannot perceive directly),
and placing the one or more objects in the virtual experience based on the spatial placement information (Fig. 10; VOYAGER builds 3D structures with human feedback. The progress of building designs that integrate human input is demonstrated from left to right).
Wang does not seem to explicitly disclose wherein the placing comprises placing objects such that overlap between objects is avoided.
In the same art of virtual object placement in augmented reality environments, Beauchamp discloses wherein the placing comprises placing objects such that overlap between objects is avoided 
(Par. 0210; determine positioning data to situate the new virtual object in a contextually realistic and appropriate location (e.g., place a vase virtual object on a table, not on a sofa; avoid collisions or overlaps) based upon, for example, attributes of the objects and/or the region (e.g., surfaces detected in the region, types of objects, position and spatial information of the other objects). In this way, the client app may identify and avoid “collisions” of overlapping virtual objects).


	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Wang’s placement of objects in the virtual experience to include overlap/collision positioning techniques as taught by Beauchamp. Doing so prevents collisions between virtual objects and improve spatial realism of the generated environment. Applying known collision-avoidance placement techniques is a predictable enhancement to virtual experience, yielding predictable results in improved immersive and interactive environments that maintain realism and interactivity expected by users. 

Regarding claim 2, Wang discloses the computer-implemented method of claim 1, and further discloses comprising modifying the virtual experience by changing an attribute of a specified object of the one or more objects in the virtual experience based on the text criteria (Fig. 2, 4, and Pg. 24-25; Retrieved skills from the skill library), wherein the attribute comprises at least one of an appearance (Pg. 24, Section A.4.1.; craftItem(bot, name, count = 1): Craft the item with a crafting table nearby), a behavior (Pg. 25, Section A.4.1; (3) Control primitive APIs provided by Mineflayer:…Examiner's note: these correlate to the bot's action or behaviors), a position (Pg. 24; Section A.4.1; placeItem(bot, name, position): Place the block at the specified position), an orientation (Pg. 25, Section A.4.1; newGoalLookAtBlock(position, bot.world, {}): Path towards a position where a face of the block at position is visible), a style (Pg. 24, Section A.4.1.; craftItem(bot, name, count = 1): Craft the item with a crafting table nearby), a material (Pg. 24, Section A.4.1; smeltItem(bot, itemName, fuelName, count = 1): Smelt the item with the specified fuel), a texture, a cost, a property, or another modifiable aspect of the specified object (Pg. 2, Section 1, VOYAGER incrementally builds a skill library by storing the action programs that help solve a task successfully).
Wang and Beauchamp are combined for the reasons set forth above with respect to claim 1.

Regarding claim 3, Wang discloses the computer-implemented method of claim 1, and further discloses wherein the identifying of the one or more objects in the virtual experience comprises:
generating one or more keywords using the large language model (Fig. 2; Add new Skill. Fig. 4; GPT-4 generates and verifies a new skill, we add it to the skill library, represented by a vector database) 
and performing a keyword search based on the keywords (Fig. 2; Skill Retrieval. Fig. 4; we perform querying to identify the top-5 relevant skills).
 Wang and Beauchamp are combined for the reasons set forth above with respect to claim 1.

Regarding claim 4, Wang discloses the computer-implemented method of claim 1, but does not appear to explicitly disclose wherein the placing comprises placing objects such that overlap between objects is avoided based on using object dimensions.
 In the same art of virtual object placement in augmented reality environments, Beauchamp discloses wherein the placing comprises placing objects such that overlap between objects is avoided based on using object dimensions (Par. 0210; determine positioning data to situate the new virtual object in a contextually realistic and appropriate location (e.g., place a vase virtual object on a table, not on a sofa; avoid collisions or overlaps) based upon, for example, attributes of the objects and/or the region (e.g., surfaces detected in the region, types of objects, position and spatial information of the other objects). In this way, the client app may identify and avoid “collisions” of overlapping virtual objects).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to determine object placement in Wang such that overlap between objects is avoided using object dimension as taught by Beauchamp. A person of ordinary skill in the art would have understood that object dimensions represent fundamental object attributes used to determine whether objects intersect within a virtual space, and incorporating object-dimension information into Wang’s placement determination would therefore have been yielded predictable results in enabled accurate collision avoidance when placing objects in virtual scenes. 

Regarding claim 5, Wang discloses the computer-implemented method of claim 1, and further discloses wherein the user prompt comprises an updated prompt (Pg. 5, Section 2.3; Iterative Prompting Mechanism. Pg. 9, Section 3.5; humans provide visual critique to VOYAGER, allowing it to modify the code from the previous round…humans break down a complex building task into smaller steps, guiding VOYAGER to complete them incrementally).
 Wang and Beauchamp are combined for the reasons set forth above with respect to claim 1.

Regarding claim 6, Wang discloses the computer-implemented method of claim 1, and further discloses comprising providing, to a user, at least one of a view of the virtual experience including the one or more objects (Fig. 2, Fig. 10 and Pg. 1; Minecraft World Environment. Pg. 25, Section A.4.1; await bot.pathfinder.goto(goal): Go to a specific position) as placed or a summary of changes made to the virtual experience (Pg. 5, Section 2.3; (1) Environment feedback, which illustrates the intermediate progress of program execution (Fig. 5, left)…We use bot.chat() inside control primitive APIs to generate environment feedback and prompt GPT-4 to use this function as well during code generation).
 Wang and Beauchamp are combined for the reasons set forth above with respect to claim 1.

Regarding claim 7, Wang discloses the computer-implemented method of claim 1, and further discloses wherein the large language model uses at least one of scene context (Fig. 3 Tasks proposed by the automatic curriculum. Pg. 20, Section A.3; input prompt to GPT-4 consists of…(2) The agent's current state: Inventory…Equipment…Nearby blocks…Nearby entities…Biome…Time…Position) and a history of user prompts (Pg. 4, Section 2.1-2.2; (3) Previously completed and failed tasks, reflecting the agent's current exploration progress and capabilities frontier… (3) The generated code from the last round, environment feedback, execution errors, and critique) to perform at least one of identifying the one or more objects or determining the spatial placement information (Pg. 9, Section 4.5 and Fig. 10; VOYAGER builds 3D structure with human feedback…Human as a critic…feedback is essential for correcting certain errors in the spatial details of a 3D structure…Human as a curriculum…).
 Wang and Beauchamp are combined for the reasons set forth above with respect to claim 1.

Regarding claim 8, Wang discloses the computer-implemented method of claim 1, and further discloses wherein the large language model uses at least one macro obtained from the natural language prompt (Pg. 24-25; (2) Control primitive APIs implemented by us…exploreUntil(bot, direction, maxTime=60, callback)…mineBlock(bot, name, count = 1)…craftItem(bot, name, count = 1…placeItem(bot, name, position)…(3) Control primitive APIs provided by Mineflayer…See Pg. 26-31 for Full system prompt for code generation) to perform at least one of identifying the one or more objects or determining the spatial placement information (Pg. 9, Section 4.5 and Fig. 10; VOYAGER builds 3D structure with human feedback…Human as a critic…feedback is essential for correcting certain errors in the spatial details of a 3D structure…Human as a curriculum…).
 Wang and Beauchamp are combined for the reasons set forth above with respect to claim 1.

Regarding claim 9, claim 9 is the CRM claim of method claim 1 and is accordingly rejected using substantially similar rationale as to that which is set for with respect to claim 1. 
 
Regarding claim 16, claim 16 is the system claim of method claim 1 and is accordingly rejected using substantially similar rationale as to that which is set for with respect to claim 1. 
 
Regarding claim 10, claim 10 has similar limitations as of claim 2, except it is a CRM claim, therefore it is rejected under the same rationale as claim 2.
 
Regarding claims 11 and 18, claims 11 and 18 has similar limitations as of claim 3, except claim 11 is the CRM claim and claim 18 is the system claim, therefore it is rejected under the same rationale as claim 3.
 
Regarding claims 14 and 20, claims 14 and 20 has similar limitations as of claim 6, except claim 14 is the CRM claim and claim 20 is the system claim, therefore it is rejected under the same rationale as claim 6.
 
Regarding claim 15, claim 15 has similar limitations as of claim 7, except it is a CRM claim, therefore it is rejected under the same rationale as claim 7.
 
Regarding claims 13 and 17, claims 13 and 17 has similar limitations as of claim 8, except claim 13 is the CRM claim and claim 17 is the system claim, therefore it is rejected under the same rationale as claim 8.
	
Regarding claim 19, Wang in view of Beauchamp discloses the system of claim 16, but Wang does not appear to explicitly disclose wherein the placing comprises placing objects such that overlap between objects is avoided by detecting whether any of the placed objects would overlap and correcting object placement if potential overlap would occur
	In the same art of virtual object placement in augmented reality environments, Beauchamp discloses wherein the placing comprises placing objects such that overlap between objects is avoided by detecting whether any of the placed objects would overlap and correcting object placement if potential overlap would occur (Par. 0210; determine positioning data to situate the new virtual object in a contextually realistic and appropriate location (e.g., place a vase virtual object on a table, not on a sofa; avoid collisions or overlaps) based upon, for example, attributes of the objects and/or the region (e.g., surfaces detected in the region, types of objects, position and spatial information of the other objects). In this way, the client app may identify and avoid “collisions” of overlapping virtual objects).
Wang and Beauchamp are combined for the reasons set forth above with respect to claim 1.

Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. "Voyager: An open-ended embodied agent with large language models." arXiv preprint arXiv:2305.16291 (2023), hereinafter referred to as “Wang”, in view of Beauchamp et al. (US 20230410436 A1), hereinafter referred to as “Beauchamp”, in further view of Shekhar et al. (US 10665030 B1).
Regarding claims 12, Wang in view of Beauchamp discloses the non-transitory computer-readable medium of claim 9, but does not appear to explicitly disclose wherein the placing comprises placing objects such that overlap between objects is avoided based on rules in the user prompt for proper placement of objects
In the same art of natural language in 3D scenes, Shekhar discloses disclose wherein the placing comprises placing objects such that overlap between objects is avoided based on rules in the user prompt for proper placement of objects (Column 2, lines, 61-64; The natural language input can describe static and dynamic (animated) scenes both explicitly, where the spatial and size relationships between objects are defined by the textual description. Column 6, lines 35-43; A scene graph is a graphical representation of natural language text, where each node in the graph corresponds to objects or other entities referenced in the text…relation edges, which describe spatial and size relationships between objects/entities. Column 10, lines 43-56; to predict the position and size of each object from those of an object that has already been positioned either manually or through another relation, and then the scene is composed from these predictions. To avoid overlap between objects, a simple heuristic is used for comparing the predicted size and location of the current object with the size and location of objects whose positions have been fixed and removing any overlap by shifting the object along the direction that requires the least possible shift (by magnitude). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to determine Wang’s object placement such that overlap between objects is avoided based on rules derived from user prompt as taught by Shekhar. Incorporating prompt-derived spatial rules/constraint when determining object placement improves the consistency between the generated scene and the user’s natural language description, while also preventing error in object arrangements. Constraints within user prompt is a common technique and predictable enhancement to improve realism and usability of automatically generated virtual experiences.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JENNY NGAN TRAN whose telephone number is (571)272-6888. The examiner can normally be reached Mon-Thurs 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alicia Harrington can be reached at (571) 272-2330. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/JENNY N TRAN/
Examiner, Art Unit 2615     

/ALICIA M HARRINGTON/Supervisory Patent Examiner, Art Unit 2615

Read full office action

Prosecution Timeline

May 01, 2024

Application Filed

Oct 16, 2025

Non-Final Rejection — §103

Nov 25, 2025

Interview Requested

Dec 03, 2025

Examiner Interview Summary

Dec 03, 2025

Applicant Interview (Telephonic)

Jan 20, 2026

Response Filed

Mar 05, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/333,695

Patent 12499589

SYSTEMS AND METHODS FOR IMAGE GENERATION VIA DIFFUSION

2y 6m to grant Granted Dec 16, 2025

Study what changed to get past this examiner. Based on 1 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

20%

Grant Probability

70%

With Interview (+50.0%)

2y 6m (~6m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 5 resolved cases by this examiner. Grant probability derived from career allowance rate.