Last updated: July 17, 2026

Application No. 18/480,177

METHOD AND APPARATUS FOR ORGANIZING AND MERGING SCENE UNDERSTANDING INFORMATION OF ARTIFICIAL INTELLIGENCE AGENT

Non-Final OA §103

Filed

Oct 03, 2023

Priority

Dec 19, 2022 — RE 10-2022-0178220

Examiner

RODRIGUEZ, ANTHONY JASON

Art Unit

2672

Tech Center

2600 — Communications

Assignee

Electronics and Telecommunications Research Institute

OA Round

3 (Non-Final)

Interview Optional

— -1.4% interview lift. Interview lift (-1.4%) is below the 15.0% threshold. A written response is recommended.

Based on 27 resolved cases, 2023–2026

Examiner Intelligence

RODRIGUEZ, ANTHONY JASON View full profile →

Grants only 30% of cases

Career Allowance Rate

8 granted / 27 resolved

-32.4% vs TC avg

Minimal -1% lift

Without

With

+-1.4%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

29 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§103

87.5%

+47.5% vs TC avg

§102

1.7%

-38.3% vs TC avg

§112

10.8%

-29.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 27 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, see Remarks page 1, filed 03/16/2026, with respect to the rejections of claims 1-16 under 35 U.S.C. 112(b) have been fully considered and are persuasive.  The rejections of claims 1-16 have been withdrawn. 
Applicant’s arguments, see Remarks pages 1-3, filed 03/16/2026, with respect to the rejections of claims 1-16 under 35 U.S.C. 101 have been fully considered and are persuasive.  The rejections of claims 1-16 have been withdrawn. 
Applicant’s arguments, see Remarks pages 4-6, filed 03/16/2026, with respect to the rejections of amended claim(s) 1 & 9 under 35 U.S.C. 102(a)(1) have been fully considered and are moot in view of the new grounds of rejection (detailed in the rejections below) necessitated by Applicant’s amendment to the claim(s).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 7, 9-12, and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Moon et al. (PDDL Planning with Natural Language-Based Scene Understanding for UAV-UGV Cooperation) hereinafter referenced as Moon, in view of Im et al. (KR102185777B1) hereinafter referenced as Im.
Regarding claim 1, Moon discloses: A method for organizing and merging a scene of an Artificial Intelligence (Al) agent (Moon: Figure 2; Abstract), comprising: acquiring an image of a first space (Moon: Figure 3; 4.1. Experiment Setting: “Each Smart Cookie has 2D laser sensors and an RGB-D camera. Beyond is equipped with an RGB-D camera. The laser sensor is used for navigation on the execution part while the RGB-D cameras are used for cognition for the natural language-based scene understanding.”); 
recognizing objects in the image; structuring information about a relationship between the objects and information about states of the objects in a form of a graph (Moon: Figure 8b; 3.1. Natural Language-Based Cognition: “In this study, we assume that the robots use a graph map (for motion planning) generated using semantic SLAM, which is a widely used environment representation method in robotics [7]. To utilize the graph map G = (V, E) that contains features and positions of the detected object as nodes vi ∈ V and their relationships as edges eij = (vi , vj) ∈ Eij, we closely follow Moon and Lee [41] for generating the language description and graph inference phase of Xu et al. [42] for the scene graph generation.”; Wherein the scene graph generated from the graph map contains the information regarding object relationships and object states.); and
merging the structured information with information received from another AI agent (Moon: 3. Architecture: “our framework entails natural language-based cognition and a knowledge engine for multiple agents…sensor information obtained from environments is continuously passed to cognition. During cognition, scene understanding-based natural language is created by generating language description and scene understanding using deep learning techniques. Then, the generated semantic information is passed on to the knowledge engine while raw sensor data are sent to episodic memory storage.”; 3.2. Knowledge Engine: “The knowledge engine obtains semantic environmental information in XML and stores it in triple store, which uses a resource description framework (RDF) such as ”subject-predicate-object” or ”resource-property type-value” unlike the conventional relational database that saves data in ”key-value.” Triple store uses the SPARQL protocol and RDF query language (SPARQL) to create, read, update, and delete the graph data that contain relations between objects.”; Wherein the multiple agents are continuously providing semantic environmental data, which is stored and shared between all agents.),
wherein merging the structured information comprises: detecting a change in a scene at a specific time point (Moon: Section: 4.2: “Every robot was required to report the current situation to the DICQ.R as well as if an unusual situation occurred. During the mission, we surmised what may happen if a dynamic obstacle, which a robot could not approach, were to suddenly appear at the POI. In this situation, the robot will generate natural language to report the current situation to the DICQ.R.”);
generating a summary sentence describing the detected change in the scene using a language-vision model based on words and visual features corresponding to the detected change (Moon: 3.1. Natural Language-Based Cognition: “Then, an RNN is used to generate a language description over the graph. The RNN takes the encoded graph features concatenated with a word vector and predicts the probabilistic distribution of the target word vector. Given that we also back-propagate the GCN when training the RNN, we can expect that graph features that fit the generated sentence will be extracted. The generated description can be used to understand the surrounding environment when an unexpected situation occurs.”); and
exchanging the generated summary sentence with the other AI agent (Moon: Figure 7: “• Perform ‘find missing child’ mission received from DICQ.R (A2) 
1. DICQ.R command Smart cookie1,2,3, and Beyond-pf1 to find a missing child 
2. Smart cookie1,2,3, and Beyond-pf1 visit every POI to find the missing child 
3. Beyond-pf1 find a human 
4. Beyond-pf1 create POI at human position 
5. DICQ.R generate new mission for Smart cookie1,2,3 to go to created POI to check the found human is the missing child…
•Smart cookie cannot approach POI due to a dynamic obstacle (A3)
1. Smart cookie generates natural language to describe the current situation to DICQ.R 
2. Replanning”; 
3. Architecture: “our framework entails natural language-based cognition and a knowledge engine for multiple agents…sensor information obtained from environments is continuously passed to cognition. During cognition, scene understanding-based natural language is created by generating language description and scene understanding using deep learning techniques. Then, the generated semantic information is passed on to the knowledge engine while raw sensor data are sent to episodic memory storage. Using the episodic memory and knowledge collected from multiple robots, the PDDL planning agent builds a sequence of actions for each agent.”;
Wherein the generated semantic information, such as natural language and scheme graphs, from each robot/agent are collected and processed in order to form a collective scene understanding for action coordination.).
	Moon does not disclose expressly: wherein merging the structured information comprises: inputting labels of respective objects in the scene to a pretrained language model configured to determine contextually appropriate state-related candidate words combinable with the object labels; and generating a summary sentence describing the detected change in the scene using a language-vision model based on the candidate words and visual features corresponding to the detected change.
Im discloses: inputting labels of respective objects in the scene to a pretrained language model configured to determine contextually appropriate state-related candidate words combinable with the object labels (Im: 0001: “The present invention relates to a method for recognizing semantic relationships of image objects based on deep learning and PLSI using a computer, and more specifically, to a method for recognizing semantic relationships of image objects based on deep learning and PLSI that introduces PLSI into the recognition of semantic relationships of image objects using deep learning to recognize objects contained in an image and find relationships between the recognized objects.”);
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to substitute the semantic graph generation method for the generation of natural language descriptions and scene graphs disclosed by Moon with the deep learning based scene graph generation method taught by Im. The suggestion/motivation for doing so would have been “existing scene graph generation methods primarily rely on locating objects within an image and then identifying their relationships; consequently, they have disadvantages such as the possibility of incorrectly detecting relationships between objects and the inability to detect relationships more precisely” (Im: 0017). Further, one skilled in the art could have substituted the elements as described above by known methods with no change in their respective functions, and the substitution would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Moon  with Im to obtain the invention as specified in claim 1.
Regarding claim 2, Moon in view of Im discloses: The method of claim 1, wherein: the objects are recognized based on a first AI neural network that is pretrained by receiving an image of a specific space as input (Im: 0019-0020: “To achieve the objective of the present invention, a method for recognizing semantic relationships of image objects based on deep learning and PLSI according to the present invention comprises: a step of receiving an image input; A step of detecting objects in the image using a deep learning-based object detection method”), and the information about the states of the objects is acquired by inputting the objects to a second AI neural network that is trained based on training data formed by labeling objects with respective states (Im: 0072-0073: “Then, PLSI is used to detect the context of the image. As shown in Fig. 6, the object named person is replaced by the player object due to the contextual situation, and the object named ball is materialized into the basketball object. In other words, objects within a comprehensive upper scope are limited to objects within a more detailed lower scope. Then, using context, existing deep learning-based relationship detection methods, such as spatial relationship detection utilizing spatial features and ontology methods, are used to find relationships between objects.”).
Regarding claim 3, Moon in view of Im discloses: The method of claim 1, wherein merging the structured information comprises merging information corresponding to the first space with information corresponding to a second space of the other AI agent (Moon: 3. Architecture: “our framework entails natural language-based cognition and a knowledge engine for multiple agents…sensor information obtained from environments is continuously passed to cognition. During cognition, scene understanding-based natural language is created by generating language description and scene understanding using deep learning techniques. Then, the generated semantic information is passed on to the knowledge engine while raw sensor data are sent to episodic memory storage.”; Wherein the agents each share information regarding their current location).  
Regarding claim 4, Moon in view of Im discloses: The method of claim 3, wherein the merged information is configured such that a positional relationship between the first space and the second space is represented in a form of a link (Moon: 3. Architecture: “our framework entails natural language-based cognition and a knowledge engine for multiple agents…sensor information obtained from environments is continuously passed to cognition. During cognition, scene understanding-based natural language is created by generating language description and scene understanding using deep learning techniques. Then, the generated semantic information is passed on to the knowledge engine while raw sensor data are sent to episodic memory storage.”;
3.2. Knowledge Engine: “The knowledge engine obtains semantic environmental information in XML and stores it in triple store…The triple store facilitates the reasoning process by using the relations and attributes between objects to find new relations.”;
4.2. Scenario: “we surmised what may happen if a dynamic obstacle, which a robot could not approach, were to suddenly appear at the POI. In this situation, the robot will generate natural language to report the current situation to the DICQ.R.”; Wherein in the case that a robot/agent’s mission were to fail due to a dynamic object at its location, the knowledge engine, and thus the DICQ.R is updated for the replanning of robot/agents’ actions, thus indicating a link between each robot’s location based on the information in the knowledge engine.).  
Regarding claim 7, Moon in view of Im discloses: The method of claim 1, wherein merging the structured information comprises exchanging information about an object having a state that is updated over time, with the other AI agent (Moon: Figure 7: “• Perform ‘find missing child’ mission received from DICQ.R (A2) 
1. DICQ.R command Smart cookie1,2,3, and Beyond-pf1 to find a missing child 
2. Smart cookie1,2,3, and Beyond-pf1 visit every POI to find the missing child 
3. Beyond-pf1 find a human 
4. Beyond-pf1 create POI at human position 
5. DICQ.R generate new mission for Smart cookie1,2,3 to go to created POI to check the found human is the missing child…
•Smart cookie cannot approach POI due to a dynamic obstacle (A3)
1. Smart cookie generates natural language to describe the current situation to DICQ.R 
2. Replanning”; 
3. Architecture: “our framework entails natural language-based cognition and a knowledge engine for multiple agents…sensor information obtained from environments is continuously passed to cognition. During cognition, scene understanding-based natural language is created by generating language description and scene understanding using deep learning techniques. Then, the generated semantic information is passed on to the knowledge engine while raw sensor data are sent to episodic memory storage. Using the episodic memory and knowledge collected from multiple robots, the PDDL planning agent builds a sequence of actions for each agent.”).  
As per claim 9, arguments made in rejecting claim 1 are analogous. In addition, Section 4.1. Experiment Setting of Moon discloses the experiment and its components, which comprises memory in which at least one program is recorded; and a processor for executing the program.
As per claim 10, arguments made in rejecting claim 2 are analogous.
As per claim 11, arguments made in rejecting claim 3 are analogous.
As per claim 12, arguments made in rejecting claim 4 are analogous.
As per claim 15, arguments made in rejecting claim 7 are analogous.

Claim(s) 5-6 and 13-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Moon in view of Im, and further in view of Dorr et al. (Cooperative Longterm SLAM for Navigating Mobile Robots in Industrial Applications) hereinafter referenced as Dorr.
Regarding claim 5, Moon in view of Im discloses: The method of claim 1.
Moon in view of Im does not disclose expressly: wherein merging the structured information comprises determining whether to merge the structured information with the information received from the other AI agent based on metadata of the other AI agent, and the metadata includes timestamp information corresponding to the information about the states of the objects. 
Thus, Moon in view of Im does not disclose merging the stored environmental data collected by its agents based on metadata, including timestamp information.
Dorr discloses: the merging of environmental data, using data collected by mobile agents, by dividing the environmental map into a grid and merging the cells based on the cell’s most recent observation (Dorr: III. D. Map Transmission: “the map upstream from the agent to the server is realized by a periodic request by the server. This enables efficient work load management. In order to reduce bandwidth requirements, cells are tagged, that have been observed since the last map upload. As a result, only these cells have to be transmitted and merged, which substantially increases the efficiency.”; III. E. Map Merging: “Map merging is needed when agents receive new map information from the server or when detected changes of the agents have to be merged with the server's global map. In both cases, we use a timestamp-based strategy where each cell keeps the timestamp of its most current observation. Following this approach, maps can easily be merged by only using the latest updated cell and discarding all obsolete one”; Wherein the observation timestamp of a cell is updated based on when a robot observes the cell.).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement the known technique of merging map information based on the most recent observation disclosed by Dorr by merging/processing the robot/agent generated semantic information stored in the Knowledge Engine disclosed by Moon in view of Im based on each object’s most recent observation. The suggestion/motivation for doing so would have been “Following this approach, maps can easily be merged by only using the latest updated cell and discarding all obsolete ones. This produces almost no overhead and leverages prior available information” (Dorr: III. E. Map Merging). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Moon in view of Im with Dorr to obtain the invention as specified in claim 5.
	Regarding claim 6, Moon in view of Im and Dorr discloses:  The method of claim 5, wherein merging the structured information comprises merging the information about the states of the objects determined to have been updated within a preset time period back from a current time using the timestamp information (Dorr: III. E. Map Merging: “Map merging is needed when agents receive new map information from the server or when detected changes of the agents have to be merged with the server's global map. In both cases, we use a timestamp-based strategy where each cell keeps the timestamp of its most current observation. Following this approach, maps can easily be merged by only using the latest updated cell and discarding all obsolete one”; Wherein the time between the last updated timestamp and the current time constitutes a preset time period.).  
As per claim 13, arguments made in rejecting claim 5 are analogous.
As per claim 14, arguments made in rejecting claim 6 are analogous.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANTHONY J RODRIGUEZ whose telephone number is (703)756-5821. The examiner can normally be reached Monday-Friday 10am-7pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached at (571) 272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/ANTHONY J RODRIGUEZ/Examiner, Art Unit 2672



/SUMATI LEFKOWITZ/Supervisory Patent Examiner, Art Unit 2672

Read full office action

Prosecution Timeline

Oct 03, 2023

Application Filed

Dec 15, 2025

Non-Final Rejection mailed — §103

Mar 11, 2026

Response Filed

Apr 09, 2026

Final Rejection mailed — §103

Jun 09, 2026

Request for Continued Examination

Jun 12, 2026

Response after Non-Final Action

Jul 13, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/972,931

Patent 12499701

DOCUMENT CLASSIFICATION METHOD AND DOCUMENT CLASSIFICATION DEVICE

3y 1m to grant Granted Dec 16, 2025

17/897,121

Patent 12488563

Hub Image Retrieval Method and Device

3y 3m to grant Granted Dec 02, 2025

17/847,222

Patent 12444019

IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND MEDIUM

3y 3m to grant Granted Oct 14, 2025

Study what changed to get past this examiner. Based on 3 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

30%

Grant Probability

28%

With Interview (-1.4%)

3y 1m (~4m remaining)

Median Time to Grant

High

PTA Risk

Based on 27 resolved cases by this examiner. Grant probability derived from career allowance rate.