Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The Amendment, filed on 12/16/2025, has been entered and acknowledged by the Examiner. Claims 1-11 are pending.
Rejection of under 35 ss. 101 is withdrawn in light of the cancellation of claims 12-20.
Response to Arguments
Applicant's arguments filed 12/16/2025 have been fully considered but they are not persuasive.
Issue: The applicant argues that either Cella nor Liu teaches or suggests fine-tuning visual foundation models with remote-sensing imagery datasets for remote sensing-specific tasks. Cella is a broad disclosure of AI/ML platforms and digital twins, but nowhere does this reference disclose or suggest fine-tuning vision models specifically with remote-sensing imagery datasets for remote sensing applications. Cella states: "learning may be used for adapting or tuning data collected for one task on another related task. For example, transfer learning may reuse a model developed for one task as the starting point for a model on a second related task." However, this is a vague generic statement about staging the output of one model as the input to the next and does not disclose or suggest the claimed step of fine-tuning VFMs with remote-sensing imagery datasets to adapt or optimize for remote sensing image processing.
Response: The examiner respectfully disagrees and submits that Cella at least discloses an asset management application 814 may use robotic process automation 1442 for automation of an asset inspection process that is normally performed or supervised by a human (such as by automating a process involving visual inspection using video or still images from a camera or other that displays images of an entity 652, such as where the robotic process automation 1442 system is trained to automate the inspection by observing interactions of a set of human inspectors or supervisors with an interface that is used to identify, diagnose, measure, parameterize, or otherwise characterize possible defects or favorable characteristics of a facility or other asse does disclose fine-tuning visual data for remote sensing applications (¶ [0262]).
Issue: The applicant argues that Liu does mention using "Segment Anything" and "Grounding DINO" as part of its toolkit (see Table 1, p. 7), but these are not fine-tuned or adapted for remote-sensing imagery. The cited models in Liu are used for general computer vision tasks, not for the remote-sensing- specific applications recited in the claims. Similarly, Cella does not mention OV-DETR, Grounding Dino, or Segment Anything Model, nor does it discuss hydrological or geomorphological catastrophes. The Office Action's citation to [0726] ("hydrodynamic changes") and [0600] ("catastrophes") in Cella refers to general digital twin modeling of maritime assets, not to remote-sensing image analysis or fine- tuning of vision models for such tasks as recited in the claims and disclosed in the present application.
Response: The examiner respectfully disagrees and submits that Cella discloses a machine learning system can learn, using the training data set, to identify the same characteristics, which in turn can be used to automate the inspection process such that defects or favorable properties are automatically classified and detected in a set of video or still images (¶ [0262]). Which can be utilized with the mentioned toolkits.
Issue: The applicant argues that Applicant submits in traversal that while both references describe cloud-based systems and user interfaces, neither reference discloses or suggests the specific architecture recited in the claims, where a prompt manager forwards a user's natural language query to an LLM, which then determines and invokes a matched VFM that has been fine-tuned for remote-sensing imagery. Further, Applicant respectfully submits that the claimed invention addresses a specific technical problem in remote-sensing image analysis: enabling non-expert users to perform complex image- related tasks on remote-sensing images via natural language, by leveraging a combination of LLMs and VFMs that have been fine-tuned with remote-sensing datasets. The cited references do not address or solve this problem.
Response: The examiner respectfully disagrees and submits that as aforementioned Cella and Liu, in combined, disclose the fine-tuning remote-sensing video and image data collected from sensor systems to monitor, analyze, inspect and fine-tune for management. Therefore, Applicant’s arguments are not persuasive.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-11 are rejected under 35 U.S.C. 103(a) as being unpatentable over Cella et al. (US Pub. 20230123322) in view of Liu (InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language).
Regarding claim 1, Cella discloses a computer-implemented method for chatting with a user in natural language and performing an image-related task mentioned or hinted by the user in a query, the image-related task being performed on one or more remote-sensing images (¶ [0375]; remotely ¶ [0993]), the method comprising:
setting up, in software, a platform used for bidirectionally communicating with the user and performing the image-related task, wherein the platform comprises a prompt manager for communicating and prompting a large language model (LLM) (Fig. 4), the prompt manager being arranged to:
forward the query to the LLM so as to cause the LLM to identify the image-related task from the query and determine one or more image-processing actions to be performed on the one or more remote-sensing images for accomplishing the image-related task (¶ [0158], [0375], active learning may include interactively querying, by the machine learning model AILD102T, a user and/or an information source to label new data points with desired outputs; ¶ [0260], For example, the artificial intelligence system 1160 may process image frames of the video feed to find markings (such as produce labels, SKUs, images, logos, or the like));
prompt the LLM to provoke a matched visual foundation model (VFM) selected from a predetermined set of one or more VFMs to perform an individual image-processing action if the LLM determines that the individual image-processing action matches an image-processing operation performable by the matched VFM (¶ [1286], an AI system 10212 may automate one or more of the design, configuration, scheduling, coordination and/or execution of a set of robotic jobs and a set of additive manufacturing jobs, such that the capabilities of an integrated mobile robotic and additive manufacturing unit are coordinated across the various jobs in time (e.g., where an interior 3D printer or other additive manufacturing unit 10102 prints a tool, workpiece, part or the like for a later job while the robotic unit performs a current job) and/or wherein jobs are coordinated across a fleet or workforce of robotic units, additive manufacturing units, and integrated combinations thereof (such as where units are matched to jobs according to locations, robotic capabilities, additive manufacturing capabilities, and other factors); and
receive from the LLM a reply to the user on any outcome of the image-related task (¶ [1286], matched to jobs);
using the platform to receive the query from the user, process the query and forward the reply to the user (¶ 1299], produce an output); and
before the platform is used to process the query, fine-tuning one or more selected VFMs with one or more remote-sensing imagery datasets (¶ [0262]) such that one or more respective image-processing operations performable by the one or more selected VFMs are adapted to or optimized for remote sensing-related image processing, wherein the one or more selected VFMs are selected from the predetermined set of one or more VFMs (¶ [1918], learning may be used for adapting or tuning data collected for one task on another related task. For example, transfer learning may reuse a model developed for one task as the starting point for a model on a second related task).
Cella does not explicitly disclose a visual foundation model (VFM); however, Liu discloses a visual foundation model (p. 4, connecting ChatGPT with visual foundation models to generate and edit images during chatting).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Liu into Cella to integrate LLM with a pool of vision experts for multimodal reason and action.
Regarding claim 2, Cella in view of Liu discloses the method of claim 1, wherein the one or more selected VFMs are fine-tuned with the one or more remote-sensing imagery datasets in a self-supervised manner (p. 3, fine-tuned ability).
Regarding claim 3, Cella in view of Liu discloses the method of claim 1, wherein the one or more respective image-processing operations performed by the one or more selected VFMs include one or more operations selected from an image classification operation, an object detection operation and an image segmentation operation (¶ [1118], object detection model).
Regarding claim 4, Cella in view of Liu discloses the method of claim 1, wherein the one or more selected VFMs include one or more of machine-learning models selected from OV-DETR, Grounding Dino and Segment Anything Model (p. 7, DINOv2; p. 11, Segment Anything).
Regarding claim 5, Cella in view of Liu discloses the method of claim 1, wherein the one or more respective image-processing operations performed by the one or more selected VFMs include one or more operations for detecting or identifying one or more types of hydrological or geomorphological catastrophes (¶ [0726], hydrodynamic changes; ¶ [0600], catastrophes).
Regarding claim 6, Cella in view of Liu discloses the method of claim 5, wherein the one or more types of hydrological or geomorphological catastrophes include flooding, landsliding, or both (¶ [0600], catastrophes).
Regarding claim 7, Cella in view of Liu discloses the method of claim 1, wherein the platform is set up in a cloud-computing environment (¶ [0158] cloud computing capability.
Regarding claim 8, Cella in view of Liu discloses the method of claim 1, wherein the platform further comprises a command interface for interfacing with the user, the command interface being arranged to receive the query from the user, forward the received query to the prompt manager and forward the reply to the user (¶ [0528], series of regular prompts that may ask and receive, reading off of event logs or feeds, and the like).
Regarding claim 9, Cella in view of Liu discloses the method of claim 8, wherein the command interface is further arranged to support sending and receiving image files during chatting (¶ [0641]).
Regarding claim 10, Cella in view of Liu discloses the method of claim 1, wherein the LLM is selected to be ChatGPT (p. 4, ChatGPT).
Regarding claim 11, Cella in view of Liu discloses the method of claim 1, wherein the platform further comprises the LLM and the predetermined set of one or more VFMs such that the platform is self-contained with a visual language model (p. 2, Husky is a large-scale visual language model).
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Shao disclose a rig state detection using video data (US Pub. 2023/0186627).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TUANKHANH D PHAN whose telephone number is (571)270-3047. The examiner can normally be reached on Mon-Fri, 10:00am-18:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Boris Gorney can be reached on 571-270-5626. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 or 571-272-1000.
/TUANKHANH D PHAN/ Examiner, Art Unit 2154