Last updated: May 04, 2026

Application No. 18/795,614

IMAGE EXPLANATION SYSTEM, IMAGE ANALYSIS DEVICE, AND IMAGE EXPLANATION METHOD

Non-Final OA §102

Filed

Aug 06, 2024

Priority

Sep 26, 2023 — JP 2023-162954

Examiner

ISLAM, MOHAMMAD K

Art Unit

2653

Tech Center

2600 — Communications

Assignee

Hitachi, Ltd.

OA Round

1 (Non-Final)

Interview Optional

— +16.6% interview lift. Examiner has a relatively high allowance rate (83%); +16.6% interview lift. A written response may suffice.

Based on 1295 resolved cases, 2023–2026

Examiner Intelligence

ISLAM, MOHAMMAD K View full profile →

Grants 83% — above average

Career Allowance Rate

1076 granted / 1295 resolved

+21.1% vs TC avg

Strong +17% interview lift

Without

With

+16.6%

Interview Lift

resolved cases with interview

Typical timeline

2y 8m

Avg Prosecution

79 currently pending

Career history

1374

Total Applications

across all art units

Statute-Specific Performance

§101

21.4%

-18.6% vs TC avg

§103

32.7%

-7.3% vs TC avg

§102

24.9%

-15.1% vs TC avg

§112

14.6%

-25.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1295 resolved cases

Office Action

§102

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in Japan on 09/26/2023. It is noted, however, that applicant has not filed a certified copy of the JP2023-162954 application as required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 08/06/2024, 02/04/2025, and 09/05/2025 are considered by the examiner.
Drawings
The drawing submitted on 08/06/2024 is considered by the examiner.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 11-30 are rejected under 35 U.S.C. 102(a)(2)as being anticipated by William et al. “Towards Language Models That can see: Computer Vision Through LENS of Natural Language”.

Regarding Claims 11 and 21, William teaches: An explanation generating system comprising a database (Fig.2 frozen LLM and a set of “vision modules” ) which can be searched by a worker in natural language; a processor which stores a sentence to said database (Introduction: Large Language Models (LLMs) …capabilities in semantic understanding, question answering and text generation …Figure 2: The LENS  framework. LENS executes computer vision and visual reasoning tasks through a frozen LLM and a set of “vision modules”. LENS leverages these vision modules to retrieve a textual description for an image which is used by the “reasoning module” (LLM) to generate a response for a given query); wherein said processor: receives scene information indicating a scene recognized by a generative model (LENS) which can analyze an image from a camera and output in natural language (3.2, LENS Components, LENS consists of 3  distinct vision modules and 1 reasoning module, each serving a specific purpose based on the task at hand. Tag Module. Given an image, this module identifies and assigns tags to the image. To accomplish this, we employ a vision encoder(CLIP) that selects the most suitable tags for each image. In our work, we adopt a common prompt: "A photo of{classname}" … Attributes Module. We utilize this module to identify and assign relevant attributes to the objects present in the image. Intensive Captioner. We utilize an image captioning model called BLIP and apply stochastic top-k sampling [12] to generate N captions per image. Reasoning Module. We adopt a frozen LLM as our reasoning module, which is capable of generating answers based on the textual descriptions fed by the vision modules, along with the task-specific instructions.); inputs a prompt to said generative model, wherein said prompt corresponds to said scene information and said prompt is generated based on explanation necessity of an object set for each scene; receives situation explanatory sentence generated by said generative model according to said prompt (3.3 Prompt Design With the textual information obtained from the vision modules ,we construct complete prompts for the LLM by combining them. We formatted the tags module as Tags: {Top-k tags},the attributes modules as Attributes: {Top-K attributes}, the intensive captioning module as Captions: {Top-N Captions}. In particular, for the hateful-memes task, we incorporate an OCR prompt as OCR: this is an image with written"{meme text}" on it. Finally, we append the specific question prompt: Question: {task-specific prompt} \n Short Answer: at the end. Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place.”, and LENS generated answer with explanation of the object with the scene, “The Great Wall of China is a fortification built by the ancient Chinese to keep out invaders.”.); and associates said situation explanatory sentence with said image, and stores said situation explanatory sentence into said database (3.2 LENS Components LENS consists of 3 distinct vision modules and 1 reasoning module, each serving a specific purpose based on the task at hand. 4.2 Implementation Details, We use OpenCLIP-H/142 and CLIP-L/143 as our default vision encoders in both tags and attributes modules.
We adopt BLIP-large4 captioning checkpoint finetuned on COCO [36] in intensive captioning module. In this module, we perform a top-k sampling [12], where k represents the desired number of captions and generates a maximum of k = 50 captions per image. Finally, we adopt Flan-T5 models as our default family of frozen LLMs [37]. To generate answers in line with the evaluation tasks, we employ beam search with number of beams equal to 5. ).

Regarding Claims 12 and 22, William teach:  The explanation generating system according to claim 11 wherein said generative model includes a language model (LLM ) (See rejection of claim 11 and 3.2, LENS Components, Reasoning Module. We adopt a frozen LLM as our reasoning module, which is capable of generating answers based on the textual descriptions fed by the vision modules, along with the task-specific instructions.).

Regarding Claims 13 and 23, William teach: The explanation generating system according to claim 11 wherein said prompt is generated based on said explanation necessity and recognition necessity (See rejection of claim 11 and Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place. ).

Regarding Claims 14 and 24, William teach: The explanation generating system according to claim 11 wherein said prompt includes an image explanatory sentence(See rejection of claim 11 and Fig.4).

Regarding Claims 15 and 25, William teach:  The explanation generating system according to claim 11 wherein said prompt includes a control explanatory sentence (See rejection of claim 11 and Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place.).

Regarding Claims 16 and 26, William teach: The explanation generating system according to claim 11 wherein said prompt includes a GPS (China) explanatory sentence(See rejection of claim 11 and Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place and LENS generated answer with explanation of the object with the scene, “The Great Wall of China is a fortification built by the ancient Chinese to keep out invaders.”.).

Regarding Claims 17 and 27, William teaches: The explanation generating system according to claim 12 wherein said prompt is generated based on said explanation necessity and recognition necessity (See rejection of claim 11 and Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place.).

Regarding Claims 18 and 28, William teaches: The explanation generating system according to claim 17 wherein said prompt includes an image explanatory sentence (See rejection of claim 11 and Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place.).

Regarding Claims 19 and 29, William teaches: The explanation generating system according to claim 18 wherein said prompt includes a control explanatory sentence(See rejection of claim 11 and Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place.).

Regarding Claims 20 and 30, William teaches: The explanation generating system according to claim 19 wherein said prompt includes a GPS (China) explanatory sentence(See rejection of claim 11 and Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place and LENS generated answer with explanation of the object with the scene, “The Great Wall of China is a fortification built by the ancient Chinese to keep out invaders.”.).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Chae et al. ( KR 102785215 B1) teach: System And Method For Providing Conversational Artificial Intelligence Service Using Complex Analysis Of Image And Query.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras Shah can be reached at 571-270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2653

Read full office action

Prosecution Timeline

Aug 06, 2024

Application Filed

May 13, 2025

Response after Non-Final Action

Feb 11, 2026

Non-Final Rejection — §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/029,946

Patent 12614072

PRETRAINING SYSTEM AND METHOD FOR SEISMIC DATA PROCESSING USING MACHINE LEARNING

3y 0m to grant Granted Apr 28, 2026

17/964,165

Patent 12608606

DATA PROCESSING METHOD AND APPARATUS

3y 6m to grant Granted Apr 21, 2026

18/593,718

Patent 12608542

DEFECT KNOWLEDGE CIRCULATION SYSTEM

2y 1m to grant Granted Apr 21, 2026

18/662,747

Patent 12608543

GENERATING VIRTUAL SENSOR PARAMETERS USING LARGE LANGUAGE MODELS FOR SYNTHETIC DATA GENERATION

1y 11m to grant Granted Apr 21, 2026

18/028,597

Patent 12601849

SYSTEMS AND METHODS FOR PLANNING SEISMIC DATA ACQUISITION WITH REDUCED ENVIRONMENTAL IMPACT

3y 0m to grant Granted Apr 14, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

83%

Grant Probability

99%

With Interview (+16.6%)

2y 8m (~11m remaining)

Median Time to Grant

Low

PTA Risk

Based on 1295 resolved cases by this examiner. Grant probability derived from career allowance rate.