Prosecution Insights
Last updated: April 19, 2026
Application No. 18/795,614

IMAGE EXPLANATION SYSTEM, IMAGE ANALYSIS DEVICE, AND IMAGE EXPLANATION METHOD

Non-Final OA §102
Filed
Aug 06, 2024
Examiner
ISLAM, MOHAMMAD K
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Hitachi, Ltd.
OA Round
1 (Non-Final)
83%
Grant Probability
Favorable
1-2
OA Rounds
2y 9m
To Grant
99%
With Interview

Examiner Intelligence

Grants 83% — above average
83%
Career Allow Rate
1070 granted / 1288 resolved
+21.1% vs TC avg
Strong +16% interview lift
Without
With
+16.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
83 currently pending
Career history
1371
Total Applications
across all art units

Statute-Specific Performance

§101
21.4%
-18.6% vs TC avg
§103
32.6%
-7.4% vs TC avg
§102
25.0%
-15.0% vs TC avg
§112
14.6%
-25.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1288 resolved cases

Office Action

§102
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Priority Acknowledgment is made of applicant's claim for foreign priority based on an application filed in Japan on 09/26/2023. It is noted, however, that applicant has not filed a certified copy of the JP2023-162954 application as required by 37 CFR 1.55. Information Disclosure Statement The information disclosure statement (IDS) submitted on 08/06/2024, 02/04/2025, and 09/05/2025 are considered by the examiner. Drawings The drawing submitted on 08/06/2024 is considered by the examiner. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. Claim(s) 11-30 are rejected under 35 U.S.C. 102(a)(2)as being anticipated by William et al. “Towards Language Models That can see: Computer Vision Through LENS of Natural Language”. Regarding Claims 11 and 21, William teaches: An explanation generating system comprising a database (Fig.2 frozen LLM and a set of “vision modules” ) which can be searched by a worker in natural language; a processor which stores a sentence to said database (Introduction: Large Language Models (LLMs) …capabilities in semantic understanding, question answering and text generation …Figure 2: The LENS framework. LENS executes computer vision and visual reasoning tasks through a frozen LLM and a set of “vision modules”. LENS leverages these vision modules to retrieve a textual description for an image which is used by the “reasoning module” (LLM) to generate a response for a given query); wherein said processor: receives scene information indicating a scene recognized by a generative model (LENS) which can analyze an image from a camera and output in natural language (3.2, LENS Components, LENS consists of 3 distinct vision modules and 1 reasoning module, each serving a specific purpose based on the task at hand. Tag Module. Given an image, this module identifies and assigns tags to the image. To accomplish this, we employ a vision encoder(CLIP) that selects the most suitable tags for each image. In our work, we adopt a common prompt: "A photo of{classname}" … Attributes Module. We utilize this module to identify and assign relevant attributes to the objects present in the image. Intensive Captioner. We utilize an image captioning model called BLIP and apply stochastic top-k sampling [12] to generate N captions per image. Reasoning Module. We adopt a frozen LLM as our reasoning module, which is capable of generating answers based on the textual descriptions fed by the vision modules, along with the task-specific instructions.); inputs a prompt to said generative model, wherein said prompt corresponds to said scene information and said prompt is generated based on explanation necessity of an object set for each scene; receives situation explanatory sentence generated by said generative model according to said prompt (3.3 Prompt Design With the textual information obtained from the vision modules ,we construct complete prompts for the LLM by combining them. We formatted the tags module as Tags: {Top-k tags},the attributes modules as Attributes: {Top-K attributes}, the intensive captioning module as Captions: {Top-N Captions}. In particular, for the hateful-memes task, we incorporate an OCR prompt as OCR: this is an image with written"{meme text}" on it. Finally, we append the specific question prompt: Question: {task-specific prompt} \n Short Answer: at the end. Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place.”, and LENS generated answer with explanation of the object with the scene, “The Great Wall of China is a fortification built by the ancient Chinese to keep out invaders.”.); and associates said situation explanatory sentence with said image, and stores said situation explanatory sentence into said database (3.2 LENS Components LENS consists of 3 distinct vision modules and 1 reasoning module, each serving a specific purpose based on the task at hand. 4.2 Implementation Details, We use OpenCLIP-H/142 and CLIP-L/143 as our default vision encoders in both tags and attributes modules. We adopt BLIP-large4 captioning checkpoint finetuned on COCO [36] in intensive captioning module. In this module, we perform a top-k sampling [12], where k represents the desired number of captions and generates a maximum of k = 50 captions per image. Finally, we adopt Flan-T5 models as our default family of frozen LLMs [37]. To generate answers in line with the evaluation tasks, we employ beam search with number of beams equal to 5. ). Regarding Claims 12 and 22, William teach: The explanation generating system according to claim 11 wherein said generative model includes a language model (LLM ) (See rejection of claim 11 and 3.2, LENS Components, Reasoning Module. We adopt a frozen LLM as our reasoning module, which is capable of generating answers based on the textual descriptions fed by the vision modules, along with the task-specific instructions.). Regarding Claims 13 and 23, William teach: The explanation generating system according to claim 11 wherein said prompt is generated based on said explanation necessity and recognition necessity (See rejection of claim 11 and Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place. ). Regarding Claims 14 and 24, William teach: The explanation generating system according to claim 11 wherein said prompt includes an image explanatory sentence(See rejection of claim 11 and Fig.4). Regarding Claims 15 and 25, William teach: The explanation generating system according to claim 11 wherein said prompt includes a control explanatory sentence (See rejection of claim 11 and Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place.). Regarding Claims 16 and 26, William teach: The explanation generating system according to claim 11 wherein said prompt includes a GPS (China) explanatory sentence(See rejection of claim 11 and Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place and LENS generated answer with explanation of the object with the scene, “The Great Wall of China is a fortification built by the ancient Chinese to keep out invaders.”.). Regarding Claims 17 and 27, William teaches: The explanation generating system according to claim 12 wherein said prompt is generated based on said explanation necessity and recognition necessity (See rejection of claim 11 and Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place.). Regarding Claims 18 and 28, William teaches: The explanation generating system according to claim 17 wherein said prompt includes an image explanatory sentence (See rejection of claim 11 and Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place.). Regarding Claims 19 and 29, William teaches: The explanation generating system according to claim 18 wherein said prompt includes a control explanatory sentence(See rejection of claim 11 and Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place.). Regarding Claims 20 and 30, William teaches: The explanation generating system according to claim 19 wherein said prompt includes a GPS (China) explanatory sentence(See rejection of claim 11 and Also see Fig.4, where an user query scene/image with “Tell me something about the history of this place and LENS generated answer with explanation of the object with the scene, “The Great Wall of China is a fortification built by the ancient Chinese to keep out invaders.”.). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Chae et al. ( KR 102785215 B1) teach: System And Method For Providing Conversational Artificial Intelligence Service Using Complex Analysis Of Image And Query. Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP). Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras Shah can be reached at 571-270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2653
Read full office action

Prosecution Timeline

Aug 06, 2024
Application Filed
May 13, 2025
Response after Non-Final Action
Feb 11, 2026
Non-Final Rejection — §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12601849
SYSTEMS AND METHODS FOR PLANNING SEISMIC DATA ACQUISITION WITH REDUCED ENVIRONMENTAL IMPACT
2y 5m to grant Granted Apr 14, 2026
Patent 12596361
FAILURE DIAGNOSIS METHOD, METHOD OF MANUFACTURING DISK DEVICE, AND RECORDING MEDIUM
2y 5m to grant Granted Apr 07, 2026
Patent 12596872
HOLISTIC EMBEDDING GENERATION FOR ENTITY MATCHING
2y 5m to grant Granted Apr 07, 2026
Patent 12596868
CREATING A DIGITAL ASSISTANT
2y 5m to grant Granted Apr 07, 2026
Patent 12597434
CONTROL OF SPEECH PRESERVATION IN SPEECH ENHANCEMENT
2y 5m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
99%
With Interview (+16.5%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 1288 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month