Last updated: April 19, 2026

Application No. 18/694,898

DESCRIPTION GENERATION DEVICE, METHOD, AND PROGRAM

Non-Final OA §103

Filed

Mar 22, 2024

Examiner

ABDI, AMARA

Art Unit

2668

Tech Center

2600 — Communications

Assignee

Kyoto University

OA Round

1 (Non-Final)

Interview Optional

— -7.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 816 resolved cases, 2023–2026

Examiner Intelligence

ABDI, AMARA View full profile →

Grants 83% — above average

Career Allow Rate

677 granted / 816 resolved

+21.0% vs TC avg

Minimal -8% lift

Without

With

+-7.5%

Interview Lift

resolved cases with interview

Typical timeline

2y 7m

Avg Prosecution

33 currently pending

Career history

849

Total Applications

across all art units

Statute-Specific Performance

§101

9.8%

-30.2% vs TC avg

§103

60.7%

+20.7% vs TC avg

§102

10.2%

-29.8% vs TC avg

§112

10.0%

-30.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 816 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
-- “an acquiring section configured to acquire, …”, “an updating section 
configured to … specify actions”, “a generating section configured to, …generate sentences”, in claim 1;
	-- “a training section configured to train …”, in claim 5.

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, and 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al, (US-PGPUB 20210118447) in view of Wallace (US-Patent 10,480,990)

In regards to claim 1, Kim et al discloses a description generation device, (see 
at least: Fig. 1, Par. 0057, “AI device (or an AI apparatus) 100”), comprising: 
an acquiring section configured to acquire, for a task including a plurality of steps, 
an updating section configured to, based on the video characteristic amounts of the respective videos of each of the steps, specify actions with respect to materials that are included in the videos of each of the steps, 
a generating section configured to, based on learning processor 130 may provide the cooking content image 604, “i.e., video characteristic amounts”, to the text recognition model 605 as input data to generate the image description text 606 as output data, “i.e., generate sentences describing procedures of the task for each of the steps”; and from Par. 0174, if the text … is Trim oyster mushrooms,” the text recognition model may output the image description text 803 “Trim oyster mushrooms”, [i.e., based on the specified actions, “Trim oyster mushrooms”, and the video characteristic amounts, “based on the video characteristic amounts”, generate sentences describing procedures of the task for each of the steps, “generate the image description text 606 as output data”]).
Kim et al does not expressly disclose acquiring material characteristic amounts expressing respective materials used in the task, and updating the material characteristic amounts of specified materials in accordance with specified actions; and generating, based on the updated material characteristic amounts, sentences describing procedures of the task for each of the steps.
Wallace discloses acquiring material characteristic amounts expressing respective materials used in the task, (see at least: col. 4, lines 9-11, implicit by displaying the recipe or one or more ingredients such that the desired amount of information is visible on a display field at the same time);

updating the material characteristic amounts of specified materials in accordance with specified actions, (see at least: col. 2, line 53 through col. 3, line 2, implicit by adjustment to the amount of the ingredient, “updating the material characteristic amounts of specified materials”, through preselecting an adjusted amount or adding less or more, “specified actions”, than the targeted amount of the ingredient as predetermined in the recipe); and
generating, based on the updated material characteristic amounts, sentences describing procedures of the task for each of the steps, (see at least: col. 11, lines 56-64, performing adjustments to ingredients including highlighting each recipe ingredient block 112. Such highlighting can include textual, visual, audio, video or other notification of operations being performed regarding that recipe ingredient block, [i.e., generating sentences describing procedures of the task for each of the steps, “implicit by highlighting including textual notification”, based on the updated material characteristic amounts, “implicitly based on the adjustments to ingredients”]).
Kim and Wallace are combinable because they are both concerned with cooking recipe description. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Kim, to use textual notification, as though by Wallace, in order to highlight the adjustments to ingredients, (Wallace, col. 11, lines 56-64).

In regards to claim 2, the combine teaching Kim and Wallace as whole discloses the limitations of claim 1.

Wallace further discloses wherein the updating section is configured to use, as the material characteristic amounts that are targets of updating, material characteristic amounts that have been updated with respect to a video of a previous step, in chronological order of the steps in the task, (see at least: col. 2, line 53 through col. 3, line 2, implicit by adjustment to the amount of the ingredient. Furthermore, the chronological order of the steps in the task, is well known in the art).

In regards to claim 3, the combine teaching Kim and Wallace as whole discloses the limitations of claim 1.
Wallace further discloses wherein the updating section is configured to carry out at least one of addition, deletion or merging of material characteristic amounts with respect to the updated material characteristic amounts, (col. 3, lines 1-2, implicit by adding less or more, “at least one of addition”, than the targeted amount of the ingredient as predetermined in the recipe).

Regarding claim 9, claim 9 recites substantially similar limitations as set forth in claim 1. As such, claim 9 is rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation(s): “description generation method executed by a computer”. However, Kim discloses the “description generation method executed by a computer”, (see at least: Fig. 4, and Par. 0011, “method”).

Regarding claim 10, claim 10 recites substantially similar limitations as set forth in claim 1. As such, claim 10 is rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation(s): “a non-transitory storage medium storing a description generation program that is executable by a computer”. However, discloses the “non-transitory storage medium storing a description generation program that is executable by a computer”, (see at least: Par. 0211, “computer-readable recording medium“).

Claims 4 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al, and Wallace, as applied to claim 1; and further in view of Goldberg (US-PGPUB 20200167722)

In regards to claim 4, the combine teaching Kim and Wallace as whole discloses the limitations of claim 1.
Wallace further discloses that the updating section is configured to specify actions from video characteristic amounts, (see at least: col. 4, lines 7-9, implicit by presenting a recipe or one or more ingredients to a user via one or more formats that include text, video, graphics, audio, and text).
In the other hand, Kim discloses that the generating section is configured to generate the sentences by using a second model that has been trained in advance so as to generate sentences describing procedures of the task for each of the steps, based on material characteristic amounts, actions and video characteristic amounts, (see at least: Abstract, and Par. 0013, implicit by using an artificial intelligence apparatus including a learning processor configured to generate recipe text including at least one of cooking ingredient information or description text of cooking from cooking content).
The combine teaching Kim and Wallace as whole does not expressly disclose updating the material characteristic amounts by using a first model that has been trained in advance so as to update the material characteristic amounts based on of specified actions.
However, Goldberg discloses updating the material characteristic amounts by using a first model that has been trained in advance so as to update the material characteristic amounts based on of specified actions, (see at least: Par. 0114, 0118, the machine learning system 106 may receive further training using updated historical ingredient use information 402 …. and predicts future supply need of one or more ingredients 202, [i.e., updating the material characteristic amounts, “future supply need of one or more ingredients”, by using a first model, “machine learning system 106”, that has been trained in advance, “trained using updated historical ingredient”, so as to update the material characteristic amounts based on of specified actions, “implicit by predicting future supply need of one or more ingredients 202”]).
Kim, Wallace, and Goldberg are combinable because they are all concerned with cooking recipe description. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Kim and Wallace, to apply the machine learning system 106, as though by Goldberg, in order to predict the future supply need of one or more ingredients 202, (Par. 0114)

In regards to claim 5, the combine teaching Kim, Wallace, and Goldberg as whole discloses the limitations of claim 1.
Kim further discloses a training section configured to train the second model by using, as training data, a material list and videos for each of the steps, and sentences of correct answers that correspond to the material list and the videos for each of the steps, (see at least: Par. 0145-0148, implicit by providing cooking content including video and audio of cooking process, to a recipe text generation model and generate recipe text),
Further, in the other hand, Goldberg discloses training the first model using, as training data, material list and videos for each of the steps, and sentences of correct answers that correspond to the material list and the videos for each of the steps, (see at least: Par. 0114, 0118, the machine learning system 106 may receive further training using updated historical ingredient use information 402 …. and predicts future supply need of one or more ingredients 202).

Allowable Subject Matter
Claims 6-8 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

In regards to claim 6, the combine teaching Kim, Wallace, and Goldberg as whole discloses the underlined limitations of claim 1.

“wherein the training section is configured to train the first model and the second model so as to minimize a total loss that includes a first loss, which is based on comparison of sentences generated by the generating section and the sentences of the correct answers, and a second loss, which is based on comparison of the actions and the material characteristic amounts specified at the updating section, and actions and materials of correct answers included in the videos for each of the steps”

Kim et al (US-PGPUB 20210118447) discloses a description generation device, (see at least: Fig. 1, Par. 0057, “AI device (or an AI apparatus) 100”), comprising: 
an acquiring section configured to acquire, for a task including a plurality of steps, video characteristic amounts extracted from respective videos of each of the steps that capture the task, (see at least: Par. 0181-0183, the  recipe information may include … description of steps of a cooking process, [i.e., a task, “cooking process”, including a plurality of steps, “implicit by description of steps of a cooking process”]. Further, from Par. 0162, If the cooking content is a cooking video explaining a food recipe, the image 701 in the cooking content may be a frame-by-frame image in the cooking image, [i.e., acquiring video characteristic amounts, “implicit by the cooking content”, extracted from respective videos, “cooking videos”, of each of the steps that capture the task, including a plurality of steps “implicit by the cooking video explaining a food recipe for the cooking process”]);
an updating section configured to, based on the video characteristic amounts of the respective videos of each of the steps, specify actions with respect to materials that are included in the videos of each of the steps, (see at least: Par. 0185, when the word “egg” about the cooking ingredient in the description text is included in cooking ingredient information, the processor 180 may determine the description text “Boil an egg” as recipe information of cooking, [i.e., based on the video characteristic amounts of the respective videos of each of the steps, “when word egg is included in the cooking content”, specify actions with respect to materials that are included in the videos of each of the steps, “determining that the text “Boil an egg” as recipe information of cooking”]); and 
a generating section configured to, based on the specified actions and the video characteristic amounts, generate sentences describing procedures of the task for each of the steps, (see at least: Par. 0170-0173, the learning processor 130 may provide the cooking content image 604, “i.e., video characteristic amounts”, to the text recognition model 605 as input data to generate the image description text 606 as output data, “i.e., generate sentences describing procedures of the task for each of the steps”; and from Par. 0174, if the text … is Trim oyster mushrooms,” the text recognition model may output the image description text 803 “Trim oyster mushrooms”, [i.e., based on the specified actions, “Trim oyster mushrooms”, and the video characteristic amounts, “based on the video characteristic amounts”, generate sentences describing procedures of the task for each of the steps, “generate the image description text 606 as output data”]).
Kim et al further discloses the artificial neural network, which may be used to determine the model parameters that minimize a loss function, which the loss function may be used as an index to determine optimal model parameters in the learning process of the artificial neural network, (Par. 0039).

However, while disclosing using  the artificial neural network to determine the model parameters that minimize a loss function; Kim et al fails to teach or suggest, either alone or in combination with the other cited references, training the first model and the second model so as to minimize a total loss that includes a first loss, which is based on comparison of sentences generated by the generating section and the sentences of the correct answers, and a second loss, which is based on comparison of the actions and the material characteristic amounts specified at the updating section, and actions and materials of correct answers included in the videos for each of the steps”

A further prior art of record, Peters et al, (US-PGPUB 20210086753) discloses training the first model and the second model so as to minimize a total loss, (see at least: Fig. 7, and Par. 0075, first model portion 404 and the second model portion 410 may be trained by reducing, for example minimizing, the total loss value 614); but fails to teach or suggest, either alone or in combination with the other cited references, that the total loss includes a first loss, which is based on comparison of sentences generated by the generating section and the sentences of the correct answers, and a second loss, which is based on comparison of the actions and the material characteristic amounts specified at the updating section, and actions and materials of correct answers included in the videos for each of the steps”.

Regarding claims 7 and 8, claims 7 and 8 are in condition for allowance based at least on their dependency from claim 6.

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMARA ABDI whose telephone number is (571)272-0273. The examiner can normally be reached 9:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached at (571) 272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AMARA ABDI/Primary Examiner, Art Unit 2668                                                                                                                                                                                            02/07/2026

Read full office action

Prosecution Timeline

Mar 22, 2024

Application Filed

Feb 07, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/569,692

Patent 12602822

METHOD DEVICE AND STORAGE MEDIUM FOR BACK-END OPTIMIZATION OF SIMULTANEOUS LOCALIZATION AND MAPPING

2y 5m to grant Granted Apr 14, 2026

18/962,814

Patent 12597252

METHOD OF TRACKING OBJECTS

2y 5m to grant Granted Apr 07, 2026

18/288,713

Patent 12576595

SYSTEMS AND METHODS FOR IMPROVED VOLUMETRIC ADDITIVE MANUFACTURING

2y 5m to grant Granted Mar 17, 2026

18/222,744

Patent 12574469

VIDEO SURVEILLANCE SYSTEM, VIDEO PROCESSING APPARATUS, VIDEO PROCESSING METHOD, AND VIDEO PROCESSING PROGRAM

2y 5m to grant Granted Mar 10, 2026

18/222,360

Patent 12563154

VIDEO SURVEILLANCE SYSTEM, VIDEO PROCESSING APPARATUS, VIDEO PROCESSING METHOD, AND VIDEO PROCESSING PROGRAM

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

83%

Grant Probability

76%

With Interview (-7.5%)

2y 7m

Median Time to Grant

Low

PTA Risk

Based on 816 resolved cases by this examiner. Grant probability derived from career allow rate.