Last updated: April 19, 2026

Application No. 18/616,100

ARTIFICIAL INTELLIGENCE ASSISTED INTERVIEW SYSTEM FOR GENERATING AND QUERYING INTERACTIVE VIDEOS

Non-Final OA §102§103

Filed

Mar 25, 2024

Examiner

OPSASNICK, MICHAEL N

Art Unit

2658

Tech Center

2600 — Communications

Assignee

StoryFile, Inc.

OA Round

1 (Non-Final)

Interview Optional

— +10.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 900 resolved cases, 2023–2026

Examiner Intelligence

OPSASNICK, MICHAEL N View full profile →

Grants 82% — above average

Career Allow Rate

737 granted / 900 resolved

+19.9% vs TC avg

Moderate +10% lift

Without

With

+10.5%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

46 currently pending

Career history

946

Total Applications

across all art units

Statute-Specific Performance

§101

17.7%

-22.3% vs TC avg

§103

33.0%

-7.0% vs TC avg

§102

29.9%

-10.1% vs TC avg

§112

6.3%

-33.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 900 resolved cases

Office Action

§102 §103

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-8,10-15, 17-20 are rejected under 35 U.S.C. 102 (a)(1) as being anticipated by Kyllonen et al (20150269529).

As per claim 1, Kyllonen et al (20150269529) teaches a computer-implemented method comprising: 
providing a graphical user interface (GUI) through which information describing an interviewee to be interviewed is specified, wherein the GUI is accessible to an interviewer conducting the interview; 
generating a prompt for a large language model (LLM) that requests a customized set of questions to ask the interviewee during the interview based at least in part on the information describing the interviewee (examiner notes that the LLM is defined as a generative artificial intelligence learning model – see applicants spec; the scoring models in Kyllonen et al (20150269529) are disclosed as machine learning models – see abstract, para 0005, 0029; as the question selection module chooses the question set – Fig. 1, subblock 195 to subblock 110; based on the response score 170, which is derived from the delivery features model 145 and content features 157 from Figure 1); 
obtaining an output from the LLM in response to the generated prompt, wherein the output provides the customized set of questions to ask the interviewee during the interview (as, the machine learning model generates a response score – Figure 1, into the question selection module 195, that selects questions from the question set and output to the interviewee – fig. 1, subblock 110, 120,130);
recording a plurality of segments of the interviewee answering questions from the customized set of questions, wherein a segment corresponds to a video recording of the interviewee while answering a given question from the customized set of questions (as, capturing the interviewee response using audiovisual equipment – para 0018; the predetermined sequence of questions comes from the question selection module – para 0017); 
and generating an interactive video of the interview, wherein the interactive video comprises the plurality of segments that are video recordings of the interviewee answering questions from the customized set of questions (as, the multimodal features of the interviewee response – para 0019, examiner notes the ‘multimodal’ is video/speech  .

As per claim 2, Kyllonen et al (20150269529) teaches the computer-implemented method of claim 1, wherein the GUI (as display interface with microphone and keyboard – Fig. 5c, para 0039) includes a form to specify at least one of: a name of the interviewee, a biography of the interviewee, topics of interest, a target user audience, a number of questions to be generated by the LLM, question length, or question tone (as selected topics – para 0017, bottom, are sufficiently covered; examiner notes that the LLM is defined as a generative artificial intelligence learning model – see applicants spec; the scoring models in Kyllonen et al (20150269529) are disclosed as machine learning models – see abstract, para 0005, 0029).

As per claim 3, Kyllonen et al (20150269529) teaches the computer-implemented method of claim 1, wherein the prompt generated to request the customized set of questions from the LLM identifies at least a name of the interviewee, a biography of the interviewee, a topic of interest, a target user audience, and a number of questions to be generated by the LLM (examiner notes that the claim scope is in the format of “at least”, and Kyllonen et al (20150269529) meets the claim scope with the question set – para 0017, wherein the question sets are of a particular type – para 0033; and, examiner notes that the LLM is defined as a generative artificial intelligence learning model – see applicants spec; the scoring models in Kyllonen et al (20150269529) are disclosed as machine learning models – see abstract, para 0005, 0029;).

As per claim 4, Kyllonen et al (20150269529) teaches the computer-implemented method of claim 1, wherein the customized set of questions generated by the LLM are provided in the GUI (as display interface with microphone and keyboard – Fig. 5c, para 0039), and wherein the GUI is accessible to the interviewer while conducting the interview (as the interviewer has the choice of choosing which questions –para 0016, very end; the machine learning models of Figure 1, 145,157, into 160, chooses a set of questions – figure 1, subblock 110; which is reviewed by a human scorer – figure 3, subblock 370).

As per claim 5, Kyllonen et al (20150269529) teaches the computer-implemented method of claim 1, wherein recording a plurality of segments of the interviewee answering questions from the customized set of questions comprises: 
storing a segment associated with a question answered by the interviewee, wherein the segment corresponds to a video recording of the interviewee while answering the question (as, performing video/image analysis of the interviewee, to determine emotion and other parameters – para 0020, and then, performing transcription conversion on the same segment – see para 0021, 0022);
 generating a second prompt for the LLM that requests one or more follow-up questions to ask the interviewee in response to the question answered by the interviewee; and obtaining a second output from the LLM in response to the second prompt, wherein the second output provides the one or more follow-up questions to ask the interviewee (as determining, by the response scoring model, that additional questions should be administered, then performing followup questions – para 0028, last half; examiner notes, that the ‘follow-up’ questions can be generated by the automated question stack – see Fig. 2, subblock 280 to subblock 290, back to subblock 200 to 210; examiner notes that the LLM is defined as a generative artificial intelligence learning model – see applicants spec; the scoring models in Kyllonen et al (20150269529) are disclosed as machine learning models – see abstract, para 0005, 0029;).

As per claim 6, Kyllonen et al (20150269529) teaches the computer-implemented method of claim 5, comprising: 
determining a transcription of the segment (as, performing video/image analysis of the interviewee, to determine emotion and other parameters – para 0020, and then, performing transcription conversion on the same segment – see para 0021, 0022; and determining content from the transcription by comparing to model responses – para 0022)),
wherein the second prompt to request the one or more follow-up questions includes at least the transcription of the segment associated with the question answered by the interviewee (as determining, by the response scoring model, that additional questions should be administered, then performing followup questions – para 0028, last half; examiner notes, that the ‘follow-up’ questions can be generated by the automated question stack – see Fig. 2, subblock 280 to subblock 290, back to subblock 200 to 210).

As per claim 7, Kyllonen et al (20150269529) teaches the computer-implemented method of claim 1, wherein recording a plurality of segments of the interviewee answering questions from the customized set of questions comprises: 
determining a transcription of a segment, wherein the segment corresponds to a video recording of the interviewee while answering a question (as, performing video/image analysis of the interviewee, to determine emotion and other parameters – para 0020, and then, performing transcription conversion on the same segment – see para 0021, 0022)); 
analyzing the transcription of the segment to determine a tone of the interviewee while answering the question; determining, based on the tone of the interviewee, to re-phrase the customized set of questions (as, looking, as an example, disfluencies of the interviewees speech or body language/mood/etc. – para 0030, and determining a set of follow-up questions based on the delivery features and the content features – para 0028 – “the instant response score and/or previously assigned response scores may be analyzed to adaptively determine which follow-up question should be asked” – see para 0020 for examples of the ‘delivery features’); 
generating a second prompt for the LLM that requests a re-phrasing of the customized set of questions based at least in part on the tone of the interviewee (as the followup questions are based on delivery features and content features – para 0028); and obtaining a second output from the LLM in response to the second prompt, wherein the second output provides the customized set of questions that are re-phrased based on the tone of the interviewee (as generating a new question set based upon the recognized features and scoring of those features – Fig. 2 subblock 290 to subblock 200; examiner notes that the LLM is defined as a generative artificial intelligence learning model – see applicants spec; the scoring models in Kyllonen et al (20150269529) are disclosed as machine learning models – see abstract, para 0005, 0029; ).

 As per claim 8, Kyllonen et al (20150269529) teaches the computer-implemented method of claim 1, wherein recording a plurality of segments of the interviewee answering questions from the customized set of questions comprises:
determining a transcription of a segment, wherein the segment corresponds to a video recording of the interviewee while answering a question (as, performing video/image analysis of the interviewee, to determine emotion and other parameters – para 0020, and then, performing transcription conversion on the same segment – see para 0021, 0022);
analyzing the transcription of the segment to determine one or more ambiguities in the answer provided by the interviewee (as, performing a transcription and determining content from the transcription by comparing to model responses – para 0022);
generating a second prompt for the LLM that requests one or more clarifying questions based at least in part on one or more ambiguities in the answer provided by the interviewee; and obtaining a second output from the LLM in response to the second prompt, wherein the second output provides the one or more clarifying questions (as determining, by the response scoring model, that additional questions should be administered, then performing followup questions – para 0028, last half; examiner notes, that the ‘follow-up’ questions can be generated by the automated question stack – see Fig. 2, subblock 280 to subblock 290, back to subblock 200 to 210; examiner notes that the LLM is defined as a generative artificial intelligence learning model – see applicants spec; the scoring models in Kyllonen et al (20150269529) are disclosed as machine learning models – see abstract, para 0005, 0029; ).

As per claim 10, Kyllonen et al (20150269529) teaches the computer-implemented method of claim 1, wherein generating the interactive video of the interview comprises:
generating an index for the interactive video based on segments recorded during the interview (as, generating a log of the video recording of segments of interest, that determine emotion, para 0020, as well as verbal analysis – para 0021), 
wherein the index maps a segment, one or more semantic vector encodings of questions answered during the segment (as developing a vector-of-values, toward the meaning/intent of the interviewee – para 0020, with speech based intent values, in para 0021), 
and a timestamp corresponding to the segment in the interactive video (as timestamping the location of the recording/gesture data that measures the emotion/meaning of the interviewee at that moment in time – para 0020).

As per claim 11, Kyllonen et al (20150269529) teaches the computer-implemented method of claim 1, further comprising:
 identifying interviewee information from the interview, assessing digital content for data related to the interviewee information, generating at least one question based in part of the assessed digital content (as, based on the interviewee’s previous responses, questions/question stacks are selected based upon the previous response of the interviewee – see para 0017), 
and providing the at least one question to the interviewer (as, in the embodiment of the human interviewer conducting the questions – para 0016, using the pre-selected questions – para 0018 – the selected question are presented to the interviewee; and further in para 0018 “read by a human”).

	Claims 12-15 are method claims whose steps are found throughout claims 1-8,10,11 above and as such, these commonly found features in claims 12-15 are similar in scope and content to claims 1-8,10,11 above; therefore, claims 12-15 are rejected under similar rationale as presented against claims 1-8,10,11.  Furthermore to claim 12, Kyllonen et al (20150269529) teaches a GUI interface (Figure 5C, subblocks 568, 570, 572, 574, 576), wherein the interviewer has access to the video segments/image and speech analysis, with the suggested followup questions – para 0018, 0016, and para 0006).  Further to claim 13, Kyllonen et al (20150269529) teaches “text” form on the display – para 0018.  Further to claim 14, see mapping shown in claim 10, toward vector features/scoring. 

Further to claim 15, Kyllonen et al (20150269529) teaches the computer-implemented method of claim 14, wherein matching the semantic vector encoding of the question provided by the user to a segment in the plurality of segments comprises: 
accessing an index associated with the interactive video, wherein the index maps segments, one or more semantic vector encodings of questions answered during the segments (as developing a vector-of-values, toward the meaning/intent of the interviewee – para 0020, with speech based intent values, in para 0021),
and timestamps corresponding to the segments in the interactive video (as timestamping the location of the recording/gesture data that measures the emotion/meaning of the interviewee at that moment in time – para 0020);
and determining a shortest cosine similarity distance between the semantic vector encoding of the question provided by the user and a semantic vector encoding associated with the segment (as using latent semantic analysis between the topic models and the responses to the selected questions – para 0023; examiner notes that it is notoriously old and well known to use cosine distance measurements in LSA – e.g., see Grefenstette et al (20030069877) performing latent semantic indexing using a cosine distance measurement (para 0243)).

	Claims 17-20 are system claims that perform steps found throughout method claims 1-8,10,11 above and as such, claims 17-20 are similar in scope and content to claims 1-8,10,11 above; therefore, claims 17-20 are rejected under similar rationale as presented against claims 1-8,10,11 above.  Furthermore, Kyllonen et al (20150269529) teaches a processor/memory performing the disclosed steps – para 0034.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Kyllonen et al (20150269529) in view of Hazan (20170213190).

As per claim 9, Kyllonen et al (20150269529) teaches the computer-implemented method of claim 1, as mapped above, as well as, the claim elements toward storing segments of the interviewee answering questions, and altering/differing the question sets based on the interviewee response (see mappings above, to claims 8, 10, 11); however, Kyllonen et al (20150269529) does not explicitly teach altering/differing the question sets based on the amount of time remaining for the interview; Hazan (20170213190) teaches machine learning models scoring interview/interviewee interactions (para 0060), that takes into account and alters the ordering of questions based on the time of the interview – beginning, middle, or the end of the interview—para 0149.  Therefore, it would have been obvious to one of ordinary skill in the art of machine learning models for interview processes, to modify the models found in Kyllonen et al (20150269529) with altering the question set based on the time period of the interview (beginning/middle/end) as taught by Hazan (20170213190), because it would advantageously mimic traditional type interview styles (monologue type) and thereby provide an authentic interview experience to returning users (Hazan (20170213190), para 0151). 

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Kyllonen et al (20150269529) in view of Singer et al (20220101096).

As per claim 16, Kyllonen et al (20150269529) teaches the computer-implemented method of claim 12, as mapped above; furthermore, Kyllonen et al (20150269529) teaches accessing top matches from the trained models (as mapped above in claims 1-8, 9-12; however, Kyllonen et al (20150269529) does not teach using a retrieval augmented generation technique that attempts to “fill-in-the-blank” when there is not an answer available; Singer et al (20220101096) teaches accessing a deep knowledge base (Fig. 1, subblock 108) that provides an answer to the reasoned extraction circuitry (Fig. 1, subblock 106) using Retrieval-Augmented Generation (para 0077).  Therefore, it would have been obvious to one of ordinary skill in the art of knowledge bases/models to expand the system of Kyllonen et al (20150269529) with an deep knowledge base and implementation of retrieval augmented generation, as taught by Singer et al (20220101096), because it would advantageously improve the accuracy of the machine models (para 0024, 0025).    

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  See references cited on the PTO-892 form.
Furthermore, the following references were found that contain features disclosed in applicants spec/claims:

Lam et al (20230177275) teaches machine learning and natural language processing of input data as recordings/transcripts, wherein the build engine generates a list of questions to the interviewee (para 0037).
 
Nunamaker Jr et al (20130266925) teaches the use of a smart agent (via machine learning) to conduct and automated interview using images and speech (103, 126,159, 173,)

Foster (20100161503) teaches the creation of virtual dossiers based on questionnaire responses (figure 4, subblock 102).

  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Michael N Opsasnick/Primary Examiner, Art Unit 2658                                                                                                                                                                                                        2/22/2026

Read full office action

Prosecution Timeline

Mar 25, 2024

Application Filed

Feb 22, 2026

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/512,723

Patent 12602554

SYSTEMS AND METHODS FOR PRODUCING RELIABLE TRANSLATION IN NEAR REAL-TIME

2y 5m to grant Granted Apr 14, 2026

17/698,029

Patent 12592246

SYSTEM AND METHOD FOR EXTRACTING HIDDEN CUES IN INTERACTIVE COMMUNICATIONS

2y 5m to grant Granted Mar 31, 2026

18/367,779

Patent 12586580

System For Recognizing and Responding to Environmental Noises

2y 5m to grant Granted Mar 24, 2026

18/344,007

Patent 12579995

Automatic Speech Recognition Accuracy With Multimodal Embeddings Search

2y 5m to grant Granted Mar 17, 2026

18/273,354

Patent 12567432

VOICE SIGNAL ESTIMATION METHOD AND APPARATUS USING ATTENTION MECHANISM

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

82%

Grant Probability

92%

With Interview (+10.5%)

3y 3m

Median Time to Grant

Low

PTA Risk

Based on 900 resolved cases by this examiner. Grant probability derived from career allow rate.