Prosecution Insights
Last updated: April 19, 2026
Application No. 18/066,365

INTELLIGENT CAPTION EDGE COMPUTING

Non-Final OA §103
Filed
Dec 15, 2022
Examiner
LEE, JANGWOEN
Art Unit
2656
Tech Center
2600 — Communications
Assignee
International Business Machines Corporation
OA Round
1 (Non-Final)
82%
Grant Probability
Favorable
1-2
OA Rounds
2y 11m
To Grant
99%
With Interview

Examiner Intelligence

Grants 82% — above average
82%
Career Allow Rate
36 granted / 44 resolved
+19.8% vs TC avg
Strong +24% interview lift
Without
With
+24.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
23 currently pending
Career history
67
Total Applications
across all art units

Statute-Specific Performance

§101
26.5%
-13.5% vs TC avg
§103
54.6%
+14.6% vs TC avg
§102
11.0%
-29.0% vs TC avg
§112
4.1%
-35.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 44 resolved cases

Office Action

§103
DETAILED ACTION This communication is in response to the Application filed on 12/15/2022. Claims 1-20 are pending and have been examined. Claims 1, 8 and 15 are independent. This Application was published as U.S. Pub. No. 2024/0203417A1. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Information Disclosure Statement The information disclosure statement (IDS) submitted on 12/15/2022 was filed. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Diamant et al., . (US Pub No. 2019/0341050, hereinafter, Diamant) in view of Raanani et al., (US Pub No. 2019/0057079, hereinafter, Raanani). Regarding Claim 1, Diamant discloses a method of real-time caption generation in an edge computing environment, executable by a processor (Diamant, par [004], "…A method for facilitating a remote conference includes receiving a digital video and a computer-readable audio signal...a speech recognition machine is operated to translate the computer-readable audio signal into a first text..."; Figs. 11, 22, par [062], "…speech recognition machine 130a-n..."; par [115], "…transcribed speech and/or speaker identity information may be gathered by computerized intelligent assistant 1300 in real time, in order to build the transcript in real time..."; Fig.22, Paras [153, 158], "…Computing system 1300 may take the form of one or more...Internet of Things (IoT) devices, embedded computing devices, and/or other computing devices...a speech recognition machine 1022, an attribution machine 1024, a transcription machine 1026..."), comprising: selecting an edge device from among a plurality of edge devices based on the determined personal characteristics, the selected edge device being configured to perform caption conversion for a participant from among the one or more participants in the web conference service (Diamant, Fig.11, par [062], "…speech recognition machines l30a-n are downstream from diarylation machine 132...Each speech recognition machine 130 optionally may be tuned for a particular individual speaker ( e.g., Bob) or species of speakers ( e.g., Chinese language speaker, or English speaker with Chinese accent)..."); customizing a lightweight user accent-oriented caption edge module associated with the selected edge device for the participant (Diamant, par [059], "…Speech recognition machine 130 may be trained with regard to an individual, a plurality of individuals, and/or a population…taking into account possible distinct characteristics of speech that may occur more frequently within the population (e.g., different languages of speech, speaking accents, vocabulary, and/or any other distinctive characteristics of speech that may vary between members of populations)..."); and deploying the customized lightweight user accent-oriented caption edge module to the selected edge device based on a corpus of captions associated with the user accent-oriented caption edge module most closely matching the determined personal characteristics (Diamant, par [059], "…Training speech recognition machine 130 with regard to an individual and/or with regard to a plurality of individuals may further tune recognition of speech to take into account further differences in speech characteristics of the individual and/or plurality of individuals…."; par [062], "…the speech from each different speaker may be processed independent of the speech of all other speakers, the grammar and/or acoustic model of all speakers may be dynamically updated in parallel on the fly...each speech recognition machine may be configured to output text 800 with labels 608 for downstream operations, such as transcription...") Diamant discloses "context of the conference participant" (e.g., paras [077, 128]), but does not explicitly disclose the limitations, "monitoring contexts related to one or more participants in a web conference service, the one or more participants using a caption service...determining personal characteristics associated with each of the participants based on the monitored contexts." However, Raanani, in the analogous field of endeavor, discloses monitoring contexts related to one or more participants in a web conference service, the one or more participants using a caption service (Raanani, Abstract, "…A call assistant device is used to command a call management system to perform a specified task in association with a specified call. The call assistant device can be an Internet of Things (IoT) based device..."; Fig.1, par [040], "…The real-time analysis component 130 receives real-time call data 150 of an ongoing conversation between a customer and a representative and analyzes the real-time call data 150 to generate a set of features..."); determining personal characteristics associated with each of the participants based on the monitored contexts (Raanani, Fig.2, par [044], "...the feature generation component 111 includes an ASR component 210, an NLP component 225, an affect component 215 and a metadata component 220...The affect component 215 can analyze the call data 205 for emotional signals and personality traits as well as general personal attributes such as gender, age, and accent of the participants..."); Therefore, it would have been obvious to one of ordinary skill in the art, before effective filing date of the claimed invention, to have modified the speech recognition and transcription machine of intelligent conference system of Diamant with the feature generation component of IoT-based call assistant device of Raanani with a reasonable expectation of success to extract the characteristics of conversation, e.g., voice of user/participants to guide the downstream processing or outcome (Raanani, paras [002, 007]) Regarding Claim 2, Diamant in view of Raanani discloses the method of claim 1. Diamant further discloses providing a caption conversion service from the deployed edge device to the participant from among the one or more participants (Diamant, par [023], "… in-person participants who are physically present at a conference location, as well as remote participants who participate via remote audio, video, textual, and/or multi-modal interaction with the in-person participants..."; Fig.22, Paras [153, 158], "…Computing system 1300 may take the form of one or more...Internet of Things (IoT) devices...a speech recognition machine 1022, an attribution machine 1024, a transcription machine 1026..."). Regarding Claim 3, Diamant in view of Raanani discloses the method of claim 2. Diamant further discloses causing the provided caption conversion service to display captions on one or more devices associated with the participant from among the one or more participants (Diamant, par [069], "…Although the following description includes examples of displayed content (e.g., notifications, transcripts, and results of analysis) at a remote user device 172, such displayed content may be displayed at any companion device..."; Fig 22, par [162], "…display subsystem 1008 may be used to present a visual representation of data held by storage subsystem 1004... Display subsystem 1008 may include one or more display devices utilizing virtually any type of technology…"). Regarding Claim 4, Diamant in view of Raanani discloses the method of claim 1, further comprising defining a framework to maintain and customize edge generation (Diamant, Fig.22, Paras [153, 158], "…Computing system 1300 may take the form of one or more...Internet of Things (IoT) devices, embedded computing devices, and/or other computing devices...a speech recognition machine 1022, an attribution machine 1024, a transcription machine 1026...") based on dynamically monitoring the contexts (Raanani, Fig.1, par [040], "…The real-time analysis component 130 receives real-time call data 150 of an ongoing conversation between a customer and a representative and analyzes the real-time call data 150 to generate a set of features..."; Fig.2, par [044], "...the feature generation component 111 includes an ASR component 210, an NLP component 225, an affect component 215 and a metadata component 220...The affect component 215 can analyze the call data 205 for emotional signals and personality traits as well as general personal attributes such as gender, age, and accent of the participants..."). Regarding Claim 5, Diamant in view of Raanani discloses the method of claim 4, further comprising defining a data structure for saving and tracking the caption service associated with each participant (Diamant, Fig.22, par [158], "… logic subsystem 1002 and storage subsystem 1004 of computing system 1300 are configured to instantiate a face identification machine 1020, a speech recognition machine 1022, an attribution machine 1024, a transcription machine 1026, and a gesture recognition machine 1028…."). Regarding Claim 6, Diamant in view of Raanani discloses the method of claim 4, further comprising maintaining and updating user caption conversion service profiles according to the determined personal characteristics (Raanani, par [039], "…The offline analysis component 110 can store the features 115 and the classifiers 120 in a storage system 125..."; Diamant, par [062], "…a user profile may specify a speech recognition machine ( or parameters thereof) suited for the particular user, and that speech recognition machine ( or parameters) may be used when the user is identified..."). Regarding Claim 7, Diamant in view of Raanani discloses the method of claim 1, wherein the personal characteristics correspond to one or more from among location, native language, secondary language (Diamant, par [065], "… computerized intelligent assistant 1300 is able to collect at least some audiovisual and/or other relevant data in order to observe conference participants within the conference environment ( e.g., a conference room, office, or any other suitable location for holding a meeting)..."; par [059], "…speech recognition machine 130 to robustly recognize speech by members of the population, taking into account possible distinct characteristics of speech that may occur more frequently within the population (e.g., different languages of speech, speaking accents, vocabulary, and/or any other distinctive characteristics of speech that may vary between members of populations)... " ) Claim 8 is a system claim with limitations similar to the limitations of Claim 1 and is rejected under similar rationale. Additionally, Diamant discloses a computer system for real-time caption generation in an edge computing environment, the computer system comprising: one or more computer-readable non-transitory storage media configured to store computer program code; and one or more computer processors configured to access said computer program code and operate as instructed by said computer program code, said computer program code including (Diamant, Fig.22, par [154], "…Computing system 1300 includes a logic subsystem 1002 and a storage subsystem 1004..."; par [155], "…The logic subsystem 1002 may include one or more hardware processors configured to execute software instructions…"; par [156], "…Storage subsystem 1004 includes one or more physical devices configured to temporarily and/or permanently hold computer information such as data and instructions executable by the logic subsystem...") … Rationale for combination is similar to that provided for Claim 1. Claim 9 is a system claim with limitations similar to the limitations of Claim 2 and is rejected under similar rationale. Claim 10 is a system claim with limitations similar to the limitations of Claim 3 and is rejected under similar rationale. Claim 11 is a system claim with limitations similar to the limitations of Claim 4 and is rejected under similar rationale. Claim 12 is a system claim with limitations similar to the limitations of Claim 5 and is rejected under similar rationale. Claim 13 is a system claim with limitations similar to the limitations of Claim 6 and is rejected under similar rationale. Claim 14 is a system claim with limitations similar to the limitations of Claim 7 and is rejected under similar rationale. Claim 15 is a non-transitory computer readable medium claim with limitations similar to the limitations of Claim 1 and is rejected under similar rationale. Additionally, Diamant discloses a non-transitory computer readable medium having stored there on a computer program for real-time caption generation in an edge computing environment, the computer program configured to cause one or more computer processors to (Diamant, Fig.22, par [155], "…The logic subsystem 1002 may include one or more hardware processors configured to execute software instructions…"; par [156], "…Storage subsystem 1004 includes one or more physical devices configured to temporarily and/or permanently hold computer information such as data and instructions executable by the logic subsystem...") … Rationale for combination is similar to that provided for Claim 1. Claim 16 is a non-transitory computer readable medium claim with limitations similar to the limitations of Claim 2 and is rejected under similar rationale. Claim 17 is a non-transitory computer readable medium claim with limitations similar to the limitations of Claim 3 and is rejected under similar rationale. Claim 18 is a non-transitory computer readable medium claim with limitations similar to the limitations of Claim 4 and is rejected under similar rationale. Claim 19 is a non-transitory computer readable medium claim with limitations similar to the limitations of Claim 5 and is rejected under similar rationale. Claim 20 is a non-transitory computer readable medium claim with limitations similar to the limitations of Claim 6 and is rejected under similar rationale. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Matula et al. (US Pub No. 2022/0084523) discloses computing system and method for utilizing user endpoint devices for performing STT transcriptions of calls between a user/customer and an agent of an enterprise enabling a more productive STT transcription since the user endpoint device may be better tuned, configured, and/or customized for transcription of the user's particular language and accent (Matula, paras [031-036]). Moy et al. (US Pub No. 2025/0322181) discloses systems and methods for providing one-to-one and audio and video calls or for providing multi-party audio or video conferences also provide language translation services. Any inquiry concerning this communication or earlier communications from the examiner should be directed to JANGWOEN LEE whose telephone number is (703)756-5597. The examiner can normally be reached Monday-Friday 8:00 am - 5:00 pm ET. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, BHAVESH MEHTA can be reached at (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /JANGWOEN LEE/Examiner, Art Unit 2656 /BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656
Read full office action

Prosecution Timeline

Dec 15, 2022
Application Filed
Nov 08, 2023
Response after Non-Final Action
Jan 23, 2026
Non-Final Rejection — §103
Apr 06, 2026
Interview Requested

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12597432
HUM NOISE DETECTION AND REMOVAL FOR SPEECH AND MUSIC RECORDINGS
2y 5m to grant Granted Apr 07, 2026
Patent 12586571
EFFICIENT SPEECH TO SPIKES CONVERSION PIPELINE FOR A SPIKING NEURAL NETWORK
2y 5m to grant Granted Mar 24, 2026
Patent 12573381
SPEECH RECOGNITION METHOD AND APPARATUS, STORAGE MEDIUM, AND ELECTRONIC DEVICE
2y 5m to grant Granted Mar 10, 2026
Patent 12567430
METHOD AND DEVICE FOR IMPROVING DIALOGUE INTELLIGIBILITY DURING PLAYBACK OF AUDIO DATA
2y 5m to grant Granted Mar 03, 2026
Patent 12566930
CONDITIONING OF PRODUCTIVITY APPLICATION FILE CONTENT FOR INGESTION BY AN ARTIFICIAL INTELLIGENCE MODEL
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
82%
Grant Probability
99%
With Interview (+24.2%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 44 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month