Last updated: April 19, 2026

Application No. 18/066,365

INTELLIGENT CAPTION EDGE COMPUTING

Non-Final OA §103

Filed

Dec 15, 2022

Examiner

LEE, JANGWOEN

Art Unit

2656

Tech Center

2600 — Communications

Assignee

International Business Machines Corporation

OA Round

1 (Non-Final)

Interview Optional

— +24.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 44 resolved cases, 2023–2026

Examiner Intelligence

LEE, JANGWOEN View full profile →

Grants 82% — above average

Career Allow Rate

36 granted / 44 resolved

+19.8% vs TC avg

Strong +24% interview lift

Without

With

+24.2%

Interview Lift

resolved cases with interview

Typical timeline

2y 11m

Avg Prosecution

23 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

26.5%

-13.5% vs TC avg

§103

54.6%

+14.6% vs TC avg

§102

11.0%

-29.0% vs TC avg

§112

4.1%

-35.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 44 resolved cases

Office Action

§103

DETAILED ACTION

This communication is in response to the Application filed on 12/15/2022. Claims 1-20 are pending and have been examined. Claims 1, 8 and 15 are independent. This Application was published as U.S. Pub. No. 2024/0203417A1.

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/15/2022 was filed.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
	
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Diamant et al., . (US Pub No. 2019/0341050, hereinafter, Diamant) in view of Raanani et al., (US Pub No. 2019/0057079, hereinafter, Raanani).
Regarding Claim 1,
Diamant discloses a method of real-time caption generation in an edge computing environment, executable by a processor (Diamant, par [004], "…A method for facilitating a remote conference includes receiving a digital video and a computer-readable audio signal...a speech recognition machine is operated to translate the computer-readable audio signal into a first text..."; Figs. 11, 22, par [062], "…speech recognition machine 130a-n..."; par [115], "…transcribed speech and/or speaker identity information may be gathered by computerized intelligent assistant 1300 in real time, in order to build the transcript in real time..."; Fig.22, Paras [153, 158], "…Computing system 1300 may take the form of one or more...Internet of Things (IoT) devices, embedded computing devices, and/or other computing devices...a speech recognition machine 1022, an attribution machine 1024, a transcription machine 1026..."), comprising: 
selecting an edge device from among a plurality of edge devices based on the determined personal characteristics, the selected edge device being configured to perform caption conversion for a participant from among the one or more participants in the web conference service (Diamant, Fig.11, par [062], "…speech recognition machines l30a-n are downstream from diarylation machine 132...Each speech recognition machine 130 optionally may be tuned for a particular individual speaker ( e.g., Bob) or species of speakers ( e.g., Chinese language speaker, or English speaker with Chinese accent)..."); 
customizing a lightweight user accent-oriented caption edge module associated with the selected edge device for the participant (Diamant, par [059], "…Speech recognition machine 130 may be trained with regard to an individual, a plurality of individuals, and/or a population…taking into account possible distinct characteristics of speech that may occur more frequently within the population (e.g., different languages of speech, speaking accents, vocabulary, and/or any other distinctive characteristics of speech that may vary between members of populations)..."); and 
deploying the customized lightweight user accent-oriented caption edge module to the selected edge device based on a corpus of captions associated with the user accent-oriented caption edge module most closely matching the determined personal characteristics (Diamant, par [059], "…Training speech recognition machine 130 with regard to an individual and/or with regard to a plurality of individuals may further tune recognition of speech to take into account further differences in speech characteristics of the individual and/or plurality of individuals…."; par [062], "…the speech from each different speaker may be processed independent of the speech of all other speakers, the grammar and/or acoustic model of all speakers may be
dynamically updated in parallel on the fly...each speech recognition machine may be configured to output text 800 with labels 608 for downstream operations, such as transcription...")
Diamant discloses "context of the conference participant" (e.g., paras [077, 128]), but does not explicitly disclose the limitations, "monitoring contexts related to one or more participants in a web conference service, the one or more participants using a caption service...determining personal characteristics associated with each of the participants based on the monitored contexts." However, Raanani, in the analogous field of endeavor, discloses monitoring contexts related to one or more participants in a web conference service, the one or more participants using a caption service (Raanani, Abstract, "…A call assistant device is used to command a call management system to perform a specified task in association with a specified call. The call assistant device can be an Internet of Things (IoT) based device..."; Fig.1, par [040], "…The real-time analysis component 130 receives real-time call data 150 of an ongoing conversation between a customer and a representative and analyzes the real-time call data 150 to generate a set of features..."); 
determining personal characteristics associated with each of the participants based on the monitored contexts (Raanani, Fig.2, par [044], "...the feature generation component 111 includes an ASR component 210, an NLP component 225, an affect component 215 and a metadata component 220...The affect component 215 can analyze the call data 205 for emotional signals and personality traits as well as general personal attributes such as gender, age, and accent of the participants..."); 
Therefore, it would have been obvious to one of ordinary skill in the art, before effective filing date of the claimed invention, to have modified the speech recognition and transcription machine of intelligent conference system of Diamant with the feature generation component of IoT-based call assistant device of Raanani with a reasonable expectation of success to extract the characteristics of conversation, e.g., voice of user/participants to guide the downstream processing or outcome (Raanani, paras [002, 007])
Regarding Claim 2,
Diamant in view of Raanani discloses the method of claim 1.
Diamant further discloses providing a caption conversion service from the deployed edge device to the participant from among the one or more participants (Diamant, par [023], "… in-person participants who are physically present at a conference location, as well as remote participants who participate via remote audio, video, textual, and/or multi-modal interaction with the in-person participants..."; Fig.22, Paras [153, 158], "…Computing system 1300 may take the form of one or more...Internet of Things (IoT) devices...a speech recognition machine 1022, an attribution machine 1024, a transcription machine 1026...").  
Regarding Claim 3,
Diamant in view of Raanani discloses the method of claim 2.
Diamant further discloses causing the provided caption conversion service to display captions on one or more devices associated with the participant from among the one or more participants (Diamant,  par [069], "…Although the following description includes examples of displayed content (e.g., notifications, transcripts, and results of analysis) at a remote user device 172, such displayed content may be displayed at any companion device..."; Fig 22, par [162], "…display subsystem 1008 may be used to present a visual representation of data held by storage subsystem 1004... Display subsystem 1008 may include one or more display devices utilizing virtually any type of technology…").
Regarding Claim 4,
Diamant in view of Raanani discloses the method of claim 1, further comprising defining a framework to maintain and customize edge generation  (Diamant, Fig.22, Paras [153, 158], "…Computing system 1300 may take the form of one or more...Internet of Things (IoT) devices, embedded computing devices, and/or other computing devices...a speech recognition machine 1022, an attribution machine 1024, a transcription machine 1026...") based on dynamically monitoring the contexts (Raanani, Fig.1, par [040], "…The real-time analysis component 130 receives real-time call data 150 of an ongoing conversation between a customer and a representative and analyzes the real-time call data 150 to generate a set of features..."; Fig.2, par [044], "...the feature generation component 111 includes an ASR component 210, an NLP component 225, an affect component 215 and a metadata component 220...The affect component 215 can analyze the call data 205 for emotional signals and personality traits as well as general personal attributes such as gender, age, and accent of the participants..."). 
Regarding Claim 5,
Diamant in view of Raanani discloses the method of claim 4, further comprising defining a data structure for saving and tracking the caption service associated with each participant (Diamant, Fig.22, par [158], "… logic subsystem 1002 and storage subsystem 1004 of computing system 1300 are configured to instantiate
a face identification machine 1020, a speech recognition machine 1022, an attribution machine 1024, a transcription machine 1026, and a gesture recognition machine 1028….").
Regarding Claim 6,
Diamant in view of Raanani discloses the method of claim 4, further comprising maintaining and updating user caption conversion service profiles according to the determined personal characteristics (Raanani, par [039], "…The offline analysis component 110 can store the features 115 and the classifiers 120 in a storage system 125..."; Diamant, par [062], "…a user profile may specify a speech recognition machine ( or parameters thereof) suited for the particular user, and that speech recognition machine ( or parameters) may be used when the user is identified...").
Regarding Claim 7,
Diamant in view of Raanani discloses the method of claim 1, wherein the personal characteristics correspond to one or more from among location, native language, secondary language (Diamant, par [065], "… computerized intelligent assistant 1300 is able to collect at least some audiovisual and/or other relevant data in order to observe conference participants within the conference environment ( e.g., a conference room, office, or any other suitable location for holding a meeting)..."; par [059], "…speech recognition machine 130 to robustly recognize speech by members of the population, taking into account  possible distinct characteristics of speech that may occur more frequently within the population (e.g., different languages of speech, speaking accents, vocabulary, and/or any other distinctive characteristics of speech that may vary
between members of populations)... " )
Claim 8 is a system claim with limitations similar to the limitations of Claim 1 and is rejected under similar rationale. Additionally,
Diamant discloses a computer system for real-time caption generation in an edge computing environment, the computer system comprising: one or more computer-readable non-transitory storage media configured to store computer program code; and one or more computer processors configured to access said computer program code and operate as instructed by said computer program code, said computer program code including  (Diamant, Fig.22, par [154], "…Computing system 1300 includes a logic subsystem 1002 and a storage subsystem 1004..."; par [155], "…The logic subsystem 1002 may include one or more hardware processors configured to execute software instructions…"; par [156], "…Storage subsystem 1004 includes one or more physical devices configured to temporarily and/or permanently hold computer information such as data and instructions executable by the logic subsystem...") 
…
Rationale for combination is similar to that provided for Claim 1.
Claim 9 is a system claim with limitations similar to the limitations of Claim 2 and is rejected under similar rationale.
Claim 10 is a system claim with limitations similar to the limitations of Claim 3 and is rejected under similar rationale.
Claim 11 is a system claim with limitations similar to the limitations of Claim 4 and is rejected under similar rationale.
Claim 12 is a system claim with limitations similar to the limitations of Claim 5 and is rejected under similar rationale.
Claim 13 is a system claim with limitations similar to the limitations of Claim 6 and is rejected under similar rationale.
Claim 14 is a system claim with limitations similar to the limitations of Claim 7 and is rejected under similar rationale.
Claim 15 is a non-transitory computer readable medium claim with limitations similar to the limitations of Claim 1 and is rejected under similar rationale. Additionally,
Diamant discloses a non-transitory computer readable medium having stored there on a computer program for real-time caption generation in an edge computing environment, the computer program  configured to cause one or more computer processors to (Diamant, Fig.22, par [155], "…The logic subsystem 1002 may include one or more hardware processors configured to execute software instructions…"; par [156], "…Storage subsystem 1004 includes one or more physical devices configured to temporarily and/or permanently hold computer information such as data and instructions executable by the logic subsystem...") 
…
Rationale for combination is similar to that provided for Claim 1.
Claim 16 is a non-transitory computer readable medium claim with limitations similar to the limitations of Claim 2 and is rejected under similar rationale.
Claim 17 is a non-transitory computer readable medium claim with limitations similar to the limitations of Claim 3 and is rejected under similar rationale.
Claim 18 is a non-transitory computer readable medium claim with limitations similar to the limitations of Claim 4 and is rejected under similar rationale.
Claim 19 is a non-transitory computer readable medium claim with limitations similar to the limitations of Claim 5 and is rejected under similar rationale.
Claim 20 is a non-transitory computer readable medium claim with limitations similar to the limitations of Claim 6 and is rejected under similar rationale.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Matula et al. (US Pub No. 2022/0084523) discloses computing system and method for utilizing user endpoint devices for performing STT transcriptions of calls between a user/customer and an agent of an enterprise enabling a more productive STT transcription since the user endpoint device may be better tuned, configured, and/or customized for transcription of the user's particular language and accent (Matula, paras [031-036]). Moy et al. (US Pub No. 2025/0322181) discloses systems and methods for providing one-to-one and audio and video calls or for providing multi-party audio or video conferences also provide language translation services.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JANGWOEN LEE whose telephone number is (703)756-5597. The examiner can normally be reached Monday-Friday 8:00 am - 5:00 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, BHAVESH MEHTA can be reached at (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JANGWOEN LEE/Examiner, Art Unit 2656                                                                                                                                                                                                        
/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656

Read full office action

Prosecution Timeline

Dec 15, 2022

Application Filed

Nov 08, 2023

Response after Non-Final Action

Jan 23, 2026

Non-Final Rejection — §103

Apr 06, 2026

Interview Requested

Precedent Cases

Applications granted by this same examiner with similar technology

18/007,025

Patent 12597432

HUM NOISE DETECTION AND REMOVAL FOR SPEECH AND MUSIC RECORDINGS

2y 5m to grant Granted Apr 07, 2026

18/118,619

Patent 12586571

EFFICIENT SPEECH TO SPIKES CONVERSION PIPELINE FOR A SPIKING NEURAL NETWORK

2y 5m to grant Granted Mar 24, 2026

18/258,569

Patent 12573381

SPEECH RECOGNITION METHOD AND APPARATUS, STORAGE MEDIUM, AND ELECTRONIC DEVICE

2y 5m to grant Granted Mar 10, 2026

17/925,261

Patent 12567430

METHOD AND DEVICE FOR IMPROVING DIALOGUE INTELLIGIBILITY DURING PLAYBACK OF AUDIO DATA

2y 5m to grant Granted Mar 03, 2026

18/310,577

Patent 12566930

CONDITIONING OF PRODUCTIVITY APPLICATION FILE CONTENT FOR INGESTION BY AN ARTIFICIAL INTELLIGENCE MODEL

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

82%

Grant Probability

99%

With Interview (+24.2%)

2y 11m

Median Time to Grant

Low

PTA Risk

Based on 44 resolved cases by this examiner. Grant probability derived from career allow rate.