Last updated: April 19, 2026
Application No. 17/822,861
METHOD AND SYSTEM FOR VOICE CLARITY DURING TELEPHONIC CONVERSATION

Non-Final OA §103
Filed
Aug 29, 2022
Examiner
MASTERS, KRISTEN MICHELLE
Art Unit
2659
Tech Center
2600 — Communications
Assignee
International Business Machines Corporation
OA Round
5 (Non-Final)
Interview Optional

— +24.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 40 resolved cases, 2023–2026
Examiner Intelligence

MASTERS, KRISTEN MICHELLE View full profile →
Grants 62% of resolved cases
Career Allow Rate
25 granted / 40 resolved
+0.5% vs TC avg
Strong +25% interview lift
Without
With
+24.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
36 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
35.2%
-4.8% vs TC avg
§103
46.9%
+6.9% vs TC avg
§102
8.0%
-32.0% vs TC avg
§112
7.1%
-32.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 40 resolved cases
Office Action

§103
Detailed Action
This communication is in response to the Request for Continued Examination filed on 12/11/2025. 
Claims 1-16 and 21-24 are pending and have been examined. 
Claims 1-16 and 21-24 are rejected. Claims 17-20 are cancelled. 
Any previous objection/rejection not mentioned in this Office Action has been withdrawn by the Examiner. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment 
The Applicants have amended the independent claims to include “generative adversarial networking”
Regarding the Claim Rejections - 35 U.S.C. § 103 applicant notes The cited portions of Gang describe an electronic device with a processor which detects a disruption or a change in a connection state between the electronic device and an external microphone (e.g., USB or Bluetooth) and switches a recording device based on detected disruptions, operational status of microphones, reconnections or user input. Gang further describes detecting an abnormal operation or disconnection of a first microphone and activating a second microphone in response. Thus, Gang describes reactive switching between microphones based on detected disruptions, operational status of microphones, reconnections, or user input. Gang outlines combinations of various microphones and describes a transitional mixing process of audio signals using deterministic signal operations such as dynamic range compression, delay, and fade- in/fadeout, all of which are deterministic and pre-defined. 
Applicant notes Gang describes mixing of fade-in and fade-out processed audio signals and applying cross-fading while mixing the first audio signals and delayed second audio signals, it fails to teach using machine learning to refine a combination of an original input and an additional audio input to maximize clarity. Examiner notes the amended claims use GAN which is taught by CHEN. See the office action claim mappings below.
Applicant notes Kwarta fails to teach or suggest any voice clarity improvement mode. The modes described in Kwarta correspond to network-level strategies such as switching networks, layering networks, or using voice to text as a fallback. Thus, the combination fails to teach a system that enables selecting an Internet of Thing (IoT)-assisted voice clarity improvement mode from a plurality of voice clarity improvement modes, obtaining an additional audio input from a nearby device in response to selecting the IOT-assisted voice clarity improvement mode, and using machine learning for refining a combination of an original input and the additional audio input, as recited in amended independent claim 1. Examiner notes the amended claims use GAN which is taught by CHEN. See the office action claim mappings below.
Regarding the Claim Rejections - 35 U.S.C. § 103 Applicant’s arguments with respect to independent claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Hence, new grounds of rejection have been made in view of Kwatra (US Patent Number US 20220086724 A1), in view of Gang (US Patent Number US 11997460 B2) and further in view of CHEN (US Patent Number US 20210201887 A1).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, 8, 9, 11, 15, 16 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Kwatra (US Patent Number US 20220086724 A1), in view of Gang (US Patent Number US 11997460 B2), and further in view of CHEN (US Patent Number US 20210201887 A1).

Regarding claim 1, Kwatra teaches 1. A processor-implemented method, the method comprising: predicting a change in network connectivity within the audio interaction; (see Kwatra [0014] “The exemplary embodiments are directed to a method, computer program product, and system for predictively compensating for expected audio communication issues through a set of rules that define a compensation measure based on conditions that are expected or are present. The exemplary embodiments may provide an intelligent mechanism by which seamless communication may be accomplished through a dynamic mode change, layering of available mechanisms, etc. to improve a communication link and prevent an interruption in the communication in a seamless manner. With the wide variety of reasons that a communication may lack a seamless quality that users may expect, the exemplary embodiments may utilize a dynamic and modular layering of input methods for a communication to proceed. Key benefits of the exemplary embodiments may include providing a seamless mechanism for a user to perform a voice communication to receive incoming voice messages in a predictive manner to minimize actions required from the user. Detailed implementation of the exemplary embodiments follows.”) in response to the predicted change in network connectivity, determining whether to perform a voice clarity improvement; (see Kwatra [0053] “During a voice communication session (e.g., a telephonic communication), the compensation client 116 installed in the primary smart device 110 may analyze a received audio quality of incoming voice communications from a further user of the communication session and may verify whether the quality of the audio satisfies a specified threshold limit. The conditions client 114 may also track any external noise (e.g., ambient noise condition) and may validate if the user is having difficulty to understand the spoken content (e.g., utterances from the further user). The compensation client 116 may utilize historical data (e.g., as analyzed by a historical data analysis program (not shown)) to identify a pattern of network strength reduction. Based on this information, the compensation client 116 may initiate a parallel mode of communication in a proactive manner to ensure seamless communication when such a compensation measure is to be used.”) 
Kwatra does not specifically teach in response to determining to perform the voice clarity improvement, obtaining an additional input from a nearby device and performing the voice clarity improvement for the audio interaction with the additional input from the nearby device. 
However, Gang does teach this limitation (see Gang (25:62-26:19) “(150) In operation 1310, the processor 120 may determine that an operation stop is detected due to operational failure in the first microphone or disconnection of the same. In an embodiment, the first microphone may be the external microphone 220 included in the external electronic device 102, and the processor 120 may determine whether connection with the external microphone 220 is disrupted through the wireless communication module 192 or the connecting terminal 178. In another embodiment, the first microphone may be the internal microphone 205 included in the electronic device 101, and the processor 120 may determine that the operation of the internal microphone 205 has become abnormal, based on a user input or internal determination. (151) In operation 1315, the processor 120 may activate the second microphone. In an embodiment, the second microphone may be different from the first microphone, which may be included in the electronic device 101 or connected to the electronic device 101. In operation 1320, the processor 120 may output the first audio signal stored in the recovery buffer 960 and, in operation 1325, mix the first audio signal read from the recovery buffer 960 and the second audio signal input from the second microphone during a designated time period. In an embodiment, the recovery buffer 960 may store the first audio signal, at least, by the same length as the designated time period.”) 
Kwatra and Gang are in the same field of endeavor of signal processing, therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method of Kwatra to incorporate the teachings of Gang to include in response to determining to perform the voice clarity improvement, obtaining an additional input from a nearby device and performing the voice clarity improvement for the audio interaction with the additional input from the nearby device. Doing so allows for continuous capturing of audio even when one device has stopped as recognized by Gang in (1:64-2:45).
Kwatra in view of Gang do not specifically teach wherein the voice clarity improvement includes refining, using generative adversarial networking, a combination of an original input and the additional audio input to maximize clarity. However, CHEN does teach this limitation (see CHEN [0031] “According to a universal approximation theorem for neural networks, the adversarial loss function model can approximate a loss function, and it is, in essence, a deep learning model for generative adversarial networks, and can be trained based on an adversarial learning method to characterize a loss function. The loss function can characterize the characteristics of the spectrum sequence in essence, and can be responsible for training the clarity of the spectrum sequence generated by the speech spectrum generation model. [0032] In this step, as shown in FIG. 2, the analog spectrum sequence generated in step S101 is input into the adversarial loss function model so that a second loss value can be output. The second loss value represents a loss in the clarity of the analog spectrum sequence relative to the real spectrum sequence..”) 
Kwatra in view of Gang and CHEN are in the same field of endeavor of signal processing, therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method of Kwatra and Gang  to incorporate the teachings of CHEN to include the voice clarity improvement includes refining, using generative adversarial networking, a combination of an original input and the additional audio input to maximize clarity. Doing so allows improves intelligibility, stability, and tone as recognized by CHEN in [0036-0037].

As to Claim 2, Kwatra in view of Gang and further in view of CHEN teaches 2. The method of claim 1, (see Claim 1). 
Furthermore, Kwatra teaches wherein the predicting is performed using a process of machine learning. (see Kwatra [0016] “As those skilled in the art will appreciate, network availability may be spotty, or there may be excessive crowds and poor bandwidth through a momentary overload of signals to switch. Furthermore, lousy weather and even a software glitch may exist which result in dropped calls leading to a poor user experience. In light of these issues and the shortcomings of conventional approaches, the exemplary embodiments provide a mechanism for dynamically projecting and/or predicting patterns of movement for a user and/or groups of people. Using data derived and based on static and/or dynamic user clustering and densities from location determining methods, the exemplary embodiments may compile statistic and machine learning inputs that are used to compensate for issues that exist or are predicted. Based on pattern analysis and other available analysis approaches, the exemplary embodiments may proactively determine when a compensation measure is to be affected (e.g., between cellular and VoIP communication systems) and provide forecasting inputs for pre-demand bandwidth expansion to enable providers insight into infrastructure needs and planned purchasing power, thereby reducing cost for vendor service providers and a high level of call quality for users. Furthermore, the exemplary embodiments may utilize a compensation measure involving a layering of both cellular and VoIP if both signal qualities are relatively weak on its own while reserving an option to utilize a compensation measure based on user preferences across multiple devices associated with the user. Still further, the exemplary embodiments may provide a compensation measure involving voice transcription that transcribes voice to text in real time, particularly due to surrounding conditions that do not permit the user from properly deciphering incoming voice communications.”)
As to Claim 4, Kwatra in view of Gang and further in view of CHEN teaches The method of claim 1, (see Claim 1).
Furthermore, Gang teaches wherein the voice clarity improvement comprises supplementing the audio interaction with the additional audio input. (see Gang (2:28-31) “(7)…synchronize and mix the first audio signal and the second audio signal during a designated first time period, and deactivate the external microphone upon lapse of the designated first time period.”)
Kwatra and Gang and further in view of CHEN are in the same field of endeavor of signal processing, therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method combination of Kwatra and Gang and CHEN to incorporate the teachings of Gang to include the voice clarity improvement comprises an improvement using generative adversarial networking. Doing so allows for continuous capturing of audio even when one device has stopped as recognized by Gang in (1:64-2:45).

Regarding claim 8, Claim 8 is a system claim with limitations similar to that of claim 1 and is rejected under the same rationale. Furthermore, Kwatra teaches A computer system, the computer system comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage medium, and program instructions stored on at least one of the one or more tangible storage medium for execution by at least one of the one or more processors via at least one of the one or more memories, (see Kwatra, [0067] “Devices used herein may include one or more processors 02, one or more computer-readable RAMs 04, one or more computer-readable ROMs 06, one or more computer readable storage media 08, device drivers 12, read/write drive or interface 14, network adapter or interface 16, all interconnected over a communications fabric 18. Communications fabric 18 may be implemented with any architecture designed for passing data and/or control information between processors (such as microprocessors, communications and network processors, etc.), system memory, peripheral devices, and any other hardware components within a system.”) 
As to Claim 9, claim 9 is a system claim with limitations similar to that of claim 2 and is rejected under the same rationale.
As to Claim 11, claim 11 is a system claim with limitations similar to that of claim 4 and is rejected under the same rationale.
Regarding claim 15, claim 15 is a device claim with limitations similar to that of claim 1 and is rejected under the same rationale. Furthermore, Kwatra teaches A computer program product, the computer program product comprising: one or more computer-readable tangible storage medium and program instructions stored on at least one of the one or more tangible storage medium, the program instructions executable by a processor capable of performing a method, (see Kwatra [0067] “Devices used herein may include one or more processors 02, one or more computer-readable RAMs 04, one or more computer-readable ROMs 06, one or more computer readable storage media 08, device drivers 12, read/write drive or interface 14, network adapter or interface 16, all interconnected over a communications fabric 18. Communications fabric 18 may be implemented with any architecture designed for passing data and/or control information between processors (such as microprocessors, communications and network processors, etc.), system memory, peripheral devices, and any other hardware components within a system.”) 
As to Claim 16, claim 16 is a device claim with limitations similar to that of claim 2 and is rejected under the same rationale.
As to Claim 22, Kwatra in view of Gang and further in view of CHEN teaches The method of claim 1, 
Furthermore Gang teaches wherein the additional audio input is input via an internet of things device. (see Gang Figure 1) (see Gang (8:66-67) “(44) In another embodiment, the external electronic device 104 may include an internet-of-things (IoT) device.”)
Kwatra and Gang are in the same field of endeavor of signal processing, therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method combination of Kwatra and Gang to incorporate the teachings of Gang to include the additional audio input is input via an internet of things device. Doing so allows for continuous capturing of audio even when one device has stopped as recognized by Gang in (1:64-2:45).

Claims 3, 5, 10, 12, 21, 23, 24 are rejected under 35 U.S.C. 103 as being unpatentable over Kwatra (US Patent Number US 20220086724 A1), in view of Gang (US Patent Number US 11997460 B2), and further in view of CHEN (US Patent Number US 20210201887 A1) and further in view of Jin (US Patent Number US-11514925-B2).
As to Claim 3, Kwatra in view of Gang and further in view of CHEN teaches The method of claim 1, (see Claim 1).
Kwatra in view of Gang and further in view of CHEN does not teach wherein the voice clarity improvement comprises an improvement using generative adversarial networking. However, Jin does teach this limitation. (see Jin (8:53-9:2) (33) Prior to being put into operation in the prediction subsystem 220, an embodiment of the training subsystem 210 trains the prediction model 230. FIG. 3 is a block diagram of the training subsystem 210 according to some embodiments. As shown in FIG. 3, the training subsystem 210 includes a generative adversarial network 310, which is used to train the prediction model 230 as well as to train a discriminator 320. In this GAN 310, the prediction model 230 itself is a neural network acting as a generator, which is trained jointly with the discriminator 320 in an adversarial manner. The GAN 310 of the training subsystem 210 further includes an evaluation tool 330, which applies one or more objective functions, also referred to as loss functions, to modify the prediction model 230, the discriminator 320, or both. One of skill in the art will understand how to construct a GAN 310 in general and, thus, given this disclosure, how to construct a GAN 310 as described herein.”) (see Jin (17:3-11) “(68) FIG. 6 is a flow diagram of a process 600 of utilizing the prediction model 230 after training, according to some embodiments. In some embodiments, the prediction subsystem 220 performs part or all of the process 600 described below. Further, the training subsystem 210 performs the process 400 of FIG. 4 prior to the process 600 of FIG. 6 being performed, thus causing the prediction model 230 to have previously been trained as part of the GAN 310 before being placed into operation.”) (see Jin (17:24-38) “(70)At block 610, the process 600 involves receiving a request to enhance the source audio 110. For instance, as described with respect to FIG. 1, an example of the interface 100 includes a button or link enabling the user to request enhancement of the source audio 110. In some embodiments, the request can be made through a single click or selection made by the user, such as by selecting a button labeled “Go,” “Enhance,” or “Convert” in the interface 100. If the request is received through the interface 100 and the interface 100 is displayed on a first computing device other than a second computing device running the prediction subsystem 220, as shown in FIG. 2, then the first computing device transmits the request to the prediction subsystem 220 at the second computing device, where the request is received and processed.”)
Kwatra in view of Gang and further in view of CHEN and Jin are in the same field of endeavor of speech processing, therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method combination of Kwatra and Gang and CHEN to incorporate the teachings of Jin to include the voice clarity improvement comprises an improvement using generative adversarial networking. Doing so allows reduced background noise, poor microphone quality, corruption of data. as recognized by Jin in (1:22-24).
As to Claim 5, Kwatra in view of Gang and further in view of CHEN teaches The method of claim 1, (see Claim 1).
Kwatra in view of Gang and further in view of CHEN does not teach wherein the voice clarity improvement uses both generative adversarial networking and connected devices concurrently. However, Jin does teach this limitation. (see Jin (8:53-9:2) (33) “Prior to being put into operation in the prediction subsystem 220, an embodiment of the training subsystem 210 trains the prediction model 230. FIG. 3 is a block diagram of the training subsystem 210 according to some embodiments. As shown in FIG. 3, the training subsystem 210 includes a generative adversarial network 310, which is used to train the prediction model 230 as well as to train a discriminator 320. In this GAN 310, the prediction model 230 itself is a neural network acting as a generator, which is trained jointly with the discriminator 320 in an adversarial manner. The GAN 310 of the training subsystem 210 further includes an evaluation tool 330, which applies one or more objective functions, also referred to as loss functions, to modify the prediction model 230, the discriminator 320, or both. One of skill in the art will understand how to construct a GAN 310 in general and, thus, given this disclosure, how to construct a GAN 310 as described herein. (see Jin (17:3-11) “(68) FIG. 6 is a flow diagram of a process 600 of utilizing the prediction model 230 after training, according to some embodiments. In some embodiments, the prediction subsystem 220 performs part or all of the process 600 described below. Further, the training subsystem 210 performs the process 400 of FIG. 4 prior to the process 600 of FIG. 6 being performed, thus causing the prediction model 230 to have previously been trained as part of the GAN 310 before being placed into operation. (See Jin (17:24-38) “(70) At block 610, the process 600 involves receiving a request to enhance the source audio 110. For instance, as described with respect to FIG. 1, an example of the interface 100 includes a button or link enabling the user to request enhancement of the source audio 110. In some embodiments, the request can be made through a single click or selection made by the user, such as by selecting a button labeled “Go,” “Enhance,” or “Convert” in the interface 100. If the request is received through the interface 100 and the interface 100 is displayed on a first computing device other than a second computing device running the prediction subsystem 220, as shown in FIG. 2, then the first computing device transmits the request to the prediction subsystem 220 at the second computing device, where the request is received and processed.”)
Kwatra in view of Gang and further in view of CHEN and Jin are in the same field of endeavor of speech processing, therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method of combination of Kwatra and Gang and CHEN to incorporate the teachings of Jin to include the voice clarity improvement uses both generative adversarial networking and connected devices concurrently. the predicting performed using a process of machine learning. Doing so allows reduced background noise, poor microphone quality, corruption of data. as recognized by Jin in (1:22-24).
As to Claim 10, claim 10 is a system claim with limitations similar to that of claim 3 and is rejected under the same rationale. 
As to Claim 12, claim 12 is a system claim with limitations similar to that of claim 5 and is rejected under the same rationale.

As to Claim 21, Kwatra in view of Gang and further in view of CHEN teaches The method of claim 1, 
Kwatra in view of Gang and further in view of CHEN does not teach wherein a generative adversarial network is the machine learning used for the refining of the combination. However, Jin does teach this limitation. (see Jin (8:53-9:2) (33) “Prior to being put into operation in the prediction subsystem 220, an embodiment of the training subsystem 210 trains the prediction model 230. FIG. 3 is a block diagram of the training subsystem 210 according to some embodiments. As shown in FIG. 3, the training subsystem 210 includes a generative adversarial network 310, which is used to train the prediction model 230 as well as to train a discriminator 320. In this GAN 310, the prediction model 230 itself is a neural network acting as a generator, which is trained jointly with the discriminator 320 in an adversarial manner. The GAN 310 of the training subsystem 210 further includes an evaluation tool 330, which applies one or more objective functions, also referred to as loss functions, to modify the prediction model 230, the discriminator 320, or both. One of skill in the art will understand how to construct a GAN 310 in general and, thus, given this disclosure, how to construct a GAN 310 as described herein. (see Jin (17:3-11) “(68) FIG. 6 is a flow diagram of a process 600 of utilizing the prediction model 230 after training, according to some embodiments. In some embodiments, the prediction subsystem 220 performs part or all of the process 600 described below. Further, the training subsystem 210 performs the process 400 of FIG. 4 prior to the process 600 of FIG. 6 being performed, thus causing the prediction model 230 to have previously been trained as part of the GAN 310 before being placed into operation. (See Jin (17:24-38) “(70) At block 610, the process 600 involves receiving a request to enhance the source audio 110. For instance, as described with respect to FIG. 1, an example of the interface 100 includes a button or link enabling the user to request enhancement of the source audio 110. In some embodiments, the request can be made through a single click or selection made by the user, such as by selecting a button labeled “Go,” “Enhance,” or “Convert” in the interface 100. If the request is received through the interface 100 and the interface 100 is displayed on a first computing device other than a second computing device running the prediction subsystem 220, as shown in FIG. 2, then the first computing device transmits the request to the prediction subsystem 220 at the second computing device, where the request is received and processed.”)
Kwatra in view of Gang and further in view of CHEN and Jin are in the same field of endeavor of speech processing, therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method of combination of Kwatra and Gang and CHEN to incorporate the teachings of Jin to include a generative adversarial network is the machine learning used for the refining of the combination. the predicting performed using a process of machine learning. Doing so allows reduced background noise, poor microphone quality, corruption of data. as recognized by Jin in (1:22-24).

As to Claim 23, Kwatra in view of Gang and further in view of CHEN teaches The method of claim 1,
Kwatra in view of Gang and further in view of CHEN does not teach further comprising: selecting the additional audio input from the group consisting of an internet of things-enabled improvement and a generative adversarial network-enabled improvement. However, Jin does teach this limitation (see Jin [0089] “The computing system 700 also includes a network interface device 710. The network interface device 710 includes any device or group of devices suitable for establishing a wired or wireless data connection to one or more data networks. Non-limiting examples of the network interface device 710 include an Ethernet network adapter, a modem, and the like. The computing system 700 is able to communicate with one or more other computing devices (e.g., a computing device acting as a client 240) via a data network using the network interface device 710.”) (see Jin [0086] “The computing system 700 may also include a number of external or internal devices, such as input or output devices. For example, the computing system 700 is shown with one or more input/output (“I/O”) interfaces 708. An I/O interface 708 can receive input from input devices or provide output to output devices. One or more buses 706 are also included in the computing system 700. The bus 706 communicatively couples one or more components of a respective one of the computing system 700.”) (see Jin [0089] “The computing system 700 also includes a network interface device 710. The network interface device 710 includes any device or group of devices suitable for establishing a wired or wireless data connection to one or more data networks. Non-limiting examples of the network interface device 710 include an Ethernet network adapter, a modem, and the like. The computing system 700 is able to communicate with one or more other computing devices (e.g., a computing device acting as a client 240) via a data network using the network interface device 710.
Kwatra in view of Gang and further in view of CHEN and Jin are in the same field of endeavor of speech processing, therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method of combination of Kwatra and Gang and CHEN to incorporate the teachings of Jin to include selecting the additional audio input from the group consisting of an internet of things-enabled improvement and a generative adversarial network-enabled improvement. the predicting performed using a process of machine learning. Doing so allows reduced background noise, poor microphone quality, corruption of data. as recognized by Jin in (1:22-24).

As to Claim 24, Kwatra in view of Gang and further in view of CHEN teaches The method of claim 1, 
Kwatra in view of Gang and further in view of CHEN does not teach wherein the voice clarity improvement includes one improvement selected from the group consisting of audio latency, normalization of audio volume, and improving audio quality. However, Jin does teach this limitation (see Jin [0075] Generally, the discriminator 320 takes in the log mel-spectrogram of target audios 345 and predicted audios 130 and, based on the log mel-spectrogram, outputs a prediction for each. For instance, that prediction is a score indicating a believed likelihood that the target audio 345 or predicted audio 130 is authentic (i.e., is a target audio 345). … The example of the discriminator 320 is a gated convolutional neural network (CNN) with several stacks of convolutional layers, a batch normalization layer, and a Gated Linear Unit (GLU). In some embodiments, the discriminator 320 is fully convolutional, thus allowing inputs of arbitrary temporal length (i.e., audios of various lengths).”)
Kwatra in view of Gang and further in view of CHEN and Jin are in the same field of endeavor of speech processing, therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method of combination of Kwatra and Gang and CHEN to incorporate the teachings of Jin to include the voice clarity improvement includes one improvement selected from the group consisting of audio latency, normalization of audio volume, and improving audio quality. Doing so allows reduced background noise, poor microphone quality, corruption of data. as recognized by Jin in (1:22-24).


Claims 6, 7, 13, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Kwatra (US Patent Number US 20220086724 A1), in view of Gang (US Patent Number US 11997460 B2) and further in view of and further in view of CHEN (US Patent Number US 20210201887 A1) and further in view of Hedqvist (US Patent Number US-9179387-B2). 

As to Claim 6, Kwatra in view of Gang and further in view of CHEN teaches 6. The method of claim 1, (see Claim 1). 
Kwatra in view of Gang and further in view of CHEN does not teach wherein the predicting is performed based on a speed and a direction of a moving device. However, HEDQVIST does teach this limitation. (see Hedqvst (7:48-65) “(19) FIGS. 3b-3f further exemplifies scenarios how the invented system works in different situations. FIG. 3b shows an area known to supply insufficient coverage for the user the “IC” (“Insufficient Coverage”) area. The server tracks the portable communication device 22 and extrapolates the speed and direction vectors and out of this adjust the handover border 65 so when the terminal passes this “border” the handover is initiated. The distance from the insufficient coverage area IC and the virtual handover border 65 is large enough to ensure that the handover mechanism is completed before insufficient coverage are is reached when initiated at the handover border line for the current speed and direction. Therefore the size or placement of the handover border 65 can be dynamic depending on user speed, direction, heuristics, statistics and potentially individual settings or knowledge about that individual correlated with the authentication of the session in the terminal. There is a minimal “handover border” area that should not be smaller than the insufficient coverage area IC.”) (see Hedqvst (2:56-3:6) “(15) According to a first aspect of the present invention, this object is achieved by a method of performing vertical handover of a wireless voice connection, which is part of a voice connection set up between a portable communication device and another communication device, said handover being performed for the portable communication device between a local wireless network and a wireless wide area network, comprising the steps of: determining a handover situation for the wireless connection to the portable communication device based on a set of handover factors at least comprising the position and movement of the portable communication device in an area of the local network and structural layout information of the area together with knowledge of where in this area there is insufficient coverage, and handing over the connection from the local network to the wide area network or from the wide area network to the local network based on the handover situation.”)
Kwatra in view of Gang and further in view of CHEN and Hedqvist are in the same field of endeavor of speech processing, therefore It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified The method of combination of Kwatra and Gang  and CHEN to incorporate the teachings of HEDQVIST to include the predicting is performed based on a speed and a direction of a moving device. Doing so allows improved connection with a portable communication device as recognized by Hedqvist in (2:38).

As to Claim 7, Kwatra in view of Gang and further in view of CHEN and further in view of Hedqvist teach The method of claim 6, (see Claim 6).
Furthermore, Kwatra teaches wherein the predicting is performed based on a projection that the moving device will enter an area with poor network connectivity, (see Kwatra [0040] “The rules program 134 may also incorporate information from the identification program 132 to the rules and the corresponding compensation measures. For example, the rules program 134 may define a rule for when a future event that the user is planning to attend having an ambient noise that will likely prevent the user from properly deciphering incoming voice communications from the further user. The rule may indicate that the compensation measure for the conversion from voice to text is to be prepared. In another example, the rules program 134 may define a rule for when the user may constantly change location from an origin to a destination along a planned or likely route. The rule may indicate that the compensation measure for the association to a further network is to be prepared where the rule may also set forth likely further networks that may provide the sufficient signal strength to perform the communication based on the location. The rules program 134 may have access to a variety of databases or crowd-sourced information that indicates various networks that are available at select locations and corresponding signal strengths experienced by devices having substantially similar technical characteristics as the primary smart device 110.”) (see Kwatra [0062] “As a result of the ambient noise condition being within acceptable limits (decision 214, “NO” branch), the primary smart device 110 may determine whether a further network is available that provides a sufficient signal strength to perform the communication (decision 218). In performing this operation, the primary smart device 110 may have determined that the current signal strength to a first network is not sufficient to perform the communication so that the user is provided a satisfactory user experience. Thus, the primary smart device 110 may identify further networks such as WiFi networks or unsecured networks that may be used in performing the communication. As a result of the signal strength to the first network being poor but the signal strength to a second network being good (decision 218, “YES” branch), the primary smart device 110 may utilize a compensation measure in which the primary smart device 110 associated with the second network to perform the communication (step 220).(60) As a result of the further network also having a signal strength that is poor (decision 218, “NO” branch), the primary smart device 110 may utilize a compensation measure in which networks may be layered to strength an overall signal strength to perform the communication (step 222). In such a scenario, the signal strength to the first network and to the second network may be poor but the layering may result in an overall signal strength that may satisfy a minimum threshold to perform the communication. For example, the primary smart device 110 may layer the cellular network over the WiFi network or vice versa.”) (see Kwatra [0016] “As those skilled in the art will appreciate, network availability may be spotty, or there may be excessive crowds and poor bandwidth through a momentary overload of signals to switch. Furthermore, lousy weather and even a software glitch may exist which result in dropped calls leading to a poor user experience. In light of these issues and the shortcomings of conventional approaches, the exemplary embodiments provide a mechanism for dynamically projecting and/or predicting patterns of movement for a user and/or groups of people. Using data derived and based on static and/or dynamic user clustering and densities from location determining methods, the exemplary embodiments may compile statistic and machine learning inputs that are used to compensate for issues that exist or are predicted. Based on pattern analysis and other available analysis approaches, the exemplary embodiments may proactively determine when a compensation measure is to be affected (e.g., between cellular and VoIP communication systems) and provide forecasting inputs for pre-demand bandwidth expansion to enable providers insight into infrastructure needs and planned purchasing power, thereby reducing cost for vendor service providers and a high level of call quality for users.”)
As to Claim 13, claim 13 is a system claim with limitations similar to that of claim 6 and is rejected under the same rationale.
As to Claim 14, claim 14 is a system claim with limitations similar to that of claim 7 and is rejected under the same rationale.
Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KRISTEN MICHELLE MASTERS whose telephone number is (703)756-1274. The examiner can normally be reached M-F 8:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Louis Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KRISTEN MICHELLE MASTERS/Examiner, Art Unit 2659   

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Aug 29, 2022
Application Filed
May 31, 2024
Non-Final Rejection — §103
Jul 30, 2024
Interview Requested
Aug 07, 2024
Applicant Interview (Telephonic)
Aug 09, 2024
Examiner Interview Summary
Aug 27, 2024
Response Filed
Nov 01, 2024
Final Rejection — §103
Jan 10, 2025
Response after Non-Final Action
Feb 10, 2025
Request for Continued Examination
Feb 11, 2025
Response after Non-Final Action
Jun 03, 2025
Non-Final Rejection — §103
Jun 27, 2025
Interview Requested
Jul 15, 2025
Examiner Interview Summary
Jul 15, 2025
Applicant Interview (Telephonic)
Aug 25, 2025
Response Filed
Sep 27, 2025
Final Rejection — §103
Nov 18, 2025
Interview Requested
Dec 01, 2025
Applicant Interview (Telephonic)
Dec 04, 2025
Examiner Interview Summary
Dec 11, 2025
Request for Continued Examination
Jan 14, 2026
Response after Non-Final Action
Feb 21, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/513,614
Patent 12592219
Hearing Device User Communicating With a Wireless Communication Device
2y 5m to grant Granted Mar 31, 2026
17/415,675
Patent 12548569
METHOD AND SYSTEM OF DETECTING AND IMPROVING REAL-TIME MISPRONUNCIATION OF WORDS
2y 5m to grant Granted Feb 10, 2026
17/790,795
Patent 12548564
SYSTEM AND METHOD FOR CONTROLLING A PLURALITY OF DEVICES
2y 5m to grant Granted Feb 10, 2026
17/940,549
Patent 12547894
ENTROPY-BASED ANTI-MODELING FOR MACHINE LEARNING APPLICATIONS
2y 5m to grant Granted Feb 10, 2026
18/311,150
Patent 12547840
MULTI-STAGE PROCESSING FOR LARGE LANGUAGE MODEL TO ANSWER MATH QUESTIONS MORE ACCURATELY
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
62%
Grant Probability
87%
With Interview (+24.7%)
3y 2m
Median Time to Grant
High
PTA Risk
Based on 40 resolved cases by this examiner. Grant probability derived from career allow rate.
METHOD AND SYSTEM FOR VOICE CLARITY DURING TELEPHONIC CONVERSATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email