Last updated: May 29, 2026
Application No. 18/548,949
Machine Learning Based Enhancement of Audio for a Voice Call

Non-Final OA §103
Filed
Sep 05, 2023
Priority
Mar 05, 2021 — nonprovisional of PCTUS2021021161
Examiner
LAM, PHILIP HUNG FAI
Art Unit
2656
Tech Center
2600 — Communications
Assignee
Google LLC
OA Round
3 (Non-Final)
Interview Optional

— +47.7% interview lift. Examiner has a relatively high allowance rate (84%); +47.7% interview lift. A written response may suffice.
Based on 143 resolved cases, 2023–2026
Examiner Intelligence

LAM, PHILIP HUNG FAI View full profile →
Grants 84% — above average
Career Allowance Rate
120 granted / 143 resolved
+21.9% vs TC avg
Strong +48% interview lift
Without
With
+47.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
13 currently pending
Career history
160
Total Applications
across all art units
Statute-Specific Performance

§101
0.7%
-39.3% vs TC avg
§103
96.4%
+56.4% vs TC avg
§102
2.3%
-37.7% vs TC avg
§112
0.3%
-39.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 143 resolved cases
Office Action

§103
DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2/12/26 has been entered.
 
Response to Amendment and Arguments
35 U.S.C. 103 Rejections
Applicant’s amendments and arguments are considered but are either unpersuasive or moot in view of the new grounds of rejection that, if presented, were necessitated by the amendments to the Claims.
Applicant’s arguments are directed to material that is added by the most recent amendments to the Claims.  Response, p. 10. 
Applicant has amended independent claims 1, 17 and 21.  Claims 1-21 are pending and have been examined.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.


Claim(s) 1-4, 6-11, 13, and 16-21 are rejected under 35 U.S.C. 103 as being unpatentable over Jose, in view Anderson (US 20190222943), and further in view of Serra (US 20230245674).
Jose discloses: 1. (Currently Amended) A computer-implemented method, comprising: (section III pg.2, Methodology.)
receiving, by a computing device and via a communications network interface, a compressed version of an audio data frame, wherein the compressed version is received after transmission over a communications network; ([sect I, pg. 1, right col.] mentions phone calls and various speech encoding standards used to transmit speech by coded audio data frames) [the disclosure is about speech enhancement using Convolutional Neural Network to enhance encoded speech in phone calls, it would implied the use of a computer device and communication network interface for transmission of voice. See fig. 1)
decompressing the compressed version to extract an audio waveform; ([sect III, pg 2 right col.] The encoder compresses the coded speech samples 𝑿 into a latent vector 𝒁. 𝑔𝜽(𝑿) can implicitly learn important features about 𝑿 such as phonemes, syllables, pitch, and timbre, and encode this into 𝒁. The decoder takes the latent code and generates speech with a higher quality and sampling rate. The U-Net architecture includes skip connections for each corresponding level of hierarchy in the encoder and decoder. This reframes the problem of generating enhanced speech samples from the latent vector, into the problem of modifying coded speech samples into enhanced speech samples. The skip connections also allow the model to transfer features common to both the low-quality and high-quality speech. Hence, 𝑔𝜽(𝒁,𝑿) also takes the coded speech samples 𝑿 as an input in addition to the latent vector Z.) [speech sample are being encoded and then decoded it, subsequently decoding it is decompressing or recovering the original signal/sample]
predicting, by applying the neural network to the audio waveform, an enhanced version of the audio waveform, ([sect III, pg 2 right col.] mention model architecture used to predict speech samples.  Also see fig.1 AMRConvNet neural network architecture.)
wherein the neural network has been trained on (i) a ground truth sample comprising unencoded audio waveforms prior to compression by ([sect III, pg 2 right col.] Dataset: High-quality speech samples were obtained from the VCTK dataset. These high-quality speech utterances originally have a sampling rate of 48 kHz. We down sampled this to 16 kHz .wav PCM files and used this as the reference ground truth speech.) [Down-sampling is a resampling process that reduces the sampling rate. While this is a form of data modification, it is not compression by an audio encoder in the typical sense (like creating an MP3 file). The resulting data is still in an uncompressed format.]
and (ii) a training dataset comprising decoded audio waveforms after compression of the unencoded audio waveforms by the plurality of different audio codecs, wherein the neural network is([sect III, pg 2 right col.] given 𝑿 coded speech samples and 𝒀 high-quality ground truth speech samples, we would like to learn a transformation function 𝑝𝜽(𝑿) that outputs 𝒀̂ predicted speech samples. 𝑝𝜽 is parameterized by non-linear weights and biases 𝜽:) [Coded speech samples is the output of the audio compression or encoder. Y (high quality ground truth), this is the uncompressed, original version of the audio waveform. It serves as the reference for what the neural network should be trying to produce. The goal is to learn a function, 𝑝𝜽(𝑿), that takes the coded speech (X) as input and produces a predicted speech sample (Y) that is as close as possible to the high-quality ground truth speech (Y), which is a regression task.  For the model to be trained to predict the high-quality waveform (Y) from the coded samples (X), the coded samples must first be converted back into a waveform format. This process is decoding. The input to the regression model is therefore the decoded audio waveform, which is inherently a distorted, lower-quality version of the original audio because the information is lost during the initial compression.]
 and providing, by an audio output component of the computing device, the enhanced version of the audio waveform. (see fig. 1, Output, and also sect IV, pg. 4. Mention evaluating the enhanced speech)
Jose does not explicitly disclose recognizing, using a neural network, whether the audio waveform needs to be enhanced upon determining that the audio waveform needs to be enhanced.
Anderson discloses: recognizing, using a neural network, whether the audio waveform needs to be enhanced upon determining that the audio waveform needs to be enhanced: ([0077] The estimate of the user's intelligibility of the speech components is provided by a first deep neural network which has been trained in a supervised procedure with predefined time segments comprising speech components and/or noise components and corresponding measured speech intelligibilities. The training is conducted under a constraint of minimizing a cost function.) Also see para 0072-0076 where the system is described used the output of one neural network to guide the training and operation of the other for audio enhancement. The first neural network determines or estimates the level of intelligibility and the second neural network use this estimate to learn the optimal way to provide an enhanced signal.
Jose and Anderson are considered analogous art.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Jose to combine the teaching of Anderson for the above mentioned features, because the system described can improve quality of life by enhancing the user’s listening experience (Anderson, [0072-0076]).
Jose/Anderson does not explicitly disclose training involving a plurality of different audio codecs, plurality of different audio codecs, wherein the neural network is configured to compensate for losses due to compression aggregated across the plurality of different audio codecs.
Serra in the related art discloses: trained on: a plurality of different audio codecs, training data set comprising: plurality of different audio codecs, wherein the neural network is configured to compensate for losses due to compression aggregated across the plurality of different audio codecs. ([0154] …The third data set is TCD-VoIP, which consists of 384 recordings and 0.7 h of audio, featuring a number of VoIP degradations. Another data set that we use is the JND data set, which consists of 20,797 pairs of recordings and 28 h of audio. More details for the training set can be found for example in section B of the enclosed appendix. For the programmatic generation of data, the present disclosure generally uses a pool of internal and public data sets, and generates 70,000 quadruples conforming 78 h audio. Further, a total of 37 possible degradations are employed, including additive background noise, hum noise, clipping, sound effects, packet losses, phase distortions, and a number of audio codecs (more details can be found for example in section C of the enclosed appendix). The present disclosure is then compared with ITU-P563, two approaches based on feature losses, one using JND (FL-JND) and another one using PASE (FL-PASE), SRMR, AutoMOS, Quality-Net, WEnets, CNN-ELM, and NISQA. For evaluation purpose, some of them have been re-implemented to fit the training and evaluation pipelines of the present disclosure and have been adapted to work at 48 kHz, if needed/possible.) 
Jose/Anderson/Serra are considered analogous art.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Jose/Anderson to combine the teaching of Serra for the above-mentioned features, because the system described can ensures the model learns to identify specific perceptual issues rather than just generic quality loss (Serra, [0154]).

Regarding Claim 2, Jose/Anderson/Serra discloses: 2. The computer-implemented method of claim 1, wherein the neural network is a symmetric encoder-decoder network with skip connections. ([sect III, pg 2 right col.] The decoder takes the latent code and generates speech with a higher quality and sampling rate. The U-Net architecture includes skip connections for each corresponding level of hierarchy in the encoder and decoder. This reframes the problem of generating enhanced speech samples from the latent vector, into the problem of modifying coded speech samples into enhanced speech samples. The skip connections also allow the model to transfer features common to both the low-quality and high-quality speech.) Also see fig. 1, which is reproduced below for easy reference.

    PNG
    media_image1.png
    311
    921
    media_image1.png
    Greyscale


Regarding Claim 3, Jose/Anderson/Serra discloses: 3. The computer-implemented method of claim 1, further comprising: initially training the neural network based on the ground truth sample and the training dataset. ([sect III, pg 2 right col.]Dataset: High-quality speech samples were obtained from the VCTK dataset. These high-quality speech utterances originally have a sampling rate of 48 kHz. We down sampled this to 16 kHz .wav PCM files and used this as the reference ground truth speech. ([sect III, pg 2 right col.] B Model Architecture: given 𝑿 coded speech samples and 𝒀 high-quality ground truth speech samples, we would like to learn a transformation function 𝑝𝜽(𝑿) that outputs 𝒀̂ predicted speech samples. 𝑝𝜽 is parameterized by non-linear weights and biases 𝜽:))

Regarding Claim 4, Jose/Anderson/Serra discloses: 4. The computer-implemented method of claim 3, wherein the initial training of the neural network is performed on one or more of adaptive multi-rate narrowband (AMR-NB), adaptive multi-rate wideband (AMR-WB), Voice over Internet Protocol (VoIP), or Enhanced Voice Services (EVS) codecs. ([sect I, pg. 1, right col.] The objective of this paper is to find ways of improving voice call quality encoded using AMR. In this paper, we design and train a convolutional neural network model which performs artificial bandwidth expansion and speech enhancement on AMR-coded speech.) Also see Conclusion, pg. 5 The model enhances narrowband speech encoded using Adaptive Multi-Rate (AMR) speech coder. [The claim only required one of the features recited]

Regarding Claim 6, Jose/Anderson/Serra discloses: 6. The computer-implemented method of claim 1, wherein the enhanced version of the audio waveform comprises a waveform with an audio frequency range that was removed during compression by the audio encoder. ([sect III, pg 2 right col.] B Model Architecture: We posed this problem as a multivariate regression task: given 𝑿 coded speech samples and 𝒀 high-quality ground truth speech samples, we would like to learn a transformation function 𝑝𝜽(𝑿) that outputs 𝒀̂ predicted speech samples. 𝑝𝜽 is parameterized by non-linear weights and biases 𝜽:) [coded speech gone through compression process, resulting in loss of quality, which includes high frequency components that is removed to reduce file size or bitrate. The goal or objective of the transformation function is to reverse the compression process or decompress, recovering or generating the missing high frequency components. The predicted output is the bandwidth extended version of the coded speech.]

Regarding Claim 7, Jose/Anderson/Serra discloses: 7. The computer-implemented method of claim 1, wherein the enhanced version of the audio waveform comprises a waveform with a reduced number of one or more speech artifacts that were introduced during the compression by the audio encoder. ([sect 3, pg. 3, right col.] We used a combination of time-domain L2 loss and frequency-domain L2 loss for encouraging AMRConvNet to converge. Mean-squared error (MSE) on the time-domain signals encourage AMRConvNet to regress the original higher frequency and undistorted time-domain signal. This time domain MSE is termed reconstruction loss, as we are trying to reconstruct the original time-domain signals.) [Compressing a signal introduces distortion or artifacts to the original, undistorted version. By training a model to reconstruct the original signal, the model learns to identify and remove those distortions. The minimization of the reconstruction loss is the process by which the artifacts are reduced.]

Regarding claim 8, Jose/Anderson/Serra discloses: 8.The computer-implemented method of claim 1, wherein the enhanced version of the audio waveform comprises a waveform with a reduced amount of signal noise, and wherein the signal noise was introduced during the compression by the audio encoder. ([sect 3, pg. 3, right col.] We used a combination of time-domain L2 loss and frequency-domain L2 loss for encouraging AMRConvNet to converge. Mean-squared error (MSE) on the time-domain signals encourage AMRConvNet to regress the original higher frequency and undistorted time-domain signal. This time domain MSE is termed reconstruction loss, as we are trying to reconstruct the original time-domain signals.) [Minimizing reconstruction loss is how the model learns to remove the noise occurred during the compression and output a enhanced audio]

Regarding Claim 9, Jose/Anderson/Serra discloses: 9. The computer-implemented method of claim 1, wherein the audio waveform comprises audio at a first frequency bandwidth, and the enhanced version of the audio waveform comprises a second frequency bandwidth greater than the first frequency bandwidth. ([sect III, pg 2 right col.] B Model Architecture: We posed this problem as a multivariate regression task: given 𝑿 coded speech samples and 𝒀 high-quality ground truth speech samples, we would like to learn a transformation function 𝑝𝜽(𝑿) that outputs 𝒀̂ predicted speech samples. 𝑝𝜽 is parameterized by non-linear weights and biases 𝜽: ) [coded speech gone through compression process, resulting in loss of quality, which includes high frequency components that is removed to reduce file size or bitrate. The goal or objective of the transformation function is to reverse the compression process or decompress, recovering or generating the missing high frequency components. The predicted output is the bandwidth extended version of the coded speech.]

Regarding Claim 10, Jose/Anderson/Serra discloses: 10. The computer-implemented method of claims 1, wherein the audio waveform comprises one or more frequency bandwidths, and the enhanced version of the audio waveform comprises enhanced audio content in at least one frequency bandwidth of the one or more frequency bandwidths. ([sect III, pg 2 right col.] B Model Architecture: We posed this problem as a multivariate regression task: given 𝑿 coded speech samples and 𝒀 high-quality ground truth speech samples, we would like to learn a transformation function 𝑝𝜽(𝑿) that outputs 𝒀̂ predicted speech samples. 𝑝𝜽 is parameterized by non-linear weights and biases 𝜽: ) [coded speech gone through compression process, resulting in loss of quality, which includes high frequency components that is removed to reduce file size or bitrate. The goal or objective of the transformation function is to reverse the compression process or decompress, recovering or generating the missing high frequency components. The predicted output is the bandwidth extended version of the coded speech.]


Regarding Claim 11, Jose/Anderson/Serra discloses: all the elements of claim 1.
Anderson further discloses: adjusting the enhanced version of the audio waveform based on a user profile. ([0023] Thereby the first and second algorithms may be optimized to a particular hearing profile.)
Where the rationale for the combination would be similar to the one already provided.

Regarding Claim 13, Jose/Anderson/Serra discloses: 13. The computer-implemented method of claim 1, wherein the predicting of the enhanced version of the audio waveform further comprises: obtaining a trained neural network at the computing device; ([sect 4, pg. 4 right col.] trained AMRConvNet and evaluated its performance with multiple speakers and on the multiple bitrates supported by AMR.)
and applying the trained neural network as obtained to the predicting of the enhanced version of the audio waveform. ([sect 4, pg. 4 right col.] We also checked the quality of the enhanced speech by inspecting their magnitude spectrograms generated using short time Fourier transform (STFT). Most samples show an extension of the frequency content of the speech waveforms beyond the original AMR coded speech (Fig. 4). We can infer that the model was able to recognize patterns from the lowfrequency coded speech and mapped them to the higher frequency speech, since the waveforms look similar in structure to the ground truth speech.)

Regarding Claim 16, Jose/Anderson/Serra discloses: 16. The computer-implemented method of claim 3, wherein the training dataset comprises decoded audio waveforms that are decoded after transmission over one or more communications networks. ([sect I, pg. 1, right col.] mentions phone calls and various speech encoding standards used to transmit speech by coded audio data frames in cellphone systems) [the speech coding standards imply transmission over network]

Regarding Claim 17, Jose discloses: 17. A computing device, comprising: a communications network interface; an audio output component; and one or more processors operable to perform operations, the operations comprising: ([the disclosure is about speech enhancement using Convolutional Neural Network to enhance encoded speech in phone calls, it would implied the use of a computer device(which would contain one or more processors) and communication network interface for transmission of voice, an audio output component to play enhance audio. Also see fig. 1)
As for the rest of the claim, they recite the elements of Claim 1, and therefore the rationale applied in the rejection of claim 1 is equally applicable.  
Claim 18 is a computer device claim that correspond to claim 3 and is rejected under similar rationale.
Claim 19 is a computer device claim that correspond to claim 4 and is rejected under similar rationale.
Claim 20 is a computer device claim that correspond to claim 9 and is rejected under similar rationale.

Regarding Claim 21, Jose discloses: 21. An article of manufacture comprising one or more computer readable media having computer-readable instructions stored thereon that, when executed by one or more processors of a computing device, cause the computing device to carry out functions comprising: (the disclosure is about speech enhancement using Convolutional Neural Network to enhance encoded speech in phone calls, it would implied the use of a computer device(which would contain computer readable media having computer-readable instructions stored) See fig. 1.)
As for the rest of the claim, they recite the elements of Claim 1, and therefore the rationale applied in the rejection of claim 1 is equally applicable.  

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Jose/Anderson/Serra, and furthermore in view Applicant supplied prior art of Zhao, Z., Liu, H., & Fingscheidt, T. (2018). Convolutional neural networks to enhance coded speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(4), 663-678.
Regarding Claim 5, Jose/Anderson/Serra discloses: all the elements of claim 1.
Jose/Anderson/Serra does not explicitly disclose an exponential linear unit (ELU) function.
Zhao (in the same field of using CNN for enhancing coded speech) discloses: wherein the neural network utilizes an exponential linear unit (ELU) function. ([sect V, pg. 8 left col] mentions using a scaled exponential linear unit (SELU)
Jose/Anderson/Serra/Zhao are considered analogous art.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of system of teaching with the teaching of Zhao for the above mentioned features, because the technique used provides significant speech quality improvements for low quality coded speech without modifying existing codecs (Zhao, [Conclusion]).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Jose/Anderson/Serra, and furthermore in view Siami (US 20150269953).
Regarding Claim 12, Jose in view of Anderson/Serra discloses: 12. The computer-implemented method of claim 11, 
Jose/Anderson/Serra does not explicitly disclose the following feature.
Siami further discloses: further comprising: receiving, via a display component of the computing device, a user indication of the user profile. ([0126] FIG. 6 is a flow diagram that illustrates the DSE Telephony Profile Setup Method--Online. In the first operation, corresponding to box 602, the User obtains a hearing test from a third party, e.g. an Audiologist, hearing test application, or equivalent. Next, at box 604, the User signs in to an account on a network, such as a Web site or a web-based application or equivalent interface, and selects a Profile to set up or to update. In the last operation, at box 606, the User enters the hearing test results in an on-line form and/or directly adjusts an Audiogram chart, or submits the data by other means.)
Jose/Anderson/Serra/Siami are considered analogous art.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of teachings to combine with the teaching of Siami for the above mentioned features, because the display and user interface would allow user convenient way to input their hearing test data to set up a personalized telephony profile (Siami, [0126]).

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Jose/Anderson/Serra, and furthermore in view Busch (US 20210224649).
Regarding Claim 14, Jose/Anderson/Serra discloses: 14. The computer-implemented method of claim 3, 
Jose/Anderson/Serra does not explicitly disclose initial training of the neural network comprises training the neural network at the computing device.
Busch (in the related field of neural network computing, an apparatus and a method for augmenting neural network training with locally captured training data) discloses: wherein the initial training of the neural network comprises training the neural network at the computing device. ([0062] the initial training of the neural network can be accomplished upon initialization of the computing device by the user and/or via an initial training session. Often, the initial neural network training is done to give the computing device a base level of functionality that can be further refined in response to interactions with the user.)
Jose/Anderson/Serra/Busch are considered analogous art.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of teachings to combine with the teaching of Busch for the above mentioned features, because the initial training on local device may enhance privacy and security as well as providing faster response (Busch, [0062]).

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Jose/Anderson/Serra, and furthermore in view Gruenstein (US 20160379113).
Regarding Claim 15, Jose/Anderson/Serra discloses: 15. The computer-implemented method of claim 1, 
Jose/Anderson/Serra does not explicitly disclose the neural network is a pre-processing network for a second neural network.
Gruenstein (in the related field of training neural network) discloses:
wherein the neural network is a pre-processing network for a second neural network. ([0010] In some implementations, the use of a coarse deep neural network for an initial analysis of feature vectors and a deep neural network for a second analysis of the feature vectors that the coarse deep neural network indicates meet a threshold level of relevance may reduce central processing unit (CPU) usage, power consumption, and/or network bandwidth usage.)
Jose/Anderson/Serra/Gruenstein are considered analogous art.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of teachings to combine with the teaching of Gruenstein for the above mentioned features, because the method may reduce central processing unit (CPU) usage, power consumption, and/or network bandwidth usage (Gruenstein, [0010]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
	Xiao US 20190172479 – disclose method for evaluating voice quality. See Abstract, and para 0015, 0024, 0030, 0040, 0049-0052 and fig. 1a, 1b and 1c for additional details.
	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Phillip H Lam whose telephone number is (571)272-1721. The examiner can normally be reached 9 AM-3 PM Pacific Time.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on (571) 272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/PHILIP H LAM/            Examiner, Art Unit 2656
Read full office action
Prosecution Timeline

Sep 05, 2023
Application Filed
Oct 08, 2025
Non-Final Rejection mailed — §103
Nov 24, 2025
Response Filed
Dec 12, 2025
Final Rejection mailed — §103
Feb 12, 2026
Request for Continued Examination
Feb 23, 2026
Response after Non-Final Action
Mar 17, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/509,678
Patent 12639529
ENHANCING LARGE LANGUAGE MODELS USING IN-CONTEXT LEARNING AND ONLINE KNOWLEDGE
2y 6m to grant Granted May 26, 2026
18/669,069
Patent 12620396
ELECTRONIC DEVICE AND CONTROLLING METHOD OF ELECTRONIC DEVICE
1y 11m to grant Granted May 05, 2026
17/643,239
Patent 12591626
SEARCH STRING ENHANCEMENT
4y 3m to grant Granted Mar 31, 2026
18/329,990
Patent 12572735
DOMAIN-SPECIFIC DOCUMENT VALIDATION
2y 9m to grant Granted Mar 10, 2026
18/377,570
Patent 12572747
MULTI-TURN DIALOGUE RESPONSE GENERATION WITH AUTOREGRESSIVE TRANSFORMER MODELS
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
84%
Grant Probability
99%
With Interview (+47.7%)
2y 6m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 143 resolved cases by this examiner. Grant probability derived from career allowance rate.