Prosecution Insights
Last updated: May 29, 2026
Application No. 18/360,981

SELECTIVE PROCESSING OF SEGMENTS OF TIME-SERIES DATA BASED ON SEGMENT CLASSIFICATION

Non-Final OA §103
Filed
Jul 28, 2023
Examiner
SIRJANI, FARIBA
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Qualcomm Incorporated
OA Round
3 (Non-Final)
76%
Grant Probability
Favorable
3-4
OA Rounds
0m
Est. Remaining
99%
With Interview

Examiner Intelligence

Grants 76% — above average
76%
Career Allowance Rate
419 granted / 554 resolved
+13.6% vs TC avg
Strong +32% interview lift
Without
With
+31.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
19 currently pending
Career history
580
Total Applications
across all art units

Statute-Specific Performance

§101
1.5%
-38.5% vs TC avg
§103
91.0%
+51.0% vs TC avg
§102
3.9%
-36.1% vs TC avg
§112
1.3%
-38.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 554 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . DETAILED ACTION Claims 1-30 are pending. Claims 1, 15, 28, and 30 are independent and are amended. Some of the dependent Claims have also been amended to specify the input. This Application was published as U.S. 20250037734. Apparent priority: 28 July 2023. Applicant’s arguments are considered but are either unpersuasive or moot in view of the new grounds of rejection. Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 2/9/2026 has been entered. Response to Arguments Amendments and arguments are moot in view of the new grounds of rejection. Claim 1 provides as follows and the other independent Claims are amended similarly. 1. A device comprising: a memory configured to store one or more segments of time-series data; and one or more processors configured to: generate, using a feature extractor, a latent-space representation of a segment of the time-series data, a dimensionality of the latent space representation reduced from a dimensionality of the segment; provide one or more classifier inputs to a classifier, the one or more classifier inputs including at least one classifier input based on the latent-space representation; and generate, based on output of the classifier, a processing control signal for the segment. The Interview Summary of 12/16/2026 includes: PNG media_image1.png 186 742 media_image1.png Greyscale As provided as a part of the response to the arguments in the Final Rejection of 11/6/2025, one emblematic characteristic of a latent space is reduction in the dimensionality to those features that are of interest to a particular task. As provided in the Advisory of 12/23/2025 the particular aspect of this Application may be that the output of the bottleneck layer is directly input to the classifier without having to go through the decoder stage and the Claim is still too broadly stated to convey a direct and uninterrupted input of reduced dimension output of the bottleneck layer to the classifier. However, even in this respect (which is not included in the Claim) see the following: Cho (US 20250239367): PNG media_image2.png 526 530 media_image2.png Greyscale [0012] Furthermore, the autoencoder may include an encoder that maps the input data into a latent space dimension to output the compressed data toward a bottleneck layer, and a decoder that reconstructs the compressed data of the bottleneck layer into the input data, wherein the classification layer is configured with a multi-layer perceptron structure connected to the bottleneck layer to predict whether the disease of interest has developed through supervised learning that inputs compressed data of the bottleneck layer and outputs the correct answer data. Ceccaldi (U.S. 20190046068): [0048] … The encoder network 401 may generate features maps at each level including the output latent space 431. The features data (e.g. maps, latent space 431) is input into the discriminator network 411 which attempts to discern from which domain the features maps were generated from. The discriminator network 411 classifies feature data as either from one domain or another and then provides feedback for the encoder network 401. … PNG media_image3.png 564 872 media_image3.png Greyscale PNG media_image4.png 650 620 media_image4.png Greyscale Please also refer to the Advisory Action of 12/23/2025 and the Interview Summary of 12/16/2025. In response to the arguments by the Applicant (Response 8-9), the amendments may overcome Atti but that does not make the Claim allowable. As indicated in the Interview Summary, Sorenson teaches this limitation. Further, as provided at length in the previous Office action and considering the references included in the Conclusion section of this Office action, a latent space is a known definition for reduced dimensionality. In the amended Claim 1, the first limitation adds a definition for latent space and the second limitation provides classifier inputs to the classifier that are broadly somehow based on the latent space representation of the input. The previous mapping to Atti teaches the Claim and the only think it lacked was the definition of the latent space. Patentability of the other independent Claims is argued based on their similarity to Claim 1. Accordingly, the above provides a reply to those arguments as well. Patentability of the dependent Claims is argued based on their dependence from their base independent Claims. Accordingly, the above provides a reply to those arguments as well. 35 U.S.C. 112(f) Claim Interpretation The following is a quotation of 35 U.S.C. 112(f): (f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph: An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: the various “means for” limitations in Claim 30. These limitations are generic in the context of the art and don’t refer to any specific structure and only serve as placeholders for the structure that performs the associated function(s) without providing any information about what that structure is. MPEP 2181 I A says: For a term to be considered a substitute for "means," and lack sufficient structure for performing the function, it must serve as a generic placeholder and thus not limit the scope of the claim to any specific manner or structure for performing the claimed function. It is important to remember that there are no absolutes in the determination of terms used as a substitute for "means" that serve as generic placeholders. The examiner must carefully consider the term in light of the specification and the commonly accepted meaning in the technological art. Every application will turn on its own facts. Based on the ordinary skill in the art and description of functions of these components in the Specification, they refer to processors or a combination of processor and memory as provided in the counterpart system Claim 1 which includes similar limitations only without the “means-plus-function” language. PLEASE NOTE: This is NOT a rejection. Please don’t address it as a rejection. If the Applicant does not agree with the INTERPRETATION, he may argue or amend to replace the terms interpreted under 112(f) with structural terms such as “memory” or “processor” as appropriately supported by the Specification. In the alternative, he may let the interpretation stand if the intent was to include a means plus function limitation in the Claim. The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked. As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph: (A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; (B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and (C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-2, 5-6, 9-16, 19-20, 22-26, and 28-30 are rejected under 35 U.S.C. 103 as being unpatentable over Atti (U.S. 20160293175) in view of Sorensen (U.S. 20240374136). Regarding Claim 1, Atti teaches: 1. A device comprising: a memory configured to store one or more segments of time-series data; and [Atti, Figure 7, “memory 732.”] one or more processors configured to: [Atti, Figure 7, “processor 706.”] generate, using a feature extractor, a latent-space representation of a segment of the time-series data, [Atti, Figure 1, “input speech 110” / “time-series data” is coming in and divided into “audio frames 112” / “segments” and input o the “selector 122” which is expanded in Figure 2. Figure 2 shows that the “input frame” / “segment” is fed into a “short term feature extraction 226” / “feature extractor.”] a dimensionality of the latent space representation reduced from a dimensionality of the segment; [The definition of latent space generally includes a reduction in dimensionality. But Atti does not expressly state this.] provide one or more classifier inputs to a classifier, the one or more classifier inputs including at least one classifier input based on the latent-space representation; and [Atti, Figure 1, the “selector 120” includes a “first classifier 122.” Figure 2 shows that the output of the “feature extraction 226” is fed to a “model based classifier”/ “first classifier 122” and the output of the first classifier goes into a second “open loop classifier” /”second classifier 124.” “Updated classification decision 248” is output.] generate, based on output of the classifier, a processing control signal for the segment. [Atti, Figure 1, the output of the “selector 122” which includes the first and second classifiers (122, 124) goes to a “switch 130” and teaches the “processing control signal” of the Claim because it controls the switch.] The definition of latent space generally includes a reduction in dimensionality. But Atti does not expressly state this. Sorensen teaches: generate, using a feature extractor, a latent-space representation of a segment of the time-series data, a dimensionality of the latent space representation reduced from a dimensionality of the segment; [Sorensen provides a link between latent-space and reduction in dimensionality: “[0215] The variational autoencoder is a unsupervised generative model, that consists of two neural networks, an inference model, the encoder and a generative model, the decoder. The encoder maps the input sample into a lower dimensional latent variable, which the decoder maps into a reconstruction of the input sample….”] Atti and Sorensen pertain to classification of signals and it would have been obvious to provide the definition of latent space from Sorensen to Atti. Atti includes a reference to latent space and Soerensen is added only to provide an express indication that latent space refers to a reduction in dimensionality. This combination falls under simple substitution of one known element for another to obtain predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396. Regarding Claim 2, Atti teaches: 2. The device of claim 1, wherein the classifier is a one-class classifier or a binary classifier and the output indicates whether the segment is assigned to a target signal class. [Atti, Figure 1 shows a one-class classifier that determines whether the frame/segment is speech or not. “A device includes a first classifier and a second classifier coupled to the first classifier. The first classifier is configured to output first decision data that indicates a classification of an audio frame as a speech frame or a non-speech frame, the first decision data determined based on first probability data associated with a first likelihood of the audio frame being the speech frame and based on second probability data associated with a second likelihood of the audio frame being the non-speech frame….” Abstract. Figure 4 shows a binary classifier that determines whether the signal is speech or music.] Regarding Claim 5, Atti teaches: 5. The device of claim 1, wherein the time-series data represents audio content, and [Atti, Figure 1the time series data is “input speech 110.”] wherein the output indicates whether the segment includes an audio data type associated with a first audio encoder. [Atti, “output of the classifier” is taught by the “Second decision data (e.g., first encoder or second encoder) 148” in Figure 1 or “updated classification decision 248” of Figure 2 which determine whether the first or the second encoder is to be used.] Regarding Claim 6, Atti teaches: 6. The device of claim 1, wherein the one or more processors are configured to selectively route the segment to one of two or more audio coders based on the processing control signal. [Atti, Figure 1, the “switch 130” routes the frame/segment to the first encoder 132 or the second encoder 134 based on the control signal determined by the classification. “[0037] The switch 130 is coupled to the selector 120 and may be configured to receive the second decision data 148. The switch 130 may be configured to select the first encoder 132 or the second encoder 134 according to the second decision data 148. The switch 130 may be configured to provide the audio frame 112 to the first encoder 132 or the second encoder 134 according to (e.g., based on) the second decision data 148. In other implementations, the switch 130 provides or routes a signal to a selected encoder to activate or enable an output of the selected encoder.”] Claims 9-14, 22-26, and 29 pertain to the use of Variational Autoencoders for performing the classification and are directed to standard aspects of VAEs. Regarding Claim 9, Atti does not teach the use of neural network for feature extraction. Sorensen teaches: 9. The device of claim 1, wherein the feature extractor includes an inference network portion and generation network portion of an autoencoder. [Sorensen uses a variational autoencoder (VAE) as its signal classifier and a VAE by definition includes an inference network/engine/model, a generation network/engine/model as parts of its encoder and decoder. Figure 26 shows the Inference Model Q and the Generative Model P. Figure 39 shows the Feature Extraction from Training Data and Testing Data for input to the training and classification parts of the model. “[0215] The variational autoencoder is a unsupervised generative model, that consists of two neural networks, an inference model, the encoder and a generative model, the decoder. The encoder maps the input sample into a lower dimensional latent variable, which the decoder maps into a reconstruction of the input sample. The variational autoencoder builds upon probability theory and Bayes' rule. In the variational autoencoder the inference model is defined as q-(zjx) and the generative model as p(xjz). By including the label variable, y into the model, a semi-supervised generative probabilistic model can be achieved….” “[0174] … The feature input to the model was extracted from time series of four vital sign parameters HR, RR, SpO2 and sysBP, from where clinical deterioration events were extracted as trends in the data time series. However, the model could equally well have been trained based on features selected from one or more of the specific clinical deterioration events disclosed herein. …” “[0234] 2) Feature extraction: Selection of discriminative features is normally important for the prediction of SAE. One or more clinical deterioration events are often preceded with SAE and can be extracted from vital signs as demonstrated in the presently disclosed approach….”] Atti and Sorensen pertain to classification of signals and it would have been obvious to replace the classifier of Atti with the variational autoencoder of Soernsen as a newer and more effective method. This combination falls under simple substitution of one known element for another to obtain predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396. Regarding Claim 10, Atti teaches that the classification decision data is made based on probability data: “[0027] In some implementations, the first classifier may be associated with a maximum-likelihood algorithm (e.g., based on Gaussian mixture models, based on hidden Markov models, or based on neural networks). To generate the first decision data, the first classifier may generate one or more probability values, such as a first probability value (e.g., first probability data) associated with a first likelihood of the audio frame being the speech frame, a second probability value (e.g., second probability data), associated with a second likelihood of the audio frame being the non-speech frame, or a combination thereof. The first classifier may include a state machine that receives the first probability data, the second probability data, or a combination thereof, and that generates the first decision data. The first decision data may be output by the state machine and received by the second classifier.” But does not teach that the probability values are based on a distribution including mean and SD. Sorensen teaches: 10. The device of claim 9, wherein the autoencoder is a variational autoencoder and the latent-space representation includes a mean and a standard deviation of a probability distribution. [Sorensen is directed to the use of a variational autoencoder (VAE). VAEs use statistical distributions and mean and SD are parameters of a Gaussian/Normal Distribution and some other types of distributions. “[0240] Random forest with 200 trees was applied for estimation of DBP and SBP, but other models can be used. First, 3 hours' time series of HR, RR, SpO2 and PR before BP measurements were extracted, from which descriptive statistics such as mean, standard deviation and range were calculated as features. Then, the regression model was trained with first day's data of each patient. The trained model was tested by the following days' data. The mean absolute error (MAE) and standard deviation (STD) of the error were used for evaluating estimation performance.” “[0263] For the model, the priors for the parameters were kept uninformative and given by normal distributions. As the intercept, α, is used as a global baseline, values for this were chosen to reflect common baseline values for the heart rate and respiration rate. For heart rate the mean was set to 70 and for respiration rate it was set to 12. All parameters in β had priors set to follow a standard normal distribution….] Rationale as provided for Claim 9. These are aspects of any variational autoencoder. Regarding Claim 11, Atti did not teach the VAE. Sorensen teaches: 11. The device of claim 9, wherein the autoencoder is trained to reproduce data segments from a target signal class, and the classifier is configured to distinguish the data segments from the target signal class and data segments that are not from the target signal class based on separation of latent-space representations between the data segments from the target signal class and the data segments that are not from the target signal class. [Sorensen, Figure 30, the “training data” are provided to “model training …” in order to make the “classifier” classify the test and input data into SAE or not SAE. SAE = serious adverse effect which is the “target signal class” of the Claim. “[0232] Prediction of SAE can be seen as a classification problem aiming to classify “SAE” versus “no SAE” over a time period (prediction window), e.g. few hours, based on last recordings (observation window). The prediction window was chosen to be two hours and the observation window was chosen to be ten hours as shown in FIG. 31A. In this study samples of SAEs resulting from neurologic, respiratory, circulatory, infectious and other complications were extracted from the patients' database. These extracted SAEs' samples were regarded as “SAE class”….” Figure 25 shows the separation of the latent space. Figure 28 shows the dividing of the “latent distribution” along to principal components and division of the data into AF and Non-AF. “[0212] The input to the model is a 10 second segment from single lead ECG. The classification model used after completed training of the model includes the encoder (cf. FIG. 25) and the classifier. The output of the classification model is the probability that the ECG signal from the input segment is showing atrial fibrillation rhythm. The Latent space and the Decoder (cf. FIG. 25), is only used for training the unsupervised part of the model.” “[0226] … The input segment and corresponding reconstruction of chosen samples is shown in FIG. 28, and the distribution of the samples for the test set in the latent space is shown in FIG. 29….”] Rationale as provided for Claim 9. This Claim expands on a feature of the VAEs. Regarding Claim 12, Atti teaches: 12. The device of claim 9, wherein the autoencoder is trained to reproduce speech data and the classifier is configured to distinguish audio data segments that include speech from audio data segments that do not include speech. [Atti, Figure 1, the “first classifier 122” is generating the “first probability data (e.g. speech) 142” and “second probability data (e.g. non-speech) 144” and the “first decision data (e.g. speech or non-speech) 146.” The “switch 130” is separating “speech” from “non-speech” and sending each to its own encoder (132, 134).] Regarding Claim 13, Atti did not teach the VAE or its aspects. Sorensen teaches: 13. The device of claim 9, wherein the one or more processors are configured to: provide generation network input, based on the latent-space representation, to the generation network portion to generate a synthesized segment of time-series data; and [Soresen, Figure 25 shows the “deep generative model (DGM).” “Generation network input” is provided to the Encoder / DGM. “[0062] FIG. 25 shows a diagram of the deep generative model (DGM) used in example 1.” “[0063] FIG. 26 shows a diagram of (a) the inference model and (b) the generative model of the proposed network used in example 1.” See [0213].] determine a reconstruction error value based on comparison of the segment and the synthesized segment, [Soresen, Figure 27 shows the “Input Samples” and the corresponding “Reconstruction.” “[0216] … The reconstruction loss p(x|z; y) is defined as a Gaussian distribution with μ.sub.θ being the reconstruction and σ.sub.θ.sup.2=2.” “[0219] Besides the lower bounds defined in equations (5) and (6), an extra loss was introduced where the standard deviations of the input signal and the reconstructions were subtracted and the absolute value was taken of the difference. This was introduced to help the decoder to make better reconstructions. For the classifier, binary cross-entropy loss was used.” The various “loss” values teach the “error value” of the Claim.] wherein at least one of the one or more classifier inputs provided to the classifier is based on the reconstruction error value. [Soresen, [0215]-[0220] teach that the reconstruction loss is minimized by maximizing log p(x) (log loss).] Rationale as provided for Claim 9. This Claim expands on a feature of the VAEs. Regarding Claim 14, Atti did not teach the VAE or its aspects. Sorensen teaches: 14. The device of claim 9, wherein the one or more processors are configured to: provide generation network input, based on the latent-space representation, to the generation network portion to generate a probability distribution; and [Soresen, Figure 26 shows the generative model / generation network. “[0063] FIG. 26 shows a diagram of (a) the inference model and (b) the generative model of the proposed network used in example 1.” “[0215] The variational autoencoder is a unsupervised generative model, that consists of two neural networks, an inference model, the encoder and a generative model, the decoder. The encoder maps the input sample into a lower dimensional latent variable, which the decoder maps into a reconstruction of the input sample. The variational autoencoder builds upon probability theory and Bayes' rule….” “[0216] The Gaussian distribution q(z|x; y) is achieved by splitting the last layer of the model into two channels representing the mean, μ.sub.ϕ, and the log variance, log σ.sub.ϕ.sup.2 of the distributions, from which z is sampled using the reparameterization trick….”] determine an error value based on the segment and the probability distribution, wherein at least one of the one or more classifier inputs provided to the classifier is based on the error value. [Soresen, the goal is to minimize the error/loss between the input (train or text data) and the output prediction generated by the model. The parameters are changed to optimize the loss/ minimize the error. “[0217] The objective of optimizing the parameters, θ and ϕ, is to maximize the log-likelihood log p(x)….” See [0215]-[0220].] Rationale as provided for Claim 9. This Claim expands on a feature of the VAEs. Claim 15 is a method claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale. Claim 16 is a method claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale. Claim 19 is a method claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale. Claim 20 is a method claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale. Claim 22 is a method claim with limitations corresponding to the limitations of Claim 9 and is rejected under similar rationale. Claim 23 is a method claim with limitations corresponding to the limitations of Claim 11 and is rejected under similar rationale. Claim 24 is a method claim with limitations corresponding to the limitations of Claim 12 and is rejected under similar rationale. Claim 25 is a method claim with limitations corresponding to the limitations of Claim 13 and is rejected under similar rationale. Claim 26 is a method claim with limitations corresponding to the limitations of Claim 14 and is rejected under similar rationale. Claim 28 is a computer program product system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale. Additionally, Atti teaches: “[0013] In another particular aspect, a computer-readable storage device storing instructions that, when executed by a processor, cause the processor to perform …” Claim 29 is a computer program product system claim with limitations corresponding to the limitations of Claim 9 and is rejected under similar rationale. Claim 30 is a means plus function system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale. The means are processors and memory which were mapped in Claim 1. Claims 3-4 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Atti and Sorensen in view of Ashley (U.S. 20110161087). Regarding Claim 3, Atti teaches: 3. The device of claim 2, wherein the one or more processors are configured to selectively, based on the processing control signal, perform first encoding operations to encode the segment or perform second encoding operations to encode the segment, [Atti, Figure 1 shows that the output of the “switch 130” sends the signal to the “First encoder 132” or the “second encoder 134” based on whether the frame is speech or non-speech. “…The second classifier is configured to output second decision data based on the first probability data, the second probability data, and the first decision data, the second decision data includes an indication of a selection of a particular encoder of multiple encoders available to encode the audio frame.” Abstract.] wherein the first encoding operations provide higher quality encoding of the target signal class than do the second encoding operations. Atti’s choice of encoder is based on the classification of the audio frame as speech or non-speech and Atti does not teach that one of the encoders is of a higher quality and the other of higher efficiency. Sorensen was cited for the definition of latent space and does not teach this either. Ashley teaches: wherein the first encoding operations provide higher quality encoding of the target signal class than do the second encoding operations. [Ashley classifies the in signal into speech and generic audio and uses a different encoder for each. See Figure 1, “speech encoder 230” and “generic audio encoder 240.” The “speech encoder 230” generates a higher quality coded signal for speech and the “generic audio encoder 240” has a better-quality output for non-speech signals. “[0002] Speech coders based on source-filter models are known to have quality problems processing generic audio input signals such as music, tones, background noise, and even reverberant speech. Such codecs include Linear Predictive Coding (LPC) processors like Code Excited Linear Prediction (CELP) coders. Speech coders tend to process speech signals low bit rates. Conversely, generic audio coding systems based on auditory models typically don't process speech signals very well to sensitivities to distortion in human speech coupled with bit rate limitations. One solution to this problem has been to provide a classifier to determine, on a frame-by-frame basis, whether an input signal is more or less speech like, and then to select the appropriate coder, i.e., a speech or generic audio coder, based on the classification. An audio signal processer capable of processing different signal types is sometimes referred to as a hybrid core codec.”] Atti/Sorensen and Ashley pertain to encoding audio signals according to their characteristics by different encoders and it would have been obvious to combine the feature of Ashley that has encoders that produce higher quality or more efficient encoding of the target signal with the system of Atti/Sorensen that performs the same speech/non-speech classification but does not elaborate on how the selected encoder has higher quality and is more efficient with respect to the target signal. Obviously, the goal of classification is to select the more suitable encoder. This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396. Regarding Claim 4, Atti teaches: 4. The device of claim 2, wherein the one or more processors are configured to selectively, based on the processing control signal, perform first encoding operations to encode the segment or perform second encoding operations to encode the segment, [Atti, Figure 1 shows that the output of the “switch 130” sends the signal to the “First encoder 132” or the “second encoder 134” based on whether the frame is speech or non-speech. “…The second classifier is configured to output second decision data based on the first probability data, the second probability data, and the first decision data, the second decision data includes an indication of a selection of a particular encoder of multiple encoders available to encode the audio frame.” Abstract.] wherein the first encoding operations provide more efficient encoding of the target signal class than do the second encoding operations. Atti’s choice of encoder is based on the classification of the audio frame as speech or non-speech and Atti does not teach that one of the encoders is of a higher quality and the other of higher efficiency. Sorensen was cited for the definition of latent space and does not teach this either. Ashley teaches: wherein the first encoding operations provide more efficient encoding of the target signal class than do the second encoding operations. [Ashley, [0002] … Speech coders tend to process speech signals low bit rates. …” Low bit rate is more efficient.] Rationale for combination as provided for Claim 3. Claim 17 is a method claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale. Claim 18 is a method claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale. Claims 7-8 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Atti and Sorensen in view of Breebaart (U.S. 20080281590). Regarding Claim 7, Atti teaches: 7. The device of claim 1, wherein the segment corresponds to a segment of audio data and a feature extractor input to the feature extractor includes a frequency-domain representation of the segment of audio data. [Atti, Figure 1, “Audio Frames 112” are divided from the “input speech.” The “selector 120” is shown in Figure 2 to include short-term and long-term features that include spectral/frequency domain features. “[0039] During operation, the input speech 110 may be processed on a frame-by-frame basis, and a set of features may be extracted from the input speech 110 at the encoder 104 (e.g., in the selector 120)….”] Atti mentions frequency bands but the features of Figure 2 of Atti are not frequency-domain features. Sorensen evaluates frequency content but its features may or may not be in the frequency domain. Breebaart teaches: wherein the segment corresponds to a segment of audio data and the input to the feature extractor includes a frequency-domain representation of the segment of audio data. [Breebart divides the signal into frames/segments ([0013]) and extracts both time domain and frequency domain features used for classification of audio. “[0057] FIG. 3 illustrates a third embodiment of the invention where features extracted from an input signal contain both time-domain and frequency-domain information. …. For every filter-bank output y[m, k], features f.sub.a[m, k], f.sub.b[m, k] are calculated. The feature type f.sub.a[m, k] in this case can be the power spectral value of its input y[m, k], while the feature type f.sub.b[m, k] is the power spectral value calculated for the previous sample. …” “…The invention further describes a method of classifying an audio input signal (M) into a group, and a method of comparing audio input signals (M, M') to determine a degree of similarity between the audio input signals (M, M'). The invention also describes a system (1) for deriving a set of features (S) of an audio input signal (M), a classifying system (4) for classifying an audio input signal (M) into a group, and a comparison system (5) for comparing audio input signals (M, M') to determine a degree of similarity between the audio input signals (M, M').” Abstract.] Atti/Sorensen and Breebaart pertain to audio classification for further processing according to the class and it would have been obvious to modify Atti/Sorensen to include the frequency domain transform before feature extraction in order to operate the classifiers on the frequency domain features of the input audio. This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396. Regarding Claim 8, Atti teaches: 8. The device of claim 7, wherein the frequency-domain representation includes a power spectrum of the segment of audio data. [Breebart, one type of extracted feature is the “power spectral value.’ “[0057]... The feature type f.sub.a[m, k] in this case can be the power spectral value of its input y[m, k], while the feature type f.sub.b[m, k] is the power spectral value calculated for the previous sample. …” “[0058] In FIG. 4, a simplified block diagram of a system 4 for classification of an audio signal M is shown. ….”] Rationale for combination as provided for Claim 7. Claim 21 is a method claim with limitations corresponding to the limitations of Claim 7 with different terminology (frame instead of segment and spectral representation instead of frequency-domain representation) and is rejected under similar rationale. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Amoh (U.S. 20210244377), Figure 3, “record audio … 404” to “perform FFT 406” to “extract features with e-GRU from time and frequency domain sounds 408” to “classify features to detect candidate events of interest 410” shows that the audio features are extracted after transformation of audio data into the frequency domain. Baker (U.S. 20190095798) is directed to an improvement on a variational autoencoder (VAE) which is used for classification of signals and includes an inference network and a generation network. “Computer systems and methods generate a stochastic categorical autoencoder learning network (SCAN). The SCAN is trained to have an encoder network that outputs, subject to one or more constraints, parameters for parametric probability distributions of sample random variables from input data. The parameters comprise measures of central tendency and measures of dispersion….” Abstract. “[0002] Two methods, Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), have emerged as leading techniques for generative modeling within artificial intelligence and deep learning. Generative modeling is an active and growing subfield within unsupervised and semi-supervised machine learning. The goal of generative models is to represent the latent variables that describe a data distribution that can produce or reproduce data examples in high-dimensional space….” The “latent variables” are the features that are extracted and whose distribution is determined. “[0017] Both the encoder 104 and decoder 106 may be implemented with neural networks. The statistics 122, 123, 124 (if any) are the output layer of the encoder 104 and the node activation values in blocks 122, 123 and 124 (if any) can also be called “latent variables” because their role is similar to that of latent variables in probabilistic inference. The sample random variables 105 (akin to a bottleneck layer) that satisfy the statistics 122-124 are then decoded by a decoder network 106 to produce an output that is as close as possible to a copy of the input 103….” “[0234] … Then four descriptive statistics (maximum, minimum, mean, and standard deviation) were calculated from the trend of each modality as features. …” “[0017] T… A SCAN is similar to a VAE, except it uses a different regularization error term and introduces many hyperparameters for detailed control of the regularization.” Figures 1, 2, and 4A show the role of “means 122/222,” “standard deviation 123/223”and the use of “Gaussians 411” as the distribution. “… The SCAN is trained to have an encoder network that outputs, subject to one or more constraints, parameters for parametric probability distributions of sample random variables from input data. The parameters comprise measures of central tendency and measures of dispersion….” Abstract. Central tendency is mean and Dispersion is standard deviation. See the following definitions of latent space in different pieces of art which are from similar art and each arrive at their latent space in their own way; each condenses the information in its own way. Wikipedia: “In most cases, the dimensionality of the latent space is chosen to be lower than the dimensionality of the feature space from which the data points are drawn, making the construction of a latent space an example of dimensionality reduction, which can also be viewed as a form of data compression.[1] Latent spaces are usually fit[clarification needed] via machine learning, and they can then be used as feature spaces in machine learning models, including classifiers and other supervised predictors.” Latent space - Wikipedia. The qualifier of “In most cases” means that the option of feature space = latent space is available unless contradicted by claimed language. Kang (U.S. 20240321265): 0004] For example, an audio signal processing device may use dynamic time warping (DTW), which is a type of forced alignment algorithm, to synchronize text and speech included in an audio signal. In DTW, the audio signal processing device compares two pieces of time-series data, and sequentially finds, starting from a first time point of first time-series data, a time point at which data of time points of the first time-series data and data of time points of the second time-series data have the most similarity. A distance between a time point of the first time series and a time point of the second time series corresponding thereto is called a warping distance, and the audio signal processing device finds an optimal warping path that minimizes the warping distance. The audio signal processor aligns speech and text according to the optimal warping path. The audio signal processor allows the input audio and input text to be converted into the same latent feature space. The audio signal processing device may use non-negative matrix factorization (NMF) to extract audio features and may use vowel class tokens as text features. Lyons (U.S. 20230080736): [0016] In some embodiments, an embedding model, using deep representation learning, is trained using a quadruplet loss technique, where the embedding model learns to project the input sensor's time series data into an embedding vector. Latent space data from similar activities are grouped together while dissimilar activities are far apart. In some embodiments, a quadruplet variational autoencoder (VAE) is used to provide the mean embedding vector (e.g., mean component) and also its covariance (e.g., variance component). Using a reparameterization technique, a VAE decoder learns to reconstruct the filtered (e.g., noise removed) input examples across all the four encoders. The quadruplet loss is implemented on the mean embedding vector. Once trained and during inference, the embedding vector from the learned model is fed into a tracker, which uses a constant velocity to track the embedding vector over time. In some embodiments, classification gating is applied to handle spurious predictions from one-off input data. A classifier unit, such as a linear classifier, operates on the tracked embedding vector to predict and reject unknown classes using distance thresholding techniques. The activity classification system can handle sensor artifacts (e.g., noise due to sensor degradation) and environment uncertainties important for practical activity recognition solutions. Ceccaldi (U.S. 10,624,558): “The generator includes an encoder that generates a latent space, e.g. a compact representation of the input MR data. The latent space includes values that describe distilled features of the input MR data. The generator also includes a decoder that uses the latent space generated by the encoder to reconstruct the object and masks, for example brain and tissue masks.” 6:25-31. Oishi (U.S. 20250069614): [0034] An embedding vector of an image and an embedding vector of a voice are obtained on the basis of a large amount of pair data of the image and the voice describing the content of the image. In the learning stage before the estimation stage, the encoding unit 121a performs deep distance learning so that the embedding vector of the image and the embedding vector of the voice are arranged close to each other in the latent space (audiovisual embedding space). Sorensen (U.S. 20240374136): [0218] In the lower bounds the contribution of z and y in the unlabeled case and z in the labeled case is marginalized out. For the unlabeled case y is treated as latent variable and is sampled by summing over the two classes, and for z the integral is approximated by sampling from the Gaussian distribution in the latent space. In the case of labeled data, optimization for the labels y is done using binary crossentropy. Liu (U.S. 20240005942): PNG media_image5.png 554 432 media_image5.png Greyscale Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached on 9 to 5, M-F. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /Fariba Sirjani/ Primary Examiner, Art Unit 2659
Read full office action

Prosecution Timeline

Show 6 earlier events
Dec 11, 2025
Applicant Interview (Telephonic)
Dec 16, 2025
Response after Non-Final Action
Feb 09, 2026
Request for Continued Examination
Feb 17, 2026
Response after Non-Final Action
Apr 01, 2026
Non-Final Rejection mailed — §103
May 11, 2026
Examiner Interview Summary
May 11, 2026
Interview Requested
May 11, 2026
Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12640143
UTILIZING GENERATIVE MODEL IN GENERATING SUMMARY OF LONG-FORM CONTENT
2y 5m to grant Granted May 26, 2026
Patent 12640159
VOICE SIGNAL PROCESSING DEVICE, VOICE SIGNAL PROCESSING METHOD, AND NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM STORING VOICE SIGNAL PROCESSING PROGRAM
1y 10m to grant Granted May 26, 2026
Patent 12614558
Method and Apparatus for Detecting Correctness of Pitch Period
2y 8m to grant Granted Apr 28, 2026
Patent 12605109
APPARATUS AND METHOD FOR DETERMINING BRAIN LANGUAGE AREA INVASION BASED ON SPEECH DATA
2y 3m to grant Granted Apr 21, 2026
Patent 12603099
SELF-ADJUSTING ASSISTANT LLMS ENABLING ROBUST INTERACTION WITH BUSINESS LLMS
2y 7m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4
Expected OA Rounds
76%
Grant Probability
99%
With Interview (+31.5%)
2y 9m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 554 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month