Last updated: May 29, 2026
Application No. 18/589,806
GENERATING NATURAL LANGUAGE DESCRIPTION OF A SOFTWARE CODE

Non-Final OA §103
Filed
Feb 28, 2024
Examiner
LOUIE, HOWARD H
Art Unit
2494
Tech Center
2400 — Computer Networks
Assignee
CYLANCE INC.
OA Round
2 (Non-Final)
Interview Optional

— +60.1% interview lift. Examiner has a relatively high allowance rate (83%); +60.1% interview lift. A written response may suffice.
Based on 186 resolved cases, 2023–2026
Examiner Intelligence

LOUIE, HOWARD H View full profile →
Grants 83% — above average
Career Allowance Rate
154 granted / 186 resolved
+24.8% vs TC avg
Strong +60% interview lift
Without
With
+60.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
12 currently pending
Career history
199
Total Applications
across all art units
Statute-Specific Performance

§101
3.3%
-36.7% vs TC avg
§103
87.8%
+47.8% vs TC avg
§102
1.2%
-38.8% vs TC avg
§112
4.9%
-35.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 186 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Information Disclosure Statement
	The information disclosure statement(s) (IDS) submitted on 1/21/2026 and 2/4/2026 is/are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement(s) is/are being considered by the examiner.

Response to Amendment
This communication is in response to the amendment filed on 12/17/2025. The Examiner acknowledges amended claims 1, 3-8, 10-15, and 17-20. Claims 2, 9, and 16 have been canceled. Claims 1, 3-8, 10-15, and 17-20 are pending and claims 1, 3-8, 10-15, and 17-20 are rejected.  Claims 1, 8, and 15 is/are independent. 
The rejection(s) of claims under 35 U.S.C. § 101 are withdrawn in view of Applicant's amendments.
The rejection(s) of claims under 35 U.S.C. § 103 have been updated based on new grounds of rejection as indicated below. 
The rejections of claims 3, 10, and 17 have been corrected.
	
Response to Arguments
Applicant's arguments filed 12/17/2025 have been fully considered.  Applicant argues (see Remarks, page 5, third paragraph through page 7, bottom paragraph) that the references cited in the previous rejection fail to disclose the newly amended claim features.  This argument is persuasive. Therefore, the rejections are withdrawn. However, upon further consideration, a new ground of rejection is made in view of Acharya et al. U.S. Publication 20250045148 (hereinafter “Acharya”) in view of Salem et al. U.S. Publication 20200372150 (hereinafter “Salem”), further in view of Harang et al. U.S. Publication 20220156372 20240020116 (hereinafter “Harang”).
Harang teaches training a model using software instance samples with labels indicating a respective sample is malicious (para. 7 and 109).
Independent claims 8 and 15 recite limitations analogous to the limitations of claim 1 and are also rejected for similar reasons. Regarding applicant’s arguments with respect to the dependent claims, applicant’s amendments to the independent claims have necessitated a new ground of rejection with respect to the independent claims from which the dependent claims depend, thereby requiring new grounds of rejection for the dependent claims also. 
Accordingly, Applicant's argument is persuasive, the rejection is withdrawn, and new ground(s) of rejection are presented herein. Note that this action is made FINAL. See MPEP § 706.07(a).	
	
Claim Rejections - 35 USC § 103
	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
	
	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

	Claims 1, 3, 7-8, 10, 14-15, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Acharya et al. U.S. Publication 20250045148 (hereinafter “Acharya”) in view of Salem et al. U.S. Publication 20200372150 (hereinafter “Salem”), further in view of Harang et al. U.S. Publication 20220156372 (hereinafter “Harang”).

As per claim 1, Acharya discloses 
A method, comprising: 
[0004] Examples of the present disclosure describe systems and methods1 for automatically detecting and repairing reliability issues in operating systems and applications using a generative artificial intelligence (“AI”) system. 

processing a code [software code file, para. 4, 17, 33, 34] by using a file encoder model [instruction identifier, para. 59; because instruction identifier accepts file input (e.g., software code file, para. 4, 17) and produces output that is a vector, the operational logic of instruction identifier discloses a file encoder model] to obtain a file embedding vector; [vector representations, para. 59] and
para. 59 instruction identifier 208 creates (or provide instructions for creating) vector representations of the context and/or the lines of software code 

[0034] Code failure explanation API 116 provides functionality that enables a user to request a natural language explanation of the cause of failure for an indicated portion of software code. In some examples, code failure explanation API 116 generates or causes the generation of a context for a failing portion of software code and extracts lines of software code corresponding to the failing portion of software code from a corresponding software code file, as discussed above with respect to code explanation API 114. 

processing the file embedding vector to obtain a text description [natural language explanation, para. 44] of the code by using a large language model (LLM).[ language model 120 ]

Para. 14 the language model is a large language model (“LLM”). 
Acharya [0044] Upon receiving input, language model 120 processes the input and outputs a response corresponding to a user request associated with the input. For instance, in response to receiving input from code explanation API 114 that is associated with a request for a natural language explanation of an indicated portion of software code, language model 120 outputs the natural language explanation to code explanation API 114.

	However, Acharya does not expressly disclose binary code
wherein the file encoder model is trained based on a training set of description sample pairs, wherein each description sample pair in the training set includes a binary code sample and a description text sample describing security risk of the binary code sample
Salem discloses processing binary code [binary code in original file, para. 136] using a model [embedding layer (e.g. Word2Vec), para. 148; neural network, co-occurrence matrix, or probabilistic models, para. 147] to obtain an embedding vector [map to vector representations, para. 148; undergo embedding, para. 147]

[0147] However, in some embodiments, prior to downsampling, the code may undergo embedding, which can refer to a modeling technique used for mapping the code to vectors of real numbers. It can represent the code in vector space with one or more dimensions. Embeddings can be generated using various methods like neural networks, co-occurrence matrix, or probabilistic models. For example, Word2Vec consists of models for generating word embedding. These models can be shallow two layer neural networks having one input layer, one hidden layer and one output layer. In some embodiments, embedding reformats the code such that code that is present in a similar context tends to be closer to each other in a produced vector space. In some embodiments, the embedding results in a four dimension vector space. The embedding step may be necessary because the neural network functions using numerical values as inputs. In some embodiments, the neural network takes as input numerical values which may be received from convolutions, additions, applications, and/or numerical transformations. In some embodiments, the neural network is not configured to use the raw code as an input. In some embodiments, in order to transform the code into meaningful numerical values which can then be down sampled and inputted into the neural network, embedding must be utilized.

[0148] Referring to FIG. 8, each of a plurality of channels 802A, 802B can represent one of the vector layers of the embedded vector space. In some embodiments, each vector layer is very large and must be further consolidated into a plurality of blocks 804A, 804B. In some embodiments, each channel 802A, 802B may be consolidated into N blocks. Once the channels are separated into blocks, a filter 806 may be used to produce a response value or sample which may represent that channel and specific blocks of the channel. The purpose of the filtering mechanism can be to provide a fixed size vector input to the neural network. In some embodiments the code is inputted into an embedding layer (e.g. Word2Vec), as described above, which may store an embedding table to map code fragments represented by indexes to vector representations. In some embodiments, the embedding may comprise a representation of the code where similar code fragments are assigned similar representations.

Salem [0136] Compared to approaches of converting a malware binary file to a two-dimensional image before doing classification, this approach may be simpler since the height and width of the image do not need to be determined. In some embodiments, converting to a byte stream preserves the order of the binary code in the original file, and this sequential representation of the data makes it natural to apply a neural network architecture to the data. In some embodiments, each byte stream is scaled to a predetermined size. In some embodiments, the scaled code corresponds to a sequence of 1-byte values.

[0153] result of the embedding and filtering steps is a down sampled, embedded code sample, which can be input into the neural network for feature generation.

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified Acharya with the technique for processing binary code using a model to obtain an embedding vector of Salem to include processing a binary code by using a file encoder model to obtain a file embedding vector;
One of ordinary skill in the art would have made this modification to improve the ability of the system to utilize a large language model to analyze the binary file. The system of the primary reference can be modified to apply a model to extract information from binary code to generate a vector, in order to facilitate processing by a large language model.
However, the combination of Acharya and Salem does not expressly disclose wherein the file encoder model is trained based on a training set of description sample pairs, wherein each description sample pair in the training set includes a binary code sample and a description text sample describing security risk of the binary code sample

Harang discloses wherein the file encoder model is trained based on a training set of description sample pairs, wherein each description sample pair in the training set includes a binary code sample and a description text sample describing security risk of the binary code sample

[0007] Implementations may include one or more of the following features. The corresponding set of numbers of detection may be each within a predetermined threshold of the first number of detections. The predetermined threshold may be an absolute numerical threshold. The predetermined threshold may be a relative threshold scaled according to a ratio of a size of the new data set to the size of each of the synthetic data sets. The new data set may include live samples analyzed for an enterprise by the malware detection system. Evaluating the true positive rate and the false positive rate for the malware detection system may include measuring the true positive rate and the false positive rate for the malware detection system when applied to a base data set having a known composition of malware instances and benign instances. The malware detection system may be a machine learning model trained to detect malware based on a training data set, further where each software instance in the training data set is labeled to indicate a malware status. The computer executable code may further perform the step of updating the true positive rate and the false positive rate based on additional software instances received by the malware detection system and automatically labeled by the malware detection system as safe or malicious.
[0109] As shown in step 904, the method 900 may include updating the detection system using live data. For example, this may include revising a prior probability distribution function or other distribution derived from the base data set as automated or manual malware distribution data becomes available from the detection system on live data. In one aspect, this may include updating the true positive rate and the false positive rate based on additional software instances received by the malware detection system and automatically labeled by the malware detection system as safe or malicious.

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Acharya and Salem with the technique for training a model using software instance samples with labels indicating a respective sample is malicious of Harang to include wherein the file encoder model is trained based on a training set of description sample pairs, wherein each description sample pair in the training set includes a binary code sample and a description text sample describing security risk of the binary code sample
One of ordinary skill in the art would have made this modification to improve the ability of the system to train a model to detect potentially malicious executable files. The system of the primary reference can be modified to train the model with software instance samples with labels that indicate whether a respective sample is malicious. The indication of whether the sample is malicious would be the description text sample.

As per claim 3, the rejection of claim 1 is incorporated herein. 
However, Acharya does not expressly disclose wherein the file encoder model comprises a pretrained embedding model in sequence with a translator model.
Salem discloses wherein the file encoder model comprises a pretrained embedding model [embedding layer (e.g. Word2Vec), para. 148; neural network, co-occurrence matrix, or probabilistic models, para. 147; the Salem models are pre-trained because an untrained model is not usable] in sequence with a translator model.[ channel filtering system, para. 149; filtering mechanisms, para. 153; downsampling, para. 147; another separate model (e.g. Word2Vec), para. 138;  resize a matrix to a fixed size at 606, para. 139; using linear interpolation to resize the matrix to a fixed size, para. 139]
[0138] In some embodiments, linear interpolation uses the fact that similar byte values have a similar semantic meaning. For example, this makes sense for images: a pixel with value 230 and a pixel with value 228 look very similar in color. However, in some embodiments, this is not the case in executable code: two byte values that are close can represent completely different opcodes. Thus, in some embodiments, an embedding table is utilized before rescaling the data by training another separate model (e.g. Word2Vec) on sections of executable code. In some embodiments, the separate model transforms the data into a numerical form that the neural network can understand. In some embodiments, each byte in the data can be translated to a fixed-size vector using the learned embedding table, and vectors in this new dimension maintain the required property for linear interpolation: Euclidean similarity indicates semantic similarity.

Salem [0147] However, in some embodiments, prior to downsampling, the code may undergo embedding, which can refer to a modeling technique used for mapping the code to vectors of real numbers…  in order to transform the code into meaningful numerical values which can then be down sampled and inputted into the neural network, embedding must be utilized.

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified Acharya with the technique for utilizing an embedding model followed by another model such as a model for resizing a matrix of Salem to include wherein the file encoder model comprises a pretrained embedding model in sequence with a translator model
One of ordinary skill in the art would have made this modification to improve the ability of the system to modify the training data as needed before input into the large language model. The system of the primary reference can be modified to utilize an embedding model followed by another model such as a model for resizing a matrix.

As per claim 7, the rejection of claim 1 is incorporated herein. 
Acharya discloses
outputting the text description of the binary code.[ outputs the natural language explanation to code explanation API 114, para. 44]

Acharya [0044] Upon receiving input, language model 120 processes the input and outputs a response corresponding to a user request associated with the input. For instance, in response to receiving input from code explanation API 114 that is associated with a request for a natural language explanation of an indicated portion of software code, language model 120 outputs the natural language explanation to code explanation API 114.

As per claim 8, the claim(s) is/are directed to a non-transitory computer readable medium with limitations which correspond to limitations of claim 1, and is/are rejected for the reasons detailed with respect to claim 1.  Claim 8 also recites A non-transitory computer-readable medium containing instructions which, when executed, cause an electronic device to perform operations comprising:
Acharya discloses A computer-readable medium containing instructions [computer-readable media, para. 82; computer-readable instructions, para. 82] which, when executed, [, para. 26 executed by hardware components] cause an electronic device to perform operations comprising:[ system 100 are implemented on a single computing device, para. 26;]
[0082] The term computer readable media as used herein may include computer storage media. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, or program modules. The system memory 504, the removable storage device 507, and the non-removable storage device 510 are all computer storage media examples (e.g., memory storage). Computer storage media includes RAM, ROM, electrically erasable ROM (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other article of manufacture which can be used to store information, and which can be accessed by the computing device 500. Any such computer storage media may be part of the computing device 500. Computer storage media does not include a carrier wave or other propagated or modulated data signal.
Acharya [0026] FIG. 1 illustrates an example system for automatically detecting and repairing reliability issues in operating systems and applications using a generative AI system. System 100, as presented, is a combination of interdependent components that interact to form an integrated whole. Components of system 100 may be hardware components or software components (e.g., APIs, modules, runtime libraries) implemented on and/or executed by hardware components of system 100. In one example, components of system 100 are implemented on a single computing device. In another example, components of system 100 are distributed across multiple computing devices and/or computing systems.

As per claim 10, the claim(s) is/are directed to a computer readable medium with limitations which correspond to limitations of claim 3, and is/are rejected for the reasons detailed with respect to claim 3.

As per claim 14, the claim(s) is/are directed to a computer readable medium with limitations which correspond to limitations of claim 7, and is/are rejected for the reasons detailed with respect to claim 7.  

As per claim 15, the claim(s) is/are directed to a computer-implemented system with limitations which correspond to limitations of claim 1, and is/are rejected for the reasons detailed with respect to claim 1.  Claim 8 also recites  A computer-implemented system, comprising:
one or more computers; and
one or more computer memory devices interoperably coupled with the one or more computers and having tangible, non-transitory, machine-readable media storing one or more instructions that, when executed by the one or more computers, perform one or more operations comprising:
Acharya discloses A computer-implemented system, comprising:
one or more computers; and [computing device 500, para. 82]
one or more computer memory devices [system memory, para. 82] interoperably coupled with the one or more computers and having tangible, non-transitory, machine-readable media storing one or more instructions [computer-readable media, para. 82; computer-readable instructions, para. 82] that, when executed [, para. 26 executed by hardware components] cause by the one or more computers, [ system 100 are implemented on a single computing device, para. 26;]perform one or more operations comprising: [perform operations, para. 86]
[0082] The term computer readable media as used herein may include computer storage media. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, or program modules. The system memory 504, the removable storage device 507, and the non-removable storage device 510 are all computer storage media examples (e.g., memory storage). Computer storage media includes RAM, ROM, electrically erasable ROM (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other article of manufacture which can be used to store information, and which can be accessed by the computing device 500. Any such computer storage media may be part of the computing device 500. Computer storage media does not include a carrier wave or other propagated or modulated data signal.
Acharya [0026] FIG. 1 illustrates an example system for automatically detecting and repairing reliability issues in operating systems and applications using a generative AI system. System 100, as presented, is a combination of interdependent components that interact to form an integrated whole. Components of system 100 may be hardware components or software components (e.g., APIs, modules, runtime libraries) implemented on and/or executed by hardware components of system 100. In one example, components of system 100 are implemented on a single computing device. In another example, components of system 100 are distributed across multiple computing devices and/or computing systems.
[0086] In another example, the technology discussed herein relates to a device comprising: a processing system; and memory coupled to the processing system, the memory comprising computer executable instructions that, when executed, perform operations comprising

As per claim 17, the claim(s) is/are directed to a computer-implemented system with limitations which correspond to limitations of claim 3, and is/are rejected for the reasons detailed with respect to claim 3.   
  
Claims 4-6, 11-13, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Acharya in view of Salem, further in view of Harang, further in view of Zweig et al. U.S. Publication 20140229158 (hereinafter “Zweig”).
As per claim 4, the rejection of claim 1 is incorporated herein. 
	However, the combination of Acharya, Salem, and Harang does not expressly disclose wherein the LLM is trained based on a training set of embedding sample pairs, wherein each embedding sample pair in the training set includes a description text sample and a text embedding vector.
Zweig discloses training a model [neural network model, para. 40] with a sample pair that includes a description text sample [input vector, para. 55; the input vector includes text from the input information ] and a text embedding vector [feature vector, para. 46; Feature vector is generated based on the words of input information]
Zweig [0003] A system is described herein which uses a neural network having an input layer that accepts an input vector and a feature vector. The input vector represents at least part of input information, such as, but not limited to, a word in sequence of words. 

[0040] The neural network 104 operates on the basis of a model 106. The model 106 may correspond to a set of weighting matrices that are applied to various vectors in the neural network 104 (to be described in greater detail below). A training system 108 produces the model 106 based on a corpus of training examples in a data store 110. The training system 108 can produce the model 106 using any technique, such as a standard back projection technique. Each training example constitutes an established case in which an instance of input information, together with an instance of context information, maps into a particular instance of output information.

[0055] A neural network 214 of any type accepts the input vector and the feature vector, and, in response, generates an output vector. The output information can be interpreted in different ways for different respective applications of the system 202, where each application employs an appropriately trained model. Section C sets forth several illustrative applications.

[0046] The GCEM 208 can formulate a feature vector based on the words in the input information. It is characterized as a "given" extraction module insofar as the input information is given, and not obtained from a separate source. For example, the GCEM 208 may analyze a sliding block 212 of words in an input sequence of words. The block 212 of words may be regarded in aggregate as an input document. In the simplified case of FIG. 2, the sliding block 212 includes words w.sub.2-w.sub.7. Here, the block 212 includes the word w.sub.7 that is used to encode the input vector; but in other implementations, the block 212 can include just the words which precede the word w.sub.7, excluding the word w.sub.7. Further, in other implementations, the block 212 may include many more words than six, such as 50 words in one implementation.

Para. 93 the system 102 generates an input vector that represents a word or words in the sequence of words.

Para. 99 input vector that represents language information that is input or otherwise provided while the user is engaging in a particular activity, such as playing a game.

Zweig Para. 84 CIPM 114 computes a feature vector f(t) associated with the new document, which expresses a distribution of topics associated with the new document.

Para. 72 first implementation uses the Latent Dirichlet Allocation (LDA) technique to generate a feature vector. The second implementation uses the Latent Semantic Analysis (LSA) to generate a feature vector. 

Para. 94 feature vector that indicates that a user resides in the state of Washington (as opposed to Wisconsin, for example), or a feature vector that indicates that many of the user's previous queries pertain to the state of Washington, etc.

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Acharya, Salem, and Harang  with the technique for training a model with a sample pair that includes a description text sample and a text embedding vector of Zweig to include wherein the LLM is trained based on a training set of embedding sample pairs, wherein each embedding sample pair in the training set includes a description text sample and a text embedding vector.
One of ordinary skill in the art would have made this modification to improve the ability of the system to train a model, such as a large language model, using text information. The system of the primary reference can be modified to generate a training set of sample pairs that each includes a description text sample and a text embedding vector, to facilitate training of the LLM.

As per claim 5, the rejection of claim 4 is incorporated herein. 
	However, the combination of Acharya, Salem, and Harang does not expressly disclose wherein the text embedding vector is generated by using a text language model based on the description text sample.
Zweig discloses generating a text embedding vector [feature vector, para. 46, 72, 84, 94] by using a text language model [GCEM 208, para. 46; CIPM 114] based on descriptive text sample [words in the input information, para. 46]
[there is no definition of text embedding vector in the specification; Feature vector may have a distribution of topics, para. 84; the different content that can be included in the feature vector includes text]

[0072] This Section sets forth two implementations of the context information-providing module (CIPM) 114 of FIG. 1, and, more specifically, the given context extraction module (GCEM) 208 of FIG. 2. The first implementation uses the Latent Dirichlet Allocation (LDA) technique to generate a feature vector. The second implementation uses the Latent Semantic Analysis (LSA) to generate a feature vector.

[0046] The GCEM 208 can formulate a feature vector based on the words in the input information. It is characterized as a "given" extraction module insofar as the input information is given, and not obtained from a separate source. For example, the GCEM 208 may analyze a sliding block 212 of words in an input sequence of words. The block 212 of words may be regarded in aggregate as an input document. In the simplified case of FIG. 2, the sliding block 212 includes words w.sub.2-w.sub.7. Here, the block 212 includes the word w.sub.7 that is used to encode the input vector; but in other implementations, the block 212 can include just the words which precede the word w.sub.7, excluding the word w.sub.7. Further, in other implementations, the block 212 may include many more words than six, such as 50 words in one implementation.

Para. 84 CIPM 114 computes a feature vector f(t) associated with the new document, which expresses a distribution of topics associated with the new document.

Para. 72 first implementation uses the Latent Dirichlet Allocation (LDA) technique to generate a feature vector. The second implementation uses the Latent Semantic Analysis (LSA) to generate a feature vector. 

Zweig Para. 94 feature vector that indicates that a user resides in the state of Washington (as opposed to Wisconsin, for example), or a feature vector that indicates that many of the user's previous queries pertain to the state of Washington, etc.

[0067] The training system 108 of FIG. 1 may train the neural network 402 of FIG. 4 using stochastic gradient descent and back-propagation of errors, based on the training examples in the training corpus. Each example constitutes a case in which a particular input vector w(t) and a particular feature vector f(t) are mapped to a particular output vector y(t). The use of the feature vectors in the training operation helps eliminate or reduce difficulties that might otherwise arise in modeling long span dependencies in a recurrent neural network (commonly referred to as the vanishing gradient problem). This is because each feature vector may itself capture aspects of a long span dependency. The model 106 produced by the training system 108 may correspond to a particular instantiation of the matrices U, F, W, V, and G described above.
Zweig [0073] FIG. 5 describes one implementation of the CIPM 114 using LDA. In general, LDA corresponds to a generative technique in which each document in a corpus of training documents is considered, for modeling purposes, to have been generated by the following procedure.

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Acharya, Salem, and Harang with the technique for generating a text embedding vector by using a model based on descriptive text sample of Zweig to include wherein the text embedding vector is generated by using a text language model based on the description text sample.
One of ordinary skill in the art would have made this modification to improve the ability of the system to generate a vector containing text for a large language model, such as when training the large language model. The system of the primary reference can be modified to perform generating a text embedding vector based on descriptive text sample by using a text language model.

As per claim 6, the rejection of claim 5 is incorporated herein. 
	However, the combination of Acharya, Salem, and Harang does not expressly disclose wherein the text language model is selected among a plurality of candidate text language models during a training process of the file encoder model.
Zweig discloses the selection of a text language model [GCEM 208, para. 46 or CIPM 114; Latent Dirichlet Allocation (LDA) or Latent Semantic Analysis (LSA), para. 72; the Zweig system chooses from these models] while training a model

[0072] This Section sets forth two implementations of the context information-providing module (CIPM) 114 of FIG. 1, and, more specifically, the given context extraction module (GCEM) 208 of FIG. 2. The first implementation uses the Latent Dirichlet Allocation (LDA) technique to generate a feature vector. The second implementation uses the Latent Semantic Analysis (LSA) to generate a feature vector.

[0046] The GCEM 208 can formulate a feature vector based on the words in the input information. It is characterized as a "given" extraction module insofar as the input information is given, and not obtained from a separate source. For example, the GCEM 208 may analyze a sliding block 212 of words in an input sequence of words. The block 212 of words may be regarded in aggregate as an input document. In the simplified case of FIG. 2, the sliding block 212 includes words w.sub.2-w.sub.7. Here, the block 212 includes the word w.sub.7 that is used to encode the input vector; but in other implementations, the block 212 can include just the words which precede the word w.sub.7, excluding the word w.sub.7. Further, in other implementations, the block 212 may include many more words than six, such as 50 words in one implementation.

Zweig Para. 84 CIPM 114 computes a feature vector f(t) associated with the new document, which expresses a distribution of topics associated with the new document.

Para. 72 first implementation uses the Latent Dirichlet Allocation (LDA) technique to generate a feature vector. The second implementation uses the Latent Semantic Analysis (LSA) to generate a feature vector. 

Para. 94 feature vector that indicates that a user resides in the state of Washington (as opposed to Wisconsin, for example), or a feature vector that indicates that many of the user's previous queries pertain to the state of Washington, etc.

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Acharya, Salem, and Harang with the technique for generating a text embedding vector by using a model based on descriptive text sample of Zweig to include wherein the text language model is selected among a plurality of candidate text language models during a training process of the file encoder model.
One of ordinary skill in the art would have made this modification to improve the ability of the system to generate a vector containing text for a large language model, such as when training the large language model. The system of the primary reference can be modified to perform generating a text embedding vector based on descriptive text sample by using a selected text language model.
As per claim 11, the claim(s) is/are directed to a computer readable medium with limitations which correspond to limitations of claim 4, and is/are rejected for the reasons detailed with respect to claim 4.  
As per claim 12, the claim(s) is/are directed to a computer readable medium with limitations which correspond to limitations of claim 5, and is/are rejected for the reasons detailed with respect to claim 5.  
As per claim 13, the claim(s) is/are directed to a computer readable medium with limitations which correspond to limitations of claim 6, and is/are rejected for the reasons detailed with respect to claim 6.  
As per claim 18, the claim(s) is/are directed to a computer-implemented system with limitations which correspond to limitations of claim 4, and is/are rejected for the reasons detailed with respect to claim 4.  
As per claim 19, the claim(s) is/are directed to a computer-implemented system with limitations which correspond to limitations of claim 5, and is/are rejected for the reasons detailed with respect to claim 5. 
As per claim 20, the claim(s) is/are directed to a computer-implemented system with limitations which correspond to limitations of claim 6, and is/are rejected for the reasons detailed with respect to claim 6.  

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HOWARD H LOUIE whose telephone number is (571)272-0036.  The examiner can normally be reached on Monday-Friday 9 AM-5 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jung W. Kim can be reached on 571-272-3804.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/HOWARD H. LOUIE/Examiner, Art Unit 2494                                                                                                                                                                                                        
/ROBERT B LEUNG/Primary Examiner, Art Unit 2494                                                                                                                                                                                                        

        1 Emphasis is additional throughout.
Read full office action
Prosecution Timeline

Feb 28, 2024
Application Filed
Oct 16, 2025
Non-Final Rejection mailed — §103
Dec 17, 2025
Response Filed
Mar 09, 2026
Final Rejection mailed — §103
Apr 15, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/410,168
Patent 12639419
CLOUD MANAGED CONFIDENTIAL WORKLOAD ERROR RECOVERY AND REPORTING
2y 4m to grant Granted May 26, 2026
18/697,910
Patent 12591657
METHOD FOR ACQUIRING IDENTITY AUTHENTICATION INFORMATION, APPARATUS, STORAGE MEDIUM AND SYSTEM
1y 12m to grant Granted Mar 31, 2026
18/663,439
Patent 12579230
Multi-Factor Authentication with Increased Security
1y 10m to grant Granted Mar 17, 2026
18/686,631
Patent 12579262
SYSTEMS AND METHODS FOR NEUTRALIZING MALICIOUS CODE WITH NESTED EXECUTUION CONTEXTS
2y 0m to grant Granted Mar 17, 2026
17/719,175
Patent 12574413
SYSTEMS AND METHODS FOR IMPLEMENTING A FAMILY POLICY USING A COOPERATIVE SECURITY FABRIC
3y 11m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
83%
Grant Probability
99%
With Interview (+60.1%)
2y 8m (~5m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 186 resolved cases by this examiner. Grant probability derived from career allowance rate.