Last updated: May 04, 2026
Application No. 17/872,211
MULTI-OUTPUT HEADED ENSEMBLES FOR PRODUCT CLASSIFICATION

Non-Final OA §103
Filed
Jul 25, 2022
Examiner
HONORE, EVEL NMN
Art Unit
2142
Tech Center
2100 — Computer Architecture & Software
Assignee
Rakuten Group Inc.
OA Round
3 (Non-Final)
Interview Optional

— +33.8% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 48% grant rate with +33.8% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 21 resolved cases, 2023–2026
Examiner Intelligence

HONORE, EVEL NMN View full profile →
Grants 48% of resolved cases
Career Allowance Rate
10 granted / 21 resolved
-7.4% vs TC avg
Strong +34% interview lift
Without
With
+33.8%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
35 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
42.4%
+2.4% vs TC avg
§103
50.0%
+10.0% vs TC avg
§102
6.6%
-33.4% vs TC avg
§112
1.1%
-38.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 21 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This action is responsive to the Application filed on 09/05/2025
Claims 1-3, 5, 11-13, 15 and 20-23 are pending in the case. Claims 1, 11 and 20 are
independent claims. Claims 4, 6-10, 14 and 16-19 have been canceled



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 5, 11-13, 15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable
over PASSBAN et al. (US Pub No.: 20220343139 A1), hereinafter referred to as
PASSBAN, in view of Leeman-Munk et al. (US Pub No.: 20160350650 A1), hereinafter referred to as Leeman-Munk.

With respect to claim 1, PASSBAN disclose:
An item classification method using multi-output headed ensembles, the method performed by at least one processor and comprising: receiving four or more text input sequences at four or more estimator threads corresponding to the four or more text input sequences (In paragraph [0043], PASSBAN discloses methods for training a neural network to perform tasks across multiple domains, the adapter network receives an embedding vector that is an encoded representation of the input data and gives back the chances that the input data belongs to each of several possible categories. In paragraph [0064], PASSBAN discloses that the multi-domain training means teaching the neural network model to translate between languages in different technical areas. Each area may use different terms, or the same term may mean different things in different fields. The training uses a dataset called custom-character, which has sentences in the source language (X) and their translations in the target language. Each data sample is a pair (x, y), where x is the text in the source language and y is the translation in the target language. In paragraph [0155], PASSBAN discloses that the computing system includes at least one processing unit, such as a processor.)
tokenizing the four or more text input sequences into one or more first tokens within the four or more estimator threads (In paragraph [0070], PASSBAN discloses a sentence in the source language that is taken from a training dataset called custom-character. Each sentence has a matching translated sentence in the target language. We also know the domain of the original sentence. The sentence is changed into a list of n tokens (w1, w2, ..., wn) using a tokenization method. These tokens are given to the encoder, which turns each token into a vector (h_w1, h_w2, ..., h_wn).)
outputting one or more item classifications based on an output of the four or more estimator threads (In paragraph [0072], PASSBAN discloses that the adapter network generates and outputs domain probabilities representing the likelihood that the unique embedding vector h.sub.<CLS> belongs to each domain (out of a defined set of domains).)
wherein the four or more estimator threads comprise two or more classification text threads operating on text to be classified and two or more metadata text threads operating on metadata corresponding to the text to be classified, wherein each thread of the four or more estimator threads comprises a stack comprising an embedding, an encoding, and a classifying (In paragraph [0079], PASSBAN discloses a neural network model (called M) with certain settings (called θ.sub. M) using a special training dataset (called custom-character) that includes data from different areas. The model can be either model 100a (which has an encoder 102 and a decoder 104) or model 100b (which has an encoder 102 and a classifier 106). The training dataset custom-character is made up of several smaller datasets (called custom-character.sub.i), each representing a different area, noted by the subscript i∈{1 ...d}. Each smaller dataset custom-character.sub.i has data samples {(x.sub.i.sup.1, y.sub.i.sup.1), ..., (x.sub.i.sup. N, y.sub.i.sup. N)}, where each sample has input data x.sub.i.sup. N and a correct output y.sub.i.sup. N (like the correct translation or class label). Sometimes, instead of taking samples from the multi-domain dataset, samples can be taken from multiple smaller datasets. In both cases, training uses multi-domain samples, and both methods are considered the same.)
wherein the embedding for each stack comprises determining one or more coordinates for one or more tokens within an embedding space (In paragraph [0083], PASSBAN discloses a data sample x is tokenized into a set of tokens including the unique token: {<CLS>, w.sub.1, w.sub.2,  w.sub.n}. The encoder 102 processes the set of tokens and generates the set of embedding vectors {h.sub.<CLS>, h.sub.w.sub.1,  h.sub.w.sub.n}. Each embedding vector is a vector representation of the respective token in an embedding latent space (i.e., the latent space defined by all possible embedding vectors generated by the encoder 102).)
wherein the method further comprises: encoding the determined one or more coordinates using one or more convolutional neural network (CNN) weights with a dropout layer, thereby resulting in one or more vectors (In paragraph [0080], PASSBAN discloses the parameters of the adapter network 112 are also set up. These parameters include a weights matrix W, which has d rows (one for each domain in the training data) and a length of dim for the embedding vectors made by the encoder 102. The weights matrix W can also be seen as a set of domain embedding vectors E, where each row e.sub. I represent the i-th domain, and E is made up of all these vectors: E=[e.sub.1|e.sub.2| ...|e.sub’d].)
calculating one or more posterior class probabilities for one or more output heads corresponding to the four or more estimator threads (In paragraph [0084], PASSBAN discloses that the adapter network 112 computes a set of domain probabilities.)
obtaining the one or more item classifications based on the one or more posterior class probabilities at the output heads for the four or more estimator threads (In paragraph [0072], PASSBAN discloses a convolutional neural network (CNN) that takes a special vector called h.sub.<CLS> and gives a list of probabilities. These probabilities show how likely it is that the vector h.sub.<CLS> fits into different categories (called domains). The output probabilities come from a process called softmax. The difference between the output probabilities and the correct category (called custom-character.sub.DM) is calculated’T).
With respect to claim 1, PASSBAN do not specifically disclose:
applying a layer normalizer to the one or more vectors
sending the normalized one or more vectors to an aggregator
However, Leeman-Munk is known to disclose:
Applying a layer normalizer to the one or more vectors (In paragraph [0112], Leeman-Munk discloses that electronic communications can be normalized using neural networks. The neural network can receive a single vector at an input layer of the neural network and transform an output of a hidden layer of the neural network into multiple values that sum to a total value of one.)
Sending the normalized one or more vectors to an aggregator (In paragraph [0111], Leeman-Munk said the processor concatenates all the numerical vector representations for each character into a single vector. In paragraph [0112], Leeman-Munk discloses that the processor transmits the single vector to a neural network. The neural network can be the neural network of the normalizer or the neural network of the flagger. In an example where the processor determined the numerical vector representations using the normalizer, the processor can transmit the single vector to the normalizer. The neural network of the normalizer can use a single vector as an input at an input layer.)

PASSBAN and Leeman-Munk are analogous pieces of art because both references concern the neural network that works across multiple domains. Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify PASSBAN, with using special “unique token” to represent domain/context information and encoding that token inro an embedding vector as taught by PASSBAN, with normalizing using neural network to determine version of the noncanonical communication based on the multiple values as taught by Leeman-Munk. The motivation for doing so would have been to  improve performance of the trained task for data samples across different domains (See[0005] of Leeman-Munk).

Regarding claim 2, PASSBAN in view of Leeman-Munk disclose the elements of claim 1. In addition, PASSBAN disclose:
The method of claim 1, further comprising: applying a backpropagation algorithm to update one or more network weights connecting one or more neural layers in the four or more estimator threads (In paragraph [0072], PASSBAN discloses updating the values of the parameters of the neural network model 100a and the adapter network 112 in backpropagation. )
defining an optimal setting of network parameters using cross-validation with respect to the four or more estimator threads (In paragraph [0053], PASSBAN discloses another training technique that is referred to as contrastive learning. Contrastive learning is a way of learning distinctiveness, and has been mainly used for self-supervised learning. The concept behind contrastive learning is that data samples from the same class (referred to as positive samples) are pulled closer together in an embedding space (i.e., the latent space defined by all possible embeddings generated by the encoder of a transformer-based neural network model), and data samples from different classes (referred to negative samples) are pushed apart in the embedding space.)
 mapping the one or more first tokens to an embedding space (In paragraph [0053], PASSBAN discloses that data samples from different classes (referred to as negative samples) are pushed apart in the embedding space.)

Regarding claim 3, PASSBAN in view of Leeman-Munk disclose the elements of claim 1. In addition, PASSBAN disclose:
The method of claim 1, further comprising: defining one or more hyper parameters using an efficient hyperparameter search technique with respect to the four or more first estimator threads (In paragraph [0049], PASSBAN disclose  hyperparameter that is selected to control the balance between the two loss terms.)

Regarding claim 5, PASSBAN in view of Leeman-Munk disclose the elements of claim 1. In addition, PASSBAN disclose:
The method of claim 1, further comprising: wherein the embedding for each stack comprises determining one or more coordinates for one or more tokens within an embedding space (In paragraph [0083], PASSBAN discloses a data sample x is tokenized into a set of tokens including the unique token: {<CLS>, w.sub.1, w.sub.2, w.sub.n}. The encoder 102 processes the set of tokens and generates the set of embedding vectors {h.sub.<CLS>, h.sub.w.sub.1, h.sub.w.sub.n}. Each embedding vector is a vector representation of the respective token in an embedding latent space (i.e., the latent space defined by all possible embedding vectors generated by the encoder 102).)
With respect to claim 11, PASSBAN disclose:
An apparatus for classifying items using multi-output headed ensembles, the apparatus comprising: a memory storage storing computer program code (In paragraph [0160], PASSBAN disclose the memory may store instructions for implementing any of the architectures and methods disclosed herein for training a neural network model.)
at least one processor communicatively coupled to the memory storage, wherein the processor is configured to execute the computer program code, the program code comprising: receive code configured to cause the at least one processor to receive four or more text input sequences at four or more estimator threads corresponding to the four or more text input sequences (In paragraph [0043], PASSBAN discloses methods for training a neural network to perform tasks across multiple domains, the adapter network receives an embedding vector that is an encoded representation of the input data and gives back the chances that the input data belongs to each of several possible categories. In paragraph [0064], PASSBAN discloses that the multi-domain training means teaching the neural network model to translate between languages in different technical areas. Each area may use different terms, or the same term may mean different things in different fields. The training uses a dataset called custom-character, which has sentences in the source language (X) and their translations in the target language. Each data sample is a pair (x, y), where x is the text in the source language and y is the translation in the target language. In paragraph [0155], PASSBAN discloses that the computing system includes at least one processing unit, such as a processor.)
tokenization code configured to cause the at least one processor to tokenize the four or more text input sequences into four or more first tokens within the four or more estimator threads (In paragraph [0070], PASSBAN discloses a sentence in the source language that is taken from a training dataset called custom-character. Each sentence has a matching translated sentence in the target language. We also know the domain of the original sentence. The sentence is changed into a list of n tokens (w1, w2, ..., wn) using a tokenization method. These tokens are given to the encoder, which turns each token into a vector (h_w1, h_w2, ..., h_wn).)
output code configured to cause the at least one processor to output one or more item classifications based on an output of the four or more first estimator threads (In paragraph [0072], PASSBAN discloses that the adapter network generates and outputs domain probabilities representing the likelihood that the unique embedding vector h.sub.<CLS> belongs to each domain (out of a defined set of domains).)
wherein the four or more estimator threads comprises two or more classification text threads operating on text to be classified and two or more metadata text threads operating on metadata corresponding to the text to be classified, wherein each thread of the four or more estimator threads comprises a stack comprising an embedding, an encoding, and a classifying (In paragraph [0079], PASSBAN discloses a neural network model (called M) with certain settings (called θ.sub. M) using a special training dataset (called custom-character) that includes data from different areas. The model can be either model 100a (which has an encoder 102 and a decoder 104) or model 100b (which has an encoder 102 and a classifier 106). The training dataset custom-character is made up of several smaller datasets (called custom-character.sub.i), each representing a different area, noted by the subscript i∈{1 ...d}. Each smaller dataset custom-character.sub.i has data samples {(x.sub.i.sup.1, y.sub.i.sup.1), ..., (x.sub.i.sup. N, y.sub.i.sup. N)}, where each sample has input data x.sub.i.sup. N and a correct output y.sub.i.sup. N (like the correct translation or class label). Sometimes, instead of taking samples from the multi-domain dataset, samples can be taken from multiple smaller datasets. In both cases, training uses multi-domain samples, and both methods are considered the same.)
wherein the embedding for each stack comprises determining one or more coordinates for one or more tokens within an embedding space (In paragraph [0083], PASSBAN discloses a data sample x is tokenized into a set of tokens including the unique token: {<CLS>, w.sub.1, w.sub.2,  w.sub.n}. The encoder 102 processes the set of tokens and generates the set of embedding vectors {h.sub.<CLS>, h.sub.w.sub.1,  h.sub.w.sub.n}. Each embedding vector is a vector representation of the respective token in an embedding latent space (i.e., the latent space defined by all possible embedding vectors generated by the encoder 102).)
wherein the program code further comprises: encoder code configured to cause the at least one processor to encode the determined one or more coordinates using one or more convolutional neural network (CNN) weights with a dropout layer, thereby resulting in one or more vectors (In paragraph [0080], PASSBAN discloses the parameters of the adapter network 112 are also set up. These parameters include a weights matrix W, which has d rows (one for each domain in the training data) and a length of dim for the embedding vectors made by the encoder 102. The weights matrix W can also be seen as a set of domain embedding vectors E, where each row e.sub. I represent the i-th domain, and E is made up of all these vectors: E=[e.sub.1|e.sub.2| ...|e.sub’d].)
classifier code configured to cause the at least one processor to calculate one or more posterior class probabilities for one or more output heads corresponding to the four or more estimator threads (In paragraph [0084], PASSBAN discloses that the adapter network 112 computes a set of domain probabilities.)
obtain the one or more item classifications based on the one or more posterior class probabilities at the output heads for the four or more estimator threads (In paragraph [0072], PASSBAN discloses a convolutional neural network (CNN) that takes a special vector called h.sub.<CLS> and gives a list of probabilities. These probabilities show how likely it is that the vector h.sub.<CLS> fits into different categories (called domains). The output probabilities come from a process called softmax. The difference between the output probabilities and the correct category (called custom-character.sub.DM) is calculated’T).
With respect to claim 11, PASSBAN do not specifically disclose:
apply a layer normalizer to the one or more vectors
send the normalized one or more vectors to an aggregator
However, Leeman-Munk is known to disclose:
Apply a layer normalizer to the one or more vectors (In paragraph [0112], Leeman-Munk discloses that electronic communications can be normalized using neural networks. The neural network can receive a single vector at an input layer of the neural network and transform an output of a hidden layer of the neural network into multiple values that sum to a total value of one.)
Send the normalized one or more vectors to an aggregator (In paragraph [0111], Leeman-Munk said the processor concatenates all the numerical vector representations for each character into a single vector. In paragraph [0112], Leeman-Munk discloses that the processor transmits the single vector to a neural network. The neural network can be the neural network of the normalizer or the neural network of the flagger. In an example where the processor determined the numerical vector representations using the normalizer, the processor can transmit the single vector to the normalizer. The neural network of the normalizer can use a single vector as an input at an input layer.)

PASSBAN and Leeman-Munk are analogous pieces of art because both references concern the neural network that works across multiple domains. Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify PASSBAN, with using special “unique token” to represent domain/context information and encoding that token inro an embedding vector as taught by PASSBAN, with normalizing using neural network to determine version of the noncanonical communication based on the multiple values as taught by Leeman-Munk. The motivation for doing so would have been to  improve performance of the trained task for data samples across different domains (See[0005] of Leeman-Munk).

Regarding claim 12, PASSBAN in view of Leeman-Munk disclose the elements of claim 11. In addition, PASSBAN disclose:
The apparatus of claim 11, the computer program code further comprising: training code configured to cause the at least one processor to: apply a backpropagation algorithm to update one or more network weights connecting one or more neural layers in the four or more estimator threads (In paragraph [0072], PASSBAN discloses updating the values of the parameters of the neural network model 100a and the adapter network 112 in backpropagation.)
define an optimal setting of network parameters using cross-validation with respect to the four or more estimator threads (In paragraph [0053], PASSBAN discloses another training technique that is referred to as contrastive learning. Contrastive learning is a way of learning distinctiveness, and has been mainly used for self-supervised learning. The concept behind contrastive learning is that data samples from the same class (referred to as positive samples) are pulled closer together in an embedding space (i.e., the latent space defined by all possible embeddings generated by the encoder of a transformer-based neural network model), and data samples from different classes (referred to negative samples) are pushed apart in the embedding space.)
map code configured to cause the at least one processor to map the one or more first tokens to an embedding space (In paragraph [0053], PASSBAN discloses that data samples from different classes (referred to as negative samples) are pushed apart in the embedding space.)

Regarding claim 13, PASSBAN in view of Leeman-Munk disclose the elements of claim 11. In addition, PASSBAN disclose:
The apparatus of claim 11, the computer program code further comprising: training code configured to cause the at least one processor to define one or more hyper parameters using an efficient hyperparameter search technique with respect to the four or more estimator threads (In paragraph [0049], PASSBAN disclose  hyperparameter that is selected to control the balance between the two loss terms.)

Regarding claim 15, PASSBAN in view of Leeman-Munk disclose the elements of claim 11. In addition, PASSBAN disclose:
The apparatus of claim 11, wherein the computer program code further comprises: determination code configured to cause the at least one processor to determine one or more coordinates for one or more tokens within an embedding space (In paragraph [0083], PASSBAN discloses a data sample x is tokenized into a set of tokens including the unique token: {<CLS>, w.sub.1, w.sub.2, w.sub.n}. The encoder 102 processes the set of tokens and generates the set of embedding vectors {h.sub.<CLS>, h.sub.w.sub.1, h.sub.w.sub.n}. Each embedding vector is a vector representation of the respective token in an embedding latent space (i.e., the latent space defined by all possible embedding vectors generated by the encoder 102).)

With respect to claim 20, PASSBAN disclose:
A non-transitory computer-readable medium comprising computer program code for classifying items using multi-output headed ensembles by an apparatus, the computer program code to be executed by at least one processor of the apparatus, the computer program code comprising: receive code configured to cause the at least one processor to receive one or more text input sequences at one or more first estimator threads corresponding to the one or more text input sequences (In paragraph [0043], PASSBAN discloses methods for training a neural network to perform tasks across multiple domains, the adapter network receives an embedding vector that is an encoded representation of the input data and gives back the chances that the input data belongs to each of several possible categories. In paragraph [0064], PASSBAN discloses that the multi-domain training means teaching the neural network model to translate between languages in different technical areas. Each area may use different terms, or the same term may mean different things in different fields. The training uses a dataset called custom-character, which has sentences in the source language (X) and their translations in the target language. Each data sample is a pair (x, y), where x is the text in the source language and y is the translation in the target language. In paragraph [0155], PASSBAN discloses that the computing system includes at least one processing unit, such as a processor.)
tokenization code configured to cause the at least one processor to tokenize the one or more text input sequences into one or more first tokens within the one or more first estimator threads (In paragraph [0070], PASSBAN discloses a sentence in the source language that is taken from a training dataset called custom-character. Each sentence has a matching translated sentence in the target language. We also know the domain of the original sentence. The sentence is changed into a list of n tokens (w1, w2, ..., wn) using a tokenization method. These tokens are given to the encoder, which turns each token into a vector (h_w1, h_w2, ..., h_wn).)
output code configured to cause the at least one processor to output one or more item classifications based on an output of the one or more first estimator threads (In paragraph [0072], PASSBAN discloses that the adapter network generates and outputs domain probabilities representing the likelihood that the unique embedding vector h.sub.<CLS> belongs to each domain (out of a defined set of domains).)
wherein the four or more estimator threads comprise two or more classification text threads operating on text to be classified and two or more metadata text threads operating on metadata corresponding to the text to be classified, wherein each thread of the four or more estimator threads comprises a stack comprising an embedding, an encoding, and a classifying (In paragraph [0079], PASSBAN discloses a neural network model (called M) with certain settings (called θ.sub. M) using a special training dataset (called custom-character) that includes data from different areas. The model can be either model 100a (which has an encoder 102 and a decoder 104) or model 100b (which has an encoder 102 and a classifier 106). The training dataset custom-character is made up of several smaller datasets (called custom-character.sub.i), each representing a different area, noted by the subscript i∈{1 ...d}. Each smaller dataset custom-character.sub.i has data samples {(x.sub.i.sup.1, y.sub.i.sup.1), ..., (x.sub.i.sup. N, y.sub.i.sup. N)}, where each sample has input data x.sub.i.sup. N and a correct output y.sub.i.sup. N (like the correct translation or class label). Sometimes, instead of taking samples from the multi-domain dataset, samples can be taken from multiple smaller datasets. In both cases, training uses multi-domain samples, and both methods are considered the same.)
wherein the embedding for each stack comprises determining one or more coordinates for one or more tokens within an embedding space (In paragraph [0083], PASSBAN discloses a data sample x is tokenized into a set of tokens including the unique token: {<CLS>, w.sub.1, w.sub.2,  w.sub.n}. The encoder 102 processes the set of tokens and generates the set of embedding vectors {h.sub.<CLS>, h.sub.w.sub.1,  h.sub.w.sub.n}. Each embedding vector is a vector representation of the respective token in an embedding latent space (i.e., the latent space defined by all possible embedding vectors generated by the encoder 102).)
wherein the program code further comprises: encoder code configured to cause the at least one processor to encode the determined one or more coordinates using one or more convolutional neural network (CNN) weights with a dropout layer, thereby resulting in one or more vectors (In paragraph [0080], PASSBAN discloses the parameters of the adapter network 112 are also set up. These parameters include a weights matrix W, which has d rows (one for each domain in the training data) and a length of dim for the embedding vectors made by the encoder 102. The weights matrix W can also be seen as a set of domain embedding vectors E, where each row e.sub. I represent the i-th domain, and E is made up of all these vectors: E=[e.sub.1|e.sub.2| ...|e.sub’d].)
classifier code configured to cause the at least one processor to calculate one or more posterior class probabilities for one or more output heads corresponding to the four or more estimator threads (In paragraph [0084], PASSBAN discloses that the adapter network 112 computes a set of domain probabilities.)
obtain the one or more item classifications based on the one or more posterior class probabilities at the output heads for the four or more estimator threads (In paragraph [0072], PASSBAN discloses a convolutional neural network (CNN) that takes a special vector called h.sub.<CLS> and gives a list of probabilities. These probabilities show how likely it is that the vector h.sub.<CLS> fits into different categories (called domains). The output probabilities come from a process called softmax. The difference between the output probabilities and the correct category (called custom-character.sub.DM) is calculated’T).
With respect to claim 20, PASSBAN do not specifically disclose:
apply a layer normalizer to the one or more vectors
send the normalized one or more vectors to an aggregator
However, Leeman-Munk is known to disclose:
Apply a layer normalizer to the one or more vectors (In paragraph [0112], Leeman-Munk discloses that electronic communications can be normalized using neural networks. The neural network can receive a single vector at an input layer of the neural network and transform an output of a hidden layer of the neural network into multiple values that sum to a total value of one.)
Send the normalized one or more vectors to an aggregator (In paragraph [0111], Leeman-Munk said the processor concatenates all the numerical vector representations for each character into a single vector. In paragraph [0112], Leeman-Munk discloses that the processor transmits the single vector to a neural network. The neural network can be the neural network of the normalizer or the neural network of the flagger. In an example where the processor determined the numerical vector representations using the normalizer, the processor can transmit the single vector to the normalizer. The neural network of the normalizer can use a single vector as an input at an input layer.)

PASSBAN and Leeman-Munk are analogous pieces of art because both references concern the neural network that works across multiple domains. Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify PASSBAN, with using special “unique token” to represent domain/context information and encoding that token inro an embedding vector as taught by PASSBAN, with normalizing using neural network to determine version of the noncanonical communication based on the multiple values as taught by Leeman-Munk. The motivation for doing so would have been to improve performance of the trained task for data samples across different domains (See[0005] of Leeman-Munk).
Claims 21 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over PASSBAN in view of Leeman-Munk and further in view of JIN et al. (US Pub No.: 20220129497 A1), hereinafter referred to as JIN.

Regarding claim 21,  PASSBAN in view of Leeman-Munk disclose the element of claim 1. PASSBAN in view of Leeman-Munk do not explicitly disclose:
The method of claim 1, wherein the first piece of metadata is a shop identifier or a brand identifier
However, JIN disclose the limitation (In paragraph [0069], JIN discloses that a product identifier may be data that uniquely identifies a product stored in a database.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention having the teachings of PASSBAN in view of Leeman-Munk before them, to include JIN’s , with receiving image data representing an image, the image being associated with a product identifier as taught by JIN. The motivation for doing so would have been to increase the computational power required in machine learning (See[0078] of JIN).

Regarding claim 23,  PASSBAN in view of Leeman-Munk disclose the element of claim 1. PASSBAN in view of Leeman-Munk do not explicitly disclose:
The method of claim 1, wherein the text to be classified comprises a product title
However, JIN disclose the limitation (In paragraph [0036], JIN disclose with an identifier on the package (e.g., a barcode, an image, a text).
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention having the teachings of PASSBAN in view of Leeman-Munk before them, to include JIN’s , with receiving image data representing an image, the image being associated with a product identifier as taught by JIN. The motivation for doing so would have been to increase the computational power required in machine learning (See[0078] of JIN).



Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over PASSBAN in view of Leeman-Munk and further in view of Bourdev et al. (US Pub No.: 20230018461 A1), hereinafter referred to as Bourdev.

Regarding claim 22,  PASSBAN in view of Leeman-Munk disclose the element of claim 21. PASSBAN in view of Leeman-Munk do not explicitly disclose:
The method of claim 1, wherein each encoder associated with a respective one of the two or more metadata text threads has a kernel size of one 
However, Bourdev disclose the limitation (In paragraph [0060], Bourdev disclose one encoded filter associated with content (e.g., text, image, audio, video, etc.))
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention having the teachings of PASSBAN in view of Leeman-Munk before them, to include Bourdev’s encoding and decoding to increase computing capability as taught by Bourdev. The motivation for doing so would have been to significantly improve the model results (e.g. the process may also be referred to as convergence) (See [0059] of Bourdev)

Response to Arguments
Applicant's arguments filed on 09/05/2025 have been fully considered, and in part are persuasive.

Pertaining to Rejection under 101
Claims 1-3, 5, 11-13, 15 and 20-23 are withdrawn under 35 USC § 101.



Pertaining to Rejection under 103
Applicant’s arguments in regard to the examiner’s rejections under 35 USC 103 are moot in view of the new grounds of rejection.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EVEL HONORE whose telephone number is (703)756-1179. The examiner can normally be reached Monday-Friday 8 a.m. -5:30 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela D Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

EVEL HONORE
Examiner
Art Unit 2142



/Mariela Reyes/Supervisory Patent Examiner, Art Unit 2142
Read full office action
Prosecution Timeline

Show 5 earlier events
May 14, 2025
Examiner Interview Summary
Jun 04, 2025
Final Rejection — §103
Aug 22, 2025
Interview Requested
Aug 28, 2025
Examiner Interview Summary
Aug 28, 2025
Applicant Interview (Telephonic)
Sep 05, 2025
Request for Continued Examination
Sep 19, 2025
Response after Non-Final Action
Feb 04, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/399,470
Patent 12566942
System and Method For Generating Parametric Activation Functions
4y 6m to grant Granted Mar 03, 2026
17/484,623
Patent 12547946
SYSTEMS AND METHODS FOR FIELD EXTRACTION FROM UNLABELED DATA
4y 4m to grant Granted Feb 10, 2026
17/687,918
Patent 12547906
METHOD, DEVICE, AND PROGRAM PRODUCT FOR TRAINING MODEL
3y 11m to grant Granted Feb 10, 2026
17/189,160
Patent 12536156
UPDATING METADATA ASSOCIATED WITH HISTORIC DATA
4y 11m to grant Granted Jan 27, 2026
17/331,332
Patent 12406483
ONLINE CLASS-INCREMENTAL CONTINUAL LEARNING WITH ADVERSARIAL SHAPLEY VALUE
4y 3m to grant Granted Sep 02, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
48%
Grant Probability
81%
With Interview (+33.8%)
4y 1m (~4m remaining)
Median Time to Grant
High
PTA Risk
Based on 21 resolved cases by this examiner. Grant probability derived from career allowance rate.