Last updated: May 29, 2026

Application No. 18/216,237

USER INTENT LEARNING WITH SESSION SEQUENCE DATA

Non-Final OA §103

Filed

Jun 29, 2023

Examiner

CHEN, ALAN S

Art Unit

2125

Tech Center

2100 — Computer Architecture & Software

Assignee

Microsoft Technology Licensing, LLC

OA Round

1 (Non-Final)

Interview Optional

— +6.3% interview lift. Interview lift (+6.3%) is below the 15.0% threshold. A written response is recommended.

Based on 1134 resolved cases, 2023–2026

Examiner Intelligence

CHEN, ALAN S View full profile →

Grants 91% — above average

Career Allowance Rate

1033 granted / 1134 resolved

+36.1% vs TC avg

Moderate +6% lift

Without

With

+6.3%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

22 currently pending

Career history

1152

Total Applications

across all art units

Statute-Specific Performance

§101

8.1%

-31.9% vs TC avg

§103

30.8%

-9.2% vs TC avg

§102

42.6%

+2.6% vs TC avg

§112

12.5%

-27.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1134 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: ‘SYSTEM AND METHOD FOR PREDICTING USER INTENT VIA DUAL-TOKENIZER TRANSFORMER PROCESSING OF HETEROGENEOUS SESSION SEQUENCES’.

The disclosure is objected to because of the following informalities:
On the title page, inventor “Lingie Weng” appears to be misspelling of ‘Lingjie Weng’
In ¶18…”a single machine learning model is rare, but even those that do utilize an aggregation approach” should be, ‘a single machine learning model is rare, but even one that does utilize an aggregation approach’
In ¶20…”heterogenous items/domains” should be, ‘heterogeneous items/domains”
Appropriate correction is required.

Claim Objections
Claim 6 is objected to because of the following informalities: extra period at the end of the claim sentence.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.CO

Claims 1-3, 8-11 and 16-19 are rejected under 35 USC 103 as being unpatentable over Behavior Sequence Transformer for E-commerce Recommendation in Alibaba to Chen et al. (hereinafter Chen) in view of TransAct: Transformer-based Realtime User Action Model for Recommendation at Pinterest to Xia et al. (hereinafter Xia).

Per claim 1, Chen discloses A system comprising:
	a non-transitory computer-readable medium having instructions stored thereon, which, when executed by a processor (Section 3.1 and Section 5…Behavior Sequence Transformer  (BST) implemented with Python 2.7 and TensorFlow 1.4, deployed at Taobao serving hundreds of millions of users, necessitating a computer system executing stored instructions, “Our model is implemented with Python 2.7 and Tensorflow 1.4, and the “Adagrad” is chosen as the optimizer… we also present the detail of deploying the proposed model in production environment at Taobao, which provides recommendation service for hundreds of millions of users in China”), cause the system to perform operations comprising:
	identifying one or more actions performed by a user on a plurality of items across multiple sessions between the user and an online portal (Fig. 1 and Section 2… identifying user click actions on items at Taobao which is an online portal, where behavior sequence S(u) = {v1 ,v2, … ,vn} spans 8 days of behaviors across multiple sessions, “In the rank stage, we model the recommendation task as Click-Through Rate (CTR) prediction problem, which can be defined as follows: given a user’s behavior sequence S(u) = {v1,v2, ...,vn} clicked by a user u, we need to learn a function, F, to predict the probability of u clicking the target item vt , i.e., the candidate one. Other Features include user profile, context, item, and cross features”; Section 3.1…. 8 days of user behavioral data on Taoboa online portal spans multiple sessions…”The dataset is constructed from the log of Taobao App 2.  We construct an offline dataset based on users’ behaviors in eight days. We use the first seven days as training data, and the last day as test data”), the plurality of items including items of different item type (Section 2.1 and Table 1...items include different categories via category_id (Table 1), “In our scenarios, there are various features, like the user profile features, item features, context features, and the combination of different features, i.e., the cross features1. Since this work is focused on modeling the behavior sequence with transformer, we denote all these features as “Other Features” for simplicity, and give some examples in Table 1”);	
	creating a user session sequence data structure, containing identifications of the one or more actions and the plurality of items, organized in order of when the one or more actions were performed (Fig. 1 and Sections 2, 2.1 …a temporally ordered user session click behavior sequence data structure is created, e.g., S(u) = {v1,v2, ...,vn}, with item identifications (item_id, category_id) as well as positional encoding using timestamps pos(vi) = t(vt) − t(vi), “As shown in Figure 1, we use two types of features to represent an item, “Sequence Item Features”(in red) and “Positional Features” (in dark blue), where “Sequence Item Features” include item_id and category_id….we add the “position” as an input feature of each item in the bottom layer before it is projected as a low-dimensional vector. Note that the position value of item vi is computed as pos(vi ) = t(vt ) −t(vi ), where t(vt) represents the recommending time and t(vi) the timestamp when user click item vi”);
…
passing the user session sequence data structure to a sequence encoder (Fig. 1…Embedding Layer), which embeds each token in the user session sequence data structure to a vector embedding (Fig. 1 and Sections 2.1-2.2…embedding layer creates embedding matrix WV converting each item to a low-dimensional vector, “an item tends to have hundreds of features, while it is too expensive to choose all to represent the item in a behavior sequence.  As introduced in our previous work [14], the item_id and category_id are good enough for the performance, we choose these two as sparse features to represent each item in embedding the user’s behavior sequences.  The “Positional Features” corresponds the following “positional embedding”. Then for each item, we concatenate Sequence Item Features and Positional Features, and create an embedding matrix WV”), the vector embedding including a set of coordinates in an n-dimensional space (Section 2.1…“create an embedding matrix WV ∈ R|V |×dv , where dV is the dimension size of the embedding, and |V | is the number of items”), wherein distance between vector embeddings in the n-dimensional space is indicative of similarity of data represented by corresponding tokens (Section 2.1… distance between embedded item vectors in embedding space reflects item similarity);
	passing the vector embeddings to a neural network (Fig. 1…Transformer Layer has “self-attention” layer and “feed-forward” neural network) to produce a final
representation (Section 2.3…passing Transformer output + Other Features through a three layer MLP  with LeakyReLU to produce a final representation, “By concatenating the embeddings of Other Features and the output of the Transformer layer applying to the target item, we then use three fully connected layers to further learn the interactions among the dense features”); and 
	feeding the final representation to a machine learning model trained to make a prediction for the online portal (Section 2.3 and Equation 5… applies sigmoid to predict CTR for Taobao (online portal), machine learning model trained with cross-entropy loss on click/non-click labels, “To predict whether a user will click the target item vt ,we model it as a binary classification problem, thus we use the sigmoid function as the output unit. To train the model, we use the cross-entropy loss…(5)…p(x) is the output of the network after the sigmoid unit, representing the predicted probability of sample x being clicked”).

Chen does not expressly disclose, but with Xia does teach:
using a first tokenizer to modify the user session sequence data structure to encode each of the one or more actions to a token unique to a corresponding action type (Xia: Fig. 2 and Sections 3-3.1…TransAct uses a dedicated action type embedding table that encodes each user action (click, repin, hide) to a unique learned embedding vector, “we introduce TransAct, our realtime-batch hybrid ranking model…we model the recommendation task as a pointwise multi-task prediction problem, which can be defined as follows: given a user 𝑢 and a pin 𝑝, we build a function to predict the probabilities of user 𝑢 performing different actions on the candidate pin 𝑝. The set of different actions contains both positive and negative actions, e.g. click, repin2 and hide”; Xia: Section 3.3….the embedding vector is a separate encoding step applied to each entry in the sequence to represent the action type, “The user action type sequence is then projected to a user action embedding matrix 𝑾𝑎𝑐𝑡𝑖𝑜𝑛𝑠”; Fig. 3…”action type embedding”); 
using a second tokenizer to modify the user session sequence data structure to encode each of the one or more items to a token unique to a corresponding item type (Xia: Section 3. …TransAct uses a separate PinSage embedding table (a learned GraphSage-based representation) to encode each engaged pin (item) to a dense vector that captures the item’s characteristics and type, this item encoding is performed as a distinct step from the action-type encoding, forming a dual-encoding architecture, “…the content of pins in the user action sequence is represented by PinSage embeddings [38]. Therefore, the content of all pins in the user action sequence is a matrix 𝑾𝑝𝑖𝑛”; Fig. 3…”action pin embedding”);

Chen and Xia are analogous art because they are from the same field of endeavor in transformer-based sequential user behavior modeling for recommendation systems deployed on large-scale online platforms (e-commerce and content discovery, respectively). Both references apply self-attention/Transformer architectures to user behavior sequences to predict user engagement (CTR prediction in Chen; click/repin/hide prediction in Xia), both operating in the domain of industrial-scale recommendation systems serving hundreds of millions of users on online portals (Taobao for Chen and Pinterest for Xia).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to incorporate the two tokenizers of Xia that generate embedding matrices Wactions and Wpin with Chen’s BST architecture.
The suggestion/motivation for doing so would have been Chen’s BST models user behavior sequences, but treats all behaviors as a single action type (clicks only) and does not separately encode the type of action or the type of item with distinct tokenization steps. Xia’s TransAct explicitly identifies this as a limitation: BST does not distinguish between the importance of action type because it does not encode action embeddings separately (Xia: Section 1…” User embedding features are often generated as batch features (e.g. generated daily), which are cost-effective to serve across multiple applications with low latency. The limitation of existing sequential recommendation is that they either only use realtime user actions, or only use a batch user representation learned from long-term user action history”). Xia’s TransAct demonstrates that adding separate action type encoding and item encoding outperforms BST in both offline and online experiments (Xia: Section 4). A person of ordinary skill in the art would recognize that the dual-encoding approach (separate tokenization of actions and items) directly addresses a known shortcoming of the BST architecture.

	Per claim 2, Chen combined with Xia discloses claim 1, Chen further disclosing the neural network (Fig. 1…BST has neural network) is a transformer (Fig. 1…”Transformer Layer”; Section 2.2…”Transformer Layer”) including a self-attention mechanism (Fig. 1…”Multi-Head Self-Attention”; Section 2.2…”Self-attention layer”) and a feed-forward neural network (Fig. 1…”Feed Forward”; Section 2.2…”Point-wise Feed-Forward Networks”), the self-attention mechanism acting to weight importance of the tokens (Section 1…using self-attention to "capture the dependency among words... and... items", being weighting importance) and the feed-forward neural network applying non-linear transformations to each token's representation from the self-attention mechanism (Section 2.2…The FFN includes LeakyReLU to "enhance the model with non-linearity").

Per claim 3, Chen combined with Xia discloses claim 1, Chen combined with Xia further disclosing the sequence encoder includes a first embedding layer corresponding to actions (Xia: Section 3.3…trainable embedding tables to project action types construed as the first embedding layer, "we use trainable embedding tables to project action types to low-dimensional vectors.  The user action type sequence is then projected to a user action embedding matrix 𝑾𝑎𝑐𝑡𝑖𝑜𝑛") and a second embedding layer corresponding to items (Chen: Fig. 1 and Section 2.1…the "Embedding Layer" represents embedding of items, "Then for each item, we concatenate Sequence Item Features and Positional Features, and create an embedding matrix WV").  The rationale to combine the teachings of Xia with Chen are the same as provided in the parent claim.

Per claim 8, Chen combined with Xia discloses claim 3, further disclosing the sequence encoder includes a third embedding layer corresponding to user feature (Section 2.1… the "Embedding Layer" as a unified component that processes all inputs, "The first component is the embedding layer, which embeds all input features... Other Features include user profile... For these features, we create an embedding matrix..." where the embedding matrix Wo is created specifically for user profile features construed as the "third embedding layer" that is distinct from the action and item layers), and wherein the operations further comprise passing user features corresponding to the user to the third embedding layer for embedding in the vector embedding (Section 2.1 and Table 1… specific user features are listed, including "gender", "age", "city", and "tag", these being features that are passed to the embedding layer to be projected into vectors).

Claims 9-11 and 16 are substantially similar in scope and spirit to claims 1-3 and 8, respectively.  Therefore, the rejections of claims 1-3 and 8 are applied accordingly.

Claims 17-19 are substantially similar in scope and spirit to claims 1-3, respectively.  Therefore, the rejections of claims 1-3 are applied accordingly.

Allowable Subject Matter
Claims 4-7, 12-15 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is the statement of reasons for the indication of allowable subject matter:  The prior art disclosed by the applicant and cited by the Examiner fail to teach or suggest, alone or in combination, all the limitations of the independent and intervening claims (claims 3, 11 and 19), further including the particular notable limitations provided below:
Claims 4-5, 12-13 and 20: identifying one or more services in which the one or more actions were performed; wherein the creating includes creating a session sequence data structure containing identifications of the one or more services, the one or more actions and the plurality of items, organized in order of when the one or more actions were performed; wherein the first tokenizer further modifies the user session sequence data structure to encode each combination of service and action to a token unique to a corresponding combination of service and action type.

Claims 6-7, 14 and 15: identifying one or more actors associated with the one or more items; and wherein the creating includes creating a session sequence data structure containing identifications of the one or more actions, the plurality of items, and the one or more actors, organized in order of when the one or more actions were performed.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Patents and/or related publications are cited in the Notice of References Cited (Form PTO-892) attached to this action to further show the state of the art with respect to predicting user intent using dual-token transformers.
		
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALAN CHEN whose telephone number is (571)272-4143. The examiner can normally be reached M-F 10-7.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached at (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ALAN CHEN/Primary Examiner, Art Unit 2125

Read full office action

Prosecution Timeline

Jun 29, 2023

Application Filed

Feb 12, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/559,163

Patent 12632725

NEURAL NETWORK PROCESSING

4y 4m to grant Granted May 19, 2026

18/075,521

Patent 12632796

TRAINING MACHINE LEARNING MODELS TO PREDICT CHARACTERISTICS OF ADVERSE EVENTS USING INTERMITTENT DATA

3y 5m to grant Granted May 19, 2026

18/110,830

Patent 12632757

FIRST-QUANTIZATION BLOCK ENCODING FOR QUANTUM EMULATION

3y 3m to grant Granted May 19, 2026

17/798,038

Patent 12626090

HIERARCHICAL NEUROMORPHIC SENSOR ARRAY WITH INTEGRATED LEARNING FOR PHYSICOCHEMICAL PROPERTY PREDICATION

3y 9m to grant Granted May 12, 2026

17/843,801

Patent 12626175

SPECTRAL CLUSTERING OF GRAPHS ON FAULT TOLERANT AND NOISY QUANTUM DEVICES

3y 11m to grant Granted May 12, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

91%

Grant Probability

97%

With Interview (+6.3%)

2y 9m (~0m remaining)

Median Time to Grant

Low

PTA Risk

Based on 1134 resolved cases by this examiner. Grant probability derived from career allowance rate.