Last updated: May 29, 2026
Application No. 17/735,116
SYSTEM AND METHOD USING ATTENTION LAYERS TO ENHANCE REAL TIME BIDDING ENGINE

Non-Final OA §101§103
Filed
May 03, 2022
Priority
Apr 30, 2021 — provisional 63/182,745
Examiner
SHELDEN, BION A
Art Unit
3685
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Zeta Global Corp.
OA Round
7 (Non-Final)
Interview Optional

— +19.3% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 22% grant rate with +19.3% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 316 resolved cases, 2023–2026
Examiner Intelligence

SHELDEN, BION A View full profile →
Grants only 22% of cases
Career Allowance Rate
69 granted / 316 resolved
-30.2% vs TC avg
Strong +19% interview lift
Without
With
+19.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
30 currently pending
Career history
363
Total Applications
across all art units
Statute-Specific Performance

§101
10.8%
-29.2% vs TC avg
§103
67.1%
+27.1% vs TC avg
§102
3.2%
-36.8% vs TC avg
§112
11.0%
-29.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 316 resolved cases
Office Action

§101 §103
DETAILED ACTION
Status of Claims
This is a non-final office action on the merits in response to the arguments and/or amendments filed on 14 April 2026 and the request for continued examination filed on 14 April 2026. 
Claim(s) 1, 10, and 19 is/are amended. 
Claim(s) 1-7, 9-16, and 18-21 is/are currently pending and have been examined. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 14 April 2026 has been entered.

Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. Applicant has not complied with one or more conditions for receiving the benefit of an earlier filing date under 35 U.S.C. 119(e) as follows:
The later-filed application must be an application for a patent for an invention which is also disclosed in the prior application (the parent or original nonprovisional application or provisional application). The disclosure of the invention in the parent application and in the later-filed application must be sufficient to comply with the requirements of 35 U.S.C. 112(a) or the first paragraph of pre-AIA  35 U.S.C. 112, except for the best mode requirement.  See Transco Products, Inc. v. Performance Contracting, Inc., 38 F.3d 551, 32 USPQ2d 1077 (Fed. Cir. 1994)
The disclosure of the prior-filed application, Provisional Application No. 63/182745, fails to provide adequate support or enablement in the manner provided by 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph for one or more claims of this application.  
Claim 1 recites “identifying a plurality of event types in an online user journey, the plurality of event types including, an email event, a click event, and a website visit; assigning a plurality of encoders to the plurality of event types, each encoder being configured to process a corresponding event type in the plurality of event types”. Claims 10 and 19 recites similar language. The prior-filed application does not appear to disclose or support journey event types, identifying a series of journey event types, the listed types of journey event types, encoders, assigning encoders, or encoding event types using assigned encoders.
Additionally, Claim 1 recites “predicting an alignment vector for a local attention mechanism implemented in the one or more attention layers, using the alignment vector to select a subset of the one or more hidden states for consideration by the local attention mechanism.” Claims 10 and 19 recites similar language. The prior-filed application does not appear to disclose or support local attention. Thus one of ordinary skill in the art would not recognize the prior-filed application as supporting the identified limitations. Accordingly, claims 1-7, 9-16, and 18-21 are not entitled to the benefit of the prior application.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-7, 9-16, and 18-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claim 1, which is representative of claims 10 and 19, recites: a system comprising: 
identifying a plurality of event types in an online user journey, the plurality of event types including an email event, a click event, and a website visit; 
assigning a plurality of encoders to the plurality of event types, each encoder being configured to process a corresponding event type in the plurality of event types;
encoding, using the plurality of encoders, the plurality of event types to generate an encoded vector for each event type, each encoded vector being representative of at least a portion of the online user journey relating to the corresponding event type;
aggregating the encoded vectors for each event type to create a plurality of encoded vectors, the plurality of encoded vectors including a first plurality of encoded vectors associated with impression events and a second plurality of encoded vectors associated with conversion events; 


generating a prediction using the 
The preceding recitation of the claim has had strikethroughs applied to the additional elements beyond the abstract idea to more clearly demonstrate the limitations setting forth the abstract idea. The remaining limitations describe a concept of processing event data and using a model based on processed event data to generate a prediction. This concept describes a mental process that a marketer should follow to estimate touchpoint outcomes similar to the “mental process that a neurologist should follow when testing a patient for nervous system malfunctions” given in MPEP 2106.04(a)(2)(II)(C) as an example of managing personal behavior in the methods of organizing human activity sub-grouping. As such, these limitation set forth a method of organizing human activity. Alternatively, this concept is analogous to the examples of “observation”, “evaluation”, and “judgement” given in MPEP 2106.04(a)(2)(III). Further, this concept as claimed does not require a scale of data beyond the mental faculties of a human being and the operations of the abstract idea can be practically performed in the human mind. As such, these limitations are determined to recite a mental process. Therefore the claims are determined to recite an abstract idea.

MPEP 2106, reflecting the 2019 PEG, directs examiners at Step 2A Prong Two to consider whether the additional elements of the claims integrate a recited abstract idea into a practical application.
Claim 1 recites one or more processors and a memory. Claim 19 recites the additional element of a non-transitory computer-readable medium. These additional elements are recited at an extremely high level of generality, and are interpreted as generic computing devices used to implement the abstract idea. Per MPEP 2106.05(f), implementing an abstract idea on a generic computing device does not integrate an abstract idea into a practical application in Step 2A Prong Two, similar to how the recitation of the computer in the claim in Alice amounted to mere instructions to apply the abstract idea on a generic computer.  As such, these additional elements do not integrate the abstract idea into a practical application.
The claims further recite an additional element of training a plurality of network layers of a machine learning model, the training of the plurality of network layers comprising: identifying a subset of encoded vectors; and generating one or more hidden states based on the one or more encoded vectors and training one or more attention layers of the machine learning model, the training comprising: feeding vector representations included in the one or more hidden states, predicting an alignment vector for a local attention mechanism implemented in the one or more attention layers, using the alignment vector to select a subset of the one or more hidden states for consideration by the local attention mechanism, and determining a vector by combining the selected subset of hidden states based on the trained attention weights of the one or more attention layers. This additional element reflects no technological improvement, as indicated by the fact that the specification does not provide a technical explanation of how to implement this functionality (MPEP 2106.05(a), “If it is asserted that the invention improves upon conventional functioning of a computer, or upon conventional technology or technological processes, a technical explanation as to how to implement the invention should be present in the specification.”). This additional element does not require any particular machine, as local attention based neural networks can be implemented and trained on a variety of computing devices. This additional element clearly does not effect a transformation of any article. As the additional element only generally describes the training of generic local attention based neural network, this additional element does not meaningfully limit the implementation of the abstract idea, but rather only generally links the abstract idea to an environment of local attention based neural networks. As such, this additional element does not integrate the abstract idea into a practical application. 
There are no further additional elements. When considered as a combination, the preceding additional elements only generally link the abstract idea to an environment of local attention based neural networks. As such, the combination of additional elements does not integrate the abstract idea into a practical application. Therefore the claims are determined to be directed to an abstract idea. 

At Step 2B of the Mayo/Alice analysis, examiners are to consider whether the additional elements amount to significantly more than the abstract idea.
As previously noted, the claims recite additional elements which may be interpreted as generic computing devices used to implement the abstract idea. However, per MPEP 2106.05(f), implementing an abstract idea on a generic computing device does not add significantly more in Step 2B, similar to how the recitation of the computer in the claim in Alice amounted to mere instructions to apply the abstract idea on a generic computer. As such, this additional element does not amount to significantly more.
As previously noted, the claims recite an additional element of a neural network using local attention. Chiu et al. (US 2020/0026760 A1) (“it has become standard to use an attention mechanism, which treats the hidden state sequence as a (soft-)addressable memory whose entries are used to compute the context vector c.sub.i” [0043]) expressly notes the conventionality of attention mechanisms. Further, Ganu et al. (US 10380236 B1) (“In a local attention mechanism, a small subset of source positions is chosen for each output label. The local attention mechanism may selectively focus on a small window of context and is differentiable” Column 10, Lines 56-59), Bellegarda (US 2020/0104369 A1) (“The attention mechanism is, for example, a global, local, or self-attention mechanism” [0262]), Meng et al. (US 2020/0335108 A1) (“the local attention selectively focuses on a small window of context centered at the current time” [0024]), Zhang et al. (US 2021/0042475 A1) (“The attention layer may include a local attention model … The local attention model may predict a single aligned position for the current word being translated, and a window centered around the single aligned position to determine a context vector” [0109]), and Mohanty et al. (US 2023/0117224 A1) (“when only a portion of the hidden states are accessed, the attention mechanism is referred to as a local attention mechanism” [0054]) collectively demonstrate that local attention techniques were conventional prior to the priority date of the claimed invention. Further, Examiner notes that the disclosure describes the use of local attention in a manner consistent with reference to a well-known technique not requiring description (e.g., at a highly functional level and without providing any technical details regarding implementation). Therefore the Examiner concludes that this additional element is well-understood, routine, and conventional, and as such does not amount to significantly more than the abstract idea. 
There are no further additional elements. When considered as a combination, the preceding additional elements only generally link the abstract idea to an environment of local attention based neural networks. As such, the combination of additional elements does not amount to significantly more than the abstract idea. Therefore, when considered individually and as a combination, the additional elements of the independent claims do not amount to significantly more than the abstract idea. Thus the independent claims are not patent eligible.

Claims 2-6, 9, 11-16, and 18, 20, and 21 further narrow the abstract idea, but the claims continue to set forth an abstract idea, albeit a narrowed one. Claims 2-4 and 11-13 recite no further additional elements. The previously identified additional elements, individually and as a combination, do not integrate the narrowed abstract idea into a practical application and do not amount to significantly more than the narrowed abstract idea for the same reasons given above. Claims 5, 6, 9, 14, 15, 18, 20, and 21 further describes a previously discussed additional element. However the further described additional element, individually and in combination with the other additional elements, still does not either integrate the narrowed abstract idea into a practical application or amount to significantly more than the narrowed abstract idea for the same reasons given above. Claim 7 and 16 recites the additional element of LSTM units. This additional element only generally links the abstract idea to a technological environment involving LSTM neural network units. When considered in combination with the previously identified additional elements, the combination of additional elements only generally links the abstract idea to an environment of LSTMs and local attention based neural networks. As such, this additional element, individually and in combination with the prior identified additional elements, does not integrate the abstract idea into a practical application. At Step 2B, Examiner notes that Verma et al. (US 2019/0197397 A1) demonstrates (“conventional LSTM networks” [0013]) that LSTM units were conventional before the priority date of the claimed invention. And when considered in combination with the previously identified additional elements, the combination of additional elements only generally links the abstract idea to an environment of LSTMs and local attention based neural networks. As such, this additional element, individually and in combination with the prior identified additional elements, does not amount to significantly more than the abstract idea. Thus as the dependent claims remain directed to a judicial exception, and as the additional elements of the claims do not amount to significantly more, the dependent claims are not patent eligible.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-7, 9-16, and 18-21 are rejected under 35 U.S.C. 103 as being unpatentable over Yan et al. (US 2019/0278378 A1) in view of Khan et al. (MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering) and Ganu et al. (US 10380236 B1). 

Regarding Claim 1, 10, and 19: a system comprising: one or more processors (See at least [0208]); and a memory storing instructions that, when executed by at least one processor in the one or more processors, cause the at least one processor to perform (See at least [0208]) operations comprising:
identifying a plurality of event types in an online user journey, the plurality of event types including an email event, a click event, and a website visit (Using the touchpoint sequence and the conversion information, the deep learning attribution system 104 can generate one or more touchpoint paths. As described above, a touchpoint path includes touchpoint sequence for a user combined with a conversion indication (e.g., a conversion or non-conversion). To illustrate, FIG. 3 shows various touchpoint paths that include touchpoints (e.g., “DI” or display impression, “DC” or display click, “ES” or email sent, “EO” or email opened, “EC” or email clicked, “FT” or free trial sign-up, and “PS” or paid search) as well as conversion indicators (e.g., “C” or conversion and “NC” or non-conversion). See at least [0080]);
encoding, using a encoder, the plurality of event types to generate an encoded vector for each event type, each encoded vector being representative of at least a portion of the online user journey relating to the corresponding event type (In one or more embodiments, the touchpoint encoding layer 402 encodes the touchpoints using one-hot encoding representation. For example, the touchpoint encoding layer 402 creates a vector that includes entries for each touchpoint type. Each entry is initialized to zero (i.e., 0). To encode a touchpoint in a training touchpoint sequence, the touchpoint encoding layer 402 changes the entry corresponding to the touchpoint to one (i.e., 1) while leaving the other entries at zero. In this manner, the touchpoint encoding layer 402 converts each touchpoint in a training touchpoint sequence into a separately encoded vector. See at least [0096]);
aggregating the encoded vectors for each event type to create a plurality of encoded vectors, the plurality of encoded vectors including a first plurality of encoded vector associated with impression events and a second plurality of encoded vectors associated with conversion events (the touchpoint encoding layer 402 outputs encoded touchpoint vectors 404, shown as x.sub.1, x.sub.2, . . . x.sub.T in FIG. 4A, which is a sequential time series of the training touchpoint sequence. See at least [0097]. Also: Using the encoded touchpoint vectors 404, the deep learning attribution system 104 can continue to train the touchpoint attribution attention neural network 400a. In particular, in various embodiments, the deep learning attribution system 104 performs the act 404 of providing the encoded touchpoint vectors as input to the embedding layer 406. See at least [0098]. Also: the touchpoint encoding layer 402 converts each touchpoint in a training touchpoint sequence into a separately encoded vector. See at least [0096]. Also: FIG. 3 shows various touchpoint paths that include touchpoints (e.g., “DI” or display impression, “DC” or display click, “ES” or email sent, “EO” or email opened, “EC” or email clicked, “FT” or free trial sign-up, and “PS” or paid search) as well as conversion indicators (e.g., “C” or conversion and “NC” or non-conversion). See at least [0080]). 
training a plurality of network layers of a machine learning model based on the first plurality of encoded vectors associated with the impression events, the training of the plurality of network layers comprising: identifying a subset of the first plurality of encoded vectors that includes only one or more encoded vectors associated with one or more impression events that contributed to one or more conversions of the online user journey, and generating one or more hidden states based on the one or more encoded vectors associated with the one or more impression events (As shown, the deep learning attribution system 104 feeds the training touchpoint paths 434 into the touchpoint attribution attention neural network 400a as part of training. See at least [0095]. Also: the deep learning attribution system 104 generates a touchpoint path that includes a training touchpoint sequence of touchpoint interactions between a given user and an entity. In addition, the touchpoint path includes a conversion indicator of whether (and when) the training touchpoint sequence resulted in a conversion. See at least [0094]. Also: The deep learning attribution system 104 can use the dense vectors 408 output from the embedding layer 406 as input to the RNN/LSTM layer 410. As shown, the RNN/LSTM layer 410 includes a LSTM neural network, which is a type of RNN network. See at least [0101]. Also: More particularly, in various embodiments, the RNN/LSTM layer 410 transforms the dense vectors 408 to create hidden state vectors 412, shown as h.sub.1, h.sub.2, . . . h.sub.T in FIG. 4A, based on the dense vectors 408 (e.g., e.sub.t) and the hidden state vectors from previous touchpoints in a training touchpoint sequence (e.g., h.sub.t−1). See at least [0103] and Fig. 4A). 
training one or more attention layers of the machine learning model, each attention layer in the one or more attention layers corresponding to an impression event in the online user journey, the training comprising: feeding vector representations included in the one or more hidden states for one or more impression events and a conversion event into each of the one or more attention layers, predicting an alignment vector for a attention mechanism implemented in the one or more attention layers, using the alignment vector to select a subset of the one or more hidden states for consideration by the attention mechanism, and determining a context vector by combining the selected subset of the hidden states based on trained attention weights of the one or more attention layers (As shown in FIG. 4A, the deep learning attribution system 104 trains the attention layer 414 by providing the hidden state vectors 412 (i.e., h.sub.1, h.sub.2, . . . , h.sub.T) to the attention layer 414. Using each of the hidden state vectors 412 in combination with a touchpoint context vector 416 (i.e., u), the deep learning attribution system 104 determines attention weights 418 (i.e., a.sub.1, a.sub.2, . . . , a.sub.T) for each touchpoint in the training touchpoint sequence. In general, the attention weights 418 are fractional values ranging between zero and one (i.e., 0-1). In some embodiments, the attention weights 418 together sum to one or near one. In alternative embodiments, the attention weights 418 do not add to one. See at least [0109]. Also: In addition, the deep learning attribution system 104 trains the attention layer 414 by combining the attention weights 418 for each touchpoint with the corresponding hidden state vectors 412 to obtain weighted hidden state vectors 420 (i.e., a.sub.1h.sub.1, a.sub.2h.sub.2, . . . , a.sub.Th.sub.T). Each of the weighted hidden state vectors 420 reflects a more accurate representation of a touchpoint's conversion significance with respect to a user's conversion given the specific sequence of touchpoints. See at least [0112]. Also:  the deep learning attribution system 104 can aggregate the representation of the weighted hidden state vectors 420 to form a touchpoint sequence representation 422 (i.e., s). See at least [0113]); and 
generating a prediction using the trained machine learning model that comprises the trained plurality of network layers and the trained one or more attention layers (As shown in FIG. 4A, the deep learning attribution system 104 classifies the touchpoint sequence representation 422. For instance, the deep learning attribution system 104 feeds the touchpoint sequence representation 422 to the classification layer 424, which predicts whether the input training touchpoint sequence results in a conversion based on the touchpoint sequence representation 422 (e.g., based on the weighted combination of all touchpoint input states). See at least [0118]. And Fig. 4A. Also: the classification layer 424 transforms the touchpoint sequence representation 422 to a number ranging between zero and one (i.e., 0-1). The transformed number indicates the probability that the training touchpoint sequence resulted in conversion. In some embodiments, the transformed number is a conversion prediction 426 (i.e., p). In alternative embodiments, the conversion prediction 426 includes a touchpoint (and/or the media channel to trigger the touchpoint) that, when added to the input training touchpoint sequence, has the highest conversion probability. See at least [0119]).

Yan does not explicitly disclose assigning a plurality of encoders to the plurality of event types, each encoder being configured to process a corresponding event type in the plurality of event types;
Khan teaches assigning a plurality of encoders to the plurality of data types, each encoder being configured to process a corresponding data type in a plurality of data types, and encoding using the plurality of encoders (“Our method starts with independent processing of modalities and the joint understanding happens at a later stage. Therefore, our method is one step forward toward better joint understanding of multiple modalities. We use separate BERT encoders to process each of the input modalities namely Q-BERT, V-BERT and S-BERT to process question (Q), video (V), and subtitles (S) respectively. Each BERT encoder takes an input source with question and candidate answer paired together.” See at least Page 2 and Figure 2). 
Yan provides an attention based neural network system which encodes user information to make predictions regarding user behavior, upon which the claimed invention’s use of type-based encoders can be seen as an improvement. However, Khan demonstrates that the prior art already knew of using data type specific encoders in attention based neural network systems. One of ordinary skill in the art could have applied the techniques of Khan to the system of Yan by incorporating encoders specific to each of Yan’s event types. Further, one of ordinary skill in the art would have recognized that such an application of Khan would have resulted in an improved system which would customize encoding processes according to each type of data, resulting in better use of multi-modal data types (Khan, at least page 2). As such the application of Khan would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Yan and the teachings of Khan. 

Yan does not explicitly disclose local attention. 
Ganu teaches local attention (Then, given the decoder hidden state h.sub.t and the context vector c.sub.t, a concatenation layer may be employed to combine the information from h.sub.t and c.sub.t to produce an attentional vector, which is used to selectively focus on certain positions in the input text position. In a global attention mechanism, the decoding process attends to all tokens in the input text for each output label. This approach is computationally expensive and can potentially render it impractical to translate longer sequences. Thus, in some embodiments, a local attention mechanism may be used, to focus only on a small subset of the source tokens in the input text. See at least Column 8, Lines 33-44). 
Yan and Khan suggest an attention based neural network system, which differs from the claimed invention by the substitution of Yan’s unstated attention mechanism for a local attention mechanism. Ganu demonstrates that the prior art already knew of using local attention mechanisms and that such attention was less computationally expensive and more practical (Ganu, See at least Column 8, Lines 33-44). One of ordinary skill in the art could have easily substituted Ganu’s local attention mechanism into the system of Yan and Khan, and further one of ordinary skill in the art would have recognized that such a substitution would have resulted in a less computationally expensive method of generating predictions. As such, the identified substitution and the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Yan and the teachings of Khan and Ganu. 

Regarding Claims 2 and 11: Yan in view of Khan and Ganu teaches the above limitations. Additionally, Yan discloses wherein the prediction comprises an occurrence probability for at least one further event in the online user journey, and wherein the operations comprises: selecting a channel for distributing a piece of targeted content based on the occurrence probability (If the target touchpoint sequence does not include a conversion (e.g., a negative touchpoint path), the deep learning attribution system 104 can feed the target touchpoint sequence through the trained touchpoint attribution attention neural network and determine a conversion prediction for the target touchpoint sequence. As shown, the deep learning attribution system 104 provides 310 a conversion prediction based on the target touchpoint sequence using the trained touchpoint attribution attention neural network. See at least [0090]. Also: In providing a recommended touchpoint type the conversion prediction 514 can also indicate a media channel. For example, if the deep learning attribution system 104 recommends an email touchpoint, the conversion prediction 514 can indicate to send content to the target user via email. In another example, if the deep learning attribution system 104 recommends a display impression, the conversion prediction 514 can recommend one or more digital content media channels (e.g., browser, in-app, push notification) to utilize to best trigger the recommended touchpoint. Further, in some embodiments, the deep learning attribution system 104 can automatically send content to the target user via the one or more recommended media channels. See at least [0166]). 

Regarding Claims 3 and 12: Yan in view of Khan and Ganu teaches the above limitations. Additionally, Yan discloses determining an attribution probability based on an impression event associated with the prediction, the impression event being included in the impression events associated with the first plurality of encoded vectors; reserving a placement for targeted content at a location or domain based on the attribution probability; selecting a channel for distributing the targeted content based on an occurrence probability associated with the prediction; and distributing the targeted content at the placement via the selected channel (If the target touchpoint sequence does not include a conversion (e.g., a negative touchpoint path), the deep learning attribution system 104 can feed the target touchpoint sequence through the trained touchpoint attribution attention neural network and determine a conversion prediction for the target touchpoint sequence. As shown, the deep learning attribution system 104 provides 310 a conversion prediction based on the target touchpoint sequence using the trained touchpoint attribution attention neural network. See at least [0090]. Also: In providing a recommended touchpoint type the conversion prediction 514 can also indicate a media channel. For example, if the deep learning attribution system 104 recommends an email touchpoint, the conversion prediction 514 can indicate to send content to the target user via email. In another example, if the deep learning attribution system 104 recommends a display impression, the conversion prediction 514 can recommend one or more digital content media channels (e.g., browser, in-app, push notification) to utilize to best trigger the recommended touchpoint. Further, in some embodiments, the deep learning attribution system 104 can automatically send content to the target user via the one or more recommended media channels. See at least [0166]).

Regarding Claims 4 and 13: Yan in view of Khan and Ganu teaches the above limitations. Additionally, Yan discloses wherein the operations comprise: determining an attribution probability; and completing a transaction based on the attribution probability exceeding a attribution threshold (If the target touchpoint sequence does not include a conversion (e.g., a negative touchpoint path), the deep learning attribution system 104 can feed the target touchpoint sequence through the trained touchpoint attribution attention neural network and determine a conversion prediction for the target touchpoint sequence. As shown, the deep learning attribution system 104 provides 310 a conversion prediction based on the target touchpoint sequence using the trained touchpoint attribution attention neural network. See at least [0090].  The conversion prediction 220 can identify a touchpoint that, if added to the second target touchpoint sequence 214 (e.g., the fourth touchpoint 212d) would have the highest probability of resulting in a conversion 206b. For example, in one or more embodiments, the deep learning attribution system 104 determines the conversion probability for each potential touchpoint that can be added to the second target touchpoint sequence 214. The deep learning attribution system 104 then utilizes the touchpoint with the highest conversion probability as the conversion prediction 220. See at least [0073]. Also: In some embodiments, if the highest conversion likelihood is below a predetermined conversion probability threshold (e.g., <50%), the deep learning attribution system 104 can repeat the process to identify additional touchpoints to add to the second target touchpoint sequence 214 to improve the likelihood of conversion. For example, upon adding a second email touchpoint, the second target touchpoint sequence 214 has a conversion probability of 40%. Further adding an in-app notification touchpoint further increases the conversion probability to 60%. See at least [0074]. Also:  In identifying a potential touchpoint for the target user, the deep learning attribution system 104 can indicate one or more media channels. Indeed, the deep learning attribution system 104 can select a media channel that is most likely to result in a conversion. Thus, in some embodiments, the conversion prediction 220 includes which digital media channels to employ when serving digital content (either directly or indirectly) to a target user. See at least [0075]. Also: In providing a recommended touchpoint type the conversion prediction 514 can also indicate a media channel. For example, if the deep learning attribution system 104 recommends an email touchpoint, the conversion prediction 514 can indicate to send content to the target user via email. In another example, if the deep learning attribution system 104 recommends a display impression, the conversion prediction 514 can recommend one or more digital content media channels (e.g., browser, in-app, push notification) to utilize to best trigger the recommended touchpoint. Further, in some embodiments, the deep learning attribution system 104 can automatically send content to the target user via the one or more recommended media channels. See at least [0166]).

Regarding Claims 5, 14, and 20: Yan in view of Khan and Ganu teaches the above limitations. Additionally, Yan discloses wherein the operations comprise: generating a training dataset that includes the one or more impression events contributed to the conversion of the online user journey; comparing events having an attribution probability above an attribution threshold to the one or more impression events contributed to the conversion of the online user journey using a loss function; and adjusting the plurality of trained weights of the plurality of network layers to minimize an error value generated by the loss function (As shown in relation to the embodiment of FIG. 3, the deep learning attribution system 104 performs an act 302 of generating training touchpoint paths based on user interactions. See at least [0076]. Also:  As shown, the series of acts also includes an act 830 of modifying the touchpoint attribution attention neural network based on the conversion prediction. The act 830 can include modifying parameters of the touchpoint attribution attention neural network based on a comparison between the conversion prediction for the first training touchpoint sequence and the first conversion indication. In one or more embodiments, comparing the conversion prediction for the first training touchpoint sequence and the first conversion indication includes utilizing a loss function to determine a training loss based on the conversion prediction and the first conversion indication, and modifying the touchpoint attention layer based on the training loss. See at least [0217]. Also:  the deep learning attribution system can employ a loss layer that includes a loss function or loss model to train the touchpoint attribution attention neural network. As used herein, the term “loss function” or “loss model” refers to a function that indicates training loss. In some embodiments, a machine-learning algorithm can repetitively train to minimize total overall loss. For example, the loss function determines an amount of loss with respect to a training touchpoint path by analyzing the conversion prediction and the conversion indication. The loss function then provides feedback, via back propagation, to one or more layers of the touchpoint attribution attention neural network to tune/fine-tune those layers. See at least [0047]). 

Regarding Claims 6, 15, and 21: Yan in view of Khan and Ganu teaches the above limitations. Additionally, Yan discloses wherein the operations comprise: generating a training dataset that includes the one or more impression events contributed to the conversion of the online user journey; comparing events having an attribution probability above an attribution threshold to the one or more impression events contributed to the conversion of the online user journey using a loss function; and adjusting a plurality of trained weights of the one or more attention layers to minimize an error value generated by the loss function (As shown in relation to the embodiment of FIG. 3, the deep learning attribution system 104 performs an act 302 of generating training touchpoint paths based on user interactions. See at least [0076]. Also:  As shown, the series of acts also includes an act 830 of modifying the touchpoint attribution attention neural network based on the conversion prediction. The act 830 can include modifying parameters of the touchpoint attribution attention neural network based on a comparison between the conversion prediction for the first training touchpoint sequence and the first conversion indication. In one or more embodiments, comparing the conversion prediction for the first training touchpoint sequence and the first conversion indication includes utilizing a loss function to determine a training loss based on the conversion prediction and the first conversion indication, and modifying the touchpoint attention layer based on the training loss. See at least [0217]. Also:  the deep learning attribution system can employ a loss layer that includes a loss function or loss model to train the touchpoint attribution attention neural network. As used herein, the term “loss function” or “loss model” refers to a function that indicates training loss. In some embodiments, a machine-learning algorithm can repetitively train to minimize total overall loss. For example, the loss function determines an amount of loss with respect to a training touchpoint path by analyzing the conversion prediction and the conversion indication. The loss function then provides feedback, via back propagation, to one or more layers of the touchpoint attribution attention neural network to tune/fine-tune those layers. See at least [0047]). 

Regarding Claims 7 and 16: Yan in view of Khan and Ganu teaches the above limitations. Additionally, Yan discloses wherein the plurality of network layers comprises a plurality of LSTM units (The deep learning attribution system 104 can use the dense vectors 408 output from the embedding layer 406 as input to the RNN/LSTM layer 410. As shown, the RNN/LSTM layer 410 includes a LSTM neural network, which is a type of RNN network. In alternative embodiments, the deep learning attribution system 104 employs another type of RNN neural network, such as another type of memory-based neural network, as the RNN/LSTM layer 410 of the touchpoint attribution attention neural network 400a. See at least [0101]. Also: Each illustrated layer can represent one or more types of neural network layers and/or include an embedded neural network. See at least [0093]);

Regarding Claims 9 and 18: Yan in view of Khan and Ganu teaches the above limitations. Additionally, Yan discloses wherein each attention layer comprises a plurality of attention units, each attention corresponds to a corresponding event type in the online user journey (See at least Fig. 4A, Element 414), and wherein the operations further comprise: predicting an alignment vector for an attention mechanism implemented in the attention layer; using the alignment vector to select hidden states for consideration by the attention mechanism; feeding a vector representation included in the hidden states to the attention layer, wherein the attention layer is included in a number of attention layers equal to a number of impression events in the online user journey, and wherein the hidden states include a hidden state for an impression event and a hidden state for a conversion event; and determining a context vector by combining the selected hidden states based on a plurality of trained weights of the one or more attention layers (As shown in FIG. 4A, the deep learning attribution system 104 classifies the touchpoint sequence representation 422. For instance, the deep learning attribution system 104 feeds the touchpoint sequence representation 422 to the classification layer 424, which predicts whether the input training touchpoint sequence results in a conversion based on the touchpoint sequence representation 422 (e.g., based on the weighted combination of all touchpoint input states). See at least [0118]. And Fig. 4A. Also: the classification layer 424 transforms the touchpoint sequence representation 422 to a number ranging between zero and one (i.e., 0-1). The transformed number indicates the probability that the training touchpoint sequence resulted in conversion. In some embodiments, the transformed number is a conversion prediction 426 (i.e., p). In alternative embodiments, the conversion prediction 426 includes a touchpoint (and/or the media channel to trigger the touchpoint) that, when added to the input training touchpoint sequence, has the highest conversion probability. See at least [0119]). As previously noted in combination with Yan, Ganu teaches local attention (Then, given the decoder hidden state h.sub.t and the context vector c.sub.t, a concatenation layer may be employed to combine the information from h.sub.t and c.sub.t to produce an attentional vector, which is used to selectively focus on certain positions in the input text position. In a global attention mechanism, the decoding process attends to all tokens in the input text for each output label. This approach is computationally expensive and can potentially render it impractical to translate longer sequences. Thus, in some embodiments, a local attention mechanism may be used, to focus only on a small subset of the source tokens in the input text. See at least Column 8, Lines 33-44). The motivation to combine Yan, Khan, and Ganu is the same as explained under claim 1 above, and is incorporated herein.

Response to Arguments
Applicant’s Argument Regarding 112(a) Rejections of claims 1-7, 9-16, and 18-21: Claims 1-7, 9-16, and 18-21 are presently amended for clarity.
Examiner’s Response: Applicant's amendments filed 14 April 2026 have been fully considered and resolve the identified issue through the removal of the unsupported material. The rejections under 112(a) are withdrawn. 

Applicant’s Argument Regarding 101 Rejections of claims 1-7, 9-16, and 18-21: 
The claims recite elements that reflect an improvement to a technology or technical field. Specifically, the claims reflect an improvement to machine learning model operation, particularly in how a machine learning model processes sequential event data and determines relationships between impression events and conversion events using a structured attention-based architecture. 
Similar to Desjardins, the present claims are directed to improvements in how a machine learning model itself functions. The claims do not merely recite performing a prediction, but instead recite a specific architecture and training methodology that improves the model’s internal operation.
Paragraphs 0018 and 0019 describe limitations of prior approaches, including that traditional feature vectors are “unable to extract the sequential and ordered context from a user’s activity,” that existing time-series models “are limited to a short lookback period,” and that such models “lack explainability.” 
When viewed as an ordered combination, the claims define a particular way of structuring and training a machine learning model that improves its operation. 
Examiner’s Response: Applicant's arguments filed 14 April 2026 have been fully considered but they are not persuasive.
Per MPEP 2106.05(a), “If it is asserted that the invention improves upon conventional functioning of a computer, or upon conventional technology or technological processes, a technical explanation as to how to implement the invention should be present in the specification.” Here, the disclosure does not provide a technical explanation of how to implement “a structured attention-based architecture”, indicating that there is no technical improvement present. 
As noted in the Desjardins decision, that specification “identifie[d] improvements in training the machine learning model itself.” Here, the disclosure only applies existing machine learning techniques to particular data in a non-specific manner. Thus the present claims are not meaningfully analogous to those in Desjardins.
Examiner notes that the asserted improvement does not appear to address the problem of vectors being “unable to extract the sequential and ordered context from a user’s activity”, models having “limited to a short lookback period”, or the “explainability” of models. For example, note that at [0018] the disclosure states that “These features are typically static features which are unable to extract the sequential and ordered context from a user’s activity.” The claims do not appear to address the problem of “static” features, and one of ordinary skill in the art would understand the “encoded vector for each event type” to be a static vector of feature data. 
The claims do not appear to recite “a particular way of structuring and training a machine learning model.” The claims merely generically describe the training of an existing type of machine learning model. One of ordinary skill in the art would not consider this either a particular structure for a machine learning model or a particular way of training a machine learning model. 

Additional Considerations
The prior art made of record and not relied upon that is considered pertinent to applicant’s disclosure can be found in the PTO-892 Notice of References Cited. 
Negi et al. (US 2020/0327444 A1) describes analyzing user purchase event sequences with neural networks. 
Additionally, Examiner notes that Zhou et al. (Understanding Consumer Journey Using Attention based Recurrent Neural Networks), previously with the 14 February 2024, teaches (See at least Fig. 3) a neural network involving feeding user activity data into an embedding layer, generating hidden states, passing those hidden states to a local attention layer, and forming a context vector from the attention outputs, which is extremely close to the techniques of the current claims. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Bion A Shelden whose telephone number is (571)270-0515. The examiner can normally be reached M-F, 12pm-10pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kambiz Abdi can be reached at (571) 272-6702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Bion A Shelden/Primary Examiner, Art Unit 3685                                                                                                                                                                                                        2026-05-02
Read full office action
Prosecution Timeline

Show 22 earlier events
Jun 17, 2025
Response Filed
Jun 25, 2025
Final Rejection mailed — §101, §103
Oct 21, 2025
Response after Non-Final Action
Oct 21, 2025
Notice of Allowance
Nov 06, 2025
Response after Non-Final Action
Apr 14, 2026
Request for Continued Examination
Apr 23, 2026
Response after Non-Final Action
May 06, 2026
Non-Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/143,551
Patent 12620475
INTERGRATED MEDICAL MANAGEMENT SYSTEM FOR INTERGRATING AND MANAGING DATA INCLUDING DATA LOCATED ON EXTERNAL SERVERS
3y 0m to grant Granted May 05, 2026
17/342,003
Patent 12591880
Terminal Data Encryption
4y 9m to grant Granted Mar 31, 2026
16/006,850
Patent 12450631
Advanced techniques to improve content presentation experiences for businesses and users
7y 4m to grant Granted Oct 21, 2025
18/225,684
Patent 12412202
APPARATUS AND METHOD FOR PROVIDING CUSTOMIZED SERVICE
2y 1m to grant Granted Sep 09, 2025
12/287,594
Patent 12363199
Systems and methods for mobile wireless advertising platform part 1
16y 9m to grant Granted Jul 15, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

7-8
Expected OA Rounds
22%
Grant Probability
41%
With Interview (+19.3%)
3y 11m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 316 resolved cases by this examiner. Grant probability derived from career allowance rate.
SYSTEM AND METHOD USING ATTENTION LAYERS TO ENHANCE REAL TIME BIDDING ENGINE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email