DETAILED ACTION
Response to Arguments
Applicant argues that the prior art of Zhuang in view of Mu does not teach the claim limitations of a bidirectional long short-term memory (LSTM) or a transformer-based
Encoder and concurrently trains the one or more models for event mention recognition,
trigger extraction, argument extraction or role extraction using an optimization algorithm
with training data from which to learn as found in claims 1, 8 and 15. See pg., 7 of Applicant’s Remarks submitted on 12/03/2025.
Respectfully, Examiner disagrees. See the Current Office Action for Zhuang’s teaching of the claim limitations of a bidirectional long short-term memory (LSTM) or a transformer-based Encoder and concurrently trains the one or more models for event mention recognition, trigger extraction, argument extraction or role extraction using an optimization algorithm with training data from which to learn. Accordingly, the 103 rejection has not been withdrawn.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12/03/2025 has been entered.
Claim Objections
Claims 1 and 8 are objected to because of the following informalities: Wherein the training i.e., the w in wherein needs to be capitalized. Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhuang, Yan, et al. "An active learning based hybrid neural network for joint information extraction." Web Information Systems Engineering–WISE 2020: 21st International Conference, Amsterdam, The Netherlands, October 20–24, 2020, Proceedings, Part II 21. Springer International Publishing, 2020(“Zhuang”) in view of Mu Y et al., A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning. Communications medicine. 2021 Jul 5(“Mu”).
Regarding claim 1, Zhuang teaches a system for event extraction, comprising:
a memory that stores computer executable components; and a processor that executes the computer executable components stored in the memory(Zhuang, pgs. 85-88, see also fig. 1, “We propose an unified human-machine framework… [w]e deploy Stanford CoreNLP to preprocess the data, including tokenizing, sentence splitting, pos-tagging and dependency parsing tree generating.”)1
wherein the computer executable components comprise:
a training component that trains one or more models to recognize one or more events from data input into the system(Zhuang, pg. 89, see also fig. 2, “The joint model shares the same embedding layer… bidirectional LSTM encoding layer… GAT layer… and is trained by respective sub-models in joint extraction layer…as shown [below] in fig. 2[a training component that trains one or more models to recognize one or more events from data input into the system].
PNG
media_image1.png
578
660
media_image1.png
Greyscale
” ),2
extract one or more triggers from the data input into the system, extract one or more arguments from the data input into the system, or extract one or more roles from the data input into the system(Zhuang, pg. 86, “[L]et
G
=
{
g
1
,
g
2
,
g
3
,
…
}
be the set of triggers with specified event sub-types[extract one or more triggers from the data input into the system], and
A
=
{
a
11
,
a
12
,
…
,
a
i
j
…
}
be the argument set between trigger
g
i
and its argument
e
j
[extract one or more arguments from the data input into the system]. Here argument
e
j
can be an entity mention, temporal expression or value with a specific role(type)[or extract one or more roles from the data input into the system]….”& Zhuang, pgs. 91-93, “For EE, the processing step of trigger detection is similar to NER and argument role prediction as RE, while executes in another independent sub-process.”)3
wherein the training component comprise [one or more optimizers] and trains the one or more models [employing the one or more optimizers], and wherein the one or more models comprise a shared encoded layer followed by individual self-attention layers and a feed forward network(Zhuang, pgs., 89-93, see also fig. 2, “The joint model shares the same embedding layer...bidirectional LSTM encoding layer...and is trained by respective
sub-models in joint extraction layer...as shown in Fig. 2 [wherein the training component comprise and trains the one or more models]... BiLSTM...is used to encapsulate the context information over the whole sentence...
h
→
i
and
h
←
i
to represent the encoded information, [g]raph attention network (GAT)... is an attention-based architecture of GNN which is to compute the hidden representations of each node by aggregating the feature information of multi-order neighborhood, followed by a self-attention strategy...
M
i
j
can be calculated by the attention mechanism as the following equation:
M
i
j
=
s
o
f
t
m
a
x
j
∈
N
i
(
δ
(
a
T
⋅
[
h
i
|
|
h
j
]
)
)
... the input
h
i
obtained from GAT layer will be fed into an FC layer[and wherein the one or more models comprise a shared encoded layer followed by individual self-attention layers and a feed forward network]....”);4,5
and a learning-based sample component that determines a sampled dataset by applying a learning-based sample to the one or more models(Zhuang, pg. 93, “Pool-based active learning selects the most valuable samples by designing sampling rules. Our pool-based strategy is a multi-round method. We denote
L
t
and
U
t
as the training set and the unlabeled set at round t. Let
θ
t
be the classifier trained on
L
t
…[g]iven budget
B
,
the objective is to select a batch in T training rounds of Q sentences at a time…from
U
t
[and a learning-based sample component that determines a sampled dataset by applying a learning-based sample]in such a way that the classifier
w
t
+
1
trained on
L
t
⋃
Q has the maximum performance gain[to the one or more models] under limited budget
C
.”),
wherein the training component concurrently trains the one or more models for event mention recognition, trigger extraction, argument extraction or role extraction using an
optimization algorithm with training data from which to learn(Zhuang, pgs. 91-93, see also fig. 2, “For NER, each word in the sentence will be assigned an entity type label following
the BIO tagging scheme... and the final softmax layer computes normalized probabilities over the possible entity types in
I
e
:
y
t
^
e
i
=
P
E
t
^
e
i
S
;
Θ
E
[for event mention recognition]...[f]or convenience of exhibition, we integrate NER and trigger detection in the same part in Fig. 2… and can be predicted by the following equation:
y
t
^
g
i
=
P
V
g
t
^
g
i
S
;
Θ
V
g
[trigger extraction]… [i]n the argument role prediction step… [w]e aggregate the hidden vectors of token subsequences belong to one entity(trigger) and feed the concatenation of (
g
i
,
e
j
) into a FC layer to predict the argument role as:
y
t
^
a
i
j
=
P
V
a
t
^
a
i
j
S
;
Θ
V
a
[argument extraction or role extraction]….[w]e first train the network by MLE to initialize the model parameters, which minimizes a joint negative log-likelihood function
F
... the objective of MLE is to select the optimal parameters
Θ
^
M
L
E
by minimizing
F
on training samples[concurrently trains the one or more models using an optimization algorithm with training data from which to learn]”).
Zhuang does not teach: one or more optimizers; employing one or more optimizers.
However, Mu teaches:
[wherein the training component comprise]one or more optimizers [and trains the one or more models] employing the one or more optimizers [,and wherein the one or more models comprise a shared encoded layer followed by individual self-attention layers and a feed forward network](Mu, pg., 4, “With the loss value, we used the Adam algorithm with weight decay fix (weight decay = 1e−2, learning rate = 1e−3)[one or more optimizers; employing the one or more optimizers ] to finetune the network weights interconnecting the layers (Fig. 1)....”)6,7
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Zhuang with the teachings of Mu the motivation to do so would be to incorporate a bidirectional encoded transformer and a finetuned classifier to extract and classify sematic labels related to medical synopses for better diagnoses(Mu, pg., 1, “Pathology synopses consist of semi-structured or unstructured text summarizing visual information by observing human tissue. Experts write and interpret these
synopses with high domain-specific knowledge to extract tissue semantics and formulate a
diagnosis in the context of ancillary testing and clinical information. The limited number of
specialists available to interpret pathology synopses restricts the utility of the inherent
information. Deep learning offers a tool for information extraction and automatic feature
generation from complex datasets.”).
Regarding claim 2, Zhuang in view of Mu teaches the system of claim 1, wherein the computer executable components further comprise:
a joint loss component that generates a joint loss of the one or more models based on one or more outputs of the one or more models(Zhuang, pgs. 91-93, see also fig. 2, “We first train the network by MLE to initialize the model parameters, which minimizes a joint negative log-likelihood function
F
as following:
F
=
-
l
o
g
P
(
E
,
R
,
V
|
S
;
Θ
)
…[as detailed by equation (7)] where
Θ
with different subscript are parameter sets of different sub-model[a joint loss component that generates a joint loss of the one or more models based on one or more outputs of the one or more models]… [a]fter that, we train the network by MRT to fine-tune the parameters…[t]he loss function can be approximated as:
l
M
R
T
(
Θ
)
∑
k
=
1
N
∑
y
^
∈
S
S
P
y
^
S
;
Θ
ϵ
∑
y
*
∈
S
S
P
y
*
S
;
Θ
)
ϵ
∆
y
^
,
y
[a joint loss component that generates a joint loss of the one or more models based on one or more outputs of the one or more models]”);
and an uncertainty scores component that determines one or more uncertainty scores based on the joint loss(Zhuang, pg. 93-94, see also algorithm 1, “We are prefer the sentence which have larger effect on the model quality when it is mispredicted. We use the expectation of model loss
l
y
^
i
(
θ
)
under different
y
^
i
∈
S
(
S
i
)
[based on the joint loss]to substitute for uncertainty
C
i
as:
C
i
'
=
-
1
|
Y
|
∑
y
^
i
∈
S
(
S
i
)
l
o
g
P
'
(
y
^
i
|
S
i
;
θ
)
l
y
^
i
(
θ
)
[and an uncertainty scores component that determines one or more uncertainty scores]”).8
Regarding claim 3, Zhuang in view of Mu teaches the system of claim 2, wherein the learning-based sample component selects one or more events that jointly maximize the one or more uncertainty scores obtained from the one or more models(Zhuang, pg. 86, “In the EE model, it needs to predict the event…[where] the output is
y
^
=
(
ℇ
^
,
R
^
,
V
^
)
… [in which the] event set [is]
V
^
[one or more events]” & Zhuang, pg. 93-94, see also algorithm 1, “Pool-based active learning selects the most valuable samples by designing sampling rules… the objective is to select a batch in T training rounds of Q sentences at a time…from
U
t
…we explore the uncertainty…sampling strategy… the uncertainty of
S
i
can be formulated by:
C
i
=
-
1
Y
max
y
^
i
∈
S
(
S
i
)
l
o
g
P
'
(
y
^
i
|
S
i
;
θ
)
[wherein the learning-based sample component selects one or more events that jointly maximize the one or more uncertainty scores obtained from the one or more models]” ).9
Regarding claim 4, Zhuang in view of Mu teaches the system of claim 3, wherein a plurality of subtasks comprises two or more of the following subtasks: event mention recognition, trigger extraction, argument extraction or role extraction, wherein the event mention recognition, trigger extraction, argument extraction or role extraction are the one or more models(Zhuang, pgs. 91-93, see also fig. 2, “For EE[event mention recognition], the processing step of trigger detection is similar to NER and argument role prediction as RE, while executes in another independent sub-process. For convenience of exhibition, we integrate NER and trigger detection in the same part in Fig. 2… and can be predicted by the following equation:
y
t
^
g
i
=
P
V
g
t
^
g
i
S
;
Θ
V
g
[trigger extraction]… [i]n the argument role prediction step… [w]e aggregate the hidden vectors of token subsequences belong to one entity(trigger) and feed the concatenation of (
g
i
,
e
j
) into a FC layer[wherein the event mention recognition, trigger extraction, argument extraction or role extraction are the one or more models] to predict the argument role as:
y
t
^
a
i
j
=
P
V
a
t
^
a
i
j
S
;
Θ
V
a
[argument extraction or role extraction]….”).10
Regarding claim 5, Zhuang in view of Mu teaches the system of claim 1, wherein the one or more uncertainty scores is measured between existing events in a training data and a present event(Zhuang, pg. 95, As algorithm 1 details below:
PNG
media_image2.png
453
537
media_image2.png
Greyscale
The initial model
M
o
d
e
l
0
has been trained using the labeled training dataset[wherein the one or more uncertainty scores is measured between existing events in a training data]; then in step 3 a sentence
S
i
∈
U
t
is sampled from the unlabeled dataset and in step 4 fed into the initial trained model
M
o
d
e
l
0
to output
P
'
(
y
^
i
|
S
i
;
Θ
)
[and a present event] which represents the uncertainty of each
S
i
).11
Regarding claim 6, Zhuang in view of Mu teaches the system of claim 1, wherein the training component trains the one or more models concurrently to extract the one or more triggers, the one or more arguments or the one or more roles, and wherein the shared encoded layer comprises a bidirectional long short-term memory (LSTM) or a transformer-based encoder (Zhuang, pg. 97, see also table 2 “The combination performs the best because it can leverage the advantages of the two training methods. In our framework we first pre-train the model with MLE, then optimize the local loss and the global loss with MRT. By integrating the two losses, all three extraction models are unaware of the loss from the other side and have a tighter connection among each other [as detailed by the F1-score of table 2][ wherein the training component trains the one or more models concurrently to extract the one or more triggers, the one or more arguments or the one or more roles].” & Zhuang, pg. 89, see also fig. 2, “The joint model shares the same embedding layer… bidirectional LSTM encoding layer[a bidirectional long short-term memory (LSTM)]… GAT layer… and is trained by respective sub-models in joint extraction layer…as shown in fig. 2”).12
Regarding claim 7, Zhuang in view of Mu teaches the system of claim 1, wherein the training component comprises functions to obtain probabilities of the events being misclassified(Zhuang, pgs. 92-93, “MRT uses the loss function
∆
(
y
^
,
y
)
…[w] e utilize
the F1 score of all the annotations of a sentence to compute
∆
(
y
^
,
y
)
[functions to obtain probabilities of the events being misclassified]”).
Regarding claim 8, Zhuang teaches a computer-implemented method for event extraction comprising:
training, by a system operatively coupled to a processor(Zhuang, pgs. 85-88, see also fig. 1, “We propose an unified human-machine framework… [w]e deploy Stanford CoreNLP to preprocess the data, including tokenizing, sentence splitting, pos-tagging and dependency parsing tree generating.”),13
one or more models to recognize one or more events from data input into the system(Zhuang, pg. 89, see also fig. 2, “The joint model shares the same embedding layer… bidirectional LSTM encoding layer… GAT layer… and is trained by respective sub-models in joint extraction layer…as shown [below] in fig. 2[training, by a system one or more models to recognize one or more events from data input into the system].
PNG
media_image1.png
578
660
media_image1.png
Greyscale
” ),14
extract one or more triggers from the data input into the system, extract one or more arguments from the data input into the system, or extract one or more roles from the data input into the system(Zhuang, pg. 86, “[L]et
G
=
{
g
1
,
g
2
,
g
3
,
…
}
be the set of triggers with specified event sub-types[extract one or more triggers from the data input into the system], and
A
=
{
a
11
,
a
12
,
…
,
a
i
j
…
}
be the argument set between trigger
g
i
and its argument
e
j
[extract one or more arguments from the data input into the system]. Here argument
e
j
can be an entity mention, temporal expression or value with a specific role(type)[or extract one or more roles from the data input into the system]….”& Zhuang, pgs. 91-93, “For EE, the processing step of trigger detection is similar to NER and argument role prediction as RE, while executes in another independent sub-process.”),15
wherein the training comprises [one or more optimizers] and training the one or more models [employing the one or more optimizers], and wherein the one or more models comprise a shared encoded layer followed by individual self-attention layers and a feed forward network and wherein the shared encoded layer comprises a bidirectional long short-term memory (LSTM) or a transformer-based encoder (Zhuang, pgs., 89-93, see also fig. 2, “The joint model shares the same embedding layer...bidirectional LSTM encoding layer...and is trained by respective sub-models in joint extraction layer...as shown in Fig. 2 [wherein the training component comprise and trains the one or more models]... BiLSTM...is used to encapsulate the context information over the whole sentence...
h
→
i
and
h
←
i
to represent the encoded information[a bidirectional long short-term memory (LSTM)], [g]raph attention network (GAT)... is an attention-based architecture of GNN which is to compute the hidden representations of each node by aggregating the feature information of multi-order neighborhood, followed by a self-attention strategy...
M
i
j
can be calculated by the attention mechanism as the following equation:
M
i
j
=
s
o
f
t
m
a
x
j
∈
N
i
(
δ
(
a
T
⋅
[
h
i
|
|
h
j
]
)
)
... the input
h
i
obtained from GAT layer will be fed into an FC layer[and wherein the one or more models comprise a shared encoded layer followed by individual self-attention layers and a feed forward network]....”);16,17
and determining, by the system, a sampled dataset by applying a learning-based sample to the one or more models(Zhuang, pg. 93, “Pool-based active learning selects the most valuable samples by designing sampling rules. Our pool-based strategy is a multi-round method. We denote
L
t
and
U
t
as the training set and the unlabeled set at round t. Let
θ
t
be the classifier trained on
L
t
…[g]iven budget
B
,
the objective is to select a batch in T training rounds of Q sentences at a time…from
U
t
[and determining, by the system, a sampled dataset by applying a learning-based sample]in such a way that the classifier
w
t
+
1
trained on
L
t
⋃
Q has the maximum performance gain[to the one or more models] under limited budget
C
.”).
Zhuang does not teach: one or more optimizers; employing one or more optimizers.
However, Mu teaches:
[wherein the training component comprise]one or more optimizers [and trains the one or more models] employing the one or more optimizers [,and wherein the one or more models comprise a shared encoded layer followed by individual self-attention layers and a feed forward network](Mu, pg., 4, “With the loss value, we used the Adam algorithm with weight decay fix (weight decay = 1e−2, learning rate = 1e−3)[one or more optimizers; employing the one or more optimizers ] to finetune the network weights interconnecting the layers (Fig. 1)....”)18,19
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Zhuang with the teachings of Mu the motivation to do so would be to incorporate a bidirectional encoded transformer and a finetuned classifier to extract and classify sematic labels related to medical synopses for better diagnoses(Mu, pg., 1, “Pathology synopses consist of semi-structured or unstructured text summarizing visual information by observing human tissue. Experts write and interpret these
synopses with high domain-specific knowledge to extract tissue semantics and formulate a
diagnosis in the context of ancillary testing and clinical information. The limited number of
specialists available to interpret pathology synopses restricts the utility of the inherent
information. Deep learning offers a tool for information extraction and automatic feature
generation from complex datasets.”).
Referring to dependent claims, 9-14 they are rejected on the same basis as dependent claims 2-7 since they are analogous claims.
Referring to independent claim 15, Zhuang teaches a computer program product facilitating a process to extract event data from heterogenous data sources, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor(Zhuang, pgs. 85-88, see also fig. 1, “We propose an unified human-machine framework… [w]e deploy Stanford CoreNLP to preprocess the data, including tokenizing, sentence splitting, pos-tagging and dependency parsing tree generating.”)20 and for all other claim limitations they are rejected on the same basis as independent claim 8 since they are analogous claims.
Referring to dependent claims, 16-20 they are rejected on the same basis as dependent claims 2-4 and 6-7 since they are analogous claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Grams, US 20190303800 A1(details a training procedure to extract various metadata from documents with respect to a changing threshold)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ADAM C STANDKE whose telephone number is (571)270-1806. The examiner can normally be reached Gen. M-F 9-9PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Adam C Standke/
Primary Examiner
Art Unit 2129
1 Examiner Remarks: Because Stanford CoreNLP is software that requires a computer for execution it is inherent that a computer contains a processor and a memory within.
2 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
3 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
4 Examiner Remarks: The claim limitations that are not in bold and contained in square brackets i.e. [ ] are claim limitations that are not taught by Zhuang
5 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
6 Examiner Remarks: The claim limitations that are not in bold and contained in square brackets i.e. [ ] are claim limitations taught by Zhuang
7According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
8 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
9 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
10 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
11 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
12 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
13 Examiner Remarks: Because Stanford CoreNLP is software that requires a computer for execution it is inherent that a computer contains a processor and a memory within.
14 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
15 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
16 Examiner Remarks: The claim limitations that are not in bold and contained in square brackets i.e. [ ] are claim limitations that are not taught by Zhuang
17 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
18 Examiner Remarks: The claim limitations that are not in bold and contained in square brackets i.e. [ ] are claim limitations taught by Zhuang
19According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.
20 Examiner Remarks: Because Stanford CoreNLP is software that requires a computer for execution it is inherent that a computer contains a processor and a memory within.