Last updated: July 17, 2026
Application No. 18/635,699
STRUCTURE SELF-AWARE MODEL FOR DISCOURSE PARSING ON MULTI-PARTY DIALOGUES

Final Rejection §103§112
Filed
Apr 15, 2024
Priority
Feb 22, 2021 — continuation of 12/032,916
Examiner
SERRAGUARD, SEAN ERIN
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Tencent Technology (Shenzhen) Company Limited
OA Round
2 (Final)
Interview Optional

— +36.9% interview lift. Examiner has a relatively high allowance rate (70%); +36.9% interview lift. A written response may suffice.
Based on 152 resolved cases, 2023–2026
Examiner Intelligence

SERRAGUARD, SEAN ERIN View full profile →
Grants 70% — above average
Career Allowance Rate
106 granted / 152 resolved
+7.7% vs TC avg
Strong +37% interview lift
Without
With
+36.9%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
20 currently pending
Career history
184
Total Applications
across all art units
Statute-Specific Performance

§101
0.4%
-39.6% vs TC avg
§103
94.0%
+54.0% vs TC avg
§102
2.2%
-37.8% vs TC avg
§112
3.1%
-36.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 152 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Status of the Claims 
Prior to entry of the amendment(s) and/or consideration of the argument(s), the status of the claims is as follows. 
Claim(s) 1-20 is/are pending. 
Claims 1-6 and 8-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-5 and 7-17 of U.S. Patent No. 12,032,916.
Claim 7 is rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1 and 6 of U.S. Patent No. 12,032,916 in view of Mao (CN110751038A, hereinafter Mao).
Claims 19 and 20 are objected to because of the following informalities
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention
Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Non-Patent Literature to Shi (Shi, Zhouxing, and Minlie Huang. “A deep sequential model for discourse parsing on multi-party dialogues.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. No. 01. 2019, hereinafter Shi) in view of Non-Patent Literature to Lin (Lin, Xiang, Shafiq Joty, Prathyusha Jwalapuram, and M. Saiful Bari. "A unified linear-time framework for sentence-level discourse parsing." arXiv preprint arXiv:1905.05682 (2019), hereinafter Lin) and Mao (CN110751038A, hereinafter Mao)

Response to Amendments 
Applicant’s amendment filed on 31 March 2026 has been entered. 
In view of the amendment to the claim(s), the amendment of claim(s) 1, 8, 11, and 18-20 have been acknowledged and entered.  
In view of the amendment to claim(s) 19-20, the objection to claim(s) 19-20 maintained.
In view of the amendment to claim(s) 1, 11, and 18, the rejection of claim(s) 1-20 under 35 U.S.C. §112 as previously presented is withdrawn. 
In view of the amendment to claim(s) 1, 11, and 18, the rejection of claims 1-20 under 35 U.S.C. §103 is/are maintained as modified in response to amendment, for the reasons provided in the action below.
In light of the amended claims, new grounds for rejection under 35 U.S.C. §112 are provided in the action below. 

Response to Arguments
Applicant’s arguments regarding the objections and prior art rejections under 35 U.S.C. §103, see pages 10-11 of the Response to Non-Final Office Action dated 10 September 2025, which was received on 31 March 2026 (hereinafter Response and Office Action, respectively), have been fully considered.
With respect to the objections to claims 19 and 20, applicant amended the claims to indicate “non-transitory” in the preamble. However, this amendment does not address the objection. The objection, which is maintained here, relates to improper dependency. Please note that the objection both as previously and currently presented, provides a proposed amendment to overcome the rejection. Though it is not necessary that the applicant accept the proposed amendment, it is necessary that the dependency errors be addressed. 
With respect to the rejection(s) of claim(s) 1, 11, and 20 under 35 U.S.C. §103 in light of Shi in view of Lin and Mao, applicant asserts that the cited references fail to teach or suggest at least “the SSA-GNN is applied to capture implicit structural information by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto, and the global representation is used as an initial node representation uo, and the kth layer node representation uk and edge representations x are updated, u^(k+1) may be computed with at least: u^(k+1)= Attention(u^k,x^k)”. Applicant’s arguments are not persuasive.
Applicant’s indications regarding overcoming the cited references are appreciated, at least insofar as they provide guidance as to how said references are viewed by the applicant. However, as the above arguments are regarding newly presented limitations and the arguments do not indicate specific deficiencies which the applicant believes are present in the rejection, applicant is directed to the rejection provided below for further explanation of the mapping of the references to the claim limitations.
Applicant further argues that the rejection(s) of dependent claims 2-10, 12-17, and 19-20 should be withdrawn for at least the same reasons as independent claims 1, 11, and 18. Applicant’s arguments are not persuasive for at least the same reasons as described above, with reference to claims 1, 11, and 18. 
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	

Information Disclosure Statement
The information disclosure statement filed 15 April 2024 is/are being considered by the examiner.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-6 and 8-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-5 and 7-17 of U.S. Patent No. 12,032,916. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims of the issued patent are narrower in scope than that of the instant application. Therefore, the claims of the issued patent anticipate the claims of the instant application.  Please see the below mapping with respect to the claims below.
Claim 1-20 rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-17 of U.S. Patent No. 12,032,916. in view of Mao (CN110751038A, hereinafter Mao). The claims of the issued patent match that of the instant application but does not teach the indicated limitations of the mapping, as provided below. However, Mao teaches those limitations, as further explained in the rejection under 35 U.S.C. 103 provided herein. It would have been obvious to one or ordinary skilled in the art to have modified the issued patent with the graph attention mechanisms of Mao, as the proposed recognition method “achieves the best results in both table structure recognition data sets” such as those used in dialogue systems, “especially in complex table structure recognition, the effect is obviously improved,” as recognized by Mao. (Mao, pg. 1, para. 1, pg. 3, para. 2). Please see the below mapping with respect to the claims below.
Instant Application
US Patent: 12,032,916
Claim 1: A method of dialogue parsing, executable by a processor, comprising:
Claim 1: A method of dialogue parsing, executable by a processor, comprising:
receiving dialogue data
receiving dialogue data having one or more elementary discourse units;
; initializing nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data
initializing nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the one or more elementary discourse units; {elementary discourse units are dialogue data}
; determining a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming utterances represented by the dialogue data
determining a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming each utterance represented by the one or more elementary discourse units 
 and then concatenating a last hidden state in the multiple GRUs
and then concatenating a last hidden state in the multiple GRUs
, and a global representation, by applying another bidirectional GRU on the local representations
, and a global representation, by applying another bidirectional GRU on the local representations, for each of the elementary discourse units;

generating, in a neural network, at least one edge-specific vector representing an edge between a pair of elementary discourse units

, the at least one edge-specific vector being generated based on the determined local and global representations

, the at least one edge-specific vector capturing relation information for the pair of elementary discourse units;
; identifying relationships between the elementary discourse units
identifying relationships between the elementary discourse units 

based on the at least one edge-specific vector generated in the neural network 
based on structure-aware scaled dot-product attention of the SSA-GNN
and based on structure-aware scaled dot-product attention of the SSA-GNN
and a layer-wise classifier on edge hidden states to update the hidden states
 and a layer-wise classifier on edge hidden states of each SSA-GNN layer (which updates the hidden states); and
, the SSA-GNN is applied to capture implicit structural information 
See Mao
by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto
See Mao
, and the global representation is used as an initial node representation uo, and the kth layer node representation u^k and edge representations x^k are updated, u^(k+1) may be computed with at least: u^(k+1)= Attention(u^k,x^k)

See Mao
; and predicting a contextual link between non-adjacent elementary discourse units
predicting a contextual link between non-adjacent elementary discourse units
based on the identified relationships
 based on the identified relationships




Claim 2: The method of claim 1,
Claim 2: The method of claim 1,
wherein the determining of the local representations comprises: processing each elementary discourse unit of the dialogue data through a first bidirectional gated recurrence unit
wherein the determining of the local representations comprises: processing each elementary discourse unit through a first bidirectional gated recurrence unit
; and concatenating a hidden state generated by the first bidirectional gated recurrence unit in two directions for each elementary discourse unit.
; and concatenating a hidden state generated by the first bidirectional gated recurrence unit in two directions for each elementary discourse unit


Claim 3: The method of claim 2
Claim 3: The method of claim 2
, further comprising updating the hidden state
, further comprising updating the hidden state
based on structure-aware scaled dot-product attention.
based on structure-aware scaled dot-product attention.


Claim 4: The method of claim 2,
Claim 4: The method of claim 2,
wherein determining the global representations comprises: processing each local representation through a second bidirectional gated recurrence unit.
wherein determining the global representations comprises: processing each local representation through a second bidirectional gated recurrence unit.


Claim 5: The method of claim 1,
Claim 5: The method of claim 1,
wherein the relationships between elementary discourse units of the dialogue data are identified
wherein the relationships between the elementary discourse units are identified
based on capturing implicit structural information corresponding to: a node-specific vector for each elementary discourse unit
based on capturing implicit structural information corresponding to: a node-specific vector for each elementary discourse unit
, and an edge-specific vector, of the at least one edge-specific vector generated in the neural network, for each pair of elementary discourse units.
, and an edge-specific vector, of the at least one edge-specific vector generated in the neural network, for each pair of elementary discourse units.  


Claim 6: The method of claim 1
Claim 7: The method of claim 1
, further comprising training the neural network based on a layer-wise relation classification on edge hidden states of each layer of the neural network for each pair of elementary discourse units.
, further comprising training the neural network based on a layer-wise relation classification on edge hidden states of each layer of the neural network for each pair of elementary discourse units.



Claim 7: The method of claim 6, 
Claims 1 and 6 as mapped above
wherein the layer-wise relation classification is based on at least one interim edge-specific vector at each layer of the neural network.
See Mao


Claim 8: The method of claim 1,
Claim 1 (cont): (Limitations of claim 1 are incorporated by reference)
wherein the dialogue data comprises one or more elementary discourse units,
receiving dialogue data having one or more elementary discourse units;
wherein initializing the nodes and the edges of the SSA-GNN based on the dialogue data comprises initializing the nodes and the edges of the SSA-GNN based on the one or more elementary discourse units of the dialogue data
initializing nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the one or more elementary discourse units;
, and wherein determining the local representation, by the multiple GRUs consuming the utterances represented by the dialogue data and then concatenating the last hidden state in the multiple GRUs , and the global representation, by applying the other bidirectional GRU on the local representations comprises determining the local representation, by the multiple GRUs consuming each utterance represented by the one or more elementary discourse units of the dialogue data 
determining a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming each utterance represented by the one or more elementary discourse units
and then concatenating the last hidden state in the multiple GRUs
 and then concatenating a last hidden state in the multiple GRUs
, and the global representation, by applying the other bidirectional GRU on the local representations 
, and a global representation, by applying another bidirectional GRU on the local representations
for each of the elementary discourse units.
for each of the elementary discourse units.




Claim 9: The method of claim 8
Claim 1 (cont): (Limitations of claim 1 are incorporated by reference)
, further comprising generating, in a neural network, at least one edge-specific vector representing an edge between a pair of elementary discourse units
generating, in a neural network, at least one edge-specific vector representing an edge between a pair of elementary discourse units
, the at least one edge-specific vector being generated based on the determined local and global representations
, the at least one edge-specific vector being generated based on the determined local and global representations
, the at least one edge-specific vector capturing relation information for the pair of elementary discourse units.
, the at least one edge-specific vector capturing relation information for the pair of elementary discourse units;




Claim 10: The method of claim 9,
Claim 1 (cont): (Limitations of claim 1 are incorporated by reference)
wherein identifying the relationships between the elementary discourse units based on the structure-aware scaled dot-product attention of the SSA-GNN and the layer-wise classifier on the edge hidden states comprises identifying the relationships between the elementary discourse units
identifying relationships between the elementary discourse units 
based on the at least one edge specific vector generated in the neural network and
based on the at least one edge-specific vector generated in the neural network 
based on the structure-aware scaled dot-product attention of the SSA-GNN
and based on structure-aware scaled dot-product attention of the SSA-GNN
and the layer-wise classifier on the edge hidden states of each SSA GNN layer.
 and a layer-wise classifier on edge hidden states of each SSA-GNN layer…




Claim 11: A computer system for dialogue parsing
Claim 8: A computer system for dialogue parsing, 
, the computer system comprising: one or more computer-readable non-transitory storage media configured to store computer program code
the computer system comprising: one or more computer-readable non-transitory storage media configured to store computer program code; and 
; and one or more computer processors configured to access said computer program code and operate as instructed by said computer program code
one or more computer processors configured to access said computer program code and operate as instructed by said computer program code
, said computer program code including: receiving code configured to cause the one or more computer processors to receive dialogue data
, said computer program code including: receiving code configured to cause the one or more computer processors to receive dialogue data having one or more elementary discourse units; 
; initializing code configured to cause the one or more computer processors to initialize nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data
initializing code configured to cause the one or more computer processors to initialize nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the one or more elementary discourse units; {elementary discourse units are dialogue data}
; determining code configured to cause the one or more computer processors to determine a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming utterances represented by the dialogue data and then concatenating a last hidden state in the multiple GRUs
determining code configured to cause the one or more computer processors to determine a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming each utterance represented by the one or more elementary discourse units {elementary discourse units are dialogue data} and then concatenating a last hidden state in the multiple GRUs
, and a global representation, by applying another bidirectional GRU on the local representations
, and a global representation, by applying another bidirectional GRU on the local representations, for each of the elementary discourse units;

network code configured to cause the one or more computer processors to generate, in a neural network, at least one edge-specific vector representing an edge between a pair of elementary discourse units, the at least one edge-specific vector being generated based on the determined local and global representations, the at least one edge-specific vector capturing relation information for the pair of elementary discourse units;
; identifying code configured to cause the one or more computer processors to identify relationships between the elementary discourse units of the dialogue data
identifying code configured to cause the one or more computer processors to identify relationships between the elementary discourse units (which are from the dialogue data)

based on the at least one edge-specific vector generated in the neural network 
based on structure-aware scaled dot-product attention of the SSA-GNN 
 and based on structure-aware scaled dot-product attention of the SSA-GNN 
and a layer-wise classifier on edge hidden states to update the hidden states
 and a layer-wise classifier on edge hidden states of each SSA-GNN layer; and

, the SSA-GNN is applied to capture implicit structural information 
See Mao
by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto
See Mao
, and the global representation is used as an initial node representation uo, and the kth layer node representation u^k and edge representations x^k are updated, u^(k+1) may be computed with at least: u^(k+1)= Attention(u^k,x^k)

See Mao
; and predicting code configured to cause the one or more computer processors to predict a contextual link between non-adjacent elementary discourse units
predicting code configured to cause the one or more computer processors to predict a contextual link between non-adjacent elementary discourse units 
based on the identified relationships.
based on the identified relationships


Claim 12: The computer system of claim 11,
Claim 9: The computer system of claim 8,
wherein the determining code comprises
wherein the determining code comprises
: processing code configured to cause the one or more computer processors to process each elementary discourse unit of the dialogue data through a first bidirectional gated recurrence unit
: processing code configured to cause the one or more computer processors to process each elementary discourse unit through a first bidirectional gated recurrence unit
; and concatenating code configured to cause the one or more computer processors to concatenate a hidden state generated by the first bidirectional gated recurrence unit in two directions for each elementary discourse unit.
; and concatenating code configured to cause the one or more computer processors to concatenate a hidden state generated by the first bidirectional gated recurrence unit in two directions for each elementary discourse unit.  


Claim 13: The computer system of claim 12
Claim 10: The computer system of claim 9
, further comprising updating code configured to cause the one or more computer processors to update the hidden state
, further comprising updating code configured to cause the one or more computer processors to update the hidden state
based on structure-aware scaled dot-product attention.
based on structure-aware scaled dot-product attention.


Claim 14: The computer system of claim 12,
Claim 11: The computer system of claim 9,
wherein the determining code further comprises
wherein the determining code further comprises
: second processing code configured to cause the one or more computer processors to process each local representation through a second bidirectional gated recurrence unit.
: second processing code configured to cause the one or more computer processors to process each local representation through a second bidirectional gated recurrence unit.


Claim 15: The computer system of claim 11,
Claim 12: The computer system of claim 8,
wherein the relationships between the elementary discourse units of the dialogue data are identified
wherein the relationships between the elementary discourse units {elementary discourse units are dialogue data} are identified
based on capturing implicit structural information corresponding to
based on capturing implicit structural information corresponding to
: a node-specific vector for each elementary discourse unit
: a node-specific vector for each elementary discourse unit
, and an edge-specific vector, of the at least one edge-specific vector generated in the neural network, for each pair of elementary discourse units.
, and an edge-specific vector, of the at least one edge-specific vector generated in the neural network, for each pair of elementary discourse units.


Claim 16: The computer system of claim 11
Claim 13: The computer system of claim 8
, further comprising training code configured to cause the one or more computer processors to train the neural network
, further comprising training code configured to cause the one or more computer processors to train the neural network
based on a layer-wise relation classification on edge hidden states of each layer of the neural network for each pair of elementary discourse units.
based on a layer-wise relation classification on edge hidden states of each layer of the neural network for each pair of elementary discourse units.  


Claim 17: The computer system of claim 16,
Claim 14: The computer system of claim 13,
wherein the layer-wise relation classification is based on at least one interim edge-specific vector 
wherein the layer-wise relation classification is based on at least one interim edge-specific vector
at each layer of the neural network.
 at each layer of the neural network.


Claim 18: A non-transitory computer readable medium having stored thereon a computer program for dialogue parsing
Claim 15: A non-transitory computer readable medium having stored thereon a computer program for dialogue parsing
, the computer program configured to cause one or more computer processors to
, the computer program configured to cause one or more computer processors to:
: receive dialogue data
receive dialogue data having one or more elementary discourse units;
; initialize nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data
initialize nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the one or more elementary discourse units; {elementary discourse units are part of the dialogue data}
; determine a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming utterances represented by the dialogue data and then concatenating a last hidden state in the multiple GRUs
determine a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming each utterance represented by the one or more elementary discourse units {elementary discourse units are part of the dialogue data} and then concatenating a last hidden state in the multiple GRUs
, and a global representation, by applying another bidirectional GRU on the local representations
, and a global representation, by applying another bidirectional GRU on the local representations, for each of the elementary discourse units; 

generate, in a neural network, at least one edge-specific vector representing an edge between a pair of elementary discourse units, the at least one edge-specific vector being generated based on the determined local and global representations, the at least one edge-specific vector capturing relation information for the pair of elementary discourse units;
; identify relationships between the elementary discourse units
identify relationships between the elementary discourse units 

based on the at least one edge-specific vector generated in the neural network and 
based on structure-aware scaled dot-product attention of the SSA-GNN and a layer-wise classifier on edge hidden states
based on structure-aware scaled dot-product attention of the SSA-GNN and a layer-wise classifier on edge hidden states of each SSA-GNN layer; and

, the SSA-GNN is applied to capture implicit structural information 
See Mao
by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto
See Mao
, and the global representation is used as an initial node representation uo, and the kth layer node representation u^k and edge representations x^k are updated, u^(k+1) may be computed with at least: u^(k+1)= Attention(u^k,x^k)

See Mao
; and predict a contextual link between non-adjacent elementary discourse units
predict a contextual link between non-adjacent elementary discourse units 
based on the identified relationships.
based on the identified relationships.


Claim 19: The computer readable medium of claim 15,
Claim 16: The computer readable medium of claim 15,
wherein the computer program is further configured to cause one or more computer processors to
wherein the computer program is further configured to cause one or more computer processors to
: process each elementary discourse unit through a first bidirectional gated recurrence unit
: process each elementary discourse unit through a first bidirectional gated recurrence unit
; and concatenate a hidden state generated by the first bidirectional gated recurrence unit in two directions for each elementary discourse unit.
; and concatenate a hidden state generated by the first bidirectional gated recurrence unit in two directions for each elementary discourse unit.


Claim 20: The computer readable medium of claim 16,
Claim 17: The computer readable medium of claim 16,
wherein the computer program is further configured to cause one or more computer processors to update the hidden state
wherein the computer program is further configured to cause one or more computer processors to update the hidden state
based on structure-aware scaled dot-product attention.
based on structure-aware scaled dot-product attention.


Applicant is reminded that filing of a terminal disclaimer, or filing a showing that the claims subject to the rejection are patentably distinct from the reference application’s claims, is necessary for further consideration of the rejection of the claims. Only compliance with objections or requirements as to form not necessary for further consideration of the claims may be held in abeyance until allowable subject matter is indicated. See MPEP 804(I)(B)(1).
Appropriate response is required.

Claim Objections
Claims 1, 11, and 18-20 are objected to because of the following informalities: 
Regarding claim 1, and mutatis mutandis claims 11 and 18, the limitation "each elementary discourse unit" at line 13 should read as "each of the elementary discourse units".
Regarding claim 19, the preamble of claim 19 recites “The computer readable medium of claim 15” which is understood to be a clerical error, as claim 15 is directed to a computer system, not a computer readable medium, and claim 19 appears to more appropriately depend from claim 18.
In light of the above, the following proposed claim amendment to claim 19, if accepted by the applicant, would overcome the objection: “The computer readable medium of claim [[15]]18”
Regarding claim 20, the preamble of claim 20 recites “The computer readable medium of claim 16” which is understood to be a clerical error, as claim 16 is directed to a computer system, not a computer readable medium, and claim 20 appears to more appropriately depend from claim 19.
In light of the above, the following proposed claim amendment to claim 20, if accepted by the applicant, would overcome the objection: “The computer readable medium of claim [[16]]19”.
For purposes of compact prosecution, claims 19 and 20 will be analyzed as dependent from claims 18 and 19, respectively.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, and mutatis mutandis claims 11 and 18, the limitation "the SSA-GNN is applied to capture implicit structural information by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto, and the global representation is used as an initial node representation uo, and the kth layer node representation u^k and edge representations x^k are updated, u^(k+1) may be computed with at least: u^(k+1)= Attention(u^k,x^k)” lacks clarity. 
Applicant's amendment is generally narrative, and, although citing technical components, such components are unconnected to the remaining claim components. In the context of the remaining portions of the amended claim, it is unclear whether the elements described are actually part of the method and/or how these components contribute to the overall claim. For example, claim 1 recites "A method…comprising:… the SSA-GNN is applied…". This limitation is written in passive form, indicating that an action was performed, and does not require that this action is part of the method in any way. Further, the limitation is not attached to prior steps, such as through a "wherein" statement. As such, as best as can be understood, the "SSA-GNN" is merely applied in some way, with no necessary action or temporal connection to the method. The further limitation of "to capture implicit structural information" appears to be an intended result of the "applied" limitation. As the applied limitation appears to be merely a statement of fact, and is not necessarily part of the described method, it is unclear how this intended result is related to the claim. 
The further limitation of "by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto" appears to modify the intended result of "to capture implicit structural information", which, in and of itself, modifies an earlier passive statement. Thus, the modifier "by" as applied to the intended result of a passive statement, it remains unclear if “adopting…” and “taking…” are considered part of the method or not. Even, assuming arguendo, that the modifier "by" is applied to the passive statement "applied" itself, the resulting action remains unconnected to the remaining elements of the method, as we cannot be sure that “applied” is an action performed as part of or even with relation to the claim.
Further, regarding the entire limitation, applicant has already described in dependent claim 5 that "the relationships between elementary discourse units of the dialogue data are identified based on capturing implicit structural information corresponding to: a node-specific vector for each elementary discourse unit, and an edge-specific vector, of the at least one edge-specific vector generated in the neural network, for each pair of elementary discourse units." The amended limitation "capture implicit structural information by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto" appears to be a rewrite of these limitations from claim 5. As such, does the applicant intend this as a description of previous limitations or new limitations? As the specification, neither supports performing these steps twice, as currently presented in the combination of claims 1 and 5, nor enables the same, it is unclear how such a combination could function alongside the remaining claim parts described in claim 1 and dependent claims thereon.
Regarding the limitation "the global representation is used as an initial node representation uo, and the kth layer node representation u^k and edge representations x^k are updated, u^(k+1) may be computed with at least: u^(k+1)= Attention(u^k,x^k)" the limitation that "the global representation is used as an initial node representation uo," lacks clarity. The initial node representation has no connection to any portion of the claim, as uo is not used or applied in any portion of the recited equation or further cited in any dependent limitations. As such, the initial node representation is without context. With further consideration that the global representation is "used as…", the provided limitation does not appear to change the global representation in any way or provide any further limitation to the claim itself (as uo is not used in the claim set).
The limitation of "the kth layer node representation u^k and edge representations x^k are updated" further lacks clarity. Each of these components are "updated". However, each of the variables lack antecedent basis in the claim prior to updating them. In fact, since no such variables exist in the claim, the meaning of updating variables before establishing them is unclear. Further, the updating is not connected to any other steps and is described in a passive voice, resulting in a claim limitation which changes parts which are not part of the claim, and the described changes are not required to be performed as part of the claim.
Even if said update is considered in the context of the later provided equation of the limitation "u^(k+1) may be computed with at least: u^(k+1) = Attention(u^k,x^k)", the above representations remain unclear. The claim part "u^(k+1)" is also not recited in any other part of the claim. The fact that it "may be computed" with the recited equation, is not an affirmative limitation of the claim. The phrase "may be" fails to further limit the claim part. As a result, the claim introduces a disconnected component which is not clearly described or related to any other component, and the only putative limitation is non-limiting due to the word “may.”
Therefore, for the reasons provided above, said limitation in claims 1, 11, and 18 lack clarity and is rejected.
Further regarding claim 1, and mutatis mutandis claims 11 and 18, there is insufficient antecedent basis for limitation “the kth layer node representation u^k”.
Claims 2-10, 12-17, and 19-20 depend from claims 1, 11, and 18, and incorporate all limitations therefrom. Therefore, claims 2-10, 12-17, and 19-20 are rejected under 35 U.S.C. 112(b) for at least the same reasons as claims 1, 11, and 18.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Non-Patent Literature to Shi (Shi, Zhouxing, and Minlie Huang. “A deep sequential model for discourse parsing on multi-party dialogues.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. No. 01. 2019, hereinafter Shi) in view of Non-Patent Literature to Lin (Lin, Xiang, Shafiq Joty, Prathyusha Jwalapuram, and M. Saiful Bari. "A unified linear-time framework for sentence-level discourse parsing." arXiv preprint arXiv:1905.05682 (2019), hereinafter Lin) and Mao (CN110751038A, hereinafter Mao).

Regarding claim 1, Shi discloses A method of dialogue parsing, executable by a processor (systems and methods described with reference to the "deep sequential model for discourse parsing on multi-party dialogues"; Shi, ¶ p. 7008, Col. 1, lines 12-13), comprising: receiving dialogue data ("proposed model... makes a sequential scan of the Elementary Discourse Units (EDUs) in a dialogue."; Shi, ¶ p. 7008, Col. 1, lines 16-17); determining a local representation, by... [a bidirectional gated recurrent unit (GRUs)] consuming utterances represented by the dialogue data ("In our model, we use two categories of discourse representations" including "local representations {determining a local representation}... [which] are non-structured and encode the local information of EDUs individually" where "Our model first computes the non-structured representations of the EDUs with hierarchical Gated Recurrent Unit (GRU) (Cho et al. 2014) encoders" where "the model makes a sequential scan of the EDUs {consuming utterances represented by the dialogue data}"; Shi, ¶ p. 7009, col. 1, lines 15-18; col. 2, lines 25-33) and then concatenating a last hidden state in the multiple GRUs ("For each EDU ui, a bidirectional GRU (bi-GRU) encoder is applied on the word sequence, and the last hidden states in two directions are concatenated as the local representation of ui,"; Shi, ¶ p. 7009, Col. 2, lines 34-37), and a global representation, by applying another bidirectional GRU on the local representations (The second of the "two categories of discourse representations" is "global representations {...and a global representation...} [which] encode the global information of the EDU sequence or the predicted discourse structure {... for each of the elementary discourse units}" where the system uses the "local representations of the EDUs...as input to a GRU encoder and the hidden states are viewed as the non-structured global representations of the EDUs"; Shi, ¶ p. 7009, Col. 2, lines 25-33); identifying relationships between the elementary discourse units of the dialogue data… ("These non-structured representations are used for predicting dependency relations and encoding structured representations."; Shi, ¶ p. 7009, col. 1, lines 15-20) ; and predicting a contextual link between non-adjacent elementary discourse units (The system "compute[s] the structured representation of ui once its parent and the corresponding relation type are decided" for "predicting a dependency relation {predicting a contextual link} linking from uj to ui {between... elementary discourse units}"; Shi, ¶ p. 7010, col. 1, lines 24-25; col. 1 lines 33-34) based on the identified relationships (The system computes the structured representations using "the embedding vector of relation type rji {based on the identified relationships}"; Shi, ¶ p. 7010, col. 1, line 42-col. 2, line 5). However, Shi fails to expressly recite initializing nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data; determining a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming utterances represented by the dialogue data; identifying relationships between the elementary discourse units based on structure-aware scaled dot-product attention of the SSA-GNN and a layer-wise classifier on edge hidden states, [and] the SSA-GNN is applied to capture implicit structural information by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto, and the global representation is used as an initial node representation uo, and the kth layer node representation u^k and edge representations x^k are updated, u^(k+1) may be computed with at least: u^(k+1)= Attention(u^k,x^k).
Lin teaches a “neural framework for sentence-level discourse analysis.” (Lin, Abstract). Regarding claim 1, Lin teaches determining a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming utterances represented by the dialogue data ("our encoder uses six (6) recurrent layers of BiGRU cells, and generates hidden statesH = (h1;:::;hn) by composing the word representations {local representations} sequentially from left-to-right and from right-to-left."; Lin, ¶ pg. 4, col. 1, para. 3).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the discourse parsing model of Shi to incorporate the teachings of Lin to include determining a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming utterances represented by the dialogue data. The “recurrent neural network (RNN) based on bidirectional Gated Recurrent Units or BiGRU” of Lin is based on the same original model relied on by Shi (Both authors cite the same original author Cho, and same paper, as the source for the bidirectional GRUs applied in their respective systems), and can “capture long range dependencies” while using “fewer parameters” than prior art cells, thus reducing processing overhead without sacrificing quality of results in an attention mechanism, as recognized by Lin. (Lin, p. 4, col. 1, para. 3). However, Shi and Lin fail to expressly recite initializing nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data; identifying relationships between the elementary discourse units based on structure-aware scaled dot-product attention of the SSA-GNN  and a layer-wise classifier on edge hidden states [and] the SSA-GNN is applied to capture implicit structural information by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto, and the global representation is used as an initial node representation uo, and the kth layer node representation u^k and edge representations x^k are updated, u^(k+1) may be computed with at least: u^(k+1)= Attention(u^k,x^k).
Mao teaches systems and methods for “table structure recognition” as applied to “identified machine-understandable tables” for “dialogue systems”. (Mao, pg. 3, para. 2). Regarding claim 1, Mao teaches initializing nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data (Discloses creating a graph where the table cells are nodes and their relationships are edges, and where the table and cells can correspond to "machine-understandable tables" as used in for dialogue in "dialogue systems"; Mao, ¶ pg. 5, para. 1 (step 2. and 2.); pg. 3, para. 2); identifying relationships between the elementary discourse units based on structure-aware scaled dot-product attention of the SSA-GNN (Discloses a "graph attention layer" that employs a scaled dot-product attention (as understood from the explicit use of the associated formula). Further, the attention is structure aware in that it is masked by the adjacency matrix B, forcing it to consider existing connections in the graph.; Mao, ¶ pg. 14, para. 4; pg. 15, Formula at top of page. ([0060] in the original document)) and a layer-wise classifier on edge hidden states to update the hidden states (Discloses an "edge classification model" that uses 2N graph attention components" to process and refine an "edge feature matrix {edge hidden states}," where each of the N components is a learnable layer-wise function which implicitly classifies or transforms the edge states to make them more separable for later application {a layer-wise classifier}, which updates the hidden states.; Mao, ¶ pg. 10, paras. 1-2), the SSA-GNN is applied to capture implicit structural information (Mao identifies relationships between the units by using the neural network model to predict the adjacency relationship between points (cells) and edges, where a cell is analogous to an elementary discourse unit, and the model uses adjacency matrix B to "Record the structural information of the undirected graph"; Mao, ¶ pg. 6, para. 1; pg. 10, para. 1) by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto (The model creates and uses a feature vector for each node, which it calls a "point" or "cell". The method includes extracting "the feature information of each cell (node)" which it stores in a "point feature matrix", where the graph is fully connected in that all nodes and edges are represented and all nodes are interconnected (as evidenced by FIG. 1); Mao, ¶ pg. 5, para. 8 (step 3), para. 10; FIG. 1), and the global representation is used as an initial node representation uo, and the kth layer node representation u^k and edge representations x^k are updated, u^(k+1) may be computed with at least: u^(k+1)= Attention(u^k,x^k) ("Take the point feature matrix N {kth layer node representation u^k} and edge feature matrix E {edge representations x^k} obtained in the previous step as input, and classify all edges through the edge classification model based on the graph attention mechanism." where the graph attention mechanism is the listed equation and "The function of the adjacency matrix B is to record the structural information of the undirected graph, that is, the connection relationship between the nodes and edges in the graph. The matrix B is used in the calculation inside the model {an initial node representation}, and its value does not change during the entire calculation process."; Mao, ¶ pg. 10, lines 2-5).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the discourse parsing model of Shi, as modified by the discourse analysis of Lin, to incorporate the teachings of Mao to include initializing nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data; identifying relationships between the elementary discourse units based on structure-aware scaled dot-product attention of the SSA-GNN  and a layer-wise classifier on edge hidden states [and] the SSA-GNN is applied to capture implicit structural information by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto, and the global representation is used as an initial node representation uo, and the kth layer node representation u^k and edge representations x^k are updated, u^(k+1) may be computed with at least: u^(k+1)= Attention(u^k,x^k). The graph attention mechanisms of Mao “achieves the best results in both table structure recognition data sets” such as those used in dialogue systems, “especially in complex table structure recognition, the effect is obviously improved,” as recognized by Mao. (Mao, pg. 1, para. 1, pg. 3, para. 2).

Regarding claim 2, the rejection of claim 1 is incorporated. Shi further discloses wherein the determining of the local representations comprises: processing each elementary discourse unit of the dialogue data through a first bidirectional gated recurrence unit ("For each EDU ui, a bidirectional GRU (bi-GRU) encoder is applied on the word sequence"; Shi, ¶ p. 7009, Col. 2, lines 34-35); and concatenating a hidden state generated by the first bidirectional gated recurrence unit in two directions for each elementary discourse unit ("the last hidden states in two directions are concatenated as the local representation of ui, denoted as hi."; Shi, ¶ p. 7009, Col. 2, lines 36-37).

Regarding claim 3, the rejection of claim 2 is incorporated. Shi disclose all of the elements of the current invention as stated above. However, Shi fail(s) to expressly recite further comprising updating the hidden state based on structure-aware scaled dot-product attention.
The relevance of Mao is described above with relation to claim 1. Regarding claim 3, Mao teaches further comprising updating the hidden state based on structure-aware scaled dot-product attention (The purpose of the N layers of the "graph attention components" is to update the hidden states (the "point feature matrix" and the "edge feature matrix"), which the system performs by using the scaled dot-product attention formula. As previously indicated, the attention is structure aware because it’s masked by the adjacency matrix, which encodes the graphs structure.; Mao, ¶ pg. 3, para. 15-16).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the discourse parsing model of Shi, as modified by the discourse analysis of Lin, as modified by the graph attention mechanisms of Mao, to further incorporate the teachings of Mao to include further comprising updating the hidden state based on structure-aware scaled dot-product attention. The graph attention mechanisms of Mao “achieves the best results in both table structure recognition data sets” such as those used in dialogue systems, “especially in complex table structure recognition, the effect is obviously improved,” as recognized by Mao. (Mao, pg. 1, para. 1, pg. 3, para. 2).

Regarding claim 4, the rejection of claim 2 is incorporated. Shi further discloses wherein determining the global representations comprises: processing each local representation through a second bidirectional gated recurrence unit ("The local representations of the EDUs... are taken as input to a GRU encoder and the hidden states are viewed as the non-structured global representations of the EDUs" where the GRUs are described as bidirectional GRUs.; Shi, ¶ p. 7010, col. 1, lines 1-5).

Regarding claim 5, the rejection of claim 1 is incorporated. Shi disclose all of the elements of the current invention as stated above. However, Shi fail(s) to expressly recite wherein the relationships between elementary discourse units of the dialogue data are identified  based on capturing implicit structural information corresponding to: a node-specific vector for each elementary discourse unit, and an edge-specific vector, of the at least one edge-specific vector generated in the neural network, for each pair of elementary discourse units.
The relevance of Mao is described above with relation to claim 1. Regarding claim 5, Mao teaches wherein the relationships between elementary discourse units of the dialogue data are identified (Mao identifies relationships between the units by using the neural network model to predict the adjacency relationship between points (cells) and edges, where a cell is analogous to an elementary discourse unit.; Mao, ¶ pg. 6, para. 1) based on capturing implicit structural information (the model uses adjacency matrix B to "Record the structural information of the undirected graph"; Mao, ¶ pg. 10, para. 1) corresponding to: a node-specific vector for each elementary discourse unit (The model creates and uses a feature vector for each node, which it calls a "point" or "cell". The method includes extracting "the feature information of each cell (node)" which it stores in a "point feature matrix"; Mao, ¶ pg. 5, para. 8 (step 3), para. 10), and an edge-specific vector (The method further includes extracting "the feature information of… each edge" which is stored in an edge feature matrix" by taking the "eigenvector of each edge as a row"; Mao, ¶ pg. 5, para. 8 (step 3); pg. 6, para. 1), of the at least one edge-specific vector generated in the neural network, for each pair of elementary discourse units (The model’s neural network generates new, updates edge vectors at each of the n layers where "The point to edge attention component is responsible for integrating the feature information" where the "latent representation of the edge feature matrix… is represented by H’E" and which is performed on "all edges"; Mao, ¶ pg. 10, para. 2; pg. 15, para. 1).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the discourse parsing model of Shi, as modified by the discourse analysis of Lin, as modified by the graph attention mechanisms of Mao, to further incorporate the teachings of Mao to include wherein the relationships between elementary discourse units of the dialogue data are identified  based on capturing implicit structural information corresponding to: a node-specific vector for each elementary discourse unit, and an edge-specific vector, of the at least one edge-specific vector generated in the neural network, for each pair of elementary discourse units. The graph attention mechanisms of Mao “achieves the best results in both table structure recognition data sets” such as those used in dialogue systems, “especially in complex table structure recognition, the effect is obviously improved,” as recognized by Mao. (Mao, pg. 1, para. 1, pg. 3, para. 2).

Regarding claim 6, the rejection of claim 1 is incorporated. Shi disclose all of the elements of the current invention as stated above. However, Shi fail(s) to expressly recite further comprising training the neural network based on a layer-wise relation classification on edge hidden states of each layer of the neural network for each pair of elementary discourse units.
The relevance of Mao is described above with relation to claim 1. Regarding claim 6, Mao teaches further comprising training the neural network (Discloses training a neural network in section "(A) model training"; Mao, ¶ pg. 18, para. 3) based on a layer-wise relation classification (Discloses an "edge classification model" that uses 2N graph attention components" to process and refine an "edge feature matrix {edge hidden states}," where each of the N layers are performing one step in a distributed layer-wise classification process., where each layer’s function is an implicit classification or transformation which makes the final classification possible. {a layer-wise relation classifier}; Mao, ¶ pg. 10, paras. 1-2) on edge hidden states of each layer of the neural network for each pair of elementary discourse units. (The model takes an initial "edge feature matrix" as input, where, for the i-th layer the input is the "edge feature matrix obtained by the previous layer" and "output gets the current edge feature matrix". These matrices are the edge hidden states for each layer, which, in light of Shi, is performed for each pair of elementary discourse units.; Mao, ¶ pg. 17 (inclusive)-pg. 18, para. 1; pg. 14, para. 1).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the discourse parsing model of Shi, as modified by the discourse analysis of Lin, as modified by the graph attention mechanisms of Mao, to further incorporate the teachings of Mao to include further comprising training the neural network based on a layer-wise relation classification on edge hidden states of each layer of the neural network for each pair of elementary discourse units. The graph attention mechanisms of Mao “achieves the best results in both table structure recognition data sets” such as those used in dialogue systems, “especially in complex table structure recognition, the effect is obviously improved,” as recognized by Mao. (Mao, pg. 1, para. 1, pg. 3, para. 2).

Regarding claim 7, the rejection of claim 6 is incorporated. Shi disclose all of the elements of the current invention as stated above. However, Shi fail(s) to expressly recite wherein the layer-wise relation classification is based on at least one interim edge-specific vector at each layer of the neural network.
The relevance of Mao is described above with relation to claim 1. Regarding claim 7, Mao teaches wherein the layer-wise relation classification is based on at least one interim edge-specific vector at each layer of the neural network (The layer-wise classification described in Mao relies on the edge vectors produced by the preceding layer to perform the transformation for the current layer. The output of layer i-1 is an "edge feature matrix". The rows in this matrix are the interim edge-specific vectors for that layer. These interim vectors are the direct input to the attention components of layer i, and the classification/ transformation performed at layer i is based on said vectors.; Mao, ¶ pg. 10, paras. 1-2).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the discourse parsing model of Shi, as modified by the discourse analysis of Lin, as modified by the graph attention mechanisms of Mao, to further incorporate the teachings of Mao to include wherein the layer-wise relation classification is based on at least one interim edge-specific vector at each layer of the neural network. The graph attention mechanisms of Mao “achieves the best results in both table structure recognition data sets” such as those used in dialogue systems, “especially in complex table structure recognition, the effect is obviously improved,” as recognized by Mao. (Mao, pg. 1, para. 1, pg. 3, para. 2).

Regarding claim 8, the rejection of claim 1 is incorporated. Shi disclose all of the elements of the current invention as stated above. Shi further recites wherein the dialogue data comprises one or more elementary discourse units, ("proposed model... makes a sequential scan of the Elementary Discourse Units (EDUs) in a dialogue."; Shi, ¶ p. 7008, Col. 1, lines 16-17), and wherein determining the local representation, by the multiple GRUs consuming the utterances represented by the dialogue data and then concatenating the last hidden state in the multiple GRUs, and the global representation, by applying the other bidirectional GRU on the local representations comprises determining the local representation, by the... [GRU] consuming each utterance represented by the one or more elementary discourse units of the dialogue data ("For each EDU ui, a bidirectional GRU (bi-GRU) encoder is applied on the word sequence"; Shi, ¶ p. 7009, Col. 2, lines 34-37) and then concatenating the last hidden state in the multiple GRUs ("For each EDU ui, a bidirectional GRU (bi-GRU) encoder is applied on the word sequence, and the last hidden states in two directions are concatenated as the local representation of ui,"; Shi, ¶ p. 7009, Col. 2, lines 34-37), and the global representation, by applying the other bidirectional GRU on the local representations for each of the elementary discourse units (The second of the "two categories of discourse representations" is "global representations {...and a global representation...} [which] encode the global information of the EDU sequence or the predicted discourse structure {... for each of the elementary discourse units}" where the system uses the "local representations of the EDUs...as input to a GRU encoder and the hidden states are viewed as the non-structured global representations of the EDUs"; Shi, ¶ p. 7009, Col. 2, lines 25-33). However, Shi fail(s) to expressly recite wherein initializing the nodes and the edges of the SSA-GNN based on the dialogue data comprises initializing the nodes and the edges of the SSA-GNN based on the one or more elementary discourse units of the dialogue data, [and] …determining the local representation, by the multiple GRUs consuming each utterance represented by the one or more elementary discourse units of the dialogue data.
The relevance of Lin is described above with relation to claim 1. Regarding claim 8, Lin teaches …determining the local representation, by the multiple GRUs consuming each utterance represented by the one or more elementary discourse units of the dialogue data ("our encoder uses six (6) recurrent layers of BiGRU cells, and generates hidden statesH = (h1;:::;hn) by composing the word representations {local representations} sequentially from left-to-right and from right-to-left."; Lin, ¶ pg. 4, col. 1, para. 3).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the discourse parsing model of Shi, as modified by the discourse analysis of Lin, as modified by the graph attention mechanisms of Mao, to further incorporate the teachings of Lin to include determining the local representation, by the multiple GRUs consuming each utterance represented by the one or more elementary discourse units of the dialogue data. The “recurrent neural network (RNN) based on bidirectional Gated Recurrent Units or BiGRU” of Lin is based on the same original model relied on by Shi (Both authors cite the same original author Cho, and same paper, as the source for the bidirectional GRUs applied in their respective systems), and can “capture long range dependencies” while using “fewer parameters” than prior art cells, thus reducing processing overhead without sacrificing quality of results in an attention mechanism, as recognized by Lin. (Lin, p. 4, col. 1, para. 3). However, Shi and Lin fail(s) to expressly recite wherein initializing the nodes and the edges of the SSA-GNN based on the dialogue data comprises initializing the nodes and the edges of the SSA-GNN based on the one or more elementary discourse units of the dialogue data.
The relevance of Mao is described above with relation to claim 1. Regarding claim 8, Mao teaches wherein initializing the nodes and the edges of the SSA-GNN based on the dialogue data comprises initializing the nodes and the edges of the SSA-GNN based on the one or more elementary discourse units of the dialogue data (Discloses creating a graph where the table cells are nodes and their relationships are edges, and where the table and cells can correspond to "machine-understandable tables" as used in for dialogue in "dialogue systems" processing of a document by first breaking it down into fundamental components, "cells" which are understood as elementary discourse units.; Mao, ¶ pg. 5, para. 1 (step 2. and 2.); pg. 3, para. 2).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the discourse parsing model of Shi, as modified by the discourse analysis of Lin, as modified by the graph attention mechanisms of Mao, to further incorporate the teachings of Mao to include wherein initializing the nodes and the edges of the SSA-GNN based on the dialogue data comprises initializing the nodes and the edges of the SSA-GNN based on the one or more elementary discourse units of the dialogue data. The graph attention mechanisms of Mao “achieves the best results in both table structure recognition data sets” such as those used in dialogue systems, “especially in complex table structure recognition, the effect is obviously improved,” as recognized by Mao. (Mao, pg. 1, para. 1, pg. 3, para. 2).

Regarding claim 9, the rejection of claim 8 is incorporated. Shi disclose all of the elements of the current invention as stated above. However, Shi fail(s) to expressly recite further comprising generating, in a neural network, at least one edge-specific vector representing an edge between a pair of elementary discourse units, the at least one edge-specific vector being generated based on the determined local and global representations, the at least one edge-specific vector capturing relation information for the pair of elementary discourse units.
The relevance of Mao is described above with relation to claim 1. Regarding claim 7, Mao teaches further comprising generating, in a neural network, at least one edge-specific vector representing an edge between a pair of elementary discourse units (Explicitly discloses the generation of new edge vectors. After the graph attention operation, a new "latent representation of the edge feature matrix… is represented by H’E". The rows of this matrix are the generated edge-specific vectors, where the edges represent the connection between pairs of cells/units {elementary discourse units}; Mao, ¶ pg. 10, para. 1-2), the at least one edge-specific vector being generated based on the determined local and global representations (The generation of the new edge feature matrix (H’E) is based on the point features of the input node feature matrix (H_N) {local representations} and the previous edge feature matrix (H_E) {global representation}; Mao, ¶ pg. 6, para. 1-2; pg. 11, All), the at least one edge-specific vector capturing relation information for the pair of elementary discourse units (The purpose of generating the new edge vectors is to classify them and thus "recognizes the adjacency relationship between cells"; Mao, ¶ pg. 3, para. 5).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the discourse parsing model of Shi, as modified by the discourse analysis of Lin, as modified by the graph attention mechanisms of Mao, to further incorporate the teachings of Mao to include further comprising generating, in a neural network, at least one edge-specific vector representing an edge between a pair of elementary discourse units, the at least one edge-specific vector being generated based on the determined local and global representations, the at least one edge-specific vector capturing relation information for the pair of elementary discourse units. The graph attention mechanisms of Mao “achieves the best results in both table structure recognition data sets” such as those used in dialogue systems, “especially in complex table structure recognition, the effect is obviously improved,” as recognized by Mao. (Mao, pg. 1, para. 1, pg. 3, para. 2).

Regarding claim 10, the rejection of claim 9 is incorporated. Shi disclose all of the elements of the current invention as stated above. However, Shi fail(s) to expressly recite wherein identifying the relationships between the elementary discourse units based on the structure-aware scaled dot-product attention of the SSA-GNN and the layer-wise classifier on the edge hidden states comprises identifying the relationships between the elementary discourse units based on the at least one edge specific vector generated in the neural network and based on the structure-aware scaled dot-product attention of the SSA-GNN  and the layer-wise classifier on the edge hidden states of each SSA GNN layer.
The relevance of Mao is described above with relation to claim 1. Regarding claim 10, Mao teaches wherein identifying the relationships between the elementary discourse units based on the structure-aware scaled dot-product attention of the SSA-GNN and the layer-wise classifier on the edge hidden states comprises identifying the relationships between the elementary discourse units based on the at least one edge specific vector generated in the neural network (The system identifies the relationships between the cells {EDUs} where a final classification step is performed on the "edge feature matrix of the last layer" This matrix is the final set of edge specific vectors generated in the neural network; Mao, ¶ pg. 14, para. 1) and based on the structure-aware scaled dot-product attention of the SSA-GNN (Discloses a "graph attention layer" that employs a scaled dot-product attention (as understood from the explicit use of the associated formula). Further, the attention is structure aware in that it is masked by the adjacency matrix B, forcing it to consider existing connections in the graph.; Mao, ¶ pg. 14, para. 4; pg. 15, Formula at top of page. ([0060] in the original document)) and the layer-wise classifier on the edge hidden states of each SSA GNN layer (Discloses an "edge classification model" that uses 2N graph attention components" to process and refine an "edge feature matrix {edge hidden states}," where each of the N components is a learnable layer-wise function which implicitly classifies or transforms the edge states to make them more separable for later application {a layer-wise classifier}; Mao, ¶ pg. 10, paras. 1-2).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the discourse parsing model of Shi, as modified by the discourse analysis of Lin, as modified by the graph attention mechanisms of Mao, to further incorporate the teachings of Mao to include wherein identifying the relationships between the elementary discourse units based on the structure-aware scaled dot-product attention of the SSA-GNN and the layer-wise classifier on the edge hidden states comprises identifying the relationships between the elementary discourse units based on the at least one edge specific vector generated in the neural network and based on the structure-aware scaled dot-product attention of the SSA-GNN  and the layer-wise classifier on the edge hidden states of each SSA GNN layer. The graph attention mechanisms of Mao “achieves the best results in both table structure recognition data sets” such as those used in dialogue systems, “especially in complex table structure recognition, the effect is obviously improved,” as recognized by Mao. (Mao, pg. 1, para. 1, pg. 3, para. 2).

Regarding claim 11, Shi discloses A computer system for dialogue parsing, the computer system comprising (systems and methods described with reference to the “deep sequential model for discourse parsing on multi-party dialogues” which is necessarily performed using a computer system; Shi, ¶ p. 7008, Col. 1, lines 12-13): one or more computer-readable non-transitory storage media configured to store computer program code (discloses a “deep sequential model” where the model is intrinsically stored as computer program code on computer-readable non-transitory storage media; Shi, ¶ p. 7008, Col. 1, lines 12-13); and one or more computer processors configured to access said computer program code and operate as instructed by said computer program code (the computer system is described as performing the instructions encoded in the “deep sequential model for discourse parsing,” which necessarily includes a processor which is operating as instructed by the “deep sequential model”; Shi, ¶ p. 7008, Col. 1, lines 12-13), said computer program code including: receiving code configured to cause the one or more computer processors to receive dialogue data ("proposed model... makes a sequential scan of the Elementary Discourse Units (EDUs) in a dialogue."; Shi, ¶ p. 7008, Col. 1, lines 16-17); ; determining code configured to cause the one or more computer processors to determine a local representation, by… [a bidirectional gated recurrent unit (GRU)] consuming utterances represented by the dialogue data ("In our model, we use two categories of discourse representations" including "local representations {determining a local representation}... [which] are non-structured and encode the local information of EDUs individually" where "Our model first computes the non-structured representations of the EDUs with hierarchical Gated Recurrent Unit (GRU) (Cho et al. 2014) encoders" where "the model makes a sequential scan of the EDUs {consuming utterances represented by the dialogue data}"; Shi, ¶ p. 7009, col. 1, lines 15-18; col. 2, lines 25-33) and then concatenating a last hidden state in the multiple GRUs ("For each EDU ui, a bidirectional GRU (bi-GRU) encoder is applied on the word sequence, and the last hidden states in two directions are concatenated as the local representation of ui,"; Shi, ¶ p. 7009, Col. 2, lines 34-37), and a global representation, by applying another bidirectional GRU on the local representations (The second of the "two categories of discourse representations" is "global representations {...and a global representation...} [which] encode the global information of the EDU sequence or the predicted discourse structure {... for each of the elementary discourse units}" where the system uses the "local representations of the EDUs...as input to a GRU encoder and the hidden states are viewed as the non-structured global representations of the EDUs"; Shi, ¶ p. 7009, Col. 2, lines 25-33); identifying code configured to cause the one or more computer processors to identify relationships between the elementary discourse units… ("These non-structured representations are used for predicting dependency relations and encoding structured representations."; Shi, ¶ p. 7009, col. 1, lines 15-20); and predicting code configured to cause the one or more computer processors to predict a contextual link between non-adjacent elementary discourse units (The system "compute[s] the structured representation of ui once its parent and the corresponding relation type are decided" for "predicting a dependency relation {predicting a contextual link} linking from uj to ui {between... elementary discourse units}"; Shi, ¶ p. 7010, Col. 1, lines 24-25; Col. 1 lines 33-34) based on the identified relationships (The system computes the structured representations using "the embedding vector of relation type rji {based on the identified relationships}"; Shi, ¶ p. 7010, Col. 1, line 42-Col. 2, line 5). However, Shi fails to expressly recite initializing nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data; determining a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming utterances represented by the dialogue data; identifying relationships between the elementary discourse units based on structure-aware scaled dot-product attention of the SSA-GNN and a layer-wise classifier on edge hidden states.
The relevance of Lin is described above with relation to claim 1. Regarding claim 11, Lin teaches determining code configured to cause the one or more computer processors to determine a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming utterances represented by the dialogue data and then concatenating a last hidden state in the multiple GRUs ("our encoder uses six (6) recurrent layers of BiGRU cells, and generates hidden statesH = (h1;:::;hn) by composing the word representations {local representations} sequentially from left-to-right and from right-to-left."; Lin, ¶ pg. 4, col. 1, para. 3).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the discourse parsing model of Shi to incorporate the teachings of Lin to include further comprising determining code configured to cause the one or more computer processors to determine a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming utterances represented by the dialogue data and then concatenating a last hidden state in the multiple GRUs. The “recurrent neural network (RNN) based on bidirectional Gated Recurrent Units or BiGRU” of Lin is based on the same original model relied on by Shi (Both authors cite the same original author Cho, and same paper, as the source for the bidirectional GRUs applied in their respective systems), and can “capture long range dependencies” while using “fewer parameters” than prior art cells, thus reducing processing overhead without sacrificing quality of results in an attention mechanism, as recognized by Lin. (Lin, p. 4, col. 1, para. 3). However, Shi and Lin fail to expressly recite initializing code configured to cause the one or more computer processors to initialize nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data; identifying code configured to cause the one or more computer processors to identify relationships between the elementary discourse units based on structure-aware scaled dot-product attention of the SSA-GNN and a layer-wise classifier on edge hidden states.
The relevance of Mao is described above with relation to claim 1. Regarding claim 11, Mao teaches initializing code configured to cause the one or more computer processors to initialize nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data (Discloses creating a graph where the table cells are nodes and their relationships are edges, and where the table and cells can correspond to "machine-understandable tables" as used in for dialogue in "dialogue systems"; Mao, ¶ pg. 5, para. 1 (step 2. and 2.); pg. 3, para. 2); identifying code configured to cause the one or more computer processors to identify relationships between the elementary discourse units of the dialogue data based on structure-aware scaled dot-product attention of the SSA-GNN (Discloses a "graph attention layer" that employs a scaled dot-product attention (as understood from the explicit use of the associated formula). Further, the attention is structure aware in that it is masked by the adjacency matrix B, forcing it to consider existing connections in the graph.; Mao, ¶ pg. 14, para. 4; pg. 15, Formula at top of page. ([0060] in the original document)) and a layer-wise classifier on edge hidden states to update the hidden states (Discloses an "edge classification model" that uses 2N graph attention components" to process and refine an "edge feature matrix {edge hidden states}," where each of the N components is a learnable layer-wise function which implicitly classifies or transforms the edge states to make them more separable for later application {a layer-wise classifier}, which updates the hidden states.; Mao, ¶ pg. 10, paras. 1-2), the SSA-GNN is applied to capture implicit structural information (Mao identifies relationships between the units by using the neural network model to predict the adjacency relationship between points (cells) and edges, where a cell is analogous to an elementary discourse unit, and the model uses adjacency matrix B to "Record the structural information of the undirected graph"; Mao, ¶ pg. 6, para. 1; pg. 10, para. 1) by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto (The model creates and uses a feature vector for each node, which it calls a "point" or "cell". The method includes extracting "the feature information of each cell (node)" which it stores in a "point feature matrix", where the graph is fully connected in that all nodes and edges are represented and all nodes are interconnected (as evidenced by FIG. 1); Mao, ¶ pg. 5, para. 8 (step 3), para. 10; FIG. 1), and the global representation is used as an initial node representation uo, and the kth layer node representation u^k and edge representations x^k are updated, u^(k+1) may be computed with at least: u^(k+1)= Attention(u^k,x^k) ("Take the point feature matrix N {kth layer node representation u^k} and edge feature matrix E {edge representations x^k} obtained in the previous step as input, and classify all edges through the edge classification model based on the graph attention mechanism." where the graph attention mechanism is the listed equation and "The function of the adjacency matrix B is to record the structural information of the undirected graph, that is, the connection relationship between the nodes and edges in the graph. The matrix B is used in the calculation inside the model {an initial node representation}, and its value does not change during the entire calculation process."; Mao, ¶ pg. 10, lines 2-5).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the discourse parsing model of Shi, as modified by the discourse analysis of Lin, to incorporate the teachings of Mao to include initializing nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data; identifying relationships between the elementary discourse units based on structure-aware scaled dot-product attention of the SSA-GNN  and a layer-wise classifier on edge hidden states [and] the SSA-GNN is applied to capture implicit structural information by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto, and the global representation is used as an initial node representation uo, and the kth layer node representation u^k and edge representations x^k are updated, u^(k+1) may be computed with at least: u^(k+1)= Attention(u^k,x^k). The graph attention mechanisms of Mao “achieves the best results in both table structure recognition data sets” such as those used in dialogue systems, “especially in complex table structure recognition, the effect is obviously improved,” as recognized by Mao. (Mao, pg. 1, para. 1, pg. 3, para. 2).

Regarding claim 12, the rejection of claim 11 is incorporated. Claim 12 is substantially the same as claim 2 and is therefore rejected under the same rationale as above.

Regarding claim 13, the rejection of claim 12 is incorporated. Claim 13 is substantially the same as claim 3 and is therefore rejected under the same rationale as above.

Regarding claim 14, the rejection of claim 12 is incorporated. Claim 14 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Regarding claim 15, the rejection of claim 11 is incorporated. Claim 15 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Regarding claim 16, the rejection of claim 11 is incorporated. Claim 16 is substantially the same as claim 6 and is therefore rejected under the same rationale as above.

Regarding claim 17, the rejection of claim 16 is incorporated. Claim 17 is substantially the same as claim 7 and is therefore rejected under the same rationale as above.

Regarding claim 18, Shi discloses A non-transitory computer readable medium having stored thereon a computer program for dialogue parsing, the computer program configured to cause one or more computer processors to (systems and methods described with reference to the “deep sequential model for discourse parsing on multi-party dialogues” where the “deep sequential model” is intrinsically stored as computer program code on computer-readable non-transitory storage media, and where performing the instructions encoded in the “deep sequential model for discourse parsing” necessarily includes a processor which is operating as instructed by the “deep sequential model”; Shi, ¶¶ p. 7008, Col. 1, lines 12-13): receive dialogue data ("proposed model... makes a sequential scan of the Elementary Discourse Units (EDUs) in a dialogue."; Shi, ¶ p. 7008, Col. 1, lines 16-17); determine a local representation, by... [a bidirectional gated recurrent unit (GRUs)] consuming utterances represented by the dialogue data ("In our model, we use two categories of discourse representations" including "local representations {determining a local representation}... [which] are non-structured and encode the local information of EDUs individually" where "Our model first computes the non-structured representations of the EDUs with hierarchical Gated Recurrent Unit (GRU) (Cho et al. 2014) encoders" where "the model makes a sequential scan of the EDUs {consuming utterances represented by the dialogue data}"; Shi, ¶ p. 7009, col. 1, lines 15-18; col. 2, lines 25-33) and then concatenating a last hidden state in the multiple GRUs ("For each EDU ui, a bidirectional GRU (bi-GRU) encoder is applied on the word sequence, and the last hidden states in two directions are concatenated as the local representation of ui,"; Shi, ¶ p. 7009, Col. 2, lines 34-37), and a global representation, by applying another bidirectional GRU on the local representations (The second of the "two categories of discourse representations" is "global representations {...and a global representation...} [which] encode the global information of the EDU sequence or the predicted discourse structure {... for each of the elementary discourse units}" where the system uses the "local representations of the EDUs...as input to a GRU encoder and the hidden states are viewed as the non-structured global representations of the EDUs"; Shi, ¶ p. 7009, Col. 2, lines 25-33); identify relationships between the elementary discourse units… ("These non-structured representations are used for predicting dependency relations and encoding structured representations."; Shi, ¶ p. 7009, col. 1, lines 15-20) ; and predict a contextual link between non-adjacent elementary discourse units (The system "compute[s] the structured representation of ui once its parent and the corresponding relation type are decided" for "predicting a dependency relation {predicting a contextual link} linking from uj to ui {between... elementary discourse units}"; Shi, ¶ p. 7010, Col. 1, lines 24-25; Col. 1 lines 33-34) based on the identified relationships (The system computes the structured representations using "the embedding vector of relation type rji {based on the identified relationships}"; Shi, ¶ p. 7010, Col. 1, line 42-Col. 2, line 5). However, Shi fails to expressly recite initializing nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data; determining a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming utterances represented by the dialogue data; identifying relationships between the elementary discourse units based on structure-aware scaled dot-product attention of the SSA-GNN and a layer-wise classifier on edge hidden states.
The relevance of Lin is described above with relation to claim 1. Regarding claim 18, Lin teaches determine a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming utterances represented by the dialogue data ("our encoder uses six (6) recurrent layers of BiGRU cells, and generates hidden statesH = (h1;:::;hn) by composing the word representations {local representations} sequentially from left-to-right and from right-to-left."; Lin, ¶ pg. 4, col. 1, para. 3).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the discourse parsing model of Shi to incorporate the teachings of Lin to include determining a local representation, by multiple bidirectional gated recurrent units (GRUs) consuming utterances represented by the dialogue data. The “recurrent neural network (RNN) based on bidirectional Gated Recurrent Units or BiGRU” of Lin is based on the same original model relied on by Shi (Both authors cite the same original author Cho, and same paper, as the source for the bidirectional GRUs applied in their respective systems), and can “capture long range dependencies” while using “fewer parameters” than prior art cells, thus reducing processing overhead without sacrificing quality of results in an attention mechanism, as recognized by Lin. (Lin, p. 4, col. 1, para. 3). However, Shi and Lin fail to expressly recite initializing nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data; identifying relationships between the elementary discourse units based on structure-aware scaled dot-product attention of the SSA-GNN  and a layer-wise classifier on edge hidden states.
The relevance of Mao is described above with relation to claim 1. Regarding claim 18, Mao teaches initialize nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data (Discloses creating a graph where the table cells are nodes and their relationships are edges, and where the table and cells can correspond to "machine-understandable tables" as used in for dialogue in "dialogue systems"; Mao, ¶ pg. 5, para. 1 (step 2. and 2.); pg. 3, para. 2); identify relationships between the elementary discourse units based on structure-aware scaled dot-product attention of the SSA-GNN (Discloses a "graph attention layer" that employs a scaled dot-product attention (as understood from the explicit use of the associated formula). Further, the attention is structure aware in that it is masked by the adjacency matrix B, forcing it to consider existing connections in the graph.; Mao, ¶ pg. 14, para. 4; pg. 15, Formula at top of page. ([0060] in the original document)) and a layer-wise classifier on edge hidden states to update the hidden states (Discloses an "edge classification model" that uses 2N graph attention components" to process and refine an "edge feature matrix {edge hidden states}," where each of the N components is a learnable layer-wise function which implicitly classifies or transforms the edge states to make them more separable for later application {a layer-wise classifier}, which updates the hidden states.; Mao, ¶ pg. 10, paras. 1-2), the SSA-GNN is applied to capture implicit structural information (Mao identifies relationships between the units by using the neural network model to predict the adjacency relationship between points (cells) and edges, where a cell is analogous to an elementary discourse unit, and the model uses adjacency matrix B to "Record the structural information of the undirected graph"; Mao, ¶ pg. 6, para. 1; pg. 10, para. 1) by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto (The model creates and uses a feature vector for each node, which it calls a "point" or "cell". The method includes extracting "the feature information of each cell (node)" which it stores in a "point feature matrix", where the graph is fully connected in that all nodes and edges are represented and all nodes are interconnected (as evidenced by FIG. 1); Mao, ¶ pg. 5, para. 8 (step 3), para. 10; FIG. 1), and the global representation is used as an initial node representation uo, and the kth layer node representation u^k and edge representations x^k are updated, u^(k+1) may be computed with at least: u^(k+1)= Attention(u^k,x^k) ("Take the point feature matrix N {kth layer node representation u^k} and edge feature matrix E {edge representations x^k} obtained in the previous step as input, and classify all edges through the edge classification model based on the graph attention mechanism." where the graph attention mechanism is the listed equation and "The function of the adjacency matrix B is to record the structural information of the undirected graph, that is, the connection relationship between the nodes and edges in the graph. The matrix B is used in the calculation inside the model {an initial node representation}, and its value does not change during the entire calculation process."; Mao, ¶ pg. 10, lines 2-5).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the discourse parsing model of Shi, as modified by the discourse analysis of Lin, to incorporate the teachings of Mao to include initializing nodes and edges of a structural self-aware graph neural network (SSA-GNN) based on the dialogue data; identifying relationships between the elementary discourse units based on structure-aware scaled dot-product attention of the SSA-GNN  and a layer-wise classifier on edge hidden states [and] the SSA-GNN is applied to capture implicit structural information by adopting a vector for each elementary discourse unit of the dialogue data and taking a fully connected graph as an input thereto, and the global representation is used as an initial node representation uo, and the kth layer node representation u^k and edge representations x^k are updated, u^(k+1) may be computed with at least: u^(k+1)= Attention(u^k,x^k). The graph attention mechanisms of Mao “achieves the best results in both table structure recognition data sets” such as those used in dialogue systems, “especially in complex table structure recognition, the effect is obviously improved,” as recognized by Mao. (Mao, pg. 1, para. 1, pg. 3, para. 2).

Regarding claim 19, the rejection of claim 18 is incorporated. Claim 19 is substantially the same as claim 2 and is therefore rejected under the same rationale as above.

Regarding claim 20, the rejection of claim 19 is incorporated. Claim 20 is substantially the same as claim 3 and is therefore rejected under the same rationale as above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
He et al. (CN110888980A) discloses an implicit textual relationship identification method based on a knowledge-enhanced attention neural network.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached at (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Primary Examiner, Art Unit 2657
Read full office action
Prosecution Timeline

Apr 15, 2024
Application Filed
Sep 10, 2025
Non-Final Rejection mailed — §103, §112
Dec 05, 2025
Response after Non-Final Action
Dec 05, 2025
Response Filed
Mar 31, 2026
Response Filed
Jun 29, 2026
Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/992,269
Patent 12682919
SYSTEM AND METHOD FOR REAL-TIME IDENTIFICATION OF DISSATISFACTION DATA
3y 7m to grant Granted Jul 14, 2026
18/163,062
Patent 12675641
DETECTION OF INTERACTION EVENTS IN RECORDED AUDIO STREAMS
3y 5m to grant Granted Jul 07, 2026
18/059,386
Patent 12646505
CONVERSATIONAL RECOMMENDATION METHOD, METHOD OF TRAINING MODEL, DEVICE AND MEDIUM
3y 6m to grant Granted Jun 02, 2026
18/465,509
Patent 12645891
AI-BASED EMAIL GENERATOR
2y 8m to grant Granted Jun 02, 2026
18/221,274
Patent 12614545
CROSS-DEVICE DATA SYNCHRONIZATION BASED ON SIMULTANEOUS HOTWORD TRIGGERS
2y 9m to grant Granted Apr 28, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
70%
Grant Probability
99%
With Interview (+36.9%)
3y 0m (~9m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 152 resolved cases by this examiner. Grant probability derived from career allowance rate.