Office Action Analysis: 18119494 — Method, Apparatus for Determining Answer to Question, Device, Storage Medium and Program Product

Office Action

§103 §112
DETAILED ACTION
This communication is in response to Application No. 18/119,494 filed on March 9th, 2023 in which claims 1-20 are presented for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in the People’s Republic of China on 09/27/2022. Acknowledgment is also made of receipt of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file.

Information Disclosure Statement
The information disclosure statements submitted on 07/12/2023 and 09/05/2023 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements were considered by the examiner.

Specification
The contents of the specification are sufficient for examination purposes.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

Claim Objections
Claims 2-3, 5, 7, 9-10, 12, 14, 16-17, and 19 objected to because of the following informalities:
“the full content-level feature are obtained” should be “the full content-level feature is obtained” or “the full content-level features are obtained” (Claim 2, ln. 7-8; Claim 9, ln. 7-8; Claim 16, ln. 7-8) (objection applies equally to dependent Claims 5, 7, 12, 14, and 19).
“one vertical reasoning layers for each reasoning focuses” should be “one vertical reasoning layer for each reasoning focus” or “one vertical reasoning layer for each of the reasoning focuses” (Claim 3, ln. 14; Claim 10, ln. 15; Claim 17, ln. 15).
“the number of the spliced candidate answers” should be amended to more clearly indicate that the number is the quantity of spliced candidate answers, such as “the quantity of spliced candidate answers in the long answer” (Claim 5, ln. 16; Claim 12, ln. 16; Claim 19, ln. 17) (objection applies equally to dependent Claims 7 and 14).
Appropriate correction is required.

Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.


Regarding Claim 1, the claim recites the limitation “preset” to describe multiple elements in the claim (ln. 7 and ln. 11). In each instance, it is unclear what the described element is “preset” in advance of. 
As a result, the scope of the claim is indefinite because one cannot reasonably ascertain what qualifies as “preset”. Therefore, the claim is rejected. The claim should be amended to clarify, for each instance where “preset” is recited, what the associated element is “preset” in advance of. 

Regarding Claim 2, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 3, the claim recites the limitation “preset” to describe multiple elements in the claim (ln. 2 and ln. 4), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner. 
Additionally, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 4, the claim recites the limitation “preset” to describe an element of the claim (ln. 8), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner. 
Additionally, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 5, the recites the limitation “preset” to describe an element of the claim (ln. 8), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner.
Additionally, the claim recites the limitation “feature extracting module”, which has been evaluated under the three-prong test set forth in MPEP § 2181, subsection I, but the result is inconclusive. Thus, it is unclear whether this limitation should be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the generic placeholder is modified by a word, which is ambiguous regarding whether it conveys structure or function. The boundaries of this claim limitation are ambiguous; therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.  
In response to this rejection, applicant must clarify whether this limitation should be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. Mere assertion regarding applicant’s intent to invoke or not invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph is insufficient. Applicant may:
(a)	Amend the claim to clearly invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, by reciting “means” or a generic placeholder for means, or by reciting “step.” The “means,” generic placeholder, or “step” must be modified by functional language, and must not be modified by sufficient structure, material, or acts for performing the claimed function;
(b)	Present a sufficient showing that 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, should apply because the claim limitation recites a function to be performed and does not recite sufficient structure, material, or acts to perform that function; 
(c)	Amend the claim to clearly avoid invoking 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, by deleting the function or by reciting sufficient structure, material or acts to perform the recited function; or
(d)	Present a sufficient showing that 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, does not apply because the limitation does not recite a function or does recite a function along with sufficient structure, material or acts to perform that function.
Furthermore, the claim recites the term “consistent with” (ln. 15), which is a relative term that renders the claim indefinite. The term “consistent with” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. As a result, one cannot reasonably ascertain what qualifies as “a number that is consistent with the number” (ln. 15-16). Therefore, the claim is rejected. 
Finally, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 6, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 7, the recites the limitation “preset” to describe multiple elements of the claim (ln. 4, 8, 12, 13, 15, 20, 25, 27, 36, 45, 48, 51, 52, and 54), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner.
Additionally, the claim recites the limitations “feature extracting module”, “horizontal reasoning module”, “vertical reasoning module”, and “feature matching module”, which have been evaluated under the three-prong test set forth in MPEP § 2181, subsection I, but the result is inconclusive. Thus, it is unclear whether this limitation should be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the generic placeholders are modified by words, which are ambiguous regarding whether it conveys structure or function. The boundaries of these claim limitations are ambiguous; therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.  Applicant must clarify whether these limitations should be interpreted under 35 U.S.C. 112(f), in the manner described in regard to the rejection of Claim 5 above.
Furthermore, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 8, the claim recites the limitation “preset” to describe multiple elements in the claim (ln. 12 and ln. 16), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner.

Regarding Claim 9, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 10, the claim recites the limitation “preset” to describe multiple elements in the claim (ln. 2 and ln. 4), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner. 
Additionally, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 11, the claim recites the limitation “preset” to describe an element of the claim (ln. 10), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner. 
Additionally, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 12, the recites the limitation “preset” to describe an element of the claim (ln. 8), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner.
Additionally, the claim recites the limitation “feature extracting module”, which has been evaluated under the three-prong test set forth in MPEP § 2181, subsection I, but the result is inconclusive. Thus, it is unclear whether this limitation should be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the generic placeholder is modified by a word, which is ambiguous regarding whether it conveys structure or function. The boundaries of this claim limitation are ambiguous; therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.  
Applicant must clarify whether these limitations should be interpreted under 35 U.S.C. 112(f), in the manner described in regard to the rejection of Claim 5 above.
Furthermore, the claim recites the limitation “consistent with” (ln. 15-16), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 5 above. Therefore, the claim is similarly rejected and should be amended in a similar manner.
Finally, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 13, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 14, the recites the limitation “preset” to describe multiple elements of the claim (ln. 4, 8, 12, 13, 15, 20, 25, 27, 36, 45, 48, 51, 52, and 54), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner.
Additionally, the claim recites the limitations “feature extracting module”, “horizontal reasoning module”, “vertical reasoning module”, and “feature matching module”, which have been evaluated under the three-prong test set forth in MPEP § 2181, subsection I, but the result is inconclusive. Thus, it is unclear whether this limitation should be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the generic placeholders are modified by words, which are ambiguous regarding whether it conveys structure or function. The boundaries of these claim limitations are ambiguous; therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.  Applicant must clarify whether these limitations should be interpreted under 35 U.S.C. 112(f), in the manner described in regard to the rejection of Claim 5 above.
Furthermore, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 15, the claim recites the limitation “preset” to describe multiple elements in the claim (ln. 9 and ln. 13), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner.

Regarding Claim 16, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 17, the claim recites the limitation “preset” to describe multiple elements in the claim (ln. 3 and ln. 4), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner. 
Additionally, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 18, the recites the limitation “preset” to describe an element of the claim (ln. 11), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner. 
Additionally, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 19, the recites the limitation “preset” to describe an element of the claim (ln. 9), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner.
Additionally, the claim recites the limitation “feature extracting module”, which has been evaluated under the three-prong test set forth in MPEP § 2181, subsection I, but the result is inconclusive. Thus, it is unclear whether this limitation should be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the generic placeholder is modified by a word, which is ambiguous regarding whether it conveys structure or function. The boundaries of this claim limitation are ambiguous; therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.  
Applicant must clarify whether these limitations should be interpreted under 35 U.S.C. 112(f), in the manner described in regard to the rejection of Claim 5 above.
Furthermore, the claim recites the limitation “consistent with” (ln. 15-16), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 5 above. Therefore, the claim is similarly rejected and should be amended in a similar manner.
Finally, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 20, the claim is rejected because it is dependent upon a rejected claim.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 4, 8, 11, 15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Tran et al. (hereinafter Tran) (Patent No. US 2021/0081503 A1) in view of Zhang et al. (hereinafter Zhang) (“A Densely Connected Criss-Cross Attention Network for Document-level Relation Extraction”).

Regarding Claim 1, Tran teaches a method for determining an answer to a question, the method comprising (Abstract, “The present disclosure relates to systems, methods, and non-transitory computer-readable media that can determine an answer to a query”): 
splicing an acquired to-be-queried question with each candidate answer into each question-answer pair (Para. [0047], “one or more components can pre-process the query 208 and any contextual information (e.g., for conversion to an embedding via a word-vector representation model). Once the query 208 is in vector form (e.g., a single vector for a smaller query 208 or a sequence of sub-vectors for a larger query 208), the answer selection system 106 can combine the query 208 with one or more of the candidate answers 210 (also converted to vector form) to form an input vector 211 representing both the query 208 and a given candidate answer of the candidate answers 210”, where “combin[ing]” the question “query” with “a given candidate answer” into a “input vector representing both the query 2018 and a given candidate answer” is within the broadest reasonable interpretation of splicing an acquired to-be-queried question with a candidate answer; see also Para. [0048], “the answer selection system 106 can generate multiple unique input vectors by combining the query 208 with respective candidate answers of the candidate answers 210”, where the “respective candidate answers” from the database “of the candidate answers 210” are the candidate answers, each of which is spliced into a question-answer pair, the “multiple unique input vectors”; see generally Fig. 2, where the “Query 208” is an acquired to-be-queried question, such as “Hey Assistant, How Do Killer Whales Coordinate Attacks On Prey”, which can be acquire through input, see Para. [0052], “one or both of the query 208 and contextual information can be in textual format from the outset . . . For example, the user 202 may choose to type the query 208 into the computing device 206”); 
performing reasoning operations of feature combination parameters on different granularity features of each question-answer pair (Fig. 3B and Para. [0060], “FIG. 3B illustrates a process flow 320 for using the gated-self attention mechanism 303 to determine a gated self-attention output vector 306 for determining a candidate answer match probability 310, in accordance with one or more embodiments of the present disclosure . . . utilizing the memory network in tandem with gate features . . .”, where the “process flow 320” comprises reasoning operations, such as “the gated-self attention mechanism 303” and the “utilizing the memory network”, which are performed using parameters, see Para. [0016], “the gated-self attention mechanism can use one or more functions with learned parameters to determine, based on the above inputs, self-attention outputs of the gated self-attention output vector”, and performed on each question-answer pair “314” and their different features, “ck” and “xk1” through “xkn”, contained in the inputs “302”, see Para. [0054], “The inputs 302 as shown in FIG. 3A [where 3B is variant including the memory network] include an input vector denoted as x.sub.1.sup.k . . . x.sub.n.sup.k that can each include a combination of a query (or a portion thereof) and a candidate answer as described above”; see also Para. [0032], “the answer selection system can refine one or more learned parameters (e.g., learned via the question-answer dataset) by applying the one or more learned parameters at the target dataset. In some embodiments, the target dataset can include one or more candidate answers to a query, while in other embodiments, arbitrary and/or unrelated to the query”, where the “learned parameters” are feature combination parameters at least because they are “learned” and “refine[d]” using the combined features of the “quest-answer dataset” and the “target dataset”; see also Para. [0047], “form an input vector 211 representing both the query 208 and a given candidate answer of the candidate answers 210. As mentioned, for larger queries 208 (e.g., a paragraph, column, page, etc.), the input vector 211 may include a sequence of input sub-vectors” and Para. [0054], “the answer selection system 106 can feed the inputs 302 to the gated self-attention mechanism 303 for attention to the query, a given candidate answer, and contextual information . . . as opposed to utilizing the context vector or a single input vector alone”, where features analyzed by the reasoning operations are of different granularities, such as features from “input sub-vectors” of varying “query” sizes and broader “context vector” features)
at a preset number of steps in a horizontal direction based on recurrent characteristics of a recurrent neural network (Para. [0060] – [0061], “By utilizing the memory network in tandem with gate features of the gated self-attention output vector 306, the answer selection system 106 can remember important and/or relevant information, forget unimportant and/or irrelevant information, etc. by updating the cell state of the memory network. In turn, the updated cell state of the memory network can influence a next context vector state 318 (denoted as c.sub.k+1) relative to a current context vector state 315 (denoted as c.sub.k) . . . .sub.i.sup.k+1 represents the input vector 316 as the next cell state of the memory network at the k+l reasoning hop; and x.sub.i.sup.k represents the input vector 314 as the current cell state of the memory network at the kth reasoning hop”, where the “memory network” allows for the repeated “reasoning hop[s]” to “update” values, which is within the broadest reasonable interpretation of steps in a horizontal direction, and can be based on the recurrent characteristics of a “recurrent neural network”, see Para. [0028], “the memory network can include a recurrent neural network”; see also Para. [0070], “For consistency of experimentation, the data shown in the table 500 was obtained by setting the number of reasoning hops to be two”, where “setting the number of reasoning hops to be two” for “consistency of experimentation” is a preset number of steps); 
determining feature combination weights of the different granularity features using [reasoning components] (Para. [0055], “a gated self-attention matrix 304 with its associated values is depicted in matrix form (albeit other forms are herein contemplated). To generate or otherwise populate the values of the gated self-attention matrix 304, the gated self-attention mechanism 303 receives the inputs 302 and can execute one or more of the following example algorithms [Algorithm 00001] where W and b represent learned parameters shared among functions”, where the reasoning components, “gated self-attention mechanism” use weights “W”, which as shown in Fig. 3B and Algorithm 1 are in different combinations with the features in order to generate the multiple “values of the self-attention matrix”, are used to determine the “gated self-attention matrix 304” of “the inputs 302”, which as discussed above includes the different granularity features, see Para. [0047], “form an input vector 211 representing both the query 208 and a given candidate answer of the candidate answers 210. As mentioned, for larger queries 208 (e.g., a paragraph, column, page, etc.), the input vector 211 may include a sequence of input sub-vectors” and Para. [0054], “the answer selection system 106 can feed the inputs 302 to the gated self-attention mechanism 303 for attention to the query, a given candidate answer, and contextual information . . . as opposed to utilizing the context vector or a single input vector alone”; see also Fig. 3B and Para. [0056], “Thus, according to the above expression for generating the gated self-attention output vector 306, the answer selection system 106 can aggregate an input vector and the context vector both weighted by various self-attention outputs a, which can include values depicted in the gated self-attention matrix 304”, where “self-attention outputs a” can also be viewed as feature combination weights, which act on the different granularity features, “302” to generate the feature combination “306”)
. . . at each step of the reasoning operations in the horizontal direction (Para. [0060] – [0061], “By utilizing the memory network in tandem with gate features of the gated self-attention output vector 306, the answer selection system 106 can remember important and/or relevant information, forget unimportant and/or irrelevant information, etc. by updating the cell state of the memory network. In turn, the updated cell state of the memory network can influence a next context vector state 318 (denoted as c.sub.k+1) relative to a current context vector state 315 (denoted as c.sub.k) . . . .sub.i.sup.k+1 represents the input vector 316 as the next cell state of the memory network at the k+l reasoning hop; and x.sub.i.sup.k represents the input vector 314 as the current cell state of the memory network at the kth reasoning hop”, where, as discussed above, the reasoning operations occur of Fig. 3B occur for each repeated “reasoning hop[s]” to “update” values, which is within the broadest reasonable interpretation of steps in a horizontal direction, and can be based on the recurrent characteristics of a “recurrent neural network”); 
obtaining a candidate answer feature corresponding to each question-answer pair, respectively, through a final step of the reasoning operations (Fig. 3B and Para. [0062], “the answer selection system 106 can determine the next context vector state 318 based on the current gated self-attention output vector state 307 of the GSAM, the current context vector state 315, and the input vector 316 . . . The answer selection system 106 can then pass the next context vector state 318 to the probability function 308 for determining the candidate answer match probability 310”, where a candidate answer feature, “the next context vector state 318”, is iteratively obtaining by the “answer selection system”, which is passed to “the probability function 308” after the final step of reasoning of “utilizing the memory network in tandem with gate features of the gated self-attention output vector 306”, see Para. [0060]- [0062], and which corresponds with each question-answer pair, because it is iteratively generated from horizontal reasoning outputs, “based on the current gated self-attention output vector state 307 of the GSAM, the current context vector state 315, and the input vector 316”, from the input “302” containing the pairs “314”, which is used “for determining the candidate answer match”); and 
determining a target candidate answer matching the to-be-queried question (Fig. 2, where a target candidate answer is determined, such as “Candidate Answer 345-MX From Naturegeographic.com”, as matching the to-be-queried question, “Selected Answer Match 216”; see also Para. [0051], “For example, as shown, the answer selection system 106 can return the selected answer match 216 that corresponds to a candidate answer 345-MX of the candidate answers 210, in which the example website nationalgeographic.com includes a best match response to the query 208 out of the candidate answers 210”)
based on a feature similarity between a question feature of the to-be-queried question and each candidate answer feature (Para. [0062], “The answer selection system 106 can then pass the next context vector state 318 to the probability function 308 for determining the candidate answer match probability 310 that includes a matching probability between the query and the respective candidate answer as described above in conjunction with FIG. 3A”, where the “context vector state 318” is used to generate the “candidate answer match probability”, based on a feature similarity between the “respective candidate answer” component of each candidate answer feature, “318”, and features of the “query”, as further outlined “as described above in conjunction with FIG. 3A”, see Para. [0054] - [0058], “in FIG. 3A . . . the answer selection system 106 can feed the inputs 302 to the gated self-attention mechanism 303 for attention to the query, a given candidate answer, and contextual information. In so doing, as opposed to utilizing the context vector or a single input vector alone, the answer selection system 106 can improve accuracy and flexibility for determining the candidate answer match probability 310 . . . The answer selection system 106 can then pass the current gated self-attention output vector state 307 to the probability function 308 for determining the candidate answer match probability 310 that includes a matching probability between the query and the respective candidate answer”).
Tran does not explicitly disclose . . . using multiple preset vertical reasoning layers at different reasoning focuses respectively . . . wherein the vertical reasoning layers are in serial connection to each other . . . .
However, Zhang teaches . . . [performing attention-based reasoning on text pairs] using multiple preset vertical reasoning layers at different reasoning focuses respectively (Pg. 1, Col. 1, Abstract, “Document-level relation extraction (RE) aims to identify relations between two entities in a given documen . . . the Dense-CCNet performs entity-pair-level logical reasoning through the Criss-Cross Attention (CCA), which can collect contextual information in horizontal and vertical directions on the entity-pair matrix to enhance the corresponding entity-pair representation”, where “vertical” components of “logical reasoning” include multiple preset layers, see Pg. 5, col. 2, Para. 4, “We set the number of layers of Dense-CCNet to 3”, and are at different reasoning focuses, which are also preset such that the “lower layers can capture local interdependence” and “the upper layers can capture global interdependence”, see Pg. 2, Col. 2, Para. 1, “The lower layers in DenseCCNet can capture local interdependence among entity-pairs and complete single-hop logical reasoning, while the upper layers can capture global interdependence among entity-pairs and complete multi-hop logical reasoning”) . . . 
[at each step of the reasoning operations in the horizontal direction,] wherein the vertical reasoning layers are in serial connection to each other . . . (Pg. 4, Col. 1, Para. 3, “the CCA module can complete entity-pair-level one-hop reasoning on the entity-pair matrix, and it is possible to complete multi-hop reasoning by stacking multiple layers of the CCA module” and Pg. 2, Col. 1, Para. 1, “To fully capture the features of single-hop and multi-hop reasoning, we stack the multi-layer modules CCA modules by the densely connected framework. The lower layers in DenseCCNet can capture local interdependence among entity-pairs and complete single-hop logical reasoning, while the upper layers can capture global interdependence among entity-pairs and complete multi-hop logical reasoning”, where “stacked” “lower” and “upper” layers in within the broadest reasonable interpretation of serial connection of vertical layers and where each horizontal reasoning step, “single-hop” uses multiple vertical reasoning layers, an individualized “lower . . . single-hop” layer and a “upper . . . multi-hop” used by multiple horizontal reasoning layers in the aggregate; see generally Pg. 3, Fig. 2, where the “overall architecture of our Dense-CCNet-based document-level RE model” is depicted).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the method for determining an answer to a question by performing horizontal reasoning operations on a question and candidate answer pair, wherein feature combination weights are determined for different granularity features at each step in the horizontal reasoning operations of Tran with the performing of attention-based reasoning on text pairs using multiple preset vertical reasoning layers with different reasoning focuses at each step of horizontal reasoning operations, wherein the vertical layers are in serial connection of Zhang in order to improve model reasoning by capturing enhanced entity-pair representations with increased ability to capture relevance relationships directly and efficiently (Zhang, Pg. 1, Col. 1, Abstract, “the Dense-CCNet performs entity-pair-level logical reasoning through the Criss-Cross Attention (CCA), which can collect contextual information in horizontal and vertical directions on the entity-pair matrix to enhance the corresponding entity-pair representation”; Zhang, Pg. 1-2, Col. 2-1, Para. 2-3, “capturing the relevance of the relationships is essential to improve the reasoning ability of document-level RE models . . . In this paper, we use the information transfer between the entity-pairs to capture the correlation between relationships more efficiently and directly”), which contributes to improved performance (Zhang, Pg. 1, Col. 1, Abstract, “Experimental results demonstrate that our model achieves state-of-the-art performance on these three datasets”).

Regarding Claim 4, Tran in view of Zhang teach the method according to claim 1, wherein determining a target candidate answer matching the to-be-queried question (Tran, Fig. 2, where a target candidate answer is determined, such as “Candidate Answer 345-MX From Naturegeographic.com”, as matching the to-be-queried question, “Selected Answer Match 216”; see also Tran, Para. [0051], “For example, as shown, the answer selection system 106 can return the selected answer match 216 that corresponds to a candidate answer 345-MX of the candidate answers 210, in which the example website nationalgeographic.com includes a best match response to the query 208 out of the candidate answers 210”)
based on the feature similarity between the question feature of the to-be-queried question and the candidate answer feature, comprises: calculating an actual feature similarity between the question feature and each candidate answer feature, respectively (Tran, Para. [0062], “The answer selection system 106 can then pass the next context vector state 318 to the probability function 308 for determining the candidate answer match probability 310 that includes a matching probability between the query and the respective candidate answer as described above in conjunction with FIG. 3A”, where the “context vector state 318” is used to generate the “candidate answer match probability”, which is within the broadest reasonable interpretation of an actual feature similarity, based on a feature similarity between the “respective candidate answer” component of each candidate answer feature, “318”, and features of the “query”, as further outlined “as described above in conjunction with FIG. 3A”, see Tran, Para. [0054] - [0058], “in FIG. 3A . . . the answer selection system 106 can feed the inputs 302 to the gated self-attention mechanism 303 for attention to the query, a given candidate answer, and contextual information. In so doing, as opposed to utilizing the context vector or a single input vector alone, the answer selection system 106 can improve accuracy and flexibility for determining the candidate answer match probability 310 . . . The answer selection system 106 can then pass the current gated self-attention output vector state 307 to the probability function 308 for determining the candidate answer match probability 310 that includes a matching probability between the query and the respective candidate answer”),
determining a candidate answer feature having the actual feature similarity greater than a preset similarity as the target candidate answer feature (Tran, Para. [0030], “the answer selection system generates matching probabilities for combinations of respective candidate answers and the query” and Tran, Para. [0062], “The answer selection system 106 can then pass the next context vector state 318 to the probability function 308 for determining the candidate answer match probability 310 that includes a matching probability between the query and the respective candidate answer”, where the target answer feature of the “select[ed]” “answer”, is determined based on the candidate answer feature, “318”, that produces the greatest actual feature similarity, “the candidate match probability 310” that produces the “best query-candidate answer match”, see Tran, Para. [0059], “such that the answer selection system 106 can select the best query-candidate answer match”; which, in view of Zhang, is greater than a preset threshold, see Zhang, Pg. 5, Col. 1, Para. 4, “we use adaptive thresholding loss (Zhou et al., 2021), which learns an adaptive threshold for each entity pair . . . where TH is an introduced class to separate positive classes and negative classes: positive classes would have higher probabilities than TH, and negative classes would have lower probabilities than TH, PD and ND are the positive classes set and negative classes set in document D respectively”, where the “adaptive” nature of the threshold “TH” does not preclude it being preset because it is initially an “introduce class” set in advance, which “adaptively” changes from its initial preset value); and 
determining a candidate answer corresponding to the target candidate answer feature as the target candidate answer matching the to-be-queried question (Tran, Fig. 2, where a target candidate answer is determined, such as “Candidate Answer 345-MX From Naturegeographic.com”, as matching the to-be-queried question, “Selected Answer Match 216”; see also Tran, Para. [0051], “For example, as shown, the answer selection system 106 can return the selected answer match 216 that corresponds to a candidate answer 345-MX of the candidate answers 210, in which the example website nationalgeographic.com includes a best match response to the query 208 out of the candidate answers 210”; see also Tran, Para. [0059], “such that the answer selection system 106 can select the best query-candidate answer match . . . the answer selection system 106 can utilize a gated self-attention mechanism 303 to account for each of the query, the contextual information, and a set of candidate answers while also accurately and flexibly controlling the flow of information used to generate the candidate answer match probability 310 and ultimately provide a response to the query”, where the candidate answer corresponding to the target candidate answer feature with the “best query-candidate answer match” is selected).  
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the method for determining a target candidate answer to a question by determining that a candidate answer feature has an actual feature similarity greater than the actual feature similarities of other candidate features of Tran in view of Zhang with the preset threshold of Zhang in order to separate low similarity question-answer pairs from high similarity question-answer pairs, which will prevent the determination of an objectively bad answer as the candidate answer, even if it is comparatively good relative to the other candidate answers (Zhang, 2, Col. 2, Para. 2, “we found that more than 90% of the entity pairs are irrelevant (that is, there is no relationship between two entities) in the document, and these entity pairs may limit the model’s reasoning ability”, where a dataset with “more than 90% . . . entity pairs . . . [that are] irrelevant” has an increased probability that the most relevant answer will not be objectively good; see also Zhang, Pg. 4, Para. 1, “We design a clustering loss function that separates the related entity pairs and the unrelated entity pairs”, where “unrelated entity pairs” should not be selected as an answer even if they are the best candidate), which would contribute to improved performance (Zhang, Pg. 1, Col. 1, Abstract, “Experimental results demonstrate that our model achieves state-of-the-art performance on these three datasets”).

Regarding Claim 8, Tran teaches an apparatus for determining an answer to a question, the apparatus comprising: at least one processor; and a memory storing instructions, wherein the instructions when executed by the at least one processor, cause the at least one processor to perform operations, the operations comprising . . . (Fig. 8 and Para. [0105], “The computing device 800 includes memory 804, which is coupled to the processor(s) 802”, where the apparatus, “computing device 800” comprises “processor(s) 802” and “memory 804”; see also Para. [0104], “As an example, and not by way of limitation, to execute instructions, the processor(s) 802 may retrieve (or fetch) the instructions from an internal register, an internal cache, memory 804, or a storage device 806 and decode and execute them”, where the “memory 804” stores “instructions” which can be executed by the “processor(s) 802”; 
 instructions are stored in memory and executed by the processor; see also Para. [0082], “Each of the components 602-616 of the answer selection system 106 can include software, hardware, or both. For example, the components 602-616 can include one or more instructions stored on a computer-readable storage medium and executable by processors of one or more computing devices, such as a client device or server device. When executed by the one or more processors, the computer-executable instructions of the answer selection system 106 can cause the computing device(s) to perform the methods described herein”, where the “processors” “cause the computing device(s) to perform the methods described herein”, which includes determining an answer to a question, see Abstract, “The present disclosure relates to systems, methods, and non-transitory computer-readable media that can determine an answer to a query”).
The remaining limitations are substantially the same as limitations of Claim 1, therefore it is rejected under the same rationale.	

Regarding Claim 11, the additional elements of the dependent claim are substantially the same as limitations of Claim 4, therefore it is rejected under the same rationale.

Regarding Claim 15, Tran teaches a non-transitory computer readable storage medium storing computer instructions, wherein, the computer instructions are used to cause the computer to perform operations comprising . . . (Para. [0093], “one or more of the processes described herein may be implemented at least in part as instructions embodied in a non-transitory computer-readable medium and executable by one or more computing devices . . . thereby performing one or more processes, including one or more of the processes described herein”).
The remaining limitations are substantially the same as limitations of Claim 1, therefore it is rejected under the same rationale.

Regarding Claim 18, the additional elements of the dependent claim are substantially the same as limitations of Claim 4, therefore it is rejected under the same rationale.

Claims 2, 9, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Zhang and Yang et al. (hereinafter Yang) (“Hierarchical Attention Networks for Document Classification”).
  
Regarding Claim 2, Tran in view of Zhang teach the method according to claim 1, wherein the different granularity features comprise . . . (Tran, Para. [0047], “form an input vector 211 representing both the query 208 and a given candidate answer of the candidate answers 210. As mentioned, for larger queries 208 (e.g., a paragraph, column, page, etc.), the input vector 211 may include a sequence of input sub-vectors” and Tran, Para. [0054], “the answer selection system 106 can feed the inputs 302 to the gated self-attention mechanism 303 for attention to the query, a given candidate answer, and contextual information . . . as opposed to utilizing the context vector or a single input vector alone”),
[input documents organized such that] . . . words contained in a sentence in a sequence in which the words form the sentence, and . . .  sentences contained in full question-answering content in a sequence in which the sentences form the full question-answering content (Tran, Para. [0023], “the query can include words . . .For instance, the query can include a sequence of words (either written or spoken). For example, a query can include one or more words that form one or more sentences in the form of statements and/or questions”, where the sentence-level features are “a sequence of words” to form the sentence and the content-level features are “one or more sentences in the form of statements and/or questions”, which form the full provided question-answer content).  
	Tran in view of Zhang do not explicitly disclose . . . a word-level feature, a sentence-level feature, and a full content-level feature, the sentence-level feature is obtained by splicing the word-level features of . . . and the full content-level feature are obtained by splicing sentence-level features of . . .
	However, Yang teaches [text-based reasoning] (Pg. 1, Col. 1, Abstract, “We propose a hierarchical attention network for document classification”) 
[on] . . . a word-level feature, a sentence-level feature, and a full content-level feature (Pg. 2, Col. 1, Para. 1, “our model includes two levels of attention mechanisms (Bahdanau et al., 2014; Xu et al., 2015) — one at the word level and one at the sentence level — that let the model to pay more or less attention to individual words and sentences when constructing the representation of the document”, where the “word level” “attention to individual words” evaluates word-level granularity features, the “sentence level” “attention to . . . sentences” evaluates sentence-level granularity features, and the “representation of the document” is the full content-level feature; see also Pg. 2, Col. 2, Fig. 2, where the “word encoder” encodes words to form word-level features and the “sentence encoder” encodes sentences to form sentence-level features, for use in generating the full-context level feature), 
the sentence-level feature is obtained by splicing the word-level features of [words contained in a sentence in a sequence in which the words form the sentence] (Pg. 1, Col. 2, Para. 2, “First, since documents have a hierarchical structure (words form sentences, sentences form a document), we likewise construct a document representation by first building representations of sentences and then aggregating those into a document representation” and Pg. 3, Col. 1, Para. 2-5, “we first embed the words to vectors . . . and aggregate the representation of those informative words to form a sentence vector”, where the “sentence vector” sentence-level features are obtained by “aggregating”  the “word . . . vector” word-level features, which is within the broadest reasonable interpretation of splicing),
 and the full content-level feature are obtained by splicing sentence-level features of [sentences contained in full question-answering content in a sequence in which the sentences form the full question-answering content] (Pg. 1, Col. 2, Para. 2, “First, since documents have a hierarchical structure (words form sentences, sentences form a document), we likewise construct a document representation by first building representations of sentences and then aggregating those into a document representation” and Pg. 3, Col. 2, Para. 2, “Given the sentence vectors si , we can get a document vector in a similar way . . . The document vector v is a high level representation of the document and can be used as features”, where the “high level representation” “document vector” “feature” is obtained “in a similar way” to the sentence vectors, which as discussed above are within the broadest reasonable interpretation of splicing).
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the method for determining an answer to a question by performing reasoning operations on features of different granularities, wherein the inputs are organized such that sentences are sequences of words and full content inputs are sequences of sentences of Tran in view of Zhang with the text-based reasoning on word-level features, sentence-level features, and full content-level features, wherein the sentence-level features are obtained by splicing the word-level features into a sequence to form a sentence and the full content-level features are obtained by splicing the sentence-level features into a sequence to form a full context of Yang in order to maintain the hierarchical structure of the to-be-queried question (Yang, Pg. 1-2, Col. 1-2, Para. 2-1, “the Hierarchical Attention Network (HAN) that is designed to capture two basic insights about document structure. First, since documents have a hierarchical structure (words form sentences, sentences form a document), we likewise construct a document representation by first building representations of sentences and then aggregating those into a document representation”), which allows for context-based relevancy determinations (Yang, Pg. 2, Col. 1, Para. 1-2, “Second, it is observed that different words and sentences in a documents are differentially informative Moreover, the importance of words and sentences are highly context dependent . . . The key difference to previous work is that our system uses context to discover when a sequence of tokens is relevant rather than simply filtering for (sequences of) tokens, taken out of context”) and contributes to improved performance (Yang, Pg. 1, Abstract, “Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin”).

Regarding Claim 9, the additional elements of the dependent claim are substantially the same as limitations of Claim 2, therefore it is rejected under the same rationale.

Regarding Claim 16, the additional elements of the dependent claim are substantially the same as limitations of Claim 2, therefore it is rejected under the same rationale.

Claims 3, 10, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Zhang, Graves (“Adaptive Computation Time for Recurrent Neural Networks”), and Mihaylov et al. (hereinafter Mihaylov) (“SemanticZ at SemEval-2016 Task 3: Ranking Relevant Answers in Community Question Answering Using Semantic Similarity Based on Fine-tuned Word Embeddings”).

Regarding Claim 3, Tran in view of Zhang teach the method according to claim 1, wherein the method further comprises: pre-constructing a preset number of vertical reasoning layers, wherein the pre-constructing a preset number of vertical reasoning layers, comprises (Zhang, Pg. 2, Col. 2, Para. 1, “To fully capture the features of single-hop and multi-hop reasoning, we stack the multi-layer modules CCA modules by the densely connected framework. The lower layers in DenseCCNet can capture local interdependence among entity-pairs and complete single-hop logical reasoning, while the upper layers can capture global interdependence among entity-pairs and complete multi-hop logical reasoning” and Zhang, Pg. 4, Col. 1, Para. 2, “the Densfe-CCNet module consists of densely connected N identical layers”, where a number of preset vertical reasoning layers “lower layers” and “upper layers” are pre-constructed to form the “Dense-CCNet”, which can be preset to a number “N”, such as “3”, see Zhang, Pg. 5, col. 2, Para. 4, “We set the number of layers of Dense-CCNet to 3): 
. . .
generating one vertical reasoning layers for each reasoning focuses, respectively (Zhang, Pg. 2, Col. 2, Para. 1, “The lower layers in DenseCCNet can capture local interdependence among entity-pairs and complete single-hop logical reasoning, while the upper layers can capture global interdependence among entity-pairs and complete multi-hop logical reasoning”, where at least one “lower layer” and at least one “upper layer” are generated for the reasoning focuses of “local” and “global”).  
The reasons for obviousness, in regard to the combination of Tran with Zhang, were discussed in regard to the rejection of Claim 1 above and remain applicable here.  
Tran in view of Zhang do not explicitly disclose . . . determining a first corpus length of the to-be- queried question and a domain complexity of a domain to which the to-be-queried question belongs; determining a second corpus length of each candidate answer in a candidate answer base of the domain corresponding to the to-be-queried question; determining, based on the domain complexity, the first corpus length and the second corpus length, an actual number of reasoning focuses; and . . .
However, Graves teaches . . . determining . . . a domain complexity of a domain to which the to-be-queried question belongs (Pg. 1, Abstract, “Overall, performance is dramatically improved by the use of ACT, which successfully adapts the number of computational steps to the requirements of the problem . . . with more computation allocated to harder-to-predict transitions, such as spaces between words and ends of sentences”, where “the requirements of the problem” is within the broadest reasonable interpretation of complexity, which is determined form input data, “with more computation allocated to harder-to-predict transitions, such as spaces between words and ends of sentences”, and is specific to the domain to which the input to-be-queried question belongs, see Pg. 1, Para. 2, “we expect the effort required to find a satisfactory route between two cities, or the number of queries needed to check a particular fact, to vary greatly, and unpredictably, from case to case”, where “find a satisfactory route between two cities” and “check a particular fact” are within the broadest reasonable interpretation of a to-be-queried question) 
. . . [and] determining, based on the domain complexity . . . an actual number of reasoning focuses (Pg. 2, Para. 1, “the effective depth of the network at each step along the sequence becomes a dynamic function of the inputs received so far”, where the focuses of vertical reasoning, “depth of the network”, is “dynamic[ally]” determined by the features of the “inputs”, which, as discussed above, includes the domain complexity; see also Pg. 1, Abstract, “This paper introduces Adaptive Computation Time (ACT), an algorithm that allows recurrent neural networks to learn how many computational steps to take between receiving an input and emitting an output”).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the method for determining an answer to a question, comprising pre-constructing a preset number of vertical reasoning layers by generating one vertical reasoning layer for each reasoning focus of Tran in view of Zhang with the determining the domain complexity of a to-be-queried question and determining an actual number of reasoning focuses based on the domain complexity of Graves in order to dramatically improve performance of the question answering method (Graves, Pg. 1, Abstract, “performance is dramatically improved by the use of ACT, which successfully adapts the number of computational steps to the requirements of the problem”) by limiting the number of reasoning focuses to what is required to answer the question (Graves, Pg. 2, Para. 4, “We would like the network to be parsimonious in its use of computation, ideally limiting itself to the minimum number of steps necessary to solve the problem”).
Additionally, Mihaylov teaches . . . [a method of determining answers to questions, comprising] (Pg. 1, Col. 1, Abstract, “We describe our system for finding good answers in a community forum, as defined in SemEval-2016, Task 3 on Community Question Answering. Our approach relies on several semantic similarity features”) 
[determining] a first corpus length of the to-be- queried question (Pg. 3, Section “4.2 Features”, Col. 2, Para. 7, “Question length. If the question is longer, it may be more clear, which may help users give a more relevant answer”, where “Question length” is a first corpus length)
. . . determining a second corpus length of each candidate answer in a candidate answer base of the domain corresponding to the to-be-queried question . . . (Pg. 3, Section “4.2 Features”, Col. 2, Para. 6, “Answer length. The assumption here is that longer answers could bring more useful detail”, where “Answer length” is a second corpus length, which is done for each candidate answer in the corresponding answer domain “find an answer that already exists in the forum”, see Pg. 1, Col. 1, Para. 2, “The main subtask (Subtask C) asks to find an answer that already exists in the forum and will be appropriate as a response to a newly posted question”; see also Pg. 2, Col. 1, Para. 2, “Subtask C, there are 317 original questions, 3,169 related questions, and 31,690 comments”)
[and a determination related to focus based on] the first corpus length and the second corpus length [and other factors] . . . (Pg. 2, Col. 2, Para. 2, “For each comment, we extract variety of features from both the question and the comment, and we train a classifier to label comments as Good or Bad with respect to the thread question. We rank the comments in each question according to the classifier’s score of being classified as Good with respect to the question”, where the “features from both the question and the comment”, which, as discussed above, includes the first and second corpus length, are used to “rank the comments”, which is a determination related to focus).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the method for determining an answer to a question, comprising pre-constructing a preset number of vertical reasoning layers by the determining the domain complexity of a to-be-queried question, determining an actual number of reasoning focuses based on the domain complexity, and generating one vertical reasoning layer for each reasoning focus of Tran in view of Zhang and Graves with the method of determining answers to questions, comprising determining corpus lengths of a question and each domain-associated candidate answer and making a determination related to focus based on the corpus lengths and other factors of Mihaylov in order to limit the number of reasoning focuses to what is required to answer the question (Graves, Pg. 2, Para. 4, “We would like the network to be parsimonious in its use of computation, ideally limiting itself to the minimum number of steps necessary to solve the problem”), which will vary based on the lengths of the questions and candidate answers (Mihaylov, Pg. 3, Section “4.2 Features”, Col. 2, Para. 6-7, “Answer length. The assumption here is that longer answers could bring more useful detail. Question length. If the question is longer, it may be more clear, which may help users give a more relevant answer”) and to expend greater computational resources on better candidate answer pairs (compare Graves, Pg. 1, Abstract, “which successfully adapts the number of computational steps to the requirements of the problem . . . with more computation allocated to harder-to-predict transitions”, where the characteristics of the inputs can be used to adjust allocated resources, with Mihaylov, Pg. 2, Col. 2, Para. 2, “For each comment, we extract variety of features from both the question and the comment, and we train a classifier to label comments as Good or Bad with respect to the thread question. We rank the comments in each question according to the classifier’s score of being classified as Good with respect to the question”, where “Good” answers are prioritized over “Bad” answers).

Regarding Claim 10, the additional elements of the dependent claim are substantially the same as limitations of Claim 3, therefore it is rejected under the same rationale.

Regarding Claim 17, the additional elements of the dependent claim are substantially the same as limitations of Claim 3, therefore it is rejected under the same rationale.

Claims 5, 7, 12, 14, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Zhang, Yang, and Devlin et al. (hereinafter Devlin) (“BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”).

Regarding Claim 5, Tran in view of Zhang and Yang teach the method according to claim 2, wherein the method further comprises: 
generating a word-level feature for a candidate answer, wherein generating the word-level feature for the candidate answer, comprises (Tran, Fig. 3B; Tran, Para. [0060], “FIG. 3B illustrates a process flow 320 for . . . determining a candidate answer match probability 310, in accordance with one or more embodiments of the present disclosure . . . utilizing the memory network in tandem with gate features . . .”; and Tran, Para. [0054], “The inputs 302 as shown in FIG. 3A [where 3B is variant including the memory network] include an input vector denoted as x.sub.1.sup.k . . . x.sub.n.sup.k that can each include a combination of a query (or a portion thereof) and a candidate answer as described above”, where the features “ck” and “xk1” through “xkn” of the candidate answers contained in the pairs, “314”, are generated for use as inputs,  “302”, in the “process flow 320” ; which, in view of Yang, includes word-level features, see Yang, Pg. 3, Col. 1, Para. 2, “we first embed the words to vectors”): 
splicing multiple candidate answers into a long candidate answer . . . (Tran, Para. [0047], “the answer selection system 106 can combine the query 208 with one or more of the candidate answers 210 (also converted to vector form) to form an input vector 211”, where an “input vector 211” with more than one “candidate answer” is a long candidate answer, which is obtained through splicing, “combin[ing]”); 
inputting the long candidate answer to a preset feature extracting module to generate a word-level long- answer feature (Tran, Fig. 3B, where each candidate answer “314”, including the long candidate answers discussed above, are input, as a component of  “302”, to the process flow “320”; which, in view of Yang, includes a feature extraction module, see Yang, Pg. 2, Col. 2, Fig. 2, where the “Hierarchical Attention Network” must have a feature extraction module to extract the word “w”, sentence “s”, and content level features “v” of the long-answer, including required hardware and software, which must be preset before operation; see also Yang, Pg. 3, Col. 1, Para. 2, “we first embed the words to vectors . . . We obtain an annotation for a given word wit by concatenating the forward hidden state −→h it and backward hidden state ←− h it, i.e., hit = [−→h it, ←− h it], which summarizes the information of the whole sentence centered around wit”, where word-level long-answer features, “summarizes the information of the whole sentence centered around wit”, are generated by the feature extraction module); 
determining, in the word-level long-answer feature . . . by the feature extracting module . . . [in order to perform] splitting the word-level long-answer feature into short-answer features with a number that is consistent with the number of the spliced candidate answers . . . (Yang, Pg. 3, Col. 1-2, Para. 2-1, “We obtain an annotation for a given word wit by concatenating the forward hidden state −→h it and backward hidden state ←− h it, i.e., hit = [−→h it, ←− h it], which summarizes the information of the whole sentence centered around wit . . . Not all words contribute equally to the representation of the sentence meaning. Hence, we introduce attention mechanism to extract such words that are important to the meaning of the sentence . . . Specifically, [Equation 5 – Equation 7] That is, we first feed the word annotation hit through a one-layer MLP to get uit as a hidden representation of hit, then we measure the importance of the word as the similarity of uit with a word level context vector uw and get a normalized importance weight αit through a softmax function”, where the feature extraction module splits the word- level long answer features, “summarizes the information of the whole sentence centered around wit”, into short answer features, “a normalized importance weight αit through a softmax function”, which are consistent with the number of the spliced candidate answers because it corresponds with the number of “relevant” words in the number of the spliced candidate answers, see Yang, Pg. 2, Col. 1, Para. 1-2, “Second, it is observed that different words and sentences in a documents are differentially informative Moreover, the importance of words and sentences are highly context dependent . . . The key difference to previous work is that our system uses context to discover when a sequence of tokens is relevant)
and obtaining a word-level feature corresponding to each candidate answer, based on the short answer features corresponding to the candidate answers (Tran, Fig. 3B, where each candidate answer “314”, including the long candidate answers formed by splicing the candidate answers discussed above, are input, as a component of  “302”, to the process flow “320”; which, in view of Yang, obtains word-level features corresponding to the candidate answers based on the short answer features “αit”, which represents word “importance” in the obtained sentence-level feature, see Yang, Pg. 3, Col. 2, Para. 1, “a word level context vector uw and get a normalized importance weight αit through a softmax function. After that, we compute the sentence vector si (we abuse the notation here) as a weighted sum of the word annotations based on the weights”; see also Yang, Pg. 3, Col. 1, Para. 2-5, “we first embed the words to vectors . . . and aggregate the representation of those informative words to form a sentence vector” and Yang, Pg. 1-2, Col. 1-2, Para. 2-1, “the Hierarchical Attention Network (HAN) that is designed to capture two basic insights about document structure. First, since documents have a hierarchical structure (words form sentences, sentences form a document), we likewise construct a document representation by first building representations of sentences and then aggregating those into a document representation”).
The reasons for obviousness, in regard to the combination of Tran and Zhang with Yang, were discussed in regard to the rejection of Claim 2 above and remain applicable here. 
Tran in view of Zhang and Yang do not explicitly disclose . . . by attaching respective splicing position marks . . . mark features obtained by processing the splicing position marks . . . based on the mark features . . . .
	However, Devlin teaches . . . [splicing text] by attaching respective splicing position marks (Pg. 4, Col. 1, Para. 2, “The first token of every sequence is always a special classification
token ([CLS]) . . . Sentence pairs are packed together into a single sequence. We differentiate the sentences in two ways. First, we separate them with a special token ([SEP])”, where “token[s]” to “differentiate the sentences” are splicing position markers, for the spliced long text, “Sentence pairs are packed together into a single sequence”) 
	. . . [determining] mark features obtained by processing the splicing position marks (Pg. 5, Fig. 2, where the splicing position marks, such as “[SEP]”, are processed to determine mark features, such as “Token Embeddings . . . E[SEP]”; see also Pg. 4, Col. 1, Para. 2-3, “First, we separate them with a special token ([SEP]) . . . For a given token, its input representation is constructed by summing the corresponding token, segment, and position embeddings. A visualization of this construction can be seen in Figure 2”)
. . . [and performing operations] based on the mark features . . . (Pg. 5, Fig. 2, “The input embeddings are the sum of the token embeddings, the segmentation embeddings and the position embeddings”, where the mark features, “token embeddings”, are used to generate “unput embeddings”, which are then used for “down-stream tasks”, see Pg. 4, Col. 1, Para. 1, “To make BERT handle a variety of down-stream tasks, our input representation is able to unambiguously represent both a single sentence and a pair of sentences (e.g., h Question, Answer i) in one token sequence”).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the generating of a word-level feature for each candidate answer to a question, comprising splicing multiple candidate answers into a long candidate answer, inputting the long candidate answer to a preset feature extraction module to generate a word-level long-answer feature, splitting the word-level long answer feature into short-answer features of a number consistent with the number of spliced candidate answers and obtaining a word-level feature based on the short answer features of Tran in view of Zhang and Yang with the splicing text by attaching respective splicing position marks, determining mark features obtained by processing the splicing position marks, and performing operations based on the mark features of Devlin in order to differential between spliced candidate answers (Devlin, Pg. 4, Col. 1, Para. 2, “Sentence pairs are packed together into a single sequence. We differentiate the sentences in two ways. First, we separate them with a special token ([SEP]). Second, we add a learned embedding to every token indicating whether it belongs to sentence A or sentence B”), which allows for unambiguous representations of multiple candidate answers in one token sequence (Devlin, Pg. 4, Col. 1, Para. 1, “To make BERT handle a variety of down-stream tasks, our input representation is able to unambiguously represent both a single sentence and a pair of sentences (e.g., h Question, Answer i) in one token sequence”) and contributes to state-of-the-art results on question answering tasks (Devlin, pg. 1, Col. 1, Abstract, “BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement)”).
	
Regarding Claim 7, Tran in view of Zhang, Yang and Devlin teach the method according to claim 5, wherein performing reasoning operations of feature combination parameters on different granularity features of each question- answer pair (Tran, Fig. 3B and Tran, Para. [0060], “FIG. 3B illustrates a process flow 320 for using the gated-self attention mechanism 303 to determine a gated self-attention output vector 306 for determining a candidate answer match probability 310, in accordance with one or more embodiments of the present disclosure . . . utilizing the memory network in tandem with gate features . . .”, where the “process flow 320” comprises reasoning operations, such as “the gated-self attention mechanism 303” and the “utilizing the memory network”, which are performed using parameters, see Tran, Para. [0016], “the gated-self attention mechanism can use one or more functions with learned parameters to determine, based on the above inputs, self-attention outputs of the gated self-attention output vector”, and performed on each question-answer pair “314” and their different features, “ck” and “xk1” through “xkn”, contained in the inputs “302”, see Tran, Para. [0054], “The inputs 302 as shown in FIG. 3A [where 3B is variant including the memory network] include an input vector denoted as x.sub.1.sup.k . . . x.sub.n.sup.k that can each include a combination of a query (or a portion thereof) and a candidate answer as described above”; see also Tran, Para. [0032], “the answer selection system can refine one or more learned parameters (e.g., learned via the question-answer dataset) by applying the one or more learned parameters at the target dataset. In some embodiments, the target dataset can include one or more candidate answers to a query, while in other embodiments, arbitrary and/or unrelated to the query”, where the “learned parameters” are feature combination parameters at least because they are “learned” and “refine[d]” using the combined features of the “quest-answer dataset” and the “target dataset”; see also Tran, Para. [0047], “form an input vector 211 representing both the query 208 and a given candidate answer of the candidate answers 210. As mentioned, for larger queries 208 (e.g., a paragraph, column, page, etc.), the input vector 211 may include a sequence of input sub-vectors” and Tran, Para. [0054], “the answer selection system 106 can feed the inputs 302 to the gated self-attention mechanism 303 for attention to the query, a given candidate answer, and contextual information . . . as opposed to utilizing the context vector or a single input vector alone”, where features analyzed by the reasoning operations are of different granularities, such as features from “input sub-vectors” of varying “query” sizes and broader “context vector” features)
at a preset number of steps in a horizontal direction based on recurrent characteristics of a recurrent neural network, comprises (Tran, Para. [0060] – [0061], “By utilizing the memory network in tandem with gate features of the gated self-attention output vector 306, the answer selection system 106 can remember important and/or relevant information, forget unimportant and/or irrelevant information, etc. by updating the cell state of the memory network. In turn, the updated cell state of the memory network can influence a next context vector state 318 (denoted as c.sub.k+1) relative to a current context vector state 315 (denoted as c.sub.k) . . . .sub.i.sup.k+1 represents the input vector 316 as the next cell state of the memory network at the k+l reasoning hop; and x.sub.i.sup.k represents the input vector 314 as the current cell state of the memory network at the kth reasoning hop”, where the “memory network” allows for the repeated “reasoning hop[s]” to “update” values, which is within the broadest reasonable interpretation of steps in a horizontal direction, and can be based on the recurrent characteristics of a “recurrent neural network”, see Tran, Para. [0028], “the memory network can include a recurrent neural network”; see also Tran, Para. [0070], “For consistency of experimentation, the data shown in the table 500 was obtained by setting the number of reasoning hops to be two”, where “setting the number of reasoning hops to be two” for “consistency of experimentation” is a preset number of steps): 
obtaining the different granularity features of each question-answer pair using the preset feature extracting module (Tran, Fig. 3B, where each candidate answer “314”, including the candidate answer pairs discussed above, are input, as a component of  “302”, to the process flow “320”; which, in view of Yang, includes a feature extraction module, see Yang, Pg. 2, Col. 2, Fig. 2, where the “Hierarchical Attention Network” must have a feature extraction module to extract the word “w”, sentence “s”, and content level features “v” of the different granularity features, including required hardware and software, which must be preset before operation); 
performing reasoning operations of feature combination parameters on different granularity features of each question-answer pair (Tran, Fig. 3B and Tran, Para. [0060], “FIG. 3B illustrates a process flow 320 for using the gated-self attention mechanism 303 to determine a gated self-attention output vector 306 for determining a candidate answer match probability 310, in accordance with one or more embodiments of the present disclosure . . . utilizing the memory network in tandem with gate features . . .”, where the “process flow 320” comprises reasoning operations, such as “the gated-self attention mechanism 303” and the “utilizing the memory network”, which are performed using parameters, see Tran, Para. [0016], “the gated-self attention mechanism can use one or more functions with learned parameters to determine, based on the above inputs, self-attention outputs of the gated self-attention output vector”, and performed on each question-answer pair “314” and their different features, “ck” and “xk1” through “xkn”, contained in the inputs “302”, see Tran, Para. [0054], “The inputs 302 as shown in FIG. 3A [where 3B is variant including the memory network] include an input vector denoted as x.sub.1.sup.k . . . x.sub.n.sup.k that can each include a combination of a query (or a portion thereof) and a candidate answer as described above”; see also Tran, Para. [0032], “the answer selection system can refine one or more learned parameters (e.g., learned via the question-answer dataset) by applying the one or more learned parameters at the target dataset. In some embodiments, the target dataset can include one or more candidate answers to a query, while in other embodiments, arbitrary and/or unrelated to the query”, where the “learned parameters” are feature combination parameters at least because they are “learned” and “refine[d]” using the combined features of the “quest-answer dataset” and the “target dataset”; see also Tran, Para. [0047], “form an input vector 211 representing both the query 208 and a given candidate answer of the candidate answers 210. As mentioned, for larger queries 208 (e.g., a paragraph, column, page, etc.), the input vector 211 may include a sequence of input sub-vectors” and Tran, Para. [0054], “the answer selection system 106 can feed the inputs 302 to the gated self-attention mechanism 303 for attention to the query, a given candidate answer, and contextual information . . . as opposed to utilizing the context vector or a single input vector alone”, where features analyzed by the reasoning operations are of different granularities, such as features from “input sub-vectors” of varying “query” sizes and broader “context vector” features)
at a preset number of steps in a horizontal direction using a preset horizontal reasoning module, wherein the reasoning operations provided by the preset horizontal reasoning module are constructed based on the recurrent characteristics of the recurrent neural network (Tran, Para. [0060] – [0061], “By utilizing the memory network in tandem with gate features of the gated self-attention output vector 306, the answer selection system 106 can remember important and/or relevant information, forget unimportant and/or irrelevant information, etc. by updating the cell state of the memory network. In turn, the updated cell state of the memory network can influence a next context vector state 318 (denoted as c.sub.k+1) relative to a current context vector state 315 (denoted as c.sub.k) . . . .sub.i.sup.k+1 represents the input vector 316 as the next cell state of the memory network at the k+l reasoning hop; and x.sub.i.sup.k represents the input vector 314 as the current cell state of the memory network at the kth reasoning hop”, where the “memory network” allows for the repeated “reasoning hop[s]” to “update” values, which is within the broadest reasonable interpretation of steps in a horizontal direction, and can be based on the recurrent characteristics of a “recurrent neural network”, see Tran, Para. [0028], “the memory network can include a recurrent neural network”; see also Tran, Para. [0070], “For consistency of experimentation, the data shown in the table 500 was obtained by setting the number of reasoning hops to be two”, where “setting the number of reasoning hops to be two” for “consistency of experimentation” is a preset number of steps; see also Tran, Fig. 3B and Tran, Para. [0060], “the answer selection system 106 can use a gated-self attention memory network (GSAMN) that combines a memory network and the gated self-attention mechanism (i.e., the GSAM described above in relation to FIG. 3A) to generate gated self-attention output vectors in determining match probabilities for candidate answers . . . [as depicted in] FIG. 3B”, where the “memory network” and its associated hardware and software comprise the horizontal reasoning module, which must be preset before operation, see generally Tran, Para. [0093], “Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory”); 
correspondingly, determining feature combination weights of the different granularity features (Tran, Para. [0055], “a gated self-attention matrix 304 with its associated values is depicted in matrix form (albeit other forms are herein contemplated). To generate or otherwise populate the values of the gated self-attention matrix 304, the gated self-attention mechanism 303 receives the inputs 302 and can execute one or more of the following example algorithms [Algorithm 00001] where W and b represent learned parameters shared among functions”, where the reasoning components, “gated self-attention mechanism” use weights “W”, which as shown in Tran, Fig. 3B and Tran, Algorithm 1 are in different combinations with the features in order to generate the multiple “values of the self-attention matrix”, are used to determine the “gated self-attention matrix 304” of “the inputs 302”, which as discussed above includes the different granularity features, see Tran, Para. [0047], “form an input vector 211 representing both the query 208 and a given candidate answer of the candidate answers 210. As mentioned, for larger queries 208 (e.g., a paragraph, column, page, etc.), the input vector 211 may include a sequence of input sub-vectors” and Tran, Para. [0054], “the answer selection system 106 can feed the inputs 302 to the gated self-attention mechanism 303 for attention to the query, a given candidate answer, and contextual information . . . as opposed to utilizing the context vector or a single input vector alone”; see also Tran, Fig. 3B and Tran, Para. [0056], “Thus, according to the above expression for generating the gated self-attention output vector 306, the answer selection system 106 can aggregate an input vector and the context vector both weighted by various self-attention outputs a, which can include values depicted in the gated self-attention matrix 304”, where “self-attention outputs a” can also be viewed as feature combination weights, which act on the different granularity features, “302” to generate the feature combination “306”)
using multiple preset vertical reasoning layers at different reasoning focuses respectively (Zhang, Pg. 1, Col. 1, Abstract, “Document-level relation extraction (RE) aims to identify relations between two entities in a given documen . . . the Dense-CCNet performs entity-pair-level logical reasoning through the Criss-Cross Attention (CCA), which can collect contextual information in horizontal and vertical directions on the entity-pair matrix to enhance the corresponding entity-pair representation”, where “vertical” components of “logical reasoning” include multiple preset layers, see Zhang, Pg. 5, col. 2, Para. 4, “We set the number of layers of Dense-CCNet to 3”, and are at different reasoning focuses, which are also preset such that the “lower layers can capture local interdependence” and “the upper layers can capture global interdependence”, see Zhang, Pg. 2, Col. 2, Para. 1, “The lower layers in DenseCCNet can capture local interdependence among entity-pairs and complete single-hop logical reasoning, while the upper layers can capture global interdependence among entity-pairs and complete multi-hop logical reasoning”), 
at each step of the reasoning operations in the horizontal direction, comprises (Zhang, Pg. 4, Col. 1, Para. 3, “the CCA module can complete entity-pair-level one-hop reasoning on the entity-pair matrix, and it is possible to complete multi-hop reasoning by stacking multiple layers of the CCA module” and Zhang, Pg. 2, Col. 1, Para. 1, “To fully capture the features of single-hop and multi-hop reasoning, we stack the multi-layer modules CCA modules by the densely connected framework. The lower layers in DenseCCNet can capture local interdependence among entity-pairs and complete single-hop logical reasoning, while the upper layers can capture global interdependence among entity-pairs and complete multi-hop logical reasoning”): 
determining the feature combination weights of the different granularity features (Tran, Para. [0055], “a gated self-attention matrix 304 with its associated values is depicted in matrix form (albeit other forms are herein contemplated). To generate or otherwise populate the values of the gated self-attention matrix 304, the gated self-attention mechanism 303 receives the inputs 302 and can execute one or more of the following example algorithms [Algorithm 00001] where W and b represent learned parameters shared among functions”, where the reasoning components, “gated self-attention mechanism” use weights “W”, which as shown in Tran, Fig. 3B and Tran, Algorithm 1 are in different combinations with the features in order to generate the multiple “values of the self-attention matrix”, are used to determine the “gated self-attention matrix 304” of “the inputs 302”, which as discussed above includes the different granularity features, see Tran, Para. [0047], “form an input vector 211 representing both the query 208 and a given candidate answer of the candidate answers 210. As mentioned, for larger queries 208 (e.g., a paragraph, column, page, etc.), the input vector 211 may include a sequence of input sub-vectors” and Tran, Para. [0054], “the answer selection system 106 can feed the inputs 302 to the gated self-attention mechanism 303 for attention to the query, a given candidate answer, and contextual information . . . as opposed to utilizing the context vector or a single input vector alone”; see also Tran, Fig. 3B and Tran, Para. [0056], “Thus, according to the above expression for generating the gated self-attention output vector 306, the answer selection system 106 can aggregate an input vector and the context vector both weighted by various self-attention outputs a, which can include values depicted in the gated self-attention matrix 304”, where “self-attention outputs a” can also be viewed as feature combination weights, which act on the different granularity features, “302” to generate the feature combination “306”)
using multiple preset vertical reasoning layers at different reasoning focuses respectively using a preset vertical reasoning module (Zhang, Pg. 1, Col. 1, Abstract, “Document-level relation extraction (RE) aims to identify relations between two entities in a given documen . . . the Dense-CCNet performs entity-pair-level logical reasoning through the Criss-Cross Attention (CCA), which can collect contextual information in horizontal and vertical directions on the entity-pair matrix to enhance the corresponding entity-pair representation”, where “vertical” components of “logical reasoning”, which in associated with the hardware and software required to operate the “Dense-CCNet” are the vertical reasoning module, which is preset with “3” “layers”, and include multiple preset layers, see Zhang, Pg. 5, Col. 2, Para. 4, “We set the number of layers of Dense-CCNet to 3”, and are at different reasoning focuses, which are also preset such that the “lower layers can capture local interdependence” and “the upper layers can capture global interdependence”, see Zhang, Pg. 2, Col. 2, Para. 1, “The lower layers in DenseCCNet can capture local interdependence among entity-pairs and complete single-hop logical reasoning, while the upper layers can capture global interdependence among entity-pairs and complete multi-hop logical reasoning”), 
at each step of the reasoning operations in the horizontal direction (Zhang, Pg. 4, Col. 1, Para. 3, “the CCA module can complete entity-pair-level one-hop reasoning on the entity-pair matrix, and it is possible to complete multi-hop reasoning by stacking multiple layers of the CCA module” and Zhang, Pg. 2, Col. 1, Para. 1, “To fully capture the features of single-hop and multi-hop reasoning, we stack the multi-layer modules CCA modules by the densely connected framework. The lower layers in DenseCCNet can capture local interdependence among entity-pairs and complete single-hop logical reasoning, while the upper layers can capture global interdependence among entity-pairs and complete multi-hop logical reasoning”), 
wherein the different vertical reasoning layers correspond to different reasoning focuses (Zhang, Pg. 1, Col. 1, Abstract, “Document-level relation extraction (RE) aims to identify relations between two entities in a given documen . . . the Dense-CCNet performs entity-pair-level logical reasoning through the Criss-Cross Attention (CCA), which can collect contextual information in horizontal and vertical directions on the entity-pair matrix to enhance the corresponding entity-pair representation”, where “vertical” components of “logical reasoning” include multiple preset layers ae at different reasoning focuses, such that the “lower layers can capture local interdependence” and “the upper layers can capture global interdependence”, see Zhang, Pg. 2, Col. 2, Para. 1, “The lower layers in DenseCCNet can capture local interdependence among entity-pairs and complete single-hop logical reasoning, while the upper layers can capture global interdependence among entity-pairs and complete multi-hop logical reasoning”); 
correspondingly, obtaining a candidate answer feature corresponding to each question-answer pair, respectively, through a final step of the reasoning operations, comprises (Tran, Fig. 3B and Tran, Para. [0062], “the answer selection system 106 can determine the next context vector state 318 based on the current gated self-attention output vector state 307 of the GSAM, the current context vector state 315, and the input vector 316 . . . The answer selection system 106 can then pass the next context vector state 318 to the probability function 308 for determining the candidate answer match probability 310”, where a candidate answer feature, “the next context vector state 318”, is iteratively obtaining by the “answer selection system”, which is passed to “the probability function 308” after the final step of reasoning of “utilizing the memory network in tandem with gate features of the gated self-attention output vector 306”, see Tran, Para. [0060]- [0062], and which corresponds with each question-answer pair, because it is iteratively generated from horizontal reasoning outputs, “based on the current gated self-attention output vector state 307 of the GSAM, the current context vector state 315, and the input vector 316”, from the input “302” containing the pairs “314”, which is used “for determining the candidate answer match”):
outputting, by the preset horizontal reasoning module, the candidate answer feature corresponding to each question-answer pair, respectively (Tran, Fig. 3B and Tran, Para. [0060], “the answer selection system 106 can use a gated-self attention memory network (GSAMN) that combines a memory network and the gated self-attention mechanism (i.e., the GSAM described above in relation to FIG. 3A) to generate gated self-attention output vectors in determining match probabilities for candidate answers . . . [as depicted in] FIG. 3B”, where, as discussed above, the “memory network” and its associated hardware and software comprise the preset horizontal reasoning module, which outputs the candidate answer feature for each pair, “318”, see Tran, Para. [0060], “the memory network can influence a next context vector state 318”); 
correspondingly, determining a target candidate answer matching the to-be-queried question (Tran, Fig. 2, where a target candidate answer is determined, such as “Candidate Answer 345-MX From Naturegeographic.com”, as matching the to-be-queried question, “Selected Answer Match 216”; see also Tran, Para. [0051], “For example, as shown, the answer selection system 106 can return the selected answer match 216 that corresponds to a candidate answer 345-MX of the candidate answers 210, in which the example website nationalgeographic.com includes a best match response to the query 208 out of the candidate answers 210”) 
based on a feature similarity between the question feature of the to-be-queried question and each candidate answer feature, comprises (Tran, Para. [0062], “The answer selection system 106 can then pass the next context vector state 318 to the probability function 308 for determining the candidate answer match probability 310 that includes a matching probability between the query and the respective candidate answer as described above in conjunction with FIG. 3A”, where the “context vector state 318” is used to generate the “candidate answer match probability”, based on a feature similarity between the “respective candidate answer” component of each candidate answer feature, “318”, and features of the “query”, as further outlined “as described above in conjunction with FIG. 3A”, see Tran, Para. [0054] - [0058], “in FIG. 3A . . . the answer selection system 106 can feed the inputs 302 to the gated self-attention mechanism 303 for attention to the query, a given candidate answer, and contextual information. In so doing, as opposed to utilizing the context vector or a single input vector alone, the answer selection system 106 can improve accuracy and flexibility for determining the candidate answer match probability 310 . . . The answer selection system 106 can then pass the current gated self-attention output vector state 307 to the probability function 308 for determining the candidate answer match probability 310 that includes a matching probability between the query and the respective candidate answer”): 
obtaining the question feature of the to-be-queried question using the preset feature extracting module (Tran, Fig. 3B, where each candidate answer “314”, including the candidate answer pairs discussed above, are input, as a component of  “302”, to the process flow “320”, in order to iteratively incorporate the question feature, data from “the query”, into the “contextual information”, see Tran, Para. [0054] - [0058], “in FIG. 3A . . . the answer selection system 106 can feed the inputs 302 to the gated self-attention mechanism 303 for attention to the query, a given candidate answer, and contextual information. In so doing, as opposed to utilizing the context vector or a single input vector alone, the answer selection system 106 can improve accuracy and flexibility for determining the candidate answer match probability 310 . . . The answer selection system 106 can then pass the current gated self-attention output vector state 307 to the probability function 308 for determining the candidate answer match probability 310 that includes a matching probability between the query and the respective candidate answer”; which, in view of Yang, includes a feature extraction module, see Yang, Pg. 2, Col. 2, Fig. 2, where the “Hierarchical Attention Network” must have a feature extraction module to extract the word “w”, sentence “s”, and content level features “v” of the different granularity features, including required hardware and software, which are comprising components of the question feature); and 
calculating the feature similarity between the question feature and each candidate answer feature (Tran, Para. [0062], “The answer selection system 106 can then pass the next context vector state 318 to the probability function 308 for determining the candidate answer match probability 310 that includes a matching probability between the query and the respective candidate answer as described above in conjunction with FIG. 3A”, where the “context vector state 318” is used to generate the “candidate answer match probability”, based on a feature similarity between the “respective candidate answer” component of each candidate answer feature, “318”, and features of the “query”, as further outlined “as described above in conjunction with FIG. 3A”, see Tran, Para. [0054] - [0058], “in FIG. 3A . . . the answer selection system 106 can feed the inputs 302 to the gated self-attention mechanism 303 for attention to the query, a given candidate answer, and contextual information. In so doing, as opposed to utilizing the context vector or a single input vector alone, the answer selection system 106 can improve accuracy and flexibility for determining the candidate answer match probability 310 . . . The answer selection system 106 can then pass the current gated self-attention output vector state 307 to the probability function 308 for determining the candidate answer match probability 310 that includes a matching probability between the query and the respective candidate answer”)
based on a preset feature matching module (Tran, Fig. 3B, where the probability function “P(A/Q) 308” and its associated hardware and software comprise the preset feature matching module, which must be preset before operation, see generally Tran, Para. [0093], “Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory”), 
and outputting the target candidate answer matching the to-be-queried question based on the feature similarity (Tran, Fig. 2, where a target candidate answer is output, such as “Candidate Answer 345-MX From Naturegeographic.com”, as matching the to-be-queried question, “Selected Answer Match 216”; see also Tran, Para. [0051], “For example, as shown, the answer selection system 106 can return the selected answer match 216 that corresponds to a candidate answer 345-MX of the candidate answers 210, in which the example website nationalgeographic.com includes a best match response to the query 208 out of the candidate answers 210”; Tran, Para. [0062], “The answer selection system 106 can then pass the next context vector state 318 to the probability function 308 for determining the candidate answer match probability 310 that includes a matching probability between the query and the respective candidate answer” and Tran, Para. [0059], “such that the answer selection system 106 can select the best query-candidate answer match”, where the outputted target candidate answer is the “best query-candidate answer match”, as determined by “the probability function 308”), 
wherein, the feature extracting module, the preset horizontal reasoning module, the preset vertical reasoning module, and the feature matching module are all used as parts forming a preset answer query model (Tran, Fig. 3B and Tran, Abstract, “The present disclosure relates to systems, methods, and non-transitory computer-readable media that can determine an answer to a query”, where the “process flow 320” of Fig. 3B is as preset answer query model, “determine an answer to a query”, which, as discussed above, includes the preset feature matching module, “P(A/Q) 308”, and the preset horizontal reasoning module, the memory network performing operations related to “312”, “318”, and “316”, which, in view of Zhang, includes the preset vertical reasoning module, see Zhang, Pg. 1, Col. 1, Abstract, “Document-level relation extraction (RE) aims to identify relations between two entities in a given documen . . . the Dense-CCNet performs entity-pair-level logical reasoning through the Criss-Cross Attention (CCA), which can collect contextual information in horizontal and vertical directions on the entity-pair matrix to enhance the corresponding entity-pair representation”, where “vertical” components of “logical reasoning”, which in associated with the hardware and software required to operate the “Dense-CCNet” are the vertical reasoning module, and, in view of Yang, includes the preset feature extraction module, see Yang, Pg. 2, Col. 2, Fig. 2, where the “Hierarchical Attention Network” must have a feature extraction module to extract the word “w”, sentence “s”, and content level features “v” of the long-answer, including required hardware and software, which must be preset before operation; see also Tran, Para. [0060], “the answer selection system 106 can use a gated-self attention memory network (GSAMN) that combines a memory network and the gated self-attention mechanism (i.e., the GSAM described above in relation to FIG. 3A) to generate gated self-attention output vectors in determining match probabilities for candidate answers . . . [as depicted in] FIG. 3B”, where the “memory network” and its associated hardware and software comprise the horizontal reasoning module; see also Tran, Para. [0093], “Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory”).  
The reasons for obviousness were discussed in regard to the rejection of Claim 1, for the combination of Tran with Zhang, and in regard to the rejection of Claim 2, for the combination of Tran and Zhang with Yang. 

Regarding Claim 12, the additional elements of the dependent claim are substantially the same as limitations of Claim 5, therefore it is rejected under the same rationale.

Regarding Claim 14, the additional elements of the dependent claim are substantially the same as limitations of Claim 7, therefore it is rejected under the same rationale.

Regarding Claim 19, the additional elements of the dependent claim are substantially the same as limitations of Claim 5, therefore it is rejected under the same rationale.

Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Zhang and Pal et al. (hereinafter Pal) (“MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset
for Medical domain Question Answering”).

Regarding Claim 6, Tran in view of Zhang teach the method according to claim 1, wherein in response to the to-be-queried question belonging to a . . . knowledge domain, . . . and the candidate answers comprise: . . . [the knowledge domain] knowledge evidences (Tran, Fig. 2, where, in response to a to-be-quired question belong to a knowledge domain of natural sciences, “Hey Assistant, How Do Killer Whales Coordinate Attacks on Prey?”, the candidate answers comprise at least one candidate answer belonging to the natural sciences knowledge evidences, “Candidate Answer 345-MX From Naturegeographic.com”).  
	Tran in view of Zhang do not explicitly disclose . . . medical . . . the to-be-queried question comprises: a combination of a to-be-queried medical question and candidate options . . . medical . . . .
	However, Pal teaches . . . [using reasoning operations to provide an answer to a question, wherein in response to a question belonging to a] medical [knowledge domain] the to-be-queried question comprises: a combination of a to-be-queried medical question and candidate options [and the candidate answers comprise] medical [knowledge evidences] (Pg. 7, Col. 1, Para. 2, “PubMedBERT  . . . is used to evaluate the performance of a fully in-domain pre-trained model on the dataset” and Pg. 1, Abstract, “This paper introduces MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions”, where “evaluat[ing] the performance” of a “model” on “Multiple-Choice” “medical entrance exam questions” requires using reasoning operations of the model to answer the “medical” knowledge domain “questions” with one of the “medical” knowledge evidences “[a]nswer[s]”, where the questions include candidate “Multiple-Choice” options; see also Pg. 12, Appendix A.1.).
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the method for determining an answer to a question, wherein in response to a to-be-queried question belonging to a particular knowledge domain the candidate answers comprise knowledge evidences from that particular knowledge domain of Tran in view of Zhang with the use of reasoning operations to provide an answer to a question, wherein in response to a question belonging to a medical knowledge domain the to-be-queried question comprises a combination of a to-be-queried medical question and candidate options and the candidate answers comprise medical knowledge evidences of Pal in order to utilize reasoning operations to answer expert-level and domain-specific medical questions (Pal, Pg. 6, Col. 2, Para. 1, “The primary motivation of the baseline experiments is to understand the adequacy of the current models in answering multiple-choice questions meant for human domain experts (post-graduate medical students) and to understand the level of domain specificity required in the models”), which could assist medical experts in real medical examinations if trained to generate answers based on an understanding of the medical domain (Pal, Pg. 1, Col. 2, Para. 2, “automatic questions answering for real medical examination is still a challenge that is less explored . . . the requirement of a comprehensive understanding of the domain, matching human experts, makes them appealing for research pursuits”, where “automatic questions answering for real medical examination” is a “less explored” but “appealing . . . research pursuit”; Pal, Pg. 8, Col. 1, Para. 2, “The error analysis details on a sample set of mispredictions by the best baseline model (PubMedBERT) is given in this section”, where the model pretrained on the medical domain, “PubMedBERT”, had the “best” performance, which suggests future utility of this research pursuit).

Regarding Claim 13, the additional elements of the dependent claim are substantially the same as limitations of Claim 6, therefore it is rejected under the same rationale.

Regarding Claim 20, the additional elements of the dependent claim are substantially the same as limitations of Claim 6, therefore it is rejected under the same rationale.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW BRYCE GOLAN whose telephone number is (571)272-5159. The examiner can normally be reached Monday through Friday, 8:00 AM to 5:00 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached at (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MATTHEW BRYCE GOLAN/Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
Method, Apparatus for Determining Answer to Question, Device, Storage Medium and Program Product

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Method, Apparatus for Determining Answer to Question, Device, Storage Medium and Program Product

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email