DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. The claims have been given priority date of 9/16/2023 related to the filling of provisional application 63/583,247.
Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because reference characters "130" and "200" have both been used to designate the Repetition Detector in FIGS. 1 and 2. Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1 to 20 are rejected under 35 U.S.C. 101 as being directed to a patent-ineligible subject matter.
The independent claim 1 is directed to a method, which is a statutory category of invention. Said method stated perform a series of steps that could be considered mental processes that could be realized by a person as mental steps or with pen and paper. In this claim:
accessing textual content that was generated by a large language model (LLM); a human can access text generated by a generic LLM or generated by another person.
wherein the textual content comprises a plurality of sub-components including a first sub-component and a second sub-component; a human can obtain multiple pieces of text and classify them as a first and second sub-component.
generating a first embedding that represents the first sub-component; a human can allocate the words in the sub-components in a look up table where each word can be represented by a number and the sub-component would become a vector that contain the numbers that represent the words that compose the sub-component.
generating a second embedding that represents the second sub-component; a human can repeat previously stated process for the second component.
based on a similarity between the first embedding and the second embedding, determining whether the second sub-component is repetitious with respect to the first sub-component; a human can compare the simple vectors generated from the sub-components and determine if there is similarity between them.
in response to determining that the second sub-component is repetitious with respect to the first sub-component, removing at least a portion of the second sub-component from the textual content; a human can remove at least a portion of the second sub-component.
Those steps constitute as an abstract idea directed to a mental process that can be executed by a human as mental steps or using pen and paper, which affect their patent eligibility. Elements disclosed in the claim such as wherein the method is performed by one or more computing devices and the large language model (LLM) are generic computing devices and generic software that are being used as a tool to implement the mental process, not integrating the mental process into a practical application and not adding significantly more to the mental steps.
The dependent claim 2 inherits the rejection of claim 1 and it further recites: wherein the textual content is first textual content, wherein removing the portion of the second sub-component from the textual content results in modified textual content, further comprising: submitting, to a second LLM to generate a second textual content that is different than the first textual content, the modified textual content and a prompt to re-phrase the modified textual content. A human could modify a second textual content in order to generate a second textual content that is different than the first textual content. The second large language model (LLM) described in the claim is being used as a generic software tool to implement the mental process, not integrating the mental process into a practical application and not adding significantly more to the mental steps.
The dependent claim 3 inherits the rejection of claim 1 and it further recites:
further comprising, prior to generating the first embedding and the second embedding: determining whether the first sub-component matches the second sub-component at a target level of textual granularity; Wherein a human can determine and obtain textual components at a target level of textual granularity.
wherein generating the first and second embeddings are performed in response to determining that the first sub-component matches the second sub-component at the target level of textual granularity. Wherein a human can compare the first and second sub-component and determine if they match and meet the target level of granularity.
The dependent claim 4 inherits the rejection of claim 1 and it further recites:
further comprising: submitting, to a second LLM, the first sub-component and a first prompt to summarize the first sub-component, wherein the second LLM outputs, based on the first sub-component and the first prompt, a first summary of the first sub-component; Wherein a human can summarize a text.
submitting, to the second LLM, the second sub-component and a second prompt to summarize the second sub-component, wherein the second LLM outputs, based on the second sub-component and the second prompt, a second summary of the second sub-component; Wherein a human can summarize a second text.
wherein generating the first embedding comprises inputting the first summary into a language model that generates the first embedding based on the first summary; Wherein a human can generate embeddings using the method discussed in claim 1.
wherein generating the second embedding comprises inputting the second summary into the language model that generates the second embedding based on the second summary. Wherein a human can generate embeddings using the method discussed in claim 1.
The second large language model (LLM) described in the claim is being used as a generic software tool to implement the mental process, not integrating the mental process into a practical application and not adding significantly more to the mental steps.
The dependent claim 5 inherits the rejection of claim 1 and it further recites:
wherein: the plurality of sub-components includes a third sub-component and a fourth sub-component; Wherein a human can obtain two more textual elements.
the first sub-component and the second sub-component correspond to a first level of granularity of a plurality of levels of granularity; Wherein a human can determine if the elements correspond to a certain level of granularity.
the third sub-component and the fourth sub-component correspond to a second level of granularity, of the plurality of levels of granularity, that is different than the first level of granularity; Wherein a human can select a different granularity than the one selected for the first two sub-components.
the plurality of levels of granularity comprises one or more of a word, a phrase, a sentence, or a paragraph; Wherein a human can select any level of granularity.
wherein the method further comprises: generating a third embedding that represents the third sub-component; Wherein a human can generate the embeddings as discussed in claim 1.
generating a fourth embedding that represents the fourth sub-component; Wherein a human can generate embeddings as discussed in claim 1.
based on a similarity between the third embedding and the fourth embedding, determining whether the fourth sub-component is repetitious with respect to the third sub-component; Wherein a human can evaluate the difference between two simple vectors and determine if they are repetitious.
in response to determining that the fourth sub-component is repetitious with respect to the third sub-component, removing at least a portion of the fourth sub-component from the textual content. Wherein a human can remove a portion of one of the sub-components.
The dependent claim 6 inherits the rejection of claim 1 and it further recites:
further comprising: determining a frequency, in the textual content, of a particular word; Wherein a human can determine the frequency of a word.
and in response to determining that the frequency meets one or more content modification criteria, removing one or more occurrences of the particular word from the textual content. Wherein a human can remove one of the occurrences of a particular word.
The dependent claim 7 inherits the rejection of claim 1 and it further recites:
further comprising: computing a cosine similarity value based on the first embedding and the second embedding; Wherein a human can compute the cosine similarity by hand, calculating the dot product of the vectors and dividing it by magnitudes.
determining whether the cosine similarity value exceeds a particular threshold value; Wherein a human can determine a threshold for such evaluation.
wherein removing is performed in response to determining that the cosine similarity value exceeds the particular threshold value. Wherein a human can compare the obtained value with the threshold.
The dependent claim 8 inherits the rejection of claim 1 and it further recites:
wherein: generating the first embedding comprises inputting the first sub-component to a machine-learned model; Wherein the machine-learned model could be replaced for a human, and that human can receive a textual component and create the vectors described in claim 1.
and generating the second embedding at least by applying the machine-learned model the second sub-component. Same logic as previous limitation.
The machine-learned model described in the claim is being used as a tool to implement the mental process, not integrating the mental process into a practical application and not adding significantly more to the mental steps of claim 1.
The independent claim 9 is directed to a method, which is a statutory category of invention. Said method stated perform a series of steps that could be considered mental processes that could be realized by a person as mental steps or pen and paper. In this claim, for:
accessing first textual content that was generated by a large language model (LLM), wherein the first textual content comprises a plurality of sub-components including a first sub-component and a second sub-component; a human can obtain content generated by a large language model (LLM) and organize them in a plurality of sub-components.
based on a similarity between the first sub-component and the second sub-component, determining whether the second sub-component is repetitious with respect to the first sub-component; a human can evaluate the similarity between two textual components and determine if the components are repetitious.
in response to determining that the second sub-component is repetitious with respect to the first sub-component, removing at least a portion of the second sub-component from the first textual content to generate modified textual content; a human can remove a certain portion of the sub-component to generate a modified textual content.
submitting, to a second LLM, the modified textual content and a prompt to rephrase the modified textual content; a human can rephrase the modified textual content without the need of a second LLM.
accessing second textual content that is generated by the second LLM based on the modified textual content and the prompt; similarly to the first limitation a human can access the text generated by a LLM or following the logic mentioned in the previous limitation can access the textual content previously rephrased by a human.
Those steps constitute as an abstract idea directed to a mental process that can be executed by a human as mental steps or using pen and paper, which affect their patent eligibility. Elements disclosed in the claim such as wherein the method is performed by one or more computing devices and the first and second large language model (LLM) are generic computing devices and generic software that are being used as a tool to implement the mental process, not integrating the mental process into a practical application and not adding significantly more to the mental steps.
The dependent claim 10 inherits the rejection of claim 9 and it further recites:
wherein: the first sub-component and the second sub-component are two instances of a particular word; Wherein a human can locate two instances of a particular word.
removing the portion of the second sub-component comprises removing one of the two instances of the particular word. Wherein a human can remove one of the two instances of the particular word.
The dependent claim 11 inherits the rejection of claim 9 and it further recites:
wherein: the first sub-component and the second sub-component are different phrases, different sentences, or different paragraphs within the first textual content; Wherein a human can select and obtain different components and different component granularities.
the method further comprising: generating a first embedding that represents the first sub-component; Wherein a human can generate simple embeddings as discussed in claim 9.
generating a second embedding that represents the second sub-component;
the similarity is a similarity between the first embedding and the second embedding. Wherein a human can compare two vectors, embeddings, and determine similarity by hand using a mathematical equation as described in claim 7.
The independent claim 12, with one or more non-transitory storage media storing instructions, is directed to the device, which is a statutory category of invention. Said device stated perform a series of steps that could be considered mental processes that could be realized by a person as mental steps or pen and paper. In this claim, for:
accessing textual content that was generated by a large language model (LLM), wherein the textual content comprises a plurality of sub-components including a first sub-component and a second sub-component; a human can obtain content generated by a large language model (LLM) and organize them in a plurality of sub-components.
generating a first embedding that represents the first sub-component; a human can allocate the words in the sub-components in a look up table where each word can be represented by a number and the sub-component would become a vector that contain the numbers that represent the words that compose the sub-component.
generating a second embedding that represents the second sub-component; a human can perform a similar logic used in previous limitation.
based on a similarity between the first embedding and the second embedding, determining whether the second sub-component is repetitious with respect to the first sub-component; a human can evaluate the similarity between two textual components and determine if the components are repetitious.
in response to determining that the second sub-component is repetitious with respect to the first sub-component, removing at least a portion of the second sub-component from the textual content. a human can remove at least a portion of the second sub-component.
Those steps constitute as an abstract idea directed to a mental process that can be executed by a human as mental steps or using pen and paper, which affect their patent eligibility. The One or more non-transitory storage media storing instructions which, when executed by one or more computing devices, cause the instructions listed above and the large language model (LLM) are generic computing devices that are being used as a tool to implement the mental process, not integrating the mental process into a practical application and not adding significantly more to the mental steps.
The dependent claim 13 inherits the rejection of claim 12 and it further recites: wherein the textual content is first textual content, wherein removing the portion of the second sub-component from the textual content results in modified textual content, wherein the instructions, when executed by the one or more computing devices, further cause: submitting, to a second LLM to generate a second textual content that is different than the first textual content, the modified textual content and a prompt to re-phrase the modified textual content. Wherein a human can modify a textual context in order to make the second textual content different from the first textual content.
The dependent claim 14 inherits the rejection of claim 12 and analogous arguments to the ones discussed in claim 3 are applicable.
The dependent claim 15 inherits the rejection of claim 12 and analogous arguments to the ones discussed in claim 4 are applicable.
The dependent claim 16 inherits the rejection of claim 12 and analogous arguments to the ones discussed in claim 5 are applicable.
The dependent claim 17 inherits the rejection of claim 12 and analogous arguments to the ones discussed in claim 6 are applicable.
The dependent claim 18 inherits the rejection of claim 12 and analogous arguments to the ones discussed in claim 7 are applicable.
The dependent claim 19 inherits the rejection of claim 12 and analogous arguments to the ones discussed in claim 8 are applicable.
Claim 20 inherits the rejection of claim 9 and analogous arguments to the ones discussed in claim 9 are applicable, although it further recites: One or more non-transitory storage media storing instructions which, when executed by one or more computing devices, cause performance of the method recited in Claim 9. Wherein the non-transitory storage media and the one or more computer devices are a generic computer system executing the method of claim 9, and not a specialized computer equipment, and are being used as tools to implement the mental process, not integrating the mental process into a practical application and not adding significantly more to the mental steps.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1 to 5, 8, 9, 11 to 16, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Yun; Zhen Ou et al. (US-20220358154-A1) hereinafter "YUN" in view of Enomoto; Masafumi et al. (US-20250045525-A1, relying on support from provisional app 63/522,470, which was filed 6/22/2023), hereinafter "ENOMOTO".
Regarding claim 1, YUN teaches:
generating a first embedding that represents the first sub-component;
“In some embodiments, the similarity calculation module 402 may use part of speech (POS) information of words in the first text statement to calculate the similarity value. Typically, the POS information may be determined based on the grammar of the language using natural language processing (NLP) algorithms now known or to be developed. As an example, NLP algorithms may determine POS information of the word “book” in a statement “Please input a book name” as noun, and may determine POS information of the word “book” in a statement “Please book a meeting room” as verb. Referring now to FIG. 5, example parameters used in the similarity calculation are depicted according to embodiments of the present invention. An example input may be a statement “The password is weak, please input a secure one”. In this example, besides token embeddings, segment embeddings and position embeddings of words or symbols in the statement obtained by NLP algorithms, POS information (which may also be referred to as POS embeddings) of words in the statement may also be applied by the similarity calculation module 402 in the similarity calculation…” (YUN [0058]).
generating a second embedding that represents the second sub-component;
“In some embodiments, the similarity calculation module 402 may use part of speech (POS) information of words in the first text statement to calculate the similarity value. Typically, the POS information may be determined based on the grammar of the language using natural language processing (NLP) algorithms now known or to be developed. As an example, NLP algorithms may determine POS information of the word “book” in a statement “Please input a book name” as noun, and may determine POS information of the word “book” in a statement “Please book a meeting room” as verb. Referring now to FIG. 5, example parameters used in the similarity calculation are depicted according to embodiments of the present invention. An example input may be a statement “The password is weak, please input a secure one”. In this example, besides token embeddings, segment embeddings and position embeddings of words or symbols in the statement obtained by NLP algorithms, POS information (which may also be referred to as POS embeddings) of words in the statement may also be applied by the similarity calculation module 402 in the similarity calculation…” (YUN [0058]).
based on a similarity between the first embedding and the second embedding, determining whether the second sub-component is repetitious with respect to the first sub-component;
“The present disclosure provides a computer-implemented method, computer system and computer program product for text processing. The present in invention may include obtaining an original text input from a collaborative development environment. The present invention may include extracting a first text statement from the original input text. The present invention may include calculating a similarity value between the first text statement and a second text statement, wherein the second text statement is obtained from a statement database. The present invention may include comparing the similarity value to a pre-set threshold.” (YUN [Abstract]). “At block 610, a similarity value between a first text statement obtained from text inputs and a second text statement obtained from a statement database may be calculated. Then, at block 612, if the calculated similarity value is larger than a pre-set threshold, the method 600 moves to block 614. At block 614, if keywords of the first text statement map with keywords of the second text statement, the method 600 moves to block 616. At block 616, the first text statement may be determined as redundant to the second text statement.” (YUN [0072]).
in response to determining that the second sub-component is repetitious with respect to the first sub-component, removing at least a portion of the second sub-component from the textual content;
“In some embodiments, the text processing system 400 may further comprise a text processing module 404. If the text processing system 400 classifies the first text statement as redundant to the second text statement, the text processing module 404 may apply the second text statement in a relevant file to replace the first text statement. For example, the first text statement to be analyzed may be “Please enter a good user id and password.” The text processing system 400 may determine the first text statement “Please enter a good user id and password.” as a non-standard statement relevant to the second text statement “Please enter a valid user id and password.” stored in the statement database 411. The text processing module 404 may replace the first text statement “Please enter a good user id and password.” with the second text statement “Please enter a valid user id and password.” in any or all relevant text files to remove redundancy or non-standardization in the collaborative development environment.” (YUN [0068]).
wherein the method is performed by one or more computing devices.
“The present disclosure provides a computer-implemented method, computer system and computer program product for text processing. The present in invention may include obtaining an original text input from a collaborative development environment. The present invention may include extracting a first text statement from the original input text. The present invention may include calculating a similarity value between the first text statement and a second text statement, wherein the second text statement is obtained from a statement database. The present invention may include comparing the similarity value to a pre-set threshold.” (YUN [Abstract]).
YUN does not teach, but ENOMOTO teaches:
A method comprising: accessing textual content that was generated by a large language model (LLM), wherein the textual content comprises a plurality of sub-components including a first sub-component and a second sub-component;
“FIG. 1 schematically illustrates a method and overall system architecture 100 for generating text summaries in accordance with an embodiment of the present invention. At least one document is taken as (a) input 102. This is then passed to the (1) extractive summarizer 104. Next, the (1) extractive summarizer 104 selects a subset of sentences from the (a) input 102. Then, the (3) preprocessor 108 adds context to the extracted sentences and removes meaningless words and phrases to generate the prompt for the abstractive summarizer as a (c) preprocessed summary 110. The (2) abstractive summarizer 112 (which also could be an LLM) then takes the (c) preprocessed summary 110 as input and generates (d) fluent summary 114 as output. Finally, the (4) explainer 120 receives three different summaries ((b) extractive summary 106, (c) preprocessed summary 110, (d) fluent summary 114) for the (a) input 102 and generates a transparent summary view for another AI system and/or a user.” (ENOMOTO [0032], as supported by [0016] in provisional).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in YUN the capability to add various components of textual content generated by a large language model as described by ENOMOTO in: “In a first aspect, the present invention provides a computer-implemented, machine learning method for generating explainable text summaries includes extracting a subset of sentences from at least one input document as an extractive summary and adding context to the extracted sentences to generate a prompt. ...” (ENOMOTO [0017], as supported by [0016] and [0034] in provisional). Wherein the various components are represented by various summaries generated from different sentences of a document.
Regarding claim 2, the rejection of claim 1 is incorporated, furthermore YUN teaches:
YUN does not teach, but ENOMOTO teaches:
The method of Claim 1, wherein the textual content is first textual content, wherein removing the portion of the second sub-component from the textual content results in modified textual content, further comprising: submitting, to a second LLM to generate a second textual content that is different than the first textual content, the modified textual content and a prompt to re-phrase the modified textual content.
“In a first aspect, the present invention provides a computer-implemented, machine learning method for generating explainable text summaries includes extracting a subset of sentences from at least one input document as an extractive summary and adding context to the extracted sentences to generate a prompt. A fluent summary is generated by using the prompt as input to a generative language model. Source information for a sentence from the fluent summary is determined by mapping the sentence from the fluent summary to at least one sentence in the extractive summary and the at least one sentence from the extractive summary to at least one sentence from the at least one input document. A transparent summary view is generated showing the sentence from the fluent summary along with the source information from the extractive summary and the at least one input document for display on a user interface.” (ENOMOTO [0017], as supported by [0015] and [0016] in provisional).
“One implementation is to use a neural network that receives the reduced (c) preprocessed summary 110 and outputs a paraphrase of the content it contains as the (d) fluent summary 114. For example, by fine-tuning a pre-trained language model with a summarization dataset, a summarizer can be obtained. Alternatively, a language model already trained on a generic task (e.g., ChatGPT) can be used. Given a reduced summary (c) and task instructions (e.g., “summarize the document”), it generates an abstract summary as the (d) fluent summary 114. The (2) abstractive summarizer 112 can be run via a web application processing interface (API) or via a local computing system.” (ENOMOTO [0049], as supported by [0024] in provisional).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in YUN the capability to modify a sentence in a way that is textually different from the original text, rewriting the content in an alternative textual representation as described by ENOMOTO in: “Embodiments of the present invention provide solutions that address both of these shortcomings of existing technology jointly, and provide to reduce the input length in a secure and reliable manner before asking the LLM to generate a fluent summary.” (ENOMOTO [0016], as supported by [0015] and [0016] in provisional). Wherein the rephrasing is done as part of the summarizing process and the result is a modified textual content.
Regarding claim 3, the rejection of claim 1 is incorporated, furthermore YUN teaches:
The method of Claim 1, further comprising, prior to generating the first embedding and the second embedding: determining whether the first sub-component matches the second sub-component at a target level of textual granularity;
“In some embodiments, the text processing system 400 may further comprise a classifier 421. The classifier 421 may classify the first text statement into multiple categories based on multiple dimensions. As an example, the first text statement may be classified based on sentence structures. The categories of the first text statement based on sentence structures may include simple sentence, complex sentence, and words/phrases, etc. ...” (YUN [0053]).
Regarding claim 4, the rejection of claim 1 is incorporated, furthermore YUN does not teach, but ENOMOTO teaches:
The method of Claim 1, further comprising: submitting, to a second LLM, the first sub-component and a first prompt to summarize the first sub-component, wherein the second LLM outputs, based on the first sub-component and the first prompt, a first summary of the first sub-component;
“In an embodiment, the present invention provides a computer-implemented, machine learning method for generating explainable text summaries. A subset of sentences are extracted from at least one input document as an extractive summary. Context is added to the extracted sentences to generate a prompt. A fluent summary is generated by using the prompt as input to a generative language model. Source information for a sentence from the fluent summary is determined by mapping the sentence from the fluent summary to at least one sentence in the extractive summary and the at least one sentence from the extractive summary to at least one sentence from the at least one input document. A transparent summary view is generated showing the sentence from the fluent summary along with the source information from the extractive summary and the at least one input document for display on a user interface. The method has applications including, but not limited to medical AI, public safety and other machine learning applications for reliable and explainable document summarization. (ENOMOTO [0004], as supported by [0011] and [0016] in provisional).
submitting, to the second LLM, the second sub-component and a second prompt to summarize the second sub-component, wherein the second LLM outputs, based on the second sub-component and the second prompt, a second summary of the second sub-component;
“In a second aspect, the present invention provides the method according to the first aspect, wherein mapping the sentence from the fluent summary to the at least one sentence in the extractive summary and/or mapping the at least one sentence from the extractive summary to the at least one sentence from the at least one input document is performed using a natural language inference model that predicts for the mapping whether a respective one of the sentences is entailed by another one of the sentences.” (ENOMOTO [0018], as supported by [0034] and [0035] in provisional).
“FIG. 2 schematically illustrates an overall architecture of a preprocessor 200, and steps performed by the preprocessor 200, in accordance with an embodiment of the present invention. The preprocessor 200 is configured to generate a prompt for the abstractive summarizer (e.g., an LLM system such as ChatGPT) such that (1) it reduces the number of tokens of the prompt to save the computational cost and inference time, and (2) adds context to extracted sentences in order to mitigate hallucinations of fact in the final summary....” (ENOMOTO [0037], as supported by [0016] in provisional).
wherein generating the first embedding comprises inputting the first summary into a language model that generates the first embedding based on the first summary;
In a third aspect, the present invention provides the method according to the first or second aspect, wherein mapping the sentence from the fluent summary to the at least one sentence in the extractive summary is performed by embedding the sentence from the fluent summary and each respective one of the extracted sentences as a numerical vector using a sentence embedding model, and selecting a number k of the extracted sentences that are nearest neighbors to the sentence from the fluent summary as evidence in the extractive summary. (ENOMOTO [0019], as supported by [0022] (4) in provisional).
wherein generating the second embedding comprises inputting the second summary into the language model that generates the second embedding based on the second summary.
In a third aspect, the present invention provides the method according to the first or second aspect, wherein mapping the sentence from the fluent summary to the at least one sentence in the extractive summary is performed by embedding the sentence from the fluent summary and each respective one of the extracted sentences as a numerical vector using a sentence embedding model, and selecting a number k of the extracted sentences that are nearest neighbors to the sentence from the fluent summary as evidence in the extractive summary. (ENOMOTO [0019], as supported by [0022] (4) in provisional).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in YUN the capability to summarize the components of textual components of claim 1 and generate embeddings of such summaries. The benefits and motivation of such modification could be described by ENOMOTO in: “… By definition, in summarization, the LLM is provided with very long (or even multiple) documents. This is particularly useful for reducing the reading time and cognitive burden on a human, which is very high when humans need to read large amounts of text. Embodiments of the present invention enable to reduce this cognitive load with the use of improved AI technology. At the same time, embodiments of the present invention enable to significantly reduce the cost of using LLM systems, by reducing the computational load, thereby enabling to conserve computational resources and/or compute time. In particular, embodiments of the present invention provide to reduce the input length of the LLM query, and enable to even make it as short as possible.” (ENOMOTO [0015], as supported by [0014] in provisional). Additionally some further motivation could be found in: “In a ninth aspect, the present invention provides the method according to any of the first to eighth aspects, further comprising checking whether one or more of the extracted sentences is a duplicate by semantically comparing embeddings of the extracted sentences using a similarity threshold, and excluding the one or more of the extracted sentences from the prompt based on a determination that the one or more of the extracted sentences is within the similarity threshold to another one of the extracted sentences.” (ENOMOTO [0025], as supported by [0022] (4) in provisional). Where in the summarized sentences embeddings are utilized to compare the similarity of the sentences.
Regarding claim 5, the rejection of claim 1 is incorporated, furthermore YUN teaches:
the first sub-component and the second sub-component correspond to a first level of granularity of a plurality of levels of granularity;
“In some embodiments, the text processing system 400 may further comprise a classifier 421. The classifier 421 may classify the first text statement into multiple categories based on multiple dimensions. As an example, the first text statement may be classified based on sentence structures. The categories of the first text statement based on sentence structures may include simple sentence, complex sentence, and words/phrases, etc.” (YUN [0053]).
the third sub-component and the fourth sub-component correspond to a second level of granularity, of the plurality of levels of granularity, that is different than the first level of granularity;
“In some embodiments, the text processing system 400 may further comprise a classifier 421. The classifier 421 may classify the first text statement into multiple categories based on multiple dimensions. As an example, the first text statement may be classified based on sentence structures. The categories of the first text statement based on sentence structures may include simple sentence, complex sentence, and words/phrases, etc.” (YUN [0053]).
the plurality of levels of granularity comprises one or more of a word, a phrase, a sentence, or a paragraph;
“In some embodiments, the text processing system 400 may further comprise a classifier 421. The classifier 421 may classify the first text statement into multiple categories based on multiple dimensions. As an example, the first text statement may be classified based on sentence structures. The categories of the first text statement based on sentence structures may include simple sentence, complex sentence, and words/phrases, etc.” (YUN [0053]).
wherein the method further comprises: generating a third embedding that represents the third sub-component;
“ ...In this example, besides token embeddings, segment embeddings and position embeddings of words or symbols in the statement obtained by NLP algorithms, POS information (which may also be referred to as POS embeddings) of words in the statement may also be applied by the similarity calculation module 402 in the similarity calculation....” (YUN [0058]).
generating a fourth embedding that represents the fourth sub-component;
“ ...In this example, besides token embeddings, segment embeddings and position embeddings of words or symbols in the statement obtained by NLP algorithms, POS information (which may also be referred to as POS embeddings) of words in the statement may also be applied by the similarity calculation module 402 in the similarity calculation....” (YUN [0058]).
based on a similarity between the third embedding and the fourth embedding, determining whether the fourth sub-component is repetitious with respect to the third sub-component;
“The present disclosure provides a computer-implemented method, computer system and computer program product for text processing. The present in invention may include obtaining an original text input from a collaborative development environment. The present invention may include extracting a first text statement from the original input text. The present invention may include calculating a similarity value between the first text statement and a second text statement, wherein the second text statement is obtained from a statement database. The present invention may include comparing the similarity value to a pre-set threshold.” (YUN [Abstract]).
“At block 610, a similarity value between a first text statement obtained from text inputs and a second text statement obtained from a statement database may be calculated. Then, at block 612, if the calculated similarity value is larger than a pre-set threshold, the method 600 moves to block 614. At block 614, if keywords of the first text statement map with keywords of the second text statement, the method 600 moves to block 616. At block 616, the first text statement may be determined as redundant to the second text statement.” (YUN [0072]).
in response to determining that the fourth sub-component is repetitious with respect to the third sub-component, removing at least a portion of the fourth sub-component from the textual content.
“In some embodiments, the text processing system 400 may further comprise a text processing module 404. If the text processing system 400 classifies the first text statement as redundant to the second text statement, the text processing module 404 may apply the second text statement in a relevant file to replace the first text statement. For example, the first text statement to be analyzed may be “Please enter a good user id and password.” The text processing system 400 may determine the first text statement “Please enter a good user id and password.” as a non-standard statement relevant to the second text statement “Please enter a valid user id and password.” stored in the statement database 411. The text processing module 404 may replace the first text statement “Please enter a good user id and password.” with the second text statement “Please enter a valid user id and password.” in any or all relevant text files to remove redundancy or non-standardization in the collaborative development environment.” (YUN [0068]).
YUN does not teach, but ENOMOTO teaches:
wherein: the plurality of sub-components includes a third sub-component and a fourth sub-component;
“FIG. 1 schematically illustrates a method and overall system architecture 100 for generating text summaries in accordance with an embodiment of the present invention. At least one document is taken as (a) input 102. This is then passed to the (1) extractive summarizer 104. Next, the (1) extractive summarizer 104 selects a subset of sentences from the (a) input 102. Then, the (3) preprocessor 108 adds context to the extracted sentences and removes meaningless words and phrases to generate the prompt for the abstractive summarizer as a (c) preprocessed summary 110. The (2) abstractive summarizer 112 (which also could be an LLM) then takes the (c) preprocessed summary 110 as input and generates (d) fluent summary 114 as output. Finally, the (4) explainer 120 receives three different summaries ((b) extractive summary 106, (c) preprocessed summary 110, (d) fluent summary 114) for the (a) input 102 and generates a transparent summary view for another AI system and/or a user. “ (ENOMOTO [0032], as supported by [0016] in provisional).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in YUN the capability to add various components of textual content generated by a large language model different from the initial components selected. Note that in this case we could obtain the same results from YUN in view of ENOMOTO by executing the process described a more than one time. The benefits and motivations for the modification are described by ENOMOTO in: “In a first aspect, the present invention provides a computer-implemented, machine learning method for generating explainable text summaries includes extracting a subset of sentences from at least one input document as an extractive summary and adding context to the extracted sentences to generate a prompt. ...” (ENOMOTO [0017], as supported by [0016] and [0034] in provisional). Wherein those different summaries of the sentence represent different textual components generated.
Regarding claim 8, the rejection of claim 1 is incorporated, furthermore YUN does not teach, but ENOMOTO teaches:
wherein: generating the first embedding comprises inputting the first sub-component to a machine-learned model;
“Second implementation 500b: Alternatively or additionally, the (5) tracer for a sentence in fluent summary 410 can be implemented with a sentence embedding model (dense retriever), such as the SentenceBERT model, wherein each input sentence is represented as an n-dimensional vector. A sentence in the (f) fluent summary 508b and each sentence in the (b) extractive summary 506b are converted into numerical vectors that are embeddings 520 in a latent space.” (ENOMOTO [0054], as supported by [0026] (1) Method 2 in provisional).
and generating the second embedding at least by applying the machine-learned model the second sub-component.
“Second implementation 500b: Alternatively or additionally, the (5) tracer for a sentence in fluent summary 410 can be implemented with a sentence embedding model (dense retriever), such as the SentenceBERT model, wherein each input sentence is represented as an n-dimensional vector. A sentence in the (f) fluent summary 508b and each sentence in the (b) extractive summary 506b are converted into numerical vectors that are embeddings 520 in a latent space.” (ENOMOTO [0054], as supported by [0026] (1) Method 2 in provisional).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in YUN the capability to apply a machine-learned model to generate the embeddings of the textual components, such as dense retrieval. Benefits and motivation to obtain those embeddings can be described by ENOMOTO in: “In a ninth aspect, the present invention provides the method according to any of the first to eighth aspects, further comprising checking whether one or more of the extracted sentences is a duplicate by semantically comparing embeddings of the extracted sentences using a similarity threshold, and excluding the one or more of the extracted sentences from the prompt based on a determination that the one or more of the extracted sentences is within the similarity threshold to another one of the extracted sentences.” (ENOMOTO [0025], as supported by [0022] (4) in provisional).
Regarding claim 9, YUN teaches:
based on a similarity between the first sub-component and the second sub-component, determining whether the second sub-component is repetitious with respect to the first sub-component;
“The present disclosure provides a computer-implemented method, computer system and computer program product for text processing. The present in invention may include obtaining an original text input from a collaborative development environment. The present invention may include extracting a first text statement from the original input text. The present invention may include calculating a similarity value between the first text statement and a second text statement, wherein the second text statement is obtained from a statement database. The present invention may include comparing the similarity value to a pre-set threshold.” (YUN [Abstract]). “At block 610, a similarity value between a first text statement obtained from text inputs and a second text statement obtained from a statement database may be calculated. Then, at block 612, if the calculated similarity value is larger than a pre-set threshold, the method 600 moves to block 614. At block 614, if keywords of the first text statement map with keywords of the second text statement, the method 600 moves to block 616. At block 616, the first text statement may be determined as redundant to the second text statement.” (YUN [0072]).
in response to determining that the second sub-component is repetitious with respect to the first sub-component, removing at least a portion of the second sub-component from the first textual content to generate modified textual content;
“In some embodiments, the text processing system 400 may further comprise a text processing module 404. If the text processing system 400 classifies the first text statement as redundant to the second text statement, the text processing module 404 may apply the second text statement in a relevant file to replace the first text statement.” (YUN [0068]).
wherein the method is performed by one or more computing devices.
“The present disclosure provides a computer-implemented method, computer system and computer program product for text processing. The present in invention may include obtaining an original text input from a collaborative development environment. The present invention may include extracting a first text statement from the original input text. The present invention may include calculating a similarity value between the first text statement and a second text statement, wherein the second text statement is obtained from a statement database. The present invention may include comparing the similarity value to a pre-set threshold.” (YUN [Abstract]).
YUN does not teach, but ENOMOTO teaches:
A method comprising: accessing first textual content that was generated by a large language model (LLM), wherein the first textual content comprises a plurality of sub-components including a first sub-component and a second sub-component;
“FIG. 1 schematically illustrates a method and overall system architecture 100 for generating text summaries in accordance with an embodiment of the present invention. At least one document is taken as (a) input 102. This is then passed to the (1) extractive summarizer 104. Next, the (1) extractive summarizer 104 selects a subset of sentences from the (a) input 102. Then, the (3) preprocessor 108 adds context to the extracted sentences and removes meaningless words and phrases to generate the prompt for the abstractive summarizer as a (c) preprocessed summary 110. The (2) abstractive summarizer 112 (which also could be an LLM) then takes the (c) preprocessed summary 110 as input and generates (d) fluent summary 114 as output. Finally, the (4) explainer 120 receives three different summaries ((b) extractive summary 106, (c) preprocessed summary 110, (d) fluent summary 114) for the (a) input 102 and generates a transparent summary view for another AI system and/or a user.” (ENOMOTO [0032], as supported by [0016] in provisional).
submitting, to a second LLM, the modified textual content and a prompt to rephrase the modified textual content;
“In a first aspect, the present invention provides a computer-implemented, machine learning method for generating explainable text summaries includes extracting a subset of sentences from at least one input document as an extractive summary and adding context to the extracted sentences to generate a prompt. A fluent summary is generated by using the prompt as input to a generative language model. Source information for a sentence from the fluent summary is determined by mapping the sentence from the fluent summary to at least one sentence in the extractive summary and the at least one sentence from the extractive summary to at least one sentence from the at least one input document. A transparent summary view is generated showing the sentence from the fluent summary along with the source information from the extractive summary and the at least one input document for display on a user interface.” (ENOMOTO [0017], as supported by [0015] and [0016] in provisional).
“One implementation is to use a neural network that receives the reduced (c) preprocessed summary 110 and outputs a paraphrase of the content it contains as the (d) fluent summary 114. For example, by fine-tuning a pre-trained language model with a summarization dataset, a summarizer can be obtained. Alternatively, a language model already trained on a generic task (e.g., ChatGPT) can be used. Given a reduced summary (c) and task instructions (e.g., “summarize the document”), it generates an abstract summary as the (d) fluent summary 114. The (2) abstractive summarizer 112 can be run via a web application processing interface (API) or via a local computing system.” (ENOMOTO [0049], as supported by [0024] in provisional).
accessing second textual content that is generated by the second LLM based on the modified textual content and the prompt;
"Thus, in total, the system according to an embodiment of the present invention creates three different summaries ((b) extractive summary 106, (c) preprocessed summary 110, (d) fluent summary 114) for the (a) input 102. Then, the (e) transparent summary view 125 links these four texts ((a) input 102, (b) extractive summary 106, (c) preprocessed summary 110, (d) fluent summary 114) together in a transparent and, therefore, secure, reliable and trustworthy manner..." (ENOMOTO [0033], as supported by [0033] in provisional).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in YUN the capability to add various components of textual content generated by a large language model as described by ENOMOTO in: “In a first aspect, the present invention provides a computer-implemented, machine learning method for generating explainable text summaries includes extracting a subset of sentences from at least one input document as an extractive summary and adding context to the extracted sentences to generate a prompt. ...” (ENOMOTO [0017], as supported by [0016] and [0034] in provisional). Further motivation to submit the text to an LLM and access the textual content generated can be found in: “Embodiments of the present invention provide solutions that address both of these shortcomings of existing technology jointly, and provide to reduce the input length in a secure and reliable manner before asking the LLM to generate a fluent summary.” (ENOMOTO [0016], as supported by [0015] and [0016] in provisional).
Regarding claim 11, the rejection of claim 9 is incorporated, furthermore YUN teaches:
The method of Claim 9, wherein: the first sub-component and the second sub-component are different phrases, different sentences, or different paragraphs within the first textual content;
“ In some embodiments, the text processing system 400 may further comprise a classifier 421. The classifier 421 may classify the first text statement into multiple categories based on multiple dimensions. As an example, the first text statement may be classified based on sentence structures. The categories of the first text statement based on sentence structures may include simple sentence, complex sentence, and words/phrases, etc.” (YUN [0053]).
the method further comprising: generating a first embedding that represents the first sub-component;
“ ...In this example, besides token embeddings, segment embeddings and position embeddings of words or symbols in the statement obtained by NLP algorithms, POS information (which may also be referred to as POS embeddings) of words in the statement may also be applied by the similarity calculation module 402 in the similarity calculation....” (YUN [0058]).
generating a second embedding that represents the second sub-component;
“...In this example, besides token embeddings, segment embeddings and position embeddings of words or symbols in the statement obtained by NLP algorithms, POS information (which may also be referred to as POS embeddings) of words in the statement may also be applied by the similarity calculation module 402 in the similarity calculation...” (YUN [0058]).
the similarity is a similarity between the first embedding and the second embedding.
“In some embodiments, the similarity calculation module 402 may use part of speech (POS) information of words in the first text statement to calculate the similarity value. Typically, the POS information may be determined based on the grammar of the language using natural language processing (NLP) algorithms now known or to be developed. As an example, NLP algorithms may determine POS information of the word “book” in a statement “Please input a book name” as noun, and may determine POS information of the word “book” in a statement “Please book a meeting room” as verb. Referring now to FIG. 5, example parameters used in the similarity calculation are depicted according to embodiments of the present invention. An example input may be a statement “The password is weak, please input a secure one”. In this example, besides token embeddings, segment embeddings and position embeddings of words or symbols in the statement obtained by NLP algorithms, POS information (which may also be referred to as POS embeddings) of words in the statement may also be applied by the similarity calculation module 402 in the similarity calculation. ...” (YUN [0058]).
Regarding claim 12, YUN teaches:
generating a first embedding that represents the first sub-component;
“…In this example, besides token embeddings, segment embeddings and position embeddings of words or symbols in the statement obtained by NLP algorithms, POS information (which may also be referred to as POS embeddings) of words in the statement may also be applied by the similarity calculation module 402 in the similarity calculation…” (YUN [0058]).
generating a second embedding that represents the second sub-component;
“…In this example, besides token embeddings, segment embeddings and position embeddings of words or symbols in the statement obtained by NLP algorithms, POS information (which may also be referred to as POS embeddings) of words in the statement may also be applied by the similarity calculation module 402 in the similarity calculation…” (YUN [0058]).
based on a similarity between the first embedding and the second embedding, determining whether the second sub-component is repetitious with respect to the first sub-component;
“The present disclosure provides a computer-implemented method, computer system and computer program product for text processing. The present in invention may include obtaining an original text input from a collaborative development environment. The present invention may include extracting a first text statement from the original input text. The present invention may include calculating a similarity value between the first text statement and a second text statement, wherein the second text statement is obtained from a statement database. The present invention may include comparing the similarity value to a pre-set threshold.” (YUN [Abstract]). “At block 610, a similarity value between a first text statement obtained from text inputs and a second text statement obtained from a statement database may be calculated. Then, at block 612, if the calculated similarity value is larger than a pre-set threshold, the method 600 moves to block 614. At block 614, if keywords of the first text statement map with keywords of the second text statement, the method 600 moves to block 616. At block 616, the first text statement may be determined as redundant to the second text statement.” (YUN [0072]).
in response to determining that the second sub-component is repetitious with respect to the first sub-component, removing at least a portion of the second sub-component from the textual content.
“In some embodiments, the text processing system 400 may further comprise a text processing module 404. If the text processing system 400 classifies the first text statement as redundant to the second text statement, the text processing module 404 may apply the second text statement in a relevant file to replace the first text statement. For example, the first text statement to be analyzed may be “Please enter a good user id and password.” The text processing system 400 may determine the first text statement “Please enter a good user id and password.” as a non-standard statement relevant to the second text statement “Please enter a valid user id and password.” stored in the statement database 411. The text processing module 404 may replace the first text statement “Please enter a good user id and password.” with the second text statement “Please enter a valid user id and password.” in any or all relevant text files to remove redundancy or non-standardization in the collaborative development environment.” (YUN [0068]).
YUN does not teach, but ENOMOTO teaches:
One or more non-transitory storage media storing instructions which, when executed by one or more computing devices, cause: accessing textual content that was generated by a large language model (LLM), wherein the textual content comprises a plurality of sub-components including a first sub-component and a second sub-component;
“FIG. 1 schematically illustrates a method and overall system architecture 100 for generating text summaries in accordance with an embodiment of the present invention. At least one document is taken as (a) input 102. This is then passed to the (1) extractive summarizer 104. Next, the (1) extractive summarizer 104 selects a subset of sentences from the (a) input 102. Then, the (3) preprocessor 108 adds context to the extracted sentences and removes meaningless words and phrases to generate the prompt for the abstractive summarizer as a (c) preprocessed summary 110. The (2) abstractive summarizer 112 (which also could be an LLM) then takes the (c) preprocessed summary 110 as input and generates (d) fluent summary 114 as output. Finally, the (4) explainer 120 receives three different summaries ((b) extractive summary 106, (c) preprocessed summary 110, (d) fluent summary 114) for the (a) input 102 and generates a transparent summary view for another AI system and/or a user.” (ENOMOTO [0032], as supported by [0016] in provisional).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in YUN the capability to add various components of textual content generated by a large language model as described by ENOMOTO in: “In a first aspect, the present invention provides a computer-implemented, machine learning method for generating explainable text summaries includes extracting a subset of sentences from at least one input document as an extractive summary and adding context to the extracted sentences to generate a prompt. ...” (ENOMOTO [0017], as supported by [0016] and [0034] in provisional).
Regarding claim 13, the rejection of claim 12 is incorporated, furthermore YUN does not teach, but ENOMOTO teaches:
The one or more storage media of Claim 12, wherein the textual content is first textual content, wherein removing the portion of the second sub-component from the textual content results in modified textual content, wherein the instructions, when executed by the one or more computing devices, further cause: submitting, to a second LLM to generate a second textual content that is different than the first textual content, the modified textual content and a prompt to re-phrase the modified textual content.
“In a first aspect, the present invention provides a computer-implemented, machine learning method for generating explainable text summaries includes extracting a subset of sentences from at least one input document as an extractive summary and adding context to the extracted sentences to generate a prompt. A fluent summary is generated by using the prompt as input to a generative language model. Source information for a sentence from the fluent summary is determined by mapping the sentence from the fluent summary to at least one sentence in the extractive summary and the at least one sentence from the extractive summary to at least one sentence from the at least one input document. A transparent summary view is generated showing the sentence from the fluent summary along with the source information from the extractive summary and the at least one input document for display on a user interface.” (ENOMOTO [0017], as supported by [0015] and [0016] in provisional). “One implementation is to use a neural network that receives the reduced (c) preprocessed summary 110 and outputs a paraphrase of the content it contains as the (d) fluent summary 114. For example, by fine-tuning a pre-trained language model with a summarization dataset, a summarizer can be obtained. Alternatively, a language model already trained on a generic task (e.g., ChatGPT) can be used. Given a reduced summary (c) and task instructions (e.g., “summarize the document”), it generates an abstract summary as the (d) fluent summary 114. The (2) abstractive summarizer 112 can be run via a web application processing interface (API) or via a local computing system.” (ENOMOTO [0049], as supported by [0024] in provisional).
Regarding claim 14, the rejection of claim 12 is incorporated, furthermore arguments analogous to the ones presented for claim 3 are applicable.
Regarding claim 15, the rejection of claim 12 is incorporated, furthermore arguments analogous to the ones presented for claim 4 are applicable.
Regarding claim 16, the rejection of claim 12 is incorporated, furthermore arguments analogous to the ones presented for claim 5 are applicable.
Regarding claim 19, the rejection of claim 12 is incorporated, furthermore arguments analogous to the ones presented for claim 8 are applicable.
Regarding claim 20, arguments analogous to the ones presented for claim 3 are applicable furthermore YUN does not teach, but ENOMOTO teaches:
One or more non-transitory storage media storing instructions which, when executed by one or more computing devices, cause performance of the method recited in Claim 9.
“Referring to FIG. 7, a processing system 700 can include one or more processors 702, memory 704, one or more input/output devices 706, one or more sensors 708, one or more user interfaces 710, and one or more actuators 712. Processing system 700 can be representative of each computing system disclosed herein.” (ENOMOTO [0080], as supported by [0042] in provisional).
“Memory 704 can include volatile memory, non-volatile memory, and any other medium capable of storing data. Each of the volatile memory, non-volatile memory, and any other type of memory can include multiple different memory devices, located at multiple distinct locations and each having a different structure. Memory 704 can include remotely hosted (e.g., cloud) storage. ” (ENOMOTO [0084], as supported by [0044] in provisional).
Motivation “...Processors 702 can perform operations embodying the function, method, or operation by, for example, executing code (e.g., interpreting scripts) stored on memory ...” (ENOMOTO [0082], as supported by [0042] in provisional).
Claims 6, 7, 10, 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over YUN in view of ENOMOTO and further in view of Miller; Travis J. et al. (US-20220253871-A1), hereinafter "MILLER".
Regarding claim 6 , the rejection of claim 1 is incorporated, furthermore the combination of YUN and ENOMOTO does not teach, but MILLER teaches:
The method of Claim 1, further comprising: determining a frequency, in the textual content, of a particular word;
“Another such method is Term Frequency-Inverse Document Frequency (TF-IDF), which involves scoring of word frequency in a document/DS versus the inverse rarity scoring of a word across a collection of documents/Record as a method of identifying possible keywords.” (MILLER [0279]).
and in response to determining that the frequency meets one or more content modification criteria, removing one or more occurrences of the particular word from the textual content.
In aspects, the invention provides the method of any one or more of aspects 1-14, wherein the method comprises the processor analyzing the data harmonized evaluation dataset for the presence of undesirable duplicate characters or undesirable duplicate system-identified according to preprogrammed data deduplication standards and removing any identified undesirable duplicate characters or identified undesirable system-identified terms according to a data deduplication protocol to generate a deduplicated dataset and subjecting the deduplicated dataset to further processing to generate semantic vectors and lexical vectors therefrom (aspect 15).” (MILLER [0478]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in YUN in view of ENOMOTO the capability to determine the frequency of a particular world in a textual content, benefits and motivation are discussed by MILLER in: “In aspects, methods also comprise the processor analyzing a data harmonized evaluation dataset for the presence of undesirable duplicate characters or undesirable duplicate system-identified according to preprogrammed data deduplication standards and removing any identified undesirable duplicate characters or identified undesirable system-identified terms according to a data deduplication protocol to generate a deduplicated dataset and subjecting the deduplicated dataset to further processing to generate semantic vectors and lexical vectors therefrom. In aspects, deduplication is repeated one or more times. In aspects, counts of frequency are made before application of deduplication step(s).” (MILLER [0134]).
Regarding claim 7, the rejection of claim 1 is incorporated, furthermore the combination of YUN and ENOMOTO does not teach, but MILLER teaches:
The method of Claim 1, further comprising: computing a cosine similarity value based on the first embedding and the second embedding;
MILLER [0293] Lexical similarity methods are known in the art and any suitable lexical similarity method or methods can be utilized by engine(s) or other components of systems in the performance of methods. Lexical similarity typically provides a measure of the similarity of two texts based on the intersection of the word sets of same or different languages. There are several different ways of evaluating lexical similarity such as Jaccard Similarity, Cosine Similarity, Levenshtein Distance, etc. A lexical similarity of 1 typically suggests that there is complete overlap between the vocabularies while a score of 0 suggests that there are no common words in the two texts.... [0296] In case of cosine similarity, typically two data elements/records/documents are represented in a n-dimensional vector space with each word represented in a vector form. Thus, the cosine similarity metric measures the cosine of the angle between two n-dimensional vectors projected in a multi-dimensional space. As is known, the cosine similarity ranges from 0 to 1. A value closer to 0 indicates less similarity whereas a score closer to 1 indicates more similarity...
determining whether the cosine similarity value exceeds a particular threshold value;
In aspects, the query comprises selecting an evaluation submission semantic vector and measuring the cosine distance between the evaluation submission semantic vector and each PIDC semantic vector. In aspects, the query further comprises identifying any PIDC semantic vectors having a cosine distance that meets or exceeds a preprogrammed semantic vector similarity threshold as similar semantic vectors. (MILLER [0067]).
wherein removing is performed in response to determining that the cosine similarity value exceeds the particular threshold value.
MILLER [0478] In aspects, the invention provides the method of any one or more of aspects 1-14, wherein the method comprises the processor analyzing the data harmonized evaluation dataset for the presence of undesirable duplicate characters or undesirable duplicate system-identified according to preprogrammed data deduplication standards and removing any identified undesirable duplicate characters or identified undesirable system-identified terms according to a data deduplication protocol to generate a deduplicated dataset and subjecting the deduplicated dataset to further processing to generate semantic vectors and lexical vectors therefrom (aspect 15).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to explicitly include in YUN in view of ENOMOTO the use of cosine similarity as the function to calculate the similarity between the sentences embeddings and the definition of a threshold for such function, benefits and motivation are discussed by MILLER in: “Lexical similarity methods are known in the art and any suitable lexical similarity method or methods can be utilized by engine(s) or other components of systems in the performance of methods. Lexical similarity typically provides a measure of the similarity of two texts based on the intersection of the word sets of same or different languages. There are several different ways of evaluating lexical similarity such as Jaccard Similarity, Cosine Similarity, Levenshtein Distance, etc....” (MILLER [0293]).
Regarding claim 10, the rejection of claim 9 is incorporated, furthermore arguments analogous to the ones presented for claim 6 are applicable.
Regarding claim 17, the rejection of claim 12 is incorporated, furthermore arguments analogous to the ones presented for claim 6 are applicable.
Regarding claim 18, the rejection of claim 12 is incorporated, furthermore arguments analogous to the ones presented for claim 7 are applicable.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HECTOR J. CRESPO FEBLES whose telephone number is (571)272-4512. The examiner can normally be reached Mon - Fri 7:30 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/H.J.C./ Examiner, Art Unit 2657
/DANIEL C WASHBURN/ Supervisory Patent Examiner, Art Unit 2657