DETAILED ACTION
Receipt of Applicant’s Amendment, filed August 7, 2025 is acknowledged.
Claims 1, 6, 10, 15, 19 and 22 were amended.
Claims 3, 7, 12, 16, 21 and 23 have been canceled.
Claims 1, 2, 4-6, 8-11, 13-15, 17-20, and 22 are pending in this office action.
Claim Interpretation
To facilitate discussion labels written in the format “[a#]” have been related to specific stanzas within the independent claims. The same label has been applied to limitations reciting similar concepts consistently through the independent claims. These labels do not modify the scope of the claims, and are intended to be use for discussion purposes only. The labels are related as:
1. (Currently amended) An alphanumeric string similarity analysis system implemented via a back-end application computer server, comprising:
(a) an input data store that contains electronic records, each electronic record being associated with enterprise data and including an electronic record identifier and an alphanumeric string, the alphanumeric string including a phrase of two or more words;
(b) the back-end application computer server, coupled to the input data store, the back- end application computer server including:
[b1] a computer processor, and a computer memory, coupled to the computer processor, storing instructions that, when executed by the computer processor cause the back-end application computer server to:
[b2] receive, from the input data store, information about electronic records to be analyzed, including alphanumeric strings;
[b3] store each of the alphanumeric strings in its own row of a plurality of rows of an initial single column;
[b4] compute a length of each alphanumeric string, wherein the length is a value representing a count of characters in the alphanumeric string;
[b5] construct, in association with a PySpark instruction associated with a user-defined function, a two-string-column result table via a cross-join on the initial single column, the two-string-column result table including:
[b.5.1] i. at least an alpha control text column, an alpha control id column, a beta control text column, a beta control id column, and a similarity score column, ii. a plurality of result rows, each result row including a single string in the alpha control text column and a corresponding single string in the beta control text column, wherein the plurality of result rows is based on a pairing of each alphanumeric string in the plurality of rows of the initial single column with each other; and
[b.5.2] wherein for each result row:
[b.5.2.1] each single string in the alpha control text column has an alpha control id;
[b.5.2.2] each single string in the beta control text column has a beta control id;
[b.5.2.3] the result rows are sorted by alpha control id; and
[b.5.2.4] wherein the result rows having a same alpha control id have a different beta control id;
[b.5.3] modify the constructed two-string-column result table by:
in a case the length of the single string in the alpha control text column is greater than the length of the corresponding single string in the beta control text column, removing the result row from the two-string- column result table, and
in a case the length of the single string in the alpha control text column is less than the length of the corresponding single string in the beta control text column, retaining the result row in the two-string-column result table, and
the single string in the alpha control text column represents a longer single string in the beta control text column;
[b6] automatically analyze the two-string-column result table using cosine similarity to generate, for each result row, the similarity score for the single string in alpha control text column and the corresponding single string in the beta control text column of the two-string-column result table, wherein the analysis comprises:
creating a first multi-dimensional vector representing the single string in the alpha control text column,
creating a second multi-dimensional vector representing the corresponding single string in the beta control text column,
calculating, via execution of the PySpark column-wise functions, cosine of an angle between the first multi-dimensional vector and the second multi-dimensional vector;
[b7] compare the similarity scores to a threshold level;
[b8] automatically generate at least two string families, each string family having similarity scores above the threshold level, wherein each string family includes a plurality of rows having a same value in the alpha control text column and a different value in the beta control text column;
[b9] arrange to output indications of the similarity scores for each string family;
[b10.1] receive selection of an element on an interactive user interface display; and
(c) a communication port coupled to the back-end application computer server transmitting data through a distributed communication network to remote user devices, wherein the remote user devices comprise the interactive user interface display; and
(d) the display device receiving the transmitted data, via the communication port and the distributed communication network, and based on the transmitted data, the interactive user interface display illustrating analysis results.
For examination purposes:
The “two-string-column result table” (See label [b5]) recited in the claim has been interpreted in light of Figure 6B.
Within this example, the claimed “families” (See label [b8]) have been construed as being exemplified by the groupings of within the dashed lines, wherein the alpha control column has the same value and the beta control value has distinct columns, as shown in Figure 6B.
With regard to [b.5.3] “in a case the length of the single string in the alpha control text column is greater than the length of the corresponding single string in the beta control text column, removing the result row from the two-string- column result table, and
in a case the length of the single string in the alpha control text column is less than the length of the corresponding single string in the beta control text column, retaining the result row in the two-string-column result table”.
Within the instant specification, Paragraph [0042] recites “A single column of strings can be cross-joined with itself to create all pairing combinations, resulting in two columns of strings. Furthermore, such a cross-join result can be specified to retain only shorter strings on the left.” The only recitation of “remove” in the instant specification is with regard to the removal of stop words (Paragraphs [0029] and [0049]), unnecessary punctuation marks (Paragraph [0033]). There is no support in the instant specification for generating results, then removing or retaining the results based on conditions. When read in light of the instant specification (Specifically, Paragraph [0042]) the system results in the two-column of strings. Wherein the resulting rows of the table have the properties of the “shorter string” on the left”. This is the interpretation that has been applied for examination purposes.
Drawings
The drawings are objected to under 37 CFR 1.83(a). The drawings must show every feature of the invention specified in the claims. Therefore, the limitations of [b.5.3] which recite “removing the result” in the case that the alpha control text is greater in length than the beta control text” must be shown or the feature(s) canceled from the claim(s). No new matter should be entered. At best Figure 2 S250 recites constructing the table which generate the same result, but this does not support the explicit recitation of “removing” or “retaining” specific data elements. The claim limitations imply that all rows are generated (whatever that even means within the claimed context) and specific rows are “removed” or “retained”. This concept is not recited in the instant drawings.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Objections
Claims 1, 10 and 19 are objected to because of the following informalities. Appropriate correction is required.
With regard to claims 1, 10 and 19, it should be noted that the tabbing and indentations that applicant has applied to the claim language does not appear to be in line with the punctuation present in the claim language (see the underlined sections in the claim interpretation above). MPEP 608.01(m) states “There may be plural indentations to further segregate subcombinations or related steps. In general, the printed patent copies will follow the format used but printing difficulties or expense may prevent the duplication of unduly complex claim formats” (emphasis added). In general, semi-colons (“;”) are to be used to separate between steps, while colons (“:”) are to be used to indicate subcombinations. The end of subcombinations are denoted by the use of the term “and”. It is suggested that the claims be amended to bring the tabbing/indentations in line with the punctuation to ensure that the scope of the claim is properly captured in the event that printing difficulties or expense prevent the duplication of the claim format. Please see the claim interpretation section which has denoted where this occurs in limitations [b5.3] and [b6]. It is unclear if the distinction between the punctuation and tab/indentions within these limitations will effect the scope of the claims during publication.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 2, 4-6, 8-11, 13-15, 17-20, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Maan [2020/0272692] in view of Khayyat [2017/0060944], CSS-ProfK [How can I style a list of name-value pairs to appear like an HTML table?], Kumaresan [2002/0171800] in view of Apache-Spark v2.1.0 [Unified engine for large-scale data analytics].
Examiners Note Regarding Apache-Spark [Unified engine for large-scale data analytics]: The proposed rejection on record is made in view of the Apache-Spark v2.1.0 specifically, which was release December 28, 2016. The cited NPL document is a collection of screenshots of the Apache Spark framework, which provides evidence of what functions are available within Apache-Spark v2.1.0. Each function lists the version of Apache Spark in which it was first made available, and last changed. Page numbers were added to the top right of the collection of documents to facilitate claim mapping. All page numbers listed herein are referring to the Page # placed at the top right of the printed screenshot.
With regard to claim 1 Maan teaches An alphanumeric string as the words in the patent (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text. It may be appreciated by those skilled in the art that the tuples are a finite ordered sequence of elements, such as words”) similarity analysis system as identifying pairings of words to substitute words (¶48 “for each of the identified one or more words sequences within the patent document text, a text box 506 shows a substitute word-sequence that would be used to replace an associated original word-sequence in the input patent document text”; Figure 5A, 506, see the pairing of “electronic message => message”, etc.) implemented via a back-end application computer server (Maan, Figure 1, 102 “Computing Device” which is located behind the “network” 106), comprising:
(a) an input data store as memory (Maan, Figure 7, 730) that contains electronic records as patents (Maan, ¶25 “The data repository creating module 202 may receive a plurality of patent documents”), each electronic record as a patent (Id) being associated with enterprise data as the patent is associated with chemical or electronical engineering data (Maan, ¶45 “by way of an example, for a chemical patent… in a patent from the field of electronical engineering”) and including an electronic record identifier as each patent has an application number (Maan, ¶25 “The data repository creating module 202 may receive a plurality of patent documents”) and an alphanumeric string as the word-sequence (Maan, ¶29 “In other words, the word-sequence identifying module 206 may identify one or more word-sequences which occur more than once within the patent document text. Additionally, each of the one or more word-sequences includes a unique last word.”), the alphanumeric string including a phrase of two or more words as a plurality of elements, such as “charging devices” (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text. It may be appreciated by those skilled in the art that the tuples are a finite ordered sequence of elements, such as words.”; ¶52 “By way of an example, for the long word-sequences "charging units," "charging stations," "charging devices," "charging apparatuses," "charging equipments," and "charging modules" the associated substitute short word determined is "chargers".”);
(b) the back-end application computer server (Maan, Figure 1, 102 “Computing Device” which is located behind the “network” 106; ¶59 “a general-purpose computer system, such as a personal computer (PC) or server computer”), coupled to the input data store (Maan, Figure 7, see that the memory 730 is part of the computer system 702), the back- end application computer server (Maan, Figure 1, 102) including:
[b1] a computer processor (Maan, Figure 7, 704; Figure 1, 110), and a computer memory (Maan, Figure 7, see 726, 728; Figure 1, 112), coupled to the computer processor (Maan, Figure 7, storage interface 724 connecting processor 704 to the RAM 726 and ROM 728), storing instructions (Maan, ¶22 “The memory 112 may store instructions that, when executed by the processor 110, cause the processor to…”) that, when executed by the computer processor cause the back-end application computer server (Id) to:
[b2] receive, from the input data store, information about electronic records to be analyzed, including alphanumeric strings as generating an array of words from the patent document (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text. It may be appreciated by those skilled in the art that the tuples are a finite ordered sequence of elements, such as words”);
[b3] store each of the alphanumeric strings as the sequence of words (Maan, ¶29) in its own row as a respective tuple in the array (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text.”) of a plurality of rows as the plurality of tuples in the array (Id) of an initial single column as an array (Id);
[b4] compute a [[ as determining that a word sequence is long or short (Maan, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”), [[
[b5] [[ as the array of words (Maan, ¶28), [[as the mapping association (Maan, ¶52 “At step 604, an associated substitute short word-sequence may be determined for each of the one or more long word-sequences based on a mapping dictionary”):
i. at least an alpha control text column as the short word-sequence (Maan, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”), [[ as the long word-sequence (Maan, ¶52), [[
ii. a plurality of result rows as each separate association (Maan, ¶52), each result row including a single string in the alpha control text column as the short word sequence (Maan, ¶52) and a corresponding single string in the beta control text column as the long word-sequence (Maan ¶52), wherein the plurality of result rows is based on a pairing as the association (Maan, ¶52) of each alphanumeric string in the plurality of rows of the initial single column with each other (Maan, ¶53 “At step 606, each of the identified one or more long word-sequences may be replaced with the associated substitute short word-sequence in the patent document text to generate the patent document summary.”); and
[b.5.2] wherein for each result row as each association (Maan, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”):
[b.5.2.1] each single string in the alpha control text column as the value of short word sequence such as “software” (Maan, ¶52; ¶53 “By way of another example, "computer program" may be replaced with "software," "software product" may be replaced with "software," and "computer processor" may be replaced with "CPU."”) has [[
[b.5.2.2] each single string in the beta control text column as the value of long word sequence such as “computer program” (Maan, ¶52; ¶53 “By way of another example, "computer program" may be replaced with "software," "software product" may be replaced with "software," and "computer processor" may be replaced with "CPU."”) has [[
[b.5.2.3] [[
[b.5.2.4] [[
[b.5.3] modify the constructed two-string-column result table by:
[[ as the shorter word-sequence (Maan, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”) is less than the length as being shorter than (Id) of the corresponding single string in the beta control text column as the long word-sequence (Id), retaining the result row as identifying the association of word-sequences that have this property (Id) in [[ as the short-word sequence may be used as a substitute for the long word-sequence (Id);
[b6] [[as fact that the word is highlighted as a replacement indicates its similarity (¶48 “for each of the identified one or more words sequences within the patent document text, a text box 506 shows a substitute word-sequence that would be used to replace an associated original word-sequence in the input patent document text”; Figure 5A, 506, see the pairing of “electronic message => message”, etc.), wherein the analysis comprises:
[[
[b7] [[
[b8] automatically generate at least two string families as a listing of alternative word-sequence suggestions, including the exemplified word-sequence suggestions (Maan ¶44 “drop-down menu may be provided that may include a list of alternative word-sequence suggestion”; ¶53 “By way of yet another example, "the present invention" may be replaced with "this invention" and "the present disclosure" with "this disclosure," and "at least one" with "one."”), [[as the listing of alterative substitute words (Id) includes a plurality of rows having a same value in the alpha control text column as the original word (Maan ¶43 “the substitute word-sequence used to replace the original word sequence”) and a different value in the beta control text column as the substitute word-sequence (Id);
[b9] arrange to output indications (Maan, ¶40 “the highlighting attribute may be displayed so as to indicate to the user that the substitute word-sequence that has the highlighting attribute is not the original word-sequence of the patent document text, but is replaced text.”) of [[ as fact that the word is highlighted as a replacement indicates its similarity (¶48 “for each of the identified one or more words sequences within the patent document text, a text box 506 shows a substitute word-sequence that would be used to replace an associated original word-sequence in the input patent document text”; Figure 5A, 506, see the pairing of “electronic message => message”, etc.) for each string family as a listing of alternative word-sequence suggestions, including the exemplified word-sequence suggestions (Maan ¶44, ¶53);
[b10.1] receive selection (Maan, ¶41 “the predefined action may include at least one of clicking on the highlighting attribute, hovering a mouse pointer over the highlighting attribute, clicking on the highlighting attribute for a predefined duration through a mouse, or performing a right-click on the highlighting attribute.”) of an element (Maan, ¶41 “a predefined action may be received through an external interface on a highlighting attribute of a substitute word-sequence”) on an interactive user interface display (Maan, ¶64 “user interfaces may provide computer interaction interface elements on a display system operatively connected to computer system 702, such as cursors, icons, check boxes, menus, scrollers, windows, widgets, etc. Graphical user interfaces (GUIs) may be employed”); and
(c) a communication port (Maan, Figure 7, 714) coupled to the back-end application computer server (Maan, Figure 1, 102; Figure 7, 702) transmitting data as sending the data to be displayed (Maan, ¶42 “the original word-sequence may be displayed to the user in a display window; ¶44 “a drop-down menu may be provided that may include a list of alternative word-sequence suggestions”) through a distributed communication network (Maan, Figure 7, 716; Figure 1, 106) to remote user devices (Maan, Figure 1, 104 a-c; Figure 7, 718, 720, 722), wherein the remote user devices (Maan, Figure 1, 104 a-c; Figure 7, 718, 720, 722) comprise the interactive user interface display (Maan, ¶64 “user interfaces may provide computer interaction interface elements on a display system operatively connected to computer system 702, such as cursors, icons, check boxes, menus, scrollers, windows, widgets, etc. Graphical user interfaces (GUIs) may be employed”); and
(d) the display device (Maan, Figure 1, 104 a-c; Figure 7, 718, 720, 722; ¶61) receiving the transmitted data as sending the data to be displayed (Maan, ¶42 “the original word-sequence may be displayed to the user in a display window; ¶44 “a drop-down menu may be provided that may include a list of alternative word-sequence suggestions”), via the communication port (Maan, Figure 7, 714) and the distributed communication network (Maan, Figure 7, 716; Figure 1, 106), and based on the transmitted data as the data sent to be displayed (Maan, ¶42 “the original word-sequence may be displayed to the user in a display window; ¶44 “a drop-down menu may be provided that may include a list of alternative word-sequence suggestions”), the interactive user interface display (Maan, ¶64 “user interfaces may provide computer interaction interface elements on a display system operatively connected to computer system 702, such as cursors, icons, check boxes, menus, scrollers, windows, widgets, etc. Graphical user interfaces (GUIs) may be employed”) display illustrating analysis result(Maan, ¶42 “the original word-sequence may be displayed to the user in a display window; ¶44 “a drop-down menu may be provided that may include a list of alternative word-sequence suggestions”).
Maan does not explicitly teach [b5] construct, in association with a PySpark instruction …, a two-string-column result table via a cross-join on the initial single column, the two-string-column result table including… an alpha control id column… a beta control id column … in a case the length of the single string in the alpha control text column is greater than the length of the corresponding single string in the beta control text column, removing the result row from the two-string- column result table, and … the two-string-column result table…
Table of Result Pairs:
Si
Sj
i
time
j
time
S2
140
S1
100
S2
140
S3
80
S2
140
S4
90
S1
100
S3
80
S1
100
S4
90
S4
90
S3
80
Khayyat teaches [b5] construct, in association with a PySpark instruction (Khayyat, ¶99 “(1) Spark SQL-IEJOIN. We implemented IEJOIN inside Spark SQL vl .0.2 (see the web page at spark.apache.org/sql/).”) [[, a two-string-column result table as the set of (si,sj) pairs returned by the calculation (Khayyat, ¶36 “returns a set of pairs {(si,sj)} where si takes more time than sj; the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}) Please note that one of ordinary skill in the art would recognize the returned set of pairs as the mathematical representation of the “Table of Results Pairs” herein.) via a cross-join on (Khayyat, ¶60 “(ii)apply Cartesian product (or self-Cartesian product for a single relation input) on the data blocks”; ¶33 “an inequality self-join query Qs”) the initial single column as t_id when input as a single relation input (Khayyat, ¶34 “Qs: SELECT s1.t_id, s2.t_id”; ¶60 “(ii) apply Cartesian product (or self-Cartesian product for a single relation input) on the data blocks”; Please note that within the 103 combination the single input is the array of strings taught by Maan), the two-string-column result table as the table of result pairs (Khayyat, ¶36 “returns a set of pairs {(si,sj)} where si takes more time than sj; the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}, see Table depicted above) including:
[b.5.1] i. at least an alpha control text column as the values in the Si column, see the time values for Si (Id), an alpha control id column as the Si column (Id), a beta control text column as value in the Sj column, see the time value for Sj (Id), a beta control id column as the Sj column (Id), and [[
ii. a plurality of result rows as each pair (Id), each result row including a single string in the alpha control text column as the values in the Si column, see the time values for Si (Id) and a corresponding single string in the beta control text column as value in the Sj column, see the time value for Sj (Id), wherein the plurality of result rows is based on a pairing as the self-join operation (Khayyat, ¶60 “(ii)apply Cartesian product (or self-Cartesian product for a single relation input) on the data blocks”; ¶33 “an inequality self-join query Qs”) of each alphanumeric string in the plurality of rows of the initial single column as t_id when input as a single relation input (Khayyat, ¶34 “Qs: SELECT s1.t_id, s2.t_id”; ¶60 “(ii) apply Cartesian product (or self-Cartesian product for a single relation input) on the data blocks”) with each other as the self-join operation results in the pairing of the values in the table with the other values in the table (Khayyat, ¶60; ¶33); and
[b.5.2] wherein for each result row as each pair (Khayyat, ¶36 “returns a set of pairs {(si,sj)} where si takes more time than sj; the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}, see Table depicted above):
[b.5.2.1] each single string in the alpha control text column as the value in the si column (Id) has an alpha control id as the si Id itself (Id);
[b.5.2.2] each single string in the beta control text column as the value in the sj column (Id) has a beta control id as the sj Id itself (Id);
[b.5.2.3] the result rows are sorted as sorting the results (Khayyat, ¶37 “A natural idea to handle an inequality join on one attribute is to leverage a sorted array. For instance, we sort west's tuples on time in ascending order into an array L1: <s3 , s4 , s1, s2>. We denote by L[i] the i-th element in array L, and L[I,j] its sub-array from position i to position j. Given a tuples, any tuple at L1[k] (kE(l, i-1}) has a time value that is less than L1[i], the position of s in L1. Consider Example l , tuple s1 in position L1[3] joins with tuples in positions L1[1, 2], namely s3 and s4 .”) by alpha control id as “i” (Id); and
[b.5.2.4] wherein the result rows as see the three rows in bold, and the two rows in italics (See the Result table ¶36) having a same alpha control id as 140 in the Si bold rows and the value of 100 for the Si italicized rows (Id) have a different beta control id as the values 100, 80, and 90 for the Sj in the bold rows and 80 and 90 for the Sj italicized rows (Id);
[b.5.3] modify the constructed two-string-column result table by:
in a case the length of the single string in the alpha control text column as the value being tested from S1 in the query example (Khayyat, ¶34-¶36 “Qs: SELECT S1.t_id, S2.t_id FROM west s1 west s2 WHERE s1 .time>s2.time”) is greater than as condition operator not being satisfied in the WHERE clause, for example “>” (Id; The example conditions in ¶41 of Khayyat are evaluating a greater than and a less than condition. One of ordinary skill would recognize that any appropriate conditional check (i.e., <, >, <= or >=) may be set as a manner of device optimization. These conditionals are recognized as substantial equivalent conditionals, which one of ordinary skill in the art would recognize which condition is best used for any particular desired check.) the [[ as the variable being tested, for example ‘time’ (Khayyat, ¶34-¶36) of the corresponding single string in the beta control text column as value being tested from S2 (Id), removing the result row from the two-string- column result table as performing the post selection operations, e.g. the wherein clause conditions (¶34-36), on the output of the join conditions (Khayyat, ¶71 “in case of additional inequality join conditions, it evaluates them as a post selection operation on the output of the first two join conditions.”), and
in a case the length of the single string in the alpha control text column as the value being tested from S1 in the query example (Khayyat, ¶34-¶36 “Qs: SELECT S1.t_id, S2.t_id FROM west s1 west s2 WHERE s1 .time>s2.time”) is less than as condition operator being satisfied in the WHERE clause, for example “>” (Id; The example conditions in ¶41 of Khayyat are evaluating a greater than and a less than condition. One of ordinary skill would recognize that any appropriate conditional check (i.e., <, >, <= or >=) may be set as a manner of device optimization. These conditionals are recognized as substantial equivalent conditionals, which one of ordinary skill in the art would recognize which condition is best used for any particular desired check.)the [[ as the variable being tested, for example ‘time’ (Khayyat, ¶34-¶36) of the corresponding single string in the beta control text column as value being tested from S2 (Id), retaining the result row in the two-string-column result table as when the condition is satisfied, the row is identified as a result (Id; See the result table depicted above), and
the single string in the alpha control text column as the Si value (Id) represents a [[ as satisfying the condition (Id) single string in the beta control text column as the Sj value (Id);
[b6] automatically analyze the two-string-column result table as the table of result pairs (Khayyat, ¶36 “returns a set of pairs {(si,sj)} where si takes more time than sj; the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}, see Table depicted above) using [[cosine similarity]] as using a second condition evaluation, for example checking both cost and time (Khayyat, ¶41) to generate, for each result row as for any given tuple (Khayyat, ¶43 “Thus, given a tuple s, any tuple… has a higher cost than the one in s… Our observations here is as follows. For any tuple ś to form a join result (s, ś) with tuple s, the following two conditions must be satisfied: (i) s’ is on the left of s in L1, i.e., s has a larger value for time than s ś, and (ii) ś is on the right of s in L2, i.e., s has a smaller value for cost than ś.”), … calculating, via execution of the PySpark (¶99) column-wise functions (¶51 “IESELFJOIN takes a self-join inequality query Q as input, and returns a set of result pairs. The algorithm first sorts the two lists of attributes to be joined (lines 2-5), computes the permutation array (line 6), and sets up the bit-array (line 7), as well as the result set (line 8). It also sets an offset variable to distinguish inequality operators with or without equality (lines 9-10).”; ¶97 “We used MonetDB Database Server Toolkit vl.1 (October2014-SP2), which is an open-source column-oriented database, in a disk partition of size 669 GB”), [[
[b8] automatically generate at least two [[ as a first grouping of results with i=2, a second grouping being the results with i=1, and the third grouping of results with i=4 in the Si column (Khayyat, ¶36 “returns a set of pairs {(si,sj)} where si takes more time than sj; the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}, see the Table depicted above).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the device taught by Maan to determine candidate pairings between words using the techniques taught by Khayyat to yield the predictable results of generating a set of word sequences pairings which can be used to substitute longer words with relatively shorter words within the text document. Please note that within the proposed combination, Maan takes an array of terms (Maan, ¶47, Figure 5, 504) and generates a list of paired terms (Maan, Figure 5, 506), wherein the system is able to identify which term in the pairing is the shortest term (Maan, ¶32, ¶52). Maan does not explicitly state how this functionality is performed. The techniques taught by Khayyat provide a means of generating term pairings, and applying a condition to the pairings.
With regard to the “string” limitations noted above, the specific data values and condition that is being tested are a specific coding implementation. Within the proposed combination of Maan-Khayyat, the underlying table is an array of terms, these terms are alphanumeric strings as detailed above (See Mapping to Maan). One of ordinary skill in the art would readily know the specific conditions that would be necessary to enable the system to use the SQL techniques taught by Khayyat to achieve the desired evaluations within the situation taught by Maan. As such, the proposed combination would be built to operate upon this array of strings instead of the array of time / cost data depicted by Khayyat.
With regard to the sorting “by” alpha control id. It is acknowledged that Khayyat teaches sorting by the alpha control text column. There is a finite number of values that the table may be sorted on, e.g. (i, j, time at Si, and time at Sj). The function necessary to perform the sorting based on any of the data values is the same functionality taught by Khayyat to perform he sorting on the alpha control text column. One of ordinary skill in the art would identify the results of sorting based on any of these values as a predictable outcome, with a reasonable expectation of successfully sorting the values. By sorting based on any particular value it allows the user of the device to more easily visualize a grouping based on that value, facilitating the ability to see similar data values. It would have been obvious to one of ordinary skill in the art to which said subject matter pertains at the time in which the invention was filed to have implemented the proposed device to have tried sorting using the other values in the table (KSR Rational E “Obvious to Try”; See MPEP 2143 (E)).
It is acknowledged that Khayyat does not explicitly use the term “table”. As noted above one of ordinary skill in the art would recognize the returned set of pairs as the mathematical representation of the “Table of Results Pairs” herein. Nevertheless, as the term “table” is not explicitly recited:
CSS-Profk explicitly teaches
[b5] construct …a two-string-column result table …, the two-string-column result table as the common layout pattern using a four column table (CSS-Profk Page 1 “I have inherited a web forms report laid out using pure HTML and ASP.NET server controls, and a common layout pattern is a section laid out using a four column table, with first and third columns used for field labels, and the second and fourth columns for values”) including:
[b.5.1] at least an alpha control text column as the second column used for values (Id), an alpha control id column as the first column used for field labels (Id), a beta control text column as the fourth column used for values (Id), a beta control id column as the third column used for labels (Id)…
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the results table generated by the proposed combination using the common layout pattern detailed by CSS-Profk as it yields the predictable results of being an easy to code stylistic format for tables which is commonly used within the field of art.
Maan does not teach explicitly teach [b5] … a similarity score column … [b6] automatically analyze the two-string-column result table using cosine similarity to generate… a similarity score … [b7] compare the similarity scores to a threshold level … [b8] …, each string family having similarity scores above the threshold level... [b9] arrange to output indications of the similarity scores. Please note that Maan teaches determining substitute words based on a mapping dictionary (Maan, ¶52) but does not state how the mapping dictionary is used to determine such substitute words.
Kumaresan teaches [b5] … a similarity score column as generating the similarity score (Kumaresan, ¶98) in association with the instant 103 combination (See motivation bellow) … [b6] automatically analyze the two-string-column result table using cosine similarity as the cosine similarity assigned to records within the dictionary (Kumaresan, ¶98 “a dictionary token may have a greater cosine similarity to another vector in the vector space”; ¶99 “the dictionary of word emending’s may assign records to the same group … similarity may be determined based on the Euclidean distance or cosine similarity between different embeddings”) to generate, for each result row as each word (Id), a similarity score (Id) for the string in a first row of the alpha control column as the vector for a dictionary token (Kumaresan, ¶98 “a vector for a dictionary token may have a smaller Euclidean distance and/or greater cosine similarity to another vector in the vector space for a token that has a more similar semantic meaning than for a token that is not as similar”) and the corresponding string in the beta control text column as another vector (Id) of the two-string-column result table as the vector space (Id);
wherein the analysis comprises:
creating a first multi-dimensional as the vector is projected into vector space, such as a Euclideant space (¶98 “Word Embeddings may be assigned such that a vector for a dictionary token may have a smaller Euclidean distance and/or greater cosine similarity to another vector in the vector space for a token that has a more similar semantic meaning than for a token that is not as similar”) vector as a real-valued vector (Kamarasan, ¶98 “In an embodiment, a dictionary may comprise a set of word embeddings where a word embedding is a numeric representation, such as a real-valued vector, of a word in a given context.”) representing the single string in the alpha control text column as the word (Id), creating a second multi-dimensional vector representing the corresponding single string in the beta control text column as the word embedding created for the second word (Id), calculating, [[ as the cosine similarity (Id) between the first multi-dimensional vector and the second multi-dimensional vector as the cosine similarity between a vector an another vector in vector space (Id);
[b7] compare the similarity scores to a threshold level as the threshold similarity level (Kumaresan, ¶99 “However, the semantic meaning conveyed by the keywords may be substantially similar (e.g., within a threshold Euclidean distance or cosine similarity) of the word embeddings included in the matched records”);
[b8] automatically generate at least two string families as the set of clustered terms (Kumaresan, ¶127 “the interactive interface may present links between different sets of clusters”; ¶98 “cosine similarity”), each string family having similarity scores above the threshold level as satisfying a threshold (Kumaresan, ¶47 "retain the top n tokens in the dictionary or only tokens with weights satisfying a threshold. other tokens maybe discarded or otherwise not stored in the dictionary"' ¶92) wherein each string family includes a plurality of rows as the tokens identified as being similar (Kumaresan, 98)
[b9] arrange to output indications (Kumaresan, ¶127 “the interactive interface may present links between different sets of clusters”) of the similarity scores as the links (Id) which include the cosine similarity (¶98 “cosine similarity”) for each string family as the set of clustered terms (Kumaresan, ¶127 “the interactive interface may present links between different sets of clusters”; ¶98 “cosine similarity”).
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the dictionary mapping within the proposed combination, using the word clustering techniques to build the dictionary. The proposed combination yields the predictable results of providing a means of generating a dictionary that is capable of relating words based on similar meaning. The proposed combination provides a similarity value that can be tested against a predefined threshold, as taught by Kumaresan, to identify the terms that are acceptable replacements of each other as is desired by Maan. Within the proposed combination, the final result table generated by the techniques taught by Khayyat may be modified to include the cosine similarity score taught by Kumaresan to facilitate the identification of similar words. Please note that the techniques taught by Khayyat allow for the result table to test for two conditions to generate the result set.
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed combination to added the cosine similarity score as additional information to the table generated within the proposed combination as it serves to facilitate identifying the association between similar words (Maan, ¶52).
Khayyat does not explicitly teach [b4] compute a length of each alphanumeric string, wherein the length is a value representing a count of characters in the alphanumeric string; [b5] construct, in association with a PySpark instruction associated with a user-defined function, a two-string-column result table via a cross-join on
…
[b.5.3] …in a case the length of the single string … the length of the corresponding single string …[b6] calculating, via execution of the PySpark column-wise functions, cosine of an angle between the first multi-dimensional vector and the second multi-dimensional vector. It should be noted that Khayyat is taught as being implemented using Spark SQL v1.0.2 from the spark.apache.org/sql/ website.
Apache-Spark v2.1.0 teaches [b4] compute a length of each alphanumeric string (Apache-Spark V2.1.0, Page 2 pyspark.sql.functions.length: “Computes the character length of string data”), wherein the length is a value representing a count of characters in the alphanumeric string as character length (Id);
[b5] construct, in association with a PySpark instruction associated with a user-defined function (Apache-Spark V2.1.0, Pages 15-17; pyspark.sql.functions.udf), a two-string-column result table via a cross-join (Apache-Spark V2.1.0, Page 6 pyspark.sql.DataFrame.crossJoin: “Returns the cartesian product of another DataFrame”) on the initial string column as DataFrame using an alias name for the initial string (Apache-Spark v2.1.0, Page 8, pyspark.sql.DataFrame.alias; Please note that one of ordinary skill in the art would recognize the use of alias name for the input as the means of executing the self-Cartesian product recited by Kayyat, ¶66)
…
[b.5.3] …in a case the length of the single string (Apache-Spark V2.1.0, Page 2 pyspark.sql.functions.length: “Computes the character length of string data”)… the length of the corresponding single string (Apache-Spark V2.1.0, Page 2 pyspark.sql.functions.length: “Computes the character length of string data”) … [b6] calculating, via execution of the PySpark column-wise functions (Apache-Spark V2.1.0, Page 4 pyspark.sql.functions.co: “Parameters: col: Column or column name”), cosine of an angle (Apache-Spark V2.1.0, Page 4 pyspark.sql.functions.co: “Returns: Column cosine of the angle, as computed by java.lang.Math.cos()”)…;
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the proposed combination to execute the device using Pyspark SQL version 2.1.0 as it is the earliest version of PySpark that contains the necessary functions. The proposed combination is simply updating the version of Pyspark from Spark SQL v1.0.2 (Kayyat, ¶99), which was used to perform the example experimentations not requiring the Cartesian product, to Pyspark.sql v2.1.0 to facilitate the ability to perform the Cartesian product which is the simplest way to approach running IEJOIN (Kayyat, ¶60) as crossJoin was added to the Pyspark.sql in version 2.1.0 (Apache-Spark v2.1.0, Page 6 pyspark.sql.DataFrame.crossJoin: “New in version 2.1.0”).
With regard to claim 2 the proposed combination further teaches wherein the cross-join (Khayyat, ¶60 “(ii)apply Cartesian product (or self-Cartesian product for a single relation input) on the data blocks”; ¶33 “an inequality self-join query Qs”; Apache-Spark V2.1.0, Page 6 pyspark.sql.DataFrame.crossJoin: “Returns the cartesian product of another DataFrame”) is associated with a Structured Query Language ("SQL")-type (Khayyat, ¶70 “Spark SQL allows users to query structed data on top of Spark”; Apache-Spark V2.1.0) WHERE condition as using a WHERE condition such as si.LEN() < sj.LEN() to determine a substitute shorter word sequence (Maan, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”; Khayyat, ¶36 “s1.time>s2.time”; ¶36 “the result is {(s2,si), (s2,s3), (s2,s4), (s1,s3), (s1,s4), (s4,s3)}”; ¶43 “the following two conditions must be satisfied: (i) s’ is on the left of s in L1, i.e., s has a larger value for time than s’”; Apache-Spark V2.1.0, Page 2 pyspark.sql.functions.length: “Computes the character length of string data”) to keep shorter strings as using a WHERE condition such as si.LEN() < sj.LEN() to determine a substitute shorter word sequence (Id) to the left in the result table as the si column (Id).
With regard to claims 4, and 13 the proposed combination further teaches wherein the back-end application computer server removes stop words from the alphanumeric strings as removing the stop words during the array generation (Maan, ¶28 “the array generating module 204 may generate an array that includes a plurality of tuples from the patent document text… In some embodiments, the plurality of tuples may be identified by removing one or more stop words in the data repository from the patent document text”) before computing the length of each alphanumeric string as the replacement is done based on the word length, which is required to be done after the array is generated (Maan, ¶30 “replace each of the identified one or more word-sequences with an associated substitute word-sequence”; ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”).
With regard to claims 5 and 14 the proposed combination further teaches wherein the back-end application computer server automatically replaces longer alphanumeric strings with shorter versions from a same family as replacing the longer words with shorter word that are identified as being similar (Maan, ¶30 “replace each of the identified one or more word-sequences with an associated substitute word-sequence”; ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences based on a mapping dictionary”).
With regard to claims 6, 15, and 22 the proposed combination further teaches wherein the computation of the length of each alphanumeric string as determining that a word sequence is long or short (Maan, ¶52 “an associated substitute short word-sequence may be determined for each of the one or more long word-sequences”; Apache-Spark V2.1.0, Page 2 pyspark.sql.functions.length: “Computes the character length of string data”) in the initi