DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In communications filed on 08/19/2024. Claims 1, 4-7, 14-15, 18-19, and 28 are amended. Claims 1-28 are pending in this examination.
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. This examination is in response to US Patent Application No. 18/347,251.
Information Disclosure Statement
The listing of references in the specification is not a proper information disclosure statement. In this application there is existing references in the specification (i.e. [0002-0012]). 37 CFR 1.98(b) requires a list of all patents, publications, or other informationsubmitted for consideration by the Office, and MPEP § 609.04(a) states, "the list may not be incorporated into the specification but must be submitted in a separate paper." Therefore, unless the references have been cited by the examiner on form PTO-892, they have not been considered.
Examiner Note
The claims 1-3, 15-17 recite “separate module”. The separate module has been described in the specifications: [0016] Fig. 4 shows a dual core processor. In one embodiment of the current invention, the separate modules of the current invention are implemented and executed on different cores of a multi-core processor. While the figure shows 2 cores, one of ordinary skill in the art would understand the current system could be implemented on any number of cores. [0017] Fig. 5 shows another embodiment of the current invention which is implemented on multi-processor system. Rather than separate cores, the modules of the current system are implemented on separate processors. [0023] The current invention has separate modules for a) data ingestion; b) data quality checking; c) processing data after quality check. It would be appreciated by one of ordinary skill in the art that these separate modules could be separate code modules such as object-oriented classes, or separate layers in a multi-tiered application. However, there is a performance advantage from implementing the current inventions modules either on separate cores as in Fig. 4, or on separate processors as in Fig. 5].
Claim 1, and 15 recite “ said threshold being data quantity, diversity, quality, OR similar metric”. This limitation with OR gives the examiner an option to just choose of the items above to examine and if the examiner chooses any item except the data quality, then the examiner does not need to examiner any of the claims 6-13, and 20-27 since all these claims are data quality algorithms. Appropriate correction/modification is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-28 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Claim Interpretation: Under the broadest reasonable interpretation, the terms of the claim are presumed to have their plain meaning consistent with the specification as it would be interpreted by one of ordinary skill in the art. See MPEP 2111.
The claim(s) recite(s) a system comprising: a computer having a processor device, an operating system, and storage devices comprising a computer memory and a persistent storage; a separate module for data querying; said module being configurable to seek new data when a threshold is met; said threshold being data quantity, diversity, quality, or similar metric; a separate module for data evaluation; said module being configurable to evaluate data based on information theory equations, statistical analysis, or similar data quality algorithm; and a separate module configurable for machine learning execution; said module being capable of executing one or more machine learning algorithms using the data that has been acquired and tested for quality. The claim does not put any limits on how the machine learning algorithm is used the data received for quality testing.
The steps recited above are performed by “ a processor coupled to a memory”. The recited processor is recited at a high level of generality, specification refers the term “processor” to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions, and since the specification is devoid of adequate structure to perform the claimed steps/functions, and merely i.e., as a generic computer performing generic computer functions.
Step 1: See MPEP 2106.03. The claim recites at least one step or act, including “ separate module for data querying…”, “separate module for data evaluation…”, “separate module configurable for machine learning execution…” Thus, the claim is to a process, which is one of the statutory categories of invention. (Step 1: YES).
Step 2A, Prong One: As explained in MPEP 2106.04, subsection II, a claim “recites” judicial
exception when the judicial exception is “set forth” or “described” in the claim.
The broadest reasonable interpretation of steps is that those
steps fall within the mental process groupings of abstract ideas because they cover concepts
performed in the human mind, including observation, evaluation, judgment, and opinion. See
MPEP 2106.04(a)(2), subsection III. Under its broadest reasonable interpretation
when read in light of the specification, the “querying data” and “evaluating data” , and “executing the data” encompasses mental observations or evaluations that are practically performed in the human mind, for example, “ separate module for data querying…”, “separate module for data evaluation…”, “separate module configurable for machine learning execution.
Step 2A, Prong Two.
See MPEP 2106.04(d). The claim recites the additional elements of “computer having a processor and memory and machine learning algorithm”. This judicial exception is not integrated into a practical application because the limitations “ separate module for data querying…”, “separate module for data evaluation…”, “separate module configurable for machine learning execution, in this case a processor, and machine learning algorithm , are recited at a high level of generality. The device is used as a tool to perform the generic computer function of querying data , evaluating the data and executing the data, and machine learning algorithm , they lack details about how these tasks are performed. Courts have held that describing use of a generic computer, a “machine learning model,” or performing abstract math on a computer is not, by itself, an inventive concept (Alice step 2 failed) unless the claim recites how the computer or machine learning model is specially configured to improve computing or solve a technical problem. See MPEP 2106.05(f). The limitations, the computer is used to perform an abstract idea, as discussed above in Step 2A, Prong One, such that it amounts to no more than mere instructions to apply the exception using a generic computer. See MPEP 2106.05(f). Even when viewed in combination, these additional elements do not integrate the recited judicial exception into a practical application (Step 2A, Prong Two: NO), and the claim is directed to the judicial exception. (Step 2A: YES).
Step 2B:
See MPEP 2106.05. As explained with respect to Step 2A, Prong Two, the additional elements.
The additional element of “computer having processor, and memory, and machine leaning algorithm” are at best mere instructions to “apply” the abstract ideas, which cannot provide an inventive concept. See MPEP 2106.05(f).
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because even when considered in combination, these additional elements represent mere instructions to implement an abstract idea or other exception on a computer and insignificant extra-solution activity, which do not provide an inventive concept. (Step 2B: NO).
The claim is ineligible.
Claims 2-5, 14, 16-19, and 28, all recites other forms of querying, evaluating and executing data such as , and the claim(s) does/do not include additional elements that are sufficient to amount to significantly more under step 2A and Sept 2B , similarly as above analyzed.
Claim 15, the limitations “computer having processor and memory, and machine learning algorithm” and the same steps, in this case a computer, and machine learning algorithm, are recited at a high level of generality, as above noted for claim 1. In these limitations are used as a tool to perform the generic computer function of receiving data and creating data. See MPEP 2106.05(f).
Claims 6-13, and 20-27 all recites mathematical formulas , which to be nothing more than by stating calculation, which uses mathematical calculations to create information. Which encompass mental choices or evaluations, and the claimed statistics performing mathematical calculations. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more under step 2A and Sept 2B , similarly as above analyzed.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL. — The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 6-13 and 20-27 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
In regard to Claims 6-13 and 20-27. the specification fails to provide written description support for the claim limitation of “data quality algorithm formula for Shannon entropy, Hartley entropy, Renyi entropy, diversity index, Shannon Weaver index, Simpson index, Kolmogorov-Smirnov test, and Kruskal- Wallis test;”. There is no evidence in the disclosure of how these different algorithms applied to the application invention , and specification just mentions the formula and what they are made of , there is no real application of the formula in the specification [ see paragraphs 29-44].
The level of detail required to satisfy the written description requirement varies depending on the nature and scope of the claims and on the complexity and predictability of the relevant technology. Ariad, 598 F.3d at 1351, 94 USPQ2d at 1172; Capon v. Eshhar, 418 F.3d 1349, 1357-58, 76 USPQ2d 1078, 1083-84 (Fed. Cir. 2005). Computer-implemented inventions are often disclosed and claimed in terms of their functionality. For computer-implemented inventions, the determination of the sufficiency of disclosure will require an inquiry into the sufficiency of both the disclosed hardware and the disclosed software due to the interrelationship and interdependence of computer hardware and software. The critical inquiry is whether the disclosure of the application relied upon reasonably conveys to those skilled in the art that the inventor had possession of the claimed subject matter as of the filing date. Vasudevan Software, Inc. v. MicroStrategy, Inc., 782 F.3d 671, 682. 114 USPQ2d 1349, 1356 (citing Ariad Pharm., Inc. V. Eli Lilly & Co, 598 F.3d 1336, 1351, 94 USPQ2d 1161, 1172 (Fed. Cir. 2010) in the context of determining possession of a claimed means of accessing disparate databases).
Applicant is kindly requested to show the examiner support in the original disclosure for the new or amended claims. See MPEP 714.02 and 2163.06 (“Applicant should specifically point out the support for any amendments made to the disclosure").
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
Claims 1-28 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
The claims 1, and 15 recite the word” being” which does not indicates performing proper wording.
Claims 2-14, and 16-28 do not cure the deficiency of claims 1, and 15 and are rejected under 35 USC 112, 2nd paragraph, for their dependency upon claims 1, and 15.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-5 are rejected under 35 U.S.C. 102(a) (1) as being anticipated by Permeh ( US2017/0249455 A1).
Regarding claim 1, Permeh discloses a system comprising: a computer having a processor device, an operating system, and storage devices comprising a computer memory and a persistent storage [0085] In some embodiments, method 500 may be implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information)…]; and
a separate module for data querying; said module being configurable to seek new data when a threshold is met; said threshold being data quantity, diversity, quality, or similar metric [0087] At 502, an electronic storage device, for example electronic storage device 104, can be scanned ( equated to data querying) for files having an unsafe file type( equated to new data when a threshold is met). The scanning can be performed using an isolated operating environment, such as local isolated operating environment 110, remote isolated operating environment 114, or the like. The scanning can be performed to identify data stored on the electronic storage device that do not match with a database of known safe data( Equated to meeting the threshold)], and [0040]; and
a separate module for data evaluation; said module being configurable to evaluate data based on information theory equations, statistical analysis, or similar data quality algorithm [0030] The isolated operating environment can analyze, using a machine learning model, the data file 106. Through the analysis, by the machine learning model, the isolated operating environment can determine whether a file, such as file 106, is safe for processing by the primary operating environment 109.], and [0087] At 502, an electronic storage device, for example electronic storage device 104, can be scanned ( equated to data querying) for files having an unsafe file type. The scanning can be performed using an isolated operating environment, such as local isolated operating environment 110( equated to separate data evaluation module) , remote isolated operating environment 114( equated to separate data evaluation module), or the like. The scanning can be performed to identify data stored on the electronic storage device that do not match with a database of known safe data( Equated to meeting the threshold)] “ and
and a separate module configurable for machine learning execution; said module being capable of executing one or more machine learning algorithms using the data that has been acquired and tested for quality.[0030] The isolated operating environment can analyze, using a machine learning model, the data file 106. Through the analysis, by the machine learning model, the isolated operating environment can determine whether a file, such as file 106, is safe for processing by the primary operating environment 109], and [0039] When the isolated operating environment is integrated with the user terminal 108, the isolated operating environment can release the requested file for interaction by a processor of the user terminal 108, in response to identifying that the requested file is safe by machine learning models], and [0042] The isolated operating environment, such as local isolated operating environment 110, remote isolated operating environment 114, or the like, can be configured to analyze the isolated data to determine whether the identified data is safe for interaction by the at least one computer processor. The analysis can be performed using a machine learning mode], and [0044] In response to determining that the data is safe for interaction by one or more components of the enterprise system, the isolated operating environment can cause an updating of the database of known safe data to include the identified data. In some variations, the isolated operating environment can cause the data identified as being safe data on the electronic storage device 104 to be updated to include a flag indicating that the data is safe for interaction].
Regarding claim 2, Permeh discloses wherein the separate modules for data querying; data evaluation; and machine learning execution constitute software modules such as separate code modules, classes, or software components that are configurable for data querying, data evaluation, and machine learning [0079]The one or more processing devices may include one or more devices executing some or all of the operations of method 400 in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 400], and [0085, 0095].
Regarding claim 3, Permeh discloses wherein the separate modules for data querying; data evaluation; and machine learning execution constitute hardware modules such as separate processors, processor cores, or separate computers that are configurable for data querying, data evaluation, and machine learning [0079] In some embodiments, method 400 may be implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information). The one or more processing devices may include one or more devices executing some or all of the operations of method 400 in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 400., and [0085, 0095].
Regarding claim 4, Permeh discloses further having a system for triggering data querying; said trigger can be configured to seek additional data based on or more of the following :machine learning metrics such as algorithm accuracy; information theoretic methods such as information diversity; information theoretic methods such as information entropy; and quantity of information; time since last information queried[0087] At 502, an electronic storage device, for example electronic storage device 104, can be scanned ( equated to data querying) for files having an unsafe file type. The scanning can be performed using an isolated operating environment, such as local isolated operating environment 110, remote isolated operating environment 114, or the like. The scanning can be performed to identify data stored on the electronic storage device that do not match with a database of known safe data( equated quality of information], and [0040] In some variations, the isolated operating environment, such as local isolated operating environment 110, remote isolated operating environment 114, or the like, can be configured to scan the electronic memory storage device 104. The electronic memory storage device 104 can be associated with the enterprise server 102. The scanning of the electronic memory storage device 104 can include scanning to identify data stored on the electronic memory storage device that do not match with a database of known safe datatypes].
Regarding claim 5, Permeh discloses wherein triggering will cause the system to utilize the data querying module to seek out new data; that data is then processed and formatted, then ingested for use by the data evaluation modules [0087] At 502, an electronic storage device, for example electronic storage device 104, can be scanned ( equated to data querying) for files having an unsafe file type. The scanning can be performed using an isolated operating environment, such as local isolated operating environment 110( equated to separate data evaluation module) , remote isolated operating environment 114( equated to separate data evaluation module), or the like. The scanning can be performed to identify data stored on the electronic storage device that do not match with a database of known safe data], and [0040] In some variations, the isolated operating environment, such as local isolated operating environment 110, remote isolated operating environment 114, or the like, can be configured to scan the electronic memory storage device 104. The electronic memory storage device 104 can be associated with the enterprise server 102. The scanning of the electronic memory storage device 104 can include scanning to identify data stored on the electronic memory storage device that do not match with a database of known safe datatypes].
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 6-9, and 11 are rejected under 35 U.S.C. 103 as being unpatentable over US Patent No. ( US2017/0249455 A1) issued to Permeh ( US2017/0249455 A1), and in view of US Patent No. (US77,763,176) issued to Auerbach.
Regarding claim 6, Permeh does not explicitly disclose, however, Auerbach discloses where at least one of the data quality algorithms is the Shannon Entropy:
PNG
media_image1.png
82
252
media_image1.png
Greyscale
[ Abstract, Techniques for improved searching and querying in computer-based reasoning systems are discussed and include receiving multiple new multidimensional data element to store in a computer-based reasoning data model, determining a feature bucket for each feature of each data element and storing a reference identifier in the feature bucket(s). A query on the computer-based reasoning system includes input data element (e.g., an actual data element, or a set of restrictions on features). For each feature in the input data element, feature buckets are determined, candidate results are determined based on whether cases have related feature buckets, and the results are determined based at least in part on the candidate results], and [ Col. 61, lines 11-31, Determining a certainty score for a particular case can be accomplished by removing the particular case from the case-based or computer-based reasoning model and determining the conviction score of the particular case based on an entropy measure associated with adding that particular case back into the model. Any appropriate entropy measure, variance, confidence, and/or related method can be used for making this determination, such as the ones described herein. In some embodiments, certainty or conviction is determined by the expected information gain of adding the case to the model divided by the actual information gain of adding the case. For example, in some embodiments, certainty or conviction may be determined based on Shannon Entropy, Renyi entropy, Hartley entropy, min entropy, Collision entropy, Renyi divergence, diversity index, Simpson index, Gini coefficient, Kullback-Leibler divergence, Fisher information, Jensen-Shannon divergence, Symmetrised divergence. In some embodiments, certainty scores are conviction scores and are determined by calculating the entropy, comparing the ratio of entropies, and/or the like].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh by incorporating “Shannon Entropy”, as taught by Auerbach. One could have been motivated to do so in order to implement reasoning systems and more specifically to searching and querying in computer-based reasoning systems, including the certainty (e.g., as a certainty function) that a particular set of data fits a model, the confidence that a particular set of data conforms to the model, or the importance of a feature or case with regard to the model [ Auerbach, field of invention, Col. 61, lines 11-31].
Regarding claim 7, Permeh does not explicitly disclose, however, Auerbach discloses where at least one of the data quality algorithms is the Hartley Entropy: HO(A) := logblAl.
[ Abstract, Techniques for improved searching and querying in computer-based reasoning systems are discussed and include receiving multiple new multidimensional data element to store in a computer-based reasoning data model; determining a feature bucket for each feature of each data element and storing a reference identifier in the feature bucket(s). A query on the computer-based reasoning system includes input data element (e.g., an actual data element, or a set of restrictions on features). For each feature in the input data element, feature buckets are determined, candidate results are determined based on whether cases have related feature buckets, and the results are determined based at least in part on the candidate results], and [ Col. 61, lines 11-31, Determining a certainty score for a particular case can be accomplished by removing the particular case from the case-based or computer-based reasoning model and determining the conviction score of the particular case based on an entropy measure associated with adding that particular case back into the model. Any appropriate entropy measure, variance, confidence, and/or related method can be used for making this determination, such as the ones described herein. In some embodiments, certainty or conviction is determined by the expected information gain of adding the case to the model divided by the actual information gain of adding the case. For example, in some embodiments, certainty or conviction may be determined based on Shannon Entropy, Renyi entropy, Hartley entropy, min entropy, Collision entropy, Renyi divergence, diversity index, Simpson index, Gini coefficient, Kullback-Leibler divergence, Fisher information, Jensen-Shannon divergence, Symmetrised divergence. In some embodiments, certainty scores are conviction scores and are determined by calculating the entropy, comparing the ratio of entropies, and/or the like].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh by incorporating “Hartley Entropy”, as taught by Auerbach. One could have been motivated to do so in order to implement reasoning systems and more specifically to searching and querying in computer-based reasoning systems, including the certainty (e.g., as a certainty function) that a particular set of data fits a model, the confidence that a particular set of data conforms to the model, or the importance of a feature or case with regard to the model [ Auerbach, field of invention, Col. 61, lines 11-31].
Regarding claim 8, Permeh does not explicitly disclose, however, Auerbach discloses where at least one of the data quality algorithms is the Renyi entropy:
PNG
media_image2.png
50
222
media_image2.png
Greyscale
[ Abstract, Techniques for improved searching and querying in computer-based reasoning systems are discussed and include receiving multiple new multidimensional data element to store in a computer-based reasoning data model; determining a feature bucket for each feature of each data element and storing a reference identifier in the feature bucket(s). A query on the computer-based reasoning system includes input data element (e.g., an actual data element, or a set of restrictions on features). For each feature in the input data element, feature buckets are determined, candidate results are determined based on whether cases have related feature buckets, and the results are determined based at least in part on the candidate results], and [ Col. 61, lines 11-31, Determining a certainty score for a particular case can be accomplished by removing the particular case from the case-based or computer-based reasoning model and determining the conviction score of the particular case based on an entropy measure associated with adding that particular case back into the model. Any appropriate entropy measure, variance, confidence, and/or related method can be used for making this determination, such as the ones described herein. In some embodiments, certainty or conviction is determined by the expected information gain of adding the case to the model divided by the actual information gain of adding the case. For example, in some embodiments, certainty or conviction may be determined based on Shannon Entropy, Renyi entropy, Hartley entropy, min entropy, Collision entropy, Renyi divergence, diversity index, Simpson index, Gini coefficient, Kullback-Leibler divergence, Fisher information, Jensen-Shannon divergence, Symmetrised divergence. In some embodiments, certainty scores are conviction scores and are determined by calculating the entropy, comparing the ratio of entropies, and/or the like].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh by incorporating “Renyi entropy y”, as taught by Auerbach. One could have been motivated to do so in order to implement reasoning systems and more specifically to searching and querying in computer-based reasoning systems, including the certainty (e.g., as a certainty function) that a particular set of data fits a model, the confidence that a particular set of data conforms to the model, or the importance of a feature or case with regard to the model [ Auerbach, field of invention, Col. 61, lines 11-31].
Regarding claim 9, Permeh does not explicitly disclose, however, Auerbach discloses where at least one of the data quality algorithms is the diversity index
PNG
media_image3.png
68
266
media_image3.png
Greyscale
[ Abstract, Techniques for improved searching and querying in computer-based reasoning systems are discussed and include receiving multiple new multidimensional data element to store in a computer-based reasoning data model; determining a feature bucket for each feature of each data element and storing a reference identifier in the feature bucket(s). A query on the computer-based reasoning system includes input data element (e.g., an actual data element, or a set of restrictions on features). For each feature in the input data element, feature buckets are determined, candidate results are determined based on whether cases have related feature buckets, and the results are determined based at least in part on the candidate results], and [ Col. 61, lines 11-31, Determining a certainty score for a particular case can be accomplished by removing the particular case from the case-based or computer-based reasoning model and determining the conviction score of the particular case based on an entropy measure associated with adding that particular case back into the model. Any appropriate entropy measure, variance, confidence, and/or related method can be used for making this determination, such as the ones described herein. In some embodiments, certainty or conviction is determined by the expected information gain of adding the case to the model divided by the actual information gain of adding the case. For example, in some embodiments, certainty or conviction may be determined based on Shannon Entropy, Renyi entropy, Hartley entropy, min entropy, Collision entropy, Renyi divergence, diversity index, Simpson index, Gini coefficient, Kullback-Leibler divergence, Fisher information, Jensen-Shannon divergence, Symmetrised divergence. In some embodiments, certainty scores are conviction scores and are determined by calculating the entropy, comparing the ratio of entropies, and/or the like].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh by incorporating “diversity index ”, as taught by Auerbach. One could have been motivated to do so in order to implement reasoning systems and more specifically to searching and querying in computer-based reasoning systems, including the certainty (e.g., as a certainty function) that a particular set of data fits a model, the confidence that a particular set of data conforms to the model, or the importance of a feature or case with regard to the model [ Auerbach, field of invention, Col. 61, lines 11-31].
Regarding claim 11, Permeh does not explicitly disclose, however, Auerbach discloses where at least one of the data quality algorithms is the Simpson index
PNG
media_image4.png
21
95
media_image4.png
Greyscale
[ Abstract, Techniques for improved searching and querying in computer-based reasoning systems are discussed and include receiving multiple new multidimensional data element to store in a computer-based reasoning data model; determining a feature bucket for each feature of each data element and storing a reference identifier in the feature bucket(s). A query on the computer-based reasoning system includes input data element (e.g., an actual data element, or a set of restrictions on features). For each feature in the input data element, feature buckets are determined, candidate results are determined based on whether cases have related feature buckets, and the results are determined based at least in part on the candidate results], and [ Col. 61, lines 11-31, Determining a certainty score for a particular case can be accomplished by removing the particular case from the case-based or computer-based reasoning model and determining the conviction score of the particular case based on an entropy measure associated with adding that particular case back into the model. Any appropriate entropy measure, variance, confidence, and/or related method can be used for making this determination, such as the ones described herein. In some embodiments, certainty or conviction is determined by the expected information gain of adding the case to the model divided by the actual information gain of adding the case. For example, in some embodiments, certainty or conviction may be determined based on Shannon Entropy, Renyi entropy, Hartley entropy, min entropy, Collision entropy, Renyi divergence, diversity index, Simpson index, Gini coefficient, Kullback-Leibler divergence, Fisher information, Jensen-Shannon divergence, Symmetrised divergence. In some embodiments, certainty scores are conviction scores and are determined by calculating the entropy, comparing the ratio of entropies, and/or the like].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh by incorporating “Simpson index”, as taught by Auerbach. One could have been motivated to do so in order to implement reasoning systems and more specifically to searching and querying in computer-based reasoning systems, including the certainty (e.g., as a certainty function) that a particular set of data fits a model, the confidence that a particular set of data conforms to the model, or the importance of a feature or case with regard to the model [ Auerbach, field of invention, Col. 61, lines 11-31].
Claims 10 is rejected under 35 U.S.C. 103 as being unpatentable over US Patent No. ( US2017/0249455 A1) issued to Permeh ( US2017/0249455 A1), and in view of US Patent No. (US2017/0212929) issued to Sundaresan.
Regarding claim 10, Permeh does not explicitly disclose, however, Sundaresan discloses where at least one of the data quality algorithms is the Shannon Weaver index
PNG
media_image5.png
46
208
media_image5.png
Greyscale
[0029] Once the associated queries are identified, the search engine module then measures a diversity of the query terms included in these queries using, for example as discussed earlier, Simpson's diversity index or Shannon's index. In this example, a diversity between the query terms 404 is measured to be sufficiently diverse because of the variety of different query terms 404, such as “ancient,” “challenge,” and “Chinese,” As explained in more detail below, the identification of whether a measurement is sufficiently diverse is based on a comparison of the computed index, for example, with a threshold value. However, if all the query terms are only “coins,” then the query terms 404 may not be sufficiently diverse for use to expand the search.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh by incorporating “Shannon Weaver index”, as taught by Sundaresan. One could have been motivated to do so in order for a search based on diversity where the system calculates a diversity index for a result set comprising results for a query. The diversity index is a measure of diversity among the results of the query. The diversity index relates to differences among the results. The system compares the diversity index to a threshold value [ Sundaresan, Abstract, ¶2].
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over US Patent No. ( US2017/0249455 A1) issued to Permeh ( US2017/0249455 A1), and in view of US Patent No. ( US2016/0070795) issued to Salamon.
Regarding claim 12, Permeh does not explicitly disclose, however, Salamon discloses where at least one of the data quality algorithms is the Kolmogorov-Smirnov test
PNG
media_image6.png
66
222
media_image6.png
Greyscale
[0007] Text search of documents and published or otherwise summarized results from specific data analyses has been performed, and the data collected is query able at a low level in databases. This is done using some form of Structured Query Language. [0053] In one embodiment, the present invention generates statistically significant findings that combine descriptive text from different sources… Examples of statistical methods to test (a) are CERNO (see the example below), the Mann-Whitney U test, the Wilcoxon Sum Rank test, the hypergeometric test, the Fishert exact test, Gene Set Enrichment Analysis (GSEA) tests such as based on the Kolmogorov-Smirnov statistic, and others. Examples of statistical tests for (b) are the globaltest, methods based on Hotelling's T-squared distribution, and others. Sources of text associated with the findings include descriptive text from the experiment that generated the transcript data and descriptive text for the subsets of genes, such as text that describes subset types 1-19 above.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh, by incorporating “Kolmogorov-Smirnov statistic”, as taught by Salamon. One could have been motivated to do so in order for analyzing data, and more particularly to searching large data collections to generate large numbers of statistically significant findings from large data repositories using automatic computation on subsets of identified electronic records, and to provide for ranking and querying statistical analysis results of database contents for the purpose of populating search engine query results with novel content, [ Salamon, Abstract, [0003]].
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over US Patent No. ( US2017/0249455 A1) issued to Permeh ( US2017/0249455 A1), and in view of US Patent No. (US11,082, 487) issued to Jain
Regarding claim 13, Permeh does not explicitly disclose, however, Jain discloses where at least one of the data quality algorithms is the Kruskal-Wallis test
PNG
media_image7.png
74
228
media_image7.png
Greyscale
[Col. 57 lines 43-49, In some implementations, the data collected through the multi-tenant data access platform comprises monitoring data generated using sensors of mobile devices and/or wearable devices of users. In some implementations, the data access request comprises a search query for the data for different units of the multiple tenant organizations], and [Col. 35 lines 41-51, Many types of data access requests are possible. Examples include requests to view or download data. Other examples include requests to search the data (e.g., a query), to generate a report based on the data, to generate a visualization based on the data, to perform a machine learning task based on the data (e.g., use the data as training data for training a model, using the data to test or validate a model, applying a model to generate a prediction based on the data, etc.), to evaluate the data to identify candidate participants for a research study cohort, to perform statistical analysis on the data, and so on [ Col. 36, 3-14, In some implementations, the data access request includes a search query for health data for health research studies corresponding to different units of the tenant organizations. For example, as shown in FIG. 2, the data access request 202 includes a search query for chemotherapy treatments received by participants of cancer research studies. In this example, the server system 110 determines that study data 214A of Study A and study data 214B of Study B both include participant data that is relevant to the search query. Study A and Study B can also be conducted by different departments of the same hospital (e.g., Oncology department and Cardiology department of Hospital A)], and [(160) Other examples of data processing requests include requests to determine statistical measures for a data set (e.g., mean, median, mode, maximum, minimum, variance, distribution characteristics, etc.) for various properties… Kruskal Wallis test, paired t-test].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh by incorporating “Kruskal-Wallis test”, as taught by Jain. One could have been motivated to do so in order to provide data processing requests for data access request for a search query for the data which include requests to determine statistical measures for a data set (e.g., mean, median, mode, maximum, minimum, variance, distribution characteristics, etc.) for various properties. [ Jain, Col. 36, 3-14, claim 3].
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over US Patent No. ( US2017/0249455 A1) issued to Permeh ( US2017/0249455 A1), and in view of US Patent No. (US2019/0347359) issued to Guy.
Regarding claim 14, Permeh does not explicitly disclose, however, Guy discloses, where the data query module utilizes web crawling and web scraping [ Abstract, Various methods and systems for processing web crawling queries using a web crawling prioritization model based on classification operation performance. A classification operation for organizing products in a product listing platform is accessed. A web crawling engine is accessed for the classification operation. The web crawling engine operates based on a web crawling query prioritization model that supports determining web crawling priority scores that indicate a predicted performance improvement for classification operations executed with known data and web crawled data to be retrieved from executing a web crawling query operation. Using the web crawling prioritization model, a web crawling priority score is determined for a web crawling query for the corresponding classification operation. The classification operation is associated with a product in a product listing platform and known data for the product. Based on the web crawling priority score, the web crawling query is executed to identify web crawled data.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh by incorporating “web crawling prioritization model ”, as taught by Guy. One could have been motivated to do so in order to implement various methods and systems for processing web crawling queries using a web crawling prioritization model based on classification operation performance. A web crawling engine is accessed for the classification operation. The web crawling engine operates based on a web crawling query prioritization model that supports determining web crawling priority scores that indicate a predicted performance improvement for classification operations executed with known data and web crawled data to be retrieved from executing a web crawling query operation. [ Guy , Abstract].
Claims 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over US Patent No. ( US2017/0249455 A1) issued to Permeh ( US2017/0249455 A1), and in view of US Patent No. ( US2018/0158157) issued to Dintenfass .
Regarding claim 15, Permeh discloses a computer having a processor device, an operating system, and storage devices comprising a computer memory and a persistent storage [0085] In some embodiments, method 500 may be implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information)…]; and
a separate module for data querying said module being configurable to seek new data when a threshold is met 0087] At 502, an electronic storage device, for example electronic storage device 104, can be scanned ( equated to data querying) for files having an unsafe file type( equated to new data when a threshold is met). The scanning can be performed using an isolated operating environment, such as local isolated operating environment 110, remote isolated operating environment 114, or the like. The scanning can be performed to identify data stored on the electronic storage device that do not match with a database of known safe data( Equated to meeting the threshold)], and [0040]; and
a separate module for data evaluation ,said module being configurable to evaluate data based on information theory equations, statistical analysis, or similar data quality algorithm ; said threshold being data quantity, diversity, quality, or similar metric [0030] The isolated operating environment can analyze, using a machine learning model, the data file 106. Through the analysis, by the machine learning model, the isolated operating environment can determine whether a file, such as file 106, is safe for processing by the primary operating environment 109.], and [0087] At 502, an electronic storage device, for example electronic storage device 104, can be scanned ( equated to data querying) for files having an unsafe file type. The scanning can be performed using an isolated operating environment, such as local isolated operating environment 110( equated to separate data evaluation module) , remote isolated operating environment 114( equated to separate data evaluation module), or the like. The scanning can be performed to identify data stored on the electronic storage device that do not match with a database of known safe data( Equated to meeting the threshold)] ; and
a separate module configurable for machine learning execution; said module being capable of executing one or more machine learning algorithms using the data that has been acquired and tested for quality[0030] The isolated operating environment can analyze, using a machine learning model, the data file 106. Through the analysis, by the machine learning model, the isolated operating environment can determine whether a file, such as file 106, is safe for processing by the primary operating environment 109], and [0039] When the isolated operating environment is integrated with the user terminal 108, the isolated operating environment can release the requested file for interaction by a processor of the user terminal 108, in response to identifying that the requested file is safe by machine learning models], and [0042] The isolated operating environment, such as local isolated operating environment 110, remote isolated operating environment 114, or the like, can be configured to analyze the isolated data to determine whether the identified data is safe for interaction by the at least one computer processor. The analysis can be performed using a machine learning mode], and [0044] In response to determining that the data is safe for interaction by one or more components of the enterprise system, the isolated operating environment can cause an updating of the database of known safe data to include the identified data. In some variations, the isolated operating environment can cause the data identified as being safe data on the electronic storage device 104 to be updated to include a flag indicating that the data is safe for interaction].
Permeh does not explicitly disclose, however, Dintenfass discloses said data querying module also being configurable for object character recognition (OCR) [0078] The OCR recognition engine 226 is configured to identify objects, object features, text, and/or logos using images 207 or video streams created from a series of images 207. In one embodiment, the OCR recognition engine 226 is configured to identify objects and/or text within an image 207 captured by the camera 206. In another embodiment, the OCR recognition engine 226 is configured to identify objects and/or text in about real-time on a video stream captured by the camera 206 when the camera 206 is configured to continuously capture images 207. The OCR recognition engine 226 employs any suitable technique for implementing object and/or text recognition as would be appreciated by one of ordinary skill in the art upon viewing this disclosure.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Permeh by incorporating “ OCR recognition engine”, as taught by Dintenfass. One could have been motivated to do so in order to employ any suitable technique for implementing object and/or text recognition, and/or logos using images or video streams created from a series of images as would be appreciated by one of ordinary skill in the art upon viewing this disclosure. [ Dintenfass, [0078]].
Regarding claim 16, the combination of Permeh and Dintenfass disclose wherein the separate modules for data querying; object character recognition (OCR); data evaluation; and machine learning execution constitute software modules such as separate code modules, classes, or software components that are configurable for data querying, data evaluation, and machine learning.
Permeh discloses : [0079]The one or more processing devices may include one or more devices executing some or all of the operations of method 400 in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 400], and [0085, 0095].
Dintenfass discloses : [0078] The OCR recognition engine 226 is configured to identify objects, object features, text, and/or logos using images 207 or video streams created from a series of images 207. In one embodiment, the OCR recognition engine 226 is configured to identify objects and/or text within an image 207 captured by the camera 206. In another embodiment, the OCR recognition engine 226 is configured to identify objects and/or text in about real-time on a video stream captured by the camera 206 when the camera 206 is configured to continuously capture images 207. The OCR recognition engine 226 employs any suitable technique for implementing object and/or text recognition as would be appreciated by one of ordinary skill in the art upon viewing this disclosure.
Regarding claim 17, the combination of Permeh and Dintenfass disclose wherein the separate modules for data querying; object character recognition (OCR); data evaluation; and machine learning execution constitute hardware modules such as separate processors, processor cores, or separate computers that are configurable for data querying, data evaluation, and machine learning.
Permeh discloses : [0079] In some embodiments, method 400 may be implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information). The one or more processing devices may include one or more devices executing some or all of the operations of method 400 in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 400., and [0085, 0095].
Dintenfass discloses : [0078] The OCR recognition engine 226 is configured to identify objects, object features, text, and/or logos using images 207 or video streams created from a series of images 207. In one embodiment, the OCR recognition engine 226 is configured to identify objects and/or text within an image 207 captured by the camera 206. In another embodiment, the OCR recognition engine 226 is configured to identify objects and/or text in about real-time on a video stream captured by the camera 206 when the camera 206 is configured to continuously capture images 207. The OCR recognition engine 226 employs any suitable technique for implementing object and/or text recognition as would be appreciated by one of ordinary skill in the art upon viewing this disclosure.
Regarding claim 18, Permeh discloses further having a system for triggering data querying; said trigger can be configured to seek additional data based on or more of the following: machine learning metrics such as algorithm accuracy; information theoretic methods such as information diversity; information theoretic methods such as information entropy; quantity of information; time since last information queried [0087] At 502, an electronic storage device, for example electronic storage device 104, can be scanned ( equated to data querying) for files having an unsafe file type. The scanning can be performed using an isolated operating environment, such as local isolated operating environment 110, remote isolated operating environment 114, or the like. The scanning can be performed to identify data stored on the electronic storage device that do not match with a database of known safe data( equated quality of information], and [0040] In some variations, the isolated operating environment, such as local isolated operating environment 110, remote isolated operating environment 114, or the like, can be configured to scan the electronic memory storage device 104. The electronic memory storage device 104 can be associated with the enterprise server 102. The scanning of the electronic memory storage device 104 can include scanning to identify data stored on the electronic memory storage device that do not match with a database of known safe datatypes].
Regarding claim 19, Permeh discloses wherein triggering will cause the system to utilize the data querying module to seek out new data; that data is then processed and formatted, then ingested for use by the data evaluation modules[0087] At 502, an electronic storage device, for example electronic storage device 104, can be scanned ( equated to data querying) for files having an unsafe file type. The scanning can be performed using an isolated operating environment, such as local isolated operating environment 110( equated to separate data evaluation module) , remote isolated operating environment 114( equated to separate data evaluation module), or the like. The scanning can be performed to identify data stored on the electronic storage device that do not match with a database of known safe data], and [0040] In some variations, the isolated operating environment, such as local isolated operating environment 110, remote isolated operating environment 114, or the like, can be configured to scan the electronic memory storage device 104. The electronic memory storage device 104 can be associated with the enterprise server 102. The scanning of the electronic memory storage device 104 can include scanning to identify data stored on the electronic memory storage device that do not match with a database of known safe datatypes].
Claims 20-23, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over US Patent No. ( US2017/0249455 A1) issued to Permeh ( US2017/0249455 A1), and in view of US Patent No. ( US2018/0158157) issued to Dintenfass and further in view of (US77,763,176) issued to Auerbach.
Regarding claim 20, Permeh, and Dintenfass do not explicitly disclose, however, Auerbach discloses where at least one of the data quality algorithms is the Shannon Entropy:
PNG
media_image1.png
82
252
media_image1.png
Greyscale
[ Abstract, Techniques for improved searching and querying in computer-based reasoning systems are discussed and include receiving multiple new multidimensional data element to store in a computer-based reasoning data model; determining a feature bucket for each feature of each data element and storing a reference identifier in the feature bucket(s). A query on the computer-based reasoning system includes input data element (e.g., an actual data element, or a set of restrictions on features). For each feature in the input data element, feature buckets are determined, candidate results are determined based on whether cases have related feature buckets, and the results are determined based at least in part on the candidate results], and [ Col. 61, lines 11-31, Determining a certainty score for a particular case can be accomplished by removing the particular case from the case-based or computer-based reasoning model and determining the conviction score of the particular case based on an entropy measure associated with adding that particular case back into the model. Any appropriate entropy measure, variance, confidence, and/or related method can be used for making this determination, such as the ones described herein. In some embodiments, certainty or conviction is determined by the expected information gain of adding the case to the model divided by the actual information gain of adding the case. For example, in some embodiments, certainty or conviction may be determined based on Shannon Entropy, Renyi entropy, Hartley entropy, min entropy, Collision entropy, Renyi divergence, diversity index, Simpson index, Gini coefficient, Kullback-Leibler divergence, Fisher information, Jensen-Shannon divergence, Symmetrised divergence. In some embodiments, certainty scores are conviction scores and are determined by calculating the entropy, comparing the ratio of entropies, and/or the like].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh, and Dintenfass by incorporating “Shannon Entropy ”, as taught by Auerbach. One could have been motivated to do so in order to implement reasoning systems and more specifically to searching and querying in computer-based reasoning systems, including the certainty (e.g., as a certainty function) that a particular set of data fits a model, the confidence that a particular set of data conforms to the model, or the importance of a feature or case with regard to the model [ Auerbach, field of invention, Col. 61, lines 11-31].
Regarding claim 21, Permeh, and Dintenfass do not explicitly disclose, however, Auerbach discloses where at least one of the data quality algorithms is the Hartley Entropy: HO(A) := logblAl.
[ Abstract, Techniques for improved searching and querying in computer-based reasoning systems are discussed and include receiving multiple new multidimensional data element to store in a computer-based reasoning data model; determining a feature bucket for each feature of each data element and storing a reference identifier in the feature bucket(s). A query on the computer-based reasoning system includes input data element (e.g., an actual data element, or a set of restrictions on features). For each feature in the input data element, feature buckets are determined, candidate results are determined based on whether cases have related feature buckets, and the results are determined based at least in part on the candidate results], and [ Col. 61, lines 11-31, Determining a certainty score for a particular case can be accomplished by removing the particular case from the case-based or computer-based reasoning model and determining the conviction score of the particular case based on an entropy measure associated with adding that particular case back into the model. Any appropriate entropy measure, variance, confidence, and/or related method can be used for making this determination, such as the ones described herein. In some embodiments, certainty or conviction is determined by the expected information gain of adding the case to the model divided by the actual information gain of adding the case. For example, in some embodiments, certainty or conviction may be determined based on Shannon Entropy, Renyi entropy, Hartley entropy, min entropy, Collision entropy, Renyi divergence, diversity index, Simpson index, Gini coefficient, Kullback-Leibler divergence, Fisher information, Jensen-Shannon divergence, Symmetrised divergence. In some embodiments, certainty scores are conviction scores and are determined by calculating the entropy, comparing the ratio of entropies, and/or the like].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh, and Dintenfass by incorporating “Hartley Entropy”, as taught by Auerbach. One could have been motivated to do so in order to implement reasoning systems and more specifically to searching and querying in computer-based reasoning systems, including the certainty (e.g., as a certainty function) that a particular set of data fits a model, the confidence that a particular set of data conforms to the model, or the importance of a feature or case with regard to the model [ Auerbach, field of invention, Col. 61, lines 11-31].
Regarding claim 22, Permeh, and Dintenfass do not explicitly disclose, however, Auerbach discloses where at least one of the data quality algorithms is the Renyi entropy:
PNG
media_image2.png
50
222
media_image2.png
Greyscale
[ Abstract, Techniques for improved searching and querying in computer-based reasoning systems are discussed and include receiving multiple new multidimensional data element to store in a computer-based reasoning data model; determining a feature bucket for each feature of each data element and storing a reference identifier in the feature bucket(s). A query on the computer-based reasoning system includes input data element (e.g., an actual data element, or a set of restrictions on features). For each feature in the input data element, feature buckets are determined, candidate results are determined based on whether cases have related feature buckets, and the results are determined based at least in part on the candidate results], and [ Col. 61, lines 11-31, Determining a certainty score for a particular case can be accomplished by removing the particular case from the case-based or computer-based reasoning model and determining the conviction score of the particular case based on an entropy measure associated with adding that particular case back into the model. Any appropriate entropy measure, variance, confidence, and/or related method can be used for making this determination, such as the ones described herein. In some embodiments, certainty or conviction is determined by the expected information gain of adding the case to the model divided by the actual information gain of adding the case. For example, in some embodiments, certainty or conviction may be determined based on Shannon Entropy, Renyi entropy, Hartley entropy, min entropy, Collision entropy, Renyi divergence, diversity index, Simpson index, Gini coefficient, Kullback-Leibler divergence, Fisher information, Jensen-Shannon divergence, Symmetrised divergence. In some embodiments, certainty scores are conviction scores and are determined by calculating the entropy, comparing the ratio of entropies, and/or the like].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh, and Dintenfass by incorporating “Renyi entropy”, as taught by Auerbach. One could have been motivated to do so in order to implement reasoning systems and more specifically to searching and querying in computer-based reasoning systems, including the certainty (e.g., as a certainty function) that a particular set of data fits a model, the confidence that a particular set of data conforms to the model, or the importance of a feature or case with regard to the model [ Auerbach, field of invention, Col. 61, lines 11-31].
Regarding claim 23, Permeh, and Dintenfass do not explicitly disclose, however, Auerbach discloses where at least one of the data quality algorithms is the diversity index
PNG
media_image3.png
68
266
media_image3.png
Greyscale
[ Abstract, Techniques for improved searching and querying in computer-based reasoning systems are discussed and include receiving multiple new multidimensional data element to store in a computer-based reasoning data model; determining a feature bucket for each feature of each data element and storing a reference identifier in the feature bucket(s). A query on the computer-based reasoning system includes input data element (e.g., an actual data element, or a set of restrictions on features). For each feature in the input data element, feature buckets are determined, candidate results are determined based on whether cases have related feature buckets, and the results are determined based at least in part on the candidate results], and [ Col. 61, lines 11-31, Determining a certainty score for a particular case can be accomplished by removing the particular case from the case-based or computer-based reasoning model and determining the conviction score of the particular case based on an entropy measure associated with adding that particular case back into the model. Any appropriate entropy measure, variance, confidence, and/or related method can be used for making this determination, such as the ones described herein. In some embodiments, certainty or conviction is determined by the expected information gain of adding the case to the model divided by the actual information gain of adding the case. For example, in some embodiments, certainty or conviction may be determined based on Shannon Entropy, Renyi entropy, Hartley entropy, min entropy, Collision entropy, Renyi divergence, diversity index, Simpson index, Gini coefficient, Kullback-Leibler divergence, Fisher information, Jensen-Shannon divergence, Symmetrised divergence. In some embodiments, certainty scores are conviction scores and are determined by calculating the entropy, comparing the ratio of entropies, and/or the like].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh, and Dintenfass by incorporating “diversity index”, as taught by Auerbach. One could have been motivated to do so in order to implement reasoning systems and more specifically to searching and querying in computer-based reasoning systems, including the certainty (e.g., as a certainty function) that a particular set of data fits a model, the confidence that a particular set of data conforms to the model, or the importance of a feature or case with regard to the model [ Auerbach, field of invention, Col. 61, lines 11-31].
Regarding claim 25, Permeh, and Dintenfass do not explicitly disclose, however, Auerbach discloses where at least one of the data quality algorithms is the Simpson index
PNG
media_image4.png
21
95
media_image4.png
Greyscale
[ Abstract, Techniques for improved searching and querying in computer-based reasoning systems are discussed and include receiving multiple new multidimensional data element to store in a computer-based reasoning data model; determining a feature bucket for each feature of each data element and storing a reference identifier in the feature bucket(s). A query on the computer-based reasoning system includes input data element (e.g., an actual data element, or a set of restrictions on features). For each feature in the input data element, feature buckets are determined, candidate results are determined based on whether cases have related feature buckets, and the results are determined based at least in part on the candidate results], and [ Col. 61, lines 11-31, Determining a certainty score for a particular case can be accomplished by removing the particular case from the case-based or computer-based reasoning model and determining the conviction score of the particular case based on an entropy measure associated with adding that particular case back into the model. Any appropriate entropy measure, variance, confidence, and/or related method can be used for making this determination, such as the ones described herein. In some embodiments, certainty or conviction is determined by the expected information gain of adding the case to the model divided by the actual information gain of adding the case. For example, in some embodiments, certainty or conviction may be determined based on Shannon Entropy, Renyi entropy, Hartley entropy, min entropy, Collision entropy, Renyi divergence, diversity index, Simpson index, Gini coefficient, Kullback-Leibler divergence, Fisher information, Jensen-Shannon divergence, Symmetrised divergence. In some embodiments, certainty scores are conviction scores and are determined by calculating the entropy, comparing the ratio of entropies, and/or the like].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh by incorporating “Simpson index”, as taught by Auerbach. One could have been motivated to do so in order to implement reasoning systems and more specifically to searching and querying in computer-based reasoning systems, including the certainty (e.g., as a certainty function) that a particular set of data fits a model, the confidence that a particular set of data conforms to the model, or the importance of a feature or case with regard to the model [ Auerbach, field of invention, Col. 61, lines 11-31].
Claim 24 is rejected under 35 U.S.C. 103 as being unpatentable over US Patent No. ( US2017/0249455 A1) issued to Permeh ( US2017/0249455 A1), and in view of US Patent No. ( US2018/0158157) issued to Dintenfass , and in view of US Patent No. (US2017/0212929) issued to Sundaresan.
Regarding claim 24, Permeh, and Dintenfass do not explicitly disclose, however, Sundaresan discloses where at least one of the data quality algorithms is the Shannon Weaver index
PNG
media_image5.png
46
208
media_image5.png
Greyscale
[0029] Once the associated queries are identified, the search engine module then measures a diversity of the query terms included in these queries using, for example as discussed earlier, Simpson's diversity index or Shannon's index. In this example, a diversity between the query terms 404 is measured to be sufficiently diverse because of the variety of different query terms 404, such as “ancient,” “challenge,” and “Chinese,” As explained in more detail below, the identification of whether a measurement is sufficiently diverse is based on a comparison of the computed index, for example, with a threshold value. However, if all the query terms are only “coins,” then the query terms 404 may not be sufficiently diverse for use to expand the search,
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh, and Dintenfass by incorporating “Simpson index algorithm”, as taught by Sundaresan. One could have been motivated to do so in order for a search based on diversity where the system calculates a diversity index for a result set comprising results for a query. The diversity index is a measure of diversity among the results of the query. The diversity index relates to differences among the results. The system compares the diversity index to a threshold value [ Sundaresan, Abstract, ¶2].
Claim 26 is rejected under 35 U.S.C. 103 as being unpatentable over US Patent No. ( US2017/0249455 A1) issued to Permeh ( US2017/0249455 A1), and in view of US Patent No. ( US2018/0158157) issued to Dintenfass , and in view of US Patent No. ( US2016/0070795) issued to Salamon.
Regarding claim 26, Permeh, and Dintenfass do not explicitly disclose, however, Salamon discloses where at least one of the data quality algorithms is the Kolmogorov-Smirnov test
PNG
media_image6.png
66
222
media_image6.png
Greyscale
[0007] Text search of documents and published or otherwise summarized results from specific data analyses has been performed, and the data collected is query able at a low level in databases. This is done using some form of Structured Query Language], and [0053] In one embodiment, the present invention generates statistically significant findings that combine descriptive text from different sources… Examples of statistical methods to test (a) are CERNO (see the example below), the Mann-Whitney U test, the Wilcoxon Sum Rank test, the hypergeometric test, the Fishert exact test, Gene Set Enrichment Analysis (GSEA) tests such as based on the Kolmogorov-Smirnov statistic, and others. Examples of statistical tests for (b) are the globaltest, methods based on Hotelling's T-squared distribution, and others. Sources of text associated with the findings include descriptive text from the experiment that generated the transcript data and descriptive text for the subsets of genes, such as text that describes subset types 1-19 above].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh, and Dintenfass by incorporating “Kolmogorov-Smirnov statistic”, as taught by Salamon. One could have been motivated to do so in order for analyzing data, and more particularly to searching large data collections to generate large numbers of statistically significant findings from large data repositories using automatic computation on subsets of identified electronic records, and to provide for ranking and querying statistical analysis results of database contents for the purpose of populating search engine query results with novel content, [ Salamon, Abstract, [0003]].
Claim 27 is rejected under 35 U.S.C. 103 as being unpatentable over US Patent No. ( US2017/0249455 A1) issued to Permeh ( US2017/0249455 A1), and in view of US Patent No. ( US2018/0158157) issued to Dintenfass , and in view of US Patent No. (US11,082, 487) issued to Jain
Regarding claim 27, Permeh, and Dintenfass do not explicitly disclose, however, Jain discloses where at least one of the data quality algorithms is the Kruskal-Wallis test
PNG
media_image7.png
74
228
media_image7.png
Greyscale
[Col. 57 lines 43-49, In some implementations, the data collected through the multi-tenant data access platform comprises monitoring data generated using sensors of mobile devices and/or wearable devices of users. In some implementations, the data access request comprises a search query for the data for different units of the multiple tenant organizations], and [Col. 35 lines 41-51, Many types of data access requests are possible. Examples include requests to view or download data. Other examples include requests to search the data (e.g., a query), to generate a report based on the data, to generate a visualization based on the data, to perform a machine learning task based on the data (e.g., use the data as training data for training a model, using the data to test or validate a model, applying a model to generate a prediction based on the data, etc.), to evaluate the data to identify candidate participants for a research study cohort, to perform statistical analysis on the data, and so on [ Col. 36, 3-14, In some implementations, the data access request includes a search query for health data for health research studies corresponding to different units of the tenant organizations. For example, as shown in FIG. 2, the data access request 202 includes a search query for chemotherapy treatments received by participants of cancer research studies. In this example, the server system 110 determines that study data 214A of Study A and study data 214B of Study B both include participant data that is relevant to the search query. Study A and Study B can also be conducted by different departments of the same hospital (e.g., Oncology department and Cardiology department of Hospital A)], and [(160) Other examples of data processing requests include requests to determine statistical measures for a data set (e.g., mean, median, mode, maximum, minimum, variance, distribution characteristics, etc.) for various properties… Kruskal Wallis test, paired t-test].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh, and Dintenfass by incorporating “Kruskal-Wallis test”, as taught by Jain. One could have been motivated to do so in order to provide data processing requests for data access request for a search query for the data which include requests to determine statistical measures for a data set (e.g., mean, median, mode, maximum, minimum, variance, distribution characteristics, etc.) for various properties. [ Jain, Col. 36, 3-14, claim 3].
Claim 28 is rejected under 35 U.S.C. 103 as being unpatentable over US Patent No. ( US2017/0249455 A1) issued to Permeh ( US2017/0249455 A1), and in view of US Patent No. ( US2018/0158157) issued to Dintenfass, and in view of US Patent No. (US2019/0347359) issued to Guy.
Regarding claim 28, Permeh does not explicitly disclose, however, Guy discloses, where the data query module utilizes web crawling and web scraping [ Abstract, Various methods and systems for processing web crawling queries using a web crawling prioritization model based on classification operation performance. A classification operation for organizing products in a product listing platform is accessed. A web crawling engine is accessed for the classification operation. The web crawling engine operates based on a web crawling query prioritization model that supports determining web crawling priority scores that indicate a predicted performance improvement for classification operations executed with known data and web crawled data to be retrieved from executing a web crawling query operation. Using the web crawling prioritization model, a web crawling priority score is determined for a web crawling query for the corresponding classification operation. The classification operation is associated with a product in a product listing platform and known data for the product. Based on the web crawling priority score, the web crawling query is executed to identify web crawled data.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Permeh, and Dintenfass by incorporating “web crawling prioritization model ”, as taught by Guy. One could have been motivated to do so in order to implement various methods and systems for processing web crawling queries using a web crawling prioritization model based on classification operation performance. A web crawling engine is accessed for the classification operation. The web crawling engine operates based on a web crawling query prioritization model that supports determining web crawling priority scores that indicate a predicted performance improvement for classification operations executed with known data and web crawled data to be retrieved from executing a web crawling query operation. [ Guy , Abstract].
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
See 892 for more relevant references.
Google Search: The Kruskal-Wallis test is a non-parametric method used in data quality analysis to determine if three or more independent groups originate from the same distribution, based on ranked data. The test statistic, denoted as \(H\), evaluates if the medians of groups are significantly different, offering a robust alternative to ANOVA for non-normally distributed data. The University of Virginia +2Kruskal-Wallis Test Statistic (\(H\)) FormulaThe statistic \(H\) is calculated as:\(H=\left[\frac{12}{N(N+1)}\sum _{i=1}^{k}\frac{R_{i}^{2}}{n_{i}}\right]-3(N+1)\) \(N\): Total number of observations across all groups.\(k\): Number of groups.\(R_{i}\): Sum of ranks for the \(i\)-th group.\(n_{i}\): Number of observations in the \(i\)-th group. University of North Dakota +1Key Aspects for Data Quality Analysis Interpretation: If \(H\) exceeds the critical value from the \(\chi ^{2}\) (chi-square) distribution (\(df=k-1\)), the null hypothesis is rejected, indicating at least one group's distribution differs significantly.Assumption: Assumes independent samples and, ideally, similar distribution shapes.Handling Ties: If tied ranks occur, a correction factor is applied to \(H\), though it often has little effect.
Banipal( US2021/0263978)[Abstract, Methods, computer program products, and systems are presented. The method computer program products, and systems can include, for instance: obtaining user browsing data from a browser plugin of a client computer device, the client computing device being associated to the user; examining the user browsing data; classifying a current activity of the user in dependence on the examining the user browsing data, wherein the classifying includes classifying the current activity of the user as searching for a service provider; in response to classifying the current activity of the user as searching for a service provider, performing web crawling on multiple websites to obtain research data, wherein the multiple websites are crawled by the performing web crawling in dependence on extracted data extracted by the examining the user browsing data; and providing an action decision in dependence on the research data, wherein the action decision includes a decision to transmit a communication to the user on behalf of a certain service provider.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAHRIAR ZARRINEH whose telephone number is (571)272-1207. The examiner can normally be reached Monday-Friday, 8:30am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jorge Ortiz-Criado can be reached at 571-272-7624. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SHAHRIAR ZARRINEH/Primary Examiner, Art Unit 2496