DETAILED ACTION
This is the initial Office action based on the application filed on January 23, 2025. Claims 1-13 are currently pending and have been considered below.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claims 1 and 7 state “classify multi-source heterogeneous data to obtain classified data sources, wherein the classified data sources comprise a structured data source, a semi-structured data source, an unstructured data source, and a binary data source.”
The above limitation implies classifying data to obtain classified data sources. That would mean that data is first classified in order to identify the data sources. However, it is unclear where the initial data comes from. Furthermore, it is unclear how classifying data would lead to obtaining data sources.
Claims 1 and 7 further state “perform information configuration of a predetermined configuration rule on the classified data sources to obtain data source information, wherein the predetermined configuration rule comprises a Uniform Resource Locator (URL), a username, a password, a driver.” However, it is unclear what “perform information configuration” means. Does that mean applying configuration rules?
Furthermore, the limitation states “wherein the predetermined configuration rule comprises a Uniform Resource Locator (URL), a username, a password, a driver.” However, it is unclear what a driver in the above context means. A driver is usually something that deals with operating system level operations and not just for retrieving data. Furthermore, because a driver is usually a software program, it is unclear how a particular rule comprises a driver.
Moreover, to further make things unclear, the instant specification at [071] states “The data source information (such as the URL, the username, the password, and the driver) is saved in a system database in the form of a page for the user to fill in.” This language suggests that the term driver is not used in its customary term in the computer arts.
As such, it is difficult to understand the metes and bounds of the claims.
Claims 1 and 7 further state “querying… wherein the predetermined data processing method comprises a batch data mode and a streaming data mode.”
However, batch data mode and streaming data mode are mutually exclusive. Furthermore, Abstract of the instant application states “query, retrieve, and read the data source information by using a predetermined data processing method, such as a batch data mode or a streaming data mode, to obtain read data.” But other parts of the instant specification state “a batch data mode and a streaming data mode.”
In light of the differing parts of the instant specification and the above claim limitation reciting mutually exclusive functionality, it is difficult to understand the metes and bounds of the claims.
Claims 4 and 11 state “wherein the statistical data comprises a destination of the multi-source heterogeneous data and a source of the read data.” However, it is unclear what the destination multi-source heterogenous data is and where it came from. Furthermore, it is unclear what specific statistical data is being analyzed.
Claims 5 and 12 “wherein the data lake format comprises Delta Lake, Apache Hudi, and Apache Iceberg.”
However, it is highly unusual for a data lake to include all three kinds of formats without interoperability layer. But, neither the claims nor the claims describe how all three formats function together, it at all. As such, it is unclear how the data lake comprises the above claimed formats all at once.
Another thing that adds to ambiguity, is that the specification at [080] states “This module supports selection of the data lake format, and provides three mainstream international data lake formats: Delta Lake, Apache Hudi, and Apache Iceberg.” That implies that only one format is selected and not all three as claimed.
The Claims also state “and the underlying storage medium comprises Hadoop Distributed File System (HDFS), Amazon Web Services (AWS), and Ceph.” However, it is unclear how such functionality is achieved when the medium comprises all three of the above recited types.
Another thing that adds to ambiguity, is that the specification at [080] states “Furthermore, the module supports selection of the underlying storage medium for the data lake, and provides HDFS, AWS, Ceph, etc.” That implies that only one type is selected and not all three as claimed.
Claims 6 and 13 recite “manage and allocate a query index of the data acquisition module.” However, it is unclear based on what information is in the query index and what information was used to form the index.
Claim 7 recites a method with analogous steps to Claim 1. As such, it is unclear if Claim 7 is supposed to be its own Independent Claim. Furthermore, Claim 7 is claiming multiple statutory classes, method and system, which makes it unclear under what statutory class the Claim should be analyzed.
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
The following is a quotation of pre-AIA 35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA 35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
Claims 7-13 are rejected under 35 U.S.C. 112(d) or pre-AIA 35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends. Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.
Claim 7 recites a method with analogous steps to Claim 1. Those steps do not further limit Claim 1. Rather, it seems that Claim 7 is its own Independent Claim, but is currently dependent upon Claim 1.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Independent Claims 1 recites abstract subject matter directed towards classifying and retrieving information. Specifically, Claim 1 recites:
classify multi-source heterogeneous data to obtain classified data sources, wherein the classified data sources comprise a structured data source, a semi-structured data source, an unstructured data source, and a binary data source – Classifying data is something that can be performed in the mind, and is thus an abstract concept.
perform information configuration of a predetermined configuration rule on the classified data sources to obtain data source information, wherein the predetermined configuration rule comprises a Uniform Resource Locator (URL), a username, a password, a driver – Applying rules to classified data is something that can be performed in the mind, and is thus an abstract concept.
the data acquisition module is configured to query, retrieve, and read the data source information by using a predetermined data processing method, to obtain read data, wherein the predetermined data processing method comprises a batch data mode and a streaming data mode – Retrieving information is considered as insignificant extra-solution activity as per MPEP 2106.05(g).
This judicial exception is not integrated into a practical application. Other, the abstract idea, the claims recite additional elements of modules executing the abstract idea. The additional elements such modules are recited at a high level of generality, i.e. as generic computer components performing generic computer functions of information processing. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The rest of the Dependent Claims, 2-13 (although Claim 7 is likely supposed to be an independent claim as addressed above), further describe more details of the above identified mental processes and thus do not provide additional elements that would make them statutory under 35 USC 101.
Claims 1-13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Specifically, the Claims lack the necessary physical articles or objects to constitute a machine or a manufacture within the meaning of 35 USC 101. Rather, the Claims describe a system, executing different software modules, without any hardware to execute the units, which is considered non-statutory.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3, 7-10 are rejected under 35 U.S.C. 103 as being unpatentable over Ghosal et al (US Patent Application Publication 2024/0160632) in view of Read et al (US Patent 9,984,136).
Claims 1 and 7: Ghosal discloses a system and method comprising: a data source management module and a data acquisition module that are connected to each other, wherein the data source management module is configured to
classify multi-source heterogeneous data to obtain classified data sources, wherein the classified data sources comprise a structured data source, a semi-structured data source, an unstructured data source, and a binary data source [0021-0022]. [See at least classifying data sources based metadata including the data sources.]
Ghosal alone does not explicitly disclose perform information configuration of a predetermined configuration rule on the classified data sources to obtain data source information, wherein the predetermined configuration rule comprises a Uniform Resource Locator (URL), a username, a password, and a driver.
However, Read (Col 6 ln 19-36) discloses using configuration data such as at least a URL, a username, password, etc to provide access to data sources.
As such, it would have been obvious for one of ordinary skill in the art before the effective filing date to modify Ghosal with Read. One would have been motivated to do so in order to access data sources using configuration information.
Ghosal as modified further discloses:
the data acquisition module is configured to query, retrieve, and read the data source information by using a predetermined data processing method, to obtain read data, wherein the predetermined data processing method comprises a batch data mode and a streaming data mode [0133-0134]. [See at least retrieving data as “a batch (i.e., a dataset) as a stream…”]
Claims 2 and 9: Ghosal as modified discloses the system and method of Claims 1 and 7 above, and Ghosal further discloses wherein the system further comprises a data storage module, wherein the data storage module is connected to the data source management module and the data acquisition module; the data storage module is configured to store the predetermined configuration rule and the data source information; and the data storage module is further configured to store the read data [0133-0134].
Claims 3 and 10: Ghosal as modified discloses the system and method of Claims 2 and 9 above, and Ghosal further discloses wherein the system further comprises a data query module, wherein the data query module is connected to the data storage module; and the data query module is configured to perform single-table query and multi-table joint query on the read data to obtain query data information, wherein the query data information comprises a storage path and data details of the read data [0131, 0133-0134]. [Also see Read (Col 6 ln 19-36) if identifying a path where data is stored.]
Claim 8: Ghosal as modified discloses the method of Claim 7 above, and Ghosal in view of Read further discloses wherein the method further comprises storing the read data [0131, 0133-0134]
Claims 4 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Ghosal et al (US Patent Application Publication 2024/0160632) in view of Read et al (US Patent 9,984,136) and further in view of Brown et al (US Patent 10,877,964).
Claims 4 and 11: Ghosal as modified discloses the system and method of Claims 1 and 7 above, but Ghosal alone does not explicitly disclose wherein the system further comprises a statistical analysis module, wherein the statistical analysis module is connected to the data source management module and the data acquisition module; the statistical analysis module is configured to display statistical data, wherein the statistical data comprises a destination of the multi-source heterogeneous data and a source of the read data; and the statistical analysis module is further configured to perform causal analysis on the read data by calling a causal analysis algorithm to obtain analysis data, and display the analysis data.
However, Brown (Col 19 ln 34-67) discloses providing a statistical report of data based on destination and sources of data.
As such, it would have been obvious for one of ordinary skill in the art before the effective filing date to modify Ghosal with Brown. One would have been motivated to do so in order to provide statistical analysis of data based on a user request.
Claims 5 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Ghosal et al (US Patent Application Publication 2024/0160632) in view of Read et al (US Patent 9,984,136) further in view of O’Krafka (US Patent Application Publication 2022/0277006) and further in view of Raghunath et al (US Patent Application Publication 2020/0319915).
Claims 5 and 12: Ghosal as modified discloses the system and method of Claims 1 and 7 above, but Ghosal alone does not explicitly disclose wherein the data storage module comprises a configured data lake format and underlying storage medium, wherein the data lake format comprises Delta Lake, Apache Hudi, and Apache Iceberg; and the underlying storage medium comprises Hadoop Distributed File System (HDFS), Amazon Web Services (AWS), and Ceph.
However, O’Krafka [0040] discloses where a data lake may be stored using “Hive, Hudi, Delta Lake, and Iceberg.” And Raghunath [0056] discloses having storage based at least on HDFS, AWS and Ceph.
As such, it would have been obvious for one of ordinary skill in the art before the effective filing date to modify Ghosal with O’Krafka and Raghunath. One would have been motivated to do so in order to store objects on a user indicated particular platform.
Claims 6 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Ghosal et al (US Patent Application Publication 2024/0160632) in view of Read et al (US Patent 9,984,136) and further in view of Yi et al (US Patent Application Publication 2016/0078135).
Claims 6 and 13: Ghosal as modified discloses the system and method of Claims 1 and 7 above, but Ghosal alone does not explicitly disclose wherein the system further comprises a task management module, wherein the task management module is connected to the data source management module and the data acquisition module; the task management module is configured to manage and set the data source information in the data source management module; and the task management module is further configured to manage and allocate a query index of the data acquisition module.
However, Yi [0032-0033] discloses having and managing a query index based on various data sources.
As such, it would have been obvious for one of ordinary skill in the art before the effective filing date to modify Ghosal with Yi. One would have been motivated to do so in order to be able to properly retrieve information.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Howes et al (2014/0283048) describes at least categorizing data sources;
Raphael et al (2022/0269663) describes at least types of data stored in a data lake (i.e. “structured data from relational databases (e.g., rectangular datasets), semi-structured data, unstructured data, and/or binary data.”)
Wood et al (2023/0300112) describes at least types of data stored in a data lake.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEX GOFMAN whose telephone number is (571)270-1072. The examiner can normally be reached Monday-Friday 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached at 571-272-4078. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ALEX GOFMAN/Primary Examiner, Art Unit 2163