DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 5 is objected to because of the following informalities:
Claim 5 starts with Equation (2) and depends on claims 4 and 1 which does not disclose an Equation (1); thus Equation (2) should be labeled as Equation (1).
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-6 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 1, recites the term “obtained multi-source urban energy data” in the limitation, “S1, converting obtained multi-source urban energy data into a multimodal input sequence….” and it is unclear where the “multi-source urban energy data” is collected or obtained from as there is no preceding obtaining step disclosed in the limitations prior. Thus, this renders the claim indefinite.
Claim 1 mentions the term “a text XW∈RT W ×D W” , “ an image X1∈RT 1× D 1” and “audio XA∈RT A ×D A” in the limitation “S1, converting obtained multi-source urban energy data into a multimodal input sequence, wherein three types of heterogeneous data of urban energy data comprise a text XW∈RT W ×D W , an image X1∈RT 1× D 1 , and audio XA∈RT A ×D A” and it is unclear how a text, an image, and audio relates to the mathematical notations “XW∈RT W ×D W”, “X1∈RT 1× D 1” and “XA∈RT A ×D A” , thus rendering the claim indefinite.
Claim 1 also fails to define the variables in the mathematical notations “XW∈RTW ×DW” , “X1∈RT 1× D 1” , and “XA∈RT A ×D A” in the limitation “S1, converting obtained multi-source urban energy data into a multimodal input sequence, wherein three types of heterogeneous data of urban energy data comprise a text XW∈RT W ×D W , an image X1∈RT 1× D 1 , and audio XA∈RT A ×D A”, the mathematical notations “[ZI→W [D], ZA→W [D]]∈RT W ×2d” , “[ZW→I [D], ZA→I [D]]∈RT 1 ×2d” , and “[ZI→A [D], ZW→A [D]]∈RT A ×2d” in the limitations “S4, performing a multi-scale and multimodal information fusion on an output in the step S3 by using a cross-modal transformer to implement a cross-modal mutual fusion of the three types of the heterogeneous data that is represented by [ZI→W [D], ZA→W [D]]∈RT W ×2d, [ZW→I [D], ZA→I [D]]∈RT 1 ×2d, and [ZI→A [D], ZW→A [D]]∈RT A ×2d” and “S5, putting [ZI→W [D], ZA→W [D]]∈RT W ×2d, [ZW→I [D], ZA→I [D]]∈RT 1 ×2d, and [ZI→A [D], ZW→A [D]]∈RT A ×2 d pairwise into three transformer networks with a self-attention for a self-attention calculation, to obtain a fused feature of the multi-source urban energy data, wherein the fused feature is an input to a deep learning-based prediction model and used to predict a quantity of energy that will be used in the future and the quantity of the energy that needs to be produced” which renders the claim indefinite. Furthermore, it is unclear whether the term “Z” is referring to “a cross-modal fusion” or “the three types of the heterogenous data”, thus rendering the claim indefinite.
Claim 1 recites the limitation "heterogenous data of urban energy data" in “S1, converting obtained multi-source urban energy data into a multimodal input sequence, wherein three types of heterogeneous data of urban energy data….” and it is unclear whether the “urban energy data” is referring to the “multi-source urban energy data” or a urban energy data that is distinct from the “multi-source urban energy data”, thus rendering the claim indefinite.
Finally, claim 1 recites repeatedly recites the term “an output” in all of its limitations, and it is unclear which data structure each “output” from the preceding limitations is referring to, thus rendering the claim indefinite.
Claims 2-5 are also rejected under 35 U.S.C. 112 (b) since the claims inherit the deficiencies of claim 1.
Claim 6 is also rejected under 35 U.S.C. 112(b) as it recites “A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the steps of the method according to claim 1 are implemented” and therefore is rejected under the same rationale as claim 1.
Regarding claim 2 which discloses “textual data is sourced” , “image data is sourced” in “sources of the multi-source urban energy data comprise water, coal, electricity, heating power, and oil industries, wherein textual data is sourced in production data, management data, and marketing data that are found in energy big data from various energy industries, and data about a consumption of various types of the energy”, and “image data is sourced in geographic information system (GIS) information and a meteorogram that are found in the energy big data from the coal, the oil, and the electricity industries” and “. It is unclear whether the data “is sourced” from all coal, oil and electricity industries at the same time or whether the data “is sourced” from just one of either coal, oil and electricity industry, thus rendering the claim indefinite.
Additionally, claim 2 also discloses “various energy industries” in “sources of the multi-source urban energy data comprise water, coal, electricity, heating power, and oil industries, wherein textual data is sourced in production data, management data, and marketing data that are found in energy big data from various energy industries”, and “audio data is sourced in energy use-related audio report information obtained from the various energy industries”. It is unclear what whether “various energy industries” is limited to water, coal, electricity, heating power and oil industries or whether “various energy industries” also includes other urban-energy industries, thus rendering the claim indefinite.
Claim 2 recites the limitation "the energy" in “audio data of an interview with people working in the various energy industries about a usage amount of a type of the energy in the future”. There is insufficient antecedent basis for this limitation in the claim.
Claim 3 fails to define the variable “T α” in “ X′ α=Conv1D(X α ,k α)∈R T α ×d (1)”, thus rendering the claim indefinite. Additionally, it is unclear whether “X′ α” is the referring to “X α” or another variable, thus rendering the claim indefinite.
Regarding claim 4 which recites “keep an interaction with windowing” in “an input of a MCTB in the target mode is from outputs of a plurality of MCTBs in the source mode, and to keep an interaction with windowing, a local interaction is formed between the MCTB in the target mode and the MCTB in the source mode that are at a same scale….”. There is no clear definition of the term “windowing” within the claims and the specification and it is unclear how “an interaction” is related to “windowing”, thus rendering the claim indefinite.
Claim 4 also recites the limitation "the MACT block" in “wherein, the MACT block comprises three subnetwork layers, which are a multi-scale multi-head cross-modal layer, a multi-scale attention layer….” and the limitation “the CT block” in “a position-wise feedforward layer, and the CT block is used for only a representation of a single scale in the source mode”. There is insufficient antecedent basis for this limitation in the claim.
Claim 5 fails to define the variables “Z β”, “Z α”, “R”, “d β” and “d α” in the mathematical notations “Zα∈RT α ×d α” and “Zβ∈RT β ×d β--” and fails to define the variables “CM”, “j”, “LN” and “P β→α” in Equations (2),(4), (5), and (6), thus rendering the claim indefinite.
Additionally, claim 5 recites “the target mode α and the source mode β” and it is unclear whether the target mode α and the source mode β is referring to the target mode and source mode in claim 4 or are distinct from the target mode and source mode in claim 4, thus rendering the claim indefinite.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Claim 6 does not fall within at least one of the four categories of patent eligible subject matter because the claim recites “A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor”. According to the specification [0034] , “A computer-readable storage medium” is described as “Therefore, the present invention may use complete hardware embodiments or complete software embodiments, or have a form combining the embodiments in aspects of software and hardware. Further, the present invention may be in a form of a computer program product that is executed on one or more computer-usable storage media including computer-usable program code.”, hence the “computer-readable storage medium” can be construed as software per se which does not fall under one of the four statutory categories and is therefore rejected.
Claims 1-6 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1:
Subject Matter Eligibility Analysis Step 1:
Claim 1 recites “A deep learning-based method for fusing multi-source urban energy data” which is a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
Claim 1 recites the steps:
“S1, converting obtained multi-source urban energy data into a multimodal input sequence, wherein three types of heterogeneous data of urban energy data comprise a text
PNG
media_image1.png
58
132
media_image1.png
Greyscale
, an image
PNG
media_image2.png
57
106
media_image2.png
Greyscale
, and audio
PNG
media_image3.png
41
122
media_image3.png
Greyscale
”: This involves a human converting urban energy data into a multimodal input sequence which contains either text, image, or audio data. Therefore, this is a mental process
“S2, performing a one-dimensional time convolution once on data in the multimodal input sequence in the step S1 to obtain time information and obtain an urban energy data feature with the time information”: This involves a human performing a one-dimensional convolution (applying activation functions) on the input sequence from step (I) to obtain time feature information. Hence, this is a mental process.
“S3, performing positional encoding (PE) on an output in the step S2 to ensure that the time information is retained in a subsequent calculation”: This involves a human taking an output from step (II) and performing positional encoding on that output, thus this is a mental process.
“S4, performing a multi-scale and multimodal information fusion on an output in the step S3 by using a cross-modal transformer to implement a cross-modal fusion of the three types of heterogeneous data the is represented by
PNG
media_image4.png
65
493
media_image4.png
Greyscale
PNG
media_image5.png
70
395
media_image5.png
Greyscale
”: This involves a human implementing a cross modal fusion of the three types of data from step (I). Thus, this is a mental process.
Claim 1 hence recites abstract ideas.
Subject Matter Eligibility Analysis Step 2A Prong 2:
Claim 1 discloses the additional elements:
“S4, performing a multi-scale and multimodal information fusion on an output in the step S3 by using a cross-modal transformer to implement a cross-modal fusion of the three types of heterogeneous data the is represented by
PNG
media_image4.png
65
493
media_image4.png
Greyscale
PNG
media_image5.png
70
395
media_image5.png
Greyscale
”: This element does not integrate the abstract ideas into a practical application because the element recites a generic computing component (MPEP 2106.05(f)).
“S5, putting
PNG
media_image6.png
60
850
media_image6.png
Greyscale
PNG
media_image7.png
48
85
media_image7.png
Greyscale
pairwise into three transformer networks with a self-attention for a self-attention calculation, to obtain a fused feature of the multi-source urban energy data, wherein the fused feature is an input to a deep learning-based prediction model and used to predict a quantity of energy that will be used in the future and the quantity of the energy that needs to be produced”: This element does not integrate the abstract ideas from Step 2A Prong 1 into a practical application because the element recites an insignificant extra solution activity of data transmission (MPEP 2106.05(g)).
Therefore, claim 1 is directed to the abstract ideas.
Subject Matter Eligibility Analysis Step 2B:
The additional elements in claim 1 does not provide significantly more than the abstract
ideas themselves taken alone and in combination because:
“S4, performing a multi-scale and multimodal information fusion on an output in the step S3 by using a cross-modal transformer to implement a cross-modal fusion of the three types of heterogeneous data the is represented by
PNG
media_image4.png
65
493
media_image4.png
Greyscale
PNG
media_image5.png
70
395
media_image5.png
Greyscale
”: This element recites a generic computing component (MPEP 2106.05(f)).
“S5, putting
PNG
media_image6.png
60
850
media_image6.png
Greyscale
PNG
media_image7.png
48
85
media_image7.png
Greyscale
pairwise into three transformer networks with a self-attention for a self-attention calculation, to obtain a fused feature of the multi-source urban energy data, wherein the fused feature is an input to a deep learning-based prediction model and used to predict a quantity of energy that will be used in the future and the quantity of the energy that needs to be produced”: This element mentions a well-understood, routine, and conventional activity of “receiving or transmitting data over a network” (MPEP 2106.05(d)(I), Intellectual Ventures v. Symantec, 838 F.3d 1307, 1321; 120 USPQ2d 1353, 1362 (Fed. Cir. 2016) [utilizing an intermediary computer to forward information]).
Since there is no nexus between additional elements that could cause the combination
to provide an inventive concept, claim 1 is subject-matter ineligible.
Regarding claim 2:
Subject Matter Eligibility Analysis Step 1:
Claim 2 is a process as in claim 1.
Subject Matter Eligibility Analysis Step 2A Prong 1:
Claim 2 recites the steps:
“sources of the multi-source urban energy data comprise water, coal, electricity, heating power, and oil industries, wherein textual data is sourced in production data, management data, and marketing data that are found in energy big data from various energy industries, and data about a consumption of various types of the energy”: This element expands on step (I) from claim 1 and further describes “a text”. Hence, this is a mental process.
“image data is sourced in geographic information system (GIS) information and a meteorogram that are found in the energy big data from the coal, the oil, and the electricity industries, and traffic flow image information about an energy consumption of the oil and the electricity”: This element expands on step (I) from claim 1 and further describes “an image”. Thus, this is a mental process.
“audio data is sourced in energy use-related audio report information obtained from the various energy industries through Internet big data mining, interview audio information related to the energy use in various industries, and the audio data of an interview with people working in the various energy industries about a usage amount of a type of the energy in the future”: This element expands on step (I) from claim 1 and further describes “an audio”. Thus, this is a mental process
Claim 2 therefore recites abstract ideas.
Subject Matter Eligibility Analysis Step 2A Prong 1:
Claim 2 recites the same additional elements as claim 1, hence claim 2 is directed to the abstract ideas.
Subject Matter Eligibility Analysis Step 2A Prong 2:
Since claim 2 recites the same additional elements as claim 1 and there is no nexus between additional elements that could cause the combination to provide an inventive concept, claim 2 is subject-matter ineligible.
Regarding claim 3:
Subject Matter Eligibility Analysis Step 1:
Claim 3 is a process as in claim 1.
Subject Matter Eligibility Analysis Step 2A Prong 1:
In addition to the mental concepts in claim 1, claim 3 recites:
“a manner of a time convolution is as follows:
PNG
media_image8.png
215
1011
media_image8.png
Greyscale
”: This element recites the formula for one-dimensional time convolution; hence this is math.
Claim 3 thus recites abstract ideas.
Subject Matter Eligibility Analysis Step 2A Prong 1:
Claim 3 recites the same additional elements as claim 1, hence claim 3 is directed to the abstract ideas.
Subject Matter Eligibility Analysis Step 2A Prong 2:
Since claim 3 recites the same additional elements as claim 1 and there is no nexus between additional elements that could cause the combination to provide an inventive concept, claim 3 is subject-matter ineligible.
Regarding claim 4:
Subject Matter Eligibility Analysis Step 1:
Claim 4 is a process as in claim 1.
Subject Matter Eligibility Analysis Step 2A Prong 1:
Claim 4 recites the same mental concepts as claim 1, thus claim 4 recites abstract ideas.
Subject Matter Eligibility Analysis Step 2A Prong 2:
In addition to the additional elements in claim 1, claim 4 recites:
“a multi-scale cooperative multimodal transformer (MCMulT) architecture is built to implement a multimodal and multi-scale information fusion, wherein a MCMulT network is divided into several connected cross-modal transformer blocks (MCTBs), and blocks comprise two types of cross-modal units, which are a multi-scale attention cross-modal (MACT) block and a cross-modal (CT) block, wherein a MACT block in one dimension and two adjacent CT blocks in another dimension form a cross-modal transformer block”: This element does not integrate the abstract ideas into a practical application because the element recites generic computing components (MPEP 2106.05(f)).
“for a target mode and a source mode, a global interaction of the MCMulT is performed by a plurality of MACT blocks, wherein an input of a MCTB in the target mode is from outputs of a plurality of MCTBs in the source mode, and to keep an interaction with windowing, a local interaction is formed between the MCTB in the target mode and the MCTB in the source mode that are at a same scale, a local interaction of the MCMulT is performed by the CT block, wherein an input of the CT block comprises a previous-layer output in the target mode and a first layer output of the MCTB in a same scale in the source mode, and the local interaction is represented using only a single scale in the source mode”: This element does not integrate the abstract ideas into a practical application because the element recites generic computing components (MPEP 2106.05(f)).
“ wherein, the MACT block comprises three subnetwork layers, which are a multi-scale multi-head cross-modal layer, a multi-scale attention layer, and a position-wise feedforward layer, and the CT block is used for only a representation of a single scale in the source mode.”: This element does not integrate the abstract ideas into a practical application because the element recites generic computing components (MPEP 2106.05(f)).
Claim 4 hence is directed to the abstract ideas.
Subject Matter Eligibility Analysis Step 2B:
The additional elements in claim 4 does not provide significantly more than the abstract ideas themselves taken alone and in combination because:
“a multi-scale cooperative multimodal transformer (MCMulT) architecture is built to implement a multimodal and multi-scale information fusion, wherein a MCMulT network is divided into several connected cross-modal transformer blocks (MCTBs), and blocks comprise two types of cross-modal units, which are a multi-scale attention cross-modal (MACT) block and a cross-modal (CT) block, wherein a MACT block in one dimension and two adjacent CT blocks in another dimension form a cross-modal transformer block”: This element recites generic computing components (MPEP 2106.05(f)).
“for a target mode and a source mode, a global interaction of the MCMulT is performed by a plurality of MACT blocks, wherein an input of a MCTB in the target mode is from outputs of a plurality of MCTBs in the source mode, and to keep an interaction with windowing, a local interaction is formed between the MCTB in the target mode and the MCTB in the source mode that are at a same scale, a local interaction of the MCMulT is performed by the CT block, wherein an input of the CT block comprises a previous-layer output in the target mode and a first layer output of the MCTB in a same scale in the source mode, and the local interaction is represented using only a single scale in the source mode”: This element recites generic computing components (MPEP 2106.05(f)).
“ wherein, the MACT block comprises three subnetwork layers, which are a multi-scale multi-head cross-modal layer, a multi-scale attention layer, and a position-wise feedforward layer, and the CT block is used for only a representation of a single scale in the source mode.”: This element recites generic computing components (MPEP 2106.05(f)).
Since there is no nexus between additional elements that could cause the combination to provide an inventive concept, claim 4 is subject-matter ineligible.
Regarding claim 5:
Subject Matter Eligibility Analysis Step 1:
Claim 5 is a process as in claim 4.
Subject Matter Eligibility Analysis Step 2A Prong 1:
In addition to the mental concepts in claim 4, claim 5 recites:
PNG
media_image9.png
640
689
media_image9.png
Greyscale
: This element recites the formulas for interactions and fusions, hence this is math.
Claim 5 thus recites abstract ideas.
Subject Matter Eligibility Analysis Step 2A Prong 1:
Claim 5 recites the same additional elements as claim 4, hence claim 5 is directed to the abstract ideas.
Subject Matter Eligibility Analysis Step 2A Prong 2:
Since claim 5 recites the same additional elements as claim 1 and there is no nexus between additional elements that could cause the combination to provide an inventive concept, claim 5 is subject-matter ineligible.
Regarding claim 6:
Subject Matter Eligibility Analysis Step 1:
As stated in the 101 rejections for claim 6 above, claim 6 recites “A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor” which can be construed as software per se and hence claim 6 does not fall under one of the four statutory categories. For further examination purposes, however, it will be assumed that the “computer-readable storage medium” in claim 6 includes a non-transitory computer-readable medium which is an article of manufacture and falls under one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
Since claim 6 discloses the same steps as claim 1, it recites abstract ideas.
Subject Matter Eligibility Analysis Step 2A Prong 2:
Claim 6 recites the same additional elements as claim 1 and further includes the element:
“wherein when the computer program is executed by a processor”: This element does not integrate the abstract ideas into a practical application because it recites a generic computing component (MPEP 2106.05(f)).
Hence, claim 6 is directed to the abstract ideas.
Subject Matter Eligibility Analysis Step 2B:
The additional elements in claim 6 does not provide significantly more than the abstract ideas themselves taken alone and in combination because of the same rationale as claim 1 and because the element:
“wherein when the computer program is executed by a processor”: recites a generic computing component (MPEP 2106.05(f)).
Since there is no nexus between additional elements that could cause the combination to provide an inventive concept, claim 6 is subject-matter ineligible.
Pertinent Prior Art
The Examiner notes that due to the various 112(b) rejections of the claims above, a prior art rejection under 35 U.S.C 102 and/or 35 U.S.C 103 has not been currently applied to the claims at this time. However, the Examiner notes the pertinent prior arts below that may be applicable to claims 1-6 are Zhu et al.’s “Multimodal Fusion Method Based on Self-Attention Mechanism” and Song et al.’s “Research on Scattering Transform of Urban Sound Events Detection Based on Self-Attention Mechanism”.
Zhu et al. discloses a multimodal fusion model that fuses visual, audio , and language data by applying self-attention to the visual, audio, and language data to form a multimodal fusion and then inputting the multimodal fusion to a prediction model for a classification task as summarized in the diagram below:
PNG
media_image10.png
851
1108
media_image10.png
Greyscale
Song et al. also teaches a fusion model but for urban energy sound data based on self-attention by filtering relevant semantic features using one-dimensional convolution then embedding the position information for each feature. The encoded positional information is then inputted to a transformer model which performs a self-attention mechanism using multi-head attention (Song et al. Section II).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RADIAH ALA ISLAM whose telephone number is (571)270-3483. The examiner can normally be reached Monday-Thursday 9:00am-7:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached at (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/R.A.I./Examiner, Art Unit 2148
/MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148