Detailed Action
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Amendments
This action is in response to amendments filed November 13th 2025, in which Claims 1-4, 8, 11-15, 19, & 22 have been amended. No claims have been added. Claims 6-7, 9, 17-18, & 20 have been cancelled. The amendments have been entered, and Claims 1-5, 8, 10-16, 19, & 21-22 are currently pending.
Response to Arguments
Regarding the applicant’s traversal of the 35 U.S.C. 101 rejections of the previous office action, the applicant’s arguments filed November 13th 2025 have been fully considered, and are unpersuasive.
The applicant asserts that claims 1 and 12, as amended, do not recite any judicial exception, but merely involve them, further asserting that the claims do not set forth any mathematical relationships, calculations, formulas, or equations using words or mathematical symbols.
The examiner respectfully submits that the previous office action never asserted that claims 1 or 12 recite mathematical processes. It is recognized that employing the detection sub-network, for example does “involve” mathematical processes but that these were not positively recited in the limitations and therefore, it was not rejected under this rationale. Five limitations of claim 1 do, however, recite mental processes as follows:
“A method of generating a prediction value of a Neural Network (NN)” (A person can mentally evaluate data of a neural network and make a judgement to generate a “prediction value” based on it (MPEP 2106).)
“generating… a plurality of features based on an input object” (A person can mentally evaluate an input object and make a judgement to generate a list of features based on it (MPEP 2106).)
“generating… the prediction value based on the other human-interpretable output and the second portion of the input object” (A person can mentally evaluate human-interpretable output in relation to the input and make a judgement to generate a “prediction value” based on that (MPEP 2106).)
“generating… a detection output based on the plurality of features… indicative of a human-interpretable output for a first portion of the input object” (A person can mentally evaluate the plurality of features and make a judgement to generate a detection output (communicate something they noticed) in a way that is human-interpretable for a portion of the input (MPEP 2106).)
“generating… a modified detection output being indicative of an other human-interpretable output for the second portion of the input object, the other human-interpretable output being different from the human-interpretable output” (A person can mentally evaluate further portions of the input and make a judgement to modify the detection output (changing thoughts based on new information) in a way that is human-interpretable for the second portion of the input, this new modified output being different than the previous human-interpretable output (MPEP 2106).)
Similar limitations are also present in claim 12. These mental process limitations are found to recite abstract ideas, which means that the additional limitations of the claim must be examined to determine if they have been integrated into a practical application. The analysis for this is detailed in the previous action and below for the limitations as amended.
Further, the applicant asserts that any abstract idea in the claims would also be integrated into a practical application, citing paragraphs [04] & [09] of the specification, further specifying that explaining output in terms of intermediate concepts which are human-interpretable, and allowing humans to correct these outputs if deemed erroneous, allows for intervention and provides more interpretability to the end-user.
The examiner respectfully asserts that these improvements seem to be directed toward the generation of the human-interpretable output and the modified human-interpretable output, which are, themselves, abstract ideas, and therefore cannot be relied upon as “additional elements” which integrate the abstract ideas into a practical application, because when examining the claim as a whole, it is the “additional elements” that have the power to integrate the abstract limitations into a practical application, and not the abstract limitations themselves, as shown in the MPEP 2106.04 at Prong Two:
“Prong Two asks does the claim recite additional elements that integrate
the judicial exception into a practical application? In Prong Two, examiners
evaluate whether the claim as a whole integrates the exception into a practical
application of that exception. If the additional elements in the claim integrate the
recited exception into a practical application of the exception, then the claim is
not directed to the judicial exception (Step 2A: NO) and thus is eligible at
Pathway B. This concludes the eligibility analysis. If, however, the additional
elements do not integrate the exception into a practical application, then the claim
is directed to the recited judicial exception (Step 2A: YES), and requires further
analysis under Step 2B (where it may still be eligible if it amounts to an
‘‘inventive concept’’). For more information on how to evaluate whether a
judicial exception is integrated into a practical application”
Further, the reception of user input merely recites an insignificant extra-solution activity (mere data gathering) (MPEP 2106.05(g)), which does not provide evidence of integration into a practical application.
Therefore, the 35 U.S.C. 101 rejections of the previous action are maintained.
Regarding the applicant’s traversal of the 35 U.S.C. 102 rejections of the previous office action, the applicant’s arguments filed November 13th 2025 have been fully considered, and are unpersuasive.
The applicant asserts that JEYUKAMAR does not describe or suggest receiving user input of a selection of a second portion of an input object and generating a modified detection output based on the second portion, or receiving user input with a modification to human-interpretable output.
The examiner respectfully submits that these limitations are taught as follows:
JEYUKAMAR teaches “outputting, by the at least one processor, a user interface comprising the detection output” [Figure 5]
In figure 5, the prediction values and human-interpretable detection outputs can be seen presented on a user interface.
Further, JEYUKAMAR teaches “receiving, by the at least one processor and via the user interface, user input comprising a selection of a second portion of the input object, the second portion being different from the first portion”:
[Figure 1]
The Input Object (Dataset (Videos (plural) (X)) is shown being input into a “Feature Extractor” that extracts “Latent features”/ “A plurality of features” based on the input object. This shows that multiple videos are input with a plurality of features being extracted, meaning that the user input comprises a second portion of the input object and the “plurality of features” that are extracted means that a “selection” of the second portion, which is different from the first portion takes place as well.
Further, JEYUKAMAR teaches “generating, by the at least one processor employing the detection sub-network, a modified detection output being indicative of an other human-interpretable output for the second portion of the input object, the other human-interpretable output being different from the human-interpretable output”:
[Figure 5]
In figure 5, we can see the “human-interpretable output” (e.g. “the batter hit the ball into foul territory”) and the “second portion of the input object”/the extracted features (e.g. the “Predicted Concepts”) which are shown to have multiple portions different from one another.
And further:
([Figure 2] The figure explicitly shows the detection output being “modified”.)
Therefore, the 35 U.S.C. 102 rejections for claims 1 & 12 are maintained. Further, claims 2-5, 8, 10, 11, 13-16, 19, & 21-22 depend upon these claims and are therefore rejected under the same rationale, in addition to their own merits as provided in the previous action, and below, as amended.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-22 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea (mental process) without significantly more.
Regarding claim 1, in Step 1 of the 101-analysis set forth in MPEP 2106, the claim recites “A method of generating a prediction value”. A method is one of the four statutory categories of invention.
In Step 2a Prong 1 of the 101-analysis set forth in the MPEP 2106, the examiner has determined that the following limitations recite a process that, under the broadest reasonable interpretation, covers a mental process but for recitation of generic computer components:
“A method of generating a prediction value of a Neural Network (NN)” (A person can mentally evaluate data of a neural network and make a judgement to generate a “prediction value” based on it (MPEP 2106).)
“generating… a plurality of features based on an input object” (A person can mentally evaluate an input object and make a judgement to generate a list of features based on it (MPEP 2106).)
“generating… the prediction value based on the other human-interpretable output and the second portion of the input object” (A person can mentally evaluate human-interpretable output in relation to the input and make a judgement to generate a “prediction value” based on that (MPEP 2106).)
“generating… a detection output based on the plurality of features… indicative of a human-interpretable output for a first portion of the input object” (A person can mentally evaluate the plurality of features and make a judgement to generate a detection output (communicate something they noticed) in a way that is human-interpretable for a portion of the input (MPEP 2106).)
“generating… a modified detection output being indicative of an other human-interpretable output for the second portion of the input object, the other human-interpretable output being different from the human-interpretable output” (A person can mentally evaluate further portions of the input and make a judgement to modify the detection output (changing thoughts based on new information) in a way that is human-interpretable for the second portion of the input, this new modified output being different than the previous human-interpretable output (MPEP 2106).)
If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
In Step 2a Prong 2 of the 101-analysis set forth in MPEP 2106, the examiner has
determined that the following additional elements do not integrate this judicial exception into a
practical application:
“…the method executable by at least one processor…” (Uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).)
“…by the at least one processor employing a feature extraction sub-network…” (Uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).)
“…by the at least one processor employing a detection sub-network… the detection sub-network having been trained to generate the detection output” (Uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).)
“outputting, by the at least one processor, a user interface comprising the detection output” (Adding insignificant extra-solution activity (mere data output) to the judicial exception (MPEP 2106.05(g)).)
“receiving, by the at least one processor and via the user interface, user input comprising a selection of a second portion of the input object, the second portion being different from the first portion” (Adding insignificant extra-solution activity (mere data gathering) to the judicial exception (MPEP 2106.05(g)).)
“…by the at least one processor employing the detection sub-network…” (Uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).)
“…by the at least one processor employing a prediction sub-network…” (Uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).)
“providing, by the at least one processor, an indication of the prediction value and the other human-interpretable output via the user interface” (Adding insignificant extra-solution activity (mere data output) to the judicial exception (MPEP 2106.05(g)).)
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is “directed” to an abstract idea.
In Step 2b of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, additional elements (vi), (vii), (viii), (xi), & (xii) recite use of a computer as a tool to perform the abstract idea, which is not indicative of significantly more. Additional elements (ix), (x), (xiii) recite insignificant extra-solution activities. Further, elements (ix) & (xiii) recite steps that present output of data which has been determined by the courts to recite a well-understood, routine, and conventional activity which is not indicative of significantly more (Presenting offers and gathering statistics, OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93). Further, element (x) recites steps of receiving/transmitting data via a network, which has been determined by the courts to recite a well-understood, routine, and conventional activity, which is not indicative of significantly more (Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362). Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Regarding claim 2, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 2 recites “providing, by the at least one processor, an indication of the first portion of the input object to the user interface, in addition to the prediction value and the human-interpretable output” (In step 2A, prong 2, this recites insignificant extra-solution activity (mere data output) to the judicial exception (MPEP 2106.05(g).) In step 2B, the courts have found steps that present output of data to be a well-understood, routine, and conventional activity, which is not indicative of significantly more (Presenting offers and gathering statistics, OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93).)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Regarding claim 3, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 3 recites the following additional mental process:
“generating… an other prediction value based on the plurality of features” (A person can mentally evaluate the plurality of features and make a judgement to generate another “prediction value” based on that (MPEP 2106).)
Further, claim 3 recites “…by the at least one processor employing an other prediction sub-network…” (In step2A, prong 2, this recites using a computer as a tool to perform an abstract idea (MPEP 2106.05(f).) In step 2B, using a computer as a tool to perform an abstract idea is not indicative of significantly more.)
Further, claim 3 recites “providing, by the at least one processor, an indication of the other prediction value to the user interface, in addition to the prediction value and the human-interpretable output” (In step 2A, prong 2, this recites insignificant extra-solution activity (mere data output) to the judicial exception (MPEP 2106.05(g).) In step 2B, the courts have found steps that present output of data to be a well-understood, routine, and conventional activity, which is not indicative of significantly more (Presenting offers and gathering statistics, OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93).)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Regarding claim 4, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 4 recites “wherein the feature extraction sub-network is a Convolutional Neural Network (CNN)” (In step 2a, prong 2, this recites generally linking the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h).) In step 2B, generally linking the use of the judicial exception to a particular technological environment or field of use is not indicative of significantly more.)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Regarding claim 5, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 5 recites “wherein the prediction sub-network is at least one of a classification sub-network and a regression sub-network” (In step 2a, prong 2, this recites generally linking the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h).) In step 2B, generally linking the use of the judicial exception to a particular technological environment or field of use is not indicative of significantly more.)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Regarding claim 8, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 8 recites the following additional mental process:
“determining… a location of the first portion in the input object based on the plurality of features” (A person can mentally evaluate an input object in relation to a plurality of features and make a judgement to determine a location of the input object based on that plurality of features (MPEP 2106).)
Further, claim 8 recites “wherein the detection sub-network includes a Location
Learning Network (LLN) and a Concept Learning Network (CLN)” (In step 2a, prong 2, this recites generally linking the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h).) In step 2B, generally linking the use of the judicial exception to a particular technological environment or field of use is not indicative of significantly more.)
Further, claim 8 recites “…by the at least one processor employing the LLN…” (In step2A, prong 2, this recites using a computer as a tool to perform an abstract idea (MPEP 2106.05(f).) In step 2B, using a computer as a tool to perform an abstract idea is not indicative of significantly more.)
Further, claim 8 recites “generating, by the at least one processor employing the CLN, the human-interpretable output based on the location of the first portion and the plurality of features” (In step 2A, prong 2, this recites insignificant extra-solution activity (mere data output) to the judicial exception (MPEP 2106.05(g).) In step 2B, the courts have found steps that present output of data to be a well-understood, routine, and conventional activity, which is not indicative of significantly more (Presenting offers and gathering statistics, OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93).)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Regarding claim 10, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 10 recites “wherein the input object is at least one of: an image file, an audio file, and a video file” (In step 2a, prong 2, this recites generally linking the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h).) In step 2B, generally linking the use of the judicial exception to a particular technological environment or field of use is not indicative of significantly more.)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Regarding claim 11, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 11 recites “wherein the human-interpretable output is for enabling a user of the at least one processor to evaluate accuracy of the prediction value” (In step 2a, prong 2, this recites generally linking the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h).) In step 2B, generally linking the use of the judicial exception to a particular technological environment or field of use is not indicative of significantly more.)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Regarding claim 12, in Step 1 of the 101-analysis set forth in MPEP 2106, the claim recites “A system for generating a prediction value”. A system is within one of the four statutory categories of invention.
In Step 2a Prong 1 of the 101-analysis set forth in the MPEP 2106, the examiner has determined that the following limitations recite a process that, under the broadest reasonable interpretation, covers a mental process but for recitation of generic computer components:
“generating a prediction value of a Neural Network (NN)” (A person can mentally evaluate data of a neural network and make a judgement to generate a “prediction value” based on it (MPEP 2106).)
“generate… a plurality of features based on an input object” (A person can mentally evaluate an input object and make a judgement to generate a list of features based on it (MPEP 2106).)
“generate… the prediction value based on the modified human-interpretable output and the given portion of the input object” (A person can mentally evaluate human-interpretable output in relation to the input and make a judgement to generate a “prediction value” based on that (MPEP 2106).)
“generate… a detection output based on the plurality of features… indicative of a human-interpretable output for a given portion of the input object” (A person can mentally evaluate the plurality of features and make a judgement to generate a detection output (communicate something they noticed) in a way that is human-interpretable for a portion of the input (MPEP 2106).)
“generate, based on the user input, a modified human-interpretable output” (A person can mentally evaluate further portions of the input and make a judgement to modify the detection output (changing thoughts based on new information) in a way that is human-interpretable for the second portion of the input (MPEP 2106).)
If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. According, the claim “recites” an abstract idea.
In Step 2a Prong 2 of the 101-analysis set forth in MPEP 2106, the examiner has
determined that the following additional elements do not integrate this judicial exception into a
practical application:
“A system… comprising at least one processor and at least one memory, the at least one memory comprising instructions which, upon being executed by the at least one processor, cause the at least one processor to:…” (Uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).)
“…by employing a feature extraction sub-network…” (Mere instructions to apply the judicial exception (MPEP 2106.05(f)).)
“…by employing a detection sub-network… the detection sub-network having been trained to generate the detection output” (Mere instructions to apply the judicial exception (MPEP 2106.05(f)).)
“output a user interface comprising the human-interpretable output” (Adding insignificant extra-solution activity (mere data output) to the judicial exception (MPEP 2106.05(g)).)
“receive, via the user interface, user input comprising a modification to the human-interpretable output” (Adding insignificant extra-solution activity (mere data gathering) to the judicial exception (MPEP 2106.05(g)).)
“…by employing a prediction sub-network…” (Mere instructions to apply the judicial exception (MPEP 2106.05(f)).)
“provide an indication of the prediction value and the modified human-interpretable output via the user interface” (Adding insignificant extra-solution activity (mere data output) to the judicial exception (MPEP 2106.05(g)).)
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is “directed” to an abstract idea.
In Step 2b of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, additional element (vi) recites use of a computer as a tool to perform the abstract idea, which is not indicative of significantly more. Additional elements (vii), (viii), & (xi) recite mere instructions to apply the judicial exception, which are not indicative of significantly more. Additional elements (ix), (x), & (xii) recite insignificant extra-solution activities. Further, elements (ix) & (xii) recite steps that present output of data which has been determined by the courts to recite a well-understood, routine, and conventional activity which is not indicative of significantly more (Presenting offers and gathering statistics, OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93). Further, element (x) recites steps of receiving/transmitting data via a network, which has been determined by the courts to recite a well-understood, routine, and conventional activity, which is not indicative of significantly more (Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362). Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Regarding claims 13-16, 19, & 21-22, they are dependent upon claim 12, and thereby incorporate the limitations of, and corresponding analysis applied to claim 12. Further, claims 13-16, 19, & 21-22 recite similar additional limitations as claims 2-5, 8, & 10-11 respectively, and are rejected under the same rationale.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-8, 10-19, & 21-22 are rejected under 35 U.S.C. 102(a)(1) as being clearly anticipated by Jeyukumar, J. et al. “Automatic Concept Extraction for Concept Bottleneck-based Video Classification.” Available at https://arxiv.org/pdf/2206.10129 on June 21 2022 (hereafter, JEYUKUMAR)
Regarding claim 1, JEYUKUMAR teaches “A method of generating a prediction value of a Neural Network (NN), the method executable by a processor” [Figure 1]
In the above figure, the dotted arrow shows “Model Outputs” which can be seen to include “Concepts Prediction” (A prediction), “Concept Scores” (A prediction value) and “Predicted Label”. The method is a ML method using a Concept Bottleneck Model, meaning that “execution by a processor” is inherent since processors are a known requirement for ML models as shown in (https://www.einfochips.com/blog/everything-you-need-to-know-about-hardware-requirements-for-machine-learning/).
Further, JEYUKUMAR teaches “the method comprising: generating, by the at least one processor employing a feature extraction sub-network, a plurality of features based on an input object” [Figure 1]
The Input Object (Dataset (Videos) (X)) is shown being input into a “Feature Extractor” that extracts “Latent features”/ “A plurality of features” based on the input object.
And further:
([3.2 Concept Bottleneck Model, Paragraphs 1-2] “We use the videos, the extracted concepts from CoDEx, and the labels to train an interpretable concept-bottleneck model to predict the activity and the corresponding concepts. …
Our bottleneck model architecture is based on the standard end-to-end video classification models where we use convolutional neural network-based feature extractors pretrained on the Imagenet dataset [8] to extract the spatial features from the videos (the feature extraction network(s) is/are subnetwork(s) of the Concept Bottleneck Model). The features are then passed through temporal layers that can capture features across multiple frames which in turn is bottle-necked to predict the concepts. …”)
Further, JEYUKUMAR teaches “generating, by the at least one processor employing a detection sub-network, a detection output based on the plurality of features, the detection sub-network having been trained to generate the detection output indicative of a human-interpretable output for a first portion of the input object”:
([3.2 Concept Bottleneck Model, Paragraphs 2-4] “Our bottleneck model architecture is based on the standard end-to-end video classification models where we use convolutional neural network-based feature extractors pretrained on the Imagenet dataset [8] to extract the spatial features from the videos (the feature extraction network(s) is/are subnetwork(s) of the Concept Bottleneck Model). The features are then passed through temporal layers that can capture features across multiple frames which in turn is bottle-necked to predict the concepts. Lastly, we deploy an additive attention module [3] that gives the concept score 𝛼𝑐 indicating the importance of every concept to the classification (a detection sub-network that generates detection outputs based on the plurality of features extracted by the feature extractor). …
Model loss function. The entire bottleneck classification model is trained in an end-to-end manner. … The hyperparameter 𝛽 controls the tradeoff between concept loss, 𝐿𝐶, versus classification loss, 𝐿𝑌 as shown in equation 3.
PNG
media_image1.png
65
412
media_image1.png
Greyscale
PNG
media_image2.png
175
447
media_image2.png
Greyscale
Testing phase. Given an input test video, the model provides us with the activity prediction (label of the video) (the actual prediction), a concept vector indicating the relevant concepts that induced this classification and the concept importance score for each concept (a prediction value). By retrieving the phrase representing the concepts present in the video, the result obtained is a human-understandable explanation (human-interpretable output for a first portion (based on feature extraction) of the input object) of the classification.”)
Further, JEYUKAMAR teaches “outputting, by the at least one processor, a user interface comprising the detection output” [Figure 5]
In figure 5, the prediction values and human-interpretable detection outputs can be seen presented on a user interface.
Further, JEYUKAMAR teaches “receiving, by the at least one processor and via the user interface, user input comprising a selection of a second portion of the input object, the second portion being different from the first portion”:
[Figure 1]
The Input Object (Dataset (Videos (plural) (X)) is shown being input into a “Feature Extractor” that extracts “Latent features”/ “A plurality of features” based on the input object. This shows that multiple videos are input with a plurality of features being extracted, meaning that the user input comprises a second portion of the input object and the “plurality of features” that are extracted means that a “selection” of the second portion, which is different from the first portion takes place as well.
Further, JEYUKAMAR teaches “generating, by the at least one processor employing the detection sub-network, a modified detection output being indicative of an other human-interpretable output for the second portion of the input object, the other human-interpretable output being different from the human-interpretable output”:
[Figure 5]
In figure 5, we can see the “human-interpretable output” (e.g. “the batter hit the ball into foul territory”) and the “second portion of the input object”/the extracted features (e.g. the “Predicted Concepts”) which are shown to have multiple portions different from one another.
And further:
([Figure 2] The figure explicitly shows the detection output being “modified”.)
Further, JEYUKUMAR teaches “generating, by the at least one processor employing a prediction sub-network, the prediction value based on the other human-interpretable output and the second portion of the input object” [Figure 5]
In figure 5, the “Concept Score” can be seen as the “prediction value” which is based upon the “human-interpretable output” (e.g. “the batter hit the ball into foul territory”) and the “second portion of the input object”/the extracted features (e.g. the “Predicted Concepts”).
Further, JEYUKUMAR teaches “providing, by the at least one processor, an indication of the prediction value and the other human-interpretable output via the user interface” [Figure 5]
In figure 5, the prediction values and human-interpretable outputs can be seen presented on a user interface.
Regarding claim 2, JEYUKUMAR teaches the limitations of claim 1. Further, JEYUKUMAR teaches “wherein the method further comprises: providing, by the at least one processor, an indication of the first portion of the input object to the user interface, in addition to the prediction value and the human-interpretable output” [Figure 5]
In figure 5, an “indication of the given portion of the input object” (e.g. “Predicted Class: ‘foul’) is shown on the interface in addition to the prediction value/Concept Score and the human-interpretable output (e.g. “the batter hit the ball into foul territory”).
Regarding claim 3, JEYUKUMAR teaches the limitations of claim 1. Further, JEYUKUMAR teaches “wherein the method further comprises: generating, by the processor employing an other prediction sub-network, an other prediction value based on the plurality of features”:
([3.2 Concept Bottleneck Model, Paragraphs 2-4] “Our bottleneck model architecture is based on the standard end-to-end video classification models where we use convolutional neural network-based feature extractors pretrained on the Imagenet dataset [8] to extract the spatial features from the videos (the feature extraction network(s) is/are subnetwork(s) of the Concept Bottleneck Model. Take note that multiple may be used including a second sub-network). The features are then passed through temporal layers that can capture features across multiple frames which in turn is bottle-necked to predict the concepts. Lastly, we deploy an additive attention module [3] that gives the concept score 𝛼𝑐 indicating the importance of every concept to the classification (a detection sub-network that generates detection outputs based on the plurality of features extracted by the feature extractor. A second sub-network may be used as cited above.). The attention module also improves the interpretability of the bottleneck model by indicating the key concepts for classification and this is evaluated in section 5. More details regarding the model architecture and hyper-parameters are in the Appendix A.4
…
Testing phase. Given an input test video, the model provides us with the activity prediction (label of the video) (the actual prediction), a concept vector indicating the relevant concepts that induced this classification and the concept importance score for each concept (a prediction value). By retrieving the phrase representing the concepts present in the video, the result obtained is a human-understandable explanation (human-interpretable output for a given portion (based on feature extraction) of the input object) of the classification.”) Further, as mentioned above, the use of a processor in ML methods is inherent.
And further [Figure 5]
In figure 5, multiple concepts (e.g. “the batter hit the ball into foul territory” & “the batter made contact with the ball” etc.) are noted with their own Concept Scores/Prediction values.
Further, JEYUKUMAR teaches “providing, by the at least one processor, an indication of the other prediction value to the user interface, in addition to the prediction value and the human-interpretable output” [Figure 5]
The multiple prediction values as cited above are all shown to be presented via the same user interface, alongside the human-interpretable output. Further, as mentioned above, the use of a processor in ML methods is inherent.
Regarding claim 4, JEYUKUMAR teaches the limitations of claim 1. Further, JEYUKUMAR teaches “wherein the feature extraction sub-network is a Convolutional Neural Network (CNN)”:
([3.2 Concept Bottleneck Model, Paragraph 2] “Our bottleneck model architecture is based on the standard end-to-end video classification models where we use convolutional neural network-based feature extractors pretrained on the Imagenet dataset [8] to extract the spatial features from the videos (the feature extraction network(s) is/are subnetwork(s) convolutional neural network(s)). …”)
Regarding claim 5, JEYUKUMAR teaches the limitations of claim 1. Further, JEYUKUMAR teaches “wherein the prediction sub-network is at least one of a classification sub-network and a regression sub-network”:
([3.2 Concept Bottleneck Model, Paragraph 2] “…Lastly, we deploy an additive attention module [3] (the prediction sub-network) that gives the concept score 𝛼𝑐 indicating the importance of every concept to the classification (a classification sub-network). The attention module also improves the interpretability of the bottleneck model by indicating the key concepts for classification (a classification sub-network) and this is evaluated in section 5. …”)
Regarding claim 8, JEYUKUMAR teaches the limitations of claim 1. Further, JEYUKUMAR teaches “wherein the detection sub-network includes a Location Learning Network (LLN) and a Concept Learning Network (CLN)”:
([3.2 Concept Bottleneck Model, Paragraphs 1-2] “We use the videos, the extracted concepts (learned concepts from a neural network meaning it is a “Concept Learning Network”) from CoDEx, and the labels to train an interpretable concept-bottleneck model to predict the activity and the corresponding concepts (learned concepts from a neural network meaning it is a “Concept Learning Network”). Figure 1 shows the overview of our bottleneck architecture. The activity label, the concepts, and the corresponding concept scores are the outputs of the interpretable model and are indicated by dotted arrows in Figure 1 (learned concepts from a neural network meaning it is a “Concept Learning Network”).
Our bottleneck model architecture is based on the standard end-to-end video classification models where we use convolutional neural network-based feature extractors pretrained on the Imagenet dataset [8] to extract the spatial features from the videos (extraction of spatial features means that this is a “location learning network”). The features are then passed through temporal layers that can capture features across multiple frames which in turn is bottle-necked to predict the concepts (learned concepts from a neural network meaning it is a “Concept Learning Network”) …”)
Further, JEYUKUMAR teaches “wherein the generating detection output comprises: determining, by the at least one processor employing the LLN, a location of the first portion in the input object based on the plurality of features”:
([3.2 Concept Bottleneck Model, Paragraph 2] “Our bottleneck model architecture is based on the standard end-to-end video classification models where we use convolutional neural network-based feature extractors pretrained on the Imagenet dataset [8] to extract the spatial features from the videos (extraction of spatial features means that this is a “location learning network” where the location of the given portion in the input object is determined based on the features). The features are then passed through temporal layers that can capture features across multiple frames which in turn is bottle-necked to predict the concepts. …”)
Further, JEYUKUMAR teaches “generating, by the at least one processor employing the CLN, the human-interpretable output based on the location of the first portion and the plurality of features”:
([3.2 Concept Bottleneck Model, Paragraph 2] “Our bottleneck model architecture is based on the standard end-to-end video classification models where we use convolutional neural network-based feature extractors pretrained on the Imagenet dataset [8] to extract the spatial features from the videos (extraction of spatial features means that this is a “location learning network” where the location of the given portion in the input object is determined based on the features). The features are then passed through temporal layers that can capture features across multiple frames which in turn is bottle-necked to predict the concepts. …”)
And further, [Figure 5]
In figure 5, the “Human-Interpretable Output” can be seen (e.g. “the batter hit the ball into foul territory”) and the “given portion of the input object”/the extracted features (e.g. the “Predicted Concepts”). As shown in the prior citation, these would have been influenced based on spatial location of the relevant given input portions. Further, as mentioned above, the use of a processor in ML methods is inherent.
Regarding claim 10, JEYUKUMAR teaches the limitations of claim 1. Further, JEYUKUMAR teaches “wherein the input object is at least one of: an image file, an audio file, and a video file” [Figure 1]
The Input Object (Dataset (Videos) (X)) is shown as the set of input objects which are video files.
Regarding claim 11, JEYUKUMAR teaches the limitations of claim 1. Further, JEYUKUMAR teaches “wherein the human-interpretable output is for enabling a user of the at least one processor to evaluate accuracy of the prediction value”:
([Introduction, Paragraph 2, Research Questions] “In summary, this paper seeks to answer the following research questions:
How can a machine automatically elicit the inherent complex concepts from natural language to construct a necessary and sufficient set of concepts for video classification tasks?
Given that a machine can extract such concepts, are they informative and meaningful enough to be detected in videos by DNNs for downstream prediction tasks?
Are the machine extracted concepts perceived by humans as good explanations for the correct classifications?” In other words, humans will evaluate accuracy of the prediction values to determine if it is accurate?)
Regarding claim 12, JEYUKUMAR teaches “A system for generating a prediction value of a Neural Network (NN), the system comprising at least one processor and at least one memory, the at least one memory comprising instructions which, upon being executed by the at least one processor, cause the at least one processor to:” [Figure 1]
In the above figure, the dotted arrow shows “Model Outputs” which can be seen to include “Concepts Prediction” (A prediction), “Concept Scores” (A prediction value) and “Predicted Label”. The method is a ML (machine learning) method using a Concept Bottleneck Model, meaning that the presence of and execution of a processor with some form of memory is inherent, as these are well-known requirements of (ML) machine learning as shown in (since processors are a known requirement for ML models as shown in (https://www.einfochips.com/blog/everything-you-need-to-know-about-hardware-requirements-for-machine-learning/).).
Further, claim 12 recites similar additional limitations as claim 1 and is rejected under the same rationale.
Regarding claims 13-16, 19, & 21-22, JEYUKUMAR teaches the limitations of claim 12. Further, claims 13-16, 19, & 21-22 recite similar additional limitations as claims 2-5, 8, & 10-11, respectively, and are rejected under the same rationale.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW LEE LEWIS whose telephone number is (571)272-1906. The examiner can normally be reached Monday: 12:00PM - 4:00PM and Tuesday - Friday: 12:00PM - 9PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached at (571)272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Matthew Lee Lewis/ Examiner, Art Unit 2144 /TAMARA T KYLE/Supervisory Patent Examiner, Art Unit 2144