Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This action is in reply to the amendments and remarks filed on 10/29/2025.
Claims 1-16 and 21-24 are pending.
Claims 1, 6, 9, 10, and 13-14 have been amended.
Claims 17-20 were previously canceled.
Response to Arguments
Applicant’s arguments, with respect to the rejection(s) of claim(s) 1, 9, and 13 under 35 U.S.C. 103, have been considered but they are not persuasive. Applicant argues that no reference teaches the amended claim limitation that now states “apply, after performing the first convolution calculation and the second convolution calculation, a task-specific filter to the dedicated output data to selectively obtain a subset of the dedicated output data corresponding to the first task” since “Kurokawa does not disclose applying a post-convolution, task-specific filter to select a subset of dedicated output data based on the current task”. The examiner respectfully disagrees in view of the broadness of the claim language.
Kurokawa has been found to teach the amended limitations since, paragraphs 0108, 0111, 0115, 0162-0169, and Fig. 3 teach a “CNN can be formed of a plurality of convolution layers CL (first/second convolution calculation)…in which z layers L (a layer L.sub.1 to a layer L.sub.z) (here, z is an integer greater than or equal to 1)” and further the “neural network [can be] including L layers (here, L is an integer greater than or equal to 3)”. The layers utilizing a “weight filter” for the “convolutional processing”. Further, paragraphs 106-109 and 0118-0120 and Figs. 3-5B teach “weight filter” in each CNN layer including the layers after L1-L2 (after performing the first convolution calculation and the second convolution calculation), wherein “image data input to the convolution layer CL is subjected to filter processing using the filters fil.sub.a, fil.sub.b, and fil.sub.c, so that data D.sub.a, D.sub.b, and D.sub.c are generated” (task-specific) for each pixel region in the image on the data passed from the previous layer.
See 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant amendments.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-16 and 21-24 are rejected under 35 U.S.C. 103 as being unpatentable over Gao et al (US Pub 20160342895) hereinafter Gao, in view of Kurokawa et al (US Pub 20200382730) hereinafter Kurokawa, in view of Szegedy et al (US Pub 20180068207) hereinafter Szegedy.
Regarding claim 1, Gao teaches a computer program product comprising a neural network model comprising M network layers and configured to execute N tasks comprising a first task, wherein when executing the first task (paragraph 0091 teach “Embodiments of the present invention may be encoded upon one or more non-transitory computer-readable media with instructions for one or more processors or processing units to cause steps to be performed”), an ith network layer in the M network layers causes an apparatus to:
obtain input data based on a type of the ith network layer, wherein the ith network layer is a convolutional layer, a fully connected layer, a deconvolution layer, or a recurrent layer (paragraphs 0037-0041, 0045, and Fig. 2 and 4 teach obtaining input question and image data to the mQA system, wherein the question data is routed to LSTM embedding layers for processing (based on a type of the ith network layer… recurrent layer));
obtain output data based on a tth group of dedicated weight values corresponding to the first task, a shared weight value that executes each of the N tasks, and the input data, wherein the ith network layer comprises the shared weight value and N groups of dedicated weight values, wherein each of the N groups of dedicated weight values executes one of the N tasks (paragraphs 0037 and 0044 teach “the weight matrix in the word embedding layers of the two LSTMs (one for the question and one for the answer) are shared (shared weight value that executes each of the N tasks, and the input data)” for mapping “a one-hot vector of the word into a dense semantic space” based on the matrix of weights (shared weight value that executes each of the N tasks…wherein each of the N groups of dedicated weight values executes one of the N tasks)), wherein the N groups of dedicated weight values are in one-to-one correspondence with the N tasks, wherein the shared weight value remains unchanged when switching from the first task to a second task, wherein 1≤i≤M, wherein i is an integer, wherein N is an integer greater than or equal to 2, wherein M is a positive integer, wherein 1≤t≤N, and wherein t is an integer (paragraphs 0037, 0039, 0044-0045, 0054, and Fig. 2 and 4 teach “the weight matrix in the word embedding layers of the two LSTMs (one for the question and one for the answer) are shared” in the multiple layers of each LSTM (M is a positive integer) and of neural networks in the mQA System (alternative M is a positive integer); and the matrix can be one of a number of “matrices” (wherein i is an integer, wherein N is an integer greater than or equal to 2). The matrices are utilized for mapping “a one-hot vector of the word into a dense semantic space” based on the matrix of weights and repeating this for each word using the matrix (shared weight value remains unchanged when switching from the first task to a second task).);
transmit the output data to an (i+1)th network layer in the M network layers when 1≤i<M, wherein the output data comprises the shared output data and the subset of the dedicated output data (paragraph 0060 and Fig. 2 teach outputting each of the LSTM results (output data comprises the shared output data and the subset of the dedicated output data) to a Softmax layer ((i+1)th network layer), and “The function of the weight matrix in the Softmax layer is to decode the dense word representation into a pseudo one-word representation, which is the inverse operation of the word embedding”); and
output the output data when i=M (paragraph 0060 and Fig. 2 teach outputting the LSTM results (output data) to a Softmax layer, and “The function of the weight matrix in the Softmax layer is to decode the dense word representation into a pseudo one-word representation, which is the inverse operation of the word embedding”).
However, Gao does not explicitly teach perform a first convolution calculation on the input data using the shared weight value to obtain the shared output data when the ith network layer is the convolutional layer; perform a second convolution calculation on the input data using the tth group of dedicated weight values to obtain the dedicated output data when the ith network layer is the convolutional layer; apply, after performing the first convolution calculation and the second convolution calculation, a task-specific filter to the dedicated output data to selectively obtain a subset of the dedicated output data corresponding to the first task.
Kurokawa teaches perform a first convolution calculation on the input data using the shared weight value to obtain the shared output data when the ith network layer is the convolutional layer (paragraphs 0108, 0111, 0115, 0162-0169, and Fig. 3 teach a “CNN can be formed of a plurality of convolution layers CL (first convolution calculation)…in which z layers L (a layer L.sub.1 to a layer L.sub.z) (here, z is an integer greater than or equal to 1)” and further the “neural network [can be] including L layers (here, L is an integer greater than or equal to 3)”. The layers utilizing a “weight filter” for the “convolutional processing”);
perform a second convolution calculation on the input data using the tth group of dedicated weight values to obtain the dedicated output data when the ith network layer is the convolutional layer (paragraphs 0108, 0111, 0115, 0162-0169, and Fig. 3 teach a “CNN can be formed of a plurality of convolution layers CL (first/second convolution calculation)…in which z layers L (a layer L.sub.1 to a layer L.sub.z) (here, z is an integer greater than or equal to 1)” and further the “neural network [can be] including L layers (here, L is an integer greater than or equal to 3)”. The layers utilizing a “weight filter” for the “convolutional processing”);
apply, after performing the first convolution calculation and the second convolution calculation, a task-specific filter to the dedicated output data to selectively obtain a subset of the dedicated output data corresponding to the first task (paragraphs 106-109 and 0118-0120 and Figs. 3-5B teach “weight filter” in each CNN layer including the layers after L1-L2 (after performing the first convolution calculation and the second convolution calculation), wherein “image data input to the convolution layer CL is subjected to filter processing using the filters fil.sub.a, fil.sub.b, and fil.sub.c, so that data D.sub.a, D.sub.b, and D.sub.c are generated” (task-specific) for each pixel region in the image on the data passed from the previous layer).
Further, Gao at least implies wherein 1≤i≤M, wherein i is an integer, wherein N is an integer greater than or equal to 2, wherein M is a positive integer, wherein 1≤t≤N, and wherein t is an integer, however Kurokawa teaches wherein 1≤i≤M, wherein i is an integer, wherein N is an integer greater than or equal to 2, wherein M is a positive integer, wherein 1≤t≤N, and wherein t is an integer (paragraphs 0111, 0162-0169, and Fig. 3 teach a “CNN can be formed of a plurality of convolution layers CL…in which z layers L (a layer L.sub.1 to a layer L.sub.z) (here, z is an integer greater than or equal to 1)” and further the “neural network [can be] including L layers (here, L is an integer greater than or equal to 3)”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement a multi-layered neural network corresponding to specific integer values as taught by Kurokawa into Gao’s teaching of a shared weight neural networks system in order to optimize a neural network’s hierarchical structure (Kurokawa, paragraphs 0111, 0162-0169, 0248, and Fig. 3).
Further still, Gao at least implies obtain input data based on a type of the ith network layer, wherein the ith network layer is a convolutional layer, a fully connected layer, a deconvolution layer, or a recurrent layer, however Szegedy teaches obtain input data based on a type of the ith network layer, wherein the ith network layer is a convolutional layer, a fully connected layer, a deconvolution layer, or a recurrent layer (paragraphs 0020 teach a “convolutional layers have nodes that produce an activation by convolving received inputs in accordance with a set of weights for each node. In some cases, nodes in a convolutional layer may be configured to share weights”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify a shared weight neural networks system, as taught by Gao as modified by a multi-layered neural network corresponding to specific integer values as taught by Kurokawa, to include a convolutional layer receiving inputs for convolving utilizing shared weights as taught by Szegedy, in order to increase training speed and “performance on the image processing task” (Szegedy, paragraphs 0005 and 0020).
Regarding claims 2 and 22, the combination of Gao, Kurokawa, and Szegedy teach all the claim limitations of claims 1 and 21 above; and further teach wherein the ith neural network model is a convolutional model (Kurokawa, paragraphs 0111, 0162-0169, and Fig. 3 teach a “CNN can be formed of a plurality of convolution layers CL…in which z layers L (a layer L.sub.1 to a layer L.sub.z) (here, z is an integer greater than or equal to 1)” and further the “neural network [can be] including L layers (here, L is an integer greater than or equal to 3)”).
Gao and Kurokawa are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claims 3 and 23, the combination of Gao, Kurokawa, and Szegedy teach all the claim limitations of claims 1 and 21 above; and further teach wherein the ith neural network model is an artificial neural network model (Gao, paragraph 0027-0028 and 0060 teach neural networks (ANNs) with access to a “fully connected Softmax layer”).
Regarding claim 4, the combination of Gao, Kurokawa, and Szegedy teach all the claim limitations of claim 1 above; and further teach wherein the second task is one of the N tasks (Gao, paragraphs 0037, 0039, 0044-0045, 0054, and Fig. 2 teach “the weight matrix in the word embedding layers of the two LSTMs (one for the question and one for the answer) are shared” in the multiple layers of each LSTM and of neural networks in the mQA System; and the matrix can be one of a number of “matrices”. The matrices are utilized for mapping “a one-hot vector of the word into a dense semantic space” based on the matrix of weights and repeating this for each word using the matrix (second task is one of the N tasks).).
Regarding claim 5, the combination of Gao, Kurokawa, and Szegedy teach all the claim limitations of claim 1 above; and further teach wherein the second task is different from the first task (Gao, paragraphs 0037, 0039, 0044-0045, 0054, and Fig. 2 teach “the weight matrix in the word embedding layers of the two LSTMs (one for the question and one for the answer) are shared” in the multiple layers of each LSTM and of neural networks in the mQA System; and the matrix can be one of a number of “matrices”. The matrices are utilized for mapping “a one-hot vector of the word into a dense semantic space” based on the matrix of weights and processing for each word using the matrix (second task is different from the first task).).
Regarding claim 6, the combination of Gao, Kurokawa, and Szegedy teach all the claim limitations of claim 1 above; and further teach wherein the recurrent layer comprises one of a recurrent neural network (RNN) layer and a long short-term memory (LSTM) layer (Gao, paragraph 0037, 0045, 0050, and Fig. 2-3 teach LSTM layers and combining outputs).
Regarding claim 7, the combination of Gao, Kurokawa, and Szegedy teach all the claim limitations of claim 1 above; and further teach when the ith network layer is the fully connected layer, the ith network layer further causes the apparatus to: perform a first multiply-add calculation on the input data using the shared weight value to obtain the shared output data; and perform a second multiply-add calculation on the input data using the tth group of dedicated weight values to obtain the dedicated output data (Kurokawa, paragraphs 0106-0107, 0112 teach the CNN have “a fully connected layer” and the “convolution processing is performed by repeating the product-sum operation”).
Gao, Kurokawa, and Szegedy are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 8, the combination of Gao, Kurokawa, and Szegedy teach all the claim limitations of claim 1 above; and further teach wherein when the ith network layer is the deconvolution layer, the ith network layer further causes the apparatus to: perform a first transposed convolution calculation on the input data using the shared weight value to obtain the shared output data; and perform a second transposed convolution calculation on the input data using the tth group of dedicated weight values to obtain the dedicated output data (Gao, paragraphs 0037, 0060, and Fig. 3 teach “A transposed weight sharing scheme (deconvolution) may also be adopted to allow weight sharing between the word embedding layer and the fully connected Softmax layer”).
Regarding claims 9 and 13 [claim 13 differences noted in brackets], Gao teaches a data processing method comprising (paragraph 0091 teach “Embodiments of the present invention may be encoded upon one or more non-transitory computer-readable media with instructions for one or more processors or processing units to cause steps to be performed”):
obtaining a first to-be-processed object [image] (paragraphs 0037-0039 and Fig. 2 teach obtaining input question and image data to the mQA system);
receiving, from a user, a first processing operation instructing execution of a first task [denoising task] on the first to-be-processed object [image] (paragraphs 0037-0039, claim 18, and Fig. 2 teach obtaining input question and image data to the mQA system from a “user”);
obtaining, in response to the first processing operation, a tth group of dedicated weight values, a shared weight value, and first input data in an ith network layer, wherein the ith network layer is a convolutional layer, a fully connected layer, a deconvolution layer, or a recurrent layer, wherein the first input data is based on a type of the ith network layer, and wherein the first input data is either data output after an (i-1)th network layer in M network layers processes the first to-be-processed object [image] when 1<i<M or data of the first to-be-processed object [image] when i=1 (paragraphs 0037-0045 and Fig. 2 and 4 teach processing the obtained input data via “the weight matrix in the word embedding layers of the two LSTMs (one for the question and one for the answer) (recurrent layer) are shared (tth group of dedicated weight values, a shared weight value, and first input data in an ith network layer…data of the first to-be-processed object [image] when i=1)” for mapping “a one-hot vector of the word into a dense semantic space” and images based on the matrix of weights);
transmit the output data to an (i+1)th network layer in the M network layers when 1≤i<M, wherein the output data comprises the shared output data and the subset of the dedicated output data (paragraph 0060 and Fig. 2 teach outputting each of the the LSTM results (output data comprises the shared output data and the subset of the dedicated output data) to a Softmax layer ((i+1)th network layer), and “The function of the weight matrix in the Softmax layer is to decode the dense word representation into a pseudo one-word representation, which is the inverse operation of the word embedding”);
transmitting the first output data, wherein the output data comprises the shared output data and the dedicated output data (paragraph 0060 and Fig. 2 teach outputting the LSTM results (output data comprises the shared output data and the dedicated output data) to a Softmax layer ((i+1)th network layer), and “The function of the weight matrix in the Softmax layer is to decode the dense word representation into a pseudo one-word representation, which is the inverse operation of the word embedding”);
obtaining a second to-be-processed object [image] (paragraphs 0037-0039 and Fig. 2 teach obtaining input questions and images data to the mQA system);
receiving, from the user, a second processing operation instructing execution of a second [image recognition] task on the second to-be-processed object [image], wherein the second [image recognition] task is one of N tasks and is different from the first task, and wherein the shared weight value remains unchanged when switching from the first [image denoising] task to the second [image recognition] task (paragraphs 0037-0039, claim 18, and Fig. 2 teach obtaining input questions and images data to the mQA system from a “user”. Paragraphs 0037, 0039-0045, 0054, claim 18, and Fig. 2 and 4 teach matrices are utilized for mapping “a one-hot vector of the word into a dense semantic space” based on the matrix of weights and repeating this for each word using the matrix and image representation (shared weight value remains unchanged when switching from the first [image denoising] task to the second [image recognition] task).); and
obtaining, in response to the second processing operation, a qth group of dedicated weight values and second input data in the ith network layer, wherein the qth group of dedicated weight values are in the ith network layer that uniquely correspond to the second [image recognition] task (paragraphs 0029, 0037, 0041-0045, 0070 teach “the weight matrix in the word embedding layers of the two LSTMs (one for the question and one for the answer) are shared (tth group of dedicated weight values, a shared weight value, and first input data in an ith network layer…data of the first to-be-processed object [image] when i=1)” for mapping “a one-hot vector of the word into a dense semantic space” and images based on the matrix of weights), wherein N>q>1, wherein q~t, wherein q is an integer, and wherein the second input data is either data output after the (i-1)h network layer processes the second to-be-processed object [image] when 1<i<M or data of the second to-be-processed object when i=1 (paragraphs 0029, 0037, 0044, 0054, 0070, and Fig. 2 teach “the weight matrix in the word embedding layers of the two LSTMs (one for the question and one for the answer) are shared” in the multiple layers of each LSTM (M is a positive integer) and of neural networks in the mQA System (alternative M is a positive integer); and the matrix can be one of a number of “matrices” (wherein i is an integer, wherein N is an integer greater than or equal to 2). The matrices are utilized for mapping “a one-hot vector of the word into a dense semantic space” based on the matrix of weights.);
obtaining second output data based on the qth group of dedicated weight values, the second input data, and the shared weight value; and transmitting the second output data (paragraph 0060 and Fig. 2 teach outputting the LSTM results (output data) to a Softmax layer ((i+1)th network layer), and “The function of the weight matrix in the Softmax layer is to decode the dense word representation into a pseudo one-word representation, which is the inverse operation of the word embedding”).
However, Gao does not explicitly teach performing a first convolution calculation on the input data using the shared weight value to obtain the shared output data when the ith network layer is the convolutional layer; performing a second convolution calculation on the input data using the tth group of dedicated weight values to obtain the dedicated output data when the ith network layer is the convolutional layer; apply, after performing the first convolution calculation and the second convolution calculation, a task-specific filter to the dedicated output data to selectively obtain a subset of the dedicated output data corresponding to the first task.
Kurokawa teaches performing a first convolution calculation on the input data using the shared weight value to obtain the shared output data when the ith network layer is the convolutional layer (paragraphs 0108, 0111, 0115, 0162-0169, and Fig. 3 teach a “CNN can be formed of a plurality of convolution layers CL (first convolution calculation)…in which z layers L (a layer L.sub.1 to a layer L.sub.z) (here, z is an integer greater than or equal to 1)” and further the “neural network [can be] including L layers (here, L is an integer greater than or equal to 3)”. The layers utilizing a “weight filter” for the “convolutional processing”);
performing a second convolution calculation on the input data using the tth group of dedicated weight values to obtain the dedicated output data when the ith network layer is the convolutional layer (paragraphs 0108, 0111, 0115, 0162-0169, and Fig. 3 teach a “CNN can be formed of a plurality of convolution layers CL (first/second convolution calculation)…in which z layers L (a layer L.sub.1 to a layer L.sub.z) (here, z is an integer greater than or equal to 1)” and further the “neural network [can be] including L layers (here, L is an integer greater than or equal to 3)”. The layers utilizing a “weight filter” for the “convolutional processing”);
apply, after performing the first convolution calculation and the second convolution calculation, a task-specific filter to the dedicated output data to selectively obtain a subset of the dedicated output data corresponding to the first task (paragraphs 106-109 and 0118-0120 and Figs. 3-5B teach “weight filter” in each CNN layer including the layers after L1-L2 (after performing the first convolution calculation and the second convolution calculation), wherein “image data input to the convolution layer CL is subjected to filter processing using the filters fil.sub.a, fil.sub.b, and fil.sub.c, so that data D.sub.a, D.sub.b, and D.sub.c are generated” (task-specific) for each pixel region in the image on the data passed from the previous layer).
Further, Gao at least implies wherein N>q>1, wherein q~t, wherein q is an integer, and wherein the second input data is either data output after the (i-1)h network layer processes the second to-be-processed object [image] when 1<i<M or data of the second to-be-processed object when i=1, however Kurokawa teaches wherein N>q>1, wherein q~t, wherein q is an integer, and wherein the second input data is either data output after the (i-1)h network layer processes the second to-be-processed object [image] when 1<i<M or data of the second to-be-processed object when i=1 (paragraphs 0111, 0162-0169, and Fig. 3 teach a “CNN can be formed of a plurality of convolution layers CL…in which z layers L (a layer L.sub.1 to a layer L.sub.z) (here, z is an integer greater than or equal to 1)” and further the “neural network [can be] including L layers (here, L is an integer greater than or equal to 3)”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement a multi-layered neural network corresponding to specific integer values as taught by Kurokawa into Gao’s teaching of a shared weight neural networks system in order to optimize a neural network’s hierarchical structure (Kurokawa, paragraphs 0111, 0162-0169, 0248, and Fig. 3).
Further still, Gao at least implies obtain input data based on a type of the ith network layer, wherein the ith network layer is a convolutional layer, a fully connected layer, a deconvolution layer, or a recurrent layer, however Szegedy teaches obtain input data based on a type of the ith network layer, wherein the ith network layer is a convolutional layer, a fully connected layer, a deconvolution layer, or a recurrent layer (paragraphs 0020 teach a “convolutional layers have nodes that produce an activation by convolving received inputs in accordance with a set of weights for each node. In some cases, nodes in a convolutional layer may be configured to share weights”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify a shared weight neural networks system, as taught by Gao as modified by a multi-layered neural network corresponding to specific integer values as taught by Kurokawa, to include a convolutional layer receiving inputs for convolving utilizing shared weights as taught by Szegedy, in order to increase training speed and “performance on the image processing task” (Szegedy, paragraphs 0005 and 0020).
Regarding claims 10 and 14, the combination of Gao, Kurokawa, and Szegedy teach all the claim limitations of claims 1 and 13 above; and further teach wherein the recurrent layer comprises one of a recurrent neural network (RNN) layer and a long short-term memory (LSTM) layer (Gao, paragraph 0037, 0045, 0050, and Fig. 2-3 teach LSTM layers and combining outputs).
Regarding claims 11 and 16, the combination of Gao, Kurokawa, and Szegedy teach all the claim limitations of claim 9 and 13 above; and further teach wherein when the ith network layer is a fully connected layer, the ith network layer further causes the apparatus to: perform a first multiply-add calculation on the input data using the shared weight value to obtain the shared output data; and perform a second multiply-add calculation on the input data using the tth group of dedicated weight values to obtain the dedicated output data (Kurokawa, paragraphs 0106-0107, 0112 teach the CNN have “a fully connected layer” and the “convolution processing is performed by repeating the product-sum operation”).
Gao, Kurokawa, and Szegedy are combinable for the same rationale as set forth above with respect to claims 9 and 13.
Regarding claims 12 and 15, the combination of Gao, Kurokawa, and Szegedy teach all the claim limitations of claims 9 and 13 above; and further teach wherein the ith network layer is a deconvolution layer, the ith network layer further causes the apparatus to: perform a first transposed convolution calculation on the input data using the shared weight value to obtain the shared output data; and perform a second transposed convolution calculation on the input data using the tth group of dedicated weight values to obtain the dedicated output data (Gao, paragraphs 0037, 0060, and Fig. 3 teach “A transposed weight sharing scheme (deconvolution) may also be adopted to allow weight sharing between the word embedding layer and the fully connected Softmax layer”).
Regarding claim 21, the combination of Gao, Kurokawa, and Szegedy teach all the claim limitations of claim 9 above; and further teach wherein the data processing method uses a neural network model to perform data processing (Gao, paragraphs 0037, 0039, 0044-0045, 0054, and Fig. 2 teach “the weight matrix in the word embedding layers of the two LSTMs (one for the question and one for the answer) are shared” in the multiple layers of each LSTM and of neural networks in the mQA System).
Regarding claim 24, the combination of Gao, Kurokawa, and Szegedy teach all the claim limitations of claim 9 above; and further teach wherein the first task is an image denoising task, and wherein the second task is an image recognition task (Gao, paragraphs 0037-0039, claim 18, and Fig. 2 teach obtaining input questions and images data to the mQA system from a “user”. Paragraphs 0029, 0037, 0039-0045, 0054, claim 18, and Fig. 2 teach creating “a visual representation” of the input image (image denoising) and then utilizing “object recognition” within the image (image recognition) for answering the input question).
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241. The examiner can normally be reached on Mon - Fri 8:00-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/C.M./Examiner, Art Unit 2123
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123