Office Action Analysis: 18447003 — MULTI-TASK LEARNING FOR DEPENDENT MULTI-OBJECTIVE OPTIMIZATION FOR RANKING DIGITAL CONTENT

Office Action

§102 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-20 are pending and examined herein.
Claim 11 is objected to.
Claims 1-20 are rejected under 35 U.S.C. 112(b) .
Claims 1-20 are rejected under 35 U.S.C. 102.

Claim Objections
Claim 11 objected to because of the following informalities:
“wherein the third head ranks head ranking a search result” should be “wherein the third head ranks a search result.”
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claims 1, 12, and 17 recite "a first head of the plurality of heads, wherein the first head is trained to perform a first task associated with the first objective.” It is unclear if “the first head” refers to “a first head of the plurality of heads” or if it is a different first head. For purposes of examination, this limitation will be interpreted as "a first head of the plurality of heads, wherein the first head of the plurality of heads is trained to perform a first task associated with the first objective.” Additionally, claims 1, 12, and 17 recite "an output of the first head is input into the second head of the plurality of heads." It is unclear whether “the first head” refers to the “first head of the plurality of heads” or if it refers to a different first head. For purposes of examination, any recitation of “the first head” without “of the plurality of heads” will be interpreted as “the first head of the plurality of heads”.

Claims 1, 12, and 17 recite recites "a second head of the plurality of heads, wherein the second head is trained to perform a first task associated with the second objective.” It is unclear if “the second head” refers to “a second head of the plurality of heads” or if it is a different second head. For purposes of examination, this limitation will be interpreted as "a second head of the plurality of heads, wherein the second head of the plurality of heads is trained to perform a first task associated with the second objective.” Additionally, claims 1, 12, and 17 recite "an output of the second head is input into a fourth head of the plurality of heads." It is unclear whether “the second head” refers to the “first head of the plurality of heads” or if it refers to a different second head. For purposes of examination, any recitation of “the second head” without “of the plurality of heads” will be interpreted as “the second head of the plurality of heads”.

Claims 1, 12, and 17 recite recites "a third head of the plurality of heads, wherein the third head is trained to perform a second task associated with the first objective." It is unclear if “the third head” refers to “a third head of the plurality of heads” or if it is a different third head. For purposes of examination, this limitation will be interpreted as "a third head of the plurality of heads, wherein the third head of the plurality of heads is trained to perform a second task associated with the first objective." Additionally, claims 1, 12, and 17 recite "an output of the third head is input into the fourth head of the plurality of heads" It is unclear whether “the third head” refers to the “third head of the plurality of heads” or if it refers to a different second head. For purposes of examination, any recitation of “the third head” without “of the plurality of heads” will be interpreted as “the third head of the plurality of heads”.

Claims 2-11, 13-16, and 18-20 fail to resolve the issues and are rejected with the same rationale.

Claim 1 recites "wherein the output is based on the first objective and the second objective" in the last paragraph of the claim. It is unclear as to which output “the output” refers to. Claim 1 recites “an output of the first head”, “an output of the second head”, “an output of the third head” and “an output of the fourth head”. For purposes of examination, this limitation will be interpreted as “wherein the output of the fourth head is based on the first objective and the second objective.”

Claims 2-11 fail to resolve the issue and are rejected with the same rationale.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Andreas (“Deep Compositional Question Answering with Neural Module Networks”, July 24, 2017) .
	
	Regarding claim 1, Andreas teaches
	A method comprising: (Page 1 states "This paper describes an approach to visual question answering based on neural module networks (NMNs)." The approach is interpreted as the method.)
configuring a memory according to a machine learning model, (Page 2 states "Thus our goal in this paper is to specify a framework for modular, composable, jointly-trained neural networks." As one of ordinary skill in the art would reason, the neural network (machine learning model) would be executed on a computer. Therefore, the memory of the computer is required to be configured according to the machine learning model.) wherein the machine learning model comprises a shared backbone (Fig. 1 shows the model, where the input image is input to the CNN, interpreted as the backbone.) and a plurality of heads each trained to perform a task associated with either a first objective or a second objective, wherein:   (Fig. 2 shows the NMN. Each module is interpreted as a head, which performs the tasks listed on page 4. The attend modules are interpreted as being associated with the first objective (attend) and the remainder of the modules are interpreted as being associated with the second objective, (re-attend/combine/measure).
an output of the shared backbone is input into: (Page 7 states "To produce an initial set of image features, we pass the input image through the convolutional portion of a LeNet [17] which is jointly trained with the question-answering part of the model." Therefore, in Fig. 2(b), the image is first input to the shared backbone before the next heads of the model.)
a first head of the plurality of heads, (Fig. 2b shows the input image, which is first fed through the shared backbone which produces an output that is input into the attend[circle] module, interpreted as the first head.) 
wherein the first head is trained to perform a first task associated with the first objective, (Fig. 2(b) shows that the first head performs the attend[circle] task, which is associated with the attention objective as shown on page 4.)
a second head of the plurality of heads, (Fig. 2b shows the input image, which is first fed through the shared backbone which produces an output that is input into the attend(circle) module, and then the re-attend[above] module, interpreted as the second head. Therefore, the shared backbone output is indirectly input into the second head.)
wherein the second head is trained to perform a first task associated with the second objective, a third head of the plurality of heads, (Fig. 2b shows that the second head performs the re-atttend[above] task, which is associated with the re-attention objective, as shown on page 4.)
wherein the third head is trained to perform a second task associated with the first objective, and (Fig. 2b shows the attend[red] module and the combine[and] modules, which are interpreted as the the third head, which performs the attend[red] task. Page 4 shows that the attend[] modules are associated with the attention objective, which is the first objective. The attend[red] and the combine[and] is interpreted as the second task associated with the first objective.)
an output of the first head is input into the second head of the plurality of heads, (Fig. 2b shows that the output of the attend[circle] module (first head) is input into the re-attend[above] module (second head).)
an output of the second head is input into a fourth head of the plurality of heads, (The measure[is] module is interpreted as the fourth head, to which the output of the re-attend[above] (second head) is input through the combine[and] module, as shown in Fig. 2b.)
an output of the third head is input into the fourth head of the plurality of heads, and (Fig.2b shows that the output of the attend[red] module and combine[and] (third head) is input into the measure[is] module, interpreted as the fourth head.)
an output of the fourth head of the plurality of heads, (Fig. 2b shows that there is an output of the measure[is] module, interpreted as the fourth head.)
wherein the output is based on the first objective and the second objective. (As the attention and re-attention modules are both used to produce an output of the measure[is] module, the output is based on the first objective and the second objective.)

Regarding claim 2, the rejection of claim 1 is incorporated herein. Andreas teaches
wherein the first objective contradicts the second objective. (As explained above, the attention is interpreted as the first objective, and the remainder of the types of modules are interpreted as the second objective. Page 4 states "An attention module attend[c] convolves every position in the input image with a weight vector (distinct for each c) to produce a heatmap or unnormalized attention. So, for example, the output of the module attend[dog] is a matrix whose entries should be in regions of the image containing cats, and small everywhere else, as shown above." Page 4 further states "So re-attend[above] should take an attention and shift the regions of greatest activation upward (as above), while re-attend[not] should move attention away from the active regions." Therefore, as the first objective is to create an attention in the area of the object, and the second objective is to move the attention away from the area of the object, the first objective contradicts the second objective.)

Regarding claim 3, the rejection of claim 1 is incorporated herein. Andreas teaches
wherein the task performed by each head of the plurality of heads is a listwise ranking task. (Page 4 states "So, for example, the output of the module attend[dog] is a matrix whose entries should be in regions of the image containing cats, and small everywhere else, as shown above." Therefore, the matrix, interpreted as the list, has attention values that rank the involved pixels. Further, the tasks re-attend[c] and combine[c] also produce attention values (see page 4) that form a listwise ranking of the involved pixels. The measure[c] task maps a distribution over labels, which means a value/probability is assigned to each label, forming a listwise ranking of the labels.)

Regarding claim 4, the rejection of claim 1 is incorporated herein. Andreas teaches
wherein the second task associated with the first objective includes a plurality of sub-tasks. (As the second task associated with the first objective is both the attend[red] and the combine[and] modules, as explained with respect to claim 1, the sub-tasks are attend[red] and combine[and].)

Regarding claim 5, the rejection of claim 4 is incorporated herein. Andreas teaches
wherein the third head is a nested multi-task machine learning model, and each head of the nested multi-task machine learning model performs a sub-task of the plurality of sub-tasks. (As the third head is interpreted as both the attend[red] and the combine[and] modules, as explained with respect to claim 1, the third head is a nested multi-task machine learning model, with two heads (attend[red] and combine[and]) which each perform their respective sub-tasks.)

Regarding claim 6, the rejection of claim 1 is incorporated herein. Andreas teaches
wherein the shared backbone is configured to extract one or more features from a search result. (Page 7 states "To produce an initial set of image features, we pass the input image through the convolutional portion of a LeNet [17] which is jointly trained with the question-answering part of the model." As the CNN is the shared backbone, the shared backbone extracts image features from the image (search result).)

Regarding claim 7, the rejection of claim 1 is incorporated herein. Andreas teaches
wherein the first task associated with the second objective depends on the first task associated with the first objective. (Fig. 2b shows the machine learning model where the re-attend[above] module, interpreted as the first task associated with the second objective, uses the output of the attend[circle] module, interpreted as the first task associated with the first objective. Therefore, the as the first task associated with the second objective uses the output of the first task associated with the first objective, the first task associated with the second objective depends on the first task associated with the first objective.)

Regarding claim 8, the rejection of claim 1 is incorporated herein. Andreas teaches
wherein the machine learning model is trained end-to-end. (Page 6 states "The complete model, including both the NMN and sequence modeling component, is trained jointly.")

Regarding claim 9, the rejection of claim 1 is incorporated herein. Andreas teaches
wherein the first head ranks a search result according to the first task associated with the first objective. (The image pixels are interpreted as the search result. The first head performs the attend[circle] task, interpreted as the first task associated with the first objective. Page 4 states "An attention module attend[c] convolves every position in the input image with a weight vector (distinct for each c) to produce a heatmap or unnormalized attention. So, for example, the output of the module attend[dog] is a matrix whose entries should be in regions of the image containing cats, and small everywhere else, as shown above." Therefore, the matrix, interpreted as the list, has attention values that form a ranking of the involved pixels.)

Regarding claim 10, Andreas teaches
wherein the second head ranks a search result according to the first task associated with the second objective. (The second head is the re-attend[above] module, which performs the re-attend[above] task. Page 4 states "So re-attend[above] should take an attention and shift the regions of greatest activation upward (as above), while re-attend[not] should move attention away from the active regions. For the experiments in this paper, the first fully-connected (FC) layer produces a vector of size 32, and the second is the same size as the input." Therefore, the attention values produced by the second head and its corresponding form a ranking of the pixels.)

Regarding claim 11, Andreas teaches
wherein the third head ranks head ranking a search result according to the second task associated with the first objective. (The third head performs the attend[red] and combine[and] task. Page 4 states "An attention module attend[c] convolves every position in the input image with a weight vector (distinct for each c) to produce a heatmap or unnormalized attention. So, for example, the output of the module attend[dog] is a matrix whose entries should be in regions of the image containing cats, and small everywhere else, as shown above." Therefore, the matrix, interpreted as the list, has attention values that form a ranking of the involved pixels.)

Regarding claim 12, Andreas teaches
A system comprising: (Page 2 states "Thus our goal in this paper is to specify a framework for modular, composable, jointly-trained neural networks." As one of ordinary skill in the art would reason, the neural network (machine learning model) and the framework would be executed on a computer, interpreted as the system.)
at least one processor; and (A computer necessarily contains a processor in order to execute instructions to perform the method.)
at least one memory, (A computer necessarily contains a processor in order to store instructions to perform the method.)
an output of the fourth head of the plurality of heads ranks a search result according to the first objective and the second objective. (As the attention and re-attention modules are both used to produce an output of the measure[is] module, the output is based on the first objective and the second objective. The fourth head, the measure[is] module, according to page 4, “takes an attention alone and maps it to a distribution over labels." Therefore, as the attention, which provides a ranking of the search result, is mapped to the distribution over labels, the distribution of the labels indicates a ranking of the search result.)
The remainder of claim 12 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.
 
Regarding claim 13, the rejection of claim 12 is incorporated herein. Andreas teaches 
wherein the first ranking task associated with the first objective, the first ranking task associated with the second objective, and the second ranking task associated with the first objective are each listwise ranking tasks. (The first-ranking task associated with the first objective is attend[circle]. The second ranking task associated with the second objective is re-attend[above]. The second ranking task associated with the first objective is attend[red] and combine [and]. Page 4 states "So, for example, the output of the module attend[dog] is a matrix whose entries should be in regions of the image containing cats, and small everywhere else, as shown above." Therefore, the matrix, interpreted as the list, has attention values that rank the involved pixels. Further, the tasks re-attend[c] and combine[c] also produce attention values (see page 4) that form a listwise ranking of the involved pixels. The measure[c] task maps a distribution over labels, which means a value/probability is assigned to each label, forming a listwise ranking of the labels.)

Claims 14-16 recite substantially similar subject matter to claims 4, 5, and 8 respectively and are rejected with the same rationale, mutatis mutandis.

Regarding claim 17, Andreas teaches
A non-transitory computer-readable medium storing executable instructions, which when executed by a processing device, cause the processing device to perform operations comprising:  (Page 2 states "Thus our goal in this paper is to specify a framework for modular, composable, jointly-trained neural networks." As one of ordinary skill in the art would reason, the neural network (machine learning model) and the framework would be executed on a computer, interpreted as the system. The computer would necessarily include a non-transitory computer-readable medium storing executable instructions executed by a processing device, cause the processing device to perform operations comprising the method.)
obtaining a search result including a plurality of entries associated with a search query; (Page 3 states "One important component of visual questioning is grounding the question in the image. This grounding task has previously been approached in [13, 24, 12, 15], where the authors tried to localize phrases in an image. [31] use an attention mechanism, to predict a heatmap for each word, as an auxiliary task, during sentence generation. The attentional component of our model is inspired by these approaches." Therefore, the pixels in the image are interpreted as the plurality of entries, the search result is interpreted as the image, and the question that is asked is interpreted as the search query.)
inputting the search result into a machine learning model trained to rank the search result according to a first objective and a second objective, wherein the machine learning model comprises:  (Fig. 1 shows the machine learning model, into which the image (search result) is input. Fig. 2 shows the NMN. Each module is interpreted as a head, which performs the tasks listed on page 4. The attend modules are interpreted as being associated with the first objective (attend) and the remainder of the modules are interpreted as being associated with the second objective, (re-attend/combine/measure). Page 4 states "So, for example, the output of the module attend[dog] is a matrix whose entries should be in regions of the image containing cats, and small everywhere else, as shown above." Therefore, the matrix, interpreted as the list, has attention values that rank the involved pixels. Further, the tasks re-attend[c] and combine[c] also produce attention values (see page 4) that form a listwise ranking of the involved pixels.)
a shared backbone outputting a feature representation into: (Page 7 states "To produce an initial set of image features, we pass the input image through the convolutional portion of a LeNet [17] which is jointly trained with the question-answering part of the model." Therefore, in Fig. 2(b), the image is first input to the shared backbone that outputs a feature representation into the heads of the model.)
an output of the fourth head of the plurality of heads ranks a search result according to the first objective and the second objective. (As the attention and re-attention modules are both used to produce an output of the measure[is] module, the output is based on the first objective and the second objective. The fourth head, the measure[is] module, according to page 4, “takes an attention alone and maps it to a distribution over labels." Therefore, as the attention, which provides a ranking of the search result, is mapped to the distribution over labels, the distribution of the labels ranks the search result.)
The remainder of claim 17 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Claims 18-20 recite substantially similar subject matter to claims 4, 5, and 8 respectively and are rejected with the same rationale, mutatis mutandis.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA THUY PHAM whose telephone number is (571)272-2605. The examiner can normally be reached Monday - Friday, 9 A.M. - 5:00 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached at (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/J.T.P./Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action
MULTI-TASK LEARNING FOR DEPENDENT MULTI-OBJECTIVE OPTIMIZATION FOR RANKING DIGITAL CONTENT

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

MULTI-TASK LEARNING FOR DEPENDENT MULTI-OBJECTIVE OPTIMIZATION FOR RANKING DIGITAL CONTENT

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email