DETAILED ACTION
Case Status
This office action is in response to remarks and amendments of 8 October 2025. Claims 1-11, 14-16 and 18-20 have been examined.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-11, 14-16 and 18-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claims 1-11, 14-16 and 18-20 are directed to one of the eligible categories of subject matter.
With respect to independent claims 1, 14, 18, the preprocesses, converts, configures, evaluate, calculate, classify cover performance of the limitations manually and/or in the mind (mental processes abstract idea). The output, input, receives, transmitted limitations are recited at a high level of generality and do not add meaningful limitations to the abstract idea; these limitations are directed to insignificant extra solution activities. The claims as a whole merely describe how to generally “apply” the exception in a computer environment using generic computer functions or components (such as the claimed neural network model). Even when viewed in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claims are not patent eligible.
With respect to dependent claim 10, the filtering, connects, convert cover performance of the limitations manually and/or in the mind (mental processes abstract idea). The pre-trained language model is recited at a high level of generality and do not add meaningful limitations to the abstract idea. The claims as a whole merely describe how to generally “apply” the exception in a computer environment using generic computer functions or components (such as the pre-trained language model). Even when viewed in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claims are not patent eligible.
With respect to dependent claims 4, 7, 8, 9, 11, 15, 16, 19 the assign, cluster, divide, generating, concatenates, separates, converts, expressed, evaluate, calculate, classify, preprocessing cover performance of the limitations manually and/or in the mind (mental processes abstract idea). No additional elements are recited and so the claims do not provide a practical application and are not considered to be significantly more. The claims are not eligible.
With respect to dependent claims 2, 3, 5, 6, 20 single neural network, transformer, output, includes information, item type are recited at a high level of generality and do not add meaningful limitations to the abstract idea. The claims as a whole merely describe how to generally “apply” the exception in a computer environment using generic computer functions or components (such as a single neural network, transformer). Even when viewed in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claims are not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 2, 3, 9, 10, 14, 16, 18, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al., Multimodal Conversational Fashion Recommendation with Positive and Negative Natural-Language Feedback, hereinafter Wu in view of Bhaskaran et al., Pub. No.: US 20200309923 A1, hereinafter Bhaskaran.
As per claim 1, Wu discloses A multi-modality system for recommending multiple items using an interaction, comprising:
an interaction data preprocessing module that preprocesses an interaction data set and converts the preprocessed interaction data set into interaction training data (see section 1, at least the first and last two paragraphs, and section 2, at least the last paragraph, section 3; the dialog is tracked turn-by-turn as the models input state, BERT text feature extractor encodes text attributes);
an item data preprocessing module that preprocesses item information data and converts the preprocessed item information data into item training data (see Wu as mapped above; the image feature extractor encodes item images and BERT text feature extractor encodes text attributes);
a learning module that includes a neural network model that is trained using the interaction training data and the item training data and outputs a result including a set of recommended items using a conversation context with a user as input (see rejection of above limitations; note that in Wu, a transformer based neural network model trains on both the dialogue data and item feature data and outputs recommended items in response to the user’s conversational inputs. Note: the claimed conversation context is merely chat or dialogue as explained in par. 55 of the published specification); and
an evaluation module that evaluates the set of recommended items, wherein the evaluation module is configured to calculate a confidence score for two inputs that are the conversation context with the user and each item or two items included in one set of recommended items (sections 5, 6 including Fig. 3, 5 and Table 1), and
Wu does not explicitly disclose, however Bhaskaran in the related field of endeavor of machine learning wherein the evaluation module is trained to classify the two inputs as true/false through a binary classifier, and the confidence score is based on a logit value of the binary classifier (Bhaskaran, pars. 128-133).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because Bhaskaran would have allowed Wu to implement the well-known technique of using a binary classifier ML model to output a confidence score indicating a likelihood that an input is associated with a classification, the confidence score indicated as a logit (Bhaskaran, pars. 128-133).
As per claim 14, it is analogous to claim 1 and therefore likewise rejected.
As per claim 18, Wu discloses A multi-modality system for recommending multiple items using an interaction, comprising: a user device that receives a conversation for item recommendation input from a user; and an item recommendation system that configures the conversation input from the user device and an answer transmitted to the user device into a series of conversation contexts, inputs the conversation contexts to a pre-trained neural network model, and outputs a result including a set of recommended items, (see section 1, at least the first and last two paragraphs, and section 2, at least the last paragraph, section 3 - a transformer based neural network model trains on both a user’s device-inputted/device-transmitted “answer” dialogue data and item feature data and outputs recommended items) wherein the item recommendation system comprises an evaluation module that evaluates the set of recommended items, wherein the evaluation module is configured to calculate a confidence score for two inputs that are the conversation context with the user and each item or two items included in one set of recommended items (sections 5, 6 including Fig. 3, 5 and Table 1), and Wu does not explicitly disclose, however Bhaskaran in the related field of endeavor of machine learning wherein the evaluation module is trained to classify the two inputs as true/false through a binary classifier, and the confidence score is based on a logit value of the binary classifier (Bhaskaran, pars. 128-133).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because Bhaskaran would have allowed Wu to implement the well-known technique of using a binary classifier ML model to output a confidence score indicating a likelihood that an input is associated with a classification, the confidence score indicated as a logit (Bhaskaran, pars. 128-133).
As per claim 2, Wu as modified discloses the multi-modality system of claim 1, wherein the neural network model is a single neural network that processes the interaction training data and the item training data (See Wu as cited in the rejection of claim 1- the Multimodal Interactive Transformer (MIT) model is a single NN that uses a transformer).
As per claim 3, Wu as modified discloses the multi-modality system of claim 2, wherein the neural network is based on a transformer (See Wu as cited in the rejection of claim 1- the Multimodal Interactive Transformer (MIT) model is a single NN that uses a transformer).
As per claim 9, Wu as modified discloses The multi-modality system of claim 1, wherein the item data preprocessing module separates the item information data into text information data and non-text information data, converts the text information data into a text feature, and converts the non-text information data into a non-text feature (see Wu as cited in the rejection of claim 1 including section 4, second paragraph and Table 2 of section 6; also, section 6.4 discloses that the system understands textual dataset names of the images and that text about the images such as color information is understood as a non-text feature in all subsequent interactions of the State Tracking/history as disclose in section 3.2).
As per claim 16, it is analogous to claim 9 and therefore likewise rejected.
As per claim 10, Wu as modified discloses The method of claim 9, wherein the item data preprocessing module performs filtering on the text information data, connects the filtered text information data to convert into one string sequence, and uses a pre-trained language model to convert the string sequence into the text feature (see rejection of claim 9 and Wu section 3.2).
As per claim 19, Wu as modified discloses The multi-modality system of claim 18, wherein the neural network model is trained based on item training data by preprocessing the interaction data set and preprocessing interaction training data and item information data (see rejection of claim 1).
As per claim 20, Wu as modified discloses The multi-modality system of claim 18, wherein the item is one of clothes, a movie, music, travel, or a book (Sections 6.4 and 7).
Claim 4, 15 are rejected under 35 U.S.C. 103 as being unpatentable over Wu as modified and further in view of Basu et al., Pub. No.: US 20150044659 A1, hereinafter Basu.
As per claim 4, Wu as modified discloses The multi-modality system of claim 1 wherein the interaction data preprocessing module assigns interaction state information to each utterance in the conversation context with the user (see rejection of claim 1 including at least section 3.2 for language/utterance state tracking).
Wu as modified does not explicitly disclose, however Basu in the related field of endeavor of question answering discloses and clusters system utterance (Basu, pars. 48-59), to divide the system utterances into a plurality of answer sets (Basu, at least pars. 20, 21, 52, 72).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because Basu would have allowed Wu to improve grouping of similar answers into clusters in order to provide more rich feedback, discover modalities, and more efficiently/rapidly encode the same or similar answers (Basu, pars. 18-19).
Analogous claim 15 is likewise rejected.
Claims 4, 5, 6, 7, 8, 11, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Wu as modified and further in view of Katayama et al., Pub. No.: US 20220261556, hereinafter Katayama.
As per claim 4, Wu as modified discloses The multi-modality system of claim 1 wherein the interaction data preprocessing module assigns interaction state information to each utterance in the conversation context with the user see rejection of claim 1 including at least section 3.2 for language/utterance state tracking).
Wu as modified does not explicitly disclose, however Katayama in the related field of endeavor of natural language processing discloses and clusters system utterances to divide the system utterances into a plurality of answer sets (Katayama, par. 23, 36, 37, 39-41 discloses clustering utterances and/or interrogatives as those that are the top N (a first cluster) and those that are not (a second cluster), and store/use interrogative answers based on rank-based calculated scores (i.e. divide)).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because Katayama would have allowed Wu to “Returning a response utterance utilizing the collected information to the user facilitates the user to have a conversation with the interaction system, and hence, smooth interaction can be expected between the interaction system and the user” (Katayama, par. 24).
Analogous claim 15 is likewise rejected.
As per claim 5, Wu as modified discloses the multi-modality system of claim 4, wherein the learning module further outputs information on an answer utterance of the system as the result (Katayama, pars. 23, 36, 37, 39-41 disclose outputting utterances as answer results; see also, Wu, section 4, and Table 2 of section 6).
As per claim 6, Wu as modified discloses the multi-modality system of claim 5, wherein the information on the answer utterance includes previous interaction state information of a current input sequence, interaction state information of an answer of the system to be currently predicted, and identification information of the answer set (Katayama, pars. 23, 36, 37, 39-41; also, Wu discloses tracking in at least section 3.2).
As per claim 7, Wu as modified discloses the multi-modality system of claim 6, wherein the learning module further includes a decoder for generating an answer sentence based on the identification information of the answer set (Katayama, pars. 36-38; see also, Wu section 3.2).
As per claim 8, Wu as modified discloses The multi-modality system of claim 4, wherein the interaction data preprocessing module concatenates similar sentences among the system utterances in the answer set into one sentence (Katayama, at least pars. 37-38 disclose concatenating as adding to the record of the estimated used interrogatives (system answer sentence utterances)).
As per claim 11, Wu as modified discloses The method of claim 7, wherein each item included in the set of recommended items is expressed as composite modality of a text feature and a non-text feature (Wu, section 4 and Table 2 in section 6.3).
Response to Arguments
Applicant's arguments filed 8 October 2025 have been fully considered but they are not persuasive.
With respect to the 35 USC 101 rejection, the remarks present the following:
PNG
media_image1.png
406
620
media_image1.png
Greyscale
Examiner is unable to identify which words in the independent claims are directed to any of these features. More specifically, claim 1 includes the term “multi-modality” in the preamble only and only as a type of label for the claimed system. There are no words in the claim requiring “multi-modal feature extraction and fusion”. The claim does not require text features and image features much less separately computed from different models. The claim does not combine anything much less text and image features “via learned logit weighting after FCN processing”. The claim does not concurrently evaluate anything much less (a) context-item pairs and (b) item-item pairs within a candidate set, producing a logit-based confidence score for each stream. The claim does not do set level scoring, N-best re-ranking, combine logits, summation of combined logits across items and pairs to form a set score in order to re-rank an N-best list generated by a trained transformer-based recommender.
The remarks further present the following:
PNG
media_image2.png
240
637
media_image2.png
Greyscale
As mentioned above, the independent claims do not recite any such architecture and operations. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Accordingly, Applicant’s arguments directed to the 35 USC 101 rejection are not persuasive.
With respect to the prior art rejection, the remarks present:
PNG
media_image3.png
211
625
media_image3.png
Greyscale
PNG
media_image4.png
172
621
media_image4.png
Greyscale
PNG
media_image5.png
241
628
media_image5.png
Greyscale
As is evident from these statements, Applicant’s interpretation of the independent claims requires reading entire disclosed embodiments into the claims. Although the claims are interpreted in light of the specification, limitations from the specification will not be read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SYED HASAN whose telephone number is (571)270-5008. The examiner can normally be reached M-F 8am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Boris Gorney can be reached at (571)270-5626. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Syed Hasan
Primary Examiner
Art Unit 2154
/SYED H HASAN/Primary Examiner, Art Unit 2154