Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This action is in reply to the communications filed on May 15, 2023, and December 19, 2025. The applicant’s claim for benefit of provisional application 63341695, filed May 13, 2022, has been received and acknowledged.
Claims 1-18 are currently pending and have been examined. In response to Applicants’ election of Group I, claims 1-18 are examined and claims 19-20 are canceled.
Information Disclosure Statement
The information disclosure statement filed May 15, 2023, has been considered by the Examiner.
Election/Restrictions
Applicant’s election without traverse of Group I, claims 1-18 in the reply filed on December 19, 2025, is acknowledged.
Claim Objections
Claim 15 is objected to because of the following informalities: Claim 15 does not end with a period. See MPEP 608.01(m).
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Independent claims 1 and 14 recite a method and a system for recommending items. With regard to claim 1, the limitations of generating a collection of embeddings for a collection of items, determining a first embedding for a selected item, determining a second embedding for the user query, determining a third embedding for a conversation history, determining similarities, and recommending an item, as drafted, illustrate a series of steps that, under their broadest reasonable interpretation, cover a mental process. That is, other than reciting that a processor performs the method (in claim 14), nothing in the claim precludes the steps from practically being performed in the mind. Claim 14 recites similar limitations.
The judicial exception is not integrated into a practical application. In particular, claims 1 and 14 recite receiving data and inputting (transmitting) data. These limitations are considered to be insignificant extra-solution activity. Further, claim 14 recites a processor and a memory at a high level of generality (i.e., as generic computer components performing generic computer functions). Accordingly, these elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Thus, claims 1 and 14 are directed to the abstract idea.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above, claims 1 and 14 recite receiving data and inputting (transmitting) data. Per MPEP 2106.05(d)(II), elements such as receiving or transmitting data over a network, using the Internet to gather data, and storing and retrieving information in memory are considered to be computer functions that are well-understood, routine, and conventional functions. See Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPG2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network)).
Further, as discussed above with respect to integration of the abstract idea into a practical application, the claims recite a processor and a memory at a high level of generality (i.e., as generic computer components performing generic computer functions). Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.
Additionally, the independent claims recite a multi-modal machine learning model. The Examiner notes that in paragraph [0032] of the published application, Applicants list various models that may constitute the multi-modal machine learning model including CLIP, MUTAN, MCAN, BUTD, ALIGN, VLBERT, VisualBERT, and variations of such models, or another model. Applicants do not describe the particulars of the models, indicating that the models are sufficiently well-known. Thus, the Examiner interprets a multi-modal machine learning model as a well-understood, routine, or conventional element.
Thus, claims 1 and 14 are not patent eligible.
Claims 2-13 and 15-18 depend from claims 1 and 14. Claim 2 is directed to performing the steps of claim 1 again with respect to a first recommended item and is further directed to the abstract idea. Claim 3 is directed to receiving data which, as discussed above, is a function that is considered to be well-understood, routine, and conventional. Claim 4 is directed to the type of item and selecting the item and is further directed to the abstract idea. Claim 5 is directed to fine tuning the model and is further directed to the abstract idea. Claim 6 is directed to adding parameters to the model and training the model and is further directed to the abstract idea. Claim 7 is directed to the type of data and is further directed to the abstract idea. Claim 8 is directed to receiving a selection of the item, the composition of the items, determining the first embeddings, and averaging the first embeddings and is further directed to the abstract idea. Claim 9 is directed to the type of model and is further directed to the abstract idea. Claim 10 is directed to determining the similarities and is further directed to the abstract idea. Claim 11 is directed to the type of item text and is further directed to the abstract idea. Claim 12 is directed to the type of conversation history and is further directed to the abstract idea. Claim 13 is directed to determining the conversation state embeddings and determining a weighted average and is further directed to the abstract idea. Claim 15 is directed to generating training data and fine-tuning the model and is further directed to the abstract idea. Claim 16 is directed to determining item attributes, accessing images, and generating training instances and is further directed to the abstract idea. Claim 17 is directed to the type of items and is further directed to the abstract idea. Claim 18 is directed to a purchase link for the recommended item, receiving a selection of the link, and adding the recommended item to a shopping cart and is further directed to the abstract idea.
Thus, the claims are not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-2, 4, 11-12, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over US 10,339,586 B1 to Khobragade et al. (hereinafter “Khobragade”), in view of US 2019/0034994 A1 to Wu et al. (hereinafter “Wu”).
Claim 1: Khobragade discloses systems and methods for “identifying similar products or services for the purpose of making relevant recommendations to an online consumer. Products and services are represented by associated vectors which include values for each of a plurality of attributes of the corresponding product or service.” (See Khobragade, at least Abstract). Khobragade further discloses that “[o]ne or more similar products or services are identified relative to a reference product or service set with reference to the distance between the end points of the respective vectors in the associated vector space.” (See Khobragade, at least Abstract). Khobragade further discloses:
receiving a selection of an item, the item including an item image and item text (See Khobragade, at least FIG. 1 and associated text; col. 3, lines 1-13, user expresses interest in a particular product by viewing more detailed information about the product; user is viewing information about a particular tablet; information include both text (Kindle Fire) and image);
receiving a user query (See Khobragade, at least FIG. 1 and associated text; col. 3, lines 14-25, user selects interface control 114 which is a “Find Similar” button to request information about products similar to the reference product);
determining a first embedding for the selected item…(See Khobragade, at least col. 3, lines 35-40, products are represented by vectors in a two-dimensional vector space; col. 3, lines 60-67, Product 1 is the Reference Product);
determining a…user query (See Khobragade, at least FIG. 3 and associated text; col. 3, lines 1-30, user is viewing information about a particular tablet; usar selects link to see similar products; col. 3, lines 35-40, products are represented by vectors in a two-dimensional vector space);
determining similarities between the target embedding and embeddings of the collection of embeddings (See Khobragade, at least col. 3, line 60 to col. 4, line 15, similar products to Product 1 are identified as including all products having product vector end points withing some programmable distance of the end point of the product vector representing Product 1; col. 5, lines 38-67, similar products to reference Product 1 are presented to user; these are Products 5, 7, and 8); and
based on the similarities, recommending an item of the collection of items (See Khobragade, at least col. 5, lines 38-67, similar products to reference Product 1 are presented to user; these are Products 5, 7, and 8).
Khobragade does not expressly disclose generating, using a multi-modal machine learning model, a collection of embeddings for a collection of items; that the first embedding is based at least in part on the item image and the item text; determining a second embedding for the user query; determining a third embedding for a conversation history, the conversation history including a previous user query and a previously selected item; and inputting the first embedding, the second embedding, and the third embedding into the multi-modal machine learning model to generate a target embedding.
However, Wu discloses “a marketplace including products offered for sale by a second user” that includes “filtering a set of product listings based on multiple respective product-listing embeddings and a content-interaction embedding associated with the first user.” (See Wu, at least Abstract). Wu further discloses:
generating, using a multi-modal machine learning model, a collection of embeddings for a collection of items (See Wu, at least para. [0024], machine learning module trains a model to map product listings to product listing embeddings; para. [0026], content associated with each product listing may comprise one or more text items, one or more image items, one or more location items, one or more categories);
the first embedding for the selected item is based at least in part on the item image and the item text (See Wu, at least para. [0024], machine learning module trains a model to map product listings to product listing embeddings; para. [0026], content associated with each product listing may comprise one or more text items and one or more image items; para; [0036], content interaction embedding includes text interaction history and image interaction history);
determining a second embedding for the user query (See Wu, at least para. [0028], n-gram in a query mapped to vector representation; para. [0036], content interaction embedding includes n-gram component from user’s text interaction history);
determining a third embedding for a conversation history, the conversation history including a previous user query and a previously selected item (See Wu, at least para. [0024], machine learning module trains a model to map demographic information, text-interaction history, image interaction history, and other information to content interaction embeddings; para. [0036], content interaction embedding includes n-gram component from user’s text interaction history);
inputting the first embedding, the second embedding, and the third embedding into the multi-modal machine learning model to generate a target embedding (See Wu, at least para. [0024], social networking system may filter a set of product listings based on a plurality of respective product-listing embeddings and a content-interaction embedding associated with the first user using personalized retrieval model; personalized retrieval model may comprise one or more machine-learning generated models).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade the ability of generating, using a multi-modal machine learning model, a collection of embeddings for a collection of items; that the first embedding is based at least in part on the item image and the item text; determining a second embedding for the user query; determining a third embedding for a conversation history, the conversation history including a previous user query and a previously selected item; and inputting the first embedding, the second embedding, and the third embedding into the multi-modal machine learning model to generate a target embedding as disclosed by Wu since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to “encourage users to interact with the products and product listings…[by] show[ing] product listings on a personalized or customized basis to users accessing the marketplace.” (See Wu, at least para. [0018]).
Claim 2: The combination of Khobragade and Wu discloses all the limitations of claim 1 discussed above.
Khobragade further discloses:
receiving a selection of the recommended item (See Khobragade, at least FIG. 3 and associated text; col. 5, lines 38-67, similar products to reference Product 1 are presented to user; these are Products 5, 7, and 8; the option to find similar products may be presented to the user multiple times; user can select Product 7 as a product of interest);
receiving a second user query (See Khobragade, at least FIG. 3 and associated text; col. 5, lines 38-67, similar products to reference Product 1 are presented to user; these are Products 5, 7, and 8; the option to find similar products may be presented to the user multiple times; user can select Product 7 as a product of interest; Product 7 is now added to the reference set for the subsequent iteration of the recommendation process);
determining a fourth embedding for the selected recommended item (See Khobragade, at least col. 5, lines 38-67, the option to find similar products may be presented to the user multiple times; user can select Product 7 as a product of interest; col. 3, lines 35-40, products are represented by vectors in a two-dimensional vector space);
determining a…second user query (See Khobragade, at least col. 5, lines 38-67, the option to find similar products may be presented to the user multiple times; user can select Product 7 as a product of interest; Product 7 replaces product 1 in the reference set; col. 3, lines 35-40, products are represented by vectors in a two-dimensional vector space);
determining second similarities between the second target embedding and the embeddings of the collection of embeddings (See Khobragade, at least col. 5, lines 38-67, similar products to reference Product 1 are presented to user; these are Products 5, 7, and 8; the option to find similar products may be presented to the user multiple times; user can select Product 7 as a product of interest; Product 7 replaces product 1 in the reference set; Figure 3 represents similar products by the circle 314 centered on Product 7 and encompassing Products 1, 3, 4, 6, and 8, which are considered similar; col. 3, line 60 to col. 4, line 13, similar products are identified as including all products having product vector end points within some programmable distance of the end point of the product vector representing Product 1/7; col. 3, lines 25-30, information on similar products are presented to the user);
based on the second similarities, recommending an additional item of the collection of items (See Khobragade, at least col. 5, lines 38-67, the option to find similar products may be presented to the user multiple times; col. 3, line 60 to col. 4, line 13, similar products are identified as including all products having product vector end points within some programmable distance of the end point of the product vector representing Product 1/7; col. 3, lines 25-30, information on similar products are presented to the user).
Khobragade does not expressly disclose determining a fifth embedding for the second user query; and inputting the target embedding, the fourth embedding, and the fifth embedding into the multi-modal machine learning model to generate a second target.
However, Wu discloses determining a fifth embedding for the second user query (See Wu, at least para. [0028], n-gram in a query mapped to vector representation; para. [0036], content interaction embedding includes n-gram component from user’s text interaction history); and inputting the target embedding, the fourth embedding, and the fifth embedding into the multi-modal machine learning model to generate a second target embedding (See Wu, at least para. [0024], social networking system may filter a set of product listings based on a plurality of respective product-listing embeddings and a content-interaction embedding associated with the first user using personalized retrieval model; personalized retrieval model may comprise one or more machine-learning generated models);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade the ability of determining a fifth embedding for the second user query; and inputting the target embedding, the fourth embedding, and the fifth embedding into the multi-modal machine learning model to generate a second target as disclosed by Wu since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to “encourage users to interact with the products and product listings…[by] show[ing] product listings on a personalized or customized basis to users accessing the marketplace.” (See Wu, at least para. [0018]).
Claim 4: The combination of Khobragade and Wu discloses all the limitations of claim 1 discussed above.
Khobragade further discloses wherein the selected item belongs to the collection of items (See Khobragade, at least col. 2, lines 50-55, products are represented in the data store).
Khobragade does not expressly disclose wherein determining the first embedding for the selected item comprises selecting a precomputed embedding from the collection of embeddings.
However, Wu discloses wherein determining the first embedding for the selected item comprises selecting a precomputed embedding from the collection of embeddings (See Wu, at least para [0044], product listing embeddings may be generated off-line such as when the particular product listing is submitted to the marketplace, i.e., prior to selection by the user).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade the ability wherein determining the first embedding for the selected item comprises selecting a precomputed embedding from the collection of embeddings as disclosed by Wu since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to “encourage users to interact with the products and product listings…[by] show[ing] product listings on a personalized or customized basis to users accessing the marketplace.” (See Wu, at least para. [0018]).
Claim 11: The combination of Khobragade and Wu discloses all the limitations of claim 1 discussed above.
Khobragade further discloses wherein the item text includes attributes of the item (See Khobragade, at least FIG. 1 and associated text, Kindle Fire HD 7”, Dolby Audio, Dual-Band WiFi, 16 GB).
Claim 12: The combination of Khobragade and Wu discloses all the limitations of claim 1 discussed above.
Khobragade does not expressly disclose wherein the conversation history includes a plurality of previous conversation states.
However, Wu discloses wherein the conversation history includes a plurality of previous conversation states (See Wu, at least para. [0024], content interaction embedding includes text-interaction history, image interaction history).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade the ability wherein the conversation history includes a plurality of previous conversation states as disclosed by Wu since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to “encourage users to interact with the products and product listings…[by] show[ing] product listings on a personalized or customized basis to users accessing the marketplace.” (See Wu, at least para. [0018]).
Claim 14: Khobragade discloses:
a processor (See Khobragade, at least FIG. 1 and associated text, server comprises processor); and
memory storing instructions (See Khobragade, at least col. 2, lines 30-35, memory storing instructions) that, when executed by the processor, cause the recommender system to:
display a user interface (See Khobragade, at least FIG. 1 and associated text, item is displayed to a user);
receive a selection of an item via the user interface (See Khobragade, at least FIG. 1 and associated text; col. 3, lines 1-13, user expresses interest in a particular product by viewing more detailed information about the product; user is viewing information about a particular tablet; information include both text ( Kindle Fire) and image;
receive a user query via the user interface (See Khobragade, at least FIG. 1 and associated text; col. 3, lines 14-25, user selects interface control 114 which is a “Find Similar” button to request information about products similar to the reference product);
determine a…user query (See Khobragade, at least FIG. 3 and associated text; col. 3, lines 1-30, user is viewing information about a particular tablet; usar selects link to see similar products; col. 3, lines 35-40, products are represented by vectors in a two-dimensional vector space);
determine similarities between the target embedding and embeddings of the collection of embeddings (See Khobragade, at least col. 3, line 60 to col. 4, line 15, similar products to Product 1 are identified as including all products having product vector end points withing some programmable distance of the end point of the product vector representing Product 1; col. 5, lines 38-67, similar products to reference Product 1 are presented to user; these are Products 5, 7, and 8); and
based on the similarities, recommend an item of the collection of items; display the recommended item via the user interface (See Khobragade, at least col. 5, lines 38-67, similar products to reference Product 1 are presented to user; these are Products 5, 7, and 8).
Khobragade does not expressly disclose a multi-modal machine learning model; generate, using the multi-modal machine learning model, a collection of embeddings for a collection of items; determine a second embedding for the user query; determine a third embedding corresponding to a conversation history, the third embedding being a previous target embedding; and input the first embedding, the second embedding, and the third embedding into the multi-modal machine learning model to generate a target embedding.
However, Wu discloses:
a multi-modal machine learning model (See Wu, at least para. [0024], machine learning module trains a model to map product listings to product listing embeddings; para. [0026], content associated with each product listing may comprise one or more text items, one or more image items, one or more location items, one or more categories);
generate, using the multi-modal machine learning model, a collection of embeddings for a collection of items (See Wu, at least para. [0024], machine learning module trains a model to map product listings to product listing embeddings; para. [0026], content associated with each product listing may comprise one or more text items, one or more image items, one or more location items, one or more categories);
determine a second embedding for the user query (See Wu, at least para. [0028], n-gram in a query mapped to vector representation; para. [0036], content interaction embedding includes n-gram component from user’s text interaction history);
determine a third embedding corresponding to a conversation history, the third embedding being a previous target embedding (See Wu, at least para. [0024], machine learning module trains a model to map demographic information, text-interaction history, image interaction history, and other information to content interaction embeddings; para. [0036], content interaction embedding includes n-gram component from user’s text interaction history); and
input the first embedding, the second embedding, and the third embedding into the multi-modal machine learning model to generate a target embedding (See Wu, at least para. [0024], social networking system may filter a set of product listings based on a plurality of respective product-listing embeddings and a content-interaction embedding associated with the first user using personalized retrieval model; personalized retrieval model may comprise one or more machine-learning generated models).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade the ability of a multi-modal machine learning model; generate, using the multi-modal machine learning model, a collection of embeddings for a collection of items; determine a second embedding for the user query; determine a third embedding corresponding to a conversation history, the third embedding being a previous target embedding; and input the first embedding, the second embedding, and the third embedding into the multi-modal machine learning model to generate a target embedding as disclosed by Wu since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to “encourage users to interact with the products and product listings…[by] show[ing] product listings on a personalized or customized basis to users accessing the marketplace.” (See Wu, at least para. [0018]).
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Khobragade in view of Wu as applied to claim 1 above, and further in view of US 2021/0224679 A1 to Misra et al. (hereinafter “Misra”).
The combination of Khobragade and Wu discloses all the limitations of claim 1 discussed above.
Khobragade further discloses wherein receiving the selection of the item comprises receiving a selection of a displayed item via a user interface (See Khobragade, at least FIG. 1 and associated text; col. 3, lines 14-25, user selects interface control 114 which is a “Find Similar” button to request information about products similar to the reference product).
Neither Khobragade nor Wu expressly discloses wherein receiving the user query comprises receiving a text string via a text input field of the user interface.
However, Misra discloses “an advisor system may receive a description of a problem to be solved and problem data identifying quantum computing-related and classical computing-related problems. The advisor system may perform natural language processing on the description of the problem and the problem data to respectively generate a problem embedding vector for the problem and to generate embedding vectors that represent the quantum computing-related and classical computing-related problems. The advisor system may process the problem embedding vector and the embedding vectors, with a vector matching model, to determine a semantically closest matching one of the embedding vectors to the problem embedding vector and, accordingly, may generate a recommendation that includes an indication to solve the problem with a classical computing resource, a quantum computing resource, or a combination of a classical computing resource and a quantum computing resource.” (See Misra, at least Abstract). Misra further discloses wherein receiving the user query comprises receiving a text string via a text input field of the user interface (See Misra, at least para. [0035], search engine allows the user to input a query that includes a description of the problem).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade and the social networking marketplace of Wu the ability wherein receiving the user query comprises receiving a text string via a text input field of the user interface as disclosed by Misra since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to help guide a user to solve a problem “when the user only has a general idea about the problem to solve.” (See Misra, at least para. [0018]).
Claims 5-7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Khobragade in view of Wu as applied to claims 1 and 14 above, and further in view of US 2020/0250734 A1 to Pande et al. (hereinafter “Pande”).
Claim 5: The combination of Khobragade and Wu discloses all the limitations of claim 1 discussed above.
Neither Khobragade nor Wu expressly discloses prior to generating the collection of embeddings for the collection of items, fine-tuning the multi-modal machine learning model.
However, Pande discloses systems and methods “for generating product recommendations from among a set of items in an item collection, such as products available at a retailer website.” (See Pande, at least para. [0007]). Pande further discloses prior to generating the collection of embeddings for the collection of items, fine-tuning the multi-modal machine learning model (See Pande, at least para. [0054], generating embeddings based on image data associated with the item, as well as embeddings based on text data associated with the item; embeddings are generated using a pre-trained model; para. [0056], models were trained for distinct categories of items, in this case, merchandise for an online retailer in the areas of clothing, home products, baby goods, and electronics items).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade and the social networking marketplace of Wu the ability of prior to generating the collection of embeddings for the collection of items, fine-tuning the multi-modal machine learning model as disclosed by Pande since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so because “a role of item embeddings or image embeddings or past user behavior may differ depending on the category.” (See Pande, at least para. [0056]).
Claim 6: The combination of Khobragade and Wu and Pande discloses all the limitations of claim 5 discussed above.
Neither Khobragade nor Wu expressly discloses wherein fine-tuning the multi-modal machine learning model comprises adding a plurality of parameters to the multi-modal machine learning model; and updating the plurality of parameters by training the multi-modal machine learning model using domain-specific training data.
However, Pande discloses wherein fine-tuning the multi-modal machine learning model comprises adding a plurality of parameters to the multi-modal machine learning model (See Pande, at least paras. [0058]-[0060], various parameters are added/altered to the model including a sampling parameter, an aggregation parameter, a loss parameter); and updating the plurality of parameters by training the multi-modal machine learning model using domain-specific training data (See Pande, at least paras. [0058]-[0060], various parameters are added/altered to the model including a sampling parameter, an aggregation parameter, a loss parameter; para. [0056], models were trained for distinct categories of items, in this case, merchandise for an online retailer in the areas of clothing, home products, baby goods, and electronics items).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade and the social networking marketplace of Wu the ability wherein fine-tuning the multi-modal machine learning model comprises adding a plurality of parameters to the multi-modal machine learning model; and updating the plurality of parameters by training the multi-modal machine learning model using domain-specific training data as disclosed by Pande since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to because “a role of item embeddings or image embeddings or past user behavior may differ depending on the category.” (See Pande, at least para. [0056]).
Claim 7: The combination of Khobragade and Wu and Pande discloses all the limitations of claim 6 discussed above.
Neither Khobragade nor Wu expressly discloses wherein the domain-specific training data is labeled fashion data.
However, Pande discloses wherein the domain-specific training data is labeled fashion data (See Pande, at least para. [0056], models were trained for distinct categories of items, in this case, merchandise for an online retailer in the areas of clothing, home products, baby goods, and electronics items).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade and the social networking marketplace of Wu the ability wherein the domain-specific training data is labeled fashion data as disclosed by Pande since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to because “a role of item embeddings or image embeddings or past user behavior may differ depending on the category.” (See Pande, at least para. [0056]).
Claim 15: The combination of Khobragade and Wu discloses all the limitations of claim 14 discussed above.
Neither Khobragade nor Wu expressly discloses generate a set of training data; and fine-tune, using the set of training data, the multi-modal machine learning model prior to generating, using the multi-modal machine learning model, a collection of embeddings for the collection of items.
However, Pande discloses:
generate a set of training data (See Pande, at least para. [0054], model is trained using item attributes and descriptions included in an item collection); and
fine-tune, using the set of training data, the multi-modal machine learning model prior to generating, using the multi-modal machine learning model, a collection of embeddings for the collection of items (See Pande, at least para. [0054]], generating embeddings based on image data associated with the item, as well as embeddings based on text data associated with the item; embeddings are generated using a pre-trained model; para. [0056], models were trained for distinct categories of items, in this case, merchandise for an online retailer in the areas of clothing, home products, baby goods, and electronics items)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade and the social networking marketplace of Wu the ability of generate a set of training data; and fine-tune, using the set of training data, the multi-modal machine learning model prior to generating, using the multi-modal machine learning model, a collection of embeddings for the collection of items as disclosed by Pande since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to because “a role of item embeddings or image embeddings or past user behavior may differ depending on the category.” (See Pande, at least para. [0056]).
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Khobragade in view of Wu as applied to claim 1 above, and further in view of US 11,113,744 B2 to Mantha et al. (hereinafter “Mantha”).
The combination of Khobragade and Wu discloses all the limitations of claim 1 discussed above.
Khobragade further discloses wherein receiving the selection of the item comprises receiving a selection of a plurality of items…(See Khobragade, at least col. 4, lines 30-35, reference set may include more than one product); and wherein determining the first embedding for the selected item comprises: determining a plurality of first embeddings, each embedding of the plurality of first embeddings corresponding to an item of the plurality of selected items (See Khobragade, at least col. 4, lines 30-35, Product 1 and Product 3 make up the reference set and each has its own vector).
Khobragade does not expressly disclose each of the plurality of items including a distinct item image and distinct item text.
However, Wu discloses each of the plurality of items including a distinct item image and distinct item text (See Wu, at least para. [0024], machine learning module trains a model to map product listings to product listing embeddings; para. [0026], content associated with each product listing may comprise one or more text items and one or more image items; para; [0036], content interaction embedding includes text interaction history and image interaction history).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade the ability of each of the plurality of items including a distinct item image and distinct item text as disclosed by Wu since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to “encourage users to interact with the products and product listings…[by] show[ing] product listings on a personalized or customized basis to users accessing the marketplace.” (See Wu, at least para. [0018]).
Neither Khobragade nor Wu expressly discloses averaging the plurality of first embeddings.
However, Mantha discloses a system and method of “providing personalized recommendations through large-scale deep-embedding architecture.” (See Mantha, at least col. 1, lines 8-10). Mantha further discloses “training two sets of item embeddings for items in an item catalog and a set of user embeddings for users, using a triple embeddings model, with triplets. The triplets each can include a respective first user of the users, a respective first item from the item catalog, and a respective second item from the item catalog, in which the respective first user selected the respective first item and the respective second item in a respective same basket.” (See Mantha, at least Abstract). Mantha further discloses averaging the plurality of first embeddings (See Mantha, at least para. [0101], an anchor set of all items in a shopping basket can be determined by taking the average embedding of all the items in the shopping basket).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade and the social networking marketplace of Wu the ability of averaging the plurality of first embeddings as disclosed by Mantha since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so because “[o]nline grocery shopping is typically highly personal” and it “can be advantageous to provide relevant recommendations at one of more points of the shopping experience.” (See Mantha, at least col. 8, lines 55-65).
Claims 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Khobragade in view of Wu as applied to claim 1 above, and further in view of US 2023/0260001 A1 to de Juan et al. (hereinafter “de Juan”).
Claim 9: The combination of Khobragade and Wu discloses all the limitations of claim 1 discussed above.
Neither Khobragade nor Wu expressly discloses wherein the multi-modal machine learning model is based on a pre-trained visual language machine learning model.
However, de Juan discloses systems and methods for “product similarity detection and recommendation” in which users may “view articles and/or other content that includes images depicting products. (See de Juan, at least Abstract). de Juan further discloses a “vector embedding model is used to generate product vector representations of the products. Catalog items that are available from a catalog to supplement the articles and other content may be processed to generate catalog item vector representations. When content (an article) with an image depicting a product is to be displayed to the user, similarity between a product vector representation of the product and the catalog item vector representations is determined in order to identify and display catalog items depicting products that are similar to the product depicted by the image in the content.” (See de Juan, at least Abstract). de Juan further discloses wherein the multi-modal machine learning model is based on a pre-trained visual language machine learning model (See de Juan, at least para. [0034], computer vision model of neural network layers is used to identify items in images).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade and the social networking marketplace of Wu the ability wherein the multi-modal machine learning model is based on a pre-trained visual language machine learning model as disclosed by de Juan since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so because if a user cannot identify items in images “the user experience is less immersive and informative due to the lack to additional information about the products that would otherwise be helpful and interesting to the user.” (See de Juan, at least para. [0001]).
Claim 10: The combination of Khobragade and Wu discloses all the limitations of claim 1 discussed above.
Khobragade does not expressly disclose wherein determining the similarities between the target embedding and embeddings of the collection of embeddings comprises determining, in a latent feature space, a…similarity between the target embedding and at least some of the embeddings of the collection of embeddings.
However, Wu discloses wherein determining the similarities between the target embedding and embeddings of the collection of embeddings comprises determining, in a latent feature space, a…similarity between the target embedding and at least some of the embeddings of the collection of embeddings (See Wu, at least FIG. 6 and associated text, embedding space; para. [0046], social networking system filters the set of product listings by searching the product listing embeddings for product listings that are similar to the content interaction history of the first user).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade the ability wherein determining the similarities between the target embedding and embeddings of the collection of embeddings comprises determining, in a latent feature space, a…similarity between the target embedding and at least some of the embeddings of the collection of embeddings as disclosed by Wu since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to “encourage users to interact with the products and product listings…[by] show[ing] product listings on a personalized or customized basis to users accessing the marketplace.” (See Wu, at least para. [0018]).
Neither Khobragade nor Wu expressly discloses that the similarity is a cosine similarity.
However, de Juan discloses that the similarity is a cosine similarity (See de Juan, at least para. [0056], item ranker uses cosine similarity to computer similarity between a product vector representation of a detected product (e.g., the hat being worn by Yori in the image 716) and a catalog item representation of a catalog item).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade and the social networking marketplace of Wu the ability that the similarity is a cosine similarity as disclosed by de Juan since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so because if a user cannot identify items in images, “the user experience is less immersive and informative due to the lack to additional information about the products that would otherwise be helpful and interesting to the user.” (See de Juan, at least para. [0001]).
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Khobragade in view of Wu as applied to claim 12 above, and further in view of US 2019/0295114 A1 to Pavletic et al. (hereinafter “Pavletic”).
The combination of Khobragade and Wu discloses all the limitations of claim 12 discussed above.
Khobragade does not expressly disclose determining a plurality of conversation state embeddings by determining an embedding for each conversation state of the conversation history.
However, Wu discloses determining a plurality of conversation state embeddings by determining an embedding for each conversation state of the conversation history (See Wu, at least para. [0024], content interaction embedding includes text-interaction history, image interaction history).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade the ability of determining a plurality of conversation state embeddings by determining an embedding for each conversation state of the conversation history as disclosed by Wu since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to “encourage users to interact with the products and product listings…[by] show[ing] product listings on a personalized or customized basis to users accessing the marketplace.” (See Wu, at least para. [0018]).
Neither Khobragade nor Wu expressly discloses determining a weighted average of the plurality of conversation state embeddings, the weighted average including a time-dependent weight applied to each conversation state of the plurality of conversation state embeddings.
However, Pavletic discloses a system and method related “to the field of digital banking, and more particularly to computer implemented systems and methods for efficiently maintaining data structures adapted for provisioning personal financial insights and rewards using at least an extracted set of information derived from transaction information and other tracked information. (See Pavletic, at least para. [0002]). Pavletic further discloses that “multi-dimensional vectors are digital representations that are extracted and collated through a corpus of transactions and other interactions (e.g., social media interactions). For example, the multi-dimensional vectors may be a data structure storing a series of numerical or string values that each represent a dimension for analysis (e.g., type of retailer—coffee shop is 15, diner is 18). The numerical or string values are adapted to include similarities with similar values (e.g., coffee shop and diner are closer to one another than coffee shop (15) and gym (42).” (See Pavletic, at least para. [0076]). Pavletic further discloses determining a weighted average of the plurality of conversation state embeddings, the weighted average including a time-dependent weight applied to each conversation state of the plurality of conversation state embeddings (See Pavletic, at least para. [0110], user has two previous coffee transactions with respective embeddings; system averages the two embeddings; user makes a new transaction at pizza hut; system is configured to apply more weight to recent transactions and thus the newest transaction at Pizza Hut is weighted more heavily than the previous coffee transactions).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade and the social networking marketplace of Wu the ability of determining a weighted average of the plurality of conversation state embeddings, the weighted average including a time-dependent weight applied to each conversation state of the plurality of conversation state embeddings as disclosed by Pavletic since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so because “[t]racking user information and behavior is helpful to generate insights in relation to tailoring products and recommendations.” (See Pavletic, at least para. [0003]).
Claims 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Khobragade in view of Wu and further in view of Pande as applied to claim 15 above, and further in view Misra and in further view of US 2020/0320348 A1 to Yang et al. (hereinafter “Yang”).
Claim 16: The combination of Khobragade and Wu and Pande discloses all the limitations of claim 15 discussed above.
Neither Khobragade nor Wu expressly discloses for each item of a plurality items, determine item attributes.
However, Pande discloses for each item of a plurality items, determine item attributes (See Pande, at least para. [0054], model is trained using item attributes and descriptions included in an item collection).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade and the social networking marketplace of Wu the ability for each item of a plurality items, determine item attributes as disclosed by Pande since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to because “a role of item embeddings or image embeddings or past user behavior may differ depending on the category.” (See Pande, at least para. [0056]).
Neither Khobragade nor Wu nor Pande expressly discloses each of the training instances including… a question determined by inputting at least some of the item attributes into a question template, and an answer to the question.
However, Misra discloses each of the training instances including… a question determined by inputting at least some of the item attributes into a question template, and an answer to the question (See Misra, at least para. [0037], advisor system may provide one or more questions associated with the problem to the client device and receive answers to the questions from the client device; advisor system may receive, from the client device, one or more answers to the one or more questions and may determine the description of the problem based on the one or more answers; paras. [0019]-[0028] provide examples of the questions and answers, i.e., the attributes).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade and the social networking marketplace of Wu and the recommendation system and method of Pande the ability each of the training instances including… a question determined by inputting at least some of the item attributes into a question template, and an answer to the question as disclosed by Misra since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to help guide a user to solve a problem “when the user only has a general idea about the problem to solve.” (See Misra, at least para. [0018]).
Neither Khobragade nor Wu nor Pande nor Misra expressly discloses for each item of the plurality of items, accessing one or more images of the item; and for each item of the plurality of items, generating a plurality of training instances, each of the training instances including at least one of the one or more images.
However, Yang discloses a system and method “for training an inference model…[including] providing a text-to-vector converter; providing the inference model and pre-training the inference model using labeled fashion entries”. (See Yang, at least Abstract). Yang further discloses for each item of the plurality of items, accessing one or more images of the item (See Yang, at least para. [0174], image inference model is trained using a plurality of entries, each entry includes an image and a label of the images); and for each item of the plurality of items, generating a plurality of training instances, each of the training instances including at least one of the one or more images (See Yang, at least para. [0174], image inference model is trained using a plurality of entries, each entry includes an image and a label of the images).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade and the social networking marketplace of Wu and the recommendation system and method of Pande and the advisor system and method of Misra the ability of each of the training instances including… a question determined by inputting at least some of the item attributes into a question template, and an answer to the question as disclosed by Misra since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to allow consumers to efficiently evaluate products by showing attributes of products accurately. (See Yang, at least para. [0004]).
Claim 17: The combination of Khobragade and Wu and Pande and Misra and Yang discloses all the limitations of claim 16 discussed above.
Neither Khobragade nor Wu expressly discloses wherein the plurality of items are fashion items.
However, Pande discloses wherein the plurality of items are fashion items (See Pande, at least para. [0056], models were trained for distinct categories of items, in this case, merchandise for an online retailer in the areas of clothing, home products, baby goods, and electronics items).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the system and method of Khobragade-Wu-Pande-Misra-Yang the ability wherein the plurality of items are fashion items as further disclosed by Pande since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to because “a role of item embeddings or image embeddings or past user behavior may differ depending on the category.” (See Pande, at least para. [0056]).
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Khobragade in view of Wu as applied to claim 14 above, and further in view of US 9,361,640 B1 to Donsbach et al. (hereinafter “Donsbach”).
The combination of Khobragade and Wu discloses all the limitations of claim 14 discussed above.
Neither Khobragade nor Wu expressly discloses wherein displaying the recommended item via the user interface comprises displaying a purchase link associated with the recommended item; and wherein the instructions, when executed by the processor, further cause the recommender system to: receive a selection of the purchase link via the user interface; and in response to the selection of the purchase link, add the recommended item to a digital shopping cart.
However, Donsbach discloses systems and methods for ordering products in which the “customer enters keywords (or generic terms) that describe a desired item on a first portion of a display screen and search result items associated with each of the keywords on the list automatically appear on a second portion of the display screen. When the customer selects the desired items from the second portion of the display screen, the selected items automatically appear on a third portion of the display screen. The customer can then click a “Check Out” button on the same screen and have the products shipped to him/her.” (See Donsbach, at least Abstract). Donsbach further discloses wherein displaying the recommended item via the user interface comprises displaying a purchase link associated with the recommended item (See Donsbach, at least FIG. 2 and associated text; col. 5, lines 58-67, an “Add” or “Add to Cart” button is provided under each item in the recommendations viewer); and wherein the instructions, when executed by the processor, further cause the recommender system to:
receive a selection of the purchase link via the user interface (See Donsbach, at least FIG. 2 and associated text; col. 5, lines 58-67, an “Add” or “Add to Cart” button is provided under each item in the recommendations viewer; when this button is selected, the corresponding item is added to a shopping cart window); and
in response to the selection of the purchase link, add the recommended item to a digital shopping cart (See Donsbach, at least FIG. 2 and associated text; col. 5, lines 58-67, an “Add” or “Add to Cart” button is provided under each item in the recommendations viewer; when this button is selected, the corresponding item is added to a shopping cart window).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in the recommendation system and method of Khobragade and the social networking marketplace of Wu the ability wherein displaying the recommended item via the user interface comprises displaying a purchase link associated with the recommended item; and wherein the instructions, when executed by the processor, further cause the recommender system to: receive a selection of the purchase link via the user interface; and in response to the selection of the purchase link, add the recommended item to a digital shopping cart as disclosed by Donsbach since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. One of ordinary skill in the art would have been motivated to do so in order to “to reduce time spent ordering groceries or other items.” (See Donsbach, at least col. 1, lines 40-46).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANNE MARIE GEORGALAS whose telephone number is (571)270-1258 E.S.T.. The examiner can normally be reached on Monday-Friday 8:30am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Marissa Thein can be reached on 571-272-6764. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Anne M Georgalas/
Primary Examiner, Art Unit 3689