Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over LIPKA et al. (US Pub No. 2022/0253477), in view of Miller et al. (US Pub No. 2024/0354641).
As to claims 1, 12, 17, LIPKA teaches a method comprising:
accessing, by a photo application of a mobile device, a corpus of image files, wherein each image file of the corpus of image files comprises respective metadata (i.e. the user determines a search query, and provides it to an image search engine, [0041]; At operation 205, the system identifies a query mention from the received search query, [0042]; query mentions (m) in a search query (q) can be linked to knowledge graph entities (e.g., via entity linking), [0044]);
identifying, by the photo application of the mobile device, a plurality of signals based at least in part on an analysis of the respective metadata of each image file of the corpus of image files (i.e. the search query 125 may also be a same media type as the search objects (e.g., an image), [0024]; the user may provide a search query including query object such as an image query, [0025]; The knowledge graph component updates a knowledge graph to include a surrogate node corresponding to a search query (e.g., corresponding to a query mention included in a search query), [0028]; FIG. 7 shows an example of a process 700 for link prediction according to aspects of the present disclosure. For example, aspects of the techniques described herein use an entity linking algorithm, providing the ability to map queries to entities of an underlying knowledge graph. Knowledge-derived query suggestion relies on properly identifying the entities occurring in the query. The entity linking system can annotate the query mentions in the query with known entities and a task is concluded, [0088]);
generating, by the photo application of the mobile device, a knowledge graph data structure representing associations between the corpus of image files and the plurality of signals (i.e. Embeddings are computed for the entities of the updated knowledge graph. For example, embeddings are computed (e.g., via a knowledge graph embedding algorithm) for the knowledge graph that includes additional edges between surrogate entities (em) and other entities of the original knowledge graph, [0044]; knowledge graph component 415 updates a knowledge graph with a surrogate entity corresponding to the at least one query mention and with a set of connection elements connecting the surrogate entity to a set of entities in the knowledge graph, [0057]);
traversing, by the photo application of the mobile device, the knowledge graph data structure to generate a respective score for each signal of the plurality of signals (i.e. A distance between the surrogate entity and each of the candidate entities can be calculated based on the vector representation, and one or more neighboring entities from among the candidate entities may be selected and retrieved based on the computed distances, [0045]; For each surrogate entity, em, k nearest neighbors (kNN) of linked entities are determined, el (i.e., top k entities whose embeddings are the closest to the embedding of linked entities of a specific surrogate entity). Triplets are predicted in the form of (em, score, {circumflex over (t)}), where {circumflex over (t)} is an entity belonging to the k nearest neighbors (kNN) of em, and score is a special relation which may be defined as the distance between the corresponding linked entity and {circumflex over (t)}, and may represent the confidence score of the entity linking component, [0092);
generating, by the photo application of the mobile device, a plurality of search objects comprising a respective search object for each signal of the plurality of signals (i.e. Next, for each query mention m in the query set, a new node (i.e., an entity) en, is added to the knowledge graph, referred to as a surrogate entity. Surrogate entities are connected to linked entities if the surrogate entities are present in the knowledge graph. Therefore, the new triplets have the form of (em, sc, el), where el is the linked entity and sc is the confidence score, [0090]); At operation 215, the system retrieves search results based on at least one neighboring entity of the surrogate entity in the updated knowledge graph, [0045]; The present disclosure uses entity linking algorithms to map query mentions in search queries to multiple knowledge graph entities ... the output of some entity linking methods is accompanied by confidence scores. The scores can be used to weight edges, where the edges are connected to the knowledge graph entities, [0089]);
ranking, by the photo application of the mobile device, the plurality of respective search objects for each signal of the plurality of signals to generate a plurality of ranked search objects, the ranking based at least in part on the score for each signal of the plurality of signals (i.e. The argmin function may be used when f is a ‘distance’ function, and may be replaced with argmax function if f is a ‘similarity’ function. The knowledge graph embedding algorithm is used to compute low dimensional embeddings for entities. Given the vector representation of an entity, the present disclosure uses a link prediction model, ranking entities based on relevance to the surrogate entities. Link prediction may be defined with Eq. 1. Given a head entity h and a relation r the goal is to infer a tail entity {circumflex over (t)} that completes a triplet, [0091]; The task of finding candidate query suggestions is an application of link prediction. For example, in entity-oriented search with low recall, knowledge graph entities are ranked as tail entity candidates using a link prediction model. The link prediction model ranks related entities globally or to a selected relation type and can use high ranked predicted tails as suggested queries for respective head entities. Entity linking may be used, where the correct linking of mentions in the query is provided to the entities of the underlying knowledge graph, [0096]);
generating, by the photo application of the mobile device, a phrase from the plurality of signals and the plurality of ranked search objects (i.e. In one embodiment, the information retrieval model of the present disclosure (e.g., in the example of FIG. 8, a query 800 may include ‘U.S. Navy Shipyard,’ and entity linking 805 may link the query 800 (e.g., or a query mention 800) to knowledge graph entities 810 including ‘Newport News Shipbuilding’ and ‘United States Navy.’ Based on the techniques described herein, a predicted link 820 may be established and a reformulated query (e.g., a suggested updated query) ‘Pearl Harbor Hawaii,’ ‘Norfolk Naval,’ or both. As shown in FIG. 3, such an updated query (e.g., updated query 810) may result in more numerous search objects retrieved, more precise (e.g., relevant) search objects retrieved, or both, [0094]; query suggestion tasks may be considered as a ranking formulation where a set of candidate suggested queries (e.g., q′) are ranked and provided to the user given an initial query, q, and a score function, score (q, q′), [0105]); and
presenting, by the photo application of the mobile device, the phrase as a graphical element of a graphical user interface of the photo application on a display of the mobile device according to the ranking (i.e. At operation 220, the system displays the retrieved search results corresponding to the neighboring entity (e.g., for the user to view in response to an input search query), [0046]; FIG. 3 shows an example of a knowledge-driven search suggestion display according to aspects of the present disclosure. The example shown includes search query 300, search results 305, updated search query 310, and updated search results 315. For instance, a retrieval network may receive a search query 300 from a user, [0047]; as shown in FIG. 3 ... display the search query 300 and corresponding search results 305, as well as the updated search query 310 and the corresponding updated search results 315 (e.g., such that the user may compare and analyze the updated search query 310 that was reformulated from the search query 300 by the information retrieval system), [0051]; generating one or more updated search queries based on the neighboring entity, where search results are retrieved based on the updated search query ... there may be multiple possible new queries, each of which can be issued and the results either merged together or displayed separately to the user. Some examples of the method described above further include ranking the search results based on relevance to the neighboring entity. Some examples of the method described above further include organizing the search results into categories based on relevance to a plurality of neighboring entities in the updated knowledge graph, [0087]).
Although LIPKA implicitly teaches the term "signals" (identifying ... a plurality of signals based at least in part on an analysis of the respective metadata of each image file of the corpus of image files) (i.e. The knowledge graph component updates a knowledge graph to include a surrogate node corresponding to a search query (e.g., corresponding to a query mention included in a search query), [0028]; other types of queries and search objects may be used. In some examples, the search query 125 may also be a same media type as the search objects (e.g., an image), [0024]; the user may provide a search query including query object such as an image query, [0025]; FIG. 7 shows an example of a process 700 for link prediction according to aspects of the present disclosure. For example, aspects of the techniques described herein use an entity linking algorithm, providing the ability to map queries to entities of an underlying knowledge graph. Knowledge-derived query suggestion relies on properly identifying the entities occurring in the query. The entity linking system can annotate the query mentions in the query with known entities and a task is concluded, [0088]), LIPKA does not clearly state this term.
Miller specifically teaches "identifying ... a plurality of signals based at least in part on an analysis of the respective metadata of each image file of the corpus of image files" (i.e. convert the image data into a signal or data format suitable for delivery to the image display device, [0225]; The environmental components 1132 include, for example, one or cameras (with still image/photograph and video capabilities) ... signals corresponding to a surrounding physical environment, [0245]; A media overlay may include text or image data that can be overlaid on top of a photograph taken by the user system 102 or a video stream produced by the user system 102. In some examples, the media overlay may be a location overlay (e.g., Venice beach), a name of a live event, or a name of a merchant overlay (e.g., Beach Coffee House), [0053]; where a particular image included in the message image payload 906 depicts an animal (e.g., a lion), a tag value may be included within the message tag 920 that is indicative of the relevant animal, [0217]).
It would have been obvious to one of ordinary skill of the art having the teaching of LIPKA, Miller before the effective filing date of the claimed invention to modify the system of LIPKA to include the limitations as taught by Miller. One of ordinary skill in the art would be motivated to make this combination in order to apply multimodal memory embeddings to content recommendation using deep learning techniques for feature extraction, and fusing these representations to provide more accurate and relevant content suggestions for users in view of Miller ([0025]), as doing so would give the added benefit of enabling content recommendations to be generated easier, faster, or more intuitively obviating a need for certain efforts or resources that otherwise would be involved in a content recommendation process, as taught by Miller ([0025]).
As to claims 2, 13, 18, Miller teaches each signal of the plurality of signals corresponds to one or more of a plurality of signal categories comprising at least one of:
persons (i.e. people, [0089]);
pets (i.e. an entity representing animals, [0090]);
cities (i.e. A map system 222 provides various geographic location, [0060]);
states (i.e. A map system 222 provides various geographic location, [0060]);
scenes (i.e. the media overlay may be a location overlay (e.g., Venice beach), [0053]);
meanings (i.e. image recognition may be deployed in an automated manner to identify objects and location associated with visual media and image data and to generate a keyword cluster or cloud that is then associated with the image-based prompt, [0112]);
seasons (i.e. the personalized AI agent system 232 can create prompts based on the time of day, season, or upcoming events or holidays, such as events that are time sensitive, [0149]);
public events (i.e. a name of a live event, [0053]);
holidays (i.e. holidays, [0053]);
trips (i.e. a user's location in Germany on a recent trip, [0104]);
business names (i.e. Beach Coffee House, [0053]);
home locations (i.e. the processors or processor-implemented components may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other examples, the processors or processor-implemented components may be distributed across a number of geographic locations, [0324]); or
work locations (i.e. a platform that provides enterprise wide connectivity to a group of users such as employees of a company, clients of an enterprise providing professional services, educational institutions, and the like, [0126]).
As to claims 3, 14, 19, LIPKA teaches multiple signals correspond to a
particular category of the plurality of signal categories (i.e. Some examples of the method described above further include ranking the search results based on relevance to the neighboring entity. Some examples of the method described above further include organizing the search results into categories based on relevance to a plurality of neighboring entities in the updated knowledge graph, [0087]).
As to claims 4, 15, 20, LIPKA teaches:
receiving, by the photo application of the mobile device, an input identifying the graphical element, where the graphical element is displayed on a first region of the graphical user interface (i.e. Embodiments of the present disclosure may be used in the context of an image search engine. For example, a retrieval network based on the present disclosure may take natural language text or an image as a query, and efficiently search through millions of images to find images relevant to the query (e.g., or the reformulated query), [0022]);
searching, by the photo application of the mobile device, the corpus of image files to identify one or more matching image files (i.e. The user 100 communicates with the retrieval network 110 via the user device 105 and the cloud 115. For example, the user 100 may provide a search query 125 including query object such as a text query or an image query. In the example illustrated in FIG. 1, the query object includes a natural language search query. The user device 105 transmits the search query 125 to the retrieval network 110 to find related objects or information (e.g., search objects stored within the database 120). In some examples, the user device 105 communicates with the retrieval network 110 via the cloud 115, [0025]); and
presenting, by the photo application of the mobile device, the one or more matching image files on a second region of the graphical user interface (i.e. The retrieval network 110 updates a knowledge graph to include a surrogate entity corresponding to a query mention included in a search query 125. The retrieval network 110 then uses embedding techniques on the updated knowledge graph and retrieves search results (e.g., search objects stored in database 120) based on selected neighboring entities. The database 120 returns results 130 including one or more images related to the search query 125 based on the updated knowledge graph (e.g., the knowledge graph that includes additional edges between surrogate entities and other entities of the original knowledge graph). The search results 130 are presented to the user 100. The process of for using the retrieval network 110 to perform an image search is further described with reference to FIG. 2, [0026]).
As to claims 5, 16, Miller teaches analyzing the metadata comprises:
comparing, by the photo application of the mobile device, pairs of metadata files to identify context information that is present in a first metadata of the pair of metadata files and absent from a second metadata file of the pair of metadata files (i.e. In some examples, the personal AI agent 302 determines that the user is using the AR/VR device 326 and receives input via the AR device including a voice prompt specifying: “What can I cook with these ingredients?” The personal AI agent 302 accesses a camera feed on the AR/VR device 326 and identifies objects within the camera feed, which include potatoes and carrots, [0104]; handling missing values, [0268]); and
associating, by the photo application of the mobile device, the context information with the second metadata (i.e. The personal AI agent 302 accesses multimodal memory 308 for the user and finds relevant user data from a common space vector, such as a like for a friend's potato and carrot soup, a user's location in Germany on a recent trip, and a video post from the user with context relating to authenticity of recipes. The personal AI agent 302 generates a prompt from this common space vector to input into a neural network engine 314 which asks for “authentic German soup recipes that include potato and carrots.” The personal AI agent 302 accesses the multimodal memory 308 and identifies that the user prefers receiving recipe instructions via the AR device, and, in response, proceeds to display recommended recipes received from the neural network engine 314 on the AR device, [0104]).
As per claim 6, Miller teaches the method of claim 5, further comprising:
identifying, by the photo application of the mobile device, two or more image files with corresponding metadata that contain information identifying a common capture location and a common timestamp (i.e. edges 604 include relationships between users and locations ... edges 604 include relationships between users and timestamps, capturing patterns in their activity or interactions over time, [0172]); and
associating, by the photo application of mobile device, the two or more image files with an event at the common capture location and the common timestamp (i.e. When the personalized AI agent system 232 generates a knowledge graph multimodal memory from user data gathered across different modalities, the personalized AI agent system 232 leverages this structured representation to provide a recommended custom response to a user prompt. The personalized AI agent system 232 receives or generates a prompt and constructs a graph query that can be used to retrieve relevant information from the knowledge graph. This query can include nodes 602 (entities) and edges 604 (relationships) that correspond to the user's request, [0176]).
As per claim 7, Miller teaches the method of claim 6, wherein the metadata of two image files contain information identifying the common capture location if:
a first capture location of a first image of the two image files is within a threshold distance of a second capture location of a second image file of the two image files (i.e. node 602 a represents a user (John) and node 602 h represents a particular location (Los Angeles). The edge 604 a that connects node 602 a and node 602 h indicates a geographic user preference for the particular location that node 602 h is linked with, [0174]).
As per claim 8, Miller teaches the method of claim 6, wherein the metadata of two image files contain information identifying the common timestamp:
a first timestamp of a first image file of the two image files is within a threshold time period of a second timestamp of a second image file of the two image files (i.e. edges 604 include relationships between users and timestamps, capturing patterns in their activity or interactions over time. In some examples, edges 604 include relationships between content and timestamps, showing when content was created, shared, or interacted with, [0172]).
As per claim 9, Miller teaches the method of claim 5, wherein the metadata of an image file captured by the photo application at a capture location comprises the context information identifying at least one of:
a label for each identified entity, where an entity is a person or an animal (i.e. Data can be gathered from user content creation and labeled using a machine learning algorithm trained to label data. Data can be generated by applying a machine learning algorithm to identify or generate similar data. This may also include removing duplicates, handling missing values, and converting data into a suitable format, [0268]; an entity representing a dog name can be linked to an entity for the user and to an entity representing animals, [0090]); his or her personal story, [0203]);
an altitude at the capture location (i.e. altitude sensor components, [0238]);
a latitude of the capture location (i.e. The position components include location sensor components to generate location coordinates, [0238]);
a longitude of the capture location (i.e. The position components include location sensor components to generate location coordinates, [0238]);
a name of the capture location (i.e. a “live story” may constitute a curated stream of user-submitted content from various locations and events, [02024]);
a timestamp (i.e. a “live story” may constitute a curated stream of user-submitted content from various locations and events. Users whose client devices have location services enabled and are at a common location event at a particular time may, for example, be presented with an option, via a user interface of the interaction client 104, to contribute content to a particular live story, [0204]);
a speed of the mobile device (i.e. The motion components include
acceleration sensor components (e.g., accelerometer), gravitation sensor components, [0238]);
a direction of travel of the mobile device (i.e. The motion components include ... rotation sensor components, [0238]); or
an orientation of the image file (i.e. orientation sensor components, [0238]).
As per claim 10, Miller teaches the method of claim 1, wherein analyzing metadata of the corpus of image files comprises:
filtering, by the photo application of mobile device, the corpus of image files to identify a set of recent image files (i.e. a user's location in Germany on a recent trip, [0104]); and
identifying, by the photo application of mobile device, the plurality of signals in metadata of the set of recent image files (i.e. The personal AI agent 302 generates a prompt from this common space vector to input into a neural network engine 314 which asks for “authentic German soup recipes that include potato and carrots.” The personal AI agent 302 accesses the multimodal memory 308 and identifies that the user prefers receiving recipe instructions via the AR device, and, in response, proceeds to display recommended recipes received from the neural network engine 314 on the AR device, [0104]).
As per claim 11, Miller teaches the method of claim 1, wherein generating the phrase comprises:
identifying, by the photo application of mobile device, a template that
corresponds to a particular search object (i.e. The personal AI agent 302 accesses a camera feed on the AR/VR device 326 and identifies objects within the camera feed, which include potatoes and carrots, [0104]);
identifying, by the photo application of mobile device, a system language of the mobile device or the photo application (i.e. In some examples, the personal AI agent 302 determines that the user is using the AR/VR device 326 and receives input via the AR device including a voice prompt specifying: “What can I cook with these ingredients?”, [0104]);
providing, by the photo application of mobile device, the template and the particular search object to a machine learning model that is trained to generate language in the system language (i.e. The personal AI agent 302 accesses multimodal memory 308 for the user and finds relevant user data from a common space vector, such as a like for a friend's potato and carrot soup, a user's location in Germany on a recent trip, and a video post from the user with context relating to authenticity of recipes, [0104]); and
receiving, by the photo application of mobile device, the phrase as output from the machine learning model (i.e. The personal AI agent 302 generates a prompt from this common space vector to input into a neural network engine 314 which asks for “authentic German soup recipes that include potato and carrots.” The personal AI agent 302 accesses the multimodal memory 308 and identifies that the user prefers receiving recipe instructions via the AR device, and, in response, proceeds to display recommended recipes received from the neural network engine 314 on the AR device, [0104]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Vishuu et al. (US Pat. 10,997,500) – discloses generating an output vector during training using the artificial neural network and the training data input vector; computing a difference between the output vector generated during training and the reference data output vector; updating a parameter of the artificial neural network using the difference between the output vector generated during training and the reference data output vector; obtaining an input vector associated with a second user, wherein an element of the input vector corresponds to data representing the first property of behavior of the second user with respect to the first item.
Li et al. (US Pub. 2022/0391768) discloses generating, for an instance of input data of the second domain, a feature representation that classifies the instance of input data of the second domain; and refining at least one parameter of the machine learning model using the feature representation; and outputting the machine learning model to be used for classifying input data of an image defined by the second domain.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MIRANDA LE whose telephone number is (571)272-4112. The examiner can normally be reached M-F 7AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached on 571-272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MIRANDA LE/ Primary Examiner, Art Unit 2153