Last updated: April 19, 2026
Application No. 18/328,594
CONTINUOUS INDEXING STRATEGY FOR IMAGE MATCHING AND VISUAL SEARCH SYSTEMS

Non-Final OA §101§103
Filed
Jun 02, 2023
Examiner
CAIADO, ANTONIO J
Art Unit
2164
Tech Center
2100 — Computer Architecture & Software
Assignee
Snap Inc.
OA Round
7 (Non-Final)
Interview Optional

— +49.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 188 resolved cases, 2023–2026
Examiner Intelligence

CAIADO, ANTONIO J View full profile →
Grants 69% — above average
Career Allow Rate
130 granted / 188 resolved
+14.1% vs TC avg
Strong +50% interview lift
Without
With
+49.9%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
23 currently pending
Career history
211
Total Applications
across all art units
Statute-Specific Performance

§101
30.1%
-9.9% vs TC avg
§103
50.5%
+10.5% vs TC avg
§102
3.9%
-36.1% vs TC avg
§112
13.0%
-27.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 188 resolved cases
Office Action

§101 §103
DETAILED ACTION
1.	 Claims 1-20 are pending in this application.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. §102 and §103 (or as subject to pre-AIA  35 U.S.C. §102 and §103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

	Continued Examination Under 37 CFR 1.114
3.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 03/03/2026 has been entered.

Response to Amendment
4.	This office action is in response to applicant’s arguments filed on 03/03/2026 in response to the final action mailed on 01/08/2026.  Claims 1-20 have been amended. Amendment has been entered.

	
Response to Arguments
5.	Applicant's representative arguments, filed on 03/03/2026, with respect to the rejection of claims 1-20 under 35 U.S.C. §101 (mental process) (Applicant’s arguments, pages 10-12), have been fully considered and are not persuasive. 

	Applicant's representative argues that “These operations involve algorithmic model selection, computational feature extraction from digital image data, and feature-space comparison between high-dimensional vector representations. A human cannot practically perform these operations mentally.” However, the algorithmic model selection of the operations is merely instructions used to implement the abstract ideas. A recitation of the words "apply it" (or an equivalent) are mere instructions to implement an abstract idea or other exception on a computer. (See MPEP 2106.05(f)).
	It is noted that the Applicant mention the Example 37 of UPSTO guidance. However, each case is examined on its on merit and the invention of Example 37 of UPSTO guidance is slightly different. 
	Applicant's representative argues that “… the present claims require machine-generated feature vectors and computational similarity comparison that are inherently technical and not performable as mental processes.” However, the claims or the specification have explicitly said that the claims require machine-generated feature vectors. It is also noted that the courts do not distinguish between claims that recite mental processes performed by humans and claims that recite mental processes performed on a computer. See MPEP 2106.04(a)(2)(III) – Mental Processes. 
	Applicant's representative argues that “The claims also do not recite a method of organizing human activity.” The claims are being rejected as mental process. The claims are not being conceded at this time as organizing human activity therefore these arguments are moot.

	Applicant's representative argues that “The additional elements are not generic computer components performing generic functions. Instead, the claims require a plurality of trained image matching models, machine-generated feature vector representations, structured storage separate from textual tags, and computational comparison of vector representations to determine similarity.” The plurality of trained image matching models merely instructions used to implement the abstract ideas. A recitation of the words "apply it" (or an equivalent) are mere instructions to implement an abstract idea or other exception on a computer. (See MPEP 2106.05(f)).

	For the above reasons, the rejection of claims 1-20 under 35 U.S.C. §101 as directed to an abstract idea (Mental Process) is upheld.
	
	Response to Arguments
	Applicant's arguments, filed on 03/03/2026, with respect to the rejection of claims 1,3-8 and 10-21 under 35 U.S.C. §103 (Applicant’s arguments, pages 12-15), have been fully considered and are but are moot because the independent claims are amended and introduce new limitations that were not previously presented newly found prior art has been applied.

Claim Rejections - 35 USC § 101
6.	35 U.S.C. §101 reads as follows:
	Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new 	and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

	Claims 1-20 are rejected under 35 U.S.C. §101 because the claimed invention is directed to an abstract idea (Mental Process) without significantly more. The claims similarly recite steps for continuous indexing of image features for use in image matching and visual systems.
	The following is an analysis based on 2019 Revised Patent Subject Matter Eligibility Guidance (2019 PEG).

Step 1, Statutory Category?
	Claims 1-7 are directed to a system.
	Claims 8-14 are directed to a method.
	Claims 15-20 is directed to non-transitory computer-readable storage medium.
	Therefore, claims 1-20 fall into at least one of the four statutory categories. 

Step 2A, Prong I: Judicial Exception Recited?
	The examiner submits that the foregoing claim limitations constitute a “Mental Process”, as the claims cover performance of the limitations in the human mind, given the broadest reasonable interpretation.
	As per independent claims 1, 8 and 15, the claims similarly recite the limitations of:
	“responsive to the request, selecting, from among a plurality of image matching models each trained to detect different categories of visual features, an image matching model based on the image attributes of the image;” A human can observe an image matching model and, based on their judgment of the observations, choose an optimal image matching model. There is nothing so complex in the limitation that could not be doing in the human mind.
	“generating an image description that comprises a machine-generated feature vector representation including a set of image features output by the selected image matching model;” A human can observe an image and mentally gather information that describes it, using predefined criteria or a template to guide the information gathering process. There is nothing so complex in the limitation that could not be doing in the human mind.
	“associating the image description with the media content within the database as a stored feature vector representation separate from textual tags;” A human can observe image descriptions and, based on their judgment, associate the observed images with a particular database. There is nothing so complex in the limitation that could not be doing in the human mind.
	“accessing the media content within the database based on the set of image features of the image and the attributes of the query;” A human can read media content related to the characteristics of preselected images. There is nothing so complex in the limitation that could not be doing in the human mind.
	As per dependent claims 5, 12 and 19, the claims recite the limitation of:
“generating, at the first server, the image description that comprises the set of image features based on the image associated with the request;” A human can observe an image and mentally gather information that describes it, using predefined criteria or a template to guide the information gathering process. There is nothing so complex in the limitation that could not be doing in the human mind.
“associating, by the first server, the image description with the media content within the database;” A human can observe image descriptions and, based on their judgment, associate the observed images with a particular database. There is nothing so complex in the limitation that could not be doing in the human mind.
“accessing, at the second server, the media content within the database based on the set of image features.” A human can read media content related to the characteristics of preselected images. There is nothing so complex in the limitation that could not be doing in the human mind.

As per dependent claims 6, 13 and 20, the claims recite the limitation of:
“associating the image description with the media content and a location encompassed by a geo-fence within the database;” A human can observe image descriptions and, based on their judgment, associate the observed images with a particular database. There is nothing so complex in the limitation that could not be doing in the human mind. 


As per dependent claims 7 and 14, the claims recite the limitation of:
“associating the image description with the media content and user profile data within the database;” A human can observe image descriptions and, based on their judgment, associate the observed images with a particular database. There is nothing so complex in the limitation that could not be doing in the human mind. 

	Accordingly, claims 1-20 recite at least one abstract idea.

Step 2A, Prong II: Integrated into a Practical Application?
	The claims recite the following additional limitations/elements:
	As per independent claims 1, 8 and 15, the claims recite the additional limitation of:
	The claims also recite the additional limitation of “receiving a request to associate an image with media content within a database, the request comprising metadata and the image comprising image attributes; receiving a query that comprises image data from a client device; extracting query image features from the image data of the query using at least one image matching model and comparing the query image features to the stored feature vector representation of the image description; and causing display of a presentation of the media content at the client device.” that are the insignificant extra-solution activity of data gathering and/or output, and can be understood as activities incidental to the primary process or product that are merely a nominal or tangential addition to the claim (see MPEP 2106.05(g)).
	The claims also recite the additional elements of “a plurality of image matching models; a machine-generated feature vector representation; and a stored feature vector representation separate from textual tags” that is mere instructions to apply an exception. A recitation of the words "apply it" (or an equivalent) are mere instructions to implement an abstract idea or other exception on a computer. (See MPEP 2106.05(f)). 

	As per independent claims 1, the claim recites the additional elements of:
	“at least one processor, at least one memory component, and client devices.” This element is example of mere instruction to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (see MPEP § 2106.05(f)). Specifically, the additional elements of the limitations invoke computers or other machinery merely as a tool to perform an existing process. Use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general-purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) do not provide improvements to the functioning of a computer or to any other technology or technical field; and do not integrate a judicial exception into a practical application.

As per independent claim 8, the claim recites the additional elements of:
	“client devices.” This element is example of mere instruction to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (see MPEP § 2106.05(f)). Specifically, the additional elements of the limitations invoke computers or other machinery merely as a tool to perform an existing process. Use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general-purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) do not provide improvements to the functioning of a computer or to any other technology or technical field; and do not integrate a judicial exception into a practical application.

	As per independent claims 15, the claim recites the additional elements of:
	“a non-transitory computer-readable storage medium, at least one processor, and client devices.” This element is example of mere instruction to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (see MPEP § 2106.05(f)). Specifically, the additional elements of the limitations invoke computers or other machinery merely as a tool to perform an existing process. Use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general-purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) do not provide improvements to the functioning of a computer or to any other technology or technical field; and do not integrate a judicial exception into a practical application.


	As per dependent claims 2, 9 and 16, the claims recite the additional limitation of:
	The claim also recites the additional limitations of “a first image matching model trained to detect local keypoint features; and a second image matching model trained to detect semantic object-level features.” that is mere instructions to apply an exception. A recitation of the words "apply it" (or an equivalent) are mere instructions to implement an abstract idea or other exception on a computer. (See MPEP 2106.05(f)). 

As per dependent claims 3, 10 and 17, the claims recite the additional limitations of:
The claim also recites the additional limitations of “a scale-invariant feature transform (SIFT); Gradient Location and Orientation Histogram (GLOH); histogram of oriented gradients (HOG); Fast Image Retrieval (FIRe); and Contrastive Language Image Pre-Training (CLIP) and wherein at least one of the plurality of image matching models generates a feature vector representation of the image.” that is mere instructions to apply an exception. A recitation of the words "apply it" (or an equivalent) are mere instructions to implement an abstract idea or other exception on a computer. (See MPEP 2106.05(f)). 

As per dependent claims 4, 11 and 18, the claims recite the additional limitations of:
The claim also recites the additional limitations of “wherein the media content comprises Augmented-Reality (AR) content and wherein the similarity between the query image features and the stored feature vector representation determined rendering of the AR content.” that is mere instructions to apply an exception. A recitation of the words "apply it" (or an equivalent) are mere instructions to implement an abstract idea or other exception on a computer. (See MPEP 2106.05(f)). 

As per dependent claims 5, 12 and 19, the claims recite the additional limitations of:
	The claim also recites the additional limitations of “ receiving, at a first server from among the plurality of servers associated with the system, the request to associate the image with the media content; receiving, at the second server from among the plurality of servers, the query that comprises image data from the client device; extracting, at the second server, query image features from the image data and comparing the query image features to the stored feature vector representation to determine a similarity score;”, and the courts have recognized that receiving or transmitting data over a network, e.g., using the Internet to gather data, as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. See MPEP 2106.05(d)(II)(i). 

	As per dependent claims 6 and 13, the claims recite the additional limitations of:
The claim also recites the additional limitations of “wherein the query includes location data that identifies the location encompassed by the geo-fence [[.]] and wherein accessing the media content further requires satisfaction of the geo-fence constraint in addition to the similarity comparison.” that is mere instructions to apply an exception. A recitation of the words "apply it" (or an equivalent) are mere instructions to implement an abstract idea or other exception on a computer. (See MPEP 2106.05(f)). 

	As per dependent claims 14 and 20, the claims recite the additional limitations of:
The claim also recites the additional limitations of “wherein the query includes the user profile data, and wherein accessing the media content further comprises considering the user profile data in combination with the similarity comparison between the extracted query image features and the stored feature vector representation.” that is mere instructions to apply an exception. A recitation of the words "apply it" (or an equivalent) are mere instructions to implement an abstract idea or other exception on a computer. (See MPEP 2106.05(f)). 

	Therefore, claims 1-20 do not integrate the recited abstract ideas into a practical application.
	
Step 2B: Claim provides an Inventive Concept?
	With respect to the limitations identified as insignificant extra-solution activity above the conclusions are carried over, and both the “receiving …; extracting …; and causing display …;” are well-understood, routine, and conventional operations.
	For support as being well-understood, routine, and conventional for receiving …; and causing display …;” as noted by the courts is well understood routine and conventional, see MPEP 2106.05(d)(ii) “i. Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); … buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network);” and/or MPEP 2106.05(d)(ii) “iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93;”, and/or MPEP 2106.05(d)(II) “iii. Ultramercial, 772 F.3d at 716, 112 USPQ2d at 1755 (updating an activity log);” See also see [Display Interface - an overview | ScienceDirect Topics, Introducing ASP.NET Web Pages - Displaying Data | Microsoft Docs, Execute DBCC PAGE command to Display Contents of Data Pages in SQL Server (kodyaz.com) and Load and display paged data  |  Android Developers].
	Looking at the limitations in combination and the claim as a whole does not change this conclusion and the claim is ineligible.	
	Therefore, the claims 1-20 are not patent eligible. 

Claim Rejections - 35 USC § 103
7. 	In the event the determination of the status of the application as subject to AIA  35 U.S.C. § 102 and § 103 (or as subject to pre-AIA  35 U.S.C. § 102 and § 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
 The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
	A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically 	disclosed as set forth in section § 102 of this title, if the differences between the claimed invention and the prior art are 	such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed 	invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be 	negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under pre-AIA  35 U.S.C. § 103(a) are summarized as follows:

1.    Determining the scope and contents of the prior art.
2.    Ascertaining the differences between the prior art and the claims at issue.
3.    Resolving the level of ordinary skill in the pertinent art.
4.    Considering objective evidence present in the application indicating obviousness or nonobviousness.

8.	Claims 1-2, 8-9 and 15-16 are rejected under 35 U.S.C. § 103 as being unpatentable over Li et. al. (US 20180011876 A1) hereinafter Li I in view of Li et. al. (US 20240202230 A1) hereinafter Li II in further view of Kale et al. (US 20210034657 A1).
	
	As per claim 1, Li I teaches a system comprising (i.e. “FIGS. 1A and 1B are block diagram illustrating an example of system configuration for matching images with content items according to some embodiments of the invention.”; figs. 1A-B, para. [0019]):
	at least one processor (i.e. “processor 1501”; fig. 9, para. [0076]);
	at least one memory (i.e. “memory 1503”; fig. 9, para. [0076]) component storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising (i.e. “Module/unit/logic 1528 may also reside, completely or at least partially, within memory 1503 and/or within processor 1501 during execution thereof by data processing system 1500, memory 1503 and processor 1501 also constituting machine-accessible storage media.”; fig. 9, para. [0076]-[0077]):
	receiving a request to associate an image with media content (i.e. “Referring to FIG. 8, at block 801, processing logic receives a number of image search requests for searching images to be associated with a number of content items.”; figs. 7-8, para. [0064])
	within a database (i.e. “For example, referring back to FIGS. 1A-1B, content database (DB) or server 133 may be an Ads database or Ads server.”; para. [0065]),
	the request comprising metadata (i.e. “an index table for content provider based image searching, which includes a number of mapping entries, each entry mapping a content provider ID to one or more image IDs.”; figs. 7-8, para. [0046], [0060], [0064]; Examiner note: the image search includes metadata such as number of mapping entries, each entry mapping a content provider ID to one or more image IDs)
	and the image comprising image attributes (i.e. “For each of the images that are identified as image candidates to be matched with a content item, a feature score is calculated for each of the features (e.g., image attributes or properties, and/or any other metadata or circumstantial data surrounding the image) that are extracted or determined from the image.”; fig. 7, para. [0060]-[0062]);
	responsive to the request (i.e. “in response to a search query received from a client at block 701”; fig. 7:701, para. [0062]); 
	receiving a query that comprises image data and attributes from a client device (i.e. “receiving a query image, recognizing one or more text strings on the query image,”; para. [0096]; Examiner note: the query is interpreted as the query image. The image data and attributes is interpreted as the one or more text strings on the query image. Further, i.e. “The computing system can include client devices and servers”; para. [0120])
	extracting query image features from the image data of the query using at least one image matching model (i.e. “one or more text strings may also be extracted from the input image, for example, using optical character recognition (OCR), and/or translated from other languages into English using a computing model”; para. [0028]; Examiner note: the query image features are interpreted as the one or more text strings. the using at least one image matching model is interpreted as the using optical character recognition (OCR)) and 
comparing the query image features to the stored feature vector representation of the image description (i.e. “the digital content contextual tagging system 106 can compare the search query (and/or search query n-gram) and tags (or other information) associated with the one or more selected images from the search results to determine which selected image will be associated with a multi-term contextual tag.”; para. [0069]; Examiner note: the stored feature vector representation of the image description is interpreted as the tags (or other information);
However, it is noted that the prior art of Li I does not explicitly teach “selecting, from among a plurality of image matching models each trained to detect different categories of visual features, an image matching model based on the image attributes of the image; generating an image description that comprises a machine-generated feature vector representation including a set of image features output by the selected image matching model; associating the image description with the media content within the database as a stored feature vector representation separate from textual tags; accessing the media content within the database based on the set of image features of the image and the attributes of the query; and causing display of a presentation of the media content at the client device.” 
On the other hand, in the same field of endeavor, Li II teaches selecting, from among a plurality of image matching models each trained to detect different categories of visual features, an image matching model based on the image attributes of the image (i.e. “the image search system is flexible, allowing user-defined and/or user-selected computing models (e.g., machine learning models) to be used.”; para. [0027]. Further, i.e. “the image search system may not match images solely based on the input image (e.g., the query image) alone. In certain embodiments, one or more text strings may also be extracted from the input image, for example, using optical character recognition (OCR), and/or translated from other languages into English using a computing model.”; para. [0028]; Examiner note: the selecting, from among a plurality of image matching models is interpreted as user-defined and/or user-selected computing models (e.g., machine learning models). The image attributes of the image are interpreted as the one or more text strings);
	generating an image description that comprises a machine-generated feature vector representation including a set of image features output by the selected image matching model (i.e. “the query vector represents one or more characteristics of the query image. In some embodiments, the query vector is converted from the query image. In certain embodiments, the query vector depends on the computing model used. In some embodiments, the query vector is a first query vector generated based on the query image using a first computing mode”; para. [0034]; Examiner note: the image description is interpreted as the characteristics of the query image. The machine-generated feature vector is interpreted as the query vector. The image features output is interpreted as the query vector generated based on the query image using a first computing mode);
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Li II that teaches provide systems and methods for searching images based on a query image content into Li I that teaches searching content with multi-dimensional image matching in response to a search query. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to improve techniques for searching data repositories—containing both text and images—using a query image. This allows users to search using an image and/or conduct searches based on both the query vector and semantic meaning extracted from the image (Li II, para. [0003]-[0004]).
	However, it is noted that the prior art of Li I and Li II do not explicitly teach “associating the image description with the media content within the database as a stored feature vector representation separate from textual tags; accessing the media content within the database based on the set of image features of the image and the attributes of the query; and causing display of a presentation of the media content at the client device.” 
	On the other hand, in the same field of endeavor, Kale teaches associating the image description with the media content within the database as a stored feature vector representation separate from textual tags (i.e. “Moreover, as illustrated in FIG. 5B, the digital content contextual tagging system 106 can store the digital images with the associated multi-term contextual tags (from act 524) in the image index 216 (e.g., back fill into the image collection).”; figs. 2, 5A-B, para. [0050], [0101]; Examiner notes: herein the image description is considered as the multi-term contextual tags; the media content is considered as the digital images; and the database is considered as the image index. Further, i.e. “associate multi-term contextual tags and scores with images”; para. [0047]; Examiner note: The stored feature vector representation separate from textual tags is interpreted as the scores);
	accessing the media content within the database based on the set of image features of the image and the attributes of the query (i.e. “FIG. 4, the digital content contextual tagging system 106 identifies user selections of the digital images portraying a “person with animal on white shirt” from the image search results as the selected images 406. Additionally, as shown in FIG. 4, the digital content contextual tagging system 106 also identifies a search query n-gram 404 from the search query in the interface 402.”; fig. 4, para. [0060], [0065]-[0066]; Examiner note: the accessing the media content within the database is interpreted as the identifies user selections of the digital images portraying a “person with animal on white shirt”. The set of image features of the image is interpreted as the person face, shirt, … in the selected images showed in fig. 4:406. The attributes of the query is interpreted as the search query n-gram in fig. 4:404); and 
	causing display of a presentation of the media content at the client device (i.e. “Indeed, as shown in FIG. 6, upon ranking the identified images in act 608, the digital content contextual tagging system 106 can provide the identified images in act 610 as search results (e.g., images portraying persons wearing animal shirts based on a correspondence to a multi-term contextual tag of “person with animal on white shirt”). Indeed, as illustrated in FIG. 6, the digital content contextual tagging system 106 can utilize propagated multi-term contextual tags to provide more accurate search results in response to search queries by client devices.”; fig. 6:610, para. [0124]; Examiner note: the causing display of a presentation of the media content at the client device is interpreted as the provide the identified images in act 610 as search results).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content into the combination of the prior arts of Li I that teaches searching content with multi-dimensional image matching in response to a search query, and Li II that teaches provide systems and methods for searching images based on a query image content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to exploit correlations among tags to improve relevance in the retrieval of images because it provides more robust and intuitive image search and browsing capabilities, better aligning search results with human perception and expectations, and ultimately improving the user experience (Kale, para. [0002]-[0003]).

As per claim 2, Li I, Li II and Kale teach all the limitations as discussed in claim 1 above.  
	However, it is noted that the prior art of Li I and Kale do not explicitly teach “wherein the selecting the image matching model comprises selecting between: a first image matching model trained to detect local keypoint features; and a second image matching model trained to detect semantic object-level features.” 
	On the other hand, in the same field of endeavor, Li II teaches wherein the selecting the image matching model comprises selecting between: a first image matching model trained to detect local keypoint features (i.e. “the computing model includes a first machine-learning model trained using a first set of training data and designed to identify a first type of object or a first specific object”’ para. [0033]. Examiner note: The first image matching model trained is interpreted as the first machine-learning model trained. The detect local keypoint features is interpreted as the to identify a first type of object or a first specific object); 	
	and a second image matching model trained to detect semantic object-level features (i.e. “the computing model includes a second machine-learning model trained using a second set of training data and designed to identify a second type of object or a second specific object.” para. [0033]. Further, i.e. “the image search system extracts semantic meanings from images”; para. [0026]; Examiner note: the semantic object-level features is interpreted as the semantic meanings. The second image matching model trained is interpreted as the second machine-learning model trained).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Li II that teaches provide systems and methods for searching images based on a query image content into Li I that teaches searching content with multi-dimensional image matching in response to a search query, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to improve techniques for searching data repositories—containing both text and images—using a query image. This allows users to search using an image and/or conduct searches based on both the query vector and semantic meaning extracted from the image (Li II, para. [0003]-[0004]).

	As per claim 8, Li I teaches a method comprising (i.e. “methods”; para. [0083]):
	receiving a request to associate an image with media content (i.e. “Referring to FIG. 8, at block 801, processing logic receives a number of image search requests for searching images to be associated with a number of content items.”; figs. 7-8, para. [0064])
	within a database (i.e. “For example, referring back to FIGS. 1A-1B, content database (DB) or server 133 may be an Ads database or Ads server.”; para. [0065]),
	the request comprising metadata (i.e. “an index table for content provider based image searching, which includes a number of mapping entries, each entry mapping a content provider ID to one or more image IDs.”; figs. 7-8, para. [0046], [0060], [0064]; Examiner note: the image search includes metadata such as number of mapping entries, each entry mapping a content provider ID to one or more image IDs)
	and the image comprising image attributes (i.e. “For each of the images that are identified as image candidates to be matched with a content item, a feature score is calculated for each of the features (e.g., image attributes or properties, and/or any other metadata or circumstantial data surrounding the image) that are extracted or determined from the image.”; fig. 7, para. [0060]-[0062]);
	responsive to the request (i.e. “in response to a search query received from a client at block 701”; fig. 7:701, para. [0062]);
	 receiving a query that comprises image data and attributes from a client device (i.e. “receiving a query image, recognizing one or more text strings on the query image,”; para. [0096]; Examiner note: the query is interpreted as the query image. The image data and attributes are interpreted as the one or more text strings on the query image. Further, i.e. “The computing system can include client devices and servers”; para. [0120])
	extracting query image features from the image data of the query using at least one image matching model (i.e. “one or more text strings may also be extracted from the input image, for example, using optical character recognition (OCR), and/or translated from other languages into English using a computing model”; para. [0028])
	and comparing the query image features to the stored feature vector representation of the image description (i.e. “the digital content contextual tagging system 106 can compare the search query (and/or search query n-gram) and tags (or other information) associated with the one or more selected images from the search results to determine which selected image will be associated with a multi-term contextual tag.”; para. [0069]; Examiner note: the stored feature vector representation of the image description is interpreted as the tags (or other information);
However, it is noted that the prior art of Li I does not explicitly teach “selecting, from among a plurality of image matching models each trained to detect different categories of visual features, an image matching model based on the image attributes of the image; generating an image description that comprises a machine-generated feature vector representation including a set of image features output by the selected image matching model; associating the image description with the media content within the database as a stored feature vector representation separate from textual tags; accessing the media content within the database based on the set of image features of the image and the attributes of the query; and causing display of a presentation of the media content at the client device.” 
On the other hand, in the same field of endeavor, Li II teaches selecting, from among a plurality of image matching models each trained to detect different categories of visual features, an image matching model based on the image attributes of the image (i.e. “the image search system is flexible, allowing user-defined and/or user-selected computing models (e.g., machine learning models) to be used.”; para. [0027]. Further, i.e. “the image search system may not match images solely based on the input image (e.g., the query image) alone. In certain embodiments, one or more text strings may also be extracted from the input image, for example, using optical character recognition (OCR), and/or translated from other languages into English using a computing model.”; para. [0028]; Examiner note: the plurality of image matching models is interpreted as user-defined and/or user-selected computing models (e.g., machine learning models). The image attributes of the image are interpreted as the one or more text strings);
	generating an image description that comprises a machine-generated feature vector representation including a set of image features output by the selected image matching model (i.e. “the query vector represents one or more characteristics of the query image. In some embodiments, the query vector is converted from the query image. In certain embodiments, the query vector depends on the computing model used. In some embodiments, the query vector is a first query vector generated based on the query image using a first computing mode”; para. [0034]; Examiner note: the image description is interpreted as the characteristics of the query image. The machine-generated feature vector is interpreted as the query vector. The image features output is interpreted as the query vector generated based on the query image using a first computing mode);
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Li II that teaches provide systems and methods for searching images based on a query image content into Li I that teaches searching content with multi-dimensional image matching in response to a search query. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to improve techniques for searching data repositories—containing both text and images—using a query image. This allows users to search using an image and/or conduct searches based on both the query vector and semantic meaning extracted from the image (Li II, para. [0003]-[0004]).
	However, it is noted that the prior art of Li I and Li II do not explicitly teach “associating the image description with the media content within the database as a stored feature vector representation separate from textual tags; accessing the media content within the database based on the set of image features of the image and the attributes of the query; and causing display of a presentation of the media content at the client device.” 
	On the other hand, in the same field of endeavor, Kale teaches associating the image description with the media content within the database as a stored feature vector representation separate from textual tags (i.e. “Moreover, as illustrated in FIG. 5B, the digital content contextual tagging system 106 can store the digital images with the associated multi-term contextual tags (from act 524) in the image index 216 (e.g., back fill into the image collection).”; figs. 2, 5A-B, para. [0050], [0101]; Examiner notes: herein the image description is considered as the multi-term contextual tags; the media content is considered as the digital images; and the database is considered as the image index. Further, i.e. “associate multi-term contextual tags and scores with images”; para. [0047]; Examiner note: The stored feature vector representation separate from textual tags is interpreted as the scores);
	accessing the media content within the database based on the set of image features of the image and the attributes of the query (i.e. “FIG. 4, the digital content contextual tagging system 106 identifies user selections of the digital images portraying a “person with animal on white shirt” from the image search results as the selected images 406. Additionally, as shown in FIG. 4, the digital content contextual tagging system 106 also identifies a search query n-gram 404 from the search query in the interface 402.”; fig. 4, para. [0060], [0065]-[0066]; Examiner note: the accessing the media content within the database is interpreted as the identifies user selections of the digital images portraying a “person with animal on white shirt”; the set of image features of the image can be considered as the person face, shirt, … in the selected images showed in fig. 4:406, the attributes of the query can be considered as the search query n-gram in fig. 4:404); and 
	causing display of a presentation of the media content at the client device (i.e. “Indeed, as shown in FIG. 6, upon ranking the identified images in act 608, the digital content contextual tagging system 106 can provide the identified images in act 610 as search results (e.g., images portraying persons wearing animal shirts based on a correspondence to a multi-term contextual tag of “person with animal on white shirt”). Indeed, as illustrated in FIG. 6, the digital content contextual tagging system 106 can utilize propagated multi-term contextual tags to provide more accurate search results in response to search queries by client devices.”; fig. 6:610, para. [0124]; Examiner note: the causing display of a presentation of the media content at the client device is interpreted as the provide the identified images in act 610 as search results). 
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content into the combination of the prior arts of Li I that teaches searching content with multi-dimensional image matching in response to a search query, and Li II that teaches provide systems and methods for searching images based on a query image content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to exploit correlations among tags to improve relevance in the retrieval of images because it provides more robust and intuitive image search and browsing capabilities, better aligning search results with human perception and expectations, and ultimately improving the user experience (Kale, para. [0002]-[0003]).

As per claim 9, Li I, Li II, and Kale teach all the limitations as discussed in claim 8 above.  
	Additionally, Li I teaches wherein the image comprises image attributes (i.e. “For each of the images that are identified as image candidates to be matched with a content item, a feature score is calculated for each of the features (e.g., image attributes or properties, and/or any other metadata or circumstantial data surrounding the image) that are extracted or determined from the image.”; fig. 7, para. [0060]-[0062]), and the generating the image description includes:
	However, it is noted that the prior art of Li I and Kale do not explicitly teach “selecting the image matching model from among the plurality of image matching models based on the image attributes of the image, wherein the plurality of image matching models are each trained to detect different categories of visual features.” 
	On the other hand, in the same field of endeavor, Li II teaches selecting the image matching model from among the plurality of image matching models based on the image attributes of the image, wherein the plurality of image matching models are each trained to detect different categories of visual features (i.e. “the image search system is flexible, allowing user-defined and/or user-selected computing models (e.g., machine learning models) to be used.”; para. [0027]. Further, i.e. “the image search system may not match images solely based on the input image (e.g., the query image) alone. In certain embodiments, one or more text strings may also be extracted from the input image, for example, using optical character recognition (OCR), and/or translated from other languages into English using a computing model.”; para. [0028]; Examiner note: the plurality of image matching models is interpreted as user-defined and/or user-selected computing models (e.g., machine learning models). The image attributes of the image are interpreted as the one or more text strings).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Li II that teaches provide systems and methods for searching images based on a query image content into Li I that teaches searching content with multi-dimensional image matching in response to a search query, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to improve techniques for searching data repositories—containing both text and images—using a query image. This allows users to search using an image and/or conduct searches based on both the query vector and semantic meaning extracted from the image (Li II, para. [0003]-[0004]).

	As per claim 15, Li I teaches a non-transitory computer-readable storage medium (i.e. “computer-readable storage medium 1509”; fig. 9, para. [0077]) 
	storing instructions that (i.e. “a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions”; fig. 9, para. [0077]), 
	when executed by at least one processor, cause the at least one processor to perform operations comprising (i.e. “executable code and/or data of a variety of operating systems, device drivers, firmware (e.g., input output basic system or BIOS), and/or applications can be loaded in memory 1503 and executed by processor 1501”; fig. 9, para. [0070]-[0071]):
	receiving a request to associate an image with media content (i.e. “Referring to FIG. 8, at block 801, processing logic receives a number of image search requests for searching images to be associated with a number of content items.”; figs. 7-8, para. [0064])
	within a database (i.e. “For example, referring back to FIGS. 1A-1B, content database (DB) or server 133 may be an Ads database or Ads server.”; para. [0065]),
	the request comprising metadata (i.e. “an index table for content provider based image searching, which includes a number of mapping entries, each entry mapping a content provider ID to one or more image IDs.”; figs. 7-8, para. [0046], [0060], [0064]; Examiner note: the image search includes metadata such as number of mapping entries, each entry mapping a content provider ID to one or more image IDs)
	and the image comprising image attributes (i.e. “For each of the images that are identified as image candidates to be matched with a content item, a feature score is calculated for each of the features (e.g., image attributes or properties, and/or any other metadata or circumstantial data surrounding the image) that are extracted or determined from the image.”; fig. 7, para. [0060]-[0062]);
	responsive to the request (i.e. “in response to a search query received from a client at block 701”; fig. 7:701, para. [0062]);
	 receiving a query that comprises image data and attributes from a client device (i.e. “receiving a query image, recognizing one or more text strings on the query image,”; para. [0096]; Examiner note: the query is interpreted as the query image. The image data and attributes is interpreted as the one or more text strings on the query image. Further, i.e. “The computing system can include client devices and servers”; para. [0120])
	extracting query image features from the image data of the query using at least one image matching model (i.e. “one or more text strings may also be extracted from the input image, for example, using optical character recognition (OCR), and/or translated from other languages into English using a computing model”; para. [0028]) and 
comparing the query image features to the stored feature vector representation of the image description (i.e. “the digital content contextual tagging system 106 can compare the search query (and/or search query n-gram) and tags (or other information) associated with the one or more selected images from the search results to determine which selected image will be associated with a multi-term contextual tag.”; para. [0069]; Examiner note: the stored feature vector representation of the image description is interpreted as the tags (or other information);
	However, it is noted that the prior art of Li I does not explicitly teach “selecting, from among a plurality of image matching models each trained to detect different categories of visual features, an image matching model based on the image attributes of the image;  generating an image description that comprises a machine-generated feature vector representation including a set of image features output by the selected image matching model; associating the image description with the media content within the database as a stored feature vector representation separate from textual tags; accessing the media content within the database based on the set of image features of the image and the attributes of the query; and causing display of a presentation of the media content at the client device.” 
On the other hand, in the same field of endeavor, Li II teaches selecting, from among a plurality of image matching models each trained to detect different categories of visual features, an image matching model based on the image attributes of the image (i.e. “the image search system is flexible, allowing user-defined and/or user-selected computing models (e.g., machine learning models) to be used.”; para. [0027]. Further, i.e. “the image search system may not match images solely based on the input image (e.g., the query image) alone. In certain embodiments, one or more text strings may also be extracted from the input image, for example, using optical character recognition (OCR), and/or translated from other languages into English using a computing model.”; para. [0028]; Examiner note: the plurality of image matching models is interpreted as user-defined and/or user-selected computing models (e.g., machine learning models). The image attributes of the image are interpreted as the one or more text strings);
	generating an image description that comprises a machine-generated feature vector representation including a set of image features output by the selected image matching model (i.e. “the query vector represents one or more characteristics of the query image. In some embodiments, the query vector is converted from the query image. In certain embodiments, the query vector depends on the computing model used. In some embodiments, the query vector is a first query vector generated based on the query image using a first computing mode”; para. [0034]; Examiner note: the image description is interpreted as the characteristics of the query image. The machine-generated feature vector is interpreted as the query vector. The image features output is interpreted as the query vector generated based on the query image using a first computing mode);
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Li II that teaches provide systems and methods for searching images based on a query image content into Li I that teaches searching content with multi-dimensional image matching in response to a search query. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to improve techniques for searching data repositories—containing both text and images—using a query image. This allows users to search using an image and/or conduct searches based on both the query vector and semantic meaning extracted from the image (Li II, para. [0003]-[0004]).
	However, it is noted that the prior art of Li I and Li II do not explicitly teach “associating the image description with the media content within the database as a stored feature vector representation separate from textual tags; accessing the media content within the database based on the set of image features of the image and the attributes of the query; and causing display of a presentation of the media content at the client device.”
	On the other hand, in the same field of endeavor, Kale teaches associating the image description with the media content within the database as a stored feature vector representation separate from textual tags (i.e. “Moreover, as illustrated in FIG. 5B, the digital content contextual tagging system 106 can store the digital images with the associated multi-term contextual tags (from act 524) in the image index 216 (e.g., back fill into the image collection).”; figs. 2, 5A-B, para. [0050], [0101]; Examiner notes: herein the image description is considered as the multi-term contextual tags; the media content is considered as the digital images; and the database is considered as the image index. Further, i.e. “associate multi-term contextual tags and scores with images”; para. [0047]; Examiner note: The stored feature vector representation separate from textual tags is interpreted as the scores);
	accessing the media content within the database based on the set of image features of the image and the attributes of the query (i.e. “FIG. 4, the digital content contextual tagging system 106 identifies user selections of the digital images portraying a “person with animal on white shirt” from the image search results as the selected images 406. Additionally, as shown in FIG. 4, the digital content contextual tagging system 106 also identifies a search query n-gram 404 from the search query in the interface 402.”; fig. 4, para. [0060], [0065]-[0066]; Examiner note: the accessing the media content within the database is interpreted as the identifies user selections of the digital images portraying a “person with animal on white shirt”; the set of image features of the image can be considered as the person face, shirt, … in the selected images showed in fig. 4:406, the attributes of the query can be considered as the search query n-gram in fig. 4:404); and 
	causing display of a presentation of the media content at the client device (i.e. “Indeed, as shown in FIG. 6, upon ranking the identified images in act 608, the digital content contextual tagging system 106 can provide the identified images in act 610 as search results (e.g., images portraying persons wearing animal shirts based on a correspondence to a multi-term contextual tag of “person with animal on white shirt”). Indeed, as illustrated in FIG. 6, the digital content contextual tagging system 106 can utilize propagated multi-term contextual tags to provide more accurate search results in response to search queries by client devices.”; fig. 6:610, para. [0124]; Examiner note: the causing display of a presentation of the media content at the client device is interpreted as the provide the identified images in act 610 as search results). 
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content into the combination of the prior arts of Li I that teaches searching content with multi-dimensional image matching in response to a search query, and Li II that teaches provide systems and methods for searching images based on a query image content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to exploit correlations among tags to improve relevance in the retrieval of images because it provides more robust and intuitive image search and browsing capabilities, better aligning search results with human perception and expectations, and ultimately improving the user experience (Kale, para. [0002]-[0003]).

	As per claim 16, Li I, Li II, and Kale teach all the limitations as discussed in claim 15 above.  
	Additionally, Li I teaches wherein the image comprises image attributes (i.e. “For each of the images that are identified as image candidates to be matched with a content item, a feature score is calculated for each of the features (e.g., image attributes or properties, and/or any other metadata or circumstantial data surrounding the image) that are extracted or determined from the image.”; fig. 7, para. [0060]-[0062]), and the generating the image description includes:
However, it is noted that the prior art of Li I and Kale do not explicitly teach “selecting the image matching model from among the plurality of image matching models based on the image attributes of the image, wherein the plurality of image matching models are each trained to detect different categories of visual features.” 
	On the other hand, in the same field of endeavor, Li II teaches selecting the image matching model from among the plurality of image matching models based on the image attributes of the image, wherein the plurality of image matching models are each trained to detect different categories of visual features (i.e. “the image search system is flexible, allowing user-defined and/or user-selected computing models (e.g., machine learning models) to be used.”; para. [0027]. Further, i.e. “the image search system may not match images solely based on the input image (e.g., the query image) alone. In certain embodiments, one or more text strings may also be extracted from the input image, for example, using optical character recognition (OCR), and/or translated from other languages into English using a computing model.”; para. [0028]; Examiner note: the plurality of image matching models is interpreted as user-defined and/or user-selected computing models (e.g., machine learning models). The image attributes of the image are interpreted as the one or more text strings).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Li II that teaches provide systems and methods for searching images based on a query image content into Li I that teaches searching content with multi-dimensional image matching in response to a search query, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to improve techniques for searching data repositories—containing both text and images—using a query image. This allows users to search using an image and/or conduct searches based on both the query vector and semantic meaning extracted from the image (Li II, para. [0003]-[0004]).

9.	Claims 3, 10 and 17 are rejected under 35 U.S.C. § 103 as being unpatentable over Li et. al. (US 20180011876 A1) hereinafter Li I in view of Li et. al. (US 20240202230 A1) hereinafter Li II in further view of Kale et al. (US 20210034657 A1) still in further view of Song et al. (US 20190318195 A1).

	As per claim 3, Li I, Li II, and Kale teach all the limitations as discussed in claim 1 above.  
	However, it is noted that the combination of the prior arts of Li I, Li II, and Kale do not explicitly teach “wherein the plurality of image matching models includes a scale-invariant feature transform (SIFT); Gradient Location and Orientation Histogram (GLOH); histogram of oriented gradients (HOG); Fast Image Retrieval (FIRe); Contrastive Language Image Pre-Training (CLIP); and wherein at least one of the plurality of image matching models generates a feature vector representation of the image.”
	On the other hand, in the same field of endeavor, Song teaches wherein the plurality of image matching models includes (i.e. “The feature detection algorithm may include”; para. [0061]:
	a scale-invariant feature transform (SIFT); Gradient Location and Orientation Histogram (GLOH); histogram of oriented gradients (HOG); Fast Image Retrieval (FIRe); Contrastive Language Image Pre-Training (CLIP) (i.e. “the scale-invariant feature transform (SIFT)”; para. [0043]; “Gradient Location and Orientation Histogram (GLOH) descriptors”; para. [0043]; “Histograms of Oriented Gradient (HOG)”; para. [0061]; “FAST”; para. [0061]. Examiner Note: the analysis based fast algorithms, which results in Fast Image Retrieval); and
	wherein at least one of the plurality of image matching models generates a feature vector representation of the image (i.e. “SIFT feature may comprise a 128-dimension vector, or a 36-dimension vector, depending on how the SIFT feature detection algorithm is configured.”; para. [0070]; Examiner note:  the one of the plurality of image matching models is interpreted as the SIFT).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Song that teaches image-based object recognition into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to enable the creation of compact and efficient recognition libraries for image-based object recognition, thereby improving the databases necessary for accurate object recognition (Song, para. [0006]).

	As per claim 10, Li I, Li II, and Kale teach all the limitations as discussed in claim 9 above.  
	However, it is noted that the combination of the prior arts of Li I, Li II, and Kale do not explicitly teach “wherein the image matching model includes one or more of: a scale-invariant feature transform (SIFT); Gradient Location and Orientation Histogram (GLOH); histogram of oriented gradients (HOG); Fast Image Retrieval (FIRe); Contrastive Language Image Pre-Training (CLIP); and wherein at least one of the plurality of image matching models generates a feature vector representation of the image.”
	On the other hand, in the same field of endeavor, Song teaches wherein the plurality of image matching models includes (i.e. “The feature detection algorithm may include”; para. [0061]:
	a scale-invariant feature transform (SIFT); Gradient Location and Orientation Histogram (GLOH); histogram of oriented gradients (HOG); Fast Image Retrieval (FIRe); Contrastive Language Image Pre-Training (CLIP) (i.e. “the scale-invariant feature transform (SIFT)”; para. [0043]; “Gradient Location and Orientation Histogram (GLOH) descriptors”; para. [0043]; “Histograms of Oriented Gradient (HOG)”; para. [0061]; “FAST”; para. [0061]. Examiner Note: the analysis based fast algorithms, which results in Fast Image Retrieval); and
	wherein at least one of the plurality of image matching models generates a feature vector representation of the image (i.e. “SIFT feature may comprise a 128-dimension vector, or a 36-dimension vector, depending on how the SIFT feature detection algorithm is configured.”; para. [0070]; Examiner note:  the one of the plurality of image matching models is interpreted as the SIFT).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Song that teaches image-based object recognition into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to enable the creation of compact and efficient recognition libraries for image-based object recognition, thereby improving the databases necessary for accurate object recognition (Song, para. [0006]).

	As per claim 17, Li I, Li II, and Kale teach all the limitations as discussed in claim 16 above.  
	However, it is noted that the combination of the prior arts of Li I, Li II, and Kale do not explicitly teach “wherein the image matching model includes one or more of: a scale-invariant feature transform (SIFT); Gradient Location and Orientation Histogram (GLOH); histogram of oriented gradients (HOG); Fast Image Retrieval (FIRe); Contrastive Language Image Pre-Training (CLIP); and wherein at least one of the plurality of image matching models generates a feature vector representation of the image.”
	On the other hand, in the same field of endeavor, Song teaches wherein the plurality of image matching models includes (i.e. “The feature detection algorithm may include”; para. [0061]:
	a scale-invariant feature transform (SIFT); Gradient Location and Orientation Histogram (GLOH); histogram of oriented gradients (HOG); Fast Image Retrieval (FIRe); Contrastive Language Image Pre-Training (CLIP) (i.e. “the scale-invariant feature transform (SIFT)”; para. [0043]; “Gradient Location and Orientation Histogram (GLOH) descriptors”; para. [0043]; “Histograms of Oriented Gradient (HOG)”; para. [0061]; “FAST”; para. [0061]. Examiner Note: the analysis based fast algorithms, which results in Fast Image Retrieval); and
	wherein at least one of the plurality of image matching models generates a feature vector representation of the image (i.e. “SIFT feature may comprise a 128-dimension vector, or a 36-dimension vector, depending on how the SIFT feature detection algorithm is configured.”; para. [0070]; Examiner note:  the one of the plurality of image matching models is interpreted as the SIFT).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Song that teaches image-based object recognition into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to enable the creation of compact and efficient recognition libraries for image-based object recognition, thereby improving the databases necessary for accurate object recognition (Song, para. [0006]).

10.	Claims 4, 11 and 18 are rejected under 35 U.S.C. § 103 as being unpatentable over Li et. al. (US 20180011876 A1) hereinafter Li I in view of Li et. al. (US 20240202230 A1) hereinafter Li II in further view of Kale et al. (US 20210034657 A1) still in further view of Martin (US 20170337744 A1).

As per claim 4, Li I, Li II and Kale teach all the limitations as discussed in claim 1 above.  
However, it is noted that the combination of the prior arts of Li I, Li II and Kale do not explicitly teach “wherein the media content comprises Augmented-Reality (AR) content and wherein the similarity between the query image features and the stored feature vector representation determined rendering of the AR content.”
On the other hand, in the same field of endeavor, Martin teaches wherein the media content comprises Augmented-Reality (AR) content (i.e. “Social Media Content”; Para. [0116]. Further, i.e. “AR client device”; para. [0120]) and 
wherein the similarity between the query image features and the stored feature vector representation determined rendering of the AR content (i.e. “Media Tags representing social media data from third party sources may look similar to those created within the digital media client software 630,”; para. [0140]). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Martin that teaches augmented reality and virtual reality environments into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to utilize data from body or body-part motion-tracking sensors, allowing for dynamic adjustments in digital media size based on proximity to the real-world setting (Martin, para. [0003]-[0004]).

As per claim 11, Li I, Li II and Kale teach all the limitations as discussed in claim 8 above.  
However, it is noted that the combination of the prior arts of Li I, Li II and Kale do not explicitly teach “wherein the media content comprises Augmented-Reality (AR) content, and wherein accessing the media content based on similarity between the extracted query image features and the stored feature vector representation triggers rendering of the AR content.”
	On the other hand, in the same field of endeavor, Martin teaches wherein the media content comprises Augmented-Reality (AR) content (i.e. “Social Media Content”; Para. [0116]. Further, i.e. “AR client device”; para. [0120]), and
	wherein accessing the media content based on similarity between the extracted query image features and the stored feature vector representation triggers rendering of the AR content (i.e. “Note that Media Tags representing social media data from third party sources may look similar to those created within the digital media client software 630,”; para. [0140]). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Martin that teaches augmented reality and virtual reality environments into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to utilize data from body or body-part motion-tracking sensors, allowing for dynamic adjustments in digital media size based on proximity to the real-world setting (Martin, para. [0003]-[0004]).

As per claim 18, Li I, Li II and Kale teach all the limitations as discussed in claim 15 above.  
However, it is noted that the combination of the prior arts of Li I, Li II and Kale do not explicitly teach “wherein the media content comprises Augmented-Reality (AR) content, and wherein accessing the media content based on similarity between the extracted query image features and the stored feature vector representation triggers rendering of the AR content.”
	On the other hand, in the same field of endeavor, Martin teaches wherein the media content comprises Augmented-Reality (AR) content (i.e. “Social Media Content”; Para. [0116]. Further, i.e. “AR client device”; para. [0120]), and
	wherein accessing the media content based on similarity between the extracted query image features and the stored feature vector representation triggers rendering of the AR content (i.e. “Note that Media Tags representing social media data from third party sources may look similar to those created within the digital media client software 630,”; para. [0140]). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Martin that teaches augmented reality and virtual reality environments into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to utilize data from body or body-part motion-tracking sensors, allowing for dynamic adjustments in digital media size based on proximity to the real-world setting (Martin, para. [0003]-[0004]).

10.	Claims 5, 12 and 19 are rejected under 35 U.S.C. § 103 as being unpatentable over Li et. al. (US 20180011876 A1) hereinafter Li I in view of Li et. al. (US 20240202230 A1) hereinafter Li II in further view of Kale et al. (US 20210034657 A1) still in further view of Baril et al. (US 11036781 B1) still in further view of Wexler et al. (US 20190294629 A1).

As per claim 5, Li I, Li II and Kale teach all the limitations as discussed in claim 1 above.
	Additionally, Li I teaches extracting, at the second server, query image features from the image data (i.e. “one or more text strings may also be extracted from the input image, for example, using optical character recognition (OCR), and/or translated from other languages into English using a computing model”; para. [0028]. Examiner note: The second server is taught by Wexler) and 
comparing the query image features to the stored feature vector representation to determine a similarity score (i.e. “the digital content contextual tagging system 106 can compare the search query (and/or search query n-gram) and tags (or other information) associated with the one or more selected images from the search results to determine which selected image will be associated with a multi-term contextual tag.”; para. [0069]; Examiner note: the stored feature vector representation of the image description is interpreted as the tags (or other information);
However, it is noted that the combination of the prior arts of Li I, Li II and Kale do not explicitly teach “wherein the system comprises a plurality of servers, and wherein the method further comprises: receiving, at a first server from among the plurality of servers associated with the system, the request to associate the image with the media content; generating, at the first server, the image description that comprises the set of image features based on the image associated with the request; associating, by the first server, the image description with the media content within the database; receiving, at the second server from among the plurality of servers, the query that comprises image data from the client device; and accessing, at the second server, the media content within the database based on the set of image features.”
On the other hand, in the same field of endeavor, Baril teaches wherein the system comprises a plurality of servers (i.e. “plurality of servers”; Col.14 lines 43-45), and 
wherein the method further comprises: receiving (i.e. “receive”; Col. 13 lines 29-35), 
at a first server from among the plurality of servers associated with the system (i.e. “each of the servers in the plurality of servers can perform operations”; Col.14 lines 45-53), 
the request to associate the image with the media content (i.e. “request for a media content item from a client device”; Col. 13 lines 29-35); 
generating, at the first server, the image description that comprises the set of image features based on the image associated with the request (i.e. “a particular image included in the message image payload 406 depicts an animal (e.g., a lion), a tag value may be included within the message tag 420 that is indicative of the relevant animal”; Col. 9 lines 5-10. Further, i.e. “Receives the request from the client device...the request 65 can be sent to the delivery server system and the client device 102 selects a selectable icon or link”; Col. 9 lines 64-67. Further, i.e. “The user selects the link "Night Court" (description) in the user interface”; Col.10 lines 10-13); 
associating, by the first server, the image description with the media content within the database (i.e. “the database 120 also stores a metadata table ... the meta data table 316 includes the meta data associated with the media content items”; Col. 8 lines 1-6. Further, i.e. “A particular image included in the message image payload 406 depicts an animal (e.g., a lion), a tag value may be included within the message tag 420 that is indicative of the relevant animal”; Col. 9 lines 5-10. Further, i.e. “Plurality of servers”; Col.14 lines 43-45);
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Baril that teaches use customized avatars within electronic messages such as text messages into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to increasingly use customized avatars in electronic messages, such as texts, emails, and chats, reflecting a global demand for more visual communication, which can convey feelings more accurately (Baril, Col 1, Lines 17-29).
	However, it is noted that the combination of the prior arts of Li I, Li II, Kale and Baril do not explicitly teach “receiving, at the second server from among the plurality of servers, the query that comprises image data from the client device; and accessing, at the second server, the media content within the database based on the set of image features.”
On the other hand, in the same field of endeavor, Wexler teaches receiving (i.e. “receives inputs”; para. [0041]), 
at the second server from among the plurality of servers (i.e. “one or more servers”; para. [0104]), 
the query that comprises image data from the client device (i.e. “server 108 receives a query from a second user ... matches the query to a first index entry in the image catalog”; para. [0146]); 
accessing (i.e. “access to a given image”; para. [0153]), 
at the second server (i.e. “one or more servers”; para. [0104]), 
the media content within the database based on the set of image features (i.e. “request that provides access to a given image repository (database)”; para. [0153]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wexler that teaches a digital image system for cataloging images from multiple sources and using the catalog for discovering and viewing images into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content, and Baril that teaches use customized avatars within electronic messages such as text messages. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to enable queries to the image catalog to operate on groups of images, thereby reducing bandwidth and improving performance (Wexler, para. [0062]).

As per claim 12, Li I, Li II and Kale teach all the limitations as discussed in claim 8 above.
Additionally, Li I teaches extracting, at the second server, query image features from the image data (i.e. “one or more text strings may also be extracted from the input image, for example, using optical character recognition (OCR), and/or translated from other languages into English using a computing model”; para. [0028]. Examiner note: The second server is taught by Wexler) and 
comparing the query image features to the stored feature vector representation to determine a similarity score (i.e. “the digital content contextual tagging system 106 can compare the search query (and/or search query n-gram) and tags (or other information) associated with the one or more selected images from the search results to determine which selected image will be associated with a multi-term contextual tag.”; para. [0069]; Examiner note: the stored feature vector representation of the image description is interpreted as the tags (or other information);
However, it is noted that the combination of the prior arts of Li I, Li II and Kale do not explicitly teach “wherein the system comprises a plurality of servers, and wherein the method further comprises: receiving, at a first server from among the plurality of servers associated with the system, the request to associate the image with the media content; generating, at the first server, the image description that comprises the set of image features based on the image associated with the request; associating, by the first server, the image description with the media content within the database; receiving, at the second server from among the plurality of servers, the query that comprises image data from the client device; and accessing, at the second server, the media content within the database based on the set of image features.”
On the other hand, in the same field of endeavor, Baril teaches wherein the system comprises a plurality of servers (i.e. “plurality of servers”; Col.14 lines 43-45), and 
wherein the method further comprises: receiving (i.e. “receive”; Col. 13 lines 29-35), 
at a first server from among the plurality of servers associated with the system (i.e. “each of the servers in the plurality of servers can perform operations”; Col.14 lines 45-53), 
the request to associate the image with the media content (i.e. “request for a media content item from a client device”; Col. 13 lines 29-35); 
generating, at the first server, the image description that comprises the set of image features based on the image associated with the request (i.e. “a particular image included in the message image payload 406 depicts an animal (e.g., a lion), a tag value may be included within the message tag 420 that is indicative of the relevant animal”; Col. 9 lines 5-10. Further, i.e. “Receives the request from the client device...the request 65 can be sent to the delivery server system and the client device 102 selects a selectable icon or link”; Col. 9 lines 64-67. Further, i.e. “The user selects the link "Night Court" (description) in the user interface”; Col.10 lines 10-13); 
associating, by the first server, the image description with the media content within the database (i.e. “the database 120 also stores a metadata table ... the meta data table 316 includes the meta data associated with the media content items”; Col. 8 lines 1-6. Further, i.e. “A particular image included in the message image payload 406 depicts an animal (e.g., a lion), a tag value may be included within the message tag 420 that is indicative of the relevant animal”; Col. 9 lines 5-10. Further, i.e. “Plurality of servers”; Col.14 lines 43-45);
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Baril that teaches use customized avatars within electronic messages such as text messages into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to increasingly use customized avatars in electronic messages, such as texts, emails, and chats, reflecting a global demand for more visual communication, which can convey feelings more accurately (Baril, Col 1, Lines 17-29).
	However, it is noted that the combination of the prior arts of Li I, Li II, Kale and Baril do not explicitly teach “receiving, at the second server from among the plurality of servers, the query that comprises image data from the client device; and accessing, at the second server, the media content within the database based on the set of image features.”
On the other hand, in the same field of endeavor, Wexler teaches receiving (i.e. “receives inputs”; para. [0041]), 
at the second server from among the plurality of servers (i.e. “one or more servers”; para. [0104]), 
the query that comprises image data from the client device (i.e. “server 108 receives a query from a second user ... matches the query to a first index entry in the image catalog”; para. [0146]); 
accessing (i.e. “access to a given image”; para. [0153]), 
at the second server (i.e. “one or more servers”; para. [0104]), 
the media content within the database based on the set of image features (i.e. “request that provides access to a given image repository (database)”; para. [0153]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wexler that teaches a digital image system for cataloging images from multiple sources and using the catalog for discovering and viewing images into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content, and Baril that teaches use customized avatars within electronic messages such as text messages. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to enable queries to the image catalog to operate on groups of images, thereby reducing bandwidth and improving performance (Wexler, para. [0062]).

As per claim 19, Li I, Li II and Kale teach all the limitations as discussed in claim 15 above.
Additionally, Li I teaches extracting, at the second server, query image features from the image data (i.e. “one or more text strings may also be extracted from the input image, for example, using optical character recognition (OCR), and/or translated from other languages into English using a computing model”; para. [0028]. Examiner note: The second server is taught by Wexler) and 
comparing the query image features to the stored feature vector representation to determine a similarity score (i.e. “the digital content contextual tagging system 106 can compare the search query (and/or search query n-gram) and tags (or other information) associated with the one or more selected images from the search results to determine which selected image will be associated with a multi-term contextual tag.”; para. [0069]; Examiner note: the stored feature vector representation of the image description is interpreted as the tags (or other information);
However, it is noted that the combination of the prior arts of Li I, Li II and Kale do not explicitly teach “wherein the system comprises a plurality of servers, and wherein the method further comprises: receiving, at a first server from among the plurality of servers associated with the system, the request to associate the image with the media content; generating, at the first server, the image description that comprises the set of image features based on the image associated with the request; associating, by the first server, the image description with the media content within the database; receiving, at the second server from among the plurality of servers, the query that comprises image data from the client device; and accessing, at the second server, the media content within the database based on the set of image features.”
On the other hand, in the same field of endeavor, Baril teaches wherein the system comprises a plurality of servers (i.e. “plurality of servers”; Col.14 lines 43-45), and 
wherein the method further comprises: receiving (i.e. “receive”; Col. 13 lines 29-35), 
at a first server from among the plurality of servers associated with the system (i.e. “each of the servers in the plurality of servers can perform operations”; Col.14 lines 45-53), 
the request to associate the image with the media content (i.e. “request for a media content item from a client device”; Col. 13 lines 29-35); 
generating, at the first server, the image description that comprises the set of image features based on the image associated with the request (i.e. “a particular image included in the message image payload 406 depicts an animal (e.g., a lion), a tag value may be included within the message tag 420 that is indicative of the relevant animal”; Col. 9 lines 5-10. Further, i.e. “Receives the request from the client device...the request 65 can be sent to the delivery server system and the client device 102 selects a selectable icon or link”; Col. 9 lines 64-67. Further, i.e. “The user selects the link "Night Court" (description) in the user interface”; Col.10 lines 10-13); 
associating, by the first server, the image description with the media content within the database (i.e. “the database 120 also stores a metadata table ... the meta data table 316 includes the meta data associated with the media content items”; Col. 8 lines 1-6. Further, i.e. “A particular image included in the message image payload 406 depicts an animal (e.g., a lion), a tag value may be included within the message tag 420 that is indicative of the relevant animal”; Col. 9 lines 5-10. Further, i.e. “Plurality of servers”; Col.14 lines 43-45);
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Baril that teaches use customized avatars within electronic messages such as text messages into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to increasingly use customized avatars in electronic messages, such as texts, emails, and chats, reflecting a global demand for more visual communication, which can convey feelings more accurately (Baril, Col 1, Lines 17-29).
	However, it is noted that the combination of the prior arts of Li I, Li II, Kale and Baril do not explicitly teach “receiving, at the second server from among the plurality of servers, the query that comprises image data from the client device; and accessing, at the second server, the media content within the database based on the set of image features.”
On the other hand, in the same field of endeavor, Wexler teaches receiving (i.e. “receives inputs”; para. [0041]), 
at the second server from among the plurality of servers (i.e. “one or more servers”; para. [0104]), 
the query that comprises image data from the client device (i.e. “server 108 receives a query from a second user ... matches the query to a first index entry in the image catalog”; para. [0146]); 
accessing (i.e. “access to a given image”; para. [0153]), 
at the second server (i.e. “one or more servers”; para. [0104]), 
the media content within the database based on the set of image features (i.e. “request that provides access to a given image repository (database)”; para. [0153]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wexler that teaches a digital image system for cataloging images from multiple sources and using the catalog for discovering and viewing images into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content, and Baril that teaches use customized avatars within electronic messages such as text messages. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to enable queries to the image catalog to operate on groups of images, thereby reducing bandwidth and improving performance (Wexler, para. [0062]).
	
11.	Claims 6-7, 13-14 and 20 are rejected under 35 U.S.C. § 103 as being unpatentable over Li et. al. (US 20180011876 A1) hereinafter Li I in view of Li et. al. (US 20240202230 A1) hereinafter Li II in further view of Kale et al. (US 20210034657 A1)) still in further view of Wexler et al. (US 20190294629 A1).

	As per claim 6, Li I, Li II and Kale teach all the limitations as discussed in claim 1 above.  
	However, it is noted that the combination of the prior arts of Li I, Li II and Kale do not explicitly teach “wherein the associating the image description with the media content within the database further comprises: associating the image description with the media content and a location encompassed by a geo-fence within the database; and wherein the query includes location data that identifies the location encompassed by the geo-fence and wherein accessing the media content further requires satisfaction of the geo-fence constraint in addition to the similarity comparison.”
On the other hand, in the same field of endeavor, Wexler teaches wherein the associating the image description with the media content within the database further comprises (i.e. “the server analyzes the respective image to extract respective keywords that describe the respective image”; para. [0104]):
	associating the image description with the media content and a location encompassed by a geo-fence within the database (i.e. “the location where the image was created (e.g., GPS data) can be stored ... generate a heatmap to show popular spots where tourists take photos as opposed to where locals take photos”; para. [0174] and FIG. 6); and
wherein the query includes location data that identifies the location encompassed by the geo-fence (i.e. “search by filtering the results based on the tapped location”; para. [0130]) and
wherein accessing the media content further requires satisfaction of the geo-fence constraint in addition to the similarity comparison (i.e. “images may live in any location, allowing the user to group collections of images stored in separate repositories, avoiding duplication and workflow changes.”; para. [0160]; Examiner note: the geo-fence constraint is interpreted as the location. The similarity comparison is interpreted as the group collections of images stored in separate repositories, avoiding duplication).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wexler that teaches a digital image system for cataloging images from multiple sources and using the catalog for discovering and viewing images into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to enable queries to the image catalog to operate on groups of images, thereby reducing bandwidth and improving performance (Wexler, para. [0062]).

	As per claim 7, Li I, Li II and Kale teach all the limitations as discussed in claim 1 above.  
	However, it is noted that the combination of the prior arts of Li I, Li II and Kale do not explicitly teach “wherein the associating the image description with the media content within the database further comprises: associating the image description with the media content and user profile data within the database; and wherein the query includes the user profile data and the image description and wherein the user profile data modifies at least one weighting applied during comparison of the query image features and the stored feature vector representation.”
On the other hand, in the same field of endeavor, Wexler teaches wherein the associating the image description with the media content within the database further comprises (i.e. “the server analyzes the respective image to extract respective keywords that describe the respective image”; para. [0104]):
	associating the image description with the media content (i.e. “the server analyzes the respective image to extract respective keywords that describe the respective image, Para. 0104. Neural Network to analyze each image to produce a set of keywords describing the semantic content”; para. [0113]) and 
user profile data within the database (i.e. “user profile data 452 storing one or more user accounts associated with a user of viewer device”; para. [0099]); and
	wherein the query includes the user profile data (i.e. “show the profile of the current search or the entire image portfolio”; para. [0136]) and 
the image description (i.e. “keywords that describe the respective image”; para. [0104]) and
wherein the user profile data modifies at least one weighting applied during comparison of the query image features and the stored feature vector representation (i.e. “Context specific data, such as company events, historical statistics (e.g. play-by-play data from each game), facial geometries of key individuals, and corporate hierarchies can be compared against the data generated by the initial pass to generate a higher level set of keywords that improve search”; para. [0120]; Examiner note: The modifies at least one weighting is interpreted as the generate a higher level set of keywords. The comparison of the query image features and the stored feature vector representation is interpreted as the Context specific data, such as company events, historical statistics (e.g. play-by-play data from each game), facial geometries of key individuals).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wexler that teaches a digital image system for cataloging images from multiple sources and using the catalog for discovering and viewing images into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to enable queries to the image catalog to operate on groups of images, thereby reducing bandwidth and improving performance (Wexler, para. [0062]).

	As per claim 13, Li I, Li II and Kale teach all the limitations as discussed in claim 8 above.  
	However, it is noted that the combination of the prior arts of Li I, Li II and Kale do not explicitly teach “wherein the associating the image description with the media content within the database further comprises: associating the image description with the media content and a location encompassed by a geo-fence within the database; and wherein the query includes location data that identifies the location encompassed by the geo-fence and wherein accessing the media content further requires satisfaction of the geo-fence constraint in addition to the similarity comparison.”
On the other hand, in the same field of endeavor, Wexler teaches wherein the associating the image description with the media content within the database further comprises (i.e. “the server analyzes the respective image to extract respective keywords that describe the respective image”; para. [0104]):
	associating the image description with the media content and a location encompassed by a geo-fence within the database (i.e. “the location where the image was created (e.g., GPS data) can be stored ... generate a heatmap to show popular spots where tourists take photos as opposed to where locals take photos”; para. [0174] and FIG. 6); and
wherein the query includes location data that identifies the location encompassed by the geo-fence (i.e. “search by filtering the results based on the tapped location”; para. [0130]) and 
wherein accessing the media content further requires satisfaction of the geo-fence constraint in addition to the similarity comparison (i.e. “images may live in any location, allowing the user to group collections of images stored in separate repositories, avoiding duplication and workflow changes.”; para. [0160]; Examiner note: the geo-fence constraint is interpreted as the location. The similarity comparison is interpreted as the group collections of images stored in separate repositories, avoiding duplication).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wexler that teaches a digital image system for cataloging images from multiple sources and using the catalog for discovering and viewing images into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to enable queries to the image catalog to operate on groups of images, thereby reducing bandwidth and improving performance (Wexler, para. [0062]).

	As per claim 14, Li I, Li II and Kale teach all the limitations as discussed in claim 8 above.  
	However, it is noted that the combination of the prior arts of Li I, Li II and Kale do not explicitly teach “wherein the associating the image description with the media content within the database further comprises: associating the image description with the media content and user profile data within the database; and wherein the query includes the user profile data and the image description and wherein accessing the media content further comprises considering the user profile data in combination with the similarity comparison between the extracted query image features and the stored feature vector representation.”
On the other hand, in the same field of endeavor, Wexler teaches wherein the associating the image description with the media content within the database further comprises (i.e. “the server analyzes the respective image to extract respective keywords that describe the respective image”; para. [0104]):
	associating the image description with the media content (i.e. “the server analyzes the respective image to extract respective keywords that describe the respective image, Para. 0104. Neural Network to analyze each image to produce a set of keywords describing the semantic content”; para. [0113]) and 
user profile data within the database (i.e. “user profile data 452 storing one or more user accounts associated with a user of viewer device”; para. [0099]); and
	wherein the query includes the user profile data (i.e. “show the profile of the current search or the entire image portfolio”; para. [0136]) and 
the image description (i.e. “keywords that describe the respective image”; para. [0104]) and 
wherein accessing the media content further comprises considering the user profile data in combination with the similarity comparison between the extracted query image features and the stored feature vector representation (i.e. “Context specific data, such as company events, historical statistics (e.g. play-by-play data from each game), facial geometries of key individuals, and corporate hierarchies can be compared against the data generated by the initial pass to generate a higher level set of keywords that improve search”; para. [0120]; Examiner note: the considering the user profile data is interpreted as the generate a higher level set of keywords. The comparison between the executed query image features and the stored feature vector representation is interpreted as the Context specific data, such as company events, historical statistics (e.g. play-by-play data from each game), facial geometries of key individuals, and corporate hierarchies can be compared against the data generated by the initial pass).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wexler that teaches a digital image system for cataloging images from multiple sources and using the catalog for discovering and viewing images into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to enable queries to the image catalog to operate on groups of images, thereby reducing bandwidth and improving performance (Wexler, para. [0062]).

	As per claim 20, Li I, Li II and Kale teach all the limitations as discussed in claim 15 above.  
	However, it is noted that the combination of the prior arts of Li I, Li II and Kale do not explicitly teach “wherein the associating the image description with the media content within the database further comprises: associating the image description with the media content and a location encompassed by a geo-fence within the database; and wherein the query includes location data that identifies the location encompassed by the geo-fence and wherein accessing the media content further comprises considering the user profile data in combination with the similarity comparison between the extracted query image features and the stored feature vector representation.”
On the other hand, in the same field of endeavor, Wexler teaches wherein the associating the image description with the media content within the database further comprises (i.e. “the server analyzes the respective image to extract respective keywords that describe the respective image”; para. [0104]):
	associating the image description with the media content and a location encompassed by a geo-fence within the database (i.e. “the location where the image was created (e.g., GPS data) can be stored ... generate a heatmap to show popular spots where tourists take photos as opposed to where locals take photos”; para. [0174] and FIG. 6); and
wherein the query includes location data that identifies the location encompassed by the geo-fence (i.e. “search by filtering the results based on the tapped location”; para. [0130]) and
wherein accessing the media content further comprises considering the user profile data in combination with the similarity comparison between the extracted query image features and the stored feature vector representation (i.e. “Context specific data, such as company events, historical statistics (e.g. play-by-play data from each game), facial geometries of key individuals, and corporate hierarchies can be compared against the data generated by the initial pass to generate a higher level set of keywords that improve search”; para. [0120]; Examiner note: the considering the user profile data is interpreted as the generate a higher level set of keywords. The comparison between the executed query image features and the stored feature vector representation is interpreted as the Context specific data, such as company events, historical statistics (e.g. play-by-play data from each game), facial geometries of key individuals, and corporate hierarchies can be compared against the data generated by the initial pass).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wexler that teaches a digital image system for cataloging images from multiple sources and using the catalog for discovering and viewing images into the combination of Li I that teaches searching content with multi-dimensional image matching in response to a search query, Li II that teaches provide systems and methods for searching images based on a query image content, and Kale that teaches determining multi-term contextual tags for digital content and propagating the multi-term contextual tags to additional digital content. Additionally, this can improve user experience, such as playing various media and multimedia content on their personal or laptop computers.
	The motivation for doing so would be to enable queries to the image catalog to operate on groups of images, thereby reducing bandwidth and improving performance (Wexler, para. [0062]).

	Conclusion
12.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANTONIO CAIA DO whose telephone number is (469)295-9251.  The examiner can normally be reached on Monday - Friday / 06:30 to 16:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ng, Amy can be reached on (571) 270-1698.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ANTONIO J CAIA DO/
Examiner, Art Unit 2164
Read full office action
Prosecution Timeline

Jun 02, 2023
Application Filed
Mar 21, 2024
Non-Final Rejection — §101, §103
Jun 27, 2024
Response Filed
Oct 17, 2024
Final Rejection — §101, §103
Nov 13, 2024
Request for Continued Examination
Nov 19, 2024
Response after Non-Final Action
May 19, 2025
Non-Final Rejection — §101, §103
Aug 04, 2025
Response Filed
Aug 19, 2025
Final Rejection — §101, §103
Sep 22, 2025
Request for Continued Examination
Oct 05, 2025
Response after Non-Final Action
Oct 22, 2025
Non-Final Rejection — §101, §103
Nov 26, 2025
Response Filed
Dec 30, 2025
Final Rejection — §101, §103
Mar 03, 2026
Request for Continued Examination
Mar 12, 2026
Response after Non-Final Action
Mar 23, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/509,143
Patent 12597055
IDENTIFYING ITEMS OFFERED BY AN ONLINE CONCIERGE SYSTEM FOR A RECEIVED QUERY BASED ON A GRAPH IDENTIFYING RELATIONSHIPS BETWEEN ITEMS AND ATTRIBUTES OF THE ITEMS
2y 5m to grant Granted Apr 07, 2026
17/220,030
Patent 12579121
MANAGEMENT OF A SECONDARY VERTEX INDEX FOR A GRAPH
2y 5m to grant Granted Mar 17, 2026
17/249,488
Patent 12579129
System and Method for Processing Hierarchical Data
2y 5m to grant Granted Mar 17, 2026
18/320,671
Patent 12579125
SYSTEMS AND METHODS FOR ADMISSION CONTROL INPUT/OUTPUT
2y 5m to grant Granted Mar 17, 2026
19/257,121
Patent 12578842
STRUCTURED SUGGESTIONS
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

7-8
Expected OA Rounds
69%
Grant Probability
99%
With Interview (+49.9%)
3y 4m
Median Time to Grant
High
PTA Risk
Based on 188 resolved cases by this examiner. Grant probability derived from career allow rate.
CONTINUOUS INDEXING STRATEGY FOR IMAGE MATCHING AND VISUAL SEARCH SYSTEMS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email