Last updated: April 19, 2026
Application No. 18/220,330
OBJECT EMBEDDING LEARNING

Final Rejection §101§103§112
Filed
Jul 11, 2023
Examiner
CAI, PHUONG HAU
Art Unit
2673
Tech Center
2600 — Communications
Assignee
Objectvideo Labs LLC
OA Round
2 (Final)
Interview Optional

— +20.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 107 resolved cases, 2023–2026
Examiner Intelligence

CAI, PHUONG HAU View full profile →
Grants 81% — above average
Career Allow Rate
87 granted / 107 resolved
+19.3% vs TC avg
Strong +21% interview lift
Without
With
+20.9%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
32 currently pending
Career history
139
Total Applications
across all art units
Statute-Specific Performance

§101
22.6%
-17.4% vs TC avg
§103
38.5%
-1.5% vs TC avg
§102
21.3%
-18.7% vs TC avg
§112
14.0%
-26.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 107 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Remark(s)
Applicant's amendment filed December 05th, 2025 have been fully entered and considered. Regarding the arguments to the 112b and the 101 rejections, the examiner respectfully finds the arguments to be non-persuasive, see response to remarks section below. Regarding the prior art rejection, all new grounds of rejection set forth in the present action were necessitated by Applicants’ claim amendments. Accordingly, this action is made final.
Status of Claims
Claims 1-20 are pending, claims 1, 7, 9, 11-12, 18 and 20 have been amended. Claims 1-20 remains rejected.
Response to Argument(s)
112b rejection:
In pages 9-11 of the remarks, the Applicants argue that the terms “likely” and “likelihood” (which previously rejected under 112b to be relative terms in indefiniteness issue of the claims) to be relative terms that do not render the claims indefinite, since they are well-known terms in the art that the Meriam Websters Dictionary defines “likelihood” as “probability,” and “likely” as “having a high probability of occurring or being true.” Furthermore, the Applicants argue that these terms are terms of degree which provides enough certainty to one of skill in the art when read in the context of the invention.
The Applicants further mention the case of “Eibel Process Co. v. Mimesota & Ontario Paper Co.” to have the relative term “substantial elevation” to be definite and sufficient.
Examiner’s reply:
The examiner finds the argument to be some persuasive, the term “likelihood” can be interpreted to be “probability,” since the instant specification provides sufficient support that the term likelihood refers to a score of likelihood such as disclosed in the instant specification’s [0036].
However, the term “likely” is recited in the claim in the language of “…indicates a likely location…” renders the claim indefinite since it’s not clear whether there is a location at all resulted in the process, the term is relative and creates an unclear scope to the claim for the purpose of examination of the claim. “likely location” renders the claims indefinite, since it’s not clear whether it refers to the location among locations to be the truest, or the location detection may be resulted or not resulted, false or true, there is a location at all or not. Therefore, the interpretation cannot be definite.

101 rejections:
In pages 11-13 of the remarks, the Applicants argue that the amended features improve the technological fields of computer technology, image processing and machine learning. Specifically, the claimed solutions rely on analysis of image data based on the features of the claims “output data that includes i) an object detection result that indicates whether a target object is detected in the image and ii) the object embedding for the target object.” At least the element of “determining to perform the automated action using the output data” indicates a specific technological solution that influences the physical world.
Furthermore, the Applicants argue that the claims’ steps, in combination provides a tangible, real-world result rooted in the analysis of image data provides technical advantages that can generate object embeddings more efficiently because the embedding branch can use data from the visual recognition branch without regenerating that data on its own, saving computational resources. The Applicants further brings in a Federal Circuit’s leading case Enfish to state that software that can make non-abstract improvements to computer technology is not an abstract idea recitation.
Examiner’s reply:
The examiner respectfully disagrees with the Applicants’ arguments and find them to be non-persuasive and incommensurate with the scope of the claims.
The examiner finds the claims to not provide any improvement to computer technology since the claims do not alter or specifically improve functionality of a computer, but using generic well-known computer as it is such as using processor, memory storing instruction/program to be executed by the processor.
Moreover, there is no improvement to machine learning since the claims simply recites, such as for the independent claims 1, 12 and 20, “…providing, to a machine learning model, data…;” “…receiving, from the machine learning model, output data…” which merely recite the machine learning model at high level of generality to receive input and provide an output without further limiting, in details in which specific structure, algorithm or steps for the machine learning model to be function in a specific way more than conventional to arrive at such output data. Therefore, there is no improvement to machine learning but merely reciting the machine learning model at high level of generality generically to perform generic functions. 
There is no improvement in image processing, since the claims recite steps that perform on image data without meeting the requirement of the 101 eligibility to indicate an integration of the judicial exceptions into a practical application (for step 2A Prong 2) or being considered significantly more (step 2B). The relevant claimed features that the Applicants mentioned, in their remarks, that the claimed solutions rely on analysis of image data based on the features of the claims “output data that includes i) an object detection result that indicates whether a target object is detected in the image and ii) the object embedding for the target object;” at least the element of “determining to perform the automated action using the output data” indicates a specific technological solution that influences the physical world. The examiner respectfully disagrees with the Applicants’ arguments and find them to be incommensurate with the scope of the claim, importantly, these features are part of a step of determining such as recited as “determining whether to perform an automated action using the output data that includes i) the object detection result that indicates whether the target object is detected in the image and ii) the object embedding for the target object,” therefore, what is being performed in this limitation is the determining step and the features that the Applicants argued are merely data/information and their specification; such determining step is a step, under BRI, a human mind can perform through a process of observation and evaluation such as the human mind can observe some output data already processed or given here being the output data and its further specification “…includes…object detection result…..object embedding….” (the output data is being recited are being performed the determining one, not recited as to performing outputting data, neither to explicitly perform any detection result in this step nor generation of object embedding, these data/information are recited as already given/processed to perform the determining based on). Importantly, the step of “determining to perform the automated action using the output data,” the Applicants states that it indicates a specific technological solution that influences the physical world, however, his statement is not commensurate with the scope of the claim, the limitation simply recites “..an automated action…” without indicating what specific action is, very general and broad to cover any type of action, not always necessarily a real world action, even if it was a real world action, in some scope, it is not clear what specific practical action is being performed, there is a lack of practical application, of course a machine can perform an action, therefore, it's redundant and obvious, not practical or specific to a specific particular action to take strictly corresponding to a particular output determined through the process. This question is not answered by the claimed, the claim is not an integration of the judicial exceptions into a practical application. Hence, there is no improvement to a software in such manner that software can make non-abstract improvements to computer technology.
	Regarding the Applicants’ argument that the claims’ steps, in combination provides a tangible, real-world result rooted in the analysis of image data provides technical advantages that can generate object embeddings more efficiently because the embedding branch can use data from the visual recognition branch without regenerating that data on its own, saving computational resources.
The examiner respectfully disagrees, find the argument to be non-persuasive and incommensurate with the scope of the claim. As the Applicants are reminded that the claims are construed based on BRI in light of the specification, therefore, the teachings of the instant specification cannot be imported to be the instant scope of the claim, therefore, the claims do not reflect this improvement in the claim’s language. Nowhere in the claims recite “generate object embeddings more efficiently because the embedding branch can use data from the visual recognition branch without regenerating that data on its own, saving computational resources.”

Prior Art rejections:
In view of the Amendments to independent claims 1, 12 and 20 the previously applied prior art rejections are withdrawn as the amendment introduced new features into the claims and narrow down the scope of the claims. Applicants’ arguments are rendered moot in view of the new grounds of rejection set forth below.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6-7 and 17-18 are rejected under 112(b)
The term “likely” in claims 6-7 and 17-18 is a relative term which renders the claim indefinite. The term “likely” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. As the response to remarks section above explain, some of the 112b rejections still hold for the term “likely” in “likely location” recited in the claims.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-4, 6-15 and 17-20 are rejected under 35 U.S.C. 101 


Regarding Independent Claim 1 and its dependent claims 2-4 and 6-11,
Step 1 Analysis: Claim 1 is directed to a method/process, which falls within one of the four statutory categories. 

Step 2A Prong 1 Analysis: Claim 1 recites, in part, “maintaining data that represents an image; indicate whether a target object is detected in the image; determining whether to perform an automated action using the output data that includes i) the object detection result that indicates whether the target object is detected in the image and the object embedding for the target object; in response to determining to perform the automated action using the output data, perform an action using a result of the processing of the object embedding” The limitations as drafted, are processes that, under broadest reasonable interpretation, covers the performance of the limitation in the mind which falls within the “Mental Processes” grouping of abstract ideas. The limitations of:
“maintaining data that represents an image,” by BRI (broadest reasonable interpretation), is a step of observation and evaluation such as a human mind can observe an image and maintain information/data mentally regarding the image; “indicate….in the image” is a step, by BRI, of an observation evaluation such as the human mind can indicate whether an object is in an image; “determining whether….the output data” is a step, by BRI, of observation evaluation wherein the human mind can determine to perform an action to be automated or not based on certain given data already resulted;
“determining whether to perform an automated action using the output data that includes i) the object detection result that indicates whether the target object is detected in the image and the object embedding for the target object” such determining step is a step, under BRI, a human mind can perform through a process of observation and evaluation such as the human mind can observe some output data already processed or given here being the output data and its further specification “…includes…object detection result…..object embedding….” (the output data is being recited are being performed the determining one, not recited as to performing outputting data, neither to explicitly perform any detection result in this step nor generation of object embedding, these data/information are recited as already given/processed to perform the determining based on);
“in response to determining to perform the automated action using the output data, perform an action using a result of the processing of the object embedding” is a step which a human mind can perform, based on BRI, under a process of observation and evaluation such as, in response some condition, the human mind can determine to perform or execute an action corresponding. 
Accordingly, the claim recites an abstract idea.

Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. particular, the claim recites the following additional element(s) – 
A system comprising one or more computers and one or more storage devices on which are stored instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
…by the system,…is for generation of an object embedding for provision to another system that runs on other hardware;
providing, to a machine learning model, the data that represents the image;
receiving, from the machine learning model, output data that includes i) an object detection result and ii) the object embedding for the target object; 
providing, to the other system that runs on other hardware, the object embedding for the target object to cause the other system to process the object embedding.
The additional elements include generic system of a computer structure including storages storing instructions to be executed by the computer, a system, which are just generic computers and computer components performing generic functions; there further insignificant extra-solution activities of data gathering of providing and receiving information/data and generic machine learning model recited at high level of generality without further limiting, in details, how the model functions to arrive at such output; therefore, these are just mere attempts to implement the abstract idea judicial exceptions using generic machine learning model. The claim as a whole is directed to an abstract idea. Accordingly, these additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Please see MPEP §2106.04.(d).III.C. 

Step 2B Analysis: there are no additional elements that amount to significantly more than the judicial exception. Please see MPEP §2106.05. The claim is directed to an abstract idea.

For all of the foregoing reasons, claim 1 does not comply with the requirements of 35 USC 101. 
Accordingly, the dependent claims 2-4 and 6-11 do not provide elements that overcome the deficiencies of the independent claim 1. Moreover, claim 2 recites additional element of insignificant extra-solution activity of data gathering of a step of “receiving….data….comprises…” and further comprises what the machine learning model comprises of “visual recognition branch….embedding branch….” Which are just generic neural network branches and functions recited at high level of generality without further limiting how these branches work to arrive at such outcome, therefore, just mere generic neural network component additional elements. claim 3 recites, in part, an additional element of insignificant extra-solution activity of data gathering of the “receiving ….” And further details to the machine learning model to includes the “embedding branch that includes a first proper subset of one or more training layers, the one or more training layers having included a) the first proper subset and b) a second proper subset that was not included in the machine learning model for interference” which are recited at high level of generality such as the embedding branch to include layers which is well-known and obvious for a neural network branch to include layers moreover, by BRI, proper subset of training layers can be understood to be any layers for training to be proper, hence, still just generic and well-known, same as for the second proper subset which is, by BRI, to be general and generic, therefore, this limitation does not further limit, in details, how the machine learning model works to arrive at such outcome. Claim 4 recites, the same generic machine learning model to “include shared initial layers that generate data used by both the visual recognition branch and the embedding branch” does not limit, in details, how the machine learning work, to arrive at such outcome therefore, just generic neural network components, it’s obvious that the neural network include initial layers to have data used for the sequential layers, and the receiving data is insignificant extra-solution data gathering activity. Claim 6 recites, in part, “receiving the output data…” is an insignificant extra-solution data gathering activity, and further specification of what the data includes “object embedding….detected in the image” which is just providing further details to what the data includes still, mere data gathering additional element of a generic data extracted from an image through a generic extraction method. Claim 7 recites an insignificant data gathering additional element activity of receiving data from a generic recited at high level of generality machine learning model well-known in the art additional element, moreover, the claim recites further that the output to be of a generic object detection result to indicate where in the image the object locates which is well known in the art of an object detection method additional element to perform a mental process of detecting object in an image, and further details to the data/information to includes an object embedding. Claim 8 recites, in part, data gathering additional element of what the data/information comprises of to be insignificant extra-solution activity. Claim 9 recites, in part, an insignificant extra-solution data gathering activity additional element of “receiving output data” from a generic recited at high level of generality machine learning model well-known in the art without further limiting how the model works, in details, to arrive at such outcome, moreover, the claim recites what data/information that output data comprises of hence, still mere recitation of data and its gathering activity, all of these recite steps are just mere insignificant extra-solution activities of data gathering. Claim 10 recites, in part, further specification details of what the object embedding data comprises of, hence, still mere recitation of data/information to the data gathering step it depends on. Claim 11 recites, in part, steps of insignificant extra-solution activities of data gathering of the steps of providing and receiving of providing resulted data to an engine and receiving from an engine data and its details hence, still mere recitation of data/information and their gathering steps. 
Accordingly, the dependent claims 2-4 and 6-11 are not patent eligible under 101.

	Regarding claim 12 and its dependent claims 13-15 and 17-19:
	Regarding the independent claim 12, the claim recites analogous limitations to the independent claim 1 hence, is analyzed under the same approach to be 101 ineligible. Moreover, claim 12 recites a method/process which is one of the statutory categories, under Step 2A Prong 1. Furthermore, the dependent claims 13-15 and 17-19 are analogous to the dependent claims 2-4 and 6-11 hence, analyzed under the same approach to be 101 ineligible.

	Regarding the independent claim 20:
Regarding the independent claim 20, the claim recites analogous limitations to the independent claim 1 hence, is analyzed under the same approach to be 101 ineligible, and the claim 20 recites further additional elements of generic computer components perform generic functions of a non-transitory computer storage media encoded with instructions to be executed by a processor to perform the operations of the claimed invention hence, not indicative of an integration of the judicial exceptions into a practical application, under Step 2A Prong 2, not considered significantly more, under Step 2B.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-10 and 12-20 are rejected under 35 U.S.C. 103 as being unpatentable over Vicky Kalogeiton et. al. (“Joint Learning of Object and Action Detectors, 2017, Proceedings of the IEEE International Conference on Computer Vision, pp. 4163-4172” hereinafter as “Kalogeiton”) in view of Nishitkumar Ashokkumar Desai et. al. (“US 11,263,795 B1” hereinafter as “Desai”). 

Regarding claim 1, Kalogeiton discloses a system comprising one or more computers and one or more storage devices on which are stored instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising (abstract discloses the use of machine learning which indicates the use of a computer to have computer components such as storage storing instructions to be executed by a processor for the operations of the invention): maintaining, by the system, data that represents an image (FIG. 2 shows that the input into the machine learning model being an image and being maintained for the whole processing of the image data, by BRI [broadest reasonable interpretation] cover the scope of the limitation); providing, to a machine learning model, the data that represents the image (FIG. 2 shows the image is being input into the machine learning model of its data); receiving, from the machine learning model, output data that includes (FIG. 2 shows that the output from the model includes two branches) i) an object detection result that indicates whether a target object is detected in the image (FIG. 2 shows that one branch is for object detection which indicates whether an object is detected in the image, by BRI, covers the scope of the limitation) and ii) an object embedding for the target object (another branch of FIG. 2 shows action detection which include action label [by BRI, can be understood to be an object embedding for the target object as claimed]); and determining whether to perform an automated action using the output data (FIG. 2 of the action detection is understood to indicate an action is determined for the object using the output data of the model of FIG. 2, the action detection is an automated process hence, by BRI, can be understood to be an automated action detection of the result is being an automated action determined, by BRI, covers the scope of the claim; moreover, section 1, 1st par., and FIG 1 and section 4.1, 3rd par., discloses the detected object in the image frames and the paired action detected is for tracking of the object over frames to be used for the system developed, therefore, it can be understood that the invention is used for tracking of object automatically when an object is detected with an action paired with it, by BRI, covers the scope of the claim).
However, Kalogeiton does not explicitly disclose and is for generation of an object embedding for provision to another system that runs on other hardware; in response to determining to perform the automated action using the output data, providing, to the other system that runs on the other hardware, the object embedding for the target object to cause the other system to process the object embedding and perform an action using a result of the processing of the object embedding.
In the same field of action detection (abstract, Desai) Desai discloses and is for generation of an object embedding for provision to another system that runs on other hardware (column 24, 3rd par., discloses an aggregated image can be created by using a plurality of images that are merged together obtained from a plurality of cameras/imaging sensors, moreover, column 15, , lines 15-27, discloses that there is the use of multiple computers to access one or more functions associated with the facility, such as providing the processed result to a system administrator [to another system that runs on other hardware]); in response to determining to perform the automated action using the output data (as discussed above to Kalogeiton’s teaching), providing, to the other system that runs on the other hardware, the object embedding for the target object to cause the other system to process the object embedding and perform an action using a result of the processing of the object embedding (column 3, 1st par., discloses when the result is processed and presented to the administrator system of the analyst [other system that runs on the other hardware], the visualization information here being the object embedding for the target object, and the administrator system analyze the visualization info. and determine actions to take intended to improve the operation of the facility [perform an action using a result of the processing of the object embedding] by changing the data processing parameters [to process the object embedding information]).
Thus, it would have been obvious for a person of ordinary skill in the art before the effective filing date to modify Kalogeiton to perform determining whether to perform an automated action using an output data, and in response to determining to perform the automated action using the output data, providing, to the other system that runs on the other hardware, the object embedding for the target object to cause the other system to process the object embedding and perform an action using a result of the processing of the object embedding as taught by Desai to arrive at the claimed invention discussed above. Such a modification is the result of combing prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to perform visualization processing correctly and efficiently (abstract and column 3, 1st par., Desai).

Regarding claim 2, Kalogeiton in view of Desai, wherein Kalogeiton discloses the system of claim 1, wherein receiving the output data comprises receiving the output data from the machine learning model that comprises i) a visual recognition branch that generates the object detection result and ii) an embedding branch that generates the object embedding (FIG. 2 shows that the model has two branches an object detection branch and an action detection branch, the object detection branch can be understood to be analogous to the visual recognition branch that generate the object detection result, and the action detection branch can be understood to be the embedding branch as claimed, by BRI, which generate the action label which is understood to be the object embedding, by BRI).

Regarding claim 3, Kalogeiton in view of Desai, wherein Kalogeiton discloses the system of claim 2, wherein receiving the output data comprises receiving the output data from the machine learning model (as discussed above in claim 2) that includes the embedding branch (the action detection branch as discussed above in claim 2) that includes a first proper subset of one or more training layers (FIG. 2 of the action detection branch include several layers for an end-to-end training process, as disclosed in FIG. 2, which, by BRI, is analogous to the recited first proper subset of one or more training layers as claimed, since any set of layers within a branch is a subset of layers to be proper [completed for the model] used for training to be training layers), the one or more training layers having included a) the first proper subset (any portion of the layers of the action detection branch can be understood to be the first proper subset as claimed, by BRI) and b) a second proper subset (and the remaining portion is the second proper subset, by BRI) that was not included in the machine learning model for inference (section 4.2, last paragraph, discloses zero shot learning table 5 shows that the network is able to infer information about actions that were not seen at training time for a given object, therefore, in this instance, the action detection branch would have layers that were not learnt these new information in other words, was not included in the machine learning model for inferring such new information, by BRI, covers the scope of the claim, and the default layers that have been learnt the information during the training can be understood to be the first proper subset as claimed, by BRI).

Regarding claim 4, Kalogeiton in view of Desai, wherein Kalogeiton discloses the system of claim 2, wherein receiving the output data comprises receiving the output data from the machine learning model that includes one or more shared initial layers that generate data used by both the visual recognition branch and the embedding branch (FIG. 2 shows that the middle portion provide information to be used by both the action and object detection branches, hence, can be understood to be analogous to the one or more shared initial layers as claimed, by BRI).

Regarding claim 5, Kalogeiton in view of Desai, wherein Kalogeiton discloses the system of claim 4, wherein receiving the output data comprises receiving the output data from the machine learning model  (as discussed above in claim 1) that was trained using i) a first loss value for the one or more shared initial layers and the visual recognition branch and ii) a second loss value for the one or more shared initial layers and the embedding branch (equation 2 of section 3.2, shows that a multi-task loss is computed for the training of the model, per branch, each branch is calculated the loss for the training, as shown in the equation 2, therefore, is analogous to the claimed limitation wherein a second loss is a value for the initial layer and the embedding branch and the first loss is for the initial layers and the object detection or visual recognition branch, by BRI, covers the scope of the claimed limitation).

Regarding claim 6, Kalogeiton in view of Desai, wherein Kalogeiton discloses the system of claim 1, wherein receiving the output data comprises receiving the output data (as discussed above in claim 1) that includes the object embedding for the target object that was extracted from an image object embedding for the image (the action detection branch, as discussed above in claim 1, in FIG. 2 to determine the action label [object embedding] from the image for the object extracted from the image) using location data that indicates a likely location of the target object detected in the image (using bounding box data [according to section 2., 2nd to the last par.] which is the location data indicates the likely location of the object detected in the image, by BRI, covers the scope of the claimed limitation).

Regarding claim 7, Kalogeiton in view of Desai, wherein Kalogeiton discloses the system of claim 1 wherein receiving, from the machine learning model, the output data (as discussed above in claim 1) that includes i) an object detection result that indicates whether a target object is detected in the image (FIG. 2 shows that one branch is for object detection which indicates whether an object is detected in the image, by BRI, covers the scope of the limitation) and ii) an object embedding for the target object (another branch of FIG. 2 shows action detection which include action label [by BRI, can be understood to be an object embedding for the target object as claimed]) comprises: receiving, from the machine learning model, the output data that includes i) an object detection result that indicates that a target object is detected in the image (FIG. 2 shows the output of the object detection branch is the data indicates an object is detected in the image) and location data that indicates a likely location of the target object detected in the image (using bounding box data [according to section 2., 2nd to the last par.] which is the location data indicates the likely location of the object detected in the image, by BRI, covers the scope of the claimed limitation), and ii) an object embedding for the target object (another branch of FIG. 2 shows action detection which include action label [by BRI, can be understood to be an object embedding for the target object as claimed]).

Regarding claim 8, Kalogeiton in view of Desai, wherein Kalogeiton discloses the system of claim 7 wherein the location data comprises a bounding box for the detected target object (another branch of FIG. 2 shows action detection which include action label [by BRI, can be understood to be an object embedding for the target object as claimed]).

Regarding claim 9, Kalogeiton in view of Desai, wherein Kalogeiton discloses the system of claim 1 wherein receiving, from the machine learning model, output data that includes an object detection result that indicates whether a target object is detected in the image comprises (as discussed above in claim 1): receiving output data that includes, for the object detection result, an object category (the output of the object detection branch of FIG. 2 is the object label such as shown and disclosed in FIG. 4 which, by BRI, is analogous to the object category as claimed); and receiving, for the object detection result, a likelihood that the detected target object belongs to the object category (based on a probability calculation that the box to be the object-action instance as disclosed in section 3.2, “Multitask” section, by BRI, the probability is analogous to the likelihood as claimed that the detected target belongs to the object category, by BRI).

Regarding claim 10, Kalogeiton in view of Desai, wherein Kalogeiton discloses the system of claim 1 wherein the object embedding for the target object comprises: discriminative features of the detected target object (section 2., last par., discloses the actions detected based on set of attributes for the FIG. 2, wherein the action is detected for the detected objects, hence, these attributes can be understood to be discriminative features to match the action to the object in the image, by BRI, is analogous to the claimed limitation), the features containing data elements for differentiating objects that belong to the same category (since the attributes to detect the action belong to the object to pair them together, it can be understood to contain data elements for differentiating objects to belong to that action of that same category, by BRI, covers the scope of the claim).

Regarding claim 12, Kalogeiton discloses a computer-implemented method comprising (abstract discloses the use of machine learning which indicates the use of a computer to have computer components such as storage storing instructions to be executed by a processor for the operations of the invention): maintaining data that represents an image (FIG. 2 shows that the input into the machine learning model being an image and being maintained for the whole processing of the image data, by BRI [broadest reasonable interpretation] cover the scope of the limitation); providing, to a machine learning model, the data that represents the image (FIG. 2 shows the image is being input into the machine learning model of its data); receiving, from the machine learning model, output data that includes (FIG. 2 shows that the output from the model includes two branches) i) an object detection result that indicates whether a target object is detected in the image (FIG. 2 shows that one branch is for object detection which indicates whether an object is detected in the image, by BRI, covers the scope of the limitation) and ii) an object embedding for the target object (another branch of FIG. 2 shows action detection which include action label [by BRI, can be understood to be an object embedding for the target object as claimed]); and determining whether to perform an automated action using the output data (FIG. 2 of the action detection is understood to indicate an action is determined for the object using the output data of the model of FIG. 2, the action detection is an automated process hence, by BRI, can be understood to be an automated action detection of the result is being an automated action determined, by BRI, covers the scope of the claim; moreover, section 1, 1st par., and FIG 1 and section 4.1, 3rd par., discloses the detected object in the image frames and the paired action detected is for tracking of the object over frames to be used for the system developed, therefore, it can be understood that the invention is used for tracking of object automatically when an object is detected with an action paired with it, by BRI, covers the scope of the claim) that includes (FIG. 2 shows that the output from the model includes two branches) i) an object detection result that indicates whether a target object is detected in the image (FIG. 2 shows that one branch is for object detection which indicates whether an object is detected in the image, by BRI, covers the scope of the limitation) and ii) an object embedding for the target object (another branch of FIG. 2 shows action detection which include action label [by BRI, can be understood to be an object embedding for the target object as claimed]).
However, Kalogeiton does not explicitly disclose and is for generation of an object embedding for provision to another system that runs on other hardware; in response to determining to perform the automated action using the output data, providing, to the other system that runs on the other hardware, the object embedding for the target object to cause the other system to process the object embedding and perform an action using a result of the processing of the object embedding.
In the same field of action detection (abstract, Desai) Desai discloses and is for generation of an object embedding for provision to another system that runs on other hardware (column 24, 3rd par., discloses an aggregated image can be created by using a plurality of images that are merged together obtained from a plurality of cameras/imaging sensors, moreover, column 15, , lines 15-27, discloses that there is the use of multiple computers to access one or more functions associated with the facility, such as providing the processed result to a system administrator [to another system that runs on other hardware]); in response to determining to perform the automated action using the output data (as discussed above to Kalogeiton’s teaching), providing, to the other system that runs on the other hardware, the object embedding for the target object to cause the other system to process the object embedding and perform an action using a result of the processing of the object embedding (column 3, 1st par., discloses when the result is processed and presented to the administrator system of the analyst [other system that runs on the other hardware], the visualization information here being the object embedding for the target object, and the administrator system analyze the visualization info. and determine actions to take intended to improve the operation of the facility [perform an action using a result of the processing of the object embedding] by changing the data processing parameters [to process the object embedding information]).
Thus, it would have been obvious for a person of ordinary skill in the art before the effective filing date to modify Kalogeiton to perform determining whether to perform an automated action using an output data, and in response to determining to perform the automated action using the output data, providing, to the other system that runs on the other hardware, the object embedding for the target object to cause the other system to process the object embedding and perform an action using a result of the processing of the object embedding as taught by Desai to arrive at the claimed invention discussed above. Such a modification is the result of combing prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to perform visualization processing correctly and efficiently (abstract and column 3, 1st par., Desai).

Regarding claim 13, Kalogeiton in view of Desai, wherein Kalogeiton discloses the method of claim 12, wherein receiving the output data comprises receiving the output data from the machine learning model that comprises i) a visual recognition branch that generates the object detection result and ii) an embedding branch that generates the object embedding (FIG. 2 shows that the model has two branches an object detection branch and an action detection branch, the object detection branch can be understood to be analogous to the visual recognition branch that generate the object detection result, and the action detection branch can be understood to be the embedding branch as claimed, by BRI, which generate the action label which is understood to be the object embedding, by BRI).

Regarding claim 14, Kalogeiton in view of Desai, wherein Kalogeiton discloses the method of claim 13, wherein receiving the output data comprises receiving the output data from the machine learning model (as discussed above in claim 13) that includes the embedding branch (the action detection branch as discussed above in claim 2) that includes a first proper subset of one or more training layers (FIG. 2 of the action detection branch include several layers for an end-to-end training process, as disclosed in FIG. 2, which, by BRI, is analogous to the recited first proper subset of one or more training layers as claimed, since any set of layers within a branch is a subset of layers to be proper [completed for the model] used for training to be training layers), the one or more training layers having included a) the first proper subset (any portion of the layers of the action detection branch can be understood to be the first proper subset as claimed, by BRI) and b) a second proper subset (and the remaining portion is the second proper subset, by BRI) that was not included in the machine learning model for inference (section 4.2, last paragraph, discloses zero shot learning table 5 shows that the network is able to infer information about actions that were not seen at training time for a given object, therefore, in this instance, the action detection branch would have layers that were not learnt these new information in other words, was not included in the machine learning model for inferring such new information, by BRI, covers the scope of the claim, and the default layers that have been learnt the information during the training can be understood to be the first proper subset as claimed, by BRI).

Regarding claim 15, Kalogeiton in view of Desai, wherein Kalogeiton discloses the method of claim 13, wherein receiving the output data comprises receiving the output data from the machine learning model that includes one or more shared initial layers that generate data used by both the visual recognition branch and the embedding branch (FIG. 2 shows that the middle portion provide information to be used by both the action and object detection branches, hence, can be understood to be analogous to the one or more shared initial layers as claimed, by BRI).

Regarding claim 16, Kalogeiton in view of Desai, wherein Kalogeiton discloses the method of claim 14, wherein receiving the output data comprises receiving the output data from the machine learning model  (as discussed above in claim 14) that was trained using i) a first loss value for the one or more shared initial layers and the visual recognition branch and ii) a second loss value for the one or more shared initial layers and the embedding branch (equation 2 of section 3.2, shows that a multi-task loss is computed for the training of the model, per branch, each branch is calculated the loss for the training, as shown in the equation 2, therefore, is analogous to the claimed limitation wherein a second loss is a value for the initial layer and the embedding branch and the first loss is for the initial layers and the object detection or visual recognition branch, by BRI, covers the scope of the claimed limitation).

Regarding claim 17, Kalogeiton in view of Desai, wherein Kalogeiton discloses the method of claim 12, wherein receiving the output data comprises receiving the output data (as discussed above in claim 12) that includes the object embedding for the target object that was extracted from an image object embedding for the image (the action detection branch, as discussed above in claim 12, in FIG. 2 to determine the action label [object embedding] from the image for the object extracted from the image) using location data that indicates a likely location of the target object detected in the image (using bounding box data [according to section 2., 2nd to the last par.] which is the location data indicates the likely location of the object detected in the image, by BRI, covers the scope of the claimed limitation).

Regarding claim 18, Kalogeiton in view of Desai, wherein Kalogeiton discloses the method of claim 12 wherein receiving, from the machine learning model, the output data (as discussed above in claim 12) that includes i) an object detection result that indicates whether a target object is detected in the image (FIG. 2 shows that one branch is for object detection which indicates whether an object is detected in the image, by BRI, covers the scope of the limitation) and ii) an object embedding for the target object (another branch of FIG. 2 shows action detection which include action label [by BRI, can be understood to be an object embedding for the target object as claimed]) comprises: receiving, from the machine learning model, the output data that includes i) an object detection result that indicates that a target object is detected in the image (FIG. 2 shows the output of the object detection branch is the data indicates an object is detected in the image) and location data that indicates a likely location of the target object detected in the image (using bounding box data [according to section 2., 2nd to the last par.] which is the location data indicates the likely location of the object detected in the image, by BRI, covers the scope of the claimed limitation), and ii) an object embedding for the target object (another branch of FIG. 2 shows action detection which include action label [by BRI, can be understood to be an object embedding for the target object as claimed]).

Regarding claim 19, Kalogeiton in view of Desai, wherein Kalogeiton discloses the method of claim 18, wherein the location data comprises a bounding box for the detected target object (another branch of FIG. 2 shows action detection which include action label [by BRI, can be understood to be an object embedding for the target object as claimed]).


Regarding claim 20, Kalogeiton discloses one or more non-transitory computer storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising (abstract discloses the use of machine learning which indicates the use of a computer to have computer components such as non-transitory storage storing instructions to be executed by a processor for the operations of the invention): maintaining data that represents an image (FIG. 2 shows that the input into the machine learning model being an image and being maintained for the whole processing of the image data, by BRI [broadest reasonable interpretation] cover the scope of the limitation); providing, to a machine learning model, the data that represents the image (FIG. 2 shows the image is being input into the machine learning model of its data); receiving, from the machine learning model, output data that includes (FIG. 2 shows that the output from the model includes two branches) i) an object detection result that indicates whether a target object is detected in the image (FIG. 2 shows that one branch is for object detection which indicates whether an object is detected in the image, by BRI, covers the scope of the limitation) and ii) an object embedding for the target object (another branch of FIG. 2 shows action detection which include action label [by BRI, can be understood to be an object embedding for the target object as claimed]); and determining whether to perform an automated action using the output data (FIG. 2 of the action detection is understood to indicate an action is determined for the object using the output data of the model of FIG. 2, the action detection is an automated process hence, by BRI, can be understood to be an automated action detection of the result is being an automated action determined, by BRI, covers the scope of the claim; moreover, section 1, 1st par., and FIG 1 and section 4.1, 3rd par., discloses the detected object in the image frames and the paired action detected is for tracking of the object over frames to be used for the system developed, therefore, it can be understood that the invention is used for tracking of object automatically when an object is detected with an action paired with it, by BRI, covers the scope of the claim) that includes (FIG. 2 shows that the output from the model includes two branches) i) an object detection result that indicates whether a target object is detected in the image (FIG. 2 shows that one branch is for object detection which indicates whether an object is detected in the image, by BRI, covers the scope of the limitation) and ii) an object embedding for the target object (another branch of FIG. 2 shows action detection which include action label [by BRI, can be understood to be an object embedding for the target object as claimed]).
However, Kalogeiton does not explicitly disclose and is for generation of an object embedding for provision to another system that runs on other hardware; in response to determining to perform the automated action using the output data, providing, to the other system that runs on the other hardware, the object embedding for the target object to cause the other system to process the object embedding and perform an action using a result of the processing of the object embedding.
In the same field of action detection (abstract, Desai) Desai discloses and is for generation of an object embedding for provision to another system that runs on other hardware (column 24, 3rd par., discloses an aggregated image can be created by using a plurality of images that are merged together obtained from a plurality of cameras/imaging sensors, moreover, column 15, , lines 15-27, discloses that there is the use of multiple computers to access one or more functions associated with the facility, such as providing the processed result to a system administrator [to another system that runs on other hardware]); in response to determining to perform the automated action using the output data (as discussed above to Kalogeiton’s teaching), providing, to the other system that runs on the other hardware, the object embedding for the target object to cause the other system to process the object embedding and perform an action using a result of the processing of the object embedding (column 3, 1st par., discloses when the result is processed and presented to the administrator system of the analyst [other system that runs on the other hardware], the visualization information here being the object embedding for the target object, and the administrator system analyze the visualization info. and determine actions to take intended to improve the operation of the facility [perform an action using a result of the processing of the object embedding] by changing the data processing parameters [to process the object embedding information]).
Thus, it would have been obvious for a person of ordinary skill in the art before the effective filing date to modify Kalogeiton to perform determining whether to perform an automated action using an output data, and in response to determining to perform the automated action using the output data, providing, to the other system that runs on the other hardware, the object embedding for the target object to cause the other system to process the object embedding and perform an action using a result of the processing of the object embedding as taught by Desai to arrive at the claimed invention discussed above. Such a modification is the result of combing prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to perform visualization processing correctly and efficiently (abstract and column 3, 1st par., Desai).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Vicky Kalogeiton et. al. (“Joint Learning of Object and Action Detectors, 2017, Proceedings of the IEEE International Conference on Computer Vision, pp. 4163-4172” hereinafter as “Kalogeiton”) in view of Nishitkumar Ashokkumar Desai et. al. (“US 11,263,795 B1” hereinafter as “Desai”) and further in view of Philippe Weinzaepfel et. al. (“Learning to track for spatio-temporal action localization, 2015, Proceedings of the IEEE International Conference on Computer Vision, pp. 3164-3172” hereinafter as “Weinzaepfel”).

Regarding claim 11, Kalogeiton in view of Desai, wherein Kalogeiton discloses the system of claim 1 wherein determining whether to perform an automated action using the output data (as discussed above in claim 1) comprising: providing the i) object detection result that indicates whether a target object is detected in the image (FIG. 2 shows that one branch is for object detection which indicates whether an object is detected in the image, by BRI, covers the scope of the limitation) and ii) object embedding for the target object (another branch of FIG. 2 shows action detection which include action label [by BRI, can be understood to be an object embedding for the target object as claimed]).
However, Kalogeiton in view of Desai does not explicitly disclose providing, to an object matching engine, the data and receiving, from the object matching engine, data that includes an object matching result indicating whether the detected target object is likely the same as another object detected in another image from a sequence of images that includes the image as part of an object tracking process.
In the same field of object action localization (title, Weinzaepfel), Weinzaepfel discloses providing, to an object matching engine, the data (section 3, 3rd par. of “tracking best candidates” section, discloses providing the extracted regions to the processing for finding best candidates [analogous to providing to an object matching engine as claimed, since this processing is performed by a processor]) and receiving, from the object matching engine, data that includes an object matching result indicating whether the detected target object is likely the same as another object detected in another image from a sequence of images (section 3, “Tracking best candidates” and “Scoring tracks” discloses performing best candidates for the tracking based on scoring to determine the action and the object match through the images of the tracking, hence, based on, BRI, is analogous to the claimed limitation wherein the scoring is analogous to object matching result indicating the likelihood that the same object is being tracked among images, by BRI covers the scope of the claimed limitation) that includes the image as part of an object tracking process (for the object tracking as discussed previously).
Thus, it would have been obvious for a person of ordinary skill in the art before the effective filing date to modify Kalogeiton in view of Desai to perform determining whether to perform an automated action using the output data comprising providing, to an object matching engine, the i) object detection result that indicates whether a target object is detected in the image and ii) object embedding for the target object; and receiving, from the object matching engine, data that includes an object matching result indicating whether the detected target object is likely the same as another object detected in another image from a sequence of images that includes the image as part of an object tracking process as taught by Weinzaepfel to arrive at the claimed invention discussed above. Such a modification is the result of combing prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to detect object and perform tracking of the object based on scoring or matching of the tracking objects more robustly (abstract, Weinzaepfel).


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHUONG HAU CAI whose telephone number is (571)272-9424. The examiner can normally be reached M-F 8:30 am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chineyere Wills-Burns can be reached at (571) 272-9752. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/PHUONG HAU CAI/Examiner, Art Unit 2673  
                                                                                                                                                                                                      /CHINEYERE WILLS-BURNS/Supervisory Patent Examiner, Art Unit 2673
Read full office action
Prosecution Timeline

Jul 11, 2023
Application Filed
Aug 23, 2025
Non-Final Rejection — §101, §103, §112
Dec 05, 2025
Response Filed
Mar 04, 2026
Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/027,687
Patent 12602833
IMAGE ANALYSIS DEVICE AND IMAGE ANALYSIS METHOD
2y 5m to grant Granted Apr 14, 2026
18/071,943
Patent 12602940
SINGLE CELL IDENTIFICATION FOR CELL SORTING
2y 5m to grant Granted Apr 14, 2026
18/034,513
Patent 12597223
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM
2y 5m to grant Granted Apr 07, 2026
17/915,489
Patent 12592064
METHOD AND APPARATUS FOR TRAINING TARGET DETECTION MODEL, METHOD AND APPARATUS FOR DETECTING TARGET
2y 5m to grant Granted Mar 31, 2026
18/212,891
Patent 12591616
METHOD, SYSTEM AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM FOR SEARCHING SIMILAR PRODUCTS USING A MULTI TASK LEARNING MODEL
2y 5m to grant Granted Mar 31, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
81%
Grant Probability
99%
With Interview (+20.9%)
3y 0m
Median Time to Grant
Moderate
PTA Risk
Based on 107 resolved cases by this examiner. Grant probability derived from career allow rate.