Last updated: April 19, 2026
Application No. 18/505,839
OBJECT TRACKING USING PREDICTED POSITIONS

Non-Final OA §101§103§112
Filed
Nov 09, 2023
Examiner
PATEL, PINALBEN V
Art Unit
2673
Tech Center
2600 — Communications
Assignee
Qualcomm Incorporated
OA Round
1 (Non-Final)
Interview Optional

— +9.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 545 resolved cases, 2023–2026
Examiner Intelligence

PATEL, PINALBEN V View full profile →
Grants 89% — above average
Career Allow Rate
484 granted / 545 resolved
+26.8% vs TC avg
Moderate +10% lift
Without
With
+9.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
23 currently pending
Career history
568
Total Applications
across all art units
Statute-Specific Performance

§101
9.1%
-30.9% vs TC avg
§103
59.9%
+19.9% vs TC avg
§102
5.9%
-34.1% vs TC avg
§112
14.9%
-25.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 545 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101 
35 U.S.C. 101 reads as follows:  
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. 
 	
Claims 1-2, 5-6, 15-16, 24-25, 28-29 are rejected under 35 U.S.C. 101 Abstract idea. 
 
35 U.S.C. 101 requires that a claimed invention must fall within one of the four eligible categories of invention (i.e. process, machine, manufacture, or composition of matter) and must not be directed to subject matter encompassing a judicially recognized exception as interpreted by the courts.  MPEP 2106.  Three categories of subject matter are found to be judicially recognized exceptions to 35 U.S.C. § 101 (i.e. patent ineligible) (1) laws of nature, (2) physical phenomena, and (3) abstract ideas.  MPEP 2106(II).  To be patent-eligible, a claim directed to a judicial exception must as whole be directed to significantly more than the exception itself.  See 2014 Interim Guidance on Patent Subject Matter Eligibility, 79 Fed. Reg. 74618, 74624 (Dec. 16, 2014).  Hence, the claim must describe a process or product that applies the exception in a meaningful way, such that it is more than a drafting effort designed to monopolize the exception.  Id 
  
Claim(s) 1 and 24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., an abstract idea) without significantly more.  Claim(s) are directed to object position in an image based on combination of first and second position determinations.  The concept is similar to data recognition and storage. The idea is collecting data, recognizing certain data within the collected data set and storing the recognized data in memory. Furthermore, mere recitation of a computer implemented invention that uses machine learning algorithm to determine position of object in image is well known and routine conventional activities previously known to industry.  (Content Extraction and Trasmission LLC v. Wells-Fargo Bank). 

The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. 

Dependent claims 2, 5-6, 15-16, 25, 28-29 are similarly rejected for reciting generic machine learning classifier. 

Other dependent claims either recite significantly more features to add the concept of data collection and recognition.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 10 and 11 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 10 recite limitations – “wherein the modification to the output embedding is determined based on a least-norm solution to an undetermined linear equation system, and wherein the undetermined linear equation system is based on the decoder”, appears to be directed to obtaining modified output embedding using least norm solution to an undetermined linear equation based on decoder. However, it is not clear as to what specific parameters are used from the obtained embedding values of first and second image to determine undetermined linear equation based on decoder. 

Therefore, Examiner suggests amending the limitations to further clarify the specific parameters used based on decoder to determine undetermined linear equation as disclosed in embodiment of the original specifications in order to render the claims definite. 

Claim 11 recite limitations – “wherein the modification to the output embedding is determined using a gradient-descent technique”, appears to be directed to determine modified output embedding using gradient descent technique. However, it is not clear as to which specific parameters are used to determine gradient descent technique obtained from position information of first and second image. Therefore, Examiner suggests amending claims in order to clarify the specific parameters as disclosed in the original specifications in order to render the claims definite. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-9, 12, 14-30 are rejected under 35 U.S.C. 103 as being unpatentable over Rublee et al. (US Patent No. 11436752 B1) in view of Igor et al. (CA 3044609 A1).  

Regarding Claim 1,
Rublee discloses An apparatus for tracking objects, the apparatus comprising: at least one memory; and at least one processor coupled to the at least one memory and configured to: (Rublee, Co. 14, Lines 62-67, Col. 15, Lines 1-16, Fig. 5, discloses a flow chart 500 for a set of methods for localizing a device with respect to a known object in accordance with specific embodiments of the invention disclosed herein. Flow chart 500 begins with a step 501 of loading, into at least one computer-readable medium on a device 550 a set of instructions, which when executed by the at least one processor on the device, cause the system to execute the additional steps in flow chart 500. The step 501 can additionally include loading the trained machine intelligence system, such as the system trained using the methods in flow chart 400, into a computer readable medium on device 550. In the illustrated case, device 550 is a space faring vessel with an attached sensor 551 in the form of a visible light camera, at least one onboard computer readable medium 553, and a processor 554. The at least one computer-readable medium 553 can store both the instructions for the methods disclosed herein and the trained machine intelligence system. In the alternative, portions of the methods can be conducted using instructions stored and processors located at alternative locations with the product of those alternative instructions and the inputs thereto being exchanged with the device over a network)

generate an output embedding based on an object in a first image; (Rublee, Col. 4, Lines 15-37, discloses trained machine intelligence system can have various inputs depending upon where it is in the localization pipeline and what its role will be. In general, the trained machine intelligence system will be used with the image to generate known object coordinates for at least a subset of the pixels in the image. However, the image may be preprocessed by other systems before it is delivered to the trained machine intelligence system. The input to the trained machine intelligence system can be an image of at least a portion of the known object or processed image data of the known object. The image can be captured using a sensor attached to the device that is being localized with respect to the object. The image can include a set of pixels each having one or more pixel values (e.g., a greyscale integer value or multiple values for encoding a color scheme such as an RGB encoding). As used herein, regardless of whether raw image data is input directly to the trained machine intelligence system or if the image data is first processed by an earlier stage of the localization pipeline before being provided as an input to the trained machine intelligence system, the trained machine intelligence system will still be described as being used with the image to determine object coordinates for the pixels in the image; position coordinate of object in image is determined and embedded as output in the image)

Rublee does not explicitly disclose obtain a predicted position of the object in a second image; modify the output embedding based on the predicted position to generate a modified output embedding; and detect the object in the second image based on the modified output embedding.  

	Igor discloses obtain a predicted position of the object in a second image; (Igor, Description, discloses wherein the computer is configured to: process the image from the camera, detect at least one moveable object within the image using a detection algorithm selected from a library of detection algorithms, estimate a current position of the at least one moveable object, estimate a current position of a user relative to the current position of the at least one moveable object, predict a future position of the at least one moveable object; position is predicted of the object in second image from among the plural images captured of the same object in object tracking)

modify the output embedding based on the predicted position to generate a modified output embedding; (Igor, Description, discloses current PTZ camera configuration may be provided by the GPU 302, which may also assist in making the determination. The logic controller 304 may also use information from the state estimator 310 to make the selection. For example, the logic controller 304 may use prediction and/or estimation 
information regarding certain objects that were detected and/or tracked, as well as the
user of the object-tracking system 100 and/or users of other object-tracking systems to
determine the new PTZ camera configuration(s). For example, the new PTZ configuration may correspond to a configuration that will direct the camera 104 toward an
approximate estimated and/or predicted bounding box center/centroid of an object being
tracked within the scene of the image; combination of information from tracked objects in plurality of images using second predicted position of object in image and its location obtained in first image are combined and modified position of object in image is determined)
and 

detect the object in the second image based on the modified output embedding.  (Igor, Description, discloses current PTZ camera configuration may be provided by the GPU 302, which may also assist in making the determination. The logic controller 304 may also use information from the state estimator 310 to make the selection. For example, the logic controller 304 may use prediction and/or estimation 
information regarding certain objects that were detected and/or tracked, as well as the
user of the object-tracking system 100 and/or users of other object-tracking systems to
determine the new PTZ camera configuration(s). For example, the new PTZ configuration may correspond to a configuration that will direct the camera 104 toward an
approximate estimated and/or predicted bounding box center/centroid of an object being
tracked within the scene of the image; combination of information from tracked objects in plurality of images using second predicted position of object in image and its location obtained in first image are combined and modified position of object in image is determined and output)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Rublee in view of Igor having a detecting an object in an image using machine classifier, with the teachings of Igor of having method of detecting object position in an image using machine learning classifier and combining the result (modifying the embedded first image object output) with object position obtained through other sensor method including GPS data to accurately determine position of object in an image by using multiple outputs by varying techniques. 

Regarding Claim 2, 
The combination of Rublee and Igor further discloses wherein: the output embedding comprises a first output embedding, and to detect the object in the second image, the at least one processor is configured to: provide the second image and the modified output embedding to an object-detection machine-learning model; and receive, from the object-detection machine-learning model, a second output embedding based on the second image and the modified output embedding.  (Igor, Description, discloses the one or more image processing techniques may include 2D and 3D object recognition, image segmentation, motion detection (e.g., single particle tracking), video tracking, optical flow, 3D Pose Estimation, etc. In certain aspects, the detection algorithms and tracking control schemes may be linked and/or otherwise associated. In certain aspects, the detection algorithms and tracking control schemes may be structured to conform to a particular modular format, to be easily swapped in and/or out of the object detector 306. In certain aspects, the detecting algorithms and/or tracking schemes may be tailored for various use cases in a reconfigurable design. In certain aspects, the detection algorithms and tracking control schemes may be trained through machine learning by artificial neural networks. In some examples, certain detection algorithms and/or tracking control schemes may be more appropriate for detecting and/or tracking a particular class, classification, type, variety, category, group, and/or grade of object than others. The detector library 308 may be implemented in hardware and/or software. In certain aspects, the detector library 308 may comprise a database. The object detector 306 may activate appropriate detecting algorithms and/or tracking schemes, while deactivating inappropriate detecting algorithms and/or tracking schemes; depending on the object being detected and/or tracked. In certain aspects, the object detector 306 may activate and/or deactivate detecting algorithms as a function of the class, classification, type, variety, category, group, and/or grade of the object being detected and/or tracked. In certain aspects, the object detector 306 may activate appropriate detecting algorithms and/or tracking schemes and/or deactivate inappropriate detecting algorithms and/or tracking schemes, depending on the desired and/or selected use case. In certain aspects, the GPU 302 may provide the object detector 306 with preprocessed images from the camera 104, to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. In certain aspects, the user may provide information through the user interface 114 to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. For example, the user may input information regarding the surrounding environment, such as the approximate region, whether it is indoors, outdoors, urban, rural, elevated, underground, etc. This may assist the object detector 306 in excluding less useful detecting algorithms and/or Tracking schemes (e.g., mountain detectors/trackers in an underground urban environment, elevator detectors/trackers in an outdoor rural setting, etc.). In certain aspects, the object detector 306 may automatically detect aspects of the surrounding environment to activate and/or deactivate the appropriate detecting algorithms and/or tracking schemes. In cases where detection of the object requires differentiation between the object and various environmental cues, features may be extracted from the images that are independent of the object bounding box. Aspects of the environment, such a foreground/background classification, environment classification, lighting, etc. The object detector 306 architecture may be configured to allow for an object-agnostic PTZ camera 104 target tracking system that is easily configurable for the type of object to be tracked and highly extensible to other object domains with little work needed on the part of the user. In certain aspects, the object detector 306 may use a bounding box to circumscribe an object within a scene during detection. In certain aspects, the detector may use a centroid, centered within the bounding box, to assist with detecting and/or tracking an object. In certain aspects, the object detector 306 may determine whether the detected; machine learning algorithms determined object locations and assist in object tracking in sequence images (first and second image) to determine its specific coordinates and outputs in embedded form). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim. 


Regarding Claim 3, 
The combination of Rublee and Igor further discloses wherein the at least one processor is further configured to:  provide the second output embedding to a decoder; and receive, from the decoder, image coordinates corresponding to the object in the second image based on the second output embedding.  (Rublee, Col. 3, Lines 58-67, Col. 4, Lines 1-15, Fig. 1, discloses a block diagram of a trained machine intelligence system 100 that is in accordance with specific embodiments of the invention disclosed herein. In the example of FIG. 1, the trained machine intelligence system is a convolutional neural network with a set of layers comprising an encoder 101 and a set of layers comprising a decoder 102. This is a common structure in the field of CNNs for image processing with the encoder converting the information from the image space to information in a feature space and the decoder converting the information in the feature space back into information in the image space. The CNN includes multiple kinds of layers including convolutional layers, up-sampling layers, pooling layers, and drop out layers. The CNN can be a deep convolutional network such as U-Net. Layers in the encoder and decoder are linked by concatenate operations 103. The intermediate layers of the trained machine intelligence system 100 include filters with values that are used in the convolutional layers and which are adjusted during training in order for the network to learn the characteristics of the known object. In this manner aspects of the known object and any variances in the training data are incorporated into the trained machine intelligence system during training; an encoder and decoder encodes and decodes the feature of image object embedded outputs). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.

Regarding Claim 4, 
The combination of Rublee and Igor further discloses wherein the image coordinates are indicative of a bounding box associated with the object.  (Igor, Description, discloses the one or more image processing techniques may include 2D and 3D object recognition, image segmentation, motion detection (e.g., single particle tracking), video tracking, optical flow, 3D Pose Estimation, etc. In certain aspects, the detection algorithms and tracking control schemes may be linked and/or otherwise associated. In certain aspects, the detection algorithms and tracking control schemes may be structured to conform to a particular modular format, to be easily swapped in and/or out of the object detector 306. In certain aspects, the detecting algorithms and/or tracking schemes may be tailored for various use cases in a reconfigurable design. In certain aspects, the detection algorithms and tracking control schemes may be trained through machine learning by artificial neural networks. In some examples, certain detection algorithms and/or tracking control schemes may be more appropriate for detecting and/or tracking a particular class, classification, type, variety, category, group, and/or grade of object than others. The detector library 308 may be implemented in hardware and/or software. In certain aspects, the detector library 308 may comprise a database. The object detector 306 may activate appropriate detecting algorithms and/or tracking schemes, while deactivating inappropriate detecting algorithms and/or tracking schemes; depending on the object being detected and/or tracked. In certain aspects, the object detector 306 may activate and/or deactivate detecting algorithms as a function of the class, classification, type, variety, category, group, and/or grade of the object being detected and/or tracked. In certain aspects, the object detector 306 may activate appropriate detecting algorithms and/or tracking schemes and/or deactivate inappropriate detecting algorithms and/or tracking schemes, depending on the desired and/or selected use case. In certain aspects, the GPU 302 may provide the object detector 306 with preprocessed images from the camera 104, to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. In certain aspects, the user may provide information through the user interface 114 to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. For example, the user may input information regarding the surrounding environment, such as the approximate region, whether it is indoors, outdoors, urban, rural, elevated, underground, etc. This may assist the object detector 306 in excluding less useful detecting algorithms and/or Tracking schemes (e.g., mountain detectors/trackers in an underground urban environment, elevator detectors/trackers in an outdoor rural setting, etc.). In certain aspects, the object detector 306 may automatically detect aspects of the surrounding environment to activate and/or deactivate the appropriate detecting algorithms and/or tracking schemes. In cases where detection of the object requires differentiation between the object and various environmental cues, features may be extracted from the images that are independent of the object bounding box. Aspects of the environment, such a foreground/background classification, environment classification, lighting, etc. The object detector 306 architecture may be configured to allow for an object-agnostic PTZ camera 104 target tracking system that is easily configurable for the type of object to be tracked and highly extensible to other object domains with little work needed on the part of the user. In certain aspects, the object detector 306 may use a bounding box to circumscribe an object within a scene during detection. In certain aspects, the detector may use a centroid, centered within the bounding box, to assist with detecting and/or tracking an object. In certain aspects, the object detector 306 may determine whether the detected; object is tracked in images and its boundary is determined and outlined as bounding boxe). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.


Regarding Claim 5, 
The combination of Rublee and Igor further discloses wherein the modified output embedding is provided to the object-detection machine-learning model as a query.  (Igor, Description, discloses the one or more image processing techniques may include 2D and 3D object recognition, image segmentation, motion detection (e.g., single particle tracking), video tracking, optical flow, 3D Pose Estimation, etc. In certain aspects, the detection algorithms and tracking control schemes may be linked and/or otherwise associated. In certain aspects, the detection algorithms and tracking control schemes may be structured to conform to a particular modular format, to be easily swapped in and/or out of the object detector 306. In certain aspects, the detecting algorithms and/or tracking schemes may be tailored for various use cases in a reconfigurable design. In certain aspects, the detection algorithms and tracking control schemes may be trained through machine learning by artificial neural networks. In some examples, certain detection algorithms and/or tracking control schemes may be more appropriate for detecting and/or tracking a particular class, classification, type, variety, category, group, and/or grade of object than others. The detector library 308 may be implemented in hardware and/or software. In certain aspects, the detector library 308 may comprise a database. The object detector 306 may activate appropriate detecting algorithms and/or tracking schemes, while deactivating inappropriate detecting algorithms and/or tracking schemes; depending on the object being detected and/or tracked. In certain aspects, the object detector 306 may activate and/or deactivate detecting algorithms as a function of the class, classification, type, variety, category, group, and/or grade of the object being detected and/or tracked. In certain aspects, the object detector 306 may activate appropriate detecting algorithms and/or tracking schemes and/or deactivate inappropriate detecting algorithms and/or tracking schemes, depending on the desired and/or selected use case. In certain aspects, the GPU 302 may provide the object detector 306 with preprocessed images from the camera 104, to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. In certain aspects, the user may provide information through the user interface 114 to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. For example, the user may input information regarding the surrounding environment, such as the approximate region, whether it is indoors, outdoors, urban, rural, elevated, underground, etc. This may assist the object detector 306 in excluding less useful detecting algorithms and/or Tracking schemes (e.g., mountain detectors/trackers in an underground urban environment, elevator detectors/trackers in an outdoor rural setting, etc.). In certain aspects, the object detector 306 may automatically detect aspects of the surrounding environment to activate and/or deactivate the appropriate detecting algorithms and/or tracking schemes. In cases where detection of the object requires differentiation between the object and various environmental cues, features may be extracted from the images that are independent of the object bounding box. Aspects of the environment, such a foreground/background classification, environment classification, lighting, etc. The object detector 306 architecture may be configured to allow for an object-agnostic PTZ camera 104 target tracking system that is easily configurable for the type of object to be tracked and highly extensible to other object domains with little work needed on the part of the user. In certain aspects, the object detector 306 may use a bounding box to circumscribe an object within a scene during detection. In certain aspects, the detector may use a centroid, centered within the bounding box, to assist with detecting and/or tracking an object. In certain aspects, the object detector 306 may determine whether the detected; object is tracked in images and its boundary is determined and outlined as bounding boxe). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.

Regarding Claim 6, 
The combination of Rublee and Igor further discloses wherein the object-detection machine-learning model comprises a detection transformer.  (Igor, Description, discloses the one or more image processing techniques may include 2D and 3D object recognition, image segmentation, motion detection (e.g., single particle tracking), video tracking, optical flow, 3D Pose Estimation, etc. In certain aspects, the detection algorithms and tracking control schemes may be linked and/or otherwise associated. In certain aspects, the detection algorithms and tracking control schemes may be structured to conform to a particular modular format, to be easily swapped in and/or out of the object detector 306. In certain aspects, the detecting algorithms and/or tracking schemes may be tailored for various use cases in a reconfigurable design. In certain aspects, the detection algorithms and tracking control schemes may be trained through machine learning by artificial neural networks. In some examples, certain detection algorithms and/or tracking control schemes may be more appropriate for detecting and/or tracking a particular class, classification, type, variety, category, group, and/or grade of object than others. The detector library 308 may be implemented in hardware and/or software. In certain aspects, the detector library 308 may comprise a database. The object detector 306 may activate appropriate detecting algorithms and/or tracking schemes, while deactivating inappropriate detecting algorithms and/or tracking schemes; depending on the object being detected and/or tracked. In certain aspects, the object detector 306 may activate and/or deactivate detecting algorithms as a function of the class, classification, type, variety, category, group, and/or grade of the object being detected and/or tracked. In certain aspects, the object detector 306 may activate appropriate detecting algorithms and/or tracking schemes and/or deactivate inappropriate detecting algorithms and/or tracking schemes, depending on the desired and/or selected use case. In certain aspects, the GPU 302 may provide the object detector 306 with preprocessed images from the camera 104, to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. In certain aspects, the user may provide information through the user interface 114 to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. For example, the user may input information regarding the surrounding environment, such as the approximate region, whether it is indoors, outdoors, urban, rural, elevated, underground, etc. This may assist the object detector 306 in excluding less useful detecting algorithms and/or Tracking schemes (e.g., mountain detectors/trackers in an underground urban environment, elevator detectors/trackers in an outdoor rural setting, etc.). In certain aspects, the object detector 306 may automatically detect aspects of the surrounding environment to activate and/or deactivate the appropriate detecting algorithms and/or tracking schemes. In cases where detection of the object requires differentiation between the object and various environmental cues, features may be extracted from the images that are independent of the object bounding box. Aspects of the environment, such a foreground/background classification, environment classification, lighting, etc. The object detector 306 architecture may be configured to allow for an object-agnostic PTZ camera 104 target tracking system that is easily configurable for the type of object to be tracked and highly extensible to other object domains with little work needed on the part of the user. In certain aspects, the object detector 306 may use a bounding box to circumscribe an object within a scene during detection. In certain aspects, the detector may use a centroid, centered within the bounding box, to assist with detecting and/or tracking an object. In certain aspects, the object detector 306 may determine whether the detected; object is processed in images using machine learning algorithms and its position is located in images as output embedded). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.

Regarding Claim 7, 
The combination of Rublee and Igor further discloses wherein the object-detection machine-learning model comprises: a convolutional neural network (CNN) to generate features based on images; a transformer encoder to generate image features based on the features; and a transformer decoder to generate output embeddings based on the image features and queries.   (Rublee, Col. 3, Lines 58-67, Col. 4, Lines 1-15, Fig. 1, discloses a block diagram of a trained machine intelligence system 100 that is in accordance with specific embodiments of the invention disclosed herein. In the example of FIG. 1, the trained machine intelligence system is a convolutional neural network with a set of layers comprising an encoder 101 and a set of layers comprising a decoder 102. This is a common structure in the field of CNNs for image processing with the encoder converting the information from the image space to information in a feature space and the decoder converting the information in the feature space back into information in the image space. The CNN includes multiple kinds of layers including convolutional layers, up-sampling layers, pooling layers, and drop out layers. The CNN can be a deep convolutional network such as U-Net. Layers in the encoder and decoder are linked by concatenate operations 103. The intermediate layers of the trained machine intelligence system 100 include filters with values that are used in the convolutional layers and which are adjusted during training in order for the network to learn the characteristics of the known object. In this manner aspects of the known object and any variances in the training data are incorporated into the trained machine intelligence system during training; CNN converts the features of image object and an encoder and decoder encodes and decodes the feature of image object embedded outputs). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.

Regarding Claim 8, 
The combination of Rublee and Igor further discloses wherein the at least one processor is further configured to decode, using a decoder, the output embedding to generate first image coordinates corresponding to the object in the first image, wherein the predicted position of the object in the second image comprises second image coordinates corresponding to the predicted position of the object in the second image; and wherein, to modify the output embedding, the at least one processor is configured to determine a modification to the output embedding that results in the modified output embedding being decodable by the decoder to generate the second image coordinates. (Rublee, Col. 3, Lines 58-67, Col. 4, Lines 1-15, Fig. 1, discloses a block diagram of a trained machine intelligence system 100 that is in accordance with specific embodiments of the invention disclosed herein. In the example of FIG. 1, the trained machine intelligence system is a convolutional neural network with a set of layers comprising an encoder 101 and a set of layers comprising a decoder 102. This is a common structure in the field of CNNs for image processing with the encoder converting the information from the image space to information in a feature space and the decoder converting the information in the feature space back into information in the image space. The CNN includes multiple kinds of layers including convolutional layers, up-sampling layers, pooling layers, and drop out layers. The CNN can be a deep convolutional network such as U-Net. Layers in the encoder and decoder are linked by concatenate operations 103. The intermediate layers of the trained machine intelligence system 100 include filters with values that are used in the convolutional layers and which are adjusted during training in order for the network to learn the characteristics of the known object. In this manner aspects of the known object and any variances in the training data are incorporated into the trained machine intelligence system during training; CNN converts the features of image object and an encoder and decoder encodes and decodes the feature of image object embedded outputs). (Igor, Description, discloses the one or more image processing techniques may include 2D and 3D object recognition, image segmentation, motion detection (e.g., single particle tracking), video tracking, optical flow, 3D Pose Estimation, etc. In certain aspects, the detection algorithms and tracking control schemes may be linked and/or otherwise associated. In certain aspects, the detection algorithms and tracking control schemes may be structured to conform to a particular modular format, to be easily swapped in and/or out of the object detector 306. In certain aspects, the detecting algorithms and/or tracking schemes may be tailored for various use cases in a reconfigurable design. In certain aspects, the detection algorithms and tracking control schemes may be trained through machine learning by artificial neural networks. In some examples, certain detection algorithms and/or tracking control schemes may be more appropriate for detecting and/or tracking a particular class, classification, type, variety, category, group, and/or grade of object than others. The detector library 308 may be implemented in hardware and/or software. In certain aspects, the detector library 308 may comprise a database. The object detector 306 may activate appropriate detecting algorithms and/or tracking schemes, while deactivating inappropriate detecting algorithms and/or tracking schemes; depending on the object being detected and/or tracked. In certain aspects, the object detector 306 may activate and/or deactivate detecting algorithms as a function of the class, classification, type, variety, category, group, and/or grade of the object being detected and/or tracked. In certain aspects, the object detector 306 may activate appropriate detecting algorithms and/or tracking schemes and/or deactivate inappropriate detecting algorithms and/or tracking schemes, depending on the desired and/or selected use case. In certain aspects, the GPU 302 may provide the object detector 306 with preprocessed images from the camera 104, to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. In certain aspects, the user may provide information through the user interface 114 to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. For example, the user may input information regarding the surrounding environment, such as the approximate region, whether it is indoors, outdoors, urban, rural, elevated, underground, etc. This may assist the object detector 306 in excluding less useful detecting algorithms and/or Tracking schemes (e.g., mountain detectors/trackers in an underground urban environment, elevator detectors/trackers in an outdoor rural setting, etc.). In certain aspects, the object detector 306 may automatically detect aspects of the surrounding environment to activate and/or deactivate the appropriate detecting algorithms and/or tracking schemes. In cases where detection of the object requires differentiation between the object and various environmental cues, features may be extracted from the images that are independent of the object bounding box. Aspects of the environment, such a foreground/background classification, environment classification, lighting, etc. The object detector 306 architecture may be configured to allow for an object-agnostic PTZ camera 104 target tracking system that is easily configurable for the type of object to be tracked and highly extensible to other object domains with little work needed on the part of the user. In certain aspects, the object detector 306 may use a bounding box to circumscribe an object within a scene during detection. In certain aspects, the detector may use a centroid, centered within the bounding box, to assist with detecting and/or tracking an object. In certain aspects, the object detector 306 may determine whether the detected; object is processed in images using machine learning algorithms and its position is located in images as output embedded). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.


Regarding Claim 9, 
The combination of Rublee and Igor further discloses wherein the decoder comprises a linear predictor configured to generate image coordinates based on output embeddings.  


Regarding Claim 12, 
The combination of Rublee and Igor further discloses wherein the first image coordinates are indicative of a first bounding box associated with the object and wherein the second image coordinates are indicative of a second bounding box associated with the object.  (Rublee, Col. 3, Lines 58-67, Col. 4, Lines 1-15, Fig. 1, discloses a block diagram of a trained machine intelligence system 100 that is in accordance with specific embodiments of the invention disclosed herein. In the example of FIG. 1, the trained machine intelligence system is a convolutional neural network with a set of layers comprising an encoder 101 and a set of layers comprising a decoder 102. This is a common structure in the field of CNNs for image processing with the encoder converting the information from the image space to information in a feature space and the decoder converting the information in the feature space back into information in the image space. The CNN includes multiple kinds of layers including convolutional layers, up-sampling layers, pooling layers, and drop out layers. The CNN can be a deep convolutional network such as U-Net. Layers in the encoder and decoder are linked by concatenate operations 103. The intermediate layers of the trained machine intelligence system 100 include filters with values that are used in the convolutional layers and which are adjusted during training in order for the network to learn the characteristics of the known object. In this manner aspects of the known object and any variances in the training data are incorporated into the trained machine intelligence system during training; CNN converts the features of image object and an encoder and decoder encodes and decodes the feature of image object embedded outputs). (Igor, Description, discloses the one or more image processing techniques may include 2D and 3D object recognition, image segmentation, motion detection (e.g., single particle tracking), video tracking, optical flow, 3D Pose Estimation, etc. In certain aspects, the detection algorithms and tracking control schemes may be linked and/or otherwise associated. In certain aspects, the detection algorithms and tracking control schemes may be structured to conform to a particular modular format, to be easily swapped in and/or out of the object detector 306. In certain aspects, the detecting algorithms and/or tracking schemes may be tailored for various use cases in a reconfigurable design. In certain aspects, the detection algorithms and tracking control schemes may be trained through machine learning by artificial neural networks. In some examples, certain detection algorithms and/or tracking control schemes may be more appropriate for detecting and/or tracking a particular class, classification, type, variety, category, group, and/or grade of object than others. The detector library 308 may be implemented in hardware and/or software. In certain aspects, the detector library 308 may comprise a database. The object detector 306 may activate appropriate detecting algorithms and/or tracking schemes, while deactivating inappropriate detecting algorithms and/or tracking schemes; depending on the object being detected and/or tracked. In certain aspects, the object detector 306 may activate and/or deactivate detecting algorithms as a function of the class, classification, type, variety, category, group, and/or grade of the object being detected and/or tracked. In certain aspects, the object detector 306 may activate appropriate detecting algorithms and/or tracking schemes and/or deactivate inappropriate detecting algorithms and/or tracking schemes, depending on the desired and/or selected use case. In certain aspects, the GPU 302 may provide the object detector 306 with preprocessed images from the camera 104, to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. In certain aspects, the user may provide information through the user interface 114 to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. For example, the user may input information regarding the surrounding environment, such as the approximate region, whether it is indoors, outdoors, urban, rural, elevated, underground, etc. This may assist the object detector 306 in excluding less useful detecting algorithms and/or Tracking schemes (e.g., mountain detectors/trackers in an underground urban environment, elevator detectors/trackers in an outdoor rural setting, etc.). In certain aspects, the object detector 306 may automatically detect aspects of the surrounding environment to activate and/or deactivate the appropriate detecting algorithms and/or tracking schemes. In cases where detection of the object requires differentiation between the object and various environmental cues, features may be extracted from the images that are independent of the object bounding box. Aspects of the environment, such a foreground/background classification, environment classification, lighting, etc. The object detector 306 architecture may be configured to allow for an object-agnostic PTZ camera 104 target tracking system that is easily configurable for the type of object to be tracked and highly extensible to other object domains with little work needed on the part of the user. In certain aspects, the object detector 306 may use a bounding box to circumscribe an object within a scene during detection. In certain aspects, the detector may use a centroid, centered within the bounding box, to assist with detecting and/or tracking an object. In certain aspects, the object detector 306 may determine whether the detected; object is processed in images using machine learning algorithms and its position is located in images as output embedded). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.


Regarding Claim 14, 
The combination of Rublee and Igor further discloses wherein the output embedding is associated with detection of the object in the first image.  (Rublee, Col. 3, Lines 58-67, Col. 4, Lines 1-15, Fig. 1, discloses a block diagram of a trained machine intelligence system 100 that is in accordance with specific embodiments of the invention disclosed herein. In the example of FIG. 1, the trained machine intelligence system is a convolutional neural network with a set of layers comprising an encoder 101 and a set of layers comprising a decoder 102. This is a common structure in the field of CNNs for image processing with the encoder converting the information from the image space to information in a feature space and the decoder converting the information in the feature space back into information in the image space. The CNN includes multiple kinds of layers including convolutional layers, up-sampling layers, pooling layers, and drop out layers. The CNN can be a deep convolutional network such as U-Net. Layers in the encoder and decoder are linked by concatenate operations 103. The intermediate layers of the trained machine intelligence system 100 include filters with values that are used in the convolutional layers and which are adjusted during training in order for the network to learn the characteristics of the known object. In this manner aspects of the known object and any variances in the training data are incorporated into the trained machine intelligence system during training; CNN converts the features of image object and an encoder and decoder encodes and decodes the feature of image object embedded outputs). (Igor, Description, discloses the one or more image processing techniques may include 2D and 3D object recognition, image segmentation, motion detection (e.g., single particle tracking), video tracking, optical flow, 3D Pose Estimation, etc. In certain aspects, the detection algorithms and tracking control schemes may be linked and/or otherwise associated. In certain aspects, the detection algorithms and tracking control schemes may be structured to conform to a particular modular format, to be easily swapped in and/or out of the object detector 306. In certain aspects, the detecting algorithms and/or tracking schemes may be tailored for various use cases in a reconfigurable design. In certain aspects, the detection algorithms and tracking control schemes may be trained through machine learning by artificial neural networks. In some examples, certain detection algorithms and/or tracking control schemes may be more appropriate for detecting and/or tracking a particular class, classification, type, variety, category, group, and/or grade of object than others. The detector library 308 may be implemented in hardware and/or software. In certain aspects, the detector library 308 may comprise a database. The object detector 306 may activate appropriate detecting algorithms and/or tracking schemes, while deactivating inappropriate detecting algorithms and/or tracking schemes; depending on the object being detected and/or tracked. In certain aspects, the object detector 306 may activate and/or deactivate detecting algorithms as a function of the class, classification, type, variety, category, group, and/or grade of the object being detected and/or tracked. In certain aspects, the object detector 306 may activate appropriate detecting algorithms and/or tracking schemes and/or deactivate inappropriate detecting algorithms and/or tracking schemes, depending on the desired and/or selected use case. In certain aspects, the GPU 302 may provide the object detector 306 with preprocessed images from the camera 104, to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. In certain aspects, the user may provide information through the user interface 114 to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. For example, the user may input information regarding the surrounding environment, such as the approximate region, whether it is indoors, outdoors, urban, rural, elevated, underground, etc. This may assist the object detector 306 in excluding less useful detecting algorithms and/or Tracking schemes (e.g., mountain detectors/trackers in an underground urban environment, elevator detectors/trackers in an outdoor rural setting, etc.). In certain aspects, the object detector 306 may automatically detect aspects of the surrounding environment to activate and/or deactivate the appropriate detecting algorithms and/or tracking schemes. In cases where detection of the object requires differentiation between the object and various environmental cues, features may be extracted from the images that are independent of the object bounding box. Aspects of the environment, such a foreground/background classification, environment classification, lighting, etc. The object detector 306 architecture may be configured to allow for an object-agnostic PTZ camera 104 target tracking system that is easily configurable for the type of object to be tracked and highly extensible to other object domains with little work needed on the part of the user. In certain aspects, the object detector 306 may use a bounding box to circumscribe an object within a scene during detection. In certain aspects, the detector may use a centroid, centered within the bounding box, to assist with detecting and/or tracking an object. In certain aspects, the object detector 306 may determine whether the detected; object is processed in images using machine learning algorithms and its position is located in images as output embedded). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.

Regarding Claim 15, 
The combination of Rublee and Igor further discloses wherein, to generate the output embedding, the at least one processor is configured to: provide the first image to an object-detection machine-learning model; and receive, from the object-detection machine-learning model, the output embedding based on the first image.  (Rublee, Col. 3, Lines 58-67, Col. 4, Lines 1-15, Fig. 1, discloses a block diagram of a trained machine intelligence system 100 that is in accordance with specific embodiments of the invention disclosed herein. In the example of FIG. 1, the trained machine intelligence system is a convolutional neural network with a set of layers comprising an encoder 101 and a set of layers comprising a decoder 102. This is a common structure in the field of CNNs for image processing with the encoder converting the information from the image space to information in a feature space and the decoder converting the information in the feature space back into information in the image space. The CNN includes multiple kinds of layers including convolutional layers, up-sampling layers, pooling layers, and drop out layers. The CNN can be a deep convolutional network such as U-Net. Layers in the encoder and decoder are linked by concatenate operations 103. The intermediate layers of the trained machine intelligence system 100 include filters with values that are used in the convolutional layers and which are adjusted during training in order for the network to learn the characteristics of the known object. In this manner aspects of the known object and any variances in the training data are incorporated into the trained machine intelligence system during training; CNN converts the features of image object and an encoder and decoder encodes and decodes the feature of image object embedded outputs). (Igor, Description, discloses the one or more image processing techniques may include 2D and 3D object recognition, image segmentation, motion detection (e.g., single particle tracking), video tracking, optical flow, 3D Pose Estimation, etc. In certain aspects, the detection algorithms and tracking control schemes may be linked and/or otherwise associated. In certain aspects, the detection algorithms and tracking control schemes may be structured to conform to a particular modular format, to be easily swapped in and/or out of the object detector 306. In certain aspects, the detecting algorithms and/or tracking schemes may be tailored for various use cases in a reconfigurable design. In certain aspects, the detection algorithms and tracking control schemes may be trained through machine learning by artificial neural networks. In some examples, certain detection algorithms and/or tracking control schemes may be more appropriate for detecting and/or tracking a particular class, classification, type, variety, category, group, and/or grade of object than others. The detector library 308 may be implemented in hardware and/or software. In certain aspects, the detector library 308 may comprise a database. The object detector 306 may activate appropriate detecting algorithms and/or tracking schemes, while deactivating inappropriate detecting algorithms and/or tracking schemes; depending on the object being detected and/or tracked. In certain aspects, the object detector 306 may activate and/or deactivate detecting algorithms as a function of the class, classification, type, variety, category, group, and/or grade of the object being detected and/or tracked. In certain aspects, the object detector 306 may activate appropriate detecting algorithms and/or tracking schemes and/or deactivate inappropriate detecting algorithms and/or tracking schemes, depending on the desired and/or selected use case. In certain aspects, the GPU 302 may provide the object detector 306 with preprocessed images from the camera 104, to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. In certain aspects, the user may provide information through the user interface 114 to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. For example, the user may input information regarding the surrounding environment, such as the approximate region, whether it is indoors, outdoors, urban, rural, elevated, underground, etc. This may assist the object detector 306 in excluding less useful detecting algorithms and/or Tracking schemes (e.g., mountain detectors/trackers in an underground urban environment, elevator detectors/trackers in an outdoor rural setting, etc.). In certain aspects, the object detector 306 may automatically detect aspects of the surrounding environment to activate and/or deactivate the appropriate detecting algorithms and/or tracking schemes. In cases where detection of the object requires differentiation between the object and various environmental cues, features may be extracted from the images that are independent of the object bounding box. Aspects of the environment, such a foreground/background classification, environment classification, lighting, etc. The object detector 306 architecture may be configured to allow for an object-agnostic PTZ camera 104 target tracking system that is easily configurable for the type of object to be tracked and highly extensible to other object domains with little work needed on the part of the user. In certain aspects, the object detector 306 may use a bounding box to circumscribe an object within a scene during detection. In certain aspects, the detector may use a centroid, centered within the bounding box, to assist with detecting and/or tracking an object. In certain aspects, the object detector 306 may determine whether the detected; object is processed in images using machine learning algorithms and its position is located in images as output embedded). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.

Regarding Claim 16, 
The combination of Rublee and Igor further discloses wherein the object-detection machine-learning model comprises a detection transformer.  (Rublee, Col. 3, Lines 58-67, Col. 4, Lines 1-15, Fig. 1, discloses a block diagram of a trained machine intelligence system 100 that is in accordance with specific embodiments of the invention disclosed herein. In the example of FIG. 1, the trained machine intelligence system is a convolutional neural network with a set of layers comprising an encoder 101 and a set of layers comprising a decoder 102. This is a common structure in the field of CNNs for image processing with the encoder converting the information from the image space to information in a feature space and the decoder converting the information in the feature space back into information in the image space. The CNN includes multiple kinds of layers including convolutional layers, up-sampling layers, pooling layers, and drop out layers. The CNN can be a deep convolutional network such as U-Net. Layers in the encoder and decoder are linked by concatenate operations 103. The intermediate layers of the trained machine intelligence system 100 include filters with values that are used in the convolutional layers and which are adjusted during training in order for the network to learn the characteristics of the known object. In this manner aspects of the known object and any variances in the training data are incorporated into the trained machine intelligence system during training; CNN converts the features of image object and an encoder and decoder encodes and decodes the feature of image object embedded outputs). (Igor, Description, discloses the one or more image processing techniques may include 2D and 3D object recognition, image segmentation, motion detection (e.g., single particle tracking), video tracking, optical flow, 3D Pose Estimation, etc. In certain aspects, the detection algorithms and tracking control schemes may be linked and/or otherwise associated. In certain aspects, the detection algorithms and tracking control schemes may be structured to conform to a particular modular format, to be easily swapped in and/or out of the object detector 306. In certain aspects, the detecting algorithms and/or tracking schemes may be tailored for various use cases in a reconfigurable design. In certain aspects, the detection algorithms and tracking control schemes may be trained through machine learning by artificial neural networks. In some examples, certain detection algorithms and/or tracking control schemes may be more appropriate for detecting and/or tracking a particular class, classification, type, variety, category, group, and/or grade of object than others. The detector library 308 may be implemented in hardware and/or software. In certain aspects, the detector library 308 may comprise a database. The object detector 306 may activate appropriate detecting algorithms and/or tracking schemes, while deactivating inappropriate detecting algorithms and/or tracking schemes; depending on the object being detected and/or tracked. In certain aspects, the object detector 306 may activate and/or deactivate detecting algorithms as a function of the class, classification, type, variety, category, group, and/or grade of the object being detected and/or tracked. In certain aspects, the object detector 306 may activate appropriate detecting algorithms and/or tracking schemes and/or deactivate inappropriate detecting algorithms and/or tracking schemes, depending on the desired and/or selected use case. In certain aspects, the GPU 302 may provide the object detector 306 with preprocessed images from the camera 104, to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. In certain aspects, the user may provide information through the user interface 114 to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. For example, the user may input information regarding the surrounding environment, such as the approximate region, whether it is indoors, outdoors, urban, rural, elevated, underground, etc. This may assist the object detector 306 in excluding less useful detecting algorithms and/or Tracking schemes (e.g., mountain detectors/trackers in an underground urban environment, elevator detectors/trackers in an outdoor rural setting, etc.). In certain aspects, the object detector 306 may automatically detect aspects of the surrounding environment to activate and/or deactivate the appropriate detecting algorithms and/or tracking schemes. In cases where detection of the object requires differentiation between the object and various environmental cues, features may be extracted from the images that are independent of the object bounding box. Aspects of the environment, such a foreground/background classification, environment classification, lighting, etc. The object detector 306 architecture may be configured to allow for an object-agnostic PTZ camera 104 target tracking system that is easily configurable for the type of object to be tracked and highly extensible to other object domains with little work needed on the part of the user. In certain aspects, the object detector 306 may use a bounding box to circumscribe an object within a scene during detection. In certain aspects, the detector may use a centroid, centered within the bounding box, to assist with detecting and/or tracking an object. In certain aspects, the object detector 306 may determine whether the detected; object is processed in images using machine learning algorithms and its position is located in images as output embedded). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.


Regarding Claim 17, 
The combination of Rublee and Igor further discloses wherein the object-detection machine-learning model comprises: a convolutional neural network (CNN) to generate features based on images; a transformer encoder to generate image features based on the features; and a transformer decoder to generate output embeddings based on the image features and queries.  (Rublee, Col. 3, Lines 58-67, Col. 4, Lines 1-15, Fig. 1, discloses a block diagram of a trained machine intelligence system 100 that is in accordance with specific embodiments of the invention disclosed herein. In the example of FIG. 1, the trained machine intelligence system is a convolutional neural network with a set of layers comprising an encoder 101 and a set of layers comprising a decoder 102. This is a common structure in the field of CNNs for image processing with the encoder converting the information from the image space to information in a feature space and the decoder converting the information in the feature space back into information in the image space. The CNN includes multiple kinds of layers including convolutional layers, up-sampling layers, pooling layers, and drop out layers. The CNN can be a deep convolutional network such as U-Net. Layers in the encoder and decoder are linked by concatenate operations 103. The intermediate layers of the trained machine intelligence system 100 include filters with values that are used in the convolutional layers and which are adjusted during training in order for the network to learn the characteristics of the known object. In this manner aspects of the known object and any variances in the training data are incorporated into the trained machine intelligence system during training; CNN converts the features of image object and an encoder and decoder encodes and decodes the feature of image object embedded outputs). (Igor, Description, discloses the one or more image processing techniques may include 2D and 3D object recognition, image segmentation, motion detection (e.g., single particle tracking), video tracking, optical flow, 3D Pose Estimation, etc. In certain aspects, the detection algorithms and tracking control schemes may be linked and/or otherwise associated. In certain aspects, the detection algorithms and tracking control schemes may be structured to conform to a particular modular format, to be easily swapped in and/or out of the object detector 306. In certain aspects, the detecting algorithms and/or tracking schemes may be tailored for various use cases in a reconfigurable design. In certain aspects, the detection algorithms and tracking control schemes may be trained through machine learning by artificial neural networks. In some examples, certain detection algorithms and/or tracking control schemes may be more appropriate for detecting and/or tracking a particular class, classification, type, variety, category, group, and/or grade of object than others. The detector library 308 may be implemented in hardware and/or software. In certain aspects, the detector library 308 may comprise a database. The object detector 306 may activate appropriate detecting algorithms and/or tracking schemes, while deactivating inappropriate detecting algorithms and/or tracking schemes; depending on the object being detected and/or tracked. In certain aspects, the object detector 306 may activate and/or deactivate detecting algorithms as a function of the class, classification, type, variety, category, group, and/or grade of the object being detected and/or tracked. In certain aspects, the object detector 306 may activate appropriate detecting algorithms and/or tracking schemes and/or deactivate inappropriate detecting algorithms and/or tracking schemes, depending on the desired and/or selected use case. In certain aspects, the GPU 302 may provide the object detector 306 with preprocessed images from the camera 104, to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. In certain aspects, the user may provide information through the user interface 114 to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. For example, the user may input information regarding the surrounding environment, such as the approximate region, whether it is indoors, outdoors, urban, rural, elevated, underground, etc. This may assist the object detector 306 in excluding less useful detecting algorithms and/or Tracking schemes (e.g., mountain detectors/trackers in an underground urban environment, elevator detectors/trackers in an outdoor rural setting, etc.). In certain aspects, the object detector 306 may automatically detect aspects of the surrounding environment to activate and/or deactivate the appropriate detecting algorithms and/or tracking schemes. In cases where detection of the object requires differentiation between the object and various environmental cues, features may be extracted from the images that are independent of the object bounding box. Aspects of the environment, such a foreground/background classification, environment classification, lighting, etc. The object detector 306 architecture may be configured to allow for an object-agnostic PTZ camera 104 target tracking system that is easily configurable for the type of object to be tracked and highly extensible to other object domains with little work needed on the part of the user. In certain aspects, the object detector 306 may use a bounding box to circumscribe an object within a scene during detection. In certain aspects, the detector may use a centroid, centered within the bounding box, to assist with detecting and/or tracking an object. In certain aspects, the object detector 306 may determine whether the detected; object is processed in images using machine learning algorithms and its position is located in images as output embedded). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.

Regarding Claim 18, 
The combination of Rublee and Igor further discloses wherein the predicted position of the object in the second image is based on relative motion data.  (Igor, Description, discloses the cameras 104 may be used to identify objects through three-dimensional reconstruction techniques such as optical flow to process a sequence of
images. Optical flow may be used to determine the pattern of apparent motion of objects,
surfaces, and edges in a visual scene caused by the relative motion between an observer and a scene image. In certain aspects, the object detector 306 may use audio information from the camera 104 in determining whether an object is moving or stationary. For
example, a changing amplification of a particular sound, and/or a changing frequency of a
particular sound, may be interpreted as indicating movement. The object detector 306 may
disregard objects (and/or corresponding bounding boxes and/or centroids) that are
determined to be moving. Moving objects may be, for example, humans, vehicles, and/or animals. The object detector 306 may provide bounding box and/or centroid information
corresponding to stationary objects to the state estimator 310. Stationary object may
comprise, for example, sign posts, landmarks, vending machines, entrance/exit doors,
building architecture, topography, etc.. In certain aspects, the object detector 306
may perform its operations in conjunction with (and/or with assistance from) other components of the object-tracking system 100, such as, for example, the logic controller 304, the GPU 302, the IMU 110, the data-management unit 312, and/or the state estimator 310. The IMU 110 may be configured to measure the user's specific force, angular rate, and/or magnetic field surrounding the user. The IMU 110 may
additionally, or alternatively, measure angular velocity, rotational rate, and/or linear
acceleration of the user. The IMU 110 may comprise one or more of an accelerometer, a gyroscope, and/or a magnetometer. In certain aspects, the IMU 110 may comprise a plurality of accelerometers, gyroscopes, and/or magnetometers.
The state estimator 310 may be configured to perform a variety of tasks. In
certain aspects, the state estimator 310 may estimate and/or predict the
current and/or future position(s) (and/or location(s)) of one or more objects detected and/or
tracked by the camera 104 and/or object detector 306. In certain aspects, the state estimator 310 may estimate and/or predict the current and/or future position(s) (and/or location(s)) of one or more users of the object-tracking system 100. In certain aspects, the state estimator 310 may perform simultaneous localization and mapping (SLAM) using one or more SLAM algorithms to estimate and/or predict the current and/or future position(s) of objects and users in the local environment. In certain aspects, the state estimator 310 may employ visual odometry with a Kalman filter to assist in performing its prediction and/or estimation. In certain aspects, the Kalman filter may be a multi-state constrained Kalman Filter (MSCKF). In certain aspects, the state estimator 310 may also employ traditional odometry with information provided by the IMU 110 to assist in its prediction and/or estimation. In some examples, drift may be prevalent in the measurements of the IMU 110, and the visual odometry used by the state estimator 310 may help to correct for this drift. In some examples the IMU 110 may be part of the computer 112. Information to and/or from the IMU 110 may be routed through the data-management unit 312. The state estimator 310 may use information from the object detector 306 and/or IMU 110, in conjunction with SLAM algorithms, odometry methods, and/or visual Odometry methods, to estimate and/or predict the current and/or future position(s) of the user and/or objects in the local environment, and may generate, maintain, and/or update a local map with this information. The map may be stored in a memory device 122. In certain aspects, the map may be generated using map information acquired before tracking services (e.g., GPS, satellite, and/or cellular communication abilities) were lost. The GPU 302 may be configured to render the map on the display 116 in accordance with corresponding selection by the user via the user interface 114, an example of which is described in connection with Figures 6a and 6b. The data-management unit 312 may be configured to provide an interface between components of the object-tracking system 100, and/or other systems and/or devices external to the object-tracking system 100. For example, the data-management unit 312 may provide an interface between the GPU 302, controller, state estimator 310; relative motion of camera with respect to objects in frames is determined based on obtained position of object from first frame to second frame). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.


Regarding Claim 19, 
The combination of Rublee and Igor further discloses wherein the relative motion data is based on at least one of ego-motion data that is indicative of motion of a camera associated with the first image and the second image or object-motion data that is indicative of motion of the object.  (Igor, Description, discloses the cameras 104 may be used to identify objects through three-dimensional reconstruction techniques such as optical flow to process a sequence of
images. Optical flow may be used to determine the pattern of apparent motion of objects,
surfaces, and edges in a visual scene caused by the relative motion between an observer and a scene image. In certain aspects, the object detector 306 may use audio information from the camera 104 in determining whether an object is moving or stationary. For
example, a changing amplification of a particular sound, and/or a changing frequency of a
particular sound, may be interpreted as indicating movement. The object detector 306 may
disregard objects (and/or corresponding bounding boxes and/or centroids) that are
determined to be moving. Moving objects may be, for example, humans, vehicles, and/or animals. The object detector 306 may provide bounding box and/or centroid information
corresponding to stationary objects to the state estimator 310. Stationary object may
comprise, for example, sign posts, landmarks, vending machines, entrance/exit doors,
building architecture, topography, etc.. In certain aspects, the object detector 306
may perform its operations in conjunction with (and/or with assistance from) other components of the object-tracking system 100, such as, for example, the logic controller 304, the GPU 302, the IMU 110, the data-management unit 312, and/or the state estimator 310. The IMU 110 may be configured to measure the user's specific force, angular rate, and/or magnetic field surrounding the user. The IMU 110 may
additionally, or alternatively, measure angular velocity, rotational rate, and/or linear
acceleration of the user. The IMU 110 may comprise one or more of an accelerometer, a gyroscope, and/or a magnetometer. In certain aspects, the IMU 110 may comprise a plurality of accelerometers, gyroscopes, and/or magnetometers.
The state estimator 310 may be configured to perform a variety of tasks. In
certain aspects, the state estimator 310 may estimate and/or predict the
current and/or future position(s) (and/or location(s)) of one or more objects detected and/or
tracked by the camera 104 and/or object detector 306. In certain aspects, the state estimator 310 may estimate and/or predict the current and/or future position(s) (and/or location(s)) of one or more users of the object-tracking system 100. In certain aspects, the state estimator 310 may perform simultaneous localization and mapping (SLAM) using one or more SLAM algorithms to estimate and/or predict the current and/or future position(s) of objects and users in the local environment. In certain aspects, the state estimator 310 may employ visual odometry with a Kalman filter to assist in performing its prediction and/or estimation. In certain aspects, the Kalman filter may be a multi-state constrained Kalman Filter (MSCKF). In certain aspects, the state estimator 310 may also employ traditional odometry with information provided by the IMU 110 to assist in its prediction and/or estimation. In some examples, drift may be prevalent in the measurements of the IMU 110, and the visual odometry used by the state estimator 310 may help to correct for this drift. In some examples the IMU 110 may be part of the computer 112. Information to and/or from the IMU 110 may be routed through the data-management unit 312. The state estimator 310 may use information from the object detector 306 and/or IMU 110, in conjunction with SLAM algorithms, odometry methods, and/or visual Odometry methods, to estimate and/or predict the current and/or future position(s) of the user and/or objects in the local environment, and may generate, maintain, and/or update a local map with this information. The map may be stored in a memory device 122. In certain aspects, the map may be generated using map information acquired before tracking services (e.g., GPS, satellite, and/or cellular communication abilities) were lost. The GPU 302 may be configured to render the map on the display 116 in accordance with corresponding selection by the user via the user interface 114, an example of which is described in connection with Figures 6a and 6b. The data-management unit 312 may be configured to provide an interface between components of the object-tracking system 100, and/or other systems and/or devices external to the object-tracking system 100. For example, the data-management unit 312 may provide an interface between the GPU 302, controller, state estimator 310; relative motion of camera (ego-motion of vehicle or any moving object where the camera is placed) with respect to objects in frames is determined based on obtained position of object from first frame to second frame). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.

Regarding Claim 20, 
The combination of Rublee and Igor further discloses wherein the at least one processor is further configured to determine a position of the object in the first image, wherein the predicted position of the object in the second image is based on the position of the object in the first image.  (Rublee, Col. 3, Lines 58-67, Col. 4, Lines 1-15, Fig. 1, discloses a block diagram of a trained machine intelligence system 100 that is in accordance with specific embodiments of the invention disclosed herein. In the example of FIG. 1, the trained machine intelligence system is a convolutional neural network with a set of layers comprising an encoder 101 and a set of layers comprising a decoder 102. This is a common structure in the field of CNNs for image processing with the encoder converting the information from the image space to information in a feature space and the decoder converting the information in the feature space back into information in the image space. The CNN includes multiple kinds of layers including convolutional layers, up-sampling layers, pooling layers, and drop out layers. The CNN can be a deep convolutional network such as U-Net. Layers in the encoder and decoder are linked by concatenate operations 103. The intermediate layers of the trained machine intelligence system 100 include filters with values that are used in the convolutional layers and which are adjusted during training in order for the network to learn the characteristics of the known object. In this manner aspects of the known object and any variances in the training data are incorporated into the trained machine intelligence system during training; CNN converts the features of image object and an encoder and decoder encodes and decodes the feature of image object embedded outputs). (Igor, Description, discloses the one or more image processing techniques may include 2D and 3D object recognition, image segmentation, motion detection (e.g., single particle tracking), video tracking, optical flow, 3D Pose Estimation, etc. In certain aspects, the detection algorithms and tracking control schemes may be linked and/or otherwise associated. In certain aspects, the detection algorithms and tracking control schemes may be structured to conform to a particular modular format, to be easily swapped in and/or out of the object detector 306. In certain aspects, the detecting algorithms and/or tracking schemes may be tailored for various use cases in a reconfigurable design. In certain aspects, the detection algorithms and tracking control schemes may be trained through machine learning by artificial neural networks. In some examples, certain detection algorithms and/or tracking control schemes may be more appropriate for detecting and/or tracking a particular class, classification, type, variety, category, group, and/or grade of object than others. The detector library 308 may be implemented in hardware and/or software. In certain aspects, the detector library 308 may comprise a database. The object detector 306 may activate appropriate detecting algorithms and/or tracking schemes, while deactivating inappropriate detecting algorithms and/or tracking schemes; depending on the object being detected and/or tracked. In certain aspects, the object detector 306 may activate and/or deactivate detecting algorithms as a function of the class, classification, type, variety, category, group, and/or grade of the object being detected and/or tracked. In certain aspects, the object detector 306 may activate appropriate detecting algorithms and/or tracking schemes and/or deactivate inappropriate detecting algorithms and/or tracking schemes, depending on the desired and/or selected use case. In certain aspects, the GPU 302 may provide the object detector 306 with preprocessed images from the camera 104, to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. In certain aspects, the user may provide information through the user interface 114 to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. For example, the user may input information regarding the surrounding environment, such as the approximate region, whether it is indoors, outdoors, urban, rural, elevated, underground, etc. This may assist the object detector 306 in excluding less useful detecting algorithms and/or Tracking schemes (e.g., mountain detectors/trackers in an underground urban environment, elevator detectors/trackers in an outdoor rural setting, etc.). In certain aspects, the object detector 306 may automatically detect aspects of the surrounding environment to activate and/or deactivate the appropriate detecting algorithms and/or tracking schemes. In cases where detection of the object requires differentiation between the object and various environmental cues, features may be extracted from the images that are independent of the object bounding box. Aspects of the environment, such a foreground/background classification, environment classification, lighting, etc. The object detector 306 architecture may be configured to allow for an object-agnostic PTZ camera 104 target tracking system that is easily configurable for the type of object to be tracked and highly extensible to other object domains with little work needed on the part of the user. In certain aspects, the object detector 306 may use a bounding box to circumscribe an object within a scene during detection. In certain aspects, the detector may use a centroid, centered within the bounding box, to assist with detecting and/or tracking an object. In certain aspects, the object detector 306 may determine whether the detected; object is processed in images using machine learning algorithms and its position is located in images as output embedded). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.

Regarding Claim 21, 
The combination of Rublee and Igor further discloses wherein the at least one processor is further configured to: determine a relative position of the object relative to a camera associated with the first image based on the position of the object in the first image; and generate the predicted position of the object in the second image based on the relative position of the object.  (Rublee, Col. 3, Lines 58-67, Col. 4, Lines 1-15, Fig. 1, discloses a block diagram of a trained machine intelligence system 100 that is in accordance with specific embodiments of the invention disclosed herein. In the example of FIG. 1, the trained machine intelligence system is a convolutional neural network with a set of layers comprising an encoder 101 and a set of layers comprising a decoder 102. This is a common structure in the field of CNNs for image processing with the encoder converting the information from the image space to information in a feature space and the decoder converting the information in the feature space back into information in the image space. The CNN includes multiple kinds of layers including convolutional layers, up-sampling layers, pooling layers, and drop out layers. The CNN can be a deep convolutional network such as U-Net. Layers in the encoder and decoder are linked by concatenate operations 103. The intermediate layers of the trained machine intelligence system 100 include filters with values that are used in the convolutional layers and which are adjusted during training in order for the network to learn the characteristics of the known object. In this manner aspects of the known object and any variances in the training data are incorporated into the trained machine intelligence system during training; CNN converts the features of image object and an encoder and decoder encodes and decodes the feature of image object embedded outputs). (Igor, Description, discloses the one or more image processing techniques may include 2D and 3D object recognition, image segmentation, motion detection (e.g., single particle tracking), video tracking, optical flow, 3D Pose Estimation, etc. In certain aspects, the detection algorithms and tracking control schemes may be linked and/or otherwise associated. In certain aspects, the detection algorithms and tracking control schemes may be structured to conform to a particular modular format, to be easily swapped in and/or out of the object detector 306. In certain aspects, the detecting algorithms and/or tracking schemes may be tailored for various use cases in a reconfigurable design. In certain aspects, the detection algorithms and tracking control schemes may be trained through machine learning by artificial neural networks. In some examples, certain detection algorithms and/or tracking control schemes may be more appropriate for detecting and/or tracking a particular class, classification, type, variety, category, group, and/or grade of object than others. The detector library 308 may be implemented in hardware and/or software. In certain aspects, the detector library 308 may comprise a database. The object detector 306 may activate appropriate detecting algorithms and/or tracking schemes, while deactivating inappropriate detecting algorithms and/or tracking schemes; depending on the object being detected and/or tracked. In certain aspects, the object detector 306 may activate and/or deactivate detecting algorithms as a function of the class, classification, type, variety, category, group, and/or grade of the object being detected and/or tracked. In certain aspects, the object detector 306 may activate appropriate detecting algorithms and/or tracking schemes and/or deactivate inappropriate detecting algorithms and/or tracking schemes, depending on the desired and/or selected use case. In certain aspects, the GPU 302 may provide the object detector 306 with preprocessed images from the camera 104, to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. In certain aspects, the user may provide information through the user interface 114 to assist the object detector 306 in determining the appropriate detecting algorithms and/or tracking schemes to activate and/or deactivate. For example, the user may input information regarding the surrounding environment, such as the approximate region, whether it is indoors, outdoors, urban, rural, elevated, underground, etc. This may assist the object detector 306 in excluding less useful detecting algorithms and/or Tracking schemes (e.g., mountain detectors/trackers in an underground urban environment, elevator detectors/trackers in an outdoor rural setting, etc.). In certain aspects, the object detector 306 may automatically detect aspects of the surrounding environment to activate and/or deactivate the appropriate detecting algorithms and/or tracking schemes. In cases where detection of the object requires differentiation between the object and various environmental cues, features may be extracted from the images that are independent of the object bounding box. Aspects of the environment, such a foreground/background classification, environment classification, lighting, etc. The object detector 306 architecture may be configured to allow for an object-agnostic PTZ camera 104 target tracking system that is easily configurable for the type of object to be tracked and highly extensible to other object domains with little work needed on the part of the user. In certain aspects, the object detector 306 may use a bounding box to circumscribe an object within a scene during detection. In certain aspects, the detector may use a centroid, centered within the bounding box, to assist with detecting and/or tracking an object. In certain aspects, the object detector 306 may determine whether the detected; object is processed in images using machine learning algorithms and its position is located in images as output embedded). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.

Regarding Claim 22, 
The combination of Rublee and Igor further discloses wherein the predicted position of the object in the second image is further based on at least one of ego-motion data that is indicative of motion of a camera associated with the first image and the second image or object-motion data that is indicative of motion of the object.  (Igor, Description, discloses the cameras 104 may be used to identify objects through three-dimensional reconstruction techniques such as optical flow to process a sequence of
images. Optical flow may be used to determine the pattern of apparent motion of objects,
surfaces, and edges in a visual scene caused by the relative motion between an observer and a scene image. In certain aspects, the object detector 306 may use audio information from the camera 104 in determining whether an object is moving or stationary. For
example, a changing amplification of a particular sound, and/or a changing frequency of a
particular sound, may be interpreted as indicating movement. The object detector 306 may
disregard objects (and/or corresponding bounding boxes and/or centroids) that are
determined to be moving. Moving objects may be, for example, humans, vehicles, and/or animals. The object detector 306 may provide bounding box and/or centroid information
corresponding to stationary objects to the state estimator 310. Stationary object may
comprise, for example, sign posts, landmarks, vending machines, entrance/exit doors,
building architecture, topography, etc.. In certain aspects, the object detector 306
may perform its operations in conjunction with (and/or with assistance from) other components of the object-tracking system 100, such as, for example, the logic controller 304, the GPU 302, the IMU 110, the data-management unit 312, and/or the state estimator 310. The IMU 110 may be configured to measure the user's specific force, angular rate, and/or magnetic field surrounding the user. The IMU 110 may
additionally, or alternatively, measure angular velocity, rotational rate, and/or linear
acceleration of the user. The IMU 110 may comprise one or more of an accelerometer, a gyroscope, and/or a magnetometer. In certain aspects, the IMU 110 may comprise a plurality of accelerometers, gyroscopes, and/or magnetometers.
The state estimator 310 may be configured to perform a variety of tasks. In
certain aspects, the state estimator 310 may estimate and/or predict the
current and/or future position(s) (and/or location(s)) of one or more objects detected and/or
tracked by the camera 104 and/or object detector 306. In certain aspects, the state estimator 310 may estimate and/or predict the current and/or future position(s) (and/or location(s)) of one or more users of the object-tracking system 100. In certain aspects, the state estimator 310 may perform simultaneous localization and mapping (SLAM) using one or more SLAM algorithms to estimate and/or predict the current and/or future position(s) of objects and users in the local environment. In certain aspects, the state estimator 310 may employ visual odometry with a Kalman filter to assist in performing its prediction and/or estimation. In certain aspects, the Kalman filter may be a multi-state constrained Kalman Filter (MSCKF). In certain aspects, the state estimator 310 may also employ traditional odometry with information provided by the IMU 110 to assist in its prediction and/or estimation. In some examples, drift may be prevalent in the measurements of the IMU 110, and the visual odometry used by the state estimator 310 may help to correct for this drift. In some examples the IMU 110 may be part of the computer 112. Information to and/or from the IMU 110 may be routed through the data-management unit 312. The state estimator 310 may use information from the object detector 306 and/or IMU 110, in conjunction with SLAM algorithms, odometry methods, and/or visual Odometry methods, to estimate and/or predict the current and/or future position(s) of the user and/or objects in the local environment, and may generate, maintain, and/or update a local map with this information. The map may be stored in a memory device 122. In certain aspects, the map may be generated using map information acquired before tracking services (e.g., GPS, satellite, and/or cellular communication abilities) were lost. The GPU 302 may be configured to render the map on the display 116 in accordance with corresponding selection by the user via the user interface 114, an example of which is described in connection with Figures 6a and 6b. The data-management unit 312 may be configured to provide an interface between components of the object-tracking system 100, and/or other systems and/or devices external to the object-tracking system 100. For example, the data-management unit 312 may provide an interface between the GPU 302, controller, state estimator 310; relative motion of camera with respect to objects in frames is determined based on obtained position of object from first frame to second frame). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.


Regarding Claim 23, 
The combination of Rublee and Igor further discloses wherein the predicted position of the object in the second image is based on at least one of: a position of the object in the first image; a relative position of the object relative to a camera associated with the first image and the second image; ego-motion data indicative of motion of the camera associated with the first image and the second image; or object-motion data indicative of motion of the object. (Igor, Description, discloses the cameras 104 may be used to identify objects through three-dimensional reconstruction techniques such as optical flow to process a sequence of
images. Optical flow may be used to determine the pattern of apparent motion of objects,
surfaces, and edges in a visual scene caused by the relative motion between an observer and a scene image. In certain aspects, the object detector 306 may use audio information from the camera 104 in determining whether an object is moving or stationary. For
example, a changing amplification of a particular sound, and/or a changing frequency of a
particular sound, may be interpreted as indicating movement. The object detector 306 may
disregard objects (and/or corresponding bounding boxes and/or centroids) that are
determined to be moving. Moving objects may be, for example, humans, vehicles, and/or animals. The object detector 306 may provide bounding box and/or centroid information
corresponding to stationary objects to the state estimator 310. Stationary object may
comprise, for example, sign posts, landmarks, vending machines, entrance/exit doors,
building architecture, topography, etc.. In certain aspects, the object detector 306
may perform its operations in conjunction with (and/or with assistance from) other components of the object-tracking system 100, such as, for example, the logic controller 304, the GPU 302, the IMU 110, the data-management unit 312, and/or the state estimator 310. The IMU 110 may be configured to measure the user's specific force, angular rate, and/or magnetic field surrounding the user. The IMU 110 may
additionally, or alternatively, measure angular velocity, rotational rate, and/or linear
acceleration of the user. The IMU 110 may comprise one or more of an accelerometer, a gyroscope, and/or a magnetometer. In certain aspects, the IMU 110 may comprise a plurality of accelerometers, gyroscopes, and/or magnetometers.
The state estimator 310 may be configured to perform a variety of tasks. In
certain aspects, the state estimator 310 may estimate and/or predict the
current and/or future position(s) (and/or location(s)) of one or more objects detected and/or
tracked by the camera 104 and/or object detector 306. In certain aspects, the state estimator 310 may estimate and/or predict the current and/or future position(s) (and/or location(s)) of one or more users of the object-tracking system 100. In certain aspects, the state estimator 310 may perform simultaneous localization and mapping (SLAM) using one or more SLAM algorithms to estimate and/or predict the current and/or future position(s) of objects and users in the local environment. In certain aspects, the state estimator 310 may employ visual odometry with a Kalman filter to assist in performing its prediction and/or estimation. In certain aspects, the Kalman filter may be a multi-state constrained Kalman Filter (MSCKF). In certain aspects, the state estimator 310 may also employ traditional odometry with information provided by the IMU 110 to assist in its prediction and/or estimation. In some examples, drift may be prevalent in the measurements of the IMU 110, and the visual odometry used by the state estimator 310 may help to correct for this drift. In some examples the IMU 110 may be part of the computer 112. Information to and/or from the IMU 110 may be routed through the data-management unit 312. The state estimator 310 may use information from the object detector 306 and/or IMU 110, in conjunction with SLAM algorithms, odometry methods, and/or visual Odometry methods, to estimate and/or predict the current and/or future position(s) of the user and/or objects in the local environment, and may generate, maintain, and/or update a local map with this information. The map may be stored in a memory device 122. In certain aspects, the map may be generated using map information acquired before tracking services (e.g., GPS, satellite, and/or cellular communication abilities) were lost. The GPU 302 may be configured to render the map on the display 116 in accordance with corresponding selection by the user via the user interface 114, an example of which is described in connection with Figures 6a and 6b. The data-management unit 312 may be configured to provide an interface between components of the object-tracking system 100, and/or other systems and/or devices external to the object-tracking system 100. For example, the data-management unit 312 may provide an interface between the GPU 302, controller, state estimator 310; relative motion of camera with respect to objects in frames is determined based on obtained position of object from first frame to second frame). Additionally, the rational and motivation to combine the references Rublee and Igor as applied in rejection of claim 1 apply to this claim.



Claims 24-30 recite method with steps corresponding to the apparatus elements recited in Claims 1-7 respectively. Therefore, the recited steps of the method Claims 24-30 are mapped to the proposed combination in the same manner as the corresponding elements of Claims 1-7 respectively. Additionally, the rationale and motivation to combine the Rublee and Igor references presented in rejection of Claim 1, apply to these claims.


Allowable Subject Matter
Claims 10-11 and 13 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 101 Abstract idea and under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.




Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 8873798 B2 (A method, non-transitory computer readable medium, and apparatus that tracks an object includes utilizing random projections to represent an object in a region of an initial frame in a transformed space with at least one less dimension. One of a plurality of regions in a subsequent frame with a closest similarity between the represented object and one or more of plurality of templates is identified as a location for the object in the subsequent frame. A learned distance is applied for template matching, and techniques that incrementally update the distance metric online are utilized in order to model the appearance of the object and increase the discrimination between the object and the background. A hybrid template library, with stable templates and hybrid templates that contains appearances of the object during the initial stage of tracking as well as more recent ones is utilized to achieve robustness with respect to pose variation and illumination changes; In this equation, D is a distance metric that indicates the similarity between different image patches. Different choices for the distance metric are discussed below. The minimization problem can be solved analytically by using an iterative optimization method, such as gradient descent, by the object tracking processing apparatus 12. In order to solve the equivalent discretized problem, a selection of how to model the gradient is made with the object tracking processing apparatus 12. If the gradient is coarsely approximated, i.e. using a large step size value, the method will converge faster but the solution may lack accuracy. If a small value is selected, more iterations will be necessary but the solution will exhibit higher accuracy. where .gamma. and .delta. are appropriate constants that will keep the large margin between similar and dissimilar data points and could be the same to the constants u and v in Eq. 9 and 10 respectively. Techniques, such as exact and approximate gradient descent of appropriate loss functions, can be used for the achievement of this goal)

US-20210027431-A1 (computer-implemented method of detecting an object depicted in a digital image includes: detecting a plurality of identifying features of the object, wherein the plurality of identifying features are located internally with respect to the object; projecting a location of region(s) of interest of the object based on the plurality of identifying features, where each region of interest depicts content; building and/or selecting an extraction model configured to extract the content based at least in part on: the location of the region(s) of interest, the of identifying feature(s), or both; and extracting the some or all of the content from the digital image using the extraction model. Corresponding system and computer program product embodiments are disclosed. The inventive concepts enable reliable extraction of data from digital images where portions of an object are obscured/missing, and/or depicted on a complex background)



Any inquiry concerning this communication or earlier communications from the examiner should be directed to PINALBEN V PATEL whose telephone number is (571)270-5872. The examiner can normally be reached M-F: 10am - 8pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chineyere Wills-Burns can be reached at 571-272-9752. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Pinalben Patel/Examiner, Art Unit 2673
Read full office action
Prosecution Timeline

Nov 09, 2023
Application Filed
Mar 04, 2026
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/311,014
Patent 12602824
SUBSTRATE TREATING APPARATUS AND SUBSTRATE TREATING METHOD
2y 5m to grant Granted Apr 14, 2026
17/825,207
Patent 12596437
Monitoring System and Method Having Gesture Detection
2y 5m to grant Granted Apr 07, 2026
18/178,589
Patent 12597235
INFORMATION PROCESSING APPARATUS, LEARNING METHOD, RECOGNITION METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM
2y 5m to grant Granted Apr 07, 2026
18/327,919
Patent 12586215
VEHICLE POSE
2y 5m to grant Granted Mar 24, 2026
18/490,178
Patent 12586217
VISION SENSOR, OPERATING METHOD OF VISION SENSOR, AND IMAGE PROCESSING DEVICE INCLUDING THE VISION SENSOR
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
89%
Grant Probability
99%
With Interview (+9.9%)
2y 6m
Median Time to Grant
Low
PTA Risk
Based on 545 resolved cases by this examiner. Grant probability derived from career allow rate.