Last updated: April 19, 2026
Application No. 18/631,513
METHODS AND SYSTEMS FOR PROCESSING VIDEO IMAGE METADATA

Non-Final OA §102§103
Filed
Apr 10, 2024
Examiner
HOANG, HAN DINH
Art Unit
2661
Tech Center
2600 — Communications
Assignee
Genetec Inc.
OA Round
1 (Non-Final)
Interview Optional

— +19.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 162 resolved cases, 2023–2026
Examiner Intelligence

HOANG, HAN DINH View full profile →
Grants 74% — above average
Career Allow Rate
120 granted / 162 resolved
+12.1% vs TC avg
Strong +19% interview lift
Without
With
+19.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
25 currently pending
Career history
187
Total Applications
across all art units
Statute-Specific Performance

§101
6.9%
-33.1% vs TC avg
§103
65.7%
+25.7% vs TC avg
§102
15.5%
-24.5% vs TC avg
§112
7.1%
-32.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 162 resolved cases
Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/08/2025 and 10/24/2024 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 4-5, 7-11, 14-16, 18-23 and 25-26 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ostrovsky-Berman et al. US PG-Pub(US 20140161314 A1).
Regarding Claim 1, Ostrovsky-Berman teaches a method of operating a computing apparatus, comprising: accessing a plurality of temporal metadata datasets(¶[0005], “An exemplary embodiment of a method of searching image data for objects includes a database populated with metadata identified from image data.” ), each of the plurality of temporal metadata datasets associated with a video image frame of a scene ([0020] “An exemplary embodiment of the method 100 begins at 102 with the acquisition of video image data which may exemplarily be from a plurality of CCTV video cameras deployed about a monitored environment. In a non-limiting example as will be used for exemplary purposes herein, the monitored environment may be a store. The plurality of cameras acquire video image data, which in exemplary embodiments may be a stream of video data” ¶[0020] discloses video image data of an environment is captured by a plurality of cameras and processed to determine objects of interest as further described in ¶[0021]) and comprising (i) identification information for that video image frame(¶[0006] The acquired video image data includes an identifier indicative of a camera of the plurality of cameras that acquired the video image data and a time stamp of when the video image data was acquired.”, ¶[0006] discloses retrieving video image data acquired from a plurality of cameras with an associated timestamp) ; (ii) an object identifier (ID) for each of one or more objects detected in that video image frame(¶[0006], “The detected objects, object characteristics and identifiers associated with each detected object are stored in the database”, ¶[0006] disclose the detected objects and identifier associated are stored in a database); and (iii) one or more object attributes associated with each of the one or more objects detected in that video image frame(¶[0021], “At least one object characteristic is detected for each of the detected objects in the frame of video image data. Object characteristics may be a type of specific descriptor that reflects a property of a specific instance of the predefined detected object. In exemplary embodiments, such object characteristics may be a color or may be an overall estimated height or relative height of the object.”, ¶[0021] discloses object characteristics associated with the object maybe used to identify the object in the image.); for a particular object having an object ID, identifying a subset of temporal metadata datasets in the plurality of temporal metadata datasets comprising an object ID that matches the object ID of the particular object (¶[0007], “ A user interface is operable by a processor that is configured to receive at least one search parameter. A searcher engine is operable by the processor to receive at least one search parameter from the user interface. The searcher engine is further operable by the processor to query the object database based upon the at least one search parameter and receive returned query results comprising at least one object, at least one object characteristic, and at least one identification of the video image data from which the objects and object characteristics are identified. The returned query results are presented on the user interface.”, ¶0007], receiving a search query for an object and returning objects that match the object in the video image data.), and processing the subset of temporal metadata datasets to create an object-based metadata record for the particular object(¶[0007], “A system for image data storage and retrieval, includes a plurality of cameras that operate to acquire video image data from a plurality of locations. An object detection engine receives the acquired video image data and identifies objects and object characteristics within the video image data. An object database is stored on a computer readable medium. The object database stores the identified objects and object characteristics with an identification of the video image data from which the objects and object characteristics are identified”, ¶[0007] discloses detecting object data from acquired video image data and storing the object and object characteristics in a database.), the object-based metadata record for the particular object comprising(i) the object ID (ii) one or more object attributes associated with the particular object and (iii) aggregated identification information for video image frames in which the particular object was detected; ([0023] “At 108 the identifications of the aggregated objects and object characteristics are stored in a database. Additionally, the object and object characteristic are stored in the database at 108 along with an identifier which may be an identification number or code that represents the camera of the plurality of cameras used to acquire the video image data in which the objects and object characteristics were detected”, ¶[0023] discloses the metadata record of the object being stored includes an identifier, characteristics of the object alongside which camera captured the object.)and causing the object-based metadata record to be stored in an object-based metadata database(¶[0032] “The object detection engine 216 operates to store the aggregated detected objects and object characteristics in an object database 220. The object database 220 may be implemented on any of a variety of known computer readable media as described above, and may operate to store identifications of the aggregated detected objects and object characteristics in association with an identification of the video image data in which the objects and object characteristics were detected. Exemplarily, this identification identifies the camera that acquired the video image data and a time stamp associated with a frame or frames in which the objects and object characteristics were detected”, ¶[0032] discloses storing a record of detected objects and characteristics in a database).  
Regarding Claim 4, Ostrovsky-Berman teaches the method defined in claim 1, wherein the aggregated identification information for the video image frames in which the particular object was detected comprises timestamps and/or frame identifiers corresponding to the video image frames in which the particular object was detected.  (¶[0023], “the object and object characteristic are stored in the database at 108 along with an identifier which may be an identification number or code that represents the camera of the plurality of cameras used to acquire the video image data in which the objects and object characteristics were detected, as well as an indication of the frames and/or a time stamp of when the video data was acquired.”, ¶[0023] discloses a specific frame and time stamp is stored alongside information pertaining to the object in the database.)
Regarding Claim 5, Ostrovsky-Berman teaches the method defined in claim 1, wherein the object-based metadata record for the particular object further comprises a thumbnail image representative of the particular object, the thumbnail image being a selected one of the video frame images identified by the aggregated identification information of the object-based metadata record.  (0027] “The visual presentation of the returned results may exemplarily include a link to the stored video image data associated with each of the returned camera identifiers. Selection of the link results in the presentation of the video data associated with the at least one camera identifier at 116. The video data is exemplarily presented on the graphical display. In an alternative embodiment, a thumb nail image, snap shot, or cropped frame of the video data associated with each of the returned camera identifiers is presented.”, ¶[0027] discloses the database record of the object may include a thumbnail image associated with the object.)
Regarding Claim 7, Ostrovsky-Berman teaches the method defined in claim 1, further comprising obtaining the plurality of temporal metadata datasets from a camera.  (¶[0028] “FIG. 2 depicts an exemplary embodiment of a system 200 for video image data storage and retrieval, The system 200 includes at least one camera. 202 and in embodiments a plurality of cameras 201 As described above, the cameras are exemplarily CCTV cameras arranged at a variety of locations around an environment to be monitored. Each of the cameras 202 acquires a stream of video image data, exemplarily digital video image data.”, ¶[0028] discloses a camera system used to acquire the stream of video image data. )
Regarding Claim 8, Ostrovsky-Berman teaches the method defined in claim 7, wherein the computing apparatus is a server communicatively coupled to the camera.  (¶[0029], “The system 200 includes a front end 204 and a back. end 206 as will be described in further detail herein. In exemplary embodiments, the front end is used as described herein to process the video image data as it is acquired by the cameras 202. The back end 206 operates to search query, retrieve the results, and display the results in an informative manner. In a non-limiting embodiment, each of the front end 204 and the back end 206 are implemented on a computing system 300 as described above with respect to FIG. 3.”, ¶[0029] discloses a back end server that is coupled to the front end which receives image data from the cameras.)
Regarding Claim 9, Ostrovsky-Berman teaches the method defined in claim 1, the method further comprising creating the object-based metadata record in real-time.  (¶[0023] “At 108 the identifications of the aggregated objects and object characteristics are stored in a database.” discloses storing a record of the object and characteristics of the object in a database and ¶[0020] discloses that the image processing and storing of the database are processed in real time)
Regarding Claim 10, Ostrovsky-Berman teaches the method defined in claim 9, the method further comprising obtaining the video image frames(Fig. 1, element 102 “acquire video image data”), wherein the object-based metadata record is created as the video image frames are obtained. (Figure 1, element 108 discloses creating a database record of the object after the video image data is acquired.)
Regarding Claim 11, Ostrovsky-Berman teaches the method defined in claim 10, wherein the computing apparatus is a camera(¶[0028] discloses the computing system, includes multiple cameras.).  
Regarding Claim 14, Ostrovsky-Berman teaches a method of operating a computing apparatus(0013] FIG. 3 is a system diagram of an exemplary embodiment of a computing system 300 as may be used to implement embodiments of the method 100), accessing a plurality of temporal metadata datasets(¶[0005], “An exemplary embodiment of a method of searching image data for objects includes a database populated with metadata identified from image data.” ), each of the plurality of temporal metadata datasets associated with a video image frame of a scene ([0020] “An exemplary embodiment of the method 100 begins at 102 with the acquisition of video image data which may exemplarily be from a plurality of CCTV video cameras deployed about a monitored environment. In a non-limiting example as will be used for exemplary purposes herein, the monitored environment may be a store. The plurality of cameras acquire video image data, which in exemplary embodiments may be a stream of video data” ¶[0020] discloses video image data of an environment is captured by a plurality of cameras and processed to determine objects of interest as further described in ¶[0021]) and comprising (i) identification information for that video image frame(¶[0006] The acquired video image data includes an identifier indicative of a camera of the plurality of cameras that acquired the video image data and a time stamp of when the video image data was acquired.”, ¶[0006] discloses retrieving video image data acquired from a plurality of cameras with an associated timestamp) ; and (ii) an object attribute combination associated with each of one or more objects detected in that video image frame (¶[0006], “The detected objects, object characteristics and identifiers associated with each detected object are stored in the database”, ¶[0006] disclose the detected objects and identifier associated are stored in a database); wherein the object attribute combination includes one or more object attributes ¶[0021], “At least one object characteristic is detected for each of the detected objects in the frame of video image data. Object characteristics may be a type of specific descriptor that reflects a property of a specific instance of the predefined detected object. In exemplary embodiments, such object characteristics may be a color or may be an overall estimated height or relative height of the object.”, ¶[0021] discloses object characteristics associated with the object maybe used to identify the object in the image.); or a particular object attribute combination, identifying a subset of temporal metadata datasets in the plurality of temporal metadata datasets comprising an object attribute combination that matches the particular object attribute combination(¶[0007], “ A user interface is operable by a processor that is configured to receive at least one search parameter. A searcher engine is operable by the processor to receive at least one search parameter from the user interface. The searcher engine is further operable by the processor to query the object database based upon the at least one search parameter and receive returned query results comprising at least one object, at least one object characteristic, and at least one identification of the video image data from which the objects and object characteristics are identified. The returned query results are presented on the user interface.”, ¶0007], receiving a search query for an object and returning objects that match the object in the video image data.), and processing the identified subset of temporal metadata datasets to create an object-based metadata record for the particular object attribute combination, (¶[0007], “A system for image data storage and retrieval, includes a plurality of cameras that operate to acquire video image data from a plurality of locations. An object detection engine receives the acquired video image data and identifies objects and object characteristics within the video image data. An object database is stored on a computer readable medium. The object database stores the identified objects and object characteristics with an identification of the video image data from which the objects and object characteristics are identified”, ¶[0007] discloses detecting object data from acquired video image data and storing the object and object characteristics in a database.), 
the object-based metadata record for the particular object attribute combination comprising (i) a plurality of object attributes of the particular object attribute combination; and (ii) aggregated identification information for the video image frames in which an object having the particular object attribute combination was detected; ([0023] “At 108 the identifications of the aggregated objects and object characteristics are stored in a database. Additionally, the object and object characteristic are stored in the database at 108 along with an identifier which may be an identification number or code that represents the camera of the plurality of cameras used to acquire the video image data in which the objects and object characteristics were detected”, ¶[0023] discloses the metadata record of the object being stored includes an identifier, characteristics of the object alongside which camera captured the object.)and causing the object-based metadata record to be stored in an object-based metadata database. (¶[0032] “The object detection engine 216 operates to store the aggregated detected objects and object characteristics in an object database 220. The object database 220 may be implemented on any of a variety of known computer readable media as described above, and may operate to store identifications of the aggregated detected objects and object characteristics in association with an identification of the video image data in which the objects and object characteristics were detected. Exemplarily, this identification identifies the camera that acquired the video image data and a time stamp associated with a frame or frames in which the objects and object characteristics were detected”, ¶[0032] discloses storing a record of detected objects and characteristics in a database).  
Regarding Claim 15, Ostrovsky-Berman teaches the method defined in claim 14, wherein the object-based metadata record for the object further comprises a thumbnail image representative of the object, the thumbnail image being a selected one of the video frame images identified by the aggregated identification information of the object-based metadata record.  (0027] “The visual presentation of the returned results may exemplarily include a link to the stored video image data associated with each of the returned camera identifiers. Selection of the link results in the presentation of the video data associated with the at least one camera identifier at 116. The video data is exemplarily presented on the graphical display. In an alternative embodiment, a thumb nail image, snap shot, or cropped frame of the video data associated with each of the returned camera identifiers is presented.”, ¶[0027] discloses the database record of the object may include a thumbnail image associated with the object.)
Regarding Claim 16, Ostrovsky-Berman teaches the method defined in claim 14, wherein the aggregated identification information for the video image frames in which the object was detected comprises timestamps and/or frame identifiers corresponding to the video image frames in which the object was detected.   (¶[0023], “the object and object characteristic are stored in the database at 108 along with an identifier which may be an identification number or code that represents the camera of the plurality of cameras used to acquire the video image data in which the objects and object characteristics were detected, as well as an indication of the frames and/or a time stamp of when the video data was acquired.”, ¶[0023] discloses a specific frame and time stamp is stored alongside information pertaining to the object in the database.)
Regarding Claim 18. Ostrovsky-Berman teaches the method defined in claim 14, further comprising obtaining the plurality of temporal metadata datasets from a camera.  (¶[0028] “FIG. 2 depicts an exemplary embodiment of a system 200 for video image data storage and retrieval, The system 200 includes at least one camera. 202 and in embodiments a plurality of cameras 201 As described above, the cameras are exemplarily CCTV cameras arranged at a variety of locations around an environment to be monitored. Each of the cameras 202 acquires a stream of video image data, exemplarily digital video image data.”, ¶[0028] discloses a camera system used to acquire the stream of video image data. )
Regarding Claim 19, Ostrovsky-Berman teaches the method defined in claim 18, wherein the computing apparatus is a server communicatively coupled to the camera.  (¶[0029], “The system 200 includes a front end 204 and a back. end 206 as will be described in further detail herein. In exemplary embodiments, the front end is used as described herein to process the video image data as it is acquired by the cameras 202. The back end 206 operates to search query, retrieve the results, and display the results in an informative manner. In a non-limiting embodiment, each of the front end 204 and the back end 206 are implemented on a computing system 300 as described above with respect to FIG. 3.”, ¶[0029] discloses a back end server that is coupled to the front end which receives image data from the cameras.)
Regarding Claim 20, Ostrovsky-Berman teaches the method defined in claim 14, the method further comprising creating the object-based metadata record in real-time.  (¶[0023] “At 108 the identifications of the aggregated objects and object characteristics are stored in a database.” discloses storing a record of the object and characteristics of the object in a database and ¶[0020] discloses that the image processing and storing of the database are processed in real time)
Regarding Claim 21, Ostrovsky-Berman teaches the method defined in claim 14, the method further comprising obtaining the video image frames, (Fig. 1, element 102 “acquire video image data”), wherein the object-based metadata record is created as the video image frames are obtained. (Figure 1, element 108 discloses creating a database record of the object after the video image data is acquired.)
Regarding Claim 22, Ostrovsky-Berman teaches the method defined in claim 21, wherein the computing apparatus is a camera. (¶[0028] discloses the computing system, includes multiple cameras.).  
Regarding Claim 23, Ostrovsky-Berman teaches a method of operating a computing apparatus, comprising: deriving a combination of object attributes of interest from a user input([0024] “At 110, the method 100 is used to search the database in order to identify video data that contains an object meeting a specified description. At 110 at least one search parameter is received through a user interface. As will be described in further detail herein, FIG. 4 depicts an exemplary embodiment of a graphical user interface (GUI) which may be presented to a user and is operable to receive the at least one search parameter. In exemplary embodiments, the received search parameter may specify general categories or properties of the object of interest, such as, but not limited to height or object sub-portion color”, ¶[0024] discloses a user is able to input a search parameter with properties of the object of interest.); consulting a database of records, each of the records being associated with an object and comprising (i) object attributes associated with that object; and (ii) identification information associated with a subset of video image frames in which that object was detected([0023] “At 108 the identifications of the aggregated objects and object characteristics are stored in a database. Additionally, the object and object characteristic are stored in the database at 108 along with an identifier which may be an identification number or code that represents the camera of the plurality of cameras used to acquire the video image data in which the objects and object characteristics were detected”, ¶[0023] discloses the metadata record of the object being stored includes an identifier, characteristics of the object alongside which camera captured the object.), wherein the consulting comprises identifying each record associated with an object for which the object attributes stored in that record match the combination of object attributes of interest defined in the user input(¶[0007], “ A user interface is operable by a processor that is configured to receive at least one search parameter. A searcher engine is operable by the processor to receive at least one search parameter from the user interface. The searcher engine is further operable by the processor to query the object database based upon the at least one search parameter and receive returned query results comprising at least one object, at least one object characteristic, and at least one identification of the video image data from which the objects and object characteristics are identified. The returned query results are presented on the user interface.”, ¶0007], receiving a search query for an object and returning objects that match the object in the video image data.), implementing a plurality of interactive graphical elements each of which corresponds to an identified record(¶[0026],”The returned database entry at 114 includes the camera identifier associated with the stored objects and object characteristics that met the database query. In embodiments, a plurality of results are returned from the database at 114 and are sorted by relevancy, and presented in an order of relevancy on the graphical display of the user interface. In an embodiment, the relevancy by which the results are sorted is a similarity score between the object characteristics and the at least one search parameter. A similarity function may be applied to calculated this similarity score.” ), wherein selection by the user input of a particular one of the plurality of interactive graphical elements causes playback of the subset of video image frames identified by the identification information in the record corresponding to the particular one of the plurality of interactive graphical elements. ([0027] “The visual presentation of the returned results may exemplarily include a link to the stored video image data associated with each of the returned camera identifiers. Selection of the link results in the presentation of the video data associated with the at least one camera identifier at 116. The video data is exemplarily presented on the graphical display. In an alternative embodiment, a thumb nail image, snap shot, or cropped frame of the video data associated with each of the returned camera identifiers is presented. While it will be recognized that in alterative embodiments, other manners of presenting the video data directly, or a link or pointer to the video data may be presented in the returned results.”, ¶[0027] discloses that the results of the query for the object of interest are presented to the user and a link to the video data for playback is present to them as well. )
Regarding Claim 25, Ostrovsky-Berman teaches the method defined in claim 23, wherein the selection by the user input includes a set of one or more keywords.  (¶[0034], “Referring to FIG. 4, the GUI 228 may include a plurality of prompts 230 and one or more drop down menus 232 or radio buttons 234 in order to input various search parameters. In alternative embodiments, search parameters may be entered with a text. field. The GUI. 228 depicted in FIG. 4 is exemplarily an embodiment for providing a description of a person to be found in video image data.”, ¶[0034] discloses allowing a user to enter text to search for a person in the video image data.)
Regarding Claim 26, Ostrovsky-Berman teaches the method defined in claim 23, wherein the selection by the user input includes keywords connected by one or more Boolean operators.  ([0036], “It will be recognized that in embodiments, the user may establish the search parameters in a variety of manners and not show limited to those as described above with respect to the GUI in FIG. 4. Alternatively, search parameters that include Boolean operators such as and or may be included, exemplarily as to enter search parameters in the alternative (e.g. red or orange shirts; red shirt or pants; brown or black hair). Similarly height, date, or time ranges may also be employed to restrict or broaden search results.”, ¶[0036] discloses the search parameters could include Boolean operators.)
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 2, 6 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Ostrovsky-Berman et al. US PG-Pub(US 20140161314 A1) in view of Tusch et al. US PG-Pub(US 20160019426 A1).
Regarding Claim 2, while Ostrovsky-Berman teaches the method defined in claim 1, wherein the object-based metadata database comprises a plurality of previously stored object-based metadata records(0032] “The object detection engine 216 operates to store the aggregated detected objects and object characteristics in an object database 220… Such information may exemplarily be provided by the identifier and time stamper 210 of the front end 204. The object database 220 can store this information in a manner that allows for efficient retrieval. In non-limiting embodiments, the object database 220 is arranged to index the object instances by camera, camera location, a dominant object characteristic, or temporally by time stamp or frame number.” ), and wherein causing the object-based metadata record to be stored in the object-based metadata database comprises: determining if an object ID for any of the plurality of previously stored object-based metadata records matches the object ID of the particular object([0024] At 110, the method 100 is used to search the database in order to identify video data that contains an object meeting a specified description. At 110 at least one search parameter is received through a user interface. As will be described in further detail herein, FIG. 4 depicts an exemplary embodiment of a graphical user interface (GUI) which may be presented to a user and is operable to receive the at least one search parameter. ); 
Ostrovsky-Berman does not explicitly teach responsive to determining that the object ID for a particular one of the plurality of previously stored object-based metadata records matches the object ID, updating the particular one of the plurality of previously stored object-based metadata records by aggregating aggregated identification information of the particular one of the plurality of previously stored object-based metadata records with the aggregated identification information of the object-based metadata record.  
Tusch teaches responsive to determining that the object ID for a particular one of the plurality of previously stored object-based metadata records matches the object ID ([0030], “FIG. 4 shows the combination of two object records to form a combined object record, in response to determining that the detected objects correspond to the same object.”, as shown in figure 4, the prior art determines if the object matches a previously detected object in the database.), updating the particular one of the plurality of previously stored object-based metadata records by aggregating aggregated identification information of the particular one of the plurality of previously stored object-based metadata records with the aggregated identification information of the object-based metadata record. ([0073] “The person then re-enters the video stream in a frame 406, is also present in further frames 407, 408, 409, and then leaves the video stream. Data corresponding to these frames are included in a second object record 410. It is then determined that the two object records correspond to the same person, for example by using a facial recognition algorithm, and the first and second object records are combined into a combined object record 411.”, ¶[0073] discloses if the object happen to be the same person the records of the object are combined and an updated record is formed with information related to the object.)
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Ostrovsky-Berman with Tusch in order to determine if the object matches a previously stored object in the database and update the record of the object. One skilled in the art would have been motivated to modify Ostrovsky-Berman in this manner in order to analyze a video stream and generate metadata, which may be transmitted to a different location. (Tusch, ¶[0003])
Regarding Claim 6, while Ostrovsky-Berman teaches the method defined in claim 5, Ostrovsky-Berman does not explicitly teach further comprising accessing the video frame images identified by the aggregated identification information of the object-based metadata record and selecting the thumbnail image of the object-based metadata record based on performing image processing on the accessed video frame images to determine which of the accessed video frame images best represents the detected object.  
Tusch teaches further comprising accessing the video frame images identified by the aggregated identification information of the object-based metadata record(¶[0062], “analyzing of the object record at at least one time at which the object is detected in the video stream”, ¶[0062] discloses analyzing the object record based on the time in which the object is detected in the video frame) and selecting the thumbnail image of the object-based metadata record based on performing image processing on the accessed video frame images to determine which of the accessed video frame images best represents the detected object. (¶[0063] “In addition to the data described above, additional data such as an image or images corresponding to part of a frame or frames, for example a thumbnail or multiple thumbnails, may be included in the object record. As an illustrative example, a thumbnail comprising a cropped portion of a video frame including a detected human face may be captured at the time of “best snap”, and selected for inclusion in an object record. The additional data could alternatively comprise, for example, a histogram of the color distribution in a relevant portion of the image. One or more entire frames of the video could also be included in the object record. ¶[0064] As shown in FIG. 1, the transmitted object record, including any additional data such as histograms, are analyzed by the second processing system. For example, a thumbnail corresponding to a human face captured at the time of “best snap” may be included in the object record in the first analysis”, as disclosed in ¶[0063]-¶[0064], the prior art a thumbnail image of the object detected may be selected based on which thumbnail best resembles the object of interest in the video.)
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Ostrovsky-Berman with Tusch in order to have a thumbnail image of the best match. One skilled in the art would have been motivated to modify Ostrovsky-Berman in this manner in order to analyze a video stream and generate metadata, which may be transmitted to a different location. (Tusch, ¶[0003])
Regarding Claim 17, while Ostrovsky-Berman teaches the method defined in claim 16, Ostrovsky-Berman does not explicitly teach further comprising accessing the video frame images identified by the aggregated identification information of the object-based metadata record and selecting the thumbnail image of the object-based metadata record based on performing image processing on the accessed video frame images to determine which of the accessed video frame images best represents the object.  
Tusch teaches further comprising accessing the video frame images identified by the aggregated identification information of the object-based metadata record(¶[0062], “analyzing of the object record at least one time at which the object is detected in the video stream”, ¶[0062] discloses analyzing the object record based on the time in which the object is detected in the video frame) and selecting the thumbnail image of the object-based metadata record based on performing image processing on the accessed video frame images to determine which of the accessed video frame images best represents the detected object. (¶[0063] “In addition to the data described above, additional data such as an image or images corresponding to part of a frame or frames, for example a thumbnail or multiple thumbnails, may be included in the object record. As an illustrative example, a thumbnail comprising a cropped portion of a video frame including a detected human face may be captured at the time of “best snap”, and selected for inclusion in an object record. The additional data could alternatively comprise, for example, a histogram of the color distribution in a relevant portion of the image. One or more entire frames of the video could also be included in the object record. ¶[0064] As shown in FIG. 1, the transmitted object record, including any additional data such as histograms, are analyzed by the second processing system. For example, a thumbnail corresponding to a human face captured at the time of “best snap” may be included in the object record in the first analysis”, as disclosed in ¶[0063]-¶[0064], the prior art a thumbnail image of the object detected may be selected based on which thumbnail best resembles the object of interest in the video.)
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Ostrovsky-Berman with Tusch in order to have a thumbnail image of the best match. One skilled in the art would have been motivated to modify Ostrovsky-Berman in this manner in order to analyze a video stream and generate metadata, which may be transmitted to a different location. (Tusch, ¶[0003])
Claims 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Ostrovsky-Berman et al. US PG-Pub(US 20140161314 A1) in view of Ong et al. US PG-Pub(US 20240104966 A1).
Regarding Claim 12, while Ostrovsky-Berman teaches The method defined in claim 1, wherein the computing apparatus is a server communicatively coupled to a plurality of cameras(¶[0029] The system 200 includes a front end 204 and a back. end 206 as will be described in further detail herein. In exemplary embodiments, the front end is used as described herein to process the video image data as it is acquired by the cameras 202. The back end 206 operates to search query, retrieve the results, and display the results in an informative manner. In a non-limiting embodiment, each of the front end 204 and the back end 206 are implemented on a computing system 300 as described above with respect to FIG. 3), the method further comprising: obtaining the plurality of temporal metadata datasets from a plurality of cameras([0020] An exemplary embodiment of the method 100 begins at 102 with the acquisition of video image data which may exemplarily be from a plurality of CCTV video cameras deployed about a monitored environment.), wherein each of the plurality of temporal metadata datasets is associated with a respective camera-unique object identifier([0023] At 108 the identifications of the aggregated objects and object characteristics are stored in a database. Additionally, the object and object characteristic are stored in the database at 108 along with an identifier which may be an identification number or code that represents the camera of the plurality of cameras used to acquire the video image data in which the objects and object characteristics were detected, as well as an indication of the frames and/or a time stamp of when the video data was acquired); identifying a plurality of camera-unique object identifiers corresponding to an identical object(¶[0024], the received search parameter may specify general categories or properties of the object of interest, such as, but not limited to height or object sub-portion color. Additionally, the search parameters may include specific cameras, locations, times, or dates, within which the video image data is of interest. In still further embodiments, the search parameter may include user preferences on the display of returned search results which may include, but is not limited to an image grid or a list, and such user preferences may also include a number of results per page or other preferences as may be recognized b a person of ordinary skill in the art.); 
However, Ostrovsky-Berman does not explicitly teach modifying the plurality of camera-unique object identifiers corresponding to the identical object to the object ID, wherein the modified object ID is server-unique.  
Ong teaches modifying the plurality of camera-unique object identifiers corresponding to the identical object to the object ID([0057] “FIG. 1 depicts an example system 100 used for identifying a target subject 702. A system 100 may comprise one or more cameras (e.g., cameras 100a-c) and each of the cameras may be configured to capture image frames of a location. When an image frame is captured by a camera, (e.g., 100a-c), information (e.g., appearance and facial information) is retrieved from the image frame to identify a subject and assigned to a subject ID relating to a subject by comparing with information stored in a database. Such subject ID is used by the camera to differentiate the subject from other subjects whose information characterized under different subject IDs.”, as disclosed in ¶[0057] the cameras assign a subject id to an object if it matches a subject idea in a database and figure 1 also shows which camera alongside the number of the camera captured the subject id that matches in the database. ), wherein the modified object ID is server-unique([0046], “In an embodiment, when a server receives new information from a camera that has not been assigned to a subject ID, the server will compare it against information stored in its database, and, if it exhibits high degree of similarity with a set of data stored in the database, for example the degree of similarity is greater than a threshold value (e.g. >80%), the new information is then assigned to a subject ID associated with that set of data and stored under the subject ID, indicating that a new set of data for a subject with the subject ID has been identified”, ¶[0046] discloses the server assigns a subject id an object in the image. ).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Ostrovsky-Berman with Ong in order to update the object id and assign a camera identifier based on the plurality of cameras. One skilled in the art would have been motivated to modify Ostrovsky-Berman in this manner in order to adaptively update a target subject identification relating to a target subject stored in a database in a server. (Ong, ¶[0001])
Regarding Claim 13, the combination of Ostrovsky-Berman and Ong teach the method defined in claim 12, where Ong further teaches wherein the modified object ID is uniquely determined based on specifications or configurations of the server that is communicatively coupled to the plurality of cameras. (¶[0037] discloses a server coupled to a camera system in which subject IDs are updated based on facial image recognition)
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Ostrovsky-Berman with Ong in order to update the object id based on the plurality of cameras. One skilled in the art would have been motivated to modify Ostrovsky-Berman in this manner in order to adaptively update a target subject identification relating to a target subject stored in a database in a server. (Ong, ¶[0001])
Claim 24 is rejected under 35 U.S.C. 103 as being unpatentable over Ostrovsky-Berman et al. US PG-Pub(US 20140161314 A1) in view of Zadeh et al. US Patent(US 10789291 B1) .
Regarding Claim 24, while Ostrovsky-Berman teaches the method defined in claim 23, Ostrovsky-Berman does not explicitly teach further comprising accessing a database of video image frames based on the identification information in the record corresponding to the particular one of the plurality of interactive graphical elements to retrieve the subset of video image frames for playback. 
Zadeh teaches further comprising accessing a database of video image frames based on the identification information in the record corresponding to the particular one of the plurality of interactive graphical elements to retrieve the subset of video image frames for playback. (Col 18, Lines 37-54, “FIG. 7 is a flow chart of an example process of playing back a video, according to one embodiment. The media detection system 140 receives 702 a selection of a video for playback from a user. The media detection system 140 generates 704 a user interface for playing the selected video. The user interface includes a progress bar interface element. A location within the progress bar interface element corresponds to a frame of the video. The media detection system 140 provides 706 the generated user interface to a client device associated with the user for display. The media detection system 140 receives 708 a request to search the selected video for a selected object from the user. The media detection system 140 classifies 710 frames of the video using one or more detectors. A detector is configured to process the frame and output a confidence score indicating a likelihood that the selected object is present within the frame.”, as disclosed in this section of the prior art, the user is able to retrieve a selection of video frames in the database for playback in order to search for a frame of the selected object.)
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Ostrovsky-Berman with Zadeh in order to access the database and retrieve a selection of frames to playback the video for the object of interest. One skilled in the art would have been motivated to modify Ostrovsky-Berman in this manner in order to search videos and other media content to identify items, objects, faces, or other entities within the media content. (Zadeh, Abstract) 
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Ostrovsky-Berman et al. US PG-Pub(US 20140161314 A1) in view of Tusch et al. US PG-Pub(US 20160019426 A1) in view of Ong et al. US PG-Pub(US 20240104966 A1).

Regarding Claim 3, while Ostrovsky-Berman and Tusch teach the method defined in claim 2, they do not explicitly teach further comprising: responsive to determining that no object ID of the plurality of previously stored object-based metadata records matches the object ID of the particular object, causing the object-based metadata record to be stored as a new record in the object-based metadata database.  
Ong teaches further comprising: responsive to determining that no object ID of the plurality of previously stored object-based metadata records matches the object ID of the particular object, causing the object-based metadata record to be stored as a new record in the object-based metadata database.  (¶[0040] “Target subject—a target subject may be any suitable type of entity, which may include a person, a user or a subject whose identification (ID) will be adaptively updated. In various embodiments, a target subject may refer to a new subject that does not match any subject in a list of subjects known to a server (based on a corresponding list of subject information and IDs stored in a database of the server) and hence a new subject ID (e.g., target subject ID) is created by the server for identifying the target subject” , ¶[0040] discloses if a new subject does not match any subjects in the database a new record is assigned to the subject.)
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Ostrovsky-Berman and Tusch with Ong in order to add a new record of an object in the case no match was found. One skilled in the art would have been motivated to modify Ostrovsky-Berman and Tusch in this manner in order to adaptively update a target subject identification relating to a target subject stored in a database in a server. (Ong, ¶[0001])
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAN D HOANG whose telephone number is (571)272-4344. The examiner can normally be reached Monday-Friday 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JOHN M VILLECCO can be reached at 571-272-7319. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HAN HOANG/Examiner, Art Unit 2661
Read full office action
Prosecution Timeline

Apr 10, 2024
Application Filed
Feb 07, 2026
Non-Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/638,420
Patent 12602835
POINT CLOUD DATA TRANSMISSION DEVICE, POINT CLOUD DATA TRANSMISSION METHOD, POINT CLOUD DATA RECEPTION DEVICE, AND POINT CLOUD DATA RECEPTION METHOD
2y 5m to grant Granted Apr 14, 2026
18/186,220
Patent 12602778
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM
2y 5m to grant Granted Apr 14, 2026
18/223,648
Patent 12602918
LEARNING DATA GENERATING APPARATUS, LEARNING DATA GENERATING METHOD, AND NON-TRANSITORY RECORDING MEDIUM HAVING LEARNING DATA GENERATING PROGRAM RECORDED THEREON
2y 5m to grant Granted Apr 14, 2026
18/286,641
Patent 12592070
IMAGE PROCESSING APPARATUS
2y 5m to grant Granted Mar 31, 2026
18/053,450
Patent 12586364
SINGLE IMAGE CONCEPT ENCODER FOR PERSONALIZATION USING A PRETRAINED DIFFUSION MODEL
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
74%
Grant Probability
93%
With Interview (+19.3%)
3y 2m
Median Time to Grant
Low
PTA Risk
Based on 162 resolved cases by this examiner. Grant probability derived from career allow rate.
METHODS AND SYSTEMS FOR PROCESSING VIDEO IMAGE METADATA

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email