Last updated: April 19, 2026
Application No. 18/800,833
IDENTIFYING AND RETRIEVING VIDEO METADATA WITH PERCEPTUAL FRAME HASHING

Non-Final OA §103§DP
Filed
Aug 12, 2024
Examiner
PARK, SUNGHYOUN
Art Unit
2484
Tech Center
2400 — Computer Networks
Assignee
Painted Dog Inc.
OA Round
1 (Non-Final)
Interview Optional

— +10.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 613 resolved cases, 2023–2026
Examiner Intelligence

PARK, SUNGHYOUN View full profile →
Grants 75% — above average
Career Allow Rate
459 granted / 613 resolved
+16.9% vs TC avg
Moderate +10% lift
Without
With
+10.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
43 currently pending
Career history
656
Total Applications
across all art units
Statute-Specific Performance

§101
5.9%
-34.1% vs TC avg
§103
51.8%
+11.8% vs TC avg
§102
26.4%
-13.6% vs TC avg
§112
6.9%
-33.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 613 resolved cases
Office Action

§103 §DP
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Ma in view of Kerl
Claims 2-7 and 9-12 are rejected under 35 U.S.C. 103 as being unpatentable over Ma et al.(USPubN 8,874,777; hereinafter Ma) in view of Kerl(USPubN 2019/0042853).
As per claim 2, Ma teaches a method, comprising: generating a first index information for a frame of a first version of a source video; storing the first index information in a database(“receiving index information from the server, the indexing information contained in a rate map index file and relating to an indexing of a single concatenated file at the server, the concatenated file containing multiple versions of a content item encoded at different bit rates, the indexing information identifying respective locations in the concatenated file of a plurality of individual encodings of the content item” in Claim 1, “the file encoder 116 logs transcoding and concatenation to a file or database. … The file encoder 116 is also responsible for generating the rate map index files for each concatenated file. During the transcoding and concatenation processes, the file encoder 116 has all the information necessary to generate the rate map index files. The transcoding configurations contain information on the granularity and units for index information. The rate map index files are written to the storage device 114 with the concatenated media files.” in Col. 9 lines 49-65); 
playing a second version of the source video on a playback device(“the native media player to initiate a seek operation from the current playback position to a position corresponding to the new concatenated file byte offset for a new encoding” in Claim 15); 
generating a second index information for a frame of the second version of the source video; matching the second index information to the first hash vector in the database(“receiving index information from the server, the indexing information contained in a rate map index file and relating to an indexing of a single concatenated file at the server, the concatenated file containing multiple versions of a content item encoded at different bit rates, the indexing information identifying respective locations in the concatenated file of a plurality of individual encodings of the content item” in Claim 1, “the file encoder 116 logs transcoding and concatenation to a file or database. … The file encoder 116 is also responsible for generating the rate map index files for each concatenated file. During the transcoding and concatenation processes, the file encoder 116 has all the information necessary to generate the rate map index files. The transcoding configurations contain information on the granularity and units for index information. The rate map index files are written to the storage device 114 with the concatenated media files.” in Col. 9 lines 49-65); and 
in response to matching the second index information to the first index information, providing a timestamp associated with the first index information from the database(“identifying, using the retrieved indexing information, a time offset in the concatenated file for the selected encoding; identifying, using the retrieved indexing information, a byte offset in the concatenated file for the selected encoding; retrieving segments of the content item from the server by specifying the identified byte offset for the selected encoding to the server; and notifying the native media player to play at the identified time offset and providing the retrieved segments to the native media player for the selected encoding” in Claim 1).
Ma is silent about generating a first hash vector for a frame of a first video and generating a second hash vector for a frame of the second video. 
Kerl teaches generating a first hash vector for a frame of a first video and generating a second hash vector for a frame of the second video(“processes may receive a first video from a network; compute first hash values of first frames of the first video; determine, from a data structure that stores second hash values of second frames of a second video based at least on spatial relationships among respective portions of the second hash values” in Abs).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Ma with the above teachings of Kerl in order to improve detection accuracy.
As per claim 3, Ma and Kerl teach all of limitation of claim 2. 
Ma is silent about wherein generating the first hash vector comprises generating the first hash vector with a perceptual hashing process.
Kerl teaches wherein generating the first hash vector comprises generating the first hash vector with a perceptual hashing process(“a vector associated with a hash value of the first hash values, of the first frames of the first video, may be X, and a hash value associated with a hash value of the second hash values, of the second frames of the second video, may be Y. In particular embodiments, X.sub.i may include a value of a byte of a first hash value, and y.sub.i may include a value of a byte of a second hash value. In particular embodiments, determining if a hash value of a frame of the second video is within the first distance of a hash value of a frame of the first video may include representing the hash value of the frame of the second video as a vector Y, representing the hash value of the frame of the first video as a vector X, and determining if a Euclidean distance between X and Y is within the first distance. In particular embodiments, utilizing the data structure that stores the second hash values based at least on spatial relationships among respective portions of the second hash values may reduce a number of comparison and/or a number of computations. For example, one or more portions of the data structure may not be utilized in determining the number of the second hash values, of the second frames of the second video, that are within the first distance of the one or more of the first hash values. For instance, the data structure stores the second hash values based at least on spatial relationships, which may be utilized in determining that one or more of the second hash values may not be within the first distance without computing a distance between a hash value of a frame of the first video and a hash value of a frame of the second video” in Para.[0040]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Ma with the above teachings of Kerl in order to improve detection accuracy.
As per claim 4, Ma and Kerl teach all of limitation of claim 2. 
Ma teaches wherein playing the second version of the source video on the playback device comprises displaying the frame of the second version of the source video on at least one of a television, a set-top box, a computer, or a mobile device(“One of the primary methods for monetizing video content is the periodic insertion of video advertisements, as with television and some internet-based long form video content delivery, as well as through strictly pre-roll and/or post-roll advertisements as with movies and some short form video content delivery” in Col. 2 lines 3-11).
As per claim 5, Ma and Kerl teach all of limitation of claim 2. 
Ma teaches wherein the second version of the source video is edited for length, edited for content, and/or changed in format with respect to the first version of the source video(“receiving index information from the server, the indexing information contained in a rate map index file and relating to an indexing of a single concatenated file at the server, the concatenated file containing multiple versions of a content item encoded at different bit rates, the indexing information identifying respective locations in the concatenated file of a plurality of individual encodings of the content item” in Claim 1).
As per claim 6, Ma and Kerl teach all of limitation of claim 2. 
Ma is silent about wherein matching the second hash vector to the first hash vector comprises determining that the second hash vector is within a predefined strict threshold distance of the first hash vector.
Kerl teaches wherein matching the second hash vector to the first hash vector comprises determining that the second hash vector is within a predefined strict threshold distance of the first hash vector (“a number of the second hash values that are within a distance of one or more of the first hash values; and determine that the first video includes at least a portion of the second video based at least on the number of the second hash values that are within the distance value of the one or more of the first hash values and based at least on a minimum of a number of frames of the first video and a number of frames of the second video” in Abs).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Ma with the above teachings of Kerl in order to improve detection accuracy.
As per claim 7, Ma and Kerl teach all of limitation of claim 2. 
Ma teaches wherein matching the second hash vector to the first hash vector comprises determining that the second hash vector is within a threshold distance of hash vectors corresponding to frames of the first version of the source video within a few seconds of each other(“An offset is calculated into the concatenated file for the new encoding. The offset corresponds to the same current position in the current encoding. In one embodiment, the offset is calculated as a time offset (e.g. 30 seconds in to the first encoding, and 30 seconds in to the second encoding). In another embodiment, the offset is calculated as a frame offset (e.g. 5th frame of the first encoding, to the 5th frame of the second encoding). The offset is then converted into a concatenated file byte offset. In one embodiment, the offset is calculated directly, using a known frame size for a given encoding, as (N*F), where N is the frame number and F is the known frame size. In another embodiment, the offset is looked up in the rate map index file, as described above. In one embodiment, the offset is calculated as the next frame or range to be retrieved by the downloader.” in Col. 6 lines 15-36).
As per claim 9, Ma and Kerl teach all of limitation of claim 2. 
Ma teaches wherein the database is a first database and further comprising: querying a second database for metadata associated with the frame of the first version of the source video based on the timestamp; and retrieving the metadata from the second database(“In one embodiment the media stream position is derived from the frames requested by the media player 14. In one embodiment, the stream position is adjusted for the size of the media player's 14 playback buffer. In one embodiment, the stream position is saved locally as a bookmark. In another embodiment, the stream position is provided by the stream assembler 13 to the segment downloader 12, so that a bookmark may be set on the server 10. The server 10 stores the bookmark as per-user/per-media metadata in the server database. When the media player 14 starts, it may either request that rendering begin at the start of the content, or it may request that rendering begin at the last known bookmark position. In the latter case, the segment downloader 12 retrieves the bookmark metadata from the server 10, calculates the necessary offsets and begins downloading segments from that point” in Col. 8 lines 66-67 and Col. 9 lines 1-9).
As per claim 10, Ma and Kerl teach all of limitation of claim 9. 
Ma teaches further comprising: transmitting the metadata to the playback device; and displaying the metadata on the playback device(“In one embodiment the media stream position is derived from the frames requested by the media player 14. In one embodiment, the stream position is adjusted for the size of the media player's 14 playback buffer. In one embodiment, the stream position is saved locally as a bookmark. In another embodiment, the stream position is provided by the stream assembler 13 to the segment downloader 12, so that a bookmark may be set on the server 10. The server 10 stores the bookmark as per-user/per-media metadata in the server database. When the media player 14 starts, it may either request that rendering begin at the start of the content, or it may request that rendering begin at the last known bookmark position. In the latter case, the segment downloader 12 retrieves the bookmark metadata from the server 10, calculates the necessary offsets and begins downloading segments from that point” in Col. 8 lines 66-67 and Col. 9 lines 1-9).
As per claim 11, Ma and Kerl teach all of limitation of claim 2. 
Ma teaches further comprising: in response to matching the second hash vector to the first hash vector, providing a timestamp offset representing a difference between the timestamp corresponding to the frame of the first version of the source video and a timestamp corresponding to the frame of the second version of the source video(“An offset is calculated into the concatenated file for the new encoding. The offset corresponds to the same current position in the current encoding. In one embodiment, the offset is calculated as a time offset (e.g. 30 seconds in to the first encoding, and 30 seconds in to the second encoding). In another embodiment, the offset is calculated as a frame offset (e.g. 5th frame of the first encoding, to the 5th frame of the second encoding). The offset is then converted into a concatenated file byte offset. In one embodiment, the offset is calculated directly, using a known frame size for a given encoding, as (N*F), where N is the frame number and F is the known frame size. In another embodiment, the offset is looked up in the rate map index file, as described above. In one embodiment, the offset is calculated as the next frame or range to be retrieved by the downloader.” in Col. 6 lines 15-36).
As per claim 12, Ma and Kerl teach all of limitation of claim 2. 
Ma teaches further comprising: in response to matching the second hash vector to the first hash vector, providing information identifying the source video(“a rate map index file is used. The rate map index file contains a plurality of entries, each entry containing an index into the concatenated file. Each index contains a plurality of concatenated file byte offsets which are offsets into the concatenated file. Each entry contains a concatenated file byte offset for each encoding in the concatenated file, such that each byte offset maps a position, in the current encoding, to the corresponding position in another encoding within the concatenated file. The offsets may be tuned to different granularity. In one embodiment the rate map indices map out only the start of the encodings. In another embodiment, the rate map indices map out individual frames of a video encoding. In another embodiment, the rate map indices map out groups of frames, beginning with key frames, for a video encoding. In another embodiment, the rate map indices map out the different compression or encryption blocks of a data encoding. The rate map indices are all of fixed size, so that the rate map indices themselves may be easily indexed by a rate map index file byte offset which is an offset into the rate map index file. For example, the index for a given frame F of a given encoding E can be found in the rate map index file at byte (((E*N)+F)*I), where N is the number of frames in each encoding, and I is the size of each index. The number of frames N is preferably consistent for all encodings of a given source video, though may differ from one source video to another.” in Col. 3 lines 61-67 and Col. 4 lines 1-19).

Ma in view of Kerl and Gupta
Claims 8, and 13-21 are rejected under 35 U.S.C. 103 as being unpatentable over Ma et al.(USPubN 8,874,777; hereinafter Ma) in view of Kerl(USPubN 2019/0042853) further in view of Gupta et al.(USPubN 2019/0332849; hereinafter Gupta)
As per claim 8, Ma and Kerl teach all of limitation of claim 2. 
Ma and Kerl are silent about wherein matching the second hash vector to the first hash vector comprises: transmitting the second hash vector from the playback device to an Application Programming Interface (API) server; and querying, via the API server, the database with the first hash vector.
Gupta teaches wherein matching the second hash vector to the first hash vector comprises: transmitting the second hash vector from the playback device to an Application Programming Interface (API) server; and querying, via the API server, the database with the first hash vector (“the social networking server 112 communicates with the various databases 116-124 through the one or more database server(s) 126. In this regard, the database server(s) 126 provide one or more interfaces and/or services for providing content to, modifying content in, removing content from, or otherwise interacting with the databases 116-124. For example, and without limitation, such interfaces and/or services may include one or more Application Programming Interfaces (APIs)” in Para.[0033], “When utilizing LSH, the near-neighbor feature vectors are those feature vectors with the same LSH hash as the query feature vector. That is, the search for the near-duplicate images is confined to those feature vectors in the same hash bucket as the query feature vector.” in Para.[0084]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Ma and Kerl with the above teachings of Gupta in order to improve performance with API server.
As per claim 13, Ma teaches a system comprising: a database to store index information for respective frames of different versions of a source video, the index information being associated in the database with respective timestamps(“receiving index information from the server, the indexing information contained in a rate map index file and relating to an indexing of a single concatenated file at the server, the concatenated file containing multiple versions of a content item encoded at different bit rates, the indexing information identifying respective locations in the concatenated file of a plurality of individual encodings of the content item” in Claim 1, “the file encoder 116 logs transcoding and concatenation to a file or database. … The file encoder 116 is also responsible for generating the rate map index files for each concatenated file. During the transcoding and concatenation processes, the file encoder 116 has all the information necessary to generate the rate map index files. The transcoding configurations contain information on the granularity and units for index information. The rate map index files are written to the storage device 114 with the concatenated media files.” in Col. 9 lines 49-65); and 
a server, communicatively coupled to the database, to perform a query of the database for a match to a index information for a frame of a first version of the source video played on a playback device, the query causing the database to match the index information to a second index information from among the index information stored in the database and to return the timestamp associated with the second hash vector(“the native media player to initiate a seek operation from the current playback position to a position corresponding to the new concatenated file byte offset for a new encoding” in Claim 15, “receiving index information from the server, the indexing information contained in a rate map index file and relating to an indexing of a single concatenated file at the server, the concatenated file containing multiple versions of a content item encoded at different bit rates, the indexing information identifying respective locations in the concatenated file of a plurality of individual encodings of the content item” in Claim 1, “the file encoder 116 logs transcoding and concatenation to a file or database. … The file encoder 116 is also responsible for generating the rate map index files for each concatenated file. During the transcoding and concatenation processes, the file encoder 116 has all the information necessary to generate the rate map index files. The transcoding configurations contain information on the granularity and units for index information. The rate map index files are written to the storage device 114 with the concatenated media files.” in Col. 9 lines 49-65, “identifying, using the retrieved indexing information, a time offset in the concatenated file for the selected encoding; identifying, using the retrieved indexing information, a byte offset in the concatenated file for the selected encoding; retrieving segments of the content item from the server by specifying the identified byte offset for the selected encoding to the server; and notifying the native media player to play at the identified time offset and providing the retrieved segments to the native media player for the selected encoding” in Claim 1).
Ma is silent about an application programming interface (API) server, communicatively coupled to the database and store hash vectors for respective frames of a first video and second video. 
Kerl teaches store hash vectors for respective frames of a first video and second video(“processes may receive a first video from a network; compute first hash values of first frames of the first video; determine, from a data structure that stores second hash values of second frames of a second video based at least on spatial relationships among respective portions of the second hash values” in Abs).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Ma with the above teachings of Kerl in order to improve detection accuracy.
Gupta teaches an application programming interface (API) server, communicatively coupled to the database (“the social networking server 112 communicates with the various databases 116-124 through the one or more database server(s) 126. In this regard, the database server(s) 126 provide one or more interfaces and/or services for providing content to, modifying content in, removing content from, or otherwise interacting with the databases 116-124. For example, and without limitation, such interfaces and/or services may include one or more Application Programming Interfaces (APIs)” in Para.[0033], “When utilizing LSH, the near-neighbor feature vectors are those feature vectors with the same LSH hash as the query feature vector. That is, the search for the near-duplicate images is confined to those feature vectors in the same hash bucket as the query feature vector.” in Para.[0084]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Ma and Kerl with the above teachings of Gupta in order to improve performance with API server.
As per claim 14, Ma, Kerl and Gupta teach all of limitation of claim 13. 
Ma is silent about wherein the hash vectors are generated with a perceptual hashing process.
Kerl teaches wherein the hash vectors are generated with a perceptual hashing process (“a vector associated with a hash value of the first hash values, of the first frames of the first video, may be X, and a hash value associated with a hash value of the second hash values, of the second frames of the second video, may be Y. In particular embodiments, X.sub.i may include a value of a byte of a first hash value, and y.sub.i may include a value of a byte of a second hash value. In particular embodiments, determining if a hash value of a frame of the second video is within the first distance of a hash value of a frame of the first video may include representing the hash value of the frame of the second video as a vector Y, representing the hash value of the frame of the first video as a vector X, and determining if a Euclidean distance between X and Y is within the first distance. In particular embodiments, utilizing the data structure that stores the second hash values based at least on spatial relationships among respective portions of the second hash values may reduce a number of comparison and/or a number of computations. For example, one or more portions of the data structure may not be utilized in determining the number of the second hash values, of the second frames of the second video, that are within the first distance of the one or more of the first hash values. For instance, the data structure stores the second hash values based at least on spatial relationships, which may be utilized in determining that one or more of the second hash values may not be within the first distance without computing a distance between a hash value of a frame of the first video and a hash value of a frame of the second video” in Para.[0040]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Ma with the above teachings of Kerl in order to improve detection accuracy.
As per claim 15, Ma, Kerl and Gupta teach all of limitation of claim 13. 
Ma teaches wherein the database is a first database and further comprising: a second database, communicatively coupled to the API server, to store metadata about the source video associated with the respective timestamps and to return at least a portion of the metadata about the source video to the API server in response to a query based on the timestamp associated with the second hash vector (“In one embodiment the media stream position is derived from the frames requested by the media player 14. In one embodiment, the stream position is adjusted for the size of the media player's 14 playback buffer. In one embodiment, the stream position is saved locally as a bookmark. In another embodiment, the stream position is provided by the stream assembler 13 to the segment downloader 12, so that a bookmark may be set on the server 10. The server 10 stores the bookmark as per-user/per-media metadata in the server database. When the media player 14 starts, it may either request that rendering begin at the start of the content, or it may request that rendering begin at the last known bookmark position. In the latter case, the segment downloader 12 retrieves the bookmark metadata from the server 10, calculates the necessary offsets and begins downloading segments from that point” in Col. 8 lines 66-67 and Col. 9 lines 1-9).
As per claim 16, Ma, Kerl and Gupta teach all of limitation of claim 13. 
Ma teaches wherein the playback device comprises at least one of a television, a set-top box, a computer, or a mobile device (“One of the primary methods for monetizing video content is the periodic insertion of video advertisements, as with television and some internet-based long form video content delivery, as well as through strictly pre-roll and/or post-roll advertisements as with movies and some short form video content delivery” in Col. 2 lines 3-11).
As per claim 17, Ma, Kerl and Gupta teach all of limitation of claim 13. 
Ma teaches wherein the different versions of the source video are edited for length, edited for content, and/or changed in format with respect to each other (“receiving index information from the server, the indexing information contained in a rate map index file and relating to an indexing of a single concatenated file at the server, the concatenated file containing multiple versions of a content item encoded at different bit rates, the indexing information identifying respective locations in the concatenated file of a plurality of individual encodings of the content item” in Claim 1).
As per claim 18, Ma, Kerl and Gupta teach all of limitation of claim 13. 
Ma is silent about wherein the database is configured to match the first hash vector to the second hash vector by determining that the first hash vector is within a predefined strict threshold distance of the second hash vector.
Kerl teaches wherein the database is configured to match the first hash vector to the second hash vector by determining that the first hash vector is within a predefined strict threshold distance of the second hash vector (“a number of the second hash values that are within a distance of one or more of the first hash values; and determine that the first video includes at least a portion of the second video based at least on the number of the second hash values that are within the distance value of the one or more of the first hash values and based at least on a minimum of a number of frames of the first video and a number of frames of the second video” in Abs).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Ma with the above teachings of Kerl in order to improve detection accuracy.
As per claim 19, Ma, Kerl and Gupta teach all of limitation of claim 13. 
Ma teaches wherein the database is configured to match the first hash vector to the second hash vector by determining that the first hash vector is within a threshold distance of several of the hash vectors corresponding to frames of the source video within a few seconds of each other (“An offset is calculated into the concatenated file for the new encoding. The offset corresponds to the same current position in the current encoding. In one embodiment, the offset is calculated as a time offset (e.g. 30 seconds in to the first encoding, and 30 seconds in to the second encoding). In another embodiment, the offset is calculated as a frame offset (e.g. 5th frame of the first encoding, to the 5th frame of the second encoding). The offset is then converted into a concatenated file byte offset. In one embodiment, the offset is calculated directly, using a known frame size for a given encoding, as (N*F), where N is the frame number and F is the known frame size. In another embodiment, the offset is looked up in the rate map index file, as described above. In one embodiment, the offset is calculated as the next frame or range to be retrieved by the downloader.” in Col. 6 lines 15-36).
As per claim 20, Ma, Kerl and Gupta teach all of limitation of claim 13. 
Ma teaches wherein the database is further configured to return a timestamp offset in response to matching the first hash vector to the second hash vector, the timestamp offset representing a difference between a timestamp corresponding to the frame of the first version of the source video and the timestamp associated with the second hash vector (“An offset is calculated into the concatenated file for the new encoding. The offset corresponds to the same current position in the current encoding. In one embodiment, the offset is calculated as a time offset (e.g. 30 seconds in to the first encoding, and 30 seconds in to the second encoding). In another embodiment, the offset is calculated as a frame offset (e.g. 5th frame of the first encoding, to the 5th frame of the second encoding). The offset is then converted into a concatenated file byte offset. In one embodiment, the offset is calculated directly, using a known frame size for a given encoding, as (N*F), where N is the frame number and F is the known frame size. In another embodiment, the offset is looked up in the rate map index file, as described above. In one embodiment, the offset is calculated as the next frame or range to be retrieved by the downloader.” in Col. 6 lines 15-36).
As per claim 21, Ma, Kerl and Gupta teach all of limitation of claim 13. 
Ma teaches wherein the database further stores information identifying the source video and is configured to provide the information identifying the source video in response to matching the first hash vector to the second hash vector (“a rate map index file is used. The rate map index file contains a plurality of entries, each entry containing an index into the concatenated file. Each index contains a plurality of concatenated file byte offsets which are offsets into the concatenated file. Each entry contains a concatenated file byte offset for each encoding in the concatenated file, such that each byte offset maps a position, in the current encoding, to the corresponding position in another encoding within the concatenated file. The offsets may be tuned to different granularity. In one embodiment the rate map indices map out only the start of the encodings. In another embodiment, the rate map indices map out individual frames of a video encoding. In another embodiment, the rate map indices map out groups of frames, beginning with key frames, for a video encoding. In another embodiment, the rate map indices map out the different compression or encryption blocks of a data encoding. The rate map indices are all of fixed size, so that the rate map indices themselves may be easily indexed by a rate map index file byte offset which is an offset into the rate map index file. For example, the index for a given frame F of a given encoding E can be found in the rate map index file at byte (((E*N)+F)*I), where N is the number of frames in each encoding, and I is the size of each index. The number of frames N is preferably consistent for all encodings of a given source video, though may differ from one source video to another.” in Col. 3 lines 61-67 and Col. 4 lines 1-19).
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The USPTO internet Web site contains terminal disclaimer forms which may be used.  Please visit http://www.uspto.gov/forms/.  The filing date of the application will determine what form should be used.  A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission.  For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.  
Claims 2-21 are rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claims 1-17 of U.S. Patent No. 12,062,026.  Although the conflicting claims at issue are not identical, they are not patentably distinct from each other. See the reasons sets forth below:
Instance Application No. 18/800,833
U.S. Patent No. 12,062,026
2. A method, comprising: generating a first hash vector for a frame of a first version of a source video; storing the first hash vector in a database; playing a second version of the source video on a playback device; generating a second hash vector for a frame of the second version of the source video; matching the second hash vector to the first hash vector in the database; and in response to matching the second hash vector to the first hash vector, providing a timestamp associated with the first hash vector from the database.

3. The method of claim 2, wherein generating the first hash vector comprises generating the first hash vector with a perceptual hashing process.

4. The method of claim 2, wherein playing the second version of the source video on the playback device comprises displaying the frame of the second version of the source video on at least one of a television, a set-top box, a computer, or a mobile device.

5. The method of claim 2, wherein the second version of the source video is edited for length, edited for content, and/or changed in format with respect to the first version of the source video.

6. The method of claim 2, wherein matching the second hash vector to the first hash vector comprises determining that the second hash vector is within a predefined strict threshold distance of the first hash vector.

7. The method of claim 2, wherein matching the second hash vector to the first hash vector comprises determining that the second hash vector is within a threshold distance of hash vectors corresponding to frames of the first version of the source video within a few seconds of each other.

8. The method of claim 2, wherein matching the second hash vector to the first hash vector comprises: transmitting the second hash vector from the playback device to an Application Programming Interface (API) server; and querying, via the API server, the database with the first hash vector.

9. The method of claim 2, wherein the database is a first database and further comprising: querying a second database for metadata associated with the frame of the first version of the source video based on the timestamp; and retrieving the metadata from the second database.

10. The method of claim 9, further comprising: transmitting the metadata to the playback device; and displaying the metadata on the playback device.

11. The method of claim 2, further comprising: in response to matching the second hash vector to the first hash vector, providing a timestamp offset representing a difference between the timestamp corresponding to the frame of the first version of the source video and a timestamp corresponding to the frame of the second version of the source video.

12. The method of claim 2, further comprising: in response to matching the second hash vector to the first hash vector, providing information identifying the source video.

13. A system comprising: a database to store hash vectors for respective frames of different versions of a source video, the hash vectors being associated in the database with respective timestamps; and an application programming interface (API) server, communicatively coupled to the database, to perform a query of the database for a match to a first hash vector for a frame of a first version of the source video played on a playback device, the query causing the database to match the first hash vector to a second hash vector from among the hash vectors stored in the database and to return the timestamp associated with the second hash vector.

14. The system of claim 13, wherein the hash vectors are generated with a perceptual hashing process.

15. The system of claim 13, wherein the database is a first database and further comprising: a second database, communicatively coupled to the API server, to store metadata about the source video associated with the respective timestamps and to return at least a portion of the metadata about the source video to the API server in response to a query based on the timestamp associated with the second hash vector.

16. The system of claim 13, wherein the playback device comprises at least one of a television, a set-top box, a computer, or a mobile device.

17. The system of claim 13, wherein the different versions of the source video are edited for length, edited for content, and/or changed in format with respect to each other.

18. The system of claim 13, wherein the database is configured to match the first hash vector to the second hash vector by determining that the first hash vector is within a predefined strict threshold distance of the second hash vector.

19. The system of claim 13, wherein the database is configured to match the first hash vector to the second hash vector by determining that the first hash vector is within a threshold distance of several of the hash vectors corresponding to frames of the source video within a few seconds of each other.

20. The system of claim 13, wherein the database is further configured to return a timestamp offset in response to matching the first hash vector to the second hash vector, the timestamp offset representing a difference between a timestamp corresponding to the frame of the first version of the source video and the timestamp associated with the second hash vector.

21. The system of claim 13, wherein the database further stores information identifying the source video and is configured to provide the information identifying the source video in response to matching the first hash vector to the second hash vector.
1. A method of identifying a frame of a source video, the method comprising: generating hash vectors for respective frames of different versions of the source video; associating the hash vectors with information about the source video; separating the hash vectors into subsets; storing each subset in a different shard of a database; playing a first version of the source video on a playback device; generating a first hash vector for a first frame of the first version of the source video; matching the first hash vector to a matching hash vector among the hash vectors in the different shards of the database; and in response to matching the first hash vector to the matching hash vector, retrieving the information about the source video.

2. The method of claim 1, wherein the hash vectors and the first hash vector are generated with a perceptual hashing process.

3. The method of claim 1, wherein the first hash vector has a size of less than or equal to 4096 bits.

4. The method of claim 1, wherein generating the first hash vector occurs within about 100 milliseconds.

5. The method of claim 1, wherein generating the first hash vector occurs automatically at regular intervals.

6. The method of claim 1, wherein generating the first hash vector occurs in response to a command from a viewer.

7. The method of claim 1, wherein separating the hash vectors into subsets comprises separating the hash vectors randomly into subsets.

8. The method of claim 1, wherein separating the hash vectors into subsets comprises separating the hash vectors randomly into even subsets.

9. The method of claim 1, further comprising: generating a second hash vector for a second frame of the first version of the source video; matching the second hash vector to the matching hash vector; in response to matching the second hash vector to the matching hash vector, retrieving a timestamp corresponding to the second hash vector; and transmitting the timestamp to the playback device.

10. The method of claim 1, wherein the database is a first database and the information about the source video comprises metadata corresponding to the respective frames of the source video, and further comprising: storing the metadata corresponding to the respective frames in a second database, wherein retrieving the information about the source video comprises retrieving the metadata corresponding to the matching hash vector from the second database.

11. The method of claim 10, wherein the metadata represents at least one of a location in the source video, a garment worn by an actor in the source video, a product appearing in the source video, or music playing the source video.

12. The method of claim 10, wherein the hash vectors are associated with the metadata by respective timestamps, and wherein retrieving the metadata comprises: querying the second database based on a timestamp associated with the matching hash vector; and retrieving the metadata associated with the timestamp from the second database.

13. The method of claim 10, further comprising: displaying the metadata to a viewer via the playback device.

14. The method of claim 13, wherein the playback device includes at least one of a television, a set-top box, a computer, or a mobile device.

15. The method of claim 1, wherein matching the first hash vector to the matching hash vector comprises: determining that the first hash vector is within a threshold distance of the matching hash vector.

16. The method of claim 1, wherein the matching hash vector is for a frame in a second version of the source video different than the first version of the source video.

17. The method of claim 1, wherein matching the first hash vector to the matching hash vector comprises: transmitting the first hash vector to an Application Programming Interface (API) server; determining, via the API server, that the first hash vector matches the matching hash vector; and in response to matching the first hash vector to the matching hash vector, identifying a timestamp associated with the matching hash vector.


Claims 2-21 are anticipated by U.S. Patent No. 12,062,026 claims 1-17 as show in the table above.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SUNGHYOUN PARK whose telephone number is (571)270-1333. The examiner can normally be reached M - Thur 6:00 am - 4 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, THAI Q TRAN can be reached at (571)272-7382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SUNGHYOUN PARK/Examiner, Art Unit 2484
Read full office action
Prosecution Timeline

Aug 12, 2024
Application Filed
Nov 18, 2024
Response after Non-Final Action
Dec 30, 2025
Non-Final Rejection — §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/392,872
Patent 12586377
SYSTEMS AND METHODS TO PREDICT AGGRESSION IN SURVEILLANCE CAMERA VIDEO
2y 5m to grant Granted Mar 24, 2026
18/399,135
Patent 12556650
FLEET WIDE VIDEO SEARCH
2y 5m to grant Granted Feb 17, 2026
18/779,629
Patent 12556795
FOLDING PRINTED CIRCUIT BOARD ASSEMBLY FOR ENDOSCOPE CAMERA
2y 5m to grant Granted Feb 17, 2026
18/050,040
Patent 12549697
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM
2y 5m to grant Granted Feb 10, 2026
18/078,411
Patent 12549797
METHODS AND SYSTEMS FOR PROVIDING MEDIA CONTENT
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
75%
Grant Probability
85%
With Interview (+10.2%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 613 resolved cases by this examiner. Grant probability derived from career allow rate.