Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/20/2026 has been entered.
Response to Amendment
Submission dated 01/20/2026 amends claims 5-7, 13-15 and 20. Claims 1-20 are pending.
Response to Arguments
Applicant's arguments in the submission have been fully considered but they are not persuasive.
On page 6-7, the applicant argues that Armiri as applied does not teach or suggest “selecting a representative snapshot for each of the at least one cluster” as recited in the because 1) a minimum number of shot-level keyframes must be selected is wot, not one, and 2) the multiple shot-level keyframes are clustered after the key frames are selected.
With respect to the argument 1), the examiner disagrees because the claim language does not limit the number of shot-level keyframes. As set forth in MPEP 2111.01, it is improper to import claim limitations from the specification (see MPEP 2111.01(II), which states “"Though understanding the claim language may be aided by explanations contained in the written description, it is important not to import into a claim limitations that are not part of the claim.””). The examiner suggests amending the claim to recites the number of the representative snapshot had the applicant wants to make the above argument.
With respect to the argument 2), the examiner disagrees because as clearly explained before and below, the frames within each boundaries are grouped before the shot-level keyframes are selected (see, e.g., sections 5.1 and 6.2 of Amiri, which teach grouping frames of an input video that are within each shot boundary).
For the foregoing reasons, the examiner finds the applicant’s arguments unpersuasive.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-4, 9-12 and 17-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over a non-patent literature titled “Hierarchical Keyframe-based Video Summarization Using QR-Decomposition and Modified k-Means Clustering” by Amiri et al. (hereinafter Amiri), published in 2010, in view of a non-patent literature titled “Video Shot Boundary Detection Using QR-Decomposition and
Gaussian Transition Detection” by Fathy et a. (hereinafter Fathy), published in 2009.
For claims 1, 9 and 17, Amiri as applied discloses a method of identifying targets, comprising:
receiving a first plurality of snapshot (see, e.g., sections 5 and 6.2, which receiving frames of an input video);
generating a first plurality of descriptors representing visual appearances of one or more objects in each of the first plurality of snapshots (see, e.g., sections 5.1 and 6.2, which teach extracting spatial features of each frame from a broad range of image features; the examiner interprets the spatial features as the claimed first descriptors);
grouping the first plurality of snapshots into at least one cluster based on the plurality of descriptors (see, e.g., sections 5.1 and 6.2, which teach grouping frames of an input video that are within each shot boundary);
selecting a representative snapshot for each of the at least one cluster (see, e.g., sections 5, 5.1 and 6.2, which teach detecting shot level keyframes for each shot);
generating at least one second descriptor for each representative snapshot, wherein the at least one second descriptor is more complex than the first plurality of descriptors (see, e.g., sections 5.1- 5.2 and 6.2, which teach clustering the shot level keyframes into common scenes; the examiner interprets the keyframes and identification/descriptions thereof as the claimed second descriptors; the examiner finds the purported second descriptors, keyframes (KFs) within each shot, are more complex than the purported first descriptors, the shots themselves, because the purported second descriptors further classify the purported first descriptors and hence correspond to a higher level complexity descriptor than the purported first descriptors (see, e.g., pars. 23-28 of the specification, which uses a similar logic to determine complexity levels of the descriptors, where a descriptor of a higher complexity is extracted after a further classification)); and
identifying a target based on comparing the at least second descriptor and a third descriptor (see, e.g., section 5.2, which teaches detecting scene level keyframes based on the scene level summaries and the value of Tmax; the examiner interprets the values of Tmax as the claimed third descriptor).
While Amiri as applied discusses the spatial feature in the context of shot boundaries, it does not explicitly teach that the shot boundaries for the grouping of the frames are determined based on the extracted spatial features. Fathy in the analogous art teaches determining shot boundaries from the spatial features by applying QR-Decomposition and Gaussian transition detection (see, e.g., sections 5.1-5.4 of Fathy).
It would have been obvious to modify Amiri to determine shot boundaries as taught by Fathy because doing so would allow finding the correct transitions between shots (see sections 5.2-5.3 of Fathy).
For claims 2, 10 and 18, Amiri in view of Fathy teaches that the third descriptor is associated with a second plurality of snapshots or an input query (see, e.g., section 5.2 of Amiri, which teaches that Tmax are predefined, i.e., user inputs, and associated with the snapshots; the examiner notes that the claimed second snapshot can be interpreted any snapshot).
For claims 3, 11 and 19, Amiri in view of Fathy teaches that the input query includes one or more arrays of numbers representing an intended target (see, e.g., sections 5.1 and 5.2 of Amiri, which teach that values of Tmax are integers, representing the number of keyframes).
For claims 4 and 12, while Amiri in view of Fathy does not explicitly teach, the limitations of these claim are obvious over Amiri in view of Fathy and further in view of MPEP 2144.04(VI)(B), i.e., In re Harza, because the limitations of claims 4 and 12 merely duplicate the subject matter of claims 1 and 9, which have been anticipated by Amiri, and as held in In re Harza, merely duplicating what has been already taught is obvious, i.e., has no patentable significance unless a new and unexpected results is produced (see 2144.04(VI)(B) and In re Harza, 274 F.2d 669, 124 USPQ 378 (CCPA 1960)).
Claim(s) 6-8, 14-16 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Amiri in view of Fathy and further in view of us patent application publication no. 20200272509 to Wright et al. (hereinafter Wright).
For claims 6 and 14, Amiri in view of Fathy teaches, prior to selecting a representative snapshot for each of the at least one cluster, estimating a precision value for each snapshot in the at least one cluster (see, e.g., section 5.2 of Amiri, which teach determining a distance of each frame from the center of the cluster before selecting a keyframe, wherein the distance being the measure of precision/accuracy). Amiri, however, does not explicitly teach utilizing a mean average precision value as the precision value.
Wright in the analogous art teaches using a MAP of a snapshot (see, e.g., pars. 58-59 of Wright). It would have been obvious to one of ordinary skill in the art to modify Amiri in view of Fathy to use a neural network to estimate the MAP for keyframe extraction as taught by Wright because doing so would provide more accurate analysis of a snapshot (see, e.g., par. 57 of Wright) and doing so would constitute a simple substitution of one element for another to achieve more accurate analysis of a snapshot (see, e.g., MPEP 2143(I)(B)).
For claims 7 and 15, Amiri in view of Fathy teaches selecting a representative snapshot for each of the at least one cluster comprises selecting a snapshot in the at least one cluster having a highest estimated MAP (see, e.g., section 5.2, selecting the center most frame as the keyframe; the examiner the center most frame as the claimed snapshot with the highest precision value). Amiri, however, does not explicitly teach utilizing a mean average precision value as the precision value.
Wright in the analogous art teaches using a MAP of a snapshot (see, e.g., pars. 58-59 of Wright). It would have been obvious to one of ordinary skill in the art to modify Amiri to use a neural network to estimate the MAP for keyframe extraction as taught by Wright because doing so would provide more accurate analysis of a snapshot (see, e.g., par. 57 of Wright) and doing so would constitute a simple substitution of one element for another to achieve more accurate analysis of a snapshot (see, e.g., MPEP 2143(I)(B)).
For claims 8 and 16, while Amiri in view of Fathy not explicitly teach, Wright in the analogous art teaches estimating a MAP of a snapshot using a neural network (see, e.g., pars. 58-59 of Wright).
It would have been obvious to one of ordinary skill in the art to modify Amiri to use a neural network to estimate the MAP for keyframe extraction as taught by Wright because doing so would provide predictable results of automating and making the extraction process more robust by using a neural network (see MPEP 2143(I)(D)).
For claim 20, Amiri in view of Fathy teaches, prior to selecting a representative snapshot, estimating a mean average precision (MAP) for each snapshot in the at least one cluster (see, e.g., section 5.2, which teach determining a distance of each frame from the center of the cluster to select a keyframe, wherein the distance being the measure of precision/accuracy). Amiri, however, does not explicitly teach estimating a MAP of a snapshot using a neural network.
Wright in the analogous art teaches estimating a MAP of a snapshot using a neural network (see, e.g., par. 58 of Wright).
It would have been obvious to one of ordinary skill in the art to modify Amiri to use a neural network to estimate the MAP for keyframe extraction as taught by Wright because doing so would provide predictable results of automating and making the extraction process more robust by using a neural network (see MPEP 2143(I)(D)).
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Us patent application publication no. 17/882208
Us patent no. 11423248
1. A method of identifying targets, comprising:
receiving a first plurality of snapshots;
generating a first plurality of descriptors representing visual appearances of one or more objects in each of the first plurality of snapshots;
grouping the first plurality of snapshots into at least one cluster based on the plurality of descriptors;
selecting a representative snapshot for each of the at least one cluster;
generating at least one second descriptor for each representative snapshot, wherein the at least one second descriptor is more complex than the first plurality of descriptors; and
identifying a target based on comparing the at least second descriptor and a third descriptor.
1. A method of hierarchical sampling, comprising:
receiving a first plurality of snapshots;
generating a first plurality of descriptors each associated with the first plurality of snapshots;
grouping the first plurality of snapshots into at least one cluster based on the plurality of descriptors;
selecting a representative snapshot for each of the at least one cluster;
generating at least one second descriptor for the representative snapshot for each of the at least one cluster, wherein the at least one second descriptor is more complex than the first plurality of descriptors; and
identifying a target by applying the at least second descriptor to a second plurality of snapshots.
2. The method of claim 1, wherein the third descriptor is associated with a second plurality of snapshots or an input query.
2. The method of claim 1, further comprising, prior to receiving the first plurality of snapshots:
receiving a third plurality of snapshots;
generating a third plurality of descriptors each associated with the third plurality of snapshots, wherein the third plurality of descriptors are less complex than the first plurality of descriptors;
grouping the third plurality of snapshots into a plurality of clusters based on the third plurality of descriptors; and
selecting a second plurality of representative snapshots as the first plurality of snapshots.
3. The method of claim 2, wherein the input query includes one or more arrays of numbers representing an intended target.
2. The method of claim 1, further comprising, prior to receiving the first plurality of snapshots:
receiving a third plurality of snapshots;
generating a third plurality of descriptors each associated with the third plurality of snapshots, wherein the third plurality of descriptors are less complex than the first plurality of descriptors;
grouping the third plurality of snapshots into a plurality of clusters based on the third plurality of descriptors; and
selecting a second plurality of representative snapshots as the first plurality of snapshots.
4. The method of claim 1, further comprising, prior to receiving the first plurality of snapshots:
receiving a second plurality of snapshots;
generating a third plurality of descriptors each associated with the second plurality of snapshots, wherein the third plurality of descriptors are less complex than the first plurality of descriptors;
grouping the second plurality of snapshots into a plurality of clusters based on the third plurality of descriptors; and
selecting a second plurality of representative snapshots as the first plurality of snapshots.
2. The method of claim 1, further comprising, prior to receiving the first plurality of snapshots:
receiving a third plurality of snapshots;
generating a third plurality of descriptors each associated with the third plurality of snapshots, wherein the third plurality of descriptors are less complex than the first plurality of descriptors;
grouping the third plurality of snapshots into a plurality of clusters based on the third plurality of descriptors; and
selecting a second plurality of representative snapshots as the first plurality of snapshots.
5. The method of claim 1, further comprising:
classifying each representative snapshot for each of the at least one cluster;
aggregating classification scores of each representative snapshot for each of the at least one cluster;
determining a class based on the aggregated classification scores; and
wherein the at least one second descriptor is a class-specific descriptor.
4. The method of claim 1, further comprising:
classifying the representative snapshot for each of the at least one cluster;
aggregating classification scores of the representative snapshot for each of the at least one cluster;
determining a class based on the aggregated classification scores; and
wherein the at least one descriptor is a class-specific descriptor.
6. The method of claim 1, further comprising, prior to selecting a representative snapshot for each of the at least one cluster, estimating a mean average precision (MAP) for each snapshot in the at least one cluster.
5. The method of claim 1, further comprising, prior to selecting the representative snapshot, estimating a mean average precision (MAP) for each snapshot in the at least one cluster.
7. The method of claim 6, wherein selecting a representative snapshot for each of the at least one cluster comprises selecting a snapshot in the at least one cluster having a highest estimated MAP.
6. The method of claim 5, wherein selecting the representative snapshot for each of the at least one cluster comprises selecting a snapshot in the at least one cluster having a highest estimated MAP.
8. The method of claim 6, wherein estimating a MAP comprises using a neural network to estimate the MAP.
8. The method of claim 5, wherein estimating a MAP comprises using a neural network to estimate the MAP.
9. A non-transitory computer readable medium comprising instructions stored therein that, when executed by a processor of a system, cause the processor to:
receive a first plurality of snapshots;
generate a first plurality of descriptors representing visual appearances of one or more objects in each of the first plurality of snapshots;
group the first plurality of snapshots into at least one cluster based on the plurality of descriptors;
select a representative snapshot for each of the at least one cluster;
generate at least one second descriptor for each representative snapshot, wherein the at least one second descriptor is more complex than the first plurality of descriptors; and
identify a target based on comparing the at least second descriptor and a third descriptor.
9. A non-transitory computer readable medium comprising instructions stored therein that, when executed by a processor of a system, cause the processor to:
receive a first plurality of snapshots;
generate a first plurality of descriptors each associated with the first plurality of snapshots;
group the first plurality of snapshots into at least one cluster based on the plurality of descriptors;
select a representative snapshot for each of the at least one cluster;
generate at least one second descriptor for the representative snapshot for each of the at least one cluster, wherein the at least one second descriptor is more complex than the first plurality of descriptors; and
identify a target by applying the at least second descriptor to a second plurality of snapshots.
10. The non-transitory computer readable medium of claim 9, wherein the third descriptor is associated with a second plurality of snapshots or an input query.
10. The non-transitory computer readable medium of claim 9, further comprising instructions stored therein that, when executed by the processor of the system, cause the processor to, prior to receiving the first plurality of snapshots:
receive a third plurality of snapshots;
generate a third plurality of descriptors each associated with the third plurality of snapshots, wherein the third plurality of descriptors are less complex than the first plurality of descriptors;
group the third plurality of snapshots into a plurality of clusters based on the third plurality of descriptors; and
select a second plurality of representative snapshots as the first plurality of snapshots.
11. The non-transitory computer readable medium of claim 10, wherein the input query includes one or more arrays of numbers representing an intended target.
10. The non-transitory computer readable medium of claim 9, further comprising instructions stored therein that, when executed by the processor of the system, cause the processor to, prior to receiving the first plurality of snapshots:
receive a third plurality of snapshots;
generate a third plurality of descriptors each associated with the third plurality of snapshots, wherein the third plurality of descriptors are less complex than the first plurality of descriptors;
group the third plurality of snapshots into a plurality of clusters based on the third plurality of descriptors; and
select a second plurality of representative snapshots as the first plurality of snapshots.
12. The non-transitory computer readable medium of claim 9, further comprising instructions that, prior to receiving the first plurality of snapshots, cause the processor to:
receive a second plurality of snapshots;
generate a third plurality of descriptors each associated with the second plurality of snapshots, wherein the third plurality of descriptors are less complex than the first plurality of descriptors;
group the second plurality of snapshots into a plurality of clusters based on the third plurality of descriptors; and
select a second plurality of representative snapshots as the first plurality of snapshots.
10. The non-transitory computer readable medium of claim 9, further comprising instructions stored therein that, when executed by the processor of the system, cause the processor to, prior to receiving the first plurality of snapshots:
receive a third plurality of snapshots;
generate a third plurality of descriptors each associated with the third plurality of snapshots, wherein the third plurality of descriptors are less complex than the first plurality of descriptors;
group the third plurality of snapshots into a plurality of clusters based on the third plurality of descriptors; and
select a second plurality of representative snapshots as the first plurality of snapshots.
13. The non-transitory computer readable medium of claim 9, further comprising instructions that cause the processor to:
classify each representative snapshot for each of the at least one cluster;
aggregate classification scores of each representative snapshot for each of the at least one cluster;
determine a class based on the aggregated classification scores; and
wherein the at least one descriptor is a class-specific descriptor.
12. The non-transitory computer readable medium of claim 9, further comprising instructions stored therein that, when executed by the processor of the system, cause the processor to:
classify the representative snapshot for each of the at least one cluster;
aggregate classification scores of the representative snapshot for each of the at least one cluster;
determine a class based on the aggregated classification scores; and
wherein the at least one descriptor is a class-specific descriptor.
14. The non-transitory computer readable medium of claim 9, further comprising instructions that, prior to selecting a representative snapshot for each of the at least one cluster, cause to processor to estimate a mean average precision (MAP) for each snapshot in the at least one cluster.
13. The non-transitory computer readable medium of claim 9, further comprising instructions stored therein that, when executed by the processor of the system, cause the processor to, prior to select the representative snapshot, estimate a Mean Average Precision for each snapshot in the at least one cluster.
15. The non-transitory computer readable medium of claim 14, wherein the instructions for selecting a representative snapshot for each of the at least one cluster comprises instructions for selecting a snapshot in the at least one cluster having a highest estimated MAP.
14. The non-transitory computer readable medium of claim 13, wherein the instructions for selecting the representative snapshot for each of the at least one cluster comprises instructions that, when executed by the processor of the system, cause the processor to select a snapshot in the at least one cluster having a highest estimated mean average precision (MAP).
16. The non-transitory computer readable medium of claim 14, wherein the instructions for estimating a MAP comprises instructions for using a neural network to estimate the MAP.
16. The non-transitory computer readable medium of claim 13, wherein the instructions for estimating a MAP comprises instructions that, when executed by the processor of the system, cause the processor to use a neural network to estimate the MAP.
17. A system, comprising:
memory that stores instructions; and
a processor configured to execute the instructions to:
receive a first plurality of snapshots;
generate a first plurality of descriptors representing visual appearances of one or more objects in each of the first plurality of snapshots;
group the first plurality of snapshots into at least one cluster based on the plurality of descriptors;
select a representative snapshot for each of the at least one cluster;
generate at least one second descriptor for each representative snapshot, wherein the at least one second descriptor is more complex than the first plurality of descriptors; and
identify a target based on comparing the at least second descriptor and a third descriptor.
17. A system, comprising:
memory that stores instructions; and
a processor configured to execute the instructions to:
receive a first plurality of snapshots;
generate a first plurality of descriptors each associated with the first plurality of snapshots;
group the first plurality of snapshots into at least one cluster based on the plurality of descriptors;
select a representative snapshot for each of the at least one cluster;
generate at least one second descriptor for the representative snapshot for each of the at least one cluster, wherein the at least one second descriptor is more complex than the first plurality of descriptors; and
identify a target by applying the at least second descriptor to a second plurality of snapshots.
18. The system of claim 17, wherein the third descriptor is associated with a second plurality of snapshots or an input query.
18. The system of claim 17, wherein the processor is further configured to execute the instructions to, prior to receiving the first plurality of snapshots:
receive a third plurality of snapshots;
generate a third plurality of descriptors each associated with the third plurality of snapshots, wherein the third plurality of descriptors are less complex than the first plurality of descriptors;
group the third plurality of snapshots into a plurality of clusters based on the third plurality of descriptors; and
select a second plurality of representative snapshots as the first plurality of snapshots.
19. The system of claim 18, wherein the input query includes one or more arrays of numbers representing an intended target.
18. The system of claim 17, wherein the processor is further configured to execute the instructions to, prior to receiving the first plurality of snapshots:
receive a third plurality of snapshots;
generate a third plurality of descriptors each associated with the third plurality of snapshots, wherein the third plurality of descriptors are less complex than the first plurality of descriptors;
group the third plurality of snapshots into a plurality of clusters based on the third plurality of descriptors; and
select a second plurality of representative snapshots as the first plurality of snapshots.
20. The system of claim 17, wherein the processor is further configured to, prior to selecting a representative snapshot for each of the at least one cluster, estimate a mean average precision (MAP) for each snapshot in the at least one cluster using a neural network.
8. The method of claim 5, wherein estimating a MAP comprises using a neural network to estimate the MAP.
16. The non-transitory computer readable medium of claim 13, wherein the instructions for estimating a MAP comprises instructions that, when executed by the processor of the system, cause the processor to use a neural network to estimate the MAP.
Claims 1, 9 and 17 are rejected on the ground of nonstatutory double patenting as being unpatentable over patent claims 1, 9 and 17 of U.S. Patent No. 11423248 (hereinafter the patent) in view of Amiri and Fathy. For the difference between claims 1, 9 and 17 and patent claims 1, 9 and 17, Amiri in the analogous art teaches detecting scene level keyframes based on the scene level summaries and the value of Tmax, (see, e.g., section 5.2 of Amiri; the examiner interprets the value of Tmax as the claimed third descriptor because they are compared to the last classification results for the further classification/clustering).
It would have been obvious to modify the teaching of claims 1, 9 and 17 to identify a target based on the comparison between a description of the last classification and another description such as the value of Tmax as taught by Amiri because doing so would yield a predictable results of providing an additional layer of classification and a more accurate identification of target (see MPEP 2143(I)(D)).
For the difference between claims 1, 9 and 17 and patent claims 1, 9 and 17 in view of Amiri, Fathy in the analogous art teaches determining shot boundaries from the spatial features by applying QR-Decomposition and Gaussian transition detection (see, e.g., sections 5.1-5.4 of Fathy).
It would have been obvious to modify Amiri to determine shot boundaries as taught by Fathy because doing so would allow finding the correct transitions between shots (see sections 5.2-5.3 of Fathy).
Claims 2, 10 and 18 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 2, 9, 10, 17, and 18 of the patent in view of Amiri and Fathy.
For claims 2, 10 and 18, the patent claims 2, 10 and 18 discloses that the third descriptor is associated with a second plurality of snapshots.
Claims 3, 11 and 19 are rejected on the ground of nonstatutory double patenting as being unpatentable over patent claims 1, 2, 9, 10, 17, and 18 of the patent in view of Amiri and Fathy.
For the difference between claims 3, 11 and 19 and patent claims 1, 2, 9, 10, 17, and 18, Amiri as applied teaches that the input query includes one or more arrays of numbers representing an intended target (see, e.g., sections 5.1 and 5.2, which teach that β and Tmax are integers, representing the number of keyframes).
It would have been obvious to modify the teaching of claims 1, 9 and 17 to identify a target based on the comparison between a description of the last classification and another description such as the value of Tmax as taught by Amiri because doing so would yield a predictable results of providing an additional layer of classification and a more accurate identification of target (see MPEP 2143(I)(D)).
Claims 4 and 12 are rejected on the ground of nonstatutory double patenting as being unpatentable over patent claims 1, 2, 9, 10, 17, and 18 of the patent in view of Amiri and Fathy.
For claims 4 and 12, patent claims in view of Amiri teaches, prior to receiving the first plurality of snapshots:
receiving a second plurality of snapshots; generating a third plurality of descriptors each associated with the second plurality of snapshots, wherein the third plurality of descriptors are less complex than the first plurality of descriptors; grouping the second plurality of snapshots into a plurality of clusters based on the third plurality of descriptors; and
selecting a second plurality of representative snapshots as the first plurality of snapshots (see, e.g., patent claims 2, 10 and 18).
Claims 5 and 13 are rejected on the ground of nonstatutory double patenting as being unpatentable over patent claims 1, 4, 9, 12, 17, and 20 of the patent in view of Amiri and Fathy.
For claims 5 and 13, patent claims in view of Amiri teaches:
classifying the representative snapshot for each of the at least one cluster;
aggregating classification scores of the representative snapshot for each of the at least one cluster;
determining a class based on the aggregated classification scores; and
wherein the at least one descriptor is a class-specific descriptor (see, e.g., patent claims 4, 12 and 20).
Claims 6 and 14 are rejected on the ground of nonstatutory double patenting as being unpatentable over patent claims 1, 5, 9, 13, and 17 of the patent in view of Amiri and Fathy.
For claims 6 and 14, patent claims in view of Amiri teaches, prior to selecting the representative snapshot, estimating a mean average precision (MAP) for each snapshot in the at least one cluster (see, e.g., patent claims 5 and 13).
Claims 7 and 15 are rejected on the ground of nonstatutory double patenting as being unpatentable over patent claims 1, 6, 9, 14, and 17 of the patent in view of Amiri and Fathy.
For claims 7 and 15, patent claims in view of Amiri teaches selecting the representative snapshot for each of the at least one cluster comprises selecting a snapshot in the at least one cluster having a highest estimated MAP (see, e.g., patent claims 6 and 14).
Claims 8, 16, and 20 are rejected on the ground of nonstatutory double patenting as being unpatentable over patent claims 1, 8, 9, 16, and 17 of the patent in view of Amiri and Fathy.
For claims 8, 16 and 20, patent claims in view of Amiri teaches using a neural network for keyframe extraction (see, e.g., patent claims 8 and 16).
Allowable Subject Matter
Claims 5 and 13 would be allowable if the double patenting can be overcome.
In regard to claims 5 and 13, when considering each as a whole, prior art of record fails to disclose or render obvious, alone or in combination:
“classifying each representative snapshot for each of the at least one cluster;
aggregating classification scores of each representative snapshot for each of the at least one cluster;
determining a class based on the aggregated classification scores; and
wherein the at least one second descriptor is a class-specific descriptor.”
Additional Citations
The following table lists several references that are relevant to the subject matter claimed and disclosed in this Application. The references are not relied on by the Examiner, but are provided to assist the Applicant in responding to this Office action.
Citation
Relevance
Yang et al. (us pat. pub. 2020/0380263)
Describes implementing key frame detection in video compression in an artificial intelligence semiconductor solution. In one embodiment, a system for detecting key frames in a video may include a feature extractor configured to extract feature descriptors for each of the multiple image frames in the video. The feature extractor may be an embedded cellular neural network of an artificial intelligence (AI) chip. The system may also include a key frame extractor configured to determine one or more key frames in the multiple image frames based on the corresponding feature descriptors of the image frames. The key frame extractor may determine the key frames based on distance values between a first set of feature descriptors corresponding to a first subset of image frames and a second set of feature descriptors corresponding to a second subset of image frames. The system may output an alert based on determining the key frames and/or display the key frames. The system may also compress the video by removing the non-key frames.
Koval et al. (us pat. pub. 2020/0372292)
Describes aspects of the disclosure provide improved systems and methods of indexing and searching video content. In one embodiment, a method of indexing and searching for video content is provided. For each frame of a first plurality of frames, a first global feature and a first plurality of local features may be identified. The first plurality of local features may be clustered around a first plurality of cluster centers. The first plurality of local features may be converted into a first plurality of binary signatures. An index that maps the first plurality of cluster centers and the first plurality of binary signatures to the first plurality of frames may be generated. A search request associated with a second video may be received and its direct and indirect features may be identified. The identified features of the second video may be compared against the index and a candidate video may be selected as a result of the search request.
Table 1
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See table 1 and form 892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WOO RHIM whose telephone number is (571)272-6560. The examiner can normally be reached Mon - Fri 9:30 am - 6:00 pm et.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Henok Shiferaw can be reached at 571-272-4637. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/WOO C RHIM/ Examiner, Art Unit 2676