DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 3-5, 7, 12, 13, 17 and 19 are objected to because of the following informalities:
For claim 3, Examiner believes this claim should be amended in the following manner:
The system of claim 1, wherein the operations further comprise:
comparing an updated timestamp of the playback of the audio track to the viseme-timestamp data for the audio track to identify an updated viseme corresponding to the updated timestamp of the audio playback; and
positioning the updated viseme at the detected location of the mouth in the video feed.
For claim 4, Examiner believes this claim should be amended in the following manner:
The system of claim 1, wherein the detection of the detected location of the mouth in the video feed comprises detecting corners of the mouth in the video feed, and wherein positioning the viseme at the detected location of the mouth in the video feed comprises positioning corners of a mouth depicted in the viseme at the corners of the mouth in the video feed.
For claim 5, Examiner believes this claim should be amended in the following manner:
The system of claim 1, wherein the detection of the detected location of the mouth in the video feed comprises detecting an angle of the mouth in the video feed, and wherein positioning the viseme at the detected location of the mouth in the video feed comprises rotating the viseme by the detected angle.
For claim 7, Examiner believes this claim should be amended in the following manner:
The system of claim 1, wherein the viseme-timestamp data comprises first and second sets of viseme-timestamps for two vocal tracks, and wherein the operations further comprise:
detecting first and second mouths in the video feed;
positioning a first viseme at a location of the first mouth in the video feed based on the first set of viseme-timestamps; and
positioning a second viseme at a location of the second mouth in the video feed based on the second set of viseme-timestamps.
For claim 12, Examiner believes this claim should be amended in the following manner:
The method of claim 10, further comprising:
comparing an updated timestamp of the playback of the audio track to the viseme-timestamp data for the audio track to identify an updated viseme corresponding to the updated timestamp of the audio playback; and
positioning the updated viseme at the detected location of the mouth in the video feed.
For claim 13, Examiner believes this claim should be amended in the following manner:
The method of claim 10, wherein the viseme-timestamp data comprises first and second sets of viseme-timestamps for two vocal tracks, the method further comprising:
detecting first and second mouths in the video feed;
positioning a first viseme at a location of the first mouth in the video feed based on the first set of viseme-timestamps; and
positioning a second viseme at a location of the second mouth in the video feed based on the second set of viseme-timestamps.
For claim 17, Examiner believes this claim should be amended in the following manner:
The non-transitory computer-readable storage medium of claim 16, wherein the detection of the detected location of the mouth in the video feed comprises detecting an angle of the mouth in the video feed, and wherein positioning the viseme at the detected location of the mouth in the video feed comprises rotating the viseme by the detected angle.
For claim 19, Examiner believes this claim should be amended in the following manner:
The non-transitory computer-readable storage medium of claim 16, wherein the viseme-timestamp data comprises first and second sets of viseme-timestamps for two vocal tracks, and wherein the operations further comprise:
detecting first and second mouths in the video feed;
positioning a first viseme at a location of the first mouth in the video feed based on the first set of viseme-timestamps; and
positioning a second viseme at a location of the second mouth in the video feed based on the second set of viseme-timestamps.
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 4, 5 and 17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
For dependent claim 4, parent claim 1 establishes “location” in line 6 and “detected location” in line 9. Claim 4 goes on to recite the phrase “the location” in line 1 of this claim and it is unclear and ambiguous to which of the previously established “location” and “detected location” is being referenced by the phrase “the location”. Examiner has suggested amendments in the claim objections discussed above to resolve the ambiguities.
For dependent claim 5, parent claim 1 establishes “location” in line 6 and “detected location” in line 9. Claim 5 goes on to recite the phrase “the location” in line 1 of this claim and it is unclear and ambiguous to which of the previously established “location” and “detected location” is being referenced by the phrase “the location”. Examiner has suggested amendments in the claim objections discussed above to resolve the ambiguities.
For dependent claim 17, parent claim 16 establishes “location” in line 5 and “detected location” in line 8. Claim 17 goes on to recite the phrase “the location” in line 2 of this claim and it is unclear and ambiguous to which of the previously established “location” and “detected location” is being referenced by the phrase “the location”. Examiner has suggested amendments in the claim objections discussed above to resolve the ambiguities.
Appropriate correction is required.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-3, 7, 10-13, 16 and 19 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Cohen-Or et al. (WO 2023/137557 A1, hereinafter “Cohen-Or”) (made of record of the IDS submitted 9/15/2025).
For claim 1, Cohen-Or discloses a system (disclosing a computer-implemented system (par. 38)) comprising:
at least one processor (disclosing a processor (par. 38));
at least one memory component storing instructions that, when executed by the at least one processor (disclosing a memory storing instructions for execution by the processor so that the processor performs the functions of the computer-implemented system (par. 38)), cause the at least one processor to perform operations comprising:
playing back an audio track (disclosing play back of an audio track (Figs. 1 and 2; par. 114 and 148));
detecting a location of a mouth depicted in a video feed (disclosing the detection of a mouth region as a location of a mouth depicted in a video feed (para. 134, 135, 141, 148, 179 and 231));
comparing a timestamp of playback of the audio track to viseme-timestamp data for the audio track to identify a viseme corresponding to the timestamp of the audio playback (disclosing the playback of the audio track is time-coded into timestamps of phonemes for comparison to a time-coded set of visemes with corresponding timestamps to identify a viseme corresponding to the timestamps of the audio playback (Figs. 1-2; par. 119-123 and 150)); and
positioning the viseme at the detected location of the mouth in the video feed (disclosing the viseme is positioned at the detected location of the mouth in the video feed to replace the mouth region with the viseme to perform lip dubbing (par. 77, 141, 151-152 and 154)).
For claim 2, depending on claim 1, Cohen-Or discloses wherein the operations further comprise:
detecting an updated location of the mouth depicted in the video feed (disclosing its operations are repeated across multiple timestamps to detect an updated mouth region as an updated location of the mouth depicted in the video feed (Figs 1-2; par. 134, 135, 141, 148, 179 and 231)); and
positioning the viseme at the detected updated location of the mouth in the video feed (disclosing the viseme is positioned at the detected updated location of the mouth in the video feed where the viseme may be mapped to multiple phonemes across multiple timestamps to replace the mouth region with the viseme to perform lip dubbing (Figs. 1-2; par. 77, 141, 151-152 and 154)).
For claim 3, depending on claim 1, Cohen-Or discloses wherein the operations further comprise:
comparing an updated timestamp of the playback of the audio track to the viseme-timestamp data for the audio track to identify an updated viseme corresponding to the timestamp of the audio playback (disclosing its operations are repeated across multiple timestamps to compare updated timestamps of the playback of the audio track to the time-coded set of visemes with corresponding timestamps to identify an updated viseme corresponding to the updated timestamps of the audio playback (Figs. 1-2; par. 119-123 and 150)); and
positioning the updated viseme at the detected location of the mouth in the video feed (disclosing the updated viseme is positioned at the detected location of the mouth in the video feed to replace the mouth region with the updated viseme to perform lip dubbing (par. 77, 141, 151-152 and 154)).
For claim 7, depending on claim 1, Cohen-Or discloses wherein viseme-timestamp data comprises first and second sets of viseme-timestamps for two vocal tracks (disclosing the timestamped viseme data may form multiple sets of visemes assigned to timestamps for multiple vocal tracks for individual characters (par. 119-123, 148-150 and 217-219)), and wherein the operations further comprise:
detecting first and second mouths in the video feed (disclosing the detection of mouths associated with the individual characters in the video feed (par. 119-123, 148-150 and 217-219));
positioning a first viseme at a location of the first mouth in the video feed based on the first set of viseme-timestamps (disclosing a first viseme is positioned at a location of a first mouth in the video feed based on a first set of visemes assigned to timestamps for a first vocal track for a first individual character (par. 77, 119-123, 141, 148-152, 154 and 217-219)); and
positioning a second viseme at a location of the second mouth in the video feed based on the second set of viseme-timestamps (disclosing a second viseme is positioned at a location of a second mouth in the video feed based on a second set of visemes assigned to timestamps for a second vocal track for a second individual character (par. 77, 119-123, 141, 148-152, 154 and 217-219)).
For claim 10, Cohen-Or discloses a method, executed by one or more processors (disclosing a method performed by a computer-implemented system with a processor (par. 38)), the method comprising steps corresponding to the functions performed by the system of claim 1 (see above as to claim 1).
For claim 11, depending on claim 10, this claim is a combination of the limitations of claim 10 and claim 2. It follows claim 11 is rejected for the same reasons as to claim 10 and claim 2.
For claim 12, depending on claim 10, this claim is a combination of the limitations of claim 10 and claim 3. It follows claim 12 is rejected for the same reasons as to claim 10 and claim 3.
For claim 13, depending on claim 10, this claim is a combination of the limitations of claim 10 and claim 7. It follows claim 13 is rejected for the same reasons as to claim 10 and claim 7.
For claim 16, Cohen-Or discloses a non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor (disclosing a memory storing instructions for execution by the processor so that the processor performs the functions of a computer-implemented system (par. 38)), cause the at least one processor to perform operations comprising functions performed by the system of claim 1 (see above as to claim 1).
For claim 19, depending on claim 16, this claim is a combination of the limitations of claim 16 and claim 7. It follows claim 19 is rejected for the same reasons as to claim 16 and claim 7.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cohen-Or in view of Aoyama et al. (U.S. Patent Application Publication 2010/0332229 A1, hereinafter “Aoyama”).
For claim 4, depending on claim 1, Cohen-Or does not disclose detecting corners of a mouth.
However, these limitations are well-known in the art as disclosed in Aoyama.
Aoyama similarly discloses a system and method for detecting lip images of a mouth for comparison to a plurality of visemes (par. 24). Aoyama explains its system detects corners of the mouth to determine an associated viseme for generating a lip image (par. 101, 188 and 200). It follows Cohen-Or may be accordingly modified with the teachings of Aoyama to detect corners of its mouth in its video feed for positioning its viseme at the detected corners of its mouth in its video feed.
A person having ordinary skill in the art (PHOSITA) before the effective filing date of the claimed invention would find it obvious to modify Cohen-Or with the teachings of Aoyama. Aoyama is analogous art in dealing with a system and method for detecting lip images of a mouth for comparison to a plurality of visemes (par. 24). Aoyama discloses its detection of corners is advantageous in appropriately detecting lips of a mouth for association with an appropriate viseme (par. 101, 188 and 200). Consequently, a PHOSITA would incorporate the teachings of Aoyama into Cohen-Or for appropriately detecting lips of a mouth for association with an appropriate viseme. Therefore, claim 4 is rendered obvious to a PHOSITA before the effective filing date of the claimed invention.
Claim(s) 5-6 and 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cohen-Or in view of Moulton et al. (U.S. Patent Application Publication 2002/0097380 A1, hereinafter “Moulton”).
For claim 5, depending on claim 1, Cohen-Or does not disclose detecting on angle of a mouth and rotating a viseme by the detected angle.
However, these limitations are well-known in the art as disclosed in Moulton.
Moulton similarly discloses a system and method for performing audio dubbing with the use of visemes (par. 22). Moulton explains its system determines an angle of a facial position of a muzzle or mouth so that the viseme is rotated by the angle to match the orientation of the mouth (par. 21, 23, 24, 39 and 40). It follows Cohen-Or may be accordingly modified with the teachings of Moulton to detect an angle of its mouth in its video feed for positioning its viseme by rotating its viseme by the detected angle.
A PHOSITA before the effective filing date of the claimed invention would find it obvious to modify Cohen-Or with the teachings of Moulton. Moulton is analogous art in dealing with a system and method for performing audio dubbing with the use of visemes (par. 22). Moulton discloses its detection of an angle is advantageous in rotating a viseme to match an orientation of a corresponding mouth for appropriate dubbing (par. 21, 23, 24, 39 and 40). Consequently, a PHOSITA would incorporate the teachings of Moulton into Cohen-Or for rotating a viseme to match an orientation of a corresponding mouth for appropriate dubbing. Therefore, claim 5 is rendered obvious to a PHOSITA before the effective filing date of the claimed invention.
For claim 6, depending on claim 1, Cohen-Or as modified by Moulton discloses wherein the operations further comprise: scaling the viseme based on a characteristic dimension of a head detected in the video feed (Moulton similarly discloses a system and method for performing audio dubbing with the use of visemes (par. 22); Moulton explains its system scales a viseme based on a dimensions of a head of an actor in a video (par. 21-24 and 26); and it follows Cohen-Or may be accordingly modified with the teachings of Moulton to scale its viseme based on dimensions of a head detected in its video feed to appropriately scale its viseme to a correct size).
For claim 17, depending on claim 16, this claim is a combination of the limitations of claim 16 and claim 5. It follows claim 17 is rejected for the same reasons as to claim 16 and claim 5.
For claim 18, depending on claim 16, this claim is a combination of the limitations of claim 16 and claim 6. It follows claim 18 is rejected for the same reasons as to claim 16 and claim 6.
Claim(s) 8 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cohen-Or in view of Cabanilla et al. (U.S. Patent Application Publication 2016/0012853 A1, hereinafter “Cabanilla”).
For claim 8, depending on claim 7, Cohen-Or discloses wherein the first mouth is larger in the video feed than the second mouth (disclosing a near head with the first mouth has a larger amount of pixels forming a larger image region than the second mouth (par. 148)).
Cohen-Or does not disclose a lead vocal.
However, these limitations are well-known in the art as disclosed in Cabanilla.
Cabanilla similarly discloses a system and method of adding lip synching effects to a video clip (par. 27). Cabanilla explains a clip may be associated with multiple audio tracks where one of the tracks be a lead vocal track for lip synching (par. 32 and 35). It follows Cohen-Or may be accordingly modified with the teachings of Cabanilla to implement a lead vocal corresponding to its first set of viseme-timestamps.
A PHOSITA before the effective filing date of the claimed invention would find it obvious to modify Cohen-Or with the teachings of Cabanilla. Cabanilla is analogous art in dealing with a system and method of adding lip synching effects to a video clip (par. 27). Cabanilla discloses its use of a lead vocal track is advantageous in facilitating lip synching effects for a video clip (par. 32 and 35). Consequently, a PHOSITA would incorporate the teachings of Cabanilla into Cohen-Or for facilitating lip synching effects for a video clip. Therefore, claim 8 is rendered obvious to a PHOSITA before the effective filing date of the claimed invention.
For claim 14, depending on claim 13, this claim is a combination of the limitations of claim 13 and claim 8. It follows claim 14 is rejected for the same reasons as to claim 13 and claim 8.
Claim(s) 9, 15 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cohen-Or in view of Adams et al. (U.S. Patent 10,770,092 B1, hereinafter “Adams”) (made of record of the IDS submitted 9/15/2025).
For claim 9, depending on claim 7, Cohen-Or does not disclose assignment of visemes is done randomly.
However, these limitations are well-known in the art as disclosed in Adams.
Adams similarly discloses a system and method for viseme data generation to perform lip syncing (col. 1/lines 57-67 and col. 2/lines 1-5). Adams explains visemes may be associated or assigned randomly to audio data (col. 9/lines 41-57). It follows Cohen-Or may be accordingly modified with the teachings of Adams to assign its first set of viseme-timestamps to a mouth randomly.
A PHOSITA before the effective filing date of the claimed invention would find it obvious to modify Cohen-Or with the teachings of Adams. Adams is analogous art in dealing with a system and method for viseme data generation to perform lip syncing (col. 1/lines 57-67 and col. 2/lines 1-5). Adams discloses its use of random association is advantageous in assigning visemes randomly to audio data to facilitate dynamic acoustic energy levels that change over time (col. 9/lines 41-57). Consequently, a PHOSITA would incorporate the teachings of Adams into Cohen-Or for assigning visemes randomly to audio data to facilitate dynamic acoustic energy levels that change over time. Therefore, claim 9 is rendered obvious to a PHOSITA before the effective filing date of the claimed invention.
For claim 15, depending on claim 13, this claim is a combination of the limitations of claim 13 and claim 9. It follows claim 15 is rejected for the same reasons as to claim 13 and claim 9.
For claim 20, depending on claim 19, this claim is a combination of the limitations of claim 19 and claim 9. It follows claim 20 is rejected for the same reasons as to claim 19 and claim 9.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES TSENG whose telephone number is (571)270-3857. The examiner can normally be reached 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached at (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHARLES TSENG/ Primary Examiner, Art Unit 2613