DETAILED ACTION
This is responsive to the preliminary amendment filed 11 September 2024.
Claims 2-22 are currently pending and considered below.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 2-22 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 12,073,844. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1-20 of U.S. Patent No. 12,073,844 anticipate the currently pending claims.
The parent claims include all of the limitations of the instant application claims, respectively. The parent claims also include additional limitations. Hence, the instant application claims are generic to the species of invention covered by the respective parent claims. As such, the instant application claims are anticipated by the parent claims and are therefore not patentably distinct therefrom. (See Eli Lilly and Co. v. Barr Laboratories Inc., 58 USPQ2D 1869, "a later genus claim limitation is anticipated by, and therefore not patentably distinct from, an earlier species claim", In re Goodman, 29 USPQ2d 2010, "Thus, the generic invention is 'anticipated' by the species of the patented invention" and the instant “application claims are generic to species of invention covered by the patent claim, and since without terminal disclaimer, extant species claims preclude issuance of generic application claims”).
Further, it is well settled that the omission of an element/step and its function is an obvious expedient if the remaining elements perform the same function as before. In re Karlson, 136 USPQ 184 (CCPA 1963). Also note Ex parte Rainu, 168 USPQ 375 (Bd. App. 1969). Omission of a reference element or step whose function is not needed would be obvious to one of ordinary skill in the art.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 2-3, 7-13 and 17-22 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Tiefenau (US PGPub 2021/0289300).
Claim 2:
Tiefenau discloses a method comprising:
receiving, by a user device, a first indication (a user input selecting the first user interface element) of one or more first speakers visible in a current view recorded by a camera of the user device (“obtaining image data with a camera of the accessory device. The image data may comprise moving image data also denoted video image data … The method comprises identifying, e.g. with accessory device, one or more audio sources including a first audio source based on the image data.”, [0030]-[0031], see also “identifying one or more audio sources comprises determining a first position of the first audio source based on the image data, displaying, e.g. on touch-sensitive display device of the accessory device, a first user interface element indicative of the first audio source, and detecting a user input selecting the first user interface element. The method may comprise, in accordance with detecting a user input selecting the first user interface element, determining first image data of the image data, the first image data associated with the first audio source.”, [0043]);
in response to receiving the first indication, generating a respective isolated speech signal for each of the one or more first speakers that isolates speech of the one or more first speakers in the current view (“determining, e.g. in the accessory device, a first model comprising first model coefficients, wherein the first model is based on image data of the first audio source and the audio input signal”, [0032], see also “The first model or first model coefficients may be applied in a (speech) separation process, e.g. in the hearing device processing the first input signal or in the accessory device, in order to separate out e.g. speech of the first audio source from the first input signal”, [0033]) and sending the isolated speech signals for each of the one or more first speakers to a listening device operatively coupled to the user device (“processing, in the accessory device, the first audio input signal based on the first model for provision of a first output signal. Transmitting a hearing device signal optionally comprises transmitting the first output signal to the hearing device”, [0049]);
while generating the respective isolated speech signal for each of the one or more first speakers, receiving, by the user device, a second indication (a user input selecting the second user interface element) of one or more second speakers visible in the current view recorded by the camera of the user device (“identifying one or more audio sources comprises determining a second position of the second audio source based on the image data, displaying, e.g. on touch-sensitive display device of the accessory device, a second user interface element indicative of the second audio source, and detecting a user input selecting the second user interface element. The method may comprise, in accordance with detecting a user input selecting the second user interface element, determining second image data of the image data, the second image data associated with the second audio source”, [0056]); and
in response to the second indication, generating and sending a respective isolated speech signal for each of the one or more second speakers to the listening device (“processing, in the accessory device, the first audio input signal based on the second model for provision of a second output signal. Transmitting a hearing device signal optionally comprises transmitting the second output signal to the hearing device”, [0062]).
Claim 3:
Tiefenau discloses the method of claim 2, wherein the listening device is configured to receive audio input from a plurality of audio channels, and wherein sending the isolated speech signals for each of the one or more first speakers to the listening device comprises: sending isolated speech signals to different audio channels of the plurality of audio channels ([0075], see also [0099]).
Claim 7:
Tiefenau discloses the method of claim 2, wherein the generating and the sending of the isolated speech signals of the one or more first speakers comprises generating and sending an isolated speech signal of a first speaker of the one or more first speakers only while the first speaker is visible in the current view recorded by the camera (“determining, e.g. in the accessory device, a first model comprising first model coefficients, wherein the first model is based on image data of the first audio source and the audio input signal”, [0032], note that the first model, used in generating the isolated speech signal, requires visible image data captured by the camera).
Claim 8:
Tiefenau discloses the method of claim 2, wherein the method further comprises receiving an indication of a preferred speaker of the one or more first speakers, and whenever generating and sending isolated speech signals for more than one first speaker, generating and sending an isolated speech signal for the preferred speaker at the exclusion of the other first speakers ([0043], see also [0033]).
Claim 9:
Tiefenau discloses the method of claim 8, wherein receiving the indication of the preferred speaker comprises receiving, at the user device, a user input selecting the preferred speaker ([0043]).
Claim 10:
Tiefenau discloses the method of claim 2, wherein receiving the first indication comprises receiving, at the user device, a first user input indicating the one or more first speakers ([0043]); and wherein receiving the second indication comprises receiving, at the user device, a second user input indicating the one or more second speakers ([0056]).
Claim 11:
Tiefenau discloses the method of claim 10, wherein the first user input and/or the second user input is a user selection received via a display operatively coupled to the user device ([0043], see also [0056]).
Claims 12-13 and 17-21:
Tiefenau discloses a system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations ([0136]), the operations comprising the steps of process claims 2-3 and 7-11 as shown above.
Claim 22:
Tiefenau discloses one or more non-transitory computer-readable storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations ([0136]), the operations comprising the steps of process claim 2 as shown above.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 5-6 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Tiefenau (US PGPub 2021/0289300) in view of Chaudhuri et al. (US PGPub 2018/0174600).
Claim 5:
Tiefenau discloses the method of claim 2, but does not explicitly disclose for each of one or more of the first speakers, processing a respective isolated speech signal for the speaker to generate a transcription of the speech of the speaker; and displaying the transcription while sending the isolated speech signal of the first speaker.
In a method similarly generating respective isolated speech signal for one or more first speakers, Chaudhuri discloses for each of one or more of the first speakers, processing a respective isolated speech signal for the speaker to generate a transcription of the speech of the speaker; and displaying the transcription while sending the isolated speech signal of the first speaker (“the caption subsystem 160 may include bounding boxes around both faces, and indicate in the bounding box having the speaker a label (e.g., “speaker”) indicating that the person is speaking. The caption subsystem 160 may also provide the captions for the speaker adjacent to that speaker”, [0129]).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to combine the references and yield the predictable result of processing a respective isolated speech signal for Tiefenau’s speaker to generate a transcription of the speech of the speaker; and displaying the transcription while sending the isolated speech signal of the first speaker in order to provide additional speech related information in textual form which is especially useful in loud and noisy environments.
Claim 6:
Tiefenau discloses the method of claim 2, but does not explicitly disclose wherein the one or more first speakers indicated are speakers at or near the center of the current view recorded by the camera.
In a method similarly generating respective isolated speech signal for one or more first speakers, Chaudhuri discloses wherein the one or more first speakers indicated are speakers at or near the center of the current view recorded by a camera (“the caption subsystem 160 may include bounding boxes around both faces, and indicate in the bounding box having the speaker a label (e.g., “speaker”) indicating that the person is speaking. The caption subsystem 160 may also provide the captions for the speaker adjacent to that speaker”, [0129], see also representative Fig. 8, item 820 where the speakers are at or near the center of the current camera view).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to combine the references and yield the predictable result of wherein the one or more first speakers indicated are speakers at or near the center of the current view recorded by the camera because as shown by Chaudhuri in Fig. 8, item 820, there are times when speakers are captured at or near the center of a current view of an image capturing camera.
Claims 15-16:
Tiefenau in view of Chaudhuri discloses one or more non-transitory computer-readable storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations (Tiefenau, [0136]), the operations comprising the steps of process claims 5-6 as shown above.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Peleg et al. (US PGPub 2019/0005976) discloses s method and system for enhancing a speech signal. The method may include the following steps: obtaining an original video, wherein the original video includes a sequence of original input images showing a face of at least one human speaker, and an original soundtrack synchronized with said sequence of images; and processing, using a computer processor, the original video, to yield an enhanced speech signal of said at least one human speaker, by detecting sounds that are acoustically unrelated to the speech of the at least one human speaker, based on visual data derived from the sequence of original input images.
Beaumont et al. (US PGPub 2015/0088515) provides a method, including: receiving image data from a visual sensor of an information handling device; receiving audio data from one or more microphones of the information handling device; identifying, using one or more processors, human speech in the audio data; identifying, using the one or more processors, a pattern of visual features in the image data associated with speaking; matching, using the one or more processors, the human speech in the audio data with the pattern of visual features in the image data associated with speaking; selecting, using the one or more processors, a primary speaker from among matched human speech; assigning control to the primary speaker; and performing one or more actions based on audio input of the primary speaker.
Krupka et al. (US PGPub 2019/0341054) discloses multi-modal speech localization is achieved using image data captured by one or more cameras, and audio data captured by a microphone array. Audio data captured by each microphone of the array is transformed to obtain a frequency domain representation that is discretized in a plurality of frequency intervals. Image data captured by each camera is used to determine a positioning of each human face. Input data is provided to a previously-trained, audio source localization classifier, including: the frequency domain representation of the audio data captured by each microphone, and the positioning of each human face captured by each camera in which the positioning of each human face represents a candidate audio source. An identified audio source is indicated by the classifier based on the input data that is estimated to be the human face from which the audio data originated.
Ephrat et al. ("Looking to listen at the cocktail party: A speaker-independent audio-visual model for speech separation." arXiv preprint arXiv:1804.03619 (2018)) presents a joint audio-visual model for isolating a single speech signal from a mixture of sounds such as other speakers and background noise. Solving this task using only audio as input is extremely challenging and does not provide an association of the separated speech signals with speakers in the video. A deep network-based model that incorporates both visual and auditory signals to solve this task is presented. The visual features are used to "focus" the audio on desired speakers in a scene and to improve the speech separation quality.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMUEL G NEWAY whose telephone number is (571)270-1058. The examiner can normally be reached Monday-Friday 9:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SAMUEL G NEWAY/Primary Examiner, Art Unit 2657