DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claim(s) 1-2, 4-7, 11-12, 14-16, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wold et al (U.S. Pub No. 20230244710, hereinafter Wold) and further in view of Thagadur Shivappa et al (U.S. Patent No. 9905233, hereinafter Thagadur).
Regarding claim 1, Wold teaches a method to control audio identification (See Wold Fig 3A, method 300), the method comprising: classifying, by the portable computing device, the received audio as containing media content or as containing no media content (See Wold Fig 3A & ¶ [0023] lines 5-8, step 308 determines which class the media content belongs to which can include a first class containing music and a second class that does not contain music), wherein classifying the received audio as containing media content or as containing no media content comprises determining whether the audio defines content emitted from a media player (See Wold ¶ [0076-0077], at blocks 304 and 306 a set of features for received media content are determined and then analyzed for classification such as comprising or not comprising music. Media player is being read as anything capable of producing media content that would fall into the comprising music category); and based on the classifying, controlling by the portable computing device whether to engage in an audio-identification process for determining an identity of the media content (See Wold Fig 3A, step 310 do not perform further analysis and step 312 perform further analysis), wherein the controlling includes (i) if the portable computing device classifies the received audio as containing media content rather than as containing no media content, then engaging in the audio-identification process for determining the identity of the media content (See Wold Fig 3A, step 312 perform further analysis and step 322 identification process), and (ii) if the portable computing device classifies the received audio as containing no media content rather than as containing media content, then forgoing from engaging in the audio-identification process for determining the identity of the received audio (See Wold Fig 3A, step 310 do not perform further analysis).
Wold does not explicitly teach receiving audio from a surrounding environment via a microphone of the portable computing device and determining the identity of the media content by searching in the received audio for watermarking that encodes an identifier of the media content.
Thagadur teaches receiving audio from a surrounding environment via a microphone of the portable computing device (See Thagadur Fig 2 & column 3 lines 33-39, sample ambient content received via microphone) and determining the identity of the media content by searching in the received audio for watermarking that encodes an identifier of the media content (See Thagadur column 9 lines 57-61, signal identification and watermark detection can be performed in parallel).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated a portable electronic device utilizing a microphone and audio watermarking as taught by Thagadur with the audio identification method taught by Wold. Portable electronic devices with microphones are well known in the art and include many devices such as cellphones, tablets, laptops, smart watches, etc. These devices offer many benefits including portability, ease of use, and connectivity. Audio watermarking is also well known in the art and widely used for content authentication and protection. This allows for easy identification and tracking of content.
Regarding claim 2, Wold in view of Thagadur teaches the method of claim 1, wherein engaging in the audio-identification process for determining the identity of the media content comprises generating digital fingerprint data representing the received audio (See Wold Fig 3A, step 312 digital fingerprint of media content is further analyzed).
Wold does not explicitly teach the use of audio content recognition (ACR).
Thagadur teaches the use of audio content recognition (ACR) (See Thagadur column 19 lines 30-36, audio content recognition).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the audio content recognition taught by Thagadur with the audio identification method taught by Wold. Audio content recognition (ACR) is well known in the art and is commonly used in many everyday devices such as smart TVs, computers, and streaming devices. ACR allows for streamlined automatic processing of audio content allowing users to save time.
Regarding claim 4, Wold in view of Thagadur teaches the method of claim 1, wherein the audio-identification process facilitates measuring media exposure (See Wold ¶ [0039] lines 13-16, information relating to the media content can be stored).
Regarding claim 5, Wold in view of Thagadur teaches the method of claim 1, wherein classifying the received audio as containing media content or containing no media content comprises applying a trained machine-learning model that classifies the received audio as containing either media content or not containing media content (See Wold ¶ [0037] lines 1-4, classification performed using machine learning profiles).
Regarding claim 6, Wold in view of Thagadur teaches the method of claim 5, wherein the method further comprises training a machine- learning model to establish the trained machine-learning model based on a dataset of a plurality of audio segments and corresponding audio segment labels classifying each corresponding audio segment as containing media content or containing no media content (See Wold ¶ [0037] lines 4-9, media content used to train a machine learning profile/model), wherein training the machine-learning model comprises:(i) determining at least one statistical measure of each of at least one audio property of each of the audio segments (See Wold ¶ [0069] lines 7-13, feature vectors of the training set), (ii) feeding the at least one statistical measure of each of the audio segments to obtain a prediction of each of the plurality of audio segments contains media content or contains no media content (See Wold ¶ [0069] lines 1-7, training data used to generate a media classification model 235), and (iii) updating the machine-learning model based on a comparison of the prediction of each of the plurality of audio segments with the corresponding audio segment labels (See Wold ¶ [0069] lines 1-7, training data and classification used to predict media content classification).
Regarding claim 7, Wold in view of Thagadur teaches the method of claim 5, wherein the machine-learning model is trained based on at least one statistical measure of each of at least one audio property (See Wold ¶ [0069] lines 7-13, feature vectors of the training set), and wherein applying the trained machine-learning model comprises (i) determining the at least one statistical measure of each of the at least one audio property of the received audio (See Wold ¶ [0069], determine optimal algorithms and features for classification) and (ii) feeding into the trained machine- learning model the determined at least one statistical measure of each of the at least one audio property of the received audio (See Wold ¶ [0069] lines 1-7, training data used to generate a media classification model 235).
Regarding claim 11, Wold teaches a processor (See Wold Fig 9, processor 902); and a non-transitory computer-readable storage medium (See Wold Fig 9, main memory 904), having stored thereon program instructions (See Wold Fig 9, instructions 922) that, upon execution by the processor, cause performance of a set of operations comprising: classifying the received audio as containing media content or as containing no media content (See Wold Fig 3A & ¶ [0023] lines 5-8, step 308 determines which class the media content belongs to which can include a first class containing music and a second class that does not contain music), wherein classifying the received audio as containing media content or as containing no media content comprises determining whether the audio defines content emitted from a media player (See Wold ¶ [0076-0077], at blocks 304 and 306 a set of features for received media content are determined and then analyzed for classification such as comprising or not comprising music. Media player is being read as anything capable of producing media content that would fall into the comprising music category); and based on the classifying, controlling whether to engage in an audio-identification process for determining an identity of the media content (See Wold Fig 3A, step 310 do not perform further analysis and step 312 perform further analysis), wherein the controlling includes (i) if the portable computing device classifies the received audio as containing media content rather than as containing no media content, then engaging in the audio- identification process for determining the identity of the media content (See Wold Fig 3A, step 312 perform further analysis and step 322 identification process), and (ii) if the portable computing device classifies the received audio as containing no media content rather than as containing media content, then forgoing from engaging in the audio- identification process for determining the identity of the received audio (See Wold Fig 3A, step 310 do not perform further analysis).
Wold does not explicitly teach receiving audio from a surrounding environment via a microphone of the portable computing device and determining the identity of the media content by searching in the received audio for watermarking that encodes an identifier of the media content.
Thagadur teaches receiving audio from a surrounding environment via a microphone of the portable computing device (See Thagadur Fig 2 & column 3 lines 33-39, sample ambient content received via microphone) and determining the identity of the media content by searching in the received audio for watermarking that encodes an identifier of the media content (See Thagadur column 9 lines 57-61, signal identification and watermark detection can be performed in parallel).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated a portable electronic device utilizing a microphone and audio watermarking as taught by Thagadur with the audio identification method taught by Wold. Portable electronic devices with microphones are well known in the art and include many devices such as cellphones, tablets, laptops, smart watches, etc. These devices offer many benefits including portability, ease of use, and connectivity. Audio watermarking is also well known in the art and widely used for content authentication and protection. This allows for easy identification and tracking of content.
Regarding claim 12, Wold in view of Thagadur teaches the portable computing device of claim 11, wherein engaging in the audio- identification process for determining the identity of the media content comprises generating digital fingerprint data representing the received audio (See Wold Fig 3A, step 312 digital fingerprint of media content is further analyzed).
Wold does not explicitly teach the use of audio content recognition (ACR).
Thagadur teaches the use of audio content recognition (ACR) (See Thagadur column 19 lines 30-36, audio content recognition).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the audio content recognition taught by Thagadur with the audio identification method taught by Wold. Audio content recognition (ACR) is well known in the art and is commonly used in many everyday devices such as smart TVs, computers, and streaming devices. ACR allows for streamlined automatic processing of audio content allowing users to save time.
Regarding claim 13, Wold in view of Thagadur teaches the portable computing device of claim 11, wherein engaging in the audio- identification process for determining the identity of the media content comprises searching in the received audio for watermarking that encodes an identifier of the media content (See Wold ¶ [0043] lines 1-2, classification controller 200A can determine when to invoke licensing logic 162 which would require a determination of ownership for the media content such as a watermark).
Regarding claim 14, Wold in view of Thagadur teaches the portable computing device of claim 11, wherein the audio-identification process facilitates measuring media exposure (See Wold ¶ [0039] lines 13-16, information relating to the media content can be stored).
Regarding claim 15, Wold in view of Thagadur teaches the portable computing device of claim 11, wherein classifying the received audio as containing media content or containing no media content comprises applying a trained machine- learning model that classifies the received audio as containing either media content or not containing media content (See Wold ¶ [0037] lines 1-4, classification performed using machine learning profiles).
Regarding claim 16, Wold in view of Thagadur teaches The portable computing device of claim 15, wherein the machine-learning model is trained based on at least one statistical measure of each of at least one audio property (See Wold ¶ [0037] lines 4-9, media content used to train a machine learning profile/model), and wherein applying the trained machine-learning model comprises (i) determining the at least one statistical measure of each of the at least one audio property of the received audio (See Wold ¶ [0069] lines 7-13, feature vectors of the training set) and (ii) feeding into the trained machine-learning model the determined at least one statistical measure of each of the at least one audio property of the received audio (See Wold ¶ [0069] lines 1-7, training data used to generate a media classification model 235).
Regarding claim 20, Wold teaches A non-transitory computer-readable storage medium (See Wold Fig 9, main memory 904), having stored thereon program instructions (See Wold Fig 9, instructions 922) that, upon execution by a processor of a portable computing device, cause performance of a set of operations comprising: classifying the received audio as containing media content or as containing no media content (See Wold Fig 3A & ¶ [0023] lines 5-8, step 308 determines which class the media content belongs to which can include a first class containing music and a second class that does not contain music), wherein classifying the received audio as containing media content or as containing no media content comprises determining whether the audio defines content emitted from a media player (See Wold ¶ [0076-0077], at blocks 304 and 306 a set of features for received media content are determined and then analyzed for classification such as comprising or not comprising music. Media player is being read as anything capable of producing media content that would fall into the comprising music category); and based on the classifying, controlling whether to engage in an audio-identification process for determining an identity of the media content (See Wold Fig 3A, step 310 do not perform further analysis and step 312 perform further analysis), wherein the controlling includes (i) if the portable computing device classifies the received audio as containing media content rather than as containing no media content, then engaging in the audio-identification process for determining the identity of the media content (See Wold Fig 3A, step 312 perform further analysis and step 322 identification process), and (ii) if the portable computing device classifies the received audio as containing no media content rather than as containing media content, then forgoing from engaging in the audio-identification process for determining the identity of the received audio (See Wold Fig 3A, step 310 do not perform further analysis).
Wold does not explicitly teach receiving audio from a surrounding environment via a microphone of the portable computing device and determining the identity of the media content by searching in the received audio for watermarking that encodes an identifier of the media content.
Thagadur teaches receiving audio from a surrounding environment via a microphone of the portable computing device (See Thagadur Fig 2 & column 3 lines 33-39, sample ambient content received via microphone) and determining the identity of the media content by searching in the received audio for watermarking that encodes an identifier of the media content (See Thagadur column 9 lines 57-61, signal identification and watermark detection can be performed in parallel).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated a portable electronic device utilizing a microphone and audio watermarking as taught by Thagadur with the audio identification method taught by Wold. Portable electronic devices with microphones are well known in the art and include many devices such as cellphones, tablets, laptops, smart watches, etc. These devices offer many benefits including portability, ease of use, and connectivity. Audio watermarking is also well known in the art and widely used for content authentication and protection. This allows for easy identification and tracking of content.
Claim(s) 8 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wold et al (U.S. Pub No. 20230244710, hereinafter Wold) in view of Thagadur Shivappa et al (U.S. Patent No. 9905233, hereinafter Thagadur) as applied to claims above, and further in view of Stojancic et al (U.S. Pub No. 20190373311, hereinafter Stojancic) and McKenna et al (U.S. Pub No. 20150154973, hereinafter McKenna).
Regarding claim 8, Wold in view of Thagadur teaches the method of claim 7.
Wold in view of Thagadur does not explicitly teach audio properties consisting of a spectrogram, signal-to-noise ratio, and sound pressure level.
Stojancic teaches audio properties from a spectrogram (See Stojancic ¶ [0055], spectrogram generated using windowed samples).
Wold in view of Thagadur and Stojancic does not explicitly teach audio properties consisting of a signal-to-noise ratio and sound pressure level.
McKenna teaches audio properties consisting of a signal-to-noise ratio (See McKenna ¶ [0091] lines 2-3, frequency values represented as a signal-to-noise ratio) and sound pressure level (See McKenna ¶ [0086] lines 4-9, symbol values represented as sound pressure level).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the audio properties taught by Stojancic and McKenna with the audio identification method taught by Wold in view of Thagadur. Spectrograms, signal-to-noise ratio, and sound pressure level audio property measurements are well known in the art and are common audio measurements. These measurements are used to quantify audio data and provide distinct characteristics for comparison when trying to identify audio information.
Regarding claim 17, Wold in view of Thagadur teaches the portable computing device of claim 15.
Wold in view of Thagadur does not explicitly teach audio properties consisting of a spectrogram, signal-to-noise ratio, and sound pressure level.
Stojancic teaches audio properties from a spectrogram (See Stojancic ¶ [0055], spectrogram generated using windowed samples).
Wold in view of Thagadur and Stojancic does not explicitly teach audio properties consisting of a signal-to-noise ratio and sound pressure level.
McKenna teaches audio properties consisting of a signal-to-noise ratio (See McKenna ¶ [0091] lines 2-3, frequency values represented as a signal-to-noise ratio) and sound pressure level (See McKenna ¶ [0086] lines 4-9, symbol values represented as sound pressure level).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the audio properties taught by Stojancic and McKenna with the audio identification method taught by Wold in view of Thagadur. Spectrograms, signal-to-noise ratio, and sound pressure level audio property measurements are well known in the art and are common audio measurements. These measurements are used to quantify audio data and provide distinct characteristics for comparison when trying to identify audio information.
Claim(s) 9-10 and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wold et al (U.S. Pub No. 20230244710, hereinafter Wold) in view of Thagadur Shivappa et al (U.S. Patent No. 9905233, hereinafter Thagadur), Stojancic et al (U.S. Pub No. 20190373311, hereinafter Stojancic) and McKenna et al (U.S. Pub No. 20150154973, hereinafter McKenna) as applied to claims above, and further in view of Yu (U.S. Pub No. 20180343501, hereinafter Yu).
Regarding claim 9, Wold in view of Thagadur, Stojancic and McKenna teaches the method of claim 8, wherein the at least one statistical measure comprise a statistical measure selected from the group consisting of mean (See Wold ¶ [0070], k-means clustering), standard deviation (See Wold ¶ [0147], standard deviation of each feature).
Wold in view of Thagadur, Stojancic and McKenna does not explicitly teach statistical measures including skewness and kurtosis.
Yu teaches statistical measures including skewness (See Yu ¶ [0049], acoustic features including skew) and kurtosis (See Yu ¶ [0049], acoustic features including kurtosis).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the statistical measures taught by Yu with the audio identification method taught by Wold in view of Thagadur, Stojancic and McKenna. Skew and kurtosis are well known in the art and common audio measurements used in analysis. These measurements provide information used in characterizing acoustic features enabling the identification of audio information.
Regarding claim 10, Wold in view of Thagadur, Stojancic and McKenna teaches the method of claim 7, wherein the at least one audio property comprises a spectrogram (See Stojancic ¶ [0055], spectrogram generated using windowed samples), a signal-to-noise ratio (See McKenna ¶ [0091] lines 2-3, frequency values represented as a signal-to-noise ratio), and a sound pressure level measurement (See McKenna ¶ [0086] lines 4-9, symbol values represented as sound pressure level), and the at least one statistical comprises mean (See Wold ¶ [0070], k-means clustering), standard deviation (See Wold ¶ [0147], standard deviation of each feature).
Wold in view of Thagadur, Stojancic and McKenna does not explicitly teach statistical measures including skewness and kurtosis.
Yu teaches statistical measures including skewness (See Yu ¶ [0049], acoustic features including skew) and kurtosis (See Yu ¶ [0049], acoustic features including kurtosis).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the statistical measures taught by Yu with the audio identification method taught by Wold in view of Thagadur, Stojancic and McKenna. Skew and kurtosis are well known in the art and common audio measurements used in analysis. These measurements provide information used in characterizing acoustic features enabling the identification of audio information.
Regarding claim 18, Wold in view of Thagadur, Stojancic and McKenna teaches the portable computing device of claim 17, wherein the at least one statistical measure comprise a statistical measure selected from the group consisting of mean (See Wold ¶ [0070], k-means clustering), standard deviation (See Wold ¶ [0147], standard deviation of each feature).
Wold in view of Thagadur, Stojancic and McKenna does not explicitly teach statistical measures including skewness and kurtosis.
Yu teaches statistical measures including skewness (See Yu ¶ [0049], acoustic features including skew) and kurtosis (See Yu ¶ [0049], acoustic features including kurtosis).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the statistical measures taught by Yu with the audio identification method taught by Wold in view of Thagadur, Stojancic and McKenna. Skew and kurtosis are well known in the art and common audio measurements used in analysis. These measurements provide information used in characterizing acoustic features enabling the identification of audio information.
Regarding claim 19, Wold in view of Thagadur, Stojancic and McKenna teaches The portable computing device of claim 16, wherein the at least one audio property comprises a spectrogram (See Stojancic ¶ [0055], spectrogram generated using windowed samples), a signal-to-noise ratio (See McKenna ¶ [0091] lines 2-3, frequency values represented as a signal-to-noise ratio), and a sound pressure level measurement (See McKenna ¶ [0086] lines 4-9, symbol values represented as sound pressure level),and the at least one statistical comprises mean (See Wold ¶ [0070], k-means clustering), standard deviation (See Wold ¶ [0147], standard deviation of each feature).
Wold in view of Thagadur, Stojancic and McKenna does not explicitly teach statistical measures including skewness and kurtosis.
Yu teaches statistical measures including skewness (See Yu ¶ [0049], acoustic features including skew) and kurtosis (See Yu ¶ [0049], acoustic features including kurtosis).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the statistical measures taught by Yu with the audio identification method taught by Wold in view of Thagadur, Stojancic and McKenna. Skew and kurtosis are well known in the art and common audio measurements used in analysis. These measurements provide information used in characterizing acoustic features enabling the identification of audio information.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-2, 4-12, and 14-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TYLER LIEBGOTT whose telephone number is (703)756-1818. The examiner can normally be reached Mon-Fri 10-6:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fan Tsang can be reached at (571)272-7547. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/T.M.L./Examiner, Art Unit 2694
/FAN S TSANG/Supervisory Patent Examiner, Art Unit 2694