Last updated: April 19, 2026
Application No. 18/391,578
SYSTEM AND METHOD FOR DETECTING AND CLASSIFYING CLASSES OF BIRDS

Non-Final OA §103
Filed
Dec 20, 2023
Examiner
TENGBUMROONG, NATHAN NARA
Art Unit
2654
Tech Center
2600 — Communications
Assignee
Loggerhead Instruments Inc.
OA Round
1 (Non-Final)
This examiner grants 43% of cases after interview

— +75.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 14 resolved cases, 2023–2026
Examiner Intelligence

TENGBUMROONG, NATHAN NARA View full profile →
Grants 43% of resolved cases
Career Allow Rate
6 granted / 14 resolved
-19.1% vs TC avg
Strong +75% interview lift
Without
With
+75.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
34 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
27.2%
-12.8% vs TC avg
§103
54.3%
+14.3% vs TC avg
§102
14.8%
-25.2% vs TC avg
§112
3.2%
-36.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 14 resolved cases
Office Action

§103
DETAILED ACTION
This office action is in response to Applicant’s submission filed on 12/20/2023. Claims 1-12 are pending in the application. As such, claims 1-12 have been examined.
	
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Graciarena et al. (US 20250046333 A1; hereinafter referred to as Graciarena) in view of Matsukawa et al. (US 20200267473 A1; hereinafter referred to as Matsukawa), Chu (US 20140304600 A1), and Berres (US 20130266147 A1).
Regarding claim 1, Graciarena teaches: a hybrid edge and cloud system for detecting and identifying bird vocalizations, the hybrid edge and cloud system comprising: an edge device ([0030] FIG. 2 is a block diagram illustrating is a block diagram illustrating a computing system, according to one or more techniques of this disclosure. In the example of FIG. 2, computation engine 230 executes machine learning system 232 on computing system 200) including; an edge neural network running on the edge device ([0022] System 100 of this disclosure, and more specifically embeddings extractor 106, may include deep neural network (DNN) models trained to generate embeddings), the edge neural network trained with audio samples for making predictions about identification of the bird vocalizations ([0021] Embeddings extractor 106 may be trained using an audio space comprising a plurality of sounds, which in some examples includes non-speech sounds. Non-speech sounds may include a sounds generated in nature, e.g., an avalanche, bird songs), the edge neural network for generating a score based on the predictions ([0019] The sound detection pipeline of system 100 may receive an input sound, e.g., input audio waveform 116, at input device 102, and output a score 112 for the input sound. Score 112 may indicate whether received input audio waveform 116 is the same or is different from a particular class of sounds); 
and an audio sensor connected to the edge neural network, the audio sensor for sending sound information to the edge neural network ([0020] input device 102 may include one or more microphones, cameras, and similar devices as well as circuitry to directly capture an audio waveform and output audio spectrogram 104);
wherein the edge neural network generates a score for each of the trained audio samples based on predictions made from a bird audio clip… ([0039] Classifier 208 may then output score 212 for each of the subsequent waveforms to indicate whether the subsequent waveform is the same or different from the new class. As described above in relation to FIG. 1, in some examples calibration module 210 may provide score calibration to produce interpretable scores);
and if the top scoring audio sample is not on a list of common bird species ([0037] In one example implementation of adding a new class, a user may first identify that the new input audio waveform 216 is a new class, e.g., via user interface 224. The new input audio waveform 216 is processed by the DNN of embeddings extractor 206 as described above, and in relation to FIG. 1. It can be determined that an audio is a new class and therefore not part of the database that contains common bird species.), the edge device sends the bird audio clip as a raw detection to the second storage service ([0053] the backend classifier may identify new classes not trained into embeddings extractor 300, with a low error rate, even though the backend classifier may have only trained with a few examples. The backend classifier may compare the representation of the new sound to the representations of sounds, e.g., embeddings 314 from embeddings extractor 300. In this manner the system of this disclosure may distinguish new classes of sounds without the need to retrain embedding extractor 300. The new sound class can represent the raw detection.) whereby the bird clip is processed for cloud inference and sent to the hit table… ([0036] classifier may be further configured to receive classification data 225, e.g., via user interface 224, and an input audio waveform 216 that adds new classes of non-speech sounds that the user may want to identify, without the need to retrain the world knowledge included in embeddings extractor 206. Classifier 208 may enroll new classes of sounds and use the new classes to discriminate and identify other input sounds, using only a limited number of examples for the new class, which in some examples may be only one example);
the edge device determines that if the bird clip is not a published bird clip, the bird clip is sent to the hit table ([0025] The bird watcher may like to know the bird species that produced the song. This could be accomplished by recording that sound and creating a detector, e.g., training backend classifiers 108 with a new class for that sound. In some examples, this detector can be used later to match the novel new bird song to a catalog of bird songs labeled by bird species. A new class represents a new bird clip that is not published.) and if the bird clip is a published bird clip, the bird clip is sent to the first storage service wherein the bird clip is processed for entry into the hit table ([0027] The current system and methods described in this disclosure may include detection of some existing sound events such as gunfire, music, moving vehicles, background noises, animals, etc. In addition, the system of this disclosure may also include the capability to detect new sound classes by providing audio examples, as well as updating an existing sound class detector with new audio samples. Updating an existing sound class means the bird clip is published.).
Graciarena does not explicitly, but Matsukawa discloses: a peripheral neural network implementing a cloud computing system ([0023] In some embodiments, the audio converter 120 may be local to the hybrid speaker assembly 100 and/or comprise a cloud-based and/or server-based converter) that includes at least one cloud neural network for processing the sound information ([0029] Examples of such sound may include car sounds, bird sounds, ballistic projectile sounds, human speech, alerts, warning sounds, and the like. In some embodiments, the identification module 202 may comprise audio event detection algorithms which utilizes convolutional neural network (CNN), recurrent neural network (RNN), and/or generative adversarial network (GAN) that identify specific sounds through machine learning using sounds in the sound database 203 as a learning set), the peripheral neural network including: a first storage service and a second storage service in the peripheral neural network… ([0030] the sound database 203 may comprise a local and/or a network-based database storing audio clips and/or sound profiles);
 Graciarena and Matsukawa are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Graciarena to combine the teachings of Matsukawa because doing so would allow for a cloud-based neural network and database system for analyzing bird vocalizations, enabling different users from different areas to effectively share data (Matsukawa [0030] the sound database 203 may comprise a local and/or a network-based database storing audio clips and/or sound profiles. In some embodiments, the sound database 203 may be maintained by a central service shared among a plurality of audio converters 200 associated with different users and different spaces. In some embodiments, the sound database 203 may be updated and pruned by machine learning algorithm such as a generative adversarial network (GAN) using audio and feedback received at a plurality of audio converter).
The combination of Graciarena and Matsukawa does not explicitly, but Chu teaches: and a hit table which stores metadata about bird detections… ([0025] information storage module may include a database or other information storage medium configured to store information, including characteristics about birds. The information storage module will also be termed an "information module" and a "bird information module". The bird information module may be built using information from expert sources, other databases, research tools, other modules according to the present invention, and publications, including, for example, peer reviewed journals, text books, aviary reports, and field guides. This database can contain common bird species and published bird clips.)
and a browsing device for browsing and listening ([0128] The computer system 200 may be a handheld device and include any small-sized computer device including, for example, a personal digital assistant ("PDA"), smart hand-held computing device, cellular telephone, or a laptop or netbook computer), the browsing device having access to the peripheral network ([0132] The client computer 302 establishes communication with the Internet 304--specifically to one or more servers--to, in turn, establish communication with one or more cloud data centers 306);
whereby the browsing device may request ([0016] an additional procedure for obtaining information about an object may include doing a search of records. To search, the observer may enter a search request in the form of a text, an image, or a sound bite) and receive video and audio information from the hit table regarding at least one of plurality of birds for which the hit table includes ([0047] the system presents images, videos and/or sound recordings of the most likely species from the database and asks the users which, if any, of the images, videos, or sound recordings match the same species that the user uploaded);
and whereby the browsing device may request and receive the video and the audio information after processing from the first storage service ([0141] the system will produce a result in the results field 412. The results may include text, image, sound recording, or video recording regarding one or more possible bird species results. In certain embodiments, the results include a list of bird species results that are ranked by likelihood of being true based on the user-provided information).
Graciarena, Matsukawa, and Chu are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Graciarena and Matsukawa to combine the teachings of Chu because doing so would allow for an accessible database system that stores bird metadata, enabling users to effectively browse and request updated bird information, leading to improved identification of bird vocalizations (Chu [0023] information may be exchanged between modules generally in real time, such that the information available in the modules is updated regularly and benefits from the information received by each other module. Specifically, crowd sourced submission--that is, any information, including text, recorded sounds, recorded videos, or images provided to the system by the user--may be used to improve and expand stored information. In addition, learning algorithms may be utilized to enhance search results and bird identification processes).
The combination of Graciarena, Matsukawa, and Chu does not explicitly, but Berres teaches: and selects a top scoring audio sample… ([0027] A length of audio input can be converted into a low-dimensional vector that contains certain spectral and temporal features. This vector can be used as an acoustic fingerprint that is then compared against a database of pre-computed fingerprints using a dynamic linear programming (DLP) algorithm to accommodate intrinsic spectral and temporal variation in the audio signal. A match determination is made using a scoring system and selecting (or reporting) the highest scoring alignment (or alignments) of both the query and subject (database) sequences);
and if the top scoring audio sample is on a list of common bird species… ([0032] the remo[t]e server compares those attributes against a number of attributes of known bird songs stored in a database in step 106. As discussed above, unlike conventional sound recording identification systems that look for a perfect match, the present system uses a dynamic system, as described below, to identify the best match out of a listing of candidate birds).
Graciarena, Matsukawa, Chu, and Berres are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Graciarena, Matsukawa, and Chu to combine the teachings of Berres because doing so would allow for the system to better determine a top score of multiple audio sample scores by using audio gaps to improve the comparison between a trained audio sample and a bird audio clip (Berres [0073] To optimize the alignment between fingerprints, a number of gaps may be inserted into one or both of the fingerprints to improve the alignment score. The insertion or opening of gap, as well as the extension of that gap (i.e., increasing the length of the gap) may be associated with penalties that can reduce the score of an alignment of the two fingerprints).

Regarding claim 2, the combination of Graciarena, Matsukawa, Chu, and Berres teaches: the system according to claim 1. Graciarena further teaches: wherein the edge device is a portable device ([0058] The cloud-based remote access is coded to utilize a protocol, such as Hypertext Transfer Protocol (HTTP), to engage in a request and response cycle with both a mobile device application resident on a client device, 302A-302G, as well as a web-browser application resident on the client device, 302A-302G. In some situations, the cloud-based remote access for a wearable electronic device 302C, can be accessed via a mobile device, a desktop, a tablet device, cooperating with that wearable electronic device 302C).  

Regarding claim 3, the claim recites similar limitations as claim 1 and therefore is rejected similarly.

Regarding claim 11, Graciarena teaches: a hybrid edge and cloud system for detecting and identifying a sound, the hybrid edge and cloud system comprising: an edge device ([0030] FIG. 2 is a block diagram illustrating is a block diagram illustrating a computing system, according to one or more techniques of this disclosure. In the example of FIG. 2, computation engine 230 executes machine learning system 232 on computing system 200) including; an edge neural network running on the edge device ([0022] System 100 of this disclosure, and more specifically embeddings extractor 106, may include deep neural network (DNN) models trained to generate embeddings), the edge neural network trained with audio samples for making predictions about identification of the sound ([0021] Embeddings extractor 106 may be trained using an audio space comprising a plurality of sounds, which in some examples includes non-speech sounds. Non-speech sounds may include a sounds generated in nature, e.g., an avalanche, bird songs), the edge neural network for generating a score based on the predictions ([0019] The sound detection pipeline of system 100 may receive an input sound, e.g., input audio waveform 116, at input device 102, and output a score 112 for the input sound. Score 112 may indicate whether received input audio waveform 116 is the same or is different from a particular class of sounds);
and an audio sensor connected to the edge neural network, the audio sensor for sending sound information to the edge neural network ([0020] input device 102 may include one or more microphones, cameras, and similar devices as well as circuitry to directly capture an audio waveform and output audio spectrogram 104);
wherein the edge neural network generates a score for each of the trained audio samples based on predictions made from a sound audio clip… ([0039] Classifier 208 may then output score 212 for each of the subsequent waveforms to indicate whether the subsequent waveform is the same or different from the new class. As described above in relation to FIG. 1, in some examples calibration module 210 may provide score calibration to produce interpretable scores);
and if the top scoring audio sample is not on a list of common sound ([0037] In one example implementation of adding a new class, a user may first identify that the new input audio waveform 216 is a new class, e.g., via user interface 224. The new input audio waveform 216 is processed by the DNN of embeddings extractor 206 as described above, and in relation to FIG. 1. It can be determined that an audio is a new class and therefore not part of the database that contains common sounds of bird species.), the edge device sends the sound audio clip as a raw detection to the second storage service ([0053] the backend classifier may identify new classes not trained into embeddings extractor 300, with a low error rate, even though the backend classifier may have only trained with a few examples. The backend classifier may compare the representation of the new sound to the representations of sounds, e.g., embeddings 314 from embeddings extractor 300. In this manner the system of this disclosure may distinguish new classes of sounds without the need to retrain embedding extractor 300. The new sound class can represent the raw detection.) whereby the sound clip is processed for cloud inference and sent to the hit table… ([0036] classifier may be further configured to receive classification data 225, e.g., via user interface 224, and an input audio waveform 216 that adds new classes of non-speech sounds that the user may want to identify, without the need to retrain the world knowledge included in embeddings extractor 206. Classifier 208 may enroll new classes of sounds and use the new classes to discriminate and identify other input sounds, using only a limited number of examples for the new class, which in some examples may be only one example);
the edge device determines that if the sound clip is not a published sound clip, the sound clip is sent to the hit table ([0025] The bird watcher may like to know the bird species that produced the song. This could be accomplished by recording that sound and creating a detector, e.g., training backend classifiers 108 with a new class for that sound. In some examples, this detector can be used later to match the novel new bird song to a catalog of bird songs labeled by bird species. A new class represents a new sound clip that is not published.) and if the sound clip is a published sound clip, the sound clip is sent to the first storage service wherein the sound clip is processed for entry into the hit table ([0027] The current system and methods described in this disclosure may include detection of some existing sound events such as gunfire, music, moving vehicles, background noises, animals, etc. In addition, the system of this disclosure may also include the capability to detect new sound classes by providing audio examples, as well as updating an existing sound class detector with new audio samples. Updating an existing sound class means the sound clip is published.).
Graciarena does not explicitly, but Matsukawa discloses: a cloud computing system ([0023] In some embodiments, the audio converter 120 may be local to the hybrid speaker assembly 100 and/or comprise a cloud-based and/or server-based converter) that includes a cloud neural network for processing the sound information ([0029] Examples of such sound may include car sounds, bird sounds, ballistic projectile sounds, human speech, alerts, warning sounds, and the like. In some embodiments, the identification module 202 may comprise audio event detection algorithms which utilizes convolutional neural network (CNN), recurrent neural network (RNN), and/or generative adversarial network (GAN) that identify specific sounds through machine learning using sounds in the sound database 203 as a learning set), the cloud computing system including: a first storage service and a second storage service… ([0030] the sound database 203 may comprise a local and/or a network-based database storing audio clips and/or sound profiles);
 Graciarena and Matsukawa are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Graciarena to combine the teachings of Matsukawa because doing so would allow for a cloud-based neural network and database system for analyzing sounds such as bird vocalizations, enabling different users from different areas to effectively share data (Matsukawa [0030] the sound database 203 may comprise a local and/or a network-based database storing audio clips and/or sound profiles. In some embodiments, the sound database 203 may be maintained by a central service shared among a plurality of audio converters 200 associated with different users and different spaces. In some embodiments, the sound database 203 may be updated and pruned by machine learning algorithm such as a generative adversarial network (GAN) using audio and feedback received at a plurality of audio converter).
The combination of Graciarena and Matsukawa does not explicitly, but Chu teaches: and a hit table which stores metadata about sound detections… ([0025] information storage module may include a database or other information storage medium configured to store information, including characteristics about birds. The information storage module will also be termed an "information module" and a "bird information module". The bird information module may be built using information from expert sources, other databases, research tools, other modules according to the present invention, and publications, including, for example, peer reviewed journals, text books, aviary reports, and field guides. This database can contain sounds including common bird species and published bird clips.)
and a browsing device for browsing and listening ([0128] The computer system 200 may be a handheld device and include any small-sized computer device including, for example, a personal digital assistant ("PDA"), smart hand-held computing device, cellular telephone, or a laptop or netbook computer), the browsing device having access to the cloud computing system ([0132] The client computer 302 establishes communication with the Internet 304--specifically to one or more servers--to, in turn, establish communication with one or more cloud data centers 306);
whereby the browsing device may request ([0016] an additional procedure for obtaining information about an object may include doing a search of records. To search, the observer may enter a search request in the form of a text, an image, or a sound bite) and receive video and audio information from the hit table regarding at least one of plurality of sounds for which the hit table includes ([0047] the system presents images, videos and/or sound recordings of the most likely species from the database and asks the users which, if any, of the images, videos, or sound recordings match the same species that the user uploaded);
and whereby the browsing device may request and receive the video and the audio information after processing from the first storage service ([0141] the system will produce a result in the results field 412. The results may include text, image, sound recording, or video recording regarding one or more possible bird species results. In certain embodiments, the results include a list of bird species results that are ranked by likelihood of being true based on the user-provided information).
Graciarena, Matsukawa, and Chu are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Graciarena and Matsukawa to combine the teachings of Chu because doing so would allow for an accessible database system that stores bird metadata, enabling users to effectively browse and request updated bird information, leading to improved identification of bird vocalizations (Chu [0023] information may be exchanged between modules generally in real time, such that the information available in the modules is updated regularly and benefits from the information received by each other module. Specifically, crowd sourced submission--that is, any information, including text, recorded sounds, recorded videos, or images provided to the system by the user--may be used to improve and expand stored information. In addition, learning algorithms may be utilized to enhance search results and bird identification processes).
The combination of Graciarena, Matsukawa, and Chu does not explicitly, but Berres teaches: and selects a top scoring audio sample… ([0027] A length of audio input can be converted into a low-dimensional vector that contains certain spectral and temporal features. This vector can be used as an acoustic fingerprint that is then compared against a database of pre-computed fingerprints using a dynamic linear programming (DLP) algorithm to accommodate intrinsic spectral and temporal variation in the audio signal. A match determination is made using a scoring system and selecting (or reporting) the highest scoring alignment (or alignments) of both the query and subject (database) sequences);
and if the top scoring audio sample is on a list of common sound… ([0032] the remo[t]e server compares those attributes against a number of attributes of known bird songs stored in a database in step 106. As discussed above, unlike conventional sound recording identification systems that look for a perfect match, the present system uses a dynamic system, as described below, to identify the best match out of a listing of candidate birds).
Graciarena, Matsukawa, Chu, and Berres are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Graciarena, Matsukawa, and Chu to combine the teachings of Berres because doing so would allow for the system to better determine a top score of multiple audio sample scores by using audio gaps to improve the comparison between a trained audio sample and a bird audio clip (Berres [0073] To optimize the alignment between fingerprints, a number of gaps may be inserted into one or both of the fingerprints to improve the alignment score. The insertion or opening of gap, as well as the extension of that gap (i.e., increasing the length of the gap) may be associated with penalties that can reduce the score of an alignment of the two fingerprints).

Regarding claim 12, the combination of Graciarena, Matsukawa, Chu, and Berres teaches: a method for using a hybrid edge and cloud system according to claim 11. Graciarena further teaches: the edge neural network generating a score for each of the trained audio samples based on predictions made from a bird audio clip… ([0039] Classifier 208 may then output score 212 for each of the subsequent waveforms to indicate whether the subsequent waveform is the same or different from the new class. As described above in relation to FIG. 1, in some examples calibration module 210 may provide score calibration to produce interpretable scores);
and if the top scoring audio sample is not on a list of common bird species ([0037] In one example implementation of adding a new class, a user may first identify that the new input audio waveform 216 is a new class, e.g., via user interface 224. The new input audio waveform 216 is processed by the DNN of embeddings extractor 206 as described above, and in relation to FIG. 1. It can be determined that an audio is a new class and therefore not part of the database that contains common bird species.), the edge device sends the bird audio clip as a raw detection to the second storage service ([0053] the backend classifier may identify new classes not trained into embeddings extractor 300, with a low error rate, even though the backend classifier may have only trained with a few examples. The backend classifier may compare the representation of the new sound to the representations of sounds, e.g., embeddings 314 from embeddings extractor 300. In this manner the system of this disclosure may distinguish new classes of sounds without the need to retrain embedding extractor 300. The new sound class can represent the raw detection.) whereby the bird clip is processed for cloud inference and sent to the hit table… ([0036] classifier may be further configured to receive classification data 225, e.g., via user interface 224, and an input audio waveform 216 that adds new classes of non-speech sounds that the user may want to identify, without the need to retrain the world knowledge included in embeddings extractor 206. Classifier 208 may enroll new classes of sounds and use the new classes to discriminate and identify other input sounds, using only a limited number of examples for the new class, which in some examples may be only one example);
the edge device determines that if the bird clip is not a published bird clip, the bird clip is sent to the hit table ([0025] The bird watcher may like to know the bird species that produced the song. This could be accomplished by recording that sound and creating a detector, e.g., training backend classifiers 108 with a new class for that sound. In some examples, this detector can be used later to match the novel new bird song to a catalog of bird songs labeled by bird species. A new class represents a new bird clip that is not published.) and if the bird clip is a published bird clip, the bird clip is sent to the first storage service wherein the bird clip is processed for entry into the hit table ([0027] The current system and methods described in this disclosure may include detection of some existing sound events such as gunfire, music, moving vehicles, background noises, animals, etc. In addition, the system of this disclosure may also include the capability to detect new sound classes by providing audio examples, as well as updating an existing sound class detector with new audio samples. Updating an existing sound class means the bird clip is published.).
Berres further teaches: and selecting a top scoring audio sample… ([0027] A length of audio input can be converted into a low-dimensional vector that contains certain spectral and temporal features. This vector can be used as an acoustic fingerprint that is then compared against a database of pre-computed fingerprints using a dynamic linear programming (DLP) algorithm to accommodate intrinsic spectral and temporal variation in the audio signal. A match determination is made using a scoring system and selecting (or reporting) the highest scoring alignment (or alignments) of both the query and subject (database) sequences);
and if the top scoring audio sample is on a list of common bird species… ([0032] the remo[t]e server compares those attributes against a number of attributes of known bird songs stored in a database in step 106. As discussed above, unlike conventional sound recording identification systems that look for a perfect match, the present system uses a dynamic system, as described below, to identify the best match out of a listing of candidate birds).
Chu further teaches: and the browsing device requesting ([0016] an additional procedure for obtaining information about an object may include doing a search of records. To search, the observer may enter a search request in the form of a text, an image, or a sound bite) and receiving video and audio information from the hit table regarding at least one of plurality of birds for which the hit table includes ([0047] the system presents images, videos and/or sound recordings of the most likely species from the database and asks the users which, if any, of the images, videos, or sound recordings match the same species that the user uploaded);
and the browsing device requesting and receiving the video and the audio information after processing from the first storage service ([0141] the system will produce a result in the results field 412. The results may include text, image, sound recording, or video recording regarding one or more possible bird species results. In certain embodiments, the results include a list of bird species results that are ranked by likelihood of being true based on the user-provided information).

Claims 4-5 and 7-10 are rejected under 35 U.S.C. 103 as being unpatentable over Graciarena in view of Matsukawa and Berres.
Regarding claim 4, Graciarena teaches: a hybrid edge and cloud system for detecting and identifying bird vocalizations, the hybrid edge and cloud system comprising: an edge device including an audio sensor for audio input ([0020] input device 102 may include one or more microphones, cameras, and similar devices as well as circuitry to directly capture an audio waveform and output audio spectrogram 104) on an edge neural network ([0022] System 100 of this disclosure, and more specifically embeddings extractor 106, may include deep neural network (DNN) models trained to generate embeddings) trained with audio samples for making predictions about identification of the bird vocalizations… ([0021] Embeddings extractor 106 may be trained using an audio space comprising a plurality of sounds, which in some examples includes non-speech sounds. Non-speech sounds may include a sounds generated in nature, e.g., an avalanche, bird songs);
wherein the edge neural network generates a score for each of the trained audio samples based on predictions made from a bird audio clip… ([0034] classifier 208 may determine a score 212 for input audio waveform 216 indicating whether input audio waveform 216 is the same or is different from the selected class of non-speech sounds, as described above in relation to FIG. 1);
the bird audio clip is sent to the cloud neural network wherein the bird audio clip is processed for cloud inference and sent to the hit table ([0036] classifier may be further configured to receive classification data 225, e.g., via user interface 224, and an input audio waveform 216 that adds new classes of non-speech sounds that the user may want to identify, without the need to retrain the world knowledge included in embeddings extractor 206. Classifier 208 may enroll new classes of sounds and use the new classes to discriminate and identify other input sounds, using only a limited number of examples for the new class, which in some examples may be only one example).
Graciarena does not explicitly, but Matsukawa teaches: and wherein the edge device communicates with a cloud neural network for processing the sound information ([0029] Examples of such sound may include car sounds, bird sounds, ballistic projectile sounds, human speech, alerts, warning sounds, and the like. In some embodiments, the identification module 202 may comprise audio event detection algorithms which utilizes convolutional neural network (CNN), recurrent neural network (RNN), and/or generative adversarial network (GAN) that identify specific sounds through machine learning using sounds in the sound database 203 as a learning set), the cloud neural network including a hit table which stores metadata about bird detections ([0029] the specific sound is identified by comparing the source audio with sound clips and/or sound profiles in a sound database 203. For example, the sound database 203 may comprise sound clip and/or sound profiles (e.g. frequency spectrum characteristics, frequency distribution, temporal frequency or amplitude changes, etc.) associated with a plurality of types of sounds suitable for playback via a directional speaker. Examples of such sound may include car sounds, bird sounds).
Graciarena and Matsukawa are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Graciarena to combine the teachings of Matsukawa because doing so would allow for a cloud-based neural network and database system for analyzing bird vocalizations, enabling different users from different areas to effectively share data (Matsukawa [0030] the sound database 203 may comprise a local and/or a network-based database storing audio clips and/or sound profiles. In some embodiments, the sound database 203 may be maintained by a central service shared among a plurality of audio converters 200 associated with different users and different spaces. In some embodiments, the sound database 203 may be updated and pruned by machine learning algorithm such as a generative adversarial network (GAN) using audio and feedback received at a plurality of audio converter).
The combination of Graciarena and Matsukawa does not explicitly, but Berres teaches: and selects a top scoring trained audio sample… ([0027] A length of audio input can be converted into a low-dimensional vector that contains certain spectral and temporal features. This vector can be used as an acoustic fingerprint that is then compared against a database of pre-computed fingerprints using a dynamic linear programming (DLP) algorithm to accommodate intrinsic spectral and temporal variation in the audio signal. A match determination is made using a scoring system and selecting (or reporting) the highest scoring alignment (or alignments) of both the query and subject (database) sequences);
and if the scores indicate that no trained audio sample matches the bird audio clip… ([0072] a first score may be determine when matching a fingerprint of an unknown bird song or call with a known predetermined fingerprint. The score may be determined based upon an optimal alignment between the two fingerprints. The score can be increased as the number of bitwise matches between the two fingerprints increases. Similarly, the score is decreased by bitwise mismatches between the fingerprints).
Graciarena, Matsukawa, and Berres are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Graciarena and Matsukawa to combine the teachings of Berres because doing so would allow for the system to better determine a top score of multiple audio sample scores by using audio gaps to improve the comparison between a trained audio sample and a bird audio clip (Berres [0073] To optimize the alignment between fingerprints, a number of gaps may be inserted into one or both of the fingerprints to improve the alignment score. The insertion or opening of gap, as well as the extension of that gap (i.e., increasing the length of the gap) may be associated with penalties that can reduce the score of an alignment of the two fingerprints).

Regarding claim 5, the combination of Graciarena, Matsukawa, and Berres teaches: the system according to claim 4. Berres further teaches: wherein if the top scoring audio sample is on a list of common bird species… ([0032] the remo[t]e server compares those attributes against a number of attributes of known bird songs stored in a database in step 106. As discussed above, unlike conventional sound recording identification systems that look for a perfect match, the present system uses a dynamic system, as described below, to identify the best match out of a listing of candidate birds).
Graciarena further teaches: the edge device determines that if the bird clip is not a published bird clip, the bird clip is sent to the hit table ([0025] The bird watcher may like to know the bird species that produced the song. This could be accomplished by recording that sound and creating a detector, e.g., training backend classifiers 108 with a new class for that sound. In some examples, this detector can be used later to match the novel new bird song to a catalog of bird songs labeled by bird species. A new class represents a new bird clip that is not published.) and if the bird clip is a published bird clip, the bird clip is sent to the first storage service wherein the bird clip is processed for entry into the hit table ([0027] The current system and methods described in this disclosure may include detection of some existing sound events such as gunfire, music, moving vehicles, background noises, animals, etc. In addition, the system of this disclosure may also include the capability to detect new sound classes by providing audio examples, as well as updating an existing sound class detector with new audio samples. Updating an existing sound class means the bird clip is published.).

Regarding claim 7, the combination of Graciarena, Matsukawa, and Berres teaches: the system according to claim 4. Graciarena further teaches: wherein the edge neural network generates a score based on the predictions, and a score is provided for each trained audio sample based on the predictions… ([0039] Classifier 208 may then output score 212 for each of the subsequent waveforms to indicate whether the subsequent waveform is the same or different from the new class. As described above in relation to FIG. 1, in some examples calibration module 210 may provide score calibration to produce interpretable scores).
Berres further teaches: and the highest scoring trained audio sample is determined ([0027] A length of audio input can be converted into a low-dimensional vector that contains certain spectral and temporal features. This vector can be used as an acoustic fingerprint that is then compared against a database of pre-computed fingerprints using a dynamic linear programming (DLP) algorithm to accommodate intrinsic spectral and temporal variation in the audio signal. A match determination is made using a scoring system and selecting (or reporting) the highest scoring alignment (or alignments) of both the query and subject (database) sequences).

Regarding claim 8, the combination of Graciarena, Matsukawa, and Berres teaches: the system according to claim 4. Matsukawa further teaches: wherein the first storage service is a first storage service and the second storage service is a second storage service in the cloud neural network ([0030] service in the peripheral neural network” – the sound database 203 may comprise a local and/or a network-based database storing audio clips and/or sound profiles).

Regarding claim 9, the combination of Graciarena, Matsukawa, and Berres teaches: the system according to claim 4. Graciarena further teaches: wherein the edge neural network generates a score for each of the trained audio samples based on predictions made from the bird audio clip… ([0039] Classifier 208 may then output score 212 for each of the subsequent waveforms to indicate whether the subsequent waveform is the same or different from the new class. As described above in relation to FIG. 1, in some examples calibration module 210 may provide score calibration to produce interpretable scores) and if the top scoring audio sample is not on a list of common bird species ([0037] In one example implementation of adding a new class, a user may first identify that the new input audio waveform 216 is a new class, e.g., via user interface 224. The new input audio waveform 216 is processed by the DNN of embeddings extractor 206 as described above, and in relation to FIG. 1. It can be determined that an audio is a new class and therefore not part of the database that contains common bird species.), the edge device sends the bird audio clip as a raw detection to the second storage service ([0053] the backend classifier may identify new classes not trained into embeddings extractor 300, with a low error rate, even though the backend classifier may have only trained with a few examples. The backend classifier may compare the representation of the new sound to the representations of sounds, e.g., embeddings 314 from embeddings extractor 300. In this manner the system of this disclosure may distinguish new classes of sounds without the need to retrain embedding extractor 300. The new sound class can represent the raw detection.) whereby the bird clip is processed for cloud inference and sent to the hit table ([0036] classifier may be further configured to receive classification data 225, e.g., via user interface 224, and an input audio waveform 216 that adds new classes of non-speech sounds that the user may want to identify, without the need to retrain the world knowledge included in embeddings extractor 206. Classifier 208 may enroll new classes of sounds and use the new classes to discriminate and identify other input sounds, using only a limited number of examples for the new class, which in some examples may be only one example).
Berres further teaches: and selects a top scoring audio sample… ([0027] A length of audio input can be converted into a low-dimensional vector that contains certain spectral and temporal features. This vector can be used as an acoustic fingerprint that is then compared against a database of pre-computed fingerprints using a dynamic linear programming (DLP) algorithm to accommodate intrinsic spectral and temporal variation in the audio signal. A match determination is made using a scoring system and selecting (or reporting) the highest scoring alignment (or alignments) of both the query and subject (database) sequences).

Regarding claim 10, the combination of Graciarena, Matsukawa, and Berres teaches: the system according to claim 4. Berres further teaches: wherein the edge neural network generates a score for each of the trained audio samples based on predictions made from the bird audio clip ([0011] determined by a similarity of the fingerprint of the input audio signal to at least one of a plurality of predetermined fingerprints within the set of categorized samples of bird vocalizations, wherein the similarity is determined by determining a score of a desirable alignment between the fingerprint and the at least one of a plurality of predetermined fingerprints. The processor is then configured to generate a report including content associated with the bird species) and selects a top scoring audio sample ([0027] A length of audio input can be converted into a low-dimensional vector that contains certain spectral and temporal features. This vector can be used as an acoustic fingerprint that is then compared against a database of pre-computed fingerprints using a dynamic linear programming (DLP) algorithm to accommodate intrinsic spectral and temporal variation in the audio signal. A match determination is made using a scoring system and selecting (or reporting) the highest scoring alignment (or alignments) of both the query and subject (database) sequences) and if the top scoring audio sample is on a list of common bird species… ([0032] the remo[t]e server compares those attributes against a number of attributes of known bird songs stored in a database in step 106. As discussed above, unlike conventional sound recording identification systems that look for a perfect match, the present system uses a dynamic system, as described below, to identify the best match out of a listing of candidate birds).
Graciarena further teaches: the edge device determines that if the bird clip is not a published bird clip, the bird clip is sent to the hit table ([0025] The bird watcher may like to know the bird species that produced the song. This could be accomplished by recording that sound and creating a detector, e.g., training backend classifiers 108 with a new class for that sound. In some examples, this detector can be used later to match the novel new bird song to a catalog of bird songs labeled by bird species. A new class represents a new bird clip that is not published.) and if the bird clip is a published bird clip, the bird clip is sent to the first storage service wherein the bird clip is processed for entry into the hit table ([0027] The current system and methods described in this disclosure may include detection of some existing sound events such as gunfire, music, moving vehicles, background noises, animals, etc. In addition, the system of this disclosure may also include the capability to detect new sound classes by providing audio examples, as well as updating an existing sound class detector with new audio samples. Updating an existing sound class means the bird clip is published.).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Graciarena in view of Matsukawa and Berres, as applied to claims 4-5 and 7-10 above, and further in view of Chu.
Regarding claim 6, the combination of Graciarena, Matsukawa, and Berres teaches: the system according to claim 4. The combination of Graciarena, Matsukawa, and Berres does not explicitly, but Chu teaches: including a browsing device for browsing 
Read full office action
Prosecution Timeline

Dec 20, 2023
Application Filed
Dec 10, 2025
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/173,495
Patent 12530536
Mixture-Of-Expert Approach to Reinforcement Learning-Based Dialogue Management
2y 5m to grant Granted Jan 20, 2026
17/876,156
Patent 12451142
NON-WAKE WORD INVOCATION OF AN AUTOMATED ASSISTANT FROM CERTAIN UTTERANCES RELATED TO DISPLAY CONTENT
2y 5m to grant Granted Oct 21, 2025
17/883,265
Patent 12412050
MULTI-PLATFORM VOICE ANALYSIS AND TRANSLATION
2y 5m to grant Granted Sep 09, 2025
Study what changed to get past this examiner. Based on 3 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
43%
Grant Probability
99%
With Interview (+75.0%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 14 resolved cases by this examiner. Grant probability derived from career allow rate.