Last updated: April 19, 2026
Application No. 18/624,872
PULMONARY LUNG DISEASE DIAGNOSTICS SYSTEM COMPRISED OF DEEP LEARNING ALGORITHMS AND NETWORK INTERFACE

Non-Final OA §102§103
Filed
Apr 02, 2024
Examiner
TRAN, CON P
Art Unit
2695
Tech Center
2600 — Communications
Assignee
Georgia Tech Research Corporation
OA Round
1 (Non-Final)
Interview Optional

— +23.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 543 resolved cases, 2023–2026
Examiner Intelligence

TRAN, CON P View full profile →
Grants 69% — above average
Career Allow Rate
374 granted / 543 resolved
+6.9% vs TC avg
Strong +24% interview lift
Without
With
+23.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 7m
Avg Prosecution
14 currently pending
Career history
557
Total Applications
across all art units
Statute-Specific Performance

§101
5.6%
-34.4% vs TC avg
§103
54.2%
+14.2% vs TC avg
§102
13.4%
-26.6% vs TC avg
§112
18.5%
-21.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 543 resolved cases
Office Action

§102 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	In the response to this office action, the Examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Information Disclosure Statement

2.	The information disclosure statement filed on April 02 2024 has been considered and placed in the application file.

Claim Rejections - 35 USC § 102

3.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


4.	Claims 1, 3, 15, and 17 are rejected under 35 U.S.C. 102(a)(1)  as being anticipated by  Anushiravani et al. U.S. Patent Application Publication 20200388287 (hereinafter, “Anushiravani”). 

	Regarding claim 1, Anushiravani teaches a computer-based diagnostic device (AI system including digital stethoscope; The sensors 102, 103, 104, 105 and 106 are examples of possible sensors that can be used with the disclosed AI system. Other sensors can also be used by the AI system, including but not limited to: a digital stethoscope, Fig. 1A, par [0076], see Anushiravani), comprising: 
		a memory (a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing, par [0213], see Anushiravani); and 
	one or more processors operatively coupled to the memory, wherein the one or more processors are configured to execute instructions causing the one or more processors (The health monitoring device can include one or more processors, 
memory (e.g., flash memory) for storing instructions and data, power source (e.g., a battery), wireless connectivity (e.g., a wireless transceiver) for wirelessly communicating with a network (e.g., the Internet, local area network) access point (e.g., WIFI router, cell tower, par [0213], see Anushiravani)) to: 
		receive (via wireless connectivity (e.g., a wireless transceiver) for wirelessly communicating with a network (e.g., the Internet, local area network) access point (e.g., WIFI router, cell tower) or directly with another device (e.g., Bluetooth, Near Field Communications, RFID), par [0079], see Anushiravani)), from a client device (i.e., digital stethoscope), source audio data (data frame 301, Fig. 3A); The feature extraction method shown at the top of FIG. 3A extracts low dimensional features from time-domain audio signals captured by one or more microphones and a digital stethoscope or other auditory data (Fig. 3A, par [0087]; In an embodiment, each audio signal is analyzed frame by frame (e.g., a consecutive group of audio samples) (301), Fig. 3A, par [0088], see Anushiravani).  A digital stethoscope, however, usually has a sampling rate between 50 Hz to 2000 Hz. Because a digital stethoscope needs to be placed on a user's chest, lungs or back, there may be more than one spot that needs to be recorded, par [0213], see Anushiravan) associated with a patient (In FIG. 7, the method described in FIG. 6 is implemented using neural networks. Based on the predicted symptoms and ground truth labels provided by a physician, a neural network is trained that maps a patient's symptoms to their disease or disease state (704), Fig. 7, par [0111], see Anushiravani);
		 convert the source audio data (via  Mel Frequency Cepstral Coefficients (MFCC), Discrete Cosine Transform coefficients (DCT), Fast Fourier Transform (FFT) coefficients, par [0088], see Anushiravani) into one or more images (see sound images of 302, Fig. 3A. In an embodiment, each audio signal is analyzed frame by frame (e.g., a consecutive group of audio samples) (301). Each frame of the data can be anywhere between 64 milliseconds to 512 milliseconds in length to capture the audio characteristics of one event. Each frame can then be divided into four or more equally spaced sub-frames based on the frame size (302, 303). Such features can include but are not limited to: Mel Frequency Cepstral Coefficients (MFCC), Fig. 3A, par [0088], see Anushiravani); 
		extract one or more numerical features from the one or more images (see extract features and concatenate 304, Fig/ 3A; In some embodiments, features are extracted from the whole frame and concatenated with the subframe feature vector. Feature extraction is then performed on each subframe and the resulting features are concatenated together (304, 305) into a combined feature vector along with features from other sensory data and other available resource, Fig. 3A, par [0088], see Anushiravani);
		determine (FIG. 6 is a graphical illustration of determining the disease or disease state of a user based on the predicted symptoms of the user at each timestamp using a neural network, according to an embodiment, Fig. 6, par [0109], see Anushiravani), using a network (FIG. 5 is a graphical illustration of a multitask symptom classification algorithm using a deep neural network, according to an embodiment, par [0101], see Anushiravani), one or more diagnoses of the patient (a neural network is trained that maps a patient's symptoms to their disease or disease state (704), Fig. 7, par [0111], see Anushiravani; Once features are extracted from the user data objects they are used to train a neural network that detects particular symptoms. For example, audio features at a certain time-stamp that correspond to a cough sound, along with other current information obtained from the user data object) based at least in part on the extracted one or more numerical features (Such features can include but are not limited to: Mel Frequency Cepstral Coefficients (MFCC); Feature extraction is then performed on each subframe and the resulting features are concatenated together (304, 305) into a combined feature vector along with features from other sensory data and other available resource, Fig. 3A, par [0088], see Anushiravani). 
	Anushiravani thus teaches all the claimed limitations.

	Regarding claim 3, Anushiravani teaches the computer-based diagnostic device of claim 1, wherein the audio data is captured by a digital stethoscope (The feature extraction method shown at the top of FIG. 3A extracts low dimensional features from time-domain audio signals captured by one or more microphones and a digital stethoscope or other auditory data (Fig. 3A, par [0087], see Anushiravani).      

	Regarding claim 15, this claim merely reflects the method to the apparatus claim of Claim 1 and is therefore rejected for the same reasons.

	Regarding claim 17, this claim merely reflects the method to the apparatus claim of Claim 3 and is therefore rejected for the same reasons.   
 
Claim Rejections - 35 USC § 103

5.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.   

6.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


7.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

8.	Claims 2, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Anushiravani et al. U.S. Patent Application Publication 20200388287 (hereinafter, “Anushiravani”) in view of Alqudah et al. "Deep learning models for detecting respiratory pathologies from raw lung auscultation sounds", Soft Computing (2022), 26 September 2022, pages 13405-13429 (hereinafter, “Alqudah”)). 
 
	Regarding claim 2, Anushiravani teaches the computer-based diagnostic device of claim 1.  However Anushiravani does not explicitly disclose wherein the one or more diagnoses include COPD, healthy, URTI, Bronchiectasis, Pneumonia, and Bronchiolitis. 
	Alqudah teaches deep learning models for detecting respiratory pathologies from raw lung auscultation sounds (see Title) in which in this paper, the data used incorporated two different datasets, both datasets are consisting of stethoscope lung sounds classified with different respiratory diseases. Table 1 provides a detailed overview of the used dataset and the four datasets from their merging. Each dataset will
be discussed in detail in the next two sections (page 13412 left column, last paragraph – right column first paragraph, see Alqudah). Table 1 teaches normal, asthma, bronchiectasis, bronchiolitis, COPD, LRTI, pneumonia, and URTI (see Table 1, page 13410). See also Confusion Matrix in Fig. 4 (B) in page 13413 see Alqudah.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the deep learning models for detecting respiratory pathologies from raw lung auscultation sounds taught by Alqudah with the computer-based diagnostic device of Anushiravani such that to obtain wherein the one or more diagnoses include COPD, healthy, URTI, Bronchiectasis, Pneumonia, and Bronchiolitis in order to improve the diagnosis performance of many diseases especially respiratory diseases as suggested by Alqudah  in Abstract. 

	Regarding claim 16, this claim merely reflects the method to the apparatus claim of Claim 2 and is therefore rejected for the same reasons.

           
9.	Claims 4, 6, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Anushiravani et al. U.S. Patent Application Publication 20200388287 (hereinafter, “Anushiravani”) in view of Park et al. U.S. Patent Application Publication 20240087265  (hereinafter, “Park”). 

	Regarding claim 4, Anushiravani teaches the computer-based diagnostic device of claim 1.  Anushiravani further teaches for instance, with images, we might do things like rotating the image slightly, cropping or scaling it, modifying colors or lighting, (see page 13414, right column, last paragraph, see Anushiravani). 
	However, Anushiravani does not explicitly disclose wherein the extracting of one or more numerical features comprises converting the plurality of images into separate matrix arrays comprised of numerical translations of pixel colors.
	Park teaches multidimentional image editing from an input image (see Title) in which The 2D image conversion module 102 is generally responsible for converting a 2D image (e.g., one of the runtime images 105-3) into a set of vectors that represent the 2D image. For example, each pixel color value of the 2D image is converted to a numerical value that represents the pixel color value. In some embodiments, such numerical value ranges from 0 to 1. In some embodiments, one or more machine learning models or pre-processing components are responsible for such conversion. For example, a Convolutional Neural Network (CNN) can receive an encoded jpeg image and decode the image into a 3-channel representation—a 3-dimensional matrix (or list, array, or tensor) where each dimension represents a color channel: R, G, and B (Red, Green, Blue), and where each matrix is composed of numbers representing pixel intensity (Fig. 1, par [0032] , see Park).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the multidimentional image editing from an input image taught by Park with the computer-based diagnostic device of Anushiravani such that to obtain wherein the extracting of one or more numerical features comprises converting the plurality of images into separate matrix arrays comprised of numerical translations of pixel colors in order to improve computer resource consumption as suggested by Park in paragraph [0028]. 
	
	Regarding claim 6, Anushiravani in view of Park teaches the computer-based diagnostic device of claim 4.  Anushiravani in view of Park, as modified teaches wherein the matrix arrays (FIG. 13 illustrates a feature extraction technique for convolutional neural networks, according to an embodiment. In another example embodiment, the features extracted from each sensor at each timestamp is fit into a matrix (1305) instead of concatenated together into a feature vector as previously described in reference to FIG. 3 (305), par [0130], see Anushiravani) include one or more of respiratory oscillations (see patterns of frame size 302, 303, Fig. 3A, par [0088]; abnormal breathing pattern, par [0135], see Anushiravani), pitch content (wheezing, par [0147], see Anushiravani), amplitude of breathing noises (see amplitude of frame size 302, Fig. 3A, par [0088], see Anushiravani), peaks and valleys in the audio data  (see peaks and valleys frame size 302, Fig. 3A, par [0088], see Anushiravani), and chord sequences in the audio data (see sequences 303 of frame size 302, Fig. 3A, par [0088], see Anushiravan). The motivation is for purpose of improving a disease state for a user, as suggested by Anushiravan in paragraph [0113].

	Regarding claim 18, this claim merely reflects the method to the apparatus claim of Claim 4 and is therefore rejected for the same reasons.

	Regarding claim 20, this claim merely reflects the method to the apparatus claim of Claim 6 and is therefore rejected for the same reasons.
 
10.	Claims 5, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Anushiravani et al. U.S. Patent Application Publication 20200388287 (hereinafter, “Anushiravani”) in view of Park et al. U.S. Patent Application Publication 20240087265  (hereinafter, “Park”) in view of Xing U.S. Patent Application Publication 20180276540, and further in view of Peleg et al. U.S. Patent Application Publication 20190005976 (hereinafter, “Peleg”). 

	Regarding claim 5, Anushiravani in view of Park teaches the computer-based diagnostic device of claim 4.  Anushiravani in view of Park, as modified teaches the researchers employed different types of machine learning algorithms with handcrafted features like Mel-frequency cepstral coefficient (MFCC) features in a support-vector machine (SVM) or fed the spectrogram images to the convolutional neural network (CNN) (page 13407, left column, last paragraph). 
	However, Anushiravani in view of Park does not explicitly disclose wherein the plurality of images include one or more of mel-frequency cepstral, chromagram, 
	Xing teaches modeling of the latent embedding of music using deep neural network (see Title) in which in some embodiments, other time-frequency analysis may be used as known in the art. For example, mel-Frequency Analysis, and/or mel-frequency cepstrum (MFC) or mel-frequency cepstral coefficients analysis (MFCC) (see par [0041], see Xing).  Other acoustic signal representations such as spectral contrast and music harmonics such as tonal centroid features (tonnetz); all the acoustic signal representations, e.g. chromagram, tempogram, mel-spectogram, (see par [0052], see Xing).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the modeling of the latent embedding of music using deep neural network taught by Xing with the computer-based diagnostic device of Anushiravani in view of Park such that to obtain wherein the plurality of images include one or more of mel-frequency cepstral, chromagram,  in order to perform backward propagation to artificially construct an audio hyper-image as suggested by Xing in paragraph [0062].
	Anushiravani in view of Park in view of Xing, as modified, teaches mel-Frequency Analysis, and/or mel-frequency cepstrum (MFC) or mel-frequency cepstral coefficients analysis (MFCC) (see par [0041]).
	However, Anushiravani in view of Park in view of Xing does not explicitly disclose wherein the plurality of images include one or more of mel-scaled spectrogram.
	  Peleg teaches method and system for enhancing a speech signal of a human speaker in a video using visual information (see Title) in which Audio spectrogram manipulation Generation of spectrogram may be done by applying short-time-Fourier-transform (STFT) to the waveform signal. Mel-scale spectrogram is computed by multiplying the spectrogram by a mel-spaced filterbank (par [0093], see Peleg).	
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method and system for enhancing a speech signal of a human speaker in a video using visual information taught by Peleg with the computer-based diagnostic device of Anushiravani in view of Park in view of Xing such that to obtain wherein the plurality of images include one or more of mel-scaled spectrogram in order to significant improvement in the enhancement performance in different scenarios as suggested by Peleg in paragraph [0050].

	Regarding claim 19, this claim merely reflects the method to the apparatus claim of Claim 5 and is therefore rejected for the same reasons.
             
11.	Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Anushiravani et al. U.S. Patent Application Publication 20200388287 (hereinafter, “Anushiravani”) in view of Park et al. U.S. Patent Application Publication 20240087265  (hereinafter, “Park”), and further in view of Sobol et al. U.S. Patent Application Publication 20190209022 (hereinafter, “Sobol”).
	 	
	Regarding claim 7, Anushiravani in view of Park teaches the computer-based diagnostic device of claim 4.  However, Anushiravani in view of Park does not explicitly disclose wherein the matrix arrays are separate numpy matrix arrays.
	Sobol teaches wearable electronic device and system for tracking location and identifying changes in salient indicators of patient health (see Title) in which for the sensors 121 mentioned previously, certain physiological parameters, such as respiration rate, heart rate, unusual breathing patterns (such as wheezing or the like), (see Fig. 2F, par ]0331], see Sobol). In one form, an open-source machine learning core library such as NumPy is used for performing fast linear algebra and related scientific computing within the Python programming language. NumPy provides support operations for multidimensional array and matrix (but not on scalar quantities) data structures, along with a large collection of high-level mathematical functions to operate on these arrays (par [0241], see Sobol). In one form, the process of segmenting the data may be performed in a random manner, while in another by known algorithms, such as regularization algorithms and others found in a Scikit-Learn function library for the previously-mentioned Pandas dataframes or NumPy arrays. All such data set splitting helps ensure that the model is not overfitting, or that the predictor variables associated with the input data avoid a covariate shift associated with an improper choice of training and testing data sets 1610, 1630 (Fig. 6, par [0263], see Sobol). 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the wearable electronic device and system for tracking location and identifying changes in salient indicators of patient health taught by Sobol with the computer-based diagnostic device of Anushiravani in view of Park such that to obtain wherein the matrix arrays are separate numpy matrix arrays in order to increase the accuracy of data analytic-based predictions of UTIs or other adverse health conditions, as suggested by Sobol in paragraph [0226].

12.	Claims 8, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Anushiravani et al. U.S. Patent Application Publication 20200388287 (hereinafter, “Anushiravani”) in view of Sobol et al. U.S. Patent Application Publication 20190209022 (hereinafter, “Sobol”). 

	Regarding claim 8, Anushiravani  teaches a (a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing, par [0213], see Anushiravani) comprising computer readable instructions executable by one or more processors (The health monitoring device can include one or more processors, memory (e.g., flash memory) for storing instructions and data, power source (e.g., a battery), wireless connectivity (e.g., a wireless transceiver) for wirelessly communicating with a network (e.g., the Internet, local area network) access point (e.g., WIFI router). In an embodiment, one or more of the methods/processes/features described below is at least partially implemented/performed on a second device, such as a network server computer, companion device, medical instrument or machine that is wirelessly coupled (or wired) to the health monitoring device, (see par [0079], see Anushiravani) to:
		receive (via wireless connectivity (e.g., a wireless transceiver) for wirelessly communicating with a network (e.g., the Internet, local area network) access point (e.g., WIFI router, cell tower) or directly with another device (e.g., Bluetooth, Near Field Communications, RFID), par [0079], see Anushiravani)), from a client device (i.e., digital stethoscope), source audio data (data frame 301, Fig. 3A); The feature extraction method shown at the top of FIG. 3A extracts low dimensional features from time-domain audio signals captured by one or more microphones and a digital stethoscope or other auditory data (Fig. 3A, par [0087]; In an embodiment, each audio signal is analyzed frame by frame (e.g., a consecutive group of audio samples) (301), Fig. 3A, par [0088], see Anushiravani).  A digital stethoscope, however, usually has a sampling rate between 50 Hz to 2000 Hz. Because a digital stethoscope needs to be placed on a user's chest, lungs or back, there may be more than one spot that needs to be recorded, par [0213], see Anushiravan) associated with a patient (In FIG. 7, the method described in FIG. 6 is implemented using neural networks. Based on the predicted symptoms and ground truth labels provided by a physician, a neural network is trained that maps a patient's symptoms to their disease or disease state (704), Fig. 7, par [0111], see Anushiravani);
		 convert the source audio data (via  Mel Frequency Cepstral Coefficients (MFCC), Discrete Cosine Transform coefficients (DCT), Fast Fourier Transform (FFT) coefficients, par [0088], see Anushiravani) into one or more images (see sound images of 302, Fig. 3A. In an embodiment, each audio signal is analyzed frame by frame (e.g., a consecutive group of audio samples) (301). Each frame of the data can be anywhere between 64 milliseconds to 512 milliseconds in length to capture the audio characteristics of one event. Each frame can then be divided into four or more equally spaced sub-frames based on the frame size (302, 303). Such features can include but are not limited to: Mel Frequency Cepstral Coefficients (MFCC), Fig. 3A, par [0088], see Anushiravani); 
		extract one or more numerical features from the one or more images (see extract features and concatenate 304, Fig/ 3A; In some embodiments, features are extracted from the whole frame and concatenated with the subframe feature vector. Feature extraction is then performed on each subframe and the resulting features are concatenated together (304, 305) into a combined feature vector along with features from other sensory data and other available resource, Fig. 3A, par [0088], see Anushiravani);
		determine (FIG. 6 is a graphical illustration of determining the disease or disease state of a user based on the predicted symptoms of the user at each timestamp using a neural network, according to an embodiment, Fig. 6, par [0109], see Anushiravani), using a network (FIG. 5 is a graphical illustration of a multitask symptom classification algorithm using a deep neural network, according to an embodiment, par [0101], see Anushiravani), one or more diagnoses of the patient (a neural network is trained that maps a patient's symptoms to their disease or disease state (704), Fig. 7, par [0111], see Anushiravani; Once features are extracted from the user data objects they are used to train a neural network that detects particular symptoms. For example, audio features at a certain time-stamp that correspond to a cough sound, along with other current information obtained from the user data object) based at least in part on the extracted one or more numerical features (Such features can include but are not limited to: Mel Frequency Cepstral Coefficients (MFCC); Feature extraction is then performed on each subframe and the resulting features are concatenated together (304, 305) into a combined feature vector along with features from other sensory data and other available resource, Fig. 3A, par [0088], see Anushiravani).
	However, Anushiravani does not explicitly disclose a computer-readable medium being a non-transitory computer-readable medium.
	Sobol teaches wearable electronic device and system for tracking location and identifying changes in salient indicators of patient health (see Title) in which the steps or events of a method, algorithm or ensuing model disclosed herein may be embodied in a processor-executable software module, which may reside on a tangible, non-transitory version of such computer-readable storage medium such that the medium be in any available form that permits access to the events or steps by a processor or related part of a computer (par [0365], see Sobol).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the wearable electronic device and system for tracking location and identifying changes in salient indicators of patient health taught by Sobol with the computer-based diagnostic device of Anushiravani such that to obtain a computer-readable medium being a non-transitory computer-readable medium in order to increase the accuracy of data analytic-based predictions of UTIs or other adverse health conditions, as suggested by Sobol in paragraph [0226].
	
	Regarding claim 10, Anushiravani in view of Sobol teaches the non-transitory computer-readable medium of claim 8, wherein the audio data is captured by a digital stethoscope (The feature extraction method shown at the top of FIG. 3A extracts low dimensional features from time-domain audio signals captured by one or more microphones and a digital stethoscope or other auditory data (Fig. 3A, par [0087], see Anushiravani).
 
13.	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Anushiravani et al. U.S. Patent Application Publication 20200388287 (hereinafter, “Anushiravani”) in view of Sobol et al. U.S. Patent Application Publication 20190209022 (hereinafter, “Sobol”), and further in view of Alqudah et al. "Deep learning models for detecting respiratory pathologies from raw lung auscultation sounds", Soft Computing (2022), 26 September 2022, pages 13405-13429 (hereinafter, “Alqudah”)).

	Regarding claim 9, Anushiravani in view of Sobol teaches the non-transitory computer-readable medium of claim 8.  However, Anushiravani in view of Sobol does not explicitly disclose wherein the one or more diagnoses include COPD, healthy, URTI, Bronchiectasis, Pneumonia, and Bronchiolitis.
	Alqudah teaches deep learning models for detecting respiratory pathologies from raw lung auscultation sounds (see Title) in which in this paper, the data used incorporated two different datasets, both datasets are consisting of stethoscope lung sounds classified with different respiratory diseases. Table 1 provides a detailed overview of the used dataset and the four datasets from their merging. Each dataset will
be discussed in detail in the next two sections (page 13412 left column, last paragraph – right column first paragraph, see Alqudah). Table 1 teaches normal, asthma, bronchiectasis, bronchiolitis, COPD, LRTI, pneumonia, and URTI (see Table 1, page 13410). See also Confusion Matrix in Fig. 4 (B) in page 13413 see Alqudah.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the deep learning models for detecting respiratory pathologies from raw lung auscultation sounds taught by Alqudah with the computer-based diagnostic device of Anushiravani such that to obtain wherein the one or more diagnoses include COPD, healthy, URTI, Bronchiectasis, Pneumonia, and Bronchiolitis in order to improve the diagnosis performance of many diseases especially respiratory diseases as suggested by Alqudah  in Abstract.

14.	Claims 11, 13, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Anushiravani et al. U.S. Patent Application Publication 20200388287 (hereinafter, “Anushiravani”) in view of Sobol et al. U.S. Patent Application Publication 20190209022 (hereinafter, “Sobol”), and further in view of Park et al. U.S. Patent Application Publication 20240087265  (hereinafter, “Park”). 
 
	Regarding claim 11, Anushiravani in view of Sobol teaches the non-transitory computer-readable medium of claim 8.  Anushiravani in view of Sobol further teaches for instance, with images, we might do things like rotating the image slightly, cropping or scaling it, modifying colors or lighting, (see page 13414, right column, last paragraph, see Anushiravani).
	However, Anushiravani in view of Sobol does not explicitly disclose wherein the extracting of one or more numerical features comprises converting the plurality of images into separate matrix arrays comprised of numerical translations of pixel colors.
	Park teaches multidimentional image editing from an input image (see Title) in which The 2D image conversion module 102 is generally responsible for converting a 2D image (e.g., one of the runtime images 105-3) into a set of vectors that represent the 2D image. For example, each pixel color value of the 2D image is converted to a numerical value that represents the pixel color value. In some embodiments, such numerical value ranges from 0 to 1. In some embodiments, one or more machine learning models or pre-processing components are responsible for such conversion. For example, a Convolutional Neural Network (CNN) can receive an encoded jpeg image and decode the image into a 3-channel representation—a 3-dimensional matrix (or list, array, or tensor) where each dimension represents a color channel: R, G, and B (Red, Green, Blue), and where each matrix is composed of numbers representing pixel intensity (Fig. 1, par [0032] , see Park).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the multidimentional image editing from an input image taught by Park with the computer-based diagnostic device of Anushiravani in view of Sobol such that to obtain wherein the extracting of one or more numerical features comprises converting the plurality of images into separate matrix arrays comprised of numerical translations of pixel colors in order to improve computer resource consumption as suggested by Park in paragraph [0028]. 

	Regarding claim 13, Anushiravani in view of Sobol in view of Park teaches the non-transitory computer-readable medium of claim 11.  Anushiravani in view of Sobol in view of Park, as modified teaches wherein the matrix arrays (FIG. 13 illustrates a feature extraction technique for convolutional neural networks, according to an embodiment. In another example embodiment, the features extracted from each sensor at each timestamp is fit into a matrix (1305) instead of concatenated together into a feature vector as previously described in reference to FIG. 3 (305), par [0130], see Anushiravani) include one or more of respiratory oscillations (see patterns of frame size 302, 303, Fig. 3A, par [0088]; abnormal breathing pattern, par [0135], see Anushiravani), pitch content (wheezing, par [0147], see Anushiravani), amplitude of breathing noises (see amplitude of frame size 302, Fig. 3A, par [0088], see Anushiravani), peaks and valleys in the audio data  (see peaks and valleys frame size 302, Fig. 3A, par [0088], see Anushiravani), and chord sequences in the audio data (see sequences 303 of frame size 302, Fig. 3A, par [0088], see Anushiravan). The motivation is for purpose of improving a disease state for a user, as suggested by Anushiravan in paragraph [0113].
	
	Regarding claim 14, Anushiravani in view of Sobol in view of Park teaches the non-transitory computer-readable medium of claim 11.  Anushiravani in view of Sobol in view of Park, as modified teaches the matrix arrays are separate numpy matrix arrays (for the sensors 121 mentioned previously, certain physiological parameters, such as respiration rate, heart rate, unusual breathing patterns (such as wheezing or the like), (see Fig. 2F, par ]0331], see Sobol). In one form, an open-source machine learning core library such as NumPy is used for performing fast linear algebra and related scientific computing within the Python programming language. NumPy provides support operations for multidimensional array and matrix (but not on scalar quantities) data structures, along with a large collection of high-level mathematical functions to operate on these arrays (par [0241], see Sobol). In one form, the process of segmenting the data may be performed in a random manner, while in another by known algorithms, such as regularization algorithms and others found in a Scikit-Learn function library for the previously-mentioned Pandas dataframes or NumPy arrays. All such data set splitting helps ensure that the model is not overfitting, or that the predictor variables associated with the input data avoid a covariate shift associated with an improper choice of training and testing data sets 1610, 1630 (Fig. 6, par [0263], see Sobol). The motivation is in order to increase the accuracy of data analytic-based predictions of UTIs or other adverse health conditions, as suggested by Sobol in paragraph [0226].
  
15.	Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Anushiravani et al. U.S. Patent Application Publication 20200388287 (hereinafter, “Anushiravani”) in view of Sobol et al. U.S. Patent Application Publication 20190209022 (hereinafter, “Sobol”) in view of Park et al. U.S. Patent Application Publication 20240087265  (hereinafter, “Park”) in view of Xing U.S. Patent Application Publication 20180276540, and further in view of Peleg et al. U.S. Patent Application Publication 20190005976 (hereinafter, “Peleg”).

	Regarding claim 12, Anushiravani in view of Sobol in view of Park teaches the non-transitory computer-readable medium of claim 11.  Anushiravani in view of Sobol in view of Park, as modified teaches the researchers employed different types of machine learning algorithms with handcrafted features like Mel-frequency cepstral coefficient (MFCC) features in a support-vector machine (SVM) or fed the spectrogram images to the convolutional neural network (CNN) (page 13407, left column, last paragraph, see Anushiravani).
	However, Anushiravani in view of Sobol in view of Park does not explicitly disclose wherein the plurality of images include one or more of mel-frequency cepstral, chromagram, 
	Xing teaches modeling of the latent embedding of music using deep neural network (see Title) in which in some embodiments, other time-frequency analysis may be used as known in the art. For example, mel-Frequency Analysis, and/or mel-frequency cepstrum (MFC) or mel-frequency cepstral coefficients analysis (MFCC) (see par [0041], see Xing).  Other acoustic signal representations such as spectral contrast and music harmonics such as tonal centroid features (tonnetz); all the acoustic signal representations, e.g. chromagram, tempogram, mel-spectogram, (see par [0052], see Xing).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the modeling of the latent embedding of music using deep neural network taught by Xing with the computer-based diagnostic device of Anushiravani in view of Sobol in view of Park such that to obtain wherein the plurality of images include one or more of mel-frequency cepstral, chromagram,  in order to perform backward propagation to artificially construct an audio hyper-image as suggested by Xing in paragraph [0062].
	Anushiravani in view of Sobol in view of Park in view of Xing, as modified, teaches mel-Frequency Analysis, and/or mel-frequency cepstrum (MFC) or mel-frequency cepstral coefficients analysis (MFCC) (see par [0041]).
	However, Anushiravani in view of Sobol in view of Park in view of Xing does not explicitly disclose wherein the plurality of images include one or more of mel-scaled spectrogram.
	  Peleg teaches method and system for enhancing a speech signal of a human speaker in a video using visual information (see Title) in which Audio spectrogram manipulation Generation of spectrogram may be done by applying short-time-Fourier-transform (STFT) to the waveform signal. Mel-scale spectrogram is computed by multiplying the spectrogram by a mel-spaced filterbank (par [0093], see Peleg).	
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method and system for enhancing a speech signal of a human speaker in a video using visual information taught by Peleg with the computer-based diagnostic device of Anushiravani in view of Sobol in view of Park in view of Xing such that to obtain wherein the plurality of images include one or more of mel-scaled spectrogram in order to significant improvement in the enhancement performance in different scenarios as suggested by Peleg in paragraph [0050].
 
Conclusion
	
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to CON P TRAN whose telephone number is (571) 272-7532. The examiner can normally be reached M-F (08:30 AM- 05:00 PM) ET.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VIVIAN C. CHIN can be reached at 571-272-7848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/C.P.T/Examiner, Art Unit 2695   

/VIVIAN C CHIN/Supervisory Patent Examiner, Art Unit 2695
Read full office action
Prosecution Timeline

Apr 02, 2024
Application Filed
Feb 06, 2026
Non-Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/112,833
Patent 12597330
EAR BUD INTEGRATION WITH PROPERTY MONITORING
2y 5m to grant Granted Apr 07, 2026
18/744,392
Patent 12598438
SWAPPING ROLES BETWEEN UNTETHERED WIRELESSLY CONNECTED DEVICES
2y 5m to grant Granted Apr 07, 2026
18/161,385
Patent 12568325
COMMUNICATION METHOD APPLIED TO BINAURAL WIRELESS HEADSET, AND APPARATUS
2y 5m to grant Granted Mar 03, 2026
18/259,796
Patent 12549881
AUDIO PLAYING METHOD, APPARATUS AND SYSTEM FOR IN-EAR EARPHONE
2y 5m to grant Granted Feb 10, 2026
18/256,534
Patent 12532116
MOVEABLE ELEMENT FOR A TRANSDUCER, TRANSDUCER, IN-EAR DEVICE AND METHOD FOR DETERMINING THE OCCURRENCE OF A CONDITION IN A TRANSDUCER
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
69%
Grant Probability
92%
With Interview (+23.5%)
3y 7m
Median Time to Grant
Low
PTA Risk
Based on 543 resolved cases by this examiner. Grant probability derived from career allow rate.