Prosecution Insights
Last updated: April 18, 2026
Application No. 18/535,106

ARTICULATION ABNORMALITY DETECTION METHOD, ARTICULATION ABNORMALITY DETECTION DEVICE, AND RECORDING MEDIUM

Final Rejection §103
Filed
Dec 11, 2023
Examiner
SHAIKH, ZEESHAN MAHMOOD
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Panasonic Holdings Corporation
OA Round
2 (Final)
52%
Grant Probability
Moderate
3-4
OA Rounds
3y 2m
To Grant
99%
With Interview

Examiner Intelligence

Grants 52% of resolved cases
52%
Career Allow Rate
16 granted / 31 resolved
-10.4% vs TC avg
Strong +55% interview lift
Without
With
+55.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
32 currently pending
Career history
63
Total Applications
across all art units

Statute-Specific Performance

§101
25.7%
-14.3% vs TC avg
§103
45.8%
+5.8% vs TC avg
§102
17.3%
-22.7% vs TC avg
§112
5.8%
-34.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 31 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Amendment This communication is responsive to the applicant’s amendment dated 1/22/2026. The applicant amended claims 1-4 and 8. Additionally, the applicant has added new claims 10-14. Response to Arguments Applicant’s arguments, see Remarks (pg. 7, line 12 – pg. 8, line 10), filed 1/22/2026, with respect to claim 8 have been fully considered and are persuasive. The 35 U.S.C. 112(f) claim interpretation of claim 8 has been withdrawn. Applicant’s arguments with respect to 35 U.S.C. 103 (see Remarks pg. 8, line 11 – pg. 11, line 21) for claims 1-9 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Given the amendments, a new ground of rejection is provided below. Additionally, the applicant argues that prior art fails to teach a certain portion of dependent claim 3. The examiner has provided additional support that reads on that particular limitation below. Applicant’s arguments, see Remarks (pg. 12, line 1 – pg. 32, line 15), filed 1/22/2026, with respect to claims 1-9 have been fully considered. While the examiner doesn’t agree with the reasons that the applicant finds the claims to be directed to patent eligible subject matter under 35 U.S.C. 101, the examiner nevertheless feels the claim is directed to patent eligible subject matter under Step 2A prong 2 due to a technical improvement of a neural network for processing this type of speaker/utterance data. The examiner views this as an improvement in machine learning for articulation abnormality detection. The applicant has amended the claims in incorporate a specifically trained neural network for a particular usage. As a result, the 35 U.S.C. 101 rejection has been withdrawn. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-14 are rejected under 35 U.S.C. 103 as being unpatentable over Moran et al. US 20070005357 A1 (hereinafter Moran) in view of Chang et al. US 20220208173 A1 (hereinafter Chang) in view of Berisha et al US 20230045078 A1 (hereinafter Berisha). Regarding independent claims 1 and 8, Moran teaches an articulation abnormality detection method comprising / an articulation abnormality detection device comprising: a processor (FIG. 1, 16); and memory, wherein the processor, using the memory (FIG. 1, 16) calculating a first acoustic feature from first utterance data of a first speaker at present (Table 2 and 3 shows calculations of acoustic features; [0021] “the feature extraction engine processes each of the speech samples in the database 60 to provide their respective feature vectors”); calculating a degree of similarity between a second speaker feature of the first speaker and the first speaker feature(FIG. 1, [0035] “a classification engine 70 is arranged to compare feature vectors for respective speech samples (probes) provided by remote users of the client devices 12, 14 or 16 to feature vectors from the database 60 either as they are written to the database or offline in batch mode.”, examiner interprets the comparison of vectors as calculating the degree of similarity; [0016] “The mixed gender 4337 database contains 631 voice recordings, each with an associated clinical diagnosis--573 from patients exhibiting a pathology and 58 for normal patients”, examiner interprets normal patients as the speaker who articulated properly); and determining whether the first speaker has an articulation abnormality, based on the degree of similarity ([0038] “While a sustained phonation, recorded in a controlled environment, can be classified as normal or pathologic with accuracy greater than 90%, results for the first embodiment indicate that a telephone quality speech can be classified as normal or pathologic with an accuracy of 74.2%”). Moran fails to teach calculating, from the first acoustic feature, a first speaker feature of the first utterance data by a trained deep neural network (DNN); trained using a plurality of sets of information identifying a speaker and utterance data of the speaker and using, as an input, an acoustic feature of utterance data to output a speaker feature indicating a speaker characteristic for identifying a speaker of the utterance data; the second speaker feature being calculated by the DNN from a second acoustic feature calculated from second utterance data of the first speaker when the first speaker articulated properly; However, Chang teaches calculating, from the first acoustic feature, a first speaker feature of the first utterance data by a trained deep neural network (DNN); trained using a plurality of sets of information identifying a speaker and utterance data of the speaker and using, as an input, an acoustic feature of utterance data to output a speaker feature indicating a speaker characteristic for identifying a speaker of the utterance data; (FIG. 3, [0154] “the deep neural network comprises algorithm for decoding the physiological feature signal to a speech pattern signal”; [0158] “In some embodiments, a decoder is configured to be trained to decode from the physiological feature signals to acoustic feature signals, coded as 25 dimensional mel frequency cepstral coefficients”; [0289] “The training dataset comprised simultaneous recordings of speech acoustics and EMA data from eight participants reading aloud sentences from the MOCHA-TIMIT dataset”) Moran in view of Chang are considered to be analogous to the claimed invention because both are the same field of speech disorders detection. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the techniques remote assessment of a user of Moran with the technique of using a DNN to calculate acoustic features taught by Chang in order to provided are methods and systems of encoding and decoding speech from a subject using articulatory physiology (see Chang [Abstract]). Moran in view of Chang fails to teach the second speaker feature being calculated by the DNN from a second acoustic feature calculated from second utterance data of the first speaker when the first speaker articulated properly; However, Berisha teaches the second speaker feature being calculated by the DNN from a second acoustic feature calculated from second utterance data of the first speaker when the first speaker articulated properly ([0025] “the multi-dimensional statistical signature comprising a plurality of features extracted from the input signal; evaluate the multi-dimensional statistical signature against one or more baseline statistical signatures of speech production ability using a deep learning convolutional neural network trained using a training dataset, thereby generating a speech change identification signal”, examiner interprets baseline as proper articulations) Moran in view of Chang in view of Berisha are considered to be analogous to the claimed invention because both are the same field of speech disorders detection. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the techniques remote assessment of a user of Moran in view of Chang with the technique comparing signals for articulation abnormalities taught by Berisha in order improve systems, devices, and methods for evaluating or analyzing complex audio signals using multi-dimensional statistical signatures and machine learning algorithms (see Berisha [Abstract]). Regarding claim 2, Moran in view of Chang in view of Berisha teaches all of the limitations of claim 1, upon which claim 2 depends. Additionally, Moran teaches in the determining, the first speaker is determined to have an articulation abnormality when the degree of similarity is less than a predetermined first threshold ([0036] “It will be seen that the classification engine could be re-defined to use Hidden Markov Models which would utilise features extracted in the time domain and discriminate between pathological and normal using a non-linear network”, examiner interprets there to be limits/thresholds for the features to provide a diagnosis). Regarding claim 3, Moran in view of Chang in view of Berisha teaches all of the limitations of claim 1, upon which claim 3 depends. Additionally, Moran teaches in the calculating of the first acoustic feature, first acoustic features including the first acoustic feature are calculated from respective items of first utterance data of the first speaker including the first utterance data ([0022] “Preferably, the features extracted include the fundamental frequency (F0), jitter (short-term, cycle to cycle, perturbation in the fundamental frequency of the voice), shimmer (short-term, cycle to cycle, perturbation in the amplitude of the voice), signal-to-noise ratios and harmonic-to-noise ratios”, examiner interprets fundamental frequency, shimmer, and signal to noise ratios as items used to calculate acoustic features; [0023] Referring to Tables 2 and 3, pitch and amplitude perturbation measures were calculated by segmenting the speech waveform (2-5 seconds in length) into overlapping `epochs`), in the calculating of the degree of similarity, degrees of similarity between the second speaker feature and the first speaker features are calculated, the degrees of similarity including the degree of similarity ([0038] “While a sustained phonation, recorded in a controlled environment, can be classified as normal or pathologic with accuracy greater than 90%, results for the first embodiment indicate that a telephone quality speech can be classified as normal or pathologic with an accuracy of 74.2%.”), and in the determining: a variance of the degrees of similarity is calculated ([0036] “It will be seen that the classification engine could be re-defined to use Hidden Markov Models which would utilise features extracted in the time domain and discriminate between pathological and normal using a non-linear network”, examiner interprets features to be the variance); and when the variance is greater than a predetermined second threshold, the speaker is determined to have an articulation abnormality ([0038] “While a sustained phonation, recorded in a controlled environment, can be classified as normal or pathologic with accuracy greater than 90%, results for the first embodiment indicate that a telephone quality speech can be classified as normal or pathologic with an accuracy of 74.2%”) Additionally, Chang teaches in the calculating of the first speaker feature, first speaker features including the first speaker feature are calculated from the first acoustic features by using the trained DNN (FIG. 3, [0154] “the deep neural network comprises algorithm for decoding the physiological feature signal to a speech pattern signal”) Regarding claim 4, Moran in view of Chang in view of Berisha teaches all of the limitations of claim 1, upon which claim 4 depends. Additionally, Chang teaches calculating an acoustic statistic from the first utterance data (FIG. 15 [0020] “Acoustics are represented as spectral features (e.g. Mel-frequency cepstral coefficients (MFCCs)) extracted from the speech waveform. FIG. 15D, Decoded signals are synthesized into an acoustic waveform”, examiner interprets acoustic waveforms as acoustic statistics), wherein the determining includes determining whether the first speaker has an articulation abnormality, based on the degree of similarity and the acoustic statistic ([0264] “The methods and systems of the present disclosure also find use in diagnosing speech motor disorders (e.g., aphasia, dysarthria, stuttering, and the like)”). Regarding claim 5, Moran in view of Chang in view of Berisha teaches all of the limitations of claim 4, upon which claim 5 depends. Additionally, Chang teaches the acoustic statistic includes a pitch variation ([0061] In certain aspects, the speech sound includes speech information such as formants (e.g., spectral peaks of the sound spectrum |P(f)| of the voice) and pitch (e.g., how “high” or “low” the speech sound is depending on the rate of vibration of the vocal chords) which are encoded in the speech production signals and capable of being decoded from the detected speech production signals and/or patterns thereof.), and in the determining, a possibility of the speaker having an articulation abnormality is determined to be higher for smaller pitch variations ([0264] “The methods and systems of the present disclosure also find use in diagnosing speech motor disorders (e.g., aphasia, dysarthria, stuttering, and the like)”). Regarding claim 6, Moran in view of Chang in view of Berisha teaches all of the limitations of claim 4, upon which claim 6 depends. Additionally, Chang teaches the acoustic statistic includes waveform periodicity (FIG. 15 [0020] “Acoustics are represented as spectral features (e.g. Mel-frequency cepstral coefficients (MFCCs)) extracted from the speech waveform. FIG. 15D, Decoded signals are synthesized into an acoustic waveform”), and in the determining, a possibility of the speaker having an articulation abnormality is determined to be higher for shorter waveform periodicity ([0264] “The methods and systems of the present disclosure also find use in diagnosing speech motor disorders (e.g., aphasia, dysarthria, stuttering, and the like)”) Regarding claim 7, Moran in view of Chang in view of Berisha teaches all of the limitations of claim 4, upon which claim 7 depends. Additionally, Chang teaches wherein the acoustic statistic includes skewness (FIG. 15 [0020] “Acoustics are represented as spectral features (e.g. Mel-frequency cepstral coefficients (MFCCs)) extracted from the speech waveform. FIG. 15D, Decoded signals are synthesized into an acoustic waveform”, examiner interprets spectral features to include skewness.), and in the determining, a possibility of the speaker having an articulation abnormality is determined to be higher for greater skewness ([0264] “The methods and systems of the present disclosure also find use in diagnosing speech motor disorders (e.g., aphasia, dysarthria, stuttering, and the like)”). Regarding claim 9, Moran in view of Chang in view of Berisha teaches all of the limitations of claim 1, upon which claim 9 depends. Additionally, Chang teaches a non-transitory computer-readable recording medium for use in a computer, the recording medium having recorded thereon a computer program for causing the computer to execute the articulation abnormality detection ([0101] “Aspects of the present disclosure include a non-transitory computer readable medium storing instructions that, when executed by one or more processors and/or computing devices, cause the one or more processors and/or computing devices to perform the steps for decoding speech events in an individual, as provided herein”) Regarding claim 10, Moran in view of Chang in view of Berisha teaches all of the limitations of claim 1, upon which claim 10 depends. Additionally, Chang teaches wherein the first utterance data is an audio signal obtained by converting a voice from the first speaker ([0345] “To transform the acoustics of all data to the target speaker, a voice conversion was applied to transform the spectral properties of each EMA speaker to match those of the target participant”), the voice being obtained using a microphone ([0343] “Speech was amplified digitally and recorded with a microphone”) Regarding claim 11, Moran in view of Chang in view of Berisha teaches all of the limitations of claim 3, upon which claim 11 depends. Additionally, Chang teaches wherein each of the items of first utterance data is an audio signal obtained by converting a voice from the first speaker ([0345] “To transform the acoustics of all data to the target speaker, a voice conversion was applied to transform the spectral properties of each EMA speaker to match those of the target participant”), the voice being obtained using a microphone ([0343] “Speech was amplified digitally and recorded with a microphone”). Regarding claim 12, Moran in view of Chang in view of Berisha teaches all of the limitations of claim 11, upon which claim 12 depends. Additionally, Berisha teaches further comprising: giving an instruction to the first speaker to utter a same predetermined phrase two or more times by at least one of an indication on a display or a voice from a loudspeaker, wherein the items of first utterance data are obtained from utterances by the first speaker based on the instruction ([0069] “The notification element 114 may further provide instructions to the user 118 for providing the speech 116 (e.g., displaying a passage for the user 118 to read)”; [0078] “Speech or audio data that fails this quality control assessment may be rejected, and the user asked to repeat or redo an instructed task (or alternatively, continue passive collection of audio/speech)”). Regarding claim 13, Moran in view of Chang in view of Berisha teaches all of the limitations of claim 1, upon which claim 13 depends. Additionally, Berisha teaches further comprising: notifying a result of the determining by at least one of an indication on a display or a voice from a loudspeaker (FIG. 1, 120, [0016] “the at least one notification signal comprises a display notification instructing the user to take action to relieve symptoms associated with respiratory tract function.”). Regarding claim 14, Moran in view of Chang in view of Berisha teaches all of the limitations of claim 1, upon which claim 14 depends. Additionally, Berisha teaches giving an instruction to the first speaker to utter a predetermined phrase by at least one of an indication on a display or a voice from a loudspeaker ([0069] “The notification element 114 may further provide instructions to the user 118 for providing the speech 116 (e.g., displaying a passage for the user 118 to read)”, examiner interprets passage as predetermined phrase). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Im et al. (US 12573504 B2) teaches an apparatus for diagnosing a disease and a method for diagnosing a disease, in which: a plurality of voice signals are received to generate a first image signal and a second image signal which are image signals for each voice signal; a plurality of disease probability information for a target disease causing a voice change are extracted by using an artificial intelligence model determined according to the type of each voice signal and a generation method used to generate each image signal for the first image signal and the second image signal for each voice signal; and it is determined whether the target disease is negative or positive on the basis of the plurality of disease probability information. Mesgarani et al. (US 20190066713 A1) teaches devices, systems, apparatus, methods, products, and other implementations, including a method comprising obtaining, by a device, a combined sound signal for signals combined from multiple sound sources in an area in which a person is located, and applying, by the device, speech-separation processing (e.g., deep attractor network (DAN) processing, online DAN processing, LSTM-TasNet processing, Conv-TasNet processing), to the combined sound signal from the multiple sound sources to derive a plurality of separated signals that each contains signals corresponding to different groups of the multiple sound sources. The method further includes obtaining, by the device, neural signals for the person, the neural signals being indicative of one or more of the multiple sound sources the person is attentive to, and selecting one of the plurality of separated signals based on the obtained neural signals. The selected signal may then be processed (amplified, attenuated). Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZEESHAN SHAIKH whose telephone number is (703)756-1730. The examiner can normally be reached Monday-Friday 7:30AM-5:00PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /ZEESHAN MAHMOOD SHAIKH/Examiner, Art Unit 2658 /RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658
Read full office action

Prosecution Timeline

Dec 11, 2023
Application Filed
Oct 17, 2025
Non-Final Rejection — §103
Jan 15, 2026
Examiner Interview Summary
Jan 15, 2026
Applicant Interview (Telephonic)
Jan 22, 2026
Response Filed
Apr 02, 2026
Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12579373
SYSTEM AND METHOD FOR SYNTHETIC TEXT GENERATION TO SOLVE CLASS IMBALANCE IN COMPLAINT IDENTIFICATION
2y 5m to grant Granted Mar 17, 2026
Patent 12555575
Wakeup Indicator Monitoring Method, Apparatus and Electronic Device
2y 5m to grant Granted Feb 17, 2026
Patent 12518090
LOGICAL ROLE DETERMINATION OF CLAUSES IN CONDITIONAL CONSTRUCTIONS OF NATURAL LANGUAGE
2y 5m to grant Granted Jan 06, 2026
Patent 12511318
MULTI-SYSTEM-BASED INTELLIGENT QUESTION ANSWERING METHOD AND APPARATUS, AND DEVICE
2y 5m to grant Granted Dec 30, 2025
Patent 12512088
METHOD AND SYSTEM FOR USER-INTERFACE ADAPTATION OF TEXT-TO-SPEECH SYNTHESIS
2y 5m to grant Granted Dec 30, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
52%
Grant Probability
99%
With Interview (+55.0%)
3y 2m
Median Time to Grant
Moderate
PTA Risk
Based on 31 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month