Prosecution Insights
Last updated: April 19, 2026
Application No. 19/027,045

METHOD AND SYSTEM FOR EARLY DIAGNOSIS OF PARKINSON'S DISEASE BASED ON MULTIMODAL DEEP LEARNING

Non-Final OA §103
Filed
Jan 17, 2025
Examiner
REMALY, MARK DONALD
Art Unit
3797
Tech Center
3700 — Mechanical Engineering & Manufacturing
Assignee
Shandong University
OA Round
1 (Non-Final)
69%
Grant Probability
Favorable
1-2
OA Rounds
3y 7m
To Grant
85%
With Interview

Examiner Intelligence

Grants 69% — above average
69%
Career Allow Rate
492 granted / 709 resolved
-0.6% vs TC avg
Strong +16% interview lift
Without
With
+15.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 7m
Avg Prosecution
24 currently pending
Career history
733
Total Applications
across all art units

Statute-Specific Performance

§101
6.8%
-33.2% vs TC avg
§103
37.8%
-2.2% vs TC avg
§102
23.6%
-16.4% vs TC avg
§112
28.5%
-11.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 709 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Election/Restrictions Claims 6-9 are withdrawn from further consideration pursuant to 37 CFR 1.142(b) as being drawn to a nonelected system, there being no allowable generic or linking claim. Election was made without traverse in the reply filed on 12/05/2025. Priority Acknowledgment is made of applicant's claim for foreign priority based on an application filed in China on 01/18/2024. It is noted, however, that applicant has not filed a certified copy of the CN202410080108.X application as required by 37 CFR 1.55. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. Claim(s) 1-5 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lipsmeier et al. (US 2023/0172526 A1) in view of Hao et al. (US 2022/0280098 A1). Regarding claim 1, Lipsmeier et al. (‘526) teach a method for early diagnosis of Parkinson's disease (“Parkinson’s disease” see [0105]) based on multimodal deep learning (“deep learning models” see [0208]), comprising: (1) acquiring audio data of a to-be-diagnosed subject while performing a speech task (“identifying a plurality of segments of the voice recording” see [0007]); (2) preprocessing the audio data to extract a plurality of audio segments; and calculating a Mel-spectrogram of each of the plurality of audio segments (“computing one or more Mel-frequency cepstral coefficients (MFCCs) for the segments” see [0007]); and (3) and the Mel-spectrogram into a multimodal deep learning model to output a classification result for Parkinson's disease early diagnosis of the to-be-diagnosed subject, wherein the multimodal deep learning model comprises a local feature extraction module, an audio feature extraction module, a feedforward network and a cross-attention module (“classifying the subject as belonging to one of the plurality of UHDRS dysarthria score classes” see [0123]); wherein step (3) is performed through steps of: (3.1) and extracting audio features from the Mel-spectrogram through the audio feature extraction module (“computing one or more Mel-frequency cepstral coefficients (MFCCs) for the segments” see [0007]); and (3.2) inputting the audio features to the feedforward network, and the audio features to the cross-attention module to learn a cross-modal attention weight (“threshold may be determined as a weighted average of the relative energy values assumed to correspond to signal and the relative energy values assumed to correspond to background noise” see [0042]); performing feature fusion based on the cross-modal attention weight to obtain multimodal features and outputting the classification result based on the multimodal features (“determining the speech rate associated with the voice recording comprises computing the total number of words in the recording and diving the total number of words by the length of the recording” see [0048]). Lipsmeier et al. fails to explicitly teach wherein each of the plurality of audio segments corresponds to a synchronized one among the plurality of video segments. However, Hao et al. (‘098) from the same field of endeavor do teach a plurality of audio segments corresponds to a synchronized one among the plurality of video segments (“Parkinson's symptoms assessor 134 may convey the one or more Parkinson's symptoms assessments to the user in the form of audio, video, text, or any other manner” see [0051]); extracting a face image sequence from each of the plurality of video segments (“video processing to extract features from collected video of an individual moving their face” see [0038]); and inputting the face image sequence as part of the multimodal features (“assess Parkinson's disease symptoms” see [0053]). It would be obvious to one of ordinary skill in the art at the time of the invention to modify the invention of Lipsmeier et al. with the features of Hao et al. for the benefit of more accurate Parkinson's disease symptom identification (see Lipsmeier et al. [0009]). Regarding claim 2, Lipsmeier et al. (‘526) in view of Hao et al. (‘098) teach the method of claim 1, wherein the local feature extraction module comprises a visual front-end network and a visual temporal network; the visual front-end network is based on ShuffleNet-V2, and further comprises a two-dimensional (2D) convolution module; the visual front-end network is configured to encode the face image sequence into a frame-based embedding sequence; and the visual temporal network consists of a video temporal convolution module, and is configured to capture facial motion visual features in different time intervals; and the step of extracting the visual features from the face image sequence through the local feature extraction module comprises: extracting facial visual features from each frame of the face image sequence through the visual front-end network, and extracting the visual features from the facial visual features through the visual temporal network, wherein the visual features are time-correlated (see Hao et al. [0036]). Regarding claim 3, Lipsmeier et al. (‘526) in view of Hao et al. (‘098) teach method of claim 1, wherein the audio feature extraction module is a VGGish network provided with a convolution module; the audio feature extraction module is configured to extract the audio features at different time intervals from the plurality of audio segments; and the step of extracting the audio features from the Mel-spectrogram through the audio feature extraction module comprises: inputting the Mel-spectrogram into the audio feature extraction module, and extracting the audio features through the VGGish network, wherein the audio features are time-correlated. (see Lipsmeier et al. [0007]); Regarding claim 4, Lipsmeier et al. (‘526) in view of Hao et al. (‘098) teach method of claim 1, wherein step (3.2) comprises: after the visual features and the audio features pass through the feedforward network, inputting the visual features and the audio features into the cross-attention module with the visual features as key vectors and value vectors and the audio features as query vectors to learn the cross-modal attention weight, and acquiring visual feature-enhanced audio features based on the cross-modal attention weight; and inputting the visual features and the audio features into the cross-attention module with the audio features as the key vectors and the value vectors and the visual features as the query vectors to learn the cross-modal attention weight, and acquiring audio feature-enhanced visual features based on the cross-modal attention weight; and fusing the visual feature-enhanced audio features with the audio features to obtain first fused features, and fusing the audio feature-enhanced visual features with the visual features to obtain second fused features, and concatenating the first fused features with the second fused features to obtain the multimodal features (see Hao et al. [0045]). Regarding claim 5, Lipsmeier et al. (‘526) in view of Hao et al. (‘098) teach method of claim 1, wherein the multimodal deep learning model is trained through steps of: collecting a plurality of sets of audio-visual data of a plurality of test subjects while performing the speech task, wherein the plurality of test subjects comprise Parkinson's disease patients and healthy subjects; performing disease severity evaluation according to a unified Parkinson's disease rating scale (UPDRS) to annotate and score the plurality of sets of audio-visual data; and constructing a training data set based on the plurality of sets of annotated audio-visual data; and based on the training data set, training the multimodal deep learning model by means of a cross-entropy loss and a stochastic gradient descent optimizer until a preset number of iterations is reached (see Lipsmeier et al. [0123]-[0126]). Regarding claim 10, the claim is rejected mutatis mutandis in view of the rejection of claim 1 above by Lipsmeier et al. (‘526) in view of Hao et al. (‘098) including at least one non-transitory computer readable medium containing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising the operations described in relation to the disclosed methods (see Lipsmeier et al [0130]). Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK REMALY whose telephone number is (571)270-1491. The examiner can normally be reached Mon - Fri 9:00 - 6:00. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christopher Koharski can be reached at (571) 272-7230. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /MARK D REMALY/Primary Examiner, Art Unit 3797
Read full office action

Prosecution Timeline

Jan 17, 2025
Application Filed
Jan 10, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12599361
ULTRASONIC IMAGING SYSTEM AND METHOD
2y 5m to grant Granted Apr 14, 2026
Patent 12588848
DEVICES AND SYSTEMS FOR MEASURING MAGNETIC FIELDS
2y 5m to grant Granted Mar 31, 2026
Patent 12549866
LIGHT FIELD CAPTURE, COMPRESSION, TRANSMISSION AND RECONSTRUCTION
2y 5m to grant Granted Feb 10, 2026
Patent 12543940
ATHERECTOMY CATHETER DRIVE ASSEMBLIES
2y 5m to grant Granted Feb 10, 2026
Patent 12539086
METHOD, APPARATUS AND SYSTEM FOR FILTERING PULSE SIGNAL MOTION INTERFERENCE
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
69%
Grant Probability
85%
With Interview (+15.8%)
3y 7m
Median Time to Grant
Low
PTA Risk
Based on 709 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month