Last updated: April 19, 2026

Application No. 19/027,045

METHOD AND SYSTEM FOR EARLY DIAGNOSIS OF PARKINSON'S DISEASE BASED ON MULTIMODAL DEEP LEARNING

Non-Final OA §103

Filed

Jan 17, 2025

Examiner

REMALY, MARK DONALD

Art Unit

3797

Tech Center

3700 — Mechanical Engineering & Manufacturing

Assignee

Shandong University

OA Round

1 (Non-Final)

Interview Optional

— +15.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 709 resolved cases, 2023–2026

Examiner Intelligence

REMALY, MARK DONALD View full profile →

Grants 69% — above average

Career Allow Rate

492 granted / 709 resolved

-0.6% vs TC avg

Strong +16% interview lift

Without

With

+15.8%

Interview Lift

resolved cases with interview

Typical timeline

3y 7m

Avg Prosecution

24 currently pending

Career history

733

Total Applications

across all art units

Statute-Specific Performance

§101

6.8%

-33.2% vs TC avg

§103

37.8%

-2.2% vs TC avg

§102

23.6%

-16.4% vs TC avg

§112

28.5%

-11.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 709 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Election/Restrictions
Claims 6-9 are withdrawn from further consideration pursuant to 37 CFR 1.142(b) as being drawn to a nonelected system, there being no allowable generic or linking claim. Election was made without traverse in the reply filed on 12/05/2025.
Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in China on 01/18/2024. It is noted, however, that applicant has not filed a certified copy of the CN202410080108.X application as required by 37 CFR 1.55.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-5 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lipsmeier et al. (US 2023/0172526 A1) in view of Hao et al. (US 2022/0280098 A1).
Regarding claim 1, Lipsmeier et al. (‘526) teach a method for early diagnosis of Parkinson's disease (“Parkinson’s disease” see [0105]) based on multimodal deep learning (“deep learning models” see [0208]), comprising: (1) acquiring audio data of a to-be-diagnosed subject while performing a speech task (“identifying a plurality of segments of the voice recording” see [0007]); (2) preprocessing the audio data to extract a plurality of audio segments; and calculating a Mel-spectrogram of each of the plurality of audio segments (“computing one or more Mel-frequency cepstral coefficients (MFCCs) for the segments” see [0007]); and (3) and the Mel-spectrogram into a multimodal deep learning model to output a classification result for Parkinson's disease early diagnosis of the to-be-diagnosed subject, wherein the multimodal deep learning model comprises a local feature extraction module, an audio feature extraction module, a feedforward network and a cross-attention module (“classifying the subject as belonging to one of the plurality of UHDRS dysarthria score classes” see [0123]); wherein step (3) is performed through steps of: (3.1) and extracting audio features from the Mel-spectrogram through the audio feature extraction module (“computing one or more Mel-frequency cepstral coefficients (MFCCs) for the segments” see [0007]); and (3.2) inputting the audio features to the feedforward network, and the audio features to the cross-attention module to learn a cross-modal attention weight (“threshold may be determined as a weighted average of the relative energy values assumed to correspond to signal and the relative energy values assumed to correspond to background noise” see [0042]); performing feature fusion based on the cross-modal attention weight to obtain multimodal features and outputting the classification result based on the multimodal features (“determining the speech rate associated with the voice recording comprises computing the total number of words in the recording and diving the total number of words by the length of the recording” see [0048]). 
Lipsmeier et al. fails to explicitly teach wherein each of the plurality of audio segments corresponds to a synchronized one among the plurality of video segments. However, Hao et al. (‘098) from the same field of endeavor do teach a plurality of audio segments corresponds to a synchronized one among the plurality of video segments (“Parkinson's symptoms assessor 134 may convey the one or more Parkinson's symptoms assessments to the user in the form of audio, video, text, or any other manner” see [0051]); extracting a face image sequence from each of the plurality of video segments (“video processing to extract features from collected video of an individual moving their face” see [0038]); and inputting the face image sequence as part of the multimodal features (“assess Parkinson's disease symptoms” see [0053]). It would be obvious to one of ordinary skill in the art at the time of the invention to modify the invention of Lipsmeier et al. with the features of Hao et al. for the benefit of more accurate Parkinson's disease symptom identification (see Lipsmeier et al. [0009]).
Regarding claim 2, Lipsmeier et al. (‘526) in view of Hao et al. (‘098) teach the method of claim 1, wherein the local feature extraction module comprises a visual front-end network and a visual temporal network; the visual front-end network is based on ShuffleNet-V2, and further comprises a two-dimensional (2D) convolution module; the visual front-end network is configured to encode the face image sequence into a frame-based embedding sequence; and the visual temporal network consists of a video temporal convolution module, and is configured to capture facial motion visual features in different time intervals; and the step of extracting the visual features from the face image sequence through the local feature extraction module comprises: extracting facial visual features from each frame of the face image sequence through the visual front-end network, and extracting the visual features from the facial visual features through the visual temporal network, wherein the visual features are time-correlated (see Hao et al. [0036]).
Regarding claim 3, Lipsmeier et al. (‘526) in view of Hao et al. (‘098) teach method of claim 1, wherein the audio feature extraction module is a VGGish network provided with a convolution module; the audio feature extraction module is configured to extract the audio features at different time intervals from the plurality of audio segments; and the step of extracting the audio features from the Mel-spectrogram through the audio feature extraction module comprises: inputting the Mel-spectrogram into the audio feature extraction module, and extracting the audio features through the VGGish network, wherein the audio features are time-correlated. (see Lipsmeier et al. [0007]);
Regarding claim 4, Lipsmeier et al. (‘526) in view of Hao et al. (‘098) teach method of claim 1, wherein step (3.2) comprises: after the visual features and the audio features pass through the feedforward network, inputting the visual features and the audio features into the cross-attention module with the visual features as key vectors and value vectors and the audio features as query vectors to learn the cross-modal attention weight, and acquiring visual feature-enhanced audio features based on the cross-modal attention weight; and inputting the visual features and the audio features into the cross-attention module with the audio features as the key vectors and the value vectors and the visual features as the query vectors to learn the cross-modal attention weight, and acquiring audio feature-enhanced visual features based on the cross-modal attention weight; and fusing the visual feature-enhanced audio features with the audio features to obtain first fused features, and fusing the audio feature-enhanced visual features with the visual features to obtain second fused features, and concatenating the first fused features with the second fused features to obtain the multimodal features (see Hao et al. [0045]).
Regarding claim 5, Lipsmeier et al. (‘526) in view of Hao et al. (‘098) teach method of claim 1, wherein the multimodal deep learning model is trained through steps of: collecting a plurality of sets of audio-visual data of a plurality of test subjects while performing the speech task, wherein the plurality of test subjects comprise Parkinson's disease patients and healthy subjects; performing disease severity evaluation according to a unified Parkinson's disease rating scale (UPDRS) to annotate and score the plurality of sets of audio-visual data; and constructing a training data set based on the plurality of sets of annotated audio-visual data; and based on the training data set, training the multimodal deep learning model by means of a cross-entropy loss and a stochastic gradient descent optimizer until a preset number of iterations is reached (see Lipsmeier et al. [0123]-[0126]).
Regarding claim 10, the claim is rejected mutatis mutandis in view of the rejection of claim 1 above by Lipsmeier et al. (‘526) in view of Hao et al. (‘098) including at least one non-transitory computer readable medium containing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising the operations described in relation to the disclosed methods (see Lipsmeier et al [0130]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK REMALY whose telephone number is (571)270-1491. The examiner can normally be reached Mon - Fri 9:00 - 6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christopher Koharski can be reached at (571) 272-7230. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARK D REMALY/Primary Examiner, Art Unit 3797

Read full office action

Prosecution Timeline

Jan 17, 2025

Application Filed

Jan 10, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/168,786

Patent 12599361

ULTRASONIC IMAGING SYSTEM AND METHOD

2y 5m to grant Granted Apr 14, 2026

18/698,987

Patent 12588848

DEVICES AND SYSTEMS FOR MEASURING MAGNETIC FIELDS

2y 5m to grant Granted Mar 31, 2026

18/670,714

Patent 12549866

LIGHT FIELD CAPTURE, COMPRESSION, TRANSMISSION AND RECONSTRUCTION

2y 5m to grant Granted Feb 10, 2026

18/939,422

Patent 12543940

ATHERECTOMY CATHETER DRIVE ASSEMBLIES

2y 5m to grant Granted Feb 10, 2026

18/497,950

Patent 12539086

METHOD, APPARATUS AND SYSTEM FOR FILTERING PULSE SIGNAL MOTION INTERFERENCE

2y 5m to grant Granted Feb 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

69%

Grant Probability

85%

With Interview (+15.8%)

3y 7m

Median Time to Grant

Low

PTA Risk

Based on 709 resolved cases by this examiner. Grant probability derived from career allow rate.