Last updated: April 19, 2026

Application No. 18/708,772

VOICE DIALOG PROCESSING METHOD AND APPARATUS BASED ON MULTI-MODAL FEATURE, AND ELECTRONIC DEVICE

Non-Final OA §101§102

Filed

May 09, 2024

Examiner

CHAWAN, VIJAY B

Art Unit

2658

Tech Center

2600 — Communications

Assignee

Jingdong Technology Information Technology Co. Ltd.

OA Round

1 (Non-Final)

Interview Optional

— +11.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 882 resolved cases, 2023–2026

Examiner Intelligence

CHAWAN, VIJAY B View full profile →

Grants 88% — above average

Career Allow Rate

776 granted / 882 resolved

+26.0% vs TC avg

Moderate +12% lift

Without

With

+11.6%

Interview Lift

resolved cases with interview

Typical timeline

2y 8m

Avg Prosecution

21 currently pending

Career history

903

Total Applications

across all art units

Statute-Specific Performance

§101

20.9%

-19.1% vs TC avg

§103

13.8%

-26.2% vs TC avg

§102

33.8%

-6.2% vs TC avg

§112

9.4%

-30.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 882 resolved cases

Office Action

§101 §102

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	Status of Claims

Claims 1-17 were cancelled with the amendment to the claims filed 05/09/2024. New claims 18-32 were added. Pending claims are 18-32.

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 18-32 are rejected under 35 U.S.C. 101 because the claims are directed toward an abstract idea without significantly more. 
Claim 18, is rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (an abstract idea) and does not include additional elements that amount significantly more than the judicial exception. 
Step 1
Claim 18 is directed toward a “method”, which is a machine and thus falls within a statutory category under the most recent guidelines of 35 U.S.C. 101.
Step 2A, Prong 1
Claim 18 recites  instructions for “acquiring, in a process of performing dialogue interaction with a user, first voice information that the user currently inputs, wherein the first voice information comprises a silent segment”; “determining, according to text information of the first voice information and historical context information of the first voice information, semantic feature information of the text information”; “determining, according to a voice fragment, which is before the silent segment, in the first voice information, phonetic feature information of the first voice information”; “acquiring temporal feature information of the first voice information”; and “determining, according to the semantic feature information, the phonetic feature information and the temporal feature information, whether the user ends voice input.” These limitations collectively recite the collection, evaluation and determination of information, including speech/silence determination using semantic and phonetic features. As characterized by the USPTO guidance and case law, such activities fall within the abstract-idea groupings of mental processes (e.g. observations, evaluations, and judgments that could be performed in the human mind or with pen and paper) and organizing /transmitting information. Reference can be made to latest patent eligibility guidelines. Accordingly, claim 18 recites an abstract idea.
Step 2A, Prong 2 
The claim does not recite any specific improvement to computer functionality (e.g., a particular translation algorithm, model architecture, data structure, memory organization, caching mechanism, latency-reduction technique, or network protocol that improves the operation of the computer or network). Nor does it effect a transformation of a physical article or use the abstract idea in any other manner that imposes a meaningful limit on the claim’s scope. Therefore, the claim does not integrate the abstract idea into a practical application under Step 2A, Prong 2.
Step 2B 
Beyond the abstract idea, the additional elements are the generic “server,” “one or more processors,” and “memory” performing their conventional functions. Implementing the abstract idea on generic computer components does not amount to significantly more. Alice, 573 U.S. at 223–24).
The ordered combination of limitations mirrors the abstract idea itself performed using routine computer operations. There is no recited unconventional hardware, no technical improvement to the functioning of the computer itself, and no nonconventional arrangement of known components etc.
Accordingly, claim 18 does not include an “inventive concept” sufficient to transform the abstract idea into a patent-eligible application.
Therefore , claim 18 is directed to an abstract idea and does not recite additional elements that integrate the exception into a practical application or amount to significantly more than the exception itself. Claim 18 is therefore rejected under 35 U.S.C. § 101. Dependent claims 19-24 do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination do not amount to significantly more than the abstract idea. 
Claim 25 is directed toward a “device”, which is a machine and thus falls within a statutory category under the most recent guidelines of 35 U.S.C. 101.
Step 2A, Prong 1
Claim 25 recites  instructions for “acquiring, in a process of performing dialogue interaction with a user, first voice information that the user currently inputs, wherein the first voice information comprises a silent segment”; “determining, according to text information of the first voice information and historical context information of the first voice information, semantic feature information of the text information”; “determining, according to a voice fragment, which is before the silent segment, in the first voice information, phonetic feature information of the first voice information”; “acquiring temporal feature information of the first voice information”; and “determining, according to the semantic feature information, the phonetic feature information and the temporal feature information, whether the user ends voice input.” These limitations collectively recite the collection, evaluation and determination of information, including speech/silence determination using semantic and phonetic features. As characterized by the USPTO guidance and case law, such activities fall within the abstract-idea groupings of mental processes (e.g. observations, evaluations, and judgments that could be performed in the human mind or with pen and paper) and organizing /transmitting information. Reference can be made to latest patent eligibility guidelines. Accordingly, claim 18 recites an abstract idea.
Step 2A, Prong 2 
The claim does not recite any specific improvement to computer functionality (e.g., a particular translation algorithm, model architecture, data structure, memory organization, caching mechanism, latency-reduction technique, or network protocol that improves the operation of the computer or network). Nor does it effect a transformation of a physical article or use the abstract idea in any other manner that imposes a meaningful limit on the claim’s scope. Therefore, the claim does not integrate the abstract idea into a practical application under Step 2A, Prong 2.
Step 2B 
Beyond the abstract idea, the additional elements are the generic “server,” “one or more processors,” and “memory” performing their conventional functions. Implementing the abstract idea on generic computer components does not amount to significantly more. Alice, 573 U.S. at 223–24).
The ordered combination of limitations mirrors the abstract idea itself performed using routine computer operations. There is no recited unconventional hardware, no technical improvement to the functioning of the computer itself, and no nonconventional arrangement of known components etc.
Accordingly, claim 25 does not include an “inventive concept” sufficient to transform the abstract idea into a patent-eligible application.
Therefore , claim 25 is directed to an abstract idea and does not recite additional elements that integrate the exception into a practical application or amount to significantly more than the exception itself. Claim 25 is therefore rejected under 35 U.S.C. § 101. Dependent claims 26 – 31 do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination do not amount to significantly more than the abstract idea. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. With respect to integration of the abstract idea into a practical application, the additional element of using a generic computing device the determining and data gathering steps amount to no more than mere instructions to apply the exception using a generic computer. The current specification on paragraph 0104,  clearly specifies that “… logic and/or steps represented in flowcharts or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing the logical functions, and can be embodied in any computer-readable medium for use by or in combination with an instruction execution system, apparatus, or device (such as a computer-based system, a system comprising a processor, or other systems that can fetch instructions from the instruction execution system, apparatus, or device and execute the instructions). As far as the present Description is concerned, the “computer-readable medium” may be any apparatus that can contain, store, communicate, propagate, or transmit a program for use by or in combination with an instruction execution system, apparatus, or device. More specific examples (non-exhaustive list) of the computer-readable storage medium include: an electrical connection with one or more wires (electronic device), a portable computer disk case (magnetic device), a random access memory (RAM), a read-only memory (ROM), an erasable, programmable read-only memory (EPROM or flash memory), a fiber optic device, and a portable compact disk read-only memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which a program can be printed, because the program can be obtained electronically, such as by optical scanning of paper or other media followed by editing, interpretation, or other suitable processing if necessary, and then stored in a computer memory.” The additional elements have been considered both individually and as an ordered combination in the significantly more consideration.  The inclusion of the computer or memory and controller to perform the selecting and generating steps amount to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using a generic computing device cannot provide an inventive concept. Therefore, claim 25 as drafted is not patent eligible. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination do not amount to significantly more than the abstract idea. 
Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.  Independent claim 25 is therefore not drawn to eligible subject matter as they are directed to an abstract idea without significantly more. Claims 26 - 31 are dependent claims and do not contain subject matter that can be overcome the rejection of independent claim 25. Claim 32 is directed toward a non-transitory computer readable medium with instructions to implement the method of claim 18 and is rejected under similar rationale.
All dependent claims when analyzed as a whole are held to be patent ineligible under 35 U.S.C. §101 because any additional recited limitations fail to establish that the claims are not directed to an abstract idea for the same reasons already recited for the independent claims.


Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 18-32 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Krishnan et al., (US 2022/0093101 A1) .
As per claims 18, 25 and 32, Krishnan et al., teach a voice dialogue processing method/device/non-transitory computer readable medium with instructions to implement said method, based on a multi-modal features (0344) comprising: 
acquiring, in a process of performing dialogue interaction with a user, first voice information that the user currently inputs, wherein the first voice information comprises a silent segment (0040, 0347, 0348); 
determining, according to text information of the first voice information and historical context information of the first voice information, semantic feature information of the text information (0133, 0075); 
determining, according to a voice fragment, which is before the silent segment, in the first voice information, phonetic feature information of the first voice information; acquiring temporal feature information of the first voice information; and 
determining, according to the semantic feature information, the phonetic feature information and the temporal feature information, whether the user ends voice input (0164, 0406, 0511). 
As per claims 19 and 26, Krishnan et al., teach the method/device according to claims 18 and 25, wherein the determining, according to text information of the first voice information and historical context information of the first voice information, semantic feature information of the text information comprises: performing voice recognition (0446) on the first voice information to obtain text information of the first voice information (0133); acquiring historical context information of the first voice information (0274); and inputting the text information and the historical context information into a semantic representation model to obtain semantic feature information of the text information (0075). 
As per claims 20 and 27, Krishnan et al., teach the method/device according to claims 18 and 25, wherein the determining, according to a voice fragment, which is before the silent segment, in the first voice information, phonetic feature information of the first voice information comprises: acquiring a voice fragment of a first preset time length, which is before the silent segment, in the first voice information; segmenting, according to a second preset time length, the voice fragment to obtain multiple voice fragments (0498); extracting respective acoustic feature information of the multiple voice fragments, and splicing the respective acoustic feature information of the multiple voice fragments, respectively, to obtain respective splicing features of the multiple voice fragments (0130, 0498, 0504, 05047); inputting the splicing features into a deep residual network to obtain phonetic feature information of the first voice information (0063, 0130, 0429, 0477).
As per claims 21 and 28, Krishnan et al., teach the method/method according to claims 18 and 25, wherein the acquiring temporal feature information of the first voice information comprises: acquiring a voice duration, a speaking speed and a text length of the first voice information (0498); inputting the voice duration, the speaking speed and the text length into a pre-trained multi-layer perceptron MLP model to obtain temporal feature information of the first voice information (0164, 0406, 0511). 
As per claims 22 and 29, Krishnan et al., teach the method/device according to claims 18 and 25, wherein the determining, according to the semantic feature information, the phonetic feature information and the temporal feature information, whether the user ends voice input comprises: inputting the semantic feature information, the phonetic feature information and the temporal feature information into a multi-modal fusion model (0120, 0156, 0447); determining, according to an output result of the multi-modal fusion model, whether the user ends voice input (0061). 
As per claims 23 and 30, Krishnan et al., teach the method/device according to claims 18 and 25, further comprising: determining, in the case of determining that the user ends the voice input, first reply voice information corresponding to the first voice information, and outputting the first reply voice information (0109). 
As per claims 24 and 31, Krishnan et al., teach the method/device according to claims 18 and 25, further comprising: acquiring, in the case of determining that the user does not end the voice input, second voice information input again by the user; and determining, according to the first voice information and the second voice information, corresponding second reply voice information, and outputting the second reply voice information (0109, 0261, 0283).



Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
The following prior art references maybe used alone or in combination to apply with respect to Applicant claimed invention.
Klein et al., (US 2022/03088718 A1) teach enabling client applications to be heavily integrated with a voice assistant in order to perform commands associated with voice utterances of users via voice assistant functionality and also seamlessly cause client applications to automatically perform native functions as part of executing the voice utterance. Such heavy integration also allows particular embodiments to support multi-modal input from a user for a single conversational interaction. In this way, client application user interface interactions, such as clicks, touch gestures, or text inputs are executed alternative or in addition to the voice utterances.
Brunn et al., (US 2020/0042595 A1) teach a method, a device and a computer program product for processing a segment are proposed. In the method, a property of at least one of a first segment and a second segment in a segment set is obtained. The segment set includes a plurality of segments belonging to at least one conversation. The second segment occurs after the first segment. A boundary feature of at least one of the first segment and the second segment is determined based on the property. The boundary feature indicates whether there is a boundary of a conversation after the first segment.
Vig et al., (US 2018/0113854 A1) teach providing a system for automatically extracting conversational structure from a voice record based on lexical and acoustic features. The system also aggregates business-relevant statistics and entities from a collection of spoken conversations. The system may infer a coarse-level conversational structure based on fine-level activities identified from extracted acoustic features. The system improves significantly over previous systems by extracting structure based on lexical and acoustic features. This enables extracting conversational structure on a larger scale and finer level of detail than previous systems, and can feed an analytics and business intelligence platform, e.g. for customer service phone calls. During operation, the system obtains a voice record. The system then extracts a lexical feature using automatic speech recognition (ASR). The system extracts an acoustic feature. The system then determines, via machine learning and based on the extracted lexical and acoustic features, a coarse-level structure of the conversation.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIJAY B CHAWAN whose telephone number is (571)272-7601. The examiner can normally be reached 7-5 Monday thru Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/VIJAY B CHAWAN/Primary Examiner, Art Unit 2658

Read full office action

Prosecution Timeline

May 09, 2024

Application Filed

Jan 09, 2026

Non-Final Rejection — §101, §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/384,607

Patent 12603089

ELECTRONIC APPARATUS PERFORMING SPEECH RECOGNITION AND METHOD FOR CONTROLLING THEREOF

2y 5m to grant Granted Apr 14, 2026

18/438,891

Patent 12592229

WAKEWORD DETECTION

2y 5m to grant Granted Mar 31, 2026

18/512,110

Patent 12586579

End-To-End Segmentation in a Two-Pass Cascaded Encoder Automatic Speech Recognition Model

2y 5m to grant Granted Mar 24, 2026

18/814,983

Patent 12585895

Communication Channel Quality Improvement System Using Machine Conversions

2y 5m to grant Granted Mar 24, 2026

18/363,309

Patent 12579968

METHOD OF DETERMINING END POINT DETECTION TIME AND ELECTRONIC DEVICE FOR PERFORMING THE METHOD

2y 5m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

88%

Grant Probability

99%

With Interview (+11.6%)

2y 8m

Median Time to Grant

Low

PTA Risk

Based on 882 resolved cases by this examiner. Grant probability derived from career allow rate.