Last updated: April 19, 2026

Application No. 18/644,612

METHOD AND SYSTEM FOR VOICE CONTROL OF A DEVICE

Non-Final OA §101§102§103§112

Filed

Apr 24, 2024

Examiner

PATEL, HEMANT SHANTILAL

Art Unit

2694

Tech Center

2600 — Communications

Assignee

Siemens Healthineers AG

OA Round

1 (Non-Final)

Interview Optional

— +13.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 939 resolved cases, 2023–2026

Examiner Intelligence

PATEL, HEMANT SHANTILAL View full profile →

Grants 81% — above average

Career Allow Rate

761 granted / 939 resolved

+19.0% vs TC avg

Moderate +14% lift

Without

With

+13.6%

Interview Lift

resolved cases with interview

Typical timeline

2y 10m

Avg Prosecution

25 currently pending

Career history

964

Total Applications

across all art units

Statute-Specific Performance

§101

4.5%

-35.5% vs TC avg

§103

44.9%

+4.9% vs TC avg

§102

15.4%

-24.6% vs TC avg

§112

22.9%

-17.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 939 resolved cases

Office Action

§101 §102 §103 §112

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because it recites “computer program product”. The computer program product as disclosed in the specification [Paragraph 0112] is a program/software. It does not fall into one of the statutory categories.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 6 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. It recites “the analyzing the image signal authentication the person as authorized to operate the device based on the identity” (emphasis added). It is not clear what is meant by “analyzing the image signal authentication the person as authorized”.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-4, 8, 10, 12-16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Cagle (US Patent Application Publication No. 2020/0360098).
Regarding claim 1, Cagle teaches a computer-implemented method for voice control of a device comprising:
recording an audio signal via an audio recording device (Paragraphs 0007, 0031- 0033, 0036, 0040-0041 voice sensors/microphone recording voice);
recording an image signal of an environment of the device via an image recording device; analyzing the image signal to provide an image analysis result (Paragraphs 0009, 0026-0027, 0041 camera capturing user images and analyzing for lip motions);
processing the audio signal using the image analysis result to provide an audio analysis result (Paragraphs 0036, 0041 lip-synched voice signal);
generating a control signal for controlling the device based on the audio analysis result; and inputting the control signal into the device (Abstract, Paragraphs 0034, 0036, 0038, 0040-0041 voice signal based commands to surgical instrument) (Paragraphs 0007-0015, 0025-0051 for complete details).
Regarding claim 2, Cagle teaches wherein the audio signal contains a voice input of a person, and the image signal contains an image of the person (Paragraphs 0007, 0009, 0031, 0033, 0040-0041).
Regarding claim 3, Cagle teaches wherein the recording the image signal comprises: aligning the image recording device onto the person (Paragraphs 0009, 0041 aligning onto the face of designated user).
Regarding claim 4, Cagle teaches wherein the analyzing the image signal comprises: detecting a speech activity of the person, and the image analysis result comprises the speech activity (Paragraphs 0009, 0041 spoken voice activity and lip movement matched).
Regarding claim 8, Cagle teaches generating a verification signal based on the image analysis result to confirm a voice input contained in the audio signal, and the generating the control signal is based on the verification signal (Paragraphs 0009,  0041).
Regarding claim 10, Cagle teaches wherein the image analysis result comprises a speech activity of a person and the verification signal is based on a determination of a temporal coherence of the speech activity and the voice input (Paragraphs 0009,  0041 lip motion and voice signal at similar time).
Regarding claim 12, Cagle teaches a voice analysis device for voice control of a device comprising:
an interface (Paragraph 0031 wireless/ data cable) configured to receive an audio signal recorded via an audio recording device (Fig. 2 items 212, 214, Fig. 4 items 404, 406, Paragraphs 0007, 0031- 0033, 0036, 0040-0041 voice sensors/microphone recording voice) and an image signal of an environment of the device recorded via an image recording device (Fig. 4 item 402, Paragraphs 0009, 0026-0027, 0041 camera capturing user images and analyzing for lip motions), and a control device (Fig. 1 item 115, Figs. 2-3 item 250) configured to cause the voice analysis device to, analyze the image signal to provide an image analysis result, process the audio signal using the image analysis result to provide an audio analysis result, generate a control signal to control the device based on the audio analysis result, and input the control signal into the device (Paragraphs 0007, 0031-0034, 0036, 0038-0041 lip-synched voice signal processed command to generate device control signal) (Paragraphs 0007-0015, 0025-0051 for complete details).
Regarding claim 13, Cagle teaches the voice analysis device of claim 12; and the device, wherein the device is configured to perform a medical procedure (Paragraphs 0028, 0032, 0036, 0038, 0040-0041).
Regarding claim 14, Cagle teaches a computer program product which comprises a program, when executed by a programmable computing unit, causes the programmable computing unit to perform the method of claim 1 (Paragraph 0043).
Regarding claim 15, Cagle teaches a non-transitory computer-readable storage medium on which readable and executable program sections are stored that, when executed by a programmable computing unit, cause the programmable computing unit to perform the method of claim 1 (Paragraphs 0042-0045, 0047).
Regarding claim 16, Cagle teaches wherein the image signal is an image taken of a face of the person (Paragraphs 0009,  0041 face with lips).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5-6, 9, 11 are rejected under 35 U.S.C. 103 as being unpatentable over Cagle as applied to claims 2, 8 above, and further in view of Ichikawa (US Patent Application Publication No. 2011/0235870).
Regarding claim 5, Cagle teaches analyzing user voice and position to identify designated user, but Cagle does not teach recognizing the person to establish an identity of the person, and the image analysis result comprises the identity of the person.
However, in the similar field of device control, Ichikawa teaches recognizing the person to establish an identity of the person, and the image analysis result comprises the identity of the person (Paragraphs 0067-0073, 0080).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the present invention to modify Cagle to include recognizing the person to establish an identity of the person, and the image analysis result comprises the identity of the person as taught by Ichikawa so that “the object person is determined as a registered person in the facial identification result and the speech recognition result is determined as a match with the password associated with the registered person” (Ichikawa, Paragraph 0072).
Regarding claim 6, Ichikawa teaches the analyzing the image signal authentication the person as authorized to operate the device based on the identity, and at least one of the processing the audio signal, the generating the control signal, or the inputting the control signal is performed only if the person has been authenticated as authorized to operate the device (Paragraphs 0072-0073, 0080).
Regarding claim 9, Cagle does not explicitly teach the image analysis result comprises a detection of a person and the verification signal is based on a presence of the person.
However, in the similar field of device control, Ichikawa teaches the image analysis result comprises a detection of a person and the verification signal is based on a presence of the person (Paragraphs 0067-0073, 0080).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the present invention to modify Cagle to include the image analysis result comprises a detection of a person and the verification signal is based on a presence of the person as taught by Ichikawa so that “the object person is determined as a registered person in the facial identification result and the speech recognition result is determined as a match with the password associated with the registered person” (Ichikawa, Paragraph 0072).
Regarding claim 11, Cagle teaches the verification signal is based on an authentication of the person as authorized to operate the device based on the identity (Paragraph 0032), Cagle does not explicitly teach the image analysis result comprises an identity of a person to authenticate the person as authorized to operate the device based on the identity.
However, in the similar field of device control, Ichikawa teaches the image analysis result comprises an identity of a person to authenticate the person as authorized to operate the device based on the identity (Paragraphs 0067-0073, 0080).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the present invention to modify Cagle to include the image analysis result comprises an identity of a person to authenticate the person as authorized to operate the device based on the identity as taught by Ichikawa so that “the object person is determined as a registered person in the facial identification result and the speech recognition result is determined as a match with the password associated with the registered person” (Ichikawa, Paragraph 0072).

Claims 7, 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Cagle as applied to claims 1, 2 above, and further in view of Kim (US Patent Application Publication No. 2018/0268812).
Regarding claim 7, Cagle teaches the generating the control signal is based on the voice data stream, but Cagle does not teach wherein the processing the audio signal comprises: detecting a start of a voice input in the audio signal based on the image analysis result, detecting an end of the voice input based on the image analysis result, and providing a voice data stream based on the audio signal between the detected start and the detected end as the audio analysis result.
However, in the similar field of recognition, Kim teaches detecting a start of a voice input in the audio signal based on the image analysis result, detecting an end of the voice input based on the image analysis result, and providing a voice data stream based on the audio signal between the detected start and the detected end as the audio analysis result (Paragraphs 0004, 0072-0078).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the present invention to modify Cagle to include detecting a start of a voice input in the audio signal based on the image analysis result, detecting an end of the voice input based on the image analysis result, and providing a voice data stream based on the audio signal between the detected start and the detected end as the audio analysis result as taught by Kim in order to “reduce false positive voice query detection” and “identify the occurrence of multiple voice commands within audio data” (Kim, Paragraph 0005).
Regarding claim 17, Cagle teaches the generating the control signal is based on the voice data stream, but Cagle does not teach wherein the processing the audio signal comprises: detecting a start of a voice input in the audio signal based on the image analysis result, detecting an end of the voice input based on the image analysis result, and providing a voice data stream based on the audio signal between the detected start and the detected end as the audio analysis result.
However, in the similar field of recognition, Kim teaches detecting a start of a voice input in the audio signal based on the image analysis result, detecting an end of the voice input based on the image analysis result, and providing a voice data stream based on the audio signal between the detected start and the detected end as the audio analysis result (Paragraphs 0004, 0072-0078).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the present invention to modify Cagle to include detecting a start of a voice input in the audio signal based on the image analysis result, detecting an end of the voice input based on the image analysis result, and providing a voice data stream based on the audio signal between the detected start and the detected end as the audio analysis result as taught by Kim in order to “reduce false positive voice query detection” and “identify the occurrence of multiple voice commands within audio data” (Kim, Paragraph 0005).
Regarding claim 18, Cagle teaches generating a verification signal based on the image analysis result to confirm a voice input contained in the audio signal, and the generating the control signal is based on the verification signal (Paragraphs 0009,  0041).

Claims 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Cagle and Kim as applied to claim 18 above, and further in view of Ichikawa.
Regarding claim 19, Cagle and Kim do not explicitly teach the image analysis result comprises a detection of a person and the verification signal is based on a presence of the person.
However, in the similar field of device control, Ichikawa teaches the image analysis result comprises a detection of a person and the verification signal is based on a presence of the person (Paragraphs 0067-0073, 0080).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the present invention to modify Cagle and Kim to include the image analysis result comprises a detection of a person and the verification signal is based on a presence of the person as taught by Ichikawa so that “the object person is determined as a registered person in the facial identification result and the speech recognition result is determined as a match with the password associated with the registered person” (Ichikawa, Paragraph 0072).
Regarding claim 20, Cagle teaches wherein the image analysis result comprises a speech activity of a person and the verification signal is based on a determination of a temporal coherence of the speech activity and the voice input (Paragraphs 0009,  0041 lip motion and voice signal at similar time).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HEMANT PATEL whose telephone number is (571)272-8620. The examiner can normally be reached M-F 8:00 AM - 4:30 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fan Tsang can be reached at 571-272-7547. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

HEMANT PATEL
Primary Examiner
Art Unit 2694


/HEMANT S PATEL/           Primary Examiner, Art Unit 2694

Read full office action

Prosecution Timeline

Apr 24, 2024

Application Filed

Jan 29, 2026

Non-Final Rejection — §101, §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/129,192

Patent 12598254

SYSTEMS AND METHODS RELATING TO GENERATING SIMULATED INTERACTIONS FOR TRAINING CONTACT CENTER AGENTS

2y 5m to grant Granted Apr 07, 2026

18/551,621

Patent 12592843

INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD

2y 5m to grant Granted Mar 31, 2026

18/387,144

Patent 12578920

AUDIO SYSTEM CONTROL DEVICE

2y 5m to grant Granted Mar 17, 2026

18/582,428

Patent 12573409

AUDIO ENCODER, METHOD FOR PROVIDING AN ENCODED REPRESENTATION OF AN AUDIO INFORMATION, COMPUTER PROGRAM AND ENCODED AUDIO REPRESENTATION USING IMMEDIATE PLAYOUT FRAMES

2y 5m to grant Granted Mar 10, 2026

18/349,756

Patent 12563160

MULTIUSER TELECONFERENCING WITH SPOTLIGHT FEATURE

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

81%

Grant Probability

95%

With Interview (+13.6%)

2y 10m

Median Time to Grant

Low

PTA Risk

Based on 939 resolved cases by this examiner. Grant probability derived from career allow rate.