Last updated: May 29, 2026
Application No. 18/667,260
VOICE OPERATION DEVICE THAT OPERATES OPERATED DEVICE, COMPUTER READABLE NON-TRANSITORY RECORDING MEDIUM HAVING VOICE OPERATION PROGRAM STORED THEREIN, AND VOICE OPERATING SYSTEM

Final Rejection §102
Filed
May 17, 2024
Priority
May 19, 2023 — JP 2023-083247
Examiner
COLUCCI, MICHAEL C
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Kyocera Document Solutions Inc.
OA Round
2 (Final)
Interview Optional

— +15.2% interview lift. Examiner has a relatively high allowance rate (76%); +15.2% interview lift. A written response may suffice.
Based on 999 resolved cases, 2023–2026
Examiner Intelligence

COLUCCI, MICHAEL C View full profile →
Grants 76% — above average
Career Allowance Rate
758 granted / 999 resolved
+13.9% vs TC avg
Strong +15% interview lift
Without
With
+15.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
32 currently pending
Career history
1033
Total Applications
across all art units
Statute-Specific Performance

§101
3.6%
-36.4% vs TC avg
§103
86.9%
+46.9% vs TC avg
§102
2.9%
-37.1% vs TC avg
§112
1.1%
-38.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 999 resolved cases
Office Action

§102
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

Response to Arguments
Applicant's arguments with respect to claims 1-5 have been considered but are moot in view of the new ground(s) of rejection. Applicant’s arguments are directed to the amended subject matter; new citations from existing prior art have been incorporated to address the claim amendments. 


Note: The claims are not directed towards patent ineligible subject matter under 35 U.S.C. 101

Step 1: IS THE CLAIM DIRECTED TO A PROCESS, MACHINE, MANUFACTURE OR COMPOSITION OF MATTER?
Yes

Step 2A.1: IS THE CLAIM DIRECTED TO A LAW OF NATURE, A NATURAL PHENOMENON (PRODUCT OF NATURE) OR AN ABSTRACT IDEA? 
No

Step 2A.2: DOES THE CLAIM RECITE ADDITIONAL ELEMENTS THAT INTEGRATE THE JUDICIAL EXCEPTION INTO A PRACTICAL APPLICATION?
No. The claims demonstrate the transmission of voice recognition results from one device to another to control the operation of the second device. This is not necessarily a practical application since two devices can process data and communicate with one another in a specific way based on the processed data as it changes.
Supported by the following:
In Finjan Inc. v. Blue Coat Systems, Inc., 879 F.3d 1299, 125 USPQ2d 1282 (Fed. Cir. 2018), the claimed invention was a method of virus scanning that scans an application program, generates a security profile identifying any potentially suspicious code in the program, and links the security profile to the application program. 879 F.3d at 1303-04, 125 USPQ2d at 1285-86. The Federal Circuit noted that the recited virus screening was an abstract idea, and that merely performing virus screening on a computer does not render the claim eligible. 879 F.3d at 1304, 125 USPQ2d at 1286. The court then continued with its analysis under part one of the Alice/Mayo test by reviewing the patent’s specification, which described the claimed security profile as identifying both hostile and potentially hostile operations. The court noted that the security profile thus enables the invention to protect the user against both previously unknown viruses and “obfuscated code,” as compared to traditional virus scanning, which only recognized the presence of previously-identified viruses. The security profile also enables more flexible virus filtering and greater user customization. 879 F.3d at 1304, 125 USPQ2d at 1286. The court identified these benefits as improving computer functionality, and verified that the claims recite additional elements (e.g., specific steps of using the security profile in a particular way) that reflect this improvement. Accordingly, the court held the claims eligible as not being directed to the recited abstract idea. 879 F.3d at 1304-05, 125 USPQ2d at 1286-87. This analysis is equivalent to the Office’s analysis of determining that the additional elements integrate the judicial exception into a practical application at Step 2A Prong Two, and thus that the claims were not directed to the judicial exception (Step 2A: NO).
Examples of claims that improve technology and are not directed to a judicial exception include: Enfish, LLC v. Microsoft Corp., 822 F.3d 1327, 1339, 118 USPQ2d 1684, 1691-92 (Fed. Cir. 2016) (claims to a self-referential table for a computer database were directed to an improvement in computer capabilities and not directed to an abstract idea); McRO, Inc. v. Bandai Namco Games Am. Inc., 837 F.3d 1299, 1315, 120 USPQ2d 1091, 1102-03 (Fed. Cir. 2016) (claims to automatic lip synchronization and facial expression animation were directed to an improvement in computer-related technology and not directed to an abstract idea); Visual Memory LLC v. NVIDIA Corp., 867 F.3d 1253,1259-60, 123 USPQ2d 1712, 1717 (Fed. Cir. 2017) (claims to an enhanced computer memory system were directed to an improvement in computer capabilities and not an abstract idea); Finjan Inc. v. Blue Coat Systems, Inc., 879 F.3d 1299, 125 USPQ2d 1282 (Fed. Cir. 2018) (claims to virus scanning were found to be an improvement in computer technology and not directed to an abstract idea); SRI Int’l, Inc. v. Cisco Systems, Inc., 930 F.3d 1295, 1303 (Fed. Cir. 2019) (claims to detecting suspicious activity by using network monitors and analyzing network packets were found to be an improvement in computer network technology and not directed to an abstract idea). Additional examples are provided in MPEP § 2106.05(a).
Regarding the December 5th 2025 Memo in light of September 26, 2025 Appeals Review Panel Decision in Ex parte Desjardins, Appeal 2024-000567 for Application 16/319,040, in deciding if a recited abstract idea does or does not direct the entire claim to an abstract idea, when a claim is considered as a whole:
Paragraph 21 of the Specification, which the Appellant cites, identifies improvements in training the machine learning model itself. Of course, such an assertion in the Specification alone is insufficient to support a patent eligibility determination, absent a subsequent determination that the claim itself reflects the disclosed improvement. See MPEP § 2106.05(a) (citing Intellectual Ventures I LLC v. Symantec Corp., 838 F.3d 1307, 1316 (Fed. Cir. 2016)). Here, however, we are persuaded that the claims reflect such an improvement. For example, one improvement identified in the 8 Appeal2024-000567 Application 16/319,040 Specification is to "effectively learn new tasks in succession whilst protecting knowledge about previous tasks." Spec. ,r 21. The Specification also recites that the claimed improvement allows artificial intelligence (AI) systems to "us[e] less of their storage capacity" and enables "reduced system complexity." Id. When evaluating the claim as a whole, we discern at least the following limitation of independent claim 1 that reflects the improvement: "adjust the first values of the plurality of parameters to optimize performance of the machine learning model on the second machine learning task while protecting performance of the machine learning model on the first machine learning task." We are persuaded that constitutes an improvement to how the machine learning model itself operates, and not, for example, the identified mathematical calculation. Under a charitable view, the overbroad reasoning of the original panel below is perhaps understandable given the confusing nature of existing § 101 jurisprudence, but troubling, because this case highlights what is at stake. Categorically excluding AI innovations from patent protection in the United States jeopardizes America's leadership in this critical emerging technology. Yet, under the panel's reasoning, many AI innovations are potentially unpatentable-even if they are adequately described and nonobvious-because the panel essentially equated any machine learning with an unpatentable "algorithm" and the remaining additional elements as "generic computer components," without adequate explanation. Dec. 24. Examiners and panels should not evaluate claims at such a high level of generality.
Specifically, Ex Parte Desjardins explained the following: 
Enfish ranks among the Federal Circuit's leading cases on the eligibility of technological improvements. In particular, Enfish recognized that “[m]uch of the advancement made in computer technology consists of improvements to software that, by their very nature, may not be defined by particular physical features but rather by logical structures and processes.” 822 F.3d at 1339. Moreover, because “[s]oftware can make non-abstract improvements to computer technology, just as hardware improvements can,” the Federal Circuit held that the eligibility determinations should turn on whether “the claims are directed to an improvement to computer functionality versus being directed to an abstract idea.” Id. at 1336. (Desjardins, page 8).
Further in Ex Parte Desjardins, Appeal No. 2024-000567 (PTAB September 26, 2025, Appeals Review Panel Decision) (precedential), the claimed invention was a method of training a machine learning model on a series of tasks. The Appeals Review Panel (ARP) overall credited benefits including reduced storage, reduced system complexity and streamlining, and preservation of performance attributes associated with earlier tasks during subsequent computational tasks as technological improvements that were disclosed in the patent application specification. Specifically, the ARP upheld the Step 2A Prong One finding that the claims recited an abstract idea (i.e., mathematical concept). In Step 2A Prong Two, the ARP then determined that the specification identified improvements as to how the machine learning model itself operates, including training a machine learning model to learn new tasks while protecting knowledge about previous tasks to overcome the problem of “catastrophic forgetting” encountered in continual learning systems. Importantly, the ARP evaluated the claims as a whole in discerning at least the limitation “adjust the first values of the plurality of parameters to optimize performance of the machine learning model on the second machine learning task while protecting performance of the machine learning model on the first machine learning task” reflected the improvement disclosed in the specification. Accordingly, the claims as a whole integrated what would otherwise be a judicial exception instead into a practical application at Step 2A Prong Two, and therefore the claims were
The claim itself does not need to explicitly recite the improvement described in the specification (e.g., “thereby increasing the bandwidth of the channel”). See, e.g., Ex Parte Desjardins, Appeal No. 2024-000567 (PTAB September 26, 2025, Appeals Review Panel Decision) (precedential), in which the specification identified the improvement to machine learning technology by explaining how the machine learning model is trained to learn new tasks while protecting knowledge about previous tasks to overcome the problem of “catastrophic forgetting,” and that the claims reflected the improvement identified in the specification. Indeed, enumerated improvements identified in the Desjardins specification included disclosures of the effective learning of new tasks in succession in connection with specifically protecting knowledge concerning previously accomplished tasks; allowing the system to reduce use of storage capacity; and the enablement of reduced complexity in the system. Such improvements were tantamount to how the machine learning model itself would function in operation and therefore not subsumed in the identified mathematical calculation.


Step 2B: DOES THE CLAIM RECITE ADDITIONAL ELEMENTS THAT AMOUNT TO SIGNIFICANTLY MORE THAN THE JUDICIAL EXCEPTION?
If Yes at step 2A.1 and step 2A.2 fails, the interpretation in the context of 35 USC 101 amounts to the transmission of voice recognition results from one device to another to control the operation of the second device. This is not extra-solution activity, since user selected media is not taking place, nor is it collecting or manipulating data per se. The only interaction a human has is the voice input, such that the device operations can take place without the intervention of the user. Further, while well-known and not a practical application, a single human cannot be substituted for, and transmit information between two devices and operate the device accordingly based on the result of transmission thereof. Since the devices performs the operations as claimed as non-mental, non-human involved, and non-extra solution activity, it not be justified to give a rejection under 35 USC 101.
Such claim language provides significance beyond mere data manipulation or mental decision making and is also analogous to interim amendment examples. Additionally, if one or ordinary skill in the art fails to properly consider any specified hardware or arranged components in the claims, considers such claim limitations non-generic, or simply disregards Enfish, the claims still demonstrate that there exists improvements to the functioning of a computer (or technology), e.g., a modification of conventional Internet hyperlink protocol to dynamically produce a dual-source hybrid webpage, as discussed in DDR Holdings, LLC v. Hotels.com, L.P., 773 F.3d 1245, 1258-59, 113 USPQ2d 1097, 1106-07 (Fed. Cir. 2014) (see MPEP § 2106.05(a));. Therefore, in light of the above as a whole, it is not warranted to give a rejection under 35 USC 101.


Claim Interpretation

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

	NOTE: The claims are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The claim limitations, specifically claims 1-5, in question recite:
“a voice operation device…”
“a voice receiver…”
“a storage device…”
“a first communication device…”
“a first controller…”
“a second communication device…”
“a second controller…”
“a command executor …”

In the scope of software-hardware, such elements preceding device and receiver provide structure that aligns with the courts and is analogous to BRI examples such as a “digital detector” in the scope of computing per se, and a “knife blade” unit for cutting, in the scope of non-computing technology. For instance, a storage device is known to be memory, a voice receiver known to be a microphone-like device, and a voice operation device similarly is known to control devices. Similarly, the other terms involve explicit processors, further limited, as part of their execution process. Such terms have the structure to perform the claimed limitations.

Specifically considering the following:
E. “Detector” 
1. Personalized Media Commc’ns, LLC, v. ITC, 161 F.3d 696 (Fed. Cir. 1998) 
United States Patent 5,335,277 (“the ’277 patent”) 

The claimed subject matter relates to an integrated system for communicating programming, e.g. electronically transmitted entertain, instruct or inform, including television, radio, broadcast print, etc. The relevant claim language of independent claim 44 of the ’277 patent is as follows: 

“. . .a digital detector operatively connected to a mass medium receiver for detecting digital information in a mass medium transmission and transferring some of said detected information to a processor; . . .” 

Issue: Does the claim limitation “digital detector” invoke 35 U.S.C. § 112, sixth paragraph? 

Analysis: The claim limitation does not use “means for” language to invoke 35 U.S.C. § 112, sixth paragraph. To one of ordinary skill in the relevant art, the term “detector” connotes or describes in general a structure. The claim term “digital detector” is subsequently modified by the functional language “for detecting.” The fact that the term “detector” does not suggest a precise physical structure would not result in 35 U.S.C. § 112, sixth paragraph being invoked. The claim term “detector” is understood in the relevant prior art and defined in dictionaries as having a well known meaning in the electrical arts connotative of structure. Further, just because the claim term “detector” is a name for structure drawn from the function it performs, should not result in treatment under 35 U.S.C. 112, sixth paragraph. Therefore the term “detector” is structural and not a nonce word or a verbal construct that is not recognized as the name of structure and simply a substitute for the term “means for.” Accordingly, the presence of a structural term combined with the absence of any “means for” language indicates that 35 U.S.C. § 112, sixth paragraph is not invoked. 

Conclusion: 35 U.S.C. § 112, sixth paragraph is not invoked. 
The above analysis is in accordance with MPEP 2181 I 8th Ed. Rev. 6., Sept 2007 Pages 2100-236, and further in view of the analytical framework of the 2011 Supplementary Guidelines to determine whether a limitation invokes 35 U.S.C. § 112, sixth paragraph.
And additionally in view of the courts regarding “means” per se, the example of an Ink jet means for ink delivery modified by “ink jet” which is sufficient structure for achieving specified functions. If such elements e.g. unit, module, sensor, were recited on its own in the current claims, such interpretation would not be applicable, and instead a generic placeholder would be present, such as the sole mention of “device” or “apparatus” on its own would in fact invoke 112(f). In the case above, this is not reasonable for the field of software/hardware.
Such claims are believed to not exhibit:
a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function, and
2) “means” or “step”, and 2) usage of the word “means” or “step”.

		
	
	
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-5 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by US 20110054895 A1 Phillips; Michael S. et al. (hereinafter Phillips).
Re claim 1, Phillips teaches 
1. A voice operation device that includes: (under BRI, fig. 1 the combination of several devices per se to render ASR results in the form of text as a command and control scheme)
A voice receiver that receives voice input from a user; and (under BRI, this can be a microphone and its affiliated components to be able to accept voice 0145)
a first communication device that communicates with an operated device and a voice recognition device, (under BRI, a communication device such as a transmitter and receiver device combination necessary for network communication 0201 with fig. 1 or for on-board local processing it can be literal wires and chips that are communicating data 0205… an operated device 0202 such as a printer 0197, a GPS arrangement 0050 with fig. 5b, a music player with speakers arrangement 0059 and 0177… and voice recognition device such as literal ASR arrangement both locally as an ASR client and/or ASR at a server device as in fig. 1 in communication with an operational/operated device across a network)
a first controller that includes a processor and causes the processor to execute a computer program to transmit voice data indicating the voice received by the voice receiver to the voice recognition device from the first communication device, (under BRI, a controller in some location including a processor and memory as a combination to render a “controller” and a program per se in some capacity 0205-0207 to execute the ASR and command operations)
wherein the first controller transmits text data indicating text generated as a result of a voice recognition processin the voice recognition devicereceived from the voice recognition device by the first communication device to the operated device that operates in accordance with the result through the first communication device. (literal collection of speech, conversion to text, processing of text to extract a command, and communicating the results locally or over a network as previously identified under BRI, a communication device such as a transmitter and receiver device combination necessary for network communication 0201 with fig. 1 or for on-board local processing it can be literal wires and chips that are communicating data 0205… an operated device 0202 such as a printer 0197, a GPS arrangement 0050 with fig. 5b, a music player with speakers arrangement 0059 and 0177… and voice recognition device such as literal ASR arrangement both locally as an ASR client and/or ASR at a server device as in fig. 1 in communication with an operational/operated device across a network… and operations for example with limiting scope, as follows: command and control results delivered as in fig. 7b for example, speech converted to text to derive a command to control a device such as a user needing directions or to make a phone call, externally for in-context applications e.g. music vs navigation in 0058 with indicators of reception e.g. fig. 7b… using a voice command input 0070, referencing fig. 1 and 1a there exists two devices, one is a user device as in 0057 with figure elements 142 / 120, and second device which receives the transmitted data to process and return results such as a command that the user is requesting be handled, in this instance at an external model at the second device as in fig. 1 and fig. 2 containing models, vocabularies, dictionaries, etc. element 102 with element 218)


Re claim 2, Phillips teaches 
2. (Currently Amended) The voice operation device according to claim 1, wherein the first controller downloads dictionary data indicating specialized terminology specific to the operated device through the first communication device and transmits the downloaded dictionary data to the voice recognition device that executes the voice recognition process using the downloaded dictionary data through the first communication device. (under BRI, a communication device such as a transmitter and receiver device combination necessary for network communication 0201 with fig. 1 or for on-board local processing it can be literal wires and chips that are communicating data 0205… an operated device 0202 such as a printer 0197, a GPS arrangement 0050 with fig. 5b, a music player with speakers arrangement 0059 and 0177… and voice recognition device such as literal ASR arrangement both locally as an ASR client and/or ASR at a server device as in fig. 1 in communication with an operational/operated device across a network… and also under BRI, a controller in some location including a processor and memory as a combination to render a “controller” and a program per se in some capacity 0205-0207 to execute the ASR and command operations… using a dictionary from an external source 0172 that is updateable 0149 transmitting results back to the user ASR device, referencing fig. 1 and 1a there exists two devices, one is a user ASR device as in 0057 with figure elements 142 / 120, and a second device which receives the transmitted data to process and return results such as a command that the user is requesting be handled, in this instance at an external model at the second device as in fig. 1 and fig. 2 containing models, vocabularies, dictionaries, etc. element 102 with element 218)


Re claim 3, Phillips teaches
3. (Currently Amended) A computer readable non-transitory recording medium having a voice operation program stored therein, the voice operation program causes a computer included in the voice operation device to act (under BRI, fig. 1 the combination of several devices per se to render ASR results in the form of text as a command and control scheme)voice recognition performed by a voice recognition device on voice data indicating the received voice to an operated device that operates in accordance with the result 
as a first controller that causes a first communication device to transmit voice data indicating a voice accepted by a voice receiver in the voice operation device to a voice recognition device, and that causes a first communication device to transmit text data indicating text generated as a result of a voice recognition process in the voice recognition device received from the voice recognition device by the first communication device to the operated device that operates in accordance with the result. (literal collection of speech, conversion to text, processing of text to extract a command, and communicating the results locally or over a network as previously identified under BRI, a communication device such as a transmitter and receiver device combination necessary for network communication 0201 with fig. 1 or for on-board local processing it can be literal wires and chips that are communicating data 0205… an operated device 0202 such as a printer 0197, a GPS arrangement 0050 with fig. 5b, a music player with speakers arrangement 0059 and 0177… and voice recognition device such as literal ASR arrangement both locally as an ASR client and/or ASR at a server device as in fig. 1 in communication with an operational/operated device across a network… and operations for example with limiting scope, as follows: command and control results delivered as in fig. 7b for example, speech converted to text to derive a command to control a device such as a user needing directions or to make a phone call, externally for in-context applications e.g. music vs navigation in 0058 with indicators of reception e.g. fig. 7b… using a voice command input 0070, referencing fig. 1 and 1a there exists two devices, one is a user device as in 0057 with figure elements 142 / 120, and second device which receives the transmitted data to process and return results such as a command that the user is requesting be handled, in this instance at an external model at the second device as in fig. 1 and fig. 2 containing models, vocabularies, dictionaries, etc. element 102 with element 218)


Re claim 4, Phillips teaches 
4. (Currently Amended) A voice operating system comprising: (under BRI, fig. 1 the combination of several devices per se to render ASR results in the form of text as a command and control scheme)
an operated device; and (an operated device 0202 such as a printer 0197, a GPS arrangement 0050 with fig. 5b, a music player with speakers arrangement 0059 and 0177…)
a voice operation device that receives voice input from a user and operates the operated device, (under BRI, a communication device such as a transmitter and receiver device combination necessary for network communication 0201 with fig. 1 or for on-board local processing it can be literal wires and chips that are communicating data 0205… with an operated device 0202 such as a printer 0197, a GPS arrangement 0050 with fig. 5b, a music player with speakers arrangement 0059 and 0177… and voice recognition device such as literal ASR arrangement both locally as an ASR client and/or ASR at a server device as in fig. 1 in communication with an operational/operated device across a network)
wherein the voice operation device includes,
a voice receiver that receives voice input from a user, (under BRI, this can be a microphone and its affiliated components to be able to accept voice 0145)
a first communication device that communicates with an operated device and a voice recognition device, and (under BRI, a communication device such as a transmitter and receiver device combination necessary for network communication 0201 with fig. 1 or for on-board local processing it can be literal wires and chips that are communicating data 0205… an operated device 0202 such as a printer 0197, a GPS arrangement 0050 with fig. 5b, a music player with speakers arrangement 0059 and 0177… and voice recognition device such as literal ASR arrangement both locally as an ASR client and/or ASR at a server device as in fig. 1 in communication with an operational/operated device across a network)
a first controller that includes a processor and causes the processor to execute a computer program to transmit voice data indicating the voice received by the voice receiver to the voice recognition device from the first communication device, (under BRI, a controller in some location including a processor and memory as a combination to render a “controller” and a program per se in some capacity 0205-0207 to execute the ASR and command operations)
wherein the first controller transmits text data indicating text generated as a result of a voice recognition processin the voice recognition devicereceived from the voice recognition device by the first communication device to the operated device through the first communication device, and (literal collection of speech, conversion to text, processing of text to extract a command, and communicating the results locally or over a network as previously identified under BRI, a communication device such as a transmitter and receiver device combination necessary for network communication 0201 with fig. 1 or for on-board local processing it can be literal wires and chips that are communicating data 0205… an operated device 0202 such as a printer 0197, a GPS arrangement 0050 with fig. 5b, a music player with speakers arrangement 0059 and 0177… and voice recognition device such as literal ASR arrangement both locally as an ASR client and/or ASR at a server device as in fig. 1 in communication with an operational/operated device across a network… and operations for example with limiting scope, as follows: command and control results delivered as in fig. 7b for example, speech converted to text to derive a command to control a device such as a user needing directions or to make a phone call, externally for in-context applications e.g. music vs navigation in 0058 with indicators of reception e.g. fig. 7b… using a voice command input 0070, referencing fig. 1 and 1a there exists two devices, one is a user device as in 0057 with figure elements 142 / 120, and second device which receives the transmitted data to process and return results such as a command that the user is requesting be handled, in this instance at an external model at the second device as in fig. 1 and fig. 2 containing models, vocabularies, dictionaries, etc. element 102 with element 218)
wherein the operated device includes 
a second communication device that communicates with the voice operation device, (on the server side of operations, under BRI, a communication device such as a transmitter and receiver device combination necessary for network communication 0201 with fig. 1 or for on-board local processing it can be literal wires and chips that are communicating data 0205… an operated device 0202 such as a printer 0197, a GPS arrangement 0050 with fig. 5b, a music player with speakers arrangement 0059 and 0177… and voice recognition device such as literal ASR arrangement both locally as an ASR client and/or ASR at a server device as in fig. 1 in communication with an operational/operated device across a network)
a storage device that stores the dictionary data, and (on the server side of operations,  as part of the network as a whole connected to the operated device for inclusion in the processing per se, under BRI, a communication device such as a transmitter and receiver device combination necessary for network communication 0201 with fig. 1 or for on-board local processing it can be literal wires and chips that are communicating data 0205… an operated device 0202 such as a printer 0197, a GPS arrangement 0050 with fig. 5b, a music player with speakers arrangement 0059 and 0177… and voice recognition device such as literal ASR arrangement both locally as an ASR client and/or ASR at a server device as in fig. 1 in communication with an operational/operated device across a network… and also under BRI, a controller in some location including a processor and memory as a combination to render a “controller” and a program per se in some capacity 0205-0207 to execute the ASR and command operations… using a dictionary from an external source 0172 that is updateable 0149 transmitting results back to the user ASR device, referencing fig. 1 and 1a there exists two devices, one is a user ASR device as in 0057 with figure elements 142 / 120, and a second device which receives the transmitted data to process and return results such as a command that the user is requesting be handled, in this instance at an external model at the second device as in fig. 1 and fig. 2 containing models, vocabularies, dictionaries, etc. element 102 with element 218)
a second controller that includes a processor and causes the processor to execute an operated program to operate as a natural language processor that generates a command from the text data when the second communication device receives the text data from the voice operation device, and (ASR on the server side of operations, under BRI, a controller in some location including a processor and memory as a combination to render a “controller” and a program per se in some capacity 0205-0207 to execute the ASR and command operations)
a command executor that brings the operated device into operation in accordance with the command generated by the natural language processor. (commands from collection of speech, conversion to text, processing of text to extract a command, and communicating the results locally or over a network as previously identified under BRI, a communication device such as a transmitter and receiver device combination necessary for network communication 0201 with fig. 1 or for on-board local processing it can be literal wires and chips that are communicating data 0205… an operated device 0202 such as a printer 0197, a GPS arrangement 0050 with fig. 5b, a music player with speakers arrangement 0059 and 0177… and voice recognition device such as literal ASR arrangement both locally as an ASR client and/or ASR at a server device as in fig. 1 in communication with an operational/operated device across a network… and operations for example with limiting scope, as follows: command and control results delivered as in fig. 7b for example, speech converted to text to derive a command to control a device such as a user needing directions or to make a phone call, externally for in-context applications e.g. music vs navigation in 0058 with indicators of reception e.g. fig. 7b… using a voice command input 0070, referencing fig. 1 and 1a there exists two devices, one is a user device as in 0057 with figure elements 142 / 120, and second device which receives the transmitted data to process and return results such as a command that the user is requesting be handled, in this instance at an external model at the second device as in fig. 1 and fig. 2 containing models, vocabularies, dictionaries, etc. element 102 with element 218)

Re claim 5, Phillips teaches 
5. (Currently Amended) The voice operating system according to claim4,  wherein the first controller of the voice operation device causes the first communication device to transmit dictionary data indicating specialized terminology specific to the operated device, which is received from the operated device to the voice recognition device, and causes the first communication device to transmit text data indicating text generated through a voice recognition process using the dictionary data in the voice recognition device to the operated device, (literal collection of speech, conversion to text, processing of text to extract a command, and communicating the results locally or over a network as previously identified under BRI, a communication device such as a transmitter and receiver device combination necessary for network communication 0201 with fig. 1 or for on-board local processing it can be literal wires and chips that are communicating data 0205… an operated device 0202 such as a printer 0197, a GPS arrangement 0050 with fig. 5b, a music player with speakers arrangement 0059 and 0177… and voice recognition device such as literal ASR arrangement both locally as an ASR client and/or ASR at a server device as in fig. 1 in communication with an operational/operated device across a network… and operations for example with limiting scope, as follows: command and control results delivered as in fig. 7b for example, speech converted to text to derive a command to control a device such as a user needing directions or to make a phone call, externally for in-context applications e.g. music vs navigation in 0058 with indicators of reception e.g. fig. 7b… using a voice command input 0070, referencing fig. 1 and 1a there exists two devices, one is a user device as in 0057 with figure elements 142 / 120, and second device which receives the transmitted data to process and return results such as a command that the user is requesting be handled, in this instance at an external model at the second device as in fig. 1 and fig. 2 containing models, vocabularies, dictionaries, etc. element 102 with element 218)

wherein the operated device further includes
a storage device that stores the dictionary data, and (a part of the network as a whole connected to the operated device for inclusion in the processing per se, under BRI, a communication device such as a transmitter and receiver device combination necessary for network communication 0201 with fig. 1 or for on-board local processing it can be literal wires and chips that are communicating data 0205… an operated device 0202 such as a printer 0197, a GPS arrangement 0050 with fig. 5b, a music player with speakers arrangement 0059 and 0177… and voice recognition device such as literal ASR arrangement both locally as an ASR client and/or ASR at a server device as in fig. 1 in communication with an operational/operated device across a network… and also under BRI, a controller in some location including a processor and memory as a combination to render a “controller” and a program per se in some capacity 0205-0207 to execute the ASR and command operations… using a dictionary from an external source 0172 that is updateable 0149 transmitting results back to the user ASR device, referencing fig. 1 and 1a there exists two devices, one is a user ASR device as in 0057 with figure elements 142 / 120, and a second device which receives the transmitted data to process and return results such as a command that the user is requesting be handled, in this instance at an external model at the second device as in fig. 1 and fig. 2 containing models, vocabularies, dictionaries, etc. element 102 with element 218)
the a-second controller further causes the processor to execute the an operated program to operate as a dictionary data transmitter that transmits the dictionary data from the second communication device to the voice operation device (as part of the network as a whole connected to the operated device for inclusion in the processing per se, under BRI, a communication device such as a transmitter and receiver device combination necessary for network communication 0201 with fig. 1 or for on-board local processing it can be literal wires and chips that are communicating data 0205… an operated device 0202 such as a printer 0197, a GPS arrangement 0050 with fig. 5b, a music player with speakers arrangement 0059 and 0177… and voice recognition device such as literal ASR arrangement both locally as an ASR client and/or ASR at a server device as in fig. 1 in communication with an operational/operated device across a network… and also under BRI, a controller in some location including a processor and memory as a combination to render a “controller” and a program per se in some capacity 0205-0207 to execute the ASR and command operations… using a dictionary from an external source 0172 that is updateable 0149 transmitting results back to the user ASR device, referencing fig. 1 and 1a there exists two devices, one is a user ASR device as in 0057 with figure elements 142 / 120, and a second device which receives the transmitted data to process and return results such as a command that the user is requesting be handled, in this instance at an external model at the second device as in fig. 1 and fig. 2 containing models, vocabularies, dictionaries, etc. element 102 with element 218)


5. The voice operating system according to claim 4, wherein the voice operation device includes 
a voice receiver that receives voice input from a user, (microphone otherwise inherent for ASR 0177)
a first communication device that communicates with the operated device and the voice recognition device, and (one is a user device as in 0057 with figure elements 142 / 120)
a first controller that includes a processor and causes the processor to execute a computer program to transmit voice data indicating the voice received by the voice receiver and dictionary data indicating specialized terminology specific to the operated device received from the operated device by the first communication device by the first communication device to the voice recognition device, (in-context applications e.g. music vs navigation in 0058 with indicators of reception e.g. fig. 7b, using a dictionary thereof from an external source 0172 that is updateable 0149 transmitting results back to the user ASR device, referencing fig. 1 and 1a there exists two devices, one is a user ASR device as in 0057 with figure elements 142 / 120, and a second device which receives the transmitted data to process and return results such as a command that the user is requesting be handled, in this instance at an external model at the second device as in fig. 1 and fig. 2 containing models, vocabularies, dictionaries, etc. element 102 with element 218)
the first controller transmits text data indicating text generated through a voice recognition process using the dictionary data in the voice recognition device from the first communication device to the operated device, and (speech converted to text externally for in-context applications e.g. music vs navigation in 0058 with indicators of reception e.g. fig. 7b, using a dictionary thereof from an external source 0172 that is updateable 0149 transmitting results back to the user ASR device, referencing fig. 1 and 1a there exists two devices, one is a user ASR device as in 0057 with figure elements 142 / 120, and a second device which receives the transmitted data to process and return results such as a command that the user is requesting be handled, in this instance at an external model at the second device as in fig. 1 and fig. 2 containing models, vocabularies, dictionaries, etc. element 102 with element 218)
the operated device includes 
a second communication device that communicates with the voice operation device, (ASR route element 202… or 0171 a network based server to transmit and receive as part of the server infrastructure in fig. 1)
a storage device that stores the dictionary data, and (in fig. 2 element 102 and specifically element 218, further using a dictionary thereof from an external source 0172 that is updateable 0149 transmitting results back to the user ASR device)
a second controller that includes a processor and causes the processor to execute an operated program to operate as (the ASR engine per se element 208 as part of a processor based server 0194)
a dictionary data transmitter that transmits the dictionary data from the second communication device to the voice operation device, (using the identified components of the server.. using a dictionary from an external source 0172 that is updateable 0149 transmitting results back to the user ASR device, referencing fig. 1 and 1a there exists two devices, one is a user ASR device as in 0057 with figure elements 142 / 120, and a second device which receives the transmitted data to process and return results such as a command that the user is requesting be handled, in this instance at an external model at the second device as in fig. 1 and fig. 2 containing models, vocabularies, dictionaries, etc. element 102 with element 218)
a natural language processor that generates a command from the text data when the second communication device receives the text data from the voice operation device, and (speech converted to text to derive a command to control a device such as a user needing directions or to make a phone call, externally for in-context applications e.g. music vs navigation in 0058 with indicators of reception e.g. fig. 7b, using a dictionary thereof from an external source 0172 that is updateable 0149 transmitting results back to the user ASR device, referencing fig. 1 and 1a there exists two devices, one is a user ASR device as in 0057 with figure elements 142 / 120, and a second device which receives the transmitted data to process and return results such as a command that the user is requesting be handled, in this instance at an external model at the second device as in fig. 1 and fig. 2 containing models, vocabularies, dictionaries, etc. element 102 with element 218)
a command executor that brings the operated device into operation in accordance with the command generated by the natural language processor. (as in fig. 7b, actions are taking place to produce results executed…based on speech converted to text to derive a command to control a device such as a user needing directions or to make a phone call, externally for in-context applications e.g. music vs navigation in 0058 with indicators of reception e.g. fig. 7b, using a dictionary thereof from an external source 0172 that is updateable 0149 transmitting results back to the user ASR device, referencing fig. 1 and 1a there exists two devices, one is a user ASR device as in 0057 with figure elements 142 / 120, and a second device which receives the transmitted data to process and return results such as a command that the user is requesting be handled, in this instance at an external model at the second device as in fig. 1 and fig. 2 containing models, vocabularies, dictionaries, etc. element 102 with element 218)


Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

US 20180336894 A1	GRAHAM; David Chance et al.
Voice command processing

US 20170358302 A1	ORR; Ryan M. et al.
Open ended user intent processing


Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL COLUCCI whose telephone number is (571)270-1847.  The examiner can normally be reached on M-F 9 AM - 7 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/MICHAEL COLUCCI/Primary Examiner, Art Unit 2655                                                                                                                                                                                               (571)-270-1847
Examiner FAX:  (571)-270-2847
Michael.Colucci@uspto.gov
Read full office action
Prosecution Timeline

May 17, 2024
Application Filed
Dec 23, 2025
Non-Final Rejection mailed — §102
Mar 19, 2026
Response Filed
May 06, 2026
Final Rejection mailed — §102 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/422,681
Patent 12640144
Generating Synthetic Conference Transcripts Using Natural Language Processing
2y 4m to grant Granted May 26, 2026
18/401,171
Patent 12633286
MACHINE LEARNING MODEL IMPROVEMENT
2y 4m to grant Granted May 19, 2026
18/352,601
Patent 12626697
SYSTEM AND METHOD FOR KEYWORD FALSE ALARM REDUCTION
2y 10m to grant Granted May 12, 2026
19/225,487
Patent 12620262
USING ARTIFICIAL ENTITIES FOR GENERATING PERSONALIZED RESPONSES
11m to grant Granted May 05, 2026
18/515,502
Patent 12592240
ENCODING AND DECODING OF ACOUSTIC ENVIRONMENT
2y 4m to grant Granted Mar 31, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
76%
Grant Probability
91%
With Interview (+15.2%)
3y 1m (~1y 1m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 999 resolved cases by this examiner. Grant probability derived from career allowance rate.