DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/12/2026 was filed after the mailing date of the Non-Final Rejection on 10/02/2025. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Response to Amendment
This communication is responsive to the applicant’s amendment dated 12/10/2025. The applicant(s) amended claims 1-9 and added claim 10.
Response to Arguments
Applicant's arguments with respect to independent claims 1 and 9 have been considered but are moot in view of the new ground(s) of rejection because the arguments pertain to the newly amended limitations.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 2-9 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Sharifi et al. (US 20230252995 A1).
Regarding claims 1 and 9, Sharifi teaches:
“A speech input support device” (par. 0033; ‘In some implementations, one or more microphones of the client device can capture audio data, where the audio data captures the spoken input spoken by the user.’) comprising:
“a microphone which obtains a speech input by a user by converting the speech into an electric signal” (par. 0033; ‘In some implementations, one or more microphones of the client device can capture audio data, where the audio data captures the spoken input spoken by the user.’):
“a recording unit which records the speech obtained by the microphone” (par. 0033; ‘In some implementations, one or more microphones of the client device can capture audio data, where the audio data captures the spoken input spoken by the user.’ Capturing audio data reads on recording the speech obtained by the microphone.); and
“a processor including hardware” (par. 0048; ‘… one or more processors for accessing data and executing applications…’) configured to:
“perform first speech recognition with respect to the speech obtained by the microphone for input of a first recording content” (par. 0033; ‘The system can generate a candidate text representation of the spoken input. In some implementations, the system can generate the candidate text representation of the spoken input by processing the audio data capturing the spoken input using an automatic speech recognition (‘ASR’) model.’);
“generate the first recording content based on a result of the first speech recognition” (par. 0034; ‘In some implementations, the system can render candidate text output 104 of the “WHAT IS A HAT”.’);
“perform, separately from the first speech recognition, second speech recognition with respect to the speech recorded by the recording unit for generation of a second recording content” (par. 0034; ‘In some implementations, the user can see the misrecognition of the word ‘vat’ and can speak further spoken utterance of “with a V” to correct the misrecognition.’);
“generate the second recording content based on (i) a result of the second speech recognition and (ii) a next operation for the user to perform input of the first recording content” (par. 0034; ‘In example 100, the system generates a revised text representation of the spoken input 208 of “WHAT IS A VAT”, where the revised text representation corrects the misrecognition of the word ‘vat’.’); and
“compare the first recording content with the second recording content” (par. 0096; ‘In some versions of those implementations, the method further includes determining the correction of the at least one word in the candidate text representation based on comparing the one or more attributes with the plurality of hypotheses of the text representation.’).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 2-8 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Sharifi in view of Preis et al. (US 20230252083 A1).
Regarding claim 2 (dep. on claim 1), Sharifi does not expressly teach:
“the next operation for the user is an operation to output guidance speech from the speech input support device;”
“the processor is further configured to perform third speech recognition with respect to the guidance speech output from the speech input support device;” and
“the processor generates the second recording content by associating (i) the result of the second speech recognition with (ii) a result of the third speech recognition.”
Preis teaches:
“the next operation for the user is an operation to output guidance speech from the speech input support device;” (par. 0114; ‘Upon triggering of a rule or threshold at the data integration and analysis manager, another natural language response is generated 941 stating that the minimum pressure of pump X has fallen below the normal threshold of 100 psi 916, to which the user responds by stating the exception “the minimum pressure of Pump X should be 90 psi” 917.’);
“the processor is further configured to perform third speech recognition with respect to the guidance speech output from the speech input support device;” (par. 0114; ‘Upon triggering of a rule or threshold at the data integration and analysis manager, another natural language response is generated 941 stating that the minimum pressure of pump X has fallen below the normal threshold of 100 psi 916, to which the user responds by stating the exception “the minimum pressure of Pump X should be 90 psi” 917.’); and
“the processor generates the second recording content by associating (i) the result of the second speech recognition with (ii) a result of the third speech recognition” (par. 0114; ‘The platform IVA manager updates the user-specific and device-specific IVA to reflect the new information in the exception 944 and the updates are uploaded to the facilities manager device IVA 918, along with a natural language generated response 945 confirming that the minimum pressure threshold for Pump X has been set to 90 psi, and asking whether this new rule should be applied to all pumps of type X 919.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Sharifi’s method of processing further spoken input by incorporating Preis’s industrial virtual assistant (IVA) manager in order to further process guidance speech. The modification would be beneficial in assisting in the operation of an industrial facility. (Preis: par. 0113)
Regarding claim 3 (dep. on claim 2), the combination of Sharifi in view of Preis further teaches:
“the recordinq unit records a series of speeches of the user and the guidance speech together” (Preis: par. 0113; ‘Here, messaging is shown between a user 910 and the system 930, along sequences of operations occurring either at the IVA instance or the IVA platform.’); and
“the processor recognizes the speeches of the user and the guidance speech together” (Preis: par. 0113; ‘Here, messaging is shown between a user 910 and the system 930, along sequences of operations occurring either at the IVA instance or the IVA platform.’).
Regarding claim 4 (dep. on claim 3), the combination of Sharifi in view of Preis further teaches:
“recognizes the speech of the user and the guidance speech while maintaining a relationship in order therebetween” (Preis: par. 0101; ‘Each domain contains information and rules about that domain of knowledge such as terminology, formulas, objects, relationships, interactions, and other information that experts in that field would be expected to know or be familiar with.’ See also par. 0104); and
“associates the result of the second speech recognition with the result of the third speech recognition based on the relationship in order” (Preis: par. 0114; ‘A positive response by the user 920 and natural language recognition by the IVA communications manager 946 sets in motion an update process which identifies all affected assets 947, retrains the machine learning algorithms in using both the new domain information (90 psi for pumps of type X instead of 100 psi) and the new context information (from the facilities manager using a particular device at a particular time on a particular day, etc.) 948, updates the domain-specific knowledge databases based on the retraining 949, and finally updates the user-specific and device-specific IVA 950 and uploads it to the facility manager's device 921.’).
Regarding claim 5 (dep. on claim 3), the combination of Sharifi in view of Preis further teaches:
“wherein the processor associates the result of the second speech recognition with the result of the third speech recognition based on a time period during which the speech of the user is being input and a time period during which the guidance speech is being output” (Preis: par. 0047; ‘For example, a facility manager/engineer might ask (by voice or text) the industrial virtual assistant that may be installed on his or her mobile device “What the fluid pressure is at a certain pumping station during time interval x?” In this case, the industrial virtual assistant has the relevant domain expertise to understand the correct meaning of the technical terms: “fluid pressure,” “in the pumping station,” and “at time interval x.”’).
Regarding claim 6 (dep. on claim 3), the combination of Sharifi in view of Preis further teaches:
“the speech obtained by the microphone comprises an input to a data item” (Preis: par. 0047; ‘For example, a facility manager/engineer might ask (by voice or text) the industrial virtual assistant that may be installed on his or her mobile device “What the fluid pressure is at a certain pumping station during time interval x?” In this case, the industrial virtual assistant has the relevant domain expertise to understand the correct meaning of the technical terms: “fluid pressure,” “in the pumping station,” and “at time interval x.”’); and
“the processor matches the result of the third speech recognition with a name of the data item at a phoneme level to associate the result of the second speech recognition with the result of the third speech recognition” (Sharifi: par. 0001; ‘For example, an automatic speech recognition (ASR) engine can process audio data that correspond to a spoken utterance of a user to generate ASR output, such as one or more speech hypotheses (i.e., sequence of term(s) and/or other token(s)) of the spoken utterance or phoneme(s) that are predicted to correspond to the spoken utterance.’).
Regarding claim 7 (dep. on claim 1), the combination of Sharifi in view of Preis further teaches:
“the speech obtained by the microphone comprises an input of a value to a data item” (Preis: par. 0114; ‘Upon triggering of a rule or threshold at the data integration and analysis manager, another natural language response is generated 941 stating that the minimum pressure of pump X has fallen below the normal threshold of 100 psi 916, to which the user responds by stating the exception “the minimum pressure of Pump X should be 90 psi” 917.’); and
“the processor matches the result of the third speech recognition with a candidate of a value to be input to the data item at a phoneme level to generate the second recording content using a candidate closest to the phoneme level as the value input to the data item” (Sharifi: par. 0001; ‘For example, an automatic speech recognition (ASR) engine can process audio data that correspond to a spoken utterance of a user to generate ASR output, such as one or more speech hypotheses (i.e., sequence of term(s) and/or other token(s)) of the spoken utterance or phoneme(s) that are predicted to correspond to the spoken utterance.’).
Regarding claim 8 (dep. on claim 1), the combination of Sharifi in view of Preis further teaches:
“wherein the processor presents to the user time at which a difference between the first recording content and the second recording content occurred” (Preis: par. 0134; ‘This summary includes and is not limited to incident type, time, location map; exceptions included in the incident along with the time series and geographical information; and status of incident in terms of being active, under investigation, or closed.’).
Regarding claim 10 (dep. on claim 1), the combination of Sharifi in view of Preis further teaches:
“the speech obtained by the microphone comprises input to one or more data items” Preis: par. 0114; ‘Upon triggering of a rule or threshold at the data integration and analysis manager, another natural language response is generated 941 stating that the minimum pressure of pump X has fallen below the normal threshold of 100 psi 916, to which the user responds by stating the exception “the minimum pressure of Pump X should be 90 psi” 917.’); and
“the processor sequentially fills in the one or more data items based on interaction between the speech input support device and the user” (Preis: par. 0114; ‘Upon triggering of a rule or threshold at the data integration and analysis manager, another natural language response is generated 941 stating that the minimum pressure of pump X has fallen below the normal threshold of 100 psi 916, to which the user responds by stating the exception “the minimum pressure of Pump X should be 90 psi” 917.’).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK VILLENA whose telephone number is (571)270-3191. The examiner can normally be reached 10 am - 6pm EST Monday through Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
MARK . VILLENA
Examiner
Art Unit 2658
/MARK VILLENA/Examiner, Art Unit 2658