Last updated: April 18, 2026
Application No. 18/522,672
INFORMATION PROCESSING DEVICE AND NON-TRANSITORY COMPUTER-READABLE MEDIUM STORING INFORMATION PROCESSING PROGRAM

Final Rejection §101§103
Filed
Nov 29, 2023
Examiner
LEE, JANGWOEN
Art Unit
2656
Tech Center
2600 — Communications
Assignee
Toyota Jidosha Kabushiki Kaisha
OA Round
2 (Final)
Interview Optional

— +24.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 44 resolved cases, 2023–2026
Examiner Intelligence

LEE, JANGWOEN View full profile →
Grants 82% — above average
Career Allow Rate
36 granted / 44 resolved
+19.8% vs TC avg
Strong +24% interview lift
Without
With
+24.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
23 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
26.5%
-13.5% vs TC avg
§103
54.6%
+14.6% vs TC avg
§102
11.0%
-29.0% vs TC avg
§112
4.1%
-35.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 44 resolved cases
Office Action

§101 §103
DETAILED ACTION
The Response filed on 01/26/2026 has been correspondingly accepted and considered in the office action.  Claims 1-5 are pending. Claims 1 and 5 are independent and amended..

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	
Response to Arguments
With respect to rejection of Claims 1-5 as being abstract idea without significantly more Under 35 U.S.C §101, Applicant appears to be presenting following position on Remarks, pp 6-9, filed on 01/26/2026:
“…Contrary to the Office Action's position that the features of independent claims 1 and 5 recite limitations that cover mental processes (Office Action at p. 3), independent claims I and 5 now affirmatively recite the operations by the processor… Moreover, amended independent claims I and 5 recite more than just mere instructions to apply the exception using a generic computer component (Office Action at p. 4)…”

In response, Examiner respectfully notes that amended limitations, “(1) setting a window size”, “(2) moving a start position and an end position a frame shift”, “(3) generating an estimation model”, “(4) selecting the estimation model corresponding to the user…”, “(5) determining the emotion of the user…”, and “(6) executing a majority decision approach…” do not integrate the judicial exception into a practical application nor are the claim elements sufficient to amount to significantly more than the judicial exception.
Limitations (1) and (2) recite at a high level of generality insignificant extra-solution activities or well understood, routine and conventional activities previously known to the industry. Each of the additional limitations is no more than mere instructions to apply the exception using a generic computer component. Limitations (3)-(5) encompass mental processes or recite generally linking the use of the judicial exception to a particular technological environment or field of use.
Limitation (3) and (4) do not provide details about how to generate and selecting an estimation model. Furthermore, limitation (5) "executing majority decision…" also simply recites no more than mere instructions to apply the exception using a generic computer component based on estimation results. 
Even when considered in combination, these amended limitations (1)-(5) represent mere instructions to apply an exception and insignificant extra-solution activity, and therefore do not amount to significantly more and fail to demonstrate improvements to the functioning of the computer itself or any other technology or technical field - see MPEP 2106.05 (a).
For the provided reasons, Examiner respectfully disagrees, and therefore, the rejection of Claims 1-5 as being abstract idea without significantly more Under 35 U.S.C §101 is sustained.
Claims 1-5 stand rejected under 35 U.S.C. § 103. Applicant’s arguments with respect to Claims 1-5 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
In order to expedite prosecution, and as to the material from the Specifications that are not in the Claim and are argued by the Applicant, please note Fujimura  (US Pub No 2015/0269940).
For at least the supra provided reasons, Applicant's arguments have been fully considered but they are not persuasive.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-5 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea without significantly more. 
Regarding Claims 1 and 5, 
Claims 1 and 5 recite a device and a non-transitory computer-readable medium for information processing, which falls under the statutory category of machine and  manufacture, respectively (Step 1: Yes). 
Claims recites limitations “(1) acquire one piece of audio data of a user”, “(2) setting a window size”, “(3) extract a plurality of pieces of audio data…”, “(4) moving a start position and an end position a frame shift”, “(5) generating an estimation model”, “(6) estimate respective feature amounts…”, and “(7) selecting the estimation model corresponding to the user…”, “(8) determining the emotion of the user…”, and “(9) executing a majority decision approach…”. Except for the recitation of generic computer components (i.e., one or more hardware processors; memory, machine learning model, estimation model), limitations 3 and 5-9 can be performed in the human mind or with pen and paper. The claims, under their broadest reasonable interpretation, cover the concept of  a person listening other person’s utterance, speech, or singing and determining the emotional state or mood by carefully examining the characteristics of vocal features (pitch, tone, and prosody) and frequency of characteristic vocal feature appearance based on judging criteria (i.e. estimation model). (see MPEP 2106.04(a)(2) III. 
Under its broadest reasonable interpretation when read in light of the specification, the actions recited in limitations 3 and 5-9 encompass mental processes practically performed in the human mind.  According, the claim recites an abstract idea (Step 2A, Prong one).
The judicial exception is not integrated into a practical application. In particular, Claim 1 and 5 recite additional elements – a memory, a processor, a computer, and an estimation model in limitation (c). But they are recited at a high level of generality  (i.e., an estimation model, combination of hardware and software are a generic computing device and generic computer components performing a generic computer functions such as processing and storing data from given input) such that it amounts to no more than mere instructions to apply to the exception using a generic computer component.
The claims recite additional limitations 1-2 and 4. These limitations are recited at a high level of generality, and amount to mere data gathering, which is a form of insignificant extra-solution activity. Each of the additional limitations is no more than mere instructions to apply the exception using a generic computer component, or well understood, routine and conventional activities previously known to the industry.
Accordingly, the additional element do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea and the claim is therefore directed to the judicial exception. (Step 2A: YES).
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because they do not include subject matter that could not be performed by a human, as discussed above with respect to integration of the abstract idea into a practical application, the additional element of using the generic computing elements to perform the claimed elements amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. 
As noted previously, the claim as a whole merely describes how to generally linking the use of the aforementioned concept to a particular technological environment or field of use. Thus, even when viewed as a whole, nothing in the claim adds significantly more (i.e., an inventive concept) to the abstract idea. The claim is not patent eligible. (Step 2B: NO).
Regarding Dependent Claims 2-4,
Claim 2-4 are dependent on supra Claim 1 and includes all the limitations of the claim and further limits the elements of Claim 1. Therefore, the dependent claims recite the same abstract idea. The claim recites the additional limitations of individual user estimation model and an overall user estimation model in Claim 4, which are no more than mere instructions to apply the exception using a generic computer component, generally linking the use of the judicial exception to a particular technological environment or field of use, insignificant extra-solution activity, or that are well understood, routine and conventional activities previously known to the industry.
No additional elements beyond the use of generic computing elements are claimed, therefore the judicial exception is not integrated into a practical application nor are the claim elements sufficient to amount to significantly more than the judicial exception. Therefore, claims are not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5 are rejected under 35 U.S.C. 103 as being unpatentable over Katsuhiko et al., (JP6638435B2, hereinafter, Katsuhiko) in view of Rozgic et al., (US Pat No. 11,854538, hereinafter, Rozgic) further in view of Fujimura (US Pub No. 2015/0269940, hereinafter, Fujimura).

Regarding Claim 1,
Katsuhiko discloses an information processing device comprising: a memory; and a processor coupled to the memory, the processor being configured to (Katsuhiko, Fig.2, par [015], "…the emotion estimation device 100 estimates the emotional state of the speaker as one of positive, negative, and neutral emotional states…"; par [018], "…The control unit 1 includes a ROM (Read Only Memory), a RAM (Random Access Memory), and a CPU (Central Processing Unit). The ROM stores an emotion estimator personal adaptation processing program according to the present embodiment"):
acquire one piece of audio data of a user (Katsuhiko, Fig.2, par [023], "…The audio data acquisition unit 110 acquires, via the input / output unit 3, audio data of a user to be analyzed…"),
set, as a predetermined time period, a window size (Fig.3, par [024], "…an analysis window…shifting the analysis window by a shift width dt (i.e., frame shift)";,
extract a plurality of pieces of audio data that have been extracted during the predetermined time period from the one piece of audio data and in accordance with the window size, the plurality of pieces of audio data being extracted by moving the time period, including moving a start position and an end position, by a predetermined unit time as a frame shift (Figs.2-3, par [024], "…The audio data analysis unit 120 analyzes the acquired audio data…The audio data analysis unit 120 sets an analysis window starting from the start point t0 of the audio data…shifting the analysis window by a shift width dt"; par [025], "…the analysis window width and the shift width dt of the analysis window are set based on the sampling frequency of the audio data…"; i.e., predetermined window width and shift width),
generate an estimation model (par [010], "…By constructing an emotion estimator for the specific individual..."; par [042], "…The emotion estimator adaptation processing unit 180...the emotion estimator generated as the teacher data using the voice data uttered by the unspecified number of speakers is personally
adapted as the emotion estimator for estimating the emotional state of the specific individual when speaking..."),
estimate respective feature amounts indicating an emotion of the user from each of the plurality of pieces of audio data (par [026], "…The feature extraction unit 130 uses the power time-series data of the extracted voice data and the pitch time-series data of the voice data as the characteristics of the voice data…"; par [027], "…the feature extracting unit 130 includes a specific individual classifying unit 131 in order for the frequency analyzing unit 150 described later to analyze the appearance frequency of the power time-series change pattern and the pitch time-series change pattern for each specific individual…"), 
by selecting the estimation model corresponding to the user related to the plurality of pieces of audio data (par [042, 103], a personal adaptation of the emotion estimator).
Katsuhiko does not explicitly discloses the limitation, "by using an estimation model obtained by executing machine learning for estimating feature amounts indicating an emotion of the user from the plurality of pieces of audio data that have been extracted." 
However, Rozgic, in the analogous field of endeavor, discloses  by using an estimation model obtained by executing machine learning for estimating feature amounts indicating an emotion of the user from the plurality of pieces of audio data that have been extracted (Rozgic, Title, Abstract, "…a system for sentiment detection in audio data. The system processes audio frame level features of input audio data using a machine learning algorithm to classify the input audio data into a particular sentiment category..."; col.3, lls.21-25, "…employ an adversarial autoencoder to train a ML model to perform variational inferences over the latent variables present in audio data…"; Fig.5, col.19, lls.34-57, "…The sentiment detection component 275 may predict one of three sentiment categories 540, 550. the sentiment categories may be broad such as positive, neutral, and negative or may be more precise such as angry, happy, distressed, surprised, disgust…The machine learning model for the trained model component 515, 525 may take many forms, including a neural network…"). 
Therefore, it would have been obvious to one of ordinary skill in the art, before effective filing date of the claimed invention, to have modified an emotion estimation device of Katsuhiko with the sentiment detection system using machine learning models of Rozgic with a reasonable expectation of success to allows systems to learn ways to solve complex problems without needing an explicit algorithm for the system to follow and improve human-computer interactions by enabling the system to detect the emotion or other type of sentiment of a user while speaking to the system or to another person, and determine a user's satisfaction of his or her interactions with the voice-activated system or smart speaker system (Rozgic, col.1, ll. 5- col.2, ll.64).
 	Neither Katsuhiko nor Rozgic explicitly discloses "determine the emotion of the user indicated by the one piece of audio data as the most frequent emotion, by using the respective feature amounts corresponding to the plurality of pieces of audio data and executing a majority decision approach using a plurality of integrated estimation results."
However, Fujimura, in the analogous field of endeavor, discloses an information processing device (Fig.1, par [014-015], The pattern recognition device 100 to detect the voice segment and recognize the speaker attribute), acquiring one piece of audio data of a user (par [019], receiver 101 receives an input of a time-series signal (such as a sound)),  extracting a plurality of pieces of audio data (par [020], "…The signal processor 102 extracts the digital sound waveform every 256 point sample with a shift of a 128 point sample and makes it a single frame..."), and an estimation model obtained by executing machine learning (par [016, 036-038], the first recognizer 103 and the second recognizer 105 perform recognition processing by using a recognizer in a neural network (e.g., DNN)) for estimating feature amounts (par [020-21], The signal processor 102 calculates a 12-dimensional MFCC (Mel Frequency Cepstral Coefficient) feature or other features in par [021]) indicating the speaker attribute ( par [012], "…gender is recognized as a speaker attribute...other speaker attribute includes an age, a generation, emotion such as anger and sorrow, laughter, a cough, a speaker, and a voice itself..."; (para [022-027], first recognizer 103, detector 104, and second recognizer 105 performs the class recognition based on speaker attributes).
Fujimura further discloses determine the emotion of the user indicated by the one piece of audio data as the most frequent emotion, by using the respective feature amounts corresponding to the plurality of pieces of audio data and executing a majority decision approach using a plurality of integrated estimation results (par [012], "…gender is recognized as a speaker attribute...other speaker attribute includes an age, a generation, emotion such as anger and sorrow, laughter, a cough, a speaker, and a voice itself..."; (para [022-027], first recognizer 103, detector 104, and second recognizer 105 performs the class recognition based on speaker attributes..."; Fig.3, par [053], "…the first recognizer 103 performs recognition of three classes including the leaf classes (male, female) and the single class (Sil)...";  par [054], "…out of the total of 13 frames (frames between a start end 301 and a terminal 302) detected as the voice segment by the detector 104, eight frames are recognized to be the male voice while two frames are recognized to be the female voice."; par [055], "…the second recognizer 105 determines, in the event segment, the male voice "m" largest in number among the frames excluding the silence "s" as the final recognition result in the sound segment illustrated in FIG. 3..." ).
It would have been obvious to a person of ordinary skill in the art to use the neural network-based, multi-level recognizers of Fujimura in the machine learning-based emotion estimation/detection system taught by Katsuhiko in view of Rozgic to improve the device with a reasonable expectation that this would result in an emotion estimation device that could  accurately detect the voice segment and recognize the speaker attribute of the segment with use of a probability of each class for the frame. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings Katsuhiko in view of Rozgic  and Fujimura to obtain the invention as specified in claim 1 (Fujimura para [003-014).
Regarding Claim 2,
The combination of Katsuhiko, Rozgic and Fujimura discloses the information processing device according to claim 1, wherein the predetermined time period is set according to, in the one piece of audio data that has been previously acquired: 
Katsuhiko further discloses feature amounts indicating an emotion that corresponds to an emotion of the user that has been set as a label of the one piece of audio data, and feature amounts indicating an emotion that is different from the emotion of the user that has been set as the label of the one piece of audio data (Katsuhiko, Figs.2, 9, par [045], "…The emotion estimation unit 190 estimates the emotional state of the analysis target user at the time of utterance, using an emotion estimator mounted on the emotion estimation device 100…As shown in FIG. 10, the emotion estimating unit 190 adds the neutral, positive, and negative labels assigned by the first label assigning unit 160 and the second label assigning unit 170 based on each of the power analysis result and the pitch analysis result…."; par [079], "…for the extracted section to which the neutral label has not been added, a positive or negative discrimination is performed using the mounted emotion estimator. Then, the emotion state (positive, negative, or neutral) with the largest number of sections among the number of sections determined as positive, the number of sections determined as negative, and the number of sections determined as neutral is estimated as the emotion state of the speaker…").
Regarding Claim 3,
The combination of Katsuhiko, Rozgic and Fujimura discloses the information processing device according to claim 1, wherein the processor is configured to extract the plurality of pieces of audio data by setting the unit time, such that a number of pieces of the audio data extracted from the one piece of audio data is a predetermined number (Katsuhiko, par [025], "…the analysis window width and the shift width dt of the analysis window are set based on the sampling frequency of the audio data…"; i.e., predetermined window width and shift width).
Regarding Claim 3,
The combination of Katsuhiko, Rozgic and Fujimura discloses the information processing device according to claim 1, wherein the processor is configured to: 
estimate the respective feature amounts indicating the emotion of the user, using, as the estimation model, an individual user estimation model obtained by learning one piece of audio data for each individual user among a plurality of users (Katsuhiko, Fig.15 par [063], "…the emotion estimator adaptation processing unit 180 converts the emotion estimator in the initial state in which voice data spoken by an unspecified number of speakers as teacher data into an emotion estimator for estimating an emotion state dedicated to a specific individual..."; See Fig.15 and paras [064-070] for an emotion estimation adaptation process for the specific user), and 
an overall user estimation model obtained by learning one piece of audio data related to all of the plurality of users (par [041], "…At the start of use of the emotion estimation device 100, positive or negative is determined by using an emotion estimator in an initial state in which voice data spoken by an unspecified
number of speakers is generated as teacher data..."; i.e., emotion estimation of plurality of users) , and 
determine the emotion of the user indicated by the one piece of audio data, using feature amounts that have respectively been estimated by the individual user estimation model and the overall user estimation model (paras [011-016], "…By constructing an emotion estimator for the specific individual, an emotion estimator generated as teacher data using the voice data spoken by the unspecified number of speakers estimates the emotional state of the specific individual at the time of speech. It is characterized by functioning as personal adaptation means for personal adaptation as an emotion estimator…it is possible to improve the estimation accuracy of the emotion estimation device for a specific individual….the configuration of the emotion estimation device 100, the personal adaptation process for adapting the emotion estimation device 100 as the emotion estimation device of the specific individual user, and the emotion estimation process of the specific individual by the emotion estimation device 100 will be described...")
Claim 5 is a non-transitory computer-readable medium claim with limitations similar to the limitations of Claim 1 and is rejected under similar rationale. Additionally,
Katsuhiko  A non-transitory computer-readable medium storing an information processing program that is executable by a computer to perform processing comprising (Katsuhiko, par [101], "…The program can be stored in a computer-readable recording medium…"):
…
Rationale for combination is similar to that provided for Claim 1.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JANGWOEN LEE whose telephone number is (703)756-5597. The examiner can normally be reached Monday-Friday 8:00 am - 5:00 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, BHAVESH MEHTA can be reached at (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JANGWOEN LEE/Examiner, Art Unit 2656                                                                                                                                                                                                        
/Paras D Shah/Supervisory Patent Examiner, Art Unit 2653                                                                                                                                                                                                        

03/31/2026
Read full office action
Prosecution Timeline

Nov 29, 2023
Application Filed
Oct 28, 2025
Non-Final Rejection — §101, §103
Jan 22, 2026
Examiner Interview Summary
Jan 22, 2026
Applicant Interview (Telephonic)
Jan 26, 2026
Response Filed
Mar 31, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/007,025
Patent 12597432
HUM NOISE DETECTION AND REMOVAL FOR SPEECH AND MUSIC RECORDINGS
2y 5m to grant Granted Apr 07, 2026
18/118,619
Patent 12586571
EFFICIENT SPEECH TO SPIKES CONVERSION PIPELINE FOR A SPIKING NEURAL NETWORK
2y 5m to grant Granted Mar 24, 2026
18/258,569
Patent 12573381
SPEECH RECOGNITION METHOD AND APPARATUS, STORAGE MEDIUM, AND ELECTRONIC DEVICE
2y 5m to grant Granted Mar 10, 2026
17/925,261
Patent 12567430
METHOD AND DEVICE FOR IMPROVING DIALOGUE INTELLIGIBILITY DURING PLAYBACK OF AUDIO DATA
2y 5m to grant Granted Mar 03, 2026
18/310,577
Patent 12566930
CONDITIONING OF PRODUCTIVITY APPLICATION FILE CONTENT FOR INGESTION BY AN ARTIFICIAL INTELLIGENCE MODEL
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
82%
Grant Probability
99%
With Interview (+24.2%)
2y 11m
Median Time to Grant
Moderate
PTA Risk
Based on 44 resolved cases by this examiner. Grant probability derived from career allow rate.