DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
This communication is responsive to the applicant’s amendment dated 12/11/2025. The applicant amended claims 1-4, 6, and 8-15. Additionally, the applicant cancelled claims 5 and 7.
Response to Arguments
Applicant’s arguments, see Remarks (pg. 8, line 18-19), filed 12/11/2025, with respect to the abstract have been fully considered and are persuasive. The objection of the abstract has been withdrawn.
Applicant’s arguments, see Remarks (pg. 8, line 20-21), filed 12/11/2025, with respect to the title have been fully considered and are persuasive. The objection of the title has been withdrawn.
Applicant’s arguments, see Remarks (pg. 8, line 22-23), filed 12/11/2025, with respect to the claims 1-13 have been fully considered and are persuasive. The 35 U.S.C. 112(f) rejection of claims 1-13 has been withdrawn.
Applicant's arguments with respect to 35 U.S.C. 101, filed 12/11/2025 have been fully considered but they are not persuasive.
The applicant argues that, claim 1 does not recite an abstract idea because “outputting second script data in a synthesized voice clearly cannot practically be performed in the human mind”. Next the applicant argues that claim 1 “integrates any purported abstract idea into a practical application and represents significantly more than an abstract idea”. The applicant supports this position by citing the specification in paragraph [0137] “can automatically provide data that can output performance voice in accordance with an intention of the script 31 by processing the first script data 30 by the information processing device”. The examiner respectfully disagrees. All of the amended limitations following “generating the second script data by…” are all things a human can do. In terms of the practical application, eligibility cannot be furnished by the abstract idea itself. MPEP 2106.05(l) states, an inventive concept "cannot be furnished by the unpatentable law of nature (or natural phenomenon or abstract idea) itself." Genetic Techs. Ltd. v. Merial LLC, 818 F.3d 1369, 1376, 118 USPQ2d 1541, 1546 (Fed. Cir. 2016). See also Alice Corp., 573 U.S. at 21-18, 110 USPQ2d at 1981 (citing Mayo, 566 U.S. at 78, 101 USPQ2d at 1968 (after determining that a claim is directed to a judicial exception, "we then ask, ‘[w]hat else is there in the claims before us?") (emphasis added)). The claim is outputting script data with a synthesized voice, which the examiner interprets as reading out the script that is being produced. This can be done by arranging people in a play to read their designated lines. The applicant is not improving or inventing voice synthesis, but simply automating the speech production process with a machine. It is a software program to produce human voice that is not being improved or invented upon. The synthesis is being used for its ordinary purpose for reading out text/a script. Speech synthesis is not being invented or improved upon but being used for its well-known ordinary purpose, so it does not constitute an inventive concept. Therefore, the examiner views the speech synthesis as mere automation of a human process, which is the production of speech. It is being used for its ordinary purpose which is to output text data, and therefore automating the human process. Additionally, in terms of the citation from the specification the applicant uses to support their argument for a practical application, the examiner fails to see how that is being reflected in the claim language. If the applicant is relying on this feature to support their case for an inventive concept, the feature needs to be reflected in the claim language. Therefore, the 35 U.S.C. 101 rejection in maintained.
Applicant’s arguments with respect to 35 U.S.C. 102 and 103 for claims 1-15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Given the amendments to the independent claims, a new ground of rejection is provided below.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-4, 6, and 8-15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Independent claims 1 and 14 recite “output second script data in which dialogue data of a dialogue included in first script data is associated with utterer data of an utterer of the dialogue from the first script data as a basis for performance, with a synthesized voice”, “specifying a script pattern representing an arrangement of the utterer and the dialogue including in the first script data”, “extracting pairs of the utterer data and the dialogue data corresponding to each other by analyzing the first script data based on the script pattern”, “giving dialogue (IDs) to pieces of the dialogue data in ascending order of appearance of the pieces of the dialogue data in the first script data”, and “arranging, in order of the dialogue IDs, each of the dialogue IDs and a corresponding one pair of the pairs of the utterer data and the dialogue data”.
The limitation of outputting script data, as drafted, is a process, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “a hardware processor” and “a computer”, nothing in the claim precludes the step from practically being performed in the mind. For example, “outputting…” in the context of this claim encompasses generating data, which a human can do in the mind or with a pen and paper. Next, the limitation of specifying a script pattern, as drafted, is a process, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting the elements listed above, nothing in the claim precludes the step from practically being performed in the mind. For example, “specifying…” in the context of this claim encompasses classifying text, which a human can do in the mind or with a pen and paper. Next, the limitation of extracting pairs of utterer data, as drafted, is a process, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting the elements listed above, nothing in the claim precludes the step from practically being performed in the mind. For example, “extracting”, in the context of this claim encompasses analyzing a script for data, which a human can do in the mind or with a pen and paper. Next, the limitation of giving IDs, as drafted, is a process, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting the elements listed above, nothing in the claim precludes the step from practically being performed in the mind. For example, “giving…”, in the context of this claim encompasses assigning an ID to text, which a human can do in the mind or with a pen and paper. Next, the limitation of arranging IDs, as drafted, is a process, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting the elements listed above, nothing in the claim precludes the step from practically being performed in the mind. For example, “arranging…”, in the context of this claim encompasses organizing data, which a human can do in the mind or with a pen and paper.
The judicial exception is not integrated into a practical application. In particular, the claim only recites the additional elements, “a hardware processor” and “a computer” to perform the recited limitations. These elements in these steps are recited at a high-level of generality such that is amounts no more than mere instructions to apply the exception using generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using “a hardware processor” and “a computer” to perform the recited limitations amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. The claim is not patent eligible.
Dependent claims 2-4, 6, 8-13 and 15 are also rejected for the same reasons provided in independent claims 1 and 14 above. The dependent claim, including the further recited limitation, does not integrate the abstract idea into a practical application and the additional elements, taken individually and in combination do not contribute to an inventive concept. In other words, the dependent claim is directed to an abstract idea without significantly more.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4 and 10-14 are rejected under 35 U.S.C. 103 as being unpatentable over Min et al. US 20100302254 A1 (hereinafter Min) in view of Chang (US 20130124202 A1).
Regarding independent claims 1 and 14 Min teaches an information processing device, comprising, an information processing method executed by a computer, the information processing method comprising:
a hardware processor configured to (Fig. 10, 1010, [0118] “hardware devices that are specially configured to store and perform program instructions”):
output second script data in which dialogue data of a dialogue included in first script data is associated with utterer data of an utterer of the dialogue from the first script data as a basis for performance, with a synthesized voice (FIG. 8, [0101] “The animation script generating device 120 of the receiving terminal 820 may generate an animation script 840 including the extracted emotion and the selected action. Next, the animation outputting device 230 of the receiving terminal 820 may output the animation script 840 as an animation”, examiner interprets 230 as the output unit, animation script as the second script data, and text scenario as the first script data; [0069-0070] “a technology outputting text of the text-based data as a voice may be referred to as a Text To Speech (TTS)… The animation generation unit 238 may combine the graphic generated in the graphic generation unit 234 with the voice or the background sound outputted from the audio processing unit 236 to thereby output an animation”).
Min fails to teach wherein the hardware pressor is further configured to generate the second script data by: specifying a script pattern representing an arrangement of the utterer and the dialogue including in the first script data; extracting pairs of the utterer data and the dialogue data corresponding to each other by analyzing the first script data based on the script pattern; giving dialogue (IDs) to pieces of the dialogue data in ascending order of appearance of the pieces of the dialogue data in the first script data; and arranging, in order of the dialogue IDs, each of the dialogue IDs and a corresponding one pair of the pairs of the utterer data and the dialogue data
However, Chang teaches wherein the hardware pressor is further configured to generate the second script data by: specifying a script pattern representing an arrangement of the utterer and the dialogue including in the first script data (FIG. 1B, 130b describes the arrangement of the utterers which examiner interprets as the script pattern);
extracting pairs of the utterer data and the dialogue data corresponding to each other by analyzing the first script data based on the script pattern ([0079] “each successive pair of hard/soft alignment points is used to create an alignment sub-matrix. The alignment sub-matrix may include script words (e.g., sub-set of script words) that occur between matched script words (e.g., script words associated with hard alignment points) and intermediate transcript words (e.g., a sub-set of transcript words) that occur between matched transcript words (e.g., transcript words associated with hard alignment points). The script words may be provided along one axis (e.g., the y or x-axis) of the sub-matrix, and the intermediate transcript words may be provided along the other axis (e.g., the x or y-axis) of the sub-matrix”);
giving dialogue (IDs) to pieces of the dialogue data in ascending order of appearance of the pieces of the dialogue data in the first script data (FIG. 1C-1D, examiner interprets the timecode as the dialogue ID); and
arranging, in order of the dialogue IDs, each of the dialogue IDs and a corresponding one pair of the pairs of the utterer data and the dialogue data (FIG. 1C-1D, 136-138, [0062] “An exemplary time-aligned script data/document 134 is depicted in FIG. 1D. As depicted, time-aligned data/document 134 includes spoken words 136 grouped with other spoken words of their respective script elements 137, and provided along with their associated timecodes 138. A start time 140 for each element grouping of lines is also provided. In the depicted time-aligned data/document, each of the script elements (and text of the script elements) is also assigned a corresponding time code”)
Min in view of Chang are considered to be analogous to the claimed invention because both are the same field of script generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the techniques an animation system for generating animation based on text-based data of Min with the technique of generating and arranging specific script data taught by Chang in order to improve methods and apparatus for time aligning documents (e.g., scripts) to associated video/audio content (e.g., movies) (see Chang [0006]).
Regarding claim 2, Min in view of Chang teaches all of the limitations of claim 1, upon which claim 2 depends.
Additionally, Min teaches wherein the hardware processor is further configured to output the second script data in which the dialogue data is associated with the utterer data as an estimation result of the utterer who utters the dialogue based on the dialogue data ([0020] “extraction unit configured to: identify, in the user profile database, the character corresponding to the generator of the text-based data”; FIG. 10, 1010).
Regarding claim 3, Min in view of Chang teaches all of the limitations of claim 1, upon which claim 3 depends.
Additionally, Min teaches wherein the hardware processor is further configured to output the second script data in which the utterer data is associated with the dialogue data in which a punctuation mark included in the dialogue is optimized (FIG. 10, 1032, 1040, “I guess around 399,000 won?”, examiner interprets the question mark as the punctuation mark; 1010).
Regarding claim 4, Min in view of Chang teaches all of the limitations of claim 1, upon which claim 4 depends.
Additionally, Min teaches wherein the hardware processor is further configured to estimate a feeling of the utterer at a time of uttering the dialogue data, and outputs the first script data with which feeling data of the estimated feeling is further associated ([0017] “here is provided a receiving terminal, including: a reference database configured to store user relationship information, character information, and emotional information, and an animation script generating device configured to: extract an emotion based on analyzing received text-based data”; 1010).
Regarding claim 10, Min in view of Chang teaches all of the limitations of claim 1, upon which claim 10 depends.
Additionally, Min teaches receive setting information including dictionary identification information of voice dictionary data corresponding to the dialogue data included in the second script data ([0014] “The emotion extraction unit may be further configured to: identify, in the user profile database, the character corresponding to the generator of the text-based data and the user relationship information, identify, in the emotional vocabulary dictionary database”); and
generate third script data in which the received setting information is associated with the corresponding dialogue data in the second script data ([0068] “The graphic generation unit 234 may generate a graphic in the background image based on an action of the character according to the camera work. The graphic generation unit 234 may generate the graphic including text of the text-based data”).
Regarding claim 11, Min in view of Chang teaches all of the limitations of claim 10, upon which claim 11 depends.
Additionally, Min teaches wherein the hardware processor is further configured to receive the setting information further including voice quality information at a time when the dialogue of the dialogue data is uttered ([0069] “The audio processing unit 236 may output the text-based data as a voice in which an intonation or a tone is applied according to the emotional information”. Examiner interprets tone as voice quality information; 1010).
Regarding claim 12, Min in view of Chang teaches all of the limitations of claim 10, upon which claim 12 depends.
Additionally, Min teaches generate performance voice data including dialogue voice data in which the dialogue data included in the third script data is associated with at least one of a voice synthesis parameter for generating synthesized voice of the dialogue data using the voice dictionary data identified with the corresponding dictionary identification information and synthesized voice data of the synthesized voice ([0059] “In addition, the emotion extraction unit 122 may extract emotions from the text-based to data, or extract the emotions from predetermined physiological features such as measured pulses or brain waves of a user. The physiological features may be measured through an external instrument such as a pulse meter, an electric spgygnomanometer, an electroencephalograph, and the like”; [0050] “The emotional vocabulary dictionary database 111 is a database in which mapping information of emotions based on vocabulary and user relationship information may be stored”).
Regarding claim 13, Min in view of Chang teaches all of the limitations of claim 12, upon which claim 13 depends.
Additionally, Min teaches wherein the hardware processor is further configured to give one or a plurality of labels to the dialogue voice data (FIG. 12, examiner interprets the columns of the table as labs of the voice data and 122 as the label giving unit; 1010).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Min in view of Chang, as shown above in claim 1, in further view of Yang et al. US 20220351714 A1 (hereinafter Yang).
Regarding claim 6, Min in view of Chang teaches all of the limitations of claim 1, upon which claim 6 depends.
Min in view of Chang fails to teach wherein the hardware processor is further configured to output the second script data as an output result obtained by inputting the first script data to a first learning model.
However, Yang teaches wherein the hardware processor is further configured to output the second script data as an output result obtained by inputting the first script data to a first learning model ([0157] “The model learning unit 24 can perform learning such that a neural network model has a determination reference about how to classify predetermined data, using the acquired learning data”, examiner interprets 24 as the first learning model; FIG. 4, 12)
Min in view of Chang in view of Yang are considered to be analogous to the claimed invention because all are the same field of text-to-speech generation systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the script analysis techniques of Min in view of Chang with the technique of using learning models taught by Yang in order to improve TTS method and device which allows multiple speakers to be set so as to perform speech synthesis by customizing an audiobook having multiple characters to a voice desired by a user (see Yang [0002]).
Claims 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Min in view of Chang in view of Nonaka US 20130282376 A1.
Regarding claim 8, Min in view of Chang in view of Nonaka teaches all of the limitations of claim 1, upon which claim 8 depends.
Min in view of Chang fails to teach wherein the hardware processor is further configured to specify the script pattern of the first script data as an output result obtained by inputting the first script data to a second learning model
However, Nonaka teaches wherein the hardware processor is further configured to specify the script pattern of the first script data as an output result obtained by inputting the first script data to a second learning model (FIG. 1, 1, [0090] “The content element may be automatically detected based on machine learning”).
Min in view of Chang in view of Nonaka are considered to be analogous to the claimed invention because all are the same field of script analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the script analysis techniques of Min in view of Chang with the technique of analyzing script patterns taught by Nonaka in order to improve a file format for digitizing a comic content (see Nonaka [0002]).
Regarding claim 9, Min in view of Chang in view of Nonaka teaches all of the limitations of claim 1, upon which claim 9 depends.
Min in view of Chang fails to teach wherein the hardware processor is further configured to: receive a correction instruction for the script pattern; and correct the script pattern in accordance with the correction instruction
However, Nonaka teaches wherein the hardware processor is further configured to: receive a correction instruction for the script pattern; and correct the script pattern in accordance with the correction instruction ([0096] “For example, the rule is learned as follows. First, a correct rule is preliminarily prepared for each of a plurality of basic patterns…. The page information analysis section 10 optimizes a parameter for estimating the reading order of panels by comparing the reading order estimated for each basic pattern and the correct rule”; [0099] “The various accompanying information acquired by the page information analysis section 10 can be corrected by operating the operation section 16”).
Min in view of Chang in view of Nonaka are considered to be analogous to the claimed invention because all are the same field of script analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the script analysis techniques of Min in view of Chang with the technique of analyzing script patterns taught by Nonaka in order to improve a file format for digitizing a comic content (see Nonaka [0002]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Ishii et al. (US 20210174702 A1) teaches a communication skill evaluation system including: an audio input device; a measurement device; an utterance period detection unit that, based on the audio information input by the audio input device, detects an utterance period and a participant who spoke in each utterance period; a participatory role assignment unit that assigns a participatory role to each participant in accordance with whether or not each participant spoke in each utterance pair composed of two utterances obtained in chronological order from each utterance period; a feature quantity extraction unit that, based on the measurement results by the measurement device, extracts a non-verbal feature quantity for each participant, relating to the non-verbal action at the end of the utterance in each utterance period; and an estimation unit that estimates communication skill for each participant based on a combination, in each utterance period, of the participatory role and the non-verbal feature quantity.
Qu et al. (US 20200273450 A1) teaches a computer implemented method and system for processing an audio signal. The method includes the steps of extracting prosodic features from the audio signal, aligning the extracted prosodic features with a script derived from or associated with the audio signal, and segmenting the script with the aligned extracted prosodic features into structural blocks of a first type. The method may further include determining a distance measure between a structural block of a first type derived from the script with another structural block of the first type using, for example, the Damerau-Levenshtein distance.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZEESHAN SHAIKH whose telephone number is (703)756-1730. The examiner can normally be reached Monday-Friday 7:30AM-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ZEESHAN MAHMOOD SHAIKH/Examiner, Art Unit 2658
/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658