Office Action Analysis: 18208765 — METHOD AND APPARATUS FOR GENERATING SIGN LANGUAGE VIDEO, COMPUTER DEVICE, AND STORAGE MEDIUM

Office Action

§101 §102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
This is a Final Office action in response to communications filed on January
20, 2026. Applicant amended claims 1, 2, 6-9, 13-16, and 19-20. Applicant cancelled claims 3-5, 10-12, and 17-18. Accordingly, Examiner withdraws the objections to claims 4 and 11. Applicant added claims 21-26. Claims 1, 2, 6-9, 13-16, and 19-26 remain pending in this application.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: Does the claimed invention fall inside one of the four statutory categories (process, machine, manufacture, or composition of matter)? Yes for claims 1, 2, 6-9, 13-16, and 19-26.
Claims 1-2, 6-7, and 21-26 are drawn to a method for generating sign language videos (i.e., process). Claims 8-9 and 13-14 are drawn to a computer device for generating a sign language video  (i.e., a manufacture). Claims 15-16 and 19-20 are drawn to a non-transitory computer-readable storage medium for generating a sign language video (i.e., a manufacture).
Step 2A - Prong One: Do the claims recite a judicial exception (an abstract idea enumerated in the 2019 PEG, a law of nature, or a natural phenomenon)? Yes, for claims 1, 2, 6-9, 13-16, and 19-26.
     	Claim 1 recites:
A method for generating sign language videos performed by a computer device, the method comprising: 
obtaining a listener text, wherein the listener text includes (i) timestamps that indicate time intervals of audio corresponding to the listener text and (ii) text that conforms to grammatical structures of a person who is able to hear;
extracting a summary of the listener text to obtain a summary text, wherein a text length of the summary text shorter than a text length of the listener text and wherein the extracting includes: dividing the listener text into a plurality of statements; 
determining a respective compression ratio for each statement of the plurality of statements;
compressing respective text in each statement of the plurality of statements, according to the determined respective compression ratio for the respective statement, thereby obtaining the summary text that includes a plurality of compressed statements, the summary text having full-text semantics that is consistent with full-text semantics of the listener text;                     
and inputting the summary text into a trained translation model that is trained on text pairs of sample sign language text and sample listener text; 
receiving, as output from the trained translation model, a sign language text output that conforms, to grammatical structures of a hearing-impaired person;
and generating a sign language video based on the sign language text output including synchronizing the sign language video with the timestamps that indicate time intervals of the audio corresponding to the listener text.
These steps amount to a form of mental process and organizing human activity (i.e., an abstract idea) because a human can obtain listener text, summarize the extracted listener text, and convert the extracted summarized text into sign language. Applicant of claimed invention discloses “acquiring listener text, the listener text being texts conforming to grammatical structures of a hearing-friendly person.” [0007]
Independent claims 8 and 15 describe nearly identical steps as claim 1 (and therefore recite limitations that fall within this subject matter of grouping abstract ideas), and these claims are therefore determined to recite an abstract idea under the same analysis. Dependent claims 2 and 6-7 are directed towards mini-tasks (performing semantic analysis, determining a compression ratio, acquiring sign language gesture information, etc.) for generating sign language videos. Each claim amounts to a form of collecting, generating, and analyzing information, and therefore falls within the scope of a method for organizing human activity, (i.e., an abstract idea). As such, the Examiner concludes that claims 1, 2, 6-9, 13-16, and 19-26 recite an abstract idea.
Step 2A – Prong Two: Do the claims recite additional elements that integrate the exception into a practical application of the exception? No
In prong two of step 2A, an evaluation is made whether a claim recites any additional element, or combination of additional elements, that integrate the exception into a practical application of that exception. An “additional element” is an element that is recited in the claim in addition to (beyond) the judicial exception (i.e., an element/limitation that sets forth an abstract idea is not an additional element). The phrase “integration into a practical application” is defined as requiring an additional element or a combination of additional elements in the claim to apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that it is more than a drafting effort designed to monopolize the exception.
The requirement to execute the claimed steps/functions using a computer device (independent claims 1, 8, and 15 and dependent claims 2, 6-7, 9, 13-14, 16, and 19-26) is equivalent to adding the words “apply it” on a generic computer and/or mere instructions to implement the abstract idea on a generic computer. Similarly, the limitations of a computer device (independent claims 1, 8, and 15 and dependent claims 2, 6-7, 9, 13-14, 16, and 19-26) are recited at a high level of generality and amount to no more than mere instructions to apply the exception using generic computer components. These limitations do not impose any meaningful limits on practicing the abstract idea, and therefore do not integrate the abstract idea into a practical application (see MPEP 2106.05(f)). 
Use of a computer, processor, memory or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general-purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not integrate a judicial exception into a practical application or provide significantly more. See Affinity Labs v. DirecTV, 838 F.3d 1253, 1262, 120 USPQ2d 1201, 1207 (Fed. Cir. 2016) (cellular telephone); TLI Communications LLC v. AV Auto, LLC, 823 F.3d 607, 613, 118 USPQ2d 1744, 1748 (Fed. Cir. 2016) (computer server and telephone unit). Intellectual Ventures I LLC v. Capital One Bank (USA), 792 F.3d 1363, 1367, 115 USPQ2d 1636, 1639 (Fed. Cir. 2015) (See MPEP 2106.05(f)). Further, the additional limitations beyond the abstract idea identified above, serve merely to generally link the use of the judicial exception to a particular technological environment or field of use. Specifically, they serve to limit the application of the abstract idea to a computerized environment (e.g., identifying and displaying, etc.) performed by a computing device, processor, and memory, etc. This reasoning was demonstrated in Intellectual Ventures I LLC v. Capital One Bank (Fed. Cir. 2015), where the court determined "an abstract idea does not become nonabstract by limiting the invention to a particular field of use or technological environment, such as the Internet [or] a computer"). These limitations do not impose any meaningful limits on practicing the abstract idea, and therefore do not integrate the abstract idea into a practical application (see MPEP 2106.05(h)).
Dependent claims 2, 6-7, 9, 13-14, 16, and 19-26 fail to include any additional elements. In other words, each of the limitations/elements recited in respective dependent claims are further part of the abstract idea as identified by the Examiner for each respective independent claim (i.e., they are part of the abstract idea recited in each respective claim). The Examiner has therefore determined that the additional elements, or combination of additional elements, do not integrate the abstract idea into a practical application. Accordingly, the claims are directed to an abstract idea.
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception? i.e., Are there any additional elements (features/limitations/step) recited in the claim beyond the abstract idea? No
In step 2B, the claims are analyzed to determine whether any additional element, or combination of additional elements, are sufficient to ensure that the claims amount to significantly more than the judicial exception. This analysis is also termed a search for an “inventive concept.” An “inventive concept” is furnished by an element or combination of elements that is recited in the claim in addition to (beyond) the judicial exception, and is sufficient to ensure that the claim as a whole amount to significantly more than the judicial exception itself. Alice Corp., 573 U.S. at 27-18, 110 USPQ2d at 1981 (citing Mayo, 566 U.S. at 72-73, 101 USPQ2d at 1966).
As discussed above in “Step 2A – Prong Two”, the identified additional elements in independent claims 1, 8, and 15 and dependent claims 2, 6-7, 9, 13-14, 16, and 19-26 are equivalent to adding the words “apply it” on a generic computer, and/or generally link the use of the judicial exception to a particular technological environment or field of use. Therefore, the claims do not amount to significantly more than the judicial exception. Viewing the additional limitations in combination also shows that they fail to ensure the claims amount to significantly more than the abstract idea. When considered as an ordered combination, the additional components of the claims add nothing that is not already present when considered separately, and thus simply append the abstract idea with words equivalent to “apply it” on a generic computer and/or mere instructions to implement the abstract idea on a generic computer and/or append the abstract idea with insignificant extra solution activity associated with the implementation of the judicial exception, (e.g., mere data gathering, post-solution activity) and/or simply appending well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception.
Dependent claims 2, 6-7, 9, 13-14, 16, and 19-26 fail to include any additional elements. In other words, each of the limitations/elements recited in respective independent claims are further part of the abstract idea as identified by the Examiner for each respective dependent claim (i.e. they are part of the abstract idea recited in each respective claim). The Examiner has therefore determined that no additional element, or combination of additional claims elements are sufficient to ensure the claims amount to significantly more than the abstract idea identified above. Therefore, claims 1-2, 6-9, 13-16, and 19-26 are not eligible subject matter under 35 USC 101.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 7-8, 14-15, and 20-26 are rejected under 35 U.S.C. 102 as being unpatentable under US 20140046661 A1 (“Bruner”).
In regards to claim 1, Bruner discloses
A method for generating sign language videos, performed by a computer device, the method comprising: obtaining a listener text ([0025], “The translation platform may be configured to receive ... textual information” Examiner notes that textual information is similar to listener text in that listener text is text presented in a format for listening.), wherein the listener text includes (i) timestamps that indicate time intervals of audio corresponding to the listener text ([0056], “the translation platform may associate a timer with the … display” Examiner notes that a timer can provide timestamps that indicate time intervals.) and (ii) text that conforms to grammatical structures of a person who is able to hear ([0021], “the Translation Platform allows for rapid conversion of Closed Captioning text” Examiner notes that closed captioning text conforms to the grammatical structures, spelling, and punctuation of the spoken language and aims to match the audio for a hearing person.);
extracting a summary of the listener text to obtain a summary text, wherein a text length of the summary text is shorter than a text length of the listener text and wherein the extracting includes: dividing the listener text into a plurality of statements ([0059], “the translation platform evaluates the one or more words to be dropped and/or one or more words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that for summary text insignificant details are excluded to focus on essential information and that summary text typically include a plurality of statements.);
determining a respective compression ratio for each statement of the plurality of statements ([0058], “certain words … are not converted … to save time … the system may process at a one to one element ratio”);
compressing respective text in each statement of the plurality of statements ([0058], “certain words … are not converted … to save time) , according to the determined respective compression ratio for the respective statement, thereby obtaining the summary text that includes a plurality of compressed statements, the summary text having full-text semantics that is consistent with full-text semantics of the listener text ([0059], “the translation platform evaluates the one or more words to be dropped and/or one or more words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that for summary text insignificant details are excluded to focus on essential information, which amounts to compression, and that summary text typically include a plurality of statements.);
and inputting the summary text into a trained translation model that is trained on text pairs of sample sign language text and sample listener text ([0025], “The translation platform may be configured to receive and process … information and/or data (e.g., … spoken information, textual information, … image … information, and/or other such information, … regardless of the language), and produce corresponding translated information”);
receiving, as output from the trained translation model, a sign language text output that conforms to grammatical structures of a hearing-impaired person ([0140], “some embodiments provide a conversion, translation or the like of substantially any type of ... information into sign language” Examiner notes that sign language text inherently has unique grammatical structures that accommodate hearing-impaired individuals.);
and generating a sign language video based on the sign language text output ([0141], “the sign language format can comprise one or more video clips or segments”) including synchronizing the sign language video with the timestamps that indicate time intervals of the audio corresponding to the listener text ([0056], “the translation platform may associate a timer with the … display” Examiner notes that a timer can provide timestamps that indicate time intervals.).
In regards to claim 7, Bruner discloses
wherein obtaining the listener text comprises at least one of: acquiring an input listener text ([0022], “a method is disclosed for translating speech elements derived from … an input speech element”);
acquiring a subtitle file, and extracting the listener text from the subtitle file ([0022], “a method is disclosed for translating speech elements derived from a Closed Captioning feed” Examiner notes that a closed captioning feed is a type of subtitle.);
acquiring an audio file, performing speech recognition on the audio file to obtain a speech recognition result, and generating the listener text based on the speech recognition result ([0022], “a method is disclosed for … querying a video database” Examiner notes that a video database may include an audio file.);
and acquiring a video file, performing character recognition on video frames of the video file to obtain a character recognition result, and generating the listener text based on the character recognition result ([0022], “a method is disclosed for … selecting from the video database at least one target video clip”).
In regards to claim 8, Bruner discloses
A computer device, comprising a memory ([0005], “embodiments provide systems for use in translating information, comprising: … memory”); 
and a processor, the memory storing instructions that, when executed by the processor, cause the computer device to perform operations including ([0005], “embodiments provide systems for use in translating information, comprising: a … processor … memory; one or more processors … wherein the one or more processors are configured to implement a plurality of processing instructions”): 
obtaining a listener text ([0025], “The translation platform may be configured to receive ... textual information” Examiner notes that textual information is similar to listener text in that listener text is text presented in a format for listening.), wherein the listener text includes (i) timestamps that indicate time intervals of audio corresponding to the listener text ([0056], “the translation platform may associate a timer with the … display” Examiner notes that a timer can provide timestamps that indicate time intervals.) and (ii) text that conforms to grammatical structures of a person who is able to hear ([0021], “the Translation Platform allows for rapid conversion of Closed Captioning text” Examiner notes that closed captioning text conforms to the grammatical structures, spelling, and punctuation of the spoken language and aims to match the audio for a hearing person.);
extracting a summary of the listener text to obtain a summary text, wherein a text length of the summary text is shorter than a text length of the listener text and wherein the extracting includes: dividing the listener text into a plurality of statements ([0059], “the translation platform evaluates the one or more words to be dropped and/or one or more words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that for summary text insignificant details are excluded to focus on essential information and that summary text typically include a plurality of statements.);
determining a respective compression ratio for each statement of the plurality of statements ([0058], “certain words … are not converted … to save time … the system may process at a one to one element ratio”);
compressing respective text in each statement of the plurality of statements ([0058], “certain words … are not converted … to save time) , according to the determined respective compression ratio for the respective statement, thereby obtaining the summary text that includes a plurality of compressed statements, the summary text having full-text semantics that is consistent with full-text semantics of the listener text ([0059], “the translation platform evaluates the one or more words to be dropped and/or one or more words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that for summary text insignificant details are excluded to focus on essential information, which amounts to compression, and that summary text typically include a plurality of statements.);
and inputting the summary text into a trained translation model that is trained on text pairs of sample sign language text and sample listener text ([0025], “The translation platform may be configured to receive and process … information and/or data (e.g., … spoken information, textual information, … image … information, and/or other such information, … regardless of the language), and produce corresponding translated information”);
receiving, as output from the trained translation model, a sign language text output that conforms to grammatical structures of a hearing-impaired person ([0140], “some embodiments provide a conversion, translation or the like of substantially any type of ... information into sign language” Examiner notes that sign language text inherently has unique grammatical structures that accommodate hearing-impaired individuals.);
and generating a sign language video based on the sign language text output ([0141], “the sign language format can comprise one or more video clips or segments”) including synchronizing the sign language video with the timestamps that indicate time intervals of the audio corresponding to the listener text ([0056], “the translation platform may associate a timer with the … display” Examiner notes that a timer can provide timestamps that indicate time intervals.).
In regards to claim 14, Bruner discloses
wherein obtaining the listener text comprises at least one of: acquiring an input listener text ([0022], “a method is disclosed for translating speech elements derived from … an input speech element”);
acquiring a subtitle file, and extracting the listener text from the subtitle file ([0022], “a method is disclosed for translating speech elements derived from a Closed Captioning feed” Examiner notes that a closed captioning feed is a type of subtitle.);
acquiring an audio file, performing speech recognition on the audio file to obtain a speech recognition result, and generating the listener text based on the speech recognition result ([0022], “a method is disclosed for … querying a video database” Examiner notes that a video database may include an audio file.);
and acquiring a video file, performing character recognition on video frames of the video file to obtain a character recognition result, and generating the listener text based on the character recognition result ([0022], “a method is disclosed for … selecting from the video database at least one target video clip”).
In regards to claim 15, Bruner discloses
A non-transitory computer-readable storage medium storing thereon computer- readable instructions, the computer-readable instructions, when executed by a processor of a computer device, cause the computer device to perform operations including ([0005], “embodiments provide systems for use in translating information, comprising: a … processor … memory; one or more processors … wherein the one or more processors are configured to implement a plurality of processing instructions”): 
obtaining a listener text ([0025], “The translation platform may be configured to receive ... textual information” Examiner notes that textual information is similar to listener text in that listener text is text presented in a format for listening.), wherein the listener text includes (i) timestamps that indicate time intervals of audio corresponding to the listener text ([0056], “the translation platform may associate a timer with the … display” Examiner notes that a timer can provide timestamps that indicate time intervals.) and (ii) text that conforms to grammatical structures of a person who is able to hear ([0021], “the Translation Platform allows for rapid conversion of Closed Captioning text” Examiner notes that closed captioning text conforms to the grammatical structures, spelling, and punctuation of the spoken language and aims to match the audio for a hearing person.);
extracting a summary of the listener text to obtain a summary text, wherein a text length of the summary text is shorter than a text length of the listener text and wherein the extracting includes: dividing the listener text into a plurality of statements ([0059], “the translation platform evaluates the one or more words to be dropped and/or one or more words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that for summary text insignificant details are excluded to focus on essential information and that summary text typically include a plurality of statements.);
determining a respective compression ratio for each statement of the plurality of statements ([0058], “certain words … are not converted … to save time … the system may process at a one to one element ratio”);
compressing respective text in each statement of the plurality of statements ([0058], “certain words … are not converted … to save time) , according to the determined respective compression ratio for the respective statement, thereby obtaining the summary text that includes a plurality of compressed statements, the summary text having full-text semantics that is consistent with full-text semantics of the listener text ([0059], “the translation platform evaluates the one or more words to be dropped and/or one or more words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that for summary text insignificant details are excluded to focus on essential information, which amounts to compression, and that summary text typically include a plurality of statements.);
and inputting the summary text into a trained translation model that is trained on text pairs of sample sign language text and sample listener text ([0025], “The translation platform may be configured to receive and process … information and/or data (e.g., … spoken information, textual information, … image … information, and/or other such information, … regardless of the language), and produce corresponding translated information”);
receiving, as output from the trained translation model, a sign language text output that conforms to grammatical structures of a hearing-impaired person ([0140], “some embodiments provide a conversion, translation or the like of substantially any type of ... information into sign language” Examiner notes that sign language text inherently has unique grammatical structures that accommodate hearing-impaired individuals.);
and generating a sign language video based on the sign language text output ([0141], “the sign language format can comprise one or more video clips or segments”) including synchronizing the sign language video with the timestamps that indicate time intervals of the audio corresponding to the listener text ([0056], “the translation platform may associate a timer with the … display” Examiner notes that a timer can provide timestamps that indicate time intervals.).
In regards to claim 20, Bruner discloses
wherein obtaining the listener text comprises at least one of: acquiring an input listener text ([0022], “a method is disclosed for translating speech elements derived from … an input speech element”);
acquiring a subtitle file, and extracting the listener text from the subtitle file ([0022], “a method is disclosed for translating speech elements derived from a Closed Captioning feed” Examiner notes that a closed captioning feed is a type of subtitle.);
acquiring an audio file, performing speech recognition on the audio file to obtain a speech recognition result, and generating the listener text based on the speech recognition result ([0022], “a method is disclosed for … querying a video database” Examiner notes that a video database may include an audio file.);
and acquiring a video file, performing character recognition on video frames of the video file to obtain a character recognition result, and generating the listener text based on the character recognition result ([0022], “a method is disclosed for … selecting from the video database at least one target video clip”).
In regards to claim 21, Bruner discloses
wherein obtaining the listener text comprises ([0025], “The translation platform may be configured to receive ... textual information” Examiner notes that textual information is similar to listener text in that listener text is text presented in a format for listening.): performing speech recognition on an audio file corresponding to a live broadcast, to obtain a speech recognition result ([0022], “a method is disclosed for … querying a video database” Examiner notes that a video database may include an audio file and that querying a video database is a form.);
and generating the listener text based on the speech recognition result ([0037], “voice recognition software can be applied to recorded audio to convert audio information into textual information” Examiner notes that voice recognition software can generate text based on speech recognition results.).
In regards to claim 22, Bruner discloses
wherein obtaining the listener text comprises ([0025], “The translation platform may be configured to receive ... textual information” Examiner notes that textual information is similar to listener text in that listener text is text presented in a format for listening.): obtaining an audio file from a live broadcast ([0141], “the conversion performed … provide … conversion in real time, such as with … live broadcast content”);
converting audio signals of the audio file from time-domain signals to frequency- domain signals ([0141], “the conversion can allow some embodiments to … perform translations to other formats of information and/or communication” Examiner notes that time-domain signals and frequency-domain signals are formats for communication systems.);
extracting feature vectors from the frequency-domain signals ([0141], “the conversion can allow some embodiments to … perform translations to other formats of information and/or communication” Examiner notes that frequency-domain signals are formats for communication systems and that feature vectors can be extracted from frequency-domain signals.);
and inputting the feature vectors into an acoustic model ([0144], “received information is processed to identify speech elements … through a voice recognition application” Examiner notes that a voice recognition application is a type of automatic speech recognition system, which is an acoustic model.) and a language model ([0151], “The remote server … includes … the grammar engine” Examiner notes that a grammar engine can function as a language model.);
and obtaining, as a decoding result of at least the acoustic model and the language model, the listener text ([0144], “received information is processed … through a voice recognition application” Examiner notes that a voice recognition application is a type of automatic speech recognition system, which is an acoustic model and that a voice recognition application can decode spoken audio into written text.).
In regards to claim 23, Bruner discloses
wherein the extracted feature vectors comprise linear prediction cepstral coefficients (LPCC) ([0141], “the conversion can allow some embodiments to … perform translations to other formats of information and/or communication” Examiner notes that frequency-domain signals are formats for communication systems and that feature vectors can be extracted from frequency-domain signals. Examiner also notes that a feature vector frequently is comprised of linear prediction cepstral coefficients.).
In regards to claim 24, Bruner discloses
wherein the extracted feature vectors comprise mel frequency cepstral coefficients (MFCC) ([0141], “the conversion can allow some embodiments to … perform translations to other formats of information and/or communication” Examiner notes that frequency-domain signals are formats for communication systems and that feature vectors can be extracted from frequency-domain signals. Examiner also notes that a feature vector commonly comprises mel frequency cepstral coefficients.).
In regards to claim 25, Bruner discloses
wherein compressing the respective text ([0058], “certain words … are not converted … to save time … the system may process at a one to one element ratio”) in each statement of the plurality of statements further includes: obtaining a plurality of candidate compression statements in accordance with the compressing ([0059], “the translation platform evaluates the one or more words to be dropped and/or one or more words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that dropped words result in compressed statements.);
and determining that the plurality of candidate compression statements includes at least one candidate compression statement whose semantic similarity with its corresponding statement is less than a threshold value ([0151], “The remote server … includes the … similarity engine” Examiner notes that a similarity engine can depict semantic similarity among words, phrases, and even data points.);
and filtering out the at least one candidate compression statement to obtain the summary text ([0059], “the translation platform evaluates the one or more words to be dropped and/or one or more words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that dropping words results in filtering words for a compressed statement to include in a corresponding summary text.).
In regards to claim 26, Bruner discloses
wherein the listener text comprises an offline text ([0025], “The translation platform may be configured to receive … spoken information” Examiner notes that spoken information is offline information.).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
Determining the scope and contents of the prior art.
Ascertaining the differences between the prior art and the claims at issue.
Resolving the level of ordinary skill in the pertinent art.
Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 2, 6, 9, 13, 16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable under Bruner in view of US 20210043110 A1 (“Jung”).
In regards to claim 2, Bruner discloses the following limitations with the exception of the underlined limitations.
wherein extracting the summary of the listener text to obtain the summary text comprises: performing semantic analysis on the listener text;
extracting key statements from the listener text ([0059], “the translation platform evaluates … words to be dropped and/or … words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that dropping insignificant words results in the extraction of key statements.) based on semantic analysis results, the key statements being statements for expressing full-text semantics in the listener text;
and determining the key statements as the summary text ([0059], “the translation platform evaluates … words to be dropped and/or … words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that dropping insignificant words results in key statements and the corresponding summary text.).
Jung discloses 
wherein extracting the summary of the listener text to obtain the summary text comprises: performing semantic analysis on the listener text ([0059], “The recognition step … includes a semantic recognition step … of recognizing a … sentence from the … information”);
based on semantic analysis results, the key statements being statements for expressing full-text semantics in the listener text ([0060], “The semantic recognition step … is a process of recognizing the linguistic meaning of the speech language from the speech information storing the sound”);
Bruner and Jung are considered analogous to the claimed invention because they are in the field of sign language apparatuses and methods. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the applicant’s invention for a method for generating a sign language video performed by a computer device, the method comprising: acquiring listener text, the listener text conforming to grammatical structures of a hearing-friendly person; wherein extracting the summary of the listener text to obtain the summary text, a text length of the summary text being shorter than a text length of the listener text; converting the summary text into sign language text, the sign language text conforming to grammatical structures of a hearing-impaired person; and generating the sign language video based on the sign language text, extracting key statements from the listener text, and determining the key statements as the summary text, as disclosed by Bruner, wherein the performing summarization extraction on the listener text to obtain summary text comprises: performing semantic analysis on the listener text; and based on semantic analysis results, the key statements being statements for expressing full-text semantics in the listener text, as disclosed by Jung, to provide a semantic recognition step for a method, apparatus, and terminal for providing a sign language video reflecting an appearance of a conversation partner. One skilled in the art would understand and recognize the value of the addition of a semantic recognition step to improve a sign language video reflecting an appearance of a conversation partner.
In regards to claim 6, Bruner does not disclose wherein generating the sign language video based on the sign language text output comprises: acquiring sign language gesture information corresponding to sign language words in the sign language text output; controlling a virtual object to perform sign language gestures in sequence based on the sign language gesture information; and generating the sign language video based on a picture of the virtual object performing the sign language gestures.
Jung discloses 
wherein generating the sign language video based on the sign language text output comprises: acquiring sign language gesture information corresponding to sign language words in the sign language text output ([0015], “the word-joint information and the sentence-joint information may include one or more pieces of unit-joint information including coordinates … of a human body necessary for reproducing gestures”); 
controlling a virtual object to perform sign language gestures in sequence based on the sign language gesture information ([0020], “the video-based non-linguistic information may be information quantified by recognizing … a gesture appearing in the appearance image” Examiner notes that an appearance image may be a virtual object.); 
and generating the sign language video based on a picture of the virtual object  performing the sign language gestures ([0085], “When the background image synthesis step … is performed, the motion model … reproduces the sign language gesture on the background image”).
Bruner and Jung are considered analogous to the claimed invention because they are in the field of sign language apparatuses and methods. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the applicant’s invention for a method for generating a sign language video performed by a computer device, the method comprising: acquiring listener text, the listener text conforming to grammatical structures of a hearing-friendly person; performing summarization extraction on the listener text to obtain summary text, a text length of the summary text being shorter than a text length of the listener text; converting the summary text into sign language text, the sign language text conforming to grammatical structures of a hearing-impaired person; and generating the sign language video based on the sign language text, extracting key statements from the listener text, and determining the key statements as the summary text, as disclosed by Bruner, wherein generating the sign language video based on the sign language text output comprises: acquiring sign language gesture information corresponding to sign language words in the sign language text output; controlling a virtual object to perform sign language gestures in sequence based on the sign language gesture information; and generating the sign language video based on a picture of the virtual object in performing the sign language gestures, as disclosed by Jung, to provide word-joint information, sentence-joint information, video-based non-linguistic information, and a background image synthesis step for a method, apparatus, and terminal for providing a sign language video reflecting an appearance of a conversation partner. One skilled in the art would understand and recognize the value of the addition of word-joint information, sentence-joint information, video-based non-linguistic information, and a background image to improve a sign language video reflecting an appearance of a conversation partner.
In regards to claim 9, Bruner discloses the following limitations with the exception of the underlined limitations.
wherein extracting the summary of the listener text to obtain the summary text comprises: performing semantic analysis on the listener text;
extracting key statements from the listener text ([0059], “the translation platform evaluates … words to be dropped and/or … words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that dropping insignificant words results in the extraction of key statements.) based on semantic analysis results, the key statements being statements for expressing full-text semantics in the listener text;
and determining the key statements as the summary text ([0059], “the translation platform evaluates … words to be dropped and/or … words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that dropping insignificant words results in key statements and the corresponding summary text.).
Jung discloses 
wherein extracting the summary of the listener text to obtain the summary text comprises: performing semantic analysis on the listener text ([0059], “The recognition step … includes a semantic recognition step … of recognizing a … sentence from the … information”);
based on semantic analysis results, the key statements being statements for expressing full-text semantics in the listener text ([0060], “The semantic recognition step … is a process of recognizing the linguistic meaning of the speech language from the speech information storing the sound”);
Bruner and Jung are considered analogous to the claimed invention because they are in the field of sign language apparatuses and methods. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the applicant’s invention for a method for generating a sign language video performed by a computer device, the method comprising: acquiring listener text, the listener text conforming to grammatical structures of a hearing-friendly person; performing summarization extraction on the listener text to obtain summary text, a text length of the summary text being shorter than a text length of the listener text; converting the summary text into sign language text, the sign language text conforming to grammatical structures of a hearing-impaired person; and generating the sign language video based on the sign language text, extracting key statements from the listener text, and determining the key statements as the summary text, as disclosed by Bruner, wherein extracting the summary of the listener text to obtain the summary text comprises: performing semantic analysis on the listener text; and based on semantic analysis results, the key statements being statements for expressing full-text semantics in the listener text, as disclosed by Jung, to provide a semantic recognition step for a method, apparatus, and terminal for providing a sign language video reflecting an appearance of a conversation partner. One skilled in the art would understand and recognize the value of the addition of a semantic recognition step to improve a sign language video reflecting an appearance of a conversation partner.
In regards to claim 13, Bruner does not disclose wherein generating the sign language video based on the sign language text output comprises: acquiring sign language gesture information corresponding to sign language words in the sign language text output; controlling a virtual object to perform sign language gestures in sequence based on the sign language gesture information; and generating the sign language video based on a picture of the virtual object performing the sign language gestures.
Jung discloses 
wherein generating the sign language video based on the sign language text output comprises: acquiring sign language gesture information corresponding to sign language words in the sign language text output ([0015], “the word-joint information and the sentence-joint information may include one or more pieces of unit-joint information including coordinates … of a human body necessary for reproducing gestures”); 
controlling a virtual object to perform sign language gestures in sequence based on the sign language gesture information ([0020], “the video-based non-linguistic information may be information quantified by recognizing … a gesture appearing in the appearance image” Examiner notes that an appearance image may be a virtual object.); 
and generating the sign language video based on a picture of the virtual object performing the sign language gestures ([0085], “When the background image synthesis step … is performed, the motion model … reproduces the sign language gesture on the background image”).
Bruner and Jung are considered analogous to the claimed invention because they are in the field of sign language apparatuses and methods. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the applicant’s invention for a method for generating a sign language video performed by a computer device, the method comprising: acquiring listener text, the listener text conforming to grammatical structures of a hearing-friendly person; performing summarization extraction on the listener text to obtain summary text, a text length of the summary text being shorter than a text length of the listener text; converting the summary text into sign language text, the sign language text conforming to grammatical structures of a hearing-impaired person; and generating the sign language video based on the sign language text, extracting key statements from the listener text, and determining the key statements as the summary text, as disclosed by Bruner, wherein generating the sign language video based on the sign language text output comprises: acquiring sign language gesture information corresponding to sign language words in the sign language text output; controlling a virtual object to perform sign language gestures in sequence based on the sign language gesture information; and generating the sign language video based on a picture of the virtual object performing the sign language gestures, as disclosed by Jung, to provide word-joint information, sentence-joint information, video-based non-linguistic information, and a background image synthesis step for a method, apparatus, and terminal for providing a sign language video reflecting an appearance of a conversation partner. One skilled in the art would understand and recognize the value of the addition of word-joint information, sentence-joint information, video-based non-linguistic information, and a background image to improve a sign language video reflecting an appearance of a conversation partner.
In regards to claim 16, Bruner discloses the following limitations with the exception of the underlined limitations.
wherein extracting the summary of the listener text to obtain the summary text comprises: performing semantic analysis on the listener text;
extracting key statements from the listener text ([0059], “the translation platform evaluates … words to be dropped and/or … words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that dropping insignificant words results in the extraction of key statements.) based on semantic analysis results, the key statements being statements for expressing full-text semantics in the listener text;
and determining the key statements as the summary text ([0059], “the translation platform evaluates … words to be dropped and/or … words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that dropping insignificant words results in key statements and the corresponding summary text.).
Jung discloses 
wherein extracting the summary of the listener text to obtain the summary text comprises: performing semantic analysis on the listener text ([0059], “The recognition step … includes a semantic recognition step … of recognizing a … sentence from the … information”);
based on semantic analysis results, the key statements being statements for expressing full-text semantics in the listener text ([0060], “The semantic recognition step … is a process of recognizing the linguistic meaning of the speech language from the speech information storing the sound”);
Bruner and Jung are considered analogous to the claimed invention because they are in the field of sign language apparatuses and methods. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the applicant’s invention for a method for generating a sign language video performed by a computer device, the method comprising: acquiring listener text, the listener text conforming to grammatical structures of a hearing-friendly person; performing summarization extraction on the listener text to obtain summary text, a text length of the summary text being shorter than a text length of the listener text; converting the summary text into sign language text, the sign language text conforming to grammatical structures of a hearing-impaired person; and generating the sign language video based on the sign language text, extracting key statements from the listener text, and determining the key statements as the summary text, as disclosed by Bruner, wherein extracting the summary of the listener text to obtain the summary text comprises: performing semantic analysis on the listener text; and based on semantic analysis results, the key statements being statements for expressing full-text semantics in the listener text, as disclosed by Jung, to provide a semantic recognition step for a method, apparatus, and terminal for providing a sign language video reflecting an appearance of a conversation partner. One skilled in the art would understand and recognize the value of the addition of a semantic recognition step to improve a sign language video reflecting an appearance of a conversation partner.
In regards to claim 19, Bruner does not disclose wherein generating the sign language video based on the sign language text output comprises: acquiring sign language gesture information corresponding to sign language words in the sign language text output; controlling a virtual object to perform sign language gestures in sequence based on the sign language gesture information; and generating the sign language video based on a picture of the virtual object in performing the sign language gestures.
Jung discloses 
wherein generating the sign language video based on the sign language text output comprises: acquiring sign language gesture information corresponding to sign language words in the sign language text output ([0015], “the word-joint information and the sentence-joint information may include one or more pieces of unit-joint information including coordinates … of a human body necessary for reproducing gestures”); 
controlling a virtual object to perform sign language gestures in sequence based on the sign language gesture information ([0020], “the video-based non-linguistic information may be information quantified by recognizing … a gesture appearing in the appearance image” Examiner notes that an appearance image may be a virtual object.); 
and generating the sign language video based on a picture of the virtual object in performing the sign language gestures ([0085], “When the background image synthesis step … is performed, the motion model … reproduces the sign language gesture on the background image”).
Bruner and Jung are considered analogous to the claimed invention because they are in the field of sign language apparatuses and methods. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the applicant’s invention for a method for generating a sign language video performed by a computer device, the method comprising: acquiring listener text, the listener text conforming to grammatical structures of a hearing-friendly person; performing summarization extraction on the listener text to obtain summary text, a text length of the summary text being shorter than a text length of the listener text; converting the summary text into sign language text, the sign language text conforming to grammatical structures of a hearing-impaired person; and generating the sign language video based on the sign language text, extracting key statements from the listener text, and determining the key statements as the summary text, as disclosed by Bruner, wherein generating the sign language video based on the sign language text output comprises: acquiring sign language gesture information corresponding to sign language words in the sign language text output; controlling a virtual object to perform sign language gestures in sequence based on the sign language gesture information; and generating the sign language video based on a picture of the virtual object in performing the sign language gestures, as disclosed by Jung, to provide word-joint information, sentence-joint information, video-based non-linguistic information, and a background image synthesis step for a method, apparatus, and terminal for providing a sign language video reflecting an appearance of a conversation partner. One skilled in the art would understand and recognize the value of the addition of word-joint information, sentence-joint information, video-based non-linguistic information, and a background image to improve a sign language video reflecting an appearance of a conversation partner.
Response to Arguments
Applicant's arguments filed January 20, 2026 have been fully considered but they are not persuasive. Claims 1, 2, 6-9, 13-16, and 19-26 remain pending in this application. With respect to the amended claims, Applicant argues that “The human mind cannot practically generate sign language videos. The human mind cannot practically perform text compression according to a compression ratio to obtain the summary text. The human mind cannot practically ‘input[] the summary text into a trained translation model that is trained on text pairs of sample sign language text and sample listener text’ and ‘receiv[e], as output from the trained translation model, a sign language text output that conforms to grammatical structures of a hearing-impaired person.’ The human mind also cannot practically generate a sign language video by ‘synchronizing the sign language video with the timestamps that indicate time intervals of the audio corresponding to the listener text.’” (See AMENDMENT, SUBSTANCE OF INTERVIEW, AND RESPONSE TO NON-FINAL OFFICE ACTION, REMARKS, REMARKS CONCERNING REJECTIONS UNDER 35 U.S.C. 101, page 10, paragraph 4). Examiner acknowledges Applicant’s remarks. Examiner notes in the 35 USC § 101 rejection, claim 1 recites a method for generating sign language videos performed by a computer device, the method comprising: obtaining a listener text, wherein the listener text includes (i) timestamps that indicate time intervals of audio corresponding to the listener text and (ii) text that conforms to grammatical structures of a person who is able to hear; extracting a summary of the listener text to obtain a summary text, wherein a text length of the summary text shorter than a text length of the listener text and wherein the extracting includes: dividing the listener text into a plurality of statements; determining a respective compression ratio for each statement of the plurality of statements; compressing respective text in each statement of the plurality of statements, according to the determined respective compression ratio for the respective statement, thereby obtaining the summary text that includes a plurality of compressed statements, the summary text having full-text semantics that is consistent with full-text semantics of the listener text;                     
and inputting the summary text into a trained translation model that is trained on text pairs of sample sign language text and sample listener text; receiving, as output from the trained translation model, a sign language text output that conforms, to grammatical structures of a hearing-impaired person; and generating a sign language video based on the sign language text output including synchronizing the sign language video with the timestamps that indicate time intervals of the audio corresponding to the listener text. 
Although a human mind cannot practically generate a video, using computing equipment a human can take the requisite steps (obtaining text, extracting a summary from the text, dividing the text into statements, etc. as described in the claimed invention) to cause the generation of a video. Also, a human can filter words out of a summary to compress the summary and thereafter calculate a compression ratio. 
Further, a human can input text into a model for translation and synchronize timestamps for a video. Taken as a whole, claim 1 amounts to a form of mental process and organizing human activity (i.e., an abstract idea) because a human can obtain listener text, summarize the extracted listener text, and convert the extracted summarized text into sign language. Applicant of claimed invention discloses “acquiring listener text, the listener text being texts conforming to grammatical structures of a hearing-friendly person.” [0007]
In prong two of step 2A, an evaluation is made whether a claim recites any additional element, or combination of additional elements, that integrate the exception into a practical application of that exception. An “additional element” is an element that is recited in the claim in addition to (beyond) the judicial exception (i.e., an element/limitation that sets forth an abstract idea is not an additional element). The phrase “integration into a practical application” is defined as requiring an additional element or a combination of additional elements in the claim to apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that it is more than a drafting effort designed to monopolize the exception. 
The requirement to execute the claimed steps/functions using a computing device (independent claims 1, 8, and 15 and dependent claims 2, 6-7, 9, 13-14, 16, and 19-26) is equivalent to adding the words “apply it” on a generic computer and/or mere instructions to implement the abstract idea on a generic computer. Similarly, the limitations of memory/storage medium and processors (independent claims 1, 8, and 15 and dependent claims 2, 6-7, 9, 13-14, 16, and 19-26) are recited at a high level of generality and amount to no more than mere instructions to apply the exception using generic computer components. These limitations do not impose any meaningful limits on practicing the abstract idea, and therefore do not integrate the abstract idea into a practical application (see MPEP 2106.05(f)).
Use of a computer, processor, memory or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general-purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not integrate a judicial exception into a practical application or provide significantly more. See Affinity Labs v. DirecTV, 838 F.3d 1253, 1262, 120 USPQ2d 1201, 1207 (Fed. Cir. 2016) (cellular telephone); TLI Communications LLC v. AV Auto, LLC, 823 F.3d 607, 613, 118 USPQ2d 1744, 1748 (Fed. Cir. 2016) (computer server and telephone unit). Intellectual Ventures I LLC v. Capital One Bank (USA), 792 F.3d 1363, 1367, 115 USPQ2d 1636, 1639 (Fed. Cir. 2015) (See MPEP 2106.05(f)). 
Further, the additional limitations beyond the abstract idea identified above, serve merely to generally link the use of the judicial exception to a particular technological environment or field of use. Specifically, they serve to limit the application of the abstract idea to a computerized environment (e.g., identifying and displaying, etc.) performed by a computing device, processor, and memory, etc. This reasoning was demonstrated in Intellectual Ventures I LLC v. Capital One Bank (Fed. Cir. 2015), where the court determined "an abstract idea does not become nonabstract by limiting the invention to a particular field of use or technological environment, such as the Internet [or] a computer"). These limitations do not impose any meaningful limits on practicing the abstract idea, and therefore do not integrate the abstract idea into a practical application (see MPEP 2106.05(h)). The Examiner has therefore determined that the additional elements, or combination of additional elements, do not integrate the abstract idea into a practical application. Accordingly, the claims are directed to an abstract idea.
In step 2B, the claims are analyzed to determine whether any additional element, or combination of additional elements, are sufficient to ensure that the claims amount to significantly more than the judicial exception. This analysis is also termed a search for an “inventive concept.” An “inventive concept” is furnished by an element or combination of elements that is recited in the claim in addition to (beyond) the judicial exception, and is sufficient to ensure that the claim as a whole amount to significantly more than the judicial exception itself. Alice Corp., 573 U.S. at 27-18, 110 USPQ2d at 1981 (citing Mayo, 566 U.S. at 72-73, 101 USPQ2d at 1966). The identified additional elements in independent claims 1, 8, and 15 and dependent claims 2, 6-7, 9, 13-14, 16, and 19-26 are equivalent to adding the words “apply it” on a generic computer, and/or generally link the use of the judicial exception to a particular technological environment or field of use. Therefore, the claims as a whole do not amount to significantly more than the judicial exception itself.
Viewing the additional limitations in combination also shows that they fail to ensure the claims amount to significantly more than the abstract idea. When considered as an ordered combination, the additional components of the claims add nothing that is not already present when considered separately, and thus simply append the abstract idea with words equivalent to “apply it” on a generic computer and/or mere instructions to implement the abstract idea on a generic computer or/and append the abstract idea with insignificant extra solution activity associated with the implementation of the judicial exception, (e.g., mere data gathering, post-solution activity) and/or simply appending well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception.
Dependent claims 2, 6-7, 9, 13-14, 16, and 19-26 fail to include any additional elements. In other words, each of the limitations/elements recited in respective independent claims are further part of the abstract idea as identified by the Examiner for each respective dependent claim (i.e. they are part of the abstract idea recited in each respective claim). The Examiner has therefore determined that no additional element, or combination of additional claims elements are sufficient to ensure the claims amount to significantly more than the abstract idea identified above. Therefore, the rejections of claims 1-2, 6-9, 13-16, and 19-26 under 35 U.S.C. § 101 are maintained.
With respect to “REJECTION OF CLAIMS 1, 3-5, 7-8, 10-12, 14-15, 17-18, AND 20 UNDER 35 U.S.C. 102 AS BEING UNPATENTABLE OVER BRUNER”, Applicant argues “The cited portions of Bruner do not teach the newly recited claim features.” (See AMENDMENT, SUBSTANCE OF INTERVIEW, AND RESPONSE TO NON-FINAL OFFICE ACTION, REMARKS, REMARKS CONCERNING REJECTIONS UNDER 35 U.S.C. 102, page 12, paragraph 5). Examiner acknowledges Applicant’s remarks. Applicant cancelled claims 3-5, 10-12, and 17-18. Accordingly, Examiner withdraws rejections for claims 3-5, 10-12, and 17-18. Regarding claim 1, Bruner discloses a method for generating sign language videos, performed by a computer device, the method comprising: obtaining a listener text ([0025], “The translation platform may be configured to receive ... textual information” Examiner notes that textual information is similar to listener text in that listener text is text presented in a format for listening.), wherein the listener text includes (i) timestamps that indicate time intervals of audio corresponding to the listener text ([0056], “the translation platform may associate a timer with the … display” Examiner notes that a timer can provide timestamps that indicate time intervals.) and (ii) text that conforms to grammatical structures of a person who is able to hear ([0021], “the Translation Platform allows for rapid conversion of Closed Captioning text” Examiner notes that closed captioning text conforms to the grammatical structures, spelling, and punctuation of the spoken language and aims to match the audio for a hearing person.); extracting a summary of the listener text to obtain a summary text, wherein a text length of the summary text is shorter than a text length of the listener text and wherein the extracting includes: dividing the listener text into a plurality of statements ([0059], “the translation platform evaluates the one or more words to be dropped and/or one or more words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that for summary text insignificant details are excluded to focus on essential information and that summary text typically include a plurality of statements.); determining a respective compression ratio for each statement of the plurality of statements ([0058], “certain words … are not converted … to save time … the system may process at a one to one element ratio”); compressing respective text in each statement of the plurality of statements ([0058], “certain words … are not converted … to save time) , according to the determined respective compression ratio for the respective statement, thereby obtaining the summary text that includes a plurality of compressed statements, the summary text having full-text semantics that is consistent with full-text semantics of the listener text ([0059], “the translation platform evaluates the one or more words to be dropped and/or one or more words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that for summary text insignificant details are excluded to focus on essential information, which amounts to compression, and that summary text typically include a plurality of statements.); and inputting the summary text into a trained translation model that is trained on text pairs of sample sign language text and sample listener text ([0025], “The translation platform may be configured to receive and process … information and/or data (e.g., … spoken information, textual information, … image … information, and/or other such information, … regardless of the language), and produce corresponding translated information”); receiving, as output from the trained translation model, a sign language text output that conforms to grammatical structures of a hearing-impaired person ([0140], “some embodiments provide a conversion, translation or the like of substantially any type of ... information into sign language” Examiner notes that sign language text inherently has unique grammatical structures that accommodate hearing-impaired individuals.); and generating a sign language video based on the sign language text output ([0141], “the sign language format can comprise one or more video clips or segments”) including synchronizing the sign language video with the timestamps that indicate time intervals of the audio corresponding to the listener text ([0056], “the translation platform may associate a timer with the … display” Examiner notes that a timer can provide timestamps that indicate time intervals.).
MPEP § 2111 discusses proper claim interpretation, including giving claims their
broadest reasonable interpretation (“BRI”) in light of the specification during examination. Under BRI, the words of a claim must be given their plain meaning unless such meaning is inconsistent with the specification, and it is improper to import claim limitations from the specification into the claim. Applicant’s argument is not persuasive because the BRI is broader than what is argued. Therefore, the rejection of claim 1, as anticipated by Bruner, is maintained. Consequently, the rejections of independent claims 8 and 15, which are similar to claim 1, and dependent claims 2, 6-7, 9, 13-14, 16, and 19-26 are maintained.
With respect to “REJECTION OF CLAIMS 2, 6,9, 13, 16, AND 19 UNDER 35 U.S.C. 103 AS BEING UNPATENTABLE UNDER BRUNER, IN VIEW OF JUNG”, Applicant argues “Bruner does not teach all the limitations of the amended independent claims. Furthermore, Jung is not cited for, and does not teach, the limitations of the amended independent claims that are missing from Bruner” (See AMENDMENT, SUBSTANCE OF INTERVIEW, AND RESPONSE TO NON-FINAL OFFICE ACTION, REMARKS, REMARKS CONCERNING REJECTIONS UNDER 35 U.S.C. 103, page 13, paragraph 3). Examiner acknowledges Applicant’s remarks. Regarding claim 1, Bruner discloses a method for generating sign language videos, performed by a computer device, the method comprising: obtaining a listener text ([0025], “The translation platform may be configured to receive ... textual information” Examiner notes that textual information is similar to listener text in that listener text is text presented in a format for listening.), wherein the listener text includes (i) timestamps that indicate time intervals of audio corresponding to the listener text ([0056], “the translation platform may associate a timer with the … display” Examiner notes that a timer can provide timestamps that indicate time intervals.) and (ii) text that conforms to grammatical structures of a person who is able to hear ([0021], “the Translation Platform allows for rapid conversion of Closed Captioning text” Examiner notes that closed captioning text conforms to the grammatical structures, spelling, and punctuation of the spoken language and aims to match the audio for a hearing person.); extracting a summary of the listener text to obtain a summary text, wherein a text length of the summary text is shorter than a text length of the listener text and wherein the extracting includes: dividing the listener text into a plurality of statements ([0059], “the translation platform evaluates the one or more words to be dropped and/or one or more words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that for summary text insignificant details are excluded to focus on essential information and that summary text typically include a plurality of statements.); determining a respective compression ratio for each statement of the plurality of statements ([0058], “certain words … are not converted … to save time … the system may process at a one to one element ratio”); compressing respective text in each statement of the plurality of statements ([0058], “certain words … are not converted … to save time) , according to the determined respective compression ratio for the respective statement, thereby obtaining the summary text that includes a plurality of compressed statements, the summary text having full-text semantics that is consistent with full-text semantics of the listener text ([0059], “the translation platform evaluates the one or more words to be dropped and/or one or more words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that for summary text insignificant details are excluded to focus on essential information, which amounts to compression, and that summary text typically include a plurality of statements.); and inputting the summary text into a trained translation model that is trained on text pairs of sample sign language text and sample listener text ([0025], “The translation platform may be configured to receive and process … information and/or data (e.g., … spoken information, textual information, … image … information, and/or other such information, … regardless of the language), and produce corresponding translated information”); receiving, as output from the trained translation model, a sign language text output that conforms to grammatical structures of a hearing-impaired person ([0140], “some embodiments provide a conversion, translation or the like of substantially any type of ... information into sign language” Examiner notes that sign language text inherently has unique grammatical structures that accommodate hearing-impaired individuals.); and generating a sign language video based on the sign language text output ([0141], “the sign language format can comprise one or more video clips or segments”) including synchronizing the sign language video with the timestamps that indicate time intervals of the audio corresponding to the listener text ([0056], “the translation platform may associate a timer with the … display” Examiner notes that a timer can provide timestamps that indicate time intervals.).
MPEP § 2111 discusses proper claim interpretation, including giving claims their
broadest reasonable interpretation (“BRI”) in light of the specification during examination. Under BRI, the words of a claim must be given their plain meaning unless such meaning is inconsistent with the specification, and it is improper to import claim limitations from the specification into the claim. Applicant’s argument is not persuasive because the BRI is broader than what is argued. Therefore, the rejection of claim 1, as anticipated by Bruner, is maintained. Consequently, the rejections of independent claims 8 and 15, which are similar to claim 1, and dependent claims 7, 14, and 20-26 are maintained. Further, the rejections of dependent claims 2, 6, 9, 13, 16, and 19, as obvious under Bruner in view of Jung are maintained.
With respect to “NEW CLAIMS 21-26”, Applicant argues “New claims 21-26 depend from claim 1 and therefore they include all the limitations of claim 1. As discussed above, neither Bruner nor Jung teaches all the limitations of amended claim 1” (See AMENDMENT, SUBSTANCE OF INTERVIEW, AND RESPONSE TO NON-FINAL OFFICE ACTION, REMARKS, REMARKS CONCERNING NEW CLAIMS, page 13, paragraph 5). Examiner acknowledges Applicant’s remarks. Regarding claim 1, Bruner discloses a method for generating sign language videos, performed by a computer device, the method comprising: obtaining a listener text ([0025], “The translation platform may be configured to receive ... textual information” Examiner notes that textual information is similar to listener text in that listener text is text presented in a format for listening.), wherein the listener text includes (i) timestamps that indicate time intervals of audio corresponding to the listener text ([0056], “the translation platform may associate a timer with the … display” Examiner notes that a timer can provide timestamps that indicate time intervals.) and (ii) text that conforms to grammatical structures of a person who is able to hear ([0021], “the Translation Platform allows for rapid conversion of Closed Captioning text” Examiner notes that closed captioning text conforms to the grammatical structures, spelling, and punctuation of the spoken language and aims to match the audio for a hearing person.); extracting a summary of the listener text to obtain a summary text, wherein a text length of the summary text is shorter than a text length of the listener text and wherein the extracting includes: dividing the listener text into a plurality of statements ([0059], “the translation platform evaluates the one or more words to be dropped and/or one or more words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that for summary text insignificant details are excluded to focus on essential information and that summary text typically include a plurality of statements.); determining a respective compression ratio for each statement of the plurality of statements ([0058], “certain words … are not converted … to save time … the system may process at a one to one element ratio”); compressing respective text in each statement of the plurality of statements ([0058], “certain words … are not converted … to save time) , according to the determined respective compression ratio for the respective statement, thereby obtaining the summary text that includes a plurality of compressed statements, the summary text having full-text semantics that is consistent with full-text semantics of the listener text ([0059], “the translation platform evaluates the one or more words to be dropped and/or one or more words around the one or more words to be dropped to determine whether the word or phrase is a word or phrase of significance.” Examiner notes that for summary text insignificant details are excluded to focus on essential information, which amounts to compression, and that summary text typically include a plurality of statements.); and inputting the summary text into a trained translation model that is trained on text pairs of sample sign language text and sample listener text ([0025], “The translation platform may be configured to receive and process … information and/or data (e.g., … spoken information, textual information, … image … information, and/or other such information, … regardless of the language), and produce corresponding translated information”); receiving, as output from the trained translation model, a sign language text output that conforms to grammatical structures of a hearing-impaired person ([0140], “some embodiments provide a conversion, translation or the like of substantially any type of ... information into sign language” Examiner notes that sign language text inherently has unique grammatical structures that accommodate hearing-impaired individuals.); and generating a sign language video based on the sign language text output ([0141], “the sign language format can comprise one or more video clips or segments”) including synchronizing the sign language video with the timestamps that indicate time intervals of the audio corresponding to the listener text ([0056], “the translation platform may associate a timer with the … display” Examiner notes that a timer can provide timestamps that indicate time intervals.).
MPEP § 2111 discusses proper claim interpretation, including giving claims their
broadest reasonable interpretation (“BRI”) in light of the specification during examination. Under BRI, the words of a claim must be given their plain meaning unless such meaning is inconsistent with the specification, and it is improper to import claim limitations from the specification into the claim. Applicant’s argument is not persuasive because the BRI is broader than what is argued. Therefore, the rejection of claim 1, as anticipated by Bruner, is maintained. Consequently, the rejections of independent claims 8 and 15, which are similar to claim 1, and dependent claims 2, 6-7, 9, 13-14, 16, and 19-26 are maintained.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Lisa Antoine whose telephone number is (571)272-4252. The examiner can normally be reached Monday - Thursday 8:30 am - 6:30 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xuan Thai can be reached at (571) 272-7147. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

LISA H ANTOINE
Examiner
Art Unit 3715

/XUAN M THAI/Supervisory Patent Examiner, Art Unit 3715
Read full office action
METHOD AND APPARATUS FOR GENERATING SIGN LANGUAGE VIDEO, COMPUTER DEVICE, AND STORAGE MEDIUM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHOD AND APPARATUS FOR GENERATING SIGN LANGUAGE VIDEO, COMPUTER DEVICE, AND STORAGE MEDIUM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email