Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are pending. Claims 1, 11, and 20 are independent.
This Application was published as US 20250232140.
Apparent priority is 17 January 2024.
The instant Application is directed to a method of translation based on sentiment analysis.
Response to Amendment
Applicant’s amendments to the claims have overcome the objection to claims 1, 11, and 20; therefore, the objection is withdrawn.
Response to Arguments
35 USC 101
Applicant's arguments have been fully considered but they are not persuasive.
Regarding Argument (a:) a. The Office's analysis fails to establish that the Applicant's claims "recite" an abstract idea
Final rejection mailed 1/9/26 clearly details on pgs. 5-6 how specific claim limitations can be performed in the human mind (mental process).
Applicant argues that the phrasing “determine sentiment and perform a translation based on the sentiment” improperly imputes features into the claim language that are not present. Examiner disagrees, as these phrases, which are meant only to summarize limitations which are listed in full in subsequent sections, are taken almost verbatim from the claims. “Determine sentiment” is a feature of claim 1, as seen in line 8: “determine a sentiment score associated with a first scene of the first content item…”. “Perform a translation based on the sentiment” is a feature of claim 1, as seen in line 13: “generate a translation of the first scene based on the sentiment score…”. At least these two features are both fully recited in the claims, and can be fully performed as a mental process. For completeness of the record, the currently amended limitations which recite mental processes are reproduced here along with examples:
for a first content item, obtain text data, audio data, and video data, [Agent watches a video with sound and subtitles]
the text data and audio data being associated with a first language; [The audio and subtitles are in English]
determine a sentiment score associated with a first scene of the first content item based on corresponding portions of text data, the audio data and the video data, the portions of the video data identifying and characterizing a combination of environmental objects included in the first scene and a combination of colors associated with the first scene; [Agent scores the sentiment as positive based on a transcript, audio, and video, further identifying the scene is a party based on identifying cake and balloons, and identifying colors of red and orange as positive.]
generate a translation of the first scene based on the sentiment score, the translation being associated with a second language [Agent translates the video to Spanish using positive words]
output the translation [Agent reads the translation]
Therefore, under Step 2A Prong One, the claims do recite mental processes.
The remaining arguments under Argument (a.) (pgs. 7-8) are directed to Step 2A Prong Two. Applicant argues that the claim recites a specific improvement to a translation computing system.
MPEP 21.6.05(a) states: “It is important to note, the judicial exception alone cannot provide the improvement.” The argued improvement to the translation computing system is provided only by the limitations listed above, which recite a judicial exception. Therefore, the claims do not provide an improvement (practical application) under Step 2A Prong Two.
Regarding Argument (b.): b. Applicant's independent claims are not "directed to" an abstract idea
As mentioned above, Final Rejection mailed 1/9/26 details on pgs. 5-6 every limitation of claim 1. The updated rejection below includes similar analysis for the amended claims.
Applicant argues that a “computing system” and “at least one processor” (as well as “computer-implemented method” and “tangible, non-transitory computer-readable medium” in claims 11 and 20 respectively) “represent meaningful limitations that integrate any allegedly abstract idea into a practical application, which, among other things, improves the operation of a translation computing system itself.” (pg. 10). Examiner disagrees, and maintains that these additional elements are generic computing components as will be discussed further below. As previously mentioned, any improvement that integrates the judicial exception cannot be performed by the abstract idea (MPEP 21.6.05(a)). The limitations of a “computing system” and “at least one processor” (as well as “computer-implemented method” and “tangible, non-transitory computer-readable medium” in claims 11 and 20 respectively) do not of themselves provide any improvement. Although not explicitly argued, a “communications interface” and a “memory” also do not provide an improvement. Thus, when considered as a whole, the claim elements do not integrate the abstract idea into a practical application under Step 2A Prong Two.
Regarding Argument (c.): c. Applicant's independent claims amount to "'significantly more" than any alleged abstract idea
Applicant argues that the above-quoted elements (“computing system”, “at least one processor”, “computer-implemented method”, and “tangible, non-transitory computer-readable medium”) do not merely recite generic compute implementation of human activity, and rather address a specific technological deficiency.
The Step 2B analysis is similar to the Step 2A Prong Two analysis with an additional consideration of whether additional elements are well-understood, routine, conventional activities previously known to the industry. See MPEP 2106.05(d). Examiner maintains that the use of the quoted elements of a “computing system”, “at least one processor”, “computer-implemented method”, and “tangible, non-transitory computer-readable medium” is well-understood, routine, and conventional activity. As mentioned above, the additional elements themselves do not provide an improvement. Therefore, under Step 2B, the additional elements do not amount to significantly more than the judicial exception.
Applicant argues that references to MPEP 2106.05(h) and an unidentified portion of Alice Corp. Pty. Ltd. v. CLS Bank Intl, 134 S. Ct. 2347 (2014) are inapplicable. Examiner is unclear what Applicant is referencing, as these references do not appear to be cited in the Final Action mailed 1/9/26. Further, at least MPEP 2106.05(h) is applicable to Step 2B: “Another consideration when determining whether a claim integrates the judicial exception into a practical application in Step 2A Prong Two or recites significantly more than a judicial exception in Step 2B is whether the additional elements amount to more than generally linking the use of a judicial exception to a particular technological environment or field of use.” Examiner requests clarification on which pages of the office action are referenced. The Final Action of 1/9/26 does reference MPEP 2106.015(f) under Step 2B, which is also applicable. “For example, because this consideration often overlaps with the improvement consideration (see MPEP § 2106.05(a)), the particular machine and particular transformation considerations (see MPEP § 2106.05(b) and (c), respectively), and the well-understood, routine, conventional consideration (see MPEP § 2106.05(d)), evaluation of those other considerations may assist examiners in making a determination of whether an element (or combination of elements) is more than mere instructions to apply an exception. Note, however, that examiners should not evaluate the well-understood, routine, conventional consideration in the Step 2A Prong Two analysis, because that consideration is only evaluated in Step 2B.” MPEP 2106.05(f)
Applicant requested documentary evidence that the claimed elements are “well known, routine, and conventional activities.” In response, Examiner cites, as one example, Torres et al. (US 20200210780 A1), which discloses, as well-known, the argued elements of “computing system”, “at least one processor”, “computer-implemented method”, and “tangible, non-transitory computer-readable medium”, as well as the unargued additional elements of a “communications interface”, a “memory storing instructions”, “audio device”, and “display device” (mapping added in parentheses):
“[0045] The above-described methods can be implemented on a computer using well-known computer processors, memory units, storage devices, computer software, and other components. (computing system, computer-implemented method) A high-level block diagram of such a computer is illustrated in FIG. 6. Computer 600 contains a processor 610, (at least one processor) which controls the overall operation of the computer 600 by executing computer program instructions which define such operation. It is to be understood that the processor 610 can include any type of device capable of executing instructions. For example, the processor 610 may include one or more of a central processing unit (CPU), a graphical processing unit (GPU), a field-programmable gate array (FPGA), and an application-specific integrated circuit (ASIC). The computer program instructions may be stored in a storage device 620 (e.g., magnetic disk) and loaded into memory 630 when execution of the computer program instructions is desired. Thus, the steps of the methods described herein may be defined by the computer program instructions stored in the memory 630 (tangible, non-transitory computer-readable medium, memory storing instructions) and controlled by the processor 610 executing the computer program instructions. According to various implementations, the computer may perform method steps as part of an in-house server or cloud based service. The computer 600 may include one or more network interfaces 650 for communicating with other devices (communications interface) via a network. The computer 600 also includes other input/output devices 660 that enable user interaction with the computer 600 (e.g., display, (display device) keyboard, mouse, speakers, (audio device) buttons, etc.). According to various embodiments, FIG. 6 is a high level representation of possible components of a computer for illustrative purposes and the computer may contain other components.”
As shown by Torres, the basic, and generically described, computing components listed in the claims do not exceed well-understood, routine, conventional activities known in the industry. Therefore, the additional elements do not amount to significantly more than the judicial exception under Step 2B.
Additionally, Examiner’s arguments in the Final Action of 1/9/2026 (pgs. 2-3) are not addressed by Applicant, and are still applicable. As cited previously, MPEP 2106.04(a)(2), subsection III states: “Nor do the courts distinguish between claims that recite mental processes performed by humans and claims that recite mental processes performed on a computer.” As a whole, the claims amount to mental processes performed on a computer, and do not recite additional elements that integrate the judicial exception into a practical application or amount to significantly more than the judicial exception.
Therefore, Applicant’s arguments are not persuasive, and the rejection is maintained.
35 USC 103
Applicant’s arguments with respect to 35 USC 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Step 1: The independent Claims are directed to statutory categories:
Claim 1 is a device claim and directed to the machine or manufacture category of patentable subject matter.
Claim 11 is a method claim and directed to the process category of patentable subject matter.
Claim 20 is a computer program product claim and is directed to the machine or manufacture category of patentable subject matter.
Step 2A, Prong One: Does the Claim recite a Judicially Recognized Exception? Abstract Idea? Are these Claims nevertheless considered Abstract as a Mathematical Concept (mathematical relationships, mathematical formulas or equations, mathematical calculations), Mental Process (concepts performed in the human mind (including an observation, evaluation, judgment, opinion), or Certain Methods of Organizing Human Activity (1-fundamental economic principles or practices (including hedging, insurance, mitigating risk), 2-commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations), 3- managing personal behavior or relationships or interactions between people (including social activities, teaching, and following rules or instructions) and fall under the judicial exception to patentable subject matter?)
The rejected Claims recite Mental Processes. See below mapping which shows the example mental process of each limitation that is not an additional element.
Step 2A, Prong Two: Additional Elements that Integrate the Judicial Exception into a Practical Application? Identifying whether there are any additional elements recited in the claim beyond the judicial exception(s), and evaluating those additional elements to determine whether they integrate the exception into a practical application of the exception. “Integration into a practical application” requires an additional element(s) or a combination of additional elements in the claim to apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize the exception. Uses the considerations laid out by the Supreme Court and the Federal Circuit to evaluate whether the judicial exception is integrated into a practical application.
The rejected Claims do not include additional limitations that point to integration of the abstract idea into a practical application and are therefore directed to a Mental Process.
Claim 1 is a generic automation of a mental process because a human agent can determine sentiment and perform a translation based on the sentiment, as further detailed below. Prong Two of step 2A in the 101 analysis asks whether the abstract idea is integrated with a practical application. The answer is no in this instance because there is no technological solution in the Claim that “integrates” the abstract idea. The Claim only suggests that the abstract idea be applied. It does not describe an application.
1. A computing system comprising: a communications interface;
a memory storing instructions; and
at least one processor coupled to the communications interface and to the memory, the at least one processor being configured to execute the instructions to: [these are generic computing components]
for a first content item, obtain text data, audio data, and video data, [Agent watches a video with sound and subtitles]
the text data and audio data being associated with a first language; [The audio and subtitles are in English]
determine a sentiment score associated with a first scene of the first content item based on corresponding portions of text data, the audio data and the video data, the portions of the video data identifying and characterizing a combination of environmental objects included in the first scene and a combination of colors associated with the first scene; [Agent scores the sentiment as positive based on a transcript, audio, and video, further identifying the scene is a party based on identifying cake and balloons, and identifying colors of red and orange as positive.]
generate a translation of the first scene based on the sentiment score, the translation being associated with a second language [Agent translates the video to Spanish using positive words]
output the translation via at least one of a display device, an audio device or a combination thereof in accordance with the sentiment score and the translation. [Agent reads the translation aloud – a display device or an audio device is a generic computing component]
Step 2B: Search for Inventive Concept: Additional Element Do not amount to Significantly More: The limitations of "computing system,” “a communications interface,” “a memory,” “at least one processor,” “display device,” and “audio device” are well-understood, routine, and conventional machine components that are being used for their well-understood, routine, and conventional and rather generic functions. Additionally, these limitations are expressed parenthetically and lack nexus to the Claim language and as such are a separable and divisible mention to a machine. Accordingly, they are not sufficient to cause the Claim to amount to significantly more than the underlying abstract idea.
The Dependent Claims do not add limitations that could help the Claim as a whole to amount to significantly more than the Abstract idea identified for the Independent Claim:
2. The computing system of claim 1, wherein to determine the sentiment score, the at least one processor is further configured to: determine the sentiment score by applying a machine learning process to the portions of text data, portions of audio data, and portions of video data associated with the first scene. [“Applying a machine learning process” does not provide any details of how machine learning is implemented, and amounts to well-understood, routine, and conventional use of machine components. See MPEP 2106.05(f). Additionally, under the broadest reasonable interpretation, the machine learning process does not even have to directly determine the score and could be used for preprocessing or other tasks.]
3. The computing system of claim 1, wherein the sentiment score is based on a sentiment value associated with the portion of audio data corresponding to the first scene. [Agent scores the audio as a 3]
4. The computing system of claim 1, wherein the sentiment score is based on a sentiment value associated with the portion of text data corresponding to the first scene. [Agent scores the text as a 5]
5. The computing system of claim 1, wherein to generate the translation of the first scene, the at least one processor is further configured to: generate the translation of the first scene by applying a machine learning process to the sentiment score and the text data. [“Applying a machine learning process” does not provide any details of how machine learning is implemented, and amounts to well-understood, routine, and conventional use of machine components. See MPEP 2106.05(f). Additionally, under the broadest reasonable interpretation, the machine learning process does not even have to directly determine the translation and could be used for preprocessing or other tasks.]
6. The computing system of claim 1, wherein the translation of the first scene is translated text data associated with the second language. [Agent translates to Spanish text]
7. The computing system of claim 1, wherein the translation of the first scene is translated audio data associated with the second language. [Agent reads the Spanish translation out loud]
8. The computing system of claim 1, wherein a portion of the audio data is associated with music associated with the first scene. [Agent scores the music as a 4]
9. The computing system of claim 1, wherein the audio data is associated with dialogue associated with the first scene. [The video contains dialog]
10. The computing system of claim 1, wherein the at least one processor is further configured to: receive a search query; search the text data; search the translation; and return a search result. [Agent searches both transcripts for the word “no” and highlights it]
The additional limitations introduced by the Dependent Claims are not sufficient as additional elements that integrate the judicial exception into a practical application or as additional elements that cause the Claim as a whole to amount to substantially more than the underlying abstract idea.
With respect to Independent Claim 11 and independent Claim 20, which have limitations similar to the limitations of Claim 1, there are no additional limitations that cause the Claim as a whole to amount to more than the underlying abstract idea.
The Dependent Claims 12-19 are similar to claims 2-4 and 6-10 and do not add limitations that could integrate the judicial exception into a practical application or help the Claim as a whole to amount to significantly more than the Abstract idea identified for the Independent Claim.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-9, 11-18, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gupta et al. (US 20220358905 A1) in view of Poria et al. ("Fusing audio, visual and textual clues for sentiment analysis from multimodal content"), Cheng et al. (“Context-Aware Based Visual-Audio Feature Fusion for Emotion Recognition”), and Plummer et al. (US 11770572 B1).
Regarding claim 1, Gupta discloses: 1. A computing system comprising: a communications interface; ("[0191] Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wire-line, optical fiber cable, radio frequency, etc., or any suitable combination of the foregoing…." )
a memory storing instructions; and ("[0189] The computer readable medium described in the claims below may be a computer readable signal medium or a computer readable storage medium..." )
at least one processor coupled to the communications interface and to the memory, the at least one processor being configured to execute the instructions to: ("[0192]... These computer program instructions may be provided to a processor of a general-purpose computer..." )
for a first content item, obtain text data, audio data, and video data, ("Input Transcription 108" ; "Input Audio 106" ; "Input Video 104" Fig. 1)
the text data and audio data being associated with a first language; ("Translate input transcription from input language to output language 210" ; "Translate input audio from input language to output language 212" Fig. 2)
determine a sentiment score associated with a first scene of the first content item based on corresponding portions of text data, the audio data and the video data, the portions of the video data identifying and characterizing a combination of environmental objects included in the first scene and a combination of colors associated with the first scene; ("[0064] In some embodiments, text preprocessor 128 is configured to convert text into phoneme analysis and/or perform emotional/sentiment analysis…" )
generate a translation of the first scene based on the sentiment score, the translation being associated with a second language; and ("[0015] Once the meta data is acquired, the input transcription and input meta information are translated into the first output language based at least on the timing information and the emotion data, such that the translated transcription and meta information include similar emotion and pacing in comparison to the input transcription and input meta information..." ; See also "[0125] Training AI meta information processor 130 to recognize and generate emotional data further improves the overall system because various sentiments can be captured and inserted into the translations…" )
output the translation via at least one of a display device, an audio device or a combination thereof in accordance with the sentiment score and the translation. (Fig. 1, “Output Media File” 122)
Gupta does not explicitly disclose that a sentiment score is based on audio data and video data, or that the portions of video data identify and characterize a combination of environmental objects included in the first scene and a combination of colors associated with the first scene.
Poria discloses: determine a sentiment score associated with a first scene of the first content item based corresponding portions of text data, the audio data and the video data; ("Next, we fused the audio, visual and textual feature vectors to form a final feature vector which contained the information of both audio, visual and textual data. Later, a supervised classifier was employed on the fused feature vector to identify the overall polarity of each segment of the video clip. On the other hand, we also carried out an experiment on decision-level fusion, which took the sentiment classification result from 3 individual modalities as inputs and produced the final sentiment label as an output." Pg. 53, Section 4.4.)
Gupta and Poria are considered analogous art to the claimed invention because they disclose methods of sentiment analysis. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Gupta to use audio and video with the text to determine sentiment as taught by Poria. Doing so would have been beneficial in order to enable effective extraction of the semantic and affective information conveyed during communication. (Poria, Pg. 51 Section 2.)
Poria does not explicitly disclose that the portions of video data identify and characterize a combination of environmental objects included in the first scene and a combination of colors associated with the first scene.
Cheng discloses: determine a sentiment score associated with a first scene of the first content item based on corresponding portions of the video data, the portions of the video data identifying and characterizing a combination of environmental objects included in the first scene. (“Therefore, this paper proposes an emotion recognition method based on video scenes and objects context clues.” Pg. 1, last para; see also: “Due to scenes and objects in videos include abundant emotional clues, as shown in Fig.1, we propose a temporal-spatial network to exploit the scenes and objects information respectively. Specifically, we introduce hierarchical Bi-LSTM to summarize video scenes and combine the attention mechanism with the GCN to dig the emotion relationship between different objects." Pg. 2, section A. Overview. See also Fig. 5 which shows a video scene of a graveyard is identified as Sadness. Cheng discloses emotion relationship between different objects which implies a combination of objects.)
Gupta, Poria, and Cheng are considered analogous art to the claimed invention because they disclose methods of sentiment analysis. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Gupta in view of Poria to analyze environmental objects in the video data to determine sentiment, as disclosed by Cheng. Doing so would have been beneficial because scenes and objects in videos include abundant emotional clues (Cheng, Pg. 2, Section A) and because it would allow determining sentiment if there is not a person in the frame. Gupta discloses that emotional information improves translation (Gupta [0074]). This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
Cheng does not explicitly disclose that the portions of video data identify and characterize a combination of colors associated with the first scene.
Plummer discloses: determine a sentiment score associated with a first scene of the first content item based on corresponding portions of the video data, the portions of the video data identifying and characterizing a combination of colors associated with the first scene. (“In aspects, once the second computer vision module 108 performs its functions, the video 128 and its associated categorizations may be stored in a database 114. Optionally, in aspects, once the second computer vision module 108 performs its functions, the video 128 may be sent to further modules implementing artificial intelligence and/or machine learning models for further processing. In aspects, one such module may be the sentiment analysis module 110. The sentiment analysis module 110 enables identification of a third information from the video 128, where the third information includes, without limitation, a sentiment in the video 128. The sentiment analysis module 110 can determine the sentiment in the video 128 by using trained models that are trained to determine mood, stress levels, colors, positive/negative sentiment, etc. in the video 128 to determine an overall sentiment. FIG. 1 shows trained model 112, which may be used for such a purpose.” Col 8, para 2 – Plummer discloses multiple colors which reads on a combination of colors.)
Gupta, Poria, Cheng, and Plummer are considered analogous art to the claimed invention because they disclose methods of sentiment analysis. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination to further analyze color combinations in the video data to determine sentiment, as disclosed by Plummer. Doing so would have been beneficial so that the sentiment could be mapped to a category/mood (Plummer Col 2; 54-57) and because it would allow determining sentiment if there is not a person in the frame. Gupta discloses that emotional information improves translation (Gupta [0074]). This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
Regarding claim 2, Gupta discloses: 2. The computing system of claim 1, wherein to determine the sentiment score, the at least one processor is further configured to: determine the sentiment score by applying a machine learning process to the portions of text data, portions of audio data, and portions of video data associated with the first scene. ("[0096] The following systems may be replaced by preprocessing AI: video preprocessor 124, audio preprocessor 126, speaker diarization processor 125, text preprocessor 128…")
Regarding claim 3, Gupta does not disclose a sentiment value based on audio data.
Poria discloses: 3. The computing system of claim 1, wherein the sentiment score is based on a sentiment value associated with the portion of audio data corresponding to the first scene. ("In decision-level fusion, we obtained feature vectors from the above-mentioned methods but instead of concatenating the feature vectors as in feature-level fusion, we used a separate classifier for each modality. The output of each classifier was treated as a classification score. In particular, we obtained a probability score for each sentiment class, from each classifier. In our case, as there are 3 sentiment classes, we obtained 3 probability scores from each modality." Pg. 56, section 8.2.)
See claim 1 for motivation statement
Regarding claim 4, Gupta discloses sentiment score based on text, but does not specifically disclose an intermediate value for determining the score.
Poria discloses: 4. The computing system of claim 1, wherein the sentiment score is based on a sentiment value associated with the portion of text data corresponding to the first scene. ("In decision-level fusion, we obtained feature vectors from the above-mentioned methods but instead of concatenating the feature vectors as in feature-level fusion, we used a separate classifier for each modality. The output of each classifier was treated as a classification score. In particular, we obtained a probability score for each sentiment class, from each classifier. In our case, as there are 3 sentiment classes, we obtained 3 probability scores from each modality." Pg. 56, section 8.2.)
See claim 1 for motivation statement
Regarding claim 5, Gupta discloses: 5. The computing system of claim 1, wherein to generate the translation of the first scene, the at least one processor is further configured to: generate the translation of the first scene by applying a machine learning process to the sentiment score and the text data. ("[0072]... As exemplified in FIG. 5, various inputs are provided to transcription and meta translation generator 132, which translates the input transcription into output language 112 in the form of translated transcription 134 and translated meta information 135..." ; see also "[0096]...Likewise, the following generators may be replaced by generative AI: input transcription generator 127, transcription and meta translation generator 132, translated text preprocessor 136, audio translation generator 138, translated audio preprocessor 142, and video sync generator 144..." )
Regarding claim 6, Gupta discloses: 6. The computing system of claim 1, wherein the translation of the first scene is translated text data associated with the second language. ("Translated Transcription 134" Fig. 5)
Regarding claim 7, Gupta discloses: 7. The computing system of claim 1, wherein the translation of the first scene is translated audio data associated with the second language. ("Translated Audio 140" Fig. 6)
Regarding claim 8, Gupta discloses: 8. The computing system of claim 1, wherein a portion of the audio data is associated with music associated with the first scene. (Claim 8 does not specify that the music is used in the sentiment analysis, only that it is associated with a portion of the audio. One of ordinary skill in the art would understand that many videos contain music. Nothing in the system of Gupta limits the video from containing music. Gupta discloses in [0011] and [0050] that the input audio can be preprocessed to partition the vocal streams. Therefore, the system disclosed by Gupta would be capable of performing the claimed limitations. For the purposes of compact prosecution, a reference is provided in the conclusion regarding sentiment analysis of music.)
Regarding claim 9, Gupta discloses: 9. The computing system of claim 1, wherein the audio data is associated with dialogue associated with the first scene. ("[0055] Speaker diarization processor 125 is configured to partition input audio 106 into homogeneous vocal segments according to an identifiable speaker. Ultimately, speaker diarization processor 125 performs a series of steps to identify one or more speakers in input media 102 and associate each string of speech (also referred to as a vocal segment) with the proper speaker." - voice segments are dialogue.)
Claim 11 is a method claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.
Claim 12 is a method claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 13 is a method claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 14 is a method claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.
Claim 15 is a method claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.
Claim 16 is a method claim with limitations corresponding to the limitations of Claim 7 and is rejected under similar rationale.
Claim 17 is a method claim with limitations corresponding to the limitations of Claim 8 and is rejected under similar rationale.
Claim 18 is a method claim with limitations corresponding to the limitations of Claim 9 and is rejected under similar rationale.
Claim 20 is a computer readable medium claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.
Claim(s) 10 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gupta in view of Poria, Cheng, and Plummer as applied in claim 1 above, in further view of Kofman et al. (US 20210165973 A1).
Regarding claim 10, Gupta, Poria, Cheng, and Plummer do not disclose the additional limitations. Plummer discloses (Col 3, para 1; Col 10, para 3; Col 11, para 1) searching text data, but not translated data.
Kofman discloses: 10. The computing system of claim 1, wherein the at least one processor is further configured to: receive a search query; search the text data; search the translated data; and return a search result. ("[0200] As can be seen in FIG. 13, in some example embodiments, the transcribed language UI 1300A includes a search field(s) 1360, which can be used to quickly find specified text in the viewed transcribed language and translated language transcripts..." – the transcribed language is the original language (text data).)
Gupta, Poria, Cheng, Plummer, and Kofman are considered analogous art to the claimed invention because they are in the field of speech processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination with a search engine to search both original and translated texts as taught by Kofman. Doing so would have been beneficial so that people around the world can receive diverse content (Kofman [0004]), and so that the user could search in either language.
Claim 19 is a method claim with limitations corresponding to the limitations of Claim 10 and is rejected under similar rationale.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JON C MEIS whose telephone number is (703)756-1566. The examiner can normally be reached Monday - Thursday, 8:30 am - 5:30 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at 571-272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JON CHRISTOPHER MEIS/Examiner, Art Unit 2654
/HAI PHAN/Supervisory Patent Examiner, Art Unit 2654