Office Action Analysis: 18627979 — Method for Providing a Sign-Language Avatar Video for a Primary Video

Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s amendments and arguments filed on 3/27/2026 have been considered. Claim 1-31 are pending in the application. Applicant’s amendments to the specifications and claims have overcome each and every objection and 35 U.S.C. 101 rejection previously set forth in the Non-Final Office Action mailed on 12/31/2025. 
Response to Arguments
Applicant’s arguments with respect to claim(s) 10 and 22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Applicant’s addition of claims 30 and 31 have also been considered but are also rejected based on the new ground of rejection. Please see 35 U.S.C 103 rejections below.

Claim Objections
Claim 17 objected to because of the following informalities:
Claim 17 depends on claim 16 which was canceled after Applicant’s amendments. For the purposes of examination, Examiner has included the rejection of claim 16 mailed on 12/31/2025 for context of the rejection of claim 17, see 35 U.S.C 103 rejections below. Examiner respectfully suggests changing the dependency or cancellation of claim 17
Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 10, 13-18, 22, 25-27, and 29-31 is/are rejected under 35 U.S.C. 103 as being unpatentable over Marey et al (US 11908056 B2), Miller (US 20220150514 A1), and David (US 20210157872 A1), hereinafter Marey, Miller, and David respectively.

Regarding claim 10, Marey teaches a method for providing a sign-language avatar in a display (“Systems and methods for doing presenting an avatar that speaks sign language based on sentiment of a speaker is disclosed herein” - Abstract), comprising: receiving on a user's computing device or system an audio input (“A translation application running on a device receives a content item comprising a video and an audio” - Abstract); converting speech in the audio input into a transcription of speech (speech to text conversion, column 2, line 42); converting the transcription into a sequence of sign language instructions (“column 2, lines 42-43, the translation application translates the text into sign language: note: sign language in Marey inherently contains a sequence of sign language instructions so that the avatar can present to the user), the sign-language instructions including one or both of gesture or movement instructions associated with word-for-word translation of the transcription into sign language and any associated sign language grammar, including sign order (“The translation application may use a database of words that are pre-mapped to images or videos showing gestures or movements of corresponding signs.” – Col 2, Lines 48-50); and generating an animation from the sequence of sign-language instructions in a display portion of the user's computing device (“The translation application generates an avatar that speaks the first sign of the first sign language where the avatar exhibits the determined emotional state. The content item and the avatar are presented for display on the device” - Abstract).
Marey does not teach receiving on a user's computing device or system an audio input, wherein the audio input is extracted from a video input uploaded from a transcoding pipeline; and wherein at least one of the step of converting the speech from the audio into a transcript or the step of converting the transcript into the sequence of sign language instructions is performed in a parallel computing path to the transcoding pipeline. However, Miller teaches receiving on a user's computing device or system an audio input wherein the audio input is extracted from a video input uploaded from a transcoding pipeline (”When transcoding for a media stream is established, an audio transcoder 305 and video transcoder 207 are established for the audio/video stream. The input audio/video stream is separated by a source demuxer 303 into a video input and a set of audio inputs” – Par 59, Lines. NOTE: during the transcoding process, a demuxer is used to separate the audio and the video from an audio/video input. This shows that the audio is extracted from the audio/video stream used as input for the transcoding process). It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to modify Marey by incorporating the teachings of Miller to receive an audio input extracted from a video input from a transcoding pipeline. Extracting the audio from a video will have the predicted result of a separate audio data that can be further processed to obtain a transcript or text data. This text data could then be translated into sign-language instructions.
Marey in view of Miller still does not teach wherein at least one of the step of converting the speech from the audio into a transcript or the step of converting the transcript into the sequence of sign language instructions is performed in a parallel computing path to the transcoding pipeline. However, David teaches wherein at least one of the step of converting the speech from the audio into a transcript or the step of converting the transcript into the sequence of sign language instructions is performed in a parallel computing path to the transcoding pipeline (“A model that extracts the audio part of the video comprises the VGGish model… In parallel, the content of the audio is grabbed with an audio to text converter.” – Par 57, Lines 4-11. NOTE: David teaches the use of a model that extracts audio from a video input. In parallel, the audio is put through an audio to text converter. After the combination, the step of converting speech from the audio into a transcript as taught by David can work in parallel with the transcoding process as taught by Miller. This modification is then added to Marey so that the digital avatar’s performance of the sign language instructions can be performed efficiently in real time.) It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to  modify Marey by incorporating the teachings of David to have the step of converting the speech to a transcript is done in parallel with the transcoding process. This would result in the predicted outcome of a more seamless and natural viewing experience as the sign language avatar would be able to perform the sign-language in real time.

Regarding claim 22, the claim describes one or more non-transitory memory device that performs the method of claim 10. Therefore, non-transitory memory device claim 22 corresponds to the method disclosed in claim 10 and is rejected for the same reasons of obviousness as used above.

Regarding claim 13, Marey in view of Miller and David teaches the method of claim 10. Marey further teaches wherein the animation is either a 3D humanoid model or a 2D humanoid model (“The translation application generates a two-dimensional or three-dimensional graphical representation of the avatar via a user interface and renders the avatar with the content item for display” – Col 13, Lines 21-24). [NOTE: Fig.3 also shows a humanoid woman news anchor as an example.]

    PNG
    media_image1.png
    348
    540
    media_image1.png
    Greyscale
                                     
Screenshot of Fig. 3 from Marey et al (US 11908056 B2)

Regarding claim 25, the claim describes one or more non-transitory memory device that performs the method of claim 13. Therefore, non-transitory memory device claim 25 corresponds to the method disclosed in claim 13 and is rejected for the same reasons of obviousness as used above.

    PNG
    media_image2.png
    406
    539
    media_image2.png
    Greyscale
Regarding claim 14, Marey in view of Miller and David teaches the method of claim 13. Marey further teaches wherein the animation is an avatar animation, which is customizable on at least one of the user's computing device or on a video-creator's configuration options (“In some embodiments, an avatar may be customized based on user preference or user input. For example, the user may change the visual characteristics of the avatar, such as a hairstyle or an eye color 506 of an avatar, as shown in FIG. 5” – Col 14, Lines 21-25).

Screenshot of Fig. 5 from Marey et al (US 11908056 B2)

Regarding claim 26, the claim describes one or more non-transitory memory device that performs the method of claim 14. Therefore, non-transitory memory device claim 25 corresponds to the method disclosed in claim 14 and is rejected for the same reasons of obviousness as used above.

Regarding claim 15, Marey in view of Miller and David teaches the method of claim 14. Marey further teaches wherein the avatar animation is customizable on the user's computing device, which provides a user with a selection of available avatars from which to select from (“A user may customize various features of the avatar so that the avatar resembles a person of interest in appearance as the avatar is in the real world. In some embodiments, a special avatar 502 may be generated based on a public figure (e.g., Justin Bieber) or virtual character of a content item (e.g., Harry Potter). A user may also find an avatar 508 by looking up a character online or a preconfigured avatar stored in a configuration file associated with the user.” – Col 14 Lines 32-40, Fig. 5).

Regarding claim 27, the claim describes one or more non-transitory memory device that performs the method of claim 15. Therefore, non-transitory memory device claim 27 corresponds to the method disclosed in claim 15 and is rejected for the same reasons of obviousness as used above.

Regarding claim 16 (CANCELED), Marey [in view of Miller and David] teaches the method of claim 10 (new references added to rejection of claim 16 for consistency with Applicant’s amendments). Marey further teaches wherein the audio input is extracted from a video input (“Speech-to-text module 230 may implement any machine learning speech recognition or voice recognition techniques, such as Google® DeepMind, to decipher the speech of a user or a character in a content item” – Col 8, Lines 52-56). [NOTE: One of ordinary skill in the art would know that speech recognition requires an extraction of the audio input. Since the example used here is from a live broadcast news report, the implication can be made that the audio input is extracted from the live video broadcast input.]

Regarding claim 17, Marey in view of Miller and David teaches the method of claim 16. Marey further teaches wherein the steps occur in real time or near real time (“Because the content item is broadcast in real time (e.g., live news), the conversion may be performed in real time” – Col 10 Line 67, Col 11 Lines 1-2). [NOTE: From the disclosure, one of ordinary skill in the art would understand that the extraction of audio from the live broadcast must be occurring in real time.]
Regarding claim 18, Marey in view of Miller and David teaches the method of claim 10. Marey further teaches wherein the steps occur in real time or near real time (“The avatar may be displayed in a separate window on a display device. The avatar may be a live avatar performing the signs in real time.” – Col 3, Lines 20-22).

Regarding claim 29, the claim describes one or more non-transitory memory device that performs the method of claim 18. Therefore, non-transitory memory device claim 29 corresponds to the method disclosed in claim 18 and is rejected for the same reasons of obviousness as used above.

Regarding claim 30, Marey in view of Miller and David teaches the method of claim 10. Marey further teaches  (“Subsequent to speech-to-text conversion, the translation application translates the text into sign language.” – Col 2 Line 42-43) NOTE: After the combination, the step of converting the speech from an audio to a transcript taught by David can work in parallel with the transcoding process taught by Miller, see rejection of claim 1. This modification is added to Marey who teaches that the transcript of the audio is directly used by a text-to-sign language module to produce the sign language instructions to be performed by the digital avatar, see Col 8, Lines 48-68. This step of translating the text transcript to sign language instructions should also work in parallel with the transcoding pipeline since it relies on the transcript obtained from the step of converting the speech from the audio into a transcript. This combination would then teach wherein each of the steps of converting the speech from the audio into a transcript and the step of converting the transcript into the sequence of sign language instructions are performed in a parallel computing path to the transcoding pipeline.) It would have been obvious to one of ordinary skill before the effective filing date of the present application to modify Marey to incorporate the teachings of Miller and David to have both the step of converting the speech from the audio into a transcript and the step of converting the transcript into the sequence of sign language instructions be done in parallel with the transcoding pipeline. This would have the predicted result in a cleaner real-time experience in which the digital avatar will quickly receive the instructions to perform the sign language for the viewer in order to create a natural viewing experience.

Regarding claim 31, the claim describes one or more non-transitory memory device that performs the method of claim 30. Therefore, non-transitory memory device claim 31 corresponds to the method disclosed in claim 30 and is rejected for the same reasons of obviousness as used above.

Claim(s) 11 and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Marey in view of Miller, David and Miyazaki (US 20100010951 A1), hereinafter Miyazaki.

Regarding claim 11. Marey in view of Miller and David teaches the method of claim 10. Marey does not teach wherein the animation is generated using accelerated graphical APIs. However, Miyazaki teaches wherein the animation is generated using accelerated graphical APIs (“For example, animation may be two-dimensional animation such as Flash (registered trademark), or three-dimensional CG animation using known technology such as OpenGL (registered trademark) or DirectX (registered trademark)” – Par 37 Lines 6-9). It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to modify Marey to incorporate the teachings of Miyazaki to generate the animations using an accelerated graphical API. It is known in the art for graphical APIs such as DirectX or Vulkan to generate animations because they are built for enhancing performance and visual quality which is integral to clearly articulating sign language gestures.

Regarding claim 23, the claim describes one or more non-transitory memory device that performs the method of claim 11. Therefore, non-transitory memory device claim 23 corresponds to the method disclosed in claim 11 and is rejected for the same reasons of obviousness as used above.

Claim(s) 19 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Marey in view of Miller, David and Bean et al (US 20240335740 A1), hereinafter Bean respectively.

Regarding claim 19, Marey in view of Miller and David teaches the method of claim 10. Marey does not teach wherein the user's computing device or system comprises an augmented reality device. However, Bean teaches wherein the user's computing device or system comprises an augmented reality device (“the principles of the present disclosure can be applied to other contexts in which communication via sign language may occur, including other types of interactive applications (e.g. social networking applications, video communications, other interactive virtual environments, virtual reality and augmented reality environments, etc.” – Par 31, Lines 3-9). [NOTE:  One of ordinary skill in the art would be able to infer that if Bean’s disclosure can be applied to augmented reality environments, then an augmented reality device should be used to do so.]. It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to modify Marey to incorporate the teachings of Bean to consider that the user’s device is an augmented reality device. It is known in the art that AR devices are a convenient form of displaying content (such as the avatar for performing the sign language gestures). One of ordinary skill in the art would recognize that using AR devices could help translate real time conversation in everyday life rather than just from an audio input.

Regarding claim 20, Marey in view of Miller and David teaches the method of claim 10. Marey does not teach wherein the user's computing device or system comprises a virtual reality device. However, Bean teaches wherein the user's computing device or system comprises a virtual reality device (“In one embodiment, the various technical examples can be implemented using a virtual environment via a head-mounted display (HMD). An HMD may also be referred to as a virtual reality (VR) headset.” – Par 85, Lines 1-4). It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to modify Marey to incorporate the teachings of Bean to consider that the user’s device is a virtual reality device. One of ordinary skill in the art would recognize that a VR headset is another form of displaying content (such as the avatar for performing the sign language gestures). Users may prefer a more immersive experience when engaging with the content and having the translation on the side of the VR display would be a logical addition.

Claim(s) 12 and 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Marey in view of Miller, David, and Lee (US 20230215296), hereinafter Lee.

Regarding claim 12, Marey in view of Miller and David teaches the method of claim 10. Marey does not teach wherein the sequence of sign language instructions is generated by an AI model. However, Lee teaches wherein the sequence of sign language instructions is generated by an AI model. (the processor 220 may convert the converted text into the sign language through an artificial intelligent (AI) based sign language translation model, paragraph 0083). Therefore, it would have been obvious to a person with ordinary skill in the art to have modified Marey by incorporating the teachings of Lee to include: wherein the sequence of sign language instructions is generated by an AI model. The reason of doing so would have allow the system to generate better sign language instruction efficiently.

Regarding claim 24, the claim describes one or more non-transitory memory device that performs the method of claim 12. Therefore, non-transitory memory device claim 24 corresponds to the method disclosed in claim 12 and is rejected for the same reasons of obviousness as used above.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID V. NGUYEN whose telephone number is (571)272-6111. The examiner can normally be reached M-F 9:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Y Poon can be reached at 571-270-0728. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DAVID VAN NGUYEN/               Examiner, Art Unit 2617                                                                                                                                                                                         /KING Y POON/Supervisory Patent Examiner, Art Unit 2617
Read full office action
Method for Providing a Sign-Language Avatar Video for a Primary Video

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Method for Providing a Sign-Language Avatar Video for a Primary Video

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email