Prosecution Insights
Last updated: April 19, 2026
Application No. 18/745,098

Gesture Playback System

Non-Final OA §101§103
Filed
Jun 17, 2024
Examiner
MASTERS, KRISTEN MICHELLE
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Comcast Cable Communications LLC
OA Round
1 (Non-Final)
62%
Grant Probability
Moderate
1-2
OA Rounds
3y 2m
To Grant
87%
With Interview

Examiner Intelligence

Grants 62% of resolved cases
62%
Career Allow Rate
25 granted / 40 resolved
+0.5% vs TC avg
Strong +25% interview lift
Without
With
+24.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
36 currently pending
Career history
76
Total Applications
across all art units

Statute-Specific Performance

§101
35.2%
-4.8% vs TC avg
§103
46.9%
+6.9% vs TC avg
§102
8.0%
-32.0% vs TC avg
§112
7.1%
-32.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 40 resolved cases

Office Action

§101 §103
Detailed Action This communication is in response to the Application filed on 6/17/2024. Claims 1-20 are pending and have been examined. Independent Claims 1, 12, and 17 are method claims, respectively. Apparent priority: 6/17/2024. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Information Disclosure Statement The information disclosure statement (IDS) submitted on 6/17/2024 have been considered by the examiner. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Independent claim 1 recites, “1. A method comprising: accessing, by one or more computing devices, audio information associated with content; translating the audio information into a sequence of sign language gestures; (This relates to a human using natural language translation and to translate audio into sign language.) determining an allocated duration for each gesture in the sequence of sign language gestures; (This relates to a human using perception to determine a duration for each gesture.) determining, based on the allocated duration for each gesture, a gesture playback rate; (This relates to a human using perception to determine a rate.) and providing, for output, data comprising the sequence of sign language gestures at the determined gesture playback rate. (This relates to a human using hand movements to output sign language gestures at a rate.) no additional elements. The Dependent Claims do not include additional limitations that could incorporate the abstract idea into a practical application or cause the Claim as a whole to amount to significantly more than the underlying abstract idea. Regarding Independent Claim 12, claim 12 is a method claim with limitations similar to that of Claim 1 and is rejected under the same rational. no additional elements. Regarding Independent Claim 17, claim 17 is a method claim with limitations similar to that of Claim 1 and is rejected under the same rational. no additional elements. Dependent claim 2 recites, “2. The method of claim 1, further comprises storing a text segment associated with the audio information, a start time associated with the text segment, and a duration associated with the text segment. (This relates to a human using pen and paper to store a text segment and start time and a duration.) no additional elements. Dependent claim 3 recites, “3. The method of claim 1, wherein the data further comprises: a start time associated with the allocated duration of each gesture; and an end time associated with the allocated duration of each gesture. (This relates to a human using perception to note a start time and duration.) no additional elements. Dependent claim 4 recites, “4. The method of claim 1, wherein the determining further comprises: sending a text segment associated with the audio information; accessing a start time associated with the text segment and a segment duration associated with the text segment; (This relates to a human using pen and paper to store a text segment and start time and a duration.) and determining, for each gesture, the allocated duration by dividing the segment duration with a total number of gestures in the sequence of sign language gestures.” (This relates to a human counting gestures.) no additional elements. Dependent claim 5 recites, “5. The method of claim 1, further comprises: determining, for each gesture, the gesture playback rate by dividing a predetermined gesture time with the allocated duration. (This relates to a human using perception to determine a rate and using logic and reasoning dividing gesture time with duration.) no additional elements. Dependent claim 6 recites, “6. The method of claim 1, further comprises: determining, for each gesture, the gesture playback rate that is less than a minimum playback threshold; and adjusting, based on the determination, the gesture playback rate to be equivalent to the minimum playback threshold. (This relates to a human determining a rate according to a threshold using perception and logic and reasoning and adjusting a rate to a minimum threshold.) no additional elements. Dependent claim 7 recites, “7. The method of claim 1, further comprises: determining, for each gesture, the gesture playback rate by dividing a predetermined gesture time with the allocated duration; (This relates to a human determining a rate according to a threshold using perception and logic and reasoning) and determining an adjusted gesture playback rate based on a maximum value between a minimum playback threshold and the determined gesture playback rate. (This relates to a human adjusting a rate to a minimum threshold using perception or pen and paper.) no additional elements. Dependent claim 8 recites, “8. The method of claim 1, further comprises: receiving a content player rate; determining that the content player rate is above a normal rate; (This relates to human using perception to determine a rate is above normal.) and determining, an adjusted gesture playback rate by multiplying the content player rate with the gesture playback rate. (this relates to a human applying logic and reasoning to determine an adjusted rate using multiplication.) no additional elements. Dependent claim 9 recites, “9. The method of claim 1, further comprises the sequence of sign language gestures associated with Sign Language. (This relates to a human performing sign language using natural language understanding and gestures) no additional elements. Dependent claim 10 recites, “10. The method of claim 1, further comprises: determining, based on a context of a text segment, an intensity associated with each gesture. (This relates to a human determining intensity of gestures using perception and natural language understanding) no additional elements. Dependent claim 11 recites, “11. The method of claim 1, wherein the translating further comprises training a machine learning model to translate the audio information to the sequence of sign language gestures. (This relates to a human using natural language understanding to translate audio into gestures.) no additional elements. As to claim 13, claim 13 is a parallel method claim with limitations similar to that of claim 6 and is rejected under the same rationale. As to claim 14, claim 14 is a parallel method claim with limitations similar to that of claim 6 and is rejected under the same rationale. As to claim 15, claim 15 is a parallel method claim with limitations similar to that of claim 2and is rejected under the same rationale. As to claim 16, claim 16 is a parallel method claim with limitations similar to that of claim 3 and is rejected under the same rationale. As to claim 18, claim 18 is a parallel method claim with limitations similar to that of claim 7 and is rejected under the same rationale. As to claim 19, claim 19 is a parallel method claim with limitations similar to that of claim 2 and is rejected under the same rationale. As to claim 20, claim 20 is a parallel method claim with limitations similar to that of claim 3 and is rejected under the same rationale. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-11, 17, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Jawahar (U.S. Patent Number US 20230290371 A1) in view of Natesan (U.S. Patent Number US 20200159833 A1). Regarding independent Claim 1, Jawahar teaches 1. A method comprising: accessing, by one or more computing devices, audio information associated with content; translating the audio information into a sequence of sign language gestures; (see Jawahar [0006] “In view of the foregoing, an embodiment herein provides a method for automatically generating a sign language video from an input speech using a machine learning model. The method includes extracting a plurality of spectrograms of an input speech by (i) encoding, using an encoder, a time domain series of the input speech to a frequency domain series, and (ii) decoding, using a decoder, a plurality of tokens for time steps of the frequency domain series. Each spectrogram comprises at least one visual representation of a strength of the input speech over time, the input speech is obtained from a user device associated with a user. The method includes generating a plurality of pose sequences for a current time step of the plurality of spectrograms using a first machine learning model. The first machine learning model is trained by correlating historical pose sequences of historical users in historical sign language videos with historical spectrograms of historical input speeches. The method includes automatically generating, using a discriminator of a second machine learning model, a sign language video for the input speech using the plurality of pose sequences and the plurality of spectrograms when the plurality of pose sequences are matched with corresponding the plurality of spectrograms that are extracted.”) determining, based on the allocated duration for each gesture, a gesture playback rate; (see Jawahar [0066] “FIG. 6 illustrates a flow diagram of a method for automatically generating a sign language video from an input speech using the machine learning model of FIG. 1 according to some embodiments herein. At the step 602, the method includes extracting a plurality of spectrograms of an input speech by (i) encoding, using an encoder, a time domain series of the input speech to a frequency domain series, and (ii) decoding, using a decoder, a plurality of tokens for time steps of the frequency domain series. Each spectrogram comprises at least one visual representation of a strength of the input speech over time, wherein the input speech is obtained from a user device associated with a user. At the step 604, the method includes generating one or more pose sequences for a current time step of the plurality of spectrograms using a first machine learning model. The first machine learning model is trained by correlating historical pose sequences of historical users in historical sign language videos with historical spectrograms of historical input speeches. At the step 606, the automatically generating, using a second machine learning model, a sign language video for the input speech using the one or more pose sequences and the plurality of spectrograms when the one or more pose sequences are matched with corresponding the plurality of spectrograms that are extracted.”) and providing, for output, data comprising the sequence of sign language gestures at the determined gesture playback rate. (see Jawahar [0056] “The generator 308 is configured to input the Mel spectrogram of the input speech from the speech encoder. In some embodiments, the generator 308 embeds the Mel Spectrogram accurately. The generator 308 may include a positional encoder that encodes the Mel Spectrogram according to the previous time step and the current time step. The generator 308 is configured to input the predicted sign poses from the pose decoder. In some embodiments, the generator 308 embeds the predicted sign poses accurately. The generator 308 may include a positional encoder that encodes the predicted sign poses according to the previous time step and the current time step.”) (see Jawahar [0057] “The generator 308 associates with the speech embedding and the pose embedding, that is configured to learn attention aware representations for the modalities. In some embodiments, the modalities include any of the input speech and the pose sequence. The generator 308 is configured to fuse the modalities that learns to embed the speech segments of the input speech into the pose sequence. In some embodiments, the generator 308 learns a relationship between the input speech and the pose sequence. The generator 308 may merge the speech segments with the pose sequence. The generator 308 is configured to apply attention to the fused embedding of two modalities and to find whether the two modalities match or not.”) Jawahar does not specifically teach determining an allocated duration for each gesture in the sequence of sign language gestures; However, Natesan does teach this limitation (see Natesan [0014] “The apparatuses, methods, and non-transitory computer readable media disclosed herein address at least the aforementioned technical challenges by implementing natural language processing to extract, from a speech video of a speaker, sentences, along with their duration. In this regard, the apparatuses, methods, and non-transitory computer readable media disclosed herein provide for the conversion of speech to sign language by translating the speech at the sentence level. By using natural language processing, the apparatuses, methods, and non-transitory computer readable media disclosed herein may extract each sentence from a speech video, and translate each sentence to sign language standard. Further, for each sentence, along with the sentence, the sentence start and end times may be identified in the speech video. The sentence start and end times may be utilized to ensure that the sign language video that is generated and the original speech video play in sync based, for example, on alignment of each sentence start and end time.”) Jawahar and Natesan are in the same field of endeavor of signal processing, therefore, it would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Jawahar to incorporate determining an allocated duration for each gesture in the sequence of sign language gestures of Natesan. This allows objective interpretation of sentences into sign language generation as recognized by Natesan [0013-0014]. As to Independent Claim 17, Claim 17 is a parallel method claim with limitations similar to that of claim 1 and is rejected under the same rationale. Furthermore Jawahar teaches and providing, for output, data comprising the sequence of sign language gestures according to a gesture playback rate and a content player rate. (see Jawahar [0056] “The generator 308 is configured to input the Mel spectrogram of the input speech from the speech encoder. In some embodiments, the generator 308 embeds the Mel Spectrogram accurately. The generator 308 may include a positional encoder that encodes the Mel Spectrogram according to the previous time step and the current time step. The generator 308 is configured to input the predicted sign poses from the pose decoder. In some embodiments, the generator 308 embeds the predicted sign poses accurately. The generator 308 may include a positional encoder that encodes the predicted sign poses according to the previous time step and the current time step.”) (see Jawahar [0057] “The generator 308 associates with the speech embedding and the pose embedding, that is configured to learn attention aware representations for the modalities. In some embodiments, the modalities include any of the input speech and the pose sequence. The generator 308 is configured to fuse the modalities that learns to embed the speech segments of the input speech into the pose sequence. In some embodiments, the generator 308 learns a relationship between the input speech and the pose sequence. The generator 308 may merge the speech segments with the pose sequence. The generator 308 is configured to apply attention to the fused embedding of two modalities and to find whether the two modalities match or not.”) As to Claim 2, Jawahar in view of Natesan teaches 2. The method of claim 1, Furthermore, Jawahar teaches further comprises storing a text segment associated with the audio information, a start time associated with the text segment, and a duration associated with the text segment. (see Jawahar [0054] “FIG. 3 is a block diagram of the second machine learning model 108B of FIG. 1 according to some embodiments herein. The second machine learning model 108B includes a ground truth module 302, a discriminator 304, a loss function 306, and a generator 308. The generator 308 generates the plurality of pose sequences, and the sign language video for the input speech. The ground truth module 302 determines ground truth spectrograms, ground truth pose sequences, ground truth input speech. The discriminator 304 discriminates between at least one of ground truth spectrograms, ground truth pose sequences, ground truth input speech, and generated plurality of pose sequences, generated sign language video. The loss function 306 is generated when the discriminator 304 discriminates between at least one of ground truth spectrograms, ground truth pose sequences, ground truth input speech, and generated plurality of pose sequences, generated sign language video.”) (see Jawahar [0055] “The discriminator 304 is configured to match the speech segments with the pose sequence. The pose sequence may be predicted sign poses. In some embodiments, the discriminator 304 includes a separate speech and pose embedding layers that learns a high dimensional embedding of the input speech and the pose sequence.”) (see Jawahar [0056] “The generator 308 is configured to input the Mel spectrogram of the input speech from the speech encoder. In some embodiments, the generator 308 embeds the Mel Spectrogram accurately. The generator 308 may include a positional encoder that encodes the Mel Spectrogram according to the previous time step and the current time step. The generator 308 is configured to input the predicted sign poses from the pose decoder. In some embodiments, the generator 308 embeds the predicted sign poses accurately. The generator 308 may include a positional encoder that encodes the predicted sign poses according to the previous time step and the current time step.”) As to Claim 3, Jawahar in view of Natesan teaches 3. The method of claim 1, Furthermore, Jawahar teaches wherein the data further comprises: a start time associated with the allocated duration of each gesture; and an end time associated with the allocated duration of each gesture. (see Jawahar [0042] “In some embodiments, the system 100 includes a language model to control the language, an accent, and a duration of speech, and includes an alignment module that aligns the input speech with the pose sequence, thereby eliminating false pose sequences. The database may store instructions to generate the sign language video with the input speech. The system 100 may include a memory that stores instructions and a processor that executes the stored instructions to generate the sign language video with the input speech.”) (see Jawahar [0063] “The system 100 may use an embedding size of d.sub.model=512, N=2 layers and number of heads, M=8. In some embodiments, the system 100 uses Xavier initialization and Adam optimizer with an initial learning rate of 10-3 for training the multi-task transformer and the cross modal discriminator. Data augmentations like predicting multiple-frame poses may be determined at each time step. In some embodiments, the system 100 predict 10 frames at every time step to penalize the network heavily for producing mean pose sequences.”) (see Jawahar [0064] “In some embodiments, quality of the generated sign language pose sequences can be evaluated using Dynamic Time Warping (DTW) and Probability of Correct Key points (PCK) scores. The DTW may find an optimal alignment between two time series by non-linearly warping the pose sequences. The PCK may be used in pose detections and generation to evaluate the probability of pose key points to be close to the ground truth key points..”) As to Claim 4, Jawahar in view of Natesan teaches 4. The method of claim 1, Furthermore, Jawahar teaches wherein the determining further comprises: sending a text segment associated with the audio information; accessing a start time associated with the text segment and a segment duration associated with the text segment; and determining, for each gesture, the allocated duration by dividing the segment duration with a total number of gestures in the sequence of sign language gestures. (see Jawahar [0066] “FIG. 6 illustrates a flow diagram of a method for automatically generating a sign language video from an input speech using the machine learning model of FIG. 1 according to some embodiments herein. At the step 602, the method includes extracting a plurality of spectrograms of an input speech by (i) encoding, using an encoder, a time domain series of the input speech to a frequency domain series, and (ii) decoding, using a decoder, a plurality of tokens for time steps of the frequency domain series. Each spectrogram comprises at least one visual representation of a strength of the input speech over time, wherein the input speech is obtained from a user device associated with a user. At the step 604, the method includes generating one or more pose sequences for a current time step of the plurality of spectrograms using a first machine learning model. The first machine learning model is trained by correlating historical pose sequences of historical users in historical sign language videos with historical spectrograms of historical input speeches. At the step 606, the automatically generating, using a second machine learning model, a sign language video for the input speech using the one or more pose sequences and the plurality of spectrograms when the one or more pose sequences are matched with corresponding the plurality of spectrograms that are extracted.”) As to Claim 5, Jawahar in view of Natesan teaches 5. The method of claim 1, Furthermore, Jawahar teaches further comprises: determining, for each gesture, the gesture playback rate by dividing a predetermined gesture time with the allocated duration. (see Jawahar [0054] “FIG. 3 is a block diagram of the second machine learning model 108B of FIG. 1 according to some embodiments herein. The second machine learning model 108B includes a ground truth module 302, a discriminator 304, a loss function 306, and a generator 308. The generator 308 generates the plurality of pose sequences, and the sign language video for the input speech. The ground truth module 302 determines ground truth spectrograms, ground truth pose sequences, ground truth input speech. The discriminator 304 discriminates between at least one of ground truth spectrograms, ground truth pose sequences, ground truth input speech, and generated plurality of pose sequences, generated sign language video. The loss function 306 is generated when the discriminator 304 discriminates between at least one of ground truth spectrograms, ground truth pose sequences, ground truth input speech, and generated plurality of pose sequences, generated sign language video.”) (see Jawahar [0055] The discriminator 304 is configured to match the speech segments with the pose sequence. The pose sequence may be predicted sign poses. In some embodiments, the discriminator 304 includes a separate speech and pose embedding layers that learns a high dimensional embedding of the input speech and the pose sequence.”) (see Jawahar [0056] “The generator 308 is configured to input the Mel spectrogram of the input speech from the speech encoder. In some embodiments, the generator 308 embeds the Mel Spectrogram accurately. The generator 308 may include a positional encoder that encodes the Mel Spectrogram according to the previous time step and the current time step. The generator 308 is configured to input the predicted sign poses from the pose decoder. In some embodiments, the generator 308 embeds the predicted sign poses accurately. The generator 308 may include a positional encoder that encodes the predicted sign poses according to the previous time step and the current time step.”) As to Claim 6, Jawahar in view of Natesan teaches 6. The method of claim 1, Furthermore, Jawahar teaches further comprises: determining, for each gesture, the gesture playback rate that is less than a minimum playback threshold; and adjusting, based on the determination, the gesture playback rate to be equivalent to the minimum playback threshold. (see Jawahar [0066] “FIG. 6 illustrates a flow diagram of a method for automatically generating a sign language video from an input speech using the machine learning model of FIG. 1 according to some embodiments herein. At the step 602, the method includes extracting a plurality of spectrograms of an input speech by (i) encoding, using an encoder, a time domain series of the input speech to a frequency domain series, and (ii) decoding, using a decoder, a plurality of tokens for time steps of the frequency domain series. Each spectrogram comprises at least one visual representation of a strength of the input speech over time, wherein the input speech is obtained from a user device associated with a user. At the step 604, the method includes generating one or more pose sequences for a current time step of the plurality of spectrograms using a first machine learning model. The first machine learning model is trained by correlating historical pose sequences of historical users in historical sign language videos with historical spectrograms of historical input speeches. At the step 606, the automatically generating, using a second machine learning model, a sign language video for the input speech using the one or more pose sequences and the plurality of spectrograms when the one or more pose sequences are matched with corresponding the plurality of spectrograms that are extracted.”) As to Claim 7, Jawahar in view of Natesan teaches 7. The method of claim 1 Furthermore, Jawahar teaches, further comprises: determining, for each gesture, the gesture playback rate by dividing a predetermined gesture time with the allocated duration; and determining an adjusted gesture playback rate based on a maximum value between a minimum playback threshold and the determined gesture playback rate. (see Jawahar [0054] “FIG. 3 is a block diagram of the second machine learning model 108B of FIG. 1 according to some embodiments herein. The second machine learning model 108B includes a ground truth module 302, a discriminator 304, a loss function 306, and a generator 308. The generator 308 generates the plurality of pose sequences, and the sign language video for the input speech. The ground truth module 302 determines ground truth spectrograms, ground truth pose sequences, ground truth input speech. The discriminator 304 discriminates between at least one of ground truth spectrograms, ground truth pose sequences, ground truth input speech, and generated plurality of pose sequences, generated sign language video. The loss function 306 is generated when the discriminator 304 discriminates between at least one of ground truth spectrograms, ground truth pose sequences, ground truth input speech, and generated plurality of pose sequences, generated sign language video.”) (see Jawahar [0055] “The discriminator 304 is configured to match the speech segments with the pose sequence. The pose sequence may be predicted sign poses. In some embodiments, the discriminator 304 includes a separate speech and pose embedding layers that learns a high dimensional embedding of the input speech and the pose sequence.”) (see Jawahar [0056] “The generator 308 is configured to input the Mel spectrogram of the input speech from the speech encoder. In some embodiments, the generator 308 embeds the Mel Spectrogram accurately. The generator 308 may include a positional encoder that encodes the Mel Spectrogram according to the previous time step and the current time step. The generator 308 is configured to input the predicted sign poses from the pose decoder. In some embodiments, the generator 308 embeds the predicted sign poses accurately. The generator 308 may include a positional encoder that encodes the predicted sign poses according to the previous time step and the current time step.”) As to Claim 8, Jawahar in view of Natesan teaches 8. The method of claim 1, Furthermore, Jawahar teaches, further comprises: receiving a content player rate; determining that the content player rate is above a normal rate; and determining, an adjusted gesture playback rate by multiplying the content player rate with the gesture playback rate. (see Jawahar [0063] “The system 100 may use an embedding size of d.sub.model=512, N=2 layers and number of heads, M=8. In some embodiments, the system 100 uses Xavier initialization and Adam optimizer with an initial learning rate of 10-3 for training the multi-task transformer and the cross modal discriminator. Data augmentations like predicting multiple-frame poses may be determined at each time step. In some embodiments, the system 100 predict 10 frames at every time step to penalize the network heavily for producing mean pose sequences.”) (see Jawahar [0064] “In some embodiments, quality of the generated sign language pose sequences can be evaluated using Dynamic Time Warping (DTW) and Probability of Correct Key points (PCK) scores. The DTW may find an optimal alignment between two time series by non-linearly warping the pose sequences. The PCK may be used in pose detections and generation to evaluate the probability of pose key points to be close to the ground truth key points.”) (see Jawahar [0066] “FIG. 6 illustrates a flow diagram of a method for automatically generating a sign language video from an input speech using the machine learning model of FIG. 1 according to some embodiments herein. At the step 602, the method includes extracting a plurality of spectrograms of an input speech by (i) encoding, using an encoder, a time domain series of the input speech to a frequency domain series, and (ii) decoding, using a decoder, a plurality of tokens for time steps of the frequency domain series. Each spectrogram comprises at least one visual representation of a strength of the input speech over time, wherein the input speech is obtained from a user device associated with a user. At the step 604, the method includes generating one or more pose sequences for a current time step of the plurality of spectrograms using a first machine learning model. The first machine learning model is trained by correlating historical pose sequences of historical users in historical sign language videos with historical spectrograms of historical input speeches. At the step 606, the automatically generating, using a second machine learning model, a sign language video for the input speech using the one or more pose sequences and the plurality of spectrograms when the one or more pose sequences are matched with corresponding the plurality of spectrograms that are extracted.”) As to Claim 9, Jawahar in view of Natesan teaches 9. The method of claim 1, Furthermore, Jawahar teaches, further comprises the sequence of sign language gestures associated with Sign Language. (see Jawahar [0066] “FIG. 6 illustrates a flow diagram of a method for automatically generating a sign language video from an input speech using the machine learning model of FIG. 1 according to some embodiments herein. At the step 602, the method includes extracting a plurality of spectrograms of an input speech by (i) encoding, using an encoder, a time domain series of the input speech to a frequency domain series, and (ii) decoding, using a decoder, a plurality of tokens for time steps of the frequency domain series. Each spectrogram comprises at least one visual representation of a strength of the input speech over time, wherein the input speech is obtained from a user device associated with a user. At the step 604, the method includes generating one or more pose sequences for a current time step of the plurality of spectrograms using a first machine learning model. The first machine learning model is trained by correlating historical pose sequences of historical users in historical sign language videos with historical spectrograms of historical input speeches. At the step 606, the automatically generating, using a second machine learning model, a sign language video for the input speech using the one or more pose sequences and the plurality of spectrograms when the one or more pose sequences are matched with corresponding the plurality of spectrograms that are extracted.”) As to Claim 10, Jawahar in view of Natesan teaches 10. The method of claim 1, Furthermore, Natesan teaches further comprises: determining, based on a context of a text segment, an intensity associated with each gesture. (see Natesan [0072] “The sentiment analyzer 116 may identify the sentiment 118 of the sentence using sentiment analysis. In this regard, the sentiment analyzer 116 may utilize machine learning to identify sentiment of each sentence. The sentiment may include, for example, joy, sorrow, thinking, etc. The sentiment may be used to ensure a sign video with face expression aligned to the sentiment is shown to the user 106.”) Jawahar and Natesan are in the same field of endeavor of signal processing, therefore, it would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Jawahar and Natesan to incorporate further comprises: determining, based on a context of a text segment, an intensity associated with each gesture of Natesan. This allows objective interpretation of sentences into sign language generation as recognized by Natesan [0013-0014]. As to Claim 11, Jawahar in view of Natesan teaches 11. The method of claim 1, Furthermore, Jawahar teaches wherein the translating further comprises training a machine learning model to translate the audio information to the sequence of sign language gestures. (see Jawahar [0006] "In view of the foregoing, an embodiment herein provides a method for automatically generating a sign language video from an input speech using a machine learning model. The method includes extracting a plurality of spectrograms of an input speech by (i) encoding, using an encoder, a time domain series of the input speech to a frequency domain series, and (ii) decoding, using a decoder, a plurality of tokens for time steps of the frequency domain series. Each spectrogram comprises at least one visual representation of a strength of the input speech over time, the input speech is obtained from a user device associated with a user. The method includes generating a plurality of pose sequences for a current time step of the plurality of spectrograms using a first machine learning model. The first machine learning model is trained by correlating historical pose sequences of historical users in historical sign language videos with historical spectrograms of historical input speeches. The method includes automatically generating, using a discriminator of a second machine learning model, a sign language video for the input speech using the plurality of pose sequences and the plurality of spectrograms when the plurality of pose sequences are matched with corresponding the plurality of spectrograms that are extracted.”) As to claim 19, claim 19 is a parallel method claim with limitations similar to that of claim 2 and is rejected under the same rationale. As to claim 20, claim 20 is a parallel method claim with limitations similar to that of claim 3 and is rejected under the same rationale. Claims 12-16 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Jawahar (U.S. Patent Number US 20230290371 A1) in view of Natesan (U.S. Patent Number US 20200159833 A1) and further in view of in view of WANG (U.S. Patent Number US 20230326369 A1). Jawahar teaches 12. A method comprising: accessing, by one or more computing devices, audio information associated with content; translating the audio information into a sequence of sign language gestures; (see Jawahar [0006] “In view of the foregoing, an embodiment herein provides a method for automatically generating a sign language video from an input speech using a machine learning model. The method includes extracting a plurality of spectrograms of an input speech by (i) encoding, using an encoder, a time domain series of the input speech to a frequency domain series, and (ii) decoding, using a decoder, a plurality of tokens for time steps of the frequency domain series. Each spectrogram comprises at least one visual representation of a strength of the input speech over time, the input speech is obtained from a user device associated with a user. The method includes generating a plurality of pose sequences for a current time step of the plurality of spectrograms using a first machine learning model. The first machine learning model is trained by correlating historical pose sequences of historical users in historical sign language videos with historical spectrograms of historical input speeches. The method includes automatically generating, using a discriminator of a second machine learning model, a sign language video for the input speech using the plurality of pose sequences and the plurality of spectrograms when the plurality of pose sequences are matched with corresponding the plurality of spectrograms that are extracted.”) (see Jawahar [0066] “FIG. 6 illustrates a flow diagram of a method for automatically generating a sign language video from an input speech using the machine learning model of FIG. 1 according to some embodiments herein. At the step 602, the method includes extracting a plurality of spectrograms of an input speech by (i) encoding, using an encoder, a time domain series of the input speech to a frequency domain series, and (ii) decoding, using a decoder, a plurality of tokens for time steps of the frequency domain series. Each spectrogram comprises at least one visual representation of a strength of the input speech over time, wherein the input speech is obtained from a user device associated with a user. At the step 604, the method includes generating one or more pose sequences for a current time step of the plurality of spectrograms using a first machine learning model. The first machine learning model is trained by correlating historical pose sequences of historical users in historical sign language videos with historical spectrograms of historical input speeches. At the step 606, the automatically generating, using a second machine learning model, a sign language video for the input speech using the one or more pose sequences and the plurality of spectrograms when the one or more pose sequences are matched with corresponding the plurality of spectrograms that are extracted.”) (see Jawahar [0056] “The generator 308 is configured to input the Mel spectrogram of the input speech from the speech encoder. In some embodiments, the generator 308 embeds the Mel Spectrogram accurately. The generator 308 may include a positional encoder that encodes the Mel Spectrogram according to the previous time step and the current time step. The generator 308 is configured to input the predicted sign poses from the pose decoder. In some embodiments, the generator 308 embeds the predicted sign poses accurately. The generator 308 may include a positional encoder that encodes the predicted sign poses according to the previous time step and the current time step.”) (see Jawahar [0057] “The generator 308 associates with the speech embedding and the pose embedding, that is configured to learn attention aware representations for the modalities. In some embodiments, the modalities include any of the input speech and the pose sequence. The generator 308 is configured to fuse the modalities that learns to embed the speech segments of the input speech into the pose sequence. In some embodiments, the generator 308 learns a relationship between the input speech and the pose sequence. The generator 308 may merge the speech segments with the pose sequence. The generator 308 is configured to apply attention to the fused embedding of two modalities and to find whether the two modalities match or not.”) Jawahar does not specifically teach determining an allocated duration for each gesture in the sequence of sign language gestures; However, Natesan does teach this limitation (see Natesan [0014] “The apparatuses, methods, and non-transitory computer readable media disclosed herein address at least the aforementioned technical challenges by implementing natural language processing to extract, from a speech video of a speaker, sentences, along with their duration. In this regard, the apparatuses, methods, and non-transitory computer readable media disclosed herein provide for the conversion of speech to sign language by translating the speech at the sentence level. By using natural language processing, the apparatuses, methods, and non-transitory computer readable media disclosed herein may extract each sentence from a speech video, and translate each sentence to sign language standard. Further, for each sentence, along with the sentence, the sentence start and end times may be identified in the speech video. The sentence start and end times may be utilized to ensure that the sign language video that is generated and the original speech video play in sync based, for example, on alignment of each sentence start and end time.”) Jawahar and Natesan are in the same field of endeavor of signal processing, therefore, it would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Jawahar to incorporate determining an allocated duration for each gesture in the sequence of sign language gestures of Natesan. This allows objective interpretation of sentences into sign language generation as recognized by Natesan [0013-0014]. Jawahar in view of Natesan do not specifically teach determining, based on the allocated duration for each gesture, a slow gesture playback rate; and providing, for output, data comprising the sequence of sign language gestures at the slow gesture playback rate. (However Wang does teach this limitation (see Wang [0191] “In some embodiments, the listener text contain corresponding timestamps, and the timestamps are used for indicating a time interval of audio corresponding to the listener text on an audio time axis. The extraction module 1302 is configured to determine candidate clip durations of candidate sign language video clips corresponding to the candidate compression statements; determine audio clip durations of audio corresponding to the text statements based on timestamps corresponding to the text statements; and determine, based on the candidate clip durations and the audio clip durations, the target compression statements from the candidate compression statements through the dynamic path planning algorithm, a video time axis of sign language video corresponding to texts composed of the target compression statements being aligned with the audio time axis of the audio corresponding to the listener text.”) (see Wang [0197] “In some embodiments, the acquisition module 1301 is configured to: acquire the input listener text; acquire a subtitle file, and extract the listener text from the subtitle file; acquire an audio file, perform speech recognition on the audio file to obtain a speech recognition result, and generate the listener text based on the speech recognition result; and acquire a video file, perform character recognition on video frames of the video file to obtain a character recognition result, and generate the listener text based on the character recognition result.”) (see Wang [0198] In summary, in the embodiments of this application, the summary text are obtained by performing text summarization extraction on the listener text, and then the text length of the listener text are shortened, so that the finally generated sign language video can keep synchronization with the audio corresponding to the listener text. Since the sign language video is generated based on the sign language text after the summary text are converted into the sign language text conforming to the grammatical structures of a hearing-impaired person, the sign language video can better express the content to a hearing-impaired person, improving the accuracy of the sign language video.”) Jawahar in view of Natesan and Wang are in the same field of endeavor of signal processing, therefore, it would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of combination of Jawahar and Natesan to incorporate determining an allocated duration for each gesture in the sequence of sign language gestures of Wang. This allows for improved generation efficiency of the sign language video as recognized by Wang [0013-0014]. As to claim 13 Jawahar in view of Natesan and further in view of Wang teach 13. The method of claim 12, Furthermore, Wang teaches further comprises: receiving a minimum playback threshold, wherein the slow gesture playback rate is less than the minimum playback threshold; and adjusting, the slow gesture playback rate to be equivalent to the minimum playback threshold. (see Wang [0191] “In some embodiments, the listener text contain corresponding timestamps, and the timestamps are used for indicating a time interval of audio corresponding to the listener text on an audio time axis. The extraction module 1302 is configured to determine candidate clip durations of candidate sign language video clips corresponding to the candidate compression statements; determine audio clip durations of audio corresponding to the text statements based on timestamps corresponding to the text statements; and determine, based on the candidate clip durations and the audio clip durations, the target compression statements from the candidate compression statements through the dynamic path planning algorithm, a video time axis of sign language video corresponding to texts composed of the target compression statements being aligned with the audio time axis of the audio corresponding to the listener text.”) (see Wang [0197] “In some embodiments, the acquisition module 1301 is configured to: acquire the input listener text; acquire a subtitle file, and extract the listener text from the subtitle file; acquire an audio file, perform speech recognition on the audio file to obtain a speech recognition result, and generate the listener text based on the speech recognition result; and acquire a video file, perform character recognition on video frames of the video file to obtain a character recognition result, and generate the listener text based on the character recognition result.”) (see Wang [0198] In summary, in the embodiments of this application, the summary text are obtained by performing text summarization extraction on the listener text, and then the text length of the listener text are shortened, so that the finally generated sign language video can keep synchronization with the audio corresponding to the listener text. Since the sign language video is generated based on the sign language text after the summary text are converted into the sign language text conforming to the grammatical structures of a hearing-impaired person, the sign language video can better express the content to a hearing-impaired person, improving the accuracy of the sign language video.”) Jawahar in view of Natesan in view of Wang are in the same field of endeavor of signal processing, therefore, it would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of combination of Jawahar and Natesan and Wang to incorporate receiving a minimum playback threshold, wherein the slow gesture playback rate is less than the minimum playback threshold; and adjusting, the slow gesture playback rate to be equivalent to the minimum playback threshold of Wang. This allows for improved generation efficiency of the sign language video as recognized by Wang [0013-0014]. As to claim 14 Jawahar in view of Natesan and further in view of Wang teach 14. The method of claim 12, Furthermore Wang teaches further comprises: determining, for each gesture, the slow gesture playback rate by dividing a predetermined gesture time with the allocated duration, wherein a minimum playback threshold is greater than the slow gesture playback rate; and adjusting, based on the determination, the slow gesture playback rate to be equivalent to the minimum playback threshold. (see Wang [0191] “In some embodiments, the listener text contain corresponding timestamps, and the timestamps are used for indicating a time interval of audio corresponding to the listener text on an audio time axis. The extraction module 1302 is configured to determine candidate clip durations of candidate sign language video clips corresponding to the candidate compression statements; determine audio clip durations of audio corresponding to the text statements based on timestamps corresponding to the text statements; and determine, based on the candidate clip durations and the audio clip durations, the target compression statements from the candidate compression statements through the dynamic path planning algorithm, a video time axis of sign language video corresponding to texts composed of the target compression statements being aligned with the audio time axis of the audio corresponding to the listener text.”) (see Wang [0197] “In some embodiments, the acquisition module 1301 is configured to: acquire the input listener text; acquire a subtitle file, and extract the listener text from the subtitle file; acquire an audio file, perform speech recognition on the audio file to obtain a speech recognition result, and generate the listener text based on the speech recognition result; and acquire a video file, perform character recognition on video frames of the video file to obtain a character recognition result, and generate the listener text based on the character recognition result.”) (see Wang [0198] In summary, in the embodiments of this application, the summary text are obtained by performing text summarization extraction on the listener text, and then the text length of the listener text are shortened, so that the finally generated sign language video can keep synchronization with the audio corresponding to the listener text. Since the sign language video is generated based on the sign language text after the summary text are converted into the sign language text conforming to the grammatical structures of a hearing-impaired person, the sign language video can better express the content to a hearing-impaired person, improving the accuracy of the sign language video.”) Jawahar in view of Natesan in view of Wang are in the same field of endeavor of signal processing, therefore, it would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of combination of Jawahar and Natesan and Wang to incorporate determining, for each gesture, the slow gesture playback rate by dividing a predetermined gesture time with the allocated duration, wherein a minimum playback threshold is greater than the slow gesture playback rate; and adjusting, based on the determination, the slow gesture playback rate to be equivalent to the minimum playback threshold of Wang. This allows for improved generation efficiency of the sign language video as recognized by Wang [0013-0014]. As to claim 15 Jawahar in view of Natesan and further in view of Wang teach 15. The method of claim 12, Furthermore, Wang teaches further comprises sending a text segment associated with the audio information, a start time associated with the text segment, and a duration associated with the text segment. (see Wang [0191] “In some embodiments, the listener text contain corresponding timestamps, and the timestamps are used for indicating a time interval of audio corresponding to the listener text on an audio time axis. The extraction module 1302 is configured to determine candidate clip durations of candidate sign language video clips corresponding to the candidate compression statements; determine audio clip durations of audio corresponding to the text statements based on timestamps corresponding to the text statements; and determine, based on the candidate clip durations and the audio clip durations, the target compression statements from the candidate compression statements through the dynamic path planning algorithm, a video time axis of sign language video corresponding to texts composed of the target compression statements being aligned with the audio time axis of the audio corresponding to the listener text.”) (see Wang [0197] “In some embodiments, the acquisition module 1301 is configured to: acquire the input listener text; acquire a subtitle file, and extract the listener text from the subtitle file; acquire an audio file, perform speech recognition on the audio file to obtain a speech recognition result, and generate the listener text based on the speech recognition result; and acquire a video file, perform character recognition on video frames of the video file to obtain a character recognition result, and generate the listener text based on the character recognition result.”) (see Wang [0198] In summary, in the embodiments of this application, the summary text are obtained by performing text summarization extraction on the listener text, and then the text length of the listener text are shortened, so that the finally generated sign language video can keep synchronization with the audio corresponding to the listener text. Since the sign language video is generated based on the sign language text after the summary text are converted into the sign language text conforming to the grammatical structures of a hearing-impaired person, the sign language video can better express the content to a hearing-impaired person, improving the accuracy of the sign language video.”) Jawahar in view of Natesan in view of Wang are in the same field of endeavor of signal processing, therefore, it would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of combination of Jawahar and Natesan and Wang to incorporate sending a text segment associated with the audio information, a start time associated with the text segment, and a duration associated with the text segment of Wang. This allows for improved generation efficiency of the sign language video as recognized by Wang [0013-0014]. As to claim 16 Jawahar in view of Natesan and further in view of Wang teach 16. The method of claim 12, Furthermore, Wang teaches wherein the data further comprises: a start time associated with the allocated duration of each gesture; and an end time associated with the allocated duration of each gesture. (see Wang [0191] “In some embodiments, the listener text contain corresponding timestamps, and the timestamps are used for indicating a time interval of audio corresponding to the listener text on an audio time axis. The extraction module 1302 is configured to determine candidate clip durations of candidate sign language video clips corresponding to the candidate compression statements; determine audio clip durations of audio corresponding to the text statements based on timestamps corresponding to the text statements; and determine, based on the candidate clip durations and the audio clip durations, the target compression statements from the candidate compression statements through the dynamic path planning algorithm, a video time axis of sign language video corresponding to texts composed of the target compression statements being aligned with the audio time axis of the audio corresponding to the listener text.”) (see Wang [0197] “In some embodiments, the acquisition module 1301 is configured to: acquire the input listener text; acquire a subtitle file, and extract the listener text from the subtitle file; acquire an audio file, perform speech recognition on the audio file to obtain a speech recognition result, and generate the listener text based on the speech recognition result; and acquire a video file, perform character recognition on video frames of the video file to obtain a character recognition result, and generate the listener text based on the character recognition result.”) (see Wang [0198] In summary, in the embodiments of this application, the summary text are obtained by performing text summarization extraction on the listener text, and then the text length of the listener text are shortened, so that the finally generated sign language video can keep synchronization with the audio corresponding to the listener text. Since the sign language video is generated based on the sign language text after the summary text are converted into the sign language text conforming to the grammatical structures of a hearing-impaired person, the sign language video can better express the content to a hearing-impaired person, improving the accuracy of the sign language video.”) Jawahar in view of Natesan in view of Wang are in the same field of endeavor of signal processing, therefore, it would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of combination of Jawahar and Natesan and Wang to incorporate wherein the data further comprises: a start time associated with the allocated duration of each gesture; and an end time associated with the allocated duration of each gesture of Wang. This allows for improved generation efficiency of the sign language video as recognized by Wang [0013-0014]. As to claim 18 Jawahar in view of Natesan and further in view of Wang teach 18. The method of claim 17, Furthermore, Wang teaches further comprises: determining, for each gesture, the gesture playback rate by dividing a predetermined gesture time with the allocated duration; receiving the content player rate, wherein the content play rate is above a normal rate; and determining, an adjusted gesture playback rate by multiplying the content player rate with the gesture playback rate. (see Wang [0191] “In some embodiments, the listener text contain corresponding timestamps, and the timestamps are used for indicating a time interval of audio corresponding to the listener text on an audio time axis. The extraction module 1302 is configured to determine candidate clip durations of candidate sign language video clips corresponding to the candidate compression statements; determine audio clip durations of audio corresponding to the text statements based on timestamps corresponding to the text statements; and determine, based on the candidate clip durations and the audio clip durations, the target compression statements from the candidate compression statements through the dynamic path planning algorithm, a video time axis of sign language video corresponding to texts composed of the target compression statements being aligned with the audio time axis of the audio corresponding to the listener text.”) (see Wang [0197] “In some embodiments, the acquisition module 1301 is configured to: acquire the input listener text; acquire a subtitle file, and extract the listener text from the subtitle file; acquire an audio file, perform speech recognition on the audio file to obtain a speech recognition result, and generate the listener text based on the speech recognition result; and acquire a video file, perform character recognition on video frames of the video file to obtain a character recognition result, and generate the listener text based on the character recognition result.”) (see Wang [0198] In summary, in the embodiments of this application, the summary text are obtained by performing text summarization extraction on the listener text, and then the text length of the listener text are shortened, so that the finally generated sign language video can keep synchronization with the audio corresponding to the listener text. Since the sign language video is generated based on the sign language text after the summary text are converted into the sign language text conforming to the grammatical structures of a hearing-impaired person, the sign language video can better express the content to a hearing-impaired person, improving the accuracy of the sign language video.”) Jawahar in view of Natesan in view of Wang are in the same field of endeavor of signal processing, therefore, it would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of combination of Jawahar and Natesan and Wang to incorporate determining, for each gesture, the gesture playback rate by dividing a predetermined gesture time with the allocated duration; receiving the content player rate, wherein the content play rate is above a normal rate; and determining, an adjusted gesture playback rate by multiplying the content player rate with the gesture playback rate. of Wang. This allows for improved generation efficiency of the sign language video as recognized by Wang [0013-0014]. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to KRISTEN MICHELLE MASTERS whose telephone number is (703)756-1274. The examiner can normally be reached M-F 8:30 AM - 5:00 PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Louis Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /KRISTEN MICHELLE MASTERS/Examiner, Art Unit 2659 /PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659
Read full office action

Prosecution Timeline

Jun 17, 2024
Application Filed
Jan 07, 2026
Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12592219
Hearing Device User Communicating With a Wireless Communication Device
2y 5m to grant Granted Mar 31, 2026
Patent 12548569
METHOD AND SYSTEM OF DETECTING AND IMPROVING REAL-TIME MISPRONUNCIATION OF WORDS
2y 5m to grant Granted Feb 10, 2026
Patent 12548564
SYSTEM AND METHOD FOR CONTROLLING A PLURALITY OF DEVICES
2y 5m to grant Granted Feb 10, 2026
Patent 12547894
ENTROPY-BASED ANTI-MODELING FOR MACHINE LEARNING APPLICATIONS
2y 5m to grant Granted Feb 10, 2026
Patent 12547840
MULTI-STAGE PROCESSING FOR LARGE LANGUAGE MODEL TO ANSWER MATH QUESTIONS MORE ACCURATELY
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
62%
Grant Probability
87%
With Interview (+24.7%)
3y 2m
Median Time to Grant
Low
PTA Risk
Based on 40 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month