DETAILED ACTION
Case Status
This office action is in response to remarks and amendments of 22 October 2025. Claims 1-19 have been examined.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 5-9, 11-14 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Talieh et al., Patent No.: 11315569, hereinafter Talieh in view of Hirano, Pub. No.: US 20230238002, herein after Hirano and further in view of Chen et al., UNITER: UNiversal Image-Text Representation Learning, hereinafter Chen.
As per claim 1, Talieh discloses A method for automatically visualizing a transcript, the method comprising:
receiving, by at least one processor, a voice input from at least one entity (col. 1, lines 55-65, col. 3, line 56, col. 10, line 59-64, fig. 5);
recognizing, by the at least one processor, speech from the voice input using a speech recognition technique (see above rejection including at least abstract, col. 2, line 5, col. 4, lines 49-55, col. 7, lines 1-2);
Talieh does not expressly disclose, however Hirano in the related field of endeavor or speech processing discloses removing, by the at least one processor, unwanted noise from the recognized speech using at least one signal processing technique to enhance the speech, wherein the speech is processed to remove an unwanted noise, wherein the at least one signal processing technique includes a bandpass filter, a low-pass filter, loudness control, and/or acoustic echo cancellation (Hirano, pars. 89, 113, 142-144, 155, 156).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because Hirano would have allowed Talieh to incorporate a noise suppression unit that removes noise so that “the accuracy of single speech detection can be improved” and “unnatural behavior due to distortion can be suppressed” (Hirano, par. 143).
converting, by the at least one processor, the enhanced speech into a transcript using a speech transcription technique (see above rejections and note that Talieh, every figure discloses speech transcription); and
automatically visualizing, by the at least one processor, the transcript into a design diagram (Talieh col. 8, line 63 discloses Gantt charts. See also, col. 7, last par., col. 8, lines 14-43, line 60 to col. 9, line 50), comprising:
The combination does not expressly disclose, however Chen in the related field of endeavor or computer vision discloses training a language model for aligning specific parts of images with text to create a frozen language model (Chen, pages 2-3 disclose that the model is explicitly trained for fine-grained word-region alignment: “Word-Region Alignment (WRA) … to explicitly encourage fine-grained alignment between words and image regions.” and explain that the language encoder is a BERT-based and pre-trained as part of a multimodal alignment system));
converting the transcript into tokenized text (Chen, page 4 discloses “For Text Embedder, we follow BERT [9] and tokenize the input sentence into WordPieces” which applies to all forms of text, and see Talieh as cited above for the specific type being transcript);
passing the tokenized text into a frozen language model to convert the tokenized text into a sequence of embeddings that captures meaning and context of the tokenized text (Chen, pages 3-4 disclose passing tokenized text to a BERT transformer encoder which produces contextualized token representations – page 4 says “We design an Image Embedder and a Text Embedder to extract their respective embeddings. These embeddings are then fed into a multi-layer Transformer to learn a cross-modality contextualized embedding across visual regions and textual tokens”; pages 4-9 disclose using a pre-trained language model that is not further updated or modified (i.e. frozen language model));
mapping, by an embedding layer, the tokenized text to vector representations (Chen, pages 4-5 disclose a text embedding layer that is used to obtain (i.e. map to) final representations (i.e. vector representations): “The final representation for each sub-word token is obtained via summing up its word embedding and position embedding, followed by another LN layer”);
processing, by one or more bidden layers, the vector representations using a transformer architecture or other neural network architecture to generate transformed representations (Chen, pages 3-5 disclose that the embeddings are processed by multiple transformer encoder layers); and generating by an output layer from the transformed representations, an output including text predictions, scores, and/or embeddings (Chen, see pages 4-6 and abstract for MLM, ITM, and WRA which cover outputs as claimed).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because Chen would have allowed the combination to use transformer-based embeddings to better capture contextual meaning (of textual content) for downstream visual alignment and generation tasks including producing image-based charts or diagrams from speech-based transcript text.
As per claim 2, Talieh as modified discloses the method as claimed in claim 1, wherein the speech recognition technique comprises at least one from among an automatic speech recognition technique and a wake word detection technique (col. 7, line 2, col. 8, line 33).
As per claim 3, Talieh as modified discloses the method as claimed in claim 1, wherein the speech transcription technique comprises at least one from among a natural language understanding technique and a natural language generation technique (see Talieh as cited in the rejection of claim 1).
As per claim 5, Talieh as modified discloses the method as claimed in claim 1, wherein the voice input is received via an audio input device (see Talieh as cited in the rejection of claim 1).
As per claim 6, Talieh as modified discloses The method as claimed in claim 1, wherein the at least one entity is one from among a team member involved in a project, a team lead of the project, a project reviewer, a user to present the project, and a client (Talieh col. 3, line 56, col. 9, lines 39 to col. 10, line 10, col. 10, line 59-64, fig. 5).
As per claims 7-9, 11-14 and 16, they are analogous to claims above and therefore likewise rejected.
Claims 4, 10 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Talieh as modified above and further in view of Yang et al., Zero-Shot Video Question Answering via
Frozen Bidirectional Language Models, hereinafter Yang.
As per claim 4, Talieh as modified discloses The method as claimed in claim 1. The combination does not expressly disclose, however the combination in view of Yang discloses wherein the automatically visualizing the transcript into the design diagram comprises: generating, by the at least one processor, the design diagram using a frozen clip model (Yang, section 3 discloses BiLM frozen language and CLIP ViT-L/14 frozen clip model, the language model encoding input text/tokens and the clip model for answering visual questions in a multimodal manner; see Talieh as cited above for design diagrams).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because Yang would have allowed the combined teaching to replace the text input in Yang with a transcript from a speech system and use the joint embedding of frozen language and frozen CLIP encoders to generate corresponding visual content (such as Talieh’s visual output including, but not limited to, col. 8, line 63 Gantt charts). The motivation to combine lies in minimizing training costs with both models being frozen and using powerful pretrained encoders, available to an ordinary artisan before the effective filing date of the claimed invention, in order to apply these in a practical interface for more relevant visual content generated in an intelligent manner (such as Talieh’s meeting transcript-based auto-generated visual content).
Analogous claims 10 and 15 are likewise rejected.
Claims 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Talieh as modified above and further in view of Srinivasamurthy et al., Pub. No.: US 20060161901 A1, hereinafter Srinivasamurthy.
As per claim 17, Talieh as modified discloses The method as claimed in claim 1. The combination does not expressly disclose, however the combination in view of Srinivasamurthy discloses wherein the design diagram is a flowchart (Srinivasamurthy, abstract, pars. 12, 20, 28). Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the cited references because Srinivasamurthy would have allowed the combined teaching to automatically generating flowcharts (in addition to, or in place of Talieh’s automatic generation of charts such as Gantt charts). This would allow the combined teaching to support display of many alternative forms of well-known visualizations; Srinivasamurthy par. 2 says “…information may be represented in a myriad of different formats, each of which define a particular arrangement of data that can be processed and/or stored by a computer. In some computing environments, it is desirable to represent information in a process or flow. Any particular computing process can be represented in a flow diagram for easier visual comprehension of the flow of the particular process…”.
Claims 18-19 are likewise rejected.
Response to Arguments
Applicant's arguments filed 22 October 2025 have been considered. In view of the amendments and Applicant’s arguments, the 35 USC 101 rejection has been withdrawn. With respect to the prior art rejection, Chen et al., UNITER: UNiversal Image-Text Representation Learning, has been applied in response to claim amendments.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SYED HASAN whose telephone number is (571)270-5008. The examiner can normally be reached M-F 8am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Boris Gorney can be reached at (571)270-5626. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Syed Hasan
Primary Examiner
Art Unit 2154
/SYED H HASAN/Primary Examiner, Art Unit 2154