Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The Amendment filed December 2nd, 2025 has been entered. Claims 1, 10 and 16 have been amended, while claim 18 has been cancelled. Claim 21 has been newly added. Claim 1-17 and 19-21 remain rejected in the application. Applicant’s amendments to the drawings and specifications have overcome each and every objection previously set forth in the Non-Final Office Action mailed September 2nd, 2025 and have therefore been withdrawn.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3-6, 9, 10, 12-14, 16 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Cui et al. (U.S. Patent: #12,197,881 B2), hereinafter Cui, in view of Dell et al. (Pub. No.: US 2023/0222149 A1), hereinafter Dell and further in view of Mishra et al. (Pub. No: US 2021/0342552 A1), hereinafter Mishra and Finegan et al. (U.S. Patent: #11,797,780 B1), hereinafter Finegan.
Regarding claim 1, Cui discloses a method (FIG. 5 and Col. 1 Lines 58-60 teach that FIG. 5 is a flowchart illustrating a text-to-visualization method according to some implementations of the present disclosure.), comprising:
dividing a set of textual data into a plurality of text chunks (Col. 4 Lines 59-64 teach that after the text 202 is provided to the parser 204, the parser 204 can parse the text 202 into a plurality of information items. For example, the parser 204 can divide or segment the text 202 into a plurality of portions, and each portion indicates respective information and corresponds to an information item.);
generating a plurality of text based on processing the plurality of text chunks using one or more machine learning models (Col. 7 Lines 43-49 teach that at 502, a plurality of information items are extracted from a natural language sentence. In some implementation, extracting the plurality of information items includes extracting the plurality of information items from the natural language sentence through a machine learning based model. The machine learning based model can be the parser 204 as shown in FIG. 3). However, Cui fails to disclose that the plurality of information text items/portions relate to text summaries.
Dell discloses that the plurality of information text consist of text summaries (Paragraph 26 teaches that the summary model 110 is a machine learning model that has been trained to produce summaries of text. For example, summary model 110 may be a denoising autoencoder that is implemented as a sequence-to-sequence model with a bidirectional encoder over a string of text and a left-to-right auto-regressive decoder.) Since Cui teaches method steps for dividing sets of textual information data into portions/chunks and each portion corresponds to an information item and then using a machine learning model to extract informational items related to natural language sentences and Dell teaches using a machine learning model that has been trained to produce and generate summaries of text, it would have been obvious to a person having ordinary skill in the art to combine the teachings together so that the training of the machine learning model could also consist of the portions of the informational items, and those portions would consist of text summaries to then use for the training of a machine learning model.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cui to incorporate the features of Dell, so that the combined features together would allow for additional machine learning model training that could include the training of portions of text and those portions would consist of generated summaries of text data.
However, Cui in view of Dell fail to disclose generating a plurality of keywords based on processing at least one of the plurality of text chunks or the plurality of text summaries using the one or more machine learning models.
Mishra discloses generating a plurality of keywords based on processing at least one of the plurality of text chunks or the plurality of text summaries using the one or more machine learning models (Paragraph 56 teaches that an embodiment of the present invention may further generate the keyword vectors by generating word embeddings for each of the keywords, encoding the word embeddings using a machine learning model to produce encoded vector representations of the keywords, and generating the keyword vectors based on the encoded vector representations. The machine learning model is trained to produce the same encoded vector representations for a set of keywords regardless of an order of the keywords. This enables the natural language content generation to be agnostic of the order of the keywords, thereby generating consistent content from the same keywords (regardless of order) that closely aligns with the template.). Since Cui in view of Dell teach generating text summaries based on information items related to text portions/chunks using a machine learning model and Mishra teaches generating keyword vectors related to keywords to be used for a machine learning model, it would have been obvious to a person having ordinary skill in the art to combine the teachings together so that any of the information items being generated could also consist of certain keywords and that those keywords could also then be used for helping in the training of a machine learning model.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cui in view of Dell to incorporate the features of Mishra, so that the combined features together would allow for the generation of additional text information items, such as specific keywords, to be used in the training of a machine learning model.
However, Cui in view of Dell and Mishra fail to disclose wherein the plurality of keywords includes, for each respective text summary of the plurality of text summaries, a corresponding set of one or more keywords generated by processing the respective text summary using the one or more machine learning models.
Finegan discloses wherein the plurality of keywords includes, for each respective text summary of the plurality of text summaries, a corresponding set of one or more keywords generated by processing the respective text summary using the one or more machine learning models (Col. 6, Lines 23-30 teaches that in block 314, a set of keywords is generated from the summary using the set of large language machine learning models. These keywords are distilled from the natural language summaries, reducing the summaries down to a number of keywords that grab the essence of the text documents and enabling the text-to-image model to generate images that are both relevant and reasonable light of the provided text documents. Additionally, Col. 6, Lines 44-53 teaches that in some embodiments, generating the set of keywords includes generating a sentiment using the set of large language machine learning models. The set of keywords can then be generated from the summary and the sentiment using the set of large language machine learning models. In some embodiments, generating the set of keywords includes generating a topic using the set of large language machine learning models. The set of keywords can then be generated from the summary and the topic using the set of large language machine learning models.). Since Cui in view of Dell and Misha teach generating text summaries and keywords based on information items related to text portions/chunks using a machine learning model and Finegan teaches generating sets of keywords related to text summaries for use in training a machine learning model, it would have been obvious to a person having ordinary skill in the art to combine the teachings together so that any of the text summaries being initially generated for use for training a machine learning model could then also be improved for training by incorporating sets of keywords corresponding to the text summaries, instead of trying to train on entire text summaries and potentially unrelated keywords.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cui in view of Dell and Misha to incorporate the features of Finegan, so that the combined features together would allow for the incorporation of additional corresponding sets of keywords related to the text summaries, to be used in the training of a machine learning model, which should help in reducing the overall amount of training time it would take for the machine learning model, if it’s focusing only on corresponding sets of keywords, instead of entire text summaries.
In addition, Cui in view of Dell, Mishra and Finegan disclose selecting a first visualization template, from a library of visualization templates, based on at least one of the plurality of keywords generated for at least one of the text summaries (Col. 6, Lines 10-20 of Cui teach that in some implementations, various types of templates can be designed or otherwise obtained to construct a template library. The template 404 can include a visualization feature indicating numerical information. For example, in an example template, the visualization feature can be in the form of ring map, in which two arcs having different colors are fused to form a circle, and a proportion of one of the two arcs in the circle represents the value thereof. The template library can include general templates for various scenarios and dedicated templates dedicated to particular scenarios. Additionally, Col. 7, Lines 4-9 of Finegan teach that in block 316, an image prompt is generated from the set of keywords using the set of large language machine learning models. For example, taking the set of the words as input, the large language model generates the natural language prompt that can be ingested into the text-to-image model.);
selecting a first set of icons, from a library of icons, based on at least one of the plurality of keywords generated for at least one of the text summaries (Col. 6, Lines 1-9 of Cui teach that in some implementations, a visual element library can be constructed for each visual element. For example, various types of icons can be designed or otherwise obtained to construct an icon library. The icon library can include icons which are used as pictograms, containers, backgrounds and so on, and each icon can include an associated descriptive tag for match use. Therefore, one or more icons 402 can be selected from the icon library by matching these information items with the descriptive tags in the icon library. Additionally, Col. 7, Lines 4-9 of Finegan teach that in block 316, an image prompt is generated from the set of keywords using the set of large language machine learning models. For example, taking the set of the words as input, the large language model generates the natural language prompt that can be ingested into the text-to-image model.);
and generating a first visualization using the first visualization template and the first set of icons and using at least one of the plurality of text summaries (Col. 5, Lines 58-67 of Cui teach that now returning to FIG. 2, after the parser 204 parses information items from the text 202, the generator 206 generates candidate visual representations of the text 202. The generator 206 can determine visual elements associated with the information items extracted by the parser 204. FIG. 4 is a schematic diagram illustrating the generator 206 according to some implementations of the present disclosure. As shown in FIG. 4, the visual elements may be one or more of an icon 402, a template 404, a description 406, and a color 408.).
Regarding claim 3, Cui in view of Dell, Mishra and Finegan disclose everything claimed as applied above (see claim 1), in addition, Cui in view of Dell, Mishra and Finegan disclose receiving a set of user keywords from a user, wherein generating at least one of the plurality of text chunks or the plurality of text summaries is performed based at least in part on the set of user keywords (Paragraph 37 of Mishra teaches that client systems 114 enable users to submit sets of keywords and templates (and optionally part-of-speech (POS) tags for the keywords) to server systems 110 for generation of natural language content (e.g., sentences, clauses, phrases, etc.). In addition, paragraph 49 of Dell teaches that operations 400 begin at step 402, with providing, based on a text string comprising a first number of tokens, one or more first inputs to a summary model that has been trained to generate summaries of input text.).
Regarding claim 4, Cui in view of Dell, Mishra and Finegan disclose everything claimed as applied above (see claim 1), in addition, Cui in view of Dell, Mishra and Finegan disclose wherein dividing the set of textual data into the plurality of text chunks comprises identifying split points in the set of textual data based on (i) a defined minimum number of summaries, (ii) a defined maximum number of words per text chunk, and (iii) locations of end-of-sentence tokens in the set of textual data (Paragraph 32 of Dell teaches that in split path 106, a plurality of segments 130.sub.1-n (collectively, segments 130 and individually, segment 130) are determined from text 102 and separately provided as inputs to summary model 110. Each segment 130 represents a portion of text 102 (e.g., a given number of characters or a given number of tokens from text 102). ... The number of segments and the size of the buffer may be configurable parameters. Additionally, paragraph 53 of Dell teaches that in some embodiments, rather than tokens, characters or another input unit may be used, and the summarized version of the text string may comprise a number of characters or other unit that is less than or equal to a maximum number of input characters or maximum number of other unit for the machine learning model.).
Regarding claim 5, Cui in view of Dell, Mishra and Finegan disclose everything claimed as applied above (see claim 1), in addition, Cui in view of Dell, Mishra and Finegan disclose identifying a set of template keywords associated with the first visualization template (Paragraph 50 of Mishra teaches that according to one embodiment of the present invention, a system comprises a processor to generate natural language content from a set of keywords in accordance with a template. Keyword vectors representing a context for the keywords are generated. The keywords are associated with language tags (e.g., part-of-speech (POS) or other tags, etc.), while the template includes a series of language tags (e.g., part-of-speech (POS) or other tags, etc.) indicating an arrangement for the generated natural language content. Template vectors are generated from the series of language tags of the template and represent a context for the template.);
computing a similarity score between the set of template keywords and the at least one of the plurality of keywords (Paragraph 78 of Mishra teaches that the importance or context weight is provided in the form of a probability term (e.g., λ=[λ.sub.1, λ.sub.2, λ.sub.3, . . . λ.sub.M] as viewed in FIG. 3) indicating a likelihood of a match between a part-of-speech (POS) tag for a keyword and a template tag, ... The bias and weights may be determined to provide a desired distribution or curve for the sigmoid function which produces a probability value based on the embeddings for the template tag and a keyword tag having the greatest similarity. Additionally, paragraph 79 of Mishra teaches that the probability term, λ, for each template tag indicates the highest probability for a match between that template tag and one of the keyword tags, and typically has a value that resides between 0 and 1 (e.g., or other range indicating between 0% and 100% probability of a match));
and selecting the first visualization template based on the similarity score (Paragraph 80 of Mishra teaches that the weighted contexts for the keywords and template tags are combined by combiners 355, and provided to decoding module 358. When the probability term, λ, indicates a likely match between a template tag and one of the keyword tags, the attended context of the keywords has greater influence over decoding module 358 for selection of one of the keywords for the template tag (as opposed to another word in the word vocabulary derived from the training data)).
Regarding claim 6, Cui in view of Dell, Mishra and Finegan disclose everything claimed as applied above (see claim 1), in addition, Cui in view of Dell, Mishra and Finegan disclose identifying a set of icon keywords associated with a first icon of the set of icons (Col. 7, Line 65 through Col. 8. Line 7 of Cui teach that at 504, a plurality of visual elements associated with the plurality of information items are determined. In some implementations, determining the plurality of visual elements includes: obtaining at least one of the plurality of visual elements associated with the plurality of information items from a visual element library, the visual element library comprising a plurality of predetermined visual elements. In some implementations, the plurality of visual elements include at least one of an icon, a template, a description and a color.);
computing a similarity score between the set of icon keywords and the at least one of the plurality of keywords (Col. 7, Lines 4-13 of Cui teach that in some implementations, the score of each visual representation includes one or more of a semantic score, an aesthetic score and an informative score. For example, the semantic score indicates a degree of semantic match between the visual element in the candidate visual representation and the natural language sentence. For example, for an information item “student,” the candidate visual representation including the student icon should have a higher semantic score than the visual representation of another candidate icon showing a human.);
and selecting the first icon for inclusion in the first set of icons based on the similarity score (Col. 6, Lines 50-62 of Cui teach that after the visual elements are determined, the generator 206 can organize the visual elements into different candidate visual representations in different manners, and rank the candidate visual representations. For example, the generator 206 can combine these visual elements to determine the candidate visual representations. In some implementations, a synthesis and ranking module 410 can determine respective scores of the candidate visual representations, and rank the candidate visual representations based on the respective scores of the candidate visual representations. The synthesis and ranking module 410 can select one or more visual representations from the candidate icons based on the ranking of the candidate visual representations.).
Regarding claim 9, Cui in view of Dell, Mishra and Finegan disclose everything claimed as applied above (see claim 1), in addition, Cui in view of Dell, Mishra and Finegan disclose further comprising outputting the first visualization via one or more display devices (Col. 4, Lines 21-24 of Cui teach that the visual representation can be provided to the output device 160 and further to a user or the like, as an output 180. For example, the visual representation can be displayed on the display.).
Regarding claim 10, the system steps correspond to and are rejected similarly to the method steps of claim 1 (see claim 1 above). In addition, Cui discloses a system (FIG. 1 and Col. 2, Lines 19-21 teach that FIG. 1 is a block diagram illustrating a computing device 100 for implementing various implementations of the present disclosure.) comprising:
one or more memories collectively storing computer-executable instructions;
and one or more processors configured to collectively execute the computer-executable instructions and cause the system to perform an operation (Col. 2, Lines 27-31 teach that the components of the computing device 100 can include, but are not limited to, one or more processors or processing units 110, a memory 120, a storage device 130, one or more communication units 140, one or more input devices 150, and one or more output devices 160.).
Regarding claim 12, the system steps correspond to and are rejected similarly to the method steps of claim 3 (see claim 3 above).
Regarding claim 13, the system steps correspond to and are rejected similarly to the method steps of claim 5 (see claim 5 above).
Regarding claim 14, the system steps correspond to and are rejected similarly to the method steps of claim 6 (see claim 6 above).
Regarding claim 16, the non-transitory computer readable medium corresponds to and is rejected similarly to the method steps of claim 1 (see claim 1 above). In addition, Cui discloses one or more non-transitory computer-readable media containing, in any combination, computer program code that, when executed by operation of a computer system, performs an operation (Col. 10, Lines 16-22 teach that in a third aspect, there is provided a computer program product being tangibly stored in a non-transitory computer storage medium and comprising machine-executable instructions, the machine executable instructions, when executed by a device, causes the device to implement the method according to the second aspect of the present disclosure.).
Regarding claim 19, the non-transitory computer readable medium corresponds to and is rejected similarly to the method steps of claim 5 (see claim 5 above).
Regarding claim 20, the non-transitory computer readable medium corresponds to and is rejected similarly to the method steps of claim 6 (see claim 6 above).
Claims 2, 7, 11, 17 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Cui in view of Dell, Mishra and Finegan as applied to claims 1 and 10 above, and further in view of Ivers et al. (Pub. No.: US 2022/0208155 A1), hereinafter Ivers.
Regarding claim 2, Cui in view of Dell, Mishra and Finegan disclose everything claimed as applied above (see claim 1), in addition, Cui in view of Dell, Mishra and Finegan disclose accessing audio data comprising natural language audio (Paragraph 146 of Mishra teaches that the keyword, tags, and generated natural language content may be in any desired form (e.g., text, characters, extracted from images or video using optical character recognition or other techniques, extracted from any documents using natural language processing (NLP) or other techniques, extracted from audio files, etc.).);
However, Cui in view of Dell, Mishra and Finegan fail to disclose delineating the audio data into a set of audio segments.
Ivers discloses delineating the audio data into a set of audio segments (Paragraph 135 teaches that in addition, the instructions (106) include a segmentation module (114), which when executed by the processor (104), facilitates the segmentation of an audio track (124) into a plurality of topical audio segments or chapters. In accordance with one embodiment, the segmentation module (114) divides audio tracks (124) into one or more segments, i.e., chapters, denoting some transition between portions of the audio of the audio track (124), e.g., changes in topics or themes, etc.). Since Cui in view of Dell, Mishra and Finegan teach using audio data related to natural language audio and Ivers teaches delineating and segmenting audio data into proper segments or chapters, it would have been obvious to a person having ordinary skill in the art to combine the teachings together so that in addition to being able to segment and separate text portions/chunks of summarized data, any audio data collected to be used for visualization purposes, could also then be delineated and separated into different portions/segments as well.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cui in view of Dell, Mishra and Finegan to incorporate the features of Ivers, so that the combined features together would provide additional separation and segmenting of information data to be used for generating visualizations, including audio data as well.
In addition, Cui in view of Dell, Mishra, Finegan and Ivers disclose generating a set of text transcriptions by processing each respective audio segment, of the set of audio segments, using a speech-to-text machine learning model (Paragraph 251 of Ivers teaches that the system may generate (362) a transcript dataset from the episode audio, which may be performed using conventional transcription software and methods. In some implementations, the generated (362) transcript may include additional indicators from the episode audio beyond speech to text transcription, which may include, for example, indications of periods of silence, indications of laughter, indications of background music or sound effects, indications of audio amplitude, and other non-speech characteristics. The generated (362) transcription may be time indexed such that any single word or other indicator in the transcript corresponds to a time location within the episode audio. Also, Col. 4, Lines 53-58 of Cui teach that without departing from the spirits and scope of the present disclosure, the text 202 may also be other types of natural language sentences, statements or texts, or even may not include numerical information. The text 202 can be provided in the form of voice, speech, or the like, and obtained in a speech-to-text form.);
and concatenating the set of text transcriptions to form the set of textual data (Paragraph 254 of Ivers teaches that the system may then determine one or more moment models to apply to the transcript dataset in order to automatically identify relevant moments occurring within the transcript dataset and the corresponding episode audio. As has been described, a moment should be understood to include a portion of a sentence, a sentence, or a set of related sentences, from a transcript dataset and/or corresponding episode audio, that has a particularly high relevance to a particular moment type. Additionally, paragraph 259 of Ivers teaches that continuing the above example, where the base model is used to analyze a transcript dataset for a podcast that is associated with a genre, sub-genre, or other model type, some or all of the set of ten moments may also be added to the training dataset for a corresponding podcast specific model, genre specific model, or sub-genre specific model. This may be advantageous to build initial training datasets for podcast specific, genre specific, and sub-genre specific models that have not yet been created, due to the lack of sufficient training data.).
Regarding claim 7, Cui in view of Dell, Mishra and Finegan disclose everything claimed as applied above (see claim 1), however, Cui in view of Dell, Mishra and Finegan fail to disclose further comprising at least one of: (i) receiving a title from a user, or (ii) generating a title based on processing at least one of the plurality of text chunks or the plurality of text summaries using the one or more machine learning models, wherein generating the first visualization comprises adding the title to the first visualization.
Ivers discloses further comprising at least one of: (i) receiving a title from a user, or (ii) generating a title based on processing at least one of the plurality of text chunks or the plurality of text summaries using the one or more machine learning models, wherein generating the first visualization comprises adding the title to the first visualization (Paragraph 144 teaches that in some embodiments, the textual element (122) is a title. The title may be defined by a curator (human, software, hardware, or a combination thereof) during the review of the audio track (10) for segmentation. In some embodiments, the segmentation module (114) alone or in concert with the voice recognition module (112) is (are) further configured to provide a title to the audio segment via a title algorithm (121) stored in memory (108). Input for the titling algorithm (121) is an audio signal, such as audio signal (10), and the output is text that represents a sequence of titles for each word/sentence in the speech. In some embodiments, the titling algorithm (121) is an HMM. Additionally, FIG. 7 and paragraph 194 teach that in accordance with another aspect of the present disclosure and with reference to FIG. 7, a flowchart of an exemplary method (700) for packaging audio segments is provided. The method segments long-playing audio tracks, e.g., audio tracks (124), into audio segments (126) and tags the audio segments with meaningful textual elements (122) while linking a visual asset (126) to the indexed audio segment. In this way, the audio segments (126) are easily searchable and sharable, e.g., via social media platforms.). Since Cui in view of Dell, Mishra and Finegan teach the initial method steps for generating a visualization using a template, icons and text summaries and Ivers teaches applying and adding/tagging a textual element, such as a title, to a visualization of segmented audio data and the title is created either by a user or a machine algorithm, it would have been obvious to a person having ordinary skill in the art to combine the teachings together so that the visualizations being created could also have a title added or assigned to each of the templates being used for the segmented summaries of textual data.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cui in view of Dell, Mishra and Finegan to incorporate the features of Ivers, so that the combined features together would allow for improved classification and searching of specific segmented text data and summaries because each visualization template would also have an associated title related to each text portion/chunk.
Regarding claim 11, the system steps correspond to and are rejected similarly to the method steps of claim 2 (see claim 2 above).
Regarding claim 17, the non-transitory computer readable medium corresponds to and is rejected similarly to the method steps of claim 2 (see claim 2 above).
Regarding claim 21, Cui in view of Dell, Mishra and Finegan disclose everything claimed as applied above (see claim 1), however, Cui in view of Dell, Mishra and Finegan fail to disclose wherein each icon in the library of icons is associated with a set of icon keywords generated using a computer vision model configured to generate textual descriptions of icon image data, and selecting the first set of icons comprises matching the plurality of keywords to the set of icon keywords generated via computer vision.
Ivers discloses wherein each icon in the library of icons is associated with a set of icon keywords generated using a computer vision model configured to generate textual descriptions of icon image data, and selecting the first set of icons comprises matching the plurality of keywords to the set of icon keywords generated via computer vision (Paragraph 147 teaches that in some embodiments, the image searching module is configured to execute a visual matching algorithm (125) that can suggest a visual asset (128) that is relevant to a textual element (122) of and audio segment (126). The visual matching algorithm (125) may use a Named Entity Recognition algorithm that extracts a textual element and develops a pairing based on the extracted content. The visual matching algorithm (125) may incorporate a NER system (New Enhancement Recognition System). In some embodiments, the visual matching algorithm (125) may include AI for removing duplicate and watermarked images. In some embodiments, the visual matching Algorithm (125) utilizes a Text Semantic Similarity Neural Network based on natural language understanding. Additionally, paragraph 150 teaches that in some embodiments, the visual asset (128) paired with the indexed audio segment (126) is a collage. That is, the image-searching module (116) is further configured to create a collage image from visual assets (126) (images and/or videos stored in database (144)), based on tags, topic names, summaries, and/or user explanations/descriptions. Lastly, paragraph 151 teaches that FIG. 9A illustrates an example of the architecture of a preferred Attentional Generative Adversarial Neural Network (900). Each attentional model (902), (903) automatically retrieves the words, represented by word vectors (904) (embeddings—a method used to represent discrete variables as continues vectors), for generating different sub-regions of a collage. The Deep Attentional Multimodal Similarity Model (906) provides the fine-grained image-text matching loss function for the generative network (900). The DAMSM (906) is composed of at least two neural networks, one that maps the sub-regions of the image and the other maps the words of the sentence to a common semantic space by measuring the image-text similarity at the word level to computer mentioned fine-grained loss function for the image generation. In some embodiments, a text encoder (907), similar or the same as the Recurrent Neural Network, is used to generate a descriptive copy. The image encoder (908) is preferably a Convolutional Neural Network for generating an image. FIG. 9B is an exemplary collage created via the Attentional Generative Network (900). In some embodiments, Computer Vision is used to segment an image. In Computer Vision, image segmentation is the process of portioning a digital image into multiple segments. The segmentation is performed to simplify and/or change the representation of an image into something that is more meaningful and/or easier to analyze. In some embodiments, image segmentation is used for object extraction and those extracted objects are used to generate portions of a collage.). Since Cui in view of Dell, Mishra and Finegan teach the initial method steps for generating a visualization using a template, icons and text summaries for object and text recognition, which relate to computer vision and Ivers teaches the capabilities of using computer vison to help in generating text-to-image creations, such as textual descriptions being used to generate different icons, it would have been obvious to a person having ordinary skill in the art to combine the teachings together so that the initial visualizations and icons being created could also have incorporated a computer vision model in the process and utilized text-to-image creations based on textual descriptions and/or certain keywords.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cui in view of Dell, Mishra and Finegan to incorporate the features of Ivers, so that the combined features together would allow for improved accuracy when generating any of the associated icons used within the library of icons because they would be generated using object recognition and/or a computer vision model using related textual description and/or keywords associated with the proper icon to be generated.
Claims 8 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Cui in view of Dell, Mishra and Finegan as applied to claims 1 and 10 above, and further in view of Tory et al. (U.S. Patent: #11,314,817 B1), hereinafter Tory.
Regarding claim 8, Cui in view of Dell, Mishra and Finegan disclose everything claimed as applied above (see claim 1), in addition, Cui in view of Dell, Mishra and Finegan disclose dividing the set of textual data into a plurality of text chunks (Col. 4, Lines 59-64 of Cui teach that after the text 202 is provided to the parser 204, the parser 204 can parse the text 202 into a plurality of information items. For example, the parser 204 can divide or segment the text 202 into a plurality of portions, and each portion indicates respective information and corresponds to an information item). However, Cui in view of Dell, Mishra and Finegan fail to disclose that set of textual data is a new plurality of text chunks.
Tory discloses the set of textual data is a new plurality of text chunks (Col. 11, Lines 38-46 teach that some additional examples of user intent include: elaborate (add new information to the visualization); adjust/pivot (adapt aspects of the visualization, such as apply/remove/modify a filter, or add/remove data fields); start new (create an altogether new visualization); retry (re-attempt a previous step that “failed”—either for technical reasons, such as a query timeout, or because the previous command failed to convey the desired visualization); and undo (return to the prior state)). Since Cui in view of Dell, Mishra and Finegan teach the initial method steps for dividing a set of textual data into a plurality of text portions/chunks for visualization and Tory teaches a concept of being able to add new information or make adjustments after a first initial visualization procedure has been completed, as well as the ability to start brand new visualization procedures if needed, it would have been obvious to a person having ordinary skill in the art to combine the concepts together so that any method step or function being implemented could then also be adjusted or modified to include new information text data separate from the first initial information data that was being gathered.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cui in view of Dell, Mishra and Finegan to incorporate the concepts of Tory, so that the combined concepts together would allow for additional steps related to the application of new text data to be incorporated, including dividing up new sets of information data that would consist of new portions/chunks of text data.
In addition, Cui in view of Dell, Mishra, Finegan and Tory disclose generating a new plurality of text summaries based on processing the new plurality of text chunks using the one or more machine learning models (Col. 7, Lines 43-49 of Cui teach that at 502, a plurality of information items are extracted from a natural language sentence. In some implementation, extracting the plurality of information items includes extracting the plurality of information items from the natural language sentence through a machine learning based model. The machine learning based model can be the parser 204 as shown in FIG. 3. Additionally, paragraph 26 of Dell teaches that the summary model 110 is a machine learning model that has been trained to produce summaries of text. For example, summary model 110 may be a denoising autoencoder that is implemented as a sequence-to-sequence model with a bidirectional encoder over a string of text and a left-to-right auto-regressive decoder.);
generating a new plurality of keywords based on processing at least one of the new plurality of text chunks or the new plurality of text summaries using the one or more machine learning models (Col. 15, Lines 54-67 of Tory teach that in some implementations, the computer also receives (590) a second natural language command and determines: one or more second keywords (591) from the second natural language command, a second user intent (592) based on the one or more keywords, and a second context (593) based on the first natural language command and/or the data visualization that is currently displayed. The computer then modifies (594) the data visualization based on the second user intent and the second context and displays (595) the modified data visualization. For example, in FIG. 3A, the computer receives a second natural language command 304 and in response, generates and displays the second data visualization 314. Additional details are provided above with respect to FIG. 3A. Additionally, paragraph 56 of Mishra discloses that an embodiment of the present invention may further generate the keyword vectors by generating word embeddings for each of the keywords, encoding the word embeddings using a machine learning model to produce encoded vector representations of the keywords, and generating the keyword vectors based on the encoded vector representations. The machine learning model is trained to produce the same encoded vector representations for a set of keywords regardless of an order of the keywords. This enables the natural language content generation to be agnostic of the order of the keywords, thereby generating consistent content from the same keywords (regardless of order) that closely aligns with the template.);
selecting a second visualization template, from the library of visualization templates, based on at least one of the new plurality of keywords (Col. 6, Lines 10-20 of Cui teach that in some implementations, various types of templates can be designed or otherwise obtained to construct a template library. The template 404 can include a visualization feature indicating numerical information. For example, in an example template, the visualization feature can be in the form of ring map, in which two arcs having different colors are fused to form a circle, and a proportion of one of the two arcs in the circle represents the value thereof. The template library can include general templates for various scenarios and dedicated templates dedicated to particular scenarios.);
selecting a second set of icons, from the library of icons, based on at least one of the new plurality of keywords (Col. 6, Lines 1-9 of Cui teach that in some implementations, a visual element library can be constructed for each visual element. For example, various types of icons can be designed or otherwise obtained to construct an icon library. The icon library can include icons which are used as pictograms, containers, backgrounds and so on, and each icon can include an associated descriptive tag for match use. Therefore, one or more icons 402 can be selected from the icon library by matching these information items with the descriptive tags in the icon library.);
and generating a second visualization using the second visualization template and the second set of icons and using at least one of the new plurality of text summaries (Col. 12, Lines 33-45 of Tory teach that in some implementations, an initial data visualization may be generated based on a user's selection of a data source and one or more data fields. After the initial data visualization is generated and displayed (in the data visualization region 414), the user may use natural language commands (e.g., in the natural language processing region 420) to further explore the displayed data visualization. For example, the user may provide a command to create a relationship between two data elements. In response to receiving the command, an updated data visualization that shows a correlation between the two data elements is generated and displayed in the data visualization region 414. In addition, Col. 5, Lines 58-67 of Cui teach that now returning to FIG. 2, after the parser 204 parses information items from the text 202, the generator 206 generates candidate visual representations of the text 202. The generator 206 can determine visual elements associated with the information items extracted by the parser 204. FIG. 4 is a schematic diagram illustrating the generator 206 according to some implementations of the present disclosure. As shown in FIG. 4, the visual elements may be one or more of an icon 402, a template 404, a description 406, and a color 408.).
Regarding claim 15, the system steps correspond to and are rejected similarly to the method steps of claim 8 (see claim 8 above).
Response to Arguments
Applicant’s arguments, see Remarks pages 13-16, filed December 2nd, 2025, with respect to the rejections of claims 1-20 under 35 U.S.C. 103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of the newly applied prior art reference of Finegan and therefore, the newly amended claim language is still rejected under 35 U.S.C. 103 with the additional prior art of Finegan being incorporated (see respectively, claims 1, 10 and 16 above for reasoning).
In regards to the additional arguments regarding any of the dependent claims 2-17 and 19-21, for the virtue of their dependency are moot because the independent claims are not allowable.
In regards to the newly added claim 21, see claim 21 above for additional reasoning of rejection under 35 U.S.C. 103.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Goodrich et al. (U.S. Patent: #11,488,359 B2) teaches a collection management system that can organize collections of text and create icons for the associated collections
Ramesh et al. (Pub. No.: US 2018/0314715 A1) teaches an image management and text selection system used for training a machine learning system.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to George Renze whose telephone number is (703)756-5811. The examiner can normally be reached Monday-Friday 9:00am - 6:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached at (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/G.R./Examiner, Art Unit 2613
/XIAO M WU/Supervisory Patent Examiner, Art Unit 2613