DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The application has been examined. Claims 1 – 20 are pending in this office action.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of pre-AIA 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under pre-AIA 35 U.S.C. 103(a) are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims under pre-AIA 35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA 35 U.S.C. 103(c) and potential pre-AIA 35 U.S.C. 102(e), (f) or (g) prior art under pre-AIA 35 U.S.C. 103(a).
Claims 1 – 20 are rejected under pre-AIA 35 U.S.C. 103(a) as being unpatentable over Srivastava et al. (US 20200311467 A1) (‘Srivastava’ herein after) further in view of Sohyeong Kim (US 20240338957 A1) (‘Kim’ herein after).
With respect to claim 1, 12, 20,
Srivastava discloses a method of organizing data, the method comprising: extracting, by a processor, a first textual information from one or more electronic documents (figures 1, 3, paragraphs 17 teaches the embedded text characters and may extract the embedded text from the image content, the image processing service may implement optical character recognition (OCR) techniques to identify features that may represent text characters that make up words, Srivastava); segmenting, by the processor, the first textual information into one or more chunks of sentences including at least one word, based on pre-defined rules (figures 1, 3 and paragraph 20, Srivastava); generating, by the processor, using a machine learning model, first numerical representations of the one or more chunks (figures 1, 3, paragraphs 18 – 20 teach machine learning model used to generate a vector representation, Srivastava); storing, in a memory, identity of each of the one or more electronic documents, the one or more chunks, and the first numerical representations, wherein an association between the one or more chunks, the first numerical representations, and a respective electronic document of the one or more electronic documents is also stored (paragraphs 32, 43 and 45 – 48 teach storing the data and the classifications, Srivastava); extracting, by the processor, one or more images from the one or more electronic documents (figures 1, 3, paragraphs 17 teaches the image that contains the embedded text characters and may extract the embedded text from the image content, the image processing service may implement optical character recognition (OCR) techniques to identify features that may represent text characters that make up words, Srivastava); extracting, by the processor, a second textual information from the one or more images, wherein the second textual information includes one or more keywords (figures 1, 3, paragraphs 17 teaches the image that contains the embedded text characters and may extract the embedded text from the image content, the image processing service may implement optical character recognition (OCR) techniques to identify features that may represent text characters that make up words, Srivastava); generating, by the processor, using the machine learning model, second numerical representations of the one or more keywords (figures 1, 3, paragraphs 18 – 20 teach machine learning model used to generate a vector representation, Srivastava); matching, by the processor, the first numerical representations with the second numerical representations for determining an association of the one or more images with the first textual information based on the association of the first numerical representations with the one or more chunks (paragraphs 35 and 45 – 48 teaches determining relation between the representations, Srivastava) and updating, by the processor, the memory for storing the association of the one or more images with the first textual information (paragraphs 43 – 50 teach matching the representations and storing the associations, Srivastava).
Srivastava teaches images being processed but does not explicitly state as claimed the images could also be electronic documents.
However, Kim teaches that the media that is processed is electronic documents which could be text files or image or video media files in paragraphs 20 – 28. Furthermore, Kim teaches data extraction from the files and analyzing by the machine learning model to label and associate them and classify them.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Kim with the teaching of Srivastava because both are directed to the same field of processing data for extraction, classification and building the data for accurate results. Furthermore, in accordance with paragraph 12 Kim teaches where one approach may receive an image that includes one or more embedded characters, which are identified from the image. A first machine learning model may be used to determine a text vector. The text vector may represent numerical representations of the one or more embedded characters. A second machine learning model may be used to determine an image vector. The image vector may represent numerical representations of the image. The text vector and the image vector may be processed to generate a multi-modal vector that represents information from the text vector and the image vector. Based on the multi-modal vector, the image may be classified into one of a plurality of image classifications. Image classifications may be used by downstream systems to determine whether images are suitable for presentation to users.
With respect to claim 2, 13,
Srivastava as modified discloses the method as claimed in claim 1, wherein the memory is updated when a value of the matching of the first numerical representations and the second numerical representations is greater than a pre-defined threshold (paragraphs 24 and 52, Srivastava and paragraph 20, 24 and 48, Kim).
With respect to claim 3,
Srivastava as modified discloses the method as claimed in claim 1, wherein the one or more key words are extracted using optical character recognition (paragraphs 17, 40 – 41, Srivastava and paragraphs 29 – 30, Kim).
With respect to claim 4, 14,
Srivastava as modified discloses the method as claimed in claim 1, further comprising generating an electronic document including the first textual information and the one or more images associated with the first textual information (figures 1, 3, paragraphs 18 – 20, Srivastava).
With respect to claim 5, 15,
Srivastava as modified discloses the method as claimed in claim 1, wherein the pre-defined rules are deployed using one or more of semantic text classification models and semantic text extraction models (paragraph 20, 23, 32, Srivastava).
With respect to claim 6, 16,
Srivastava as modified discloses the method as claimed in claim 1, wherein the association of the one or more images with the first textual information includes one or more of an index, identity, and link to location of the one or more images contained in the one or more electronic documents (paragraphs 24 – 27, Srivastava).
With respect to claim 7, 17,
Srivastava as modified discloses the method as claimed in claim 1, wherein the association of the one or more images with the first textual information is stored as a single entry of a table (paragraphs 20 and 28, Kim).
With respect to claim 8, 18,
Srivastava as modified discloses the method as claimed in claim 1, further comprising: receiving a user query including one or more query words; determining a similarity between the one or more query words and the first textual information and providing a response including the first textual information and the one or more images associated with the first textual information based on the similarity between the one or more query words and the first textual information (paragraphs 24 and 52, Srivastava and paragraph 20, 24 and 48, Kim).
With respect to claim 9,
Srivastava as modified discloses the method as claimed in claim 8, wherein the user query is processed using a natural language processing technique for determining the one or more query words (figures 1, 3, paragraphs 18 – 20, Srivastava).
With respect to claim 10, 19,
Srivastava as modified discloses the method as claimed in claim 1, further comprising generating a video using the one or more images associated with the first textual information overlaid on a speech synthesized audio sequence of the first textual information (paragraphs 56 – 58, Kim).
With respect to claim 11,
Srivastava as modified discloses the method as claimed in claim 1, wherein the one or more images are captured from a video file (paragraphs 53 – 58, Kim).
Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20250005952 A1 teaches receiving document images, each including representations of characters. The processor is caused to parse each document image to extract, based on structure type, subsets of characters, to generate a text encoding for that document image. For each document, the processor is caused to extract visual features to generate a visual encoding for that document image, each visual feature associated with a subset of characters. The processor is caused to generate parsed documents, each parsed document uniquely associated with a document image and based on the text and visual encoding for that document image. For each parsed document, the processor is caused to identify sections uniquely associated with section type. The processor is caused to train machine learning models, each machine learning model associated with one section type and trained using a portion of each parsed document associated with that section type.
US 20250156460 A1 teaches receiving input data comprising a corpus of documents; processing the corpus of documents to generate training data, wherein processing the corpus of documents comprises: segmenting the corpus of documents into a plurality of segments based on a semantic pattern; and producing one or more embeddings for each segment of the plurality of segments; and generating the training data based on the one or more embeddings; and training a large language model (LLM) using the training data.
US 20200394509 A1 teaches training a neural network includes receiving a text corpus containing a labeled portion and an unlabeled portion, extracting local n-gram features and a sequence of the local n-gram features from the text corpus, processing the text corpus, using convolutional layers, according to the local n-gram features to determine capsule parameters of capsules configured to preserve the sequence of the local n-gram features, performing a forward-oriented dynamic routing between the capsules using the capsule parameters to extract global characteristics of the text corpus, and processing the text corpus according to the global characteristics using a long short-term memory layer to extract global sequential text dependencies from the text corpus, wherein parameters of the neural network are updated according to the local n-gram features, the capsule parameters, global characteristics, and global sequential text dependencies.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NAVNEET K GMAHL whose telephone number is (571)272-5636.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SANJIV SHAH can be reached on . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NAVNEET GMAHL/Examiner, Art Unit 2166 Dated: 1/24/2026
/SANJIV SHAH/Supervisory Patent Examiner, Art Unit 2166