DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 12-15, 17-20, 1-5, and 7-11 (1-5, 7-15, and 17-20) are rejected under 35 U.S.C. 103 as being unpatentable over Sureshkumar et al., US 2021/0117685 A1 (Sureshkumar).
Regarding claim 12, Sureshkumar teaches a computer-implemented method (computer system) (Fig. 7; [0031]) for processing media content (facilitating a video annotation system) ([0031]), the computer-implemented method comprising:
obtaining one or more media content items of a segment of media content (determining segment-level contextual information for a segment of video 140) (Fig. 1A; [0047]), the media content comprising a video (obtaining a video 140) (Fig. 1A; [0044]);
generating, based on one or more signals in the one or more media content items (based on one or more input signals of the media content, such as video frames, audio signal, textual information, metadata, etc.) (Fig. 1A; [0049-0050]), one or more media content representations encoding information (autoencoder for identifying information of the video content; background signal embeddings, word embeddings, etc.) ([0073] and [0078-0079]) about the one or more media content items (wherein the analysis module can apply a corresponding AI model on each mode of input and infer semantic contextual information indicating the corresponding classification into a semantic category) (Fig. 1A; [0050]);
classifying a content of the segment of the media content based on the one or more media content representations (wherein the analysis module can apply a corresponding AI model on each mode of input and infer semantic contextual information indicating the corresponding classification into a semantic category) (Fig. 1A; [0050]), the content of the segment of the media content being classified into one or more categories of content (wherein the analysis module can apply a corresponding AI model on each mode of input and infer semantic contextual information indicating the corresponding classification into a semantic category) (Fig. 1A; [0050-0051]); and
matching the segment of the media content with a targeted media content item (matching the segments contextual information with advertisement that matches that contextual information) ([0043], [0053-0054], and [0083]) based on the one or more categories of content associated with the segment of the media content and at least one category of content associated with the targeted media content item (the advertisement system can dynamically match the annotations with the content of the available advertisements and place the advertisement that matches the contextual information represented in the annotations) ([0043], [0053-0054], and [0083]) (wherein the annotations can be associated with the preceding and subsequent scene of the advertisement spot) ([0043]) (wherein the annotation is a respective determined category and a likelihood of the segment being in that category) ([0050-0051]).
Although Sureshkumar does not explicitly teach “media” content, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that since the video in Sureshkumar’s invention can be a movie ([0061]) or television content ([0044]) that both of these are obviously media content.
Regarding claim 13, Sureshkumar teaches further comprising: inserting the targeted media content item within the segment of the media content (placing the contextually relevant advertisements in a video; in inter-segment availabilities) ([0042-0043] and [0081-0082]); and providing, to a device associated with a user (such as user device 106) (Fig. 1A; [0045]), the segment of the media content with the targeted media content item (placing the advertisement within the video that the user is watching) ([0087]).
Regarding claim 14, Sureshkumar teaches wherein matching the segment of the media content with the targeted media content item (matching the segments contextual information with advertisement that matches that contextual information) ([0043], [0053-0054], and [0083]) comprises: matching the one or more categories of content associated with the segment with the at least one category of content associated with the targeted media content item (matching the segments contextual information with advertisement that matches that contextual information) ([0043], [0053-0054], [0083], and [0087]) (wherein the annotation is a respective determined category and a likelihood of the segment being in that category) ([0050-0051]); and based on the matching of the one or more categories of content associated with the segment with the at least one category of content associated with the targeted media content item (based on the matching the segments contextual information with advertisement that matches that contextual information) ([0043], [0053-0054], and [0083]), matching the segment with the targeted media content item (matching the segments contextual information with advertisement that matches that contextual information) ([0043], [0053-0054], and [0083]).
Regarding claim 15, Sureshkumar teaches further comprising: determining similarity metrics (corresponding similarity strengths) ([0085]) indicating respective similarities between the one or more categories of content associated with the segment and a set of categories of content associated with a set of targeted media content items (to perform the matching operation, advertisement system can obtain values and strengths from the keys in the annotations, and select a set of values with corresponding strengths greater than a threshold) ([0085]), the set of targeted media content items comprising the targeted media content item (the targeted media content comprising the media content item) ([0085] and [0087]); based on a comparison of the similarity metrics (based on the corresponding strengths) ([0085]), determining that a similarity between the one or more categories of content associated with the segment and the at least one category of content associated with the targeted media content item is greater than a respective similarity between each category of content from the set of categories of content associated with the set of targeted media content items (selecting an advertisement that has the best match with the content and narrative and is greater than a threshold) ([0085]) (wherein if the strength is greater than a threshold, it may indicate that segment belongs to a category corresponding to a key) (Fig. 2A; [0067]); and based on the determining that the similarity between the one or more categories of content associated with the segment and the at least one category of content associated with the targeted media content item is greater than the respective similarity between each category of content from the set of categories of content, matching the one or more categories of content associated with the segment with the at least one category of content associated with the targeted media content item (since the system only selects the advertisement with the best match that has a category that matches over a threshold; it would only select the one targeted advertisement) ([0067], [0085], and [0087]).
Regarding claim 17, Sureshkumar teaches wherein the one or more signals in the one or more media content items comprise a visual signal comprising image data from the one or more media content items (wherein the input set can include video frames 122) (Fig. 1A; [0049-0050]), an audio signal comprising audio from the one or more media content items (the input set can also include an audio signal 124) (Fig. 1A; [0049-0050]), a closed caption signal comprising text associated with the one or more media content items (the input set can also include textual information 126; wherein the textual information can be subtitles/closed captions) (Figs. 1A and 3A; [0049-0050] and [0070]), or a combination thereof (wherein the input set can include a combination of signals) (Fig. 1A; [0049-0050] and [0070]); and wherein the one or more media content representations comprise a first media content representation encoding information determined based on the visual signal (wherein the video frames can include encoding information, such as visual embeddings) (Fig. 3B; [0078]), a second media content representation encoding information determined based on the audio signal (wherein the audio signal can include background signal embeddings) (Fig. 3B; [0078]), a third media content representation encoding information determined based on the closed caption signal (wherein the textual information can include word embeddings) (Fig. 3B; [0078]), or a combination thereof (wherein the embeddings can be concatenated; fused together) (Fig. 3B; [0079]).
Regarding claim 18, Sureshkumar teaches further comprising: combining at least two media content representations from the first media content representation, the second media content representation, and the third media content representation into a fused media content representation (wherein the media representations (visual embeddings from the video frames, background signal embeddings from the audio signal, and word embeddings from the textual information) can be fused using a concatenating layer representing the multimodal embeddings) (Fig. 3B; [0078-0079]); and classifying the content of the segment of the media content into the one or more categories of content based on the fused media content representation (wherein the multimodal embeddings are input into a multimodal segment classifier which can infer properties and provide the classification labels) (Fig. 3B; [0079]) (wherein annotation can include a set of keys representing the contextual information) ([0080]) (wherein the annotation is a respective determined category and a likelihood of the segment being in that category) ([0050-0051]).
Regarding claim 19, Sureshkumar teaches wherein the one or more media content representations comprises one or more embeddings encoding information about the one or more media content items (wherein the media representations comprises visual embeddings from the video frames, background signal embeddings from the audio signal, and word embeddings from the textual information) (Fig. 3B; [0078-0079]), and wherein the information about the one or more media content items comprises a context associated with the content of the segment of the media content (wherein the annotation can include a set of keys representing the contextual information associated with the segment) (Fig. 3B; [0080]), one or more features of the content of the segment of the media content (wherein the annotation key can include a feature of the segment) (Fig. 2B; [0064]), one or more characteristics of the content of the segment of the media content (wherein the annotation key can include characteristics of the content; such as a person who is smiling) (Fig. 2B; [0069]), one or more characteristics of a scene in the segment of the media content (wherein the annotation key can include characteristics of the scene such as the genre of the segment) (Fig. 2B; [0064-0066] and [0069]), one or more characteristics of a shot in the segment of the media content (wherein the characteristics can also be based on shots of the segment) ([0070]), or a combination thereof (wherein the annotation key can be a combination of contextual information) (Fig. 2B; [0069]).
Regarding claim 20, see the rejection made to claim 12, as well as prior art Sureshkumar for a non-transitory computer-readable medium (non-transitory computer-readable medium) ([0100-0101]) having instructions stored thereon that (having code and/or data stored thereon) ([0100-0101]), when executed by at least one computing device (when executed by the computer system) ([0100-0101]), cause the at least one computing device to perform operations (wherein the computer system performs the methods and processes stored within the computer-readable storage medium) ([0101]), for they teach all the limitations within this claim.
Regarding claim 1, see the rejection made to claim 12, as well as prior art Sureshkumar for a system for processing video content (computer system 700 for facilitating a video annotation system) (Fig. 7; [0094]), the system comprising: one or more memories (memory 704) (Fig. 7; [0094]); and at least one processor (processor 702) (Fig. 7; [0094]) coupled to at least one of the one or more memories (wherein the processor is coupled to the memory being within the computer system) (Fig. 7; [0094]) and configured to perform operations (causing the computer system 700 to perform methods and/or processes) (Fig. 7; [0094-0095]), for they teach all the limitations within this claim.
Regarding claim 2, see the rejection made to claim 13, as well as prior art Sureshkumar for a system for processing video content (computer system 700 for facilitating a video annotation system) (Fig. 7; [0094]), the system comprising: one or more memories (memory 704) (Fig. 7; [0094]); and at least one processor (processor 702) (Fig. 7; [0094]) coupled to at least one of the one or more memories (wherein the processor is coupled to the memory being within the computer system) (Fig. 7; [0094]) and configured to perform operations (causing the computer system 700 to perform methods and/or processes) (Fig. 7; [0094-0095]), for they teach all the limitations within this claim.
Regarding claim 3, see the rejection made to claim 14, as well as prior art Sureshkumar for a system for processing video content (computer system 700 for facilitating a video annotation system) (Fig. 7; [0094]), the system comprising: one or more memories (memory 704) (Fig. 7; [0094]); and at least one processor (processor 702) (Fig. 7; [0094]) coupled to at least one of the one or more memories (wherein the processor is coupled to the memory being within the computer system) (Fig. 7; [0094]) and configured to perform operations (causing the computer system 700 to perform methods and/or processes) (Fig. 7; [0094-0095]), for they teach all the limitations within this claim.
Regarding claim 4, Sureshkumar teaches wherein the at least one processor (processor 702) (Fig. 7; [0094]) is configured to perform operations (causing the computer system 700 to perform methods and/or processes) (Fig. 7; [0094-0095]) comprising: determining similarity metrics (corresponding similarity strengths) ([0085]) indicating respective similarities between the one or more categories of content associated with the segment and a set of categories of content associated with a set of targeted media content items (to perform the matching operation, advertisement system can obtain values and strengths from the keys in the annotations, and select a set of values with corresponding strengths greater than a threshold) ([0085]), the set of targeted media content items comprising the targeted media content item (the targeted media content comprising the media content item) ([0085] and [0087]); and matching the one or more categories of content associated with the segment with the at least one category of content associated with the targeted media content item based on a respective similarity metric (corresponding similarity strengths) ([0085]) associated with the at least one category of content (selecting an advertisement that has the best match with the content and narrative and is greater than a threshold) ([0085]) (wherein if the strength is greater than a threshold, it may indicate that segment belongs to a category corresponding to a key) (Fig. 2A; [0067]).
Regarding claim 5, Sureshkumar teaches wherein the at least one processor (processor 702) (Fig. 7; [0094]) is configured to perform operations (causing the computer system 700 to perform methods and/or processes) (Fig. 7; [0094-0095]) comprising: comparing the similarity metrics (based on the corresponding strengths) ([0085]); based on the comparing of the similarity metrics (based on the corresponding strengths) ([0085]), determining that a similarity between the one or more categories of content associated with the segment and the at least one category of content associated with the targeted media content item is greater than a respective similarity between each category of content from the set of categories of content associated with the set of targeted media content items (selecting an advertisement that has the best match with the content and narrative and is greater than a threshold) ([0085]) (wherein if the strength is greater than a threshold, it may indicate that segment belongs to a category corresponding to a key) (Fig. 2A; [0067]); and based on the determining that the similarity between the one or more categories of content associated with the segment and the at least one category of content associated with the targeted media content item is greater than the respective similarity between each category of content from the set of categories of content, matching the one or more categories of content associated with the segment with the at least one category of content associated with the targeted media content item (since the system only selects the advertisement with the best match that has a category that matches over a threshold; it would only select the one targeted advertisement) ([0067], [0085], and [0087]).
Regarding claim 6, see the rejection made to claim 16, as well as prior art Sureshkumar for a system for processing video content (computer system 700 for facilitating a video annotation system) (Fig. 7; [0094]), the system comprising: one or more memories (memory 704) (Fig. 7; [0094]); and at least one processor (processor 702) (Fig. 7; [0094]) coupled to at least one of the one or more memories (wherein the processor is coupled to the memory being within the computer system) (Fig. 7; [0094]) and configured to perform operations (causing the computer system 700 to perform methods and/or processes) (Fig. 7; [0094-0095]), for they teach all the limitations within this claim.
Regarding claim 7, see the rejection made to claim 17, as well as prior art Sureshkumar for a system for processing video content (computer system 700 for facilitating a video annotation system) (Fig. 7; [0094]), the system comprising: one or more memories (memory 704) (Fig. 7; [0094]); and at least one processor (processor 702) (Fig. 7; [0094]) coupled to at least one of the one or more memories (wherein the processor is coupled to the memory being within the computer system) (Fig. 7; [0094]) and configured to perform operations (causing the computer system 700 to perform methods and/or processes) (Fig. 7; [0094-0095]), for they teach all the limitations within this claim.
Regarding claim 8, see the rejection made to claim 18, as well as prior art Sureshkumar for a system for processing video content (computer system 700 for facilitating a video annotation system) (Fig. 7; [0094]), the system comprising: one or more memories (memory 704) (Fig. 7; [0094]); and at least one processor (processor 702) (Fig. 7; [0094]) coupled to at least one of the one or more memories (wherein the processor is coupled to the memory being within the computer system) (Fig. 7; [0094]) and configured to perform operations (causing the computer system 700 to perform methods and/or processes) (Fig. 7; [0094-0095]), for they teach all the limitations within this claim.
Regarding claim 9, see the rejection made to claim 19, as well as prior art Sureshkumar for a system for processing video content (computer system 700 for facilitating a video annotation system) (Fig. 7; [0094]), the system comprising: one or more memories (memory 704) (Fig. 7; [0094]); and at least one processor (processor 702) (Fig. 7; [0094]) coupled to at least one of the one or more memories (wherein the processor is coupled to the memory being within the computer system) (Fig. 7; [0094]) and configured to perform operations (causing the computer system 700 to perform methods and/or processes) (Fig. 7; [0094-0095]), for they teach all the limitations within this claim.
Regarding claim 10, Sureshkumar teaches wherein the at least one processor (processor 702) (Fig. 7; [0094]) is configured to perform operations (causing the computer system 700 to perform methods and/or processes) (Fig. 7; [0094-0095]) comprising: determining, based on a sentiment analysis performed using a large language network model (emotion detection technique and/or classifier using natural language processing) ([0075-0076]), an emotional tone associated with the content of the segment of the media content (which can determine/infer the emotion and vibe of the segment) ([0075]); and classifying the content of the segment of the media content based on the one or more media content representations and the emotional tone associated with the content of the segment of the media content (classifying the content of the segment based on classifications generated from the visual, audio, and text classifiers; which includes inferring contextual information such as sentiment and emotion) ([0075-0076]).
Regarding claim 11, Sureshkumar teaches wherein the at least one processor (processor 702) (Fig. 7; [0094]) is configured to perform operations (causing the computer system 700 to perform methods and/or processes) (Fig. 7; [0094-0095]) comprising: generating, based on text describing the information encoded in the one or more media content representations (based on subtitles/closed captions, script, and speech recognition) ([0075]), augmented data (generating augmented data) ([0075]) comprising an indication of the one or more categories of content and additional information about the one or more categories of content (which includes inferring the emotion and vibe of the segment using the augmented technique; which in turn classifies the segment into contextual information, such as the genre associated with the segment) ([0075-0076]), the content of the segment of the media content, or a combination thereof (wherein the system can then fuse the classifications, predictions, and/or scores from each of the visual, audio, and text classifiers based on a classifier score fusion technique and generate jointly inferred contextual information from the multiple input modalities associated with the segment) ([0076]); and associating the segment of the media content with the augmented data (wherein the segment is associated with the augmented data) ([0075-0076]).
Claim(s) 16 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Sureshkumar et al., US 2021/0117685 A1 (Sureshkumar), and further in view of Henkin et al., US 2011/0213655 A1 (Henkin).
Regarding claim 16, Sureshkumar teaches determining strengths between the one or more categories of content associated with the segment and a set of categories of content associated with a set of targeted media content items (matching the segments contextual information with advertisement that matches that contextual information) ([0043], [0053-0054], [0083], and [0087]) (wherein the annotation is a respective determined category and a likelihood of the segment being in that category) ([0050-0051]), the set of targeted media content items comprising the targeted media content item (the targeted media content comprising the media content item) ([0085] and [0087]); and
determining a similarity between the one or more categories of content associated with the segment and the at least one category of content associated with the targeted media content item is greater than a respective similarity between each category of content from the set of categories of content associated with the set of targeted media content items (selecting an advertisement that has the best match with the content and narrative and is greater than a threshold) ([0085]) (wherein if the strength is greater than a threshold, it may indicate that segment belongs to a category corresponding to a key) (Fig. 2A; [0067]);
matching the one or more categories of content associated with the segment with the at least one category of content associated with the targeted media content item (since the system only selects the advertisement with the best match that has a category that matches over a threshold; it would only select the one targeted advertisement) ([0067], [0085], and [0087]).
However, Sureshkumar does not explicitly state that the similarity is based determining “distance metrics indicating respective distances within a representation space” or “determining, based on the distance metrics, that a distance within the representation space is smaller than a respective distance” for detecting the category.
Henkin teaches different types of Hybrid contextual advertising and related content analysis and display techniques are disclosed for facilitating on-line contextual advertising operations and related content delivery operations implemented in a computer network (Abstract); wherein a distance metric in a representation space can be used to determine if two representations are similar ([0053] and [1524]); and wherein the smaller the distance the closer to a match ([0053] and [1524]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Sureshkumar to not only use the normalized representation of the strengths associated with each key (category) such as a value of 1 or 100 (Sureshkumar; [0040]) and comparing the strength to a threshold for matching the category (Sureshkumar; [0041]) but to include determining a distance within a representation space since comparing the distances between the annotations of each segment as in Sureshkumar would be considered to be an obvious alternative to the fact of comparing strength of each annotation to a threshold; which also allows for matching in an accurate and effective way (Henkin; [0085]).
Regarding claim 6, see the rejection made to claim 16, as well as prior art Sureshkumar for a system for processing video content (computer system 700 for facilitating a video annotation system) (Fig. 7; [0094]), the system comprising: one or more memories (memory 704) (Fig. 7; [0094]); and at least one processor (processor 702) (Fig. 7; [0094]) coupled to at least one of the one or more memories (wherein the processor is coupled to the memory being within the computer system) (Fig. 7; [0094]) and configured to perform operations (causing the computer system 700 to perform methods and/or processes) (Fig. 7; [0094-0095]), for they teach all the limitations within this claim.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Zhang et al., US 2018/0020247 A1 (Zhang): teaches a method of advertising, the method including: 1) segmenting a source video into individual scenes using a clustering-based approach; 2) obtaining relevant information about objects in the video for each individual scene using region-wise convolutional characteristics based detection; 3) searching, in a database, for advertisement objects matching the objects using garment retrieval and a category-based strategy; 4) performing optimization processing of retrieved advertisement objects matching the objects to obtain a candidate advertisement; and 5) optimizing a distance between an advertisement and a target object and an area of overlapping regions between the advertisement and all objects; and 6) distributing the video that contains the candidate advertisement to a plurality of displays and displaying the video that contains the candidate advertisement on the plurality of displays (Abstract).
Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J VANCHY JR whose telephone number is (571)270-1193. The examiner can normally be reached Monday - Friday 9am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emily Terrell can be reached at (571) 270-3717. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL J VANCHY JR/Primary Examiner, Art Unit 2666 Michael.Vanchy@uspto.gov