DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of the Claims
Currently, claims 1-20 are pending in the application. Claims 1-4, 7-13, and 15-16 are amended. Claims 17-20 are added.
Response to Arguments / Amendments
Applicant’s arguments have been fully considered, but they are not persuasive, see discussion below.
Rejections under 35 U.S.C. § 103:
The applicant argued that the cited references fail to teach or suggest a system that analyzes, compresses, and delivers multimedia content as required by the claims. The Office Action also lacks a reasoned motivation supported by evidence to combine Wu and Debnath in the manner claimed. Because the rejection does not establish a prima facie case of obviousness, Applicant respectfully requests withdrawal of the § 103 rejection.
As to the above argument, Wu discloses analyzing content characteristics of input media content with retrieval block 119 includes a plurality of encoders to encode multi-modal (e.g., textual, video, image, and audio) information from public codebase 115 and private codebase 117 ([0071], FIG. 1) and dynamically adjust processing parameters based on the content characteristics using two sets of encoders 215 and 217 for public codebase 115 and private codebase 117 ([0077], FIG. 2);
Wu further discloses process the input media content based on the dynamically adjusted processing parameters ([0077], FIG. 2; [0118], FIG. 2) and select encoding parameters based on the analyzed content characteristics, received user preference data, ([0072], FIG. 1; [0355] ).
In addition to Wu, Debnath teaches a network interface configured to monitor network conditions; and select encoding parameters based on the monitored network conditions ([0006], [0024])
Furthermore, Wu and Debnath are in the same field of endeavor – while Debnath teaches Video compression parameters are predicted based on an analysis of the video and an assessment of the current network bandwidth using a control network, and the video is compressed based on the predicted parameters with an adaptive video compression module (Abstract), Wu discloses a plurality of encoders to encode multi-modal (e.g., textual, video, image, and audio) information from public codebase ([0071]
Accordingly, Examiner maintains the rejection with regards to above arguments.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al. (US 20250238208, hereinafter Wu) in view of Debnath et al. (US 20240275983, hereinafter Debnath).
Regarding Claim 1, Wu discloses a system for adaptive multi-modal media processing and delivery (Fig. 2), comprising:
a hardware accelerator ([0141], FIG. 9, data center infrastructure layer 910 including a resource orchestrator 912, accelerators); and
a memory storing instructions to be executed by one or more processors; and one or more processors operably coupled to the hardware accelerator ([0069], FIG. 1; [0141], FIG. 9, data center infrastructure layer 910 may include a resource orchestrator 912, processors (including accelerators, field programmable gate arrays (FPGAs), graphics processors, etc.), memory storage devices 918(1)-918(N) (e.g., dynamic read-only memory, solid state storage or disk drives), network input/output (“NW I/O”) devices, network switches, virtual machines (“VMs)) the network interface, and the memory, wherein the one or more processors are configured to execute the instructions to:
analyze content characteristics of input media content([0071], FIG. 1, retrieval block 119 includes a plurality of encoders to encode multi-modal (e.g., textual, video, image, and audio) information from public codebase 115 and private codebase 117);
dynamically adjust processing parameters based on the content characteristics ([0077], FIG. 2, a retrieval block 119 includes two sets of encoders 215 and 217 for public codebase 115 and private codebase 117);
process the input media content using the hardware accelerator, wherein the processing is based on the dynamically adjusted processing parameters ([0077], FIG. 2, a retrieval block 119 includes two sets of encoders 215 and 217 for public codebase 115 and private codebase 117; [0118], FIG. 2, store graph code or other software to control timing and/or order, in which weight and/or other parameter information is to be loaded to configure, logic, including integer and/or floating point units (collectively, arithmetic logic units (ALUs)) and graph code loads weight or other parameter information into processor ALUs based on an architecture of a neural network to which such code corresponds);
select encoding parameters based on the analyzed content characteristics, received user preference data, ([0072], FIG. 1, encoder 105 parses specifications 103 to determine its contents, which involves tokenizing text describing specifications 103, and embedding generated tokens into high-dimensional vectors; [0070], detailing layers of functionality and code dependencies, from high-level user interfaces down to lower-level system operations);
apply adaptive processing techniques to the media content using the hardware accelerator to encode the media content into compressed media content ([0083], FIG. 2, transformer encoder 105, which is trainable, receives neighbors 237 and 239 as well as representation tensor 247 after representation tensor 247 passes through a self-attention module 251 and generates encoded neighbors 241 and 243); and
output the compressed media content for delivery ([0083], FIG. 2, a transformer encoder 105 generates encoded neighbors 241 and 243 from neighbors 237 and 239, respectively that incorporates information about dependencies in code 203 from self-attention module 251).
Wu does not explicitly disclose a network interface configured to monitor network conditions; and select encoding parameters based on the monitored network conditions.
Debnath teaches a network interface configured to monitor network conditions ([006], Current network bandwidth is assessed and future bandwidth availability is predicted. Video compression parameters are predicted based on an analysis of the video and an assessment of the current network bandwidth using a control network, and the video is compressed based on the predicted parameters with an adaptive video compression module ); and
select encoding parameters based on the monitored network conditions ([0024] an integrated system and method for adaptively controlling video compression based on deep learning techniques to optimize network bandwidth usage while maintaining high-quality video for analytics utilizing surrogate model-based video encoding with reinforcement learning for dynamic adjustment of encoding parameters to achieve an optimal balance between bandwidth efficiency and the analytical accuracy of video content in varying network conditions).
Therefore, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of select encoding parameters based on network conditions as taught by Debnath ([0024]) into the encoding & decoding system of Wu in order to provide systems for enhancements in video compression and analytics for network optimization and improved video analysis and optimal balance between bandwidth efficiency and the analytical accuracy of video content in varying network conditions (Debnath, [0024]).
Regarding Claim 2, Wu in view of Debnath discloses the system of claim 1,
Wu discloses wherein the one or more processors are configured to execute the instructions to:
manage hierarchical models for processing input media content, wherein the hierarchical models comprise at least one parent model trained on a content category and one or more child models specialized for specific content items within the content category; and combine at least one of the child models with the parent model to process the media content ([0058], each self-attention operation over a pair of code hierarchical levels is performed between raw code tokens at a lower level and summary data tokens at a higher level).
Regarding Claim 3, Wu in view of Debnath discloses the system of claim 1,
Wu discloses wherein the hardware accelerator component includes at least one specialized processing unit ([0069], FIG. 1; [0141], FIG. 9, data center infrastructure layer 910 may include a resource orchestrator 912, processors (including accelerators, field programmable gate arrays (FPGAs), graphics processors, etc.), memory storage devices 918(1)-918(N) (e.g., dynamic read-only memory, solid state storage or disk drives), network input/output (“NW I/O”) devices, network switches, virtual machines (“VMs)).
Regarding Claim 4, Wu in view of Debnath discloses the system of claim 1,
Wu discloses further comprising a wherein processing the input media content includes processing multi-dimensional video by performing symmetry analysis or diffeomorphic transformations on the multi-dimensional video processing subsystem ([0455], performing 3D operations, such as rendering three-dimensional images and scenes using processing functions that act upon 3D primitive shapes (e.g., rectangle, triangle, etc.).
Regarding Claim 5, Wu in view of Debnath discloses the system of claim 1,
Wu discloses further comprising a content security subsystem ([0185], a CNN for facial recognition and vehicle owner identification using data from camera sensors; and/or a CNN for security and/or safety related events;[0355] GPGPU 1830 trains neural networks used within an inferencing platform. Memory technology associated with memory 1844A-1844B may differ between inferencing and training configurations, with higher bandwidth memory technologies devoted to training configurations that provides support for one or more 8-bit integer dot product instructions, which may be used during inferencing operations for deployed neural networks)
Regarding Claim 6, Wu in view of Debnath discloses the system of claim 1,
Wu discloses wherein the adaptive processing techniques include AI-driven compression ([0072], neural network 101 uses such learned code dependencies and said specifications 103 to generate new code for said code-generating task; [0129], FIG. 8, training and deployment of a deep neural network, according to at least one embodiment. In at least one embodiment, untrained neural network 806;[0149]).
Regarding Claim 7, Wu in view of Debnath discloses the system of claim 1,
Wu discloses further comprising a continuous learning subsystem configured to refine wherein the adaptive processing techniques include continuous learning refinement based on historical data and user feedback ([0149] allow users to train or performing inferencing of information, such as image recognition, speech recognition, or other artificial intelligence services).
Regarding Claim 8, Wu in view of Debnath discloses the system of claim 1,
Wu discloses wherein applying the adaptive processing techniques includes subsystem is configured to performing cross-media optimization across multiple content types ([0071], FIG. 1, retrieval block 119 includes a plurality of encoders to encode multi-modal (e.g., textual, video, image, and audio) information from public codebase 115 and private codebase 117).
Regarding Claim 17, Wu in view of Debnath discloses the system of claim 1,
Wu discloses further comprising a generative artificial intelligence (AI) codec, wherein the media content is encoded into compressed media content using the generative AI codec ([0149], application-specific integrated circuits (ASICs), GPUs, FPGAs, or other hardware to perform training and/or inferencing using above-described resources to allow users to train or performing inferencing of information, such as image recognition, speech recognition, or other artificial intelligence services).
Regarding Claim 18, Wu in view of Debnath discloses the system of claim 3,
Wu discloses wherein the specialized processing unit is configured to perform deep learning-based compression ([0149], application-specific integrated circuits (ASICs), GPUs, FPGAs, or other hardware to perform training and/or inferencing using above-described resources to allow users to train or performing inferencing of information, such as image recognition, speech recognition, or other artificial intelligence services).
Regarding Claims 9-16 and 19-20, Method claims 9-16 and 19-20 of using the corresponding system claimed in claims 1-8 and 17-18, and the rejections of which are incorporated herein for the same reasons as used above.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Samuel D Fereja whose telephone number is (469)295-9243. The examiner can normally be reached 8AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DAVID CZEKAJ can be reached at (571) 272-7327. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SAMUEL D FEREJA/Primary Examiner, Art Unit 2487