DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-23have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-2, 8 and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Xiong et al., (U.S. Pub. No. 20220374635 A1) and in view of Kumar et al., (U.S. Patent No. 11301684-B1).
As per claim 1, Xiong teaches a method of processing video data, comprising: receiving a plurality of frames of a video at an edge device (fig. 1),wherein the video depicts an action that span the plurality of frames ([0041], [0045], [0052], [0063], [0087], [0092]; “detected motion”); compressing (fig. 1 el. 118; “encoder”) using an encoder network of the edge device, each of the plurality of frames to obtain compressed frame features, wherein the features include fewer data bits than the plurality of frames of the video (fig. 1 el. 118; [0025], [0026], [0027], “Encoder 118 may use various possible digital encoding and/or compression formats for encoding the video data generated by image sensor 112 into a time-dependent video stream composed of video frames at a determined frame rate (number of frames per second). In some embodiments, encoder 118 may use a compressed video format to reduce the storage size and network bandwidth necessary for storing and transferring the original video stream.”); classifying, using a classification network of the edge device (fig. 1 el. 124 “analysis 124”), the compressed frame features to obtain classification information correspond to the in the video (fig. 1, fig. 2, fig. 4, fig. 6, [0008] [0029], [0056-0057] [0061-0062], [0099], In some embodiments, video analysis subsystem 124 may be configured to support real-time image classification and object detection within camera 110 without processing support from network video recorder 130 or network video server 160. For example, video analysis subsystem 124 may receive a video stream (from sensor 112 and/or encoder 118), classify the video frame to determine whether an object type of interest is present and, if so, initiate an object detector to determine the object's position within the video frame (and/or subsequent video frames)), to obtain classification information corresponding to the video, wherein the classification information comprises a prediction of a likelihood (fig. 2, [0057], “… If the video frame does include the object type, the video frame may be selectively passed to object detection 218. At block 216, image classification 212 may generate an object type confidence determination in addition to the binary indicator of the object presence or absence. For example, object type confidence determination 216 may include a confidence value between 0 and 1 that indicates the likelihood that the image classification algorithm has correctly identified the presence or absence of the object type in the video frame”) and transmitting the classification information from the edge device to a central server (fig. 1). Xiong does not explicitly disclose classifying, to obtain action classification information; and wherein the action classification information comprises a prediction of a likelihood that the video depicts the action, as recited in claim 1.
However, Kumar teaches classifying, to obtain action classification information corresponding to the action in the video, (abstract, col. 2 lines 36-40, lines 55-58; col. 4 lines 6-17, col. 5 lines 40-44; and figs. 1-2), wherein the action classification information comprises a prediction of a likelihood that the video depicts the action (col. 5 lines 57-65, col.10 lines 38-67 and figs.1-2); and transmitting the action classification information to a central server (fig. 11; col. 22 lines 23-30; col. 23 lines 25-36).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Kumar with Xiong for the benefit of providing improved classification.
As per claim 2, Xiong (modified by Kumar) as a whole teaches everything as claimed above, see claim 1. In addition, Xiong teaches recording the video at the edge device (fig. 1 and [0026]) and selecting the plurality of frames of the video for classification (fig. 4 and [0102]).
As per claim 8, Xiong (modified by Kumar) as a whole teaches everything as claimed above, see claim 1. In addition, Xiong teaches the compressed frame features comprise a binary code ((fig. 1 el. 118; [0025], [0026], [0027],).
As per claim 21, which is the corresponding non-transitory computer-readable medium storing instructions with the limitations of the method for processing video data as recited in claim 1. Thus, the rejection and analysis made for claim 1 also applies here.
Claim(s) 4 and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Xiong et al., in view of Kumar (U.S. Patent No. 11301684-B1) and further in view of Tran et al., (U.S. Pub. No. 2023/0344962 A1).
As per claim 4, Xiong (modified by Kumar) as a whole teaches everything as claimed above, see claim 1. Although Xiong teaches decoding the compressed frame features ([0074]) and wherein classification is based on the decoding ([0075-0077]), Xiong does not explicitly disclose decoding the compressed frame features using a three-dimensional convolution network and a fully connected layer, wherein the action classification information is based on the decoding.
However, Kumar teaches wherein the action classification information is based on the decoding (col. 7 lines 6-11).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Kumar with Xiong for the benefit of providing improved classification.
Xiong (modified by Kumar) does not explicitly disclose the compressed frame features using a three-dimensional convolution network and a fully connected layer.
However, Tran teaches decoding the compressed frame features using a three-dimensional convolutional network and a fully connected layer ([0024], [0038-0039] and figs. 3A-3B).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Tran with Xiong (modified by Kumar) for providing improved performance.
As per claim 23, which is the corresponding non-transitory computer-readable medium storing instructions with the limitations of the method as recited in claim 4, thus the rejection and analysis made for claim 4 also applies here.
Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Xiong et al., (U.S. Pub. No. 20220374635 A1) in view of Kumar et al., (U.S. Patent No. 11301684-B1 ) and further in view of Zhu et al., (U.S. Pub. No. 20210289227-A1 ).
As per claim 6, Xiong (modified by Kumar) as a whole teaches everything as claimed above, see claim 1. Although Xiong teaches decoding the compressed frame features (fig.3 and [0074]), wherein the classification is based on the decoding (fig. 3 and [0075-0077]). Xiong does not explicitly disclose decoding the compressed frame feature using a recurrent neural network.
However, Kumar teaches wherein the action classification is based on the decoding (col. 7 lines 6-11).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Kumar with Xiong for the benefit of providing improved classification.
Xiong (modified by Kumar) does not explicitly disclose decoding the compressed frame feature using a recurrent neural network.
However, Zhu teaches decoding the compressed frame features using a recurrent neural network (fig. 2, fig. 4 and [0029], [0052]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Zhu with Xiong (modified by Kumar) for the benefit of providing improved image processing.
Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Xiong et al., (U.S. Pub. No. 20220374635-A1) in view of Kumar et al., (U.S. Patent No. 11301684-B2)and further in view of Lu et al., (U.S. Pub. No. 20180144248-A1).
As per claim 7, Xiong (modified by Kumar) as a whole teaches everything as claimed above, see claim 6. Xiong does not explicitly disclose performing a convolution operation on at least one frame the video, wherein a layer of the recurrent neural network takes a hidden state from a previous layer and an output of the convolution operation as input.
However, Kumar teaches performing a convolution operation on at least one frame of the video (col. 7 lines 1-6).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Kumar with Xiong for the benefit of providing improved classification.
Xiong (modified by Kumar) does not explicitly disclose wherein a layer of the recurrent neural network takes a hidden state from a previous layer and an output of the convolution layer as input.
However Lu teaches wherein a layer of the recurrent neural network takes a hidden state from a previous layer and an output of the convolution layer as input ([0018]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Lu with Xiong (modified by Kumar) for the benefit of yielding predictable results of improving accuracy of predictions.
Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Xiong et al., (U.S. Pub. No. 20220374635-A1) in view of Kumar et al., (U.S. Patent No. 11301684-B2)and further in view of Apparalaju et al., (U.S. Patent No. 10909728-B1).
As per claim 9, Xiong (modified by Kumar) as a whole teaches everything as claimed above, see claim 1. Xiong does not explicitly disclose a compression ratio of the compressed frame features is at least 2.
However, Appalaraju teaches a compression ratio of the compressed frame feature is at least 2 (col. 3 lies 40-63).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Appalaraju with Xiong (modified by Kumar) for the benefit of providing improved image quality and increased coding efficiency.
Claim(s) 13-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Golinski et al., (U.S. Pub. No. 20210281867-A1) in view of Kumar et al., (U.S. Patent No. 11301684-B1).
As per claim 13, Golinski teaches an apparatus comprising: an encoder network configured to compress each of a plurality of frame of video to obtain compressed frame features (abstract, [0002], [0005], [0073], [0100], [0121]), wherein the encoder network is trained to compress the plurality of frames by comparing the plurality of frames to reconstructed frames that are based on the compressed frame features (fig. 4 and [0121] and “The encoder portion 432 can be trained to compress input data 440 (e.g., video frames) using motion compensation based on previous representations (e.g., one or more previously reconstructed frames)” The Examiner notes that motion compensation compares current frames to reconstructed frames); a classification network (fig. 2D, classification section and [0082]), Golinski does not explicitly disclose to classify the compressed frame features to obtain action classification information for an action in the video that spans the plurality of frames of the video, wherein the action classification information comprises a prediction of a likelihood that the video depicts the action.
However, Kumar teaches to classify the compressed frame features to obtain action classification information for an action in the video that span the plurality of frames of the video (abstract, abstract, col. 2 lines 36-40, lines 55-58; col. 4 lines 6-17, col. 5 lines 40-44; and figs. 1-2), wherein the action classification information comprises a prediction of a likelihood that the video depicts the action (col. 5 lines 57-65, col.10 lines 38-67 and figs.1-2).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Kumar with Golinski for the benefit of providing improved classification.
As per claim 14, Golinski (modified by Kumar) as a whole teaches everything as claimed above, see claim 13. In addition, Golinski teaches a camera configured to capture the video ([0034], [0081], [0116] and figs. 2D, 4).
As per claim 15, Golinski (modified by Kumar) as a whole teaches everything as claimed above, see claim 13. In addition, Golinski teaches a reporting component configured to report the classification information to central server.
However, Kumar teaches a reporting component configured to report the classification information to a central server (fig. 11; col.19 lines 9-23).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Kumar with Golinski for the benefit of providing improved classification.
As per claim 16, Golinski (modified by Kumar) as a whole teaches everything as claimed above, see claim 13. In addition, Golinski teaches a decoder network configured to generate a reconstructed video based on the compressed frame features (abstract, [0005], [0073], [0110], and fig. 4).
Claim(s) 17-18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Golinski et al., (U.S. Pub. No. 20210281867 ) in view of Kumar et al., (U.S. Patent No. 11301684-B1 ) and further in view of Tran et al., (U.S Pub. No. 20230344962 A1).
As per claim 17, Golinski (modified by Kumar) as a whole teaches everything as claimed above, see claim 13. Although Golinski discloses the connections between layers of a neural network may be fully connected or locally connected, [0078] and fig. 2A, Golinski does not explicitly disclose the classification network comprises a three-dimensional convolution layer and a fully connected layer.
However, Tran teaches the classification network comprises a three-dimensional convolution layer and a fully connected layer ([0024], [0038-0039], and figs. 3A-3B).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Tran with Golinski (modified by Kumar) for providing improved performance.
As per claim 18, Golinski (modified by Kumar and Zhou) as a whole teaches everything as claimed above, see claim 17. Golinski does not explicitly disclose the classification network comprises a two-dimensional convolution layer.
However, Tran teaches the classification network comprises a two-dimensional convolution layer ([0039], [0041], [0054] and at least fig. 12).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Tran with Golinski (modified by Kumar) for providing improved performance.
As per claim 20, Golinski (modified by Kumar) as a whole teaches everything as claimed above, see claim 13. Golinski does not explicitly disclose the classification network comprises an attention layer.
However, Tran teaches the classification network comprises an attention layer ([0025], [0057], [0061]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Tran with Golinski (modified by Kumar) for providing improved performance.
Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Golinski et al., (U.S. U.S. Pub. No. 20210281867 ) in view of Kumar et al., (U.S. Patent No. 11301684-B1 ) and further in view of Zhu et al., (U.S. Pub. No. 2021/0289227 A1).
As per claim 19, Golinski (modified by Kumar) as a whole teaches everything as claimed above, see claim 13. In addition, Golinski teaches the classification network comprises a convolution component ([0068-0069], [0075-0077], [0080-0082] and fig. 2D). Golinski does not explicitly disclose the classification network comprises a recurrent neural network.
However, Zhu teaches the classification network comprises a recurrent neural network ([0027], [0029-0030]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Zhu with Golinski (modified by Kumar) for the benefit of providing improved image processing and classification.
Allowable Subject Matter
Claims 3 and 5 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA PRINCE whose telephone number is (571)270-1821. The examiner can normally be reached M-F 7:30-3:30 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jamie Atala can be reached at 571-272-7384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
JESSICA PRINCE
Examiner
Art Unit 2486
/JESSICA M PRINCE/ Primary Examiner, Art Unit 2486