DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of prior-filed applications
15/456,294 filed 3/10/2017, now patent 10147193 issued 12/04/2018.
15/693,446 filed 8/31/2017, now patent 10067509 issued 9/04/2018.
15/796,769 filed 10/28/2017, now patent 10311312 issued 6/04/2019.
under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. Claims 1-20 receive priority to application 15/693,446 filed 8/31/2017.
Response to Amendment
This action is in response to amendments and remarks filed on 09/02/2021. Claims 1-20 are considered in this office action. Claims 1, 4, 6-7, 10-11, 13, 15-16, and 20 have been amended. Claims 1-20 are pending examination.
Response to Arguments
Applicant presents the following arguments regarding the previous office action:
None of the cited references teach the newly added claim limitation “the different dilation rates not having a common factor relationship”
Applicant’s argument A. with respect to the independent claims pertains to newly added claim limitation amendments which have been considered and addressed as detailed below under Claim Rejections.
Drawings
Color photographs and color drawings are not accepted in utility applications unless a petition filed under 37 CFR 1.84(a)(2) is granted. Any such petition must be accompanied by the appropriate fee set forth in 37 CFR 1.17(h), one set of color drawings or color photographs, as appropriate, if submitted via EFS-Web or three sets of color drawings or color photographs, as appropriate, if not submitted via EFS-Web, and, unless already present, an amendment to include the following language as the first paragraph of the brief description of the drawings section of the specification:
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Color photographs will be accepted if the conditions for accepting color drawings and black and white photographs have been satisfied. See 37 CFR 1.84(b)(2).
Claim Objections
Claim 11 is objected to because of the following informalities:
Claim 11 line 2 “rates including using” should read “rates includes using”
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-20 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 1, claim 7, and claim 13, the term “common factor relationship” is unclear and thus renders the claims indefinite. Where applicant acts as his or her own lexicographer to specifically define a term of a claim contrary to its ordinary meaning, the written description must clearly redefine the claim term and set forth the uncommon definition so as to put one reasonably skilled in the art on notice that the applicant intended to so redefine that claim term. Process Control Corp. v. HydReclaim Corp., 190 F.3d 1350, 1357, 52 USPQ2d 1029, 1033 (Fed. Cir. 1999). The accepted ordinary meaning of “common factor relationship” in the art is “related using a common mathematical function” (e.g. the numbers 2, 4, 8, 16 are related using a common exponential function of 2n (n=1,2,…); the numbers 3, 6, 9, 12 are related using a common factor function of 3*n (n=1,2,…)). Par. [0063]-[0064] of the instant specification describe the image processing module using hybrid dilation convolution (HDC) to address the problem of “gridding” where instead of using the same dilation rate for all layers, a different dilation rate is used for each layer, and the process uses arbitrary dilation rates that should not have a common factor relationship (e.g., like 2, 4, 8, etc.) within a group of layers. The specification describes an example embodiment that forms three succeeding layers as a group, and changes their dilation rates from all being r=2 to being r=1, r=2, and r=3, respectively. However, according the accepted ordinary meaning of “common factor relationship”, which the dilation rates of a group of layers should not have according to the instant specification, a group of three succeeding layers with dilation rates of r=1, r=2, and r=3, respectively, would constitute a group of layers with dilation rates having a common factor relationship (i.e. a common factor function of r=1*n (n=1,2,3)). Thus, the meaning assigned to the term “common factor relationship” used in the claims is not the same as the ordinary accepted meaning for the term “common factor relationship” used in the art, and the instant specification does not appear to provide any description that clearly redefines the claim term “common factor relationship” or sets forth the uncommon definition. Therefore, the claims are rendered indefinite. For the purposes of examination, Examiner is interpreting the limitation “the different dilation rates not having a common factor relationship” in claims 1, 7, and 13 as “the different dilation rates having an arbitrary value”.
Claims 2-6 and 19, claims 8-12 and 20, and claims 14-18 are rejected based on rejected base claim 1, claim 7, and claim 13, respectively, for the same rationale as recited above.
Regarding claim 11, the element “a range of different dilation rates” is unclear and renders the claim indefinite. It is unclear if the “range of different dilation rates” is the same “range of different dilation rates” recited earlier in claim 7 or a different “range of different dilation rates”. Therefore, the claim is rendered indefinite.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (“DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs”. 12 May 2017. arXiv: 1606.00915v2, v2, all pages) in view of Yu et al. (“Multi-Scale Context Aggregation by Dilated Convolutions”. 30 Apr 2016. arXiv: 1511.07122v3, v3, all pages) and further in view of Brand et al. (“Instance-level segmentation of vehicles by deep contours”. Asian Conference on Computer Vision. Springer, Cham, 2016).
Regarding claim 1, Chen teaches “A system comprising a data processor (it is inherent that a processor is needed for neural network operations); and an occluding object contour detection processing module (Fig. 11 shows occluded objects detected and separated from foreground elements), executable by the data processor, the occluding object contour detection processing module being configured to at least: receive an input image (Pg. 4, section 3, paragraph 4 and Fig. 6, 11); produce a feature map from the input image by semantic segmentation (Abstract discloses DeepLab is the State of the Art at semantic segmentation task at the time and Fig. 6, 11 shows object instances separated, which a person having ordinary skill in the art would recognize is a feature map, produced by DeepLab); apply a range of different dilation rates to the feature map to produce a final feature map (Fig. 4 and Fig. 7 show multiple filters with different dilation rates to produce the final map) maintaining a resolution corresponding to training labels (section 3.2, To produce the final result, we bilinearly interpolate the feature maps from the parallel DCNN branches to the original image resolution and fuse them); match object shapes from the training labels to objects and object instances detected in the input image (introduction, pg. 2, left hand column, second to last paragraph and fig. 1); generate, based on the object shapes, contour information of the objects and object instances detected in the input image (fig. 10 shows generated object boundaries)”. However, Chen does not explicitly teach “wherein the different dilation rates are applied to each of a plurality of convolution layers, the different dilation rate not having a common factor relationship” and the processor configured to “apply the contour information onto the final feature map”.
From the same field of endeavor, Yu teaches “wherein the different dilation rates are applied to each of a plurality of convolution layers, the different dilation rate not having a common factor relationship (Section 2 Par. 4 lines 5-6 teaches the dilated convolution operator applying the same filter at different ranges using different dilation factors; Table 1 and Section 3 Par. 3 lines 1-2 teaches the basic context module has 7 layers that apply 3x3 convolutions with different dilation factors, where the dilations are 1, 1, 2, 4, 8, 16, and 1 (i.e. arbitrary with no common factor relationship))”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the disclosed invention to modify the teachings of Chen to incorporate the teachings of Yu to have the different dilation rates taught by Chen be applied to each of a plurality of convolution layers and not have a common factor relationship as taught by Yu.
The motivation for doing so would be to increase accuracy of the feature maps by passing them through multiple layers that expose contextual information (Yu, Section 3 Par. 2 lines 4-5).
However, the combination of Chen and Yu does not explicitly teach the processor configured to “apply the contour information onto the final feature map”.
From the same field of endeavor, Brand teaches the processor configured to “generate, based on the object shapes, contour information of the objects and object instances detected in the input image (fig. 2, middle image shows detecting contours from the feature map, and sections 3.1 teaches detecting contours in the image); and apply the contour information onto the final feature map (fig. 2, right most image shows the detected contours are applied onto the feature map and filled, and section 3.3 teaches applying the contours to the upscaled feature map)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination of Chen and Yu to incorporate the teachings of Brand to have the semantic segmentation and dilated convolution process within a fully convolutional neural network, trained on traffic scenes, to create a dense feature map as taught by the combination of Chen and Yu to further include generating contour information and applying it onto an upscaled feature map within a fully convolutional network as taught by Brand.
The motivation for doing so would be because the detection and segmentation of objects into their single instances is an important task on the path towards full understanding of traffic scenes (Brand, introduction, par. 1).
Regarding claim 2, the combination of Chen, Yu, and Brand teaches all the limitations of claim 1 above, and further teaches “wherein the semantic segmentation (Chen, Pg. 4, section 3.1, paragraphs 1-2) is machine learnable (Chen, Section 4.1, 4.1.1 and pg. 2, left hand column, par. 3, A deep convolutional neural network trained in the task of image classification is re-purposed to the task of semantic segmentation).
Regarding claim 3, the combination of Chen, Yu, and Brand teaches all the limitations of claim 1 above, and further teaches “wherein the semantic segmentation is performed by a deep convolutional neural network (Chen, Abstract, DeepLab is a DCNN) trained on a dataset configured for a traffic environment (Chen, Section 4.4, based on Cityscape data)”.
Regarding claim 4, the combination of Chen, Yu, and Brand teaches all the limitations of claim 1 above, and further teaches “the system of claim 1 being configured to operate within a fully convolutional network (Chen, Pg. 2, paragraph 3, transforming all full connected layers to fully convolutional network and pg. 4, section 3, paragraph 1)”.
Regarding claim 5, the combination of Chen, Yu, and Brand teaches all the limitations of claim 1 above, and further teaches “wherein the contour information is produced without the use of bounding boxes (Chen. section 3.1-3.3, contour information is produced by Atrous convolution, pyramid pooling and CRFs and Fig. 5 shows no bounding boxes are used in producing contour information)”.
Regarding claim 6, the combination of Chen, Yu, and Brand teaches all the limitations of claim 1 above, and further teaches “wherein the contour information enables an autonomous control subsystem to control a vehicle without a driver (Brand, Introduction, In the context of autonomous driving, a full scene understanding of the environment is also crucial, because objects must be recognized for collision avoidance and path planning. This is saying being able to detect individual instances of objects (through the use of contour information) is needed for the implementation of collision avoidance and path planning. In autonomous vehicles, collision avoidance and path planning are implemented to control the vehicle)”.
Regarding claim 1, Chen teaches “A method comprising: receiving an input image (Pg. 4, section 3, paragraph 4 and Fig. 6, 11); producing a feature map from the input image by semantic segmentation (Abstract discloses DeepLab is the State of the Art at semantic segmentation task at the time and Fig. 6, 11 shows object instances separated, which a person having ordinary skill in the art would recognize is a feature map, produced by DeepLab); applying a range of different dilation rates to the feature map to produce a final feature map (Fig. 4 and Fig. 7 show multiple filters with different dilation rates to produce the final map) maintaining a resolution corresponding to training labels (section 3.2, To produce the final result, we bilinearly interpolate the feature maps from the parallel DCNN branches to the original image resolution and fuse them); matching object shapes from the training labels to objects and object instances detected in the input image (introduction, pg. 2, left hand column, second to last paragraph and fig. 1); generating, based on the object shapes, contour information of the objects and object instances detected in the input image (fig. 10 shows generated object boundaries)”. However, Chen does not explicitly teach “wherein the different dilation rates are applied to each of a plurality of convolution layers, the different dilation rate not having a common factor relationship” and “applying the contour information onto the final feature map”.
From the same field of endeavor, Yu teaches “wherein the different dilation rates are applied to each of a plurality of convolution layers, the different dilation rate not having a common factor relationship (Section 2 Par. 4 lines 5-6 teaches the dilated convolution operator applying the same filter at different ranges using different dilation factors; Table 1 and Section 3 Par. 3 lines 1-2 teaches the basic context module has 7 layers that apply 3x3 convolutions with different dilation factors, where the dilations are 1, 1, 2, 4, 8, 16, and 1 (i.e. arbitrary with no common factor relationship))”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the disclosed invention to modify the teachings of Chen to incorporate the teachings of Yu to have the different dilation rates taught by Chen be applied to each of a plurality of convolution layers and not have a common factor relationship as taught by Yu.
The motivation for doing so would be to increase accuracy of the feature maps by passing them through multiple layers that expose contextual information (Yu, Section 3 Par. 2 lines 4-5).
However, the combination of Chen and Yu does not explicitly teach “applying the contour information onto the final feature map”.
From the same field of endeavor, Brand teaches “generating, based on the object shapes, contour information of the objects and object instances detected in the input image (fig. 2, middle image shows detecting contours from the feature map, and sections 3.1 teaches detecting contours in the image); and applying the contour information onto the final feature map (fig. 2, right most image shows the detected contours are applied onto the feature map and filled, and section 3.3 teaches applying the contours to the upscaled feature map)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination of Chen and Yu to incorporate the teachings of Brand to have the semantic segmentation and dilated convolution process within a fully convolutional neural network, trained on traffic scenes, to create a dense feature map as taught by the combination of Chen and Yu to further include generating contour information and applying it onto an upscaled feature map within a fully convolutional network as taught by Brand.
The motivation for doing so would be because the detection and segmentation of objects into their single instances is an important task on the path towards full understanding of traffic scenes (Brand, introduction, par. 1).
Regarding claim 8, the combination of Chen, Yu , and Brand teaches all the limitations of claim 7 above, and further teaches “wherein the range of different dilation rates are applied in an encoding phase (Chen, section 3.1, pg. 5, par. 1, We have adopted instead a hybrid approach that strikes a good efficiency/accuracy trade-off, using atrous convolution to increase by a factor of 4 the density of computed feature maps and Conclusion, atrous spatial pyramid pooling, which encodes objects as well as image context at multiple scales)”.
Regarding claim 9, the combination of Chen, Yu, and Brand teaches all the limitations of claim 7 above, and further teaches “applying convolutional operations directly on the feature map to generate a pixel-wise prediction map (Chen, section 3.1 and fig. 3, feature extraction with atrous convolution applied on a high resolution input feature map)”.
Regarding claim 10, the combination of Chen, Yu, and Brand teaches all the limitations of claim 7 above, and further teaches “using dense upsampling convolution with the semantic segmentation (Yu, Introduction Par. 4 lines 1-5 teaches a convolutional network module designed specifically for dense prediction that aggregates multi-scale contextual information without losing resolution or analyzing rescaled images comprising a rectangular prism of convolutional layers with no pooling or subsampling; Section 4 Par. 2 lines 4-7 teaches removing the pooling layers and the padding of the intermediate feature maps from the network (i.e. removes upsampling and padding and applies convolutional operations directly on feature maps for dense prediction))”. NOTE: “Dense upsampling convolution” is being interpreted by Examiner as corresponding to Par. [0055] lines 7-10 of the instant specification stating “Instead of performing bilinear upsampling, which is not learnable, or deconvolution in which zeros have to be padded in the unpooling step before the convolution operation, DUC (dense upsampling convolution) applies convolutional operations directly on the feature maps to get the pixel-wise prediction map”.
Regarding claim 11, the combination of Chen, Yu, and Brand teaches all the limitations of claim 7 above, and further teaches “wherein applying a range of different dilation rates including using hybrid dilation convolution with the semantic segmentation (Yu, Section 4 Par. 2 lines 2-3 teaches removing the last two pooling and striding layers from the network entirely; Table 1 and Section 3 Par. 3 lines 1-2 teaches the basic context module has 7 layers that apply 3x3 convolutions with different dilation factors, where the dilations are 1, 1, 2, 4, 8, 16, and 1 (i.e. different dilation rates concatenated serially))”. NOTE: “Hybrid dilation convolution” is being interpreted by Examiner as corresponding to Par. [0059] lines 13-15 of the instant specification stating “a simple hybrid dilation convolution (HDC)…instead of using the same rate of dilation for the same spatial resolution, the example embodiment uses a range of dilation rates and concatenates them serially the same way as “blocks” in ResNet-101”.
Regarding claim 12, the combination of Chen, Yu, and Brand teaches all the limitations of claim 7 above, and further teaches “wherein the contour information is used by an autonomous vehicle motion planner to control a vehicle without a driver (Brand, Introduction, In the context of autonomous driving, a full scene understanding of the environment is also crucial, because objects must be recognized for collision avoidance and path planning. This is saying being able to detect individual instances of objects (through the use of contour information) is needed for the implementation of collision avoidance and path planning. In autonomous vehicles, collision avoidance and path planning are implemented to control the vehicle)”.
Regarding claim 13, Chen teaches “A non-transitory machine-useable storage medium embodying instructions which, when executed by at least one processor, cause the at least one processor (a memory storing instructions operated by a machine is inherent for operating a neural network) to at least: receive an input image (Pg. 4, section 3, paragraph 4 and Fig. 6, 11); produce a feature map from the input image by semantic segmentation (Abstract discloses DeepLab is the State of the Art at semantic segmentation task at the time and Fig. 6, 11 shows object instances separated, which a person having ordinary skill in the art would recognize is a feature map, produced by DeepLab); apply a range of different dilation rates to the feature map to produce a final feature map (Fig. 4 and Fig. 7 show multiple filters with different dilation rates to produce the final map) maintaining a resolution corresponding to training labels (section 3.2, To produce the final result, we bilinearly interpolate the feature maps from the parallel DCNN branches to the original image resolution and fuse them); match object shapes from the training labels to objects and object instances detected in the input image (introduction, pg. 2, left hand column, second to last paragraph and fig. 1); generate, based on the object shapes, contour information of the objects and object instances detected in the input image (fig. 10 shows generated object boundaries)”. However, Chen does not explicitly teach “wherein the different dilation rates are applied to each of a plurality of convolution layers, the different dilation rate not having a common factor relationship” and the processor caused to “apply the contour information onto the final feature map”.
From the same field of endeavor, Yu teaches “wherein the different dilation rates are applied to each of a plurality of convolution layers, the different dilation rate not having a common factor relationship (Section 2 Par. 4 lines 5-6 teaches the dilated convolution operator applying the same filter at different ranges using different dilation factors; Table 1 and Section 3 Par. 3 lines 1-2 teaches the basic context module has 7 layers that apply 3x3 convolutions with different dilation factors, where the dilations are 1, 1, 2, 4, 8, 16, and 1 (i.e. arbitrary with no common factor relationship))”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the disclosed invention to modify the teachings of Chen to incorporate the teachings of Yu to have the different dilation rates taught by Chen be applied to each of a plurality of convolution layers and not have a common factor relationship as taught by Yu.
The motivation for doing so would be to increase accuracy of the feature maps by passing them through multiple layers that expose contextual information (Yu, Section 3 Par. 2 lines 4-5).
However, the combination of Chen and Yu does not explicitly teach the processor caused to “apply the contour information onto the final feature map”.
From the same field of endeavor, Brand teaches the processor configured to “generate, based on the object shapes, contour information of the objects and object instances detected in the input image (fig. 2, middle image shows detecting contours from the feature map, and sections 3.1 teaches detecting contours in the image); and apply the contour information onto the final feature map (fig. 2, right most image shows the detected contours are applied onto the feature map and filled, and section 3.3 teaches applying the contours to the upscaled feature map)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination of Chen and Yu to incorporate the teachings of Brand to have the semantic segmentation and dilated convolution process within a fully convolutional neural network, trained on traffic scenes, to create a dense feature map as taught by the combination of Chen and Yu to further include generating contour information and applying it onto an upscaled feature map within a fully convolutional network as taught by Brand.
The motivation for doing so would be because the detection and segmentation of objects into their single instances is an important task on the path towards full understanding of traffic scenes (Brand, introduction, par. 1).
Regarding claim 14, the combination of Chen, Yu, and Brand teaches all the limitations of claim 13 above, and further teaches “The non-transitory machine-useable storage medium of claim 13 being configured to apply a range of different dilation rates as part of the convolution operation (Chen, fig. 4 and fig. 7 shows multiple dilation rates used in Atrous convolution)”.
Regarding claim 15, the combination of Chen, Yu, and Brand teaches all the limitations of claim 13 above, and further teaches “wherein the semantic segmentation is performed by a deep convolutional neural network trained (Chen, introduction, pg. 2, left hand column second to last paragraph, deep convolutional neural network trained in the task of image classification is re-purposed to the task of semantic segmentation) on a cityscape dataset (Chen, section 4.4 and fig. 13)”.
Regarding claim 16, the combination of Chen, Yu, and Brand teaches all the limitations of claim 13 above, and further teaches “The non-transitory machine-useable storage medium of claim 13 being configured to use conditional random fields (Chen, pg. 6, section 3.3, left hand column, using CRFs to produce accurate semantic segmentation results and recover object boundaries at a level of detail)”.
Regarding claim 18, the combination of Chen, Yu, and Brand teaches all the limitations of claim 13 above, and further teaches “wherein the contour information is used by an autonomous vehicle motion planner to plan a route for an autonomous vehicle (Brand, introduction, In the context of autonomous driving, a full scene understanding of the environment is also crucial, because objects must be recognized for collision avoidance and path planning. This is saying being able to detect individual instances of objects (through the use of contour information) is needed for the implementation of collision avoidance and path planning. In autonomous vehicles, collision avoidance and path planning are implemented to control the vehicle)”.
Regarding claim 19, the combination of Chen, Yu, and Brand teaches all the limitations of claim 1 above, and further teaches “The system of claim 1 being further configured to use dense upsampling convolution (DUC) to generate pixel-level predictions of objects detected in the input image (Yu, Introduction Par. 4 lines 1-5 teaches a convolutional network module designed specifically for dense prediction that aggregates multi-scale contextual information without losing resolution or analyzing rescaled images comprising a rectangular prism of convolutional layers with no pooling or subsampling; Section 4 Par. 2 lines 4-7 teaches removing the pooling layers and the padding of the intermediate feature maps from the network (i.e. removes upsampling and padding and applies convolutional operations directly on feature maps for dense prediction))”. NOTE: “Dense upsampling convolution” is being interpreted by Examiner as corresponding to Par. [0055] lines 7-10 of the instant specification stating “Instead of performing bilinear upsampling, which is not learnable, or deconvolution in which zeros have to be padded in the unpooling step before the convolution operation, DUC (dense upsampling convolution) applies convolutional operations directly on the feature maps to get the pixel-wise prediction map”.
Regarding claim 20, the combination of Chen, Yu, and Brand teaches all the limitations of claim 7 above, and further teaches “The method of claim 7 including using dense upsampling convolution to generate pixel-level predictions of objects detected in the input image (Yu, Introduction Par. 4 lines 1-5 teaches a convolutional network module designed specifically for dense prediction that aggregates multi-scale contextual information without losing resolution or analyzing rescaled images comprising a rectangular prism of convolutional layers with no pooling or subsampling; Section 4 Par. 2 lines 4-7 teaches removing the pooling layers and the padding of the intermediate feature maps from the network (i.e. removes upsampling and padding and applies convolutional operations directly on feature maps for dense prediction))”. NOTE: “Dense upsampling convolution” is being interpreted by Examiner as corresponding to Par. [0055] lines 7-10 of the instant specification stating “Instead of performing bilinear upsampling, which is not learnable, or deconvolution in which zeros have to be padded in the unpooling step before the convolution operation, DUC (dense upsampling convolution) applies convolutional operations directly on the feature maps to get the pixel-wise prediction map”.
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (“DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs”. 12 May 2017. arXiv: 1606.00915v2, v2, all pages) in view of Yu et al. (“Multi-Scale Context Aggregation by Dilated Convolutions”. 30 Apr 2016. arXiv: 1511.07122v3, v3, all pages), in view of Brand et al. (“Instance-level segmentation of vehicles by deep contours”. Asian Conference on Computer Vision. Springer, Cham, 2016), and further in view of Zitnick et al. (“Edge Boxes: Locating Object Proposals from Edges”. Computer Vision – ECCV 2014. LNIP, vol 8693. Springer, Cham, 2014).
Regarding claim 17, the combination of Chen, Yu, and Brand teaches all the limitations of claim 13 above, however the combination of Chen, Yu, and Brand does not explicitly teach “wherein the contour information is produced in addition to the use of bounding boxes”.
From the same field of endeavor, Zitnick teaches “wherein the contour information is produced in addition to the use of bounding boxes (Fig. 1 shows contour information being produced and additionally bounding boxes are also used)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination of Chen, Yu, and Brand to incorporate the teachings of Zitnick to have the method for producing contour information as taught by the combination of Chen, Yu, and Brand further include producing contour information and also using boundary boxes as taught by Zitnick.
The motivation for doing so would be because the number of contours wholly enclosed by a bounding box is indicative of the likelihood of the box containing an object (Zitnick, pg. 3, par. 2). One would want to determine the likelihood of a box containing an object because this allows the system to instead of searching for an object at every image location and scale, a set of object bounding box proposals is first generated which reduces the set of positions that need to be further analyzed (Zitnick, introduction, par. 1).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Sun et al. (US 2019/0228529 A1) teaches performing image segmentation within a FCN and the performing dilated convolution with a different dilation for each layer.
Sun et al. (US 9858496 B2) teaches generating a convolutional feature map and then creating object proposals on the feature map.
Song et al. (CN 105787482 A) teaches using a DCNN to obtain target outline image segmentation for automobile contour segmentation application.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATHERINE M FITZHARRIS whose telephone number is (469)295-9147. The examiner can normally be reached on 7:30 am - 6:00 pm M-Th.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHRISTIAN CHACE can be reached on (571)272-4190. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/K.M.F./Examiner, Art Unit 3665 /CHRISTIAN CHACE/Supervisory Patent Examiner, Art Unit 3665