DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner’s Note
The instant application has a lengthy prosecution history and the examiner encourages the applicant to have an interview (telephonic or personal) with the examiner prior to filing a response to the instant office action. Also, prior to the interview the examiner encourages the applicant to present multiple possible claim amendments, so as to enable the examiner to identify claim amendments that will advance prosecution in a meaningful manner.
Continued Examination Under 37 CFR 1.114
The present application is being examined under the pre-AIA first to invent provisions. A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 11/17/2025 has been entered.
Response to Arguments/Amendments
Presented arguments have been fully considered, but are rendered moot in view of the new ground(s) of rejection necessitated by amendment(s) initiated by the applicant(s).
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-7 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Applicant has not pointed out where the new (or amended) claim is supported, nor does there appear to be a written description of the claim limitation ‘a plurality of feature extraction filtering configurations’ in the application as filed (claim 1).
When an amendment is filed in reply to an objection or rejection based on 35 U.S.C. 112(a) or pre-AIA 35 U.S.C. 112, first paragraph, a study of the entire application is often necessary to determine whether or not "new matter" is involved. Applicant should therefore specifically point out the support for any amendments made to the disclosure. MPEP 2163.06 I.
Claims 1-7 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA the applicant regards as the invention.
Claims 1 recites the limitation “the filtering configurations” in line 7. There is insufficient antecedent basis for this limitation in the claim.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-2 are rejected under 35 U.S.C. 103 as being unpatentable over Fan Chen et al. [US 20190026864 A1: already of record] in view of Matteo Tiezzi et al. [Foveated Neural Computation].
Regarding claim 1, Fan teaches:
1. A method (i.e. An embodiment of a semiconductor package apparatus may include technology to identify a region of interest portion of a first image, and render the region of interest portion with super-resolution. Other embodiments are disclosed and claimed- Abstract), comprising:
defining a plurality of regions of pixels (i.e. a camera may capture an image of a pupil and the system may determine where the user is looking (e.g., a focus area, depth, and/or direction). The camera may capture pupil dilation information and the system may infer where the user's focus area is based on that information- ¶0030 …One problem in foveated rendering is producing a smooth transition between the central vision with high pixel details (e.g., near the ROI) and the peripheral vision with less pixel details- ¶0050) in an image sensing pixel array (i.e. Turning now to FIGS. 11A to 11F, embodiments of regions of interest for foveated encoding may be represented by any of a variety of different shapes and sizes. An image area 110 may generally have a rectangular shape or a square shape. A focus area 112 may have any suitable shape such as circular (e.g., FIG. 11A), elliptical (e.g., FIGS. 11B and 11D), square or rectangular (e.g., FIG. 11E), a point (e.g., FIG. 11C), or arbitrary (e.g., FIG. 11F). A ROI 114 may have any suitable shape such as a square (e.g., FIGS. 11A, 11C, and 11E) or a rectangle (e.g., FIGS. 11B, 11D, and 11F). The size of the ROI 114 may be fixed or may be adjusted based on the size of the focus area 112. For example, the size of the ROI 114 may correspond to the size of the focus area 112 plus some delta X and delta Y (e.g., which may be different from each other). The ROI 114 may generally be bigger than the focus area 112 to provide a smooth transition between the relatively higher quality central vision region and the relatively lower quality peripheral vision region- ¶0058);
associating a plurality of filtering configurations with the plurality of regions respectively (i.e. The size of the ROI 114 may be fixed or may be adjusted based on the size of the focus area 112. For example, the size of the ROI 114 may correspond to the size of the focus area 112 plus some delta X and delta Y (e.g., which may be different from each other). The ROI 114 may generally be bigger than the focus area 112 to provide a smooth transition between the relatively higher quality central vision region and the relatively lower quality peripheral vision region- ¶0058);
generating, using the image sensing pixel array, image data representative of an image of a scene (i.e. a camera may capture an image of a pupil and the system may determine where the user is looking (e.g., a focus area, depth, and/or direction). The camera may capture pupil dilation information and the system may infer where the user's focus area is based on that information- ¶0030);
selecting first input data generated for the image by a first block of pixels in the image sensing pixel array located in a first region among the plurality of regions (i.e. For each input image to be rendered at block 101, an image for the central vision may be extracted at block 102 and provided to the trained super-resolution network at block 103- ¶0055);
performing, using a multiplier-accumulator unit, a dot product between the first weight matrix and the first image data to obtain first feature data representative of the first image data being filtered via a first kernel of a convolutional neural network (i.e. Some embodiments may provide a super-resolution technique/framework that may benefit from the foveated characteristic of the human vision. For example, some embodiments may train a super-resolution neural network with synthesized foveated rendering images, which may have high resolution details from super-resolution as well as a smooth transition from high-quality pixels to medium quality peripheral pixels. When the resulting foveated image is rendered on a VR display, for example, some embodiments may achieve less-visible boundaries between the central vision region and the peripheral vision regions to reduce or avoid artifact that may distract the viewer. An example of a suitable super-resolution neural network may include a convolutional neural network (CNN) such as super-resolution CNN (SRCNN) or a fast CNN such as a fast super-resolution CNN (FSRCNN)- ¶0053).
However, Fan does not teach explicitly:
a plurality of feature extraction filtering configurations… according to the filtering configurations; identifying a first weight matrix associated with the first region in the plurality of filtering configurations.
In the same field of endeavor, Matteo teaches:
a plurality of feature extraction filtering configurations (i.e. Fig. 1: Top-left: out-of-the-box FCLs. Bottom-left: example of R = 4 regions in the piecewise-defined kernel case, when the attention a is given. Right: three strategies (one-per-column) to implement a piecewise-defined kernel, with examples of spatial coverage of the 4 region-wise kernels (coordinates not covered due to dilation are blank). We report right after the strategy name further operations needed to fulfil the uniform spatial coverage assumption- page 5… This implies that all the region-defined filters share related semantics across the image plane, due to the shared nature of the learnable kj . A natural alternative to this model consists in using independent learnable filters in each region- page 7, ¶2) according to the filtering configurations (i.e. In this paper we propose to go beyond such a scheme, introducing the notion of Foveated Convolutional Layer (FCL), that formalizes the idea of location-dependent convolutions with foveated processing, i.e., fine-grained processing in a given-focused area and coarser processing in the peripheral regions); identifying a first weight matrix associated with the first region in the plurality of filtering configurations (i.e. For each kernel kj :R2 → R, j = 1, . . . , F, the convolution between I and kj is defined as equation 1- page 4, ¶1).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan with the teachings of Matteo to efficiently handle the information in the peripheral regions, eventually avoiding the development of misleading biases (Matteo- Abstract).
Regarding claim 2, Fan and Matteo teach all the limitations of claim 1.
However, Fan does not teach explicitly:
wherein the plurality of filtering configurations identify a plurality of kernels of different kernel sizes for the plurality of regions respectively.
In the same field of endeavor, Matteo teaches:
wherein the plurality of filtering configurations identify a plurality of kernels of different kernel sizes for the plurality of regions respectively (i.e. The first two ones are based on the fact that the cost of convolution is directly proportional to the number of spatial components of the kernel, thus the computational burden can be controlled by reducing the size of the kernel defined in each region in function of ri. However, a smaller kernel size implies covering smaller receptive inputs, thus violating the previously introduced uniform spatial coverage assumption- page 6, ¶3)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan and Matteo with the teachings of Paolo allow the keypoints to be uniquely identified in each image (¶0077- Paolo).
Claims 3-6 are rejected under 35 U.S.C. 103 as being unpatentable over Fan Chen et al. [US 20190026864 A1: already of record] in view of Matteo Tiezzi et al. [Foveated Neural Computation] and further in view of Paolo Di Febbo et al. [US 20180268256 A1: already of record].
Regarding claim 3, Fan and Matteo teach all the limitations of claim 2.
However, Fan and Matteo do not teach explicitly:
further comprising: selecting, according to the filtering configurations, second image data generated for the image by a second block of pixels in the image sensing pixel array located in a second region, different from the first region, among the plurality of regions; identifying a second weight matrix associated with the second region in the plurality of filtering configurations; and performing a dot product between the second weight matrix and the second image data to obtain second feature data representative of the second image data being filtered by a second kernel.
In the same field of endeavor, Paolo teaches:
further comprising: selecting, according to the filtering configurations, second image data generated for the image by a second block of pixels in the image sensing pixel array located in a second region, different from the first region, among the plurality of regions; identifying a second weight matrix associated with the second region in the plurality of filtering configurations; and performing a dot product between the second weight matrix and the second image data to obtain second feature data representative of the second image data being filtered by a second kernel (i.e. Because aspects of embodiments of the present invention are implemented using a convolutional neural network, and because the CNN processes each patch (having dimensions equal in size to the convolutional kernel, w×w) of its input image independently, the training of the CNN can be performed using patches that are selected from the response map and corresponding patches of the training images- ¶0129).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan and Matteo with the teachings of Paolo allow the keypoints to be uniquely identified in each image (¶0077- Paolo).
Regarding claim 4, Fan, Matteo and Paolo teach all the limitations of claim 3 and Fan further teaches:
wherein the first region is configured to capture a central region of the image; the second region is configured to capture a peripheral region of the image; and the second block of pixels has a size larger than the first block of pixel (i.e. see figs. 11).
Regarding claim 5, Fan, Matteo and Paolo teach all the limitations of claim 4.
However, Fan and Matteo do not teach explicitly:
wherein the central region of the image filtered using the first kernel but not the second kernel; and the peripheral region is filtered using the second kernel but not the first kernel.
In the same field of endeavor, Paolo teaches:
wherein the central region of the image filtered using the first kernel but not the second kernel; and the peripheral region is filtered using the second kernel but not the first kernel (i.e. Because aspects of embodiments of the present invention are implemented using a convolutional neural network, and because the CNN processes each patch (having dimensions equal in size to the convolutional kernel, w×w) of its input image independently, the training of the CNN can be performed using patches that are selected from the response map and corresponding patches of the training images- ¶0129).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan and Matteo with the teachings of Paolo allow the keypoints to be uniquely identified in each image (¶0077- Paolo).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Fan Chen et al. [US 20190026864 A1: already of record] in view of Matteo Tiezzi et al. [Foveated Neural Computation] further in view of Paolo Di Febbo et al. [US 20180268256 A1: already of record] and even further in view of Liao Jiping et al. [US 20220207764 A1: already of record].
Regarding claim 6, Fan, Matteo and Paolo teach all the limitations of claim 4.
However, Fan and Matteo do not teach explicitly:
wherein the plurality of filtering configurations further identify a plurality of stride lengths for filtering within the plurality of regions respectively; and the method further comprises: filtering the first region according to a first stride length; filtering the second region according to a second stride length larger than the first stride length.
In the same field of endeavor, Liao teaches:
wherein the plurality of filtering configurations further identify a plurality of stride lengths for filtering within the plurality of regions respectively; and the method further comprises: filtering the first region according to a first stride length; filtering the second region according to a second stride length larger than the first stride length (i.e. The convolution operator is also referred to as a kernel. In image processing, the convolution operator functions as a filter that extracts specific information from a matrix of an input image. The convolution operator may essentially be a weight matrix, and the weight matrix is usually predefined. In a process of performing a convolution operation on an image, the weight matrix usually processes pixels at a granularity level of one pixel (or two pixels, depending on a value of a stride) in a horizontal direction in the input image, to extract a specific feature from the image. A size of the weight matrix is related to a size of the image- ¶0094).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan and Matteo with the teachings of Liao to improve definition of the extended depth of field image (¶0017- Liao).
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Fan Chen et al. [US 20190026864 A1: already of record] in view of Matteo Tiezzi et al. [Foveated Neural Computation] further in view of Paolo Di Febbo et al. [US 20180268256 A1: already of record] and even further in view of Liao Jiping et al. [US 20220207764 A1: already of record] and Sanghoon Lee et al. [Foveated Video Compression with Optimal Rate Control].
Regarding claim 7, Fan, Matteo, Paolo and Liao teach all the limitations of claim 6.
However, Fan, Matteo, Paolo and Liao do not teach explicitly:
wherein the plurality of filtering configurations further identify a plurality of quantization levels for filtering within the plurality of regions respectively; and the method further comprises: quantizing the first image data at a first precision level as an input to the multiplier-accumulator unit; and quantizing the second image data at a second precision level, lower than the first precision level.
In the same field of endeavor, Sanghoon teaches:
wherein the plurality of filtering configurations further identify a plurality of quantization levels for filtering within the plurality of regions respectively; and the method further comprises: quantizing the first image data at a first precision level as an input to the multiplier-accumulator unit; and quantizing the second image data at a second precision level, lower than the first precision level (i.e. The other is nonuniform quantization which maximizes the FSNR subject to a rate constraint over the curvilinear coordinates- page 982, ¶7… The area in Fig. 2(d) becomes Ac which is unchanged near the center of the foveation point and decreases from the foveation point toward the periphery relative to the area A0 in fig. 2b- page 981, ¶5… The QPs for coding macroblocks consist of a quantization state ={q1, q2,…, qM} vector
Q
-
={q1, q2,…, qM}- page 983, ¶1).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan, Matteo, Paolo and Liao with the teachings of Sanghoon to deliver high-quality video at reduced bit rates by seeking to match the nonuniform sampling of the human retina (Sanghoon - Abstract).
Claims 8-9 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Fan Chen et al. [US 20190026864 A1: already of record] in view of Matteo Tiezzi et al. [Foveated Neural Computation] and further in view of Sanghoon Lee et al. [Foveated Video Compression with Optimal Rate Control].
Regarding claim 8, Fan teaches:
8. (Currently Amended) A device (i.e. Embodiments of each of the above application processor 11, persistent storage media 12, graphics subsystem 13, sense engine 14, focus engine 15, motion engine 16, super-resolution foveated renderer 17, and other system components may be implemented in hardware, software, or any suitable combination thereof. For example, hardware implementations may include configurable logic such as, for example, programmable logic arrays (PLAs), FPGAs, complex programmable logic devices (CPLDs), or in fixed-functionality logic hardware using circuit technology such as, for example, ASIC, complementary metal oxide semiconductor (CMOS) or transistor-transistor logic (TTL) technology, or any combination thereof- ¶0021, 0041, 0044 and 0047), comprising:
a first integrated circuit die having an image sensing pixel array configured to generate image data representative of an image of a scene (i.e. For example, a sense engine may include a sensor hub communicatively coupled to two dimensional (2D) cameras, three dimensional (3D) cameras, depth cameras- ¶0024), the image having a first region and a second region(i.e. a camera may capture an image of a pupil and the system may determine where the user is looking (e.g., a focus area, depth, and/or direction). The camera may capture pupil dilation information and the system may infer where the user's focus area is based on that information- ¶0030 …One problem in foveated rendering is producing a smooth transition between the central vision with high pixel details (e.g., near the ROI) and the peripheral vision with less pixel details- ¶0050… Turning now to FIGS. 11A to 11F, embodiments of regions of interest for foveated encoding may be represented by any of a variety of different shapes and sizes. An image area 110 may generally have a rectangular shape or a square shape. A focus area 112 may have any suitable shape such as circular (e.g., FIG. 11A), elliptical (e.g., FIGS. 11B and 11D), square or rectangular (e.g., FIG. 11E), a point (e.g., FIG. 11C), or arbitrary (e.g., FIG. 11F). A ROI 114 may have any suitable shape such as a square (e.g., FIGS. 11A, 11C, and 11E) or a rectangle (e.g., FIGS. 11B, 11D, and 11F). The size of the ROI 114 may be fixed or may be adjusted based on the size of the focus area 112. For example, the size of the ROI 114 may correspond to the size of the focus area 112 plus some delta X and delta Y (e.g., which may be different from each other). The ROI 114 may generally be bigger than the focus area 112 to provide a smooth transition between the relatively higher quality central vision region and the relatively lower quality peripheral vision region- ¶0058);
a second integrated circuit die having a memory cell array configured to store (i.e. these components may be implemented in one or more modules as a set of logic instructions stored in a machine- or computer-readable storage medium such as random access memory (RAM)- ¶0021) a first weight matrix representative of a first kernel of a convolutional neural network and a second weight matrix representative of a second kernel(i.e. Some embodiments may provide a super-resolution technique/framework that may benefit from the foveated characteristic of the human vision. For example, some embodiments may train a super-resolution neural network with synthesized foveated rendering images, which may have high resolution details from super-resolution as well as a smooth transition from high-quality pixels to medium quality peripheral pixels. When the resulting foveated image is rendered on a VR display, for example, some embodiments may achieve less-visible boundaries between the central vision region and the peripheral vision regions to reduce or avoid artifact that may distract the viewer. An example of a suitable super-resolution neural network may include a convolutional neural network (CNN) such as super-resolution CNN (SRCNN) or a fast CNN such as a fast super-resolution CNN (FSRCNN)- ¶0053); and
However, Fan does not teach explicitly:
a third integrated circuit die having a logic circuit configured to:
apply the first kernel to the first region using the first weight matrix to generate first feature data; and apply the second kernel to the second region using the second weight matrix to generate second feature data.
In the same field of endeavor, Matteo teaches:
a third integrated circuit die having a logic circuit configured to:
apply the first kernel to the first region using the first weight matrix to generate first feature data; and apply the second kernel to the second region using the second weight matrix to generate second feature data (i.e. In this paper we propose to go beyond such a scheme, introducing the notion of Foveated Convolutional Layer (FCL), that formalizes the idea of location-dependent convolutions with foveated processing, i.e., fine-grained processing in a given-focused area and coarser processing in the peripheral regions); identifying a first weight matrix associated with the first region in the plurality of filtering configurations (i.e. For each kernel kj :R2 → R, j = 1, . . . , F, the convolution between I and kj is defined as equation 1- page 4, ¶1… Fig. 1: Top-left: out-of-the-box FCLs. Bottom-left: example of R = 4 regions in the piecewise-defined kernel case, when the attention a is given. Right: three strategies (one-per-column) to implement a piecewise-defined kernel, with examples of spatial coverage of the 4 region-wise kernels (coordinates not covered due to dilation are blank). We report right after the strategy name further operations needed to fulfil the uniform spatial coverage assumption- page 5… This implies that all the region-defined filters share related semantics across the image plane, due to the shared nature of the learnable kj . A natural alternative to this model consists in using independent learnable filters in each region- page 7, ¶2).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan with the teachings of Matteo to efficiently handle the information in the peripheral regions, eventually avoiding the development of misleading biases (Matteo- Abstract).
However, Fan and Matteo not teach explicitly:
wherein the logic circuit is configured to apply quantization of image data from the first region according at a first precision level and apply quantization of image data from the second region according to a second precision level different from the first precision level.
In the same field of endeavor, Sanghoon teaches:
wherein the logic circuit is configured to apply quantization of image data (i.e. The other is nonuniform quantization which maximizes the FSNR subject to a rate constraint over the curvilinear coordinates- page 982, ¶7) from the first region according at a first precision level and apply quantization of image data from the second region according to a second precision level different from the first precision level (i.e. The QPs for coding macroblocks consist of a quantization state ={q1, q2,…, qM} vector
Q
-
={q1, q2,…, qM}- page 983, ¶1).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan and Matteo with the teachings of Sanghoon to deliver high-quality video at reduced bit rates by seeking to match the nonuniform sampling of the human retina (Sanghoon - Abstract).
Regarding claim 9, Fan, Matteo and Sanghoon teach all the limitations of claim 8 and Fan further teaches:
further comprising: an integrated circuit package configured to enclose at least the second integrated circuit die and the third integrated circuit die (i.e. FIG. 13 is a block diagram of an example of a system having a small form factor according to an embodiment).
Regarding claim 16, Fan teaches:
16. (Currently Amended) An apparatus (i.e. Embodiments of each of the above application processor 11, persistent storage media 12, graphics subsystem 13, sense engine 14, focus engine 15, motion engine 16, super-resolution foveated renderer 17, and other system components may be implemented in hardware, software, or any suitable combination thereof. For example, hardware implementations may include configurable logic such as, for example, programmable logic arrays (PLAs), FPGAs, complex programmable logic devices (CPLDs), or in fixed-functionality logic hardware using circuit technology such as, for example, ASIC, complementary metal oxide semiconductor (CMOS) or transistor-transistor logic (TTL) technology, or any combination thereof- ¶0021, 0041, 0044 and 0047), comprising:
an image sensor; a lens configured to project an image onto the image sensor (i.e. or example, the user's device(s) may include one or more 2D, 3D, and/or depth cameras- ¶0025);
a storage device (i.e. alternatively, or additionally, these components may be implemented in one or more modules as a set of logic instructions stored in a machine- or computer-readable storage medium such as random access memory (RAM), read only memory (ROM), programmable ROM (PROM), firmware, flash memory, etc., to be executed by a processor or computing device- ¶0021) configured to store:
a region mask configured to identify a focal region of the image and a peripheral region of the image (i.e. The cropped image may be intentionally down-sampled at block 93 and the down-sampled image may be re-up-sampled at block 94 (e.g., to simulate the up-sampling process in the testing phase). The training workflow 90 may then simulate the foveated rendering process by blending the re-up-sampled image with the cropped image at block 95 (e.g., with a pre-defined mask which defines the size of the central vision and the blurry characteristics of the peripheral vision) to generate a foveated image at block 96. The super-resolution neural network may then be trained at block 97 by feeding the down-sampled image as an input to the super-resolution neural network and feeding the generated foveated image to the super-resolution neural network as a target output of the super-resolution neural network. The result of the training may be a trained super-resolution network based on foveated rendering- ¶0054A focus area 112 may have any suitable shape such as circular (e.g., FIG. 11A), elliptical (e.g., FIGS. 11B and 11D), square or rectangular (e.g., FIG. 11E), a point (e.g., FIG. 11C), or arbitrary (e.g., FIG. 11F)- ¶0058, fig. 11);
a communication device (i.e. The radio 718 may be a network controller including one or more radios capable of transmitting and receiving signals using various suitable wireless communications techniques. Such techniques may involve communications across one or more wireless networks- ¶0069); and
However, Fan does not teach explicitly:
a first kernel of a convolutional neural network; and a second kernel; a processor configured to: apply, according to the region mask, the first kernel to the focal region to generate first feature data; and apply, according to the region mask, the second kernel to the peripheral region to generate second feature data.
In the same field of endeavor, Matteo teaches:
a first kernel of a convolutional neural network; and a second kernel; a processor configured to: apply, according to the region mask, the first kernel to the focal region to generate first feature data; and apply, according to the region mask, the second kernel to the peripheral region to generate second feature data (i.e. In this paper we propose to go beyond such a scheme, introducing the notion of Foveated Convolutional Layer (FCL), that formalizes the idea of location-dependent convolutions with foveated processing, i.e., fine-grained processing in a given-focused area and coarser processing in the peripheral regions); identifying a first weight matrix associated with the first region in the plurality of filtering configurations (i.e. For each kernel kj :R2 → R, j = 1, . . . , F, the convolution between I and kj is defined as equation 1- page 4, ¶1… Fig. 1: Top-left: out-of-the-box FCLs. Bottom-left: example of R = 4 regions in the piecewise-defined kernel case, when the attention a is given. Right: three strategies (one-per-column) to implement a piecewise-defined kernel, with examples of spatial coverage of the 4 region-wise kernels (coordinates not covered due to dilation are blank). We report right after the strategy name further operations needed to fulfil the uniform spatial coverage assumption- page 5… This implies that all the region-defined filters share related semantics across the image plane, due to the shared nature of the learnable kj . A natural alternative to this model consists in using independent learnable filters in each region- page 7, ¶2).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan with the teachings of Matteo to efficiently handle the information in the peripheral regions, eventually avoiding the development of misleading biases (Matteo- Abstract).
However, Fan and Matteo not teach explicitly:
wherein the logic circuit is configured to apply quantization of image data from the first region according at a first precision level and apply quantization of image data from the second region according to a second precision level different from the first precision level.
In the same field of endeavor, Sanghoon teaches:
wherein the processor is configured to quantize image data (i.e. The other is nonuniform quantization which maximizes the FSNR subject to a rate constraint over the curvilinear coordinates- page 982, ¶7) from the focal region at a first precision level and quantize image data from the peripheral region at a second precision level lower than the first precision level (i.e. The area in Fig. 2(d) becomes Ac which is unchanged near the center of the foveation point and decreases from the foveation point toward the periphery relative to the area A0 in fig. 2b- page 981, ¶5… The QPs for coding macroblocks consist of a quantization state ={q1, q2,…, qM} vector
Q
-
={q1, q2,…, qM}- page 983, ¶1).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan and Matteo with the teachings of Sanghoon to deliver high-quality video at reduced bit rates by seeking to match the nonuniform sampling of the human retina (Sanghoon - Abstract).
Claims 10-12 are rejected under 35 U.S.C. 103 as being unpatentable over Fan Chen et al. [US 20190026864 A1: already of record] in view of Matteo Tiezzi et al. [Foveated Neural Computation] further in view of of Sanghoon Lee et al. [Foveated Video Compression with Optimal Rate Control] even further in view of Liao Jiping et al. [US 20220207764 A1: already of record].
Regarding claim 10, Fan, Matteo and Sanghoon teach all the limitations of claim 9 and Fan further teaches:
wherein the memory cell array is further configured to store a region mask (i.e. To prepare for training, the logic 53 may be configured to crop a training image based on the region of interest to generate a cropped image- ¶0040… the logic 62 may be configured to crop a training image based on the region of interest to generate a cropped image- ¶0043) configured to identify the first region within the image and the second region within the image (i.e. Turning now to FIG. 9, an embodiment of a workflow 90 for training a super-resolution network based on foveated rendering may include starting with a dataset of high quality training images at block 90. For each high quality image in the original dataset, the image may be cropped at block 92 at randomly generated central vision positions. The cropped image may be intentionally down-sampled at block 93 and the down-sampled image may be re-up-sampled at block 94 (e.g., to simulate the up-sampling process in the testing phase). The training workflow 90 may then simulate the foveated rendering process by blending the re-up-sampled image with the cropped image at block 95 (e.g., with a pre-defined mask which defines the size of the central vision and the blurry characteristics of the peripheral vision) to generate a foveated image at block 96- ¶0054 ).
However, Fan, Matteo and Sanghoon do not teach explicitly:
the logic circuit is further configured to select, according to the region mask, the first kernel and the second kernel to filter the first region and the second region in generation of the first feature data and the second feature data.
In the same field of endeavor, Liao teaches:
the logic circuit is further configured to select, according to the region mask, the first kernel and the second kernel to filter the first region and the second region in generation of the first feature data and the second feature data.
(i.e. The convolution operator is also referred to as a kernel. In image processing, the convolution operator functions as a filter that extracts specific information from a matrix of an input image. The convolution operator may essentially be a weight matrix, and the weight matrix is usually predefined. In a process of performing a convolution operation on an image, the weight matrix usually processes pixels at a granularity level of one pixel (or two pixels, depending on a value of a stride) in a horizontal direction in the input image, to extract a specific feature from the image. A size of the weight matrix is related to a size of the image- ¶0094).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan, Matteo and Sanghoon with the teachings of Liao to improve definition of the extended depth of field image (¶0017- Liao).
Regarding claim 11, Fan, Matteo, Sanghoon and Liao teach all the limitations of claim 10.
However, Fan, Matteo and Sanghoon do not teach explicitly:
further comprising: a communication device configured to communicate the first feature data and the second feature data to a remote server system.
In the same field of endeavor, Liao teaches:
further comprising: a communication device configured to communicate the first feature data and the second feature data to a remote server system (i.e. A computational pipeline for performing a computer vision task includes keypoint detection, such as in the flowchart of FIG. 2. According to various embodiments of the present invention, various portions of the computational pipeline can be performed locally (e.g., in hardware directly connected to the cameras 302 capturing images of the scenes to be analyzed) or remotely (e.g., in hardware connected to the cameras over a network, such as on a computer server). Systems and methods for processing the various stages of the computational pipeline locally or remotely are described in more detail in U.S. patent application Ser. No. 15/805,107 “System and Method for Portable Active 3D Scanning,” filed in the United States Patent and Trademark Office on Nov. 6, 2017, the entire disclosure of which is incorporated by reference herein- ¶0179).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan, Matteo and Sanghoon do with the teachings of Liao to improve definition of the extended depth of field image (¶0017- Liao).
Regarding claim 12, Fan, Matteo, Sanghoon and Liao teach all the limitations of claim 11.
However, Fan and Matteo do not teach explicitly:
wherein the logic circuit is configured to apply the first kernel in the first region according to a first stride length and apply the second kernel in the second region according to a second stride length different from the first stride length.
In the same field of endeavor, Liao teaches:
wherein the logic circuit is configured to apply the first kernel in the first region according to a first stride length and apply the second kernel in the second region according to a second stride length different from the first stride length (i.e. The convolution operator is also referred to as a kernel. In image processing, the convolution operator functions as a filter that extracts specific information from a matrix of an input image. The convolution operator may essentially be a weight matrix, and the weight matrix is usually predefined. In a process of performing a convolution operation on an image, the weight matrix usually processes pixels at a granularity level of one pixel (or two pixels, depending on a value of a stride) in a horizontal direction in the input image, to extract a specific feature from the image. A size of the weight matrix is related to a size of the image- ¶0094).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan, Matteo and Sanghoon with the teachings of Liao to improve definition of the extended depth of field image (¶0017- Liao).
Claim 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Fan Chen et al. [US 20190026864 A1: already of record] in view of Matteo Tiezzi et al. [Foveated Neural Computation] further in view of Sanghoon Lee et al. [Foveated Video Compression with Optimal Rate Control] and even further in view of Bing Song et al. [US 20190080453 A1: already of record].
Regarding claim 17, Fan, Matteo and Sanghoon teaches all the limitations of claim 16:
However, Fan, Matteo and Sanghoon do not teach explicitly:
wherein the processor is further configured to recognize anomaly in the image based on the first feature data and the second feature data.
In the same field of endeavor, Bing teaches:
wherein the processor is further configured to recognize anomaly in the image based on the first feature data and the second feature data(i.e. In step 500C, using client 401 or client 403, the technician uploads the WSI to web server 405 (there may be multiple web servers but only one is shown for simplicity). Web server 405 then transmits the WSI to file server 407, which stores a plurality of WSIs and related metadata. Web server 405 then uses a queueing engine running on the web server to determine which of GPU Servers 409 or 411 to use to break up the WSI into patches and extra features based on a load balancing algorithm. Once a GPU server is selected, the web server 405 transmits instructions to the selected GPU server (such as GPU server 409) to break up the WSI into patches and extract features using a convolutional neural network running on GPU server 409 (as discussed in more detail below with reference to step 505 of FIG. 5B).- ¶0071… In step 500D, once GPU server 409 (or 411) extracts the features using the convolutional neural network, it transmits a message to file server 407 to store metadata associated with WSI, the metadata including patch location, patch size (e.g., 400×400 pixels), and values of each of the features of the patch extracted by the GPU server 409. The metadata is then stored on file server 407 (associated with the stored WSI) and can be accessed by any GPU server (for example, 409 and 411) or web server 405.- ¶0072…It is noted that the above process of determining which of the previous positive patches 302D are in the same class as current positive patches 302B via the positive one-class SVM 311 in step 509C may be performed by other means of outlier detection. For example, elliptic envelope, or isolation forest (which, for example, detects data-anomalies using binary trees) may be used in place of, or in combination with, a one-class SVM to determine which patches in previous positive patches 302D to remove as outliers.- ¶0105).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan, Matteo and Sanghoon with the teachings of Bing to facilitate various services, such as the storage, caching, or transmission of content, streaming media and applications on behalf of content providers (Bing- ¶0057)
Regarding claim 18, Fan Matteo and Sanghoon teaches all the limitations of claim 16:
However, Fan, Matteo and Sanghoon do not teach explicitly:
wherein the processor is further configured to communicate, using the communication device, the first feature data and the second feature data to a remote server system configured to recognize anomaly in the image based on the first feature data and the second feature data.
In the same field of endeavor, Bing teaches:
wherein the processor is further configured to communicate, using the communication device, the first feature data and the second feature data to a remote server system configured to recognize anomaly in the image based on the first feature data and the second feature data. (i.e. In step 500C, using client 401 or client 403, the technician uploads the WSI to web server 405 (there may be multiple web servers but only one is shown for simplicity). Web server 405 then transmits the WSI to file server 407, which stores a plurality of WSIs and related metadata. Web server 405 then uses a queueing engine running on the web server to determine which of GPU Servers 409 or 411 to use to break up the WSI into patches and extra features based on a load balancing algorithm. Once a GPU server is selected, the web server 405 transmits instructions to the selected GPU server (such as GPU server 409) to break up the WSI into patches and extract features using a convolutional neural network running on GPU server 409 (as discussed in more detail below with reference to step 505 of FIG. 5B).- ¶0071… In step 500D, once GPU server 409 (or 411) extracts the features using the convolutional neural network, it transmits a message to file server 407 to store metadata associated with WSI, the metadata including patch location, patch size (e.g., 400×400 pixels), and values of each of the features of the patch extracted by the GPU server 409. The metadata is then stored on file server 407 (associated with the stored WSI) and can be accessed by any GPU server (for example, 409 and 411) or web server 405.- ¶0072…It is noted that the above process of determining which of the previous positive patches 302D are in the same class as current positive patches 302B via the positive one-class SVM 311 in step 509C may be performed by other means of outlier detection. For example, elliptic envelope, or isolation forest (which, for example, detects data-anomalies using binary trees) may be used in place of, or in combination with, a one-class SVM to determine which patches in previous positive patches 302D to remove as outliers.- ¶0105).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan, Matteo and Sanghoon with the teachings of Bing to to facilitate various services, such as the storage, caching, or transmission of content, streaming media and applications on behalf of content providers (Bing- ¶0057)
Claim 19 are rejected under 35 U.S.C. 103 as being unpatentable over Fan Chen et al. [US 20190026864 A1: already of record] in view of Matteo Tiezzi et al. [Foveated Neural Computation] further in view of Sanghoon Lee et al. [Foveated Video Compression with Optimal Rate Control] and even further in view of Liao Jiping et al. [US 20220207764 A1: already of record].
Regarding claim 19, Fan, Matteo and Sanghoon teaches all the limitations of claim 16:
However, Fan, Matteo and Sanghoon do not teach explicitly:
wherein the processor is configured to apply the first kernel in the focal region according to a first stride length and apply the second kernel in the peripheral region according to a second stride length larger than the first stride length.
In the same field of endeavor, Liao teaches:
wherein the processor is configured to apply the first kernel in the focal region according to a first stride length and apply the second kernel in the peripheral region according to a second stride length larger than the first stride length (i.e. The convolution operator is also referred to as a kernel. In image processing, the convolution operator functions as a filter that extracts specific information from a matrix of an input image. The convolution operator may essentially be a weight matrix, and the weight matrix is usually predefined. In a process of performing a convolution operation on an image, the weight matrix usually processes pixels at a granularity level of one pixel (or two pixels, depending on a value of a stride) in a horizontal direction in the input image, to extract a specific feature from the image. A size of the weight matrix is related to a size of the image- ¶0094).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Fan, Matteo and Sanghoon with the teachings of Liao to improve definition of the extended depth of field image (¶0017- Liao).
Allowable Subject Matter
Claims 14 and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLIFFORD HILAIRE whose telephone number is (571)272-8397. The examiner can normally be reached 5:30-1400.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SATH V PERUNGAVOOR can be reached at (571)272-7455. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
CLIFFORD HILAIRE
Primary Examiner
Art Unit 2488
/CLIFFORD HILAIRE/Primary Examiner, Art Unit 2488