DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-30 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (Liu) (US 2023/0110558 A1).
Regarding claim 1, Liu discloses an apparatus for image processing (e.g., a system 100, paragraph 88, figure 1), comprising:
at least one memory comprising instructions; and at least one processor coupled to the at least one memory (e.g., The system 100 may further include a server 110 having at least a processor, a memory, a storage medium, and/or other components, paragraph 88), wherein the at least one processor is configured to:
detect a first object in an image using a first object detection machine learning model (e.g., the at least one processor may be configured to execute programming instructions stored in the memory to analyze the images and perform object detection on each of the images to detect one or more objects in the images, paragraph 88);
Liu, in one embodiment, does not specifically disclose upscale the image to generate an upscaled image;
generate a plurality of sub-images from the upscaled image based on the first object;
perform object detection on the plurality of sub-images using a second object detection machine learning model to detect a second object;
fuse locations of objects detected in the plurality of sub-images into a single object location; and
output a location of the first object and second object in the upscaled image.
Liu, in another embodiment, discloses upscale the image to generate an upscaled image (e.g., In some examples, the pre-trained machine learning network may include a feature pyramid network and an upsampling-and-concatenation network, paragraph 104);
generate a plurality of sub-images from the upscaled image based on the first object (e.g., In some embodiments, the upsampling-and-concatenation network in the pre-trained machine learning network may be configured to take each feature map and upsample it (e.g., using a bi-linear interpolation) to a common size h×w, paragraph 105);
perform object detection on the plurality of sub-images using a second object detection machine learning model to detect a second object (e.g., In some embodiments, detecting the one or more objects may further include recognizing the one or more objects in the image by processing the locations of the one or more objects in the image using another machine learning model (e.g., a second machine learning model) and the feature map obtained from the pre-trained machine learning model, paragraph 92);
fuse locations of objects detected in the plurality of sub-images into a single object location (e.g., the multiple feature maps may be respectively converted to feature maps of sizes: h×w×D1, h×w×D2, and h×w×D3. These feature maps may be concatenated to generate the output feature map of size h×w×D, paragraph 105); and
output a location of the first object and second object in the upscaled image (e.g., In some examples, the number of channels D in the output feature map may have a value D=D1+D2+D3. In a non-limiting example, the feature map may have a different size than that of the image. For example, h and w may take the values of H/8 and W/8, respectively, or other suitable values, where H and W are the size of the image. The width and height of an image may not be equal, nor are the width and height of a feature map, paragraph 105).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention to have modified Liu in one embodiment, to include upscale the image to generate an upscaled image; generate a plurality of sub-images from the upscaled image based on the first object; perform object detection on the plurality of sub-images using a second object detection machine learning model to detect a second object; fuse locations of objects detected in the plurality of sub-images into a single object location; and output a location of the first object and second object in the upscaled image as taught by Liu in another embodiment. It would have been obvious to one of ordinary skill in the art at the time of the invention to have modified Liu in one embodiment by the teaching of Liu in another embodiment to use for particular application.
Regarding claim 2, Liu discloses wherein the first object detection machine learning model and the second object detection machine learning model are a same object detection machine learning model (e.g., The first and the second machine learning models may be trained in a field training process, paragraph 4).
Regarding claim 3, Liu discloses wherein the first object detection machine learning model and the second object detection machine learning model are different object detection machine learning models (e.g., processing the feature map of the image using a first machine learning model to generate a character center heatmap for the image, wherein the character center heatmap includes a plurality of samples each having a value indicative of a likelihood of a corresponding sample in the image being a center of a character; and processing the feature map of the image and the character center heatmap for the image using a second machine learning model to recognize one or more characters in the image, paragraph 19).
Regarding claim 4, Liu discloses wherein the second object was missed by the first object detection machine learning model (e.g., In some examples, the method may additionally include processing the locations of the one or more objects in the image using a second machine learning model and the feature map to recognize at least one object of the one or more objects, paragraph 4).
Regarding claim 5, Liu discloses wherein the second object detection machine learning model outputs a location of the second object relative to an image of the plurality of sub-images, and wherein the at least one processor is further configured to transform the location of the second object from a coordinate system relative to the image of the plurality of sub-images to a coordinate system relative to the upscaled image (e.g., The pre-trained machine learning model 220 may be configured to encode some features of the image for subsequent processing. In some examples, the pre-trained machine learning network may include a feature pyramid network and an upsampling-and-concatenation network. The feature pyramid network may be a neural network that takes an image as input and outputs one or more feature maps that are generated independently, for example, at different scales. In a non-limiting example, the feature pyramid network may generate multiple feature maps of different sizes, e.g., h1×w1×D1, h2×w2×D2, and h3×w3×D3. Various implementations of a feature pyramid network may be available, paragraph 104).
Regarding claims 6 and 7, Liu discloses wherein the first object detection machine learning model and the second object detection machine learning model are configurable; and wherein the at least one processor is further configured to: receive a first indication of a first machine learning model to use as the first object detection machine learning model; and receive a second indication of a second machine learning model to use as the second object detection machine learning model (e.g., n some aspects, the above described techniques may be applied to the problem of character detection (e.g., OCR). For example, a first machine learning model may be configured to extract certain features of the image that allow detecting characters locations in the image. A second machine learning model may be configured to extract certain other features of the image that allow the system to recognize the characters in the image, paragraph 66).
Regarding claims 8 and 9, Liu discloses wherein the image is upscaled using an upscaling machine learning model based on a scaling factor; wherein the at least one processor is further configured to receive an indication of a machine learning model to use as the upscaling machine learning model along with an indication of the scaling factor (e.g., In some examples, the pre-trained machine learning network may include a feature pyramid network and an upsampling-and-concatenation network. The feature pyramid network may be a neural network that takes an image as input and outputs one or more feature maps that are generated independently, for example, at different scales. In a non-limiting example, the feature pyramid network may generate multiple feature maps of different sizes, e.g., h1×w1×D1, h2×w2×D2, and h3×w3×D3. Various implementations of a feature pyramid network may be available. For example, feature pyramid networks as described in Lin et al., “Feature Pyramid Networks for Object Detection,” December, 2016, (arxiv.org/abs/1612.03144), may be used, and herein incorporated by reference in its entirety, paragraph 104).
Regarding claim 10, Liu discloses wherein the plurality of sub-images is generated based on locations of each relevant object detected in the image (e.g., The method may use a pre-trained machine learning model to determine the feature map. The pre-trained machine learning model may be a deep machine learning model. The method further includes processing the feature map of the image using a first machine learning model to generate an object center heatmap for the image, where the object center heatmap includes a plurality of samples each having a value indicative of a likelihood of a corresponding sample in the image being a center of an object. The method further include determining locations of one or more objects in the image based on the object center heatmap, paragraph 4).
Regarding claim 11, claim 11 is a method claim for image processing with limitations similar of limitations of claim 1. Therefore, claim 11 is rejected as set forth above as claim 1.
Regarding claim 12, claim 12 is a method claim for image processing with limitations similar of limitations of claim 2. Therefore, claim 12 is rejected as set forth above as claim 2.
Regarding claim 13, claim 13 is a method claim for image processing with limitations similar of limitations of claim 3. Therefore, claim 13 is rejected as set forth above as claim 3.
Regarding claim 14, claim 14 is a method claim for image processing with limitations similar of limitations of claim 4. Therefore, claim 14 is rejected as set forth above as claim 4.
Regarding claim 15, claim 15 is a method claim for image processing with limitations similar of limitations of claim 5. Therefore, claim 15 is rejected as set forth above as claim 5.
Regarding claims 16 and 17, claims 16 and 17 are a method claim for image processing with limitations similar of limitations of claims 6 and 7. Therefore, claims 16 and 17 are rejected as set forth above as claims 6 and 7.
Regarding claims 18 and 19, claims 18 and 19 are a method claim for image processing with limitations similar of limitations of claims 8 and 9. Therefore, claims 18 and 19 are rejected as set forth above as claims 8 and 9.
Regarding claim 20, claim 20 is a method claim for image processing with limitations similar of limitations of claim 10. Therefore, claim 20 is rejected as set forth above as claim 10.
Regarding claim 21, claim 21 is a non-transitory computer-readable medium claim with limitations similar of limitations of claim 1. Therefore, claim 21 is rejected as set forth above as claim 1.
Regarding claim 22, claim 22 is the non-transitory computer-readable medium claim with limitations similar of limitations of claim 2. Therefore, claim 22 is rejected as set forth above as claim 2.
Regarding claim 23, claim 23 is the non-transitory computer-readable medium claim with limitations similar of limitations of claim 3. Therefore, claim 23 is rejected as set forth above as claim 3.
Regarding claim 24, claim 24 is the non-transitory computer-readable medium claim with limitations similar of limitations of claim 4. Therefore, claim 24 is rejected as set forth above as claim 4.
Regarding claim 25, claim 25 is the non-transitory computer-readable medium claim with limitations similar of limitations of claim 5. Therefore, claim 25 is rejected as set forth above as claim 5.
Regarding claims 26 and 27, claims 26 and 27 are the non-transitory computer-readable medium claims with limitations similar of limitations of claims 6 and 7. Therefore, claims 16 and 17 are rejected as set forth above as claims 6 and 7.
Regarding claims 28 and 29, claims 28 and 29 are the non-transitory computer-readable medium claims with limitations similar of limitations of claims 8 and 9. Therefore, claims 28 and 29 are rejected as set forth above as claims 8 and 9.
Regarding claim 30, claim 30 is the non-transitory computer-readable medium claim with limitations similar of limitations of claim 10. Therefore, claim 30 is rejected as set forth above as claim 10.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to QUANG N VO whose telephone number is (571)270-1121. The examiner can normally be reached Monday-Friday, 7AM-4PM, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Akwasi M. Sarpong can be reached at 571-270-3438. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/QUANG N VO/Primary Examiner, Art Unit 2681