Last updated: April 19, 2026
Application No. 18/612,220
APPARATUS AND METHOD FOR SCENE GRAPH GENERATION AND METHOD FOR ENCODING IMAGE

Non-Final OA §102§103
Filed
Mar 21, 2024
Examiner
SAMS, MICHELLE L
Art Unit
2611
Tech Center
2600 — Communications
Assignee
ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
OA Round
1 (Non-Final)
Interview Optional

— +8.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 481 resolved cases, 2023–2026
Examiner Intelligence

SAMS, MICHELLE L View full profile →
Grants 76% — above average
Career Allow Rate
364 granted / 481 resolved
+13.7% vs TC avg
Moderate +8% lift
Without
With
+8.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
10 currently pending
Career history
491
Total Applications
across all art units
Statute-Specific Performance

§101
16.1%
-23.9% vs TC avg
§103
51.5%
+11.5% vs TC avg
§102
10.6%
-29.4% vs TC avg
§112
14.8%
-25.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 481 resolved cases
Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/21/2024 and 10/15/2025 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because Fig. 10 is referred to in the drawings but the specification refers to the drawing as “Fig. 11” in [0103] of PG Pub US 2024/0420392 A1. “Fig. 11” does not exist.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 15-19 are rejected under 35 U.S.C. 102(1) as being anticipated by LEE et al. (2021/0365724 A1).

RE claim 15, Lee teaches a method for image encoding, comprising: 
(a)
generating multiple candidate bounding boxes by applying a convolution layer to a first feature map extracted from an input image; 

Fig. 1, Lee teaches the feature map extraction module (10) may receive an image for object detection, that is, an input image (IMG1), and extract a feature map having multiple resolutions (said first feature map) for the input image (IMG1) [0044]. Additionally, Lee teaches the bounding box detection module (20) may classify (or identify) a bounding box by applying a first group of convolution layers to the feature map extracted by the feature map extraction module (10) [0049, 0077-0081, Fig. 5]. The bounding box detection module (20) may set offsets in multiple directions based on the center point of the object (said multiple candidate bounding boxes) [0054]. 
(b)
extracting a second feature map that is based on multiple masks for shapes of objects within the multiple candidate bounding boxes using the first feature map; and 

Lee teaches the bounding box detection module (20) may classify (or identify) a bounding box by applying a first group of convolution layers to the feature map (said first feature map) extracted by the feature map extraction module (10) [0049]. Additionally, the mask generation module (30), using the feature map extracted by the feature map extraction module (10), may generate a mask for the shape of the object in the bounding box predicted by the bounding box detection module (20) to output an output image (IMG2) [0057]. Lee provides the example when four “zebras” are detected, the output image (IMG2) may include four masks (said generating multiple masks) [0060].

generating a third feature map by combining the first feature map and the second feature map.

Lee teaches the mask generation module (30), using the feature map extracted by the feature map extraction module (10), may generate a mask for the shape of the object in the bounding box predicted by the bounding box detection module (20) to output an output image (IMG2) (said third feature map) [0057].


RE claim 16, Lee teaches wherein generating the multiple candidate bounding boxes includes:
(a)
applying a convolution layer of a first group to the first feature map and performing classification of the bounding box using a binary classifier; and

Lee teaches the bounding box detection module (20) may classify (or identify) a bounding box by applying a first group of convolution layers to the feature map extracted by the feature map extraction module (10) [0049]. Lee also teaches the bounding box detection module (20) may classify a bounding box using a binary classifier [0051]. 
(b)
predicting the bounding box by applying a convolution layer of a second group of the first feature map.

In addition, the bounding box detection module (20) may predict the bounding box by applying the second group of convolution layers to the feature map extracted by the feature map extraction module (10) [0052].


RE claim 17, Lee teaches wherein predicting the bounding box comprises setting offsets in multiple directions based on a center point of the object and estimating a location and size of the bounding box.
Lee teaches the bounding box detection module (20) may set offsets in multiple directions based on the center point of the object (said multiple candidate bounding boxes), and then estimate the position (said estimating a location) and the size of the bounding box (said estimating size of the bounding box) [0054]. 

RE claim 18, Lee teaches wherein generating the multiple candidate bounding boxes comprises adjusting confidence of the predicted bounding box based on a confidence score for the classification of the bounding box and centeredness indicating a degree of matching between a center of the predicted bounding box and a center of ground truth (GT).
Lee teaches the bounding box module (20) may adjust the reliability of the predicted bounding box based on the confidence score for the classification of the bounding box and the centeredness indicating the degree to which the predicted bounding box coincides with the ground truth (GT) [0055]. 

RE claim 19, Lee teaches wherein generating the masks includes: 
(a)
extracting a region corresponding to the bounding box from the first feature map and warping the region into a feature map having a preset first resolution; 

Lee teaches the mask generation module (30) may extract an area corresponding to the bounding box from the feature map, and then perform warping with a feature map having a preset resolution [0058].
(b)
acquiring a convolution feature map by applying a convolution layer to a warped feature map acquired as a result of warping; 

Lee teaches the mask generation module (30) may obtain a convolutional feature map by applying a convolutional layer to the warped feature map [0058]. 
(c)
generating a max-pooled feature map and an average-pooled feature map by performing max pooling and average pooling on the convolution feature map; 

Lee teaches combining a maximum pooling feature map and an average pooling feature map by performing maximum pooling and average pooling on the convolutional feature map [0058].
(d)
acquiring an attention map by combining the max-pooled feature map and the average-pooled feature map and applying a nonlinear function to a combination of the max-pooled feature map and the average-pooled feature map; 

Additionally, Lee teaches the mask generation module (30) may obtain an attention map by applying a nonlinear function to the combined maximum pooling feature map and average pooling feature map [0059]. 
(e)
acquiring an up-sampling result having a second resolution higher than the first resolution by performing up-sampling on a result of multiplying the attention map and the convolution feature map; and 

Lee teaches the mask generation module (30) may multiply the attention map (35) and the convolutional feature map (32), then performs up-sampling on the multiplied result (16) to obtain the up-sampling result (37) [0080], and 
(f)
generating the mask by performing binary classification on the up-sampling result.

Lee teaches performing a binary classification on the multiplied result to generate the mask (38) [0080].

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over RAO et al. (2020/0394434 A1).

RE claim 1, Rao teaches a system and method for processing an image. Rao teaches a method for scene graph generation, comprising: 
(a)
extracting a first feature map from an input image; 

Fig. 4A and 4B, the machine-learning system (410) performs method (400) for image searching and image cropping [0061]. Step (401), a saliency map (said first feature map) for an image is generated by a saliency map generator (412) [0062, 0071, 0073]. The saliency map (414) represents where areas of interest (e.g., the one or more salient portions including salient objects) in the image (411) are located, based on information about the content of the image (411) [0071]. 
(b)
extracting a second feature map that is based on a mask for a shape of an object within a bounding box using the first feature map; 

Fig. 4B the saliency map (414(1)) (said first feature map) may represent boundaries or contours of each salient portion detect from the image (411) [0073]. As shown in Fig. 4B, two helmets, a boy, a girl, and two bikes are dominant objects in the image (411), i.e., salient objects. Other objects in the image (411), such as clouds and grass, are less important, which may be located in background of image (411), i.e., background objects [0074]. The saliency map (414) may define the saliency portions based on the boundaries of salient objects and may omit background objects of the image (411) [0074]. Fig. 4D, object segmentation and regression may be performed on image (411) [0081]. Feature vectors of a plurality of detected objects (420) within the image (411) may be extracted by the image feature extractor (4131). The segmentation and regression unit (4132) receives the feature vectors of the plurality of objects and generates labels (422) and bounding boxes (421) for each of the plurality of objects (420) (said mask for a shape of an object within a bounding box) [0081]. The segmentation and regression unit (4132) further generates bounding boxes (421) for each detected object (420) along with corresponding location parameters (434) for the bounding box (421) [0082]. A second dataset (said second feature map) is generated comprising image information (411)(1) and bounding box information (421)(1) [0082].
(c)
generating a third feature map by combining the first feature map and the second feature map; and 

Rao does not specifically reference a third feature map however, Rao does teach in the rationales of claims 1(a) and 1(b), the saliency map (said first feature map) [0061] in combination with the object segmentation and regression (said second feature map) [0081] provides the information to generate the scene graph of Rao (said combining the first feature map and the second feature map). Thus, it would have been obvious before the effective filing date of the claimed invention to reference the output used in the combination of the saliency map and segmentation as the claimed third feature map since this information is further used to generate the scene graph. 
(d)
generating a scene graph by predicting a relationship between objects from the third feature map.

The scene graph generator generates a scene graph for at least the one or more salient portions of the image [0064-0065]. The scene graph may be pruned to be relevant only to the salient portions of the image [0070]. Object detection and object classification is performed within the boundaries of the salient portion(s) of the image (411), to generate nodes of the scene graph (416) [0076]. The object pair detector (4133) receives the labels (422) and determines which objects from the detected objects can form different object pairs (431) [0077, 0083]. The relationship extractor (4134) performs relationships extraction between each respective object pair (431) by receiving the generated bounding boxes (421) from the segmentation and regression unit (4132) [0084]. At least one vector is calculated to connect the two bounding boxes associated with the object pair (431) [0084].


RE claim 8, claim 8 recites similar limitations as claim 1 but in system form. Therefore, the same rationale used for claim 1 is applied. Rao further teaches the system displayed in Fig. 2 that includes the processing system (200) (said backbone network, encoder, decoder) that executes the limitations of claim 8.

Claims 2-7, 9-14 are rejected under 35 U.S.C. 103 as being unpatentable over RAO et al. (2020/0394434 A1) in view of LEE et al. (2021/0365724 A1).

RE claim 2, Rao teaches the limitations of claim 2 with the exception of providing the specifics discussed below. Lee is made of record as teaching an object detection system. In further view of Lee, Lee teaches wherein extracting the second feature map includes: 
(a)
generating multiple candidate bounding boxes by applying a convolution layer to the first feature map; and

Lee teaches the bounding box detection module (20) may classify (or identify) a bounding box by applying a first group of convolution layers to the feature map extracted by the feature map extraction module (10) [0049, 0077-0081, Fig. 5]. The bounding box detection module (20) may set offsets in multiple directions based on the center point of the object (said multiple candidate bounding boxes) [0054].
(b)
generating multiple masks for shapes of objects within the multiple candidate bounding boxes using the first feature map.

The mask generation module (30), using the feature map extracted by the feature map extraction module (10), may generate a mask for the shape of the object in the bounding box predicted by the bounding box detection module (20) to output an output image (IMG2) [0057]. Lee provides the example when four “zebras” are detected, the output image (IMG2) may include four masks (said generating multiple masks) [0060].
It would have been obvious before the effective filing date of the claimed invention to utilize the object detection and segmentation of Lee within the method/system of Rao because the method of Lee is based on points without using a predefined anchor box that requires a high computation amount and memory usage. This makes it possible to achieve efficiency in terms of computational amount and memory occupancy [Lee: 0061]. Furthermore, it is possible to implement real-time object detection and segmentation in various fields based on platforms with little computing power [0061]. 


RE claim 3, in further view of Lee, Lee teaches wherein generating the multiple candidate bounding boxes includes:
(a)
applying a convolution layer of a first group to the first feature map and performing classification of the bounding box using a binary classifier; and

Lee teaches the bounding box detection module (20) may classify (or identify) a bounding box by applying a first group of convolution layers to the feature map extracted by the feature map extraction module (10) [0049]. Lee also teaches the bounding box detection module (20) may classify a bounding box using a binary classifier [0051]. 
(b)
predicting the bounding box by applying a convolution layer of a second group of the first feature map.

In addition, the bounding box detection module (20) may predict the bounding box by applying the second group of convolution layers to the feature map extracted by the feature map extraction module (10) [0052].
The same motivation to combine as taught in the rationale of claim 2 is incorporated herein.


RE claim 4, in further view of Lee, Lee teaches wherein predicting the bounding box comprises setting offsets in multiple directions based on a center point of the object and estimating a location and size of the bounding box.
Lee teaches the bounding box detection module (20) may set offsets in multiple directions based on the center point of the object (said multiple candidate bounding boxes), and then estimate the position (said estimating a location) and the size of the bounding box (said estimating size of the bounding box) [0054]. The same motivation to combine as taught in the rationale of claim 2 is incorporated herein.

RE claim 5, in further view of Lee, Lee teaches wherein generating the multiple candidate bounding boxes comprises adjusting confidence of the predicted bounding box based on a confidence score for the classification of the bounding box and centeredness indicating a degree of matching between a center of the predicted bounding box and a center of ground truth (GT).
Lee teaches the bounding box module (20) may adjust the reliability of the predicted bounding box based on the confidence score for the classification of the bounding box and the centeredness indicating the degree to which the predicted bounding box coincides with the ground truth (GT) [0055]. The same motivation to combine as taught in the rationale of claim 2 is incorporated herein.

RE claim 6, in further view of Lee, Lee teaches wherein generating the masks includes: 
(a)
extracting a region corresponding to the bounding box from the first feature map and warping the region into a feature map having a preset first resolution; 

Lee teaches the mask generation module (30) may extract an area corresponding to the bounding box from the feature map, and then perform warping with a feature map having a preset resolution [0058].
(b)
acquiring a convolution feature map by applying a convolution layer to a warped feature map acquired as a result of warping; 

Lee teaches the mask generation module (30) may obtain a convolutional feature map by applying a convolutional layer to the warped feature map [0058]. 
(c)
generating a max-pooled feature map and an average-pooled feature map by performing max pooling and average pooling on the convolution feature map; 

Lee teaches combining a maximum pooling feature map and an average pooling feature map by performing maximum pooling and average pooling on the convolutional feature map [0058].
(d)
acquiring an attention map by combining the max-pooled feature map and the average-pooled feature map and applying a nonlinear function to a combination of the max-pooled feature map and the average-pooled feature map; 

Additionally, Lee teaches the mask generation module (30) may obtain an attention map by applying a nonlinear function to the combined maximum pooling feature map and average pooling feature map [0059]. 
(e)
acquiring an up-sampling result having a second resolution higher than the first resolution by performing up-sampling on a result of multiplying the attention map and the convolution feature map; and 

Lee teaches the mask generation module (30) may multiply the attention map (35) and the convolutional feature map (32), then performs up-sampling on the multiplied result (16) to obtain the up-sampling result (37) [0080], and 
(f)
generating the mask by performing binary classification on the up-sampling result.

Lee teaches performing a binary classification on the multiplied result to generate the mask (38) [0080].
The same motivation to combine as taught in the rationale of claim 2 is incorporated herein.


RE claim 7, in further view of Lee, Lee teaches wherein extracting the first feature map includes:
(a)
extracting multiple feature maps for respective layers in a backbone network; 

Fig. 2, Lee teaches a feature map extraction module (20) of an object detection system that may generate a feature pyramid (13) from a backbone network (11) [0063]. The feature map extraction module (10) may extract a feature map having multiple resolutions using the feature pyramid (13) [0044, 0065-0066].

forming a feature pyramid for fusing information of the multiple feature maps for the respective layers by adding the extracted multiple feature maps for the respective layers in reverse order; and 

The feature map extraction module (10) may construct a feature pyramid that combines information of feature maps of each layer from the input image (IMG1), and use the feature pyramid to extract the feature map having multiple resolutions [0047]. The feature pyramid may be configured by extracting feature maps for each layer from a backbone network and adding the feature maps for each layer in reverse order [0048].

extracting the first feature map having multi-resolution for the image using the feature pyramid.

The feature map extraction module (10) may extract a feature map having multiple resolutions using the feature pyramid (13) [0044, 0065-0066].
It would have been obvious before the effective filing date of the claimed invention to utilize the object detection and segmentation of Lee within the method/system of Rao because the method of Lee is based on points without using a predefined anchor box that requires a high computation amount and memory usage. This makes it possible to achieve efficiency in terms of computational amount and memory occupancy [Lee: 0061]. Furthermore, it is possible to implement real-time object detection and segmentation in various fields based on platforms with little computing power [0061]. 


RE claim 9, claim 9 recites similar limitations as claim 2 but in system form. Therefore, the same rationale used for claim 2 is applied.

RE claim 10, claim 10 recites similar limitations as claim 3 but in system form. Therefore, the same rationale used for claim 3 is applied.

RE claim 11, claim 11 recites similar limitations as claim 4 but in system form. Therefore, the same rationale used for claim 4 is applied.

RE claim 12, claim 12 recites similar limitations as claim 5 but in system form. Therefore, the same rationale used for claim 5 is applied.

RE claim 13, claim 13 recites similar limitations as claim 6 but in system form. Therefore, the same rationale used for claim 6 is applied.

RE claim 14, claim 14 recites similar limitations as claim 7 but in system form. Therefore, the same rationale used for claim 7 is applied.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE L SAMS:
direct telephone number:
(571) 272-7661
email:
michelle.sams@uspto.gov


The examiner is currently part time and can be reached Mon.-Fri. 5:30am-9:30am.
Examiner interviews are available via telephone and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee M. Tung can be reached on (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHELLE L SAMS/
Primary Examiner, Art Unit 2611
26 January 2026
Read full office action
Prosecution Timeline

Mar 21, 2024
Application Filed
Jan 21, 2026
Non-Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/603,781
Patent 12592009
MEDICAL MONITORING ANALYSIS AND REPLAY INCLUDING INDICIA RESPONSIVE TO LIGHT ATTENUATED BY BODY TISSUE
2y 5m to grant Granted Mar 31, 2026
18/197,794
Patent 12561861
DYNAMIC RESOURCE CONSTRAINT BASED SELECTIVE IMAGE RENDERING
2y 5m to grant Granted Feb 24, 2026
18/110,055
Patent 12555297
PRESENTATION OF TOPIC INFORMATION USING ADAPTATIONS OF A VIRTUAL ENVIRONMENT
2y 5m to grant Granted Feb 17, 2026
18/247,244
Patent 12548536
Image Processing Method Based on Vertical Sychronization Signal and Electronic Device
2y 5m to grant Granted Feb 10, 2026
18/589,500
Patent 12548213
IMAGE INSPECTION SYSTEM
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
76%
Grant Probability
84%
With Interview (+8.4%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 481 resolved cases by this examiner. Grant probability derived from career allow rate.