Prosecution Insights
Last updated: April 19, 2026
Application No. 18/599,064

INTELLIGENT INDUSTRIAL WORKSHOP INSPECTION BASED ON ARTIFICIAL INTELLIGENCE

Non-Final OA §103§112
Filed
Mar 07, 2024
Examiner
CHEN, JOSHUA NMN
Art Unit
2665
Tech Center
2600 — Communications
Assignee
International Business Machines Corporation
OA Round
1 (Non-Final)
85%
Grant Probability
Favorable
1-2
OA Rounds
2y 11m
To Grant
99%
With Interview

Examiner Intelligence

Grants 85% — above average
85%
Career Allow Rate
34 granted / 40 resolved
+23.0% vs TC avg
Strong +26% interview lift
Without
With
+26.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
20 currently pending
Career history
60
Total Applications
across all art units

Statute-Specific Performance

§101
18.7%
-21.3% vs TC avg
§103
52.0%
+12.0% vs TC avg
§102
15.7%
-24.3% vs TC avg
§112
12.0%
-28.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 40 resolved cases

Office Action

§103 §112
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Information Disclosure Statement The information disclosure statement (IDS) submitted on 03/07/2024 was filed and is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. Claim Status Claims 1-20 are pending in the present application. Claims 2-3, 9-10, and 16-17 are rejected under 35 USC 112(b). Claims 1, 4, 8, and 11 are rejected under 35 USC 103 as being unpatentable over Wang (Vision-assisted behavior-based construction safety: Integrating computer vision and natural language processing) in view of Linder et al. (WO 2024/220444 A2). Claims 5 and 12 are rejected under 35 USC 103 as being unpatentable over Wang (Vision-assisted behavior-based construction safety: Integrating computer vision and natural language processing) in view of Linder et al. (WO 2024/220444 A2) and PATEL et al. (US 2023/0162502 A1). Claims 6-7 and 13-14 are rejected under 35 USC 103 as being unpatentable over Wang (Vision-assisted behavior-based construction safety: Integrating computer vision and natural language processing) in view of Linder et al. (WO 2024/220444 A2) and Xia et al. (Maintenance planning recommendation of complex industrial equipment based on knowledge graph and graph neural network). Claim 15 and 18 are rejected under 35 USC 103 as being unpatentable over Wang (Vision-assisted behavior-based construction safety: Integrating computer vision and natural language processing) in view of Linder et al. (WO 2024/220444 A2) and MAN (US 2019/0236370 A1). Claim 19 is rejected under 35 USC 103 as being unpatentable over Wang (Vision-assisted behavior-based construction safety: Integrating computer vision and natural language processing) in view of Linder et al. (WO 2024/220444 A2), MAN (US 2019/0236370 A1), and PATEL et al. (US 2023/0162502 A1). Claim 20 is rejected under 35 USC 103 as being unpatentable over Wang (Vision-assisted behavior-based construction safety: Integrating computer vision and natural language processing) in view of Linder et al. (WO 2024/220444 A2), MAN (US 2019/0236370 A1), and Xia et al. (Maintenance planning recommendation of complex industrial equipment based on knowledge graph and graph neural network). No prior art rejection is currently applied to claims 2-3, 9-10, and 16-17 due to the 112(b) indefiniteness rejection. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 2-3, 9-10, and 16-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Regarding claims 2, 9, and 16, the claims recite: selecting a subset of video frames from the plurality of video frames that show the operation that is performed from a set of video frames included in the video based on execution of a neural network on the set of video frames and the description of the operation that is performed. It is unclear to the examiner which set of frames are being selected and how the set of video frames are selected. The definitions of the video frames rely upon each other and range for each of the video frames is unclear. As such, claims 2, 9, and 16 are rejected under 35 U.S.C. 112 (b) as indefinite. Claims 3, 10, and 17 are rejected under 35 U.S.C. 112 (b) as indefinite for depending upon claims 2, 9, and 16 respectively,. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claims 1, 4, 8, and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (Vision-assisted behavior-based construction safety: Integrating computer vision and natural language processing, hereinafter Wang) in view of Linder et al. (WO 2024/220444 A2, hereinafter Linder). Regrading claims 1 and 8, Wang discloses Claim 1: A computer-implemented method comprising: Claim 8: A computer system comprising: a processor set; a set of one or more computer-readable storage media; and program instructions, collectively stored in the set of one or more storage media (P. 29 3.3.2 Implementations: “The experiments ran on a workstation with Xeon E5-2678 CPU and GeForce RTX 2080 Ti GPU. The software environment is configured as ubuntu 18.04, Python 3.8, PyTorch 1.7.1, CUDA 11.0, Detectron2, and other related packages.”), for causing the processor set to perform computer operations to: generate a description of the plurality of video frames based on execution of an artificial intelligence (AI) model on the plurality of video frames (Figure 3 Captioning and Dense Captioning, Figure 10 Region caption, P. 46: “Dense captioning: The region features are forwarded to a long short-term memory (LSTM) based recurrent neural network (RNN) to realize semantic information recognition and extraction. Figure 12 illustrates how the semantic information is recognized and how the region caption is generated in the LSTM cells. The LSTM cell performs several calculations to encode the necessary information as the hidden state. The hidden state transfers the information between cells, meaning that the previous cell’s hidden state is used to calculate the hidden state of the current time step.”), determine a safety issue with respect to the operation based on execution of a language machine learning model on the description of the plurality of video frames and text from a safety specification for the industrial equipment (Figure 10 Semantic Similarity, Figure 11 Semantic Similarity, Hazards reasoning and identification, Figure 17 Thresholding and Rule Compliance, P. 52: “4.2.5 Rule Compliance Checking: Rule compliance checking is performed by measuring the semantic similarity of the embedding vectors of captions and rules. As discussed in section 3.3, word embedding is a technique that represents words in a continuous, dense vector space, where each dimension represents some semantic or syntactic feature of the word. These vectors, which in the context of this study describe the region captions and the safety rule corpus, constitute a vector space that enables the evaluation of the semantic similarity between these two textual pieces of information. This process is illustrated in Figure 17 intuitively.”), and present an identifier of the safety issue via an output device of a computer associated with the industrial equipment (P. 58: “4.4.1 Model Predictions and Evaluation Results: The rule reasoning accuracy was evaluated to determine whether the model correctly predicted the safety rule that had been violated. As can be seen from the analysis results, the corresponding score was almost identical to hazard identification accuracy.” P. 95 Para. 2: “The CBIR system provides a valuable tool for construction site managers to improve their decision-making processes and identify potential safety hazards. Overall, the contribution of a CBIR system for construction images can enhance the efficiency and effectiveness of construction site management and contribute to a safer working environment for construction workers.”; Since the computer output device merely has to be associated to the industrial equipment and a system must run on a computer, the computer does not have to be directly attached to the equipment. The system being used to manage the site that the equipment is on is enough as an association.). However, Wang does not explicitly disclose identify a plurality of video frames within a video of an operation that is performed with industrial equipment, and processing multiple frames. Linder teaches identify a plurality of video frames within a video of an operation that is performed with industrial equipment, and processing multiple frames (Para [0167]: “More generally, any computer vision or other image or video processing techniques suitable for segmenting and/or identifying actions in a video may be used to parse the video for further processing.”, Para [0168]: “In one aspect, parsing the video may include extracting feature data for segments including action segments, keyframe detections correlated to action segments, and natural language action descriptions correlated to action segments. A variety of techniques are known for keyframe extraction and related functions, such as motion detection, feature extraction, clustering, and the like, as well as the use of machine learning and deep learning modules to provide keyframe segmentation using, e.g., supervised learning, temporal models, and the like.”). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang with keyframe selection and motion detection using machine learning and deep learning model of Linder since Wang in P. 10 suggests that techniques of image captioning and safety compliance detection in Wang can be extended to video since videos are sequence of images. In addition, Linder in Para. [0170] describes the parsed video to be transformed into intermediate representation which may include text descriptions and other forms as well. Regarding claims 4 and 11, dependent upon claims 1 and 8 respectively, Wang in view of Linder teaches everything regarding claim 1 and 8. Wang further teaches masking content within the plurality of video frames to remove content which is unrelated to the industrial equipment based on execution of a segmentation model, prior to the execution of the AI model on the plurality of video frames (Figure 3, P. 45: “Region proposal: The extracted image feature map is forwarded to a region proposal network (RPN) to generate target regions in the image… After obtaining the subregions’ coordinates B and scores s, the final target subregions are filtered out by applying a threshold score, whereas the image feature vectors of the subregions are obtained using another small CNN”, P. 46: “Dense captioning: The region features are forwarded to a long short-term memory (LSTM)-based recurrent neural network (RNN) to realize semantic information recognition and extraction.”; Although the region proposal here is to draw out bounding boxes, it achieves the same result as masking since the captioning is based what is inside the bounding boxes.). Claims 5 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (Vision-assisted behavior-based construction safety: Integrating computer vision and natural language processing, hereinafter Wang) in view of Linder et al. (WO 2024/220444 A2, hereinafter Linder) and PATEL et al. (US 2023/0162502 A1, hereinafter Patel). Regarding claims 5 and 12, dependent upon claims 4 and 11 respectively, Wang in view of Linder teaches everything regarding claim 4 and 11. However, Wang in view of Linder does not explicitly teach the masking content comprises masking the plurality of video frames based on a description of an object of interest which is input into the segmentation model. Patel teaches the masking content comprises masking the plurality of video frames based on a description of an object of interest which is input into the segmentation model (Fig. 2, Para [0026]: “At numeral 2, the query parser 104 can parse the text of the user input 102. For example, the query parser 104 can include a trained machine learning model that applies natural language processing to the text of user input 102 to identify an object and an intended video edit. The object may include a person, place, or other feature of the video scene such as, but not limited to: “person with a red jacket,” or “dog swimming,” or “motorcyclist and the bike.””, Para [0030]: “As shown in FIG. 2, a video segmentation system can receive a video that includes a set of frames 202. The video segmentation system also receives a text input 204.”, Para [0031]: “Once the object keyframes are identified, the video segmentation system can generate a set of reference keyframes 212A-C(i.e., reference keyframe 212A, reference keyframe 212B, and reference keyframe 212C, collectively "reference keyframes 212A-C based on a ranking or other categorization of the keyframes based on an importance of the object keyframe.”). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang in view of Linder with applying mask to a video based on text description of Patel to effectively increase the flexibility of the system when identifying objects. Claims 6-7 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (Vision-assisted behavior-based construction safety: Integrating computer vision and natural language processing, hereinafter Wang) in view of Linder et al. (WO 2024/220444 A2, hereinafter Linder) and Xia et al. (Maintenance planning recommendation of complex industrial equipment based on knowledge graph and graph neural network, hereinafter Xia). Regarding claims 6 and 13, dependent upon claims 1 and 8 respectively, Wang in view of Linder teaches everything regarding claim 1 and 8. However, Wang in view of Linder does not explicitly teach generating a knowledge graph based on the text from the safety specification, wherein the knowledge graph comprises nodes representing pieces of equipment, and edges between the nodes represent operational dependencies between the pieces of equipment. Xia teaches generating a knowledge graph based on the text from the safety specification, wherein the knowledge graph comprises nodes representing pieces of equipment, and edges between the nodes represent operational dependencies between the pieces of equipment (Fig. 1 Data Acquisition: MultiModal Data, Fig. 2, P. 3 Para. 2: “The roadmap for MKG construction is depicted in Fig. 1. Maintenance knowledge is usually heterogeneous, including signal data, images, maintenance logs, and domain-related data, thus requiring separate processing and aligned fusion.”; Since the data acquisition of Fig. 1 includes multi modal data with natural language text, it is reasonable for a person with ordinary skill in the art to have also include the safety manual of an equipment as part of the data to be used to construct the knowledge graph). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang in view of Linder with generating a knowledge graph with components as nodes and relationship between components as edges and other aspects of Xia to effectively increase the readability of relationship between components. Regarding claims 7 and 14, dependent upon claims 6 and 13 respectively, Wang in view of Linder and Xia teaches everything regarding claims 6 and 13. Xia further teaches determining the safety issue with respect to the operation that is performed is based on execution of the language machine learning model on the knowledge graph (P. 6 Para. 1: “Based on the RGCN structure [41,53], this paper proposed an ACRGCN, where involved graph attention network and compressed mechanism can overcome the sparsity and the long-tail distribution problem of MKG above. As shown in Fig. 4, the left upper side is the input of this model, and there are node feature matrix and graph structure matrix. It is noted that the graph structure matrix is used for RGCN module and DeRGCN module (Eq. (7)). The input data will travel through a graph attention mechanism after passing through a number of encoder modules, as the right side in the Fig. 4. Then, the intermate result will go through the Decoder section, where the opposite computation in the encoder phase will be done. Meanwhile, Encoder and Decoder have cross-calculation by a residual block, where same layer of Encoder and Decoder will be joined by an add method. Finally, the proposed model can predict practical linkages according to trigger node.”, P. 7: “The DistMult model creates a ranking list of predictive links, which provides references for operators to handle the maintenance tasks. For instance, direct cause and root cause can help operators understand why the fault will occur.”, P. 11: “Besides, a specific mechanism is established to maximize MKG utilization while continually increasing KG quality, as depicted in Fig. 14. The operators send a query sentence in the QA system, which converts to searching triggers in the KG. These searching triggers will transmit to the QA system to generate an answer for maintenance tasks. However, suppose the KG cannot search for anything, the proposed ACRGCN model will be activated to predict the potential link as the recommendation, where the KG will be updated based on the engineer’s feedback. Moreover, the proposed ACRGCN model can periodically optimize KG.”). Claim 15 and 18 are rejected under 35 USC 103 as being unpatentable over Wang (Vision-assisted behavior-based construction safety: Integrating computer vision and natural language processing) in view of Linder et al. (WO 2024/220444 A2) and MAN (US 2019/0236370 A1, hereinafter Man). Regarding claim 15, Wang discloses A computer program product comprising: a set of one or more computer-readable storage media; and program instructions, collectively stored in the set of one or more computer-readable storage media (P. 29 3.3.2 Implementations: “The experiments ran on a workstation with Xeon E5-2678 CPU and GeForce RTX 2080 Ti GPU. The software environment is configured as ubuntu 18.04, Python 3.8, PyTorch 1.7.1, CUDA 11.0, Detectron2, and other related packages.”), for causing a processor set to perform computer operations comprising: generating a description of the plurality of video frames based on execution of an artificial intelligence (AI) model on the plurality of video frames (Figure 3 Captioning and Dense Captioning, Figure 10 Region caption, P. 46: “Dense captioning: The region features are forwarded to a long short-term memory (LSTM) based recurrent neural network (RNN) to realize semantic information recognition and extraction. Figure 12 illustrates how the semantic information is recognized and how the region caption is generated in the LSTM cells. The LSTM cell performs several calculations to encode the necessary information as the hidden state. The hidden state transfers the information between cells, meaning that the previous cell’s hidden state is used to calculate the hidden state of the current time step.”); determining a safety issue with respect to the operation based on execution of a language machine learning model on the description of the plurality of video frames and text from a safety specification for the industrial equipment (Figure 10 Semantic Similarity, Figure 11 Semantic Similarity, Hazards reasoning and identification, Figure 17 Thresholding and Rule Compliance, P. 52: “4.2.5 Rule Compliance Checking: Rule compliance checking is performed by measuring the semantic similarity of the embedding vectors of captions and rules. As discussed in section 3.3, word embedding is a technique that represents words in a continuous, dense vector space, where each dimension represents some semantic or syntactic feature of the word. These vectors, which in the context of this study describe the region captions and the safety rule corpus, constitute a vector space that enables the evaluation of the semantic similarity between these two textual pieces of information. This process is illustrated in Figure 17 intuitively.”). However, Wang does not explicitly disclose identifying a plurality of video frames within a video of an operation that is performed with industrial equipment; displaying an identifier of the safety issue on a display screen associated with the industrial equipment; and processing multiple frames. Linder teaches identifying a plurality of video frames within a video of an operation that is performed with industrial equipment; and processing multiple frames (Para [0167]: “More generally, any computer vision or other image or video processing techniques suitable for segmenting and/or identifying actions in a video may be used to parse the video for further processing.”, Para [0168]: “In one aspect, parsing the video may include extracting feature data for segments including action segments, keyframe detections correlated to action segments, and natural language action descriptions correlated to action segments. A variety of techniques are known for keyframe extraction and related functions, such as motion detection, feature extraction, clustering, and the like, as well as the use of machine learning and deep learning modules to provide keyframe segmentation using, e.g., supervised learning, temporal models, and the like.”). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang with keyframe selection and motion detection using machine learning and deep learning model of Linder since Wang in P. 10 suggests that techniques of image captioning and safety compliance detection in Wang can be extended to video since videos are sequence of images. In addition, Linder in Para. [0170] describes the parsed video to be transformed into intermediate representation which may include text descriptions and other forms as well. However Wang in view of Linder does not explicitly teach displaying an identifier of the safety issue on a display screen associated with the industrial equipment. Man teaches displaying an identifier of the safety issue on a display screen associated with the industrial equipment (Para [0164]: “detecting the equipment failure or an anomaly may be performed by employing a changepoint detection algorithm.”, Para [0165]: “The user interface may be configured to generate and display visual and audio alerts, as well as providing interactivity mechanisms enabling interactions between users and the decision support system. Enabling the interactions may include generating and providing the functionalities to allow the users to submit inputs to the decision support system, control tasks executed by the decision support system, and receive results from the decision support system.”). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang in view of Linder with outputting alerts and warnings when equipment failure or anomaly has occurred to a display of Man to effectively increase the adaptability of the system in different situations. Regarding claim 18, dependent upon claim 15, Wang in view of Linder and Man teaches everything regarding claim 15. Wang further discloses perform masking content within the plurality of video frames to remove content which is unrelated to the industrial equipment based on execution of a segmentation model, prior to the execution of the AI model on the plurality of video frames (Figure 3, P. 45: “Region proposal: The extracted image feature map is forwarded to a region proposal network (RPN) to generate target regions in the image… After obtaining the subregions’ coordinates B and scores s, the final target subregions are filtered out by applying a threshold score, whereas the image feature vectors of the subregions are obtained using another small CNN”, P. 46: “Dense captioning: The region features are forwarded to a long short-term memory (LSTM)-based recurrent neural network (RNN) to realize semantic information recognition and extraction.”; Although the region proposal here is to draw out bounding boxes, it achieves the same result as masking since the captioning is based what is inside the bounding boxes.). Claim 19 is rejected under 35 USC 103 as being unpatentable over Wang (Vision-assisted behavior-based construction safety: Integrating computer vision and natural language processing) in view of Linder et al. (WO 2024/220444 A2), MAN (US 2019/0236370 A1), and PATEL et al. (US 2023/0162502 A1). Regarding claim 19, dependent upon claim 18, Wang in view of Linder and Man teaches everything regarding claim 18. However, Wang in view of Linder and Man does not explicitly teach masking the plurality of video frames based on a description of an object of interest which is input into the segmentation model. Patel teaches masking the plurality of video frames based on a description of an object of interest which is input into the segmentation model (Fig. 2, Para [0026]: “At numeral 2, the query parser 104 can parse the text of the user input 102. For example, the query parser 104 can include a trained machine learning model that applies natural language processing to the text of user input 102 to identify an object and an intended video edit. The object may include a person, place, or other feature of the video scene such as, but not limited to: “person with a red jacket,” or “dog swimming,” or “motorcyclist and the bike.””, Para [0030]: “As shown in FIG. 2, a video segmentation system can receive a video that includes a set of frames 202. The video segmentation system also receives a text input 204.”, Para [0031]: “Once the object keyframes are identified, the video segmentation system can generate a set of reference keyframes 212A-C(i.e., reference keyframe 212A, reference keyframe 212B, and reference keyframe 212C, collectively "reference keyframes 212A-C based on a ranking or other categorization of the keyframes based on an importance of the object keyframe.”). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang in view of Linder and Man with applying mask to a video based on text description of Patel to effectively increase the flexibility of the system when identifying objects. Claim 20 is rejected under 35 USC 103 as being unpatentable over Wang (Vision-assisted behavior-based construction safety: Integrating computer vision and natural language processing) in view of Linder et al. (WO 2024/220444 A2), MAN (US 2019/0236370 A1), and Xia et al. (Maintenance planning recommendation of complex industrial equipment based on knowledge graph and graph neural network). Regarding claim 20, dependent upon claim 15, Wang in view of Linder and Man teaches everything regarding claim 15. However Wang in view of Linder and Man does not explicitly teach generating a knowledge graph based on the text from the safety specification, wherein the knowledge graph comprises nodes representing pieces of equipment, and edges between the nodes represent operational dependencies between the pieces of equipment. Xia teaches generating a knowledge graph based on the text from the safety specification, wherein the knowledge graph comprises nodes representing pieces of equipment, and edges between the nodes represent operational dependencies between the pieces of equipment (Fig. 1 Data Acquisition: MultiModal Data, Fig. 2, P. 3 Para. 2: “The roadmap for MKG construction is depicted in Fig. 1. Maintenance knowledge is usually heterogeneous, including signal data, images, maintenance logs, and domain-related data, thus requiring separate processing and aligned fusion.”; Since the data acquisition of Fig. 1 includes multi modal data with natural language text, it is reasonable for a person with ordinary skill in the art to have also include the safety manual of an equipment as part of the data to be used to construct the knowledge graph). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wang in view of Linder and Man with generating a knowledge graph with components as nodes and relationship between components as edges and other aspects of Xia to effectively increase the readability of relationship between components. Relevant Prior Art Directed to State of Art Qin et al. (US 2021/0081497 A1, hereinafter Qin) is prior art not applied in the rejection(s) above. Qin discloses a method of detecting key messages for a video, a processor builds a role model based on data from one or more data sources, with an identification feature of each role in a video. Zhang et al. (Automatic construction site hazard identification integrating construction scene graphs with BERT based domain knowledge, hereinafter Zhang) is prior art not applied in the rejection(s) above. Zhang discloses an automatic hazard inference method using construction scene graphs and C-BERT network. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSHUA CHEN whose telephone number is (703)756-5394. The examiner can normally be reached M-Th: 9:30 am - 4:30pm ET F: 9:30 am - 2:30pm ET. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, STEPHEN R KOZIOL can be reached at (408)918-7630. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /J. C./Examiner, Art Unit 2665 /Stephen R Koziol/Supervisory Patent Examiner, Art Unit 2665
Read full office action

Prosecution Timeline

Mar 07, 2024
Application Filed
Feb 13, 2026
Non-Final Rejection — §103, §112
Mar 16, 2026
Interview Requested
Mar 24, 2026
Applicant Interview (Telephonic)
Mar 24, 2026
Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602747
METHOD AND APPARATUS FOR DENOISING A LOW-LIGHT IMAGE
2y 5m to grant Granted Apr 14, 2026
Patent 12592090
COMPENSATION OF INTENSITY VARIANCES IN IMAGES USED FOR COLONY ENUMERATION
2y 5m to grant Granted Mar 31, 2026
Patent 12579614
IMAGING DEVICE
2y 5m to grant Granted Mar 17, 2026
Patent 12579678
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT
2y 5m to grant Granted Mar 17, 2026
Patent 12573065
Vision Sensing Device and Method
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
85%
Grant Probability
99%
With Interview (+26.1%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 40 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month