Last updated: April 19, 2026
Application No. 18/625,556
APPARATUS AND METHOD WITH DEFECT DETECTION

Non-Final OA §101§103
Filed
Apr 03, 2024
Examiner
SHIFERAW, HENOK ASRES
Art Unit
2676
Tech Center
2600 — Communications
Assignee
Samsung Electro-Mechanics Co., Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +1.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 578 resolved cases, 2023–2026
Examiner Intelligence

SHIFERAW, HENOK ASRES View full profile →
Grants 90% — above average
Career Allow Rate
518 granted / 578 resolved
+27.6% vs TC avg
Minimal +2% lift
Without
With
+1.5%
Interview Lift
resolved cases with interview
Fast prosecutor
1y 10m
Avg Prosecution
19 currently pending
Career history
597
Total Applications
across all art units
Statute-Specific Performance

§101
12.3%
-27.7% vs TC avg
§103
72.7%
+32.7% vs TC avg
§102
6.2%
-33.8% vs TC avg
§112
4.0%
-36.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 578 resolved cases
Office Action

§101 §103
Detailed Action
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . See 35 U.S.C. § 100 (note).
Oath/Declaration
The receipt of Oath/Declaration is acknowledged.
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. 
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55. 
Preliminary Amendment
The Preliminary Amendment submitted on August 9, 2024 containing amendments to the specification are acknowledge.
Drawings
The drawing(s) filed on April 3, 2024 are accepted by the Examiner.
Status of Claims
Claims 1–20 are pending in this application.
  Information Disclosure Statement
The information disclosure statement (IDS) submitted on April 3, 2024 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Patent Subject Matter Eligibility
Claim Rejection - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1–20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
In January, 2019 (updated October 2019), the USPTO released new examination guidelines setting forth a two-step inquiry for determining whether a claim is directed to non-statutory subject matter.  According to the guidelines, a claim is directed to non-statutory subject matter if: 
STEP 1: the claim does not fall within one of the four statutory categories of invention (process, machine, manufacture or composition of matter), or 
STEP 2: the claim recites a judicial exception, e.g., an abstract idea, without reciting additional elements that amount to significantly more than the judicial exception, as determined using the following analysis:
STEP 2A (PRONG 1): Does the claim recite an abstract idea, law of nature, or natural phenomenon?
STEP 2A (PRONG 2): Does the claim recite additional elements that integrate the judicial exception into a practical application?
STEP 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
Using the two-step inquiry, it is clear that the claims are directed toward non-statutory subject matter, as shown below:
STEP 1: Do the claims fall within one of the statutory categories?  Yes.
Claims 1–8: Process (method).
Claims 9–20: Machine (apparatus comprising processors).
The claims are within a statutory category.
STEP 2A (PRONG 1): Is the claim directed to a law of nature, a natural phenomenon or an abstract idea?  Yes, the claims are directed to an abstract idea.
	With regard to STEP 2A (PRONG 1), the guidelines provide three groupings of subject matter that are considered abstract ideas:
Mathematical concepts – mathematical relationships, mathematical formulas or equations, mathematical calculations;
Certain methods of organizing human activity – fundamental economic principles or practices (including hedging, insurance, mitigating risk); commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations); managing personal behavior or relationships or interactions between people (including social activities, teaching, and following rules or instructions); and
Mental processes – concepts that are practicably performed in the human mind (including an observation, evaluation, judgment, opinion).
The claims recite abstract ideas in the “mathematical concepts” grouping (2019 PEG: mathematical relationships, formulas, calculations; data structures) and “mental processes” (determine/classify output) insofar as they are reducible to data manipulation and mathematical scoring without specific technological improvement to computer functionality.
Mathematical concepts:
“calculating respective similarities between a first feature map … and a plurality of second feature maps …” (Claim 2)
“converting the respective similarities … into corresponding scores” (Claim 3)
“convert the similarity … into a score based on a softmax function” (Claim 13)
“train … through a minimization of a loss calculated from a similarity …” (Claim 14)
“training … to increase the similarity … through plural training iterations” (Claim 8)
Data-only creation/manipulation:
“generating a plurality of text data by adding a plurality of candidate classes … to product text information” (Claim 1)
“generate a first feature map using an image model” and “generate a plurality of second feature maps using a text model” (Claim 9)
Mental process/result-oriented determination:
“determining whether the product is defective based on the calculated respective similarities” (Claim 2)
“outputting a candidate class … corresponding to a highest score” (Claim 3)
These recitations mirror the abstract idea cases such as SAP v. InvestPic (mathematical calculations/scoring), Digitech (data creation), Electric Power Group (collecting, analyzing, and displaying information), and In re TLI Commc’ns (classifying/storing images with metadata using generic hardware). 
STEP 2A (PRONG 2): Does the claim recite additional elements that integrate the judicial exception into a practical application?  No, the claim does not recite additional elements that integrate the judicial exception into a practical application.
With regard to STEP 2A (prong 2), whether the claim recites additional elements that integrate the judicial exception into a practical application, the guidelines provide the following exemplary considerations that are indicative that an additional element (or combination of elements) may have integrated the judicial exception into a practical application:
an additional element reflects an improvement in the functioning of a computer, or an improvement to other technology or technical field;
an additional element that applies or uses a judicial exception to effect a particular treatment or prophylaxis for a disease or medical condition; 
an additional element implements a judicial exception with, or uses a judicial exception in conjunction with, a particular machine or manufacture that is integral to the claim;
an additional element effects a transformation or reduction of a particular article to a different state or thing; and
an additional element applies or uses the judicial exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception.
While the guidelines further state that the exemplary considerations are not an exhaustive list and that there may be other examples of integrating the exception into a practical application, the guidelines also list examples in which a judicial exception has not been integrated into a practical application:
an additional element merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea; 
an additional element adds insignificant extra-solution activity to the judicial exception; and 
an additional element does no more than generally link the use of a judicial exception to a particular technological environment or field of use.
The claims are applied in the field of defect detection from product images, and they use “image model,” “text model,” “feature maps,” and “softmax” to decide defect status. However:
No specific improvement to computer functionality is recited (e.g., no change to memory architecture, processor operations, specialized encoder architecture, hybrid compute pipeline, latency reduction mechanism, bandwidth management, or image acquisition pipeline improvement).
No particular machine is claimed beyond “one or more processors,” “image model,” “text model,” and a “classifier” (generic computing components).
No transformation of an article is recited; the output is a classification/label (OK/NG or defect type).
The recited product text information and decision information are field-of-use/extra-solution data constraints (customer standards, lighting, process info) that inform the scoring but do not integrate the math into a technological process that changes how the computer itself operates.
The “softmax” and “similarity” scoring are standard ML post-processing. Joint training with loss minimization is likewise a conventional ML training paradigm.
Accordingly, the judicial exception is not integrated into a practical application under the 2019 PEG. The claims amount to applying mathematical scoring and ML classification to a particular field (industrial defect inspection) without reciting a technological improvement in the computer or imaging technology itself. See Oct. 2019 Update (field-of-use and extra-solution activity are insufficient)..
STEP 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? No, the claim does not recite additional elements that amount to significantly more than the judicial exception.
With regard to STEP 2B, whether the claims recite additional elements that provide significantly more than the recited judicial exception, the guidelines specify that the pre-guideline procedure is still in effect.  Specifically, that examiners should continue to consider whether an additional element or combination of elements:
adds a specific limitation or combination of limitations that are not well-understood, routine, conventional activity in the field, which is indicative that an inventive concept may be present; or  
simply appends well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception, which is indicative that an inventive concept may not be present.
The following computer functions have been recognized as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality). 
Additional elements in the claims consist of generic processors, generic “image model,” generic “text model,” a “classifier” implementing a softmax function, and general ML training steps (pairs of image/text, similarity-based loss minimization). On the present record, these appear to be well-understood, routine, and conventional (WURC) in the ML/CV art:
Feature-map extraction by an “image encoder” and “text encoder,” similarity computation, softmax-based class scoring, and joint training via iterative loss minimization are standard ML techniques widely documented in the literature (e.g., standard CV pipelines, multimodal similarity learning, and classification routines).
The claims do not specify any unconventional network architecture, parameterization, hardware acceleration scheme, or non-generic computing configuration that departs from routine ML practice. Therefore, the claims do not add “significantly more” to the judicial exception under Step 2B.
CONCLUSION
Thus, since claims 1, 9, and 19 are: (a) directed toward an abstract idea, (b) does not recite additional elements that integrate the judicial exception into a practical application, and (c) does not recite additional elements that amount to significantly more than the judicial exception, it is clear that claims 1, 9, and 19 are directed towards non-statutory subject matter.
Further, dependent claims 2–8, 10–18 and 20 further limit the abstract idea without integrating the abstract idea into practical application or adding significantly more.  Each of the claimed limitations either expand upon or add either 1) new mental process, 2) a new additional element, 3) previously presented mental process, and/or 4) a previously presented additional element.  As such, claims 2–8, 10–18 and 20 are similarly rejected as being directed towards non-statutory subject matter.  
Art Rejections
Obviousness Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 1–3, 5–12, 13 and 19 are rejected under 35 U.S.C. § 103 as being unpatentable over the combination of “Learning Transferable Visual Models From Natural Language Supervision” (Radford et al. hereinafter referred to as “Radford”) in view of “A review on modern defect detection models using DCNNs – Deep convolutional neural networks” (Tulbure et al. hereinafter referred to as “Tulbure”). 
Claims 4, 17, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over “Radford” in view of “Tulbure” as applied to claims 1 and 9 above, and further in view of US Patent Application Publication 2013/0170733 (published Jul. 4, 2013) (“Leu”).
With respect to claim 1, Radford discloses a processor-implemented computer vision systems are trained to predict a fixed set of predetermined object categories and a processor-implemented method for classifying images using a dual-model architecture: an image encoder and a text encoder …), the method comprising: 
generating a plurality of text data by adding a plurality of candidate classes (Radford see Sec. 3.1.2, 3.1.4 and Fig. 1, 3 – teaches generating a set of text prompts by combining class labels (e.g., object names, categories, or status such as “defective” or “OK”) with descriptive text to form input to a text encoder for classification (see Sec. 3.1.2, Fig. 1, 3). Prompt engineering (Sec. 3.1.4) makes clear that such candidate classes can be systematically combined with context …), 
detecting whether a product image represents a  – teaches an image encoder (“image model”) and text encoder (“text model”) that encode images and text into a shared embedding space and use similarity for prediction/classification (CLIP dual encoder framework; zero-shot classification via image-text similarity … ).
However, Radford fails to explicitly disclose defect detecting, which indicate whether a product is defective, to product text information, and detecting whether a product image represents a defective product using an image and a text.
Tulbure, working in the same field of endeavor, recognizes this problem and teaches defect detecting (Tulbure Introduction p. 34 – disclose multiple industrial applications evolved from it, one of which is defect detection in industrial processes …), which indicate whether a product is defective, to product text information, and detecting whether a product image represents a defective product using an image and a text (Tulbure; see, e.g., Abstract, p. 33–38, 44–46, and “Deep learning era applications” “Defect detection deep learning frameworks”) –  explicitly teaches the use of defect detection in industrial and manufacturing settings, where candidate classes such as “defective” and “non-defective” (or “OK”/“NG”) are used to indicate whether a product is defective. Tulbure further discusses that defect detection systems may use additional product information as part of the input or context … Tulbure teaches that defect detection can be performed by providing a product image to a deep learning model that classifies the image as “defective” or “non-defective”. Tulbure further teaches that such models can use additional product information as context or input …).
At the time of the invention, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have included in the invention of Radford with the defect detection application of Tulbure since doing so would have predictably and advantageously allows by using product images and defect-related class prompts, thereby enabling automated, accurate detection of defective products in manufacturing.  
Therefore, the claimed subject matter would have been obvious to a person having ordinary skill in the art at the time the invention was made.
With respect to claim 2, which claim 1 is incorporated, Radford discloses calculating respective similarities between a first feature map generated by the image model and a plurality of second feature maps generated by the text model (Radford (Sec. 2.3 and 2.4) – teaches computing similarity between image embeddings and text embeddings (e.g., cosine similarity / dot product in shared embedding space) for matching/classification …); and 
determining whether the product is defective based on the calculated respective similarities (Radford (Sec. 3.1.4 and 3.1.5) – teaches selecting the best-matching text label/prompt based on similarity scores for classification decisions (zero-shot classification decision based on similarity ranking).
With respect to claim 3, which claim 2 is incorporated, Radford discloses calculating respective similarities between each of the plurality of second feature maps and the first feature map (Radford (Sec. 3.1.2 – teaches same CLIP similarity computation between image embedding and multiple candidate text embeddings …); 
converting the respective similarities between each of the plurality of second feature maps and the first feature map into corresponding scores (Radford (Sec. 3.1.2) – teaches converting similarity logits into probabilities/scores (e.g., softmax over similarities in classification) …); and 
outputting a candidate class comprised in text data corresponding to a highest score among the corresponding scores (Radford (Sec. 3.1.2 and 3.1.3) – teaches selecting the highest scoring text prompt/class as the predicted label…).
With respect to claim 5, which claim 1 is incorporated, Radford disclose plurality of candidate classes (Radford (Sec. 3.1.2, 3.1.4, Fig. 1, 3) – teaches that the candidate classes (class labels) used to generate text prompts for the text encoder can be arbitrary and are not limited to any specific domain, and demonstrates prompt engineering, where the class label can be a phrase or sentence, and multiple attributes can be included in the prompt text …) 
However, Radford fails to explicitly disclose information indicating whether the product is defective, and a defect type of the product.
Tulbure, working in the same field of endeavor, recognizes this problem and teaches information indicating whether the product is defective, and a defect type of the product (Tulbure (see p. 35–38, 44–46, e.g., “defect type,” “defect detection and classification”) – teaches that in industrial defect detection, candidate classes often include both whether a product is defective (“defective”/“non-defective”, “OK”/“NG”) and the type of defect (e.g., “scratch,” “crack,” “missing part,” etc.). Tulbure further explains that models are trained to output both defect status and defect type as part of the classification result. …).
At the time of the invention, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have included in the invention of Radford with Tulbure’s explicit teaching that candidate classes for defect detection include both defect status and defect type, in order to improve the utility and accuracy of automated defect detection and classification systems.
Therefore, the claimed subject matter would have been obvious to a person having ordinary skill in the art at the time the invention was made.
With respect to claim 6, which claim 1 is incorporated, Radford  discloses wherein the product text information comprises a plurality of information that identifies the product (Sec. 2.2 and 3.1.4 – CLIP’s training data includes text descriptions that often contain multiple fields or types of information (object name, context, category, etc.), which together serve to identify the subject/product in the image …), and wherein the plurality of information and the candidate class of the plurality of candidate classes included in each text data are distinguished by a special character (Sec. 2.2 and 3.1.4 – CLIP uses prompt engineering and templated prompts, which explicitly include special characters (commas, periods, etc.) to separate different pieces of information (e.g., ‘A photo of a {label}, a type of pet.’). This demonstrates that the text data used by CLIP distinguishes between various information fields and class labels using special characters. …).
With respect to claim 7, which claim 1 is incorporated, Radford discloses training the image model and the text model using a plurality of training data (Radford (Sec. 1) – teaches training both the image encoder and text encoder on large datasets of paired images and text using contrastive objectives …), wherein each of the plurality of training data comprises a pair that includes text data and a product image, and the training text data comprises product text information and a ground truth label (Radford (Radford (Sec. 1 and 2.3)– teaches optimizing a contrastive objective to increase similarity for matched pairs across training iterations and see Fig. 21 – ground truth labels …).
With respect to claim 8, which claim 7 is incorporated, Radford discloses calculating a similarity between a third feature map for the trained product image output from the image model and a fourth feature map for the training text data output from the text model (Radford (Sec. 1) – teaches similarity computation between image and text embeddings during training (contrastive learning objective …); and 
training the image model and the text model to increase the similarity between the third feature map and the fourth feature map through plural training iterations (Radford (Sec. 1) – teaches optimizing a contrastive objective to increase similarity for matched pairs across training iterations …).
With respect to claim 9, Radford discloses an apparatus comprising: one or more processors (Radford Abstract – computer vision systems are trained to predict a fixed set of predetermined object categories and a processor-implemented method for classifying images using a dual-model architecture: an image encoder and a text encoder …) configured to: 
generate a plurality of text data by combining each of a plurality of candidate classes with product text information (Radford (Sec. 3.1.2, Fig. 1, 3) – teaches generating text prompts by combining candidate classes with product-related context for the text encoder …)
generate a first feature map using an image model based on a product image (Radford (Sec. 2.3, Fig. 1, 3) – teaches generating a feature map from an image using an image encoder …); 
generate a plurality of second feature maps using a text model based on the plurality of text data (Radford (Sec. 2.3, Fig. 1, 3) – teaches using a text encoder to generate feature maps (embeddings) for each text prompt …); and 
detect whether the product image represents a defective product based on a determined similarity between each of the plurality of second feature maps and the first feature map (Radford (Sec. 3.1.2, Eqns., Fig. 3) – teaches comparing the image feature map to each text feature map (using cosine similarity or softmax over similarities) to classify the image …).
However, Radford fails to explicitly disclose defect detecting.
Tulbure, working in the same field of endeavor, recognizes this problem and teaches defect detection as the classification task (Tulbure (see p. 35–38, 44–46) – teaches using deep learning to classify whether a product image is “defective” or “non-defective” …).
At the time of the invention, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have included in the invention of Radford with the defect detection application of Tulbure since doing so would have predictably and advantageously allows by using product images and defect-related class prompts, thereby enabling automated, accurate detection of defective products in manufacturing.  
Therefore, the claimed subject matter would have been obvious to a person having ordinary skill in the art at the time the invention was made.
With respect to claim 10, which claim 9 is incorporated, Radford discloses wherein the product text information comprises at least one of customer company information, production area information for the product, factory information for the product, product line information for the product, process information for the product, external environment information for the product, and inspection surface information for the product (Sec. 2.2 and 3.1.4 – CLIP’s text data includes a wide range of contextual and descriptive information about each image, such as the object’s type, location, and context (e.g., “a satellite photo of a city,” “a photo of a German traffic sign,” “a photo of a barn,” etc.). This encompasses production area, external environment, inspection surface, and product line information, as these are all types of context or location information that can be included in the text descriptions used for CLIP’s training and zero-shot classification …).
With respect to claim 11, which claim 10 is incorporated, Radford discloses wherein the customer company information comprises defect inspection standard information of a customer company, and wherein the external environment information of the product comprises lighting condition information (Sec. 2.2,  3.1.4 and 3.3 – CLIP’s text descriptions can and do include information about standards (“a photo of a traffic sign,” “a photo of a {label}, a type of pet,” etc.), which can be extended to include inspection standards as part of the label or context (e.g., “a product meeting standard X”). Similarly, environmental information such as lighting is a common part of image descriptions (“a photo taken at night,” “a photo under bright sunlight,” etc.) and is part of the dataset diversity that CLIP is trained to handle …).
With respect to claim 12, which claim 9 is incorporated, Radford disclose plurality of candidate classes (Radford (Sec. 3.1.2, 3.1.4, Fig. 1, 3) – teaches that the candidate classes (class labels) used to generate text prompts for the text encoder can be arbitrary and are not limited to any specific domain, and demonstrates prompt engineering, where the class label can be a phrase or sentence, and multiple attributes can be included in the prompt text …) 
However, Radford fails to explicitly disclose information indicating whether the product is defective, and a defect type of the product.
Tulbure, working in the same field of endeavor, recognizes this problem and teaches information indicating whether the product is defective, and a defect type of the product (Tulbure (see p. 35–38, 44–46, e.g., “defect type,” “defect detection and classification”) – teaches that in industrial defect detection, candidate classes often include both whether a product is defective (“defective”/“non-defective”, “OK”/“NG”) and the type of defect (e.g., “scratch,” “crack,” “missing part,” etc.). Tulbure further explains that models are trained to output both defect status and defect type as part of the classification result. …).
At the time of the invention, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have included in the invention of Radford with Tulbure’s explicit teaching that candidate classes for defect detection include both defect status and defect type, in order to improve the utility and accuracy of automated defect detection and classification systems.
Therefore, the claimed subject matter would have been obvious to a person having ordinary skill in the art at the time the invention was made.
With respect to claim 13, which claim 9 is incorporated, Radford discloses wherein the classifier is configured to convert the similarity between each of the plurality of second feature maps and the first feature map into a score based on a softmax function (Radford (Sec. 2.3, 3.1.1, and 3.1.2 – teaches applying softmax over similarity logits for classification probabilities …), and output a candidate class corresponding to a highest score among the converted score as a result of whether the product image is defective (Radford (Sec. 3.1.1 and 3.1.2, Eqns., Fig. 3) –teaches selecting the highest probability/highest similarity class as the predicted output …).
With respect to claim 15, which claim 9 is incorporated, Radford discloses wherein the one or more processors are further configured to determine a plurality of candidate classes based on decision information (Sec. 3.1.2 and 3.1.4 – in CLIP, the set of candidate classes (labels) is determined by the user or the dataset designer, based on the decision or classification task at hand (e.g., ImageNet classes, or custom prompts for a specific dataset). The “decision information” in CLIP is implicitly the set of classes (labels) that the user wants the model to choose from for a particular prediction task. CLIP describes constructing a set of prompts or candidate classes (e.g., “A photo of a cat,” “A photo of a dog,” etc.) based on the classification decision to be made. CLIP’s process for zero-shot classification involves determining a set of candidate classes (text prompts) based on the user’s intended classification or decision task. The “decision information” is the set of labels or classes relevant to the dataset or prediction scenario, and CLIP’s inference process is structured around this …)
With respect to claim 16, which claim 15 is incorporated, Radford fails to explicitly discloses wherein the decision information comprises defect type information and defect status information.
However, Tulbure, working in the same field of endeavor, recognizes this problem and teaches wherein the decision information comprises defect type information and defect status information (Tulbure (see p. 35–38, 44–46, e.g., “defect type,” “defect detection and classification”) – teaches that in industrial defect detection, candidate classes often include both whether a product is defective (“defective”/“non-defective”, “OK”/“NG”) and the type of defect (e.g., “scratch,” “crack,” “missing part,” etc.). Tulbure further explains that models are trained to output both defect status and defect type as part of the classification result. …).
At the time of the invention, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have included in the invention of Radford with Tulbure’s explicit teaching that candidate classes for defect detection include both defect status and defect type, in order to improve the utility and accuracy of automated defect detection and classification systems.
Therefore, the claimed subject matter would have been obvious to a person having ordinary skill in the art at the time the invention was made.
With respect to claim 19, Radford discloses an apparatus, comprising: processors (Radford Abstract – computer vision systems are trained to predict a fixed set of predetermined object categories and a processor-implemented method for classifying images using a dual-model architecture: an image encoder and a text encoder …) configured to: 
train an image model, using product text data and product images, to generate a first feature map (Radford (Sec. 2.3, 2.5, Fig. 3, 8) – teaches training an image encoder using paired image/text data to generate feature maps for images …); 
train a text model, using the product text data, to generate a second feature map (Radford (Sec. 2.3, 2.5, Fig. 3, 8) – teaches training a text encoder using paired image/text data to generate feature maps for text prompts …); and 
train a classifier to convert a determined similarity between the first feature map and the second feature map into a class score that is indicative of whether a product is defective, wherein the image model, the text model, and the classifier are trained together (Radford (Sec. 2.3, 2.5, Fig. 3, 8) –teaches jointly training the image and text models such that the similarity between image and text feature maps reflects the correct class, with a classifier (e.g., softmax over similarities) outputting a class score …).
However, Radford fails to explicitly disclose indicative of whether a product is defective.
Tulbure, working in the same field of endeavor, recognizes this problem and teaches defective”/“non-defective” as the class or defect detection as the application (Tulbure (see p. 35–38, 44–46) – teaches using deep learning to classify whether a product image is “defective” or “non-defective” …).
At the time of the invention, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have included in the invention of Radford with the defect detection application of Tulbure since doing so would have predictably and advantageously allows by using product images and defect-related class prompts, thereby enabling automated, accurate detection of defective products in manufacturing.  
Therefore, the claimed subject matter would have been obvious to a person having ordinary skill in the art at the time the invention was made.
Claims 4, 17, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over “Radford” in view of “Tulbure” as applied to claims 1 and 9 above, and further in view of US Patent Application Publication 2013/0170733 (published Jul. 4, 2013) (“Leu”).
With respect to claim 4, which claim 1 is incorporated, neither Radford nor Tulbure appear to explicitly disclose wherein the product text information comprises at least one of customer company information, production area information for the product, factory information for the product, product line information for the product, process information for the product, external environment information for the product, and inspection surface information for the product.
However, Leu, working in the same field of endeavor, recognizes this problem and teaches wherein the product text information comprises at least one of customer company information, production area information for the product, factory information for the product, product line information for the product, process information for the product, external environment information for the product, and inspection surface information for the product (¶ 33 – treats brightness, shadow (and related image/process characteristics) as characteristics used in classification, these correspond to environmental/illumination conditions affecting captured inspection imagery … note that Claim 4 only requires at least one of the listed categories …).
At the time of the invention, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have included in the invention of Radford in view of Tulbure to apply external environment information for the product as taught by Leu since doing so would have predictably and advantageously allows to improve defect detection accuracy and robustness (see at least Leu, ¶ 4–8).
Therefore, the claimed subject matter would have been obvious to a person having ordinary skill in the art at the time the invention was made.
With respect to claim 17, which claim 16 is incorporated, neither Radford nor Tulbure appear to explicitly discloses wherein the user-specific product text information comprises at least one of customer company information, production area information, factory information, product line information, process information, external environment information, and inspection surface information.
However, Leu, working in the same field of endeavor, recognizes this problem and teaches wherein the user-specific product text information comprises at least one of customer company information, production area information, factory information, product line information, process information, external environment information, and inspection surface information (¶ 31, 33 and 38 – Leu discloses that defect scan data used for defect classification includes process information, lot information, and equipment information (¶[0033]). The disclosed process, lot, and equipment information constitute user-specific manufacturing context corresponding to process, production area, factory, and product line information. Leu further discloses that defect classification considers environmental and imaging characteristics, including brightness and shadow, as classification characteristics (¶[0038]), which correspond to external environment information. Additionally, the reference discloses that defect classification is performed on a semiconductor image representing a surface of a wafer or specimen on which defects are detected (¶[0031]), which corresponds to inspection surface information … note that Claim 3 only requires at least one of the listed categories).
At the time of the invention, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have included in the invention of Radford in view of Tulbure to apply user-specific product text information comprises at least one of customer company information, production area information, factory information, product line information, process information, external environment information, and inspection surface information as taught by Leu since doing so would have predictably and advantageously allows to improve defect detection accuracy and robustness (see at least Leu, ¶ 4–8).
Therefore, the claimed subject matter would have been obvious to a person having ordinary skill in the art at the time the invention was made.
With respect to claim 18, which claim 17 is incorporated, neither Radford nor Tulbure appear to explicitly discloses wherein the customer company information comprises defect inspection standard information of a customer company.
However, Leu, working in the same field of endeavor, recognizes this problem and teaches wherein the customer company information comprises defect inspection standard information of a customer company (¶¶29, 34, 44–45 – Leu discloses that defect classification rules (“rules of thumb”) are established based on user-defined ground truth and human judgment standards used to determine defect classes during training and classification (¶¶[0029], [0044]–[0045]). These disclosed judgment standards define how defects are determined and applied during inspection and correspond to defect inspection standard information provided by a user or customer entity. Leu further discloses that such standards directly influence defect classification outcomes by guiding the labeling of training data and the application of classification criteria (¶¶[0034], [0044]–[0045]). …).
At the time of the invention, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have included in the invention of Radford in view of Tulbure to apply defect inspection standard information of a customer company as taught by Leu since doing so would have predictably and advantageously allows to improve defect detection accuracy and robustness (see at least Leu, ¶ 4–8).
Therefore, the claimed subject matter would have been obvious to a person having ordinary skill in the art at the time the invention was made.
With respect to claim 20, (drawn to an apparatus) the proposed combination of Radford in view of Tulbure and further in view of Leu, explained in the rejection of method claim 4 renders obvious the steps of the apparatus of claim 20, because these steps occur in the operation of the apparatus as discussed above. Thus, the arguments similar to that presented above for claim 4 is equally applicable to claim 20.
Summary
Claims 1–13 and 15–20 are rejected under at least one of 35 U.S.C. §§ 102 and 103 as being unpatentable over the cited prior art. In the event the determination of the status of the application as subject to AIA  35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA  35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 C.F.R. § 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
  ALLOWABLE SUBJECT MATTER
Claim(s) 14 is/are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and the rejection under the 35 U.S.C. § 101 is overcome. 
Claim(s) 14 contain subject matter that is not disclosed or made obvious in the cited art.
In regard to claim 14, when considering claim 14 as a whole, prior art of record fails to disclose or render obvious, alone or in combination: 
“[…] wherein the one or more processors are further configured to: train the image model and the text model using a plurality of training data consisting of pairs of respective training text data and training product image through a minimization of a loss calculated from a similarity between a third feature map for the training product image output from the image model and a fourth feature map for the training text data output from the text model, respectively, wherein the training text data comprises the product text information and a ground truth label.”
ADDITIONAL CITATIONS
The following table lists several references that are relevant to the subject matter claimed and disclosed in this Application. The references are not relied on by the Examiner, but are provided to assist the Applicant in responding to this Office action.
Citation
Relevance
Lee et al. (2024/0273374)
Describes Method for classifying and locating surface defects in semiconductor industry. The method predicts the defect category of the input image by using a CNN classification model, and combines a rough localization map generated by the CNN classification model with an anomaly map generated by using an anomaly detection model through position integration to localize the defect region.
Wang et al. (2024/0062362)
Describes apparatus for generating a synthetic defect image for wafer inspection in charged-particle beam inspection of integrated circuits (ICs), and unfinished or finished circuit components. The apparatus improves defect inspection performance by training machine learning or deep learning models for inspecting inspection images with sufficient amounts of training defect images.
Defect Image Sample Generation With GAN for Improving Defect Recognition, by Niu
Describes to improve deep-learning-based surface defect recognition. Owing to the insufficiency of the defect images in practical production lines and the high cost of labeling, it is difficult to obtain a sufficient defect data set in terms of diversity and quantity. This article proposes a method of defect image generation to address the lack of industrial defect images. Deep learning requires a plethora of images, and the number of industrial defect images cannot meet this requirement. Article propose a new defect image-generation method called SDGAN to generate a defect image data set that balances diversity and authenticity. In practice, employing a large number of defect-free images to generate a large number of defect images using our method to expand the industry defect-free image data set. Then, the augmented defect data set is used to build a deep-learning defect recognition model.

Table 1
 CONCLUSION
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENOK A SHIFERAW whose telephone number is (571)272-4637. The examiner can normally be reached Monday-Friday, 8:30AM - 5:00PM, (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


HENOK A. SHIFERAW
Supervisory Patent Examiner
Art Unit 2676



/Henok Shiferaw/Supervisory Patent Examiner, Art Unit 2676
Read full office action
Prosecution Timeline

Apr 03, 2024
Application Filed
Aug 09, 2024
Response after Non-Final Action
Feb 19, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/349,408
Patent 12597117
METHOD, PROGRAM, APPARATUS, AND SYSTEM FOR ABNORMALITY DETECTION SUCH AS FOR DETERMINING WHETHER A PLURALITY OF CONTAINERS TO BE STACKED ON A PALLET IS NORMAL OR ABNORMAL
2y 5m to grant Granted Apr 07, 2026
18/305,627
Patent 12555231
DETECTING ISCHEMIC STROKE MIMIC USING DEEP LEARNING-BASED ANALYSIS OF MEDICAL IMAGES
2y 5m to grant Granted Feb 17, 2026
18/274,579
Patent 12536796
REMOTE SOIL AND VEGETATION PROPERTIES DETERMINATION METHOD AND SYSTEM
2y 5m to grant Granted Jan 27, 2026
18/033,185
Patent 12525056
METHOD AND DEVICE FOR MULTI-DNN-BASED FACE RECOGNITION USING PARALLEL-PROCESSING PIPELINES
2y 5m to grant Granted Jan 13, 2026
18/051,443
Patent 12499506
INFERENCE MODEL CONSTRUCTION METHOD, INFERENCE MODEL CONSTRUCTION DEVICE, RECORDING MEDIUM, CONFIGURATION DEVICE, AND CONFIGURATION METHOD
2y 5m to grant Granted Dec 16, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
90%
Grant Probability
91%
With Interview (+1.5%)
1y 10m
Median Time to Grant
Low
PTA Risk
Based on 578 resolved cases by this examiner. Grant probability derived from career allow rate.