Last updated: May 29, 2026
Application No. 18/335,825
IMAGE RECOGNITION SUPPORT APPARATUS AND IMAGE RECOGNITION SUPPORT METHOD

Final Rejection §103§112
Filed
Jun 15, 2023
Priority
Jul 20, 2022 — JP 2022-115870
Examiner
TERRELL, EMILY C
Art Unit
2666
Tech Center
2600 — Communications
Assignee
Hitachi, Ltd.
OA Round
2 (Final)
This examiner grants 59% of cases after interview

— +35.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 539 resolved cases, 2023–2026
Examiner Intelligence

TERRELL, EMILY C View full profile →
Grants 59% of resolved cases
Career Allowance Rate
317 granted / 539 resolved
-3.2% vs TC avg
Strong +36% interview lift
Without
With
+35.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
7 currently pending
Career history
555
Total Applications
across all art units
Statute-Specific Performance

§101
1.1%
-38.9% vs TC avg
§103
84.8%
+44.8% vs TC avg
§102
9.2%
-30.8% vs TC avg
§112
2.8%
-37.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 539 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Status
Claims 1-12 were pending for examination in the application filed June 15, 2023. Claims 1, 6-8, 10 and 12 are amended, claims 13-16 are added, and no claims are cancelled as of the remarks and amendments received July 28, 2025. Accordingly, claims 1-16 are pending in the application for examination.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1 and dependent claims by dependency, are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. The newly amended limitation calculates a similarity of each combination using the image language model, calculates an average of similarity from combinations having a high similarity, and when the average of the similarity is equal to or larger than a predetermined value is confusing. High first being a relative term of degree with no relationship clarified in the specification nor metes and bounds defined by the claim, but additionally, the relationship between the average and the larger than a predetermined value to that of the “high” similarity is not clarified. Further, there is no disclosure as to what occurs if the high similarity is not indeed the similarity found. The Examiner is interpreting this limitation as the average of similarity from having a combination of two or greater elements.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1.	Determining the scope and contents of the prior art.
2.	Ascertaining the differences between the prior art and the claims at issue.
3.	Resolving the level of ordinary skill in the pertinent art.
4.	Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-13 and 15-16 are being rejected under 35 U.S.C. 103 as being unpatentable over Zadeh et al. (Zadeh; U.S. Patent Application Publication 2018/0204111 A1).

Regarding Claim 1, (Currently Amended) Zadeh discloses the aspects of the image recognition support apparatus comprising: 
a processor (processor) coupled to a memory storing instructions for the processor to function as:
an image acquisition unit (processor) that acquires an image ([2405] For example, see FIG. 213. In one embodiment, we have dictionary of images, or reverse dictionary of images, for searches both ways. In one embodiment, we have a document and document reviewer, for rendering, e.g. with image and text on the same file or page or layout. See Section Dictionary of Images); 
an image recognition unit (processor) that detects an object included in the image using an object detection model ([2408] In one embodiment, the plug-in has a corresponding user interface, communicating with Z-web and its databases. In one embodiment, there is a context, obtained from metadata, or from enclosing documents, or from user ID, or from reader's ID. In one embodiment, in one or more databases, we have well-known people or objects, or classes of them. In one embodiment, we have various thresholds for different matchings. In one embodiment, we have batch process before the user's view, based on web page or image, or document capture, or indexing process, or background process in network (e.g. not user-driven), e.g. using annotations. (Note that these can be before that process, or at the same time.) In one embodiment, some of the process can be done without user's involvement or reader's involvement. In one embodiment, the UI or drop-down menu is used for entry into database, for editing and entry, or for learning an image or object.); and 
a detection result processing unit (processor see especially [2097]-[2098]) that 
generates one or more expanded image queries indicating a partial image of the image including the object (See section Feature Detection and Learning especially [0675], and Partial Image Training Section beginning [1741] as well as Object Detection in Layers: [1753] In one embodiment, an object/feature detector/classifier detects an object in a data/image. In one embodiment, the detected object may be part of or component of another object or detected for example based on the recognition of a partial image. In one embodiment, the structure of the object (e.g., the periphery, blob, coverage projection, or support regions) is determined based on localization of the object within the image (e.g., through reconstruction). In one embodiment, the potential objects/concepts in the image are determined, e.g., based on the context of the image or correlation with the context(s) of the detected object. In one embodiment, the visible structure of the object is removed from the image, e.g., as part of the objects in the image foreground. In one embodiment, e.g., with RBMs or deep belief networks, partial clamping of the input (visible) data is done for regions in the image not removed. Based on the context or correlation with other types of objects, corresponding detectors, e.g., RBMs or deep belief networks, are used to detect objects (which may be partially visible). In one embodiment, through reconstruction at the visible layer, the hidden/blocked portion of such objects is estimated/predicted. For example, this facilitates reconstructing background (if learned) or the rest of the face of a person (if learned). This approach can be executed continuously or iteratively to gather correlated collections of objects or their degree of possibilities based on the reliability factors. In one embodiment, more specific context may be derived based on each correlated (and for example expanding) collection of objects, and further information or proposition may be inferred (with a reliability factor) based on the image, by feeding the relationships and the reliability factors in a knowledge web. [1754] In one embodiment, face recognition is performed on a partially blocked face in an image using a feature detector/classifier and database of known signature (vectors) associated with identified faces. In one embodiment, the comparison of detected features provides a matching probability measure between the partial image and a subset of those known in the database. In one embodiment, the reconstructed image at, for example, unclamped visible units representing the blocked portion, may provide full a face candidate for comparison with those images in the database), 
wherein the expanded image queries are images including the object, overlapping the object, or included in the object ([1750] In one embodiment, the datasets (e.g., images) include (or associated with) various objects or concepts (e.g., face, body, book, computer, chair, car, plane, road, and building). In one embodiment, classifiers are trained to detect high level signatures/features of various objects/concepts, e.g., by training the classifiers with (labeled) training data sets, including those with and without object features. Some data sets may include multiple objects or concepts, and therefore, the occurrences of the objects/concepts overlap. In one embodiment, a classifier may classify multiple objects/concepts. In one embodiment, the correlations between the objects/concepts are determined as the result of classification of various datasets. In one embodiment, a data-concept matrix is setup based on the classification of the data sets, and further analyzed, for example, by decomposition using orthogonal matrices and a (e.g., low dimensional) diagonal matrix (e.g., to a low dimensional space), e.g., by using single value decomposition technique (SVD). In one embodiment, this dimensional space represents various contexts (e.g., family, sitting, coworkers, house, office, city, outdoor, and landscape) that support or relate to various object/concepts. In one embodiment, each context represents/contributes a set of weights representing the relationships between object/concepts.), generates an expanded language query by acquiring synonyms, template extensions or additions, and labels (¶1750), 
acquires a combination having a similarity greater than or equal to a predetermined value calculated using an image language model trained (see classifier training discussions, especially [1750] and figure 198) on a relationship between an image and an attribute including a state or situation among combinations of an expanded image query (context relationships) and an expanded language query (synonyms and multiple objects or concepts) indicating one or more language labels (See Context Relationship section beginning [1750]), 
wherein the detection result processing unit generates combinations of the expanded image queries (¶1750, for the expanded image query based on combinations of the expanded image queries and photographs ¶02409) and the expanded language query (¶02409), calculates a similarity of each combination using the image language model, calculates an average of similarity from combinations having a high similarity, and when the average of the similarity is equal to or larger than a predetermined value ([2492] In one embodiment, the system evaluates the totality of all N parameters for matching photos or images or faces, or compares them using weights for more emphasis on some parameters, or adds all the scores for comparisons together for all parameters, or do a weighted average or score or vote for N parameters (e.g. N comparisons), e.g. against or versus one or more thresholds, e.g. N threshold values, or do a fuzzy comparison with no hard boundary or thresholding for any parameter, using fuzzy sets, fuzzy rules engine, or membership functions, for each or all parameter(s) or comparison(s).)
Use Images from Different Angles or Perspectives:
[1883] To model an object, from a 3-D perspective, one models the object using images taken by a real camera, from different angles. For example, for the recognition of a face or person, one looks at the face from multiple directions, e.g. from side view left, front view, half-side view right, and back side. Thus, we store the multiple views from different camera positions or angles, for the same person, for later recognition of the person, to find an exact match or a match between two or more of these snap shots or images (i.e. using limited numbers of images, as discrete sampling, for continuous matching positions, later on), as interpolation or extrapolation of one or more images, or some weighted average of them, or some average of them.
[1839] In one embodiment, we use 3 types of templates for face model in 3-D (dimensional) for face recognition, or after scanning the face (with a light, scanner, or by a 2D image or multiple 2-D images), or for storage, library, or comparison, alone or in combination: (1) wire mesh using thousands of points on the face, (2) contours of face for topography and geometry, e.g. cheek bone curves and structure, and (3) semantic model, which models the face based on the general semantics and description of the face, e.g. “big nose” or “small lips”, which are Fuzzy descriptions, with corresponding library of descriptors and shapes, plus rules engine or database, defining those beforehand, so that we can store or reconstruct or combine Fuzzy features e.g. “big nose” and “small lips”, and e.g. make up a face from descriptors later, or compare 2 faces just using descriptors without reconstructing the faces at all, which is very fast and cheap, for a Fuzzy match or closeness degree. In one embodiment, we use many small steps between Fuzzy descriptors on the scale or axis, to have differentiation between objects more easily and have a good coverage for all samples in the defined set or universe, e.g. for “height” property, we will have: “short”, “very short”, “very very short”, “extremely short”, “unbelievably short”, and so on. See e.g. FIG. 135 for such a system.
[1840] The method of recognition mentioned above is helpful as one of the parameters for face recognition, or validation for identity of a person, using pictures of different years or ages, to find a person. Identity recognition, in turn, is a factor for determination of the relationships between objects and humans (or other subjects), and to build such a web of relationships or Z-web from all these determinations, like a tree structure, with nodes and branches, with strength of relationship and reliability of the determination e.g. symbolized with the thickness and inverse length of the branches (respectively), connecting the concepts as nodes, for example, for display purposes, for visual examination by the user (which we call Z-web).
[1841] In one embodiment, we have a picture, or multiple pictures of a same person, possibly from different angles, and then we feed that to the system, and then from library, based on shape comparison (e.g. features and parameters of the head in N-dimensional feature space), the system chooses the most possible type of head, out of say e.g. 105 types it has, to suggest that as a model. Once we have the model, we fit those one or more pictures into that model, and construct point by point or mesh structure or contour map of the face. The model has some parameters as variables, which can be adjusted in 3D using those 2D images as input, which gives elasticity to the form of the face and head in the 3D format, for minor adjustments to the 3D model in computer (which can be displayed for the user, as well, as an option). In addition, the same 3D model can be input to a 3D printer, or 2D rendering image printer, or laser induced bubble printer (in plastic or glass), to construct the same head in the solid format, e.g. in glass or plastic or polymer.
[1842] In one embodiment, we have e.g. front view of a person, e.g. in a picture or image. Then, we use slanting or some deforming lens or filter or translational transform(s) to change the shape of the face slightly, and store them as the basis for the rotating or moving head slightly, from the front view position (from its original position, with small perturbation or movements), in the library. So, we can use them as eigenfaces for frontal or near frontal sideway faces, for the future face modeling, face replacement, face recognition, face storage, as linear combination of eigenfaces, face approximation, efficient storing of faces, coding the face, and comparison of faces. See e.g. FIG. 136 for such a system.
[1843] In one embodiment, we have orthogonal or orthonormal eigenfaces as basis. In one embodiment, we have non-orthogonal or non-orthonormal eigenfaces as basis, e.g. some being as linear combination of others, which is less efficient for recognition (and being too redundant), but easier to generate the basis functions, due to less constraints on basis functions. In one embodiment, we obtain eigenfaces from thousands of samples, by cloudifying or fuzzifying or averaging pixels in large neighborhood regions for the samples, in the first step. Then, optionally, we can stop there, and use the result of the first step as our final answer, as eigenfaces. Or, we go one more step, in another embodiment, and we average the first step results together, to get even more “cloudy” images, as our final result, for our basis, for eigenfaces. Or, we go one more step, in a loop, recursively, in another embodiment, and we average the averages again, until it is cloudy enough or we reach N loop count, and we stop at that point, yielding our eigenfaces. Then, any given face is a linear combination of our eigenfaces. See e.g. FIG. 137 for such a system.
[1844] To remove redundant eigenfaces from our basis functions, e.g. to have an orthogonal set, we try or choose one eigenface, and if we can write it in terms of linear combination of others, then that chosen eigenface is redundant (and not needed) and can be removed from the set. In one embodiment, to make some image fuzzified, we can use fuzzy parameters, rather than crisp ones, or use dirty or oily lens for image, or use defocused lens or out-of-focus lens for images, as a filter or transformation or operator, to get the cloudy or average effect between pixels.
[1845] In one embodiment, for face recognition, or eyes or any other object, we have Sobel operator or filter or matrix or convolution, based on gradient or derivative, so that the operator finds the gradient of the image intensity at each pixel, e.g. the direction of the largest increase for pixel intensity (with the rate) or contrast, as an indication of abruptness of changes in the image, to find the edges or boundaries, to find the objects or recognize them. In one embodiment, other filter kernels, e.g. Scharr operators, can be used for edge detection or gradient analysis.
[1846] In one embodiment, for face recognition, we use edge detection or other object recognition methods to find eyes (or nose), first, as an anchor point or feature. Then, from the eyes' positions, we know relatively where other parts may be located, if it is a real face, based on expected values or distances based on face models in library, e.g. as a probability distribution or expected value or average value or median value, for distances. See e.g. FIG. 138 for such a system. Or, in one embodiment, based on the eyes' positions, we can normalize the face size or other components or the image, for faster comparison. In one embodiment, for face recognition, we find the edges, first. In one embodiment, for face recognition, we find the separate components, eg. eyes and nose and mouth, first. In one embodiment, for face recognition, we find the whole face, as a whole, first, using e.g. eigenfaces. In one embodiment, we combine the 3 methods mentioned above, for different parts or components or stages of image or object or recognition process, for higher efficiency. In one embodiment, we generate the eigenfaces based on a large number of samples or pictures of many people, e.g. from front view or from side view, for different sets of corresponding eigenfaces, for front or side view, respectively, e.g. using averaging or weighted averaging on pictures, or using a training module.
[1744] In one embodiment, in the partial image training, the weight/bias adjustments for a learning step is modified by scaling the learning rate for a given unit (e.g., a hidden unit in H.sup.(1) layer) with the ratio of the number of its links traceable to the clamped visible units and the number of its links traceable to any visible unit. In one embodiment, similar adjustment to the learning rate is made with respect to a higher level hidden unit (e.g., in layer H.sup.(2) by, for example, determining such ratio (indirectly) by tracing through layer H.sup.(1), or simply by estimating the ratio based on similar average ratio from the traceable units in H.sup.(1) layer. For higher hidden layers where each unit is quite likely traceable to every visible unit, the ratio is estimated as number of clamped visible units to number of visible units. In one embodiment, by tempering the learning rate, the impact of the partial image on the weights is tempered as well. In one embodiment, by limiting the adjustment of weights, the impact of learning from phantom or residual data/images from the unclamped is also reduced.), and 
sets the object indicated by the expanded image query of the combination as a detected object (identified people, places or objects) and sets the expanded language query of the combination as an attribute detail label of the object (concept or object see [2409] In one embodiment, we have reverse dictionary of photos, e.g. with a GUI, e.g. with a plug-in, e.g. with a right-click-mouse function, e.g. for WHOIS? function (to identify the person or object in the image), or UPLOAD function to upload, or ANNOTATE function to annotate, or LINK function to link, or the like. In one embodiment, for the match, we compare with user's album, friend's album, friend-of-friend's album, group's album, super-group's album, social network's album, or the like, in an expanding manner, for scope or reach or size or width. In one embodiment, for the match, we have a repository of famous people, places, objects, or the like, with corresponding thresholds or criteria, with corresponding Z-factors, e.g. reliability factor. In one embodiment, for a given image or picture, the system gets or extracts a concept or object, and from that, the system can get antonyms or synonyms for that concept or object, if any, pictorially or textually or both, displayed to the user, on GUI or monitor or display. In one embodiment, the system displays ads based on antonyms or synonyms or related concepts. In one embodiment, the system displays concepts related to the object, based on a thesaurus, slangs, proverbs, or idioms dictionary, fully pictorially, or half pictorially (mixed with text).).
	In the disclosed embodiments, Zadeh discloses the aspects of the expanded language query and natural language aspects, and therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the presently filed invention to modify the system of Zadeh with the teachings of the embodiments of Zadeh as “this disclosure has many embodiments, systems, methods, algorithms, inventions, vertical applications, usages, topics, functions, variations, and examples[,] We divided them into sections for ease of reading, but they are all related and can be combined as one system, or as combination of subsystems and modules, in any combinations or just alone,” as taught by Zadeh.
Regarding Claim 2, (Original) Zadeh further discloses the aspects of the image recognition support apparatus according to claim 1, wherein the expanded language query includes a preset language label indicating an attribute of the object (Tags and Comments for Pictures and Images section: [1996] Picture annotation and caption is useful for recognition of people in the image, e.g. looking for phrases such as “from left to right”, or “top row”, to find location of faces or people in the image, and order them in rows or columns, and then call or label them as objects or persons P.sub.R1, P.sub.R2, . . . , PRN, as placeholders for names, and then compare them with the names coming after the flagged phrases such as “from left to right”, to get names matched with placeholders P.sub.R1, P.sub.R2, . . . , PRN. For recognition of names and flagged or pre-designated phrases, we use OCR and then basic or full natural language processor module. [1997] In one embodiment, we can simply look for specific words such as “left”, as flagged words, and if successful, then look for specific phrases, such as “from left to right”, as flagged phrases, from our library of flagged phrases and words, pre-recorded and stored, or dynamically adjusted and improved through time, without actually understanding the meaning of the full text and sentence, for fast picture analysis and matching names or tags or comments related to the pictures.).
Regarding Claim 3, (Original) Zadeh further discloses the aspects of the image recognition support apparatus according to claim 1, wherein the image recognition unit detects a language label indicating an attribute of the object using an attribute classification model (classifiers), and the expanded language query includes at least one of the language label indicating the attribute and a language label indicating a synonym of the attribute ([2409] In one embodiment, we have reverse dictionary of photos, e.g. with a GUI, e.g. with a plug-in, e.g. with a right-click-mouse function, e.g. for WHOIS? function (to identify the person or object in the image), or UPLOAD function to upload, or ANNOTATE function to annotate, or LINK function to link, or the like. In one embodiment, for the match, we compare with user's album, friend's album, friend-of-friend's album, group's album, super-group's album, social network's album, or the like, in an expanding manner, for scope or reach or size or width. In one embodiment, for the match, we have a repository of famous people, places, objects, or the like, with corresponding thresholds or criteria, with corresponding Z-factors, e.g. reliability factor. In one embodiment, for a given image or picture, the system gets or extracts a concept or object, and from that, the system can get antonyms or synonyms for that concept or object, if any, pictorially or textually or both, displayed to the user, on GUI or monitor or display. In one embodiment, the system displays ads based on antonyms or synonyms or related concepts. In one embodiment, the system displays concepts related to the object, based on a thesaurus, slangs, proverbs, or idioms dictionary, fully pictorially, or half pictorially (mixed with text).).
Regarding Claim 4, (Original) Zadeh further discloses the aspects of the image recognition support apparatus according to claim 1, wherein the expanded language query includes a specified language label ([2433] In one embodiment, the system gets a video, and it identifies e.g. Abraham Lincoln, “war” scene, and “old style clothing”, along with audio track which is transcribed to text for search, or searched by voice analyzer directly, which can identify the person in movie as e.g. Abraham Lincoln, as well, indicating a movie about Abraham Lincoln. In one embodiment, the system classifies the video as historical, or comedy, or the like, based on some rules and tags or labels or identifiers or indicators, or set of them, or rules engine, or fuzzy rules engine.).
Regarding Claim 5, (Original) Zadeh further discloses the aspects of the image recognition support apparatus according to claim 1, wherein the expanded image query includes a partial image in the image specified (See section Feature Detection and Learning especially [0675], and Partial Image Training Section beginning [1741] as well as Object Detection in Layers: [1753] In one embodiment, an object/feature detector/classifier detects an object in a data/image. In one embodiment, the detected object may be part of or component of another object or detected for example based on the recognition of a partial image. In one embodiment, the structure of the object (e.g., the periphery, blob, coverage projection, or support regions) is determined based on localization of the object within the image (e.g., through reconstruction). In one embodiment, the potential objects/concepts in the image are determined, e.g., based on the context of the image or correlation with the context(s) of the detected object. In one embodiment, the visible structure of the object is removed from the image, e.g., as part of the objects in the image foreground. In one embodiment, e.g., with RBMs or deep belief networks, partial clamping of the input (visible) data is done for regions in the image not removed. Based on the context or correlation with other types of objects, corresponding detectors, e.g., RBMs or deep belief networks, are used to detect objects (which may be partially visible). In one embodiment, through reconstruction at the visible layer, the hidden/blocked portion of such objects is estimated/predicted. For example, this facilitates reconstructing background (if learned) or the rest of the face of a person (if learned). This approach can be executed continuously or iteratively to gather correlated collections of objects or their degree of possibilities based on the reliability factors. In one embodiment, more specific context may be derived based on each correlated (and for example expanding) collection of objects, and further information or proposition may be inferred (with a reliability factor) based on the image, by feeding the relationships and the reliability factors in a knowledge web. [1754] In one embodiment, face recognition is performed on a partially blocked face in an image using a feature detector/classifier and database of known signature (vectors) associated with identified faces. In one embodiment, the comparison of detected features provides a matching probability measure between the partial image and a subset of those known in the database. In one embodiment, the reconstructed image at, for example, unclamped visible units representing the blocked portion, may provide full a face candidate for comparison with those images in the database).
Regarding Claim 6, (Currently Amended) Zadeh further discloses the aspects of the image recognition support apparatus according to claim 1, wherein the processor further functions as a display control unit (processor) that outputs an expanded query display screen including an image indicating the expanded image query (images with synonyms and associated annotate functions: [2409] In one embodiment, we have reverse dictionary of photos, e.g. with a GUI, e.g. with a plug-in, e.g. with a right-click-mouse function, e.g. for WHOIS? function (to identify the person or object in the image), or UPLOAD function to upload, or ANNOTATE function to annotate, or LINK function to link, or the like. In one embodiment, for the match, we compare with user's album, friend's album, friend-of-friend's album, group's album, super-group's album, social network's album, or the like, in an expanding manner, for scope or reach or size or width. In one embodiment, for the match, we have a repository of famous people, places, objects, or the like, with corresponding thresholds or criteria, with corresponding Z-factors, e.g. reliability factor. In one embodiment, for a given image or picture, the system gets or extracts a concept or object, and from that, the system can get antonyms or synonyms for that concept or object, if any, pictorially or textually or both, displayed to the user, on GUI or monitor or display. In one embodiment, the system displays ads based on antonyms or synonyms or related concepts. In one embodiment, the system displays concepts related to the object, based on a thesaurus, slangs, proverbs, or idioms dictionary, fully pictorially, or half pictorially (mixed with text).), the expanded language query (synonyms and related concepts for the match), and similarity between the expanded image query and the expanded language query calculated using the image language model (Dictionary of Images section, see especially match to albums and famous people, as well as Image Recognition for People (or Objects), Auto-Annotation & Feature-Enabling Web Albums).
Regarding Claim 7, (Currently Amended) Zadeh further discloses the aspects of the image recognition support apparatus according to claim 1, wherein the processor further functions as a display control unit (processor) that outputs an image recognition result display screen including an image indicating an expanded image query and an expanded language query (images with synonyms and associated annotate functions: [2409] In one embodiment, we have reverse dictionary of photos, e.g. with a GUI, e.g. with a plug-in, e.g. with a right-click-mouse function, e.g. for WHOIS? function (to identify the person or object in the image), or UPLOAD function to upload, or ANNOTATE function to annotate, or LINK function to link, or the like. In one embodiment, for the match, we compare with user's album, friend's album, friend-of-friend's album, group's album, super-group's album, social network's album, or the like, in an expanding manner, for scope or reach or size or width. In one embodiment, for the match, we have a repository of famous people, places, objects, or the like, with corresponding thresholds or criteria, with corresponding Z-factors, e.g. reliability factor. In one embodiment, for a given image or picture, the system gets or extracts a concept or object, and from that, the system can get antonyms or synonyms for that concept or object, if any, pictorially or textually or both, displayed to the user, on GUI or monitor or display. In one embodiment, the system displays ads based on antonyms or synonyms or related concepts. In one embodiment, the system displays concepts related to the object, based on a thesaurus, slangs, proverbs, or idioms dictionary, fully pictorially, or half pictorially (mixed with text).), the expanded language query (synonyms and related concepts for the match) that are a combination having a maximum similarity among combinations of the expanded image query and the expanded language query related to the object (Dictionary of Images section, see especially match to albums and famous people, as well as Image Recognition for People (or Objects), Auto-Annotation & Feature-Enabling Web Albums).
Regarding Claim 8, (Currently Amended) Zadeh discloses the aspects of the image recognition support apparatus according to claim 1, wherein the detection result processing unit calculates, as a display switching label of the image, a display switching label having a similarity greater than or equal to a predetermined value calculated using the image language model among combinations of the image and a display switching label indicating one or more language labels (reviews entirety of image for objects and labeling based on the entire image with objects within: [2399] In one embodiment, the system shows the nodes connected to the original node, in the Z-web, on screen or monitor for the user to see, for more research, search, hint, clue, or the like, pictorially or in text format or in voice format or bar-code or coded format or multimedia or mixed format or any other format. For example, if the user is searching for “outdoors”, the system shows the picture of “outdoors”, as well as e.g. a picture of “SUV” or “JEEP” automobile or “beach umbrella” or “flying kite”, on the side of the screen, so that the user or her friends can select the side pictures and continue going deep with it, in one or more directions in the Z-web, for related concepts or objects or people, e.g. by clicking on them, to discover more in the Z-web and the knowledge base. Auto-Annotation & Feature-Enabling Web Albums: [2400] For example, see FIG. 212. In one embodiment, we search e.g. by gender, or looking for a “red dress”, color of dress, or “wedding dress”, or looking for “cold weather”, using indicators, e.g. hat, coat, GPS data (for location on hemisphere or planet Earth), ice, snow, time or season, or the like. In one embodiment, we look for bride in an album (image album or video library) for wedding, or wedding indicators, e.g. wedding dress, tags, comments, formal dress, cake, tie, or jacket. In one embodiment, we do the same in frames of video. [2403] In one embodiment, we can search by e.g. people's name, occasion, or time, or by the characteristics of a dress, e.g. type of dress, e.g. “wedding dress”, or attributes, e.g. color, or objects, e.g. hat or type of “hat”, or e.g. abstract level of relationship for web or semantic web, e.g. “bride”, to search for “wedding dress”, or “cold weather”, to search for hat, snow, or ice, or search for concepts or environments, e.g. night or day (in the image, based on color, histogram, “Moon”, “Sun”, intensity, time, or the like). Then, in one embodiment, after search, we can organize and rank the results. The result is based on images, or parts of images with objects in them, or highlighted parts. In one embodiment, with a click of the mouse or selector on screen or monitor or display, the system goes or jumps or refers to other data about the image, or original image itself, or relationships for the image. In one embodiment, we can get the information about the body of the person, or infer the age of person, in the image, using the metadata (or the like). In one embodiment, we search e.g. for CASUAL EVENT, e.g. looking for casual dress, tie, or jacket, coming from a database or rule storage, to expand the search terms. See also [2404]).
Regarding Claim 9, (Original) Zadeh discloses the aspects of the image recognition support apparatus according to claim 1, wherein the detection result processing unit regards a plurality of specified objects as one object, and calculates the attribute detail label of the one object (reviews entirety of image for objects and labeling based on the entire image with objects within: [2399] In one embodiment, the system shows the nodes connected to the original node, in the Z-web, on screen or monitor for the user to see, for more research, search, hint, clue, or the like, pictorially or in text format or in voice format or bar-code or coded format or multimedia or mixed format or any other format. For example, if the user is searching for “outdoors”, the system shows the picture of “outdoors”, as well as e.g. a picture of “SUV” or “JEEP” automobile or “beach umbrella” or “flying kite”, on the side of the screen, so that the user or her friends can select the side pictures and continue going deep with it, in one or more directions in the Z-web, for related concepts or objects or people, e.g. by clicking on them, to discover more in the Z-web and the knowledge base. Auto-Annotation & Feature-Enabling Web Albums: [2400] For example, see FIG. 212. In one embodiment, we search e.g. by gender, or looking for a “red dress”, color of dress, or “wedding dress”, or looking for “cold weather”, using indicators, e.g. hat, coat, GPS data (for location on hemisphere or planet Earth), ice, snow, time or season, or the like. In one embodiment, we look for bride in an album (image album or video library) for wedding, or wedding indicators, e.g. wedding dress, tags, comments, formal dress, cake, tie, or jacket. In one embodiment, we do the same in frames of video. [2403] In one embodiment, we can search by e.g. people's name, occasion, or time, or by the characteristics of a dress, e.g. type of dress, e.g. “wedding dress”, or attributes, e.g. color, or objects, e.g. hat or type of “hat”, or e.g. abstract level of relationship for web or semantic web, e.g. “bride”, to search for “wedding dress”, or “cold weather”, to search for hat, snow, or ice, or search for concepts or environments, e.g. night or day (in the image, based on color, histogram, “Moon”, “Sun”, intensity, time, or the like). Then, in one embodiment, after search, we can organize and rank the results. The result is based on images, or parts of images with objects in them, or highlighted parts. In one embodiment, with a click of the mouse or selector on screen or monitor or display, the system goes or jumps or refers to other data about the image, or original image itself, or relationships for the image. In one embodiment, we can get the information about the body of the person, or infer the age of person, in the image, using the metadata (or the like). In one embodiment, we search e.g. for CASUAL EVENT, e.g. looking for casual dress, tie, or jacket, coming from a database or rule storage, to expand the search terms. See also [2404]).
Regarding Claim 10, (Currently Amended) Zadeh discloses the aspects of the image recognition support apparatus according to claim 1, wherein the processor further functions as a clustering unit that performs clustering processing on the object included in a specified region in the image and divides objects at positions relative to each other into a plurality of groups having equal number of objects, wherein the detection result processing unit regards the group as one object, and calculates the group and the attribute detail label of the group (Data Extraction, Including Emotions and Taste: [2053] In one embodiment, we have a face recognition based on the chunks or pieces of face, e.g. recognizing nose or lips, individually and with respect to each other, to confirm that they constitute a face, e.g. with respect to relative position or size. The parameters are all fuzzy parameters, in one embodiment. The relationship and relative position or size can be expressed through our Z-web, as a method of recognition of an object, with all its components, to first see that it is actually a face, and if so, whose face it belongs to, i.e. recognize the person in the next step. The shape and size of the components of a face or object are expressed in fuzzy relationships or fuzzy rules, in one embodiment. Or, it can be stored as a target object or training sample in a database or library or storage, for recognition, training, and comparison purposes. See also Face Locating Module, Data Type, Feature Detection, Using Basis Objects or Basis Windows, [2618] and Image Matching: [2487]-[2492]).
Regarding Claim 11, (Original) Zadeh discloses the aspects of the image recognition support apparatus according to claim 1, wherein the image is a monochrome image, a color image, an infrared image, or a computer graphics image ([2773] There is a great wealth of information in image and video content, which cannot be obtained from text data. People and machines generate ever increasing volume of images and videos, e.g., using mobile devices with cameras. Search for an object by text alone is inadequate. Current image search engines deliver incomplete knowledge, with unreliable or irrelevant results. By analyzing/recognizing images, highly targeted and more relevant ads may be supplied to the users. In one embodiment, a search engine platform for image and/or video is used, e.g., for recognition of objects and/or humans, with high reliability, relevance, and speed. One embodiment results in high rate of click through and/or conversion for display ads relevant to the displayed items on webpages. Appendix 4 (slides including photographic images) depicts various embodiments of the invention. [2754] In one embodiment, first, the face is located in an image, e.g. using Viola-Jones algorithm. Then, two or more images of a person are captured at 2 or more different bands of spectrum of light, using different detectors or sensors or cameras, at different ranges of frequencies. Then, the captured images are aligned and normalized, for the referencing or comparison to each other. Then, the accessories, e.g. hair or eyeglasses, are removed from the image. Then, the system normalizes or equalizes the histogram from the images, to reduce any environmental effects or camera effects. Then, any of the methods below can be applied, for face recognition (or other methods we described in our prior disclosures). In one embodiment, the system uses grayscale image or infrared image of the face or object, as the basis, for recognition.).
Regarding Claim 12, (Currently Amended) Zadeh further discloses the aspects of the image recognition support method as interpreted and rejected in light of claim 1, enumerated above, please see the above rejection of claim 1.
Regarding Claim 13, (New) Zadeh further discloses aspects of the image recognition support apparatus according to claim 1, wherein the processor is configured to transmit a work instruction in text or voice format (gesture, voice, text) to a photographer who is capturing the image or a worker at a capturing site (worker, user, friends, family, co-worker, boss, employee, contractor, senior management, public, social network, college, school, classmate, roommate, household, shared device, shared account, friend-of-friend, friend-of-friend-of-friend, and so on, or the like), the work instruction being based on a recognition situation of the image ([2471] In one embodiment, the system is used for hand gesture analysis from a video, with templates of sign language in different styles or languages, for translation to regular English or other languages or text or voice, or for analysis of hand gesture in other applications, e.g. for baseball game, or for construction workers in a noisy environment with critical results, or for codes between friends, or for special symbols between cultures or people, e.g. “V” sign, by 2 fingers, indicating VICTORY. [2472] In one embodiment, the system is used for tracking and understanding video or camera images, e.g. for a computer or smart phone or tablet input (or computer game systems), e.g. for capturing and interpreting the finger(s), hand, body, face, eye, eyebrow, nose, mouth, hat on the head, eyeglasses on the head, and the like, for poses, gestures, sequences, movements, and the like, based on coded definitions or prior interpretations or stored sequences or videos or images or frames, for comparison and analysis, to match and interpret the meaning, e.g. to convert to text or computer commands or codes, e.g. to initiate an action on the device, or other functionalities or options on the device, e.g. emailing a file or picture to a friend. [2473] Or, for example, the system interprets a “closed first” for left hand and “circular motion” with right index finger as e.g. a command for “drawing a complete circle on the screen”, on the drawing software, using pre-programmed sequences or commands or codes or executables on the drawing software, based on the library of hand motions, e.g. in the server farm, to initiate such an action, to draw a circle on the screen or display. In one embodiment, the system combines e.g. text commands and voice commands, as well, to e.g. move the circle (in the example above) around on the screen, e.g. for “move up” command, or using an “arrow-up” on the keyboard, to e.g. move the “circle” up on the display.). 
Regarding Claim 15, (New) Zadeh further discloses the aspects of the image recognition support apparatus according to claim 1, wherein the processor is configured to obtain the expanded image query by performing image conversion processing on the image of the object, the image conversion processing including at least one of color conversion, super-resolution, affine transformation, text removal, and noise removal ([1680] In one embodiment, noise is incorporated into the rendering in order to make the network more resilient to noise. In one embodiment, a stochastic noise (e.g., Gaussian) is applied to the rendering, e.g., in illumination, intensity, texture, color, contrast, saturation, edges, scale, angles, perspective, projection, skew, rotation, or twist, across or for portion(s) of the image. In one embodiment, noise is added to a hidden layer in a reproducible manner, i.e., for a given data sample (or for a given model parameters), in order to adjust the weight to result in a more modal range of activities to increase tolerance for noise. See also ¶1685, ¶3219).  
Regarding Claim 16, (New) Zadeh further discloses the aspects of the image recognition support apparatus according to claim 1, wherein the processor is configured to issue an alarm (which then can be detected for specific person, activity, object, action, or sequence, e.g. for suspicious activities, e.g. to alarm the police or authorities) this is claimed in the alternative and thus only one limitation needs to be met by the prior art of record, or when the similarity exceeds a predetermined threshold ([2394] In one embodiment, we get confidence factor and reliability factor, as Z-factors. In one embodiment, we use it for dating sites or type of look search or FBI face search. In one embodiment, we get images of object A, rather than images related to object A. In one embodiment, we use video frame images or sequences or major changes. In one embodiment, we search by image or video piece (e.g. based on percentage of matches between single frames or series of frames, with a threshold(s)), for action sequences or emotion or pose or behavior, e.g. walking or drunk walking or explosion scene or sitting or rocket take-off, for human or animal or object, e.g. for similar to dictionary or reverse dictionary search. In one embodiment, we have self-annotate function, by user or friends. In one embodiment, we have auto-annotate function. In one embodiment, we have the data indexed or linked from the database. In one embodiment, we locate or identify the person or object in various videos or still images. In one embodiment, we use audio track or OCR for recognition analysis. In one embodiment, we name the person with real name that we had found. In one embodiment, we tag the person as person X, as a substitute, until we name the person later on, with his/her real name, when it is known to us. In one embodiment, we have video frames or pieces identified with its track or piece number or ID.).

Claim 14 is being rejected under 35 U.S.C. 103 as being unpatentable over Zadeh et al. (Zadeh; U.S. Patent Application Publication 2018/0204111 A1) and Li (U.S. Patent Application Publication 2023/0386183 A1).

Regarding Claim 14, (New) Zadeh further discloses the aspects of the image recognition support apparatus according to claim 1, wherein the image is a satellite image or aerial image (Satellite or Aerial ¶0222-0223, ¶1324, 1335, 1579) comprising thermal images or synthetic images, and wherein the processor is configured to assign the attribute detail label to the attribute in the image without adding learning data or performing manual label assignment work ([2768] In one embodiment, the system classifies e.g. a bird, e.g. a Cardinal, as a male and female, due to various or different appearances, or various colors due to different seasons or climates for animals, as a separate clusters or classifications, that are later related or connected by tags and extra information, to be in the same or under the same family or name or umbrella. In one embodiment, the system classifies the Cardinal, as a bird, and carries both information with the object, for the fact that all description of BIRD carries here, as inheritance, for simplicity for recognition for faster result, to describe the parameters or units defining a bird in an image (or sound or the like), from a template for BIRD in a database, already populated or learned. For example, see the example above, for the description of an insect with geometrical units or alphabets.).
Zadeh does not explicitly teach that the satellite or aerial image is thermal or synthetic, however in further embodiments, Zadeh teaches infrared imaging for object recognition and “[3177] This can be for any kind of data, not just image. The ZAC-AI platform is horizontal, feeding the vertical applications, e.g., for image recognition, e.g., for clothing, shoe, bag, face, biometrics, satellite, aerial, building, structures, landmarks (artificial or natural), or medical, for end-users for, e.g., image referral network, image ad network, searchable images and videos, mobile and wearable devices, smart cameras and phones, social network, tracking and monitoring, analytics, security and intelligence, dating sites, location services, maps, tourism, real estate, electronic medical records, diagnostic tools, fraud detection, e.g., for blockchain and banking, or the like.”  
In the same field of endeavor, satellite and aerial imaging, Li teaches “[0007] Accordingly, implementations are described herein for generating realistic synthetic satellite imagery that simulates myriad variations of terrain conditions. Implementations are also described herein for using this realistic synthetic satellite imagery to train a remote sensing machine learning model training so that it can be used to process satellite imagery to infer terrain conditions in satellite imagery for which underlying terrain conditions are not readily known (e.g., from ground-level observations).”
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the presently filed invention to modify the system of Zadeh with the teachings of Li to train a remote sensing machine learning model training so that it can be used to process satellite imagery to infer terrain conditions in satellite imagery for which underlying terrain conditions are not readily known (e.g., from ground-level observations).

Response to Arguments
Claim Rejections Under 35 U.S.C. §103 
The Examiner most respectfully disagrees with Applicant’s assertion that Zadeh fails to teach or suggest the specific technical workflow recited in amended claim 1 as Zadeh broadly discusses aspects of image recognition and similarity calculation, [but] does not disclose or suggest the particular approach of generating and evaluating expanded queries as claimed; However, this general discussion of ZAC methods for complex scenes does not teach the specific steps of generating expanded image queries that include, overlap, or are included in the detected object, combined with expanded language queries derived from synonyms, templates, and labels. As noted in the claim rejection, paragraph [1750] discusses objects with overlap and concept matching, “[1750] In one embodiment, the datasets (e.g., images) include (or associated with) various objects or concepts (e.g., face, body, book, computer, chair, car, plane, road, and building). In one embodiment, classifiers are trained to detect high level signatures/features of various objects/concepts, e.g., by training the classifiers with (labeled) training data sets, including those with and without object features. Some data sets may include multiple objects or concepts, and therefore, the occurrences of the objects/concepts overlap. In one embodiment, a classifier may classify multiple objects/concepts. In one embodiment, the correlations between the objects/concepts are determined as the result of classification of various datasets. In one embodiment, a data-concept matrix is setup based on the classification of the data sets, and further analyzed, for example, by decomposition using orthogonal matrices and a (e.g., low dimensional) diagonal matrix (e.g., to a low dimensional space), e.g., by using single value decomposition technique (SVD). In one embodiment, this dimensional space represents various contexts (e.g., family, sitting, coworkers, house, office, city, outdoor, and landscape) that support or relate to various object/concepts. In one embodiment, each context represents/contributes a set of weights representing the relationships between object/concepts.” Additionally Zadeh teaches here labeling object features, which, from paragraph [2409] of Zadeh’s disclosure include synonyms, and antonyms for concepts and objects. The combination of embodiments of Zadeh teaches the expanded image queries, with expanded language queries and meets the broadest reasonable interpretation of the claimed invention and therefore the Examiner most respectfully maintains the rejection.
The Examiner most respectfully disagrees with Applicant’s assertion that while Zadeh discusses classifying objects and relating them through tags and information, it lacks the claimed generation of specific combinations of expanded image and language queries as [Zadeh’s] general classification and tagging approach does not disclose the specific steps of generating combinations of expanded queries, calculating similarities for each combination, and determining an average similarity from high-scoring combinations as recited in amended claim 1. In paragraphs discussed in the Image Matching section beginning paragraph [2487] the system utilizes several parameters for photos or images and compares them based on parameter matching by calculating a weighted average score, “[2492] In one embodiment, the system evaluates the totality of all N parameters for matching photos or images or faces, or compares them using weights for more emphasis on some parameters, or adds all the scores for comparisons together for all parameters, or do a weighted average or score or vote for N parameters (e.g. N comparisons), e.g. against or versus one or more thresholds, e.g. N threshold values, or do a fuzzy comparison with no hard boundary or thresholding for any parameter, using fuzzy sets, fuzzy rules engine, or membership functions, for each or all parameter(s) or comparison(s).” As previously noted, several of the parameters and comparisons are with the labeled data containing the synonyms, antonyms and several words and phrases that describe that detected object, person, or thing. The combination of embodiments of Zadeh teaches the expanded image queries, with expanded language queries and meets the broadest reasonable interpretation of the claimed invention and therefore the Examiner most respectfully maintains the rejection.
The Examiner most respectfully disagrees with Applicant’s assertion that Zadeh's similarity calculations focus on matching image signatures or hashes to a repository, rather than evaluating specific combinations of expanded image and language queries, [and t]his approach differs significantly from the claimed method of generating and evaluating combinations of expanded queries. 
Applicants specification discusses these elements in the originally filed disclosure paragraphs [0035]-[0039]:
[0039] Next, the detection result processing unit 113 generates a combination of the expanded image query 310 (the original image 311 and the expanded images 312 to 314) and the expanded language query 320 (the original text 321 and the expanded texts 322 to 324). Next, the detection result processing unit 113 calculates a similarity of each combination using the image language model 123, and calculates an average of similarity having high predetermined number or predetermined ratio from combinations having a high similarity. Any average such as a geometric average may be used as the average in addition to an arithmetic average. When the average of the similarity is equal to or larger than a predetermined value, the detection result processing unit 113 stores the expanded image query and the expanded language query included in the combination, and the similarity of the combination in the attribute detail label of the image database 130 (see Fig. 2), and notifies the display control unit 115 to be described later of them. The display control unit 115 displays the expanded image query and the expanded language query (see Fig. 9). Note that in Fig. 5, the similarity is indicated by a thickness of a line connecting the expanded image query and the expanded language query. For example, a line connecting the expanded image 314 and the expanded text 322 is the thickest, which indicates that the expanded image 314 and the expanded text 322 have a maximum similarity. 

Zadeh teaches:
Image of an Object (Versus Image Related to an Object (or Face or Person)):
[2393] For example, see FIG. 209. In one embodiment, we get features from images and video, along with user identity, relationships, and annotations, to match features with labels, users, entities, to auto-annotate and apply matched relationships. Then, in one embodiment, for a new image and/or metadata of that image or video, the system extracts features, and then, the system matches with existing features or annotations, as mentioned above, e.g. based on rules and policy engine. Then, in one embodiment, for matched images, the system locates or identifies same or similar person, object, or entity, based on matched features, in various images, photos, audios, or videos (e.g. the location in video frames, e.g. at 30-50 second points or frames or ranges), e.g. based on annotations, including auto-annotations. In one embodiment, the system locates (for pictures) at what part of picture or body, and for videos, it locates at what track and timing or segment of track, using scene name or number or ID, or for frames, locating with location or position or coordinate (at which part(s) of the frame or image)…. [2492] In one embodiment, the system evaluates the totality of all N parameters for matching photos or images or faces, or compares them using weights for more emphasis on some parameters, or adds all the scores for comparisons together for all parameters, or do a weighted average or score or vote for N parameters (e.g. N comparisons), e.g. against or versus one or more thresholds, e.g. N threshold values, or do a fuzzy comparison with no hard boundary or thresholding for any parameter, using fuzzy sets, fuzzy rules engine, or membership functions, for each or all parameter(s) or comparison(s). Here the weighted average parameters gathered from the annotated image data, the expanded image and language queries compared to a threshold meets the claimed subject matter, in light of the specification, and the Examiner most respectfully maintains the rejection.
[1439] In one embodiment, parallel/suggestive attributes are queried, e.g., from an attribute/relationship database. For example, a parallel/suggestive query for “Age” attribute, results in attribute “Birth”. In one embodiment, a template set of attributes/relationship is determined based on the result of such query. For example, along with attribute/event “Birth”, other related attributes, e.g., “Time” and “Place” related to “Birth” are returned as set/template for application and instantiation. For example, such template is applied to objects/records “Vera”, “Rob”, and “Alice”, e.g., based on their existing attribute “Age”. In one embodiment, the instantiation of template results in separate records and relationships for each instance. A template may include a class level attribute with instantiation at the class level. In one embodiment, the expanded attributes/relationships are supplemented to the relationships and records, e.g., in database. In one embodiment, a protoform of the existing attributes/relationships are instantiated and/or linked to the objects/records, as for example, depicted in FIG. 120(b) (in dotted lines): Mother(Rob) is Vera. 
[1457] In one embodiment, the contextual facts/functions are provided as template/set to supplement via instantiation and/or used in bind/join operation. In one embodiment, such instantiation further extends the attributes related to records/objects, as for example depicted in FIG. 120(d) in dotted lines, expanding “Elapsed” attribute/function on “Time” attribute, i.e., on “Time(Birth(Vera))”, “Time(Birth(Rob))”, and “Time(Birth(Alice))” 
[1579] This can be applied to any pattern recognition system or method, such as image mining or recognition on a large number of images (for example, for satellite or radar or laser or stereo or 3D (3-dimensional) imaging), e.g. using a knowledge-based database, with metadata attached or annotated to each image, identifying the source, parameters, or details of the image, e.g. as keywords or indices (which can also be used for database query). This can be used as a user-trainable search tool, employing a neural network module, with scoring functions using examples and counterexamples histograms. For example, in a bin (or partition) where there are more counterexamples than the number of examples, the resulting score is negative. These can be used for the recognition of (for example) trucks, cars, people, structures, and buildings in the images, with membership values associated with each target recognition. Each stored object or class of objects in the database (of all possible objects) has a signature (or one or more specific features, in an N-dimensional feature space, such as the length of the object, the angle between two lines, or the ratio of the length-to-width of the object), which can be matched to (or compared with) a target, with a corresponding membership value for each feature. This can be used for biometrics and security applications, as well, such as face recognition, iris recognition, hand recognition, or fingerprint recognition (e.g. with feature vectors defined from the curved pieces on fingerprints). 
[2309] In one embodiment, for machine translation, we use alignment lines between corresponding words and phrases, sometimes in different order in the sentence. In one embodiment, for machine translation, we use a pyramid (called Vauquois Triangle), starting from base as source language text, as input, and ending at other end at the base of pyramid, as output, as target language text. For the first level, we have words to words, direct translation. Then, on the 2.sup.nd level, for synthetic structure, we have synthetic transfer. Then, on the 3.sup.rd level, for semantic structure, we have semantic transfer. Then, on the top, at peak, we have interlingua. So, starting from input base on the bottom of the pyramid or triangle, going up between each level to the peak, we have morphological analysis input to the first level (words), which feeds parsing to the second level (synthetic structure), which feeds shallow synthetic analysis to the 3.sup.rd level (semantic structure), which feeds conceptual analysis to the peak (interlingua), which feeds back down from the top, conceptual generation to lower level (semantic structure), which feeds semantic generation to the lower level (synthetic structure), which feeds synthetic generation to the lower level (words), which outputs morphological generation for target language text, at the bottom of the pyramid, at the other side. Therefore, now, we have a complete machine translation method and system here.
[2310] In one embodiment, for machine translation, we use statistical alignment lines, or we use offset alignment lines, using signal processing methods, e.g. on bit text maps, to correspond the matching text together in different languages. In one embodiment, for text categorization, we use decision trees, using conditional probability and training sets. In one embodiment, for ranking or recognition, we use the frequency and distribution of some keywords. In one embodiment, the keywords can be obtained from the related nodes in Z-web.
While the Examiner appreciates the benefits noted by Applicants for the methodology presented, the Examiner kindly notes that the approach presently claimed is met by Zadeh's teachings on image recognition and classification, and that because the references meet the claimed elements, the claims remain rejected.
Applicant respectfully submits that amended claim 1 is allowable over Zadeh. Independent claim 12, as amended, recites similar features to claim 1 and is allowable for at least the same reasons. 
The Examiner acknowledges that claims 2-11 by dependency remain rejected, respectfully, for the reasons stated in response to the arguments and amendments to claim 1 above from which, either directly or indirectly, they depend. 


Conclusion
The prior art made of record:
Rumble discloses IMAGE AQUISITION, U.S. Patent Application Publication 2011/0025851 A1:
[0003] It is known to provide aerial thermal imaging maps, for example to provide an overview of heat emitted over a broad area such as over a built-up area or to identify locations of raised temperature. Individual areas, objects, people or buildings can be readily identified using various techniques involving the use of aerial thermal imaging. For mapping purposes, a thermal image can be overlaid against a known map to identify and locate relevant locations or buildings for further investigation.

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Emily C Terrell whose telephone number is (571)270-3717. The examiner can normally be reached Monday - Thursday 7 a.m.-4 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/EMILY C TERRELL/Supervisory Patent Examiner, Art Unit 2666
Read full office action
Prosecution Timeline

Jun 15, 2023
Application Filed
Jun 24, 2025
Non-Final Rejection mailed — §103, §112
Jul 28, 2025
Response Filed
Jan 22, 2026
Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/140,208
Patent 12633144
SYSTEM AND METHOD FOR TRAINING A MULTI-VIEW 3D OBJECT DETECTION FRAMEWORK
3y 0m to grant Granted May 19, 2026
18/359,958
Patent 12608935
ACTIVATING A NETWORK OF TELESCOPES FOR OPTIMIZED OBSERVATION OF ASTRONOMICAL EVENTS
2y 8m to grant Granted Apr 21, 2026
17/480,895
Patent 12586167
MEDICAL IMAGE PROCESSING APPARATUS AND MEDICAL IMAGE PROCESSING METHOD
4y 6m to grant Granted Mar 24, 2026
18/145,907
Patent 12573072
SYSTEM AND METHOD FOR OBJECT DETECTION IN DISCONTINUOUS SPACE
3y 2m to grant Granted Mar 10, 2026
18/058,528
Patent 12561956
AFFORDANCE-BASED REPOSING OF AN OBJECT IN A SCENE
3y 3m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
59%
Grant Probability
94%
With Interview (+35.7%)
2y 10m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 539 resolved cases by this examiner. Grant probability derived from career allowance rate.