Office Action Analysis: 18669790 — MACHINE LEARNING DEVICE, MACHINE LEARNING METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM HAVING MACHINE LEARNING PROGRAM

Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged that application is a National Stage application of PCT PCT/JP2022/032977. Priority to JP2021-195454 with a priority date of 12/01/2024 is acknowledged under 35 USC 119(e) and 37 CFR 1.78.
Information Disclosure Statement
The IDS dated 7/30/2021 has been considered and placed in the application file.
Specification Objection - Title
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: Machine Learning Device, Method, and Non-Transitory Computer-Readable Medium for Semantic Vector Generation.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 4, and 5 are rejected under 35 U.S.C. 103 as obvious over US Patent Publication 2018 0225548 A1, (Cao et al.) in view of US Patent Publication 2019 0325243 A1, (Sikka et al.).
Claim 1
 Regarding Claim 1, Cao et al. teach a machine learning device comprising: a feature extraction unit that extracts a feature vector from input data; ("the extraction submodule 310 transforms the objects into low-level features," par. 62) ("Each object is associated with a low-level feature vector denoted by x.sub.i ∈R.sup.m.sup.0, which can be obtained from the extraction submodule 310," par. 71) a semantic vector generation unit that generates a semantic vector from semantic information added to the input data; ("the representation submodule 310 transforms the high-level labels into multiple views of semantic representations," par. 62) ("Each label is associated with a high-level semantic representation in a vector form, y.sub.v(z.sub.i) ∈R.sup.m.sup.v, which can also be obtained from the extraction submodule 310," par. 71) a semantic prediction unit that has been trained in advance in a meta-learning process and that generates a semantic vector from the feature vector of the input data; ("This mapping process, as shown in FIG. 4 and performed in the training phase using the training data of ‘known’ object or scene classes, will capture how the low-level image features (e.g., the features extracted within the different bounding boxes in the input image) are related with the high-level semantic descriptions (e.g., a walking child, head lights, tires, a stop sign, a yellow bus) of the scene label (e.g., ‘a school bus picking up a child’)," par. 76) a mapping unit that has learned a base class and that generates a semantic vector from the feature vector of the input data; ("An objective of an embedding technique is to map vectors from different spaces to a common space, such that there is a unified representation (e.g. same dimensionality) for the vectors, allowing for a convenient comparison of the vectors," par. 74) and when semantic information is not added to input data of a novel class at the time of learning the novel class ("while in the zero-shot object recognition mode 320, the model 316 will be only tested on objects with unseen 324 labels," par. 66).
Cao et al. do not explicitly teach all of an optimization unit that optimizes parameters of the mapping unit using the semantic vector generated by the semantic prediction unit as a correct answer semantic vector such that a distance between the semantic vector generated by the mapping unit and the correct answer semantic vector is minimized.
However, Sikka et al. teach an optimization unit that optimizes parameters of the mapping unit using the semantic vector generated by the semantic prediction unit as a correct answer semantic vector ("the seen object bounding box was assigned the object class label of the ground-truth box by the bounding box proposal module 110. Each ground truth bounding box is capable of generating several object bounding boxes for training the semantic embedding space," par. 50) such that a distance between the semantic vector generated by the mapping unit and the correct answer semantic vector is minimized ("At 716, a similarity measure is computed between the projected features of the proposed object bounding box and the embedded features of at least one of the embedded foreground object bounding boxes and the embedded background object bounding box in the semantic embedding space. The method 700 can proceed to 718. At 718, an object class label is predicted for the features/object of the proposed object bounding box by determining at least one of a nearest foreground object class and a nearest background object class to the projected features of the proposed object bounding box in the semantic embedding space based on the similarity measure computed for the projected features of the proposed object bounding box," par. 78-79).
Therefore, taking the teachings of Cao et al. and Sikka et al. as a whole, it would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify feature vector extraction as taught by Cao et al. to use a ground truth parameter optimization as taught by Sikka et al. The suggestion/motivation for doing so would have been that, “an object class label is predicted for the features/object of the proposed object bounding box by determining at least one of a nearest foreground object class and a nearest background object class to the projected features of the proposed object bounding box in the semantic embedding space based on the similarity measure computed for the projected features of the proposed object bounding box” as noted by the Sikka et al. disclosure in paragraph [0079], which also motivates combination because the combination would predictably have a higher efficiency as there is a reasonable expectation that the nearest foreground/background object class in the semantic embedding space provides a robust and efficient classification for zero-shot object detection; and/or because doing so merely combines prior art elements according to known methods to yield predictable results.
The rejection of device claim 1 above applies mutatis mutandis to the corresponding limitations of method claim 4 and non-transitory computer-readable medium claim 5 while noting that the rejection above cites to both method and non-transitory computer-readable medium disclosures. Claims 4 and 5 are mapped below for clarity of the record and to specify any new limitations not included in claim 1.
Claim 2
 Regarding claim 2, Cao et al. and Sikka et al. teach the machine learning deice according to Claim 1 as noted above. Cao et al. teach when semantic information is added to the input data of the novel class ("In the general object recognition mode 318, the model 316 will be tested on objects with seen 322 or unseen 324 labels," par. 66).
Cao et al. do not explicitly teach all of wherein the optimization unit optimizes the parameters of the mapping unit using the semantic vector generated by the semantic vector generation unit as the correct answer semantic vector such that the distance between the semantic vector generated by the mapping unit and the correct answer semantic vector is minimized.
However, Sikka et al. teach wherein the optimization unit optimizes the parameters of the mapping unit using the semantic vector generated by the semantic vector generation unit as the correct answer semantic vector ("the seen object bounding box was assigned the object class label of the ground-truth box by the bounding box proposal module 110. Each ground truth bounding box is capable of generating several object bounding boxes for training the semantic embedding space," par. 50) such that the distance between the semantic vector generated by the mapping unit and the correct answer semantic vector is minimized ("At 716, a similarity measure is computed between the projected features of the proposed object bounding box and the embedded features of at least one of the embedded foreground object bounding boxes and the embedded background object bounding box in the semantic embedding space. The method 700 can proceed to 718. At 718, an object class label is predicted for the features/object of the proposed object bounding box by determining at least one of a nearest foreground object class and a nearest background object class to the projected features of the proposed object bounding box in the semantic embedding space based on the similarity measure computed for the projected features of the proposed object bounding box," par. 78-79).
Cao et al. and Sikka et al. are combined as per claim 1.
Claim 4
 Regarding Claim 4, Cao et al. teach a machine learning method comprising: extracting a feature vector from input data; ("the extraction submodule 310 transforms the objects into low-level features," par. 62) ("Each object is associated with a low-level feature vector denoted by x.sub.i ∈R.sup.m.sup.0, which can be obtained from the extraction submodule 310," par. 71) generating a semantic vector from semantic information added to the input data; ("the representation submodule 310 transforms the high-level labels into multiple views of semantic representations," par. 62) ("Each label is associated with a high-level semantic representation in a vector form, y.sub.v(z.sub.i) ∈R.sup.m.sup.v, which can also be obtained from the extraction submodule 310," par. 71) generating a semantic vector from the feature vector of the input data by using a semantic prediction module that has been trained in advance in a meta-learning process; ("This mapping process, as shown in FIG. 4 and performed in the training phase using the training data of ‘known’ object or scene classes, will capture how the low-level image features (e.g., the features extracted within the different bounding boxes in the input image) are related with the high-level semantic descriptions (e.g., a walking child, head lights, tires, a stop sign, a yellow bus) of the scene label (e.g., ‘a school bus picking up a child’)," par. 76) generating a semantic vector from the feature vector of the input data by using a mapping module that has learned a base class; ("An objective of an embedding technique is to map vectors from different spaces to a common space, such that there is a unified representation (e.g. same dimensionality) for the vectors, allowing for a convenient comparison of the vectors," par. 74) and when semantic information is not added to input data of a novel class at the time of learning the novel class ("while in the zero-shot object recognition mode 320, the model 316 will be only tested on objects with unseen 324 labels," par. 66).
Cao et al. do not explicitly teach all of optimizing parameters of the mapping module using the semantic vector generated by the semantic prediction module as a correct answer semantic vector such that a distance between the semantic vector generated by the mapping module and the correct answer semantic vector is minimized.
However, Sikka et al. teach optimizing parameters of the mapping module using the semantic vector generated by the semantic prediction module as a correct answer semantic vector ("the seen object bounding box was assigned the object class label of the ground-truth box by the bounding box proposal module 110. Each ground truth bounding box is capable of generating several object bounding boxes for training the semantic embedding space," par. 50) such that a distance between the semantic vector generated by the mapping module and the correct answer semantic vector is minimized ("At 716, a similarity measure is computed between the projected features of the proposed object bounding box and the embedded features of at least one of the embedded foreground object bounding boxes and the embedded background object bounding box in the semantic embedding space. The method 700 can proceed to 718. At 718, an object class label is predicted for the features/object of the proposed object bounding box by determining at least one of a nearest foreground object class and a nearest background object class to the projected features of the proposed object bounding box in the semantic embedding space based on the similarity measure computed for the projected features of the proposed object bounding box," par. 78-79).
Cao et al. and Sikka et al. are combined as per claim 1.
Claim 5
 Regarding Claim 5, Cao et al. teach a non-transitory computer-readable medium having a machine learning program comprising computer-implemented modules including: ("the computer program product includes computer-readable instructions stored on a non-transitory computer-readable medium that are executable by a computer having one or more processors," par. 17) a feature extraction module that extracts a feature vector from input data; ("the extraction submodule 310 transforms the objects into low-level features," par. 62) ("Each object is associated with a low-level feature vector denoted by x.sub.i ∈R.sup.m.sup.0, which can be obtained from the extraction submodule 310," par. 71) a semantic vector generation module that generates a semantic vector from semantic information added to the input data; ("the representation submodule 310 transforms the high-level labels into multiple views of semantic representations," par. 62) ("Each label is associated with a high-level semantic representation in a vector form, y.sub.v(z.sub.i) ∈R.sup.m.sup.v, which can also be obtained from the extraction submodule 310," par. 71) a semantic prediction module that has been trained in advance in a meta-learning process and that generates a semantic vector from the feature vector of the input data; ("This mapping process, as shown in FIG. 4 and performed in the training phase using the training data of ‘known’ object or scene classes, will capture how the low-level image features (e.g., the features extracted within the different bounding boxes in the input image) are related with the high-level semantic descriptions (e.g., a walking child, head lights, tires, a stop sign, a yellow bus) of the scene label (e.g., ‘a school bus picking up a child’)," par. 76) a mapping module that has learned a base class and that generates a semantic vector from the feature vector of the input data; ("An objective of an embedding technique is to map vectors from different spaces to a common space, such that there is a unified representation (e.g. same dimensionality) for the vectors, allowing for a convenient comparison of the vectors," par. 74) and when semantic information is not added to input data of a novel class at the time of learning the novel class ("while in the zero-shot object recognition mode 320, the model 316 will be only tested on objects with unseen 324 labels," par. 66).
Cao et al. do not explicitly teach all of an optimization module that optimizes parameters of the mapping module using the semantic vector generated by the semantic prediction module as a correct answer semantic vector such that a distance between the semantic vector generated by the mapping module and the correct answer semantic vector is minimized.
However, Sikka et al. teach an optimization module that optimizes parameters of the mapping module using the semantic vector generated by the semantic prediction module as a correct answer semantic vector ("the seen object bounding box was assigned the object class label of the ground-truth box by the bounding box proposal module 110. Each ground truth bounding box is capable of generating several object bounding boxes for training the semantic embedding space," par. 50) such that a distance between the semantic vector generated by the mapping module and the correct answer semantic vector is minimized ("At 716, a similarity measure is computed between the projected features of the proposed object bounding box and the embedded features of at least one of the embedded foreground object bounding boxes and the embedded background object bounding box in the semantic embedding space. The method 700 can proceed to 718. At 718, an object class label is predicted for the features/object of the proposed object bounding box by determining at least one of a nearest foreground object class and a nearest background object class to the projected features of the proposed object bounding box in the semantic embedding space based on the similarity measure computed for the projected features of the proposed object bounding box," par. 78-79).
Cao et al. and Sikka et al. are combined as per claim 1.
2nd Claim Rejections - 35 USC § 103
Claim 3 is rejected under 35 U.S.C. 103 as obvious over US Patent Publication 2018 0225548 A1, (Cao et al.) and US Patent Publication 2019 0325243 A1, (Sikka et al.) in view of US Patent Publication 2021 0319263 A1, (Schwartz et al.).
Claim 3
 Regarding Claim 3, Cao et al. and Sikka et al. teach the machine learning deice according to Claim 1 as noted above. Cao et al. teach wherein the semantic vector generation unit generates a semantic vector from semantic information added to input data ("the representation submodule 310 transforms the high-level labels into multiple views of semantic representations," par. 62)("Each label is associated with a high-level semantic representation in a vector form, y.sub.v(z.sub.i) ∈R.sup.m.sup.v, which can also be obtained from the extraction submodule 310," par. 71) and wherein the semantic prediction unit generates a semantic vector from a feature vector ("the extraction submodule 310 transforms the objects into low-level features," par. 62) ("Each object is associated with a low-level feature vector denoted by x.sub.i ∈R.sup.m.sup.0, which can be obtained from the extraction submodule 310," par. 71).
Sikka et al. teach wherein the optimization unit optimizes parameters of the semantic prediction unit using the semantic vector generated by the semantic vector generation unit as a correct answer semantic vector ("the seen object bounding box was assigned the object class label of the ground-truth box by the bounding box proposal module 110. Each ground truth bounding box is capable of generating several object bounding boxes for training the semantic embedding space," par. 50) such that the distance between the semantic vector generated by the semantic prediction unit and the correct answer semantic vector is minimized ("At 716, a similarity measure is computed between the projected features of the proposed object bounding box and the embedded features of at least one of the embedded foreground object bounding boxes and the embedded background object bounding box in the semantic embedding space. The method 700 can proceed to 718. At 718, an object class label is predicted for the features/object of the proposed object bounding box by determining at least one of a nearest foreground object class and a nearest background object class to the projected features of the proposed object bounding box in the semantic embedding space based on the similarity measure computed for the projected features of the proposed object bounding box," par. 78-79).
Cao et al. and Sikka et al. do not explicitly teach all semantic information added to input data of a pseudo few-shot class selected from the base class and a feature vector of the input data of the pseudo few-shot class.
However, Schwartz et al. teach all semantic information added to input data of a pseudo few-shot class selected from the base class ("Embodiments may utilize few-shot learning with semantics closer to the setting used by human infants by building on multiple semantic explanations (e.g. name and description) that accompany the few image examples and utilize more complex natural language based semantics rather than just the name of the category," par. 37) and a feature vector of the input data of the pseudo few-shot class ("the image may be processed by model 100 using an included visual information branch 101 supported by a CNN backbone 104 that may compute features both for the training images of the few-shot task and for the query images 112," par. 19).
Therefore, taking the teachings of Cao et al., Sikka et al., and Schwartz et al. as a whole, it would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify feature vector extraction as taught by Cao et al. and a ground truth parameter optimization as taught by Sikka et al. to use few-shot object classification as taught by Schwartz et al. The suggestion/motivation for doing so would have been that, “These richer descriptions and the multiple semantic setting may facilitate few-shot learning (leveraging the intuition of how human infants learn). Typically, more complex semantics (description) alone is not sufficient for improving performance. However, embodiments that combine more complex semantics with the label semantics in a multiple semantic setting, may provide improved performance” as noted by the Schwartz et al. disclosure in paragraph [0016], which also motivates combination because the combination would predictably have a higher productivity/efficiency/lower complexity/mathematical load as there is a reasonable expectation that  combining richer descriptions with label semantics facilitates the intuition of human learning to improve object classification even when training data is limited ; and/or because doing so merely combines prior art elements according to known methods to yield predictable results.

Reference Cited
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
US Patent Publication 2021 0124993 A1 to Singh et al. discloses classifying digital images in few-shot tasks based on self-supervision and manifold mix-up.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KARSTEN F LANTZ whose telephone number is (571) 272-4564. The examiner can normally be reached Monday-Friday 8:00-4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ms. Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Karsten F. Lantz/Examiner, Art Unit 2664


Date: 2/25/2026
/JENNIFER MEHMOOD/Supervisory Patent Examiner, Art Unit 2664
Read full office action
MACHINE LEARNING DEVICE, MACHINE LEARNING METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM HAVING MACHINE LEARNING PROGRAM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

MACHINE LEARNING DEVICE, MACHINE LEARNING METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM HAVING MACHINE LEARNING PROGRAM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email