Office Action Analysis: 18386265 — DATA AUGMENTATION APPARATUS AND METHOD FOR ACTION RECOGNITION THROUGH SELF-SUPERVISED LEARNING BASED ON OBJECT

Office Action

§103 §112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The Amendment filed on February 2, 2026 has been entered: claim 2-4 and 6-8 have been canceled, the application has pending claims 1, 5 and 9.

Response to Applicant’s Arguments
Applicant’s arguments, see pages 1-13, filed 02/02/2026, with respect to the rejection(s) under 35 U.S.C. §101 and §112, which was applied to the original claim set have been fully considered and are persuasive.  Therefore, the previous 35 U.S.C. §101 and §112 has been withdrawn following Applicant’s amendment to the claims. It is noted the amendments do raise new 112 issues as fully disclosed below.
Applicant’s arguments, see pages 13-18, filed 02/02/2026, with respect to the rejection(s) under 35 U.S.C. §103 based on Che in view of Ren and further in view of Sun, which was applied to the original claim set have been fully considered and are persuasive.  Therefore, the previous 35 U.S.C. §103 rejection has been withdrawn following Applicant’s amendment to the claims, the scope of the pending claims has been substantively modified, including the addition of limitations directed to disentangled feature vectors, distance-based metric learning, cross-instance motion recombination, and regeneration operations.  However, upon further consideration, a new ground of prior art rejection is made in further view of Aberman. The present subsequent §103 rejection reflects the Examiner’s consideration of the amended claim scope and the results of the additional search conducted in response thereto. Based on these facts, this action is made FINAL.

Examiner’s Comments
The amendments to claims 1, 5 and 9 include newly added limitations that are blurred and faded, making reading and interpreting difficult.  The submitted documents appears to contain low-quality or degraded text such that certain words, characters and symbols are not clearly legible without undue effort. Pursuant to 37 CFR 1.52(a) and 37 CFR 1.121, application must be presented in a clear and legible manner to permit accurate examination of the application. The Examiner has made reasonable effort to interpret the submitted amendment; however, the poor legibility of the document places an unnecessary burden on examination.  Kindly provide further responses with documents on good quality paper are required.  See 37 CFR 1.52(b).

Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 5 and 9 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
 	Claim 1 recites the limitations "motion data" in line 5, “synthesize motion data” in line 6, “new motion data” in lines 7-8, “the motion data (x)” in line 10, “motion data (x+)” in line 13, “motion data (x-)” in line 15, “a first motion data (x)” in line 17 and “a second motion data (x′)” in line 18, “a new motion data (                    
                        
                            
                                x
                            
                            ~
                        
                    
                )” in line 24. Because how similar the term “motion data” is used throughout the claims, various instances appear to make it appear to be the same, yet is unclear as it lacks antecedent basis. Consistency with the appropriate language and symbols is required to clarify the indefiniteness.
 	Accordingly, claim 1 continue to recite the limitation "feature vector" in line 4, “feature vector (s)” in line 12, “feature vector (s+)” in line 13, “feature vector (s-)” in line 14, “feature vectors (s, s′)” in line 18, and “feature vector (s′)” in line 23.  Because how similar the term “feature vector” is used throughout the claims, various instances appear to make it appear to be the same, yet is unclear as it lacks antecedent basis. Consistency with the appropriate language and symbols is required to clarify the indefiniteness.
 	Claim 1 further recites the limitation "that performance is improved as adversarial learning between the generator (G) and the discriminator (D) is repeated" in line 31.  It is unclear and thereby indefinite what or whose performance is improved.

Accordingly, claims 5 and 9 fall with claim 1 and are rejected for the same reasons set forth with respect to claim 1 above. 

Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 5, and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Che (Che et al, US 2021/0390713 A1, 2021) in view of Aberman (Aberman et al. "Unpaired Motion Style Transfer from Video to Animation" ACM Transactions on Graphics, 39(4)., 2020).

Regarding claim 1, with deficiencies of Che noted in square brackets [], Che discloses a data augmentation apparatus for action recognition through self-supervised learning based on objects, comprising:
an image input unit configured to input image information (Che discloses a motion transfer device 110 including a communication interface 202, processor 204, memory 206, and storage 208 that receive source image 101 and target image 102 from a database repository 150 and/or user device 160, corresponding to an image input unit that receives and inputs image information into the system; see Che, Fig. 3, Step 302 and paragraphs [0040]);
an information extraction unit configured to extract a feature vector [with object information] from motion data of the inputted image information (Che, in [0041], teaches extracting motion features (pose information/location) from an object depicted in an input image using learning model 105 include a motion feature encoder. Che, in [0044] further teaches generating an encoded motion feature vector representing the object’s motion feature;); and
a motion information synthesis unit configured to synthesize motion data [taking a different action] and the feature vector [with the object information] to generate new motion data (Che, in [0044], [0046], teaches generating synthesized image by synthesizing the motion feature vector, thereby synthesizing new motion data;)
wherein the information extraction unit is configured to:
receive the motion data (x) (Che, as mentioned above in [0041] & [0044], teaches receives images and then extracts motion features using a motion-feature encoder, corresponds to receive the motion data),
extract the feature vector (s) [with the object information] via an encoder (Es) (Che, in [0041] & [0044], teaches generating an encoded motion feature vector from the motion data using a motion refiner and encoder network, the encoded feature vector representing attributes of the motion source.), and
[perform learning such that a distance between the feature vector (s) and a feature vector (s+) extracted from motion data (x+) having same object information as the motion data (x) is decreased, and a distance between the feature vector (s) and a feature vector (s-) extracted from motion data (x-) having different object information from the motion data (x) is increased],
wherein the motion information synthesis unit is configured to receive a first motion data (x), [a second motion data (x′), and feature vectors (s, s′)] respectively extracted by the information extraction unit (Che, in [0041] & [0044], Che teaches receiving motion data (x) and corresponding encoded motion features vector (s) extracted from image-derived motion data; Che does not expressly disclose receiving two motion instances (x, x′) and their respective feature vectors (s, s′) for cross-combination),
wherein the motion information synthesis unit comprises a generator (G) (Che, in [0053-0055], Che teaches a generator forming part of a generative adversarial framework for generating motion data. Thus, Che teaches that the motion information synthesis unit comprises a generator) [configured to:
(i) receive the first motion data (x) and the feature vector (s) corresponding to the first motion data (x) to regenerate the first motion data (x);
(ii) receive the first motion data (x) and the feature vector (s′) corresponding to the second motion data (x′) to generate a new motion data (                        
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) in which the object information is substituted while action information is maintained; and
(iii) receive the generated new motion data (                        
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) and the feature vector (s) corresponding to the first motion data (x) to regenerate the first motion data (x),]
wherein the apparatus further comprises a discriminator (D) configured to learn to classify the new motion data generated by the generator (G) as fake (Che, in [0055-0056], teaches a discriminator network (D) forming part of a generative adversarial framework. Che discloses that the discriminator is trained to distinguish generated outputs from real motion data, thereby classifying generated motion data as fake relative to real data),
wherein the generator (G) learns to cause the discriminator (D) to misclassify the generated new motion data as real data, such that performance is improved as adversarial learning between the generator (G) and the discriminator (D) is repeated (Che, in [0053-0056], teaches adversarial training between the generator (G) and the discriminator (D), wherein the generator is optimized to generate motion data that fools the discriminator into classifying generated data as real. The generator and discriminator are iteratively trained in opposition, improving generation quality through repeated adversarial learning).

As noted above in square brackets, Che failed to disclose, but Aberman teaches:
an information extraction unit configured to extract a feature vector with object information from motion data of the inputted image information and extract the feature vector (s) with the object information via an encoder (Es) (Aberman, in [§3.1 ("Architecture")], teaches encoding motion sequences using an encoder network (E) to extract a latent codes / feature vector comprising a content code and a style code. The style code represents performer-specific characteristics associated with the motion sequence. Such latent vectors correspond to a “feature vector with object information” because they encode identity-related attributes of the motion subject that are used to condition generation.)
perform learning such that a distance between the feature vector (s) and a feature vector (s+) extracted from motion data (x+) having same object information as the motion data (x) is decreased (Aberman, in [§3.2 ("Training and Loss")], including Eq. (7)], ∥𝐸𝑆 (n𝑡 ) − 𝐸𝑆 (x𝑡 )∥ where xt is the motion with same style t=t and the loss encourages the distance between feature vectors that share the same style to be smaller), and a distance between the feature vector (s) and a feature vector (s-) extracted from motion data (x-) having different object information from the motion data (x) is increased (Aberman, in [§3.2 ("Training and Loss")], including Eq. (7)],  ∥𝐸𝑆 (n𝑡 ) − 𝐸𝑆 (w𝑠 )∥ where ws is the motion with different style t≠s and the loss encourages the distance between feature vectors that don’t share the same style to be bigger) (Aberman, in [§3.2 ("Training and Loss")], including Eq. (7)], teaches a triplet loss applied to latent style embeddings, wherein an anchor style embedding is encouraged to be closer to a positive embedding corresponding to the same style and farther from a negative embedding corresponding to a different style. This explicitly teaches decreasing distance for embeddings having the same subject / style characteristics and increasing distance for embeddings having different subject / style characteristics. )
taking a different action and the feature vector with the object information (Aberman, in [§3.1 ("Architecture")], teaches a generator network (G) that receives a content code and a style code and generates a new motion sequence combining motion content with different style attributes. Aberman explicitly teaches transferring style attributes from one motion sequence to another while maintaining motion content.)
a second motion data (x′), and feature vectors (s, s′) (Aberman, in [§3.1 and §3.2], teaches using an anchor motion and a separate motion providing style code, wherein the generator receives motion content from one sequence and style embedding from another sequence to generate a new motion output. This corresponds to receiving a first motion sequence (x) and a second motion sequence (x′) and their respective latent codes (s, s′) for recombination.)
comprises a generator (G) (Aberman, Fig. 1, explicitly teaches a generator network G forming part of an adversarial framework) configured to:
(i) receive the first motion data (x) and the feature vector (s) corresponding to the first motion data (x) to regenerate the first motion data (x) (Aberman, in [§3 ("Motion Style Transfer Framework")], teaches a reconstruction loss wherein the generator reconstructs the original motion when content and style codes originate from the same motion sequence. This corresponds to regenerating the first motion data from its own motion content and feature vector);
(ii) receive the first motion data (x) and the feature vector (s′) corresponding to the second motion data (x′) to generate a new motion data (                        
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) in which the object information is substituted while action information is maintained (Aberman, in [§3.1 ("Architecture")], teaches generating a new motion by combining content from one motion and style code from another motion. The generated output maintains motion content (action information) while substituting performer / style attributes); and
(iii) receive the generated new motion data (                        
                            
                                
                                    x
                                
                                ^
                            
                        
                    ) and the feature vector (s) corresponding to the first motion data (x) to regenerate the first motion data (x) (Aberman, in [§3 ("Motion Style Transfer Framework")], teaches cycle consistency, wherein a motion generated with substituted style is re-encoded and passed back through the generator with the original style code to reconstruct the original motion. This corresponds to regenerating the first motion from the generated motion and the original feature vector),

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Che’s motion generation and adversarial framework with the encoding, disentanglement, and metric-learning techniques taught by Aberman. Che already discloses a generative adversarial framework for motion generation including a generator (G) and discriminator (D) trained in opposition (Che, [0053–0056]). Aberman teaches encoding motion sequences into disentangled latent representations including content and style codes (§3.1), and further teaches applying a triplet loss to explicitly decrease the distance between embeddings having the same object / style characteristics and increase the distance between embeddings having different object / style characteristics (§3.2, Eq. (7)). Aberman further teaches recombining content from one motion sequence with style from another to generate new motion sequences while preserving action information (§3.1), as well as reconstruction and cycle consistency losses to maintain structural integrity of generated motion (§3). One of ordinary skill in the art would have been motivated to incorporate Aberman’s disentangled latent representation and metric-learning approach into Che’s adversarial motion synthesis framework in order to improve controllability, stability, and attribute preservation during motion generation. Such modification would have yielded predictable results, namely enhanced separation of motion content from object-related attributes and improved quality of synthesized motion data, consistent with the rationale set forth in KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398 (2007), that combining known elements according to known methods to yield predictable results is obvious. Additionally, Aberman itself discloses a generator–discriminator adversarial framework for motion generation (Fig. 1; §3), wherein a generator network produces motion sequences and a discriminator network is trained to distinguish generated sequences from real motion data. Thus, Aberman teaches the same general adversarial GAN paradigm already present in Che. The combination therefore represents the application of known adversarial motion generation techniques with known disentangled latent-space and metric-learning techniques to achieve predictable improvements in representation learning and motion synthesis quality.
Regarding claims 5 and 9, the rationale in the rejection of claim 1 is provided herein. In addition, the data augmentation apparatus of claim 1 corresponds to the method of claim 5, as well as the non-transitory computer-readable recording medium of claim 9, and performs the steps disclosed herein. Therefore, the claims are all ineligible.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEN KUDO whose telephone number is (571)272-4498. The examiner can normally be reached M-F 8am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached at 571-272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

KEN KUDO
Examiner
Art Unit 2671



/KEN KUDO/            Examiner, Art Unit 2671                                                                                                                                                                                            

/VINCENT RUDOLPH/            Supervisory Patent Examiner, Art Unit 2671
Read full office action
DATA AUGMENTATION APPARATUS AND METHOD FOR ACTION RECOGNITION THROUGH SELF-SUPERVISED LEARNING BASED ON OBJECT

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

DATA AUGMENTATION APPARATUS AND METHOD FOR ACTION RECOGNITION THROUGH SELF-SUPERVISED LEARNING BASED ON OBJECT

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email