Last updated: May 29, 2026

Application No. 18/290,495

LEARNING APPARATUS, LEARNING METHOD AND PROGRAM

Non-Final OA §103

Filed

Nov 14, 2023

Priority

May 17, 2021 — nonprovisional of PCTJP2021018586

Examiner

TENGBUMROONG, NATHAN NARA

Art Unit

2654

Tech Center

2600 — Communications

Assignee

NTT, Inc.

OA Round

2 (Non-Final)

This examiner grants 47% of cases after interview

— +26.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 19 resolved cases, 2023–2026

Examiner Intelligence

TENGBUMROONG, NATHAN NARA View full profile →

Grants 47% of resolved cases

Career Allowance Rate

9 granted / 19 resolved

-14.6% vs TC avg

Strong +27% interview lift

Without

With

+26.7%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

21 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§103

98.3%

+58.3% vs TC avg

§102

1.7%

-38.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 19 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Claims 1 and 5 are amended. Claims 1-6 are presented for examination.

Response to Arguments
Rejection under 35 U.S.C. 103
Applicant’s arguments have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Gfeller et al. (US 20210056980 A1; hereinafter referred to as Gfeller) in view of Hayakawa (US 20180167649 A1).
Regarding claim 1, Gfeller teaches: a learning device comprising: a processor ([0005] the present disclosure is directed to a computing system comprising at least one processor); and a storage medium having computer program instructions stored thereon, wherein the computer program instructions ([0090] The memory 654 can include one or more non-transitory computer-readable storage mediums), when executed by the processor, perform processing of: updating content of main conversion processing for converting data to be processed into data in a predetermined format by executing self-supervised learning ([0024] a self-supervised learning task can be applied to audio spectrograms. As examples, an audio signal can be converted to a log-mel spectrogram, a short-time Fourier transform (STFT), or other suitable signal, and one or more spectrogram slices of the audio signal can be sampled);
and executing data augmentation processing of generating data to be processed in the main conversion processing based on an acoustic time series ([0022] the audio signal can be converted to a log-mel spectrogram, and one or more spectrogram slices can be input into the machine-learned model. The machine-leaned model can include an encoder network and one or more decoder networks, and can be configured to output one or more determined characteristics associated with the audio signal),
wherein acoustic time series clipping processing of clipping a partial time series that is a time series of a part of the acoustic time series... ([0022] The audio signal can be sampled to select one or more slices. For example, in some implementations, the audio signal can be converted to a log-mel spectrogram, and one or more spectrogram slices can be input into the machine-learned model),
and the content of the main conversion processing is updated by self-supervised learning based on a result obtained by the conversion processing ([0038] provide an improvement to computing technology, particularly in the area of unsupervised (e.g., self-supervised) learning of machine-learned models... A loss function can be determined for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal. For example, a task-specific loss function can be determined for each particular task. In some implementations, a plurality of tasks can be trained concurrently using embeddings received from the encoder network. The machine-learned model can then be trained from end to end based at least in part on the loss function(s). The loss function can be used to update the conversion processing.).
Gfeller does not explicitly, but Hayakawa discloses: duplication processing of duplicating the partial time series ([0101] the audio processing unit 170 may stretch the reproduction time period by processing of dividing the audio waveform into a plurality of pieces) to create two partial time series having identical values ([0140] The duplication unit 173 duplicates the high-resolution audio data. The duplication unit 173 duplicates the number of times of duplication indicated by the additional information and supplies each of the generated duplicated audio data to the gain adjustment unit 175 as duplicated audio data. The number of duplications can be two.), and conversion processing of applying a different transformation to each of the two partial time series… ([0141-0142] The gain adjustment unit 175 adjusts a volume level of the duplicated audio data with a gain. The gain adjustment unit 175 adjusts the volume level with different gains for respective duplicated audio data, for example, according to the additional information… The equalizer processing unit 176 performs the equalizer processing of changing frequency characteristics of the duplicated audio data to the characteristics different from each other).
 Gfeller and Hayakawa are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Gfeller to combine the teachings of Hayakawa because doing so would allow for an acoustic time series to be converted into a different format using duplication processing and different transformations on a duplicated audio (Hayakawa [0155] The metadata generation unit 190 generates detailed setting data indicating reproduction time point and contents of the signal processing (the number of times of duplication and the like) of high-resolution audio data from additional information, stores the same in metadata, and supplies the metadata to a recording format conversion unit 150).

Regarding claim 5, it recites similar limitations as claim 1 and therefore is rejected similarly.

Regarding claim 6, Gfeller teaches: a non-transitory computer readable medium which stores a program for causing a computer to function... ([0005] at least one tangible, non-transitory computer-readable medium that stores instructions that, when executed by the at least one processor). The rest of the claim is rejected similarly to claim 1.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Gfeller in view of Hayakawa, as applied to claims 1 and 5-6 above, and further in view of Grauman et al. (US 20210174817 A1; hereinafter referred to as Grauman).
Regarding claim 2, Gfeller in view of Hayakawa teaches: the learning device according to claim 1. Gfeller in view of Hayakawa does not explicitly, but Grauman teaches: wherein the conversion processing includes first mix-up processing of changing a first partial time series that is one of partial time series using a first mixed time series that is another time series ([0005] each training data includes one or more sets of audio data and associated video data. In somе embodiments, the method includes performing object detection, mixing the audio data, and training the neural network for each training data. The step of performing object detection can include performing object detection on the video data of the one or more sets to detect one or more sound producing objects of the video data. The step of mixing the audio data can include mixing the audio data of the one or more sets to generate mixed audio data. This can include a first partial time series and a second partial time series.), and second mix-up processing of changing a second partial time series that is the other of partial time series using a second mixed time series different from the first mixed time series ([0088-0089] The video/audio separator 422 may provide the audio data from each video or from each set of input data to audio mixer 420... audio mixer 420 is configured to receive the audio data from video/audio separator 422 and mix, merge, etc., the audio data to generate mixed audio data. In some embodiments, audio mixer 420 provides the mixed audio data to magnitude spectrogram generator 412. In some embodiments, audio mixer 420 is configured to generate or output mixed audio 304 (e.g., x.sub.m(t)).
Gfeller, Hayakawa, and Grauman are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Gfeller and Hayakawa to combine the teachings of Grauman because doing so would allow for a time-series to be processed with mixed audio and converted to an acoustic image for further analyzing in order to improve the accuracy of acoustic time series conversions (Grauman [0062] The audio 208 (e.g., x(t) or x.sub.m(t) if the audio 208 is artificially mixed audio) can be transferred into a magnitude spectrogram 210. The magnitude spectrogram 210 can also be referred to as X.sup.M where X.sup.MEcustom-character.sub.+.sup.FxN and the magnitude spectrogram 210 includes F frequency bins and N short-time Fourier transform (STFT) frames. The magnitude spectrogram 210 may encode changes of a signal's frequency and phase content over time).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Gfeller in view of Hayakawa and Grauman, as applied to claim 2 above, and further in view of Gupta et al. (US 11617008 B1; hereinafter referred to as Gupta).
Regarding claim 3, the combination of Gfeller, Hayakawa, and Grauman teaches: the learning device according to claim 2. The combination of Gfeller, Hayakawa, and Grauman does not explicitly, but Gupta teaches: wherein using information indicating an intensity for each set of frequency and time as acoustic image data ([col 8, lines 19-21] To perform feature transformation, log scaled mel-spectrograms of the clip may be used. Spectrograms include intensity data.), the conversion processing includes first random resizing processing of executing affine conversion on at least a part of acoustic images expressing a first mixing time series that is a first partial time series after change by the first mix-up processing, and second random resizing processing of executing affine conversion on at least a part of acoustic images expressing a second mixing time series that is a second partial time series after change by the second mix-up processing ([col 8, lines 29-35] the system 100 may use various spectrogram augmentation schemes such as time warping of the spectrogram, time stretching and frequency stretching, spectrogram magnitude adjustment by a constant but random factor, and/or introducing time-frequency masking by masking 20% continuous time and frequency bins. This can be performed for multiple audio clips representing a first mixing time series and a second mixing time series.).
Gfeller, Hayakawa, Grauman, and Gupta are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Gfeller, Hayakawa, and Grauman to combine the teachings of Gupta because doing so would allow for further flexibility in resizing acoustic images for conversion processing (Gupta [col 8, lines 39-44] The augmented sample may be computed by time warping the spectrogram between 30% and 50%. Following the time warping, the sample may be time and frequency stretched in the range of [0- 20%]. The augmentation process may add time-frequency masking in 50% of the samples).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Gfeller in view of Hayakawa, as applied to claims 1 and 5-6 above, and further in view of Gupta.
Regarding claim 4, Gfeller in view of Hayakawa teaches: the learning device according to claim 1. Gfeller in view of Hayakawa does not explicitly, but Gupta teaches: wherein the conversion processing includes first random resizing processing of executing affine conversion ([col 8, lines 14-21] the datasets 25 may include thousands of audio clips, e.g., audio clips of up to 15 seconds with one or more tags or labels per clip. For preprocessing, the datasets may be divided into two-second overlapping clips with 50% overlap. Across all datasets, hundreds of thousands of such clips may be produced. To perform feature transformation, log scaled mel-spectrograms of the clip may be used) on at least a part of an acoustic image representing a first partial time series that is one of partial time series, and second random resizing processing of executing affine conversion on at least a part of an acoustic image representing a second partial time series that is the other of partial time series ([col 8, lines 29-35] the system 100 may use various spectrogram augmentation schemes such as time warping of the spectrogram, time stretching and frequency stretching, spectrogram magnitude adjustment by a constant but random factor, and/or introducing time-frequency masking by masking 20% continuous time and frequency bins).
Gfeller, Hayakawa, and Gupta are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Gfeller and Hayakawa to combine the teachings of Gupta because doing so would allow for further flexibility in resizing acoustic images for conversion processing (Gupta [col 8, lines 39-44] The augmented sample may be computed by time warping the spectrogram between 30% and 50%. Following the time warping, the sample may be time and frequency stretched in the range of [0-20%]. The augmentation process may add time-frequency masking in 50% of the samples).


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Nathan Tengbumroong whose telephone number is (703)756-1725. The examiner can normally be reached Monday - Friday, 11:30 am - 8:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at 571-272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NATHAN TENGBUMROONG/Examiner, Art Unit 2654                          

/HAI PHAN/Supervisory Patent Examiner, Art Unit 2654

Read full office action

Prosecution Timeline

Nov 14, 2023

Application Filed

Sep 17, 2025

Non-Final Rejection mailed — §103

Dec 10, 2025

Response Filed

Jan 09, 2026

Final Rejection mailed — §103

Apr 06, 2026

Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

18/195,121

Patent 12640161

METHOD AND APPARATUS FOR PROCESSING AUDIO FOR SCENE CLASSIFICATION

3y 0m to grant Granted May 26, 2026

18/173,495

Patent 12530536

Mixture-Of-Expert Approach to Reinforcement Learning-Based Dialogue Management

2y 11m to grant Granted Jan 20, 2026

17/876,156

Patent 12451142

NON-WAKE WORD INVOCATION OF AN AUTOMATED ASSISTANT FROM CERTAIN UTTERANCES RELATED TO DISPLAY CONTENT

3y 2m to grant Granted Oct 21, 2025

17/883,265

Patent 12412050

MULTI-PLATFORM VOICE ANALYSIS AND TRANSLATION

3y 1m to grant Granted Sep 09, 2025

Study what changed to get past this examiner. Based on 4 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

2-3

Expected OA Rounds

47%

Grant Probability

74%

With Interview (+26.7%)

3y 0m (~6m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 19 resolved cases by this examiner. Grant probability derived from career allowance rate.