Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This office action is in response to correspondence 09/05/25 regarding application 18/296,133, in which claims 1 and 10 were amended. Claims 1-20 are pending in the application, with claims 17-20 withdrawn. Claims 1-16 have been considered.
Response to Arguments
Amended claim 10 overcomes the objection for a minor informality, and so the objection is withdrawn.
Applicant argues on page 8-9 that the 35 U.S.C. 101 rejections should be withdrawn because as amended, the claims contain “a new type of neural network with a novel structure compared to existing growing neural gas network” (Remarks, page 9). While the examiner has determined via an updated search that the neural network found in the amended claim is not new (see the newly discovered reference to Palomo et al. 2017), the trained deep growing neural gas neural network as detailed in the amended claim is no longer merely recited at a high-level of generality (i.e., as a generic deep growing neural gas neural network) such that it can be considered mere instructions to apply the exception using a generic computer component. Also, it is not considered mere extra solution activity because the claim recites using the trained network to parse the transcript. Therefore, the 35 U.S.C. 101 rejections of claims 1-16 as being directed to an abstract idea without significantly more are withdrawn.
Applicant’s arguments on pages 10-12 regarding the 35 U.S.C. 103 rejections of claims 1-16 based on Tran, Andreakis, Ichimura, Carpenter, and Aronowitz have been considered but are moot in view of the new grounds for rejection, based in part on the newly discovered reference to Palomo et al. (“The Growing Hierarchical Neural Gas Self-Organizing Neural Network”. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 9, SEPTEMBER 2017), which describes and was necessitated by Applicant’s amendments directed to the details of the deep-growing neural gas neural network as claimed.
Claim Objections
In claim 1, line 4, should “the data set” be “the dataset”?
In claim 10, line 13, should “the data set” be “the dataset”?
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 2, 10, and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Tran et al. (“On the Role of Style in Parsing Speech with Neural Models”. arXiv:2010.04288v1 [cs.CL] 8 Oct 2020) in view of Palomo et al. (“The Growing Hierarchical Neural Gas Self-Organizing Neural Network”. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 9, SEPTEMBER 2017).
Consider claim 1, Tran discloses a method for dialogue parsing (parsing conversational speech, Section 3.1, page 2), the method comprising:
receiving dialogue transcript data (transcripts from the SWBD corpus of telephone speech conversations, Section 3.1, page 2);
pre-processing dialogue transcript data to generate pre-processed dialogue transcript data (generating the word embeddings ei from transcript words w1…wt, Figure 1, page 2, Section 2.1, page 1, Section 4.1, pages 2-3);
providing pre-processed dialogue transcript data as an input to a trained deep neural network (parser model accepts word embeddings ei, Section 2.1, page 1, which are provided to self-attentive parser, which is composed of a multihead self-attention encoder and span-based chart decoder, Section 2.2, page 1, Figure 1, page 2); and
receiving parsed dialogue transcript data as an output from the trained deep neural network (the predicted parse tree, e.g. Figure 2, page 4).
Tran does not specifically mention:
training a deep-growing neural gas neural network by:
extending a dataset to a deep neural network;
arranging a first subset of the data set into a layered topology, comprising L layers, to form a deep-growing neural gas neural network structure;
receiving training data at the deep-growing neural gas neural network to generate the trained deep-growing neural gas neural network.
Palomo discloses training a deep-growing neural gas neural network (the GHNG models, with 3 levels, i.e. “deep”, were trained using 100000 input samples and during N = 20000 time steps for each input distribution, page 2003, Section A) by:
extending a dataset to a deep neural network (a tree of self-organizing graphs, Section II, page 2001);
arranging a first subset of the data set into a layered topology, comprising L layers, to form a deep-growing neural gas neural network structure (each graph is the child of a unit in the upper level, except for the top level graph, to yield a hierarchy of graphs, Section II., page 2001; the maximum number of levels was set to three, i.e. “L layers”, Section A, page 2003; each level of the hierarchy is considered a layer making up a subset of the graphs);
receiving training data at the deep-growing neural gas neural network to generate the trained deep-growing neural gas neural network (the GHNG models, with 3 levels, i.e. “deep”, were trained using 100000 input samples and during N = 20000 time steps for each input distribution, page 2003, Section A).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran by training a deep-growing neural gas neural network by: extending a dataset to a deep neural network; arranging a first subset of the data set into a layered topology, comprising L layers, to form a deep-growing neural gas neural network structure; receiving training data at the deep-growing neural gas neural network to generate the trained deep-growing neural gas neural network in order to represent data in a more plastic and flexible way, as suggested by Palomo (Section 1, page 2000), predictably improving visualization capabilities and understanding of data, as suggested by Palomo (Section 1, page 2000). The references cited are analogous art in the same field of machine learning.
Consider claim 10, Tran discloses a system for dialogue parsing (parsing conversational speech, Section 3.1, page 2), the system comprising:
a memory, configured to store dialogue transcript data (a memory is inherent for storing the SWBD transcripts and audio files, page 2, Section 3.1); and
a processor, coupled to the memory, configured to execute a dialogue pre-processing module and trained deep neural network (a processor is inherent for running the experiments described at Section 4, as well as implementing the elements of Figure 1, page 2, and running word embeddings such as eLMo and BERT, pages 2-3, Section 4);
wherein the processor is configured to receive the dialogue transcript data from the memory (transcripts from the SWBD corpus of telephone speech conversations, Section 3.1, page 2), pre-process the dialogue transcript data using the dialogue pre-processing module to generate pre-processed dialogue transcript data (generating the word embeddings ei from transcript words w1…wt, Figure 1, page 2, Section 2.1, page 1, Section 4.1, pages 2-3), provide the pre-processed dialogue transcript data to the trained deep neural network as an input (parser model accepts word embeddings ei, Section 2.1, page 1, which are provided to self-attentive parser, which is composed of a multihead self-attention encoder and span-based chart decoder, Section 2.2, page 1, Figure 1, page 2), and receive parsed dialogue transcript data from the trained deep neural network as an output (the predicted parse tree, e.g. Figure 2, page 4).
Tran does not specifically mention:
training a deep-growing neural gas neural network by:
extending a dataset to a deep neural network;
arranging a first subset of the data set into a layered topology, comprising L layers, to form a deep-growing neural gas neural network structure;
receiving training data at the deep-growing neural gas neural network to generate the trained deep-growing neural gas neural network.
Palomo discloses training a deep-growing neural gas neural network (the GHNG models, with 3 levels, i.e. “deep”, were trained using 100000 input samples and during N = 20000 time steps for each input distribution, page 2003, Section A) by:
extending a dataset to a deep neural network (a tree of self-organizing graphs, Section II, page 2001);
arranging a first subset of the data set into a layered topology, comprising L layers, to form a deep-growing neural gas neural network structure (each graph is the child of a unit in the upper level, except for the top level graph, to yield a hierarchy of graphs, Section II., page 2001; the maximum number of levels was set to three, i.e. “L layers”, Section A, page 2003; each level of the hierarchy is considered a layer making up a subset of the graphs);
receiving training data at the deep-growing neural gas neural network to generate the trained deep-growing neural gas neural network (the GHNG models, with 3 levels, i.e. “deep”, were trained using 100000 input samples and during N = 20000 time steps for each input distribution, page 2003, Section A).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran by training a deep-growing neural gas neural network by: extending a dataset to a deep neural network; arranging a first subset of the data set into a layered topology, comprising L layers, to form a deep-growing neural gas neural network structure; receiving training data at the deep-growing neural gas neural network to generate the trained deep-growing neural gas neural network for reasons similar to those for claim 1.
Consider claim 2, Tran discloses the trained deep neural network is generated by providing object node data to an untrained deep neural network to train the untrained deep neural network (gold and silver parse trees were used in the training set for the parser, Table 1, page 2).
Tran does not specifically mention a growing neural gas neural network.
Palomo discloses a growing neural gas neural network (Abstract, page 10).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran by using a growing neural gas neural network for reasons similar to those for claim 1.
Consider claim 12, Tran discloses the trained deep neural network is generated by providing object node data to an untrained deep neural network to train the untrained deep neural network (gold and silver parse trees were used in the training set for the parser, Table 1, page 2).
Tran does not specifically mention a growing neural gas neural network.
Palomo discloses a growing neural gas neural network (Abstract, page 10).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran by using a growing neural gas neural network for reasons similar to those for claim 1.
Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Tran et al. (“On the Role of Style in Parsing Speech with Neural Models”. arXiv:2010.04288v1 [cs.CL] 8 Oct 2020) in view of Palomo et al. (“The Growing Hierarchical Neural Gas Self-Organizing Neural Network”. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 9, SEPTEMBER 2017), in further view of Ichimura (US 20170270095).
Consider claim 3, Tran discloses pre-processing dialogue transcript data comprises: applying word embeddings to dialogue transcript data to convert words into word embeddings (generating the word embeddings ei from transcript words w1…wt, Figure 1, page 2, Section 2.1, page 1, Section 4.1, pages 2-3); and applying concepts to the words of dialogue transcript data to associate words of dialogue transcript data to concepts (e.g. NP, VP, etc., Fig 2, page 4).
Tran and Palomo do not specifically mention a concept dictionary.
Ichimura discloses a concept dictionary (concept dictionary, Fig 2, [0008]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran and Palomo by including a concept dictionary in order to better understand variations of utterances, as suggested by Ichimura ([0004-0006]), predictably resulting in better understanding of user intention, as suggested by Ichimura ([0004-0006]). The references cited are analogous art in the same field of natural language.
Consider claim 13, Tran discloses pre-processing dialogue transcript data comprises: applying word embeddings to dialogue transcript data to convert words into word embeddings (generating the word embeddings ei from transcript words w1…wt, Figure 1, page 2, Section 2.1, page 1, Section 4.1, pages 2-3); and applying concepts to the words of dialogue transcript data to associate words of dialogue transcript data to concepts (e.g. NP, VP, etc., Fig 2, page 4).
Tran and Palomo do not specifically mention a concept dictionary.
Ichimura discloses a concept dictionary (concept dictionary, Fig 2, [0008]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran and Palomo by including a concept dictionary for reasons similar to those for claim 3.
Claims 4, 5, 11, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Tran et al. (“On the Role of Style in Parsing Speech with Neural Models”. arXiv:2010.04288v1 [cs.CL] 8 Oct 2020) in view of Palomo et al. (“The Growing Hierarchical Neural Gas Self-Organizing Neural Network”. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 9, SEPTEMBER 2017), in further view of Carpenter, II et al. (US 10592706).
Consider claim 4, Tran discloses: collecting audio data, wherein the audio data comprises human dialogue (The SWBD audio files of telephone speech conversations, page 2, section 3.1).
Tran and Palomo do not specifically mention applying a speech recognition algorithm to audio stream data to generate dialogue transcript data.
Carpenter II discloses applying a speech recognition algorithm to audio stream data to generate dialogue transcript data (converting words in the audio stream of a customer order to text using a speech recognition module, Col 3 lines 37-47).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran and Palomo by applying a speech recognition algorithm to audio stream data to generate dialogue transcript data in order to reduce slow, inaccurate, or inefficient capture of verbal orders, predictably reducing customer frustration, as suggested by Carpenter, II (Col 1 lines 50-56). The references cited are analogous art in the same field of audio processing.
Consider claim 5, Tran and Palomo do not, but Carpenter II discloses the audio stream data comprises quick service restaurant order audio (converting words in the audio stream of a customer fast food order to text using a speech recognition module, Col 3 lines 37-47, Col 1 lines 50-56).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran and Palomo such that the audio stream data comprises quick service restaurant order audio for reasons similar to those for claim 4.
Consider claim 11, Tran and Palomo do not, but Carpenter II discloses the system further comprises: an audio capture device, configured to capture audio stream data, and provide the audio stream data to the memory for storage (audio stream from a microphone to a base station, Col 3 lines 31-34, an audio buffer, i.e. storage memory, inherent for performing speech recognition, Col 3 lines 37-47); and wherein the processor further comprises a speech recognition module, configured to receive audio stream data from the memory as an input, generate dialogue transcript data as an output and transmit dialogue transcript data to the memory for storage (converting words in the audio stream of a customer fast food order to text using a speech recognition module, Col 3 lines 37-47, the text stored in memory for future access, Col 11-12 lines 65-8).
Carpenter II discloses the system further comprises: an audio capture device, configured to capture audio stream data, and provide the audio stream data to the memory for storage; and wherein the processor further comprises a speech recognition module, configured to receive audio stream data from the memory as an input, generate dialogue transcript data as an output and transmit dialogue transcript data to the memory for storage.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran and Palomo such that the system further comprises: an audio capture device, configured to capture audio stream data, and provide the audio stream data to the memory for storage; and wherein the processor further comprises a speech recognition module, configured to receive audio stream data from the memory as an input, generate dialogue transcript data as an output and transmit dialogue transcript data to the memory for storage for reasons similar to those for claim 4.
Consider claim 14, Tran and Palomo do not, but Carpenter II discloses the audio stream data comprises quick service restaurant order audio (converting words in the audio stream of a customer fast food order to text using a speech recognition module, Col 3 lines 37-47, Col 1 lines 50-56).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran and Palomo such that the audio stream data comprises quick service restaurant order audio for reasons similar to those for claim 4.
Claims 6, 7, and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Tran et al. (“On the Role of Style in Parsing Speech with Neural Models”. arXiv:2010.04288v1 [cs.CL] 8 Oct 2020) in view of Palomo et al. (“The Growing Hierarchical Neural Gas Self-Organizing Neural Network”. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 9, SEPTEMBER 2017), in further view of Aronowitz (US 20090319269).
Consider claim 6, Tran and Palomo do not, but Aronowitz discloses: collecting audio stream data (an unlabeled audio stream, [0012]); and diarizing audio stream data, generating sequenced speech data (generating sequenced segments labeled by speaker identity, [0029]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran and Palomo by collecting audio stream data and diarizing audio stream data, generating sequenced speech data in order to optimize performance and accuracy of speech and speaker recognition systems, as suggested by Aronowitz ([0013]). The references cited are analogous art in the same field of audio processing.
Consider claim 7, Tran and Palomo do not, but Aronowitz discloses diarizing audio stream data comprises: extracting features of audio stream data (vectors representing the audio characteristics, Fig 5 step 74, [0031]); separating audio stream data into data chunks (divide audio stream into small evenly spaced segments, step 72, Fig 5, [0031]); and providing chunked audio stream data to a trained speech sequencing module (clustering engine, which is trained to cluster by participant and combine adjacent segments, [0029-0031]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran and Palomo such that diarizing audio stream data comprises: extracting features of audio stream data; separating audio stream data into data chunks; and providing chunked audio stream data to a trained speech sequencing module for reasons similar to those for claim 6.
Consider claim 9, Tran and Palomo do not, but Aronowitz discloses the trained speech sequencing module is trained is generated by providing speech sequencing training data to an untrained trained speech sequencing module to train the trained speech sequencing module (creating the intra-speaker variability profiles from the training data which is the labeled audio stream, [0027]-[0032]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran and Palomo such that the trained speech sequencing module is trained is generated by providing speech sequencing training data to an untrained trained speech sequencing module to train the trained speech sequencing module for reasons similar to those for claim 6.
Claims 8, 15, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Tran et al. (“On the Role of Style in Parsing Speech with Neural Models”. arXiv:2010.04288v1 [cs.CL] 8 Oct 2020) in view of Palomo et al. (“The Growing Hierarchical Neural Gas Self-Organizing Neural Network”. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 9, SEPTEMBER 2017), in further view of Aronowitz (US 20090319269), in further view of Carpenter, II et al. (US 10592706).
Consider claim 8, Tran, Palomo, and Aronowitz do not, but Carpenter II discloses the audio stream data comprises quick service restaurant order audio (converting words in the audio stream of a customer fast food order to text using a speech recognition module, Col 3 lines 37-47, Col 1 lines 50-56).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran, Palomo, and Aronowitz such that the audio stream data comprises quick service restaurant order audio in order to reduce slow, inaccurate, or inefficient capture of verbal orders, predictably reducing customer frustration, as suggested by Carpenter, II (Col 1 lines 50-56).
Consider claim 15, Tran and Palomo do not, but Aronowitz discloses providing an audio stream data to the memory for storage (audio stream is loaded, [0031]); and wherein the processor further comprises a diarizing module (software executed by a computer to implement the diarization method,[0022]) configured to receive audio stream data from the memory as an input, generate sequenced speech data as an output and transmit sequenced speech data to the memory for storage (clustering engine receives loaded audio stream and clusters by participant, then combines and stores adjacent segments, [0029-0031]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran and Palomo by providing an audio stream data to the memory for storage; and wherein the processor further comprises a diarizing module configured to receive audio stream data from the memory as an input, generate sequenced speech data as an output and transmit sequenced speech data to the memory for storage in order to optimize performance and accuracy of speech and speaker recognition systems, as suggested by Aronowitz ([0013]).
Tran, Palomo, and Aronowitz do not specifically mention an audio capture device, configured to capture audio stream data.
Carpenter II discloses an audio capture device, configured to capture audio stream data (audio stream from a microphone to a base station, Col 3 lines 31-34).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran, Palomo, and Aronowitz by including an audio capture device, configured to capture audio stream data for reasons similar to those for claim 6.
Consider claim 16, the Tran- Palomo-Aronowitz-Carpenter II combination discloses the system of claim 15 (see above). Regarding the further features of claim 16, Tran and Palomo do not, but Aronowitz discloses generate sequenced speech data comprises: extracting features of audio stream data (vectors representing the audio characteristics, Fig 5 step 74, [0031]); separating audio stream data into data chunks (divide audio stream into small evenly spaced segments, step 72, Fig 5, [0031]); and providing chunked audio stream data to a trained speech sequencing module (clustering engine, which is trained to cluster by participant and combine adjacent segments, [0029-0031]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tran- Palomo-Aronowitz-Carpenter II further such that diarizing audio stream data comprises: extracting features of audio stream data; separating audio stream data into data chunks; and providing chunked audio stream data to a trained speech sequencing module in order to optimize performance and accuracy of speech and speaker recognition systems, as suggested by Aronowitz ([0013]).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Jesse S Pullias/
Primary Examiner, Art Unit 2655 09/23/25