DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This office action is in response to arguments and amendments entered on July 17, 2025 for the patent application 17/220,781 originally filed on April 1, 2021. Claims 1-4, 8-11, 14-17, 21, 22, and 25 are amended. Claims 6, 13, 18-20, and 24 are canceled. Claim 26 is new. Claims 1-5, 7-12, 14-17, 21-23, 25, and 26 remain pending. The first office action of May 14, 2024, the second office action of November 29, 2024, and the third office action of April 21, 2025 are fully incorporated by reference into this office action.
Response to Amendment
Applicant’s amendments to the claims have been noted by the Examiner.
Applicant’s amendments to the claims are not sufficient to overcome the outstanding rejections under 35 USC 101, for reasons set forth below.
The Applicant’s amendments to the claims are sufficient to overcome the outstanding rejections under 35 USC 103. However, new rejections under 35 USC 103 are applied below in light of the amended limitations of the claims.
Claim Rejections - 35 USC § 101
35 U.S.C. § 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-5, 7-12, 14-17, 21-23, 25, and 26 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.
Claim 1 is directed to “a method” (i.e. a process), claim 10 is directed to “a computing device” (i.e. a machine), and claim 16 is directed to “an electronic device” (i.e. a machine), hence the claims are directed to one of the four statutory categories (i.e. process, machine, manufacture, or composition of matter). In other words, Step 1 of the subject-matter eligibility analysis is “Yes.”
However, the claims are drawn to an abstract idea of “entering a demonstration mode” in the form of “mental processes,” in terms of processes that can be performed in the human mind (including an observation, evaluation, judgement or opinion) which are “performed on a computer” (per MPEP 2106(III)(C) “A Claim That Requires a Computer May Still Recite a Mental Process”).
Regardless, the claims are reasonably understood as “mental processes,” which require the following limitations:
“capturing, automatically… audio data that includes a conversation including a first audio source associated with a primary entity and a second audio source associated with a secondary entity within a proximity…
encoding the audio data as a vector representation for processing by a machine learning model that includes representations of audio frequency data for overlapping time windows of the audio data, the machine learning model trained to generate intent classifications for each entity within conversations based on vector representations that include audio frequency data for overlapping time windows;
processing the encoded audio data by the machine learning model to determine an intent of the primary entity is to have the secondary entity assess features of [a] computing device and an intent of the secondary entity is to assess features of the computing device; and
changing, automatically and without user interaction based on the intent of the primary entity and the intent of the secondary entity, one or more device property values… to demonstration mode device property values including at least one demonstration mode gesture value that specifies an input style for… navigation in a user interface.”
These limitations simply describe a process of data gathering and manipulation, which is partially analogous to “collecting information, analyzing it, and displaying certain results of the collection analysis” (i.e. Electric Power Group, LLC, v. Alstom, 830 F.3d 1350, 119 U.S.P.Q.2d 1739 (Fed. Cir. 2016)). Hence, these limitations are akin to an abstract idea which has been identified among non-limiting examples to be an abstract idea. In other words, Step 2A, Prong 1 of the subject-matter eligibility analysis is “Yes.”
Furthermore, the claims do not include additional elements that either alone or in combination are sufficient to claim a practical application because to the extent that, e.g., “a computing device,” “one or more sensors,” “a processor,” “a microphone,” “a computer-readable storage medium,” “an electronic device,” “a storage device,” and “a display mode determination module” are claimed, as these are merely claimed to add insignificant extra-solution activity to the judicial exception (e.g., data gathering) and/or do no more than generally link the use of a judicial exception to a particular technological environment or field of use. In other words, the claimed “entering a demonstration mode,” is not providing a practical application, thus Step 2A, Prong 2 of the subject-matter eligibility analysis is “No.”
Likewise, the claims do not include additional elements that either alone or in combination are sufficient to amount to significantly more than the judicial exception because to the extent that, e.g., “a computing device,” “one or more sensors,” “a processor,” “a microphone,” “a computer-readable storage medium,” “an electronic device,” “a storage device,” and “a display mode determination module” are claimed, these are all generic, well-known, and conventional computing elements. As evidence that these are generic, well-known, and conventional computing elements, Applicant’s specification discloses them in a manner that indicates that the additional elements are sufficiently well-known that the specification does not need to describe the particulars of such additional elements to satisfy 35 U.S.C. § 112(a), per MPEP § 2106.07(a) III (a), which satisfies the Examiner’s evidentiary burden requirement per the Berkheimer memo.
Specifically, the Applicant’s claimed “computing device” is described in paragraph [0006] as including “many different types of computing or electronic devices. For example, the computing device 102 can be a smartphone or other wireless phone, a camera (e.g., compact or single-lens reflex), or a tablet or phablet computer. By way of further example, the computing device 102 can be a notebook computer (e.g., netbook or ultrabook), a laptop computer, a wearable device (e.g., a smartwatch, an augmented reality headset or device, a virtual reality headset or device), a personal media player, a personal navigating device (e.g., global positioning system), an entertainment device (e.g., a gaming console, a portable gaming device, a streaming media player, a digital video recorder, a music or other audio playback device), a video camera, an Internet of Things (IoT) device, an automotive computer, and so forth.” The Applicant’s claimed “electronic device” is described in paragraph [0065] as “any of the devices described with reference to FIGs. 1-5, such as any type of client device, mobile phone, tablet, computing, communication, entertainment, gaming, media playback, or other type of electronic device.” The Applicant’s claimed “one or more sensors” are described in paragraph [0010] as “any of a variety of different types of sensors, such as a fingerprint sensor (e.g., a capacitive scanner, an optical scanner, an ultrasonic scanner, etc.), an image sensor (e.g., a charge-coupled device (CCD) sensor or a complementary metal- oxide-semiconductor (CMOS) sensor), a touchscreen (e.g., as part of the display 104 on one or more surfaces of the computing device 102), and so forth. In one or more embodiments, the sensor 112 is an audio sensor (e.g., a microphone, such as any suitable type of microphone incorporating a transducer that converts sound into an electrical signal, including a dynamic microphone, a condenser microphone, a piezoelectric microphone, and so forth).” These elements are reasonably interpreted as a generic computer which provide no details of anything beyond ubiquitous standard equipment.
Applicant’s claimed “a processor,” “a microphone,” “a computer-readable storage medium,” and “a storage device” are all claimed as and described in the instant specification as generic computer components.
The claimed “display mode determination module” is described in paragraph [0080] as being implemented at least in part in hardware, and configured to analyze conversations. It is reasonably understood that the “display mode determination module” is a generic component of a conventional computer.
As such, the claimed limitations of “a computing device,” “a processor,” “a microphone,” “a computer-readable storage medium,” “an electronic device,” “a storage device,” and “a display mode determination module” are reasonably understood as not providing anything significantly more. Therefore, Step 2B, of the subject-matter eligibility analysis is “No.”
In addition, dependent claims 2-5, 7-9, 11, 12, 14, 15, 17, and 21-26 do not provide a practical application and are insufficient to amount to significantly more than the judicial exception. As such, dependent claims 2-5, 7-9, 11, 12, 14, 15, 17, and 21-26 are also rejected under 35 U.S.C. § 101, based on their respective dependencies to independent claims 1, 10, and 16.
Therefore, claims 1-5, 7-12, 14-17, 21-23, 25, and 26 are rejected under 35 U.S.C. § 101 as being directed to non-statutory subject matter.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-4, 7, 10, 11, 16, 17, 21-23, 25, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Kanevsky et al. (hereinafter “Kanevsky,” US 2017/0094049) in view of Parthasarathi et al. (hereinafter “Parthasarathi,” US 2017/0270919), and in further view of Shintani et al. (hereinafter “Shintani,” US 2011/0164143).
Regarding claim 1, and substantially similar limitations in claims 10, 16, 22, and 26, Kanevsky discloses a method implemented in a computing device, the method comprising:
capturing, automatically by one or more audio sensors of the computing device, audio data that includes a conversation including a first audio source associated with a primary entity and a second audio source associated with a secondary entity within a proximity to the computing device (Kanevsky [0044], “Delegation module 16 may also be able to determine that voice inputs received by computing device 2 from a first user and a second user indicate intentional delegation of computing device 2 from the first user to the second user even if neither the delegator (i.e., first user) nor the delegatee (i.e., the second user) speaks a voice command that is recognized by delegation module 16 as a command that directs delegation module 16 to delegate computing device 2 from the first user to the second user or a command that directs delegation module 16 to accept delegation of computing device 2 by the second user from the first user. Computing device 2 may receive voice input from a conversation held between the first user and the second user and may determine, based on the conversation, a delegation of computing device 2 from the first user to the second user. For example, John may ask Lena “Lena, which road should I take to avoid traffic?” Lena may respond by saying “let me check that with your phone.” In this example, neither of the voice inputs received by computing device 2 from John and Lena is a command that is recognized by delegation module 16.”).
Kanevsky may not explicitly teach every limitation of encoding the audio data as a vector representation for processing by a machine learning model that includes representations of audio frequency data for overlapping time windows of the audio data, the machine learning model trained to generate intent classifications for each entity within conversations based vector representations that include audio frequency data for overlapping time windows; processing the encoded audio data by the machine learning model to determine an intent of the primary entity is to have the secondary entity assess features of the computing device.
However, Parthasarathi discloses encoding the audio data as a vector representation for processing by a machine learning model that includes representations of audio frequency data for overlapping time windows of the audio data, the machine learning model trained to generate intent classifications for each entity within conversations based vector representations that include audio frequency data for overlapping time windows; processing the encoded audio data by the machine learning model to determine an intent of the primary entity is to have the secondary entity assess features of the computing device (Parthasarathi [0018], “encoding reference audio data into a feature vector”; also Parthasarathi [0049], “In one configuration each audio frame includes 25 ms of audio and the frames start at 10 ms intervals resulting in a sliding window where adjacent audio frames include 15 ms of overlapping audio. Many different features for a particular frame may be determined, as known in the art, and each feature represents some quality of the audio that may be useful for ASR [automatic speech recognition] processing,” overlapping time windows of audio data; also Parthasarathi [0032], “The server 120 then processes (136) further input audio data (such as audio feature vectors corresponding to further audio frames) using the encoded reference audio data. An audio frame corresponds to a particular set of audio data, for example 25 ms worth of PCM or similar audio data. For example, the server 120 may use a classifier or other trained machine learning model to determine if the incoming audio feature vectors represent speech from the same speaker as the speech in the reference audio data by using the encoded reference audio data. The server then labels (138) each audio feature vector (and/or the corresponding audio frame),” processing the audio data with machine learning; also Parthasarathi [0055], “The NLU [natural language understanding] process takes textual input (such as processed from ASR [automatic speech recognition] 250 based on the utterance 11) and attempts to make a semantic interpretation of the text. That is, the NLU process determines the meaning behind the text based on the individual words and then implements that meaning. NLU processing 260 interprets a text string to derive an intent or a desired action from the user as well as the pertinent pieces of information in the text that allow a device (e.g., device 110) to complete that action,” determining and classifying intent).
Parthasarathi is analogous to Kanevsky, as both are drawn to the art of audio processing. It would be obvious to try by one of ordinary skill in the art at the time of filing to have modified the method as taught by Kanevsky, to include encoding the audio data as a vector representation for processing by a machine learning model that includes representations of audio frequency data for overlapping time windows of the audio data, the machine learning model trained to generate intent classifications for each entity within conversations based vector representations that include audio frequency data for overlapping time windows; processing the encoded audio data by the machine learning model to determine an intent of the primary entity is to have the secondary entity assess features of the computing device, as taught by Parthasarathi, since it applies known techniques of audio encoding for machine learning processing to determine intent to a known method ready for improvement to yield predictable results. Doing so is a predictable solution that one of ordinary skill in the art could have pursued with a reasonable expectation of success.
Kanevsky in view of Parthasarathi does not teach every limitation of changing, automatically and without user interaction based on the intent of the primary entity and the intent of the secondary entity, one or more device property values of the computing device to demonstration mode device property values including at least one demonstration mode gesture value that specifies an input style for touch-based navigation in a user interface of the computing device.
Kanevsky does disclose changing, automatically and without user interaction based on the intent of the primary entity and the intent of the secondary entity, one or more device property values of the computing device (Kanevsky [0052-0053], “In examples where behavioral module 14 determines that the user physically possessing computing device 2 is the primary user of computing device 2, and that the change in possession of computing device 2 is a change away from the primary user being in possession of computing device 2, such that the primary user of computing device 2 is no longer physically possessing computing device 2 after the change in possession of computing device 2, delegation control module 8 may change the level of access to functionality of computing device 2 from a first level of access granted to the primary user of computing device 2 to a second level of access that is of a lower level of access than the first level of access. In other words, the second level of access may be more limited or restricted than the first level of access… In the example of where first user states “delegate my phone to Lena as a user level 2,” “user level 2” may be associated with a specific set of access rights and/or permissions to functionality of computing device 2. In some examples, if the user specifies a specific delegatee (e.g., “Lena”), the delegatee may be associated with a specific level of access. In some examples, the first and second users may specify the application on computing device 2 that the delegatee is restricted to using”). However, Kanevsky does not teach changing the device property values to demonstration mode device property values including at least one demonstration mode gesture value that specifies an input style for touch-based navigation in a user interface of the computing device.
However, Shintani discloses changing the device property values to demonstration mode device property values including at least one demonstration mode gesture value that specifies an input style for touch-based navigation in a user interface of the computing device (Shintani [0021], “If the TV has been set in a demonstration interactive mode wherein the system is awaiting a customer's appropriate gesture to enable the demo at 18, then the system's camera looks for the appropriate gesture. A suitable and readily implemented gesture is a smile. Currently available cameras are equipped with software, hardware or firmware that can detect faces and smiles to aid in focus and timing of a photograph, and similar technology can be applied here. In the case of detection of a smile, the face is detected at 22 and analyzed for the presence of a smile. If a smile is detected at 26, then the TV can enter a demonstration mode at 30 which pre-empts other video inputs. Hence, at the prompting or spontaneous occurrence of the pre-determined gesture, the TV enters the demo mode at 30 and a demo can play to completion at 34 without need for the user to interact using a remote controller or other device that could leave the TV in an undesirable mode of operation,” adjusting the computer property values to be in demo mode; also Shintani [0024], “during the demo 30, the demo can pause for a user input in the form of a gesture representing a "yes" answer or selection of a gesture from a menu of gestures to select more information,” settings so that certain gestures are available during demo mode, which are changes to the input style for navigation of the demo mode that were unavailable when outside of demo mode; Official Notice is taken that televisions with touch-screen navigation capabilities were well known at the time the invention was filed).
Shintani is analogous to Kanevsky in view of Parthasarathi, as both are drawn to the art of device sharing. It would be obvious to try by one of ordinary skill in the art at the time of filing to have modified the method as taught by Kanevsky in view of Parthasarathi, to include changing the device property values to demonstration mode device property values including at least one demonstration mode gesture value that specifies an input style for touch-based navigation in a user interface of the computing device, as taught by Shintani, since it would be a simple substitution of one known element (software features enabled) for another to obtain predictable results. Doing so is a predictable solution that one of ordinary skill in the art could have pursued with a reasonable expectation of success.
Regarding claim 2, Kanevsky in view of Parthasarathi and Shintani discloses the primary entity including an owner of the computing device and the secondary entity including a secondary user of the computing device, and the intent of the secondary entity indicates that the secondary user is requesting the computing device to assess the features of the computing device (Kanevsky [0053], “In some examples, if the user specifies a specific delegatee (e.g., “Lena”), the delegatee may be associated with a specific level of access. In some examples, the first and second users may specify the application on computing device 2 that the delegatee is restricted to using. For example, in the aforementioned example where the first user says “Lena, please check the route in Maps,” and the second user says “open Maps,” the second user may have a level of access that enables the second user to only use the Maps application and that locks the second user out of the other functionalities of computing device 2,” the secondary user is requesting to open Maps).
Regarding claim 3, Kanevsky in view of Parthasarathi and Shintani discloses the primary entity including an owner of the computing device and the secondary entity including a secondary user of the computing device, and the intent of the first entity indicates that the owner of the computing device is requesting that the secondary user assess the features of the computing device (Kanevsky [0053], “In some examples, if the user specifies a specific delegatee (e.g., “Lena”), the delegatee may be associated with a specific level of access. In some examples, the first and second users may specify the application on computing device 2 that the delegatee is restricted to using. For example, in the aforementioned example where the first user says “Lena, please check the route in Maps,” and the second user says “open Maps,” the second user may have a level of access that enables the second user to only use the Maps application and that locks the second user out of the other functionalities of computing device 2,” the owner of the device is delegating a task on Maps to a secondary user).
Regarding claim 4, and substantially similar limitations in claims 11 and 17, Kanevsky in view of Parthasarathi and Shintani discloses the intent of the primary entity or the intent of the secondary entity determined based on one or more words or key phrases detected in the conversation directed to one of the primary entity or the secondary entity rather than to the computing device (Kanevsky [0045], “delegation module 16 may receive the conversation between John and Lena as vocal input and may parse the conversation to determine that John implicitly delegated computing device 2 to Lena and that Lena in response implicitly accepts the delegation of computing device 2. Delegation module 16 may determine that the sentence “Lena, which road should I take to avoid traffic” includes the name Lena and is a question directed towards Lena. Further, delegation module 16 may determine that the question asked by John can be answered using the Maps application on computing device 2. As such, delegation module 16 may determine that the phrase spoken by John is an implicit command to delegate computing device 2 to Lena. Delegation module 16 may further determine that the sentence “let me check that with your phone” includes the word “your phone” as well as the action term “check that.” As such, delegation module 16 may determine that the phrase spoken by Lena is an implicit command to accept delegation of computing device 2 from John.”).
Regarding claim 7, Kanevsky in view of Parthasarathi does not explicitly teach the demonstration mode device property values including a value indicating to automatically enter a demonstration mode to run a demonstration program on the computing device that highlights features of the computing device.
However, Shintani discloses the demonstration mode device property values including a value indicating to automatically enter a demonstration mode to run a demonstration program on the computing device that highlights features of the computing device (Shintani [0024], “a basic demo might include video content that highlights the performance of the TV along with text (and possibly audio) that will explain basic features that are associated with the TV. But, during the course of the demo, it may be desired to permit the user to select features that he or she desires further information about.”).
Shintani is analogous to Kanevsky in view of Parthasarathi, as both are drawn to the art of device sharing. It would be obvious to try by one of ordinary skill in the art at the time of filing to have modified the method as taught by Kanevsky in view of Parthasarathi, to include the demonstration mode device property values including a value indicating to automatically enter a demonstration mode to run a demonstration program on the computing device that highlights features of the computing device, as taught by Shintani, since it would be a simple substitution of one known element (software features enabled) for another to obtain predictable results. Doing so is a predictable solution that one of ordinary skill in the art could have pursued with a reasonable expectation of success.
Regarding claim 21, Kanevsky in view of Parthasarathi and Shintani discloses wherein the computing device includes a biometric sensor configured to detect biometric data that includes one or more of a voice, facial features, fingerprint features, or a grip on the computing device of one or more of the primary entity or the secondary entity, the method further comprising detecting a current user of the computing device based on the audio data and the biometric data (Kanevsky [0086], “determining delegation of computing device 2 from the first user to the second user may be further based at least in part on biometric data received by computing device”; also Kanevsky [0047-0048], “For example, a mom may physically give computing device 2 to her child so that her child may play a computer game on computing device 2. Because the mom may be apprehensive about her child possibly damaging computing device 2, she may tell the child “be careful” or may otherwise speak in an apprehensive tone. Delegation module 16 may analyze the mom's speech and may determine a change in emotional state in the mom and/or may determine an apprehensive emotion based on the mom's speech. Delegation module 16 may then determine that the mom intends to delegate computing device 2 based at least in part on the emotion detected in the mom's speech… Delegation module 16 may further determine whether the user that possesses computing device 2 during or after delegation of computing device 2 by delegator is the intended delegatee of computing device 2 based at least in part on authenticating the user that possesses computing device 2. Computing device 2 may perform such authentication by authenticating the fingerprint, facial features, and/or other biometric features of the user.”).
Regarding claim 23, Kanevsky in view of Parthasarathi and Shintani discloses the electronic device including one or more sensors that include a biometric sensor configured to identify one or more voices in the audio data and determine one or more voice features for the one or more voices (Kanevsky [0042], “If delegation module 16 determines, such as via voice authentication, that Lena is the person that accepts delegation of computing device”).
Regarding claim 25, Kanevsky in view of Parthasarathi and Shintani does not explicitly teach wherein the demonstration program includes a plurality of slides to highlight features of the computing device, and the demonstration mode gesture value controls the input style to navigate between the plurality of slides using touch-based input. However, the display of “slides” as opposed to any other screen content is an obvious design choice. Shintani discloses a “demo mode” and gives the example of televisions, which implies video content. However, Shintani does not specify exactly what type content is demo content. The content displayed as demo content presumably would be content that highlights the capabilities of whatever device is being demonstrated. Applicant has not disclosed that specifically showing “slides,” which are being interpreted as any displayed still images, as demonstration content solves any stated problem or is for any particular purpose. Moreover, it appears that any content display using the device of Kanevsky in view of Parthasarathi and Shintani or the Applicant would perform equally well. Therefore, it would have been prima facie obvious to modify Kanevsky in view of Parthasarathi and Shintani to obtain the computing device as specified in claim 25, because such a modification would have been considered a mere design consideration which fails to patentably distinguish over the prior art of Kanevsky in view of Parthasarathi and Shintani.
Claims 5 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Kanevsky in view of Parthasarathi and Shintani, and in further view of Dvijotham et al. (hereinafter “Dvijotham,” US 2021/0334459).
Regarding claim 5, and substantially similar limitations in claim 12, Kanevsky in view of Parthasarathi and Shintani does not teach wherein the machine learning model is trained to minimize a loss between classifications of whether an intent of a user is to have a secondary user assess features of the computing device generated by the machine learning model for training data and known labels corresponding to the training data.
However, Dvijotham discloses wherein the machine learning model is trained to minimize a loss between classifications of whether an intent of a user is to have a secondary user assess features of the computing device generated by the machine learning model for training data and known labels corresponding to the training data (Dvijotham [0029], “The text classification machine learning model receives as input combined feature representations of text samples that each include a sequence of words from a vocabulary, e.g., a vocabulary of words in a particular natural language.”; also Dvijotham [0037], “The training system 100 trains the text classification machine learning model by adjusting the values of the model parameters to minimize a loss function that measures errors between classifications generated by the text classification machine learning model and target classifications for the training text samples. For example, if the text classification machine learning model is a binary classifier for a given category of interest that receives a combined feature representation of a text sample and generates a predicted probability between 0 and 1 that characterizes the likelihood that the text sample belongs to the category of interest, then the loss function can be the cross-entropy loss function”).
Dvijotham is analogous to Kanevsky in view of Parthasarathi and Shintani, as both are drawn to the art of natural language processing. It would be obvious to try by one of ordinary skill in the art at the time of filing to have modified the method as taught by Kanevsky in view of Parthasarathi and Shintani, to include wherein the machine learning model is trained to minimize a loss between classifications of whether an intent of a user is to have a secondary user assess features of the computing device generated by the machine learning model for training data and known labels corresponding to the training data, as taught by Dvijotham, since it applies a known technique for minimizing loss to a known natural language processing methods ready for improvement to yield predictable results. Doing so is a predictable solution that one of ordinary skill in the art could have pursued with a reasonable expectation of success.
Claims 8 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Kanevsky in view of Parthasarathi and Shintani, and in further view of He et al. (hereinafter “He,” US 2020/0236454).
Regarding claim 8, and substantially similar limitations in claim 14, Kanevsky in view of Parthasarathi and Shintani does not explicitly teach: determining, while in the demonstration mode, that a current user of the computing device has changed from the current user to an additional user; and automatically exiting the demonstration mode in response to determining that the current user has changed.
However, He discloses determining, while in the demonstration mode, that a current user of the computing device has changed from the current user to an additional user; and automatically exiting the demonstration mode in response to determining that the current user has changed (He [0064], “When any one or more of the foregoing sensors detect that the user wears the headset, the terminal is controlled to automatically play music or a video, automatically turn on a music balancer, automatically answer a call, and so on. When the sensor detects that the user takes off the headset, the terminal is controlled to automatically pause or stop music or a video, automatically exit a background application, turn off a music balancer, automatically hang up a call, and so on,” when the user takes off the headset, the current user is changed and the relevant software programs are exited).
He is analogous to Kanevsky in view of Parthasarathi and Shintani, as both are drawn to the art of user presence detection. It would be obvious to try by one of ordinary skill in the art at the time of filing to have modified the method as taught by Kanevsky in view of Parthasarathi and Shintani, to include determining, while in the demonstration mode, that a current user of the computing device has changed from the current user to an additional user; and automatically exiting the demonstration mode in response to determining that the current user has changed, as taught by He, in order to reduce unnecessary power consumption (He [0086]). Doing so is a predictable solution that one of ordinary skill in the art could have pursued with a reasonable expectation of success.
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Kanevsky in view of Parthasarathi and Shintani, and in further view of Guthery (US 2019/0074003).
Regarding claim 9, Kanevsky in view of Parthasarathi and Shintani does not explicitly teach receiving additional audio at the computing device while the computing device is operating in the demonstration mode, the additional audio including a conversation between the first entity and the second entity; determining, by analyzing the conversation from the additional audio using the machine learning model, whether an updated intent of the first entity or an updated intent of the second entity is to have the computing device exit the demonstration mode; and automatically exiting the demonstration mode in response to determining that the updated intent of the first entity or the updated intent of the second entity is to have the computing device exit the demonstration mode.
However, Guthery discloses receiving additional audio at the computing device while the computing device is operating in the demonstration mode, the additional audio including a conversation between the first entity and the second entity; determining, by analyzing the conversation from the additional audio using the machine learning model, whether an updated intent of the first entity or an updated intent of the second entity is to have the computing device exit the demonstration mode; and automatically exiting the demonstration mode in response to determining that the updated intent of the first entity or the updated intent of the second entity is to have the computing device exit the demonstration mode (Guthery [0040], “Upon receiving an audio signal representing a user utterance of a phrase to exit programming mode (e.g., “end programming mode”), the voice command recognition and programming application 102 may interpret the audio signal as a command to again change the grammar applied to subsequent utterances,” a voice command by the user causes the device to exit a mode of operation).
Guthery is analogous to Kanevsky in view of Parthasarathi and Shintani, as both are drawn to the art of voice command processing. It would be obvious to try by one of ordinary skill in the art at the time of filing to have modified the method as taught by Kanevsky in view of Parthasarathi and Shintani, to include receiving additional audio at the computing device while the computing device is operating in the demonstration mode, the additional audio including a conversation between the first entity and the second entity; determining, by analyzing the conversation from the additional audio using the machine learning model, whether an updated intent of the first entity or an updated intent of the second entity is to have the computing device exit the demonstration mode; and automatically exiting the demonstration mode in response to determining that the updated intent of the first entity or the updated intent of the second entity is to have the computing device exit the demonstration mode, as taught by Guthery, since it uses a known technique (voice command to exit) on a known device ready for improvement to yield predictable results. Doing so is a predictable solution that one of ordinary skill in the art could have pursued with a reasonable expectation of success.
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Kanevsky in view of Parthasarathi and Shintani, and in further view of Chen et al. (hereinafter “Chen,” US 2016/0323282).
Regarding claim 15, Kanevsky in view of Parthasarathi and Shintani does not explicitly teach checking whether the secondary entity is a trusted secondary entity, and the automatically entering including automatically entering the demonstration mode only if the secondary entity is not the trusted secondary entity.
However, Chen discloses checking whether the secondary entity is a trusted secondary entity, and the automatically entering including automatically entering the demonstration mode only if the secondary entity is not the trusted secondary entity (Chen [0005], “when the user is an unauthorized user, disabling a permission for modifying configurations of the terminal by the user”; also Chen [0040], “when the fingerprint information of the user is successfully obtained based on the touch operation of the user, and the user is determined as an unauthorized user by fingerprint information matching, the terminal may firstly determine whether the touch operation of the user triggers a configuration modifying event; if the touch operation of the user triggers a configuration modifying event, the configuration modifying permission corresponding to the configuration modifying event may be disabled,” when an unauthorized secondary user is detected, a mode is entered where the configuration cannot be modified).
Chen is analogous to Kanevsky in view of Parthasarathi and Shintani, as both are drawn to the art of computer interfaces. It would be obvious to try by one of ordinary skill in the art at the time of filing to have modified the method as taught by Kanevsky in view of Parthasarathi and Shintani, to include checking whether the secondary entity is a trusted secondary entity, and the automatically entering including automatically entering the demonstration mode only if the secondary entity is not the trusted secondary entity, as taught by Chen, since it combines prior art elements according to known methods to yield predictable results. Doing so is a predictable solution that one of ordinary skill in the art could have pursued with a reasonable expectation of success.
Response to Arguments
The Applicant’s arguments filed on July 17, 2025 have been fully considered.
§ 101 Rejection
The Applicant respectfully argues that the claims “are not simply directed to a mental process, particularly in view of the current amendments. Rather, Applicant's claimed subject matter is directed to devices and operations of those devices… the limitations of claim 1, and similarly claims 10 and 16, cannot be practically performed by the human mind and thus do not fall within the mental process grouping. For example, it is not possible by a human mind by thinking to automatically capture audio data that includes a first audio source and a second audio source by one or more audio sensors, encode the audio data as a vector representation that includes representations of audio frequency data for overlapping time windows of the audio data, inputting the encoded audio data to a machine learning model trained to generate intent classifications for entities within conversations, process the encoded audio data by the machine learning model to determine an intent of a first entity and/or a second entity, and automatically changing one or more device property values (such as a gesture value that specifies an input style for touch-based navigation in a user interface of the computing device) based on the intent classification generated by the machine learning model.”
The Examiner respectfully disagrees. Applicant’s claimed devices are generic computing devices, and MPEP 2106.04(a)(2)(III)(C) states that a claim that requires a computer may still recite a mental process. The MPEP further offers guidance that “examiners should review the specification to determine if the claimed invention is described as a concept that is performed in the human mind and applicant is merely claiming that concept performed 1) on a generic computer, or 2) in a computer environment, or 3) is merely using a computer as a tool to perform the concept. In these situations, the claim is considered to recite a mental process.” In the present case, the claimed invention is described as the concept of “entering a demonstration mode” performed on a generic computer or computer environment, or is merely using a computer as a tool to perform the concept.
Applicant’s claimed “computing device” is described in paragraph [0006] as including “many different types of computing or electronic devices. For example, the computing device 102 can be a smartphone or other wireless phone, a camera (e.g., compact or single-lens reflex), or a tablet or phablet computer. By way of further example, the computing device 102 can be a notebook computer (e.g., netbook or ultrabook), a laptop computer, a wearable device (e.g., a smartwatch, an augmented reality headset or device, a virtual reality headset or device), a personal media player, a personal navigating device (e.g., global positioning system), an entertainment device (e.g., a gaming console, a portable gaming device, a streaming media player, a digital video recorder, a music or other audio playback device), a video camera, an Internet of Things (IoT) device, an automotive computer, and so forth.” The Applicant’s claimed “electronic device” is described in paragraph [0065] as “any of the devices described with reference to FIGs. 1-5, such as any type of client device, mobile phone, tablet, computing, communication, entertainment, gaming, media playback, or other type of electronic device.”
Therefore, the claimed invention is directed to abstract mental processes.
The Applicant further respectfully argues that similar to Example 39 of the 2019 PEG Examples, “claim 1 as amended describes a particular machine learning model that is trained using a specific training process to perform a specific task. For instance, as recited in claim 1 the machine learning model is ‘trained to generate intent classifications for each entity within conversations based on vector representations that include audio frequency data for overlapping time windows’ and as such is implemented by the computing device to process ‘the encoded audio data ... to determine an intent of the primary entity is to have the secondary entity assess features of the computing device and an intent of the secondary entity is to assess features of the computing device.’“
The Examiner respectfully disagrees. Example 39 of the 2019 PEG Examples is directed to a highly specific method of training a neural network for facial detection including collecting specific types of images, applying specific transformations to the images, and using specific types of images to train a neural network using a two-stage process. In contrast, the instant claim 1 encoding highly general “audio data” for “processing by a machine learning model.”
The audio data is converted to vector representations, but such conversion is conventional in audio data processing in order to allow for analysis of the audio data, since raw audio data cannot be effectively analyzed.
As for training the machine learning model, the claim only states that the machine learning model is “trained to generate intent classifications of conversations based on audio data.” Unlike Example 39, instant claim 1 recites training the machine learning model with a high level of generality that amounts to simply “using” a generic machine learning model. Therefore, the instant claims recite a judicial exception and fall under mental processes.
The Applicant further respectfully argues that similar to Example 39 of the 2019 PEG Examples, “the claims as amended do not describe ‘highly general audio data’ and ‘recite training the machine learning model at a high level of generality that amounts to simply “using” a generic machine learning model’ as asserted by the Office. Office Action, p. 24. Rather, the claims recite a specific transformation to the audio data, e.g., ‘encoding the audio data as a vector representation for processing by a machine learning model that includes representations of audio frequency data for overlapping time windows of the audio data,’ a specific manner of training the machine learning model, e.g., ‘the machine learning model trained to generate intent classifications for each entity within conversations based on vector representations that include audio frequency data for overlapping time windows,’ and a specific manner of implementing the trained machine learning model to perform a particular task, e.g., ‘processing the encoded audio data by the machine learning model to determine an intent of the primary entity is to have the secondary entity assess features of the computing device and an intent of the secondary entity is to assess features of the computing device.’ Similar to how in the example claim the limitations of using a machine learning system trained for a particular task do not pertain to a mental process, neither do the analogous limitations in claim 1. As such, the claims are patent eligible for at least the same rationale that the Office found the claim in Example 39 patent eligible.”
The Examiner respectfully disagrees. As for processing overlapping time windows of audio data, the instant disclosure only describes the process in paragraph [0029] as an optional process, and does not state any reason why this should be done ([0029], “The audio input 204 is optionally separated into multiple windows (e.g., 5-second or 10-second) windows of audio, optionally overlapping.”). In other words, there appears to be no technical benefit for processing the audio data as overlapping windows of