Last updated: April 19, 2026
Application No. 17/740,291
MACHINE-LEARNING BASED GESTURE RECOGNITION WITH FRAMEWORK FOR ADDING USER-CUSTOMIZED GESTURES

Final Rejection §103
Filed
May 09, 2022
Examiner
DASGUPTA, SHOURJO
Art Unit
2144
Tech Center
2100 — Computer Architecture & Software
Assignee
Apple Inc.
OA Round
2 (Final)
Interview Optional

— +38.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 449 resolved cases, 2023–2026
Examiner Intelligence

DASGUPTA, SHOURJO View full profile →
Grants 65% — above average
Career Allow Rate
293 granted / 449 resolved
+10.3% vs TC avg
Strong +38% interview lift
Without
With
+38.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
32 currently pending
Career history
481
Total Applications
across all art units
Statute-Specific Performance

§101
11.8%
-28.2% vs TC avg
§103
56.8%
+16.8% vs TC avg
§102
12.2%
-27.8% vs TC avg
§112
15.6%
-24.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 449 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action
2.	This Final Office Action is responsive to Applicants’ amendments and arguments, as received 8/20/25.  Claims 1-20 remain pending, of which claims 1 and 11 are independent.

3.	The previously-presented rejections under 35 U.S.C. 101 are withdrawn in view of Applicants’ amendments and arguments.

4.	The previously-presented rejection to claim 18 under 35 U.S.C. 112(b) are withdrawn in view of Applicants’ arguments.

Claim Rejections - 35 USC § 103
5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office Action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


6.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

7.	Claims 1-2, 4-6, 8-9, 11-12, 14-16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 2020/0275895 (“Barachant”) in view of U.S. Patent Application Publication No. 2020/0143154 (“Ette”) and further in view of Non-Patent Literature “Ensemble Learning” (“Polikar”).
Regarding claim 1, BARACHANT teaches A method comprising:
receiving, with at least one processor, sensor data indicative of a ... gesture made by a user (FIG. 4 step 402 teaching obtaining sensor signals during the training classification model, and FIG. 7 step 702 teaching the use of recorded sensor data (i.e., subject to obtaining/”receiving” (as recited)) as applied to the trained classification model), where per [0070] and [0085] the obtained/received sensor data corresponds to a user making a gesture), the sensor data obtained from at least one sensor of a wearable device worn on a limb of the user (FIG. 2A and [0034], [0058], [0060], and [0065] for example teaching that the operable sensors are worn on/around a user’s wrist (i.e., “limb of the user” as recited));
generating, with at least one processor, a current encoding of features extracted from the sensor data using a machine learning model with the features as input ... and generating, with at least one processor, similarity metrics between the current encoding and each encoding in a set of previously generated encodings for gestures ([0085]: “FIG. 7 illustrates a process 700 for recognizing user-defined gestures based on recorded sensor data in accordance with some embodiments. FIG. 7 represents a “gesture recognition phase” where the trained classification model is used to recognize gestures being performed by users and generate corresponding command/control signals. In act 702, sensor data recorded by one or more sensors is provided as input to one or more trained classification models that include one or more categorical representations for each of the gestures previously performed by a user or multiple users, as described briefly above.”, and then [0119]-[0120]: “After computing the direction and average magnitude corresponding to each of the gestures performed by the user during the training phase as discussed above, gesture classifications can be effectively determined for unseen gesture vectors. An unseen gesture vector is a vector derived from an unclassified user-defined gesture. ... Given an unseen gesture vector, a gesture can be inferred or classified based on a similarity metric (e.g., the cosine of the angle) between the unseen gesture vector and a set of gesture vectors produced during the training phase of the system. Each gesture vector in the set of gesture vectors corresponds to a gesture learned by the model during the training phase ...”);
generating, with at least one processor, similarity scores based on the similarity metrics ([0119]-[0123] discussing a relative magnitude for an unseen vector that is based in part on similarity metrics and/or cosine distances); 
predicting the gesture made by the user based on the similarity scores and predicting, with the at least one processor, the ... gesture made by the user using the machine learning model with the similarity scores as input ([0119]-[0123] discussing the basis of a relative magnitude to classify an unseen gesture vector in real-time); and
performing, with at least one processor, an action on the wearable device or other device based on the predicted ... gesture (generation of control signals responsive to the successful classification and identification of a user’s gesture in real-time, per [0002], [0065], [0087], and [0126], the control signal for example to control objects in AR/VR environments and/or other systems/devices).

The claims have been amended to recite a customized gesture, e.g. for which sensor data is received and for which a prediction is made.  The Examiner believes Barachant teaches this limitation, see e.g., [0072]: “In some embodiments, the gestures may include non-restrictive or customized gestures (also referred to herein as “user-defined” gestures) defined by a user that may represent the user's own and unique collection of gestures. These gestures may not be pre-defined or pre-modeled in the system and the classification model may be trained to identify any type of gesture while applying any amount of force.”  
Subject to further teachings ([0064] discussing training and retraining the model, [0119] discussing the generation of a vector to characterize an unseen gesture), the Examiner believes these may be understood even to constitute gestures learned at a later time, e.g., subsequent to an earlier learned gesture.  Hence, Barachant as discussed here may also read on part of the further limitation wherein the machine learning model includes a pre-trained gesture recognition model for predicting a gesture class based on a set of known gestures and additionally for predicting customized gestures.
That said, Barachant does not teach the further limitation of an additional prediction head for the predicting of customized gesture that is separate and apart from the pre-trained gesture recognition model for predicting a gesture class based on a set of known gestures.  Rather, the Examiner relies upon ETTE and POLIKAR to teach what Barachant may otherwise lack:
ETTE teaches a comparable gesture recognition framework, where per [0060]-[0061] it is contemplated that existing manufacturer gesture profiles can exist (e.g., akin to Applicants’ recitation for known gestures part of a pre-trained gesture recognition model) as well as the capacity to “capture and store” new gesture profiles.  The Examiner understands Ette to make more explicit and concrete the capacity for a gesture recognition framework to have existing/known profiles and to actively learn new profiles, e.g. beyond what Barachant contemplates based on the portions of the primary reference discussed above.
Both references are similarly directed to gesture recognition and the capacity to train and continually train a classification model therefor.  Hence, they are similarly directed and therefore analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Ette’s dichotomy of known and new gesture processing functionality to further/better implement the training and retraining contemplations that Barachant mentions but does not concretely detail, e.g. such that a user of Barachant’s modified framework can benefit from a gesture recognition framework that can actively learn new gestures from that particular user for example.
Neither Barachant nor Ette concretely teach or delineate the existing and new gesture processing into separate portions of an architecture, e.g. the additional recitation of an additional prediction head.  Rather, the Examiner relies upon POLIKAR to teach what Barachant and Ette otherwise lack, see e.g., POLIKAR’s pages 13-14 discussing Incremental learning: “Incremental learning refers to the ability of an algorithm to learn from new data that may become available after a classifier (or a model) has already been generated from a previously available dataset ... A commonly used approach to learn from additional data - discarding the existing classifier, and retraining a new one with the old and new data combined together does not meet the definition of incremental learning, since it causes catastrophic forgetting of all previously learned information and uses previous data. Ensemble based systems can be used for such problems by training an additional classifier (or an additional ensemble of classifiers) on each dataset that becomes available. Learn ++ primarily for incremental learning problems that do not introduce new classes (Polikar2001), and Learn ++ .NC for those that introduce new classes with additional datasets (Muhlbaier 2008) are two examples of ensemble based incremental learning algorithms.”  The Examiner understands Polikar to teach additional/later classifier development for new classes based on new data, subsequent to earlier learning, earlier classifier development, etc.  In the Examiner’s view, the new classifier in an ensemble, as discussed here, is akin to a separate processing head for classifying a new/additional class of output.
Like Barachant and Ette, Polikar relates to classification, and specifically classification frameworks that train or retrain or continually learn.  Hence, they are similarly directed and therefore analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Polikar’s designation of a new classifier for new data, in accordance with its incremental learning principles, into a framework such as Barachant’s, with a reasonable expectation of success, such that the new classification can be attained without disrupting or otherwise thwarting prior learning.

Regarding claim 2, Barachant in view of Ette and further in view of Polikar teach the method of claim 1, as discussed above.  The aforementioned references teach the additional limitations wherein the limb is a wrist of the user (FIG. 2A and [0034], [0058], [0060], and [0065] for example teaching that the operable sensors are worn on/around a user’s wrist (i.e., “limb of the user” as recited)), and the sensor data is obtained from a combination of a bio signal ([0061]: “other types of sensors such as a heart-rate monitor”) and at least one motion signal ([0057]: “Sensors 102 may include one or more Inertial Measurement Units (IMUs), which measure a combination of physical aspects of motion, using, for example, an accelerometer, a gyroscope, a magnetometer, or any combination of one or more accelerometers, gyroscopes and magnetometers”).

Regarding claim 4, Barachant in view of Ette and further in view of Polikar teach the method of claim 1, as discussed above.  The aforementioned references teach the additional limitations wherein the similarity metrics are distance metrics ([0013], [0026], and [0029] clarifying the similarity metric as “a cosine distance”).

Regarding claim 5, Barachant in view of Ette and further in view of Polikar teach the method of claim 1, as discussed above.  The aforementioned references teach the additional limitations wherein the machine learning model is a neural network (trained classification model per [0022] and [0051]-[0055] is akin to a neural network or a model that one would understand to be implemented using a neural network).

Regarding claim 6, Barachant in view of Ette and further in view of Polikar teach the method of claim 1, as discussed above.  The aforementioned references teach the additional limitations wherein the similarity scores are generated by a neural network ([0120]: “Given an unseen gesture vector, a gesture can be inferred or classified based on a similarity metric (e.g., the cosine of the angle) between the unseen gesture vector and a set of gesture vectors produced during the training phase of the system.”, and [0121]: “Accordingly, unseen gesture vectors computed from subsequent user-performed gestures (after the training phase) can be classified, in some instances, based on their cosine distance”).

Regarding claim 8, Barachant in view of Ette and further in view of Polikar teach the method of claim 1, as discussed above.  The aforementioned references teach the additional limitations wherein the action corresponds to navigating a user interface on the wearable device or other device (generation of control signals responsive to the successful classification and identification of a user’s gesture in real-time, per [0002], [0065], [0087], and [0126], the control signal for example to control objects in AR/VR environments and/or other systems/devices and/or scrolling through text (specifically, see [0065])).

Regarding claim 9, Barachant in view of Ette and further in view of Polikar teach the method of claim 1, as discussed above.  The aforementioned references teach the additional limitations wherein the machine learning model is a neural network trained using sample data for pairs of gestures obtained from a known set of gestures ( [0120]: “Given an unseen gesture vector, a gesture can be inferred or classified based on a similarity metric (e.g., the cosine of the angle) between the unseen gesture vector and a set of gesture vectors produced during the training phase of the system. Each gesture vector in the set of gesture vectors corresponds to a gesture learned by the model during the training phase. For example, a match between an unseen gesture vector and a learned gesture vector can be inferred by selecting a learned gesture vector from the set of learned gesture vectors having the minimum cosine distance with respect to the unseen gesture vector.”, which essentially involves a pairwise comparison between a present gesture and a prior learned gesture), where each gesture in the pair is annotated with a label indicating that the gesture is from a same class or a different class ( “categorical representation” as determined based on the gesture identification and related processing is a akin to a labeling (see, e.g., [0054]-[0055], [0080], [0085], [0102])), and a feature vector for each gesture in the pair is separately encoded using the machine learning model ([0014], [0020], [0027]-[0029], [0077], [0081]).

Regarding claim 11, the claim includes the same or similar limitations as claim 1 discussed above, and is therefore rejected under the same rationale.

Regarding claim 12, the claim includes the same or similar limitations as claim 2 discussed above, and is therefore rejected under the same rationale.

Regarding claim 14, the claim includes the same or similar limitations as claim 4 discussed above, and is therefore rejected under the same rationale.

Regarding claim 15, the claim includes the same or similar limitations as claim 5 discussed above, and is therefore rejected under the same rationale.

Regarding claim 16, the claim includes the same or similar limitations as claim 6 discussed above, and is therefore rejected under the same rationale.

Regarding claim 19, the claim includes the same or similar limitations as claim 9 discussed above, and is therefore rejected under the same rationale.


8.	Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Barachant in view of Ette and Polikar and further in view of U.S. Patent Application Publication No. 2021/0373676 (“Jorasch”).
Regarding claim 3, Barachant in view of Ette and further in view of Polikar teach the method of claim 2, as discussed above.  The aforementioned reference teaches the additional limitations wherein ... the at least one motion signal is acceleration or angular rate (Barachant’s [0057]: “Sensors 102 may include one or more Inertial Measurement Units (IMUs), which measure a combination of physical aspects of motion, using, for example, an accelerometer, a gyroscope, a magnetometer, or any combination of one or more accelerometers, gyroscopes and magnetometers”) but not specifically the further limitation wherein the bio signal is a photoplethysmography (PPG) signal.  At best, Barachant teaches a heart-rate monitoring sensor or the like, as discussed above per claim 2, but not a PPG signal specifically, which the Examiner understands to be a type of heart-rate monitor which Barachant only generally teaches.  Rather, the Examiner relies upon JORASCH to teach what Barachant etc. otherwise lacks, see e.g. Jorasch’s [1915]-[1916] teaching a PPG sensor as recited, in a comparable framework that also extracts features to identify based on gestures using a machine learning model or the like ([0445], [0460], [1229], [1244]).
The references Barachant and Jorasch both relate to feature extraction and model building frameworks that at least in part involve gesture identification using similar machine-learning approaches.  Hence, the references are similarly directed and therefore analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to extend Barachant’s sensing modalities, which explicitly involve heart-rate monitoring, to involve a specific type of that same heart monitoring via Jorasch’s PPG signal monitoring as taught, with a reasonable expectation of success, such that Barachant’s signal collection and processing aspect can benefit from having an increased breadth that encompasses specific types of signals/sensors as known in the state of the art, such as Jorasch teaches.

Regarding claim 13, the claim includes the same or similar limitations as claim 3 discussed above, and is therefore rejected under the same rationale.


9.	Claims 7 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Barachant in view of Ette and Polikar and further in view of U.S. Patent No. 11281293 (“Hernandez”).
Regarding claim 7, Barachant in view of Ette and Polikar teach the method of claim 6, as discussed above.  The aforementioned references teach a trained classifier, as discussed above per claims 1 and 6 and Barachant for example, which the Examiner equates with the recited “neural network.”  That said, Barachant etc. is silent as to the further limitations wherein the neural network is a deep neural network that includes a sigmoid activation function.  Rather, the Examiner relies upon HERNANDEZ to teach what Barachant etc. otherwise lack, see e.g., a comparable handstate modeling framework, capable of gesture recognition per column 18 lines 9-29 and column 23 lines 13-31, which is broadened in its teachings to encompass deep neural network implementations inclusive of sigmoid activation functions, per column 15 line 25 – column 16 line 2. 
The references Barachant and Hernandez both relate to feature extraction and model building frameworks that at least in part involve gesture identification using similar machine-learning approaches.  Hence, the references are similarly directed and therefore analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to extend Barachant’s machine learning approaches to encompass techniques known in the state of the art to accomplish the same feature extraction and similarity determinations, such as Hernandez does, with a reasonable expectation of success, such that Hernandez’s deep learning implementations can be leveraged to solve the same or similar problems considered by Barachant with a capability to handle more/greater complexity.

Regarding claim 17, the claim includes the same or similar limitations as claim 7 discussed above, and is therefore rejected under the same rationale.

Regarding claim 18, Barachant in view of Ette and Polikar and further in view of Hernandez teach the method of claim 6, as discussed above.  The aforementioned references teach the additional limitations wherein the performed action corresponds to navigating a user interface on the wearable device or other device (Barachant: generation of control signals responsive to the successful classification and identification of a user’s gesture in real-time, per [0002], [0065], [0087], and [0126], the control signal for example to control objects in AR/VR environments and/or other systems/devices and/or scrolling through text (specifically, see [0065]), and where the control signal is understood to perform a control action or a scrolling action as discussed here).  The motivation for combining the references is as discussed above in relation to claim 7.


10.	Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Barachant in view of Ette and Polikar and further in view of U.S. Patent Application Publication No. 2021/0319321 (“Krishnamurthy”).
Regarding claim 10, Barachant in view of Ette and further in view of Polikar teach the method of claim 9, as discussed above.  The aforementioned references do not teach the further limitation wherein the machine learning model uses a different loss function for each gesture in each pair during training, and rather the Examiner relies upon KRISHNAMURTHY to teach what Barachant etc. otherwise lack, see e.g., Krishnamurthy’s [0023]: “After initialization the activation function and optimizer is defined. The NN is then provided with a feature vector or input dataset at 142. Each of the different feature vectors may be generated by the NN from inputs that have known relationships. Similarly, the NN may be provided with feature vectors that correspond to inputs having known relationships. The NN then predicts a distance between the features or inputs at 143. The predicted distance is compared to the known relationship (also known as ground truth) and a loss function measures the total error between the predictions and ground truth over all the training samples at 144. By way of example and not by way of limitation the loss function may be a cross entropy loss function, quadratic cost, triplet contrastive function, exponential cost, mean square error etc. Multiple different loss functions may be used depending on the purpose. By way of example and not by way of limitation, for training classifiers a cross entropy loss function may be used whereas for learning an embedding a triplet contrastive loss function may be employed. The NN is then optimized and trained, using known methods of training for neural networks such as backpropagating the result of the loss function and by using optimizers, such as stochastic and adaptive gradient descent etc., as indicated at 145. In each training epoch, the optimizer tries to choose the model parameters (i.e., weights) that minimize the training loss function (i.e. total error). Data is partitioned into training, validation, and test samples.”
The references Barachant and Krishnamurthy both relate to feature extraction and model building frameworks that at least in part involve solving some identification/classification problem using similar machine-learning approaches.  Hence, the references are similarly directed and therefore analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to extend Barachant’s machine learning training approach to encompass the optimization techniques discussed above per Krishnamurthy, with a reasonable expectation of success, such that loss evaluation is more finely tuned to the underlying learning task at hand as part of the larger framework as assembled.

Regarding claim 20, the claim includes the same or similar limitations as claim 10 discussed above, and is therefore rejected under the same rationale.


Response to Arguments
11.	Applicants’ arguments with respect to the pending claim have been carefully considered but are respectfully moot in view of the newly-formulated grounds of rejection as necessitated by the recent amendments.


Conclusion
12.	The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure:
US 2017/0344859 
US 9582080 and US 10540597 and US 1082954
Non-Patent Literature “Thoughts on a Recursive Classifier Graph: a Multiclass Network for Deep Object Recognition”
Non-Patent Literature “Fast Adaptive Stacking of Ensembles Adaptation for Supporting Active Learning. A Real Case Application”
Non-Patent Literature “Concept Drift Detection and Adaption in Big Imbalance Industrial IoT Data Using an Ensemble Learning Method of Offline Classifiers”


13.	Applicants’ amendment necessitated the new ground(s) of rejection presented in this Office Action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicants are reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

14.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHOURJO DASGUPTA whose telephone number is (571)272-7207. The examiner can normally be reached M-F 8am-5pm CST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached at 571 272 4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHOURJO DASGUPTA/Primary Examiner, Art Unit 2144
Read full office action
Prosecution Timeline

May 09, 2022
Application Filed
Apr 18, 2025
Non-Final Rejection — §103
Aug 20, 2025
Response Filed
Nov 20, 2025
Final Rejection — §103
Dec 03, 2025
Examiner Interview Summary
Dec 03, 2025
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

17/491,240
Patent 12591802
GENERATING ESTIMATES BY COMBINING UNSUPERVISED AND SUPERVISED MACHINE LEARNING
2y 5m to grant Granted Mar 31, 2026
17/342,719
Patent 12586371
SENSOR DATA PROCESSING
2y 5m to grant Granted Mar 24, 2026
18/076,240
Patent 12578979
VISUALIZATION OF APPLICATION CAPABILITIES
2y 5m to grant Granted Mar 17, 2026
18/662,972
Patent 12572782
SCALABLE AND COMPRESSIVE NEURAL NETWORK DATA STORAGE SYSTEM
2y 5m to grant Granted Mar 10, 2026
17/341,511
Patent 12549397
MULTI-USER CAMERA SWITCH ICON DURING VIDEO CALL
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
65%
Grant Probability
99%
With Interview (+38.1%)
3y 1m
Median Time to Grant
Moderate
PTA Risk
Based on 449 resolved cases by this examiner. Grant probability derived from career allow rate.