DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Style
In this action unitalicized bold is used for claim language, while italicized bold is used for emphasis.
Applicant Reply
“The claims may be amended by canceling particular claims, by presenting new claims, or by rewriting particular claims as indicated in 37 CFR 1.121(c). The requirements of 37 CFR 1.111(b) must be complied with by pointing out the specific distinctions believed to render the claims patentable over the references in presenting arguments in support of new claims and amendments. . . . The prompt development of a clear issue requires that the replies of the applicant meet the objections to and rejections of the claims. Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 2163.06. . . . An amendment which does not comply with the provisions of 37 CFR 1.121(b), (c), (d), and (h) may be held not fully responsive. See MPEP § 714.” MPEP § 714.02. Generic statements or listing of numerous paragraphs do not “specifically point out the support for” claim amendments. “With respect to newly added or amended claims, applicant should show support in the original disclosure for the new or amended claims. See, e.g., Hyatt v. Dudas, 492 F.3d 1365, 1370, n.4, 83 USPQ2d 1373, 1376, n.4 (Fed. Cir. 2007) (citing MPEP § 2163.04 which provides that a ‘simple statement such as ‘applicant has not pointed out where the new (or amended) claim is supported, nor does there appear to be a written description of the claim limitation ‘___’ in the application as filed’ may be sufficient where the claim is a new or amended claim, the support for the limitation is not apparent, and applicant has not pointed out where the limitation is supported.’)” MPEP § 2163(II)(A).
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
Unless specifically indicated in this office action, claim limitations are not interpreted as means plus function under § 112(f).
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. “‘The standard is whether the words of the claim are understood by persons of ordinary skill in the art to have a sufficiently definite meaning as the name for structure.’” Williamson, 792 F.3d at 1349, 115 USPQ2d at 1111; see also Greenberg v. Ethicon Endo-Surgery, Inc., 91 F.3d 1580, 1583, 39 USPQ2d 1783, 1786 (Fed. Cir. 1996).” MPEP § 2181(I).
Such claim limitation(s) is/are: “automated machine learning engine” in claims 1 and 10 and “monitoring and decision module” in claims 1, 8, 10, and 12-15.
The claimed “automated machine learning engine” is supported in the Specification. See Spec. ¶31.
The claimed “monitoring and decision module” is supported in the Specification. See Spec. ¶34.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-3 and 5-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
All independent claims substantially recite: “dividing an input data set of a single problem type into multiple different regions of a feature space by clustering outputs of a feature detector, and applying autoML mechanisms to each of the different regions to generate the model set such that each of the different models of the model set is specialized for a particular region of the feature space.” The original Specification does not mention a “dividing an input data set of a single problem type into multiple different regions of a feature space[.]” See also Claim 4 as filed 25 Feb. 2025. Further, this language could be read as dividing data into sets associated with different regions in the feature space for training models on a given region in the feature space (“applying autoML mechanisms to each of the different regions [of the feature space] to generate the model set such that each of the different models of the model set is specialized for a particular region of the feature space”) and then sub-dividing the data for a given problem into subsets in the feature space (“dividing an input data set for a single ML problem type into multiple different regions of a feature space by clustering outputs of a feature detector”). Nothing is found in the Specification which would provide support for division of the data at two different granularities. Further, the claim language appears to recite using both granularities as part of training specialized models. Nothing is found in the original specification which would support training of specialized models in this configuration.
All dependent claims are rejected as containing the limitations of the claims from which they depend.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-3 and 5-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA the applicant regards as the invention.
All independent claims substantially recite: “dividing an input data set of a single problem type into multiple different regions of a feature space by clustering outputs of a feature detector, and applying autoML mechanisms to each of the different regions to generate the model set such that each of the different models of the model set is specialized for a particular region of the feature space.” The original Specification does not mention a “dividing an input data set of a single problem type.” This language first appears in the claims filed 25 Feb. 2025. As such, the Specification does not provide clarity. The Specification describes dividing an input dataset into multiple distinct regions within a feature space based on features of input data. Spec. ¶18. AutoML mechanisms can then be applied to each different region to generate models specialized for particular regions in the feature space. Spec. ¶18. The only example indicates that images could be objects or could be persons. Spec. ¶18. Based on this description, the claimed “multiple different regions of a feature space” would be understood as referring to different types of images, such as persons or objects. These different types of images can then be “clustered” based on their respective type of image. Spec. ¶18. In other words, the limitation above could read on dividing input data based on inclusion of a particular type of image (e.g. an object or a person) and training a corresponding model for the model set. Alternatively, the dividing of “an input data set of a single problem type into multiple different regions of a feature space” could be read as dividing data based on features within an image based on the plain meaning of “feature” in the field of image recognition. See e.g. Kumar: A Detailed Review of Feature Extraction in Image Processing Systems P. 6. (“The structural features representing the coarser shape of the character capture the presence of corners, diagonal lines, and vertical and horizontal lines in the gradient image. The concavity features capture the major topological and geometrical features including direction of bays, presence of holes, and large vertical and horizontal strokes.) This second interpretation is also consistent with another section of the Specification. Paragraph 18 of the Specification cites SIFT. “SIFT” originates from an academic paper, which uses “feature” consistent with its plain meaning in the field of image recognition. (See Lowe Distinctive Image Features from Scale-Invariant Keypoints, 2004 PP. 4-5.) Since it is not clear whether the claims are reciting dividing data into sets of data based on the type of image on which a model is to be trained, or dividing data based on low level features within an image (or both), the claim is indefinite.
All independent claims substantially recite “applying autoML mechanisms to each of the different regions to generate the model set such that each of the different models of the model set is specialized for a particular region of the feature space[.]” No objective measure allowing one of ordinary skill in the art to determine whether “mechanisms” constitute “autoML mechanisms” is found in the Specification. Without any objective measure distinguishing an “autoML mechanism” from an ordinary mechanism, the way this term should be construed to modify the claim language cannot objectively be determined. This renders the claim scope subjective, and therefore indefinite. The Specification mentions various operations in the context of autoML mechanisms, but no clear meaning for the term can be extrapolated. For instance the Specification mentions training and storing as autoML mechanisms in paragraph 19. In paragraph 30 autoML engine 1 uses unsupervised learning to divide input data into different regions and autoML mechanisms are used to derive dedicated models. This fails to provide a common genus of operations that one of ordinary skill in the art could apply to the current claim language for clarity. As such, this subjective term renders the claim indefinite. Note that this issue applies to the similar language in claim 19 as well.
Claim 16 recites “The method according to claim 1, wherein the particular situation is a situation determined independently of computing resource availability of the resource-constrained device.” But Claim 17 recites “The method according to claim 16, wherein the particular situation is one of an available time to execute the one or more selected models, an expected accuracy of the one or more selected models, or a data sample arrival rate at the resource-constrained device.” Claim 16 limits “the particular situation is a situation determined independently of computing resource availability[.]” This limitation on the recited “situation” is inconsistent with the usage of “the particular situation” recited in claim 17. Specifically, Claim 17 recites various “situations” that cannot reasonably be construed as “independent[] of computing resource availability.” For instance, claim 17 recites “the particular situation is . . . an available time to execute the one or more selected models.” Since time to execute a model is not independent from computing resource availability based on a plain meaning of the language, it is not clear how the language of claim 16 should be construed to be consistent with the language of both claim 16 and with claim 17. For the foregoing reasons, claims 16 and 17 are indefinite.
All dependent claims are rejected as containing the limitations of the claims from which they depend.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-3, 5-7, 9-14, and 16-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Han (MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints, 2016).
Claim 1: A method of solving a machine learning (ML) problem using a resource-constrained device, the method comprising: (Intended use language is explained in MPEP §§ 2103 and 2111.02. “Claim scope is not limited by claim language that suggests or makes optional but does not require steps to be performed, or by claim language that does not limit a claim to a particular structure.” MPEP § 2111.04. Further, Han teaches: “We consider applying computer vision to video on cloud-backed mobile devices using Deep Neural Networks (DNNs). The computational demands of DNNs are high enough that, without careful resource management, such applications strain device battery, wireless data, and cloud cost budgets.” Han Abstract.) generating and training, by an automated machine learning, (autoML) engine a model set including a number of different models for the ML problem, (“The large number of simultaneous DNNs in operation and the high frequency of their use will strain available resources, even when optimized models are used. We use two insights to address this problem. First, model optimization typically allows a graceful tradeoff between accuracy and resource use. Thus, a system could adapt to high workloads by using less accurate variants of optimized models.” Han P. 123 Col. 2. “We formulate the problem of adaptively selecting model variants of differing accuracy in order to remain within per-request resource constraints (e.g., memory) and long-term constraints (e.g., energy) while maximizing average classification accuracy as a constrained optimization problem we call Approximate Model Scheduling (AMS). To solve AMS, MCDNN contains three innovations. First, we generate optimized variants of models by automatically applying a variety of model optimization techniques with different settings.” Han P. 124 Col. 1. “The matrix multiplication and convolution layers are parameterized by weight arrays that are estimated from training data. The network description before training is called a model architecture, and the trained network with weights instantiated is a model.” Han P. 125 Col. 2. “Recent model optimization techniques for reducing resource use in DNNs have targeted these opportunities in three main ways: . . . Architectural changes [44] explore the space of model architectures, including varying the number of layers, size of weight matrices including kernels, etc. For instance, reducing the number of layers from 19 to 11 results in a drop in accuracy of 4.1% points. A common theme across these techniques is the trading off of resource use for accuracy.” Han P. 125 Col. 2 – P. 126 Col. 1. “MCDNN applies the following traditional techniques: . . . Architectural change: There is a wide variety of architectural transformations possible. It concentrates on the following: (1) For convolutional and locally connected layers, increase kernel/stride size or decrease number of kernels, to yield quadratic or linear reduction in computation. (2) For fully connected layers, reduce the size of the output layer to yield a linear reduction in size of the layer. (3) Eliminate convolutional layers entirely.” Han P. 127 Col. 2 – P. 128 Col. 1.) wherein each of the different models of the model set is generated for a different input data characteristic to be specialized for a particular situation, comprising: (Han teaches: “Assume a device memory of size S, device energy budget E and cloud cost budget of D. Assume a set {M1,…,Mn} of models, where each model Mi has a set Vi of variants {Mij|n>=1i>=1, ni>=j>=1}, typically generated by model optimization. For instance, M1 may be a model for face recognition, M2 for object recognition, and so on.” Han P. 126 Col. 2. “Each mt is a request for model Mi (for some i) at time t. For each request mt, the system must decide where (device or cloud) to execute this model. If on-device, it must decide which variant Mij, if any, of Mi to load into the cache and execute at timestep t, while also deciding which, if any, models must be evicted from the cache to make space for Mij.” Han P. 126 Col. 2. “When applications are installed, they register with the scheduler a map from input types to a catalog of model fragments to be used to process those inputs, and handlers to be invoked with the result from each model. The catalog is stored on disk. When a new input appears, the scheduler (with help from the router and profiler) is responsible for identifying the model variants that need to be executed in response, paging if needed the appropriate model variants in from disk to the in-memory model-fragment cache in the appropriate location (i.e., on-device or on-cloud) for execution, executing the models on the input and dispatching the results to the registered handlers.” Han P. 130, col. 1. “Figure 9 illustrates the architecture of the MCDNN system. An application developer interested in using a DNN in resource-constrained settings provides the compiler with the type of input to which the model should be applied (e.g., faces), a model schema, and training data. The compiler derives a catalog of trained models from this data, mapping each trained variant of a model to its resource costs, accuracy, and information relevant to executing them (e.g., the runtime context in which they apply). When a user installs the associated application, the catalog is stored on disk on the device and cloud and registered with the MCDNN runtime as handling input of a certain type for a certain app.” Han P. 131 col. 1-2.) dividing an input data set of a single problem type into multiple different regions of a feature space by clustering outputs of a feature detector, (As best understood, this limitation refers to feature extraction within an image. That is, image data is processed to determine individual features within images (e.g. lines, dots, corners) which are used to recognize images. Note that “feature maps” are divided into regions, on which convolution is performed to extract the features. Han offers a more technical explanation below. It is understood that one of ordinary skill in the art if familiar with the common usage of the terms “feature map” and “convolution” as those terms are used in this area. Han teaches: “With model specialization, MCDNN seeks to exploit class-clustering in contexts to derive more efficient DNN-based classifiers for those contexts.” Han P. 129. “Figure 2: (a) DNN “layers” are array operations on lists of arrays called feature maps.” Han P. 125, description of figure 2. “A CNN can be viewed as a dataflow graph where the nodes, or layers, are array-processing operations (see Figure 2(a)). Each layer takes as input a set of arrays (called feature maps), performs an array operation on them, and outputs a set of feature maps that will in turn be processed by downstream layers. The array operations belong to a small set, including matrix multiplication (that multiplies feature maps by a weight matrix), convolution (that convolves inputs by a convolution kernel, typically of size 3x3 or 5x5), max-pooling (that replaces each input array value by the maximum of its neighbors), non-linearizing (that replaces each array value by a non-linear function of itself) and re-scaling (that re-scales inputs to sum to 1). Figure 2(b) shows how layers are connected in a typical CNN: groups of convolution, pooling and non-linearizing layers are repeated several times before 1-2 matrix-multiplication (or fully connected) layers are applied, ended by re-scaling.” Han P. 125 col 1.) and applying autoML mechanisms to each of the different regions to generate the model set such that each of the different models of the model set is specialized for a particular region of the feature space (The Specification explains “region” in reference to the feature space. See e.g. Spec. ¶18. “Assume a device memory of size S, device energy budget E and cloud cost budget of D. Assume a set {M1,…,Mn} of models, where each model Mi has a set Vi of variants {Mij|n>=1i>=1, ni>=j>=1}, typically generated by model optimization. For instance, M1 may be a model for face recognition, M2 for object recognition, and so on.” Han P. 126 Col. 2. With model specialization, MCDNN seeks to exploit class-clustering in contexts to derive more efficient DNN-based classifiers for those contexts. . . . The specializer, which runs in the background in the cloud, determines if a small fraction of possible classes “dominate” the CDF for a given model. If so, it adds to the catalog specialized versions of the generic variants (stored in the catalog) of the model by “re-training” them on a subset of the original data dominated by these classes. If a few classes do indeed dominate strongly, we expect even smaller models, that are not particularly accurate on the general inputs, to be quite accurate on inputs drawn from the restricted context.” Han P. 129 col. 1. See also Han P. 129 col. 1-2. (“When data flow from devices embedded in the real world, however, it is well-known that classes are heavily clustered by context. For instance you may tend to see the same 10 people 90% of the time you are at work . . . Intuitively, we seek to train a resource-light “specialized” variant of the developer-provided model for the few classes that dominate each context. Crucially, this model must also recognize well when an input does not belong to one of the classes; we refer to this class[.] . . . The profiler maintains a cumulative distribution function (CDF) of the classes resulting from classifying inputs so far to each model. The specializer, which runs in the background in the cloud, determines if a small fraction of possible classes “dominate” the CDF for a given model. If so, it adds to the catalog specialized versions of the generic variants (stored in the catalog) of the model by “re-training” them on a subset of the original data dominated by these classes. . . . The specializer determines that a CDF C is n; p-dominated if n of its most frequent classes account for at least fraction p of its weight. For instance, if 10 of 4000 possible people account for 90% of faces recognized, the corresponding CDF would be (10,0.9) dominated. The specializer checks for n;p-dominance in incoming CDFs. MCDNN currently takes the simple approach of . . . . The straightforward way to specialize a model in the catalog to a restricted context would be to re-train the schema for that model on the corresponding restricted dataset. Full retraining of DNNs is often expensive, as we discussed in the previous section. Further, the restricted datasets are often much smaller than the original ones; the reduction in data results in poorly trained models. The MCDNN specializer therefore uses a variant of the in-place transformation discussed in the previous section to retrain just the output layer, i.e., the last fully-connected layer and softmax layers, of the catalog model on the restricted data.”)) monitoring, by a monitoring and decision module, based on one or more of the input data characteristics being present in the input data, input data of the ML problem and selecting one or more models of the generated model set as active models to be applied by the resource-constrained device, (“We formulate the problem of adaptively selecting model variants of differing accuracy in order to remain within per-request resource constraints (e.g., memory) and long-term constraints (e.g., energy) while maximizing average classification accuracy as a constrained optimization problem we call Approximate Model Scheduling (AMS).” Han P. 124 Col. 1. “Instead of using a single hand-picked approximate variant of a model, MCDNN dynamically picks the most appropriate variant from its model catalog.” Han P. 127 Col. 2. See also Han Table 2 and Figs. 4 and 5 showing tradeoffs between resources usage and accuracy for different models identifying different types of images. “To study these questions, we have generated catalogs for ten distinct classification tasks (a combination of model architecture and training data, the information a developer would input to MCDNN).” Han P. 128 Col. 1. “With model specialization, MCDNN seeks to exploit class-clustering in contexts to derive more efficient DNN-based classifiers for those contexts. We adopt a cascaded approach (Figure 7(a)) to exploit this opportunity. Intuitively, we seek to train a resource-light “specialized” variant of the developer-provided model for the few classes that dominate each context.” Han P. 129 Col. 1. “When a new input appears, the scheduler (with help from the router and profiler) is responsible for identifying the model variants that need to be executed in response, paging if needed the appropriate model variants in from disk to the in-memory model-fragment cache in the appropriate location (i.e., on-device or on-cloud) for execution, executing the models on the input and dispatching the results to the registered handlers.” Han P. 130 Col. 1. As explained above, Han teaches selecting from a set of models based on the type of input data. See Han P. 130 col. 1 and Han P. 131 col. 1-2, cited above.) and receiving, by the resource-constrained device, input data of the ML problem and applying the one or more models selected by the monitoring and decision module to the received input data. (“When a new input appears, the scheduler (with help from the router and profiler) is responsible for identifying the model variants that need to be executed in response, paging if needed the appropriate model variants in from disk to the in-memory model-fragment cache in the appropriate location (i.e., on-device or on-cloud) for execution, executing the models on the input and dispatching the results to the registered handlers.” Han P. 130 Col. 1.)
Claim 2: The method according to claim 1, wherein the model set is configured to include a number of different models that satisfy resource constraints of the resource-constrained device. (See rejection of claim 1.)
Claim 3: The method according to claim 1, wherein the model set includes a number of different models having varying trade-offs between accuracy of the models and resource requirements of the models. (See rejection of claim 1.)
Claim 5: The method according to claim 4, wherein the different regions of the input data set are defined as distinct data space regions based on features of the input data. (See rejection of claim 4. Images of faces and images of objects are “distinct data space regions based on the features of the input data.”)
Claim 6: The method according to claim 4, wherein the different regions of the input data set are defined based on time and/or frequency of input data arrival. (Han teaches: “Second, both streaming and multi-programming themselves provide additional structure that enable powerful new model optimizations. For instance, streams often have strong temporal locality (e.g., a few classes dominate output for stretches of time)[.]” Han P. 123 Col. 2. “The specializer, which runs in the background in the cloud, determines if a small fraction of possible classes ‘dominate’ the CDF for a given model. If so, it adds to the catalog specialized versions of the generic variants (stored in the catalog) of the model by ‘re-training’ them on a subset of the original data dominated by these classes. If a few classes do indeed dominate strongly, we expect even smaller models, that are not particularly accurate on the general inputs, to be quite accurate on inputs drawn from the restricted context. We seek to minimize the overhead of specialization to 10s or less, so we can exploit class skews lasting as little as five minutes, a key to making specialization broadly useful. Implementing the above raises three main questions. What is the criterion for whether a set of classes dominates the CDF? How can models be re-trained efficiently? How do we avoid re-training too many variants of models and focus our efforts on profitable ones? We describe how MCDNN addresses these. The specializer determines that a CDF C is n; p-dominated if n of its most frequent classes account for at least fraction p of its weight.” Han P. 129 Col. 1. “When applications are installed, they register with the scheduler a map from input types to a catalog of model fragments to be used to process those inputs, and handlers to be invoked with the result from each model. The catalog is stored on disk. When a new input appears, the scheduler (with help from the router and profiler) is responsible for identifying the model variants that need to be executed in response, paging if needed the appropriate model variants in from disk to the in-memory model-fragment cache in the appropriate location (i.e., on-device or on-cloud) for execution, executing the models on the input and dispatching the results to the registered handlers.” Han P. 130 Col. 1.)
Claim 7: The method according to claim 1, wherein switching between the models of the model set that are selected to be applied by the resource-constrained device is performed based on a frequency of input data arrival at the resource-constrained device, (See rejection of claim 6.) a time of input data arrival at the resource constrained device, data characteristics of the input data, or a model execution context. (See rejection of claim 1.)
Claim 9: The method according to claim 1, wherein the models of the model set are configured to provide information on the confidence of their predictions, (See Han Fig. 2 showing softmax rescaling.) and wherein an input data sample is fed into a model of higher complexity when the confidence of a model of lower complexity is below a configurable threshold and the resource-constrained device has sufficient computational resources available. (“The large number of simultaneous DNNs in operation and the high frequency of their use will strain available resources, even when optimized models are used. We use two insights to address this problem. First, model optimization typically allows a graceful tradeoff between accuracy and resource use. Thus, a system could adapt to high workloads by using less accurate variants of optimized models.” Han P. 123 Col. 2. “Recent model optimization techniques for reducing resource use in DNNs have targeted these opportunities in three main ways: . . . Architectural changes [44] explore the space of model architectures, including varying the number of layers, size of weight matrices including kernels, etc. For instance, reducing the number of layers from 19 to 11 results in a drop in accuracy of 4.1% points. A common theme across these techniques is the trading off of resource use for accuracy.” Han P. 125 Col. 2 – P. 126 Col. 1. “We formulate the problem of adaptively selecting model variants of differing accuracy in order to remain within per-request resource constraints (e.g., memory) and long-term constraints (e.g., energy) while maximizing average classification accuracy as a constrained optimization problem we call Approximate Model Scheduling (AMS).” Han P. 124 Col. 1. Read as a whole, Han indicate the tradeoff between accuracy and resource use refers to the tradeoff between accuracy and larger, more complex models.)
Claim 10: A system for solving a machine learning (ML) problem, the system comprising: an automated machine learning (autoML); engine configured to generate and train a model set including a number of different models for the ML problem, wherein each of the different models of the model set is generated for a different input data characteristic to be specialized for a particular situation, comprising: dividing an input data set for a single ML problem type into multiple different regions of a feature space by clustering outputs of a feature detector, and applying autoML mechanisms to each of the different regions to generate the model set such that each of the divided input data sets is specialized for a particular region of the feature space a monitoring and decision module configured to monitor input data of the ML problem and to select, based on one or more of the input data characteristics being present in the input data, one or more models of the generated model set as active models to be applied by a resource-constrained device, and a-the resource-constrained device. configured to receive input data of the ML problem and to apply the one or more models selected by the monitoring and decision module to the received input data. (See rejection of claim 1.)
Claim 11: The system according to claim 10, wherein the resource-constrained device is an edge device. (See rejection of claim 1. (“We consider applying computer vision to video on cloud-backed mobile devices using Deep Neural Networks (DNNs). The computational demands of DNNs are high enough that, without careful resource management, such applications strain device battery, wireless data, and cloud cost budgets.” Han Abstract.))
Claim 12: The system according to claim 10, wherein the monitoring and decision module is configured to map the input data of the ML problem to at least one appropriate model of the model set. (“When applications are installed, they register with the scheduler a map from input types to a catalog of model fragments to be used to process those inputs, and handlers to be invoked with the result from each model. The catalog is stored on disk. When a new input appears, the scheduler (with help from the router and profiler) is responsible for identifying the model variants that need to be executed in response, paging if needed the appropriate model variants in from disk to the in-memory model-fragment cache in the appropriate location (i.e., on-device or on-cloud) for execution, executing the models on the input and dispatching the results to the registered handlers.” Han P. 130 Col. 1.)
Claim 13: The system according to claim 10, wherein the monitoring and decision module is configured to make decisions on deactivating a currently active model of the model set and replacing the currently active model by an inactive model of the model set. (“This online scheduling problem is challenging because it combines several elements considered separately in the scheduling literature. First, it has an “online paging” element [7], in that every time an input is processed, it must reckon with the limited capacity of the model cache. If no space is available for a model that needs to be loaded, it must evict existing models and page in new ones.” Han P. 130 Col. 1.)
Claim 14: The system according to claim 10, wherein the monitoring and decision module and the trained models of the model set for the ML problem are hosted on the resource-constrained device. (Han teaches that the “MCDNN infrastructure for sharing [is] replicated in the client and cloud.” Han P. 130, description of Fig. 8. Han teaches: “Since detection and tracking computations are expected to be performed frequently (e.g., a few times a second), we expect DNNs to be applied several times to many of the video frames. We therefore view DNN execution as the bulk of modern vision computation.” Hand P. 124 Col. 2. “The two main physical components are a battery-powered mobile device (typically some combination of a phone and a wearable) and powered computing infrastructure (some combination of a cloudlet and the deep cloud).” Han P. 124 Col. 2. “Offloading all data for cloud-processing using maximally accurate DNNs would thus probably only be practical in usages with a large dedicated wearable battery, lightly subscribed WiFi connection (thus limiting mobility) and substantial cloud budget. One path to reducing these resource demands is to reduce the amount of data transmitted by executing “more lightweight” DNN calculations (e.g. detection and tracking) on the device and only transmitting heavier calculations (e.g., recognition) to the cloud. Once at the cloud, reducing the overhead of (DNN) computations can support more modest cloud budgets. Now consider performing vision computations on the device. Such local computation may be necessary during the inevitable disconnections from the cloud, or just preferable in order to reduce data transmission power and compute overhead.” Han P. 124 Col. 2 – P. 125 Col. 1. The claimed hosting of the trained model on the resource constrained device reads on carrying out DNN calculations, for example detection and tracking, on the device. Han further teaches: “When a new input appears, the scheduler (with help from the router and profiler) is responsible for identifying the model variants that need to be executed in response, paging if needed the appropriate model variants in from disk to the in-memory model-fragment cache in the appropriate location (i.e., on-device or on-cloud) for execution, executing the models on the input and dispatching the results to the registered handlers.” Han P. 130 Col. 1. See also Han Fig. 9. “At run time, inputs for classification stream to the device. For each input, the scheduler selects the appropriate variant of all registered models from their catalogs, selects a location for executing them, pages them into memory if necessary, and executes them.” Han P. 131 Col. 2. Note that the “scheduler,” part of the MCDNN infrastructure, is shown located in both the device and in the cloud in Fig. 9.)
Claim 16: The method according to claim 1, wherein the particular situation is a situation determined independently of computing resource availability of the resource-constrained device. (See rejection of claim 1. Note that the models in the series Mi taught an Han are directed to “particular situations” such as identification of aspects of a face or identification of objects. These “situations” are “defined independently of computing resource availability.”)
Claim 17: The method according to claim 16, wherein the particular situation is one of an available time to execute the one or more selected models, an expected accuracy of the one or more selected models, or a data sample arrival rate at the resource-constrained device. (See rejection of claim 1. Note that picking the model directed to the correct image (e.g. an aspect of a face or object to identify) is based on an expected accuracy of the selected model.)
Claim 18: The method according to claim 1, wherein selecting, based on one or more of the input data characteristics being present in the input data, the one or more models of the model set further comprises selecting the one or more models based on a presence of a cluster corresponding to the one or more input data characteristics the one or more models are generated for. (Han teaches: “When data flow from devices embedded in the real world, however, it is well-known that classes are heavily clustered by context. For instance you may tend to see the same 10 people 90% of the time you are at work, with a long tail of possible others seen infrequently; the objects you use in the living room are a small fraction of all those you use in your life; the places you visit while shopping at the mall are likewise a tiny fraction of all the places you may visit in daily life. With model specialization, MCDNN seeks to exploit class-clustering in contexts to derive more efficient DNN-based classifiers for those contexts. . . . The specializer, which runs in the background in the cloud, determines if a small fraction of possible classes “dominate” the CDF for a given model. If so, it adds to the catalog specialized versions of the generic variants (stored in the catalog) of the model by “re-training” them on a subset of the original data dominated by these classes. If a few classes do indeed dominate strongly, we expect even smaller models, that are not particularly accurate on the general inputs, to be quite accurate on inputs drawn from the restricted context.” Han P. 129 col. 1. This teaches that the specialized models used for various tasks in Han refer to models specialized for “clustered” data. This is also consistent with the description of clustering in paragraph 19 of the Specification. With respect to the operation of selecting the model, see rejection of claim 1.)
Claim 19 (New): The method according to claim 1, wherein applying the autoML mechanisms further comprises storing trained models of the model set, the clustered outputs of the feature detector and a mapping of the divided input data regions to the trained models of the model set in a database accessible to the resource-constrained device, and wherein the resource-constrained device is an edge device. (Han teaches “When applications are installed, they register with the scheduler a map from input types to a catalog of model fragments to be used to process those inputs, and handlers to be invoked with the result from each model. The catalog is stored on disk. When a new input appears, the scheduler (with help from the router and profiler) is responsible for identifying the model variants that need to be executed in response, paging if needed the appropriate model variants in from disk to the in-memory model-fragment cache in the appropriate location (i.e., on-device or on-cloud) for execution, executing the models on the input and dispatching the results to the registered handlers.” Han P. 130. See also Algorithm 1 lines 22 (“Calculate energy, energy/dollar and dollar budgets for executing model n on the mobile device, split between device/cloud and cloud only.”) and 44 (“Insert variant v in location l, where l E “dev”, “split”, “cloud””) Note that the claimed “clustered outputs of the feature detector” and the “input types” of Han both refer to the type of images to be evaluated.)
Claim 20 (New): The method according to claim 1, wherein each of the different models of the model set is generated for a data input rate, and wherein selecting one or more models of the generated model set as the active models is further based on the data input rate. (This reads on selecting the models based on the rate at which a given type of data is input (e.g. how much face data is received in a given time period. Note that where data of a given type is input, the input data rate for that type of image increased from 0 to something above 0. Han teaches “When applications are installed, they register with the scheduler a map from input types to a catalog of model fragments to be used to process those inputs, and handlers to be invoked with the result from each model. The catalog is stored on disk. When a new input appears, the scheduler (with help from the router and profiler) is responsible for identifying the model variants that need to be executed in response, paging if needed the appropriate model variants in from disk to the in-memory model-fragment cache in the appropriate location (i.e., on-device or on-cloud) for execution, executing the models on the input and dispatching the results to the registered handlers.” Han P. 130. See also Algorithm 1 line 22 (“Calculate energy, energy/dollar and dollar budgets for executing model n on the mobile device, split between device/cloud and cloud only.”))
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Han (MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints, 2016) and Kim (Dynamic Security-Level Maximization for Stabilized Parallel Deep Learning Architectures in Surveillance Applications 2017).
Claim 8: The method according to claim 1, further comprising, performing, by the monitoring and decision module: monitoring a buffer fill status of a buffer of the resource-constrained device, and applying a buffer management strategy configured to select, for each data sample in the buffer, a model of the model set that maximizes the average accuracy under the constraint that no buffer overflows occur. (Han teaches: “To inform the resource allocator, we characterize how several common DNNs, when subjected to state-of-the art optimizations, trade off accuracy for resource use such as memory, computation, and energy.” Han Abstract.
The previously cited art does not expressly refer to buffers.
Kim teaches: “Based on this system architecture, a new dynamic control algorithm which selects one deep learning framework for time-average security-level (i.e., machine learning accuracy for recognition and classification) maximization under the consideration of system stability.” Kim Abstract. “If the deep learning framework has very simple architecture (i.e., one hidden layer with small number of nodes in the layer) [1], the recognition accuracy (a.k.a., security-level, in this paper) will be degraded whereas the computation will be fast which is good in terms of system/queue stability. Note that the queue is stable when limt→∞ Q[t] < ∞ [2]. On the other hand, sophisticated deep learning framework takes higher recognition accuracy whereas the computation becomes slow which leads to un-stability in the queue. Therefore, there exists tradeoff between security-level and system stability. Based on this tradeoff, a dynamic control algorithm is designed which aims at time-average recognition accuracy (i.e., security-level) maximization based on system stability.” Kim P. 192 Col. 2. “Among the given deep learning frameworks, i.e., D = {D1, D2, · · · ,Dn}, one deep learning framework will be selected by the “Stochastic Deep Learning Framework Selection” component that pursues time-average security-level maximization subject to system stability. Based on the theory of stochastic network optimization which aims at time-average utility optimization subject to queue stability[.]” Kim P. 193 Col. 1. “(ii) suppose that Q[t]≈∞. In this case, the V · S (Di) in (1) becomes negligible. Therefore, we have to maximize Q[t] · P (Di) which means we have to process maximum number of bits from Q[t] with one selected deep learning framework Di ∈ D. Therefore, we have to select one Di ∈ D which is the fastest without any consideration of security-level.” Kim P. 193 Col. 1. One of ordinary skill in the art would understand the goal of system stability defined as avoiding buffers being filled to infinity over time as a way of setting up a cost function designed to avoid buffer overflow. See e.g. equation 1 on Kim P. 192.
It would have been obvious to one of ordinary skill in the art before the effective filing date, in view of Kim, to modify the teachings of the primary reference to include a constraint that no buffer overflows because avoiding buffer overflow avoids losing data and avoids accessing overflowed data on larger slower memories which slows the system.)
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Han (MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints, 2016) and McCaffrey (Machine Learning with IoT Devices on the Edge; 2018)
Claim 15: The system according to claim 10, wherein the monitoring and decision module is hosted in a cloud ML system, and wherein the monitoring and decision module is configured to instruct the resource-constrained device to download and activate one or more particular models of the model set. (Han teaches: “The specializer, which runs in the background in the cloud, determines if a small fraction of possible classes “dominate” the CDF for a given model. If so, it adds to the catalog specialized versions of the generic variants (stored in the catalog) of the model by “re-training” them on a subset of the original data dominated by these classes.” Han P. 129 Col. 1. This teaches retraining models by the MCDNN system in the cloud. Figure 8 of Han shows the model fragment cache “replicated in the client and cloud,” indicating that the retrained models arrive on the client device. Further, Han teaches running of retrained models at least in part on mobile devices: “However, MCDNN’s optimizations cut this overhead dramatically. If we only seek to re-target the model (i.e., only retrain the top layer), overhead falls to tens of seconds and pre-forwarding (see Section 5.2.1) training data through lower layers yields roughly 10-second specialization times. MCDNN is thus well positioned to exploit data skew that lasts for as little as tens of seconds to minutes, a capability that we believe dramatically broadens the applicability of specialization. The benefits of sharing, when applicable, are even more striking. For scene recognition, assuming that the baseline scene recognition model (dataset S in Table 2) is running, we share all but its last layer to infer three attributes: we check if the scene is man-made or not (dataset M), whether lighting is natural or artificial (L), and whether the horizon is visible (H). Similarly, for face identification (D), we consider inferring three related attributes: age (Y), gender (G) and race (R). When sharing is feasible, the resources consumed by shared models are remarkably low: tens of kilobytes per model (cyan and purple dots in Figure 10a), roughly 100mJ of energy consumption (Figure 10b) and under 1ms of execution latency (Figure 10c), representing over 100x savings in these parameters over standard models. Shared models can very easily run on mobile devices.”
Han does not expressly state that models are sent from the cloud to a client or edge device.
McCaffrey teaches: “The image in Figure 1 shows an example of what training an ML model looks like. I used Visual Studio Code as the editor and the Python language API interface to the CNTK v2.4 library. Creating a trained ML model can take days or weeks of effort, and typically requires a lot of processing power and memory. Therefore, model training is usually performed on powerful machines, often with one or more GPUs. Additionally, as the size and complexity of a neural network increases, the number of weights and biases increases dramatically, and so the file size of a saved model also increases greatly.” McCaffrey P. 5. “So, suppose you have a trained ML model. You want to deploy the model to a small, weak, IoT device.” McCaffrey P. 6. “Why does ML need to be on the IoT edge?” McCaffrey P. 1. “Latency is often a big problem. In the smart traffic intersection example, a delay of more than a fraction of a second could have disastrous consequences. Additional problems with trying to perform ML in the cloud include reliability (a dropped network connection is typically impossible to predict and difficult to deal with), network availability (for example, a ship at sea may have connectivity only when a satellite is overhead) and privacy/security (when, for example, you’re monitoring a patient in a hospital.)” McCafrey P. 2.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of McCaffrey have the model sent to the edge device because the power of hardware in the cloud shortens training time, while running the model on the edge device can improve speed and reliability.)
Response to Arguments
Applicant's arguments filed 10/01/2025 have been fully considered but they are not persuasive.
Rejections under § 101:
In response to claim amendments and after further consideration of the inventive concept as a whole, the rejection under this section is withdrawn. The claims are drawn to creating specialized models for use on resource constrained devices (i.e. edge devices) and using the models on resource constrained devices. Similar to caching, the claims are directed to a technical solution to the technical problem of edge device resource utilization. In laymen’s terms, the problem is that generic models use up too much computing resources on small devices like cell phones. The solution is creating small models that are only good at evaluating a specific type of data (e.g. pictures of cats or pictures of faces) and applying models to the specific types of data on resources constrained devices.
Rejections under § 112
No specific arguments are offered in the Remarks.
Rejections under §§ 102 and 103
Applicant states that Han fails to teach a “feature detector in the first place” and reasons that this failure precludes the reference from teaching “outputs of a feature detector that are clustered.” Rem. 14. This assertion appears to be inconsistent with Han’s teaching of feature maps within a convolutional neural network used to evaluated images. See rejection of claim 1, above.
Applicant also asserts that Han fails to teach clustering, asserting that the “feature maps are an input for a DNN.” As is conventional, the feature maps of Han are part the deep convolutional neural network. As one of ordinary skill in the are would generally understand, feature maps encode image data in different ways as the data propagates through the convolutional network. Similarly, “clustering” is a common term in this area, which one of ordinary skill in the art would understand as referring to grouping of similar inputs by a neural network (i.e. grouping similar types of images). As illustrated in Han, the last layer in the network is a scaling layer (i.e. a layer that is scaled so that all outputs sum to one), which outputs classes with associated scores. Note that outputs having similar scores of the same class are “clustered.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL M KNIGHT whose telephone number is (571) 272-8646. The examiner can normally be reached Monday - Friday 9-5 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached on (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
PAUL M. KNIGHTExaminerArt Unit 2148
/PAUL M KNIGHT/Examiner, Art Unit 2148