Office Action Analysis: 17647628 — UNKNOWN OBJECT CLASSIFICATION FOR UNSUPERVISED SCALABLE AUTO LABELLING

Office Action

§103 §112
DETAILED ACTION
Status of Claims
This Office action is responsive to communications filed on 2025-09-22. Claim(s) 1-20 is/are pending and are examined herein.
Claim(s) 1-20 is/are objected to. 
Claim(s) 1-20 is/are rejected under 35 USC 112(b).
Claim(s) 1-20 is/are rejected under 35 USC 103.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after 2013-03-16, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2025-09-22 has been entered.
 
Response to Arguments
Regarding objections for informalities and rejections under 35 USC 112, the applicant’s amendments resolve some, but not all, of the concerns from the previous Office action. They also introduce new concerns. Issues in the pending claims are described below. 

Regarding 35 USC 101, the rejections are withdrawn upon further consideration of the claims as a whole. 

Regarding 35 USC 103, the applicant’s arguments have been fully considered. 
The applicant asserts that “Park does not disclose or suggest an autoencoder reconstruction error threshold as a trigger for identifying samples that cannot be soft labeled” [remarks, page 12]. However, this rationale is not relevant to the pending claims as recited because this claim element occurs in one of three limitations that are listed in the alternative, which means that only one of those three elements is required for the broadest reasonable interpretation of the claim. In other words, while Park may not disclose autoencoder reconstruction errors, they are also not presently required by the claim. If the applicant wishes to narrow the scope of the claim to necessitate autoencoder reconstruction errors, they are invited to amend the claim accordingly (e.g., by removing the other two alternatives from the claim). However, even if the applicant were to do this, the examiner notes that Oza, as cited in the conclusion of a previous Office action, already discloses reconstruction errors in classification contexts. The applicant asserts that Oza “does not tie reconstruction error to autoencoders at individual edge nodes in a distributed system” [remarks, page 12], but the examiner maintains the position suggested in the previous Office actions: namely, that Oza, in combination with other references made of record, would disclose the use of autoencoder reconstruction errors in a distributed system. 
The applicant asserts that “Sapek does not teach identifying neighboring edge nodes based on communication feasibility factors such as latency, bandwidth, or cost” [remarks, page 12]. However, the applicant provides no rationale in support of this assertion. In fact, as noted in previous Office actions and again below, Sapek clearly indicates that “the selection of neighbors might also be based on a variety of other factors, such as (e.g.) physical proximity, network topology (e.g., a subset of nodes on the same local area network or within a range of IP addresses), or comparatively low network latencies among the source node 12 and the neighbors” [Sapek, 0047]. These are clearly “communication feasibility factors” of the type described by the applicant. 
The applicant asserts that Park “does not teach selecting among neighboring edge nodes” [remarks, page 12]. However, the examiner maintains the position of the previous Office actions which is reiterated in the prior art mapping as given below: namely, that the disclosures of Park in view of Sapek would have made obvious to a person of ordinary skill in the art before the effective filing date of the invention to select the candidate node from among neighboring nodes in order to ensure efficient communications. 
This applicant has amended the claim in what appears to be an attempt to claim the decision-tree-style conditional logic of [specification, figure 3C] for selecting the candidate node, and appears to assert that this conditional logic is not disclosed by Park [remarks, page 12-13]. However, the applicant is reminded that the broadest reasonable interpretation of limitations recited in the alternative requires only one of the alternatives to be satisfied, and that MPEP 2111.04(II) indicates that “[t]he broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) are not met” [emphasis added]. The claims as presently recited do not require that the set of neighboring edge nodes be either nonempty or empty; and, if the set of neighboring edge nodes is nonempty, they also do not require either that information on classes not known to the first edge node be either available or unavailable. This means that, since the claim does not require meeting any of the conditions in the conditional logic of [specification, figure 3C], the broadest reasonable interpretation of the method claims also does not necessitate any of the three possible strategies of selecting the candidate node. If the applicant wishes to unambiguously claim the conditional logic of [specification, figure 3C], appropriate amendments would be required (cf. examiner’s remarks). 
The complete prior art mapping, with updates in view of the applicant’s amendments, is given below. 

Examiner’s Remarks
If the applicant wishes to claim the conditional logic of the decision tree of [specification, figure 3C] for the selection of the candidate node, the examiner suggests, in view of the guidance regarding conditional limitations in MPEP 2111.04(II), that the applicant amend the claim to include first, second, and third edge nodes, where each of these three edge nodes perform essentially the same steps except that they “go down separate paths” in the decision tree in order to select their corresponding candidate nodes. This would mean, for example, introducing first, second, and third samples (received by the first, second, and third edge nodes, respectively), and first, second, and third sets of neighboring edge nodes (identified by the first, second, and third edge nodes, respectively). The first and second sets of neighboring edge nodes would be required to be nonempty, while the third set of neighboring edge nodes would be required to be empty. For the first edge node, information on classes not known to the first edge node would be required to be available, while for the second edge node, the information on classes not known to the second edge node would be required to be unavailable. There would then be a first candidate node selected from the first set of neighboring edge nodes which “has a maximum number of classes not known to the first edge node”, a second candidate node selected from the second set of neighboring edge nodes whose known classes have “a maximum intersection with classes known to the [second] edge node”, and a third candidate node selected “from among all nodes in the system”. (Appropriate amendments to and/or cancellations of the dependent claims would be required in view of such an amendment to the independent claim. For example, the substance of dependent claims 4-5, 7, and 9-10, would likely be subsumed by such an independent claim, so these dependents could likely be cancelled.)

Claim Objections
Claim(s) 1-20 is/are objected to because of the following informalities: 
Claims 1 and 12 recite selecting, by the central node, a candidate node but this should be “selecting, by the central node, the candidate node” for proper antecedent basis (since the claim previously recites “a candidate node”). Dependent claims 2-11 and 13-20 inherit the objection.
Claims 4 and 15 recite a maximum number of classes not known to the first edge node [emphasis added] but this should be “the maximum number of classes not known to the first edge node” (since the parent claim already recites “a maximum number of classes not known to the first edge node”). 
Claims 10 and 20 recite based on a reconstruction error exceeding the first threshold, based on a probability distribution over a set of classes that includes an unknown class, or based on an assignment of a known unknown class to the sample [emphasis added] this should be “based on the reconstruction error exceeding the first threshold, based on the predicted probability distribution over a set of classes that includes the unknown class, or based on the assignment to the unknown class .

Appropriate correction is required.
	
Claim Rejections - 35 USC 112(b)
The following is a quotation of 35 USC 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 USC 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim(s) 1-20 is/are rejected under 35 USC 112(b) or 35 USC 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 USC 112, the applicant), regards as the invention.

Claims 1 and 12 were amended to indicate that the first edge node is selected from a set of neighboring edge nodes configured for direct communication with the first edge node [emphasis added] but the intended scope of “direct communication” is not clear. The specification does not provide a special definition clarifying the scope of the phrase. For example, it is not clear whether a node whose communication with the first edge node is mediated only by one or more network adapters and wireless routers would be included or excluded by this language. For the purpose of compact prosecution, the limitation is interpreted broadly as encompassing a situation where each of the neighboring edge nodes is connected in some fashion to the first edge node. Dependent claims 2-11 and 13-20 inherit the rejection. The examiner suggests either removing the word “direct” or amending the claim to clearly demarcate the intended scope of “direct communication”. 

Claims 1 and 12 recites a node that has a maximum intersection with classes known to the first edge node  but it is not clear what it means to take an intersection of a node (i.e., a computing device) with a set of classes known to the first edge node. Dependent claims 2-11 and 13-20 inherit the rejection. In view of the specification [specification, 0051], the intention behind this limitation appears to the examiner to be to refer to a node whose set of known classes intersects maximally with the set of known classes of the first edge node. In view of this, the examiner suggests “a node whose known classes have a maximum intersection with classes known to the first edge node” for clarity. For the purpose of compact prosecution, the claims are interpreted broadly herein as encompassing at least this interpretation.  

Claims 6 and 17 recite communicating the sample to each of the plurality of candidate nodes, wherein each of the plurality of candidate nodes generates a plurality of soft label for the sample; aggregating the plurality of soft labels into a single soft label for the sample [emphasis added]. However, the underlined phrase is both indefinite and ungrammatical, since “plurality” conflicts with the use of the singular “soft label”. The specification describes a singular result (i.e., a single soft label) being generated by each candidate node (cf. [specification, 0056, 0058]), and this is the interpretation used herein. The examiner suggests amending the claim to “communicating the sample to each of the plurality of candidate nodes, wherein each of the plurality of candidate nodes generates a candidate soft label for the sample; aggregating the candidate soft labels generated by the plurality of candidate nodes into a single soft label for the sample” for definiteness and consistency with the specification (or, alternatively, using some other word besides “candidate” to avoid a repetition of nomenclature with the “soft label” already recited in the parent claim). For the purpose of compact prosecution, the claims are interpreted broadly herein as encompassing at least this interpretation.  

Claim Rejections - 35 USC 103
The following is a quotation of 35 USC 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 USC 102(b)(2)(C) for any potential 35 USC 102(a)(2) prior art against the later invention.

Claim(s) 1-10 and 12-20 is/are rejected under 35 USC 103 as being unpatentable over Gi-Ho PARK et al. (US20220335289A1, effectively filed 2019-06-25; hereafter, “Park”) in view of Adam SAPEK (US20100306280A1, published 2010-12-02; hereafter, “Sapek”) and Bruno RICHARD (US20030076786A1, published 2003-04-24).

Claim 1
Park discloses: 
In a system including a central node associated with a plurality of nodes, ([Park, figure 1]: Park discloses a distributed computing system having a client 130, a server 120, and a plurality 110 of terminals 111-115 [Park, figure 1; see also, 0029]. For the sake of concreteness, the plurality 110 of terminals 111-115 maps to the “plurality of nodes” of the claim, and the server 120 and/or client 130 to the “central node” of the claim.)
a method, comprising: training a model at a first edge node; ([Park, 0001, 0031, 0037, 0058]: Park discloses “a method of distributing pretrained classification models to several terminals” [Park, 0001], i.e., where “[s]ubclassification models are each stored in one of the terminals 111 to 115 in a distributed manner” [Park, 0031]. Moreover, various aspects of training these subclassification models are discussed, for example, in [Park, 0037, 0058, etc]. Any one of the terminals is the “first edge node” of the claim, the subclassification model at that terminal is the “model at [the] first edge node” of the claim, and its training maps to the “training” step of the claim. This mapping for the “first edge node” is refined in the following parentheticals.)
identifying, by the model executing at the first edge node, a sample from a data stream received at the first edge node by the model; ([Park, 0032, 0041]: Park discloses that “[t]he subclassification models may generate classification data for input target data, for example, a target image” [Park, 0032], where “[t]he target image may be transmitted from one of the plurality of terminals 111 to 115 to another terminal [or] the target image may be provided from the server 120 to the plurality of terminals 111 to 115” [Park, 0041]. The input target data of Park (also called “target data” [Park, figure 3] or “input data” [Park, 0025]) maps to the “sample” of the claim, and the transmission by which it is received maps to the “data stream” of the claim.)
determining, by the first edge node, that the sample cannot be soft labeled at the node based on one or more of: a reconstruction error of an autoencoder at the first edge node exceeding a first threshold, a predicted probability distribution indicating a highest probability for an unknown class, or an assignment to the unknown class; ([Park, 0032-0033, 0050]: Each subclassification model classifies objects into one of a number of classes – including “a plurality of preset target classes” and, “in some embodiments”, an Others class (“which [is a] class other than the target classes”) [Park, 0033]. Moreover, the “classification data” generated by the subclassification models [Park, 0032] “may include confidence values for all the classes allocated to each subclassification model” [Park, 0050]. In other words, the subclassification models each perform a soft labeling of the input target data. The Others class maps to the “unknown class” of the claim. Any terminal which classifies the input target data into the Others class (e.g., by assigning the Others class the highest confidence value) then maps to the “first edge node” of the claim, since that terminal has then determined “that the sample cannot be soft labeled” based on both “a predicted probability indicating a highest probability for an unknown class” as well as an “assignment to an unknown class” as recited by the claim. The applicant is invited to consult Oza as cited in the conclusion of a previous Office action regarding reconstruction errors of autoencoders.)
determining, by the first edge node, a set of unlikely classes for the sample, the set comprising one or more known classes whose predicted probabilities are below a second threshold; ([Park, 0032-0033, 0050]: As noted above, the input target data is classified into the Others class by the subclassification model of the terminal that is mapped to the “first edge node” of the claim above [Park, 0032-0033, 0050]. The confidence value assigned to the Others class maps to the “second threshold” of the claim, and all of the target classes of that subclassification model (i.e., all of the classes associated with that model besides the Others class) map to the “unlikely classes” of the claim. For example, in the embodiment depicted in [Park, figure 2], if the input target data was classified into Others by the second classification model 212, so that the second terminal is the “first edge node” of the claim. Then C and D would be the “unlikely classes” of the claim (since the confidence values for those two classes would be below that of the confidence value of the Others class, i.e., they are “one or more known classes whose predicted probabilities are below [the] second threshold” as recited by the claim). The applicant is also invited to consult [Park, 0055-0056], or Daniels or Ebihara as cited in the conclusion of a previous Office action.)
selecting, by the first edge node, a candidate node [from the set of neighboring edge nodes,] the candidate node being a node that has a maximum number of classes not known to the first edge node, or, when information on classes not known to the first edge node is unavailable, a node that has a maximum intersection with classes known to the first edge node; ([Park, 0041, figure 2]: Park discloses that the input target data “may be transmitted from one of the plurality of terminals 111 to 115 to another terminal” [Park, 0041]. The other terminal to which the data is transmitted is the “candidate node” of the claim. In one of the embodiments depicted in Park, there are 10 target labels A to J, and each of the 5 subclassification models is associated with exactly 2 of those target labels [Park, figure 2]. In other words, the “number of classes not known to the first edge node” is exactly 8 for all of the other terminals/nodes in this embodiment, so any one of the other terminals has the property of “being a node that has a maximum number of classes not known to the first edge node” as recited by the claim. The examiner notes that the claim as presently recited requires only one of the two limitations which are presently recited in the alternative. Nonetheless, the examiner notes that, in the same embodiment, for any pair of subclassification models, the intersection of their sets of target labels is empty (no two subclassification labels share target labels in this embodiment), so it is also true that any other terminal has the property that its set of target classes “has a maximum intersection with classes known to the first edge node” as recited by the claim.)
[when the set of neighboring edge nodes is empty, communicating the sample to the central node] and selecting, by the central node, a candidate node from among all nodes in the system; ([Park, 0040-0041]: This recites a conditional limitation: the claim does not require that the set of neighboring edge nodes be empty, so it also does not require steps which are contingent on this hypothesis. Nonetheless, the examiner notes that the “candidate node” as mapped is above is selected “from among all nodes in the system” as recited by the claim. Park also indicates that the terminal to which data is sent “may be determined according to the results of monitoring terminal resources” [Park, 0042] and that  “[s]uch monitoring may be performed by one of the plurality of terminals… or the server” [Park, 0040]. Since the server is mapped to the “central node” of the claim, a selection of the “candidate node” on the basis of monitoring terminal resources by the server maps falls under the broadest reasonable interpretation of being a selection “by the central node” as recited by the claim. The applicant is also invited to consult the combination with Richard as described below.)
communicating the sample to the candidate node; ([Park, 0041]: As noted above, Park discloses that the input target data “may be transmitted from one of the plurality of terminals 111 to 115 to another terminal” [Park, 0041]. The terminal to which the data is transmitted/communicated is the “candidate node” of the claim.)
and receiving, from the candidate node, a soft label for the sample, wherein the soft label comprises a probability distribution generated by a classifier at the candidate node over a set of known classes of the candidate node. ([Park, 0042, 0050]: Every terminal in Park has a subclassification model with a set of target classes. In particular, considering the terminal that has been mapped to the “candidate node” as described above, the subclassification model at that terminal maps to the “classifier of the candidate node” of the claim and its target classes to the “set of known classes of the candidate node” of the claim. As noted above, the classification data generated by that terminal includes confidence values for each of the target classes associated to that model [Park, 0050], so this classification data generated by the “candidate node” a mapped above maps to the “soft label for the sample, wherein the soft label comprises a probability distribution generated by [the] classifier at the candidate node over [the] set of known classes of the candidate node” as recited by the claim. Moreover, Park discloses that “[o]ne of the plurality of terminals 111 to 115 may receive classification data generated by another terminal” [Park, 0042]. This receiving step maps to the “receiving” step of the claim.)

Park might not distinctly disclose: 
identifying, by the first edge node, a set of neighboring edge nodes configured for direct communication with the first edge node based on one or more of latency, bandwidth, or transmission cost; [selecting, by the first edge node, a candidate node] from the set of neighboring edge nodes,
when the set of neighboring edge nodes is empty, communicating the sample to the central node

Sapek is in the field of distributed computing. Moreover, Park in view of Sapek discloses: 
identifying, by the first edge node, a set of neighboring edge nodes configured for direct communication with the first edge node based on one or more of latency, bandwidth, or transmission cost; [selecting, by the first edge node, a candidate node] from the set of neighboring edge nodes, ([Sapek, 0042, 0047, and figure 10]: Sapek discloses “selection, by a node, of a set of neighbors”, where the selection is “based on a variety of other factors, such as (e.g.) physical proximity, network topology (e.g., a subset of nodes on the same local area network or within a range of IP addresses), or comparatively low network latencies among the source node 12 and the neighbors” [Sapek, 0047]. It further discloses forming a “swarm network” which is a subset of the neighbor set [Sapek, figure 10 and 0042] to which data is transmitted (cf. “send the updated object 14 to the swarm nodes” [Sapek, 0042]). The factors described in Sapek for selecting the neighbor set fall under the broadest reasonable interpretation of “one or more latency, bandwidth, or transmission cost” as recited by the claim. In the combination, any one of the swarm nodes is taken to be the candidate node as disclosed by the Park, so that the candidate note is in fact “select[ed]… from the set of neighboring edge nodes” as recited by the claim.)
when the set of neighboring edge nodes is empty, ([Sapek, 0047]: Sapek mentions a “neighbor set comprising any number of neighbors” and more specifically discusses certain actions to be taken in case of an “empty neighbor set” [Sapek, 0047].)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the distributed classification system of Park with a use of neighbor sets as in Sapek because it “may help avoid… network congestion and inefficient synchronization” [Sapek, 0047], so the combination would be more efficient overall.  

The claim does not require the set of neighboring edge nodes to be empty, and therefore also does not require any actions which are contingent on this hypothesis. Nonetheless, it may nonetheless be argued that Park in view of Sapek might not distinctly disclose:
[when the set of neighboring edge nodes is empty,] communicating the sample to the central node

Richard is in the field of distributed computing. Moreover, Park in view of Sapek and Richard discloses:
[when the set of neighboring edge nodes is empty,] communicating the sample to the central node ([Richard, 0003]: Richard discloses a system in which, when a “device 101a wishes to distribute data to other devices in the network, e.g. devices 101b to 101n, the device 101a sends the data to the central server 102. The central server knows which devices are connected to it, and therefore is able to distribute the data as appropriate to the devices 101b to 101n” [Richard, 0003]. Sending data to the central server maps to the step of “communicating the sample to the central node” of the claim. The central server distributing data to an appropriate device corresponds to the selection of and communication to the candidate node of the claim as mapped from Park. In the combination, these actions may be taken when the neighbor set of Sapek is empty.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the distributed classification system of Park in view of Sapek with the process of routing data through the central server as described in Richard because it would ensure that the system is able to transmit data even from nodes that have no neighbors, thereby ensuring that the system is more robust overall. 

Claim 2
Park in view of Sapek and Richard discloses elements of the parent claim(s). It also discloses:
[The method of claim 1, wherein] determining unlikely classes includes, when the first edge node includes an open set model, including all classes known to the first edge node in the unlikely classes. ([Park, 0033]: The broadest reasonable interpretation of this claim does not require that the “first edge node” actually include “an open set model”, which means that it also does not require the set of unlikely classes to have any additional properties. In other words, the mappings as given under the parent claim already satisfy the broadest reasonable interpretation of the contingent limitation of this dependent claim. Nonetheless, as noted under the parent claim, Park discloses that their subclassification models may, “in some embodiments”, include an Others class “which [is a] class other than the target classes” [Park, 0033]. A subclassification model including an Others class maps to the “open set model” of the claim. Moreover, the mappings described under the parent claim include a mapping in which all of the target classes of a subclassification model (i.e., all of the classes allocated to the model except for the Others class) are mapped to the “unlikely classes” of the claim, so the target classes map to the “all classes known to the node” of the claim.)

The same motivation to combine applies. 

Claim 3
Park in view of Sapek and Richard discloses elements of the parent claim(s). It also discloses:
[The method of claim 1, wherein] determining unlikely classes includes, when the model executing at the first edge node does not include an open set model, including in the set of unlikely classes those classes known to the first edge node whose predicted probabilities are lower than the second threshold. ([Park, 0033]: The broadest reasonable interpretation of this claim does not require the “model executing at the first edge node” to actually not include an “open set model”, which means that it also does not require the set of unlikely classes to have any particular properties. In other words, the mappings as given under the parent claim already satisfy the broadest reasonable interpretation of the contingent limitation of this dependent claim. Nonetheless, as noted under the parent claim, Park discloses that their subclassification models may, “in some embodiments”, include an Others class “which [is a] class other than the target classes” [Park, 0033]. Since the Others class need only occur in some embodiments, the subclassification models need not include an Others class and so need not be an “open set model” as recited by the claim. Moreover, the mappings described under the parent claim include a mapping wherein the unlikely classes are those whose predicted probabilities are lower than the second threshold.)

The same motivation to combine applies. 

Claim 4
Park in view of Sapek and Richard discloses elements of the parent claim(s). It also discloses:
[The method of claim 1, further comprising] selecting, as the candidate node, a neighboring edge node that has a maximum number of classes not known to the first edge node. ([Park, 0041 and figure 2; Sapek, 0047]: As noted under the parent claim, Park in view of Sapek and Richard discloses selecting, as the candidate node, a neighboring node having the property that the number of its target classes not known to the first edge node is a maximizum.)

The same motivation to combine applies. 

Claim 5
Park in view of Sapek and Richard discloses elements of the parent claim(s). It also discloses:
[The method of claim 1, further comprising] selecting the candidate node such that an intersection of classes known to the candidate node and classes known to the first edge node is maximized. ([Park, 0041 and figure 2]: As noted under the parent claim, Park in view of Sapek and Richard discloses selecting, as the candidate node, a neighboring node having the property that the number of its target classes which are also known to the first edge node is a maximum.)

The same motivation to combine applies. 

Claim 6
Park in view of Sapek and Richard discloses elements of the parent claim(s). It also discloses:
[The method of claim 1, further comprising:] selecting a plurality of candidate nodes; communicating the sample to each of the plurality of candidate nodes, wherein each of the plurality of candidate nodes generates a plurality of soft label for the sample; ([Park, figure 1; Sapek, 0042 and figure 10]: As noted under the parent claim, Park discloses input target data being transmitted to each of the plurality of terminals [Park, 0041 and figure 1], where each terminal stores a subclassification model that generates classification data [Park, 0031-0032], including confidence values associated to all of the (target) classes of the subclassification model [Park, 0050]. The plurality of terminals maps to the “plurality of candidate nodes” of the claim, transmitting data to the terminals is the “communicating” step of the claim, and any class together with a corresponding confidence value that is produced by that subclassification model maps to the “soft label” of the claim. Alternatively, in the combination with Sapek, the swarm nodes of Sapek (i.e., the subset of the neighbor set) [Sapek, figure 10 and 0042] could be taken to be the “plurality of candidate nodes” of the claim.) 
aggregating the plurality of soft labels into a single soft label for the sample before communicating the single soft label to the central node; and communicating the single soft label to the central node. ([Park, 0043-0044, 0052-0055]: These limitations can be mapped in one of at least two ways. First, as noted under the parent claim, Park discloses several strategies for processing the classification data produced by the subclassification models in order to obtain a “final class” alongside its associated confidence value [Park, 0052-0055], and then the “determined final class is provided to the client” [Park, 0043]. In this case, the final class maps to the “single soft label” of the claim, any of the strategies used to produce it can map to the “aggregating” step. Second, Park discloses embodiments where a target class is shared across multiple subclassification models (e.g., class B in [Park, figure 4]), and where the confidence value associated to such a shared class may be taken to an average, or a maximum, across the confidence values assigned to it by the associated subclassification models [Park, 0054]. The resulting classification data is then transmitted to either a terminal or a server for determining the final class [Park, 0044]. In this case, the shared class together with its aggregated confidence value maps to the “single soft label” of the claim.)

The same motivation to combine applies. 

Claim 7
Park in view of Sapek and Richard discloses elements of the parent claim(s). It also discloses:
[The method of claim 1, further comprising] selecting the candidate node from the set of neighboring edge nodes. ([Park, 0041, Sapek, 0047]: As noted under the parent claim, Park discloses selecting a candidate node, Sapek discloses constructing a neighbor set and choosing from the neighbor set, and the combination discloses the selection of the candidate node from the neighbor set.)

The same motivation to combine applies. 

Claim 8
Park in view of Sapek discloses elements of the parent claim(s). It also discloses:
[The method of claim 1, further comprising] communicating the soft label generated by the candidate node to the central node. ([Park, 0042-0043]: As noted under the parent claim, the server and/or client maps to the “central node” of the claim. Park discloses that “the server 120 may receive the classification data” that is generated at a terminal by a subclassification model [Park, 0042] and also that the “determined final class is provided to the client” [Park, 0043].)

The same motivation to combine applies. 

Claim 9
Park in view of Sapek and Richard discloses the elements of the parent claim(s). It also discloses:
[The method of claim 1, further comprising,] when the set of neighboring edge nodes is empty, selecting the candidate node by the central node from among all nodes. ([Sapek, 0047; Park, 0040-0041]: This limitation repeats a limitation already substantively incorporated into the parent claim, and it disclosed in the same way as noted above. Sapek discloses constructing a neighbor set as well as an empty neighbor set [Sapek, 0047] and Park discloses selecting the candidate node from among all nodes [Park, 0040-0041].)

The same motivation to combine applies. 

Claim 10
Park in view of Sapek and Richard discloses elements of the parent claim(s). It also discloses:
[The method of claim 1, wherein] the sample is identified as a sample that cannot be soft labeled based on a reconstruction error exceeding the first threshold, based on a probabilistic distribution over a set of classes that includes an unknown class, or based on an assignment of a known unknown class to the sample. ([Park, 0032-0033, 0050]: This limitation repeats a limitation already substantively incorporated into the parent claim, and it disclosed in the same way as noted above. In Park, the sample is determined/identified as being a sample that cannot be soft labeled based on at least two of the three criteria recited in the claim. The applicant is invited to consult Oza as cited in the conclusion of a previous Office action regarding reconstruction error. The examiner notes that the Others class of Park falls under the broadest reasonable interpretation of a “known unknown class”: it is an “unknown class” in the sense that it is a catch-all for all non-target classes of a subclassification model, but it is also “known” by the subclassification model.)

The same motivation to combine applies. 

Claim 12
Park discloses: 
In a system including a central node associated with a plurality of nodes, ([Park, figure 1]: Park discloses a distributed computing system having a client 130, a server 120, and a plurality 110 of terminals 111-115 [Park, figure 1; see also, 0029]. For the sake of concreteness, the plurality 110 of terminals 111-115 maps to the “plurality of nodes” of the claim, and the server 120 and/or client 130 to the “central node” of the claim.)
a non-transitory storage medium having stored therein instructions that are executable by one or more hardware processors to perform operations comprising: ([Park, 0026, 0074]: Park discloses that the methods disclosed therein “may be performed in a computing device including a processor and a memory” and “may be implemented in the form of a program command executable by various computing means and recorded on a computer readable recording medium” [Park, 0074].)
training a model at a first edge node; ([Park, 0001, 0031, 0037, 0058]: Park discloses “a method of distributing pretrained classification models to several terminals” [Park, 0001], i.e., where “[s]ubclassification models are each stored in one of the terminals 111 to 115 in a distributed manner” [Park, 0031]. Moreover, various aspects of training these subclassification models are discussed, for example, in [Park, 0037, 0058, etc]. Any one of the terminals is the “first edge node” of the claim, the subclassification model at that terminal is the “model at [the] first edge node” of the claim, and its training maps to the “training” step of the claim. This mapping for the “first edge node” is refined in the following parentheticals.)
identifying, by the model executing at the first edge node, a sample from a data stream received at the first edge node by the model; ([Park, 0032, 0041]: Park discloses that “[t]he subclassification models may generate classification data for input target data, for example, a target image” [Park, 0032], where “[t]he target image may be transmitted from one of the plurality of terminals 111 to 115 to another terminal [or] the target image may be provided from the server 120 to the plurality of terminals 111 to 115” [Park, 0041]. The input target data of Park (also called “target data” [Park, figure 3] or “input data” [Park, 0025]) maps to the “sample” of the claim, and the transmission by which it is received maps to the “data stream” of the claim.)
determining, by the first edge node, that the sample cannot be soft labeled at the node based on one or more of: a reconstruction error of an autoencoder at the first edge node exceeding a first threshold, a predicted probability distribution indicating a highest probability for an unknown class, or an assignment to the unknown class; ([Park, 0032-0033, 0050]: Each subclassification model classifies objects into one of a number of classes – including “a plurality of preset target classes” and, “in some embodiments”, an Others class (“which [is a] class other than the target classes”) [Park, 0033]. Moreover, the “classification data” generated by the subclassification models [Park, 0032] “may include confidence values for all the classes allocated to each subclassification model” [Park, 0050]. In other words, the subclassification models each perform a soft labeling of the input target data. The Others class maps to the “unknown class” of the claim. Any terminal which classifies the input target data into the Others class (e.g., by assigning the Others class the highest confidence value) then maps to the “first edge node” of the claim, since that terminal has then determined “that the sample cannot be soft labeled” based on both “a predicted probability indicating a highest probability for an unknown class” as well as an “assignment to an unknown class” as recited by the claim. The applicant is invited to consult Oza as cited in the conclusion of a previous Office action regarding reconstruction errors of autoencoders.)
determining, by the first edge node, a set of unlikely classes for the sample, the set comprising one or more known classes whose predicted probabilities are below a second threshold; ([Park, 0032-0033, 0050]: As noted above, the input target data is classified into the Others class by the subclassification model of the terminal that is mapped to the “first edge node” of the claim above [Park, 0032-0033, 0050]. The confidence value assigned to the Others class maps to the “second threshold” of the claim, and all of the target classes of that subclassification model (i.e., all of the classes associated with that model besides the Others class) map to the “unlikely classes” of the claim. For example, in the embodiment depicted in [Park, figure 2], if the input target data was classified into Others by the second classification model 212, so that the second terminal is the “first edge node” of the claim. Then C and D would be the “unlikely classes” of the claim (since the confidence values for those two classes would be below that of the confidence value of the Others class, i.e., they are “one or more known classes whose predicted probabilities are below [the] second threshold” as recited by the claim). The applicant is also invited to consult [Park, 0055-0056], or Daniels or Ebihara as cited in the conclusion of a previous Office action.)
selecting, by the first edge node, a candidate node [from the set of neighboring edge nodes,] the candidate node being a node that has a maximum number of classes not known to the first edge node, or, when information on classes not known to the first edge node is unavailable, a node that has a maximum intersection with classes known to the first edge node; ([Park, 0041, figure 2]: Park discloses that the input target data “may be transmitted from one of the plurality of terminals 111 to 115 to another terminal” [Park, 0041]. The other terminal to which the data is transmitted is the “candidate node” of the claim. In one of the embodiments depicted in Park, there are 10 target labels A to J, and each of the 5 subclassification models is associated with exactly 2 of those target labels [Park, figure 2]. In other words, the “number of classes not known to the first edge node” is exactly 8 for all of the other terminals/nodes in this embodiment, so any one of the other terminals has the property of “being a node that has a maximum number of classes not known to the first edge node” as recited by the claim. The examiner notes that the claim as presently recited requires only one of the two limitations which are presently recited in the alternative. Nonetheless, the examiner notes that, in the same embodiment, for any pair of subclassification models, the intersection of their sets of target labels is empty (no two subclassification labels share target labels in this embodiment), so it is also true that any other terminal has the property that its set of target classes “has a maximum intersection with classes known to the first edge node” as recited by the claim.)
[when the set of neighboring edge nodes is empty, communicating the sample to the central node] and selecting, by the central node, a candidate node from among all nodes in the system; ([Park, 0040-0041]: This recites a conditional limitation: the claim does not require that the set of neighboring edge nodes be empty, so it also does not require steps which are contingent on this hypothesis. Nonetheless, the examiner notes that the “candidate node” as mapped is above is selected “from among all nodes in the system” as recited by the claim. Park also indicates that the terminal to which data is sent “may be determined according to the results of monitoring terminal resources” [Park, 0042] and that  “[s]uch monitoring may be performed by one of the plurality of terminals… or the server” [Park, 0040]. Since the server is mapped to the “central node” of the claim, a selection of the “candidate node” on the basis of monitoring terminal resources by the server maps falls under the broadest reasonable interpretation of being a selection “by the central node” as recited by the claim. The applicant is also invited to consult the combination with Richard as described below.)
communicating the sample to the candidate node; ([Park, 0041]: As noted above, Park discloses that the input target data “may be transmitted from one of the plurality of terminals 111 to 115 to another terminal” [Park, 0041]. The terminal to which the data is transmitted/communicated is the “candidate node” of the claim.)
and receiving, from the candidate node, a soft label for the sample, wherein the soft label comprises a probability distribution generated by a classifier at the candidate node over a set of known classes of the candidate node. ([Park, 0042, 0050]: Every terminal in Park has a subclassification model with a set of target classes. In particular, considering the terminal that has been mapped to the “candidate node” as described above, the subclassification model at that terminal maps to the “classifier of the candidate node” of the claim and its target classes to the “set of known classes of the candidate node” of the claim. As noted above, the classification data generated by that terminal includes confidence values for each of the target classes associated to that model [Park, 0050], so this classification data generated by the “candidate node” a mapped above maps to the “soft label for the sample, wherein the soft label comprises a probability distribution generated by [the] classifier at the candidate node over [the] set of known classes of the candidate node” as recited by the claim. Moreover, Park discloses that “[o]ne of the plurality of terminals 111 to 115 may receive classification data generated by another terminal” [Park, 0042]. This receiving step maps to the “receiving” step of the claim.)

Park might not distinctly disclose: 
identifying, by the first edge node, a set of neighboring edge nodes configured for direct communication with the first edge node based on one or more of latency, bandwidth, or transmission cost; [selecting, by the first edge node, a candidate node] from the set of neighboring edge nodes,
when the set of neighboring edge nodes is empty, communicating the sample to the central node

Sapek is in the field of distributed computing. Moreover, Park in view of Sapek discloses: 
identifying, by the first edge node, a set of neighboring edge nodes configured for direct communication with the first edge node based on one or more of latency, bandwidth, or transmission cost; [selecting, by the first edge node, a candidate node] from the set of neighboring edge nodes, ([Sapek, 0042, 0047, and figure 10]: Sapek discloses “selection, by a node, of a set of neighbors”, where the selection is “based on a variety of other factors, such as (e.g.) physical proximity, network topology (e.g., a subset of nodes on the same local area network or within a range of IP addresses), or comparatively low network latencies among the source node 12 and the neighbors” [Sapek, 0047]. It further discloses forming a “swarm network” which is a subset of the neighbor set [Sapek, figure 10 and 0042] to which data is transmitted (cf. “send the updated object 14 to the swarm nodes” [Sapek, 0042]). The factors described in Sapek for selecting the neighbor set fall under the broadest reasonable interpretation of “one or more latency, bandwidth, or transmission cost” as recited by the claim. In the combination, any one of the swarm nodes is taken to be the candidate node as disclosed by the Park, so that the candidate note is in fact “select[ed]… from the set of neighboring edge nodes” as recited by the claim.)
when the set of neighboring edge nodes is empty, ([Sapek, 0047]: Sapek mentions a “neighbor set comprising any number of neighbors” and more specifically discusses certain actions to be taken in case of an “empty neighbor set” [Sapek, 0047].)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the distributed classification system of Park with a use of neighbor sets as in Sapek because it “may help avoid… network congestion and inefficient synchronization” [Sapek, 0047], so the combination would be more efficient overall.  

The claim does not require the set of neighboring edge nodes to be empty, and therefore also does not require any actions which are contingent on this hypothesis. Nonetheless, it may nonetheless be argued that Park in view of Sapek might not distinctly disclose:
[when the set of neighboring edge nodes is empty,] communicating the sample to the central node

Richard is in the field of distributed computing. Moreover, Park in view of Sapek and Richard discloses:
[when the set of neighboring edge nodes is empty,] communicating the sample to the central node ([Richard, 0003]: Richard discloses a system in which, when a “device 101a wishes to distribute data to other devices in the network, e.g. devices 101b to 101n, the device 101a sends the data to the central server 102. The central server knows which devices are connected to it, and therefore is able to distribute the data as appropriate to the devices 101b to 101n” [Richard, 0003]. Sending data to the central server maps to the step of “communicating the sample to the central node” of the claim. The central server distributing data to an appropriate device corresponds to the selection of and communication to the candidate node of the claim as mapped from Park. In the combination, these actions may be taken when the neighbor set of Sapek is empty.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the distributed classification system of Park in view of Sapek with the process of routing data through the central server as described in Richard because it would ensure that the system is able to transmit data even from nodes that have no neighbors, thereby ensuring that the system is more robust overall. 

Claims 13-20 inherit limitations from claim 20 and recite additional limitations which are substantially similar to those recited by claims 2-10 (which claim 19 reciting limitations found in claims 8-9), so they are rejected by the same rationale. 

Claim(s) 11 is/are rejected under 35 USC 103 as being unpatentable over Park in view of Sapek and Richard, further in view of Kevin ASH et al. (US20170109226A1, published 2017-04-20; hereafter “Ash”).

Claim 11
Park in view of Sapek and Richard discloses the elements of the parent claim(s). It does not distinctly disclose: 
[The method of claim 1, further comprising] marking the sample for manual labelling after a threshold number of attempts have been attempted to determine the soft label.  

Ash is in the field of computation. Moreover, Park in view of Sapek, Richard, and Ash discloses:
[The method of claim 1, further comprising] marking the sample for manual labelling after a threshold number of attempts have been attempted to determine the soft label. ([Ash, abstract]: Ash discloses a system in which, “in response to a failure beyond a threshold number of times…, an error notification is transmitted for manual intervention” [Ash, abstract]. In the combination, the classification procedure of Park is iterated, the threshold number of failures of Ash maps to the “threshold number of attempts” of the claim, and transmitting an error notification for manual intervention as in Ash maps to “marking the sample for manual labeling” as in the claim.) 

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the classification system of Park in view of Sapek and Richard with the idea of flagging problematic items for manual intervention as in Ash because it would avoid expending computing resources for tasks on which the system repeatedly fails. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Surin AHN et al. (Global Multiclass Classification and Dataset Construction via Heterogeneous Local Experts, published 2020-12-03; hereafter, “Ahn”) discloses performing multiclass classification using “[h]eterogeneous [l]ocal [e]experts” [Ahn, title], i.e., “a set of m classifiers F_{m,K} = {f_i : X → Y_i, i ∈ [m]} with Y_i ⊆ [K] = {1, 2, …, K} and 2 ≤ |Y_i| ≤ K for all i ∈ [m]. Each possible input x in X belongs to one of the classes in [K]. We assume that each classifier f_i was trained to distinguish only a subset of the classes in [K], i.e., those contained in Y_i. Therefore, given any input x in X, each f_i necessarily outputs a class in Y_i” [Ahn, section II first paragraph]. If each model f_i is distributed to a different node in a distributed system (as, e.g., in Park), the corresponding subset of labels Y_i would then correspond to the set of “classes known to [the] node” of the claims. 
Jin-Hyuk HONG et al. (Fingerprint classification using one-vs-all support vector machines dynamically ordered with naïve Bayes classifiers, published 2008; hereafter, “Hong”) discloses performing multiclass classification using one-vs-all (OVA) decomposition where each of the subclassifiers are “dynamically ordered with a naïve Bayes classifer” [Hong, abstract]. OVA decomposition means that the set of “classes known to [each] node” in the language of the claims is just a single class, and the dynamic ordering means that each subsequent node (i.e., each “candidate node” in the language of the claims) is selected in a systematic way using the naïve Bayes classifier. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Shishir AGRAWAL whose telephone number is +1 703-756-1183. The examiner can normally be reached Monday through Thursday, 08:30-14:30 Pacific Time.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey SHMATOV can be reached on +1 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is +1 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at +1 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call +1 800-786-9199 (IN USA OR CANADA) or +1 571-272-1000.

/S.A./Examiner, Art Unit 2123

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
UNKNOWN OBJECT CLASSIFICATION FOR UNSUPERVISED SCALABLE AUTO LABELLING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

UNKNOWN OBJECT CLASSIFICATION FOR UNSUPERVISED SCALABLE AUTO LABELLING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email