Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Remarks
This Office Action is responsive to Applicants' Amendment f10/14/2025.
Claims 1-4, 8-14 and 17 are pending.
Response to Arguments
The rejections to claims under 35 U.S.C. § 101 are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
The rejections to claims 1 and 2 under 35 U.S.C. § 102, are withdrawn as necessitated by applicant’s amendment. Amended claim 1 includes the limitation from cancelled claims 5-7. An updated 103 rejection is provided in the office action set forth below.
In response to applicant’s argument on Page 10 that Park “fails to disclose that the calibrated table is corrected using runtime by reflecting the real-time current consumption quantity even after the table information has been recorded, to prevent the real-time consumption quantity a resource from exceeing a resource limit”, in particular, applicant argues that the table disclosed by Park is ‘offline’, instead of dynamically generated, the argument has been carefully considered, but examiner respectably disagrees.
Park Fig.5 discloses table of metadata recording consumption quantities of energy consumed (e.g. resource), Fig. 6 discloses how the table is dynamically generated. Park 7A-B and col. 8-9 further discloses how the table is used to calibrate and update the NN layer based on thermal energy dissipitation thresholds. “Information identifying any processors having temperatures exceeding the first threshold may be added to a first list, as indicated by block 708. The measured temperature associated with each SoC processor also may be compared with a second threshold (block 706) representing a temperature below which undesirable effects of excess thermal energy dissipation are estimated to be less likely to occur. Information identifying any processors having temperatures below the second threshold may be added to a second list, as indicated by block 710.” This disclosed two lists of processors based on thermal energy dissipation. More importantly, Park further discloses “ the method 700 may dynamically re-allocate the executing neural network layers or other neural network units among the various SoC processors in a manner that tends to maintain the processors at temperatures between the first (higher) and second (lower) thresholds. .. (40) As indicated by the loop between blocks 712 and 714, the method 700 may include iteratively performing the above-described comparisons and categorizations (blocks 704-710), selecting the next processor (block 714) and determining if there are more processors that have not yet been selected (block 712)….. 44) the method 700 may include iteratively performing the steps described above with regard to blocks 718-722, selecting the next processor (if any) in the first list (block 716) and determining if there are more processors in the first list that have not yet been selected (block 724), until all processors in the first list have been the subject of the steps described above with regard to blocks 718-722. … The method 700 may be repeated at time intervals on a continuous basis while the neural networks are operating.” Calibration is applying a gain or offset to the predicted metric according to the current metric, the actual list of processors being added or relocated corresponds to a “generated calibrated table”, it is generated by the circuit, by looking at the resource table, such as temperature, and determines whether or not to maintain it or relocate it based on threshold recorded in metadata table (e.g. table of consumption quantity of resource).
Therefore, , Park Fig. 5, 6 and 7A-B and related disclosure teaches a process to dynamically generate a list of processors to re-allocate resource based on energy consumption (calibrated table).
The following references are relied upon for the 103 rejection set forth below:
Ryu (US 20200394504 A1)
Park (US 1149238 B2)
Bleiweiss (US12086705 B2)
Hassantabar (US 20220036150 A1)
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-2, 4-8 are rejected under 35 U.S.C. 103 as being unpatentable over Ryu in view of Park.
Regarding claim 1, Ryu teaches “An artificial intelligence semiconductor processor comprising: a neural network computational accelerator configured to implement a neural network based on neural network configuration information;” (Paragraph 0002, “Embodiments of the inventive concept described herein relate to a semiconductor device, and more particularly, relate to a precision scalable neural network accelerator.” Paragraph 0029, “A neural network accelerator 1000 may process input feature data IF based on the neural network to generate output feature data OF. For example, the neural network accelerator 1000 may process the input feature data IF based on convolutional neural network (CNN). However, the inventive concept is not limited thereto. For example, the neural network accelerator 1000 may use various neural network algorithms.” Ryu teaches that the neural network accelerator will implement feature data based on the neural network and neural network algorithms that can affect the neural network configuration information). Ryu teaches “and a control circuit configured to adjust precision of the neural network configuration information based on device information, wherein the control circuit adjusts the precision of the neural network configuration information such that neural network processing is performed by using a resource within a resource limit” (Figure 1, the processing circuit is a part of the neural network accelerator. Paragraph 0033, "According to embodiments of the inventive concept, the neural network accelerator 1000 may perform calculations based on data precision scalable depending on the required accuracy. In particular, even though the number of bits of the input feature data IF and the weight data WT varies depending on the required accuracy, the neural network accelerator 1000 may perform calculations based on the input feature data IF and the weight data WT, which have various numbers of bits." Paragraph 0073, "Accordingly, as well as the hardware area of the neural network accelerator 1000 including a plurality of bit operators, the power required for calculation of the neural network accelerator 1000 may be reduced." The present application defines precision as adjusting the sparsity ratio, or the number of bits. Ryu teaches that the control circuit will perform calculations on the bits based on the device configuration, such as weight and feature data and the required accuracy. Ryu also teaches that power is a resource where its usage can be reduced to stay within a limit).
However, Ryu does not disclose the following limitations:
wherein the device information includes a table of consumption quantities of the resource according to the precision of the neural network configuration information and a current consumption quantity of the resource.
wherein the control circuit generates a calibrated table by calibrating the consumption quantities of the resource in the table, based on the current consumption quantity of the resource.
Park though teaches the artificial intelligence semiconductor processor of claim 1, wherein the device information includes a table of consumption quantities of the resource according to the precision of the neural network configuration information (Figure 5, shows a file that includes the neural network configuration, and the meta data associated with that configuration. The meta data is shown in the form of a table 506. Col. 6, lines 50-53, "Such a process of executing a neural network unit, measuring the power consumption, and storing it in the table 506 may be repeated for each processor until the neural network unit has been executed on each processor.” Col. 7, lines 64-67, "The method 600 may begin with obtaining a plurality of measurements that characterize an aspect of the operation of each of the corresponding plurality of SoC processors, as indicated by block 602." Park teaches that the table is updated until each neural network unit has been executed on each processor. Figure 6 shows the process for work reallocation, with the first step 600, gathering the current measurements in the table. Thus, the table will be up to date and have the current consumption quantities when the method 600 begins.)
Park further teaches wherein the control circuit generates a calibrated table by calibrating the consumption quantities of the resource of the table, based on the current consumption quantity of the resource (Col. 9, lines 3-8, "In such an example, involving two thresholds, the method 700 may dynamically re-allocate the executing neural network layers or other neural network units among the various SoC processors in a manner that tends to maintain the processors at temperatures between the first (higher) and second (lower) thresholds" In the present application, calibrating is defined as applying a gain or offset to the predicted metric according to the current metric. Park teaches that the processor, which includes a circuit, looks at the resource, such as temperature, and determines whether or not to maintain it between thresholds, thus applying an offset to calibrate it, the dynamically generated list of processors to be maintained or re-allocated corresponds to a calibrated table.)
Ryu and Park are analogous to the claimed invention because they both are in the same field of increasing the efficiency of neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ryu to incorporate the teachings of Park. This is because Park teaches an easier way of identifying consumption problems and way to monitor resources. This is done with tables, as they are easier to visualize for the user to see what the devices’ current resource usage is.
Regarding claim 2, Ryu-Park teaches the subject matter of claim 1. Ryu teaches the artificial intelligence semiconductor processor of claim 1, wherein the neural network configuration information includes weight data and feature map data constituting the neural network (Ryu, Paragraph 0031, "The processing circuit 100 may receive the weight data WT from the memory 10 and may perform calculations based on the weight data WT and the input feature data IF. The processing circuit 100 may generate the output feature data OF as the result of the calculations." Ryu teaches that the circuit will utilize weight data and feature map data. This shows that the neural network accelerator may process input feature data and weight data based on the neural network to generate output feature data).
Regarding claim 4, Ryu teaches the subject matter of claim 1. However, Ryu does not teach the artificial intelligence semiconductor processor of claim 1, wherein the resource limit includes at least one of a threshold value of a temperature and a threshold value of a power.
Park though teaches the artificial intelligence semiconductor processor of claim 1, wherein the resource limit includes at least one of a threshold value of a temperature and a threshold value of a power (Col. 6, lines 50-59, "Such a process of executing a neural network unit, measuring the power consumption, and storing it in the table 506 may be repeated for each processor until the neural network unit has been executed on each processor. The stored power consumption level for a particular neural network unit executing on a particular SoC processor thus represents an estimate of the power that would be consumed if that particular neural network unit were later executed on that particular processor. The process may be repeated for each neural network unit." Col. 8, line 60 -67, "Information identifying any processors having temperatures exceeding the first threshold may be added to a first list, as indicated by block 708. The measured temperature associated with each SoC processor also may be compared with a second threshold (block 706) representing a temperature below which undesirable effects of excess thermal energy dissipation are estimated to be less likely to occur.” SoC is defined as system on chip and it includes processing units. Park teaches that the power consumption of the processor is measured and put into a table. It is an estimate of the power that would be consumed and needs to be monitored so it does not use more power than it needs to. The temperature of the processor is also measured to make sure it does not exceed any thresholds.)
Ryu and Park are analogous to the claimed invention because they are all in the same field of increasing the efficiency of neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ryu to incorporate the teachings of Park. This is because Park teaches an easier way of identifying consumption problems and way to monitor resources. This is done with tables, as they are easier to visualize for the user to see what the devices’ current resource usage is.
Regarding claim 8, the combination of Ryu and Park teaches the subject matter of claim 7. Park further teaches the artificial intelligence semiconductor processor of claim 7, wherein the device information further includes information of the resource limit (Col. 8, lines 9-13, "As indicated by block 604, the measurements may be compared with one or more thresholds. For example, an SoC processor associated with each measurement may be categorized as corresponding to a measurement above a threshold or below a threshold. " Park teaches that the measurements, such as temperature, will be compared to their respective threshold, or limit.)
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Ryu in view of Park, further in view of Bleiweiss (US 12086705 B2).
Regarding claim 3, Ryu-Park teaches the subject matter of claim 1. Ryu further teaches “The artificial intelligence semiconductor processor of claim 1, wherein the precision of the neural network configuration information includes at least one of a sparsity ratio” (Paragraph 0037, "In an exemplary embodiment, each of the shifters 140 to 160 may shift the number of digits of the addition result depending on a predetermined shift value or may shift the number of digits of the addition result depending on the shift value entered as a separate control signal. For example, the shifter may shift the number of digits of the addition result by adding 0-bits to the addition result depending on the shift value." Sparsity ratio is defined as the proportion of zero-valued weights. Ryu teaches that shifters can shift the number of digits by adding 0-bits to the shift value and increase sparsity ratio). However, Ryu-Park does not teach and a quantization format of the neural network configuration information.
Bleiweiss though teaches and a quantization format of the neural network configuration information. (Col. 41, lines 1-8, "In another embodiment, compute mechanism 2010 may be implemented to perform quantization for deep neural networks. Quantization is implemented to shrink file sizes by storing a minimum and maximum for each network layer, and then compressing each floating-point value to an integer representing the closest real number in a linear set within the range. Quantization also reduces the computational resources needed to perform inference calculations" Quantization format is defined as reducing the precision of weights and features from higher bit depths to lower ones. Bleiweiss teaches that the quantization format can be used to decrease floating points to smaller integer to that within the range.)
Ryu-Park and Bleiweiss are analogous to the claimed invention because they are all in the same field of increasing speed and efficiency of neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ryu to incorporate the teachings of Bleiweiss. This is because Bleiweiss teaches a faster way of training a neural network by utilizing quantization format, which is reduces the precision of numerical representations such as weights and bits. This leads to reduced computational cost and lower power consumption, thus a faster training time.
Claims 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Ryu in view of Park, further in view of Hassantabar.
Regarding claim 9, Ryu-Park teaches the subject matter of claim 1. However, Ryu-Park does not teach the artificial intelligence semiconductor processor of claim 1, wherein the device information further includes a table of accuracy according to precision of layers of the neural network.
Hassantabar though teaches the artificial intelligence semiconductor processor of claim 1, wherein the device information further includes a table of accuracy according to precision of layers of the neural network (Paragraph 0009, "The system includes one or more processors configured to provide an initial neural network architecture; perform a dataset modification on the dataset, the dataset modification including reducing dimensionality of the dataset; perform a first compression step on the initial neural network architecture that results in a compressed neural network architecture, the first compression step including reducing a number of neurons in one or more layers of the initial neural network architecture based on a feature compression ratio determined by the reduced dimensionality of the dataset;" Figure 13, shows a table off accuracy for each data set. Hassantabar teaches that each data set comprises of a neural network and its layers that have been modified. The table shows accuracy values for each neural network).
Ryu-Park and Hassantabar are all considered to be analogous to the claimed invention because they are all in the same field increasing training speed and efficiency of neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ryu to incorporate the teachings of Hassantabar. This is because Hassantabar teaches a more accurate way of training a convolutional neural network by introducing a table of accuracies for each neural network and its layers. This allows the processor to identify which layers with low accuracy values, require changes to the configuration.
Regarding claim 10, Ryu-Park teaches the subject matter of claim 1. However, Ryu-Park does not teach the artificial intelligence semiconductor processor of claim 9, wherein the control circuit adjusts the precision of the neural network configuration information, based on the table.
Hassantabar though teaches the artificial intelligence semiconductor processor of claim 9, wherein the control circuit adjusts the precision of the neural network configuration information, based on the table (Paragraph 0100, "For these seven datasets, DR+SCANN can meet the baseline accuracy with a 28.Ox to 5078.7x smaller network. This shows a significant improvement over the compression ratio achievable by just using SCANN." Paragraph 0101, "The performance of applying DR without the benefit of the SCANN synthesis step is also reported. While these results show improvements, DR+SCANN can be seen to have much more compression power, relative to when DR and SCANN are used separately. This points to a synergy between DR and SCANN. " Hassantabar defines SCANN as neural network synthesis system and method; DR + SCANN as synthesis system and method of dimensionality reduction. Hassantabar teaches that DR + SCANN leads to higher accuracy values, thus adjusting the precision).
Claim(s) 11- 14, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Hassantabar in view of Park, further in view of Ryu.
Regarding claim 11, Hassantabar teaches an operating method of an artificial intelligence semiconductor processor, including a neural network computational accelerator and a control circurit, the method comprising: receiving neural network configuration information; receiving initial device information; (Paragraph 0009, "The system includes one or more processors configured to provide an initial neural network architecture; perform a dataset modification on the dataset, the dataset modification including reducing dimensionality of the dataset; perform a first compression step on the initial neural network architecture that results in a compressed neural network architecture, the first compression step including reducing a number of neurons in one or more layers of the initial neural network architecture based on a feature compression ratio determined by the reduced dimensionality of the dataset;" Hassantabar teaches that each data set comprises of a neural network and its layers that have been modified. These data sets can be the initial information and initial network configuration information).
Paragraph 0029, “A neural network accelerator 1000 may process input feature data IF based on the neural network to generate output feature data OF. For example, the neural network accelerator 1000 may process the input feature data IF based on convolutional neural network (CNN). However, the inventive concept is not limited thereto. For example, the neural network accelerator 1000 may use various neural network algorithms.” Ryu teaches that the neural network accelerator will implement feature data based on the neural network and neural network algorithms that can affect the neural network configuration information)
Hassantabar fails to teach about device information “including a table of consumption quantities of a resource according to updating of the neural network configuration information;”, and real-time device information “including a real-time resource consumption quantity”;
“updating, by the control circuit, the table by calibrating the consumption quantities of the resource included in the table based on the real-time resource consumption quantity” and “such that neural network processing performed by the neural network computational accelerator consumes a resource lower than a resource limit”
However, Park teaches the above limitations:
…a table of consumption quantities of a resource according to the updating of the neural network configuration information, … including a real-time resource consumption quantity (Col. 6, lines 50-53, "Such a process of executing a neural network unit, measuring the power consumption, and storing it in the table 506 may be repeated for each processor until the neural network unit has been executed on each processor. Col. 7, lines 64-67, "The method 600 may begin with obtaining a plurality of measurements that characterize an aspect of the operation of each of the corresponding plurality of SoC processors, as indicated by block 602." Park teaches that the table is updated until each neural network unit has been executed on each processor. Figure 6 shows the process for work reallocation. Thus, the table will be up to date and have the current consumption quantities when the method 600 begins.)
“updating, by the control circuit, the table by calibrating the consumption quantities of the resource included in the table based on the real-time resource consumption quantity”; (Col. 6, lines 50-53, "Such a process of executing a neural network unit, measuring the power consumption, and storing it in the table 506 may be repeated for each processor until the neural network unit has been executed on each processor. Col. 7, lines 64-67, "The method 600 may begin with obtaining a plurality of measurements that characterize an aspect of the operation of each of the corresponding plurality of SoC processors, as indicated by block 602." Col. 8, lines 18-24, "For example, reallocating a neural network unit from execution on a processor associated with a measurement that exceeds a threshold (block604) to a processor associated with a measurement that is significantly lower, may help distribute the workloads in a manner that promotes higher performance and/or less localized thermal energy dissipation." Park teaches that the table is updated with the current resource consumption quantity before starting the process. Once the processor looks at the resource consumption quantities, the network configuration changes, such as reallocating the workloads).
Updating, by the control circuit, the neural network configuration information, based on the based on the initial device information and the real-time device information (Col. 6, lines 50-53, "Such a process of executing a neural network unit, measuring the power consumption, and storing it in the table 506 may be repeated for each processor until the neural network unit has been executed on each processor. Col. 6, lines 26-34, "In the example illustrated in FIG. 5, in which the neural network units are layers of a neural network, the metadata information is provided on a per-Layer basis. Note that in an example of operation in which a plurality of neural network units may be executing concurrently, some of those neural network units may be neural networks while others may be layers of the same neural network. Stated generally, the metadata may be provided on a per-neural network unit basis. "Col. 7, lines 64-67, Col. 8, lines 1-13 "The method 600 may begin with obtaining a plurality of measurements that characterize an aspect of the operation of each of the corresponding plurality of SoC processors, as indicated by block 602. The aspect may be any aspect that potentially constrains or limits the operation. For example, a measurement may be a temperature characterizing the heat dissipation of a processor under its present workload (i.e., executing one or more neural network units). As indicated by block 604, the measurements may be compared with one or more thresholds. For example, an SoC processor associated with each measurement may be categorized as corresponding to a measurement above a threshold or below a threshold. " Park teaches that the table for resources, and similarly the neural network configuration for each layer, are updated continuously based on real time information. These values are then compared to thresholds set based on the initial device information values).
“such that neural network processing performed by the neural network computational accelerator consumes a resource lower than a resource limit”, Park Col. 9, lines 3-8, "In such an example, involving two thresholds, the method 700 may dynamically re-allocate the executing neural network layers or other neural network units among the various SoC processors in a manner that tends to maintain the processors at temperatures between the first (higher) and second (lower) thresholds." Park teaches that the measurements such as temperature and power will be compared to their respective threshold, or limit. Park teaches that the processor will dynamically make sure that the resource stays within the threshold).
Hassantabar and Park are analogous to the claimed invention because they are all in the same field increasing the speed and efficiency of resources of neural networks by monitoring resource usage. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hassantabar to incorporate the teachings of Park. This is because Park teaches a more accurate way of training a neural network by monitoring more resources and updating the tables in real time.
Hassantabar and Park does not disclose a neural network computational accelerator, that is disclosed ty Ryu. See Paragraph 0029, “A neural network accelerator 1000 may process input feature data IF based on the neural network to generate output feature data OF. For example, the neural network accelerator 1000 may process the input feature data IF based on convolutional neural network (CNN). However, the inventive concept is not limited thereto. For example, the neural network accelerator 1000 may use various neural network algorithms.” Ryu teaches that the neural network accelerator will implement feature data based on the neural network and neural network algorithms that can affect the neural network configuration information).
Hassantabar, Park, and Ryu are all considered to be analogous to the claimed invention because they are all in the same field of increasing the efficiency of neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hassantabar and Park to incorporate the teachings of Ryu. This is because Ryu teaches a more efficient way to train neural networks by increasing sparsity ratio. When the spart ratio is increased it leads to reduced parameters and faster training times.
Regarding claim 12, the combination of Hassantabar and Park teaches the subject matter of claim 11. Park further teaches the method of claim 11, wherein the updating of the neural network configuration information includes: (Col. 6, lines 26-34, "In the example illustrated in FIG. 5,in which the neural network units are layers of a neural network, the metadata information is provided on a per-Layer basis. Note that in an example of operation in which a plurality of neural network units may be executing concurrently, some of those neural network units may be neural networks while others may be layers of the same neural network. Stated generally, the metadata may be provided on a per-neural network unit basis." Park teaches that each layer and neural network configuration is updated alongside the metadata). However, Hassantabar and Park do not teach adjusting a sparsity ratio of the neural network configuration information.
Ryu though teaches adjusting a sparsity ratio of the neural network configuration information. (Paragraph 0037, "In an exemplary embodiment, each of the shifters 140 to 160 may shift the number of digits of the addition result depending on a predetermined shift value or may shift the number of digits of the addition result depending on the shift value entered as a separate control signal. For example, the shifter may shift the number of digits of the addition result by adding 0-bits to the addition result depending on the shift value Sparsity ratio is defined as the proportion of zero-valued weights. Ryu teaches that shifters can shift the number of digits by adding 0-bits to the shift value and increase sparsity ratio).
Regarding claim 14, the combination of Hassantabar and Park teaches the subject matter of claim 11. Park further teaches the method of claim 11, wherein the initial device information includes information indicating a resource limit, and wherein the updating of the neural network configuration information includes: updating the neural network configuration information such that the artificial intelligence semiconductor processor consumes a resource lower than the resource limit (Park, Col. 8, lines 9-13, "As indicated by block 604, the measurements may be compared with one or more thresholds. For example, an SoC processor associated with each measurement may be categorized as corresponding to a measurement above a threshold or below a threshold." Col. 9, lines 3-8, "In such an example, involving two thresholds, the method 700 may dynamically re-allocate the executing neural network layers or other neural network units among the various SoC processors in a manner that tends to maintain the processors at temperatures between the first (higher) and second (lower) thresholds." Park teaches that the measurements such as temperature and power will be compared to their respective threshold, or limit. Park teaches that the processor will dynamically make sure that the resource stays within the threshold).
Regarding claim 17, the combination of Hassantabar and Park and Ryu teaches the subject matter of claim 11. Hassantabar further teaches the method of claim 11, wherein the initial device information includes a table of accuracy according to the updating of the neural network configuration information, and wherein the updating of the neural network configuration information is based on the table (Figure 13, shows a table off accuracy for each data set. Each data set comprises of a neural network and its layers that have been modified. Paragraph 0009, "The system includes one or more processors configured to provide an initial neural network architecture; perform a dataset modification on the dataset, the dataset modification including reducing dimensionality of the dataset; perform a first compression step on the initial neural network architecture that results in a compressed neural network architecture, the first compression step including reducing a number of neurons in one or more layers of the initial neural network architecture based on a feature compression ratio determined by the reduced dimensionality of the dataset;" Paragraph 0104, “The advantages of SCANN and DR+SCANN are derived from its core benefit: the network architecture is allowed to dynamically evolve during training. This benefit is not directly available in several other existing automatic architecture synthesis techniques, such as the evolutionary and reinforcement learning based approaches.” Hassantabar teaches that the network configuration SCANN is able to dynamically evolve while training, thus able to look at tables and update accordingly).
Claim(s) 13 is rejected under 35 U.S.C. 103 as being unpatentable over Hassantabar in view of Park, further in view of Ryu, further in view of Bleiweiss.
Regarding claim 13, the combination of Hassantabar and Park teaches the subject matter of claim 11. Park further teaches the method of claim 11, wherein the updating of the neural network configuration information includes: (Col. 6, lines 26-34, "In the example illustrated in FIG. 5,in which the neural network units are layers of a neural network, the metadata information is provided on a per-Layer basis. Note that in an example of operation in which a plurality of neural network units may be executing concurrently, some of those neural network units may be neural networks while others may be layers of the same neural network. Stated generally, the metadata may be provided on a per-neural network unit basis." Park teaches that each layer and neural network configuration is updated alongside the metadata). However, Hassantabar and Park do not teach adjusting a quantization format of the neural network configuration information.
Bleiweiss though teaches adjusting a quantization format of the neural network configuration information (Col. 41, lines 1-8, "In another embodiment, compute mechanism 2010 may be implemented to perform quantization for deep neural networks. Quantization is implemented to shrink file sizes by storing a minimum and maximum for each network layer, and then compressing each floating point value to an integer representing the closest real number in a linear set within the range. Quantization also reduces the computational resources needed to perform inference calculations" Quantization format is defined as reducing the precision of weights and features from higher bit depths to lower ones. Bleiweiss teaches that the quantization format can be used to decrease floating points to smaller integer that within the range).
Hassantabar, Park, Ryu, and Bleiweiss are all considered to be analogous to the claimed invention because they are all in the same field of increasing speed and efficiency of neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hassantabar and Park to incorporate the teachings of Bleiweiss. This is because Bleiweiss teaches a faster way of training a neural network by utilizing quantization format, which is reduces the precision of numerical representations such as weights and bits. This leads to reduced computational cost and lower power consumption, thus a faster training time.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MIRANDA HUANG whose telephone number is 571.270.7092. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Wiley can be reached on 571.272.4150. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124