Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-21 are pending for examination.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-21 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim language in the following claims is not clearly understood:
As per claim 1, line 14, it is unclear whether “the sync request wires” and “the sync acknowledgment wires” are referring to “a sync request wire” and “an acknowledgment wire” of “the sets of wiring” in line 5-6 (i.e. consistent term should be used with “the” or “said” if they are the same)
Line 16, the claim recites the limitation "the sync controller". There is insufficient antecedent basis for this limitation in the claim.
As per claim 21, it has the same deficiencies as claim 1 above. Appropriate correction are required.
As per claims 2-20, they depend from rejected claims and do not resolve the deficiencies thereof and are therefore rejected for at least the same reasons.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-5, 8-9, 13-14, 17-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Knowles US Pub 2019/0121638 (hereafter Knowles) in view of Huse US Pub 2021/0194793 (hereafter Huse).
As per claim 1, Knowles teaches the invention substantially as claimed including a data processing device comprising: a plurality of processors, each comprising an execution unit configured to participate in at least one of a plurality of barrier synchronisations (para[0062, 0110], FIG. 1 and 9, each processor module 4 comprises an execution unit, and the processor modules (4i, 4ii …) are participating in the barrier synchronization);
and a plurality of sets of wiring for co-ordinating the barrier synchronisations between the processors, wherein each of the sets of wiring is associated with a respective one of the processors and comprises a sync request wire and a sync acknowledgment wire, wherein for each of the processors, circuitry of the respective processor is configured to: receive a signal representing a state of the sync acknowledgment wire for the respective processor (para[0121-0123, 0138-0139], FIG. 16, sync req wire and sync ack wire, are dedicated to each processing module, and the sync_req and sync_ack signals are transmitted and received to and from the sync. controller);
assert a sync request by setting a state of the sync request wire for the respective processor in dependence upon the received signal, so as to be opposite to the state of the sync acknowledgement wire for the respective processor; each of the sync request wires has been set to the opposite of the state of the sync acknowledgment wires; causing the state of the sync acknowledgment wire of the respective processor to be set to be the same as the state of the sync request wire of the respective processor (para[0139, 0141, 0144], raise the sync request wire to signal the sync request to value 1, which is an opposite value of 0 to before the sync ack wire is set, and a sync ack wire of the each processor module is set in response to receiving the sync request signal, thus same state as the sync req wire).
Knowles does not explicitly teach wherein the data processing device further comprises: aggregation circuitry configured to, in response to detecting that each of the sync request wires has been set to the opposite of the state of the sync acknowledgment wires, output an aggregate sync request for a first of the barrier synchronisations to the sync controller; a sync controller comprising circuitry configured to, in response to the aggregate sync request, return to each of the processors, an acknowledgment of the sync request of the respective processor by causing the state of the sync acknowledgment wire of the respective processor to be set to be the same as the state of the sync request wire of the respective processor.
However, Huse teaches wherein the data processing device further comprises: aggregation circuitry configured to, in response to detecting that each of the sync request wires has been set, output an aggregate sync request for a first of the barrier synchronisations to the sync controller (para[0060-0066, 0074], sync request are issued by the tiles (processing units) and the accelerator aggregates the sync requests and issues an aggregated sync request to the gateway, where the sync logic use dedicated wiring for transmitting the sync requests and sync acknowledgements);
a sync controller comprising circuitry configured to, in response to the aggregate sync request, return to each of the processors, an acknowledgment of the sync request of the respective processor (para[0066, 0073-0074, 0083], the gateway receives the aggregated sync request and allow the synchronization barrier to be passed, and the sync logic use dedicated wiring for transmitting the sync acknowledgement to each tiles once all the sync requests have been received from all the tiles).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention incorporate Huse’s teaching to Knowles’ invention in order to provide a gateway connected to a computer subsystem for acting as a work accelerator to control propagation of sync requests and acknowledgements in the gateway, which allows synchronization to be completed even if there is a fault in one of the paths of the sync network (para[0008-0010]).
As per claim 2, Knowles and Huse teach the data processing device of claim 1, and Knowles teaches wherein the circuitry of the sync controller is configured to, for each of the processors, perform the causing the state of the sync acknowledgment wire for the respective processor to be set by setting the state of the sync acknowledgment wire for the respective processor in dependence upon the aggregate sync request so as to be the same as state of a wire on which the aggregate sync request is provided (para[0120, 0139, 0144], when the sync controller receives a sync_req from all the tiles (sync req wire raised), then the sync controller sends the sync_ack using the sync ack wire back to the sync logic on each of the tiles, thus the sync_ack wire same state as the sync req).
As per claim 3, Huse teaches wherein the aggregation circuitry is configured to output the aggregate sync request on an aggregate sync request wire to the sync controller by: following a transition in the state of all of the sync request wires to an updated state, updating a state of the aggregate sync request wire to match the updated state of all of the sync request wires (para[0074, 0082], FIG. 4, sync logic uses dedicated wiring for transmitting the aggregated sync request 56, which matches the states of all the sync req from the each tile).
As per claim 4, Knowles teaches wherein the aggregation circuitry comprising a plurality of aggregation circuits, each of which is associated with a set of one or more of the processors and is configured to aggregate the state of the sync request wires of its associated processors with running aggregate state (para[0138, 0144], FIG. 16, sync block 95 comprises respective gating logic and a respective sync aggregator, thus a plurality of sync aggregator aggregates sync requests from processor modules).
As per claim 5, Huse teaches wherein for at least some of the processors, the respective execution unit is configured to, in response to the acknowledgment from the sync controller, proceed past the first of the barrier synchronisations (para[0071, 0090, 0254], allowing the sync barrier to be passed involves generating a sync ack and sending to the accelerator).
As per claim 8, Knowles teaches wherein for each of the processors, the respective execution unit is configured to operate in an alternating cycle of compute phases and exchange phases separated by the barrier synchronisations (para[0032-0033, 0110-0111], alternate between compute phase and exchange phase).
As per claim 9, Knowles teaches wherein for each of at least some of the processors, the respective execution unit is configured to: in response to receipt of the acknowledgment of the sync request of the respective processor, proceed to one of the exchange phases in which each of the at least some of the processors at least one of: sends or receives data (para[0033, 0110, 0120], return a sync ack signal, the proceed to next exchange phase to exchange data).
As per claim 13, Knowles teaches the circuitry of the sync controller is configured to: in response to the aggregate sync request, issue a further request to an external sync controller for the processors to participate in the first of the barrier synchronisations with further processors belonging to further devices; and subsequently, in response to receipt of a further acknowledgment of the further request from the external sync controller, return to each of the processors, the acknowledgment of the sync request of the respective processor (para[0129, 0136, 0159], external sync participation is signalled to the external system, and the tile remain suspended until it receives external sync acknowledgement from the external system).
As per claim 14, Knowles teaches wherein for each of the processors, the respective sync request wire and the respective sync acknowledgment are associated with a first sync group to which at least some of the processors belong (para[0139-0140], FIG. 96, sync_req and sync_ack wires of are associate with the first sync group)
As per claim 17, Knowles teaches wherein for each of the processors, the circuitry comprises at least one of: an inverter gate configured to invert the signal representing the state of the sync acknowledgment wire for the respective processor in order to set the state of the sync request wire for the respective processor to be opposite to the state of the sync acknowledgement wire for the respective processor; and a XOR gate configured to invert the signal representing the state of the sync acknowledgment wire for the respective processor in order to set the state of the sync request wire for the respective processor to be opposite to the state of the sync acknowledgement wire for the respective processor (para[0095], XOR which inverts the states).
As per claim 19, Knowles teaches wherein for each of at least some of the processors, the circuitry of the respective processor is configured to: detect the acknowledgment of the sync request for the respective processor in response to detecting a transition in the state of the sync acknowledgment wire for the respective processor (para[0139, 0144], detect the sync wire raised by the chip).
As per claim 20, Knowles teaches wherein the data processing device is an integrated circuit (para[0106], chip is implemented alone on its own single chip integrated circuit package).
Claim(s) 6-7, 10-12, 15-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Knowles in view of Huse as applied to claim 1 above, and further in view of Wilkinson et al US Pub 2019/0121679 (hereafter Wilkinson).
As per claim 6, Knowles and Huse teach the data processing device of claim 1, but do not teach wherein each of the processors comprises: a memory storing a local program comprising a set of computer readable instructions, the respective set of computer readable instructions comprising indications of each of ones of the barrier synchronisations in which the respective processor is to participate; and an execution unit configured to execute the computer readable instructions of the respective processor so as to enable the respective processor to participate in ones of the barrier synchronisations.
However, Wilkinson teaches a memory storing a local program comprising a set of computer readable instructions, the respective set of computer readable instructions comprising indications of each of ones of the barrier synchronisations in which the respective processor is to participate; and an execution unit configured to execute the computer readable instructions of the respective processor so as to enable the respective processor to participate in ones of the barrier synchronisations (para[0164, 0171], the group of participating tiles can be set by the mode operand of the sync instruction, which is enabled to participate in the barrier synchronization).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention incorporate Wilkinson’s teaching to Knowles and Huse’s invention in order to provide a method for synchronizing the workloads of different tiles using bulk synchronous parallel communication schemes where the sync instruction provides the ability to synchronize amongst a desired set or subset of tiles with a reduced latency and lower code density (para[0018]).
As per claim 7, Knowles, Huse and Wilkinson teach the data processing device of claim 6, and Wilkinson teaches wherein for each of at least some of the processors: the respective execution unit is configured to, in response to a first of the indications for the first of the barrier synchronisations, execute a sync instruction to cause the circuitry of the respective processor to assert the sync request for the respective processor (para[0164, 0171], the participating tile automatically assert of readiness using the sync_req).
As per claim 10, Knowles and Huse teach the data processing device of claim, but they do not explicitly teach wherein a subset of the processors each comprise a register storing an indication that the processor does not belong to a group of processors that are configured to participate in the first of the barrier synchronisations, wherein for each of the subset of the processors, the circuitry of the respective processor is configured to assert the sync request for the respective processor in response to the indication that the respective processor does not belong to the group.
However, Wilkinson teaches wherein a subset of the processors each comprise a register storing an indication that the processor does not belong to a group of processors that are configured to participate in the first of the barrier synchronisations, wherein for each of the subset of the processors, the circuitry of the respective processor is configured to assert the sync request for the respective processor in response to the indication that the respective processor does not belong to the group (para[0164-0170], not all tiles participate in the synchronization, and other tiles not participating executes SAN instruction to cause the tile to abstain from the current barrier synchronization, and send the sync req signal to the sync controller).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention incorporate Wilkinson’s teaching to Knowles and Huse’s invention in order to provide a method for synchronizing the workloads of different tiles using bulk synchronous parallel communication schemes where the sync instruction provides the ability to synchronize amongst a desired set or subset of tiles with a reduced latency and lower code density (para[0018]).
As per claim 11, Knowles, Huse and Wilkinson teach the data processing device of claim 10, and Wilkinson teaches wherein for each of the subset of the processors: an execution unit of the respective processors is configured to, following assertion of the sync request by the circuitry of the respective processor and prior to receipt of the acknowledgment, proceed with computation or data exchange without waiting at the first of the barrier synchronisations (para[0166-0168], SAN instruction cause the tile to abstain from the current barrier synchronization, not paused awaiting the sync_ack, and without holding up other tiles which are waiting for all tiles in the group to SYNC).
As per claim 12, Knowles and Huse teach the data processing device of claim 9, but they do not explicitly teach wherein a subset of the processors each comprise a register storing an indication that the processor does not belong to a group of processors that are configured to participate in the first of the barrier synchronisations, wherein for each of the subset of the processors, the circuitry of the respective processor is configured to assert the sync request for the respective processor in response to the indication that the respective processor does not belong to the group; wherein for each of the subset of the processors: an execution unit of the respective processors is configured to, in response to the indication that the processor does not belong to the sync group, abstain from participating in the one of the exchange phases.
However, Wilkinson teaches wherein a subset of the processors each comprise a register storing an indication that the processor does not belong to a group of processors that are configured to participate in the first of the barrier synchronisations, wherein for each of the subset of the processors, the circuitry of the respective processor is configured to assert the sync request for the respective processor in response to the indication that the respective processor does not belong to the group; wherein for each of the subset of the processors: an execution unit of the respective processors is configured to, in response to the indication that the processor does not belong to the sync group, abstain from participating in the one of the exchange phases (para[0164-0170, 0173], not all tiles participate in the synchronization, and other tiles not participating executes SAN instruction to cause the tile to abstain from the current barrier synchronization, and send the sync req signal to the sync controller).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention incorporate Wilkinson’s teaching to Knowles and Huse’s invention in order to provide a method for synchronizing the workloads of different tiles using bulk synchronous parallel communication schemes where the sync instruction provides the ability to synchronize amongst a desired set or subset of tiles with a reduced latency and lower code density (para[0018]).
As per claim 15, Knowles and Huse teach the data processing device of claim 14, but they do not explicitly teach for each of the processors, the respective set of sync wiring comprising: a plurality of further sync request wires, each of which is associated with a different sync group and is operable to transport sync requests for ones of the barrier synchronisations involving the respective sync group; and a plurality of further sync acknowledgment wires, each of which is associated with a different sync group and is operable to transport sync acknowledgments in relation to ones of the barrier synchronisations involving the respective sync group
However, Wilkinson teaches for each of the processors, the respective set of sync wiring comprising: a plurality of further sync request wires, each of which is associated with a different sync group and is operable to transport sync requests for ones of the barrier synchronisations involving the respective sync group; and a plurality of further sync acknowledgment wires, each of which is associated with a different sync group and is operable to transport sync acknowledgments in relation to ones of the barrier synchronisations involving the respective sync group (para[0018, 0039-0048], performs sync signalling among the specified group of tiles via dedicated sync (req and ack) wires).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention incorporate Wilkinson’s teaching to Knowles and Huse’s invention in order to provide a method for synchronizing the workloads of different tiles using bulk synchronous parallel communication schemes where the sync instruction provides the ability to synchronize amongst a desired set or subset of tiles with a reduced latency and lower code density (para[0018]).
As per claim 16, Knowles, Huse and Wilkinson teach the data processing device of claim 15, and Huse teaches wherein each of the different sync groups is a configurable sync group, wherein each of the processors comprises a register comprising an indication, for each of the configurable sync groups, whether or not the respective processor belongs to that configurable sync group (para[0068, 0092], an indication is stored in the register as to the sync zone for a group of tiles).
Allowable Subject Matter
Claim 18 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAMMY EUNHYE LEE whose telephone number is (571)270-7773. The examiner can normally be reached Mon, Tues, Thur 9PM-4PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached at (571)272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TAMMY E LEE/Primary Examiner, Art Unit 2195