DETAILED ACTIONS
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim this application is in benefit of foreign priority from Japanese Patent Application No. JP2022-156128 filed on September 9, 2022.
Information Disclosure Statement
The information disclosure statement (“IDS”) filed on 05/22/2023 was reviewed and the listed references were noted.
Drawings
The 17-page drawings have been considered and placed on record in the file.
Status of Claims
Claims 1-11 are pending.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-5 and 11 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1,
Step 1 Analysis: Claim 1 is directed to an apparatus, which falls within one of the four statutory categories.
Step 2A-Prong 1 Analysis: The limitation of estimating a number of at least one target object included in a target region being at least part of the acquired image by using a learned model , as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind which falls within the Mental Process grouping of abstract ideas. For example, a learned model can be any process with specific steps that was utilized by a human to estimate the number of people or objects in a physical image. Similarly, estimation of a number of the at least one target object included in the target region is performed by using the likelihood data and the numerical data , as drafted, is a process that under its broadest reasonable interpretation, covers performance of the limitation in the mind which falls within the Mental Process grouping of abstract ideas. For example, the human can determine the number of people/objects in a region of the image as well as the likelihood of a person/object being in that region and summing up the total number for every region. Accordingly, the claim recites an abstract idea.
Step 2A-Prong 2 Analysis: The limitation of “acquiring an image” is considered to be an insignificant extra-solution activity for mere data gathering. An initial step of acquiring an image does not integrate the exception into a practical application or add significantly more. The limitation of “output data of the model” is considered to be an insignificant post-solution activity. The claim recites additional elements – a memory and a processor. The memory and processor are recited at a high-level of generality (i.e., as a processor and memory as part of a generic computer performing a generic computer function of counting the number of people in an image) such that it amounts no more than mere instructions to apply the exception using a generic computer. The claim does not include additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Step 2B Analysis: Because the claim fails under Step 2A, the claim is further evaluated under Step 2B. The claim herein does not contain additional elements that are sufficient to amount to significantly more than the judicial exception, because as discussed above with respect to integration of the abstract idea into a practical application, the additional element/limitation acquiring an image and out of the model amounts to no more than an insignificant well-understood, routine, and conventional element. Therefore, independent claim 1 is not patent eligible.
Regarding dependent claims 2-5, they do not overcome the deficiencies of the rejected independent claim 1, and they are also rejected.
Regarding claim 11,
Step 1 Analysis: Claim 11 is directed to a method, which falls within one of the four statutory categories.
Step 2A-Prong 1 Analysis: The limitation of estimating a number of at least one target object included in a target region being at least part of the acquired image by using a learned model , as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind which falls within the Mental Process grouping of abstract ideas. For example, a learned model can be any process with specific steps that was utilized by a human to estimate the number of people or objects in a physical image. Similarly, estimation of a number of the at least one target object included in the target region is performed by using the likelihood data and the numerical data , as drafted, is a process that under its broadest reasonable interpretation, covers performance of the limitation in the mind which falls within the Mental Process grouping of abstract ideas. For example, the human can determine the number of people/objects in a region of the image as well as the likelihood of a person/object being in that region and summing up the total number for every region. Accordingly, the claim recites an abstract idea.
Step 2A-Prong 2 Analysis: The limitation of “acquiring an image” is considered to be an insignificant extra-solution activity for mere data gathering. An initial step of acquiring an image does not integrate the exception into a practical application or add significantly more. The limitation of “output data of the model” is considered to be an insignificant post-solution activity. The claim does not include additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Step 2B Analysis: Because the claim fails under Step 2A, the claim is further evaluated under Step 2B. The claim herein does not contain additional elements that are sufficient to amount to significantly more than the judicial exception, because as discussed above with respect to integration of the abstract idea into a practical application, the additional element/limitation acquiring an image and out of the model amounts to no more than an insignificant well-understood, routine, and conventional element. Therefore, independent claim 11 is not patent eligible.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1, 5-8, and 11 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Tan et al., (US/20230301522 A1, PCT Filed on 08/18/2020), hereinafter referred to as Tan.
Claim 1
Tan discloses an estimation apparatus (Tan, Fig. 1) comprising:
at least one memory configured to store instructions (Tan, [0006], “instructions are stored on the memory”); and
at least one processor (Tan, [0006], “a processing unit comprising a processor,”) configured to execute the instructions to perform operations (Tan, [0006], “instructions are stored on the memory”) comprising:
acquiring an image (Tan, Fig. 4, step 405, “receive and process thermal images”; and
estimating a number of at least one target object included in a target region (Tan, [0045], “This allows the sensor to always obtain a human silhouette based on the heat signature of the human, in addition other objects. Key information that could be generated includes: the number of persons within view; the position of the persons in relation to the surrounding; the direction of movement; body postures that indicate if one is standing up, walking, sitting or lying down, and actions such as bed turning, cleaning, medication adherence.”, [0065], “The methodology of segmenting an image into grids, and then finding overlaps of parts of a potential object and later merging into a final bounding box is a common methodology and two known examples are the You Only Look Once (YOLO) family algorithms and Single Shot Multibox Detector (SSD). In this application two bounding box for a human is obtained, the first bounding box encompasses the entire human body and the second bounding box for the head of the human body. The bounding box for the human returns a 1 or 0 and contributes towards counting the total number of people in the thermal sensor 120 field-of-view. The bounding box of the human can also be used to estimate its position in relation to other humans, or with objects such as bed and chair.”), being at least part of the acquired image by using a learned model (Tan, [0065], “Each cell is responsible for predicting bounding boxes of potential objects to be recognised and removing boxes with low object probability and matching against a pre-trained model. In order to train a model, a certain number of images are required.”) wherein
input data of the model (Tan, [0065], “Object recognition is then applied to the processed array of temperature data in the following manner. The array of temperature data is first split into cells in the form of a grid. Each cell is responsible for predicting bounding boxes of potential objects to be recognised and removing boxes with low object probability and matching against a pre-trained model. ”) are the image (Tan, [0057], “the processing unit 110 is a multi-input data acquisition module that receive pertinent information on the monitored subject. The thermal sensor 120 and UWB sensor 130 are connected to processing unit come with an on-board processor that is configured to receive data and transmit data to a main server 190.”, [0062], “FIG. 4 illustrates a process 400 performed by the processing unit 110 for receiving data from the thermal sensor 120 and UWB sensor 130 in accordance with an embodiment of this invention”),
output data of the model (Tan, [0065], “With the trained model, the objection recognition is able to find all the objects in the image to draw the bounding box”)include:
likelihood data indicating a likelihood of the one or more target objects being included in each of a plurality of partial regions (Tan, [0065], “the bounding box for the human returns a 1 or 0 and contributes towards counting the total number of people in the thermal sensor 120 field-of-view. The bounding box of the human can also be used to estimate its position in relation to other humans, or with objects such as bed and chair., the bounding box is the partial region and 1 or 0 is the likelihood of a human or target object being in the bounding box) acquired by dividing the image ([0065], “ The methodology of segmenting an image into grids, and then finding overlaps of parts of a potential object and later merging into a final bounding box is a common methodology and two known examples are the You Only Look Once (YOLO) family algorithms and Single Shot Multibox Detector (SSD).”); and
numerical data indicating an estimated number of the at least one target object for the partial region estimated to include the one or more target objects out of the plurality of partial regions (Tan, [0065], “the bounding box for the human returns a 1 or 0 and contributes towards counting the total number of people in the thermal sensor 120 field-of-view”, each bounding box can either have a numerical data of 1 or 0 to indicate if a human is inside the box that adds up to the number of total target objects in the image), and
estimation of a number of the at least one target object included in the target region is performed by using the likelihood data and the numerical data (Tan, [0065], “([0065], “the bounding box for the human returns a 1 or 0 and contributes towards counting the total number of people in the thermal sensor 120 field-of-view”, the number of humans in each bounding boxes are summed up to output the total number of people in the thermal image, [0069] “In step 420, the data processed from step 405, 410 and analysed data and activity intensity are transmitted to the main server 190 for further analysis. The data includes number of people, respiratory rate, heart rate, temperature of patient, thermal images, bed-exit status, and activity intensity.”).
Claim 5
Tan discloses the estimation apparatus according to claim 1 (Tan, Fig. 1), wherein the output data include the likelihood data for each type of the target object (Tan, [0065], “The methodology of segmenting an image into grids, and then finding overlaps of parts of a potential object and later merging into a final bounding box is a common methodology and two known examples are the You Only Look Once (YOLO) family algorithms and Single Shot Multibox Detector (SSD). In this application two bounding box for a human is obtained, the first bounding box encompasses the entire human body and the second bounding box for the head of the human body. The bounding box for the human returns a 1 or 0 and contributes towards counting the total number of people in the thermal sensor 120 field-of-view. The bounding box of the human can also be used to estimate its position in relation to other humans, or with objects such as bed and chair.”, the two types are human or not a human which is indicated by 1 or 0).
Claim 6
Tan discloses a model generation apparatus (Tan, Fig. 1) comprising:
at least one memory configured to store instructions(Tan, [0006], “instructions are stored on the memory”); and
at least one processor (Tan, [0006], “a processing unit comprising a processor,”) configured to execute the instructions to perform operations (Tan, [0006], “instructions are stored on the memory”) comprising:
acquiring training data in which a training image and ground truth data are associated with each other (Tan, [0062], “In order to train a model, a certain number of images are required. Each image includes marked out points and bounding box or boxes with tag depicting a category such as people, hand, leg, torso, head, table, chair, bed, etc.”, “The exact number of images to be used for training a model may be through trial and error and is left to one skilled in the art to determine. Approximately 70-80% of the images are used to train a model while the remaining 20-30% of the images are used to verify the accuracy of the model.”, labeled data is the ground truth data); and
generating a model by performing machine learning using the training data (Tan, [0062], “In order to train a model, a certain number of images are required. Each image includes marked out points and bounding box or boxes with tag depicting a category such as people, hand, leg, torso, head, table, chair, bed, etc.”, [0065], “The methodology of segmenting an image into grids, and then finding overlaps of parts of a potential object and later merging into a final bounding box is a common methodology and two known examples are the You Only Look Once (YOLO) family algorithms and Single Shot Multibox Detector (SSD).”), wherein
input data of the model (Tan, [0065], “Object recognition is then applied to the processed array of temperature data in the following manner. The array of temperature data is first split into cells in the form of a grid. Each cell is responsible for predicting bounding boxes of potential objects to be recognised and removing boxes with low object probability and matching against a pre-trained model. ”)are an image (Tan, [0057], “the processing unit 110 is a multi-input data acquisition module that receive pertinent information on the monitored subject. The thermal sensor 120 and UWB sensor 130 are connected to processing unit come with an on-board processor that is configured to receive data and transmit data to a main server 190.”, [0062], “FIG. 4 illustrates a process 400 performed by the processing unit 110 for receiving data from the thermal sensor 120 and UWB sensor 130 in accordance with an embodiment of this invention”),
output data of the model (Tan, [0065], “With the trained model, the objection recognition is able to find all the objects in the image to draw the bounding box”) include:
likelihood data indicating a likelihood of the one or more target objects being included in each of a plurality of partial regions (Tan, [0065], “the bounding box for the human returns a 1 or 0 and contributes towards counting the total number of people in the thermal sensor 120 field-of-view. The bounding box of the human can also be used to estimate its position in relation to other humans, or with objects such as bed and chair., the bounding box is the partial region and 1 or 0 is the likelihood of a human or target object being in the bounding box) acquired by dividing the image ([0065], “ The methodology of segmenting an image into grids, and then finding overlaps of parts of a potential object and later merging into a final bounding box is a common methodology and two known examples are the You Only Look Once (YOLO) family algorithms and Single Shot Multibox Detector (SSD).”); and
numerical data indicating an estimated number of the at least one target object for the partial region estimated to include the one or more target objects out of the plurality of partial regions (Tan, [0065], “the bounding box for the human returns a 1 or 0 and contributes towards counting the total number of people in the thermal sensor 120 field-of-view”, each bounding box can either have a numerical data of 1 or 0 to indicate if a human is inside the box that adds up to the number of total target objects in the image).
Claim 7
Tan discloses the model generation apparatus according to claim 6 (Tan, Fig. 1), wherein the ground truth data include ground truth numerical data indicating a number of the at least one target object in each of a plurality of partial regions acquired by dividing the training image (Tan, [0062], “In order to train a model, a certain number of images are required. Each image includes marked out points and bounding box or boxes with tag depicting a category such as people, hand, leg, torso, head, table, chair, bed, etc.”, “Tan, [0062], “In order to train a model, a certain number of images are required. Each image includes marked out points and bounding box or boxes with tag depicting a category such as people, hand, leg, torso, head, table, chair, bed, etc.”, “The exact number of images to be used for training a model may be through trial and error and is left to one skilled in the art to determine. Approximately 70-80% of the images are used to train a model while the remaining 20-30% of the images are used to verify the accuracy of the model.”, labeled data is the ground truth data, [0065], “The methodology of segmenting an image into grids, and then finding overlaps of parts of a potential object and later merging into a final bounding box is a common methodology and two known examples are the You Only Look Once (YOLO) family algorithms and Single Shot Multibox Detector (SSD). In this application two bounding box for a human is obtained, the first bounding box encompasses the entire human body and the second bounding box for the head of the human body. The bounding box for the human returns a 1 or 0 and contributes towards counting the total number of people in the thermal sensor 120 field-of-view”).
Claim 8
Tan discloses the model generation apparatus according to claim 6 (Tan, Fig. 1), wherein,
the machine learning is performed in such a way that the model outputs a number of the at least one target object in the partial region for the partial region in which the one or more target objects exist (Tan, [0065], “With the trained model, the objection recognition is able to find all the objects in the image to draw the bounding box”, [0065], “the bounding box for the human returns a 1 or 0 and contributes towards counting the total number of people in the thermal sensor 120 field-of-view”).
Claim 11 is rejected for similar reasons as those described in claim 1. The additional elements in Claim 11 (Tan) discloses includes: an estimation method (Tan, Fig. 4).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 2-4 and 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Tan in view of Rudzicz et al., (US 2023/0419503 A1, PCT Filed on 11/19/2021), hereinafter referred to as Rudzicz.
Claim 2
Tan discloses the estimation apparatus according to claim 1 (Tan, Fig. 1), wherein the output data further include position data indicating an estimated position of the target object for the partial region (Tan, [0065], “The bounding box of the human can also be used to estimate its position in relation to other humans, or with objects such as bed and chair.”) with an estimated number of the at least one target object indicated in the numerical data being equal to 1 or the partial region with the estimated number being equal to or greater than 1 (Tan, [0065], “the bounding box for the human returns a 1 or 0 and contributes towards counting the total number of people in the thermal sensor 120 field-of-view”, each bounding box can either have a numerical data of 1 or 0 to indicate if a human is inside the box that adds up to the number of total target objects in the image).
Tan does not explicitly disclose wherein the output data further include size data indicating an estimated size of the target object for the partial region with an estimated number of the at least one target object indicated in the numerical data being equal to 1 or the partial region with the estimated number being equal to or greater than 1, and the operations further comprise estimating a position and a size of the target object by using the position data and the size data.
However, Rudzicz teaches wherein the output data further include size data indicating an estimated size of the target object for the partial region with an estimated number of the at least one target object indicated in the numerical data being equal to 1 or the partial region with the estimated number being equal to or greater than 1 (Rudzicz, [0129], “A neural network can detect where a face is in an image 810. The neural network can include a number of linear transformation sub-processes 820, 840, 860, 880. The detection of a face can be represented by a vector 890 of size 5. For example, the vector can include five elements, each describing, respectively: if a face is presented, height in pixels, width in pixels, centre pixel location on x axis, and centre pixel location on y axis. If there is a face, the value stored at the first position can be 1, if there is no face, the value stored at the first position can be 0, and so on.”, the height and the width of the pixels is analogous to the size of the data), and the operations further comprise estimating a position and a size of the target object by using the position data and the size data (Rudzicz, [0046], “the processor is configured to, for one or more regions corresponding to a detected head, hand, or body, compute a bounding box or pixel-level mask for the respective region, a confidence score for the detected head, hand, or body, and compute data indicating the bounding boxes or pixel-level masks, the confidence scores, and the frames of the video data”, the bounding box determines the position of the object within the image, [0129], “The neural network can include a number of linear transformation sub-processes 820, 840, 860, 880. The detection of a face can be represented by a vector 890 of size 5. For example, the vector can include five elements, each describing, respectively: if a face is presented, height in pixels, width in pixels, centre pixel location on x axis, and centre pixel location on y axis. If there is a face, the value stored at the first position can be 1, if there is no face, the value stored at the first position can be 0, and so on”, the height and width in pixels can indicate the size of the object) .
Tan and Rudzicz are both considered to be analogous to the claimed invention because they are in the same field of human counting in an image. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the image signal processor as taught by Tan to incorporate the teachings of Rudzicz wherein the output data further include size data indicating an estimated size of the target object for the partial region with an estimated number of the at least one target object indicated in the numerical data being equal to 1 or the partial region with the estimated number being equal to or greater than 1, and the operations further comprise estimating a position and a size of the target object by using the position data and the size data. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been for improved efficiency (Rudzicz, [0115]).
Claim 3
Tan discloses the estimation apparatus according to claim 1 (Tan, Fig. 1), wherein the position data indicate an estimated position of the target object only for the partial region with an estimated number of the at least one target object indicated in the numerical data being equal to 1 (Tan, [0065], “the bounding box for the human returns a 1 or 0 and contributes towards counting the total number of people in the thermal sensor 120 field-of-view. The bounding box of the human can also be used to estimate its position in relation to other humans, or with objects such as bed and chair., the bounding box is the partial region and 1 or 0 is the likelihood of a human or target object being in the bounding box, partial region is the bounding box), and the size data indicate an estimated size of the target object only for the partial region with an estimated number of the at least one target object indicated in the numerical data being equal to 1 (Rudzicz, [0046], “the processor is configured to, for one or more regions corresponding to a detected head, hand, or body, compute a bounding box or pixel-level mask for the respective region, a confidence score for the detected head, hand, or body, and compute data indicating the bounding boxes or pixel-level masks, the confidence scores, and the frames of the video data”, the bounding box determines the position of the object within the image, [0129], “The neural network can include a number of linear transformation sub-processes 820, 840, 860, 880. The detection of a face can be represented by a vector 890 of size 5. For example, the vector can include five elements, each describing, respectively: if a face is presented, height in pixels, width in pixels, centre pixel location on x axis, and centre pixel location on y axis. If there is a face, the value stored at the first position can be 1, if there is no face, the value stored at the first position can be 0, and so on”, the height and width in pixels can indicate the size of the object). The proposed combination as well as the motivation for combining the Tan and Rudzicz references presented in the rejection of Claim 2, apply to Claim 3 and are incorporated herein by reference. Thus, the apparatus in Claim 3 is met by Tan and Rudzicz.
Claim 4
Tan discloses the estimation apparatus according to claim 1 (Tan, Fig. 1), wherein the position data indicate an estimated mean position of the one or more target objects for the partial region with an estimated number of the at least one target object indicated in the numerical data being equal to or greater than 1 (Rudzicz, [0046], “the processor is configured to, for one or more regions corresponding to a detected head, hand, or body, compute a bounding box or pixel-level mask for the respective region, a confidence score for the detected head, hand, or body, and compute data indicating the bounding boxes or pixel-level masks, the confidence scores, and the frames of the video data”, the bounding box determines the position of the object within the image, [0129], “The neural network can include a number of linear transformation sub-processes 820, 840, 860, 880. The detection of a face can be represented by a vector 890 of size 5. For example, the vector can include five elements, each describing, respectively: if a face is presented, height in pixels, width in pixels, centre pixel location on x axis, and centre pixel location on y axis. If there is a face, the value stored at the first position can be 1, if there is no face, the value stored at the first position can be 0, and so on”, center pixel location is analogous to the mean position of the object as shown as an example on Fig. 16 of the Specification as the open circle as the center of the person), and the size data indicate an estimated size of a region including the one or more target objects for the partial region with an estimated number of the at least one target object indicated in the numerical data being equal to or greater than 1 (Rudzicz, [0046], “the processor is configured to, for one or more regions corresponding to a detected head, hand, or body, compute a bounding box or pixel-level mask for the respective region, a confidence score for the detected head, hand, or body, and compute data indicating the bounding boxes or pixel-level masks, the confidence scores, and the frames of the video data”, the bounding box determines the position of the object within the image, [0129], “The neural network can include a number of linear transformation sub-processes 820, 840, 860, 880. The detection of a face can be represented by a vector 890 of size 5. For example, the vector can include five elements, each describing, respectively: if a face is presented, height in pixels, width in pixels, centre pixel location on x axis, and centre pixel location on y axis. If there is a face, the value stored at the first position can be 1, if there is no face, the value stored at the first position can be 0, and so on”, the height and width in pixels can indicate the size of the object).
Tan and Rudzicz are both considered to be analogous to the claimed invention because they are in the same field of human counting in an image. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the image signal processor as taught by Tan to incorporate the teachings of Rudzicz wherein the position data indicate an estimated mean position of the one or more target objects for the partial region with an estimated number of the at least one target object indicated in the numerical data being equal to or greater than 1. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been for improved efficiency (Rudzicz, [0115]).
Claim 9
Tan discloses the model generation apparatus according to claim 8 (Tan, Fig. 1), wherein the machine learning is performed in such a way that the model outputs a position (Tan, [0065], “The bounding box of the human can also be used to estimate its position in relation to other humans, or with objects such as bed and chair.”).
Tan does not exactly disclose wherein the machine learning is performed in such a way that the model outputs a position and a size of the target object for the partial region with a number of the at least one target object being equal to 1.
However, Rudzicz teaches (Rudzicz, [0046], “the processor is configured to, for one or more regions corresponding to a detected head, hand, or body, compute a bounding box or pixel-level mask for the respective region, a confidence score for the detected head, hand, or body, and compute data indicating the bounding boxes or pixel-level masks, the confidence scores, and the frames of the video data”, the bounding box determines the position of the object within the image, [0129], “The neural network can include a number of linear transformation sub-processes 820, 840, 860, 880. The detection of a face can be represented by a vector 890 of size 5. For example, the vector can include five elements, each describing, respectively: if a face is presented, height in pixels, width in pixels, centre pixel location on x axis, and centre pixel location on y axis. If there is a face, the value stored at the first position can be 1, if there is no face, the value stored at the first position can be 0, and so on”, center pixel location is position and the width and height indicate the size of the object)
Tan and Rudzicz are both considered to be analogous to the claimed invention because they are in the same field of human counting in an image. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the image signal processor as taught by Tan to incorporate the teachings of Rudzicz wherein the machine learning is performed in such a way that the model outputs a position and a size of the target object for the partial region with a number of the at least one target object being equal to 1. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been for improved efficiency (Rudzicz, [0115]).
Claim 10
Tan discloses the model generation apparatus according to claim 8 (Tan, Fig. 1)
Tan does not explicitly disclose wherein the machine learning is performed in such a way that the model outputs a mean position of the one or more target objects and a size of a region including the one or more target objects for the partial region with a number of the at least one target object being equal to or greater than 1.
However, Rudzicz teaches wherein the machine learning is performed in such a way that the model outputs a mean position of the one or more target objects (Rudzicz, [0046], “the processor is configured to, for one or more regions corresponding to a detected head, hand, or body, compute a bounding box or pixel-level mask for the respective region, a confidence score for the detected head, hand, or body, and compute data indicating the bounding boxes or pixel-level masks, the confidence scores, and the frames of the video data”, the bounding box determines the position of the object within the image, [0129], “The neural network can include a number of linear transformation sub-processes 820, 840, 860, 880. The detection of a face can be represented by a vector 890 of size 5. For example, the vector can include five elements, each describing, respectively: if a face is presented, height in pixels, width in pixels, centre pixel location on x axis, and centre pixel location on y axis. If there is a face, the value stored at the first position can be 1, if there is no face, the value stored at the first position can be 0, and so on”, center pixel location is analogous to the mean position of the object as shown as an example on Fig. 16 of the Specification as the open circle as the center of the person), and a size of a region including the one or more target objects for the partial region with a number of the at least one target object being equal to or greater than 1 (Rudzicz, [0046], “the processor is configured to, for one or more regions corresponding to a detected head, hand, or body, compute a bounding box or pixel-level mask for the respective region, a confidence score for the detected head, hand, or body, and compute data indicating the bounding boxes or pixel-level masks, the confidence scores, and the frames of the video data”, the bounding box determines the position of the object within the image, [0129], “The neural network can include a number of linear transformation sub-processes 820, 840, 860, 880. The detection of a face can be represented by a vector 890 of size 5. For example, the vector can include five elements, each describing, respectively: if a face is presented, height in pixels, width in pixels, centre pixel location on x axis, and centre pixel location on y axis. If there is a face, the value stored at the first position can be 1, if there is no face, the value stored at the first position can be 0, and so on”, the height and width in pixels can indicate the size of the object).
Tan and Rudzicz are both considered to be analogous to the claimed invention because they are in the same field of human counting in an image. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the image signal processor as taught by Tan to incorporate the teachings of Rudzicz wherein the position data indicate an estimated mean position of the one or more target objects for the partial region with an estimated number of the at least one target object indicated in the numerical data being equal to or greater than 1. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been for improved efficiency (Rudzicz, [0115]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Chembakassery Rajendran et al., (US 2023/0267742 A1) – discloses dividing the image and further counting the number of person/object inside each grid (Fig. 3).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENISE G ALFONSO whose telephone number is (571)272-1360. The examiner can normally be reached Monday - Friday 7:30 - 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached at (571)272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DENISE G ALFONSO/Examiner, Art Unit 2662
/AMANDEEP SAINI/Supervisory Patent Examiner, Art Unit 2662