DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The disclosure is objected to because of the following informalities:
"Teacher data" is recited in Paragraphs [0001]-[0003], [0005]-[0006], and [0175]; examiner believes this to be a mistranslation as the original Japanese text reads: 教師データ, which more accurately translates to "training data"; 教師 by itself translates to "teacher" and データ by itself translates to "data"; as a result, examiner suggests amending "teacher data" to “training data” to avoid unnecessary confusion.Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-6, 8, 10-11, 13, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Wong et al. ("SmartAnnotator: An Interactive Tool for Annotating RGBD Indoor Images"), hereinafter referenced as Wong, in view of Sharma et al. (US 10977518 B1), hereinafter referenced as Sharma.
Regarding Claim 1, Wong discloses an information processing apparatus (Wong, [Section 7 Evaluation]: teaches testing a system on a benchmark dataset) comprising:
at least one memory storing instructions (Wong, [Section 7 Evaluation]: teaches testing the system on a benchmark dataset that includes stored RGBD images for the Smart-Annotator program <read on instructions> to process; Note: it should be noted that storing images requires memory); and
at least one processor configured to execute the instructions to (Wong, [Section 7 Evaluation]: teaches testing the system on a benchmark dataset; Note: it should be noted that a system requires a processor):
acquire object configuration including a definition of each of one or more elements added to an image in order to indicate an object included in the image and a definition of order of the element (Wong, [Section 4.1 Structure Graph]: teaches using and generating a structure graph <read on object configuration> to encode geometric and structural information from a 3D structure of a scene, where each node indicates a detected object and its spatial and support relationships <read on definition> to the room layout (i.e., pillow on top of bed, bed on top of floor) as shown in FIG. 4; FIG. 4 teaches annotated bounding boxes <read on elements> for each detected object being added in the image, where each bounding box indicates its spatial and support relationships to each other; Note: it should be noted that the structure graph includes levels of a tree data structure, where the levels are being interpreted as the definition of order of an element);
PNG
media_image1.png
246
932
media_image1.png
Greyscale
provide a user interface being set based on a definition of the element indicated by the object configuration information according to order of the element defined by the object configuration information, in order to accept an input for specifying the element with respect to the object (Wong, [Section 6.2 User Session]: teaches the annotating phase, where an interface <read on user interface> allows a user to interact with the system in which they are allowed to click on an object and proceed with one of four actions: "Confirm", "Re-order", "Type", and "Approve All"; [Section 6.2 User Session]: further teaches the four acceptable actions <read on input>, where "Confirm" allows the user to confirm and lock the system's suggestion, "Re-order" allows the user to fix certain predicted labels <read on specifying element with respect to object> by selecting alternative labels, which would modify the level <read on order of element> of the node in the structure graph <read on object configuration information>, "Type" allows the user to override all labels when none of the suggestions are correct, and "Approve All" allows the user to confirm all labels and finish the annotation process as shown in FIG. 6; FIG. 6 teaches the predicted annotated bounding boxes and labels <read on element> for each detected object in the image); and
PNG
media_image2.png
430
930
media_image2.png
Greyscale
[[control in such a way as to store information of the element specified by the input in association with the image.]]
However, Wong does not expressly disclose
control in such a way as to store information of the element specified by the input in association with the image.
Sharma discloses
control in such a way as to store information of the element specified by the input in association with the image (Sharma, [Column 5, Lines 28-37]: teaches a user interface 200 for configuring an annotation job for images, where "a user may provide a name for an annotation job (“name”), an identifier <read on specified element> of the input dataset (the input data elements 138) such as a storage location (e.g., a URL or URI) storing the input data elements 138 (“input dataset location”), and/or an identifier of a location where data generated by the annotation job should be stored (“output dataset location”)" as shown in FIG. 2; Note: it should be noted that "control in such a way as to store information" is being interpreted as a way to control stored information).
PNG
media_image3.png
670
418
media_image3.png
Greyscale
Sharma is analogous art with respect to Wong because they are from the same field of endeavor, namely interactive annotation systems. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to incorporate an annotation application user interface that allows a user to specify annotation jobs and task types, and provide instructions with examples to annotators as taught by Sharma into the teaching of Wong. The suggestion for doing so would allow annotators to achieve high quality annotations through a streamlined user interface, thereby enabling faster and more accurate annotations for the neural network to use. Therefore, it would have been obvious to combine Sharma with Wong.
Regarding Claim 16, it recites the limitations that are similar in scope to Claim 1, but in an information processing method. As shown in the rejection, the combination of Wong and Sharma discloses the limitations of Claim 1. Additionally, Wong discloses an information processing method (Wong, [Section 6 Annotating Phase]: teaches an annotation process <read on information processing method>) comprising:…
Thus, Claim 16 is met by Wong according to the mapping presented in the rejection of Claim 1, given the information processing apparatus corresponds to an information processing method.
Regarding Claim 17, it recites the limitations that are similar in scope to Claim 1, but in a non-transitory computer-readable medium. As shown in the rejection, the combination of Wong and Sharma discloses the limitations of Claim 1. Additionally, Wong discloses a non-transitory computer-readable medium storing a program causing a computer to execute (Wong, [Section 7 Evaluation]: teaches testing the system on a benchmark dataset that includes stored RGBD images for the Smart-Annotator program to process; Note: it should be noted that storing images requires memory <read on non-transitory computer-readable medium>):…
Thus, Claim 17 is met by Wong according to the mapping presented in the rejection of Claim 1, given the information processing apparatus corresponds to a non-transitory computer-readable medium.
Regarding Claim 2, the combination of Wong and Sharma discloses the information processing apparatus of Claim 1. Additionally, Wong further discloses wherein the processor is further configured to execute the instructions to:
further acquire task configuration information indicating a set of one or more objects (Wong, [Section 6.2 User Session]: teaches the annotating phase, where the interface displays drawn cuboids over each detected object and shows the first label in the ordered suggestions <read on acquired task configuration information> on top of each detected object as shown in FIG. 2; Note: it should be noted that Paragraph [0094] of the specification states that the task configuration information includes Object-Order data), and
PNG
media_image4.png
564
933
media_image4.png
Greyscale
provide the user interface with reference to the object configuration information of any object included in a set indicated by the task configuration information (Wong, [Section 6.2 User Session]: teaches the annotating phase, where the interface <read on user interface> displays drawn cuboids over each detected object <read on reference to object configuration information> and shows the first label in the ordered suggestions <read on indicated by task configuration information> on top of each detected object).
Regarding Claim 3, the combination of Wong and Sharma discloses the information processing apparatus of Claim 2. Additionally, Wong further discloses wherein the processor is further configured to execute the instructions to
provide the user interface with reference to the object configuration information selected according to an instruction (Wong, [Section 6.2 User Session]: teaches the annotating phase, where the interface <read on user interface> displays drawn cuboids over each detected object <read on reference to object configuration information> and shows <read on instruction> the first label in the ordered suggestions on top of each detected object, and where each cuboid is selectable).
Regarding Claim 4, the combination of Wong and Sharma discloses the information processing apparatus of Claim 2. Additionally, Wong further discloses wherein the task configuration information includes
a definition of order of a plurality of objects included in the set (Wong, [Section 4.1 Structure Graph]: teaches using and generating a structure graph <read on object configuration> to encode geometric and structural information from a 3D structure of a scene, where each node indicates a detected object and its spatial and support relationships <read on definition> to the room layout (i.e., pillow on top of bed, bed on top of floor) as shown in FIG. 4), and
the processor is further configured to execute the instructions to provide the user interface with reference to the object configuration information selected according to order of the plurality of objects defined in the task configuration information (Wong, [Section 6.2 User Session]: teaches the annotating phase, where the interface <read on user interface> displays drawn cuboids over each detected object <read on reference to object configuration information> and shows the first label in the ordered suggestions <read on task configuration information> on top of each detected object).
Regarding Claim 5, the combination of Wong and Sharma discloses the information processing apparatus of Claim 1. Additionally, Wong further discloses wherein
the object configuration information further defines a constraint condition of arrangement of the element on an image (Wong, [Section 4 Modeling Scene Structure]: teaches denoting the initial segmentation of detected objects in an input image using a segmentation set
S
=
{
s
1
,
…
,
S
N
s
}
, where each segment
s
i
=
{
I
i
,
X
i
,
p
i
,
n
i
}
encodes the image pixels, 3D points, 3D plane, and normal to the 3D plane respectively, where "segments
s
i
and
s
j
are parallel (orthogonal) <read on arrangement of element> if the angle between
n
i
and
n
j
is within (above a tolerance angle of
a
T
(
90
-
a
T
)
"; [Section 4 Modeling Scene Structure]: further teaches defining "
s
i
and
s
j
are coplanar if
s
i
and
s
j
are parallel and distance between them is within a threshold of
d
T
<read on constraint condition>"; Note: it should be noted that the segmentation set is being interpreted as representing the initial bounding boxes <read on element> of detected objects in the input image, where the structure graph provides an order for how the elements should be arranged as shown in FIG. 4), and
the processor is further configured to execute the instructions to support an input for specifying the element, based on the constraint condition (Wong, [Section 4.1 Structure Graph]: teaches using and generating a structure graph <read on object configuration> to encode geometric and structural information from a 3D structure of a scene, where each node indicates a detected object and its spatial and support relationships to the room layout (i.e., pillow on top of bed, bed on top of floor) as shown in FIG. 4; [Section 4.1 Structure Graph]: further teaches the spatial and support relationships of a cuboid
c
i
being defined as contacting a wall or floor respectively, if the distance between its back face and the nearest wall or the distance between the top face of
c
i
and the bottom face of
c
j
are within a threshold
(
d
T
)
<read on constraint condition>; [Section 6.2 User Session]: teaches Re-order, where the user rectifies <read on input> the predicted label by selecting an alternative label among the suggestions <read on specifying element> in the popup menu, thereby modifying the structure graph).
Regarding Claim 6, the combination of Wong and Sharma discloses the information processing apparatus of Claim 5. Additionally, Wong further discloses wherein
the constraint condition restricts a relative relationship of a plurality of the elements on an image (Wong, [Section 5.1 Learning Priors]: teaches a spatial model, where in order to model spatial relationships of an object with respect to the room layout, the most commonly used metrics are measuring the objects' relative distance and orientation to the walls of the room; [Section 5.1 Learning Priors]: further teaches encoding the relationships to the walls as spatial constraints <read on constraint condition of relative relationship>, which are used to guide the local refinement, such as adjusting cuboids <read on element>; [Section 6 Annotating Phase]: teaches the learned models of the geometric and structural priors and the spatial and support constraints are used to facilitate smart annotation of an image).
Regarding Claim 8, the combination of Wong and Sharma discloses the information processing apparatus of Claim 5. Additionally, Wong further discloses wherein, the processor is further configured to execute the instructions to,
when an accepted input does not satisfy the constraint condition, correct a position of the element specified by the input to a position satisfying the constraint condition (Wong, FIG. 6 teaches when the initial cuboids of detected objects are incorrectly labeled and/or placed <read on accepted input not satisfying constraint condition>, the user re-ordering objects <read on input> to improve the dimension and orientation of the bounding box/cuboid of the detected objects, which is done to resolve the ambiguity that would arise from nearby detected objects, such as correcting the cuboid of a pillow <read on correcting position of element>).
Regarding Claim 10, the combination of Wong and Sharma discloses the information processing apparatus of Claim 1. Additionally, Wong further discloses wherein the processor is further configured to execute the instructions to
accept correction of information of the element specified by the input (Wong, FIG. 6 teaches the user re-ordering a nightstand to a bed <read on input>, where the system performs local refinements <read on correction of information> to improve the dimension and orientation of the bed's cuboid/bounding box).
Regarding Claim 11, the combination of Wong and Sharma discloses the information processing apparatus of Claim 1. Wong does not expressly disclose the limitations of Claim 11; however, Sharma discloses wherein the processor is further configured to execute the instructions to
output guide information guiding an input for specifying the element for the object (Sharma, FIG. 5 teaches a user interface with instructions that explains to the user to draw a box <read on input specifying element> around any automobile <read on object> shown in the provided image, where good, bad, and edge case examples <read on guide information> are shown <read on output>).
PNG
media_image5.png
673
437
media_image5.png
Greyscale
Sharma is analogous art with respect to Wong because they are from the same field of endeavor, namely interactive annotation systems. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to incorporate an annotation application user interface that allows a user to specify annotation jobs and task types, and provide instructions with examples to annotators as taught by Sharma into the teaching of Wong. The suggestion for doing so would allow annotators to achieve high quality annotations through a streamlined user interface, thereby enabling faster and more accurate annotations for the neural network to use. Therefore, it would have been obvious to combine Sharma with Wong.
Regarding Claim 13, the combination of Wong and Sharma discloses the information processing apparatus of Claim 1. Additionally, Wong further discloses wherein the processor is further configured to execute the instructions to
generate the object configuration information, based on an accepted input (Wong, FIG. 6 teaches the user re-ordering a nightstand to a bed <read on accepted input>, where the system performs local refinements to improve the dimension and orientation of the bed's cuboid/bounding box by modifying the structure graph <read on object configuration information>), and
acquire the generated object configuration information (Wong, FIG. 6 teaches the user re-ordering a nightstand to a bed, where the system performs local refinements to improve the dimension and orientation of the bed's cuboid/bounding box by modifying the structure graph <read on acquiring object configuration information>).
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Wong et al. ("SmartAnnotator: An Interactive Tool for Annotating RGBD Indoor Images"), hereinafter referenced as Wong, in view of Sharma et al. (US 10977518 B1), hereinafter referenced as Sharma as applied to Claim 5 above respectively, and further in view of Aoshima et al. (US 20240370571 A1), hereinafter referenced as Aoshima.
Regarding Claim 7, the combination of Wong and Sharma discloses the information apparatus of Claim 5. Additionally, Wong further discloses wherein the processor is further configured to execute the instructions to
[[output a warning when]] an accepted input does not satisfy the constraint condition (Wong, FIG. 6 teaches when the initial cuboids of detected objects are incorrectly labeled and/or placed <read on accepted input not satisfying constraint condition>, the user re-ordering objects <read on input> to improve the dimension and orientation of the bounding box/cuboid of the detected objects, which is done to resolve the ambiguity that would arise from nearby detected objects, such as correcting the cuboid of a pillow).
However, the combination of Wong and Sharma does not expressly disclose
output a warning when an accepted input does not satisfy the constraint condition.
Aoshima discloses
output a warning when an accepted input does not satisfy the constraint condition (Aoshima, [0130]: teaches a vulnerability verification unit 133 verifying "whether a specification of an annotation is satisfied <read on constraint condition> using an execution context extracted by the context extraction unit 132," such that "when it cannot be determined that the specification of the annotation is not satisfied, a warning is output").
Aoshima is analogous art with respect to Wong, in view of Sharma because they are from the same field of endeavor, namely handling annotations. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement a verification unit to verify the specification of annotations as taught by Aoshima into the teaching of Wong, in view of Sharma. The suggestion for doing so would inform annotators when improper annotation data is present, thereby resulting in error-free training data. Therefore, it would have been obvious to combine Aoshima with Wong, in view of Sharma.
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Wong et al. ("SmartAnnotator: An Interactive Tool for Annotating RGBD Indoor Images"), hereinafter referenced as Wong, in view of Sharma et al. (US 10977518 B1), hereinafter referenced as Sharma as applied to Claim 1 above respectively, and further in view of Dasgupta et al. (US 10719301 B1), hereinafter referenced as Dasgupta.
Regarding Claim 9, the combination of Wong and Sharma discloses the information apparatus of Claim 1. Additionally, Wong further discloses wherein the object configuration information includes
a definition indicating whether the element is essential to be added to an image (Wong, [Section 6.1 Label Prediction]: teaches adding a label with the highest cost <read on definition of essential element> to the suggestions, which can then be added to an image),
the object configuration information defines order of the element being essential to be added (Wong, [Section 6.1 Label Prediction]: teaches adding a label <read on essential element> with the highest cost to the suggestions, which can then be added to an image, where "the suggestions are built until all the labels in parent suggestions are visited in order," which is the structure graph), and
the processor is further configured to execute the instructions to sequentially provide a user interface being set based on a definition of the element being essential to be added, according to the order (Wong, [Section 6.2 User Session]: teaches the annotating phase, where the interface <read on user interface> displays drawn cuboids over each detected object and shows <read on instruction> the first label in the ordered suggestions on top of each detected object, and where each cuboid is selectable), and
[[provide a user interface associated to a definition of the element not being essential to be added, in response to an instruction to switch a user interface.]]
However, the combination of Wong and Sharma does not expressly disclose
provide a user interface associated to a definition of the element not being essential to be added, in response to an instruction to switch a user interface.
Dasgupta discloses
provide a user interface associated to a definition of the element not being essential to be added, in response to an instruction to switch a user interface (Dasgupta, [Column 48, Lines 38-44]: teaches a query interface <read on user interface associated to definition of element>, which is a part of the model diagnosis interface 148, that includes user control elements that "allow users to alter the query sample, for example to add or remove one or more elements <read on definition of non-essential element> in the sample, and easily rerun the query"; FIG. 1 teaches the model diagnosis interface 148 and the media data management interface 142 being two separate user interfaces, which would requiring UI switching; Note: it should be noted that one skilled in the art would understand that switching between interfaces would require a computer instruction).
PNG
media_image6.png
405
554
media_image6.png
Greyscale
Dasgupta is analogous art with respect to Wong, in view of Sharma because they are from the same field of endeavor, namely handling annotations via an interactive annotation system. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement a query interface that allows users to modify annotation data elements in a training dataset as taught by Dasgupta into the teaching of Wong, in view of Sharma. The suggestion for doing so would allow for detailed reviews of the dataset, thereby allowing annotators to remove unwanted or mislabeled data, which can then be used as good, bad, and edge case examples for future annotation tasks. Therefore, it would have been obvious to combine Dasgupta with Wong, in view of Sharma.
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Wong et al. ("SmartAnnotator: An Interactive Tool for Annotating RGBD Indoor Images"), hereinafter referenced as Wong, in view of Sharma et al. (US 10977518 B1), hereinafter referenced as Sharma as applied to Claim 11 above respectively, and further in view of Ashikawa et al. (US 20140046735 A1), hereinafter referenced as Ashikawa.
Regarding Claim 12, the combination of Wong and Sharma discloses the information processing apparatus of Claim 11. Wong does not expressly disclose the limitations of Claim 12; however, Sharma discloses wherein the processor is further configured to execute the instructions to
manage, for each user, progress of an input for specifying the element for the object (Sharma, [Column 3, Lines 54-56]: teaches an annotation service 110 "that can run annotation jobs on behalf of users"; [Column 4, Lines 40-43]: teaches "a user 102 may seek to run an annotation job <read on manage progress> via the annotation service 110, e.g., to generate labeled data elements that may be used, for example, by the user 102 to later train a machine learning model"; [Column 11, Lines 57-60]: teaches "the updated or adapted job instructions can be provided at circle (9) to annotators working on the annotation job, and the process may continue as described (potentially updating the job instructions multiple times <read on progress of input>)"; FIG. 5 teaches a user interface with instructions that explains to the user to draw a box <read on input specifying element> around any automobile shown in the provided image), and
output the guide information for a user[[, based on a work pace of the user determined from the progress]] (Sharma, FIG. 5 teaches a user interface with instructions that explains to the user to draw a box around any automobile shown in the provided image, where good, bad, and edge case examples <read on guide information> are shown <read on output>).
Sharma is analogous art with respect to Wong because they are from the same field of endeavor, namely interactive annotation systems. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to incorporate an annotation application user interface that allows a user to specify annotation jobs and task types, and provide instructions with examples to annotators as taught by Sharma into the teaching of Wong. The suggestion for doing so would allow annotators to achieve high quality annotations through a streamlined user interface, thereby enabling faster and more accurate annotations for the neural network to use. Therefore, it would have been obvious to combine Sharma with Wong.
However, the combination of Wong and Sharma does not expressly disclose
output the guide information for a user, based on a work pace of the user determined from the progress.
Ashikawa discloses
output the guide information for a user, based on a work pace of the user determined from the progress (Ashikawa, [0055]: teaches the system determining the current pace of progress of a user's completion time).
Ashikawa is analogous art with respect to Wong, in view of Sharma because they are from the same field of endeavor, namely handling delegated tasks. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement an evaluation calculation device to monitor and evaluate the pace of progress of a worker, such as an annotator as taught by Ashikawa into the teaching of Wong, in view of Sharma. The suggestion for doing so would allow the system to associate an appropriate reward amount for the annotators, thereby optimizing workload and task accuracy, resulting in more accurate training datasets without exhausting the annotators. Therefore, it would have been obvious to combine Ashikawa with Wong, in view of Sharma.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Bahrami et al. (US 20200364044 A1) discloses automatic annotation configuration for API formats;
Chandler et al. (US 20220262100 A1) discloses creating one or more annotated perception inputs;
Duffy (US 20170075958 A1) discloses a system for user identification of a desired document;
Irshad et al. (US 20190362186 A1) discloses an assisted image annotation system;
Kuo et al. (US 20240257510 A1) discloses an object localization network (OLN) which is used to localize objects in an instance of vision data;
Pinnamaneni et al. (US 20220222417 A1) discloses creating, organizing, viewing, and connecting annotations of web documents within web browsers;
Shen et al. (US 20200019799 A1) discloses an annotation system that provides tools for facilitating training data annotation; and
Deneke et al. ("A Multi-Agent Approach to Simulate Collaborative Classroom Activities Using Digital Humans") discloses a classroom activity simulation program that utilizes an annotation logger.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KARL TRUONG whose telephone number is (703)756-5915. The examiner can normally be reached 10:30 AM - 7:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached at (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/K.D.T./Examiner, Art Unit 2614
/KENT W CHANG/Supervisory Patent Examiner, Art Unit 2614