Prosecution Insights
Last updated: April 19, 2026
Application No. 18/684,045

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM

Non-Final OA §101§103
Filed
Feb 15, 2024
Examiner
HAUSMANN, MICHELLE M
Art Unit
2671
Tech Center
2600 — Communications
Assignee
Sony Group Corporation
OA Round
1 (Non-Final)
76%
Grant Probability
Favorable
1-2
OA Rounds
3y 1m
To Grant
98%
With Interview

Examiner Intelligence

Grants 76% — above average
76%
Career Allow Rate
658 granted / 863 resolved
+14.2% vs TC avg
Strong +22% interview lift
Without
With
+21.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
23 currently pending
Career history
886
Total Applications
across all art units

Statute-Specific Performance

§101
14.6%
-25.4% vs TC avg
§103
61.2%
+21.2% vs TC avg
§102
5.7%
-34.3% vs TC avg
§112
10.1%
-29.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 863 resolved cases

Office Action

§101 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Interpretation The following is a quotation of 35 U.S.C. 112(f): (f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph: An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: “An information processing apparatus comprising a control unit, the control unit being configured to: estimate…” in claim 1. Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. The publication of the specification indicates “The control unit 250 integrally controls the operation of the terminal device 200 using, for example, a CPU, a graphics processing unit (GPU), and a RAM built in the terminal device 200. For example, the control unit 250 causes the display unit 230 to display a video received from the information processing apparatus 100” ([0097]) “The control unit 130 integrally controls the operation of the information processing apparatus 100 using, for example, a CPU, a graphics processing unit (GPU), and a RAM, provided in the information processing apparatus 100. For example, the control unit 130 is implemented by a processor executing various programs stored in the storage device inside the information processing apparatus 100 using a random access memory (RAM) or the like as a work area. Note that the control unit 130 may be realized by an integrated circuit such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA). Any of the CPU, the MPU, the ASIC, and the FPGA can be regarded as a controller” ([0104]), therefore this is interpreted as the corresponding structure. If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim recites an information processing apparatus comprising a control unit, the control unit being configured to: estimate a person region including a user in distance information generated by a distance measuring device provided in a device used by the user, the person region being estimated based on a user posture estimated using a sensor provided in the device; and update environment information around the user based on the person region and the distance information. The limitation of “estimate a person region”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processing apparatus,” nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the “processing apparatus” language, “estimate” in the context of this claim encompasses the user manually estimating how big a region a user takes up based on images or gyroscope data. Similarly, the limitation of update environment information around the user based on the person region and the distance information, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. For example, but for the “processing apparatus” language, “update” in the context of this claim encompasses the user indicating there is an obstacle near a person. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional element – using a processing apparatus to perform both the estimating and updating steps. The processing apparatus in both steps is recited at a high-level of generality (i.e., as a generic processing apparatus performing a generic computer function of estimating) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a processing apparatus to perform both the estimating and updating steps amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible. Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. A person can “assign” a value by just saying a pixel seems more or less reliable. Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. A person can make a very coarse “occupancy map” to show a person and no person area. Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. A person can say measurements they are taking that are close to them will be more accurate than measurements further away. Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. A person can look at readings from different devices to determine where a person is. Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. A person can tell if a person is sitting or standing. Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. A person can assume a basic person shape has a basic skeleton shape as humans all have a skeleton. Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. A person can read data from a hand held device. Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. A person can make a judgement based on a small area or a whole room or a building. Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. A person can “assign” a value of unknown if they can’t tell if an arm is in a region or not. Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. A person can use information about a plane (the ground they are standing on for instance) to help determine an area with a person. Claim 12 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Wearing an HMD is insignificant post-solution activity. Claim 13 is rejected under 35 U.S.C. 101 for the same reasons as claim 1. Claim 14 is rejected under 35 U.S.C. 101 for the same reasons as claim 1. 35 U.S.C. 101 requires that a claimed invention must fall within one of the four eligible categories of invention (i.e. process, machine, manufacture, or composition of matter) and must not be directed to subject matter encompassing a judicially recognized exception as interpreted by the courts. MPEP 2106. The four eligible categories of invention include: (1) process which is an act, or a series of acts or steps, (2) machine which is an concrete thing, consisting of parts, or of certain devices and combination of devices, (3) manufacture which is an article produced from raw or prepared materials by giving to these materials new forms, qualities, properties, or combinations, whether by hand labor or by machinery, and (4) composition of matter which is all compositions of two or more substances and all composite articles, whether they be the results of chemical union, or of mechanical mixture, or whether they be gases, fluids, powders or solids. MPEP 2106(I). Claim 14 is also rejected under 35 U.S.C. 101 as not falling within one of the four statutory categories of invention because the claimed invention is directed to computer program per se. See MPEP 2106(I). A claim directed toward a non-transitory computer-readable medium having the program encoded thereon establishes a sufficient functional relationship between the program and a computer so as to remove it from the realm of “program per se”. MPEP 2111.05(III). Hence, adding the limitation of “stored on a non-transitory computer-readable medium” would resolve this issue. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1, 7-10, and 12-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Johnson et al. (US 20210124412 A1). Regarding claims 1, 13, and 14, Johnson et al. disclose an information processing apparatus comprising a control unit, the control unit being configured to, an information processing method comprising, and program causing a computer to function as a control unit executing: estimate a person region including a user in distance information generated by a distance measuring device provided in a device used by the user (“The system can modify how the warning generation operates based on the detected modality. For example, the system can decrease the sensitivity or obtrusiveness of the guardian as the user moves into less active modalities (e.g., sitting or lying), such as to decrease the distance threshold from obstacles at which to trigger the guardian when the user is lying down, or change how the warning is used to warn about objects in the periphery of the user's field of view or behind the user. The system can modify a region of the user's field of view that the guardian occupies depending on the modality, such as to expand the guardian if the user is standing or moving”, [0007], “The system can modify how the warning generation operates based on the detected modality. For example, the system can decrease the sensitivity or obtrusiveness of the guardian as the user moves into less active modalities, such as to decrease the distance threshold from obstacles at which to trigger the guardian when the user is lying down, or change how the warning is used to warn about objects in the periphery of the user's field of view or behind the user. The system can modify a region of the user's field of view that the guardian occupies depending on the modality, such as to expand the guardian if the user is standing up”, [0024], “The skeletal inference model 152 can include parameters such as height, reach, limb lengths, and distances between the head and hands in different poses or arrangements to infer the skeletal pose. The skeletal inference model 152 can use the sensor data to evaluate the parameters in order to infer the skeletal pose. The skeletal inference model 152 can include a statistical or probabilistic model that assigns a confidence score to each of a variety of expected skeletal poses using the sensor data (e.g., using positions or orientations of the head and hands as well as parameters that are determined based on the sensor data, such as distances between the hands or between each hand and the head)”, [0062], “In some embodiments, the warning generator 160 uses a distance between the position of the user (e.g., as determined by motion detector 120) and the position of the obstacle to determine an angular range corresponding to the obstacle, such that as the distance decreases, the angle increases. For example, the warning generator 160 can determine an angular range corresponding to an angular size of an appearance of the obstacle within the FOV of the user. The warning generator 160 can assign a weight to each pixel representing the obstacle based on the distance between the position of the user and the position of the obstacle (e.g., greater weight as the distance decreases)”, [0098], “At 325, a warning is generated regarding potential collisions of the user with one or more obstacles in an environment around the user using the type of the pose of the user. Collision data regarding the obstacles can be determined using various sensor data, such as collision data indicating distances to object”, warning can be generated based on comparing the collision data to respective thresholds, [0130]) [“guardian” interpreted as “person region”], the person region being estimated based on a user posture estimated using a sensor provided in the device (When in use, the machine learning model can use position data from the headset and controllers and the skeletal inference model to predict the modality. The system can calibrate to specific users using a body model that generates confidence over time regarding the user's positioning (e.g., height, reach, etc.). The system can detect transitions between modalities, which can enable more effective triggering of changes of how the guardian operates. For example, the machine learning model can be trained using training data that identifies modality transitions (e.g., when a user moves from standing up to sitting down), and thus in operation output an indication of a modality transition, [0023], “The processing circuitry 116 can include a pose detector 148. The pose detector 148 can include any function, operation, routine, logic, or instructions to perform functions such as detecting a pose that the user is oriented in responsive to sensor data from the sensors 104, motion data from the motion detector 120, or various combinations thereof. For example, the pose detector 148 can receive the sensor data from position sensors (e.g., accelerometer data, gyroscope data, camera data, head tracking position sensor 220, hand tracking position sensor 228 described with reference to FIG. 2, or any combination thereof), including receiving the sensor data as motion data from the motion detector 120. As such, the pose detector 148 can detect the pose of the user using data from various stages of processing by the system 100, including using accelerometer data (e.g., position, velocity, or acceleration data), gyroscope data (e.g., angular data, orientation data), or camera data (e.g., image data) that may or may not be processed by the motion detector 120, hand tracker 124, or head tracker 128. The sensor data can include six degree of freedom (DOF) transform data from any of the sensors or trackers. The pose detector 148 can use the received data to categorize the pose of the user based on a set of types (e.g., modalities), such as standing, sitting, or lying down. For example, the pose detector 148 can determine the pose to indicate various aspects of how the user is oriented, such as distance of head or hands from each other, the ground, or other landmarks, angles of hands relative to various planes of the body, or orientation of the head, and can use this information to assign a type to the pose, which may be a particular type of a predetermined set of types. The pose detector 148 can provide an indication of the type of the pose to warning generator 160, which, as described below, can cause the warning generator 160 to modify how the warning generator 160 operates based on the detected type”, [0056]); and update environment information around the user based on the person region and the distance information (“The locations of the one or more obstacles can be detected by adjusting locations of the one or more obstacles as maintained in the obstacle database (or as determined via depth mapping) based on at least one of a position or an orientation of the user. For example, the at least one of the position or the orientation of the user can be periodically sampled (e.g., based on sensor data from position sensors of the VR system, thus taking into account movement of the user) and compared to the frame of reference of the locations of the one or more obstacles to determine the locations of obstacles relative to the user”, [0054], “In some embodiments, the warning generator 160 processes the collision data received from the collision detector 132 to determine an extent of the one or more obstacles in the FOV. For example, the warning generator 160 can evaluate the collision data on a more granular basis than determining whether to generate a warning on a single obstacle basis, such as by evaluating the collision data on a pixel-by-pixel or groups of pixel basis. For example, the warning generator 160 can identify multiple subsets of one or more pixels that correspond to a selected obstacle of the one or more obstacles. The warning generator 160 can identify collision data corresponding to each subset of one or more pixels, and evaluate each subset to determine whether the collision data corresponding to the subset indicates a warning should be generated (e.g., for a particular subset of pixels, determine whether the subset of pixels satisfies the corresponding thresholds). Responsive to determining to generate a warning for the subset, the warning generator 160 can assign the subset of pixels as the extent of the obstacle”, [0097], “As described above, the warning can be generated at edge or edge portions of the FOV of the HMD, such as if the one or more obstacles are to the side of or behind the user, and can be modified responsive to the type of the pose of the user. The warning can be displayed with each image that is displayed by the HMD, or can be displayed at a different frame rate that the images displayed by the HMD. The warning can be displayed using depth map data regarding the one or more obstacles. The warning can be presented as an overlay in the virtual environment displayed using the HMD.”, [0137]) [environment interpreted as obstacles that could collide with the person]. Johnson et al. does not use the exact term “person region”. As Johnson et al. determines a person’s pose (sitting, lying, standing, moving, [0007]) and from that determine the “guardian” which appears to be a zone in which a person is notified of objects with the potential for collisions near them (see for instance: “The system can modify how the warning generation operates based on the detected modality. For example, the system can decrease the sensitivity or obtrusiveness of the guardian as the user moves into less active modalities (e.g., sitting or lying), such as to decrease the distance threshold from obstacles at which to trigger the guardian when the user is lying down, or change how the warning is used to warn about objects in the periphery of the user's field of view or behind the user. The system can modify a region of the user's field of view that the guardian occupies depending on the modality, such as to expand the guardian if the user is standing or moving”, [0007]), it would have been obvious at the time of filing to one of ordinary skill in the art that this “guardian” area can be interpreted as a “person region”. Regarding claim 7, Johnson et al. disclose the information processing apparatus according to claim 1. Johnson et al. further indicate the control unit: estimates skeleton information indicating the user posture, and estimates the person region using the skeleton information (“Sensor data such as accelerometer data, gyroscope data, camera data, or any combination thereof, may be directly used to determine the type of the pose, or may be processed through various portions of a pose estimation (e.g., positional tracking) pipeline, such as through a hand tracker associated with a controller manipulated by a hand of the user or a head tracker associated with a head device (e.g., HMD, HWD) on the head of the user. Various such systems and methods can use skeletal inference models to accurately determine an expected skeletal pose of the user (e.g., using reference points corresponding to the head and one or more hands of the user), and provide the expected skeletal pose to one or more machine learning models trained to categorize the expected skeletal pose into one or more types of poses.”, [0020], “The pose models 152 can include one or more skeletal inference models 152. The skeletal inference model 152 can receive the reference points as input and output an expected skeletal pose of the user. The pose detector 148 can categorize the type of the pose of the user using the expected skeletal pose, or can further process the expected skeletal pose (e.g., using templates or other models 152) to determine the pose which can then be categorized”, [0061]). Regarding claim 8, Johnson et al. disclose the information processing apparatus according to claim 1. Johnson et al. further indicate the control unit estimates an arm of the user as the person region according to a position of a second device gripped by the user (“Sensor data such as accelerometer data, gyroscope data, camera data, or any combination thereof, may be directly used to determine the type of the pose, or may be processed through various portions of a pose estimation (e.g., positional tracking) pipeline, such as through a hand tracker associated with a controller manipulated by a hand of the user or a head tracker associated with a head device (e.g., HMD, HWD) on the head of the user. Various such systems and methods can use skeletal inference models to accurately determine an expected skeletal pose of the user (e.g., using reference points corresponding to the head and one or more hands of the user), and provide the expected skeletal pose to one or more machine learning models trained to categorize the expected skeletal pose into one or more types of poses.”, [0020], The hand tracking sensors 104 can generate motion data including at least one of a position, a velocity, or an acceleration of a respective hand (e.g., of a hand device 224 manipulated by the hand as described with reference to FIG. 2), [0033], “For example, the pose detector 148 can receive the sensor data from position sensors (e.g., accelerometer data, gyroscope data, camera data, head tracking position sensor 220, hand tracking position sensor 228 described with reference to FIG. 2, or any combination thereof), including receiving the sensor data as motion data from the motion detector 120. As such, the pose detector 148 can detect the pose of the user using data from various stages of processing by the system 100, including using accelerometer data (e.g., position, velocity, or acceleration data), gyroscope data (e.g., angular data, orientation data), or camera data (e.g., image data) that may or may not be processed by the motion detector 120, hand tracker 124, or head tracker 128. The sensor data can include six degree of freedom (DOF) transform data from any of the sensors or trackers. The pose detector 148 can use the received data to categorize the pose of the user based on a set of types (e.g., modalities), such as standing, sitting, or lying down. For example, the pose detector 148 can determine the pose to indicate various aspects of how the user is oriented, such as distance of head or hands from each other, the ground, or other landmarks, angles of hands relative to various planes of the body, or orientation of the head, and can use this information to assign a type to the pose, which may be a particular type of a predetermined set of types”, [0056]). Regarding claim 9, Johnson et al. disclose the information processing apparatus according to claim 1. Johnson et al. further indicate the control unit updates the environment information using person region environment information generated based on the distance information in the person region and surrounding environment information generated based on the distance information in an entire region (The environment can be an indoor or outdoor environment, including both natural and man-made structures, terrain, or other objects, including sky, clouds, roads, buildings, streets, pedestrians, or cyclists. The environment can include one or more objects (e.g., real-world objects), which can be represented by the images 112 captured by the sensors, [0034], “For example, the warning generator 160 can determine a size metric of the play space (e.g., the real world environment, such as the room, in which the user is operating the HMD). The warning generator 160 can determine the size metric based on boundaries maintained by the obstacle tracker 140 regarding the one or more obstacles, including obstacles such as walls, floors, and ceilings. The warning generator 160 can decrease the effect of velocity or acceleration data as the size metric decreases (e.g., in smaller spaces the user may be more aware of their surroundings) and increase the effect of velocity or acceleration data as the size metric increases (e.g., in larger spaces the user may be less aware of their surroundings and may make larger motions). For example, the warning generator 160 can assign a greater weight to velocity or acceleration data as the size metric increases”, [0087]). Regarding claim 10, Johnson et al. disclose the information processing apparatus according to claim 9. Johnson et al. further indicate the environment information, the person region environment information, and the surrounding environment information are occupancy grid maps, and the control unit updates the environment information by changing an occupancy state, to an unknown state, of a grid corresponding to the person region environment information among grids of the surrounding environment information (For example, some traditional techniques construct occupancy grids that assign statuses to every possible point within an environment, such statuses including “unoccupied”, “occupied” or “unknown”, [0971]). Regarding claim 12, Johnson et al. disclose the information processing apparatus according to claim 1. Johnson et al. further indicate the device used by the user is worn on a head of the user and provides predetermined content to the user (The VR system can include the HMD (e.g., headset), which can be worn by the user to present the display data to the user, as well as one or more hand devices, such as hand-held controllers, that can be manipulated by the user as the user interacts with the virtual environment, [0018], In some embodiments, the system can include processing circuitry configured to receive sensor data regarding a user operating a head mounted display (HMD), identify a plurality of reference points of a pose of the user based at least on the sensor data, apply one or more models to the plurality of reference points to determine a type of the pose of the user, and select a mode of operation of the HMD responsive to the type of the pose, [0021]; “In some embodiments, the warning generator 160 (or the image renderer 168) uses the depth map data to generate display data regarding the one or more obstacles. For example, the warning generator 160 can use the depth map data to determine the pixels to use to display the one or more obstacles. In some embodiments, the warning generator 160 generates the display data regarding the one or more obstacles in a manner different than used to display the virtual environment, to help distinguish the real world obstacles from objects in the virtual environment. The warning generator 160 can apply peripheral scaling factors as described herein (e.g., relatively higher intensity at the periphery of the FOV of the HMD) to the display data regarding the one or more obstacles generated using depth map data,” [0104]). Claim(s) 2-6 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Johnson et al. (US 20210124412 A1) as applied to claim 1 above, further in view of Ebrahimi Afrouzi et al. (US 20220066456 A1). Regarding claim 2, Johnson et al. disclose the information processing apparatus according to claim 1. Johnson et al. partly disclose the distance information is a depth map, and when a pixel of the depth map is included in the person region, the control unit assigns a person region reliability value to the pixel (depth information to be extracted from the first image 112a and second image 112b, [0030], the pose detector 148 assigns a confidence score to each of several candidate types of poses, the user's body type (e.g., height, weight, arm length, leg length, etc.) can be stored as part of a user profile and used by the pose detector, [0060], The position sensor 220 can output at least one of a position or an orientation of the body 202, depth mapping of obstacles detected via the image capture devices, [0119]) however another reference is added to make this more explicit. Ebrahimi Afrouzi et al. teach the distance information is a depth map, and when a pixel of the depth map is included in the person region, the control unit assigns a person region reliability value to the pixel (In some embodiments, the processor of the robot makes decisions relating to robot navigation and instructs the robot to move along a path that may be (or may not be) the most beneficial way for the robot to increase its depth confidences. In embodiments, motion may be determined based on increasing confidences of enough number of pixels which may be achieved by increasing depth confidences, [0356], determine an uncertainty of the pose of the robot and the state space surrounding the robot, [0421], “In some embodiments, a camera of the robot may capture images t.sub.0, t.sub.1,t.sub.2, . . . , t.sub.n. In some embodiments, the processor of the robot may use the images together with SLAM concepts described herein in real time to actuate a decision and/or series of decisions. For example, the methods and techniques described herein may be used in determining a certainty in a position of a robot arm in relation to the robot itself and the world. This may be easily determined for a robot arm when its fixed on a manufacturing site to act as screwdriver as the robot arm is fixed in place. The range of the arm may be very controlled and actions are almost deterministic. One example may include a factory robot and an autonomous car. The car may approach the robot in a controlled way and end up where it is supposed to be given the fixed location of the factory robot. In contrast to a carwash robot, the position of the robot in relation to a car is probabilistic on its own”, [0442], In some embodiments, the processor of the robot may account for uncertainties that the robot arm may have with respect to uncertainties of the robot itself. For instance, actuation may not be perfect and there may be an error in a predicted location of the robot that may impact an end point of the arm. Further, motors of joints of the robot arm may be prone to error and the error in each motor may add to the uncertainty, [0443], However, if the robot or the object captured by a camera of the robot is in motion, SLAM methods may be necessary to account for uncertainties of motion of the robot and the object and uncertainties of perception due to motion of the robot and the object captured by the camera of the robot, [0615], The probability density of the location of the robot has a large variance and the region surrounding the mean is large due to low confidence. In some embodiments, the processor may relate the location of the plant and the position of the robot using a cost function and minimize the cost function to narrow down a region around the mean., [0621]) [the robot interpreted as the “person” in this scenario as its perspective is the “self”; probability/probabilistic is an indication of “reliability value”). Johnson et al. and Ebrahimi Afrouzi et al. are in the same art of determining pose (Johnson et al., abstract; Ebrahimi Afrouzi et al., [0317]). The combination of Ebrahimi Afrouzi et al. with Johnson et al. will enable assigning a pixel reliability value. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the reliability values of Ebrahimi Afrouzi et al. with the invention of Johnson et al. as this was known at the time of filing, the combination would have predictable results, and Ebrahimi Afrouzi et al. indicate this will help to “devise intelligent path and task plans for efficient navigation and task completion” ([0004]) which would also be useful for a person wanting to navigate their environment in a safe and efficient manner. Regarding claim 3, Johnson et al. and Ebrahimi Afrouzi et al. disclose the information processing apparatus according to claim 2. Ebrahimi Afrouzi et al. further teach the environment information is an occupancy grid map, and the control unit updates an occupancy state of each grid of the occupancy grid map according to a value corresponding to the person region reliability value (In some embodiments, the processor may use an inverse measurement model when filling obstacle data into the coordinate system to indicate obstacle occupancy, free space, or probability of obstacle occupancy, [0421], probabilistic occupancy grid map may be represented using a grayscale image and vice versa, [0597], “Some traditional techniques may use that data to create a computationally expensive occupancy map. In contrast, some embodiments implement a less computationally expensive approach for creating a map whereby, in some cases, the output matrix of depth cameras, any digital camera (e.g., a camera without depth sensing), or other depth perceiving devices (e.g., ultrasonic or laser range finders) may be used. In some embodiments, pixel intensity of captured images is not required. In some cases, the resulting map may be converted into an occupancy map”, [0927]). Regarding claim 4, Johnson et al. and Ebrahimi Afrouzi et al. disclose the information processing apparatus according to claim 2. Ebrahimi Afrouzi et al. further teach the control unit sets the person region reliability value such that the person region reliability value increases as a region is closer to the user (“The processor of the robot initially generates a uniform phase space probability distribution over the phase space D. FIGS. 115A-115D illustrate examples of initial phase space probability distributions the processor may use. FIG. 115A illustrates a Gaussian distribution over the phase space, centered at q=5, p=0. The robot is estimated to be in close proximity to the center point with high probability, the probability decreasing exponentially as the distance of the point from the center point increases”, [1041]-[1042]). Regarding claim 5, Johnson et al. and Ebrahimi Afrouzi et al. disclose the information processing apparatus according to claim 2. Ebrahimi Afrouzi et al. further teach the control unit estimates the person region based on a front distance measurement direction of the distance measuring device and a gravity direction (The pose and maps portion may include a coverage tracker, a pose estimator, SLAM, and a SLAM updater. The pose estimator may include an Extended Kalman Filter (EKF) that uses odometry, IMU, and LIDAR data. SLAM may build a map based on scan matching. The pose estimator and SLAM may pass information to one another in a feedback loop. The SLAM updated may estimate the pose of the robot., [0244], image data bundled with IMU data may be directly sent to a pose estimation subsystem, [0319]) [IMU uses gravity direction]. Regarding claim 6, Johnson et al. and Ebrahimi Afrouzi et al. disclose the information processing apparatus according to claim 2. Johnson et al. further indicate the control unit corrects the person region when a sitting position is detected as the user posture (“The system can modify how the warning generation operates based on the detected modality. For example, the system can decrease the sensitivity or obtrusiveness of the guardian as the user moves into less active modalities (e.g., sitting or lying), such as to decrease the distance threshold from obstacles at which to trigger the guardian when the user is lying down, or change how the warning is used to warn about objects in the periphery of the user's field of view or behind the user”, [0007], “The warning generator 160 can generate the warning by evaluating various combinations of factors. For example, the warning generator 160 can generate no warning when the user is far from the obstacles even if the motion data indicates that the user is moving relatively quickly; generate at least a minimal warning when the user is close to the obstacles even if the motion data indicates that the user is not moving or the type of the pose indicates that the user is sitting down or lying down”, [0077], the warning generator 160 can modify the thresholds to decrease the sensitivity or obtrusiveness of the warnings as the user moves into less active modalities (e.g., less active types of poses, such as lying down or sitting as compared to standing up or moving while standing), such as to decrease a threshold at which to trigger generation and output of the warning when the user is lying down, or change how the warning is used to warn about objects in the periphery of the user's field of view or behind the user, [0080]). Regarding claim 11, Johnson et al. disclose the information processing apparatus according to claim 1. Johnson et al. do not disclose the control unit corrects the person region based on plane information detected using the distance information. Ebrahimi Afrouzi et al. teach the control unit corrects the person region based on plane information detected using the distance information (the robot may use a LIDAR (e.g., 360 degrees LIDAR) to measure distances to objects along a two dimensional plane. For example, a robot may use a LIDAR to measure distances to objects within an environment along a 360 degrees plane. In some embodiments, the robot may use a two-and-a-half dimensional LIDAR. For example, the two-and-a-half dimensional LIDAR may measure distances along multiple planes at different heights corresponding with the total height of illumination provided by the LIDAR, [0585], In some embodiments, the value associated with each cell may be used to determine a location of the cell in a planar surface along with a height from a ground zero plane. In some embodiments, a plane of reference (e.g., x-y plane) may be positioned such that it includes a lowest point in the map. In this way, all vertical measurements (e.g., z values measured in a z-direction normal to the plane of reference) are always positive. In some embodiments, the processor of the robot may adjust the plane of reference each time a new lower point is discovered and all vertical measurements accordingly, [0940], In some embodiments, the processor may describe a geometric feature by defining a region R of a binary image as a two-dimensional distribution of foreground points p.sub.1=(u.sub.1, v) on the discrete plane Z.sup.2 as a set R={x.sub.0, . . . , x.sub.N−1}={(u.sub.0, v.sub.0), (u.sub.1, v.sub.1), . . . , (u.sub.N−1, v.sub.(N−1))}, [1200]). Johnson et al. and Ebrahimi Afrouzi et al. are in the same art of determining pose (Johnson et al., abstract; Ebrahimi Afrouzi et al., [0317]). The combination of Ebrahimi Afrouzi et al. with Johnson et al. will enable correcting the person region based on plane information. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the correcting of Ebrahimi Afrouzi et al. with the invention of Johnson et al. as this was known at the time of filing, the combination would have predictable results, and Ebrahimi Afrouzi et al. indicate this will help to “devise intelligent path and task plans for efficient navigation and task completion” ([0004]) which would also be useful for a person wanting to navigate their environment in a safe and efficient manner. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: US 11507203 B1: “In particular embodiments, the controllers 220 and 230 may each include IMU, cameras, and any suitable sensors, which can be used to perform SLAM for self-localization. Thus, the controllers 220 and 230 may accurately determine the user's hand positions (e.g., as represented by the key point 242A and 242B). As a result, the system may accurately determine at least three keypoints 241, 242A, and 242B associated with the user's head and hands. Because human skeletons have inherent structural constraints, the system may use limited keypoints (e.g., 242A, 242B, and 241) to infer the positions of other keypoints (e.g., neck, shoulders, elbows) and estimate the body pose for the upper body of the user 201. For example, because human skeletons only allow particular arm poses for the arm 207B when the user's hand 242B is at the key point 244B, the system may accurately infer the user's arm pose for the right arm 207B based on the single key point 244B. Similarly, because human skeletons only allow particular arm poses for the arm 207A when the user's hand is at the key point 244A, the system may effectively infer the user's arm pose for the left arm 207A based on the single key point 244A”); US 12425554 B2 (One or more non-transitory computer-readable media having instructions stored thereon, which, when executed, cause a computing device to perform operations comprising: receiving physical pose information associated with a user; estimating a future head pose of the user within a virtual immersive environment based on the physical pose information; determining that an object has a relevance level based on the future head pose of the user; associating the object with a fidelity label that corresponds to the relevance level; and encoding a point cloud representing the object according to the fidelity label associated with the object to generate an encoded point cloud); US 20240310851 A1 (Then, the distance measuring sensor 2 calculates a depth value (depth) d which is a distance from the distance measuring sensor 2 to the object 3 on the basis of four luminance images (the luminance image includes a luminance value (luminance information) of each pixel of the pixel array and coordinate information corresponding to each pixel) supplied for each pixel of the pixel array. Then, the depth map (distance image) in which the depth value d is stored as the pixel value of each pixel and a reliability map in which the reliability conf is stored as the pixel value of each pixel are generated and output from the distance measuring sensor 2 to an external device. Then, the distance measuring sensor 2 calculates a depth value (depth) d which is a distance from the distance measuring sensor 2 to the object 3 on the basis of four luminance images (the luminance image includes a luminance value (luminance information) of each pixel of the pixel array and coordinate information corresponding to each pixel) supplied for each pixel of the pixel array. Then, the depth map (distance image) in which the depth value d is stored as the pixel value of each pixel and a reliability map in which the reliability conf is stored as the pixel value of each pixel are generated and output from the distance measuring sensor 2 to an external device.). Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084. The examiner can normally be reached 10-7 M-F. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent M Rudolph can be reached at (571) 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /MICHELLE M ENTEZARI HAUSMANN/Primary Examiner, Art Unit 2671
Read full office action

Prosecution Timeline

Feb 15, 2024
Application Filed
Jan 17, 2026
Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602775
INTERPOLATION OF MEDICAL IMAGES
2y 5m to grant Granted Apr 14, 2026
Patent 12602793
Systems and Methods for Predicting Object Location Within Images and for Analyzing the Images in the Predicted Location for Object Tracking
2y 5m to grant Granted Apr 14, 2026
Patent 12602949
SYSTEM AND METHOD FOR DETECTING HUMAN PRESENCE BASED ON DEPTH SENSING AND INERTIAL MEASUREMENT
2y 5m to grant Granted Apr 14, 2026
Patent 12597261
OBJECT MOVEMENT BEHAVIOR LEARNING
2y 5m to grant Granted Apr 07, 2026
Patent 12597244
METHOD AND DEVICE FOR IMPROVING OBJECT RECOGNITION RATE OF SELF-DRIVING CAR
2y 5m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
76%
Grant Probability
98%
With Interview (+21.6%)
3y 1m
Median Time to Grant
Low
PTA Risk
Based on 863 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month