Last updated: April 19, 2026
Application No. 18/089,071
BEHAVIOR RECOGNITION DEVICE, BEHAVIOR RECOGNITION METHOD, AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM

Final Rejection §103
Filed
Dec 27, 2022
Examiner
HAUSMANN, MICHELLE M
Art Unit
2671
Tech Center
2600 — Communications
Assignee
Panasonic Intellectual Property Corporation of America
OA Round
2 (Final)
Interview Optional

— +21.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 863 resolved cases, 2023–2026
Examiner Intelligence

HAUSMANN, MICHELLE M View full profile →
Grants 76% — above average
Career Allow Rate
658 granted / 863 resolved
+14.2% vs TC avg
Strong +22% interview lift
Without
With
+21.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
23 currently pending
Career history
886
Total Applications
across all art units
Statute-Specific Performance

§101
14.6%
-25.4% vs TC avg
§103
61.2%
+21.2% vs TC avg
§102
5.7%
-34.3% vs TC avg
§112
10.1%
-29.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 863 resolved cases
Office Action

§103
DETAILED ACTION
Response to Amendment
Claims 1-12 are pending. Claims 1-12 are amended directly or by dependency on an amended claim.
Response to Arguments
Applicant’s arguments with respect to the 35 USC 103 rejections of claim(s) 1-12 have been considered but are moot because the new ground of rejection does not rely on the combination of references applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. In particular the new limitations which Mishra does not disclose are now taught by Niu et al. and Cai et al..
Applicant’s arguments, see pages 9-12, filed 12 August, 2025, with respect to the 35 USC 101 rejections of claims 1-12, along with accompanying amendments received on the same date have been fully considered and are persuasive.  The 35 USC 101 rejections of claims 1-12 have been withdrawn. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 11 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mishra (US 9305216 B1) in view of Niu et al. (“AtLAS: An Activity-Based Indoor Localization and Semantic Labeling Mechanism for Residences”) in view of Cai et al. (US 11380108 B1).

Regarding claims 1, 11, and 12, Mishra discloses a behavior recognition device that recognizes a behavior of a person in a building, the behavior recognition device comprising a processor; and a memory including a program that, when executed by the processor, causes the processor to; a method for recognizing a behavior of a user in a building, the method causing a computer to perform; and a non-transitory computer-readable recording medium recording a program for recognizing a behavior of a user in a building, the program causing a computer to: acquire target behavior information including one or more target behaviors that are each to be a predetermined recognition target, room layout information on the building, and position information on an image sensor installed in the building (
    PNG
    media_image1.png
    400
    738
    media_image1.png
    Greyscale
see Fig. 1A and 1B: imaging device 140 is “image sensor”, worker 150 and workstation 160-3 are “predetermined recognition target”, conveyors 135-1, 135-3 form “room layout information” and col. 2, line 55 - col. 3, line 25; see Fig 5: commence monitoring of environment using image devices, provide to classifiers; “Many imaging devices also include manual or automatic features for modifying their respective fields of view or orientations. For example, a digital camera may be configured in a fixed position, or with a fixed focal length (e.g., fixed-focus lenses) or angular orientation. Alternatively, an imaging device may include one or more motorized features for adjusting a position of the imaging device,” col. 5, line 60 - col. 6, line 5 [interpreted as position information], “Detecting tools used by workers in the performance of such tasks, or the items or containers themselves, and tracking their changes in state, which may include but are not limited to linear, translational or rotational motion, may be used to discern which of the actions or activities is being performed by each of the workers. For example, the vertical and/or horizontal changes in position of a tool, as well as changes in rotation or angular alignment of the tool, may be tracked and associated with one or more actions or activities involving the tool”, col. 9, line 55 - col. 10, line 25 [interpreted as position information], “The imaging device 240 may capture one or more still or moving images, as well as any relevant audio signals or other information, within one or more designated locations within the fulfillment center 230,” col. 13, lines 20-45 [interpreted as position information] “The fulfillment center 230 may also include one or more predefined two-dimensional or three-dimensional storage areas including facilities for accommodating items and/or containers of such items, such as aisles, rows, bays, shelves, slots, bins, racks, tiers, bars, hooks, cubbies or other like storage means, or any other appropriate regions or stations” col. 14, lines 35-60 [interpreted as layout information], based on the presence or absence of objects within an environment, and the states or changes in states of the objects, e.g., changes in position or orientation of the objects caused by one or more types of motion, as determined from imaging data captured from the environment, col. 22, lines 1-15 [recognition target, position], 
    PNG
    media_image2.png
    462
    660
    media_image2.png
    Greyscale
see Fig 8A and col. 23, lines 40-65:  imaging device 840 is “image sensor”, patron and staff are “predetermined recognition target”, shelves 860-5-860-7 and cart 860-4 form “room layout information”); specify a space in which the image sensor is installed and select a candidate behavior that is estimated that a user acts in a space from among the one or more target behaviors included in the target behavior information based on the position information on the image sensor and the room layout information (Once the contextual cues within a scene of an environment, e.g., not only the objects recognized as present therein but also the states or changes in states of such objects, have been identified, the contextual cues may be leveraged in order to narrow a set of possible actions or activities that may have occurred within the scene and been captured within the imaging data, thereby facilitating the process by which a predicted action or activity is identified, col. 2, lines 40-55, According to the systems and methods of the present disclosure, a set of possible actions 155 that may be performed by an actor within an environment and which may be detected and classified using imaging data (e.g., one or more still or moving images) captured from the environment may be narrowed based on a context associated with the environment that may be determined by identifying one or more objects or entities that are present therein identified therein, col. 3, lines 40-60, col. 9, lines 25-55, According to some embodiments, the classifiers to which the captured imaging data is provided may be selected for any purpose based on the environment. For example, where imaging data is captured from a scene at a construction site, the imaging data may be provided to classifiers that are configured to recognize hardware tools (e.g., hammers, saws, drills, sanders and the like). Where imaging data is captured from a scene at a bank, the imaging data may be provided to classifiers that are configure to recognize bank-related objects or entities including but not limited to keyboards, printers, monitors, pens, moneybags, money sacks or carts, col. 17, lines 5-23); acquire image data detected by the image sensor (Machine vision systems and methods are typically provided in order to enable computers to see, i.e., to visually recognize and identify one or more objects, from imaging data captured by an imaging device, e.g., one or more digital cameras, col. 5, lines 25-50, The imaging device 240 may comprise any form of optical recording device that may be used to photograph or otherwise record images of structures, facilities or other elements within the fulfillment center 230, as well as the items within the fulfillment center 230, or for any other purpose, Although the working area 235 of FIG. 2 includes a single imaging device 240, any number or type of imaging devices may be provided in accordance with the present disclosure, including but not limited to digital cameras or other optical sensors, col. 13, lines 20-45, the monitoring of an environment using one or more imaging devices commences, and at box 510, imaging data is captured from a foreground of the environment. The imaging devices with which the imaging data was captured may be any type or form of imaging device, e.g., a digital camera or depth sensor, col. 19, lines 5-25); determine a predetermined one or more recognizers corresponding to the candidate behavior and calculate the one or more feature values of the image data using the one or more predetermined recognizers (Outputs of the second set of classifiers may be used to generate probabilities that the motion is associated with one or more predetermined activities, col. 3, lines 39-58, identify probabilities that the objects, entities, states or changes in state are associated with one or more predetermined actions or activities, col. 11, lines 15-33, The objects or entities may be recognized by providing the imaging data to a first set of classifiers, e.g., one or more support vector machines or other learning models, each of which may be trained to recognize one or more specific objects or entities therein, col. 3, lines 40-60, Such classifiers may be trained to recognize edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of particular objects or entities within an environment, as well as whether such objects are stationary or in motion, col. 10, lines 35-68, At box 720, the imaging data may be processed to recognize one or more objects therein. The imaging data may be analyzed in order to identify one or more edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of such objects, e.g., by providing the imaging data to one or more classifiers that are trained to recognize individual objects therein, col. 22, lines 30-45); recognize the candidate behavior (“The objects or entities may be recognized by providing the imaging data to a first set of classifiers, e.g., one or more support vector machines or other learning models, each of which may be trained to recognize one or more specific objects or entities therein. Once the objects or entities have been recognized, motion of the objects or entities may be subsequently tracked by providing the imaging data to a second set of classifiers, each of which may be trained to recognize a type or kind of motion of the objects or entities. Outputs of the second set of classifiers may be used to generate probabilities that the motion is associated with one or more predetermined activities, or is not associated with any previously known activities”, col. 3, lines 40-60, Once the characteristics of stationary or moving objects or portions thereof have been recognized in one or more digital image, such characteristics of the objects or portions thereof may be matched against information regarding edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of known objects, which may be stored in one or more data stores, col. 7, lines 5-20, At box 360, a most probable action or activity is determined based on the outputs from the second set of classifiers, col. 17, lines 45-60); and output a recognition result of the candidate behavior, which is recognized based on the feature value (After an object or an entity has been detected and recognized within the imaging data, the states or changes in state of the object or the entity may be tracked based on imaging data that is subsequently captured at later points in time, col. 10, lines 35-68, At box 370, information regarding the most probable action or activity is stored in one or more data stores, and the process ends, col. 17, lines 45-60, At box 740, the motion of the recognized objects within the monitored environment is classified using the imaging data, and at box 750, a most likely action associated with the objects and the classified motion is selected, col. 22, lines 45-60).

Mishra does not explicitly disclose the room layout information being two-dimensional or three-dimensional information expressing components of rooms and a shape and a positional relationship of each of the rooms, combine the one or more feature values and input combined feature values to a classifier to recognize the candidate behavior, and control an appliance in the building in accordance with the recognition result.

Niu et al. teach the room layout information being two-dimensional or three-dimensional information expressing components of rooms and a shape and a positional relationship of each of the rooms (“By mining the semantics of objects from action patterns and the topological structure among objects, indoor maps can be labeled accurately”, part 1, “Before semantic labeling, we need to detect all indoor objects to generate floorplan,” 
    PNG
    media_image3.png
    342
    362
    media_image3.png
    Greyscale
, part IIID, Residences always have some functional areas, such as a bedroom, kitchen, etc. As shown in Fig. 10, each node belongs to one or more functional areas (e.g., there is an overlap between two functional areas). 
    PNG
    media_image4.png
    278
    400
    media_image4.png
    Greyscale
, We cannot consider this area as bedroom because the accuracy of both results is not quite different. The reason for this case is that there may be no door between two functional areas, part IIIE, These residences include three kinds of house types and indoor layouts (90, 120, and 160 m2), and for each type, we have two families to collect data, part IV, 
    PNG
    media_image5.png
    262
    402
    media_image5.png
    Greyscale
, part IVB); specify a space in which the image sensor is installed and select a candidate behavior that is estimated that a user acts in the space from among the one or more target behaviors included in the target behavior information based on the position information on the image sensor and the room layout information (The key idea is that some objects in an indoor environment, such as doors and toilets, determine predictable human behaviors in
small areas, abstract, 
    PNG
    media_image6.png
    129
    402
    media_image6.png
    Greyscale
, part IIIE, The system [21] fixes a wearable depth camera on user’s chest to recognize activities, part VA, Some systems leverage environmental physical features to conduct indoor localization without relying on the RF signature [1], [2], [41]–[43]. In such systems, users’ relative positions to physical features and specific reference points are obtained from videos and images. ClickLoc [37] is an accurate, easy to deploy, sensor enriched, and image-based indoor localization system. These vision-based positioning methods need users to equip a depth camera or set extra specific devices in the indoor environment, and are easily disturbed by the surroundings, part VB).

Mishra and Niu et al. are in the same art of activity labeling (Mishra, abstract; Niu et al., abstract). The combination of Niu et al. with Mishra will enable using a floorplan. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the floorplan of Niu et al. with the invention of Mishra as this was known at the time of filing, the combination would have predictable results, as a floorplan is a common way to express an environment being examined, and as Niu et al. indicate, “An indoor map with semantic knowledge is also crucial for indoor LBSs to improve positioning accuracy and further understand human behaviors” (part 1), implying an improvement to the recognition that may result from the combination of inventions. 

Mishra and Niu et al. do not explicitly disclose combine the one or more feature values and input combined feature values to a classifier to recognize the candidate behavior and control an appliance in the building in accordance with the recognition result.

Cai et al. teach combine the one or more feature values and input combined feature values to a classifier to recognize the candidate behavior (“A: A system comprising: one or more processors; and one or more computer-readable media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the system to perform operations comprising: receiving sensor data of an environment, the sensor data comprising image data; determining an object based at least in part on the sensor data; tracking a behavior of the object based at least in part on the sensor data; inputting the image data into a first portion of a machine-learned model; receiving, from the first portion of the machine-learned model, an image feature representation representing the object; receiving a top-down representation, the top-down representation representing the object; concatenate, as a concatenated representation, the image feature representation and the top-down feature representation; inputting the concatenated representation into a second portion of the machine-learned model; receiving, from the second portion of the machine-learned model, a predicted behavior of the object; and determining a difference between the predicted behavior of the object and the behavior of the object; and altering one or more parameters of one or more of the first portion or the second portion of the machine-learned model to minimize the difference”, col. 28, line 55 - col. 29, line 10) and control an appliance in the building in accordance with the recognition result (controlling an autonomous vehicle based on the predicted behavior, col. 2, lines 1-5, col. 4, lines 20-25).

Mishra and Niu et al. and Cai et al. are in the same art of activity labeling (Mishra, abstract; Niu et al., abstract; Cai et al., col. 28, line 55 - col. 29, line 10). The combination of Cai et al. with Mishra and Niu et al. will enable using combined feature values. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the combined values of Cai et al. with the invention of Mishra and Niu et al. as this was known at the time of filing, the combination would have predictable results, as a floorplan is a common way to express an environment being examined, and as Cai et al. indicate, “The techniques discussed herein can improve a functioning of a computing device in a number of ways. For instance, the machine-learned model may use as few as a single image or single video frame to make reliable behavior predictions of objects in the environment surrounding a vehicle. Consequently, significantly less processing resources are used in comparison to conventional techniques that require complex image analysis algorithms applied to sequences of multiple images to predict a behavior of an object. Further, conventional techniques that predict object behavior often require multiple observations (e.g., multiple images or video frames), and thus these conventional techniques have higher latency than the techniques described herein. Since the behavior prediction can be made from a single image, the object direction component may be able to determine predicted behaviors more quickly and/or for more objects in the environment than would be possible if more images, and/or other sensor data, was required. Additionally, supplementing top-down predictions with image features allows the machine-learned model to decipher interactions between objects from a single image, which would require multiple frames and/or images captured over time to determine using conventional techniques. In some cases, the described techniques are more accurate than other behavior prediction mechanisms, thus improving safety of the autonomous vehicle and surrounding persons and/or vehicles” (col. 5, lines 10-65), providing a safety benefit when these inventions are combined.

Regarding claim 2, Mishra, Niu et al., and Cai et al. disclose the behavior recognition device according to claim 1.  Mishra further indicates the processor determines a plurality of recognizers when the candidate behavior is a predetermined behavior, and the processor combines feature values calculated by the plurality of recognizers to recognize the candidate behavior based on the combined feature values (Furthermore, edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of objects, or portions of objects, expressed in still or moving digital images may be identified using one or more algorithms or machine-learning tools, col. 6, line 50 - col. 7, line 5, For example, the vertical and/or horizontal changes in position of a tool, as well as changes in rotation or angular alignment of the tool, may be tracked and associated with one or more actions or activities involving the tool. Likewise, attributes of the tool such as color, contours or textures may be further tracked and associated with such actions or activities. For example, during operation, a lawn mower may change in color or texture, and its bag may expand, due to the cutting of grass and the proliferation of the cut grass into the bag and onto one or more external surfaces of the lawn mower. Therefore, the lawn mower may be determined to have been used in operation where one or more external surfaces of the lawn mower turns green or otherwise develops a roughened texture, or where the bag expands in volume and shape, col. 9, line 55 - col. 10, line 25, “According to some embodiments of the present disclosure, discrete sets of imaging data (e.g., clips or video data files having finite lengths on the order of tens of seconds) may be provided to one or more first classifiers that are each specifically configured to detect and recognize one or more objects or entities therein, as well as one or more states of such objects or entities. Such classifiers may be trained to recognize edges, contours, outlines, colors, textures, silhouettes, shapes or other characteristics of particular objects or entities within an environment, as well as whether such objects are stationary or in motion, or in one or more of a defined set of possible conditions (e.g., the container 160-3 of FIGS. 1A-1D, which is open at time t.sub.1 and sealed at time t.sub.2). For example, within a fulfillment center environment, the classifiers may be trained to recognize a computer keyboard or other interface, a bin or other item carrier, or one or more accessories for receiving, storing, retrieving, packing or shipping items based on their respective outlines, colors or shapes or other intrinsic features, as well as whether any of the keys of the keyboard is actuated (e.g., depressed downward or returning upward), or whether the bin or the other item carrier is empty or partially or completely filled with items. After an object or an entity has been detected and recognized within the imaging data, the states or changes in state of the object or the entity may be tracked based on imaging data that is subsequently captured at later points in time. Some classifiers that may be utilized in order to recognize an object or an entity, or to determine a state or a change in state of the object or entity, include but are not limited to support vector machines, Bayes classifiers, neural networks, Random Forest methods, deep learning methods or any other type of machine learning tool, method, algorithm or technique”, col. 10, lines 35-68, Next, using one or more second classifiers that are each specifically configured to identify specific actions or activities, or groups of actions or activities, information regarding the objects or entities within the environment, and the static or dynamic states of such objects or entities, may be used to select one or more actions or activities with which such object or entities, or states or changes in state, are associated. For example, in a fulfillment center environment, such classifiers may be trained to associate the appearance of an item within (or disappearance of the item from) imaging data, as well as the linear or rotational motion (e.g., raising, lowering, side-to-side translation, spinning, tumbling) of bins or item carriers that have been recognized within imaging data with an action or an activity, col. 11, lines 1-15, At box 770, the sequence of the most likely actions defined at box 760 is compared to one or more established procedures. For example, where the sequence of actions is associated with operating a crane or erecting scaffolding, which require the establishment of safety regions using temporary fencing or other barricades, the sequence of likely actions may be evaluated to determine whether such fencing or barricades were installed, or whether such regions were established. Where the sequence of actions is associated with performing a task such as trimming weeds or blowing leaves using powered equipment that requires the wearing of safety goggles and/or ear protection, the sequence of actions may be evaluated to determine whether such goggles or ear protection were worn. Where the sequence of actions is associated with an activity requiring the pounding of nails or the driving of screws, the sequence of actions may be evaluated to determine whether the nails were pounded with a hammer or the screws were driven with a screwdriver, col. 23, lines 1-20

    PNG
    media_image7.png
    612
    432
    media_image7.png
    Greyscale

[color, change in rotation, change in state, timing, order are different recognizers]

Regarding claim 3, Mishra, Niu et al., and Cai et al. disclose the behavior recognition device according to claim 2.  Mishra further indicates the predetermined behavior is cleaning, brushing teeth, cooking, washing, using a computer, reading, or eating (Similarly, where a wine glass is identified in an environment within imaging data, the set of possible actions or activities that may be performed by an actor may be narrowed to washing or drying the glass, placing the glass on a shelf or removing the glass therefrom, filling the glass or drinking from the glass. By tracking the repeated and temporary raising and lowering of the glass, and a temporary rotation of the glass from a vertical orientation to an angled or horizontal orientation, and back to the vertical orientation again, or any other changes in the state of the glass, it may be determined that the actor is drinking from the glass, and not washing, drying, storing or filling the glass, col. 9, lines 5-30, For example, where a broom is identified within imaging data captured from one or more imaging devices mounted in a warehouse, it may be reasonably inferred that a worker using the broom is sweeping a floor and not mopping, painting or waxing it, col. 9, lines 25-55, as well as whether any of the keys of the keyboard is actuated (e.g., depressed downward or returning upward), col. 10, lines 35-68, Any content-indexing and retrieval system may be enhanced using one or more of the embodiments disclosed herein, such as by classifying such content with one or more activity types such as “surfing,” “golfing,” “cooking,” “sports” or “racing” based on the objects such as surf boards, golf clubs, stoves or other cooking implements, balls or other sporting accessories, or automobiles moving at a high rate of speed expressed within such files, col. 25, lines 1-35).

Regarding claim 4, Mishra, Niu et al., and Cai et al. disclose the behavior recognition device according to claim 1.  Mishra further indicates each of the predetermined one or more recognizers is constituted of a convolution neural network (Some classifiers that may be utilized in order to recognize an object or an entity, or to determine a state or a change in state of the object or entity include neural networks, col. 10, lines 35-68), and the processor recognizes the candidate behavior using a classifier using any one of logistics regression, a support vector machine, a decision tree, random forest, a k-nearest neighbor algorithm, Gaussian naive Bayes, a perceptron, and a stochastic descent method (After an object or an entity has been detected and recognized within the imaging data, the states or changes in state of the object or the entity may be tracked based on imaging data that is subsequently captured at later points in time. Some classifiers that may be utilized in order to recognize an object or an entity, or to determine a state or a change in state of the object or entity, include but are not limited to support vector machines, Bayes classifiers, neural networks, Random Forest methods, deep learning methods or any other type of machine learning tool, method, algorithm or technique, col. 10, lines 35-68, plurality of object or entity classifiers, e.g., one or more support vector machines, col. 19, lines 20-35, one or more state classifiers, which may also be support vector machines or other learning models, col. 19, lines 30-50).

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mishra (US 9305216 B1) and Niu et al. (“AtLAS: An Activity-Based Indoor Localization and Semantic Labeling Mechanism for Residences”) and Cai et al. (US 11380108 B1) as applied to claim 1 above, further in view of Takeichi et al. (US 20200250408 A1).

Regarding claim 5, Mishra, Niu et al., and Cai et al. disclose the behavior recognition device according to claim 1.  Mishra, Niu et al., and Cai et al. do not explicitly disclose the processor recognizes the candidate behavior using the classifier that is machine-learned with the one or more feature values as an explanatory variable and the target behavior as an objective variable.

Takeichi et al. teach a processor recognizes the candidate behavior using a classifier that is machine-learned with the feature value as an explanatory variable and the target behavior as an objective variable (“For example, in order to evaluate the step length and cadence, a value such as a height, weight, age, gender, exercise speed, or level of a marathon time as the “value based on the physical condition of the subject” for a plurality of subjects may be used as an explanatory variable, and a regression equation including the step length or cadence at the speed as an objective variable and a standard deviation may be calculated in advance and stored as a regression analysis result”, [0012], Herein, the value based on the physical condition is a value having a correlation with particularly a value used as an objective variable in the regression equation out of values representing the physical condition of the subject under examination, a physical exercise level, physical exercise feature, and physical exercise attribute of the subject, and the like, [0048], For example, in a case of a specification where the exercise speed, height, or the like of the subject under examination is used as a value V based on the physical condition of the subject, the reference storage 132 stores a regression equation (E=aV+b) having the exercise speed, height, or the like as an explanatory variable and the tentative value E of each of the running form parameters as an objective variable, and the standard deviation SD for each of the running form parameters, [0057], “That is, an injury function for calculating the injury risk is created through statistical analysis using the presence or absence of injury as an objective variable and the running form parameters as explanatory variables, and stored in the reference storage 132. The evaluation processor 146 substitutes the analysis value P of each of various running form parameters into the injury function to evaluate the presence or absence of the injury risk. This allows the user to quantitatively and easily recognize his/her injury risk only through image-analysis of the moving image, and therefore it can be expected to avoid injury”, [0069]).

Mishra and Takeichi et al. are in the same art of tracking an activity (Mishra, abstract; Takeichi et al., abstract). The combination of Takeichi et al. with Mishra will enable using an explanatory variable and the target behavior as an objective variable. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the variables of Takeichi et al. with the invention of Mishra as this was known at the time of filing, the combination would have predictable results, and as Takeichi et al. indicate “The evaluation processor 146 substitutes the analysis value P of each of various running form parameters into the injury function to evaluate the presence or absence of the injury risk. This allows the user to quantitatively and easily recognize his/her injury risk only through image-analysis of the moving image, and therefore it can be expected to avoid injury” ([0069]) thereby suggesting an improvement to the invention of Mishra by enabling an additional quantitative element to allow for easy comparisons, which as noted by Takeichi et al. can allow the invention to be used to improve user safety.

Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mishra (US 9305216 B1) and Niu et al. (“AtLAS: An Activity-Based Indoor Localization and Semantic Labeling Mechanism for Residences”) and Cai et al. (US 11380108 B1) as applied to claim 1 above, further in view of Cai et al. (US 11380108 B1).

Regarding claim 6, Mishra, Niu et al., and Cai et al. disclose the behavior recognition device according to claim 1.  Mishra, Niu et al., and Cai et al. do not explicitly disclose the processor weights each of the one or more feature values using a weight coefficient determined in advance in accordance with the candidate behavior, and recognizes the candidate behavior based on each of the one or more feature values weighted.

Cai et al. teach the processor weights each feature value using a weight coefficient determined in advance in accordance with the candidate behavior, and recognizes the candidate behavior based on each feature value weighted (“Therefore, the described techniques supplement predictions using top-down representations as determined from image data by giving a machine-learned model access to the image data to generate a corresponding top-down representation of image features. In this way, the machine-learned model may “learn” features that are important to predicting object behavior from the image features, without requiring that the features be enumerated beforehand. Further, the machine-learned model may learn features that are not important (or are less important) to predicting object behavior, and forego analysis of such features, or give such features less weight when predicting object behavior,” col. 3, lines 15-30, “Although the top-down representation 114 and the image feature representation 122 are capable of including information of similar types and values, in some cases, the information embedded in the two different representations will be different. As discussed above and below, conventional top-down image generation techniques may rely upon previously enumerated feature types, which may result in the top-down image ignoring features that may indicate a behavior that would affect how the autonomous vehicle 106 is controlled, while devoting processing resources to features that may have little effect on object behavior that is relevant to the autonomous vehicle 106. By providing access to the image feature representation 122, new features that are relevant to object behavior may be determined, and relevance of features that affect object behavior may be weighted more accurately to control driving outcomes of the autonomous vehicle 106”, col. 11, lines 10-30, “Thus, the concatenation component may create a “concatenated representation” that includes information from both the top-down representation 114 and the image feature representation 122 regarding object type, bounding boxes, movement information, and the like. In some examples, as discussed in more detail below with regards to FIG. 3, a machine-learned model may be trained to make improved predictions from the concatenated features about object behaviors in the environment surrounding the autonomous vehicle 106”, col. 11, lines 25-45).

Mishra and Cai et al. are in the same art of tracking an activity (Mishra, abstract; Cai et al., abstract). The combination of Cai et al. with Mishra will enable using weighted feature values. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the weighted values of Cai et al. with the invention of Mishra as this was known at the time of filing, the combination would have predictable results, and as Cai et al. indicate, “That is, the predictions of future states of objects in an environment can be based on candidate actions proposed to be performed by the autonomous vehicle and such predictions may comprise improved predictions with respect to the additional objects” (col. 2, line 50 – col. 3, line 10) “Additionally, supplementing top-down predictions with image features allows the machine-learned model to decipher interactions between objects from a single image, which would require multiple frames and/or images captured over time to determine using conventional techniques. In some cases, the described techniques are more accurate than other behavior prediction mechanisms, thus improving safety of the autonomous vehicle and surrounding persons and/or vehicles” “By controlling the vehicle based in part on predicted behaviors of objects determined using image features, the safety of the vehicle can be improved by predicting object behaviors faster and earlier, thus allowing the vehicle to make its own trajectory decisions earlier. Further, techniques for controlling the vehicle based in part on predicted behaviors of objects determined from image features can increase a confidence that the vehicle can avoid collisions with oncoming traffic and/or pedestrians by determining the behaviors earlier and with greater accuracy, which may improve safety outcomes, performance, and/or accuracy. These and other improvements to the functioning of the computer are discussed herein” (col. 5, lines 10-65) showing the combination of inventions can be used to improve user safety in driving applications, and can improve the predictions the Mishra and Cai et al. inventions rely upon.

Claim(s) 7-8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mishra (US 9305216 B1) and Niu et al. (“AtLAS: An Activity-Based Indoor Localization and Semantic Labeling Mechanism for Residences”) and Cai et al. (US 11380108 B1) as applied to claim 1 above, further in view of Tanaka (US 20190272143 A1).

Regarding claim 7, Mishra, Niu et al., and Cai et al. disclose the behavior recognition device according to claim 1.  Mishra further indicates one or more objects installed in the building are extracted from the room layout information (
    PNG
    media_image1.png
    400
    738
    media_image1.png
    Greyscale
see Fig. 1A and 1B: conveyors 135-1, 135-3 form “room layout information” and col. 2, line 55 - col. 3, line 25;  “The fulfillment center 230 may also include one or more predefined two-dimensional or three-dimensional storage areas including facilities for accommodating items and/or containers of such items, such as aisles, rows, bays, shelves, slots, bins, racks, tiers, bars, hooks, cubbies or other like storage means, or any other appropriate regions or stations” col. 14, lines 35-60 [interpreted as layout information], 
    PNG
    media_image2.png
    462
    660
    media_image2.png
    Greyscale
see Fig 8A and col. 23, lines 40-65: shelves 860-5-860-7 and cart 860-4 form “room layout information”), the one or more objects are classified into any one of a first object that is movable (cart 860-4), Fig 8A and col. 23, lines 40-65), a second object that is a plumbing facility (kitchen appliances, col. 7, lines 25-55), and a third object that is a structure of the building (home or living environments (e.g., yards, living rooms or apartment buildings), retail establishments (e.g., large department stores), transportation facilities (e.g., airports, train stations, bus stations or seaports) or large venues such as stadiums or arenas, dressing rooms, col. 7, lines 25-55), a room feature value in which classification information indicating a classification result is associated with an installation position is extracted for each of the one or more objects (“Many imaging devices also include manual or automatic features for modifying their respective fields of view or orientations. For example, a digital camera may be configured in a fixed position, or with a fixed focal length (e.g., fixed-focus lenses) or angular orientation. Alternatively, an imaging device may include one or more motorized features for adjusting a position of the imaging device,” col. 5, line 60 - col. 6, line 5 [interpreted as position information], “Detecting tools used by workers in the performance of such tasks, or the items or containers themselves, and tracking their changes in state, which may include but are not limited to linear, translational or rotational motion, may be used to discern which of the actions or activities is being performed by each of the workers. For example, the vertical and/or horizontal changes in position of a tool, as well as changes in rotation or angular alignment of the tool, may be tracked and associated with one or more actions or activities involving the tool”, col. 9, line 55 - col. 10, line 25 [interpreted as position information], “The imaging device 240 may capture one or more still or moving images, as well as any relevant audio signals or other information, within one or more designated locations within the fulfillment center 230,” col. 13, lines 20-45 [interpreted as position information] based on the presence or absence of objects within an environment, and the states or changes in states of the objects, e.g., changes in position or orientation of the objects caused by one or more types of motion, as determined from imaging data captured from the environment, col. 22, lines 1-15 [position],  see Fig 8A and col. 23, lines 40-65: shelves 860-5-860-7 and cart 860-4 can be moved), and a behavior selection table is generated based on the room feature value, the behavior selection table shows one or more spaces of the building that are associated with the corresponding one or more target behaviors, and the processor acquires the behavior selection table as the target behavior information (Once the contextual cues within a scene of an environment, e.g., not only the objects recognized as present therein but also the states or changes in states of such objects, have been identified, the contextual cues may be leveraged in order to narrow a set of possible actions or activities that may have occurred within the scene and been captured within the imaging data, thereby facilitating the process by which a predicted action or activity is identified, col. 2, lines 40-55, According to the systems and methods of the present disclosure, a set of possible actions 155 that may be performed by an actor within an environment and which may be detected and classified using imaging data (e.g., one or more still or moving images) captured from the environment may be narrowed based on a context associated with the environment that may be determined by identifying one or more objects or entities that are present therein identified therein, col. 3, lines 40-60, col. 9, lines 25-55).

Mishra, Niu et al., and Cai et al. do not disclose using a table.

Tanaka teaches a behavior selection table is generated based on the room feature value, the behavior selection table shows one or more spaces of the building that are associated with the corresponding one or more target behaviors, and the processor acquires the behavior selection table as the target behavior information (The action DB 5c stores possible action options in association with each scene. The action determining section determines, as the action to be carried out by the terminal 9, an action in accordance with a scene inferred by the scene inferring section 24. For example, in a case where (i) the scene inferring section 24 infers that the scene “living room” is the scene in which the terminal 9 is located and (ii) the action DB 5c stores possible action options as shown in Table 6 in association with the scene “living room”, the action determining section determines the action to be carried out by the terminal 9 by selecting one possible action option from among options such as dancing and singing, [0058], For example, in a case where (i) the scene inferring section 24 has inferred that the scene “living room” is the scene in which the terminal 9 is located and (ii) the action DB 5c stores the possible action options shown in Table 8 in association with the scene “living room”, [0070]

    PNG
    media_image8.png
    104
    266
    media_image8.png
    Greyscale
).

Mishra and Tanaka are in the same art of scene detection (Mishra, abstract; Tanaka, abstract). The combination of Tanaka with Mishra will enable using a table to present scene data. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the table of Tanaka with the invention of Mishra as this was known at the time of filing, the combination would have predictable results, and as one of a limited number of ways to organize data, would have been obvious to try and a design choice as a table is easily readable and comprehensible to a human.

Regarding claim 8, Mishra, Niu et al., and Cai et al. and Tanaka disclose the behavior recognition device according to claim 7. Mishra and Tanaka further indicates an installation support memory that is communicably connected to a display terminal, and that is configured to store a name of a space of the building in which the image sensor is installed using the display terminal, and output installation guidance to the display terminal, the installation guidance being for installing the image sensor with a field of view including a specific device or a specific facility related to the space (Mishra, Many imaging devices also include manual or automatic features for modifying their respective fields of view or orientations. For example, a digital camera may be configured in a fixed position, or with a fixed focal length (e.g., fixed-focus lenses) or angular orientation. Alternatively, an imaging device may include one or more motorized features for adjusting a position of the imaging device, or for adjusting either the focal length (e.g., zooming the imaging device) or the angular orientation (e.g., the roll angle, the pitch angle or the yaw angle), by causing a change in the distance between the sensor and the lens (e.g., optical zoom lenses or digital zoom lenses), a change in the location of the imaging device, or a change in one or more of the angles defining the angular orientation, col. 5, line 60 - col. 6, line 5, Additionally, where machine vision systems or methods are provided in large, dynamic environments, including but not limited to fulfillment centers, such as the working environment 130 of FIG. 1A, and also home or living environments (e.g., yards, living rooms or apartment buildings), retail establishments (e.g., large department stores), transportation facilities (e.g., airports, train stations, bus stations or seaports) or large venues such as stadiums or arenas, the difficulty in detecting and recognizing objects may be heightened due to the scale of such systems or methods, and the computer processing power and capacity that may be required in order to operate them, as well as complexities in the layouts or configurations of their respective scenes, col. 7, lines 25-55, The imaging device 240 may comprise any form of optical recording device that may be used to photograph or otherwise record images of structures, facilities or other elements within the fulfillment center 230, as well as the items within the fulfillment center 230, col. 13, lines 20-45; Tanaka, The scene inferring section 24 refers to the scene DB 5b and infers a scene in which the specific mobile body device is located, based on a combination of names, IDs, or the like of objects identified by the image recognition section 23 (scene inference processing). As used herein, the term “scene in which the mobile body device is located” refers to the type of the location in which the mobile body device is situated. For example, in a case where the mobile body device is in a house, the scene may be a room such as a living room, a bedroom, a kitchen, etc. The scene DB 5b stores, in advance, association information which indicates an association between each object and each scene. More specifically, for example, the scene DB 5b stores in advance the names, IDs, or the like of objects which are likely to exist in each scene, in association with that scene, [0026], For example, in a case where (i) the scene inferring section 24 has inferred that the scene “living room” is the scene in which the terminal 9 is located and (ii) the action DB 5c stores the possible action options shown in Table 8 in association with the scene “living room”, [0070]).

Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mishra (US 9305216 B1) and Niu et al. (“AtLAS: An Activ
Read full office action
Prosecution Timeline

Dec 27, 2022
Application Filed
Mar 13, 2023
Response after Non-Final Action
May 09, 2025
Non-Final Rejection — §103
Aug 12, 2025
Response Filed
Oct 05, 2025
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/742,463
Patent 12602775
INTERPOLATION OF MEDICAL IMAGES
2y 5m to grant Granted Apr 14, 2026
17/855,522
Patent 12602793
Systems and Methods for Predicting Object Location Within Images and for Analyzing the Images in the Predicted Location for Object Tracking
2y 5m to grant Granted Apr 14, 2026
18/335,046
Patent 12602949
SYSTEM AND METHOD FOR DETECTING HUMAN PRESENCE BASED ON DEPTH SENSING AND INERTIAL MEASUREMENT
2y 5m to grant Granted Apr 14, 2026
17/964,716
Patent 12597261
OBJECT MOVEMENT BEHAVIOR LEARNING
2y 5m to grant Granted Apr 07, 2026
18/346,894
Patent 12597244
METHOD AND DEVICE FOR IMPROVING OBJECT RECOGNITION RATE OF SELF-DRIVING CAR
2y 5m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
76%
Grant Probability
98%
With Interview (+21.6%)
3y 1m
Median Time to Grant
Moderate
PTA Risk
Based on 863 resolved cases by this examiner. Grant probability derived from career allow rate.
BEHAVIOR RECOGNITION DEVICE, BEHAVIOR RECOGNITION METHOD, AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email