DETAILED ACTION
Response to Amendment
Claims 1-20 are pending. Claims 1-20 are amended directly or by dependency on an amended claim.
Response to Arguments
Applicant’s arguments filed 27 March, 2026 with respect to the 35 USC 103 rejections of claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on the combination of references including the new primary reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. It is noted the secondary reference on claim 18 was also changed to more closely align with the new primary reference in view of amendments.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1, 7, 8, 10, 12, 13, 15-17 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Croxford et al. (US 20190392194 A1).
Regarding claims 1 and 10, Croxford et al. disclose a method of filtering objects of interest of images captured at a computing device, the method comprising and computing device for filtering objects of interest of images comprising: an image capturing device; memory configured to store objects of interest; and a processor configured to, for an image captured by the image capturing device ([0052]-[0055]): for a captured image: determining, in a secure domain, one or more regions of interest in the captured image based on one or more objects of interest (The first sensed data is processed, using the neural network in the secure environment, to identify an object in the first sensed data. The method includes determining that the identified object belongs to a predetermined class of objects. In response to the determining, a first portion of the first sensed data is classified as data to be secured, and a second portion of the first sensed data is classified as data which is not to be secured, abstract, “At item 104, and in response to the determining that the identified object belongs to a predetermined class of objects, a first portion of the first image data is classified as image data to be secured with at least one security feature, and a second portion of the first image data is classified as image data to which said at least one security feature is not to be applied. For example, in the example in which the predetermined class of objects is ‘credit cards’ a first portion of the first image data, corresponding the identified credit card in the image, may be classified as image data to be secured” [0020]), wherein the secure domain is separate from a non-secure domain comprising an operating system of the computing device (For example, components in the secure environment may have access to certain storage, e.g. secure or “protected” memory regions, which are not accessible to components and systems outside of the secure environment. For example, components and devices performing non-secure operations, e.g. non-secure image processing operations, should be prevented from accessing any storage, e.g. region of memory, which is defined as being secure, Certain storage, e.g. non-secure storage, may exist outside of the secure environment, to which components outside of the secure environment, e.g. in a non-secure environment, may access, For example, a storage device may be divided into a secure region and a non-secure region [0016]); modifying, in the secure domain, the captured image to form a modified image based on the determined one or more regions of interest (Obfuscating the first portion, e.g. in the modified version of the first image data, may involve at least one of blurring, obscuring, and redacting the first portion for example, [0027], For example, in the case where the predetermined class of objects corresponds to children's faces, the second image data output by the object identification system may include an obfuscation of the image data corresponding to a location of a child's face in the image, [0028]), wherein an unmodified version of the captured image is inaccessible to the non-secure domain (The secure environment may be implemented on the object identification system using the TrustZone® technology developed by Arm Limited of Cambridge, UK for example, which provides mechanisms for enforcing security boundaries in a data processing apparatus such as an image processing system. In essence, components within the secure environment (or “secure domain”) are trusted within an image processing system (e.g. comprising the object identification system) and therefore are allowed access to security-sensitive data, e.g. within the image processing system, whilst components outside the secure environment (e.g. in a “non-secure domain”) are not allowed access to such security-sensitive data, For example, components in the secure environment may have access to certain storage, e.g. secure or “protected” memory regions, which are not accessible to components and systems outside of the secure environment. For example, components and devices performing non-secure operations, e.g. non-secure image processing operations, should be prevented from accessing any storage, e.g. region of memory, which is defined as being secure, [0016]); and providing, from the secure domain to the non-secure domain, the modified image for display, wherein the modified image is displayed without the one or more objects of interest being viewable (In the example above, the ‘credit card’ class may have been set as the predetermined class of objects so that the parts of an image that are determined to not contain a credit card can be released from the object identification system as non-secure data, e.g. into a non-secure domain, [0024], In this way, the first portion may be unintelligible when the second image data is displayed. This allows the second image data to be used, e.g. displayed or further processed, as non-secure data without the original first portion of the first image data (corresponding to the object belonging to the predetermined class) being released from the secure environment, Thus, it is possible allow for further use of an image comprising a given type of object, e.g. a credit card, to be made outside of the secure environment while not releasing therefrom the particular information associated with the given type of object, [0027], For example, in the case where the predetermined class of objects corresponds to children's faces, the second image data output by the object identification system may include an obfuscation of the image data corresponding to a location of a child's face in the image, [0028]).
Regarding claims 7 and 15, Croxford et al. disclose the method and device of claims 1 and 10. Croxford et al. further indicate determining the one or more regions of interest by performing inference processing, using a neural network, on the captured image (“After the training phase, the neural network 300 (which may be referred to as a trained neural network 300) may be used to detect the presence of objects of a predetermined class of objects in input images. This process may be referred to as “classification” or “inference”. Classification typically involves convolution of the kernels obtained during the training phase with image patches of the image input to the neural network 300 to generate a feature map. The feature map may then be processed using at least one fully connected layer to classify the image”, [0044]).
Regarding claims 8 and 16, Croxford et al. disclose the method and device of claims 7 and 15. Croxford et al. further indicate providing, as inputs to the neural network, image data representing the captured image and image data representing the one or more objects of interest (determining that the identified object belongs to a predetermined class of objects, abstract, [0020]); and identifying the one or more regions of interest as comprising the one or more objects of interest (“At item 104, and in response to the determining that the identified object belongs to a predetermined class of objects, a first portion of the first image data is classified as image data to be secured with at least one security feature, and a second portion of the first image data is classified as image data to which said at least one security feature is not to be applied. For example, in the example in which the predetermined class of objects is ‘credit cards’ a first portion of the first image data, corresponding the identified credit card in the image, may be classified as image data to be secured. The rest of the image, determined to not contain an object belonging to the predetermined class, e.g. a credit card, may be classified as the second portion, for example not to be secured with the at least one security feature”, [0020], As an example, the first portion of the first image data may be flagged as secure by setting a value of a security flag associated therewith, [0022], In general, the first image data may be modified to alter the appearance of the first portion in the second image data which is output as non-secure image data. For example, pixel values corresponding to the first portion may be altered to obfuscate the first portion when the second image data is displayed, e.g. on a display device, [0027]).
Regarding claim 12, Croxford et al. disclose the device of claim 10. Croxford et al. further indicate the objects of interest are stored in a secure portion of the memory which is not accessible by a non-secure operating system of the computing device (For example, components in the secure environment may have access to certain storage, e.g. secure or “protected” memory regions, which are not accessible to components and systems outside of the secure environment. For example, components and devices performing non-secure operations, e.g. non-secure image processing operations, should be prevented from accessing any storage, e.g. region of memory, which is defined as being secure, Certain storage, e.g. non-secure storage, may exist outside of the secure environment, to which components outside of the secure environment, e.g. in a non-secure environment, may access, For example, a storage device may be divided into a secure region and a non-secure region, [0016]).
Regarding claim 13, Croxford et al. disclose the device of claim 10. Croxford et al. further indicate the objects of interest comprise one or more nonviewable objects of interest, and the processor is configured to: select the one or more nonviewable objects of interest (The first sensed data is processed, using the neural network in the secure environment, to identify an object in the first sensed data. The method includes determining that the identified object belongs to a predetermined class of objects. In response to the determining, a first portion of the first sensed data is classified as data to be secured, and a second portion of the first sensed data is classified as data which is not to be secured, abstract, “At item 104, and in response to the determining that the identified object belongs to a predetermined class of objects, a first portion of the first image data is classified as image data to be secured with at least one security feature, and a second portion of the first image data is classified as image data to which said at least one security feature is not to be applied. For example, in the example in which the predetermined class of objects is ‘credit cards’ a first portion of the first image data, corresponding the identified credit card in the image, may be classified as image data to be secured” [0020]); and modify the image by preventing the one or more nonviewable objects of interest from being viewable in the image (Obfuscating the first portion, e.g. in the modified version of the first image data, may involve at least one of blurring, obscuring, and redacting the first portion for example, [0027], For example, in the case where the predetermined class of objects corresponds to children's faces, the second image data output by the object identification system may include an obfuscation of the image data corresponding to a location of a child's face in the image, [0028]).
Regarding claim 17, Croxford et al. disclose the computing device of claim 10. Croxford et al. further indicate the modified image is displayed at the display (In this way, the first portion may be unintelligible when the second image data is displayed. This allows the second image data to be used, e.g. displayed or further processed, as non-secure data without the original first portion of the first image data, [0027]).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 2-5 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Croxford et al. (US 20190392194 A1) as applied to claims 1 and 10, further in view of Nicholson et al. (US 20230306610 A1) further in view of Tang et al. (US 20220058394 A1).
Regarding claims 2 and 11, Croxford et al. disclose the method and device of claims 1 and 10. Croxford et al. do not disclose identifying an application executing on the computing device as a video conferencing application; in response to the identification of the application executing on the computing device, selecting the one or more objects of interest from a stored list of objects of interest; and determining the one or more regions of interest in the captured image based on the selected one or more objects of interest.
Nicholson et al. teach identifying an application executing on the computing device as a video conferencing application: in response to the identification of the application executing on the computing device, selecting the one or more objects of interest from a stored list of objects of interest; and determining the one or more regions of interest in the captured image based on the selected one or more objects of interest (Optionally, the one or more processors may be configured to identify the set of one or more objects to modify based on designated settings of a user profile stored on the memory or another data storage device that is operably connected to the one or more processors, [0008], In an embodiment, the controller 102 may display the annotated image 400 during a set-up or preview stage before the video feed generated by the camera 104 is remotely transmitted. For example, upon initiating a video conference application on the user computer device 202, the controller 102 may present the annotated image 400 to the user prior to joining the meeting. In another example, the user may select, via the input device 108, to update user settings. In response to the selection to update user settings, the controller 102 may display the annotated image 400 to enable the user to set the status of the objects 310. Once the statuses for the objects 310 are set, the controller 102 may save the object statuses as a user profile. For a subsequent streaming event (e.g., video conference call), the user may simply select the user profile from a list of profiles, and the controller 102 accesses the object statues from the user database 118 of the memory 114. The user may set multiple different profiles which have different objects 310 modified and/or the same objects 310 modified but in a different way. For example, the user may set one profile for work video conference meetings, a second profile for video conferences with extended family, and a third profile for video conferences with friends. The image alteration system 100 enables the user to curate the particular objects 310 shown in the background environment 304 for each of the different types of video streaming events. The profiles represent short cuts that allows the user to quickly access a pre-selected ensemble of object appearance changes, without setting individual object statuses. For example, if the controller 102 receives a command to implement a specific user profile, then after segmenting the input image data, the controller 102 identifies the set of one or more objects 310 to modify based on the designated settings in the selected user profile., [0046], The stock objects in the library 120 may be categorized based on type, size, and/or the like. The user may select the stock object to replace the given object 310 via the input device 108, [0054], The set of objects 310 that are modified may differ for different audiences of the streaming event. For example, a “work” profile may replace a family picture with a stock inspirational poster, may replace family photo albums with stock images of textbooks, and/or the like, [0056], Optionally, the set of objects to modify may be based on designated settings of a user profile and/or user input selections generated via an input device 108. The user profile may be stored in a memory device, such as the memory 114 of the controller 102, [0061], The image alteration system and method described herein allows the user to create a physical background that is customized for the audience of the video streaming event, such as a video conference call. The customization may be integrated into the existing background without universally blurring or replacing the background, [0065]).
Croxford et al. and Nicholson et al. are in the same art of object identification (Croxford et al., abstract; Nicholson et al., [0008]). The combination of Nicholson et al. with Croxford et al. will allow for identifying an application executing on the computing device as a video conferencing application. It would have been obvious at the time of filing to combine the identifying of Nicholson et al. with the invention of Croxford et al. as this was known at the time of filing, the combination would have predictable results, and as Nicholson et al. state, “The image alteration system and method described herein allows the user to create a physical background that is customized for the audience of the video streaming event, such as a video conference call. The customization may be integrated into the existing background without universally blurring or replacing the background” ([0065]) indicating a customization improvement that would result from combining inventions.
It would have been obvious at the time of filing to one of ordinary skill in the art the reference is “selecting the one or more objects of interest from a stored list of objects of interest”, as there are stored profiles with which objects to replace, and replacement objects are stored in a library, therefore together these teach the limitation. It would have been obvious at the time of filing to one of ordinary skill in the art the reference is “determining the one or more regions of interest in the captured image based on the selected one or more objects of interest”, as the reference teaches determining which objects to replace, and the replacement area shape is based on the object that is replaced. Another reference is added however for more explicit teaching of these limitations.
Tang et al. teach in response to the identification of the application executing on the computing device, selecting the one or more objects of interest from a stored list of objects of interest (In some embodiments, the cloud service 202 may be configured to provide resources such as training data and/or a database of feature maps (e.g., feature maps of recognized objects that may be used as a basis to perform object recognition and/or classification), [0118], In some embodiments, the settings may be stored in the cloud service 202 as the event settings storage 210 (e.g., using a secured account). The signal PREFS may comprise the objects and/or events of interest selected by the user. In one example, the signal PREFS may enable the user to select people and animals as the objects and/or events of interest, [0123]) and determining the one or more regions of interest in the captured image based on the selected one or more objects of interest (In the example shown, the distortion effect 392 may have a circular shape. In some embodiments, the distortion effect 392 may be intelligently selected to have a shape that corresponds to the shape of the body part and/or face that may obscured. For example, the processor 102 may be configured to identify the shape of the face 378 based on the characteristics of the pixels (e.g., an arrangement of similar colors) to determine the shape for the distortion effect 392. In another example, the processor 102 may be configured to apply the distortion effect 392 to a randomly selected area around the face 378 (or other body parts) to help conceal identifying features of the family member 354 (e.g., conceal a body shape or clothes worn that might be used to identify the family member 354). In one example, the distortion effect 392 may be a mask (e.g., a colored mask overlaid on top of the face of the family member). In another example, the distortion effect 392 may be a blur effect. In yet another example, the distortion effect 392 may be a mosaic effect. In still another example, the distortion effect 392 may comprise cropping and/or removing pixels (e.g., replacing with null data or random data). In yet another example, the distortion effect 392 may comprise replacing the face 378 with an alternate graphic. The type of the distortion effect 392 applied may be varied according to the design criteria of a particular implementation, [0205]).
Croxford et al. and Nicholson et al. and Tang et al. are in the same art of object identification (Croxford et al., abstract; Nicholson et al., [0008]; Tang et al., [0118]). The combination of Tang et al. with Croxford et al. and Nicholson et al. will allow for determining the one or more regions of interest in the captured image based on the selected one or more objects of interest. It would have been obvious at the time of filing to combine the determining of Tang et al. with the invention of Croxford et al. and Nicholson et al. as this was known at the time of filing, the combination would have predictable results, and as Tang et al. indicate, “It would be desirable to implement a person-of-interest centric timelapse video with AI input on home security camera to protect privacy” ([0006]), “The edge AI home security device/camera may be configured to implement artificial intelligence (AI) technology. Using AI technology, the edge AI camera may be a more powerful (e.g., by providing relevant data for the user) and a more power efficient solution than using a cloud server in many aspects” ([0026]), “Implementing various functionality of the processor 102 using the dedicated hardware modules 190a-190n may enable the processor 102 to be highly optimized and/or customized to limit power consumption, reduce heat generation and/or increase processing speed compared to software implementations” ([0099]) thereby providing privacy, efficiency, and customizability advantages to the combination of inventions.
Regarding claim 3, Croxford et al. and Nicholson et al. and Tang et al. disclose the method of claim 2. Tang et al. further teach the stored list of objects of interest are stored in a secure portion of memory which is not accessible by the operating system of the computing device (“In an example, the companion app implemented on the remote devices 204a-204n may enable the end users to adjust various settings for the camera systems 100a-100n and/or the video captured by the camera systems 100a-100n. In some embodiments, the settings may be stored in the cloud service 202 as part of the event settings storage 210 (e.g., using a secured account). However, in some embodiments, to ensure privacy protection, the settings of the signal IPREFS may instead avoid communication to/from the cloud service 202. For example, a direct connection and/or a communication that does not transfer data to the cloud service 202 may be established between one or more of the remote devices 204a-204n and the edge AI camera 100i. The signal IPREFS may comprise the faces and/or identities of various people that may be selected by the user. The signal IPREFS may enable the user to select people (e.g., faces) as privacy events. In one example, the signal IPREFS may enable the user to select people (e.g., faces) to enable the processor 102 to distinguish between people that are considered privacy events and people that are not considered privacy events. Generally, the data from the signal IPREFS may not be stored in the cloud services 202”, [0132], For example, there may be no concern of leaking family privacy information (e.g., video and/or images of family members and/or the behavior of family members) because the faces of the family members may be enrolled locally using the app on the remote devices 204a-204n and the feature set IFEAT generated from the enrolled faces may be sent via a local network rather than through the cloud service 202. The data about the events and/or objects of interest may be routed through the cloud service 202, but the family privacy information may never be uploaded to the cloud service 202., [0135]).
Regarding claim 4, Croxford et al. and Nicholson et al. and Tang et al. disclose the method of claim 2. Nicholson et al. and Tang et al. further indicate the stored list of objects of interest comprise one or more nonviewable objects of interest, and the captured image is modified to prevent the one or more nonviewable objects of interest, in the stored list of objects of interest, from being viewable in the captured image (Nicholson et al., Thus, the system enables a user to select, on an object-by-object basis, which objects the user would like to conceal from the outgoing image data without requiring the user to blur or replace the entire background environment with a stock background. The image data produced by the system may resemble the user's actual room or space except for the select objects modified. The system can successfully conceal personal and private aspects visible in the background of a camera view, without substantially changing the image aesthetics, [0022], The first user may modify objects that are personal or private by choosing to blur, remove, or replace those objects., [0023], After the segmentation of the input image data, the controller 102 may identify a set of one or more of the objects 310 in the background environment 304 to modify. For example, some of the objects 310 may be deemed by the user as too personal or private, or simply not appropriate for the tenor of the video conference call. The set of objects 310 may be identified in order to conceal those objects 310 from view by persons that view the video stream showing the room 308., [0043], The profiles represent short cuts that allows the user to quickly access a pre-selected ensemble of object appearance changes, without setting individual object statuses. For example, if the controller 102 receives a command to implement a specific user profile, then after segmenting the input image data, the controller 102 identifies the set of one or more objects 310 to modify based on the designated settings in the selected user profile., [0046], The set of objects 310 that are modified may differ for different audiences of the streaming event. For example, a “work” profile may replace a family picture with a stock inspirational poster, may replace family photo albums with stock images of textbooks, and/or the like, [0056]; Tang et al., Embodiments of the present invention may be configured to protect the privacy of particular people when the smart timelapse video is generated. The video content (e.g., what appears in the smart timelapse video) may be automatically adjusted in response to the objects/events detected. Particular objects/events may be shown as captured in the smart timelapse video and other objects/events may be excluded and/or removed from the smart timelapse video stream. For example, the smart timelapse video stream may be generated to include the faces and behaviors of strangers but exclude the faces and behaviors of family members. Generally, the faces and/or behaviors excluded from the smart timelapse video stream may correspond to privacy concerns (e.g., identifying particular people, a person being uncomfortable being on video, preventing the storage of potentially embarrassing behaviors, etc.). The criteria for including or excluding video content may be varied according to the design criteria of a particular implementation, [0025], The signal IPREFS may be communicated via a local network in order to protect a privacy of people and/or faces of people that may be communicated (e.g., to generate feature set data)., [0131], The signal IPREFS may enable the user to select people (e.g., faces) as privacy events. In one example, the signal IPREFS may enable the user to select people (e.g., faces) to enable the processor 102 to distinguish between people that are considered privacy events and people that are not considered privacy events, [0132]).
Regarding claim 5, Croxford et al. and Nicholson et al. and Tang et al. disclose the method of claim 4. Nicholson et al. and Tang et al. further indicate modifying the captured image to prevent the one or more nonviewable objects of interest from being viewable comprises blurring the one or more nonviewable objects of interest, blacking out the one or more nonviewable objects of interest, or distorting the one or more nonviewable objects of interest (Nicholson et al., blur objects, [0023], [0050]; Tang et al., blur face, [0023], apply distortion, [0034], [0203]).
Claim(s) 6 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Croxford et al. (US 20190392194 A1) and Nicholson et al. (US 20230306610 A1) and Tang et al. (US 20220058394 A1) as applied to claims 4 and 13, further in view of Kurtz et al. (US 20080297587 A1).
Regarding claims 6 and 14, Croxford et al. and Nicholson et al. and Tang et al. disclose the method and device of claims 4 and 13. Nicholson et al. and Tang et al. also indicate the stored list of objects of interest further comprise one or more viewable objects of interest (Nicholson et al., the one or more processors may be configured to identify the set of one or more objects to modify based on designated settings of a user profile stored on the memory or another data storage device that is operably connected to the one or more processors, [0008], In an embodiment, the controller 102 may display the annotated image 400 during a set-up or preview stage before the video feed generated by the camera 104 is remotely transmitted. For example, upon initiating a video conference application on the user computer device 202, the controller 102 may present the annotated image 400 to the user prior to joining the meeting. In another example, the user may select, via the input device 108, to update user settings. In response to the selection to update user settings, the controller 102 may display the annotated image 400 to enable the user to set the status of the objects 310. Once the statuses for the objects 310 are set, the controller 102 may save the object statuses as a user profile. For a subsequent streaming event (e.g., video conference call), the user may simply select the user profile from a list of profiles, and the controller 102 accesses the object statues from the user database 118 of the memory 114. The user may set multiple different profiles which have different objects 310 modified and/or the same objects 310 modified but in a different way. For example, the user may set one profile for work video conference meetings, a second profile for video conferences with extended family, and a third profile for video conferences with friends. The image alteration system 100 enables the user to curate the particular objects 310 shown in the background environment 304 for each of the different types of video streaming events. The profiles represent short cuts that allows the user to quickly access a pre-selected ensemble of object appearance changes, without setting individual object statuses. For example, if the controller 102 receives a command to implement a specific user profile, then after segmenting the input image data, the controller 102 identifies the set of one or more objects 310 to modify based on the designated settings in the selected user profile., [0046], The stock objects in the library 120 may be categorized based on type, size, and/or the like. The user may select the stock object to replace the given object 310 via the input device 108, [0054], The set of objects 310 that are modified may differ for different audiences of the streaming event. For example, a “work” profile may replace a family picture with a stock inspirational poster, may replace family photo albums with stock images of textbooks, and/or the like, [0056], Optionally, the set of objects to modify may be based on designated settings of a user profile and/or user input selections generated via an input device 108. The user profile may be stored in a memory device, such as the memory 114 of the controller 102, [0061], The image alteration system and method described herein allows the user to create a physical background that is customized for the audience of the video streaming event, such as a video conference call. The customization may be integrated into the existing background without universally blurring or replacing the background, [0065]; Tang et al., In some embodiments, the cloud service 202 may be configured to provide resources such as training data and/or a database of feature maps (e.g., feature maps of recognized objects that may be used as a basis to perform object recognition and/or classification), [0118], In some embodiments, the settings may be stored in the cloud service 202 as the event settings storage 210 (e.g., using a secured account). The signal PREFS may comprise the objects and/or events of interest selected by the user. In one example, the signal PREFS may enable the user to select people and animals as the objects and/or events of interest, [0123]).
Croxford et al. and Nicholson et al. and Tang et al. do not explicitly disclose and the captured image is modified to prevent the one or more nonviewable objects of interest from being viewable in the captured image by cropping the captured image to include the one or more viewable objects of interest without the one or more nonviewable objects of interest.
Kurtz et al. teach the captured image is modified to prevent the one or more nonviewable objects of interest from being viewable in the captured image by cropping the captured image to include the one or more viewable objects of interest without the one or more nonviewable objects of interest (Although users 10 may define image areas 422 for exclusion from video capture for various reasons, maintenance of personal or family privacy is likely the key motivator. As shown in FIG. 4A, an image capture device 120 (the WFOV camera) has a portion of its image field of view 420, indicated by image area 422, modified, for example, by cropping image area 422 out of the captured image before image transmission across network 360 to a remote site 364. The local user 10 can utilize the privacy interface 400 and the contextual interface 450 to establish human perceptible modifications to a privacy sensitive image area 422, [0066], For example, a privacy sensitive image area 422 may simply be cropped out of the captured images. Alternately, an image area 422 can be modified or obscured with other visual effects, such as distorting, blurring (lowering resolution), or shading (reducing brightness or contrast). For example, the shading can be applied as a gradient, to simulate a natural illumination fall-off. Device supplied scene analysis rules can be used to recommend obscuration effects, [0067], As another circumstance typical of the residential setting, it can be anticipated that children or pets or neighbors can wander into the capture field of view during a communication event. In particular, in such environments, it is not uncommon to have unclothed children wandering about the residence in unpresentable forms of attire. The contextual interface 450 can quickly recognize this and direct the image processor 320 to blur or crop out imagery of privacy sensitive areas. Indeed, the default settings in the privacy interface 400 may require such blurring or cropping, [0084]).
Nicholson et al. and Kurtz et al. are in the same art of video-conferencing (Nicholson et al., [0001]; Kurtz et al., [0043]). The combination of Kurtz et al. with Croxford et al. and Nicholson et al. and Tang et al. will allow for cropping the image. It would have been obvious at the time of filing to combine the cropping of Kurtz et al. with the invention of Croxford et al. and Nicholson et al. and Tang et al. as this was known at the time of filing, the combination would have predictable results, and as Kurtz et al. indicate, “This video communication system is particularly intended for use in the residential environment, where a variety of factors, such as variable conditions and participants, ease of use, privacy concerns, and system cost, are highly relevant” ([0003]) and “maintenance of personal or family privacy is likely the key motivator” ([0066]) this provides a privacy motivation to the combination of inventions.
Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Croxford et al. (US 20190392194 A1) as applied to claim 1, further in view of Elron et al. (US 20220092400 A1).
Regarding claim 9, Croxford et al. disclose the method of claim 1. Croxford et al. do not explicitly disclose identifying the one or more regions of interest as comprising the one or more objects of interest using a neural network, trained prior to runtime, to recognize the one or more objects of interest.
Elron et al. teach identifying the one or more regions of interest as comprising the one or more objects of interest using a neural network, trained prior to runtime, to recognize the one or more objects of interest (A foreground-background classification (or segmentation) was obtained using a convolutional neural network (CNN). An image 104 shows the difference in classifications (between background and foreground) between the two consecutive frames 100 and 102. Those light areas that are non-zero (in difference indicating motion) in image 104 are a very small part of the image and indicate the noticeable differences between the two frames. An image 106 shows a shaded overlay 108 indicating the locations on the frame that a temporal predictor of the present method disabled turned off a main CNN by omitting layer operations for this area of the frame, [0025], neural network inferencing, [0049], “As a preliminary matter, process 600 may include “train neural networks” 602, and by one example, this is performed offline before a runtime. The main NN may be trained as by known methods and depending on the architecture and purpose of the NN. No significant changes to the training are needed to implement the main NN disclosed herein”, [0052], highly efficient neural network video image processing, semantic classifications when object segmentation is being performed, [0061]).
Nicholson et al. and Elron et al. are in the same art of video-conferencing (Nicholson et al., [0001]; Elron et al., [0027] [0117]). The combination of Elron et al. with Croxford et al. and Nicholson et al. and Tang et al. will allow for using a neural network, trained prior to runtime, to recognize the one or more objects of interest. It would have been obvious at the time of filing to combine the training of Elron et al. with the invention of Croxford et al. and Nicholson et al. and Tang et al. as this was known at the time of filing, the combination would have predictable results, and as Elron et al. indicate this will allow for highly efficient neural network video image processing, semantic classifications when object segmentation is being performed ([0061]) providing an efficiency benefit to the combination of inventions.
Claim(s) 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Croxford et al. (US 20190392194 A1) in view of Yerli (US 20240048600 A1).
Regarding claim 18, Croxford et al. disclose a computing device for filtering objects of interest of images comprising: an image capturing device; memory configured to store objects of interest; and a first processor configured to ([0052]-[0055]): for a captured image: determining, in a secure domain, one or more regions of interest in the captured image based on one or more objects of interest (The first sensed data is processed, using the neural network in the secure environment, to identify an object in the first sensed data. The method includes determining that the identified object belongs to a predetermined class of objects. In response to the determining, a first portion of the first sensed data is classified as data to be secured, and a second portion of the first sensed data is classified as data which is not to be secured, abstract, “At item 104, and in response to the determining that the identified object belongs to a predetermined class of objects, a first portion of the first image data is classified as image data to be secured with at least one security feature, and a second portion of the first image data is classified as image data to which said at least one security feature is not to be applied. For example, in the example in which the predetermined class of objects is ‘credit cards’ a first portion of the first image data, corresponding the identified credit card in the image, may be classified as image data to be secured” [0020]), wherein the secure domain is separate from a non-secure domain comprising an operating system of the computing device (For example, components in the secure environment may have access to certain storage, e.g. secure or “protected” memory regions, which are not accessible to components and systems outside of the secure environment. For example, components and devices performing non-secure operations, e.g. non-secure image processing operations, should be prevented from accessing any storage, e.g. region of memory, which is defined as being secure, Certain storage, e.g. non-secure storage, may exist outside of the secure environment, to which components outside of the secure environment, e.g. in a non-secure environment, may access, For example, a storage device may be divided into a secure region and a non-secure region [0016]); modifying, in the secure domain, the captured image to form a modified image based on the determined one or more regions of interest (Obfuscating the first portion, e.g. in the modified version of the first image data, may involve at least one of blurring, obscuring, and redacting the first portion for example, [0027], For example, in the case where the predetermined class of objects corresponds to children's faces, the second image data output by the object identification system may include an obfuscation of the image data corresponding to a location of a child's face in the image, [0028]), wherein an unmodified version of the captured image is inaccessible to the non-secure domain (The secure environment may be implemented on the object identification system using the TrustZone® technology developed by Arm Limited of Cambridge, UK for example, which provides mechanisms for enforcing security boundaries in a data processing apparatus such as an image processing system. In essence, components within the secure environment (or “secure domain”) are trusted within an image processing system (e.g. comprising the object identification system) and therefore are allowed access to security-sensitive data, e.g. within the image processing system, whilst components outside the secure environment (e.g. in a “non-secure domain”) are not allowed access to such security-sensitive data, For example, components in the secure environment may have access to certain storage, e.g. secure or “protected” memory regions, which are not accessible to components and systems outside of the secure environment. For example, components and devices performing non-secure operations, e.g. non-secure image processing operations, should be prevented from accessing any storage, e.g. region of memory, which is defined as being secure, [0016]); and providing, from the secure domain to the non-secure domain, the modified image for display, wherein the modified image is displayed without the one or more objects of interest being viewable (In the example above, the ‘credit card’ class may have been set as the predetermined class of objects so that the parts of an image that are determined to not contain a credit card can be released from the object identification system as non-secure data, e.g. into a non-secure domain, [0024], In this way, the first portion may be unintelligible when the second image data is displayed. This allows the second image data to be used, e.g. displayed or further processed, as non-secure data without the original first portion of the first image data (corresponding to the object belonging to the predetermined class) being released from the secure environment, Thus, it is possible allow for further use of an image comprising a given type of object, e.g. a credit card, to be made outside of the secure environment while not releasing therefrom the particular information associated with the given type of object, [0027], For example, in the case where the predetermined class of objects corresponds to children's faces, the second image data output by the object identification system may include an obfuscation of the image data corresponding to a location of a child's face in the image, [0028]).
Croxford et al. disclose blurring, obscuring, and redacting the first portion of the image in the secure portion of the computing device ([0027]) which could be interpreted as “in the secure domain, for the image captured by the image capturing device: convert the image for processing by the first processor”, however another reference is added to be more in line with the definition of convert given in the specification.
Yerli teaches in the secure domain, for the image captured by the image capturing device: convert the image for processing by the first processor (generating specific secure deep links, [0066], “In various embodiments, the level and ratio of usage of the client-server side 304 with respect to the P2P side 306 depend on the amount of data to be processed, the latency permitted to sustain a smooth user experience, the desired quality of service (QOS), the services required, and the like. In one embodiment, the P2P side 306 is used for video and data processing, streaming and rendering. This mode of employing the hybrid system architecture 300 may be suitable, for example, when a low latency and low amounts of data need to be processed, and when in the presence of “heavy” clients, meaning that client devices 308 comprise sufficient computing power to perform such operations. In another embodiment, a combination of the client-server side 304 and P2P side 306 is employed, such as the P2P side 306 being used for video streaming and rendering while the client-server side 304 is used for data processing. This mode of employing the hybrid system architecture 300 may be suitable, for example, when there is a high amount of data to be processed or when other micro-services may be required. In yet further embodiments, the client-server side 304 may be used for video streaming along with data processing while the P2P side 306 is used for video rendering”, [0093], “In such embodiments, where the intermediary server is a SAMS, such media server manages, analyze and processes incoming data of each sending client device 308 (including but not limited to meta-data, priority data, data classes, spatial structure data, three dimensional positional, orientation or locomotion information, image, media, scalable video codec based video), and in such analysis optimizes the forwarding of the outbound data streams to each receiving client device 308 by modifying, upscaling or downscaling the media for temporal (e.g., varying frame rate), spatial (e.g., different image size), quality (e.g., different compression or encoding based qualities) and color (e.g., color resolution and range) based on the specific receiving client device user's spatial, three dimensional orientation, distance and priority relationship to such incoming data achieving optimal bandwidths and computing resource utilizations for receiving one or more user client devices 308” [0095], In some embodiments, the media, video and data processing comprise one or more further encoding, transcoding, decoding spatial or 3D analysis and improvements comprising image filtering, computer vision processing, image sharpening, background improvements, background removal, foreground blurring, eye covering, pixilation of faces, voice-distortion, image uprezzing, image cleansing, bone structure analysis, face or head counting, object recognition, marker or QR, code-tracking, eye tracking, feature analysis, 3D mesh or volume generation, feature tracking, facial recognition, SLAM tracking and facial expression recognition or other modular plugins in form of micro-services running on such media router or servers, [0096] “In some embodiments, the media, video and data processing comprise one or more further encoding, transcoding, decoding spatial or 3D analysis and improvements comprising image filtering, computer vision processing, image sharpening, background improvements, background removal, foreground blurring, eye covering, pixilation of faces, voice-distortion, image uprezzing, image cleansing, bone structure analysis, face or head counting, object recognition, marker or QR, code-tracking, eye tracking, feature analysis, 3D mesh or volume generation, feature tracking, facial recognition, SLAM tracking and facial expression recognition or other modular plugins in form of micro-services running on such media router or servers.” [0096] secure end-to-end communication between the client device 308 and web/application servers 312 over a network, [0097]) [indicates the various image processing such as blurring can be performed on the client-server side or the P2P side as desired].
Croxford et al. and Yerli are in the same art of object identification (Croxford et al., abstract; Yerli, [0008]). The combination of Yerli with Croxford et al. will allow for converting the image for processing by the first processor. It would have been obvious at the time of filing to combine the conversion of Yerli with the invention of Croxford et al. as this was known at the time of filing, the combination would have predictable results, and as Yerli indicates “Providing the entitlements to each of the meeting slots 208 and providing a deep link that directs a participant directly to the corresponding meeting slot 208 comprising the entitlements 210, enables increased session security. In situations where the deep link is unique to the meeting slot 208 and is renewed after each session, this decreases the chances of the link being “leaked” or otherwise obtained by an unauthorized user” ([0086]) indicating an added security benefit when the inventions are combined.
Regarding claim 19, Croxford et al. and Yerli disclose the computing device of claim 18. Croxford et al. further indicate the first processor is an inference processing unit configured to determine the one or more regions of interest by performing inference processing on the image using a neural network (“After the training phase, the neural network 300 (which may be referred to as a trained neural network 300) may be used to detect the presence of objects of a predetermined class of objects in input images. This process may be referred to as “classification” or “inference”. Classification typically involves convolution of the kernels obtained during the training phase with image patches of the image input to the neural network 300 to generate a feature map. The feature map may then be processed using at least one fully connected layer to classify the image”, [0044]), and the second processor is an image signal processor, in the secure domain separate from the non-secure domain comprising the operating system of the computing device (For example, components in the secure environment may have access to certain storage, e.g. secure or “protected” memory regions, which are not accessible to components and systems outside of the secure environment. For example, components and devices performing non-secure operations, e.g. non-secure image processing operations, should be prevented from accessing any storage, e.g. region of memory, which is defined as being secure, Certain storage, e.g. non-secure storage, may exist outside of the secure environment, to which components outside of the secure environment, e.g. in a non-secure environment, may access, For example, a storage device may be divided into a secure region and a non-secure region [0016]).
Regarding claim 20, Croxford et al. and Yerli disclose the computing device of claim 18. Croxford et al. further indicate portions of the memory accessed by the first processor and the second processor are secure portions of memory which are not accessible by a non-secure operating system of the computing device (The secure environment may be implemented on the object identification system using the TrustZone® technology developed by Arm Limited of Cambridge, UK for example, which provides mechanisms for enforcing security boundaries in a data processing apparatus such as an image processing system. In essence, components within the secure environment (or “secure domain”) are trusted within an image processing system (e.g. comprising the object identification system) and therefore are allowed access to security-sensitive data, e.g. within the image processing system, whilst components outside the secure environment (e.g. in a “non-secure domain”) are not allowed access to such security-sensitive data, For example, components in the secure environment may have access to certain storage, e.g. secure or “protected” memory regions, which are not accessible to components and systems outside of the secure environment. For example, components and devices performing non-secure operations, e.g. non-secure image processing operations, should be prevented from accessing any storage, e.g. region of memory, which is defined as being secure, [0016]).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084. The examiner can normally be reached 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent M Rudolph can be reached at (571) 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHELLE M ENTEZARI HAUSMANN/Primary Examiner, Art Unit 2671