DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of the Claims
Claims 1-9 are currently pending in the present application, with claims 1 and 7 being independent. Claims 10-22 have been cancelled.
Election/Restrictions
Applicant’s election without traverse of Group I in the reply filed on 25 February 2026 is acknowledged. Applicant has cancelled the claims corresponding to the non-elected groups.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1, 4-7, and 9 is/are rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Smit (US PG Publication 2018/0164876).
Regarding claim 1, Smit teaches a system for remote viewing (see for instance, paragraphs 6, 63-65 and figs. 1 and 2), the system comprising: memory configured to store computer-executable instructions; and a hardware processor in communication with the memory, wherein the computer-executable instructions, when executed by the hardware processor (see for instance, paragraphs 61-65 and 70), cause the hardware processor to:
obtain a sequence of images of a location, wherein the sequence of images is captured by at least one of a plurality of cameras positioned at one or more positions in the location (a plurality of cameras at different viewpoints, the cameras arranged at a location within an environment, see for instance, paragraphs 7, 20, 63, and fig. 1. Images are obtained of the location, see for instance, paragraphs 31, 61, 64, and 68);
generate a virtual space, wherein the virtual space is a virtual representation of the location (A virtual representation of the location is generated, see for instance, paragraphs 63, 68, 70, and 93);
determine at least one of a position, a direction of travel, or a speed of travel of a remote user within the virtual representation of the location based on one or more measurements obtained from a sensory input (see for instance, paragraphs 18, 66, 67, and 69. The user is able to control their point of view in the remote world by rotating a certain degree thereby moving along a trajectory defined by the positions of the cameras, see for instance, paragraph 18);
select a subset of cameras from the plurality of cameras positioned at one or more positions in the location based on at least one of the position, the direction of travel, or the speed of travel of the user within the virtual representation of the location (At a certain moment in time, if the received orientation information changes, the processing system will switch from the currently selected cameras 11L and 11R to one or two newly selected cameras, such as 12L and 12R – this will give the user the feeling that they are in fact moving in the viewed space, see for instance, paragraph 69); and
cause a user device to display one or more images in the sequence of images captured by the subset of cameras in an order based on the position, the direction of travel, and the speed of travel of the user within the virtual representation of the location (At a certain moment in time, if the received orientation information changes, the processing system will switch from the currently selected cameras 11L and 11R to one or two newly selected cameras, such as 12L and 12R – this will give the user the feeling that they are in fact moving in the viewed space, see for instance, paragraph 69. The images from the selected cameras are displayed, see for instance, paragraphs 10 and 68).
Regarding claim 4, Smit teaches system of claim 1 and further teaches wherein the computer-executable instructions, when executed, further cause the hardware processor to: determine a location of a first camera in the subset of cameras that captured a first image in the sequence of images that is displayed by the user device (A sensor, such as a torso sensor or as part of the device, can be used to select specific cameras, see for instance, paragraphs 71 and 81);
determine a distance from the first camera based on at least one of the direction of travel of the user or the speed of travel of the user; determine that a second camera in the subset of cameras is located at a distance from the location of the first camera that matches the determined distance; and cause the user device to display a second image in the sequence of images captured by the second camera subsequent to the user device displaying the first image (At a certain moment in time, if the received orientation information changes, the processing system will switch from the currently selected cameras 11L and 11R to one or two newly selected cameras, such as 12L and 12R – this will give the user the feeling that they are in fact moving in the viewed space, see for instance, paragraph 69. The images from the selected cameras are displayed, see for instance, paragraphs 10 and 68. When the user rotates his torso from direction 30 to direction 31, the processing system will switch from camera 11L, 11R to cameras 12L,12R...when the HMD screens are ready to show the next frame, at that moment it is decided/calculated what cameras to use images from, see for instance, paragraph 71).
Regarding claim 5, Smit teaches the system of claim 1 and further teaches wherein the sensory input comprises one of a touch input, a haptic input, a gesture input, a wearable input, or a voice input provided to the user device (The sensory input may be for a wearable input, see for instance, paragraphs 68, 82 and fig. 2).
Regarding claim 6, Smit teaches the system of claim 1 and further teaches wherein the computer-executable instructions, when executed, further cause the hardware processor to determine at least one of an updated position, an updated direction of travel, or an updated speed of a travel of the user within the virtual representation of the location based on one or more second measurements obtained from the sensory input and generated subsequent to the one or more measurements (At a certain moment in time, if the received orientation information changes, the processing system will switch from the currently selected cameras 11L and 11R to one or two newly selected cameras, such as 12L and 12R – this will give the user the feeling that they are in fact moving in the viewed space, see for instance, paragraph 69. If newly received orientation information indicates an orientation change across a predetermined threshold, switch from currently selected cameras to one or two newly selected cameras from other viewpoints, see for instance, paragraph 12).
Regarding claim 7, Smit teaches a non-transitory, computer-readable medium storing computer-executable instructions for remote viewing, wherein the computer-executable instructions (see for instance, paragraphs 6, 61-65, 70, and 98 and figs. 1 and 2), when executed, cause a computing system to:
obtain a sequence of images of a location, wherein the sequence of images is captured by at least one of a plurality of cameras positioned at one or more positions in the location; generate a virtual space, wherein the virtual space is a virtual representation of the location (A virtual representation of the location is generated, see for instance, paragraphs 63, 68, 70, and 93);
determine at least one of a position, a direction of travel, or a speed of travel of a remote user within the virtual representation of the location based on one or more measurements obtained from a sensory input (see for instance, paragraphs 18, 66, 67, and 69. The user is able to control their point of view in the remote world by rotating a certain degree thereby moving along a trajectory defined by the positions of the cameras, see for instance, paragraph 18);
select a subset of cameras from the plurality of cameras positioned at one or more positions in the location based on at least one of the position, the direction of travel, or the speed of travel of the user within the virtual representation of the location (At a certain moment in time, if the received orientation information changes, the processing system will switch from the currently selected cameras 11L and 11R to one or two newly selected cameras, such as 12L and 12R – this will give the user the feeling that they are in fact moving in the viewed space, see for instance, paragraph 69); and
cause a user device to display one or more images in the sequence of images captured by the subset of cameras in an order based on the position, the direction of travel, and the speed of travel of the user within the virtual representation of the location (At a certain moment in time, if the received orientation information changes, the processing system will switch from the currently selected cameras 11L and 11R to one or two newly selected cameras, such as 12L and 12R – this will give the user the feeling that they are in fact moving in the viewed space, see for instance, paragraph 69. The images from the selected cameras are displayed, see for instance, paragraphs 10 and 68).
Regarding claim 9, Smit teaches the non-transitory, computer-readable medium of claim 7, and further teaches wherein the computer-executable instructions, when executed, further cause the computing system to: determine a location of a first camera in the subset of cameras that captured a first image in the sequence of images that is displayed by the user device (A sensor, such as a torso sensor or as part of the device, can be used to select specific cameras, see for instance, paragraphs 71 and 81);
determine a distance from the first camera based on at least one of the direction of travel of the user or the speed of travel of the user; determine that a second camera in the subset of cameras is located at a distance from the location of the first camera that matches the determined distance; and cause the user device to display a second image in the sequence of images captured by the second camera subsequent to the user device displaying the first image (At a certain moment in time, if the received orientation information changes, the processing system will switch from the currently selected cameras 11L and 11R to one or two newly selected cameras, such as 12L and 12R – this will give the user the feeling that they are in fact moving in the viewed space, see for instance, paragraph 69. The images from the selected cameras are displayed, see for instance, paragraphs 10 and 68. When the user rotates his torso from direction 30 to direction 31, the processing system will switch from camera 11L, 11R to cameras 12L,12R...when the HMD screens are ready to show the next frame, at that moment it is decided/calculated what cameras to use images from, see for instance, paragraph 71).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 2, 3, and 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Smit (US PG Publication 2018/0164876) as applied to claims 1 and 7 above, and further in view of Jain et al. (US Patent 5,745,126).
Regarding claim 2, Smit teaches the system of claim 1, but does not teach wherein the computer-executable instructions, when executed, further cause the hardware processor to: obtain an indication of an object to track and a first image in the sequence of images captured by a first camera in the subset of cameras and displayed by the user device; apply image processing to the first image to identify a characteristic of the object; apply image processing to images in the sequence of images other than the first image to identify a second image in the sequence of images that depicts the object with the characteristic; and cause the user device to display the second image following the first image.
In the same art of viewing a subset of cameras/images, Jain teaches obtaining an indication of an object to track and a first image in the sequence of images captured by a first camera in the subset of cameras and displayed by the user device (select a particular object – which may be a dynamically moving object – or even an event in the real world scene that is a of particular interest...as the scene develops its presentation to the viewer will prominently feature the selected object or the selected event, see for instance, column 1, lines 60-67); apply image processing to the first image to identify a characteristic of the object (First, many of the features extracted will turn out to be (i) distinct and (ii) fixed; and are in fact the hash marks and yard markings of an American football field! It is clear that these fixed features could be entered into any system. even by manual means, just once "before the game"... easily captured by even the most rudimentary machine vision programs, see for instance, column 38, lines 50-56. The system classifies, tags, and tracks objects in the scene, including static objects such as field markers, and dynamically moving objects such as the football and the football players, see for instance, column 7, lines 50-67. Objects are detected and localized, see for instance, column 28, lines 25-38); apply image processing to images in the sequence of images other than the first image to identify a second image in the sequence of images that depicts the object with the characteristic (First, many of the features extracted will turn out to be (i) distinct and (ii) fixed; and are in fact the hash marks and yard markings of an American football field! It is clear that these fixed features could be entered into any system. even by manual means, just once "before the game"... easily captured by even the most rudimentary machine vision programs, see for instance, column 38, lines 50-56. The system classifies, tags, and tracks objects in the scene, including static objects such as field markers, and dynamically moving objects such as the football and the football players, see for instance, column 7, lines 50-67. The perception system, using camera hand-off, dynamically tracks objects in the scene as they move from one camera coverage zone to another, see for instance, column 26, lines 50-60); and cause the user device to display the second image following the first image (The viewer can command the selection of real...video images of the scene in response to ...selected...static or moving object in the scene, see for instance, column 7, lines 30-50. The system will automatically accomplish, a handing off from one camera to another camera as different ones of multiple cameras best serve to image over time the selected object, see for instance, column 8, lines 1-5).
It would have been obvious to one of ordinary skill in the art having the teachings of Smit and Jain in front of them before the effective filing date of the claimed invention to incorporate camera selection and object tracking as taught by Jain into Smit’s telepresence system, as selecting cameras and tracking objects using various techniques, such as described by Jain was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Smit.
The modification of Smit with Jain would have allowed the computer-executable instructions, when executed, further cause the hardware processor to: obtain an indication of an object to track and a first image in the sequence of images captured by a first camera in the subset of cameras and displayed by the user device; apply image processing to the first image to identify a characteristic of the object; apply image processing to images in the sequence of images other than the first image to identify a second image in the sequence of images that depicts the object with the characteristic; and cause the user device to display the second image following the first image.
The motivation for combining Smit with Jain would have been to improve the user experience, enhance functionality and to find the best and proper image, see for instance, Jain, column 8, lines 20-25.
Regarding claim 3, Smit teaches the system of claim 1, but does not teach wherein the computer-executable instructions, when executed, further cause the hardware processor to apply the first image as an input to a trained object detection artificial intelligence model, wherein application of the first image as the input to the trained object detection artificial intelligence model causes the trained object detection artificial intelligence model to output an indication that the object with the characteristic is depicted in the first image.
In the same art of viewing a subset of cameras/images, Jain teaches applying the first image as an input to a trained object detection artificial intelligence model, wherein application of the first image as the input to the trained object detection artificial intelligence model causes the trained object detection artificial intelligence model to output an indication that the object with the characteristic is depicted in the first image (The general concepts, and voluminous prior art, concerning machine vision, target classification, and target tracking are all relevant to the present invention, see for instance, column 6, lines 1-3. Many features can be detected automatically using machine vision techniques, see for instance, column 23, lines 40-41. First, many of the features extracted will turn out to be (i) distinct and (ii) fixed; and are in fact the hash marks and yard markings of an American football field! It is clear that these fixed features could be entered into any system. even by manual means, just once "before the game"... easily captured by even the most rudimentary machine vision programs, see for instance, column 38, lines 50-56. Artificial intelligence model, see for instance, column 19, lines 40-45)
It would have been obvious to one of ordinary skill in the art having the teachings of Smit and Jain in front of them before the effective filing date of the claimed invention to incorporate camera selection and object tracking as taught by Jain into Smit’s telepresence system, as selecting cameras and tracking objects using various techniques, such as described by Jain was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Smit.
The modification of Smit with Jain would have allowed the computer-executable instructions, when executed, further cause the hardware processor to apply the first image as an input to a trained object detection artificial intelligence model, wherein application of the first image as the input to the trained object detection artificial intelligence model causes the trained object detection artificial intelligence model to output an indication that the object with the characteristic is depicted in the first image.
The motivation for combining Smit with Jain would have been to improve the user experience, enhance functionality and to find the best and proper image, see for instance, Jain, column 8, lines 20-25.
Regarding claim 8, Smit teaches the non-transitory, computer-readable medium of claim 7, wherein the computer-executable instructions, when executed, further cause the computing system to: obtain an indication of an object to track and a first image in the sequence of images captured by a first camera in the subset of cameras and displayed by the user device; apply image processing to the first image to identify a characteristic of the object; apply image processing to images in the sequence of images other than the first image to identify a second image in the sequence of images that depicts the object with the characteristic; and cause the user device to display the second image following the first image.
In the same art of viewing a subset of cameras/images, Jain teaches obtaining an indication of an object to track and a first image in the sequence of images captured by a first camera in the subset of cameras and displayed by the user device (select a particular object – which may be a dynamically moving object – or even an event in the real world scene that is a of particular interest...as the scene develops its presentation to the viewer will prominently feature the selected object or the selected event, see for instance, column 1, lines 60-67); apply image processing to the first image to identify a characteristic of the object (First, many of the features extracted will turn out to be (i) distinct and (ii) fixed; and are in fact the hash marks and yard markings of an American football field! It is clear that these fixed features could be entered into any system. even by manual means, just once "before the game"... easily captured by even the most rudimentary machine vision programs, see for instance, column 38, lines 50-56. The system classifies, tags, and tracks objects in the scene, including static objects such as field markers, and dynamically moving objects such as the football and the football players, see for instance, column 7, lines 50-67. Objects are detected and localized, see for instance, column 28, lines 25-38); apply image processing to images in the sequence of images other than the first image to identify a second image in the sequence of images that depicts the object with the characteristic (First, many of the features extracted will turn out to be (i) distinct and (ii) fixed; and are in fact the hash marks and yard markings of an American football field! It is clear that these fixed features could be entered into any system. even by manual means, just once "before the game"... easily captured by even the most rudimentary machine vision programs, see for instance, column 38, lines 50-56. The system classifies, tags, and tracks objects in the scene, including static objects such as field markers, and dynamically moving objects such as the football and the football players, see for instance, column 7, lines 50-67. The perception system, using camera hand-off, dynamically tracks objects in the scene as they move from one camera coverage zone to another, see for instance, column 26, lines 50-60); and cause the user device to display the second image following the first image (The viewer can command the selection of real...video images of the scene in response to ...selected...static or moving object in the scene, see for instance, column 7, lines 30-50. The system will automatically accomplish, a handing off from one camera to another camera as different ones of multiple cameras best serve to image over time the selected object, see for instance, column 8, lines 1-5).
It would have been obvious to one of ordinary skill in the art having the teachings of Smit and Jain in front of them before the effective filing date of the claimed invention to incorporate camera selection and object tracking as taught by Jain into Smit’s telepresence system, as selecting cameras and tracking objects using various techniques, such as described by Jain was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Smit.
The modification of Smit with Jain would have allowed the computer-executable instructions, when executed, further cause the computing system to: obtain an indication of an object to track and a first image in the sequence of images captured by a first camera in the subset of cameras and displayed by the user device; apply image processing to the first image to identify a characteristic of the object; apply image processing to images in the sequence of images other than the first image to identify a second image in the sequence of images that depicts the object with the characteristic; and cause the user device to display the second image following the first image.
The motivation for combining Smit with Jain would have been to improve the user experience, enhance functionality and to find the best and proper image, see for instance, Jain, column 8, lines 20-25.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US PG Publication 2022/0277565 to Haro teaches real-time video dimensional transformations of video for presentation in mixed reality-based virtual spaces.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J COBB whose telephone number is (571)270-3875. The examiner can normally be reached Monday - Friday, 11am - 7pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alicia Harrington can be reached at 571-272-2330. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL J COBB/Primary Examiner, Art Unit 2615