Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: a POI entity detection unit, a coordinate conversion unit, and a rendering unit in claims 1-9 and 11-17.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
DETAILED ACTION
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 7-8, and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Castaneda (Pub No. US 20200175747 A1) in view of Cower (Pub No. US 20220254108 A1).
As per claim 1, Castaneda teaches the claimed:
1. An augmented reality (AR) (Castaneda teaches a system for virtual reality content. However, it can also concern augmented reality. Castaneda teaches that virtual reality content can be generated from real-world captured images. Castaneda [0002]: “Various immersive technologies (e.g., virtual reality technology, augmented reality technology, mixed reality technology, etc.) allow users of media player devices to experience immersive scenes that are different from a real-world environment within which the users and media player devices actually exist. For example, a virtual reality scene may be generated based on camera-captured scenery of a real-world scene or virtually-rendered scenery of a virtualized scene. Using an immersive technology such as virtual reality technology, users experience simulated scenes that may be difficult, expensive, or impossible to experience otherwise. Accordingly, virtual reality technology may provide users with a variety of entertainment, educational, vocational, and/or other enjoyable or valuable experiences that may otherwise be difficult or inconvenient for the users to obtain.” Castaneda teaches basing the virtual reality scene with real-time, real-world scenes. Castaneda [0035]: “In some examples, facilities 102 through 108 may be configured to operate in real-time so as to generate, manage, access, process, and/or provide data while a user is experiencing a scene associated with a real-time timeline. As used herein, operations may be performed in “real time” when they are performed immediately and without undue delay such that, for example, data processing operations associated with a virtual reality scene based on an ongoing event (e.g., a real-world sporting event, a virtualized gaming event, etc.) are performed while the event is still ongoing (i.e., rather than after the fact) even if there is some amount of delay such as a few seconds or minutes. In some examples, these types of real-time operations may allow virtual reality users to experience a real-world event live or at approximately the same time as people actually attending the event.” Castaneda teaches mixing real and virtual elements in a volumetric representation. The mixing of real-time and virtual elements is augmented reality. Castaneda [0037]: “…As such, the state of the volumetric representation may represent a current, real-time state that various aspects of the volumetric representation are in (e.g., where objects are located within the scene, how objects are oriented, how objects are moving, etc.). In other examples, the volumetric representation may be associated with a virtual timeline (i.e., a timeline that is unrelated to real time in the real world) and scene management facility 102 may manage the state of the virtual reality scene for each time covered by the virtual timeline (which may be manipulated arbitrarily in ways that real timelines cannot be manipulated). In still other examples, the volumetric representation may be based on both real-world and virtualized elements. For instance, captured color footage of a real-world scene could be projected onto a virtualized 3D depth model of the scene to generate a mixed volumetric representation of a virtual reality scene that is based on the real-world scene.” Additionally, Castaneda teaches the capturing devices used to capture the real-world events, including a camera. Castaneda [0045]: “Image capture system 202 may be configured to capture surface data frames representative of a virtual reality scene. In some examples, image capture system 202 may capture sequences of such surface data frames that will be referred to herein as surface data frame sequences. In certain implementations, a virtual reality scene may be based on a real-world scene (e.g., by being generated based on camera-captured footage of real-world scenery, etc.). As such, image capture system 202 may include or be communicatively coupled with a plurality of capture devices (e.g., video cameras, depth imaging devices, etc.) configured to capture images for processing and distribution by image capture system 202. For instance, an exemplary implementation of image capture system 202 may include a plurality of capture devices that may be selectively and communicatively coupled to one another and to a capture controller included within image capture system 202.” This data is used to create an orthographic projection. Castaneda [0067]: “To illustrate, FIG. 4B shows such a projection. Specifically, FIG. 4B illustrates an exemplary orthographic projection 406 that is generated based on orthographic vantage point 402 to depict each of objects 404. As indicated by the coordinate system in FIG. 4B, orthographic projection 406 depicts objects 404 from a front view where the x-axis still extends to the right across the page, but now the y-axis extends upward toward the top of the page and the z-axis (not explicitly shown in FIG. 4B) will be understood to extend into the page.”).
a display that displays an image acquired from the camera and at least one AR entity; (Castaneda [0122]: “I/O module 1708 may include one or more devices for presenting output to a user, including, but not limited to, a graphics engine, a display (e.g., a display screen), one or more output drivers (e.g., display drivers), one or more audio speakers, and one or more audio drivers. In certain embodiments, I/O module 1708 is configured to provide graphical data to a display for presentation to a user. The graphical data may be representative of one or more graphical user interfaces and/or any other graphical content as may serve a particular implementation” Castenada describes the mixing of real-world elements and virtual ones into a volumetric representation. Castenada [0037]: “As such, the state of the volumetric representation may represent a current, real-time state that various aspects of the volumetric representation are in (e.g., where objects are located within the scene, how objects are oriented, how objects are moving, etc.). In other examples, the volumetric representation may be associated with a virtual timeline (i.e., a timeline that is unrelated to real time in the real world) and scene management facility 102 may manage the state of the virtual reality scene for each time covered by the virtual timeline (which may be manipulated arbitrarily in ways that real timelines cannot be manipulated). In still other examples, the volumetric representation may be based on both real-world and virtualized elements. For instance, captured color footage of a real-world scene could be projected onto a virtualized 3D depth model of the scene to generate a mixed volumetric representation of a virtual reality scene that is based on the real-world scene.” Castenada fig. 3 shows dashed lines and a box overlaying certain portions of a basketball court, which is the real-world object. The dashed line is the AR entity used to analyze the image of the real object.).
a POI entity detection unit that detects point-of-interest (POI) entities from image information acquired through the camera; (Castenada [0028]: “Different types of projections have different strengths and weaknesses for depicting a scene because different portions of a scene tend to be viewed differently by users experiencing the scene (e.g., virtually experiencing a virtual reality scene). For example, the portion of a scene in immediate proximity to the user position at any given moment may be the portion that calls for the most detailed and thorough representation because it is the portion that the user can see most clearly, explore most directly, and so forth. Objects nearby the user position within the scene may be walked around and viewed from various different angles, for instance, thus making it important for such objects to be thoroughly represented so that they can be flexibly rendered and depicted from any viewpoint from which a user may wish to view the objects.” The viewpoint the user wishes to look at views objects which are the points-of-interest entities.).
Castaneda alone does not explicitly teach the remaining claim limitations.
However, Castaneda in combination with Cower teaches the claimed:
a coordinate conversion unit that sequentially converts coordinates of the detected POI entities based on a preset conversion matrix to convert them into coordinates in an orthographic projection space; (Cower [0062]: “The projection matrix may be defined as a combination of a perspective projective transformation matrix M.sub.o and an orthographic projection matrix M.sub.p. In this regard, the projection matrix may be defined as M.sub.o×M.sub.p. The perspective projective transformation matrix M.sub.o may perform a perspective projection including a linear transformation of 3D data and distortion of points (e.g. using a pinhole projection model) as if such points were viewed from the perspective of the vehicle's camera. The orthographic projection matrix M.sub.p then converts the projected coordinates into clip space coordinates.”).
a rendering unit that renders AR entities including information on respective POI entities into a preset orthographic projection space; (Cower [0018]: “The projection matrix may transform from 3D camera-space coordinates or rather coordinates in a reference frame of the camera into clip-space coordinates. During rendering, the projection matrix is used to transform all visible coordinates into normalized device coordinate space, and anything outside of that is discarded during rendering. The projection matrix may be defined as a combination of a perspective projective transformation matrix and an orthographic projection matrix.” All the objects in view are transformed to the orthographic projection space. AR entities are the overlays applied to the image with the objects. Cower [0019]: “The results of the application of the view matrix and the projection matrix for each camera may be a virtual camera space within a 3D world view for the scene. In other words, all of the 3D content captured in a scene may be incorporated into the virtual camera space for a camera even if it is outside of or beyond the 3D area of the virtual camera model. The 3D world view may then be composited with images captured by the camera to create a visualization of the 3D world view. In this regard, the visualization provides a real-world image from the perspective of the camera of the vehicle with one or more graphical overlays of the 3D content.” Cower Fig. 7 shows a screen that is been rendered according to the projective transformation, and shows graphical overlays that are based on the transformed positions of objects. Those overlays correspond to the AR entities of that correspond to objects in the scene.).
and a controller that controls the coordinate conversion unit to convert the POI entity coordinates detected by the POI entity detection unit into coordinates in an orthographic projection space according to a screen of the display, (Castaneda [0059]: “ While depth data may be generated and transmitted over network 210 to media player devices 212 using any suitable coordinate space, it may be advantageous for various reasons to represent depth data within a given scene representation 220 using a clip or screen coordinate space for transmission, and to include a transform (e.g., an inverse view-projection transform) with the depth data to allow the depth data to be converted by the media player device 212 from the clip or screen coordinate space back to the world coordinate space as part of the rendering process. For instance, one advantage of using a clip or a screen coordinate space is that these coordinate spaces are not linearized, as is the world coordinate space. As such, the limited amount of data transmitted may be best leveraged by naturally allocating more data for describing regions closer to a particular vantage point and less data for describing regions farther from the vantage points. Additionally, another advantage of using clip coordinate spaces relates to preserving precision in compressed depth data. Accordingly, system 100 may generate and transmit the surface data frame sequences included within scene representation 220 using non-normalized, non-linear coordinates of a coordinate space such as a screen or clip coordinate space rather than using the normalized, linear coordinates of the world coordinate space.” The controller is what uses a clip or a screen coordinate system to determine the visible location of the objects after the transformation.).
controls the rendering unit to render AR entities including information on respective POI entities at respective positions of the POI entity coordinates converted to the coordinates of the orthographic projection space, and controls the display to display the rendered AR entities at the positions of the POI entity coordinates of the image acquired from the camera. (Castaneda [0062]: “As described above, the output of virtual reality provider system 208, and, more particularly, the output of an implementation of system 100 included therein and/or implemented thereby, may include one or more scene representations 220 that each include a respective set of surface data frame sequences depicting orthographic projections and perspective projections of virtual reality scene 302. As mentioned above, orthographic projections, as used herein, refer to renderings or other projections of color data and/or depth data created using parallel projection lines, while perspective projections refer to renderings or other projections employing diverging projection lines.” Both the objects and the augmented or virtual reality objects are transformed and shown.).
wherein among the respective POI entity coordinates converted into the coordinates of the orthographic projection space, when there are coordinates that are close to one another by a predetermined level or more, (Cower [0063] teaches the matrix that indicates the far and near planes used in clipping, to define the clipping volume. This is the volume that is being viewed. The all the objects within the clipping volume are within the bounds of that volume. This means all objects within the volume are close together by a certain amount. The range of values that the objects reside in is predetermined as variables in the matrix. The examiner is interpreting “more” than a level of closeness to mean a smaller distance, so that the predetermined level is the range between the planes are the variables within the matrix.).
the controller modifies the coordinate values of the respective POI entity coordinates that are close to one another such that the POI entity coordinates that are close to one another are spaced apart from one another according to distances between the respective POI entities and the camera. (This refers to distortion of points based on a viewing transformation. Even if points are close together, they will be distorted based on their relationship to the camera. The distance between then may be uneven because of the pinhole projection model of the camera. This is the spacing apart. Cower [0062]: “The projection matrix may be defined as a combination of a perspective projective transformation matrix M.sub.o and an orthographic projection matrix M.sub.p. In this regard, the projection matrix may be defined as M.sub.o×M.sub.p. The perspective projective transformation matrix M.sub.o may perform a perspective projection including a linear transformation of 3D data and distortion of points (e.g. using a pinhole projection model) as if such points were viewed from the perspective of the vehicle's camera. The orthographic projection matrix M.sub.p then converts the projected coordinates into clip space coordinates.” Cower fig. 7 shows objects from a distorted perspective where the spacing between objects that were originally more evenly spaced out changes and there appears to be larger distance from each other based on relationship to the camera.).
An augmented reality (AR) navigation device, the device comprising: a camera that acquires a front image; (Castaneda did not explicitly teach Cower teaches a camera used in a navigation device for vehicles. Cower [0068]: “The aforementioned features may enable human operators to more readily identify certain issues, some of which may stem from improper sensor calibration. For example, if user 442 is using client computing device 440 to visualize 3D content for a scene where vehicle 100 is navigating a plurality of objects, with a typical top-down 3D view with camera images displayed at the same time, user 442 would need to mentally “work out” a mapping between the 3D objects in the top-down 3D view and the same objects in the camera image. Depending upon the complexity of the scene, this can be both time consuming and error prone. In other words, it can be especially difficult for scenes such as those on a highway with many other vehicles. However, when viewing the 3D world view overlayed with the camera images, the operator is able to visualize the 3D view from a camera of the vehicle. In this regard, the operator is able to directly visualize what objects the vehicle is detecting.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the orthographic projection matrix and resulting object and graphical overlay distortion as taught by Cower with the system of Castaneda in order to clearly establish the transformation matrix being used on the AR objects and visual overlays of Castenada.
As per claim 17, this claim is similar in scope to limitations recited in claim 1, and thus is rejected under the same rationale.
As per claim 2, Castaneda alone does not explicitly teach the claimed limitations.
However, Castaneda in combination with Cower teaches the claimed:
2. The device of claim 1, wherein the coordinate conversion unit converts the POI entity coordinates detected by the POI entity detection unit into coordinates in a clip space based on a model-view-projection (MVP) transformation matrix, (An MVP matrix converts coordinates from world space to viewing space. Cower [0057]: “[0057] At block 830, a set of matrices using the virtual camera model is generated. The model for each camera may be used to generate a set of matrices for that camera. These matrices may be generated locally by the computing device 440 or may be computed by the server computing devices, stored in the storage system 450 and retrieved as needed by the computing device 440. As with the models, the matrixes may not be the same for each camera for each calibration, and thus may need to be redetermined if the cameras are recalibrated. The matrices may include a view matrix and a projection matrix.”).
converts the converted POI entity coordinates into coordinates in a normalized device coordinate (NDC) space according to the screen of the display, (Cower [0061]: “The projection matrix may transform from 3D camera-space coordinates or rather coordinates in a reference frame of the camera into clip-space coordinates. Clip space may be a 2-unit wide cube, centered at (0, 0, 0), and with corners that range from (−1, −1, −1) to (1, 1, 1). Clip space may be compressed down into a 2D space and rasterized into an image. During rendering, the projection matrix is used to transform all visible coordinates into normalized device coordinate space (−1 to 1 in X, Y, and Z), and anything outside of that −1 to 1 cube is then discarded during rendering. This process may occur as part of the GPU rendering pipeline”).
and converts the coordinates in the normalized space into coordinates in the orthographic projection space according to the screen of the display. (Cower [0062]: “The projection matrix may be defined as a combination of a perspective projective transformation matrix M.sub.o and an orthographic projection matrix M.sub.p. In this regard, the projection matrix may be defined as M.sub.o×M.sub.p. The perspective projective transformation matrix M.sub.o may perform a perspective projection including a linear transformation of 3D data and distortion of points (e.g. using a pinhole projection model) as if such points were viewed from the perspective of the vehicle's camera. The orthographic projection matrix M.sub.p then converts the projected coordinates into clip space coordinates.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the mathematical representation of orthographic projection transformation and normalization as taught by Cower with the system of Castaneda in order to clearly define the operations used to transform and normalize the objects in the view for display.
As per claim 3, Castaneda teaches the claimed:
3. The device of claim 2, wherein the clip space, which is a space in which the AR entities are to be rendered, is a frustum-shaped space according to a field of view (FOV) of the camera, centered on a line of sight of the camera. (Castaneda [0074]: “In FIG. 6A, perspective vantage point 602 is symbolized by a relatively short line perpendicular to the z-axis and including arrow tips at either end diverging away from the z-axis in the positive z direction. While only two dimensions are clearly illustrated in FIG. 6A, it will be understood that perspective vantage point 602 may actually be implemented as a 3D vantage point (e.g., a frustum-shaped perspective vantage point) extending from the x-y plane and directed in the positive z-axis direction toward objects 404. In some examples, rather than diverging from a plane to form a frustum shape, implementations of perspective vantage point 602 may diverge from a point to form a pyramid shape. Regardless of the shape of perspective vantage point 602, a perspective projection that depicts objects 404 may be generated based on perspective vantage point 602” The positive z-axis direction of the viewpoint is the line of sight of the camera.).
As per claim 4, Castaneda alone does not explicitly teach the claimed limitations.
However, Castaneda in combination with Cower teaches the claimed:
4. The device of claim 2, wherein the normalized coordinate system is a coordinate system in which an area corresponding to a size of the display screen is normalized into a square with a width and height of 2 each (Cower [0061]: “The projection matrix may transform from 3D camera-space coordinates or rather coordinates in a reference frame of the camera into clip-space coordinates. Clip space may be a 2-unit wide cube, centered at (0, 0, 0), and with corners that range from (−1, −1, −1) to (1, 1, 1). Clip space may be compressed down into a 2D space and rasterized into an image. During rendering, the projection matrix is used to transform all visible coordinates into normalized device coordinate space (−1 to 1 in X, Y, and Z), and anything outside of that −1 to 1 cube is then discarded during rendering. This process may occur as part of the GPU rendering pipeline.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the normalization of the viewable coordinate system into a range of -1 to 1 in 3 axes as taught by Cower with the system of Castaneda in order to normalize the coordinates for the AR viewable objects of Castaneda.
As per claim 7, Castaneda alone does not explicitly teach the claimed limitations.
However, Castaneda in combination with Cower teaches the claimed:
7. The device of claim 2, wherein the coordinate conversion unit converts the POI entity coordinates that have been converted into the coordinates of the NDC space into coordinates in an orthographic projection space that corresponds to the size of the display screen and that is generated in a direction perpendicular to a line of sight of the camera. (Applicant’s specification in [0054] clarifies this as converting coordinates to a 2D plane that is orthogonal to the camera, i.e. the x-y plane. Castaneda teaches a vantage point that is parallel to the x-y plane, that is looking in the positive z-axis direction. Castaneda [0074]: “While only two dimensions are clearly illustrated in FIG. 6A, it will be understood that perspective vantage point 602 may actually be implemented as a 3D vantage point (e.g., a frustum-shaped perspective vantage point) extending from the x-y plane and directed in the positive z-axis direction toward objects 404. In some examples, rather than diverging from a plane to form a frustum shape, implementations of perspective vantage point 602 may diverge from a point to form a pyramid shape. Regardless of the shape of perspective vantage point 602, a perspective projection that depicts objects 404 may be generated based on perspective vantage point 602.” The camera shows the contents of the scene in the x-y plane and the camera looks down the z-axis. This means that the plane representing the view is perpendicular to the line of sight of the view. It would be obvious to view this plane with a camera that has coordinates defined ahead of time.).
As per claim 8, Castaneda teaches the claimed:
8. The device of claim 1, wherein the controller varies sizes of AR entities corresponding to respective POI entities based on distances between respective POI entities in the camera. (Castaneda [0076]: “As shown in FIG. 6B, objects 404 are depicted to be of different sizes in perspective projection 606, despite the fact that, as shown in FIG. 6A, each object 404 is actually identically sized. This is because, as described above, each ray used to generate each pixel of perspective projection 606 may be diverging from a common point (in a pyramidal implementation) or from a small planar area (in a frustum-shaped implementation as shown). Accordingly, in perspective projection 606, each object 404 may be depicted with a unique, different amount of detail and resolution, based on the proximity of each object to perspective vantage point 602. Specifically, as shown, perspective projection 606 depicts relatively close objects to the vantage point such as object 404-2 with greater size and detail than relatively distant objects from the vantage point such as object 404-1.”
Cower fig. 7 shows AR overlays that change in size based on the distance of the object from the Camera.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to changing sized overlays based on object distance from the camera as taught by Cower with the system of Castaneda in order either allow the size of the overlay to directly match the object for bounding purposes, or to convey the distance an object is when presenting other information about so the user as a better sense of the space of the scene.
As per claim 11, Castaneda alone does not explicitly teach the claimed limitations.
However, Castaneda in combination with Cower teaches the claimed:
11. The device of claim 1, wherein among the respective POI entity coordinates converted into the coordinates of the orthographic projection space, the controller determines that the POI entity coordinates with the same coordinates on a specific axis are coordinates that are close to one another by a predetermined level or more, and wherein the specific axis is at least one of an X-axis and a Y-axis of the display screen. (The projective transformation matrices in cower [0063] and [0064] have values that correspond to the axes of a 3D space and will create a limit for the X and Y axis of the screen and place the objects in the clipping volume in ranged containing those.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the definition of the projective transformation along certain axes to show distortion on a 2D screen as taught by Cower with the system of Castaneda in order to apply clearly defined distortion on the display screen of the AR system of Castaneda.
As per claim 12, Castaneda alone does not explicitly teach the claimed limitations.
However, Castaneda in combination with Cower teaches the claimed:
12. The device of claim 11, wherein the controller assigns a preset separation value, and modifies the coordinate values ??of the respective POI entity coordinates that are close to one another according to distances between respective POI entities and the camera such that the POI entity coordinates that are close to one another are spaced apart from one another,
and wherein the preset separation value is reflected in a coordinate value in a direction of the specific axis among the coordinate values ??of the respective POI entity coordinates that are close to one another. (Because the same projective transformation is applied to all the objects in the clipping volume described by Cower in claim 11, they all receive the same preset separation value. They are separated from one another in the viewing transformation which is preset but moves different objects in space differently. Because of the nature of a perspective transformation, they will appear to be separated from each other from the distortion based on the relationship to the camera in the pinhole model, as taught by Cower above.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the consistently applied separation value that is based on distance from camera as taught by Cower with the system of Castaneda in order to create a consistent change in perspective of the objects viewed by the AR system.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Castaneda in view of Cower and further in view of Cha (Pub No. US 20220063510 A1).
As per claim 5, Castaneda alone does not explicitly teach the claimed limitations.
However, Castaneda in combination with Cha teaches the claimed:
5. The device of claim 4, wherein the coordinate conversion unit converts POI entity coordinates in the clip space into coordinates in the NDC space based on an inverse conversion matrix provided by advanced driver assistance systems (ADAS). (Cha [0107]: “Like this, the method to output a 3D graphic object to the ground surface by the head up display of some example embodiments may be a method to convert the specific object (Obj) corresponding to information desired to be transmitted into an image substantially outputted from the display device (LCD) by using M.sup.−1 and C. In equation (2), the M.sup.−1 may be an inverse conversion matrix of a conversion matrix to which characteristics of the optical system of the head up display of some example embodiments have been reflected. And the C may be a conversion matrix for projecting the specific object (Obj) 141 onto the ground surface according to a position and an angle of the camera (or the driver view).”
Cha teaches the ADAS. Cha [0082]: “Main information provided from a vehicle navigation may include route information on a road where the vehicle is running, lane information, distance information from a front vehicle, etc. In an Advanced Driver-Assistance System (ADAS), information related to safety may be provided to a user. The information may include lane information, distance information from a front vehicle/a next vehicle, unexpected (e.g., sudden, urgent and/or emergency) information, etc. Likewise, at the time of an autonomous driving, a vehicle which is a subject of a driving may provide information about situations to occur later, such as a rotation on a road or a lane change, to a driver. Route information may include information to guide routes, which may include turn-by-turn (TBT) information to guide going straight, a rotation, etc.” The information provided concerns the position and orientation of the objects in the scene, and would produce a more accurate viewing transformation depending on the subject of the process. The driver-assistance system provides safely information, which could include the perspective from which to view the scene or an object that would need to be viewed.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the inverse matrix provided by the Advanced Driver-Assistance System as taught by Cha with the system of Castaneda in order to use a vehicle specific system analyzing real-time data to inform the geometry of the transformation to get the most relevant scene possible.
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Castaneda in view of Cower and further in view of LaMontagne (Pub No. US 20200242831 A1).
As per claim 6, Castaneda alone does not explicitly teach the claimed limitations.
However, Castaneda in combination with LaMontagne teaches the claimed:
6. The device of claim 4, wherein the coordinate conversion unit normalizes coordinate values of respective POI entity coordinates in the clip space by dividing them by a size of the display screen so as to convert the POI entity coordinates in the clip space into coordinates in an NDC space according to the size of the display screen. (LaMontagne [0059]: “In one embodiment, the GAP can use normalized viewport space based coordinates, where the area [0, 1] for x and y-axes is the viewport (screen). Points projected beyond that range considered as located off the screen. Thus, in this embodiments, GAP is measured related to the screen size and includes a ratio of the area of the object of interest projected on the screen divided by the total area of the screen. Thus, the GAP in a normalized viewport space can be represented as:”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the normalization based total area of the screen as taught by LaMontagne with the system of Castaneda in order to fit the viewing transformation and normalization of the AR system to the screen and the portion of the screen being used.
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Castaneda in view of Cower and further in view of Tran (Pub No. US 10149958 B1).
As per claim 9, Castaneda alone does not explicitly teach the claimed limitations.
However, Castaneda in combination with Cower and Tran teaches the claimed:
9. The device of claim 8, wherein the controller determines, according to a distance between the camera and a specific POI entity, a size of an AR entity corresponding to the specific POI entity, based on a preset function,
and wherein the preset function is a quadratic function in which a size of an AR entity is inversely proportional to the square of a change in distance between the specific POI object and the camera. (The quadratic function described is the inverse-square law, measuring an effect between two objects based proportional to the square of distance between them. It is a commonly used mathematical function. Tran col. 22 lines 30-55: “(98) In one embodiment, the GPU detects an object using color and then tracks the object by: 1. Create a masking image by comparing each pixel with a target color value. Convert pixels that fall within the range to white, and convert those that fall outside the range to black. 2. Find the centroid of the target color. The centroid of the tracked color defines the center position for the overlay image. A multipass pixel-processing kernel is used to compute a location. The output of this phase is a 1×1-pixel image, containing the coordinate of the centroid in the first two components (pixel.rg) and the area in the third component (pixel.b). The area is used to estimate the distance of the object from the camera. 3. Composite an image over the detected object. Assuming the shape of the object does not change with respect to the frame, then the change in area of the tracked color is proportional to the square of the distance of the object from the viewer. This information is used to scale the overlay image, so that the overlay image increases or decreases in size appropriately.” Tran teaches scaling the overlay corresponding to an object with the inverse square law relating to distance to the viewer, which could be through a camera.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the scaling of an overlay based on the distance of an object from the viewer as taught by Tran with the system of Castaneda and Cower in order to scale the AR entities according to their object distance from the camera and then use that information to apply the entities to the transformed objects according to the projective matrix.
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Castaneda in view of Cower and further in view of (“Panda3D Manual: How to Control Render Order” https://docs.panda3d.org/1.7/python/How_to_Control_Render_Order.
Carnegie Mellon University 2010. Hereinafter named “Panda3D Order”).
As per claim 13, Castaneda alone does not explicitly teach the claimed limitations.
However, Castaneda in combination with Panda3D Order teaches the claimed:
13. The device of claim 12, wherein the rendering unit renders, when the respective POI entity coordinates that are close to one another are spaced apart from one another according to the preset separation value,
AR entities corresponding to respective POI entities according to the positions of the POI entity coordinates that are spaced apart from one another, and renders AR entities corresponding to respective POI entities in a reverse order of distances, from long to short, between the POI entities and the camera among the POI entity coordinates that are close to one another. (Panda3D Order, just above “Cull Bins”: “Finally, regardless of the rendering optimizations described above, a particular sorting order is required to render transparency properly (in the absence of the specialized transparency support that only a few graphics cards provide). Transparent and semitransparent objects are normally rendered by blending their semitransparent parts with what has already been drawn to the framebuffer, which means that it is important that everything that will appear behind a semitransparent object must have already been drawn before the semitransparent parts of the occluding object is drawn. This implies that all semitransparent objects must be drawn in order from farthest away to nearest, or in "back-to-front" ordering, and furthermore that the opaque objects should all be drawn before any of the semitransparent objects.” It would be obvious to render non-transparent objects in this order as well.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the rendering of objects in order of farthest from the camera to nearest to the camera as taught by Panda3D Order with the system of Castaneda in order to determine if nearer objects occlude farther objects both to render semi-transparent objects accurately and determine if the viewing transformation causes some far objects to be occluded by nearer ones relative to the camera’s point of view. It would also preserve the data of the hidden portions of the object in case the view changes quickly.
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Castaneda in view of Cower and further in view of Choi (Pub No. US 20190051030 A1).
As per claim 14, Castaneda alone does not explicitly teach the claimed limitations.
However, Castaneda in combination with Choi teaches the claimed:
14. The device of claim 1, wherein the controller generates a first layer on which an image acquired from the camera is displayed, renders the AR entities on a second layer having the same size as the first layer, and controls the display to overlap the second layer on the first layer. (Choi [0008]: “In accordance with an embodiment of the present invention, the above and other objects can be accomplished by the provision of a user interface apparatus for a vehicle, including: a first camera configured to capture a forward view image including an object; an interface unit configured to receive information about the object from a second camera; a display; and a processor configured to convert the information about the object in a coordinate system of the second camera with respect to the first camera, generate an augmented reality (AR) graphic object corresponding to the object, and control the display to overlay the AR graphic object on the forward view image.” The information from the first camera is being transformed to the coordinate system of the second camera to be overlayed. Thus, the layer with the graphic object is the same size since it has been transformed. Choi fig. 16 shows a grid-like layer with AR overlays that is the same size as the screen depicting the scene.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use a layer containing graphical overlay which is the same size as a transformed image screen with the system of Castaneda in order to indicate to a user allow an AR overlay to be controlled for size and to be able to separate the graphical parts cleanly from the original image.
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Castaneda in view of Cower and further in view of Pettyjohn (Pub No. US 20150170256 A1).
As per claim 15, Castaneda alone does not explicitly teach the claimed limitations.
However, Castaneda in combination with Pettyjohn teaches the claimed:
15. The device of claim 1, wherein the controller controls the display to display the image acquired from the camera, and controls the rendering unit to render AR entities including information on respective POI entities in a designated area of the display screen separately from the image acquired from the camera, and to render connection lines connecting between the respective POI entity coordinates converted into coordinates of the orthographic projection space and the respective AR entities rendered in the designated area. (Pettyjohn teaches an AR system navigating through a space and providing information on objects in the path through AR overlays. PettyJohn [0072]: “Alternatively, and as discussed elsewhere herein in an illustrative example, a particular messaging campaign may not merely transmit (107) content when the consumer is detected (107) at a particular location in the retail location, but may transmit (107) navigational and/or map data to the consumer, which is used to provide in a mobile application a visual representation of the location of the consumer in the retail location, and navigational data to direct the consumer to a particular location. In such an embodiment, map data may be used to transmit (107) messaging, in that the consumer will be provided a topological map with navigational instructions.” Pettyjohn fig. 4 shows a designated area for AR entities that is outside image of the physical objects in the top left corner. These indicate things about the object like the location and the pricing of them.
Petty John [0076]: “An exemplary embodiment, implemented as a mobile device application, is depicted in FIG. 4. In the depicted embodiment, a mobile device (401) having a display (403), displays a real-time image of a retail location (405), said image comprising a generally faithful presentation of the current state of the retail location. This image is generally produced at least in part using an imaging device built into the mobile device, such as a digital camera. The image is overlaid with various components to create an augmented reality experience. By way of example and not limitation, a topological map (407) of the retail location is displayed. In the depicted embodiment, the topological map (407) is a topological map of the retail location depicted (405) in the display (403), and comprises an indication of the location (408) of the consumer in the retail location and an indication of navigational instructions (410) to locate a certain product (409) in the retail location. In another embodiment, the topological map (407) may further comprise an indication of the location of the product in the retail location (not depicted).” Pettyjohn [0077]: “In the depicted embodiment, the display further comprises an image of a specific product sought (409), displayed in a callout and located in the augmented reality image in the approximate location of the product on the shelf. The display further comprises overlaid navigational instructions (411) to the location of the product (409). In a still further embodiment, the display comprises messaging (413) in a callout. The messaging may be displayed in connection with the physical location of the product to which the messaging pertains. One of ordinary skill in the art will appreciate that, in an augmented reality application, the location of the overlaid components on the display (403) will move, resize, and/or disappear from the display, and new overlaid components may appear, resize, and/or move on the display, as the location and orientation of the device (and thus, the display) changes in response to consumer behavior or movement.” It would be obvious to add connected lines to show which objects in the scene are being indicated by the AR overlays, especially if they aren’t right next to the objects they describe, but are in another designed area of the display.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the generation of graphical overlays that point to an indicate objects in a scene as taught by Pettyjohn with the system of Castaneda in order to indicate to a user information about the locations of important objects on a navigable path and to inform the user whether they should be sought out or avoided based on their nature.
Claim 16 are rejected under 35 U.S.C. 103 as being unpatentable over Castaneda in view of Cower and further in view of PettyJohn and further in view of Choi.
As per claim 16, Castaneda alone does not explicitly teach the claimed limitations.
However, Castaneda in combination with Choi and Pettyjohn teaches the claimed:
16. The device of claim 1, wherein the controller generates a first layer and a second layer corresponding to a size of the display screen, displays the image acquired from the camera on the first layer, detects corresponding coordinates corresponding to POI entity coordinates in an orthographic projection space detected from the image acquired by the camera in the second layer, and renders AR entities including information on respective POI entities in a designated area of the second layer, and renders connection lines connecting between the respective rendered AR entities and the respective corresponding coordinates on the second layer, and controls the display to overlap the second layer on the first layer. (Claim 16 substantially combines the features of 14 and 15. Choi teaches the layer with overlay information that is the same size as the transformed screen. Choi [0008]: “In accordance with an embodiment of the present invention, the above and other objects can be accomplished by the provision of a user interface apparatus for a vehicle, including: a first camera configured to capture a forward view image including an object; an interface unit configured to receive information about the object from a second camera; a display; and a processor configured to convert the information about the object in a coordinate system of the second camera with respect to the first camera, generate an augmented reality (AR) graphic object corresponding to the object, and control the display to overlay the AR graphic object on the forward view image” Choi teaches the overlay being output on a cropped area. This is the second layer that is in the same coordinate system of the first layer and contains AR graphic overlays. This can be combined with the AR overlays in designated areas pointing to specific objects in the scene as taught by Pettyjohn. The overlays can be in the second layer and point to the objects based on the transformed coordinate system.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use a layer containing graphical overlay which is the same size as a transformed image screen with the system of Castaneda in order to indicate to a user allow an AR overlay to be controlled for size and to be able to separate the graphical parts cleanly from the original image.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the generation of graphical overlays that point to an indicate objects in a scene as taught by Pettyjohn with the system of Castaneda in order to indicate to a user information about the locations of important objects on a navigable path and to inform the user whether they should be sought out or avoided based on their nature.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS JOHN FOSTER whose telephone number is (571)272-5053. The examiner can normally be reached Mon, Fri 8:30-6. Tues-Thurs 7:30-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Hajnik can be reached at 571-272-7642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/THOMAS JOHN FOSTER/Examiner, Art Unit 2616
/HAI TAO SUN/Primary Examiner, Art Unit 2616