Last updated: May 29, 2026
Application No. 18/889,450
DIETARY INTAKE INFORMATION ACQUISITION DEVICE AND DIETARY INTAKE INFORMATION ACQUISITION METHOD

Final Rejection §103§112
Filed
Sep 19, 2024
Priority
Apr 13, 2022 — continuation of PCTJP2022017684
Examiner
BARTLEY, KENNETH
Art Unit
3684
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Mitsubishi Electric Corporation
OA Round
2 (Final)
This examiner grants 36% of cases after interview

— +28.8% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 614 resolved cases, 2023–2026
Examiner Intelligence

BARTLEY, KENNETH View full profile →
Grants only 36% of cases
Career Allowance Rate
223 granted / 614 resolved
-15.7% vs TC avg
Strong +29% interview lift
Without
With
+28.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 10m
Avg Prosecution
38 currently pending
Career history
671
Total Applications
across all art units
Statute-Specific Performance

§101
14.5%
-25.5% vs TC avg
§103
72.8%
+32.8% vs TC avg
§102
2.4%
-37.6% vs TC avg
§112
10.0%
-30.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 614 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Receipt of Applicant’s Amendment filed January 12, 2026, is acknowledged.

Response to Amendment
Claims 1 and 14 have been amended.  Claims 15 and 16 are new.  Claims 1-16 are pending and are provided to be examined upon their merits.

Response to Arguments
Applicant’s arguments with respect to claims 1-16 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.  A response is provided in bold to Applicant’s Remarks:.
Applicant argues 35 USC §101 Rejection, starting pg. 9 of Remarks:

Based on the claim amendments, the 35 USC 101 Rejection is withdrawn.  See analysis below.

Applicant argues 35 USC §112(b) Rejection, starting pg. 9 of Remarks:

The prior rejection has been withdrawn, but a new rejection is made based on the claim amendments.

Applicant argues 35 USC §103 Rejection, starting pg. 17 of Remarks:

Claims 1-11 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over US 2015/0168365 to Connor in view of US 2015/0379238 to Connor (hereinafter referred to as Connor 2); and claims 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Connor, Connor 2 and US 2019/0295440 to Hadad. In response, Applicant respectfully submits that amended independent claims 1 and 14 recite novel features not taught or rendered obvious by the applied references.

By way of background, independent claim 1 is directed to a dietary intake information acquisition device including:

a first camera;

a second camera;

as first motor to move the first camera;

a second motor to move the second camera; and

processing circuitry to

infer a dish or an ingredient to be consumed by a user on a basis of a captured image taken by the first camera and perform creation of dietary intake information regarding the dish or the ingredient that has been inferred,

create, for a target dish or a target ingredient among the dish or the ingredient inferred by the inferring, a question asking what the target dish or the target ingredient is,

detect an action of the user related to the target dish or the target ingredient on a basis of another captured image taken by the second camera,

determine a question timing at which the question created by the creation is output from a period in which an action of the user related to the target dish or the target ingredient is detected by the detection,

control a speaker to output of question voice output information for outputting the question created by the creation of the question by voice at the question timing determined by the determination,

acquire an uttered speech of the user by a microphone in response to the question output by the output by voice on a basis of the question voice output information, 

perform speech recognition on the uttered speech acquired by the acquisition, and

perform reflection of, in the dietary intake information, information regarding the dish or the ingredient which has been obtained from the user performing answer of the question and which has been specified on a basis of a result of the speech recognition.

Independent claim 14, although varying in claim scope and statutory class, recites substantially similar features as claim 1. Thus, the arguments presented below with respect to claim 1 are also applicable to independent claim 14.

Turning now to the applied references, Applicant respectfully submits that Reference A fails to teach or suggest "a first camera; a second camera; a first motor to move the first camera; a second motor to move the second camera; and processing circuitry to infer a dish or an ingredient to be consumed by a user on a basis of a captured image taken by the first camera and perform creation of dietary intake information regarding the dish or the ingredient that has been inferred, create, for a target dish or a target ingredient among the dish or the ingredient inferred by the inferring, a question asking what the target dish or the target ingredient is, detect an action of the user related to the target dish or the target ingredient on a basis of another captured image taken by the second camera," as recited in Applicant's claim 1.


Connor describes a mobile phone 901 with a built-in camera 902. Connor also describes a wearable camera. However, Connor fails to disclose a first motor to move the camera 902 and a second motor to move the wearable camera. Thus, Applicant respectfully submits that Connor fails to teach or suggest "a first motor to move the first camera; a second motor to move the second camera," as recited in claim 1.

Accordingly, Applicant respectfully submits that Connor also fails to teach or suggest "processing circuitry to infer a dish or an ingredient to be consumed by a user on a basis of a captured image taken by the first camera and perform creation of dietary intake information regarding the dish or the ingredient that has been inferred, create, for a target dish or a target ingredient among the dish or the ingredient inferred by the inferring, a question asking what the target dish or the target ingredient is, detect an action of the user related to the target dish or the target ingredient on a basis of another captured image taken by the second camera," as recited in Applicant's claim 1.

Thus, Applicant respectfully submits that independent claims 1 and 14 (and all claims depending thereon) patentably distinguish over Connor. Further, Applicant respectfully submits that Connor 2 and Hadad fail to cure the above-noted deficiencies of Connor.

Connor2 teaches:
Fig. 32, ref. 3210 teaches [first] camera taking an image of a dish or ingredient…


    PNG
    media_image1.png
    184
    282
    media_image1.png
    Greyscale


“In this example, the person is wearing an automatic-imaging member comprised of a wrist band 3208 to which are attached two cameras, 3209 and 3210, on the opposite (narrow) sides of the person's wrist. Camera 3209 takes pictures within field of vision 3211. Camera 3210 takes pictures within field of vision 3212. Each field of vision, 3211 and 3212, is represented in these figures by a dotted-line conical shape. The narrow tip of the dotted-line cone is at the camera's aperture and the circular base of the cone represents the camera's field of vision at a finite focal distance from the camera's aperture.” [0303]

Camera (first camera) downward-facing on food source (dish or ingredient)…
“Field of vision 3211 from camera 3209 is represented in FIG. 32 by a generally upward-facing cone-shaped configuration of dotted lines that generally encompasses the person's mouth and face as the person eats. Field of vision 3212 from camera 3210 is represented in FIG. 32 by a generally downward-facing cone-shaped configuration of dotted lines that generally encompasses the reachable food source as the person eats.” [0305]

Fig. 32, ref. 3209 teaches [second] camera taking an image of a user…


    PNG
    media_image2.png
    330
    298
    media_image2.png
    Greyscale


“In this example, the person is wearing an automatic-imaging member comprised of a wrist band 3208 to which are attached two cameras, 3209 and 3210, on the opposite (narrow) sides of the person's wrist. Camera 3209 takes pictures within field of vision 3211. Camera 3210 takes pictures within field of vision 3212. Each field of vision, 3211 and 3212, is represented in these figures by a dotted-line conical shape. The narrow tip of the dotted-line cone is at the camera's aperture and the circular base of the cone represents the camera's field of vision at a finite focal distance from the camera's aperture.” [0303]

Camera (second camera) upward-facing on face of user as the person eats (detecting an action of the person)…
“Field of vision 3211 from camera 3209 is represented in FIG. 32 by a generally upward-facing cone-shaped configuration of dotted lines that generally encompasses the person's mouth and face as the person eats. Field of vision 3212 from camera 3210 is represented in FIG. 32 by a generally downward-facing cone-shaped configuration of dotted lines that generally encompasses the reachable food source as the person eats.” [0305]

Automatically move cameras (therefore using a motor of some type)…
“In an example, the location of one or more cameras may be moved automatically, independently of movement of the body member to which the cameras are attached, in order to increase the probability of encompassing both the person's mouth and a reachable food source. In an example, the lenses of one or more cameras may be automatically and independently moved in order to increase the probability of encompassing both the person's mouth and a reachable food source. In various examples, a lens may be automatically shifted or rotated to change the direction or focal length of the camera's field of vision. In an example, the lenses of one or more cameras may be automatically moved to track the person's mouth and hand. In an example, the lenses of one or more cameras may be automatically moved to scan for reachable food sources.” [0386] Inherent with automatically move a camera is some type of motor device.

Therefore, Connor2 teaches automatically moving cameras to capture a person’s mouth and a reachable food source.  While they do not literally use the word “motor,” it is physically impossible to automatically move the location of a camera without some type of motor when the cameras independent of movement of the body member to which it is attached.

Also from Connor2…
Camera can scan and back and forth…
“In an example, video camera 2107 can have a fixed focal direction and focal length. In an example, the focal direction of the video camera may always point toward the person's fingers and the space surrounding the person's fingers. In another example, the video camera can have a focal direction or focal length that is automatically adjusted while the camera is in operation. In an example, when it is in operation, the video camera can scan back and forth through the space near the person's hand and fingers to search for food. In an example, the video camera can use pattern recognition to track the relative location of the person's fingers. In an example, the camera can automatically adjust its focal direction and/or focal length to monitor and identify eating-related objects (such as a fork or glass) that come into contact with the person's fingers.” [0244]  Inherent with scan and track is a motor.

Video camera can scan in a spiral, radial or back and forth pattern (therefore move location)…
“In an example, video camera 2107 can scan in a spiral, radial, or back-and-forth pattern in order to monitor activity near both the person's fingers and the person's mouth. This is more complex than just tracking the person's fingers. This requires that the device keep track of where the person's fingers and mouth are, in three-dimensional space, relative to the camera as the person moves their arm, hand, and head. In an example, face recognition software can help the device to track the person's mouth and gesture recognition software can help the device to track the person's fingers.” [0245] Inherent with scan in spiral, radial and back-and-forth pattern is a motor to cause the scan movements.

“In the example shown in FIG. 21, there is only one miniature video camera in this device and it is located on the outer portion of the person's wrist where the main body of a wrist watch would generally be located. In another example, such a device may have one video camera located on the opposite side of the person's wrist. In other examples, there may be two or more video cameras mounted on different locations around the person's wrist. In an example with two or more video cameras, different cameras may track different objects. For example, one camera may track the person's fingers and the other camera may track the person's mouth. In various examples, different cameras may operate at different times and/or with different focal lengths.” [0246]

Non-wearable imaging device…
“In an example of this invention, multiple imaging members may be worn on the same body member. In another example, multiple imaging members may be worn on different body members. In an example, an imaging member may be worn on each of a person's wrists or each of a person's hands. In an example, one or more imaging members may be worn on a body member and a supplemental imaging member may be located in a non-wearable device that is in proximity to the person. In an example, wearable and non-wearable imaging members may be in wireless communication with each other. In an example, wearable and non-wearable imaging members may be in wireless communication with an image-analyzing member.” [0348]

Therefore, cameras that move and are wearable or non-wearable.

Accordingly, Applicant respectfully requests that the rejections under 35 U.S.C. § 103(a) be withdrawn. 

Consequently, in view of the present amendment, and in light of the above discussion, the pending claims as presented herewith are believed to be in condition for formal allowance, and an early and favorable action to that effect is respectfully requested.

The rejection is respectfully maintained but modified for the claim amendments.

35 USC § 101 Analysis 
Regarding 35 USC 101.  Claim 14 recites the following limitations:
A dietary intake information acquisition method comprising:
moving a position of a first camera with a first motor;
moving a position of a second camera with a second motor;
inferring of a dish or an ingredient to be consumed by a user on a basis of a captured image taken by the first camera and perform creation of dietary intake information regarding the dish or the ingredient that has been inferred;
creating, for a target dish or a target ingredient among the dish or the ingredient inferred by the inferring, a question asking what the target dish or the target ingredient is; 
detecting an action of the user related to the target dish or the target ingredient on a basis of another captured image taken by the second camera;
determining a question timing at which the question created by the creation is output from a period in which an action of the user related to the target dish or the target ingredient is detected by the detection;
controlling a speaker to output question voice output information for outputting the question created by the creation of the question by voice at the question timing determined by the determination;
acquiring an uttered speech of the user by a microphone in response to the question output by the output by voice on a basis of the question voice output information;
performing speech recognition on the uttered speech acquired by the acquisition; and
performing reflection of, in the dietary intake information, information regarding the dish or the ingredient which has been obtained from the user performing answer of the question and which has been specified on a basis of a result of the speech recognition.

While the above Claim 14 recites abstract elements, the combination of two moving cameras with motors, using the cameras to take images of a dish/ingredient and a user, along with controlling a speaker provide a practical application and significantly more.  

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-16 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim 14 recites: “moving a position of a first camera with a first motor; moving a position of a second camera with a second motor;” where there is no teaching of moving a position of a camera with a motor.
From Applicant’s specification:
“… The first camera 21 and the second camera 22 include a drive unit (not illustrated) including a motor or the like, and are provided in such a manner that the imaging direction can be changed by the drive unit.” [0013]
Therefore, the imaging direction, not camera position, can be changed.  For examination purposes, this is interpreted as changing an imaging direction of a camera.  Claim 1 has a similar problem.
Claims 2-13, 15, and 16 are further rejected as they depend from their independent claim 1.

Examiner Request
The Applicant is requested to indicate where in the specification there is support for amendments to claims should Applicant amend.  The purpose of this is to reduce potential 35 U.S.C. §112(a) or §112 1st paragraph issues that can arise when claims are amended without support in the specification.  The Examiner thanks the Applicant in advance.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-11 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Pub. No. US 2015/0168365 to Connor in view of Pub. No. US 2015/0379238 to Connor (hereinafter referred to as Connor2).
Regarding claim 1
A dietary intake information acquisition device comprising:
a first camera;

Connor teaches:
Cameras (plural)…
“Examples of devices and methods in this category include: wearable accelerometers or other motion sensors that detect body motions associated with eating (e.g. particular patterns of hand movements or mouth movements); wearable heart rate, blood pressure, and/or electromagnetic body signal monitors that are used to detect eating events; wearable thermal energy sensors that are used to detect eating events; wearable glucose monitors that are used to detect eating events and provide some information about the nutritional composition of food consumed; wearable body fluid sampling devices such as continuous micro-sampling blood analysis devices; wearable sound sensors that detect body sounds or environmental sounds associated with eating events (e.g. chewing sounds, swallowing sounds, gastrointestinal organ sounds, and verbal food orders); and wearable cameras that continually take video images of the space surrounding the person wherein these video images are analyzed to detect eating events and identify foods consumed.” [0054]

a second camera;

Cameras (plural)…
“Examples of devices and methods in this category include: wearable accelerometers or other motion sensors that detect body motions associated with eating (e.g. particular patterns of hand movements or mouth movements); wearable heart rate, blood pressure, and/or electromagnetic body signal monitors that are used to detect eating events; wearable thermal energy sensors that are used to detect eating events; wearable glucose monitors that are used to detect eating events and provide some information about the nutritional composition of food consumed; wearable body fluid sampling devices such as continuous micro-sampling blood analysis devices; wearable sound sensors that detect body sounds or environmental sounds associated with eating events (e.g. chewing sounds, swallowing sounds, gastrointestinal organ sounds, and verbal food orders); and wearable cameras that continually take video images of the space surrounding the person wherein these video images are analyzed to detect eating events and identify

a first motor to move the first camera;

See Move First and Second Camera below.

a second motor to move the second camera;

See Move First and Second Camera below.

processing circuitry

Example of microprocessor…
“In an example, a wearable device for measuring a person's consumption of at least one selected type of food, ingredient, or nutrient can comprise multiple components selected from the group consisting of: Central Processing Unit (CPU) or microprocessor; food-consumption monitoring component (motion sensor, electromagnetic sensor, optical sensor, and/or chemical sensor); graphic display component (display screen and/or coherent light projection); human-to-computer communication component (speech recognition, touch screen, keypad or buttons, and/or gesture recognition); memory component (flash, RAM, or ROM); power source and/or power-transducing component; time keeping and display component; wireless data transmission and reception component; and strap or band.” [0200]

infer a dish or an ingredient to be consumed by a user on a basis of a captured image taken by the first camera and perform creation of dietary intake information regarding the dish or the ingredient that has been inferred,

Images analyzed to identifying (inferring) types of food (dish) consumed…
“On the one hand, one can create an external device that can be very accurate in monitoring and measuring a person's food consumption, but it will be highly-intrusive with respect to the privacy of the person being monitored and other people nearby. For example, one can create a video imaging device that a person wears continually on their head, neck, or torso. This wearable imaging device can continually take video images of the space surrounding the person. Then these video images can be automatically analyzed to detect eating events and to identify the types of foods that the person consumes. However, continuous video monitoring of the space surrounding a person can be highly-intrusive with respect to the person's privacy and also the privacy of other people nearby.” [0018]

Diet log (dietary intake) and example of tracking food (diet)…
“For decades, many people manually kept track of what foods they ate and/or the associated calories (often called a "food log," "diet log," or "calorie counting") using a pencil and paper. With the development of personal computers, mobile electronic devices, and smart phone applications, much of this manual food consumption tracking has been made easier with menu-driven human-to-computer interfaces that help people to more easily enter information concerning what food they eat. Databases of common foods and their associated nutritional information (including calories) have made calorie counting easier by automatically associating calories with foods entered.” [0027]

Analyze pictures (images) of foods (dish) and ingredients to estimate (infer) type of foods, ingredients, etc. being consumed…
“FIGS. 9 through 12 also show an example of how this invention can be embodied in a device for monitoring food consumption comprising: (a) a wearable sensor that is configured to be worn on a person's body or clothing, wherein this wearable sensor automatically collects data that is used to detect probable eating events without requiring action by the person in association with a probable eating event apart from the act of eating, and wherein a probable eating event is a period of time during which the person is probably eating; (b) an imaging member, wherein this imaging member is used by the person to take pictures of food that the person eats, wherein using this imaging member to take pictures of food requires voluntary action by the person apart from the act of eating, and wherein the person is prompted to take pictures of food using this imaging member when data collected by the wearable sensor indicates a probable eating event; and (c) a data analysis component, wherein this component analyzes pictures of food taken by the imaging member to estimate the types and amounts of foods, ingredients, nutrients, and/or calories that are consumed by the person. In this example, the wearable sensor is motion sensor 203. In this example, the imaging member is camera 902 which is part of phone 901. In this example, the data analysis component is data processing unit 204.” [0384]

See First and Second Camera below.

create, for a target dish or a target ingredient among the dish or the ingredient inferred by the inferring, a question asking what the target dish or the target ingredient is,

Device can ask (creation) a question of food consumed (target dish) with estimates (infer) of types of food…
“In an example, a device can ask a person clarifying questions concerning food consumed. In an example, a device can prompt the person with queries to refine initial automatically-generated estimates of the types and quantities of food consumed. In an example, these questions can be asked in real time, as a person is eating, or in a delayed manner, after a person has finished eating or at a particular time of the day. In an example, the results of preliminary automated food identification can be presented to a human via a graphical user interface and the human can then refine the results using a touch screen. In an example, the results of automated food identification can be presented to a human via verbal message and the human can refine the results using a speech recognition interface. In an example, data can be transmitted (such as by the internet) to a review center where food is identified by a dietician or other specialist. In various examples, a human-to-computer interface for entering information concerning food consumption can comprise one or more interface elements selected the group consisting of: microphone, speech recognition, and/or voice recognition interface; touch screen, touch pad, keypad, keyboard, buttons, or other touch-based interface; camera, motion recognition, gesture recognition, eye motion tracking, or other motion detection interface; interactive food-identification menu with food pictures and names; and interactive food-identification search box.” [0276]

detect an action of the user related to the target dish or the target ingredient on a basis of the another captured image taken by the second camera,

One example of detect body motions (actions of a user) associated with eating (related to target dished or ingredients) and with cameras…
“Examples of devices and methods in this category include: wearable accelerometers or other motion sensors that detect body motions associated with eating (e.g. particular patterns of hand movements or mouth movements); wearable heart rate, blood pressure, and/or electromagnetic body signal monitors that are used to detect eating events; wearable thermal energy sensors that are used to detect eating events; wearable glucose monitors that are used to detect eating events and provide some information about the nutritional composition of food consumed; wearable body fluid sampling devices such as continuous micro-sampling blood analysis devices; wearable sound sensors that detect body sounds or environmental sounds associated with eating events (e.g. chewing sounds, swallowing sounds, gastrointestinal organ sounds, and verbal food orders); and wearable cameras that continually take video images of the space surrounding the person wherein these video images are analyzed to detect eating events and identify foods consumed.” [0054]

Detection of person eating (action)…
“In an example, a device can ask a person clarifying questions concerning food consumed. In an example, a device can prompt the person with queries to refine initial automatically-generated estimates of the types and quantities of food consumed. In an example, these questions can be asked in real time, as a person is eating, or in a delayed manner, after a person has finished eating or at a particular time of the day. In an example, the results of preliminary automated food identification can be presented to a human via a graphical user interface and the human can then refine the results using a touch screen. In an example, the results of automated food identification can be presented to a human via verbal message and the human can refine the results using a speech recognition interface. In an example, data can be transmitted (such as by the internet) to a review center where food is identified by a dietician or other specialist. In various examples, a human-to-computer interface for entering information concerning food consumption can comprise one or more interface elements selected the group consisting of: microphone, speech recognition, and/or voice recognition interface; touch screen, touch pad, keypad, keyboard, buttons, or other touch-based interface; camera, motion recognition, gesture recognition, eye motion tracking, or other motion detection interface; interactive food-identification menu with food pictures and names; and interactive food-identification search box.” [0276]

See First and Second Camera below.

determine a question timing at which the question created by the creation is output from a period in which an action of the user related to the target dish or the target ingredient is detected by the detection,

	Question asked in real time (question timing) or a delayed manner…
“In an example, a device can ask a person clarifying questions concerning food consumed. In an example, a device can prompt the person with queries to refine initial automatically-generated estimates of the types and quantities of food consumed. In an example, these questions can be asked in real time, as a person is eating, or in a delayed manner, after a person has finished eating or at a particular time of the day. In an example, the results of preliminary automated food identification can be presented to a human via a graphical user interface and the human can then refine the results using a touch screen. In an example, the results of automated food identification can be presented to a human via verbal message and the human can refine the results using a speech recognition interface. In an example, data can be transmitted (such as by the internet) to a review center where food is identified by a dietician or other specialist. In various examples, a human-to-computer interface for entering information concerning food consumption can comprise one or more interface elements selected the group consisting of: microphone, speech recognition, and/or voice recognition interface; touch screen, touch pad, keypad, keyboard, buttons, or other touch-based interface; camera, motion recognition, gesture recognition, eye motion tracking, or other motion detection interface; interactive food-identification menu with food pictures and names; and interactive food-identification search box.” [0276]

control a speaker to output of question voice output information for outputting the question created by the creation of the question by voice at the question timing determined by the determination,

Voice recognition interface….
“In an example, a device can ask a person clarifying questions concerning food consumed. In an example, a device can prompt the person with queries to refine initial automatically-generated estimates of the types and quantities of food consumed. In an example, these questions can be asked in real time, as a person is eating, or in a delayed manner, after a person has finished eating or at a particular time of the day. In an example, the results of preliminary automated food identification can be presented to a human via a graphical user interface and the human can then refine the results using a touch screen. In an example, the results of automated food identification can be presented to a human via verbal message and the human can refine the results using a speech recognition interface. In an example, data can be transmitted (such as by the internet) to a review center where food is identified by a dietician or other specialist. In various examples, a human-to-computer interface for entering information concerning food consumption can comprise one or more interface elements selected the group consisting of: microphone, speech recognition, and/or voice recognition interface; touch screen, touch pad, keypad, keyboard, buttons, or other touch-based interface; camera, motion recognition, gesture recognition, eye motion tracking, or other motion detection interface; interactive food-identification menu with food pictures and names; and interactive food-identification search box.” [0276]

See Voice and Speaker below.

acquire an uttered speech of the user by a microphone in response to the question output by the output by voice on a basis of the question voice output information,

Receive verbal input (perform acquisition of an uttered speech) from a person with sound sensor…
“In an example, a sound sensor can include speech recognition or voice recognition to receive verbal input from a person concerning food that the person consumes. In an example, a sound sensor can include speech recognition or voice recognition to extract food selecting, ordering, purchasing, or consumption information from other sounds in the environment.” [0167]

Sensor as a microphone…
“In an example, a food-consumption monitor or food-identifying sensor can be a microphone or other type of sound sensor. In an example, a sensor to detect food consumption and/or identify consumption of a selected type of food, ingredient, or nutrient can be a sound sensor. In an example, a sound sensor can be an air conduction microphone or bone conduction microphone. In an example, a microphone or other sound sensor can monitor for sounds associated with chewing or swallowing food. In an example, data collected by a sound sensor can be analyzed to differentiate sounds from chewing or swallowing food from other types of sounds such as speaking, singing, coughing, and sneezing.” [0166]

perform speech recognition on the uttered speech acquired by the acquisition, and

Speech recognition…
“In an example, a sound sensor can include speech recognition or voice recognition to receive verbal input from a person concerning food that the person consumes. In an example, a sound sensor can include speech recognition or voice recognition to extract food selecting, ordering, purchasing, or consumption information from other sounds in the environment.” [0167]

perform reflection of, in the dietary intake information, information regarding the dish or the ingredient which has been obtained from the user performing answer of the question and which has been specified on a basis of a result of the speech recognition.

Better management or modification (reflection) information of food consumption (dish)…
“In various examples, a device for measuring a person's consumption of at least one selected type of food, ingredient, or nutrient can provide feedback to the person that is selected from the group consisting of: advice concerning consumption of specific foods or suggested food alternatives (such as advice from a dietician, nutritionist, nurse, physician, health coach, other health care professional, virtual agent, or health plan); electronic verbal or written feedback (such as phone calls, electronic verbal messages, or electronic text messages); live communication from a health care professional; questions to the person that are directed toward better measurement or modification of food consumption; real-time advice concerning whether to eat specific foods and suggestions for alternatives if foods are not healthy; social feedback (such as encouragement or admonitions from friends and/or a social network); suggestions for meal planning and food consumption for an upcoming day; and suggestions for physical activity and caloric expenditure to achieve desired energy balance outcomes.” [0230]

Move First and Second Camera
Connor teaches cameras.  He does not specifically teach move two cameras.

Connor2 also in the business of camaras teaches:
Fig. 32, ref. 3210 teaches [first] camera taking an image of a dish or ingredient…


    PNG
    media_image1.png
    184
    282
    media_image1.png
    Greyscale


“In this example, the person is wearing an automatic-imaging member comprised of a wrist band 3208 to which are attached two cameras, 3209 and 3210, on the opposite (narrow) sides of the person's wrist. Camera 3209 takes pictures within field of vision 3211. Camera 3210 takes pictures within field of vision 3212. Each field of vision, 3211 and 3212, is represented in these figures by a dotted-line conical shape. The narrow tip of the dotted-line cone is at the camera's aperture and the circular base of the cone represents the camera's field of vision at a finite focal distance from the camera's aperture.” [0303]

Camera (first camera) downward-facing on food source (dish or ingredient)…
“Field of vision 3211 from camera 3209 is represented in FIG. 32 by a generally upward-facing cone-shaped configuration of dotted lines that generally encompasses the person's mouth and face as the person eats. Field of vision 3212 from camera 3210 is represented in FIG. 32 by a generally downward-facing cone-shaped configuration of dotted lines that generally encompasses the reachable food source as the person eats.” [0305]

Fig. 32, ref. 3209 teaches [second] camera taking an image of a user…


    PNG
    media_image2.png
    330
    298
    media_image2.png
    Greyscale


“In this example, the person is wearing an automatic-imaging member comprised of a wrist band 3208 to which are attached two cameras, 3209 and 3210, on the opposite (narrow) sides of the person's wrist. Camera 3209 takes pictures within field of vision 3211. Camera 3210 takes pictures within field of vision 3212. Each field of vision, 3211 and 3212, is represented in these figures by a dotted-line conical shape. The narrow tip of the dotted-line cone is at the camera's aperture and the circular base of the cone represents the camera's field of vision at a finite focal distance from the camera's aperture.” [0303]

Camera (second camera) upward-facing on face of user as the person eats (detecting an action of the person)…
“Field of vision 3211 from camera 3209 is represented in FIG. 32 by a generally upward-facing cone-shaped configuration of dotted lines that generally encompasses the person's mouth and face as the person eats. Field of vision 3212 from camera 3210 is represented in FIG. 32 by a generally downward-facing cone-shaped configuration of dotted lines that generally encompasses the reachable food source as the person eats.” [0305]

Automatically move cameras (therefore using a motor of some type)…
“In an example, the location of one or more cameras may be moved automatically, independently of movement of the body member to which the cameras are attached, in order to increase the probability of encompassing both the person's mouth and a reachable food source. In an example, the lenses of one or more cameras may be automatically and independently moved in order to increase the probability of encompassing both the person's mouth and a reachable food source. In various examples, a lens may be automatically shifted or rotated to change the direction or focal length of the camera's field of vision. In an example, the lenses of one or more cameras may be automatically moved to track the person's mouth and hand. In an example, the lenses of one or more cameras may be automatically moved to scan for reachable food sources.” [0386] Inherent with automatically move a camera is some type of motor device.

Focal direction automatically adjusted (moved) and track…
“In an example, video camera 2107 can have a fixed focal direction and focal length. In an example, the focal direction of the video camera may always point toward the person's fingers and the space surrounding the person's fingers. In another example, the video camera can have a focal direction or focal length that is automatically adjusted while the camera is in operation. In an example, when it is in operation, the video camera can scan back and forth through the space near the person's hand and fingers to search for food. In an example, the video camera can use pattern recognition to track the relative location of the person's fingers. In an example, the camera can automatically adjust its focal direction and/or focal length to monitor and identify eating-related objects (such as a fork or glass) that come into contact with the person's fingers.” [0244]  Inherent with automatically adjust direction and scan and track is a motor.

Video camera can scan in a spiral, radial or back and forth pattern (therefore move location)…
“In an example, video camera 2107 can scan in a spiral, radial, or back-and-forth pattern in order to monitor activity near both the person's fingers and the person's mouth. This is more complex than just tracking the person's fingers. This requires that the device keep track of where the person's fingers and mouth are, in three-dimensional space, relative to the camera as the person moves their arm, hand, and head. In an example, face recognition software can help the device to track the person's mouth and gesture recognition software can help the device to track the person's fingers.” [0245] Inherent with scan in spiral, radial and back-and-forth pattern is a motor to cause the scan movements.

“In the example shown in FIG. 21, there is only one miniature video camera in this device and it is located on the outer portion of the person's wrist where the main body of a wrist watch would generally be located. In another example, such a device may have one video camera located on the opposite side of the person's wrist. In other examples, there may be two or more video cameras mounted on different locations around the person's wrist. In an example with two or more video cameras, different cameras may track different objects. For example, one camera may track the person's fingers and the other camera may track the person's mouth. In various examples, different cameras may operate at different times and/or with different focal lengths.” [0246]

Non-wearable imaging device…
“In an example of this invention, multiple imaging members may be worn on the same body member. In another example, multiple imaging members may be worn on different body members. In an example, an imaging member may be worn on each of a person's wrists or each of a person's hands. In an example, one or more imaging members may be worn on a body member and a supplemental imaging member may be located in a non-wearable device that is in proximity to the person. In an example, wearable and non-wearable imaging members may be in wireless communication with each other. In an example, wearable and non-wearable imaging members may be in wireless communication with an image-analyzing member.” [0348]

“In an example, first automatic-imaging member 5007 constantly maintains a line of sight to the person's mouth by constantly shifting the direction and/or focal length of its field of vision 5008. In another example, this first automatic-imaging member 5007 scans and acquires a line of sight to the person's mouth only when a sensor indicates that the person is eating. In an example, this scanning function may comprise changing the direction and/or focal length of the member's field of vision 5008. If the line of sight from this member to the person's mouth is obstructed, or otherwise impaired, then this device and method detects and responds to this impairment as part of its tamper-resisting function. In an example, its response to tampering helps to restore proper imaging function for automatic monitoring and estimation of caloric intake.” [0416]

It would have been obvious to one of ordinary skill in the art before the effective filing date to include in the method and system of Connor the ability to do move two cameras as taught by Connor2 since the claimed invention is merely a combination of old elements and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.  Further motivation is provided by Connor2 who teaches the advantages of moving two cameras for monitoring food consumption.  

Voice and Speaker
The combined references teach speech/voice and food.  They also teach question.  They do not specifically teach question with voice and speaker.

Connor2 also in the business of speech/voice and food teaches:
Question using voice-based inquiry…
“In FIG. 23, the motion sensor has detected this possible eating event (i.e. the glass being tilted up to the mouth and then back down) and this event has triggered a voice-based inquiry from the device to the person via microphone and speaker unit 2106. In an example, the device, upon detection of a probable eating event, can ask the person a question such as—“If you are eating something, please identify it.” The sound waves of this voice-based inquiry from the device are represented in FIG. 23 by concentric dotted lines 2301 expanding outward from the device. In this example, the device solicits voluntary data concerning food consumption from the person through a voice-based message. In other examples, a device may solicit voluntary data via other means such as a display screen, buzzing or ring tone, vibration, or text message.” [0248]

Example of speaker…
“In FIG. 23, the motion sensor has detected this possible eating event (i.e. the glass being tilted up to the mouth and then back down) and this event has triggered a voice-based inquiry from the device to the person via microphone and speaker unit 2106. In an example, the device, upon detection of a probable eating event, can ask the person a question such as—“If you are eating something, please identify it.” The sound waves of this voice-based inquiry from the device are represented in FIG. 23 by concentric dotted lines 2301 expanding outward from the device. In this example, the device solicits voluntary data concerning food consumption from the person through a voice-based message. In other examples, a device may solicit voluntary data via other means such as a display screen, buzzing or ring tone, vibration, or text message.” [0248]

It would have been obvious to one of ordinary skill in the art before the effective filing date to include in the method and system of the combined references the ability to do use voice for questions and speaker as taught by Connor2 since the claimed invention is merely a combination of old elements and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.  Further motivation is provided by the combined references that teach using voice capabilities, it would be obvious to use this for questions as taught by Connor2.

Regarding claim 2
The dietary intake information acquisition device according to claim 1, wherein the processing circuitry determines whether or not there is the dish or the ingredient that is a target to which the question is to be output among the dish or the ingredient inferred by the inferring on a basis of the dietary intake information created by the creation of the dietary intake information, and creates the question with the dish or the ingredient determined to be the target to which the question is to be output as the target dish or the target ingredient.

Connor teaches:
Various devices (processing circuitry)…
“In an example, a device and system for measuring a person's consumption of at least one selected type of food, ingredient, or nutrient can be entirely wearable or include a wearable component. In an example, a wearable device or component can be worn directly on a person's body, can be worn on a person's clothing, or can be integrated into a specific article of clothing. In an example, a wearable device for measuring food consumption can be in wireless communication with an external device. In various examples, a wearable device for measuring food consumption can be in wireless communication with an external device selected from the group consisting of: a cell phone, an electronic tablet, electronically-functional eyewear, a home electronics portal, an internet portal, a laptop computer, a mobile phone, a remote computer, a remote control unit, a smart phone, a smart utensil, a television set, and a virtual menu system.” [0199]

Example of using computer (processing circuitry)…
“In various examples, a device for measuring a person's consumption of at least one selected type of food, ingredient, or nutrient can provide feedback to the person that is selected from the group consisting of: auditory feedback (such as a voice message, alarm, buzzer, ring tone, or song); feedback via computer-generated speech; mild external electric charge or neural stimulation; periodic feedback at a selected time of the day or week; phantom taste or smell; phone call; pre-recorded audio or video message by the person from an earlier time; television-based messages; and tactile, vibratory, or pressure-based feedback.” [0227]

Regarding claim 3
The dietary intake information acquisition device according to claim 1, wherein the processing circuitry detects an action of the user of touching a tableware on which the target dish or the target ingredient is served.

Connor teaches:
Fig. 2 and touching a spoon (tableware)…


    PNG
    media_image3.png
    162
    488
    media_image3.png
    Greyscale


One example of touch sensor and utensil…
“In an example, a sensor to monitor, detect, or sense food consumption or to identify a selected type of food, ingredient, or nutrient consumed can be pressure sensor or touch sensor. In an example, a pressure or touch sensor can sense pressure or tactile information from contact with food that will be consumed. In an example, a pressure or touch sensor can be incorporated into a smart food utensil or food probe. In an example, a pressure or touch based sensor can be incorporated into a pad on which a food utensil is placed between mouthfuls or when not in use. In an example, a pressure or touch sensor can sense pressure or tactile information from contact with a body member whose internal pressure or external shape is affected by food consumption. In various examples, a pressure or touch sensor can be selected from the group consisting of: food viscosity sensor, blood pressure monitor, muscle pressure sensor, button or switch on a food utensil, jaw motion pressure sensor, and hand-to-mouth contact sensor.” [0174]

Another example of camera (processing circuitry) and picture of spoon…
“If the person has not already used camera 502 on smart spoon 501 to take pictures of food during a particular eating event detected by smart watch 201, then smart watch 201 prompts the person to take a picture of food using camera 502 on smart spoon 501. In this example, this prompt 301 is represented by a "lightning bolt" symbol in FIG. 7. In this example, the person complies with prompt 301 and activates camera 502 by touch in FIG. 8. In this example, a picture is taken of a mouthful of food 208 in the scoop of smart spoon 501. In another example, the person could aim camera 502 on smart spoon 501 toward food on a plate, food in a bowl, or food packaging to take a picture of food before it is apportioned by spoon 501.” [0358]

Regarding claim 4
The dietary intake information acquisition device according to claim 1, wherein the processing circuitry detects an action of the user of holding the target dish or the target ingredient with cutlery.


Connor teaches:
Fig. 2 and holding dish with spoon (cutlery)…


    PNG
    media_image3.png
    162
    488
    media_image3.png
    Greyscale

Regarding claim 5
The dietary intake information acquisition device according to claim 1, wherein the processing circuitry detects an action of the user of putting the target dish or the target ingredient into a mouth of the user or an action of the user of chewing the target dish or the target ingredient.

Connor teaches:
Example of eating and mouth movements (chewing) and chewing sounds…
“Examples of devices and methods in this category include: wearable accelerometers or other motion sensors that detect body motions associated with eating (e.g. particular patterns of hand movements or mouth movements); wearable heart rate, blood pressure, and/or electromagnetic body signal monitors that are used to detect eating events; wearable thermal energy sensors that are used to detect eating events; wearable glucose monitors that are used to detect eating events and provide some information about the nutritional composition of food consumed; wearable body fluid sampling devices such as continuous micro-sampling blood analysis devices; wearable sound sensors that detect body sounds or environmental sounds associated with eating events (e.g. chewing sounds, swallowing sounds, gastrointestinal organ sounds, and verbal food orders); and wearable cameras that continually take video images of the space surrounding the person wherein these video images are analyzed to detect eating events and identify foods consumed.” [0054]

Regarding claim 6
The dietary intake information acquisition device according to claim 1, wherein the processing circuitry detects an action of the user of swallowing the dish or the target ingredient. 

Connor teaches:
Example of swallowing sounds…
“Examples of devices and methods in this category include: wearable accelerometers or other motion sensors that detect body motions associated with eating (e.g. particular patterns of hand movements or mouth movements); wearable heart rate, blood pressure, and/or electromagnetic body signal monitors that are used to detect eating events; wearable thermal energy sensors that are used to detect eating events; wearable glucose monitors that are used to detect eating events and provide some information about the nutritional composition of food consumed; wearable body fluid sampling devices such as continuous micro-sampling blood analysis devices; wearable sound sensors that detect body sounds or environmental sounds associated with eating events (e.g. chewing sounds, swallowing sounds, gastrointestinal organ sounds, and verbal food orders); and wearable cameras that continually take video images of the space surrounding the person wherein these video images are analyzed to detect eating events and identify foods consumed.” [0054]

Regarding claim 7
The dietary intake information acquisition device according to claim 1, wherein the processing circuitry determines a period during which the user continues the action of the user related to the target dish or the target ingredient as the question timing.

Connor teaches:
Example of eating event (dish) and period of time…
“In an example, a device and method for measuring a person's consumption of at least one selected type of food, ingredient, or nutrient can collect data that enables tracking the cumulative amount of a type of food, ingredient, or nutrient which the person consumes during a period of time (such as an hour, day, week, or month) or during a particular eating event. In an example, the time boundaries of a particular eating event can be defined by a maximum time between chews or mouthfuls during a meal and/or a minimum time between chews or mouthfuls between meals. In an example, the time boundaries of a particular eating event can be defined by Fourier Transformation analysis of the variable frequencies of chewing, swallowing, or biting during meals vs. between meals.” [0131]

Question asked in real time (question timing) or a delayed manner…
“In an example, a device can ask a person clarifying questions concerning food consumed. In an example, a device can prompt the person with queries to refine initial automatically-generated estimates of the types and quantities of food consumed. In an example, these questions can be asked in real time, as a person is eating, or in a delayed manner, after a person has finished eating or at a particular time of the day. In an example, the results of preliminary automated food identification can be presented to a human via a graphical user interface and the human can then refine the results using a touch screen. In an example, the results of automated food identification can be presented to a human via verbal message and the human can refine the results using a speech recognition interface. In an example, data can be transmitted (such as by the internet) to a review center where food is identified by a dietician or other specialist. In various examples, a human-to-computer interface for entering information concerning food consumption can comprise one or more interface elements selected the group consisting of: microphone, speech recognition, and/or voice recognition interface; touch screen, touch pad, keypad, keyboard, buttons, or other touch-based interface; camera, motion recognition, gesture recognition, eye motion tracking, or other motion detection interface; interactive food-identification menu with food pictures and names; and interactive food-identification search box.” [0276]

Regarding claim 8
The dietary intake information acquisition device according to claim 1, wherein the processing circuitry acquires the uttered speech of surroundings, and

Connor teaches:
Sound sensor and speech or voice recognition…
“In an example, a sound sensor can include speech recognition or voice recognition to receive verbal input from a person concerning food that the person consumes. In an example, a sound sensor can include speech recognition or voice recognition to extract food selecting, ordering, purchasing, or consumption information from other sounds in the environment.” [0167]

the processing circuitry determines whether or not there is a speech on a basis of the result of the speech recognition, and does not determine the question timing while there is the speech.

Human-to-computer interface for entering information using speech recognition, where no timing is taught during speech recognition…
“In an example, a device can ask a person clarifying questions concerning food consumed. In an example, a device can prompt the person with queries to refine initial automatically-generated estimates of the types and quantities of food consumed. In an example, these questions can be asked in real time, as a person is eating, or in a delayed manner, after a person has finished eating or at a particular time of the day. In an example, the results of preliminary automated food identification can be presented to a human via a graphical user interface and the human can then refine the results using a touch screen. In an example, the results of automated food identification can be presented to a human via verbal message and the human can refine the results using a speech recognition interface. In an example, data can be transmitted (such as by the internet) to a review center where food is identified by a dietician or other specialist. In various examples, a human-to-computer interface for entering information concerning food consumption can comprise one or more interface elements selected the group consisting of: microphone, speech recognition, and/or voice recognition interface; touch screen, touch pad, keypad, keyboard, buttons, or other touch-based interface; camera, motion recognition, gesture recognition, eye motion tracking, or other motion detection interface; interactive food-identification menu with food pictures and names; and interactive food-identification search box.” [0276]

Regarding claim 9
The dietary intake information acquisition device according to claim 1, wherein the processing circuitry creates the question asking what the target dish or the target ingredient is, using a demonstrative.
{
From Applicant’s specification on demonstrative…
“The question creation unit 103 creates, for example, a question asking what the question target is with a demonstrative. As a specific example, the question creation unit 103 creates a question asking "What is that dish?" or "What is that dish you are eating now?" for the target dish, for example. Furthermore, the question creation unit 103 creates a question asking "What is that ingredient?", "What is in that dish?", or "What is that ingredient you are eating now?", for example, for the target ingredient.” [0035]

Therefore, what dish (food) are you eating as an example.
}

Connor teaches:
Ask a person questions concerning food consumed…
“In an example, a device can ask a person clarifying questions concerning food consumed. In an example, a device can prompt the person with queries to refine initial automatically-generated estimates of the types and quantities of food consumed. In an example, these questions can be asked in real time, as a person is eating, or in a delayed manner, after a person has finished eating or at a particular time of the day. In an example, the results of preliminary automated food identification can be presented to a human via a graphical user interface and the human can then refine the results using a touch screen. In an example, the results of automated food identification can be presented to a human via verbal message and the human can refine the results using a speech recognition interface. In an example, data can be transmitted (such as by the internet) to a review center where food is identified by a dietician or other specialist. In various examples, a human-to-computer interface for entering information concerning food consumption can comprise one or more interface elements selected the group consisting of: microphone, speech recognition, and/or voice recognition interface; touch screen, touch pad, keypad, keyboard, buttons, or other touch-based interface; camera, motion recognition, gesture recognition, eye motion tracking, or other motion detection interface; interactive food-identification menu with food pictures and names; and interactive food-identification search box.” [0276]

Regarding claim 10
The dietary intake information acquisition device according to claim 2, wherein the processing circuitry calculates degree of certainty indicating certainty of an inference result of the dish or the ingredient, and

Connor teaches:
Computer with types of foods consumed and level of certainty (calculates degree of certainty), also weight (degree of certainty)…
“In an example, initial estimates of the types and amounts of food consumed can be made by a computer in an automated manner and then refined by human review as needed. In an example, if automated methods for identification of the types and amounts of food consumed do not produce results with a required level of certainty, then a device and system can prompt a person to collect and/or otherwise provide supplemental information concerning the types of food that the person is consuming. In an example, a device and system can track the accuracy of food consumption information provided by an automated process vs. that provided by a human by comparing predicted to actual changes in a person's weight. In an example, the relative weight which a device and system places on information from automated processes vs. information from human input can be adjusted based on their relatively accuracy in predicting weight changes. Greater weight can be given to the information source which is more accurate based on empirical validation.” [0275]

the processing circuitry determines whether or not there is the target dish or the target ingredient among the dish or the ingredient inferred by the inferring on a basis of the degree of certainty.
	
Example of initial estimate (inferring) of types of food (dish)…
“In an example, initial estimates of the types and amounts of food consumed can be made by a computer in an automated manner and then refined by human review as needed. In an example, if automated methods for identification of the types and amounts of food consumed do not produce results with a required level of certainty, then a device and system can prompt a person to collect and/or otherwise provide supplemental information concerning the types of food that the person is consuming. In an example, a device and system can track the accuracy of food consumption information provided by an automated process vs. that provided by a human by comparing predicted to actual changes in a person's weight. In an example, the relative weight which a device and system places on information from automated processes vs. information from human input can be adjusted based on their relatively accuracy in predicting weight changes. Greater weight can be given to the information source which is more accurate based on empirical validation.” [0275]

Regarding claim 11
The dietary intake information acquisition device according to claim 2, wherein the processing circuitry determines again whether or not there is the target dish or the target ingredient among the dish or the ingredient inferred by the inferring on a basis of the dietary intake information in which information regarding the dish or the ingredient obtained from the user as the answer has been reflected by the reflection.

Connor teaches:
Example of initial estimate (inferring) of types of food (dish) and collect supplemental information (answer) about type of food…
“In an example, initial estimates of the types and amounts of food consumed can be made by a computer in an automated manner and then refined by human review as needed. In an example, if automated methods for identification of the types and amounts of food consumed do not produce results with a required level of certainty, then a device and system can prompt a person to collect and/or otherwise provide supplemental information concerning the types of food that the person is consuming. In an example, a device and system can track the accuracy of food consumption information provided by an automated process vs. that provided by a human by comparing predicted to actual changes in a person's weight. In an example, the relative weight which a device and system places on information from automated processes vs. information from human input can be adjusted based on their relatively accuracy in predicting weight changes. Greater weight can be given to the information source which is more accurate based on empirical validation.” [0275]

Regarding claim 14
A dietary intake information acquisition method comprising:
moving a position of a first camera with a first motor;

Connor teaches:
Example of cameras orient toward food, as the person moves, move as person brings food to their mouth (therefore, moving a camera)…
“In an example, use of a hand-held camera, mobile phone, or other imaging device to identify food depends on a person's manually aiming and triggering the device for each eating event. In an example, the person must bring the imaging device with them to each meal or snack, orient it toward the food to be consumed, and activate taking a picture of the food by touch or voice command. In an example, a camera, smart watch, smart necklace or other imaging device that is worn on a person's body or clothing can move passively as the person moves. In an example, the field of vision of an imaging device that is worn on a person's wrist, hand, arm, or finger can move as the person brings food up to their mouth when eating. In an example, such an imaging device can passively capture images of a reachable food source and interaction between food and a person's mouth.” [0151]

See First and Second Camera below.

	See Motor below.


moving a position of a second camera with a second motor;

Example of cameras orient toward food, as the person moves, move as person brings food to their mouth (therefore, moving a camera)…
“In an example, use of a hand-held camera, mobile phone, or other imaging device to identify food depends on a person's manually aiming and triggering the device for each eating event. In an example, the person must bring the imaging device with them to each meal or snack, orient it toward the food to be consumed, and activate taking a picture of the food by touch or voice command. In an example, a camera, smart watch, smart necklace or other imaging device that is worn on a person's body or clothing can move passively as the person moves. In an example, the field of vision of an imaging device that is worn on a person's wrist, hand, arm, or finger can move as the person brings food up to their mouth when eating. In an example, such an imaging device can passively capture images of a reachable food source and interaction between food and a person's mouth.” [0151]

See First and Second Camera below.

	See Motor below.

inferring of a dish or an ingredient to be consumed by a user on a basis of a captured image taken by the first camera and perform creation of dietary intake information regarding the dish or the ingredient that has been inferred;

Images analyzed to identifying (inferring) types of food (dish) consumed…
“On the one hand, one can create an external device that can be very accurate in monitoring and measuring a person's food consumption, but it will be highly-intrusive with respect to the privacy of the person being monitored and other people nearby. For example, one can create a video imaging device that a person wears continually on their head, neck, or torso. This wearable imaging device can continually take video images of the space surrounding the person. Then these video images can be automatically analyzed to detect eating events and to identify the types of foods that the person consumes. However, continuous video monitoring of the space surrounding a person can be highly-intrusive with respect to the person's privacy and also the privacy of other people nearby.” [0018]

Diet log (dietary intake) and example of tracking food (diet)…
“For decades, many people manually kept track of what foods they ate and/or the associated calories (often called a "food log," "diet log," or "calorie counting") using a pencil and paper. With the development of personal computers, mobile electronic devices, and smart phone applications, much of this manual food consumption tracking has been made easier with menu-driven human-to-computer interfaces that help people to more easily enter information concerning what food they eat. Databases of common foods and their associated nutritional information (including calories) have made calorie counting easier by automatically associating calories with foods entered.” [0027]

Analyze pictures (images) of foods (dish) and ingredients to estimate (infer) type of foods, ingredients, etc. being consumed…
“FIGS. 9 through 12 also show an example of how this invention can be embodied in a device for monitoring food consumption comprising: (a) a wearable sensor that is configured to be worn on a person's body or clothing, wherein this wearable sensor automatically collects data that is used to detect probable eating events without requiring action by the person in association with a probable eating event apart from the act of eating, and wherein a probable eating event is a period of time during which the person is probably eating; (b) an imaging member, wherein this imaging member is used by the person to take pictures of food that the person eats, wherein using this imaging member to take pictures of food requires voluntary action by the person apart from the act of eating, and wherein the person is prompted to take pictures of food using this imaging member when data collected by the wearable sensor indicates a probable eating event; and (c) a data analysis component, wherein this component analyzes pictures of food taken by the imaging member to estimate the types and amounts of foods, ingredients, nutrients, and/or calories that are consumed by the person. In this example, the wearable sensor is motion sensor 203. In this example, the imaging member is camera 902 which is part of phone 901. In this example, the data analysis component is data processing unit 204.” [0384]

See First and Second Camera below.


creating, for a target dish or a target ingredient among the dish or the ingredient inferred by the inferring, a question asking what the target dish or the target ingredient is; 

Device can ask (creation) a question of food consumed (target dish) with estimates (infer) of types of food…
“In an example, a device can ask a person clarifying questions concerning food consumed. In an example, a device can prompt the person with queries to refine initial automatically-generated estimates of the types and quantities of food consumed. In an example, these questions can be asked in real time, as a person is eating, or in a delayed manner, after a person has finished eating or at a particular time of the day. In an example, the results of preliminary automated food identification can be presented to a human via a graphical user interface and the human can then refine the results using a touch screen. In an example, the results of automated food identification can be presented to a human via verbal message and the human can refine the results using a speech recognition interface. In an example, data can be transmitted (such as by the internet) to a review center where food is identified by a dietician or other specialist. In various examples, a human-to-computer interface for entering information concerning food consumption can comprise one or more interface elements selected the group consisting of: microphone, speech recognition, and/or voice recognition interface; touch screen, touch pad, keypad, keyboard, buttons, or other touch-based interface; camera, motion recognition, gesture recognition, eye motion tracking, or other motion detection interface; interactive food-identification menu with food pictures and names; and interactive food-identification search box.” [0276]

detecting an action of the user related to the target dish or the target ingredient on a basis of another captured image taken by the second camera;

	Detection of person eating (action)…
“In an example, a device can ask a person clarifying questions concerning food consumed. In an example, a device can prompt the person with queries to refine initial automatically-generated estimates of the types and quantities of food consumed. In an example, these questions can be asked in real time, as a person is eating, or in a delayed manner, after a person has finished eating or at a particular time of the day. In an example, the results of preliminary automated food identification can be presented to a human via a graphical user interface and the human can then refine the results using a touch screen. In an example, the results of automated food identification can be presented to a human via verbal message and the human can refine the results using a speech recognition interface. In an example, data can be transmitted (such as by the internet) to a review center where food is identified by a dietician or other specialist. In various examples, a human-to-computer interface for entering information concerning food consumption can comprise one or more interface elements selected the group consisting of: microphone, speech recognition, and/or voice recognition interface; touch screen, touch pad, keypad, keyboard, buttons, or other touch-based interface; camera, motion recognition, gesture recognition, eye motion tracking, or other motion detection interface; interactive food-identification menu with food pictures and names; and interactive food-identification search box.” [0276]

See First and Second Camera below.

determining a question timing at which the question created by the creation is output from a period in which an action of the user related to the target dish or the target ingredient is detected by the detection;

	Question asked in real time (question timing) or a delayed manner…
“In an example, a device can ask a person clarifying questions concerning food consumed. In an example, a device can prompt the person with queries to refine initial automatically-generated estimates of the types and quantities of food consumed. In an example, these questions can be asked in real time, as a person is eating, or in a delayed manner, after a person has finished eating or at a particular time of the day. In an example, the results of preliminary automated food identification can be presented to a human via a graphical user interface and the human can then refine the results using a touch screen. In an example, the results of automated food identification can be presented to a human via verbal message and the human can refine the results using a speech recognition interface. In an example, data can be transmitted (such as by the internet) to a review center where food is identified by a dietician or other specialist. In various examples, a human-to-computer interface for entering information concerning food consumption can comprise one or more interface elements selected the group consisting of: microphone, speech recognition, and/or voice recognition interface; touch screen, touch pad, keypad, keyboard, buttons, or other touch-based interface; camera, motion recognition, gesture recognition, eye motion tracking, or other motion detection interface; interactive food-identification menu with food pictures and names; and interactive food-identification search box.” [0276]

controlling a speaker to output question voice output information for outputting the question created by the creation of the question by voice at the question timing determined by the determination;

Voice recognition interface….
“In an example, a device can ask a person clarifying questions concerning food consumed. In an example, a device can prompt the person with queries to refine initial automatically-generated estimates of the types and quantities of food consumed. In an example, these questions can be asked in real time, as a person is eating, or in a delayed manner, after a person has finished eating or at a particular time of the day. In an example, the results of preliminary automated food identification can be presented to a human via a graphical user interface and the human can then refine the results using a touch screen. In an example, the results of automated food identification can be presented to a human via verbal message and the human can refine the results using a speech recognition interface. In an example, data can be transmitted (such as by the internet) to a review center where food is identified by a dietician or other specialist. In various examples, a human-to-computer interface for entering information concerning food consumption can comprise one or more interface elements selected the group consisting of: microphone, speech recognition, and/or voice recognition interface; touch screen, touch pad, keypad, keyboard, buttons, or other touch-based interface; camera, motion recognition, gesture recognition, eye motion tracking, or other motion detection interface; interactive food-identification menu with food pictures and names; and interactive food-identification search box.” [0276]

	See Voice and Speaker below.

acquiring an uttered speech of the user by a microphone in response to the question output by the output by voice on a basis of the question voice output information;

Receive verbal input (perform acquisition of an uttered speech) from a person with sound sensor…
“In an example, a sound sensor can include speech recognition or voice recognition to receive verbal input from a person concerning food that the person consumes. In an example, a sound sensor can include speech recognition or voice recognition to extract food selecting, ordering, purchasing, or consumption information from other sounds in the environment.” [0167]

Sensor as a microphone…
“In an example, a food-consumption monitor or food-identifying sensor can be a microphone or other type of sound sensor. In an example, a sensor to detect food consumption and/or identify consumption of a selected type of food, ingredient, or nutrient can be a sound sensor. In an example, a sound sensor can be an air conduction microphone or bone conduction microphone. In an example, a microphone or other sound sensor can monitor for sounds associated with chewing or swallowing food. In an example, data collected by a sound sensor can be analyzed to differentiate sounds from chewing or swallowing food from other types of sounds such as speaking, singing, coughing, and sneezing.” [0166]

performing speech recognition on the uttered speech acquired by the acquisition; and

Speech recognition…
“In an example, a sound sensor can include speech recognition or voice recognition to receive verbal input from a person concerning food that the person consumes. In an example, a sound sensor can include speech recognition or voice recognition to extract food selecting, ordering, purchasing, or consumption information from other sounds in the environment.” [0167]

performing reflection of, in the dietary intake information, information regarding the dish or the ingredient which has been obtained from the user performing answer of the question and which has been specified on a basis of a result of the speech recognition.

Better management or modification (reflection) information of food consumption (dish)…
“In various examples, a device for measuring a person's consumption of at least one selected type of food, ingredient, or nutrient can provide feedback to the person that is selected from the group consisting of: advice concerning consumption of specific foods or suggested food alternatives (such as advice from a dietician, nutritionist, nurse, physician, health coach, other health care professional, virtual agent, or health plan); electronic verbal or written feedback (such as phone calls, electronic verbal messages, or electronic text messages); live communication from a health care professional; questions to the person that are directed toward better measurement or modification of food consumption; real-time advice concerning whether to eat specific foods and suggestions for alternatives if foods are not healthy; social feedback (such as encouragement or admonitions from friends and/or a social network); suggestions for meal planning and food consumption for an upcoming day; and suggestions for physical activity and caloric expenditure to achieve desired energy balance outcomes.” [0230]

Move First and Second Camera
Connor teaches cameras.  He does not specifically teach move two cameras.

Connor2 also in the business of camaras teaches:
Fig. 32, ref. 3210 teaches [first] camera taking an image of a dish or ingredient…


    PNG
    media_image1.png
    184
    282
    media_image1.png
    Greyscale


“In this example, the person is wearing an automatic-imaging member comprised of a wrist band 3208 to which are attached two cameras, 3209 and 3210, on the opposite (narrow) sides of the person's wrist. Camera 3209 takes pictures within field of vision 3211. Camera 3210 takes pictures within field of vision 3212. Each field of vision, 3211 and 3212, is represented in these figures by a dotted-line conical shape. The narrow tip of the dotted-line cone is at the camera's aperture and the circular base of the cone represents the camera's field of vision at a finite focal distance from the camera's aperture.” [0303]

Camera (first camera) downward-facing on food source (dish or ingredient)…
“Field of vision 3211 from camera 3209 is represented in FIG. 32 by a generally upward-facing cone-shaped configuration of dotted lines that generally encompasses the person's mouth and face as the person eats. Field of vision 3212 from camera 3210 is represented in FIG. 32 by a generally downward-facing cone-shaped configuration of dotted lines that generally encompasses the reachable food source as the person eats.” [0305]

Fig. 32, ref. 3209 teaches [second] camera taking an image of a user…


    PNG
    media_image2.png
    330
    298
    media_image2.png
    Greyscale


“In this example, the person is wearing an automatic-imaging member comprised of a wrist band 3208 to which are attached two cameras, 3209 and 3210, on the opposite (narrow) sides of the person's wrist. Camera 3209 takes pictures within field of vision 3211. Camera 3210 takes pictures within field of vision 3212. Each field of vision, 3211 and 3212, is represented in these figures by a dotted-line conical shape. The narrow tip of the dotted-line cone is at the camera's aperture and the circular base of the cone represents the camera's field of vision at a finite focal distance from the camera's aperture.” [0303]

Camera (second camera) upward-facing on face of user as the person eats (detecting an action of the person)…
“Field of vision 3211 from camera 3209 is represented in FIG. 32 by a generally upward-facing cone-shaped configuration of dotted lines that generally encompasses the person's mouth and face as the person eats. Field of vision 3212 from camera 3210 is represented in FIG. 32 by a generally downward-facing cone-shaped configuration of dotted lines that generally encompasses the reachable food source as the person eats.” [0305]

Automatically move cameras (therefore using a motor of some type)…
“In an example, the location of one or more cameras may be moved automatically, independently of movement of the body member to which the cameras are attached, in order to increase the probability of encompassing both the person's mouth and a reachable food source. In an example, the lenses of one or more cameras may be automatically and independently moved in order to increase the probability of encompassing both the person's mouth and a reachable food source. In various examples, a lens may be automatically shifted or rotated to change the direction or focal length of the camera's field of vision. In an example, the lenses of one or more cameras may be automatically moved to track the person's mouth and hand. In an example, the lenses of one or more cameras may be automatically moved to scan for reachable food sources.” [0386] Inherent with automatically move a camera is some type of motor device.

“In an example, video camera 2107 can have a fixed focal direction and focal length. In an example, the focal direction of the video camera may always point toward the person's fingers and the space surrounding the person's fingers. In another example, the video camera can have a focal direction or focal length that is automatically adjusted while the camera is in operation. In an example, when it is in operation, the video camera can scan back and forth through the space near the person's hand and fingers to search for food. In an example, the video camera can use pattern recognition to track the relative location of the person's fingers. In an example, the camera can automatically adjust its focal direction and/or focal length to monitor and identify eating-related objects (such as a fork or glass) that come into contact with the person's fingers.” [0244]  Inherent with scan and track is a motor.

Video camera can scan in a spiral, radial or back and forth pattern (therefore move location)…
“In an example, video camera 2107 can scan in a spiral, radial, or back-and-forth pattern in order to monitor activity near both the person's fingers and the person's mouth. This is more complex than just tracking the person's fingers. This requires that the device keep track of where the person's fingers and mouth are, in three-dimensional space, relative to the camera as the person moves their arm, hand, and head. In an example, face recognition software can help the device to track the person's mouth and gesture recognition software can help the device to track the person's fingers.” [0245] Inherent with scan in spiral, radial and back-and-forth pattern is a motor to cause the scan movements.

“In the example shown in FIG. 21, there is only one miniature video camera in this device and it is located on the outer portion of the person's wrist where the main body of a wrist watch would generally be located. In another example, such a device may have one video camera located on the opposite side of the person's wrist. In other examples, there may be two or more video cameras mounted on different locations around the person's wrist. In an example with two or more video cameras, different cameras may track different objects. For example, one camera may track the person's fingers and the other camera may track the person's mouth. In various examples, different cameras may operate at different times and/or with different focal lengths.” [0246]

Non-wearable imaging device…
“In an example of this invention, multiple imaging members may be worn on the same body member. In another example, multiple imaging members may be worn on different body members. In an example, an imaging member may be worn on each of a person's wrists or each of a person's hands. In an example, one or more imaging members may be worn on a body member and a supplemental imaging member may be located in a non-wearable device that is in proximity to the person. In an example, wearable and non-wearable imaging members may be in wireless communication with each other. In an example, wearable and non-wearable imaging members may be in wireless communication with an image-analyzing member.” [0348]

It would have been obvious to one of ordinary skill in the art before the effective filing date to include in the method and system of Connor the ability to do move two cameras as taught by Connor2 since the claimed invention is merely a combination of old elements and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.  Further motivation is provided by Connor2 who teaches the advantages of moving two cameras for monitoring food consumption.  

Voice and Speaker
The combined references teach speech/voice and food.  They also teach question.  They do not specifically teach question with voice and speaker.

Connor2 also in the business of speech/voice and food teaches:
Question using voice-based inquiry…
“In FIG. 23, the motion sensor has detected this possible eating event (i.e. the glass being tilted up to the mouth and then back down) and this event has triggered a voice-based inquiry from the device to the person via microphone and speaker unit 2106. In an example, the device, upon detection of a probable eating event, can ask the person a question such as—“If you are eating something, please identify it.” The sound waves of this voice-based inquiry from the device are represented in FIG. 23 by concentric dotted lines 2301 expanding outward from the device. In this example, the device solicits voluntary data concerning food consumption from the person through a voice-based message. In other examples, a device may solicit voluntary data via other means such as a display screen, buzzing or ring tone, vibration, or text message.” [0248]

Example of speaker…
“In FIG. 23, the motion sensor has detected this possible eating event (i.e. the glass being tilted up to the mouth and then back down) and this event has triggered a voice-based inquiry from the device to the person via microphone and speaker unit 2106. In an example, the device, upon detection of a probable eating event, can ask the person a question such as—“If you are eating something, please identify it.” The sound waves of this voice-based inquiry from the device are represented in FIG. 23 by concentric dotted lines 2301 expanding outward from the device. In this example, the device solicits voluntary data concerning food consumption from the person through a voice-based message. In other examples, a device may solicit voluntary data via other means such as a display screen, buzzing or ring tone, vibration, or text message.” [0248]

It would have been obvious to one of ordinary skill in the art before the effective filing date to include in the method and system of the combined references the ability to do use voice for questions and speaker as taught by Connor2 since the claimed invention is merely a combination of old elements and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.  Further motivation is provided by the combined references that teach using voice capabilities, it would be obvious to use this for questions as taught by Connor2.

Claims 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over the combined references in section (8) above in further view of Pub. No. US 2019/0295440 to Hadad.
Regarding claim 12
The dietary intake information acquisition device according to claim 1, wherein the processing circuitry gives, when there is a plurality of the target dishes or the target ingredients, priority orders to the plurality of the target dishes or the target ingredients, and

Connor teaches:
Sugar is given priority and main ingredient…
“Many people consume highly-processed foods whose primary ingredients include multiple types of sugar. The total amount of sugar is often obscured or hidden, even from those who read ingredients on labels. Sometimes sugar is disguised as "evaporated cane syrup." Sometimes different types of sugar are labeled as different ingredients (such as "plain sugar," "brown sugar," "maltose", "dextrose," and "evaporated cane syrup") in a single food item. In such cases, "sugar" does not appear as the main ingredient. However, when one adds up all the different types of sugar in different priority places on the ingredient list, then sugar really is the main ingredient. These highly-processed conglomerations of sugar (often including corn syrup, fats, and/or caffeine) often have colorful labels with cheery terms like "100% natural" or "high-energy." However, they are unhealthy when eaten in the quantities to which many Americans have become accustomed. It is no wonder that there is an obesity epidemic. The device and method disclosed herein is not be fooled by deceptive labeling of ingredients.” [0107]

	See Plurality and Priority below.

the processing circuitry determines the question timing in accordance with the priority orders.
{
From Applicant’s specification on priority…
“Furthermore, in the first embodiment described above, the question creation unit 103 may give a higher priority order to a target dish or a target ingredient that is obtained as an answer to a question and that can also be an answer to a question to another target dish or target ingredient.” [0183]

Therefore, higher priority to a dish that has an answer to a question.
}

Question asked in real time (question timing) or a delayed manner…
“In an example, a device can ask a person clarifying questions concerning food consumed. In an example, a device can prompt the person with queries to refine initial automatically-generated estimates of the types and quantities of food consumed. In an example, these questions can be asked in real time, as a person is eating, or in a delayed manner, after a person has finished eating or at a particular time of the day. In an example, the results of preliminary automated food identification can be presented to a human via a graphical user interface and the human can then refine the results using a touch screen. In an example, the results of automated food identification can be presented to a human via verbal message and the human can refine the results using a speech recognition interface. In an example, data can be transmitted (such as by the internet) to a review center where food is identified by a dietician or other specialist. In various examples, a human-to-computer interface for entering information concerning food consumption can comprise one or more interface elements selected the group consisting of: microphone, speech recognition, and/or voice recognition interface; touch screen, touch pad, keypad, keyboard, buttons, or other touch-based interface; camera, motion recognition, gesture recognition, eye motion tracking, or other motion detection interface; interactive food-identification menu with food pictures and names; and interactive food-identification search box.” [0276]

Plurality and Priority
The combined references teach question.  They also teach question with timing.  They do not teach priority and plurality of dishes or ingredients.

Hadad also in the business of questions teaches:
Identify foods (plurality) definitely (priority) in the image…
“The food image recognition engine described herein can address the above shortcomings of existing systems. The food image recognition engine disclosed herein is capable of one or more of the following: (1) Identify which foods are definitely (i.e. 100% probability) in the image; (2) Identify foods which may be in the image (probabilities), either by studying the image, the context in which it was taken, the user's history, or the likelihoods of certain foods appearing together; (3) Distinguish between dishes (e.g. Pad Thai) and ingredients (e.g. peanuts) they may contain; and (4) Leverage historical eating patterns of a user, and the context in which the user is eating to estimate volume and thus nutritional values. In particular, this can include using known menus when the user is eating out.” [0187]

Sort (prioritize) based on probability…
“To achieve the above goals, the food image recognition engine can include an algorithm comprising the following. First, a complete ontology of visual cues for foods (VICUF) is constructed. This is an ontology of any visual cue that human beings (or computers) may use to identify the food that is before them, and may include: (1) Combo food items (e.g. Greek salad, or a burrito); (2) Ingredients (e.g. banana, apple, shrimps); and (3) Other cues (e.g. cup, liquid, fried, etc.). Knowing the entire ontology of VICUF can be used in conjunction with other inputs to obtain a more accurate identification of restaurant dishes. Next, a robust corpus for each label in VICUF is created. Ideally each label in the training set is to be annotated. Next, a convolutional neural network (CNN) is trained on each label in the VICUF (binary classifier), or a CNN capable of multi-labeling is trained. The latter CNN may provide better results since certain foods often appear together, while other foods do not. Next, each time a new image is supplied, the CNN will map it to a probability vector, where each component represents the probability that a specific food-related visual cue is in the image. The above steps may be sufficient to create a food logging experience. Given an image, the food image recognition engine can sort items in the food logger depending on their probability value in the output vector. A threshold can be included such that items with low probability do not appear.” [0188]

Ask questions based on prediction (priority) …
“The insights and recommendation engine 230 can predict a user's eating patterns or habits. As much as 78% of meals can repeat themselves for an individual's diet. The insights and recommendation engine 230 may find re-occurring patterns in the diet based on (1) the user's historical meal or beverage consumption data, (2) relations between different foods derived from the food ontology, and (3) location and/or time of day. For example, as illustrated in FIGS. 32A-32B, the user may have a habit of eating bananas, red tomatoes, whole wheat toast, and white rice on a first day, and a broccoli rice bowl on a second day. Next time the user consumes a sub-combination or an entirety of banans, red tomatoes, whole wheat toast, and white rice in a, the insights and recommendation engine 230 may predict that the next meal would be the broccoli rice bowl. The insights and recommendation engine 230 can use the GUI-based software interface to ask the user to confirm or correct the prediction prior to logging the meal. Based on the user's response, the insights and recommendation engine 230 can confirm or improve its eating pattern prediction algorithm. In another example, a user can input “omelette” in the GUI-based software interface, and the insights and recommendation engine 230 may predict that the next most likely foods to be logged can be “bread” and “coffee” and auto-complete the user's meal as “Omelette with sliced bread and a cup of coffee.” Such auto-completion capability can allow user clicks and/or inputs to be reduced by at least 30, 40, 50, 60, 70%, or more.” [0297]

It would have been obvious to one of ordinary skill in the art before the effective filing date to include in the method and system of the combined references the ability to have plurality of dishes and prioritize as taught by Hadad since the claimed invention is merely a combination of old elements and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.  Further motivation is provided by Hadad who teaches the benefits of probabilities of dishes and the likelihood of foods appearing together.  

Regarding claim 13
The dietary intake information acquisition device according to claim 12, wherein the processing circuitry gives a higher priority order to the target dish or the target ingredient that is obtained as an answer to the question and that is also possibly an answer to a question for another target dish included in the target dish or another target ingredient included in the target ingredient.

Plurality and Priority
The combined references teach priority.  They also teach question with timing.  They do not teach another dish or ingredient.

Hadad also in the business of questions teaches:

Confirm correct prediction (priority) and example of bread and coffee as another ingredient included with the omelet (target ingredient)…
“The insights and recommendation engine 230 can predict a user's eating patterns or habits. As much as 78% of meals can repeat themselves for an individual's diet. The insights and recommendation engine 230 may find re-occurring patterns in the diet based on (1) the user's historical meal or beverage consumption data, (2) relations between different foods derived from the food ontology, and (3) location and/or time of day. For example, as illustrated in FIGS. 32A-32B, the user may have a habit of eating bananas, red tomatoes, whole wheat toast, and white rice on a first day, and a broccoli rice bowl on a second day. Next time the user consumes a sub-combination or an entirety of banans, red tomatoes, whole wheat toast, and white rice in a, the insights and recommendation engine 230 may predict that the next meal would be the broccoli rice bowl. The insights and recommendation engine 230 can use the GUI-based software interface to ask the user to confirm or correct the prediction prior to logging the meal. Based on the user's response, the insights and recommendation engine 230 can confirm or improve its eating pattern prediction algorithm. In another example, a user can input “omelette” in the GUI-based software interface, and the insights and recommendation engine 230 may predict that the next most likely foods to be logged can be “bread” and “coffee” and auto-complete the user's meal as “Omelette with sliced bread and a cup of coffee.” Such auto-completion capability can allow user clicks and/or inputs to be reduced by at least 30, 40, 50, 60, 70%, or more.” [0297]

It would have been obvious to one of ordinary skill in the art before the effective filing date to include in the method and system of the combined references the ability to have another dish or ingredient as taught by Hadad since the claimed invention is merely a combination of old elements and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.  Further motivation is provided by Hadad who teaches the benefits of knowing related dished for analysis purposes.    

Claims 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over the combined references in section (8) above in further view of Pub. No. US 2019/0291277 to Oleynik.
Regarding claim 15
The dietary intake information acquisition device according to claim 1, further comprising: 
a motor that propels the dietary intake information acquisition device.

The combined reference teach cameras.  He does not specifically teach move two cameras.

Connor2 also in the business of camaras teaches:
Automatically move cameras (therefore using a motor of some type)…
“In an example, the location of one or more cameras may be moved automatically, independently of movement of the body member to which the cameras are attached, in order to increase the probability of encompassing both the person's mouth and a reachable food source. In an example, the lenses of one or more cameras may be automatically and independently moved in order to increase the probability of encompassing both the person's mouth and a reachable food source. In various examples, a lens may be automatically shifted or rotated to change the direction or focal length of the camera's field of vision. In an example, the lenses of one or more cameras may be automatically moved to track the person's mouth and hand. In an example, the lenses of one or more cameras may be automatically moved to scan for reachable food sources.” [0386] Inherent with automatically move a camera is some type of motor device.

“In an example, video camera 2107 can have a fixed focal direction and focal length. In an example, the focal direction of the video camera may always point toward the person's fingers and the space surrounding the person's fingers. In another example, the video camera can have a focal direction or focal length that is automatically adjusted while the camera is in operation. In an example, when it is in operation, the video camera can scan back and forth through the space near the person's hand and fingers to search for food. In an example, the video camera can use pattern recognition to track the relative location of the person's fingers. In an example, the camera can automatically adjust its focal direction and/or focal length to monitor and identify eating-related objects (such as a fork or glass) that come into contact with the person's fingers.” [0244]  Inherent with scan and track is a motor.

Video camera can scan in a spiral, radial or back and forth pattern (therefore move location)…
“In an example, video camera 2107 can scan in a spiral, radial, or back-and-forth pattern in order to monitor activity near both the person's fingers and the person's mouth. This is more complex than just tracking the person's fingers. This requires that the device keep track of where the person's fingers and mouth are, in three-dimensional space, relative to the camera as the person moves their arm, hand, and head. In an example, face recognition software can help the device to track the person's mouth and gesture recognition software can help the device to track the person's fingers.” [0245] Inherent with scan in spiral, radial and back-and-forth pattern is a motor to cause the scan movements.

Non-wearable imaging device…
“In an example of this invention, multiple imaging members may be worn on the same body member. In another example, multiple imaging members may be worn on different body members. In an example, an imaging member may be worn on each of a person's wrists or each of a person's hands. In an example, one or more imaging members may be worn on a body member and a supplemental imaging member may be located in a non-wearable device that is in proximity to the person. In an example, wearable and non-wearable imaging members may be in wireless communication with each other. In an example, wearable and non-wearable imaging members may be in wireless communication with an image-analyzing member.” [0348]

It would have been obvious to one of ordinary skill in the art before the effective filing date to include in the method and system of the combined references the ability to do move two cameras as taught by Connor2 since the claimed invention is merely a combination of old elements and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.  Further motivation is provided by Connor2 who teaches the advantages of moving two cameras for monitoring food consumption.  

The combined references teach imaging located with non-wearable device.  They do not teach motor propels device.

Oleynik also in the business of non-wearable devices teaches:

Robotic arms and meal….
“Embodiments of the present disclosure are directed to methods, computer program products, and computer systems of a robotic apparatus with robotic instructions replicating a food dish with substantially the same result as if a chef had prepared the food dish. In a first embodiment, the robotic assistant system in a standardized robotic kitchen comprises two robotic arms and hands that replicate the precise movements of the chef in same sequence (or substantially the same sequence). The two robotic arms and hands replicate the movements in the same timing (or substantially the same timing) to prepare the food dish based on a previously recorded document (a recipe-script) of the chef's precise movements in preparing the same food dish. In a second embodiment, a computer-controlled cooking apparatus prepares a food dish based on a sensory-curve, such as temperature over time, which was previously recorded in a software file where the chef prepared the same food dish with the cooking apparatus with sensors for which a computer recorded the sensor values over time when the chef previously prepared the food dish on the cooking apparatus fitted with the sensors. In a third embodiment, the kitchen apparatus comprises the robotic arms in the first embodiment and the cooking apparatus with sensors in the second embodiment to prepare a dish that combines both the robotic arms and one or more sensory curves, where the robotic arms are capable of quality-checking a food dish during the cooking process, for such characteristics as taste, smell, and appearance, allowing for any cooking adjustments to the preparation steps of the food dish. In a fourth embodiment, the kitchen apparatus comprises a food storage system with computer-controlled containers and container identifiers for storing and supplying ingredients for a user to prepare the food dish by following the chef's cooking instructions. In a fifth embodiment, a robotic kitchen comprises a robotic assistant system with arms and a kitchen apparatus in which the robotic assistant system moves around the kitchen apparatus to prepare a food dish by emulating a chef's precise cooking movements, including possible real-time modifications/adaptations to the preparation process defined in the recipe-script.” [0033]

Robot with motor…
“The high-level controller software engine 3057 builds the application-specific task-based robotic instruction-sets, which are in turn fed to a command sequencer software engine that creates machine-understandable command and control sequences for the command executor GG8. The software engine 3052 decomposes the command sequence into motion and action goals and develops execution-plans (both in time and based on performance levels), thereby enabling the generation of time-sequenced motion (positions & velocities) and interaction (forces and torques) profiles, which are then fed to the low-level controller 3059 for execution on the humanoid robot platform by the affected individual actuator controllers 3060, which in turn comprise at least their own respective motor controller and power hardware and software and feedback sensors.” [0652]


Robotic assistant (acquisition device) with wheels…
“As described above, the robotic assistant 5002r can be a standalone and independently movable structure (e.g., a body on wheels) or a structure that is movably attached to the environment or workspace (e.g., robotic parts attached to a multi-rail and actuator system). In either structural scenario, the robotic assistant 5002r can navigate to the desired or target environment. In some embodiments, the robotic assistant 5002 includes a navigation module that can be used to navigate to the desired position in the environment 5002 and/or workspace 5002w.” [0936]

It would have been obvious to one of ordinary skill in the art before the effective filing date to include in the method and system of the combined references the ability to have a motor to propel a device as taught by Oleynik since the claimed invention is merely a combination of old elements and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.  Further motivation is provided by Oleynik who teaches the advantages of using a robot for meals.    

Regarding claim 16
The dietary intake information acquisition device according to claim 1, further comprising: 
a motor that drives wheels to propel the dietary intake information acquisition device.


The combined reference teach cameras.  He does not specifically teach move two cameras.

Connor2 also in the business of camaras teaches:
Automatically move cameras (therefore using a motor of some type)…
“In an example, the location of one or more cameras may be moved automatically, independently of movement of the body member to which the cameras are attached, in order to increase the probability of encompassing both the person's mouth and a reachable food source. In an example, the lenses of one or more cameras may be automatically and independently moved in order to increase the probability of encompassing both the person's mouth and a reachable food source. In various examples, a lens may be automatically shifted or rotated to change the direction or focal length of the camera's field of vision. In an example, the lenses of one or more cameras may be automatically moved to track the person's mouth and hand. In an example, the lenses of one or more cameras may be automatically moved to scan for reachable food sources.” [0386] Inherent with automatically move a camera is some type of motor device.

“In an example, video camera 2107 can have a fixed focal direction and focal length. In an example, the focal direction of the video camera may always point toward the person's fingers and the space surrounding the person's fingers. In another example, the video camera can have a focal direction or focal length that is automatically adjusted while the camera is in operation. In an example, when it is in operation, the video camera can scan back and forth through the space near the person's hand and fingers to search for food. In an example, the video camera can use pattern recognition to track the relative location of the person's fingers. In an example, the camera can automatically adjust its focal direction and/or focal length to monitor and identify eating-related objects (such as a fork or glass) that come into contact with the person's fingers.” [0244]  Inherent with scan and track is a motor.

Video camera can scan in a spiral, radial or back and forth pattern (therefore move location)…
“In an example, video camera 2107 can scan in a spiral, radial, or back-and-forth pattern in order to monitor activity near both the person's fingers and the person's mouth. This is more complex than just tracking the person's fingers. This requires that the device keep track of where the person's fingers and mouth are, in three-dimensional space, relative to the camera as the person moves their arm, hand, and head. In an example, face recognition software can help the device to track the person's mouth and gesture recognition software can help the device to track the person's fingers.” [0245] Inherent with scan in spiral, radial and back-and-forth pattern is a motor to cause the scan movements.

Non-wearable imaging device…
“In an example of this invention, multiple imaging members may be worn on the same body member. In another example, multiple imaging members may be worn on different body members. In an example, an imaging member may be worn on each of a person's wrists or each of a person's hands. In an example, one or more imaging members may be worn on a body member and a supplemental imaging member may be located in a non-wearable device that is in proximity to the person. In an example, wearable and non-wearable imaging members may be in wireless communication with each other. In an example, wearable and non-wearable imaging members may be in wireless communication with an image-analyzing member.” [0348]

It would have been obvious to one of ordinary skill in the art before the effective filing date to include in the method and system of the combined references the ability to do move two cameras as taught by Connor2 since the claimed invention is merely a combination of old elements and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.  Further motivation is provided by Connor2 who teaches the advantages of moving two cameras for monitoring food consumption.  

The combined references teach imaging located with non-wearable device.  They do not teach motor propels device.

Oleynik also in the business of non-wearable devices teaches:

Robotic arms and meal….
“Embodiments of the present disclosure are directed to methods, computer program products, and computer systems of a robotic apparatus with robotic instructions replicating a food dish with substantially the same result as if a chef had prepared the food dish. In a first embodiment, the robotic assistant system in a standardized robotic kitchen comprises two robotic arms and hands that replicate the precise movements of the chef in same sequence (or substantially the same sequence). The two robotic arms and hands replicate the movements in the same timing (or substantially the same timing) to prepare the food dish based on a previously recorded document (a recipe-script) of the chef's precise movements in preparing the same food dish. In a second embodiment, a computer-controlled cooking apparatus prepares a food dish based on a sensory-curve, such as temperature over time, which was previously recorded in a software file where the chef prepared the same food dish with the cooking apparatus with sensors for which a computer recorded the sensor values over time when the chef previously prepared the food dish on the cooking apparatus fitted with the sensors. In a third embodiment, the kitchen apparatus comprises the robotic arms in the first embodiment and the cooking apparatus with sensors in the second embodiment to prepare a dish that combines both the robotic arms and one or more sensory curves, where the robotic arms are capable of quality-checking a food dish during the cooking process, for such characteristics as taste, smell, and appearance, allowing for any cooking adjustments to the preparation steps of the food dish. In a fourth embodiment, the kitchen apparatus comprises a food storage system with computer-controlled containers and container identifiers for storing and supplying ingredients for a user to prepare the food dish by following the chef's cooking instructions. In a fifth embodiment, a robotic kitchen comprises a robotic assistant system with arms and a kitchen apparatus in which the robotic assistant system moves around the kitchen apparatus to prepare a food dish by emulating a chef's precise cooking movements, including possible real-time modifications/adaptations to the preparation process defined in the recipe-script.” [0033]

Robot with motor…
“The high-level controller software engine 3057 builds the application-specific task-based robotic instruction-sets, which are in turn fed to a command sequencer software engine that creates machine-understandable command and control sequences for the command executor GG8. The software engine 3052 decomposes the command sequence into motion and action goals and develops execution-plans (both in time and based on performance levels), thereby enabling the generation of time-sequenced motion (positions & velocities) and interaction (forces and torques) profiles, which are then fed to the low-level controller 3059 for execution on the humanoid robot platform by the affected individual actuator controllers 3060, which in turn comprise at least their own respective motor controller and power hardware and software and feedback sensors.” [0652]


Robotic assistant (acquisition device) with wheels…
“As described above, the robotic assistant 5002r can be a standalone and independently movable structure (e.g., a body on wheels) or a structure that is movably attached to the environment or workspace (e.g., robotic parts attached to a multi-rail and actuator system). In either structural scenario, the robotic assistant 5002r can navigate to the desired or target environment. In some embodiments, the robotic assistant 5002 includes a navigation module that can be used to navigate to the desired position in the environment 5002 and/or workspace 5002w.” [0936]

It would have been obvious to one of ordinary skill in the art before the effective filing date to include in the method and system of the combined references the ability to have wheels with a motor to propel a device as taught by Oleynik since the claimed invention is merely a combination of old elements and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.  Further motivation is provided by Oleynik who teaches the advantages of using a robot for meals.    

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KENNETH BARTLEY whose telephone number is (571)272-5230. The examiner can normally be reached Mon-Fri: 7:30 - 4:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SHAHID MERCHANT can be reached at (571) 270-1360. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KENNETH BARTLEY/Primary Examiner, Art Unit 3684
Read full office action
Prosecution Timeline

Sep 19, 2024
Application Filed
Oct 16, 2025
Non-Final Rejection mailed — §103, §112
Jan 12, 2026
Response Filed
Apr 01, 2026
Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/336,611
Patent 12633420
MANAGEMENT DEVICE, THERAPEUTIC INHALER, AND NON-TRANSITORY STORAGE MEDIUM STORING MANAGEMENT PROGRAM
2y 11m to grant Granted May 19, 2026
17/945,300
Patent 12614631
SYSTEMS AND METHODS TO PROCESS A HEALTHCARE TRANSACTION IN A HETEROGENEOUS ENVIRONMENT COMPRISED OF MAINFRAME AND CLOUD BASED SYSTEMS IN REAL-TIME
3y 7m to grant Granted Apr 28, 2026
17/996,673
Patent 12603168
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR PROVIDING QUICK RESPONSE (QR) CODES FOR INJECTION SYSTEMS
3y 5m to grant Granted Apr 14, 2026
17/729,257
Patent 12512195
SYSTEM FOR MONITORING HEALTH DATA ACQUISITION DEVICES
3y 8m to grant Granted Dec 30, 2025
17/948,469
Patent 12475987
ROBOTICALLY-ASSISTED DRUG DELIVERY
3y 2m to grant Granted Nov 18, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
36%
Grant Probability
65%
With Interview (+28.8%)
3y 10m (~2y 2m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 614 resolved cases by this examiner. Grant probability derived from career allowance rate.