Engagement when the toddler is watching towards the movie stimuli is defined by

Engagement when the toddler is watching towards the

This preview shows page 39 - 42 out of 119 pages.

Engagement when the toddler is watching towards the movie stimuli is defined by frames when the toddler exhibits yaw poses with magnitudes less than 20 ˝ . Figure 3.2 : Example of a head turn using the automatic method. To differentiate a head turn from a face occlusion, we determine if the child is performing a head turning motion before and after the face is lost or when its exhibiting a yaw pose with large magnitude. The red bars represent the half-second windows used to determine if the child is exhibiting a head turning motion before and after the face is lost (by the camera) or when its exhibiting a yaw pose with large magnitude. (a) 28
Image of page 39
(b) Figure 3.3 : Audio is analyzed to detemine the exact time point the practitioner said the child’s name during a name-call. The power spectrum density (psd) of the recorded audio signal (3.3(a)) contains audio from the movie stimuli (predominantly music) and instances of vocalizations. Root mean squared (RMS) values of the audio signal (3.3(b)) provide quantification of audio signals at each time point, and are used to detect a name-call prompt. Knowing that practitioner was asked to prompt a name-call at 15 seconds into the stimuli, in this example we are able to focus on speech around the time point (green box) and detect the exact time point when maximum speech occurred. Head movement and turn detection We estimate the child’s head movement by tracking the distances and pixel-wise displacements of central facial landmarks. We record the frame-by-frame displacements of landmarks around the nose, namely the two outer eye landmarks and the lowest nose landmark shown in Figure 3.1. The magnitudes of these displacements are heavily dependent on the distance the child is away from the camera. Thus these displacements need to be normalized with respect to the child’s distance from the camera. If depth information were available, this would be a trivial task; however, since it is not, we normalize the displacements with respect to the distance between the child’s eyes, keeping in line with the use of only available and ubiquitous hardware. At any given time point, the displacements from the nose landmark are normalized by a ˘ 1 second windowed-average Euclidean distance between the eyes. Since the practitioner and caregiver are located behind the child, the child must transition his/her face from looking at the screen to looking behind him/her in order to perform a head turn (in response to name calling or social referencing for example). To detect head turns and distinguish between a head turn and just an occlusion of the face, we tracked yaw pose changes and defined two rules: to initiate a head turn the pose had to go from a frontal to one extreme head pose position (left or right); to complete a head turn the pose then had to come back from the same 29
Image of page 40
extreme position to a frontal position. More formally, to initiate a head turn the yaw pose had to change from a frontal position θ yaw P t´ 20 ˝ , ` 20 ˝ u to an extreme | θ yaw | ą 35 ˝ within a half-second window. Then to complete a head turn, the yaw pose had to
Image of page 41
Image of page 42

You've reached the end of your free preview.

Want to read all 119 pages?

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture