Section 4.1 presents results of simula-tions, while experimental results with an eye-in-handrobotic system are presented in Section 4.2. Finally,in Section 5 the major contribution of the paper issummarized and future work is outlined.2. Models and measurementsAssume that the visible object’s surface is a smoothmanifold of equation .x;y/ D z in terms of thecamera frame hciDfi i ic;j j jc;k k kcg. The region ofthe visible surface is mapped into a patch & of theimage plane in which visual analysis is carried out.LetcPc D [xc;yc;zc]T be the point belonging to the visible surface whose projection in the image planeis the centroid of &. We consider a local approxima-tion of around cPc, resulting in the tangent planeequation: .x;y/ px C qy C c; (1)where p D .@ .x; y/=@x/jxc;yc , q D .@ .x;y/=@y/jxc;yc , and c D .xc;yc/ − pxc − qyc. The planecoefficients determine, up to a degree of freedom, therelative pose between the camera and the visible sur-face. Let us also assume a weak perspective cameraapproximation [16], and fix an object-centered framehoiDfi i io;j j jo;k k kog whose origin is the pointcPc. Thetangent plane equation is expressed in the objectframe simply as oz D 0. The following equation re-lates camera coordinates and object frame coordinatesfor a given projected point p p p D [px ;py ]T:[px ;py ]T D [xc;yc]T C T [ox;oy]T; (2)where D f=zc and T is the weak perspective pro-jection matrix. It holds thatT D c c c' C s s' c c s' − s c's c c' − c s' s c s' C c c' ; (3)where the notation c# D cos # and s# D sin # is usedand the angles 2 [0; =2] (slant), 2 [− ; ] (tilt)and ' 2 [− ; ] (orientation) define a minimal repre-sentation of orientation commonly used in computervision [6].Eq. (3) provides two different solutions for (and'), which differ by . This results in a pose ambigu-ity which is typical of any perspective linearization:there are two distinct object poses sharing the samevisual appearance [6]. In the case of weak perspec-tive, such ambiguity can be written as T. ; ; ;'/ DT. ; C ; ;'C / (see also Fig. 2). Given two ob-ject views fp p p1g and fp p p2g of the same object point, andthe corresponding centroids of & : p p p1c and p p p2c , fromEq. (2) we obtain p p p2 −p p p2c D A12.p p p1 −p p p1c /, withA12 D T2.T1/−1: (4)It follows that changes in shape are modeled as 2Daffine transformations, which form a sub-group of 2Dprojective transformations (isomorphic to SL(3)).The dynamic interaction between camera and ob-ject can be expressed, at a generic image point p p p,interms of the 2D motion field v v v.pxpy / DP p P p P p arisingin the image plane due to both surface shape and 3Drelative velocity twist. We show below that with ourassumptions the motion field has a first-order spatialstructure around the centroid p p pc of the image patch&. At time t, for a generic image point results p p p.t/ Dp p pc.t/CA.t/.p p p.0/−p p pc.0// which, differentiated withrespect to time, yields: P p P p P p.t/ DP p P p P pc.t/ C P A.t/A.t/−1.p p p.t/ −p p pc.t//: (5)The tensor (of degree 2) Mc D P A.t/A.t/−1 canbe characterized in terms of differential invariants[13].3. Hybrid visual servoingIn this section, a hybrid state space representationof camera–object interaction is first derived, and thena robust control law is synthesized.3.1. State representationAccording to the first-order spatial structure of themotion field of Eq. (5), the dynamic evolution of anyimage patch enclosing the object has six degrees offreedom, namely the velocity centroid coordinatesv v vc —accounting for rigid translations of the wholepatch—and the entries of the 2 2 tensor
Mc —related to changes in shape of the patch [13]. Let uschoose as the state of the system the 6-vectorx x x D [pcx ;pcy ; − ';p;q; c]T; (6)which is a hybrid vector, since it includes both im-age space 2D information and 3D orientation and dis-tance parameters. Notice that the choice of − ' isdue to the fact that this quantity is well defined also inthe fronto-parallel configuration, which is a singularityof orientation representation for the angles ; , and'. We demonstrate below that the state space repre-sentation of camera–object interaction can be writtenasP x x x D B.x x x/cV V V cno; (7)where the notation aV V V bnc stands for relative twistscrew of frame hbi with respect to frame hci ex-pressed in frame hai. The system described by Eq.(7) is a driftless, input-affine nonlinear system, wherecV V V cno D cV V V cna − cV V V ona is the relative twist screwof camera and object.cV V V cna D [cv v vTcna;c! ! !Tcna]T is thecontrol input and cV V V ona D [cv v vTona;c! ! !Tona]T is a distur-bance input, and hai is an arbitrary reference frame.Assuming that the object is almost centered in the vi-sual field, and sufficiently far from the camera plane, itapproximately holds that px qx =f 2 0;px qy =f 2 0and py qy =f 2 0, for any two imaged object pointsp p p and q q q. The centroid dynamics is given by thefollowing first-order motion field expression [1]:P p P p P pc.t/ D −f=zc 0 pcx =c 0 −fpcy0 −f=zc pcy =c f 0 −pcx cV V V cno;(8)where zc.x x x/ D c=.1−p.pcx =f /−q.pcy =f / / is the cen-troid depth expressed as a function of state variables.The dynamics of the 3D parameters p, q, and c canbe determined as in [13] by combining Eq. (1) andrigid body kinematics. Moreover, from Eq. (2), weobtain:s −' D t11 C t22 .c C 1/;c −' D −t12 C t21 .c C 1/; (9)where tij is the generic element of the projection ma-trix T. The quantities c and have the following ex-pressions as functions of state variables:c .x x x/ D 1p1 C p2 C q2; .x x x/ Df − ppcx − qpcyc: (10)Since tan. − '/ is a function of T, we need first toderive the dynamics of the projection matrix. FromEqs. (5) and (4), it soon follows Mc D P T.T/−1.Now,by defining y y y D [pcx ;pcy ;t11;t12;t21;t22;p;q;c]T,weobtain the dynamics of x3 D − ' asP x3.x x x/ D @x3@y y yP y y yD .rsx3 cx3 −rcx3 sx3 /By .x x x/cV V V cno; (11)wherersx3 D"@sx3@pcx;@sx3@pcy;@sx3@t11;@sx3@t12;@sx3@t21;@sx3@t22;@sx3@p;@sx3@q;@sx3@c and the similar expression for rcx3 are straightforwardto compute.