RE: TPAMI-2007-06-0371, "Tracking by an Optimal Sequence of Linear Predictors" Manuscript Type: Regular Dear Mr. Karel Zimmermann , We have completed the review process of the above referenced paper for the IEEE Transactions on Pattern Analysis and Machine Intelligence. Enclosed are your reviews. Associate Editor Dr. Patrick Perez has recommended to the Editor-in-Chief that your paper undergo a major revision revision. If you should choose to revise your paper, please prepare a separate document describing how each of the reviewers' comments are responded to in your revision and send it to us in three months. To revise your manuscript, log into https://mc.manuscriptcentral.com/tpami-cs and enter your Author Center, where you will find your manuscript title listed under "Manuscripts with Decisions." Under "Actions," click on "Create a Revision." Your manuscript number has been appended to denote a revision. Once the revised manuscript is prepared, you can upload it and submit it through your Author Center. When submitting your revised manuscript, you will be able to respond to the comments made by the reviewer(s) in the space provided. You can use this space to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the reviewer(s)’ questions and comments. You may also upload your responses as separate files for review along with your revision. If you choose to do this, please choose “Summary of Changes” as the file designation. IMPORTANT: Your original files are available to you when you upload your revised manuscript. Please delete any redundant files before completing the submission. When the submission process is complete, you will receive an automated confirmation email immediately. If you did not receive that email, your submission is not yet complete. The journal’s publication coordinator will contact you should we have any concerns or questions regarding your revision. Otherwise, your revision will be forwarded to the assigned associate editor with a request to begin the second round of reviews. Please be mindful when making your revisions that you still need to maintain the size limitations for papers submitted to TPAMI. Our manuscript types and submission length guidelines (including the main text, the abstract, index terms, illustrations and references) are as follows: TPAMI manuscript types and submission length guidelines are as follows: • Regular papers* – 35 single column pages • Comments paper – 4 single column pages • Survey papers** – 45 single column pages Please note that double column will translate more readily into the final publication format. Our peer review double column templates can be found on http://www.computer.org/portal/site/transactions/menuitem.eda2ca84d8d67764cfe79d108bcd45f3/index.jsp?&pName=transactions_level1&path=transactions/tpami/mc&file=author.xml&xsl=article.xsl&#templates Please do not hesitate in contacting us should you have any questions about our process or are experiencing technical difficulties. You can reach me at tpami@computer.org. Thank you for your contribution to TPAMI, and we look forward to receiving your revised manuscript. Sincerely, Ms. Elaine Stephenson on behalf of Dr. Patrick Perez IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Computer Society 10662 Los Vaqueros Circle Los Alamitos, CA 90720 USA Voice: 714.821.8380 Fax: 714.821.9975 tpami@computer.org ************** Editor Comments Editor: 1 Comments to the Author: The three referees found the main idea novel, sound and useful. They also pointed out that some of the experimental results are impressive. However, two of them expressed a few major critics regarding both the theoretical properties of the proposed tracking system (convergence, robustness and complexity issues) as well as the practical merit (lack of comparison with respect to [3] and [5], unclear comparison to Lucas-Kanade, lack of robustness experimental assessment, absence of occlusions of tracked object by other scene elements). These important questions should be addressed in the revised manuscript, in addition to the various clarity issues listed by all referees. ******************** Reviewer Comments Reviewer: 1 Recommendation: Author Should Prepare A Major Revision For A Second Review Comments: Overall the idea of the paper is very good and the obtained results are impressive. Also the developed theory is useful and can probably be used for other tasks too. The emphasize on globally optimal results is important. But there are points that need improvement before this paper can be accepted for PAMI: 1. The presentation: In some parts the paper seems to introduce an overly complex description of the method which makes it more complicated than necessary. There are many forward references in the paper which makes it sometimes hard to follow. Please make sure that in all cases the formal apparatus introduced is really necessary for the method. In some cases I have the impression that the formulations are more general than necessary. In other parts not the best visualization/description has been used. For example Fig. 1 is not really helpful and could be clearer, similar Algorithm 1 does not provide any information. Also Fig. 7 is a strange way of visualization. Please introduce in the beginning a complete overview of all the parts such that the reader can orient herself. 2. The experimental results can be extended: One should perform the Exp. of Fig. 9 also with a regular sampling (e.g. along a circle) instead of a random one. Please state how the reference points where selected when you compare SLLP to Lukas-Kanade. If this where random or regular distributed points than the comparison is unfair for LK. If these where corner points and you are still better than this is very significant. In any case a experiment using corner points for comparison would be good. I miss a discussion on the accuracy of the tracker. Please perform an experiment which demonstrates how accurate you can estimate the motion parameters (either using synthetic data or preferably using some accurate motion stage). 3. There are some minor details that should be fixed: You state that some of the problems you are treating are NP-hard, please provide a reference for the claim. In Fig. 3(a),(b)(d) use also r-range and c - complexity as an axis label to make it consistent with Fig. 3(c). Fig. 6 (b) the coordinate axis with 0.5, 1.5 iterations does not make sense. You cite [21] for more details, please include these measures in the description in order to make the journal paper self-contained. Please state explicitly which operations of your method can be speed-up by a GPU implementation. Some English formulations could be improved: "... where planar object is tracked", "... more iteration's is performed ...", " ... the cheapest in a graph." Please do a careful proofreading. ===================== 1. Which category describes this manuscript?: Research/Technology 2. How relevant is this manuscript to the readers of this periodical? If you answer Not very relevant or Irrelevant please explain your rating under Public Comments below.: Very Relevant 1. Please evaluate the significance of the manuscript’s research contribution.: Excellent 2. Please explain how this manuscript advances this field of research and/or contributes something new to the literature. : The paper proposes a tracking method that is based on learning linear predictors. The general idea of the paper is well in line with recent techniques like Lepetit, Jurie etc. that use an offline learning phase to develop detectors/predictors which are very fast at execution time. In this particular case the authors use linear predictors to predict the motion locally and a RANSAC algorithm to arrive at a robust global estimate. The real strength of the paper is the mathematics and careful design of the ingredients which allows to formulate optimality conditions, eg. that the linear predictors are formulated as a minimax problem rather than linear predictors, the complexity of the tracker can be controlled by the support set of the predictors and the number of predictors which are used in the RANSAC step. This allows a design (run-time) of the algorithm for a particular application. 3. Is the manuscript technically sound? In the Public Comments section, please provide detailed explanations to support your assessment: Yes 4. How thorough is the experimental validation (where appropriate)? Please discuss any shortcomings in the Public Comments section.: Lacking in some respects; some cases of interest not tested 1. Are the title, abstract, and keywords appropriate? If not, please comment in the Public Comments section.: Yes 2. Does the manuscript contain sufficient and appropriate references? Please comment and include additional suggested references in the Public Comments section.: References are sufficient and appropriate 3. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? If not, please explain your answer in the Public Comments section.: Yes 4. How would you rate the organization of the manuscript? Is it focused? Please elaborate with suggestions for reorganization in the Public Comments section.: Could be improved 5. Please rate the readability of the manuscript. Explain your rating under Public Comments below. : Readable - but requires some effort to understand 6. How is the length of the manuscript? If changes are suggested, please make explicit recommendations in the Public Comments section.: About right Please rate the manuscript overall. Explain your choice.: Excellent ******************************************** Reviewer: 2 Recommendation: Author Should Prepare A Major Revision For A Second Review Comments: The main contribution and novelty of the paper is that it explicitly takes into consideration the accuracy (uncertainty region), the complexity and the range of linear predictors that are applied sequentially for tracking. The predictors sequence is selected in a way that minimizes the total complexity (for a given level of desired accuracy and range). The computational complexity of the resulting scheme is good (resulting to a more than realtime matlab implementation). On the other hand there are a number of issues that to my opinion are not addressed sufficiently well, and need to be addressed in a revised version of the manuscript. In particular, the issue of robustness of the individual detectors and of the sequence of detectors is not addressed well, neither theoretically nor experimentally. The manuscript contains some (correctable) technical errors (see below). 1) Proposition 1, states that the uncertainty region is a non-increasing function of the complexity. In the ideal case that a regressor would be able to discard the information provided by additional pixels/features, the proposition should be true. However, this is in general not the case as learning algorithms cannot always discard irrelevant/redundant features. This is also the case with the proposed regressors as the greedy feature selection procedure that is employed (section VI) is not optimal. It seems to me that proposition 2 (the proof of which was not very clear to me) might also depend on the learning algorithm. These propositions are central for claiming that the sequence of regressors is computationally optimal. If they do not hold, it seems to me that more paths need to be considered in the graph of algorithm 3, namely, regressors with larger ranges than the ones considered in step 2a might need to be considered. 2) Occlusions are dealt by maintaining an active set of predictors and by using the Ransac algorithm. However the selection of the active set of the predictors is based on quality measures that are defined on the training phase and therefore do not seem appropriate for an online selection. Further the experimental results do not contain sequences with partial target occlusion, apart from occlusions that are caused by the frame size (i.e. the target is partially outside the visible frame) and in which the set of visible predictors can be trivially detected. This is in general not the case and experimental results using sequences with more complex occlusions had better be included. 3) The main idea of the paper (that is to use a sequence of predictors in a way that the computational complexity is minimized) is interesting. However, an issue that is not sufficiently addressed is the robustness. During training (step 2a algorithm 3) only predictors whose range is *just* larger than the uncertainty region of the immediately preceding predictors are considered. Therefore, if a predictor fails then the next predictor is almost certainly bound to fail and the whole sequence should fail either. Compromises between robustness and complexity would be better introduced (i.e. by allowing some larger difference between the uncertainty region and the range of two immediately subsequent predictors) and evaluated. In any case, the robustness of the sequence of regressors to individual regressor failures needs to be quantified and evaluated in a test set. For example, an important question is to which extend do the predictions of the individual detectors fall within their expected uncertainty range. In continuation of the above, since the accuracy/uncertainty region of any detector cannot be warranted during tracking (i.e. in the test set), outliers (that is predictions that fall outside the uncertainty range that is estimated/minimized during training), are to be expected. From this perspective the significance in the difference of the errors of the LSQ and the proposed regressor as illustrated in Fig.4 is not immediately apparent. A relevant question is if for a given uncertainty region the proposed regressor has a lower probability that outliers occur (in a test set as in the training this is trivially true). 4) Experimental results should include quantification of the robustness of the algorithm wrt. noise. 5) The comparison of the proposed regressor with the LSQ regressor would better be clearer as the results in Table II are too much condensed. Figure 4 seems to indicate that the differences between the LSQ and the proposed regressor are very small and it is rather surprising that Ransac cannot handle the few 'outlier' predictions. 6) The notation used is sometimes wrong as well as the described algorithms. In particular: a) Eq.3 Minimization is not with respect to \lambda (i.e. the notation \arg\min_{\lambda} should be changed) b) Algorithm 4 is not correct. I assume that the idea is that the set I is incrementally constructed, but this is not what is described in step 2. The equation should be something like: I = I \union \arg\min{...} c) Miminization of the uncertainty region wrt a *biased* rectangle (fig.4c) is puzzling. If the regressor cannot learn to compensate for the bias then the uncertainty region should also be calculated with the (0,0) as the rectangle center. It is puzzling though that the regressor cannot compensate for the bias and at the same time does minimize the rectangle area. 7) The paper would benefit from careful reading and editing. In particular the authors should pay attention as they often skip articles. 8) Coloring of the bar in fig.5 is not consistent with the labeling of the horizontal axis. Also the fact that the arrow lead to points in the complexity/range graph is misleading. The arrow should lead to lines with constant range. the same holds for fig.6 ================= 1. Which category describes this manuscript?: Research/Technology 2. How relevant is this manuscript to the readers of this periodical? If you answer Not very relevant or Irrelevant please explain your rating under Public Comments below.: Relevant 1. Please evaluate the significance of the manuscript’s research contribution.: Good 2. Please explain how this manuscript advances this field of research and/or contributes something new to the literature. : The paper proposes a tracking method that is based on a sequence of learned linear predictors. The main contribution of the paper is that the sequence of the predictors is constructed in a way that minimizes the computational complexity of the compound predictor, for a given level of desired accuracy and for a given range of motions that the predictor is trained on. The performance of the tracking scheme is evaluated in a number of sequences and the reported complexity/speed is very good. 3. Is the manuscript technically sound? In the Public Comments section, please provide detailed explanations to support your assessment: Partially 4. How thorough is the experimental validation (where appropriate)? Please discuss any shortcomings in the Public Comments section.: Lacking in some respects; some cases of interest not tested 1. Are the title, abstract, and keywords appropriate? If not, please comment in the Public Comments section.: Yes 2. Does the manuscript contain sufficient and appropriate references? Please comment and include additional suggested references in the Public Comments section.: References are sufficient and appropriate 3. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? If not, please explain your answer in the Public Comments section.: Yes 4. How would you rate the organization of the manuscript? Is it focused? Please elaborate with suggestions for reorganization in the Public Comments section.: Satisfactory 5. Please rate the readability of the manuscript. Explain your rating under Public Comments below. : Easy to read 6. How is the length of the manuscript? If changes are suggested, please make explicit recommendations in the Public Comments section.: Should be trimmed a bit Please rate the manuscript overall. Explain your choice.: Good ********************************** Reviewer: 3 Recommendation: Author Should Prepare A Major Revision For A Second Review Comments: This paper concerns the visual tracking of an object of interest in a sequence of images. The proposed approach fall in the realm of trained trackers. An off-line step is performed in order to "learn" the relationship between the observed image registration error and the state parameters. The authors build their work on the previous work of Cootes [4] (which has been successively redesigned by Jurie [3]). Trained trackers are opposed and preferred to on-line methods (like the Lukas-Kanade tracker) based on gradient descent techniques they suffer of typical problems of local optimization: convergence to a local minimum, an unknown number of iterations and an unknown basin of convergence. The method proposed by the authors consists in selecting some "reference points" (i.e. arbitrary points on the object) then train an optimal Sequence of Learned Linear Predictors. The main contribution of the paper is the idea that the training stage should take into account the computational complexity of the tracking. The computational complexity of the tracking is proportional to the number of pixels of the support. On the other hand, the motion parameters considered in the training step are simply a translation in the image (2 d.o.f.s). In any case, another robust stage (with RANSAC) for the global motion estimation is needed. The main concern with the proposed trained tracker is that it is supposed to avoid the problems of gradient descent algorithms as claimed by the authors in the introduction. However, this do not seem to be the case. In the paper, there is no proof that the trained tracker will converge to the good solution (which is equivalent to find the global minimum); there is no proof that the the trained tracker will converge to the solution in a fixed number of iteration (or equivalently in a fixed time); there is no proof that the the trained tracker has a known basin of convergence. - Concerning the convergence: Given that the learned motion is a translation and the predictor is linear and it does not allow to capture the shape of a non-linear function (i.e. even if the real motion is a translation the predictor may fail to find the true translation parameters), then all the estimated 2D translation may have some errors that will propagate in the estimation of the global motion. Despite the use of the RANSAC algorithm these errors will be never be compensated and the complete algorithm may fail to find the true global motion parameters. - Concerning the number of iterations: The amount of computational time necessary to find the true solution is unknown. In the proposed method the total time is the sum of the time to compute n predictors an the time to compute m iterations of RANSAC. If the time to compute the n predictors is too high than the number of RANSAC iterations may be too small. Conversely, if you fix a too high number of RANSAC iterations then the number of predictors may be too small to find correct solutions for the local motion. - Concerning the basin of convergence: Again, the learned predictors are linear. Since the relationship between the intensities errors and the motion parameters is generally non-linear, the basin of convergence of the algorithm is unknown. Moreover, the basin of convergence should also depend on the size of the tracked object in the available training images. The experimental results are not completely convincing and should be improved. The authors do not really compare their results with the work of Jurie [3]. The comparison is limited to the algorithm for learning the linear regressor with the LSQ approach used by Jurie. The authors should compare the global algorithms in order to quantify the improvements with respect to the Jurie algorithm [3]. Also, why the authors do not compare their work with the work by Williams [5] ? The comparison with the on-line gradient descent approach es can be completed using more advanced and efficient gradient descent approaches have been proposed since 1981 (see for example [Hager 1998] and the inverse compositional algorithm of Baker [2]). Note that, subset selection of pixel can be performed also for gradient descent approaches (see for example recent results in [Benhimane 2007]). [Hager 1998] Hager, G. D. and Belhumeur, P. N.: "Efficient region tracking with parametric models of geometry and illumination", TPAMI, 20(10):1025-1039, 1998. [Benhimane 2007] S. Benhimane, S, A. Ladikos, V. Lepetit, N. Navab: "Linear and Quadratic Subsets for Template-Based Tracking", Computer Vision and Pattern Recognition, 2007. Finally, the approach proposed in the paper is very similar to a feature-based approach where the matching step (for example with SIFT descriptors) of the reference points is replaced by the SLLP tracking. The authors should also consider these approaches in their discussion and experimental comparison. Minor issues: - The SSD optimisation in the KLT algorithm is performed by a Gauss-Newton and not by a Newton-Raphson method. Indeed, the Newton-Raphson method applied to the gradient of the SSD cost function would need the computation of Hessian matrices (second-derivatives) - The overview is a bit redundant w.r.t. the state of the art. I would suggest to merge them. - In figure 11(b) the range of LP and LP+DP methods starts from 20 pixels, why ? Is this linked to the uncertainty region ? ========================== 1. Which category describes this manuscript?: Research/Technology 2. How relevant is this manuscript to the readers of this periodical? If you answer Not very relevant or Irrelevant please explain your rating under Public Comments below.: Relevant 1. Please evaluate the significance of the manuscript’s research contribution.: Good 2. Please explain how this manuscript advances this field of research and/or contributes something new to the literature. : The main contribution of the paper is the idea that the training stage should take into account the computational complexity of the tracking. The computational complexity of the tracking is proportional to the number of pixels of the support. 3. Is the manuscript technically sound? In the Public Comments section, please provide detailed explanations to support your assessment: Appears to be - but didn't check completely 4. How thorough is the experimental validation (where appropriate)? Please discuss any shortcomings in the Public Comments section.: Insufficient; clearly inferior to state of the art, or necessary tests are absent 1. Are the title, abstract, and keywords appropriate? If not, please comment in the Public Comments section.: Yes 2. Does the manuscript contain sufficient and appropriate references? Please comment and include additional suggested references in the Public Comments section.: Important references are missing; more references are needed 3. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? If not, please explain your answer in the Public Comments section.: Could be improved 4. How would you rate the organization of the manuscript? Is it focused? Please elaborate with suggestions for reorganization in the Public Comments section.: Could be improved 5. Please rate the readability of the manuscript. Explain your rating under Public Comments below. : Readable - but requires some effort to understand 6. How is the length of the manuscript? If changes are suggested, please make explicit recommendations in the Public Comments section.: About right Please rate the manuscript overall. Explain your choice.: Good