Dr. Ing. Josef Šivic

Theses

Dissertation theses

Learning visuomotor skills for robotic manipulation

Level
Topic of dissertation thesis
Topic description

Humans can solve everyday manipulation tasks remarkably efficiently and safely. With only a few interactions they learn to use tools without knowing a priori their exact physical properties or the properties of the environment to solve tasks such as hammering a nail, shoveling snow, raking leaves, or drilling holes into different materials. Currently, there is no artificial system with a similar level of visuomotor capabilities.

The objective of this thesis is to develop machine learning models grounded in the physical and geometrical structure of the world that enable learning safe visuomotor skills for robotic manipulation in new unseen environments with only a minimal amount of supervision, for example, coming from observing people performing the same task.

More information:

Literature
  • Z Li, J Sedlar, J Carpentier, I Laptev, N Mansard, J Sivic, Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
  • Y Labbe, S Zagoruyko, I Kalevatykh, I Laptev, J Carpentier, M Aubry, J Sivic, Monte-Carlo Tree Search for Efficient Visually Guided Rearrangement Planning, IEEE Robotics and Automation Letters (2020).
  • Y Labbe, J Carpentier, M Aubry, J Sivic, CosyPose: Consistent multi-view multi-object 6D pose estimation, European Conference on Computer Vision (ECCV) (2020).

Machine Learning for Analysis of Molecular Dynamics Simulations

Level
Topic of dissertation thesis
Topic description

Molecular dynamics (MD) simulations allow analyzing the physical movements of biomolecules. The generated data are sequences of frames (in 100 000s) captured at a predefined time step. Each frame consists of positions of all the atoms of a protein (in 10 000s), which are simulated using a molecular mechanics force field. The analysis of such a massive amount of data is often challenging especially for molecules with conformational heterogeneity, such as the disordered Abeta peptide relevant for Alzheimer's disease (AD). Abeta peptide is the hallmark of the disease and adopts diverse conformations. Understanding the dynamic properties of the Abeta protein is a key to determine the effects of drug candidates for potential AD treatment. The objective of this thesis is to build on recent advances in large-scale weakly and self-supervised learning for video sequences and develop new methods for automatic analysis of molecular dynamics simulations and more generally protein engineering. This topic will be co-advised with Dr. Stanislav Mazurenko and Prof. Jiri Damborsky (Loschmidt Laboratories, Masaryk University).

More information:

Literature
  • S Mazurenko, Z Prokop, J Damborsky. Machine learning in enzyme engineering, In ACS Catalysis, 10 (2), 1210-1223, 2020.
  • A Miech, D Zhukov, JB Alayrac, M Tapaswi, I Laptev, J Sivic, Howto100m: Learning a text-video embedding by watching hundred million narrated video clips, In Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition, 2019.
  • J-B Alayrac, P Bojanowski, N Agrawal, J Sivic, I Laptev, S Lacoste-Julien, Learning from Narrated Instruction Videos, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
  • A Mardt, L Pasquali, H Wu, F Noe, VAMPnets for deep learning of molecular kinetics. In Nature Communications, 9, 5, 2018.

Weakly supervised learning for visual recognition

Level
Topic of dissertation thesis
Topic description

Building machines that can automatically understand complex visual inputs is one of the central problems in artificial intelligence with applications in autonomous robotics, automatic manufacturing or healthcare. The problem is difficult due to the large variability of the visual world. The recent successes are, in large part, due to a combination of learnable visual representations based on convolutional neural networks, supervised machine learning techniques and large-scale Internet image collections. The next fundamental challenge lies in developing visual representations that do not require full supervision in the form of inputs and target outputs, but are instead learnable from only weak supervision that is noisy and only partially annotated data. This thesis will address this challenge.

More information:

Literature
  • Alayrac, J.-B., Bojanowski, P., Agrawal, N., Laptev, I., Sivic, J. and Lacoste-Julien, S., Learning from narrated instruction videos IEEE Transactions on Pattern Analysis and Machine Intelligence, 40 (9), 2194-2208 (2018)