Learning for Optimizing (Méta heuristiques et Machine Learning pour l'Optimisation Combinatoire)
Objectif
Le but est de fournir des approches où les problèmes d'optimisation combinatoire sont résolus en utilisant des techniques de métaheuristiques et d'apprentissage automatique (soit en combinant les métaheuristiques et l'apprentissage automatique pour résoudre le même problème, soit en les utilisant dans un flux de travail où chacun résout différentes parties d'un problème plus vaste).
Perturbation-based XAI methods for Visual Transformers
Objectives:
We develop a XAI technique for Visual Transformers, named Transformers Input Sampling (TIS), and compare it to state of the art methods (ViT-CX, G-LIME, TAM, Attention rollout, …). The comparison is done for several metrics (Insertion/deletion, Pointing Game, …) and for two visual transformer networks: the vanilla Vision Transformers (ViT) and the Data Efficient Image Transformers (DeiT).
DeepRare: Generic Unsupervised Visual Attention Models
Human visual system is modeled in engineering field providing feature-engineered methods which detect contrasted/surprising/unusual data into images. This data is "interesting" for humans and leads to numerous applications. Deep learning (DNNs) drastically improved the algorithms efficiency on the main benchmark datasets. However, DNN-based models are counter-intuitive: surprising or unusual data is by definition difficult to learn because of its low occurrence probability.
Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation
This work aims at generating captions for soccer videos using deep learning. In this context, this paper introduces a dataset, model, and triple-level evaluation. The dataset consists of 22k caption-clip pairs and three visual features (images, optical flow, inpainting) for ~500 hours of \emph{SoccerNet} videos. The model is divided into three parts: a transformer learns language, ConvNets learn vision, and a fusion of linguistic and visual features generates captions.
Analysis of Co-Laughter Gesture Relationship on RGB videos in Dyadic Conversation Contex
The development of virtual agents has enabled human-avatar interactions to become increasingly rich and varied. Moreover, an expressive virtual agent i.e. that mimics the natural expression of emotions, enhances social interaction between a user (human) and an agent (intelligent machine). The set of non-verbal behaviors of a virtual character is, therefore, an important component in the context of human-machine interaction.