The natural world is abundant with concepts expressed via visual, acoustic, tactile, and linguistic modalities. The term cross-modal learning emerges as important research direction in computer vision and multimedia, which refers to the adaptive, synergistic integration of complex perceptions from multiple sensory modalities, such as the learning that occurs within any individual visual sensory modality can be enhanced with information from one or more other modalities, e.g., texts. This session focuses on understanding, reasoning and generation across language/text and vision. It prompts the creation of intelligent services, including vision-to-text captioning, textto-vision generation, and question answering/ dialog about images and videos. This special session invites papers that will be complimentary to all ICMR 2021 conference registrants.
Perspective submissions should fall into the following topics but not limited to:
Each full paper should be limited to 6-8 pages (6 pages limit + reference).
Paper Submission: March 3, 2021
Notification of Acceptance: April 11, 2021
Camera-Ready Papers Due: May 1, 2021
See the ICMR 2021 Paper submission section.