ImpactU - Detalle del Producto

Active Speakers in Context

Acceso Abierto

ID Minciencias: ART-0001579086-66

Ranking: ART-GC_ART

Idioma: Inglés

Publicado: 01/01/2020

APC (est): No disponible

PDF

JSON

HTML

Abstract:

Current methods for active speak er detection focus on modeling short-term audiovisual information from a single speaker. Although this strategy can be enough for addressing single-speaker scenarios, it prevents accurate detection when the task is to identify who of many candidate speakers are talking. This paper introduces the Active Speaker Context, a novel representation that models relationships between multiple speakers over long time horizons. Our Active Speaker Context is designed to learn pairwise and temporal relations from an structured ensemble of audio-visual observations. Our experiments show that a structured feature ensemble already benefits the active speaker detection performance. Moreover, we find that the proposed Active Speaker Context improves the state-of-the-art on the AVA-ActiveSpeaker dataset achieving a mAP of 87.1%. We present ablation studies that verify that this result is a direct consequence of our long-term multi-speaker analysis.

Tópico:

Speech and Audio Processing

Citaciones:

Citaciones por año:

No hay datos de citaciones disponibles

Altmétricas:

Información de la Fuente:

FuentearXiv (Cornell University)	Cuartil año de publicaciónNo disponible	VolumenNo disponible
IssueNo disponible	PáginasNo disponible	pISSNNo disponible
ISSNNo disponible	Perfil OpenAlexhttps://openalex.org/S4306400194

Enlaces e Identificadores:

Minciencias ID	ART-0001579086-66	Scienti ID	0001579086-66	Open_access URL	https://arxiv.org/abs/2005.09812
Openalex URL	https://openalex.org/W3026908737	Doi URL	https://doi.org/10.48550/arxiv.2005.09812

Publicaciones editoriales no especializadas