This study investigated the effect of visual input on L2 listening comprehension within the context of a North American intensive English program.The interaction between visual input and working memory (WM) was also investigated, with the aim of clarifying what role visual input, together with WM, plays in L2 listening tests.The study compared two groups of upperintermediate L1 Chinese and Arabic ESL students.All participants (N = 24) took a WM test and were divided into two groups to take a listening comprehension test under two treatment conditions: one with video and one with audio-only texts.Results indicated that the presence of visual input had a significant negative effect on listening comprehension, while working memory had no significant effect.Additionally, no interaction was found between WM and the presence or absence of visual input.This paper concludes by discussing further research questions and implications for L2 listening assessment.Listening in a second language (L2) has been described as an arduous task: comprehension of speech requires the simultaneous processing of phonological, syntactic, semantic, and pragmatic information (Flowerdew & Miller, 2005).The act of listening, moreover, does not typically occur in isolation.Listeners usually receive visual input, such as observations of kinesic behavior and contextual information (Gregersen, 2007;Kellerman, 1992).In light of this fact, teachers began using video in L2 listening classrooms in the mid-1970s due to its ability to contextualize language and increase motivation (Flowerdew & Miller, 2005).However, while the use of video has now become standard practice in many L2 classrooms, it is not always used in testing situations.This discrepancy begs the question of what effect, if any, the use of video has on listening comprehension test scores.Another dimension of L2 listening is working memory (WM).Unlike aspects of language ability such as reading and writing, the aural channel through which listening is accomplished is typically more ephemeral in nature; the input listeners receive disappears after a speaker has finished speaking.This is particularly true in many academic contexts, where listening is often a one-way, transactional process (Buck, 2001;Morley, 2001;Peterson, 2001) that requires a high level of fluency and possibly a high WM capacity, especially at the discourse level (Juffs &