Abstract This article sets out to bring sound and music to the field of visual studies in International Relations. It argues that IR largely has approached the visual field as if it was without sound; neglecting how audial landscapes frame and direct our interpretation of moving imagery. Sound and music contribute to making imagery intelligible to us, we ‘hear the pictures’ often without noticing. The audial can for instance articulate a visual absence, or blast visual signs, bring out certain emotional stages or subjects’ inner life. Audial frames steer us in distinct directions, they can mute the cries of the wounded in war, or amplify the sounds of joy of soldiers shooting in the air. To bring the audial and the visual analytically and empirically together, the article therefore proposes four key analytical themes: 1) the audial–visual frame, 2) point of view/point of audition, 3) modes of audio-visual synchronization and 4) aesthetics moods. These are applied to a study of ‘war music videos’ in Iraq and Syria made and circulated by Shi'a militias currently fighting there. Such war music videos, it is suggested, are not just artefacts of popular culture, but have become integral parts of how warfare is practiced today, and one that is shared by soldiers in the US and Europe. War music videos are performing war, just as they shape how war is known by spectators and participants alike.