People who find themselves blind or have low imaginative and prescient (BLV) could hesitate to journey independently in unfamiliar environments as a result of uncertainty concerning the bodily panorama. Whereas most instruments give attention to in-situ navigation, these exploring pre-travel help sometimes present solely landmarks and turn-by-turn directions, missing detailed visible context. Avenue view imagery, which comprises wealthy visible info and has the potential to disclose quite a few environmental particulars, stays inaccessible to BLV individuals. On this work, we introduce SceneScout, a multimodal giant language mannequin (MLLM)-driven AI agent that allows accessible interactions with avenue view imagery. SceneScout helps two modes: (1) Route Preview, enabling customers to familiarize themselves with visible particulars alongside a route, and (2) Digital Exploration, enabling free motion inside avenue view imagery. Our person research (N=10) demonstrates that SceneScout helps BLV customers uncover visible info in any other case unavailable by means of current means. A technical analysis reveals that almost all descriptions are correct (72%) and describe secure visible parts (95%) even in older imagery, although occasional refined and believable errors make them troublesome to confirm with out sight. We focus on future alternatives and challenges of utilizing avenue view imagery to boost navigation experiences.
- †Work carried out whereas at Apple
- ‡ Columbia College