People who find themselves blind or have low imaginative and prescient (BLV) might hesitate to journey independently in unfamiliar environments on account of uncertainty concerning the bodily panorama. Whereas most instruments concentrate on in-situ navigation, these exploring pre-travel help sometimes present solely landmarks and turn-by-turn directions, missing detailed visible context. Road view imagery, which incorporates wealthy visible info and has the potential to disclose quite a few environmental particulars, stays inaccessible to BLV folks. On this work, we introduce SceneScout, a multimodal giant language mannequin (MLLM)-driven AI agent that allows accessible interactions with road view imagery. SceneScout helps two modes: (1) Route Preview, enabling customers to familiarize themselves with visible particulars alongside a route, and (2) Digital Exploration, enabling free motion inside road view imagery. Our person examine (N=10) demonstrates that SceneScout helps BLV customers uncover visible info in any other case unavailable by way of present means. A technical analysis reveals that the majority descriptions are correct (72%) and describe secure visible components (95%) even in older imagery, although occasional delicate and believable errors make them tough to confirm with out sight. We talk about future alternatives and challenges of utilizing road view imagery to boost navigation experiences.
- †Work accomplished whereas at Apple
- ‡ Columbia College