AIを使って録音した音声を正確な街頭画像に変換(Researchers Use AI To Turn Sound Recordings Into Accurate Street Images)

ad

2024-11-27 テキサス大学オースチン校(UT Austin)

AIを使って録音した音声を正確な街頭画像に変換(Researchers Use AI To Turn Sound Recordings Into Accurate Street Images)

テキサス大学オースティン校の研究者たちは、生成型人工知能(AI)を用いて、音声録音から街並みの画像を生成する技術を開発しました。この技術は、音響環境に含まれる視覚的手がかりを活用し、都市や農村の音声データから高解像度の画像を作成します。研究チームは、北米、アジア、ヨーロッパの都市で収集した10秒間の音声クリップと対応する画像をAIモデルに学習させました。その結果、生成された画像は、空、緑地、建物の割合や建築様式、物体間の距離、さらには天候や時間帯の情報まで、実際の風景と高い一致を示しました。この成果は、AIが人間の感覚的経験を模倣し、音から視覚的情報を再現できる可能性を示しています。

<関連情報>

聴覚から視覚へ: サウンドスケープから画像へ生成する人工知能により、聴覚と視覚による場所の認識をリンクさせる From hearing to seeing: Linking auditory and visual place perceptions with soundscape-to-image generative artificial intelligence

Yonggai Zhuang, Yuhao Kang, Teng Fei, Meng Bian, Yunyan Du
Computers, Environment and Urban Systems  Available online: 1 May 2024
DOI:https://doi.org/10.1016/j.compenvurbsys.2024.102122

Highlights

  • A Soundscape-to-Image Diffusion Model is proposed to visualize street soundscapes.
  • Human auditory and visual perceptions are linked to understanding the sense of place.
  • Soundscapes provide sufficient visual information of places.

Abstract

People experience the world through multiple senses simultaneously, contributing to our sense of place. Prior quantitative geography studies have mostly emphasized human visual perceptions, neglecting human auditory perceptions at place due to the challenges in characterizing the acoustic environment vividly. Also, few studies have synthesized the two-dimensional (auditory and visual) perceptions in understanding human sense of place. To bridge these gaps, we propose a Soundscape-to-Image Diffusion model, a generative Artificial Intelligence (AI) model supported by Large Language Models (LLMs), aiming to visualize soundscapes through the generation of street view images. By creating audio-image pairs, acoustic environments are first represented as high-dimensional semantic audio vectors. Our proposed Soundscape-to-Image Diffusion model, which contains a Low-Resolution Diffusion Model and a Super-Resolution Diffusion Model, can then translate those semantic audio vectors into visual representations of place effectively. We evaluated our proposed model by using both machine-based and human-centered approaches. We proved that the generated street view images align with our common perceptions, and accurately create several key street elements of the original soundscapes. It also demonstrates that soundscapes provide sufficient visual information places. This study stands at the forefront of the intersection between generative AI and human geography, demonstrating how human multi-sensory experiences can be linked. We aim to enrich geospatial data science and AI studies with human experiences. It has the potential to inform multiple domains such as human geography, environmental psychology, and urban design and planning, as well as advancing our knowledge of human-environment relationships.

場所のアイデンティティ:ジェネレーティブAIの視点 Place identity: a generative AI’s perspective

Kee Moon Jang,Junda Chen,Yuhao Kang,Junghwan Kim,Jinhyung Lee,Fabio Duarte & Carlo Ratti
Humanities and Social Sciences Communications  Published:07 September 2024
DOI:https://doi.org/10.1057/s41599-024-03645-7

Fig. 1

Abstract

Do cities have a collective identity? The latest advancements in generative artificial intelligence (AI) models have enabled the creation of realistic representations learned from vast amounts of data. In this study, we test the potential of generative AI as the source of textual and visual information in capturing the place identity of cities assessed by filtered descriptions and images. We asked questions on the place identity of 64 global cities to two generative AI models, ChatGPT and DALL·E2. Furthermore, given the ethical concerns surrounding the trustworthiness of generative AI, we examined whether the results were consistent with real urban settings. In particular, we measured similarity between text and image outputs with Wikipedia data and images searched from Google, respectively, and compared across cases to identify how unique the generated outputs were for each city. Our results indicate that generative models have the potential to capture the salient characteristics of cities that make them distinguishable. This study is among the first attempts to explore the capabilities of generative AI in simulating the built environment in regard to place-specific meanings. It contributes to urban design and geography literature by fostering research opportunities with generative AI and discussing potential limitations for future studies.

1600情報工学一般
ad
ad
Follow
ad
タイトルとURLをコピーしました