AudioSetCaps|音频-语言多模态数据集|多模态数据数据集
收藏AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models
数据集概述
- 数据来源: AudioSet, YouTube-8M, VGGSound
- 音频文件数量: 6,117,099个10秒音频文件
- Q&A数据对数量: 18,414,789对
数据集内容
- 音频描述: 每个音频文件附有一个描述性标题。
- Q&A对: 每个音频文件附有三个Q&A对,作为生成最终标题的元数据。
示例
| ID | 音频 | 描述 | Q&A 1 | Q&A 2 | Q&A 3 |
|---|---|---|---|---|---|
| _7Xe9vD3Hpg_4_10 | <audio controls><source src="Example /_7Xe9vD3Hpg_4_10.mp3" type="audio/mpeg"> Your browser does not support the audio element.</audio> | A solemn instrumental piece unfolds, featuring the melancholic strains of a cello and the resonant tolling of a bell. The initial tempo is slow and deliberate, gradually building intensity with each successive bell ring. | Question: Describe this audio according to the sounds in it. Answer: The music starts with a slow string melody and continues with a bass note. The sound of a bell rings and the music becomes more intense. | Question: Based on the QAs, give some information about the speech, such as the emotion of the speaker, the gender of the speaker, and the spoken language, only if speech is present in this audio. Answer: Im sorry, but there is no speech in the audio. | Question: Based on the QAs, give some information about the music, such as music genre and music instruments, only if music is present in this audio. Answer: The music genre is instrumental. The music instruments are the cello and the bell. |
| -TL8Mp3xcUM_0_10 | <audio controls><source src="Example/-TL8Mp3xcUM_0_10.mp3" type="audio/mpeg"> Your browser does not support the audio element.</audio> | A woman expresses strong emotions with a noticeably high-pitched vocal tone. | Question: Describe this audio according to the sounds in it. Answer: A woman speaks with a high-pitched voice. | Question: Based on the QAs, give some information about the speech, such as the emotion of the speaker, the gender of the speaker, and the spoken language, only if speech is present in this audio. Answer: The speech is emotional, as the woman speaks in a high-pitched voice. | Question: Based on the QAs, give some information about the music, such as music genre and music instruments, only if music is present in this audio. Answer: There is no music in this audio. |
数据统计
| 数据集 | 音频描述数量 | Q&A描述数量 | 总计 |
|---|---|---|---|
| AudioSetCaps | 1910920 | 5736072 | 7646992 |
| YouTube-8M | 4023990 | 12086037 | 16110027 |
| VGGSound | 182189 | 592680 | 774869 |
| 总计 | 6117099 | 18414789 | 24531888 |
下载
许可证
- 仅允许学术用途
引用
bibtex @inproceedings{ bai2024audiosetcaps, title={AudioSetCaps: Enriched Audio Captioning Dataset Generation Using Large Audio Language Models}, author={JISHENG BAI and Haohe Liu and Mou Wang and Dongyuan Shi and Wenwu Wang and Mark D Plumbley and Woon-Seng Gan and Jianfeng Chen}, booktitle={Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation}, year={2024}, url={https://openreview.net/forum?id=uez4PMZwzP} }

波士顿房价数据集
波士顿房价数据集是一个经典的机器学习数据集,通常用于回归任务,尤其是房价预测。下方文档中有所有字段顺序的描述。
阿里云天池 收录
中国交通事故深度调查(CIDAS)数据集
交通事故深度调查数据通过采用科学系统方法现场调查中国道路上实际发生交通事故相关的道路环境、道路交通行为、车辆损坏、人员损伤信息,以探究碰撞事故中车损和人伤机理。目前已积累深度调查事故10000余例,单个案例信息包含人、车 、路和环境多维信息组成的3000多个字段。该数据集可作为深入分析中国道路交通事故工况特征,探索事故预防和损伤防护措施的关键数据源,为制定汽车安全法规和标准、完善汽车测评试验规程、
北方大数据交易中心 收录
TongueDx Dataset
TongueDx数据集是一个专为远程舌诊研究设计的综合性舌象图像数据集,由香港理工大学和新加坡管理大学的研究团队创建。该数据集包含5109张图像,涵盖了多种环境条件下的舌象,图像通过智能手机和笔记本电脑摄像头采集,具有较高的多样性和代表性。数据集不仅包含舌象图像,还提供了详细的舌面属性标注,如舌色、舌苔厚度等,并附有受试者的年龄、性别等人口统计信息。数据集的创建过程包括图像采集、舌象分割、标准化处理和多标签标注,旨在解决远程医疗中舌诊图像质量不一致的问题。该数据集的应用领域主要集中在远程医疗和中医诊断,旨在通过自动化技术提高舌诊的准确性和可靠性。
arXiv 收录
中国近海台风路径集合数据集(1945-2024)
1945-2024年度,中国近海台风路径数据集,包含每个台风的真实路径信息、台风强度、气压、中心风速、移动速度、移动方向。 数据源为获取温州台风网(http://www.wztf121.com/)的真实观测路径数据,经过处理整合后形成文件,如使用csv文件需使用文本编辑器打开浏览,否则会出现乱码,如要使用excel查看数据,请使用xlsx的格式。
国家海洋科学数据中心 收录
TCIA: The Cancer Imaging Archive
TCIA: The Cancer Imaging Archive 是一个公开的癌症影像数据库,包含多种癌症类型的影像数据,如乳腺癌、肺癌、脑癌等。数据集还包括相关的临床数据和生物标记物信息,旨在支持癌症研究和临床应用。
www.cancerimagingarchive.net 收录
