five

Vision Based Navigation Datasets|航天导航数据集|机器学习数据集

收藏
arXiv2024-09-18 更新2024-09-19 收录
航天导航
机器学习
下载链接:
http://arxiv.org/abs/2409.11383v1
下载链接
链接失效反馈
资源简介:
Vision Based Navigation Datasets是由欧洲空间局主导,空客防务与空间公司参与创建的数据集,旨在支持基于视觉的导航技术在航天领域的应用。该数据集包含多个子集,涵盖了从月球着陆到人造卫星对接等多种场景,总计超过13万条数据。数据集的创建过程结合了真实图像、实验室模拟和合成图像,确保了数据的多源性和高质量。这些数据集主要用于训练机器学习算法,特别是在姿态估计和光学流算法方面,以解决航天器导航中的精确控制问题。
提供机构:
欧洲空间局
创建时间:
2024-09-18
AI搜集汇总
数据集介绍
main_image_url
构建方式
Vision Based Navigation Datasets的构建方式融合了多种数据源,包括真实图像、实验室模拟和合成图像。首先,利用Chang’e 3着陆器的导航相机图像作为基础数据,通过PDS标准格式化,并逆向推导出估计的轨迹。其次,利用SurRender软件进行高保真图像模拟,结合多分辨率地形模型和元数据生成合成数据。此外,DLR TRON设施的实验室模拟和Airbus Robotic实验室的实验数据也被纳入,确保数据的多样性和真实性。最后,通过生成对抗网络(GAN)将低分辨率合成图像转换为高分辨率图像,进一步丰富数据集的多样性。
特点
Vision Based Navigation Datasets的特点在于其数据来源的多样性和高保真度。数据集不仅包含真实的Chang’e 3图像,还包括实验室模拟和合成图像,确保了数据的多维度覆盖。此外,通过SurRender软件生成的高保真图像模拟,结合多分辨率地形模型和元数据,使得数据集在视觉和物理特性上具有高度一致性。生成对抗网络(GAN)的应用进一步提升了图像质量,使得数据集在训练机器学习算法时具有更高的适用性和准确性。
使用方法
Vision Based Navigation Datasets的使用方法多样,适用于多种视觉导航算法的训练和验证。首先,数据集可以直接用于训练基于卷积神经网络(CNN)的姿态估计算法,通过对比预测的热图与地面真实值,评估算法的性能。其次,数据集也可用于训练密集光流算法,如RAFT,通过光学流端点误差(EPE)等指标评估算法在不同数据集上的表现。此外,数据集还可用于生成对抗网络(GAN)的训练,通过将低分辨率合成图像转换为高分辨率图像,提升图像质量和数据集的多样性。
背景与挑战
背景概述
视觉导航数据集(Vision Based Navigation Datasets)是由Airbus Defence and Space与欧洲航天局(ESA)合作开发,旨在解决基于视觉的导航(VBN)在航天应用中的关键问题。该项目始于2022年6月,持续至2023年12月,主要研究人员包括Jérémy Lebreton、Ingo Ahrns等,涵盖了Airbus Toulouse、Airbus Bremen以及DLR等机构。核心研究问题是如何生成适用于机器学习算法的训练数据集,以验证和提升VBN算法的性能。该数据集的创建不仅推动了航天领域中机器学习的应用,还为未来的空间任务提供了重要的技术支持。
当前挑战
视觉导航数据集在构建过程中面临多项挑战。首先,生成高质量的合成数据集需要精确的模拟工具和复杂的图像处理技术,如SurRender软件的使用。其次,确保合成数据与真实数据之间的准确性和一致性是一个重大难题,尤其是在处理如月球着陆和卫星对接等复杂场景时。此外,数据集的多样性和覆盖范围也是一个挑战,需要涵盖不同的光照条件、视角和动态环境。最后,如何有效地利用生成对抗网络(GAN)等先进技术来提升数据集的质量和真实感,同时保持计算效率,也是当前研究的重点。
常用场景
经典使用场景
Vision Based Navigation Datasets(基于视觉的导航数据集)在航天领域中被广泛用于训练机器学习算法,特别是在视觉导航和控制(GNC)方面。该数据集的经典使用场景包括两个主要案例:一是卫星在轨对接,使用ENVISAT卫星的模拟数据;二是月球着陆场景,利用嫦娥三号(Chang’e 3)的真实图像和合成数据。这些数据集通过高保真图像模拟器SurRender生成,结合了真实图像和合成图像,以确保训练数据的多样性和准确性。
衍生相关工作
Vision Based Navigation Datasets的开发和应用催生了一系列相关研究工作。例如,基于该数据集,研究者们开发了多种深度学习模型,用于姿态估计和光学流计算。此外,生成对抗网络(GAN)在该数据集上的应用,展示了如何将低分辨率合成图像转换为高分辨率真实图像,进一步提升了数据集的质量和应用范围。这些衍生工作不仅丰富了视觉导航领域的研究内容,也为未来的航天任务提供了技术储备。
数据集最近研究
最新研究方向
在视觉导航领域,基于视觉的导航数据集(Vision Based Navigation Datasets)的最新研究方向主要集中在利用合成数据和真实数据相结合的方法,以提高机器学习算法在空间应用中的性能。研究团队通过生成高保真度的合成图像和元数据,结合生成对抗网络(GANs)和模型捕捉技术,致力于解决空间导航中数据集不足的问题。这些研究不仅推动了视觉导航算法的发展,还为未来空间任务中的自主导航提供了坚实的基础。
相关研究论文
  • 1
    Training Datasets Generation for Machine Learning: Application to Vision Based Navigation欧洲空间局 · 2024年
以上内容由AI搜集并总结生成
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4099个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

aqcat25

<h1 align="center" style="font-size: 36px;"> <span style="color: #FFD700;">AQCat25 Dataset:</span> Unlocking spin-aware, high-fidelity machine learning potentials for heterogeneous catalysis </h1> ![datset_schematic](https://cdn-uploads.huggingface.co/production/uploads/67256b7931376d3bacb18de0/W1Orc_AmSgRez5iKH0qjC.jpeg) This repository contains the **AQCat25 dataset**. AQCat25-EV2 models can be accessed [here](https://huggingface.co/SandboxAQ/aqcat25-ev2). The AQCat25 dataset provides a large and diverse collection of **13.5 million** DFT calculation trajectories, encompassing approximately 5K materials and 47K intermediate-catalyst systems. It is designed to complement existing large-scale datasets by providing calculations at **higher fidelity** and including critical **spin-polarized** systems, which are essential for accurately modeling many industrially relevant catalysts. Please see our [website](https://www.sandboxaq.com/aqcat25) and [paper](https://cdn.prod.website-files.com/622a3cfaa89636b753810f04/68ffc1e7c907b6088573ba8c_AQCat25.pdf) for more details about the impact of the dataset and [models](https://huggingface.co/SandboxAQ/aqcat25-ev2). ## 1. AQCat25 Dataset Details This repository uses a hybrid approach, providing lightweight, queryable Parquet files for each split alongside compressed archives (`.tar.gz`) of the raw ASE database files. More details can be found below. ### Queryable Metadata (Parquet Files) A set of Parquet files provides a "table of contents" for the dataset. They can be loaded directly with the `datasets` library for fast browsing and filtering. Each file contains the following columns: | Column Name | Data Type | Description | Example | | :--- | :--- | :--- | :--- | | `frame_id` | string | **Unique ID for this dataset**. Formatted as `database_name::index`. | `data.0015.aselmdb::42` | | `adsorption_energy`| float | **Key Target**. The calculated adsorption energy in eV. | -1.542 | | `total_energy` | float | The raw total energy of the adslab system from DFT (in eV). | -567.123 | | `fmax` | float | The maximum force magnitude on any single atom in eV/Å. | 0.028 | | `is_spin_off` | boolean | `True` if the system is non-magnetic (VASP ISPIN=1). | `false` | | `mag` | float | The total magnetization of the system (µB). | 32.619 | | `slab_id` | string | Identifier for the clean slab structure. | `mp-1216478_001_2_False` | | `adsorbate` | string | SMILES or chemical formula of the adsorbate. | `*NH2N(CH3)2` | | `is_rerun` | boolean | `True` if the calculation is a continuation. | `false` | | `is_md` | boolean | `True` if the frame is from a molecular dynamics run. | `false` | | `sid` | string | The original system ID from the source data. | `vadslabboth_82` | | `fid` | integer | The original frame index (step number) from the source VASP calculation. | 0 | --- #### Understanding `frame_id` and `fid` | Field | Purpose | Example | | :--- | :--- | :--- | | `fid` | **Original Frame Index**: This is the step number from the original VASP relaxation (`ionic_steps`). It tells you where the frame came from in its source simulation. | `4` (the 5th frame of a specific VASP run) | | `frame_id` | **Unique Dataset Pointer**: This is a new ID created for this specific dataset. It tells you exactly which file (`data.0015.aselmdb`) and which row (`101`) to look in to find the full atomic structure. | `data.0015.aselmdb::101` | --- ## Downloadable Data Archives The full, raw data for each split is available for download in compressed `.tar.gz` archives. The table below provides direct download links. The queryable Parquet files for each split can be loaded directly using the `datasets` library as shown in the "Example Usage" section. The data currently available for download (totaling ~11.1M frames, as listed in the table below) is the initial dataset version (v1.0) released on September 10, 2025. The 13.5M frame count mentioned in our paper and the introduction includes additional data used to rebalance non-magnetic element systems and add a low-fidelity spin-on dataset. These new data splits will be added to this repository soon. | Split Name | Structures | Archive Size | Download Link | | :--- | :--- | :--- | :--- | | ***In-Domain (ID)*** | | | | | Train | `7,386,750` | `23.8 GB` | [`train_id.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/train_id.tar.gz) | | Validation | `254,498` | `825 MB` | [`val_id.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/val_id.tar.gz) | | Test | `260,647` | `850 MB` | [`test_id.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/test_id.tar.gz) | | Slabs | `898,530` | `2.56 GB` | [`id_slabs.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/id_slabs.tar.gz) | | ***Out-of-Distribution (OOD) Validation*** | | | | | OOD Ads (Val) | `577,368` | `1.74 GB` | [`val_ood_ads.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/val_ood_ads.tar.gz) | | OOD Materials (Val) | `317,642` | `963 MB` | [`val_ood_mat.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/val_ood_mat.tar.gz) | | OOD Both (Val) | `294,824` | `880 MB` | [`val_ood_both.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/val_ood_both.tar.gz) | | OOD Slabs (Val) | `28,971` | `83 MB` | [`val_ood_slabs.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/val_ood_slabs.tar.gz) | | ***Out-of-Distribution (OOD) Test*** | | | | | OOD Ads (Test) | `346,738` | `1.05 GB` | [`test_ood_ads.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/test_ood_ads.tar.gz) | | OOD Materials (Test) | `315,931` | `993 MB` | [`test_ood_mat.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/test_ood_mat.tar.gz) | | OOD Both (Test) | `355,504` | `1.1 GB` | [`test_ood_both.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/test_ood_both.tar.gz) | | OOD Slabs (Test) | `35,936` | `109 MB` | [`test_ood_slabs.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/test_ood_slabs.tar.gz) | --- ## 2. Dataset Usage Guide This guide outlines the recommended workflow for accessing and querying the AQCat25 dataset. ### 2.1 Initial Setup Before you begin, you need to install the necessary libraries and authenticate with Hugging Face. This is a one-time setup. ```bash pip install datasets pandas ase tqdm requests huggingface_hub ase-db-backends ``` **1. Create a Hugging Face Account:** If you don't have one, create an account at [huggingface.co](https://huggingface.co/join). **2. Create an Access Token:** Navigate to your **Settings -> Access Tokens** page or click [here](https://huggingface.co/settings/tokens). Create a new token with at least **`read`** permissions. Copy this token to your clipboard. **3. Log in via the Command Line:** Open your terminal and run the following command: ```bash hf auth login ``` ### 2.2 Get the Helper Scripts You may copy the scripts directly from this repository, or download them by running the following in your local python environment: ```python from huggingface_hub import snapshot_download snapshot_download( repo_id="SandboxAQ/aqcat25", repo_type="dataset", allow_patterns=["scripts/*", "README.md"], local_dir="./aqcat25" ) ``` This will create a local folder named aqcat25 containing the scripts/ directory. ### 2.3 Download Desired Dataset Splits Data splits may be downloaded directly via the Hugging Face UI, or via the `download_split.py` script (found in `aqcat25/scripts/`). ```bash python aqcat25/scripts/download_split.py --split val_id ``` This will download `val_id.tar.gz` and extract it to a new folder named `aqcat_data/val_id/`. ### 2.4 Query the Dataset Use the `query_aqcat.py` script to filter the dataset and extract the specific atomic structures you need. It first queries the metadata on the Hub and then extracts the full structures from your locally downloaded files. **Example 1: Find all CO and OH structures in the test set:** ```bash python aqcat25/scripts/query_aqcat.py \ --split test_id \ --adsorbates "*CO" "*OH" \ --data-root ./aqcat_data/test_id ``` **Example 2: Find structures on metal slabs with low adsorption energy:** ```bash python aqcat25/scripts/query_aqcat.py \ --split val_ood_both \ --max-energy -2.0 \ --material-type nonmetal \ --magnetism magnetic \ --data-root ./aqcat_data/val_ood_both \ --output-file low_energy_metals.extxyz ``` **Example 3: Find CO on slabs containing both Ni AND Se with adsorption energy between -2.5 and -1.5 eV with a miller index of 011** ```bash python aqcat25/scripts/query_aqcat.py \ --split val_ood_ads \ --adsorbates "*COCH2OH" \ --min-energy -2.5 \ --max-energy -1.5 \ --contains-elements "Ni" "Se" \ --element-filter-mode all \ --facet 011 \ --data-root ./aqcat_data/val_ood_ads \ --output-file COCH2OH_on_ni_and_se.extxyz ``` --- ## 3. How to Cite If you use the AQCat25 dataset or the models in your research, please cite the following paper: ``` Omar Allam, Brook Wander, & Aayush R. Singh. (2025). AQCat25: Unlocking spin-aware, high-fidelity machine learning potentials for heterogeneous catalysis. arXiv preprint arXiv:XXXX.XXXXX. ``` ### BibTeX Entry ```bibtex @article{allam2025aqcat25, title={{AQCat25: Unlocking spin-aware, high-fidelity machine learning potentials for heterogeneous catalysis}}, author={Allam, Omar and Wander, Brook and Singh, Aayush R}, journal={arXiv preprint arXiv:2510.22938}, year={2025}, eprint={2510.22938}, archivePrefix={arXiv}, primaryClass={cond-mat.mtrl-sci} } ```

魔搭社区 收录

Spatial_Navigation

这是一个专注于四个代表性任务的多模态增强数据集,这些任务需要不同程度的视觉参与和跨模态交互,包括拼图组装、空间导航、视觉搜索和图表重聚焦。

huggingface 收录

LIDC-IDRI

LIDC-IDRI 数据集包含来自四位经验丰富的胸部放射科医师的病变注释。 LIDC-IDRI 包含来自 1010 名肺部患者的 1018 份低剂量肺部 CT。

OpenDataLab 收录

MOOCs Dataset

该数据集包含了大规模开放在线课程(MOOCs)的相关数据,包括课程信息、用户行为、学习进度等。数据主要用于研究在线教育的行为模式和学习效果。

www.kaggle.com 收录

MeSH

MeSH(医学主题词表)是一个用于索引和检索生物医学文献的标准化词汇表。它包含了大量的医学术语和概念,用于描述医学文献中的主题和内容。MeSH数据集包括主题词、副主题词、树状结构、历史记录等信息,广泛应用于医学文献的分类和检索。

www.nlm.nih.gov 收录