Skip to content

Implement NASA flight data loader#696

Open
ThreeMonth03 wants to merge 1 commit intosolvcon:masterfrom
ThreeMonth03:data_processing
Open

Implement NASA flight data loader#696
ThreeMonth03 wants to merge 1 commit intosolvcon:masterfrom
ThreeMonth03:data_processing

Conversation

@ThreeMonth03
Copy link
Collaborator

I implement the data loader of NASA flight dataset to help us finish the future work in issue #685. As for the usage of data loader, you can see the following command and understand how to get data from a iterator.

(base) threemonth03@threemonthwin:~/Downloads/modmesh$ python3 modmesh/track/load.py
Loaded 120791 events.
Event 0:
  timestamp=1.60259601021904e+18
  source=ground_truth
  data=GroundTruthData(con_pos_in_ecef=array([-1387897.36558835, -5268929.31643981,  3306577.65409484]), con_vel_in_ecef=array([ 0.00220006, -0.0081095 ,  0.00414972]), quat_con_ecef=array([-0.45840088, -0.1767584 ,  0.51147502,  0.70499532]))

Event 1:
  timestamp=1.60259601022904e+18
  source=ground_truth
  data=GroundTruthData(con_pos_in_ecef=array([-1387897.36564722, -5268929.31643027,  3306577.65410693]), con_vel_in_ecef=array([ 0.00197762, -0.00815516,  0.00414352]), quat_con_ecef=array([-0.45840118, -0.17675819,  0.51147449,  0.70499557]))

Event 2:
  timestamp=1.6025960102293e+18
  source=imu
  data=IMUData(dvel_in_imu=array([-0.18792725, -0.00048828, -0.04785156]), dangle_in_imu=array([-1.90734863e-06,  3.81469727e-06,  1.90734863e-06]))

Event 3:
  timestamp=1.60259601023904e+18
  source=ground_truth
  data=GroundTruthData(con_pos_in_ecef=array([-1387897.36570593, -5268929.31642279,  3306577.65411988]), con_vel_in_ecef=array([ 0.00190321, -0.00814863,  0.00428433]), quat_con_ecef=array([-0.45840138, -0.17675799,  0.51147416,  0.70499573]))

Event 4:
  timestamp=1.60259601024904e+18
  source=ground_truth
  data=GroundTruthData(con_pos_in_ecef=array([-1387897.36576642, -5268929.31641798,  3306577.65413258]), con_vel_in_ecef=array([ 0.00201493, -0.00820102,  0.00423451]), quat_con_ecef=array([-0.45840157, -0.17675801,  0.51147413,  0.70499562]))

Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yungyuc Please review this pull request when you are available. Thanks.

Comment on lines +33 to +58
@dataclass
class GroundTruthData:
"""
Data structure for ground truth data.
"""
con_pos_in_ecef: np.ndarray
con_vel_in_ecef: np.ndarray
quat_con_ecef: np.ndarray


@dataclass
class IMUData:
"""
Data structure for IMU data.
"""
dvel_in_imu: np.ndarray
dangle_in_imu: np.ndarray


@dataclass
class LidarData:
"""
Data structure for Lidar data.
"""
range_m: np.ndarray
doppler_speed_mps: np.ndarray
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data structure from different sensors.

Comment on lines +61 to +69
@dataclass(order=True)
class SensorEvent:
"""
A generic sensor event with a timestamp,
source identifier, and data payload.
"""
timestamp: float
source: str = field(compare=False)
data: Any = field(compare=False)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A generic sensor event with a timestamp,
source identifier, and data payload.

Comment on lines +99 to +108
def load(self):
"""
Load imu, lidar, and ground-truth data,
then sort events by timestamp.
"""
self.load_imu()
self.load_lidar()
self.load_ground_truth()
self.events.sort()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When calling this function, this function will load data from different files, then sort the data based on the order of timestamps.

Comment on lines +221 to +232
def main():
dataset = FlightDataset()
dataset.load()
print(f"Loaded {len(dataset)} events.")
it = dataset.iter_events()
for i in range(5):
event = next(it)
print(f"Event {i}:")
print(f" timestamp={event.timestamp}")
print(f" source={event.source}")
print(f" data={event.data}")
print()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple usage to load dataset by iterator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you ever need to have this CLI in the first place? It looks like just an example. If you only need an example, why don't you just create unit tests?

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was already a script track/dataset.py, and this PR adds a new one. They are not integrated while they should be. It is also questionable that why do you need CLI for load.py in this PR.

Points to address:

  • Clarify why the CLI is ever needed.
  • Add unit tests. The code does not look like friendly to testing. You may need to think about how to restructure to make pretty testing.
  • Do not directly import module content.
  • Clarify if it is better to make FlightDataset an iterator.
  • Name mangling is not necessary. Use _load_text() instead.

The code quality is not very good. We could need more discussions.

from dataclasses import dataclass, field
import numpy as np
from pathlib import Path
from typing import Any, Iterable, Optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not directly import module content. That is, do not from ... import .... Use only import ... for all modules and entities.

import typing, and then consume the contents like typing.Any, instead of from typing import Any; Any. The former removes the module name at the consuming site.

dataset = FlightDataset()
dataset.load()
print(f"Loaded {len(dataset)} events.")
it = dataset.iter_events()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is preventing you from using enumerate()?

for i, event in enumerate(dataset):
  if i > 5:
    break

Making dataset an iterator is more straigh-forward.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are around 130000 event data points, I'm not sure whether it is a good idea without iterator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enumerate() uses the iterator protocol as well.

data: Any = field(compare=False)


class FlightDataset:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a good idea for high-performance computing to rely on looping in Python. As a prototype, it's fine. But eventually you will need to get this thing into C++ for speed and saving memory.

"""
return self.events[idx]

def iter_events(self, sources: Optional[str | Iterable[str]] = None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typing information in Python will get in your way when moving the code into C++.

Let Python be dynamic. If you need typing information, it is a sign that the code needs to be in C++ soon.

Iterate over events of one or more sources.

:param sources: Event source(s). If None, iterate over all events.
Sources can be a single source string or an iterable of source strings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sphinx docstring format is off.

"""
if sources is None:
yield from self.events
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you return after yeild? Is it ever reaching return?

Load imu data.
"""
data = self.__load_text(self.imu_csv)
for row in data:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be very slow compare to C++. I would guess 20-100x.

)
self.events.append(event)

def __load_text(self, path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name mangling is not necessary. Use _load_text() instead.

Comment on lines +221 to +232
def main():
dataset = FlightDataset()
dataset.load()
print(f"Loaded {len(dataset)} events.")
it = dataset.iter_events()
for i in range(5):
event = next(it)
print(f"Event {i}:")
print(f" timestamp={event.timestamp}")
print(f" source={event.source}")
print(f" data={event.data}")
print()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you ever need to have this CLI in the first place? It looks like just an example. If you only need an example, why don't you just create unit tests?

@yungyuc yungyuc changed the title Implement NASA flight data loader. Implement NASA flight data loader Mar 18, 2026
@yungyuc
Copy link
Member

yungyuc commented Mar 20, 2026

@ThreeMonth03 update?

@ThreeMonth03 ThreeMonth03 force-pushed the data_processing branch 2 times, most recently from 6fb857f to 0ef7fc3 Compare March 20, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants