Implement NASA flight data loader#696
Conversation
ThreeMonth03
left a comment
There was a problem hiding this comment.
@yungyuc Please review this pull request when you are available. Thanks.
modmesh/track/load.py
Outdated
| @dataclass | ||
| class GroundTruthData: | ||
| """ | ||
| Data structure for ground truth data. | ||
| """ | ||
| con_pos_in_ecef: np.ndarray | ||
| con_vel_in_ecef: np.ndarray | ||
| quat_con_ecef: np.ndarray | ||
|
|
||
|
|
||
| @dataclass | ||
| class IMUData: | ||
| """ | ||
| Data structure for IMU data. | ||
| """ | ||
| dvel_in_imu: np.ndarray | ||
| dangle_in_imu: np.ndarray | ||
|
|
||
|
|
||
| @dataclass | ||
| class LidarData: | ||
| """ | ||
| Data structure for Lidar data. | ||
| """ | ||
| range_m: np.ndarray | ||
| doppler_speed_mps: np.ndarray |
There was a problem hiding this comment.
Data structure from different sensors.
modmesh/track/load.py
Outdated
| @dataclass(order=True) | ||
| class SensorEvent: | ||
| """ | ||
| A generic sensor event with a timestamp, | ||
| source identifier, and data payload. | ||
| """ | ||
| timestamp: float | ||
| source: str = field(compare=False) | ||
| data: Any = field(compare=False) |
There was a problem hiding this comment.
A generic sensor event with a timestamp,
source identifier, and data payload.
modmesh/track/load.py
Outdated
| def load(self): | ||
| """ | ||
| Load imu, lidar, and ground-truth data, | ||
| then sort events by timestamp. | ||
| """ | ||
| self.load_imu() | ||
| self.load_lidar() | ||
| self.load_ground_truth() | ||
| self.events.sort() | ||
|
|
There was a problem hiding this comment.
When calling this function, this function will load data from different files, then sort the data based on the order of timestamps.
| def main(): | ||
| dataset = FlightDataset() | ||
| dataset.load() | ||
| print(f"Loaded {len(dataset)} events.") | ||
| it = dataset.iter_events() | ||
| for i in range(5): | ||
| event = next(it) | ||
| print(f"Event {i}:") | ||
| print(f" timestamp={event.timestamp}") | ||
| print(f" source={event.source}") | ||
| print(f" data={event.data}") | ||
| print() |
There was a problem hiding this comment.
A simple usage to load dataset by iterator.
There was a problem hiding this comment.
Why do you ever need to have this CLI in the first place? It looks like just an example. If you only need an example, why don't you just create unit tests?
yungyuc
left a comment
There was a problem hiding this comment.
There was already a script track/dataset.py, and this PR adds a new one. They are not integrated while they should be. It is also questionable that why do you need CLI for load.py in this PR.
Points to address:
- Clarify why the CLI is ever needed.
- Add unit tests. The code does not look like friendly to testing. You may need to think about how to restructure to make pretty testing.
- Do not directly import module content.
- Clarify if it is better to make
FlightDatasetan iterator. - Name mangling is not necessary. Use
_load_text()instead.
The code quality is not very good. We could need more discussions.
modmesh/track/load.py
Outdated
| from dataclasses import dataclass, field | ||
| import numpy as np | ||
| from pathlib import Path | ||
| from typing import Any, Iterable, Optional |
There was a problem hiding this comment.
Do not directly import module content. That is, do not from ... import .... Use only import ... for all modules and entities.
import typing, and then consume the contents like typing.Any, instead of from typing import Any; Any. The former removes the module name at the consuming site.
modmesh/track/load.py
Outdated
| dataset = FlightDataset() | ||
| dataset.load() | ||
| print(f"Loaded {len(dataset)} events.") | ||
| it = dataset.iter_events() |
There was a problem hiding this comment.
What is preventing you from using enumerate()?
for i, event in enumerate(dataset):
if i > 5:
breakMaking dataset an iterator is more straigh-forward.
There was a problem hiding this comment.
There are around 130000 event data points, I'm not sure whether it is a good idea without iterator.
There was a problem hiding this comment.
enumerate() uses the iterator protocol as well.
| data: Any = field(compare=False) | ||
|
|
||
|
|
||
| class FlightDataset: |
There was a problem hiding this comment.
It is not a good idea for high-performance computing to rely on looping in Python. As a prototype, it's fine. But eventually you will need to get this thing into C++ for speed and saving memory.
modmesh/track/load.py
Outdated
| """ | ||
| return self.events[idx] | ||
|
|
||
| def iter_events(self, sources: Optional[str | Iterable[str]] = None): |
There was a problem hiding this comment.
Typing information in Python will get in your way when moving the code into C++.
Let Python be dynamic. If you need typing information, it is a sign that the code needs to be in C++ soon.
modmesh/track/load.py
Outdated
| Iterate over events of one or more sources. | ||
|
|
||
| :param sources: Event source(s). If None, iterate over all events. | ||
| Sources can be a single source string or an iterable of source strings. |
There was a problem hiding this comment.
Sphinx docstring format is off.
modmesh/track/load.py
Outdated
| """ | ||
| if sources is None: | ||
| yield from self.events | ||
| return |
There was a problem hiding this comment.
Why do you return after yeild? Is it ever reaching return?
modmesh/track/load.py
Outdated
| Load imu data. | ||
| """ | ||
| data = self.__load_text(self.imu_csv) | ||
| for row in data: |
There was a problem hiding this comment.
May be very slow compare to C++. I would guess 20-100x.
modmesh/track/load.py
Outdated
| ) | ||
| self.events.append(event) | ||
|
|
||
| def __load_text(self, path): |
There was a problem hiding this comment.
Name mangling is not necessary. Use _load_text() instead.
| def main(): | ||
| dataset = FlightDataset() | ||
| dataset.load() | ||
| print(f"Loaded {len(dataset)} events.") | ||
| it = dataset.iter_events() | ||
| for i in range(5): | ||
| event = next(it) | ||
| print(f"Event {i}:") | ||
| print(f" timestamp={event.timestamp}") | ||
| print(f" source={event.source}") | ||
| print(f" data={event.data}") | ||
| print() |
There was a problem hiding this comment.
Why do you ever need to have this CLI in the first place? It looks like just an example. If you only need an example, why don't you just create unit tests?
|
@ThreeMonth03 update? |
6fb857f to
0ef7fc3
Compare
0ef7fc3 to
022c53d
Compare
I implement the data loader of NASA flight dataset to help us finish the future work in issue #685. As for the usage of data loader, you can see the following command and understand how to get data from a iterator.
(base) threemonth03@threemonthwin:~/Downloads/modmesh$ python3 modmesh/track/load.py Loaded 120791 events. Event 0: timestamp=1.60259601021904e+18 source=ground_truth data=GroundTruthData(con_pos_in_ecef=array([-1387897.36558835, -5268929.31643981, 3306577.65409484]), con_vel_in_ecef=array([ 0.00220006, -0.0081095 , 0.00414972]), quat_con_ecef=array([-0.45840088, -0.1767584 , 0.51147502, 0.70499532])) Event 1: timestamp=1.60259601022904e+18 source=ground_truth data=GroundTruthData(con_pos_in_ecef=array([-1387897.36564722, -5268929.31643027, 3306577.65410693]), con_vel_in_ecef=array([ 0.00197762, -0.00815516, 0.00414352]), quat_con_ecef=array([-0.45840118, -0.17675819, 0.51147449, 0.70499557])) Event 2: timestamp=1.6025960102293e+18 source=imu data=IMUData(dvel_in_imu=array([-0.18792725, -0.00048828, -0.04785156]), dangle_in_imu=array([-1.90734863e-06, 3.81469727e-06, 1.90734863e-06])) Event 3: timestamp=1.60259601023904e+18 source=ground_truth data=GroundTruthData(con_pos_in_ecef=array([-1387897.36570593, -5268929.31642279, 3306577.65411988]), con_vel_in_ecef=array([ 0.00190321, -0.00814863, 0.00428433]), quat_con_ecef=array([-0.45840138, -0.17675799, 0.51147416, 0.70499573])) Event 4: timestamp=1.60259601024904e+18 source=ground_truth data=GroundTruthData(con_pos_in_ecef=array([-1387897.36576642, -5268929.31641798, 3306577.65413258]), con_vel_in_ecef=array([ 0.00201493, -0.00820102, 0.00423451]), quat_con_ecef=array([-0.45840157, -0.17675801, 0.51147413, 0.70499562]))