Doc example standard analysis reproducibility#14
Conversation
NEStock
left a comment
There was a problem hiding this comment.
Looks good and page works as expected!
| @@ -0,0 +1,116 @@ | |||
| # Example Data Standardization | |||
There was a problem hiding this comment.
Would suggest making this a bit more descriptive. Maybe "Example Workflow for Data Standardization and Analysis Reproducibility"
| @@ -0,0 +1,116 @@ | |||
| # Example Data Standardization | |||
|
|
|||
| Here we document an example of how we took processed data stored in a typical file format many researchers work with and converted that to the NWB file format (community standard accepted by EMBER). | |||
There was a problem hiding this comment.
Might be nice to make this a bulleted list to highlight the goals being:
- Standardize the data in a community-accepted standard (NWB)
- Enable reproducibility of paper figures as a starting point for secondary analyses
That way the goals are viewed in parallel rather than the analysis reproducibility being just a secondary effort
|
|
||
| In collaboration with Dr. Suthana and Dr. Seeber (lead author), we explored each of the data variables in the original .mat files and identified analogous containers within the NWB file structure. | ||
|
|
||
| Most of the data variables are relevant to multiple subjects at the same time, as is often the case for data representing group averages in paper figures. |
There was a problem hiding this comment.
Maybe add a ("e.g., ...") after "...relevant to multiple subjects at the same time". That would give a more concrete example of what that means so that someone understands when ndx-multisubjects should be used (I think the "at the same time" is the key part here?)
|
|
||
| As mentioned above, the first step towards enabling robust secondary analyses is to replicate publication figures or analyses produced with the original file format. | ||
|
|
||
| To continue towards this effort, the following next steps are outlined: |
There was a problem hiding this comment.
Maybe we can motivate some of these next steps with one sentence here? Why do we need/want to do this with the raw data as well?
| We also show how we are able to replicate figure results from a paper using the converted data. | ||
|
|
||
| In doing this exercise, we complete a pipeline that is key for any datasets that are uploaded to EMBER. It is important that not only is the data standardized for improved storage and metadata retrieval, but that the standardized data can also be used for secondary processing and analyses. Reproducing key figures is the first such verification step towards ensuring that open datasets can be repurposed for new scientific endeavours. | ||
|
|
There was a problem hiding this comment.
Might be good to add a link to the data conversion code (python script to convert the .mat files to .nwb) and a link to the jupyter notebook that reproduces the analysis. We can also mention that we plan to release the standardized version of the dataset in the future.
|
Thanks for putting this together! I left a few minor comments throughout the file. Feel free to look through and address the ones you think are critical and then get this merged into |
No description provided.