-
Notifications
You must be signed in to change notification settings - Fork 2
[Add] Initial HealthTable setup and support #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #30 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 3 6 +3
Lines 42 162 +120
==========================================
+ Hits 42 162 +120 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really great so far -- lots of terrific ideas! I'll be able to be more hands-on this week but this is exactly what I was looking for. Great start. Eager to keep things going! Let me know if you have any follow-up questions as we continue iteration.
Oh! And don't forget to add tests -- we can worry about that towards the end. :)
Keep up the great work @kosuri-indu!
…ase.jl into health-table
|
Hey @TheCedarPrince, added initial changes and basic tests (will follow up with standard ones). But I ran into an error: OMOPCDM_VERSIONS not defined in OMOPCommonDataModel, likely a dependency or import issue. Also noting: precompiling throws an error when using Tables in @HealthBase.jl, that also seems related to dependencies. I think I might need help regarding the dependencies actually.. |
|
Hey @TheCedarPrince, So i was working on implementing the HealthTable struct, the Tables.jl interface, and related OMOP CDM utility functions (like one-hot encoding, high-cardinality handling, and concept mapping), as discussed in the docs. I had some doubts on where these definitions and implementations to live?
Let me know when you get a chance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a bunch of comments and really like where this PR is going! Thanks for the hard wor k so far. Keep it up! I have some more thoughts I'll share in a moment too about your questions.
|
Hey @kosuri-indu, here are my thoughts about your questions
After a bunch of consideration with regards to 1, the more and more I thought about it, the more the question came to mind of "What should HealthBase actually own?" This ended up causing me to go down a rabbit hole of best practices across the Julia ecosystem about how people are handling extensions and putting things where. After consideration, here's my conclusion: we should move a majority of the functionality from the extension here into OMOPCommonDataModel.jl. The reason why I came to that conclusion is that as we keep pushing more and more stuff into HealthBase, we run the risk of making HealthBase less light-weight and more brittle. I would rather OMOPCommonDataModel.jl have DataFrames dependencies and other associated processing. As I read on best practices, extensions really seem more geared to smaller things. Here, we are introducing a lot of functionality that could really go into OMOPCommonDataModel.jl. With that said, for now, keep developing in this repo and this PR! What we will do is merge it into HealthBase when ready and then excise out these utilities into OMOPCommonDataModel.jl. There is some additional prep I have to do within OMOPCommonDataModel.jl anyhow to make things ready for this overhaul. Plus, I would rather have something that works that may not be best practice but is functional. For 2, in a similar way to 1, keep working on developing these features on this repo, and then we will merge them into HealthBase. After that, we can excise and move these functions to OMOPCommonDataModel.jl. I think it just makes more thematic sense anyhow. When this PR is merged, then we can worry about what should stay and what should go to OMOPCommonDataModel.jl. 😃 Great work so far though @kosuri-indu! I am really excited about how this is developing and shaping up! |
|
Also, regarding your dependency question: it seems like you didn't explicitly add Tables.jl as a dependency. Could you add it in? pkg> add TablesDoes that work to fix it? Also, don't forget to add compat entries for Tables.jl with PackageCompatUI.jl :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how this PR is shaping up! I think the test suite in particular looks fantastic; good questions about documentation as well. Let me know if you have any concerns or questions!
I will say one additional thing: at this point, I am not sure how to actually test using your PR. Could you try adding a small quickstart guide so I could try it out? Plus, that could go into documentation or docstrings. I want to test how it feels/works.
Thanks and keep up the great work Indu!
|
Hey @TheCedarPrince, got it! I am currently working on the preprocessing methods, I will finish that soon and then I’ll set up the complete test guide with the above changes within the next two days. |
|
Hey @TheCedarPrince, Just a quick update:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Indu, thanks for all the hard work so far! I think we are getting well-along with this PR. I left some thoughts as well related to our discussion on Friday. I know we had found some issues on Friday and I also left some thoughts here which might be a bit challenging. If you have any questions or concerns, please don't hesitate to ask and I will be here to support!
~ tcp 🌳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Indu,
Finished doing another review here. This is excellent! I only had a few comments left specifically on the PR alongside some general comments for future work. Here are my general notes:
Quickstart notes
- This quickstart is FANTASTIC! I followed it perfectly and it was super straightforward to use!
HealthTable Interface
-
I noticed with HealthTable you are still returning a tuple with the table and version information. Could you attach the version information as metadata to the dataframe itself (i.e.
source)? -
The
wrong_dferror message looks great so far! -
This is fantastic!
julia> ht = HealthTable(wrong_df, omop_cdm_version="v5.4.1", disable_type_e
nforcement=true)
┌ Warning: OMOP CDM type validation failed for the following columns:
│ Column 'person_id': has type String, expected Int64
│
│ Type enforcement is disabled. Unexpected behavior may occur.
└ @ HealthBaseOMOPCDMExt ~/FOSS/HealthBase.jl/ext/HealthBaseOMOPCDMExt.jl:1
63
HealthTable{DataFrame}(2×4 DataFrame
Row │ person_id gender_concept_id year_of_birth race_concept_id
│ String Int64 Int64 Int64
─────┼──────────────────────────────────────────────────────────────
1 │ 1 8507 1990 8527
2 │ 2 8532 1985 8516, "v5.4.
1")map_concepts
- I was playing around with the
map_conceptsfunction quite a lot to see how it "felt". I like the idea of it so far. I had a few thoughts about it:
-
We should make an in-place mutation version that is called
map_concepts!and mutates the passed in HealthTable and overwrites the requested column. -
I think that we should enable passing in mutliple
col's and multiplenew_col's. That way, a person can map multiple columns at once and given them names. -
Building off 2, if nothing for
new_colis passed, we could instead rely on asuffixkwarg that is something like_mappedso that if a function likemap_concepts(ht; :gender_concept_id, conn)was passed, the HealthTable has a new column calledgender_concept_id_mapped. If multiple columns are passed, they all get the the same suffix added to them. -
I also think that in the quickstart we should have a simple mapping example showing how to use
map!andmapthat if one wants to pass in their own mappings, they can using base Julia.
As I wrote up this comment, I think this should rather be in a separate issue on OMOPCommonDataModel.jl; over there, we can think further about implementing these functions further.
Otherwise, with my last couple notes on the PR, I think we are just about done after we finish some tweaks, docs, and tests!
|
Hey @TheCedarPrince, I have updated the code based on the review, did some changes and also added show method using PrettyTables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work here Indu! Very few minor comments and as soon as you are done with them, this is ready for merging. Thanks!
|
Hey @TheCedarPrince, I have made all the changes that we discussed. After reviewing everything, it looks good from my end. Do take a final look 😊 |
|
Hey @kosuri-indu I was just about to merge this when I saw that code coverage for this PR seems to still be lacking coverage for a lot of parts of the code. Could you help me with adding additional tests to provide coverage? Thanks! ~ tcp 🌳 |

This PR implements the initial version of the HealthTable function for OMOP CDM datasets via the HealthBaseOMOPExt.jl extension.