Skip to content

Conversation

@kosuri-indu
Copy link
Member

This PR implements the initial version of the HealthTable function for OMOP CDM datasets via the HealthBaseOMOPExt.jl extension.

@codecov
Copy link

codecov bot commented Jun 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (3258bce) to head (db2caec).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##            master       #30    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files            3         6     +3     
  Lines           42       162   +120     
==========================================
+ Hits            42       162   +120     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@TheCedarPrince TheCedarPrince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really great so far -- lots of terrific ideas! I'll be able to be more hands-on this week but this is exactly what I was looking for. Great start. Eager to keep things going! Let me know if you have any follow-up questions as we continue iteration.

Oh! And don't forget to add tests -- we can worry about that towards the end. :)

Keep up the great work @kosuri-indu!

TheCedarPrince

This comment was marked as duplicate.

@kosuri-indu
Copy link
Member Author

Hey @TheCedarPrince, added initial changes and basic tests (will follow up with standard ones). But I ran into an error: OMOPCDM_VERSIONS not defined in OMOPCommonDataModel, likely a dependency or import issue.
44e08ca6-4fdc-4c55-a93d-722f950153c7

Also noting: precompiling throws an error when using Tables in @HealthBase.jl, that also seems related to dependencies.

I think I might need help regarding the dependencies actually..

@kosuri-indu
Copy link
Member Author

Hey @TheCedarPrince, So i was working on implementing the HealthTable struct, the Tables.jl interface, and related OMOP CDM utility functions (like one-hot encoding, high-cardinality handling, and concept mapping), as discussed in the docs. I had some doubts on where these definitions and implementations to live?

  1. Should the HealthTable struct and the Tables.jl interface methods be in the main source (example: src/omopcdm_stub.jl or in the extension ext/HealthBaseOMOPCDMExt.jl)?
  2. Also, should the utility functions (one-hot encoding, concept mapping, etc.) be in the extension, the core package, or split between stub/API in core and implementation in the extension?

Let me know when you get a chance!

Copy link
Member

@TheCedarPrince TheCedarPrince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a bunch of comments and really like where this PR is going! Thanks for the hard wor k so far. Keep it up! I have some more thoughts I'll share in a moment too about your questions.

@TheCedarPrince
Copy link
Member

Hey @kosuri-indu, here are my thoughts about your questions

  1. Should the HealthTable struct and the Tables.jl interface methods be in the main source (example: src/omopcdm_stub.jl or in the extension ext/HealthBaseOMOPCDMExt.jl)?
  2. Also, should the utility functions (one-hot encoding, concept mapping, etc.) be in the extension, the core package, or split between stub/API in core and implementation in the extension?

After a bunch of consideration with regards to 1, the more and more I thought about it, the more the question came to mind of "What should HealthBase actually own?" This ended up causing me to go down a rabbit hole of best practices across the Julia ecosystem about how people are handling extensions and putting things where. After consideration, here's my conclusion: we should move a majority of the functionality from the extension here into OMOPCommonDataModel.jl. The reason why I came to that conclusion is that as we keep pushing more and more stuff into HealthBase, we run the risk of making HealthBase less light-weight and more brittle. I would rather OMOPCommonDataModel.jl have DataFrames dependencies and other associated processing. As I read on best practices, extensions really seem more geared to smaller things. Here, we are introducing a lot of functionality that could really go into OMOPCommonDataModel.jl.

With that said, for now, keep developing in this repo and this PR! What we will do is merge it into HealthBase when ready and then excise out these utilities into OMOPCommonDataModel.jl. There is some additional prep I have to do within OMOPCommonDataModel.jl anyhow to make things ready for this overhaul. Plus, I would rather have something that works that may not be best practice but is functional.

For 2, in a similar way to 1, keep working on developing these features on this repo, and then we will merge them into HealthBase. After that, we can excise and move these functions to OMOPCommonDataModel.jl. I think it just makes more thematic sense anyhow.

When this PR is merged, then we can worry about what should stay and what should go to OMOPCommonDataModel.jl. 😃

Great work so far though @kosuri-indu! I am really excited about how this is developing and shaping up!

@TheCedarPrince
Copy link
Member

Also, regarding your dependency question: it seems like you didn't explicitly add Tables.jl as a dependency. Could you add it in?

pkg> add Tables

Does that work to fix it? Also, don't forget to add compat entries for Tables.jl with PackageCompatUI.jl :)

Copy link
Member

@TheCedarPrince TheCedarPrince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how this PR is shaping up! I think the test suite in particular looks fantastic; good questions about documentation as well. Let me know if you have any concerns or questions!

I will say one additional thing: at this point, I am not sure how to actually test using your PR. Could you try adding a small quickstart guide so I could try it out? Plus, that could go into documentation or docstrings. I want to test how it feels/works.

Thanks and keep up the great work Indu!

@kosuri-indu
Copy link
Member Author

Hey @TheCedarPrince, got it! I am currently working on the preprocessing methods, I will finish that soon and then I’ll set up the complete test guide with the above changes within the next two days.

@kosuri-indu
Copy link
Member Author

Hey @TheCedarPrince,

Just a quick update:

  • I have completed the basic setup for some of the preprocessing functions (still need to add two more), they are functional but still minimal, need to add more strategies and improve them going forward.
  • I also added a quickstart.md file for a basic test guide, just the beginning for now.

Copy link
Member

@TheCedarPrince TheCedarPrince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Indu, thanks for all the hard work so far! I think we are getting well-along with this PR. I left some thoughts as well related to our discussion on Friday. I know we had found some issues on Friday and I also left some thoughts here which might be a bit challenging. If you have any questions or concerns, please don't hesitate to ask and I will be here to support!

~ tcp 🌳

Copy link
Member

@TheCedarPrince TheCedarPrince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Indu,

Finished doing another review here. This is excellent! I only had a few comments left specifically on the PR alongside some general comments for future work. Here are my general notes:

Quickstart notes

  • This quickstart is FANTASTIC! I followed it perfectly and it was super straightforward to use!

HealthTable Interface

  • I noticed with HealthTable you are still returning a tuple with the table and version information. Could you attach the version information as metadata to the dataframe itself (i.e. source)?

  • The wrong_df error message looks great so far!

  • This is fantastic!

julia> ht = HealthTable(wrong_df, omop_cdm_version="v5.4.1", disable_type_e
nforcement=true)
┌ Warning: OMOP CDM type validation failed for the following columns:
│ Column 'person_id': has type String, expected Int64
│ 
│ Type enforcement is disabled. Unexpected behavior may occur.
└ @ HealthBaseOMOPCDMExt ~/FOSS/HealthBase.jl/ext/HealthBaseOMOPCDMExt.jl:1
63
HealthTable{DataFrame}(2×4 DataFrame
 Row │ person_id  gender_concept_id  year_of_birth  race_concept_id 
     │ String     Int64              Int64          Int64           
─────┼──────────────────────────────────────────────────────────────
   1 │ 1                       8507           1990             8527
   2 │ 2                       8532           1985             8516, "v5.4.
1")

map_concepts

  • I was playing around with the map_concepts function quite a lot to see how it "felt". I like the idea of it so far. I had a few thoughts about it:
  1. We should make an in-place mutation version that is called map_concepts! and mutates the passed in HealthTable and overwrites the requested column.

  2. I think that we should enable passing in mutliple col's and multiple new_col's. That way, a person can map multiple columns at once and given them names.

  3. Building off 2, if nothing for new_col is passed, we could instead rely on a suffix kwarg that is something like _mapped so that if a function like map_concepts(ht; :gender_concept_id, conn) was passed, the HealthTable has a new column called gender_concept_id_mapped. If multiple columns are passed, they all get the the same suffix added to them.

  4. I also think that in the quickstart we should have a simple mapping example showing how to use map! and map that if one wants to pass in their own mappings, they can using base Julia.

As I wrote up this comment, I think this should rather be in a separate issue on OMOPCommonDataModel.jl; over there, we can think further about implementing these functions further.


Otherwise, with my last couple notes on the PR, I think we are just about done after we finish some tweaks, docs, and tests!

@kosuri-indu
Copy link
Member Author

kosuri-indu commented Jul 7, 2025

Hey @TheCedarPrince, I have updated the code based on the review, did some changes and also added show method using PrettyTables.

Copy link
Member

@TheCedarPrince TheCedarPrince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here Indu! Very few minor comments and as soon as you are done with them, this is ready for merging. Thanks!

@kosuri-indu
Copy link
Member Author

Hey @TheCedarPrince, I have made all the changes that we discussed. After reviewing everything, it looks good from my end. Do take a final look 😊

@TheCedarPrince
Copy link
Member

Hey @kosuri-indu

I was just about to merge this when I saw that code coverage for this PR seems to still be lacking coverage for a lot of parts of the code. Could you help me with adding additional tests to provide coverage? Thanks!

~ tcp 🌳

@TheCedarPrince TheCedarPrince merged commit d5ee502 into JuliaHealth:master Aug 27, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants