Merge pull request #7 from JuliaHealth/add/docs

TheCedarPrince · web-flow · commit fcf213b42eeb · 2025-09-12T09:43:29.000-04:00
[DOCS] Documentation for the package!
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -1,3 +1,11 @@
 [deps]
 Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
+DocumenterTools = "35a29f4d-8980-5a13-9543-d66fff28ecb8"
+LiveServer = "16fef848-5104-11e9-1b77-fb7a48bbb589"
 OMOPCDMFeasibility = "d7c4b303-36e6-42c5-a114-1784c141c4f7"
+
+[compat]
+Documenter = "1"
+DocumenterTools = "0.1.20"
+LiveServer = "1.5"
+julia = "1.10"
diff --git a/docs/make.jl b/docs/make.jl
@@ -7,14 +7,25 @@ DocMeta.setdocmeta!(
 
 makedocs(;
     modules=[OMOPCDMFeasibility],
+    checkdocs = :none,
     authors="Kosuri Lakshmi Indu <kosurilindu@gmail.com> and contributors",
+    repo = "https://github.com/JuliaHealth/OMOPCDMFeasibility.jl/blob/{commit}{path}#{line}",
     sitename="OMOPCDMFeasibility.jl",
     format=Documenter.HTML(;
+        prettyurls = get(ENV, "CI", "false") == "true",
         canonical="https://JuliaHealth.github.io/OMOPCDMFeasibility.jl",
-        edit_link="master",
         assets=String[],
     ),
-    pages=["Home" => "index.md"],
+    pages=[
+        "Home" => "index.md",
+        "Quickstart" => "quickstart.md",
+
+        "Pre-Cohort Analysis" => "precohort.md",
+        "Post-Cohort Analysis" => "postcohort.md",
+        
+        "API" => "api.md",
+    ],
+    doctest = false,
 )
 
-deploydocs(; repo="github.com/JuliaHealth/OMOPCDMFeasibility.jl", devbranch="master")
+deploydocs(; repo="github.com/JuliaHealth/OMOPCDMFeasibility.jl")
diff --git a/docs/src/api.md b/docs/src/api.md
@@ -0,0 +1,15 @@
+```@meta
+CurrentModule = OMOPCDMFeasibility
+```
+
+# OMOPCDMFeasibility
+
+Documentation for [OMOPCDMFeasibility](https://github.com/JuliaHealth/OMOPCDMFeasibility.jl).
+
+```@index
+
+```
+
+```@autodocs
+Modules = [OMOPCDMFeasibility]
+```
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -1,14 +1,24 @@
-```@meta
-CurrentModule = OMOPCDMFeasibility
-```
+# OMOPCDMFeasibility.jl
 
-# OMOPCDMFeasibility
+> A Julia package for feasibility and cohort analysis on OMOP Common Data Model (CDM) data.
 
-Documentation for [OMOPCDMFeasibility](https://github.com/JuliaHealth/OMOPCDMFeasibility.jl).
+## Overview
 
-```@index
-```
+OMOPCDMFeasibility.jl helps researchers and data scientists quickly explore, summarize, and compare patient cohorts using OMOP CDM databases. It is designed for use in observational health studies, cohort discovery, and data quality assessment.
 
-```@autodocs
-Modules = [OMOPCDMFeasibility]
-```
+## Features
+
+- **Pre-cohort analysis:** Explore concept distributions, domain breakdowns, and data quality before defining a cohort.
+- **Post-cohort analysis:** Summarize, profile, and compare cohorts after extraction.
+- **Flexible database support:** Works with DuckDB, SQLite, PostgreSQL, and more.
+- **Composable with JuliaHealth:** Integrates with DataFrames.jl, OMOPCommonDataModel.jl, and other JuliaHealth tools.
+- **Reproducible workflows:** Designed for robust, testable, and transparent research.
+- **Clear error handling:** Provides informative messages and input validation.
+
+## Limitations
+
+- OMOPCDMFeasibility.jl is focused on feasibility and cohort analysis only; it does not perform cohort extraction or patient-level prediction itself.
+- Some advanced features (e.g., custom covariates, non-standard dialects) may require additional JuliaHealth packages or extensions.
+- The package assumes your data is already in OMOP CDM format and accessible via a supported database backend.
+
+For a step-by-step guide, see the [Quickstart](quickstart.md). For detailed workflows and function documentation, explore the other sections in this documentation.
diff --git a/docs/src/postcohort.md b/docs/src/postcohort.md
@@ -0,0 +1,75 @@
+# Post-Cohort Analysis
+
+**What is Post-Cohort Analysis?**
+
+Post-cohort analysis is the process of exploring and summarizing your study population after you have defined your cohort. It helps you answer questions like: Who is in my cohort? What are their characteristics? How do they compare to the rest of the database?
+
+This step is essential for understanding your results, checking for biases, and making your study reproducible and transparent.
+
+Post-cohort analysis in OMOPCDMFeasibility.jl is designed to be simple and clear, even for beginners.
+
+## 1. `create_individual_profiles`
+
+```@docs
+OMOPCDMFeasibility.create_individual_profiles
+```
+
+## 2. `create_cartesian_profiles`
+
+```@docs
+OMOPCDMFeasibility.create_cartesian_profiles
+```
+
+## Example: Post-Cohort Analysis in Practice
+
+```julia
+using DataFrames, DuckDB, DBInterface, Dates
+using OMOPCDMFeasibility
+using OMOPCDMCohortCreator:
+    GenerateDatabaseDetails,
+    GenerateTables,
+    GetPatientGender,
+    GetPatientAgeGroup,
+    GetPatientRace,
+    GetPatientEthnicity,
+    ConditionFilterPersonIDs
+
+conn = DBInterface.connect(DuckDB.DB, "synthea_1M_3YR.duckdb")
+
+GenerateDatabaseDetails(:postgresql, "dbt_synthea_dev")
+GenerateTables(conn)
+
+diabetes_concept_ids = [201826]
+cohort_result = ConditionFilterPersonIDs(diabetes_concept_ids, conn)
+cohort_ids = cohort_result.person_id
+
+sample_cohort = DataFrame(
+    person_id = cohort_ids
+)
+
+println("Creating individual demographic profiles...")
+individual_demographics = OMOPCDMFeasibility.create_individual_profiles(
+    cohort_df=sample_cohort,
+    conn=conn,
+    covariate_funcs=[GetPatientGender, GetPatientRace, GetPatientAgeGroup]
+)
+
+println("Individual profiles:")
+for (name, table) in pairs(individual_demographics)
+    println("$name:")
+    println(table)
+    println()
+end
+
+println("Creating Cartesian demographic profiles...")
+cartesian_demographics = OMOPCDMFeasibility.create_cartesian_profiles(
+    cohort_df=sample_cohort,
+    conn=conn,
+    covariate_funcs=[GetPatientAgeGroup, GetPatientGender, GetPatientRace]
+)
+
+println("Cartesian profiles:")
+println(cartesian_demographics)
+
+DBInterface.close!(conn)
+```
diff --git a/docs/src/precohort.md b/docs/src/precohort.md
@@ -0,0 +1,87 @@
+# Pre-Cohort Analysis
+
+**What is Feasibility Analysis?**
+
+Feasibility analysis is the process of checking if your planned study or cohort is possible and meaningful with the data you have. It helps you answer questions like: Are there enough patients? Are the concepts I care about present? Is the data complete and reliable?
+
+**What is Pre-Cohort Analysis?**
+
+Pre-cohort analysis is the first step in any observational health study. Before you define your study population (the "cohort"), you use pre-cohort tools to explore your OMOP CDM database. This helps you:
+
+- Understand what data is available
+- Check the frequency and quality of key concepts
+- Plan your study with confidence
+
+Pre-cohort analysis is like scouting the terrain before starting a journey—it helps you avoid surprises and design better, more robust studies.
+
+## 1. `analyze_concept_distribution`
+
+```@docs
+OMOPCDMFeasibility.analyze_concept_distribution
+```
+
+## 2. `generate_summary`
+
+```@docs
+OMOPCDMFeasibility.generate_summary
+```
+
+## 3. `generate_domain_breakdown`
+
+```@docs
+OMOPCDMFeasibility.generate_domain_breakdown
+```
+
+## Example: Pre-Cohort Analysis in Practice
+
+```julia
+using DataFrames, DuckDB, DBInterface
+using OMOPCDMFeasibility
+using OMOPCDMCohortCreator:
+    GenerateDatabaseDetails,
+    GenerateTables,
+    GetPatientGender,
+    GetPatientAgeGroup,
+    GetPatientRace,
+    GetPatientEthnicity,
+    ConditionFilterPersonIDs
+
+conn = DBInterface.connect(DuckDB.DB, "synthea_1M_3YR.duckdb")
+
+GenerateDatabaseDetails(:postgresql, "dbt_synthea_dev")
+GenerateTables(conn)
+
+concept_ids = [
+    31967,    # Condition: Nausea
+    1127433,  # Drug: Acetaminophen
+]
+
+println("\n")
+distribution = OMOPCDMFeasibility.analyze_concept_distribution(
+    conn;
+    concept_set=concept_ids,
+    covariate_funcs=[GetPatientGender, GetPatientRace],
+    schema="dbt_synthea_dev"
+)
+display(distribution)
+
+println("\n")
+summary = OMOPCDMFeasibility.generate_summary(
+    conn;
+    concept_set=concept_ids,
+    covariate_funcs=[GetPatientAgeGroup, GetPatientRace],
+    schema="dbt_synthea_dev"
+)
+display(summary)
+
+println("\n")
+domain_breakdown = OMOPCDMFeasibility.generate_domain_breakdown(
+    conn;
+    concept_set=concept_ids,
+    schema="dbt_synthea_dev"
+)
+display(domain_breakdown)
+println()
+
+DBInterface.close!(conn)
+```
diff --git a/docs/src/quickstart.md b/docs/src/quickstart.md
@@ -0,0 +1,115 @@
+# Quickstart 🎉
+
+Welcome to the Quickstart guide for OMOPCDMFeasibility.jl! This guide shows you how to set up your Julia environment and use OMOPCDMFeasibility.jl for pre- and post-cohort analysis-after you have created a cohort using the recommended observational window template workflow.
+
+## 1. Getting Started
+
+### Launch Julia and Enter Your Project Environment
+
+To get started:
+
+1. **Open your terminal or Julia REPL.**
+2. **Navigate to your project folder (where `Project.toml` is located):**
+
+```sh
+cd path/to/your/project
+```
+
+3. **Activate the project:**
+
+```sh
+julia --project=.
+```
+
+4. **(Optional for docs) For working on documentation:**
+
+```sh
+julia --project=docs
+```
+
+## 2. Create Your Cohort with the Observation Window Template
+
+> For a robust, reproducible template for observational study setup and cohort creation, follow the official workflow :
+>
+> **[Observational Template Workflow](https://juliahealth.org/HealthBase.jl/dev/observational_template_workflow/#2.-Download-OHDSI-Cohort-Definitions)**
+>
+> **1.** Go through steps 2–5 in the workflow to define and create your cohort table in your database.
+>
+> **2.** Once your cohort is created, return here to analyze it with OMOPCDMFeasibility.jl.
+
+## 3. Pre-Cohort Analysis
+
+Explore your data before defining a cohort.
+
+**Pre-Cohort Analysis:**
+
+- Pre-cohort functions (like `analyze_concept_distribution`) do **not** accept a cohort table or DataFrame. They always analyze the full database, but you can optionally stratify by covariates (e.g., gender, race, age group) using the `covariate_funcs` argument.
+- The `covariate_funcs` argument is optional-include it if you want to stratify by covariates.
+
+To use covariate getter functions, import them from [OMOPCDMCohortCreator.jl](https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl/blob/dev/src/getters.jl):
+
+```julia
+using OMOPCDMCohortCreator: GetPatientGender, GetPatientRace, GetPatientAgeGroup
+```
+
+For more advanced understanding and options, see [Pre-Cohort Analysis](precohort.md).
+
+```julia
+# Check how common specific OMOP concepts are in your database
+# 201826 = "Hypertension", 3004249 = "Metformin"
+analyze_concept_distribution(conn; concept_set=[201826, 3004249], schema="main")
+
+# Example with covariate_funcs (optional)
+analyze_concept_distribution(
+   conn;
+   concept_set=[201826, 3004249],
+   covariate_funcs=[GetPatientGender, GetPatientRace],
+   schema="main"
+)
+
+# Get summary statistics for a set of concepts
+generate_summary(conn; concept_set=[201826, 3004249], schema="main")
+
+# See which OMOP domains your concepts belong to
+generate_domain_breakdown(conn; concept_set=[201826, 3004249], schema="main")
+```
+
+## 4. Post-Cohort Analysis
+
+After extracting your cohort, you can perform post-cohort analyses as shown below.
+
+**Post-Cohort Analysis:**
+
+- Post-cohort functions (like `create_individual_profiles`) require you to provide either:
+  - `cohort_definition_id` (to use a cohort table in the database), or
+  - `cohort_df` (a DataFrame of person IDs).
+- You should provide only one of them.
+- The `covariate_funcs` argument is optional-include it if you want to stratify by covariates (e.g., gender, race, age group).
+
+To use covariate getter functions, import them from [OMOPCDMCohortCreator.jl](https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl/blob/dev/src/getters.jl):
+
+```julia
+using OMOPCDMCohortCreator: GetPatientGender, GetPatientRace, GetPatientAgeGroup
+```
+
+For more advanced understanding and options, see [Post-Cohort Analysis](postcohort.md).
+
+```julia
+# Example: Using a DataFrame
+create_individual_profiles(
+   cohort_df = sample_cohort,
+   conn = conn,
+   # covariate_funcs = [GetPatientGender, GetPatientRace], # optional
+   schema = "main"
+)
+
+# Example: Using a cohort_definition_id
+create_individual_profiles(
+   cohort_definition_id = 1,
+   conn = conn,
+   # covariate_funcs = [GetPatientGender, GetPatientRace], # optional
+   schema = "main"
+)
+```
+
+Happy experimenting with OMOPCDMFeasibility.jl! 🎉