Skip to content

Conversation

@jleagle94
Copy link

Adds a new directory for a Fornax notebook and files demonstrating high-energy Galactic source studies with Chandra, Fermi, GAIA and 2MASS.

Closes #491

@zoghbi-a zoghbi-a self-assigned this Aug 27, 2025
@bsipocz bsipocz changed the title Review and add new fornax notebook and files Add CFG2 cross matching notebook Aug 27, 2025
Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick run though it, haven't tried to run the code or render the notebook.

Overall it feels that there could and should be way more narrative, especially at the last third of the notebook

Comment on lines 75 to 91
```python
heasarc = vo.dal.TAPService("https://heasarc.gsfc.nasa.gov/xamin/vo/tap")
```

## Step 3: Run a ADQL query to get all observation positions

```python
query = """
SELECT TOP 99999999 obsid, ra, dec, exposure, detector, time, name
FROM chanmaster
WHERE ra IS NOT NULL AND dec IS NOT NULL
"""


results = heasarc.search(query)
table = results.to_table()
table[:5]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider sending this through astroquery rather than directly using pyvo

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked her to use PyVO where possible. If you already know that the data are at HEASARC, and if your query can be done by astroquery, then it is indeed simpler. But note that this is a call to fetch the entire chanmaster catalog. I'm not sure astroquery.heasarc can do that. If you use query_region() and give a full-sky search radius, it'll slow the query down significantly.

Is there an astroquery-y way to fetch an entire catalog that's served out of a database (as opposed to a file on disk)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, a query_tap via astroquery is possible, one can use that for custom ADQL as opposed to the wrapper/convenience of a spatial TAP query via query_region

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an astroquery-y way to fetch an entire catalog that's served out of a database (as opposed to a file on disk)?

I don't understand this, astroquery doesn't require file on disk for anything (well, it's some of the super small modules that do that typically when it's just a something dumped on a website, but it's rare and the exception). It does queries whenever there are an actual server on the other end.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But not every archive's module has a query_tap() is my point.

Comment on lines 588 to 591
```python
gaia_dr3 = vo.dal.TAPService("https://gea.esac.esa.int/tap-server/tap")
mass = vo.dal.TAPService("https://irsa.ipac.caltech.edu/TAP")
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do use astroquery for both of these

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice example where we can demonstrate the end-to-end PyVO way of doing things, particularly to demonstrate the data discovery so that users can modify it for a different dataset. (It also depends on whether it will be used later with a constraint that is not just spatial. You cannot do generic SQL queries like that in most astroquery modules, AFAIK.)

We (HEASARCers) think we may present both possibilities, hiding PyVo maybe under a "for a more generic solution, see here" or something. But do we need to do that in every single notebook? That would be a pain. But again, what does the user do when they don't know which astroquery module serves the data they might need? Is there any sort of astroquery registry? Searching astroquery docs for a module that serves the data you want? That could be cool, and an alternative to the VO Registry, which has issues.

To be continued.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do generic adql at most of the big astroquery modules, the rest are either not VO complient, or not maintained by the archives, or using vo is just not their preference. For IRSA/2MASS I think it's a reasonable ask to use our astroquery module. --> and here there is no data discovery, the users are expected not just to know the name of the archives but to know the exact URLs, which is not a reasonable expectation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of great points here. I will incorporate them in the data access notebook I am working on that will cover astroquery and pyvo uses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for data access I would think it would be great to get Jessica involved in the brainstorming, too. I mean we did add a lot of data access already (some of which may have room for improvements), and just overall to coordinate on the exact topic as some of the resources may exist already

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be in fornax-demo-notebooks, but it was moved...

Fornax-specific data access means demonstrating how to access and utilize all archival data. I personally would like to see such a notebook to expand my research interests to utilize IRSA and MAST. This is the point of Fornax, right? As a high-energy person myself, I am unfamiliar with IRSA and MAST and how they have become accustomed to storing, accessing, and using their data, and would like to learn how to. I feel like this is a shared thought for many of us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but there is zero fornax specific of that data access, you will be using standard tools and protocols.

Fornax specific would be to bake in local paths and code that is only available on fornax. Fortunately both the libraries and the data are public -- there are multiple reasons to do all of this within fornax, but none of those reasons are that there are some fornax specific data access for otherwise non-public data.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that anything that is multi-archive = Fornax-specific. This is also a requirement noted in the checklist

We should really discuss this as a group so we are all on the same page. I would think anything considered a Fornax notebook is not necessarily only capable of running in the Fornax environment, but is tested and maintained to run on it, and emphasizes the power of multi-archival abilities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing up different types of notebooks @jleagle94. I'm just seeing this and want to add a bit to try to help clarify what our approach has been. Most of our notebooks show how to access data. Access methods for quite a few different datasets are demonstrated and I can point out a few things about them on Thursday. There are certainly datasets and access methods that aren't represented here yet and showing ones that are here again in different contexts is good too. So if you've got a multi-archive data access notebook in progress I'm sure we're all interested to bring it in here, and we'll also push to make it focus on a science use case. We haven't made a notebook for this repo that only shows how to access data for a few reasons, I think. One is exactly what @bsipocz said above; there's nothing specific about Fornax to justify that. Another is that there's often more than one way to access a particular dataset and which option is "best" usually depends on the use case. Both the science use case and the amount of data being accessed can make a big difference. (And the amount of data needed depends on the science use case). In addition, this repo is intended to highlight the capabilities of the Fornax Science Console and the Console is particularly well suited to certain kinds of use cases. I can expand on all that on Thursday, but in a nutshell, this is why our approach has been to focus the notebooks in this repo on science use cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional context, @troyraen. I know that each archive already has some data access notebooks, but as @trjaffe has pointed out, there are differences in how each archive stores and accesses data in the cloud. I think it would be useful to have a multi-archive notebook that can touch on this a bit, and also inform users of the various tools and methods that are at their disposal. It would be a bit more comprehensive than what I have seen in single archive notebooks.

I have a current data access notebook that tries to do this that is under review with HEASARC right now that might make its way here at some point. The same version is available here. The point of the notebooks I am developing is to exactly emphasize one of your points about Fornax: that data access changes depending on the goal of the user and to demonstrate what is available and when it might be useful.

I would like to know more about what use cases are well suited for a Fornax environment so I can make sure the ones I develop take advantage of this.

@DavidT3
Copy link

DavidT3 commented Aug 28, 2025

For HEASARC reference, the relevant issue to this PR is HEASARC/heasarc-tutorials#3

@jleagle94
Copy link
Author

Thanks for your comments @bsipocz. I am working on implementing them over the next week or so. I will explore astroquery to see if it simplifies the code.

Copy link
Contributor

@troyraen troyraen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jleagle94, it's fun to see some high-energy data! My background isn't in this area but it made me interested to learn more.

I ran the notebook on the Fornax Science Console and it ran without error. Reading through it, I was mostly able to follow what you're doing but had a hard time following the why. I would have loved some intro text before jumping in and a quick note in many places to help me learn about the data. For this PR, I would say please add a title and an introduction giving a little bit of background about the datasets and what can be learned by putting them together. Also fix the requirements file so it can be used reliably to run the notebook.

I left other comments below with an eye towards the review checklists (https://github.com/nasa-fornax/fornax-demo-notebooks/blob/main/notebook_review_checklists.md) but historically we don't expect that everything is addressed in the first PR. The existing workflow for this repo is a 2-step process. First step is into the repo and second is into the rendered docs. We review using the checklists more explicitly before finishing step two. There's a notebook template in prep in #473. Feel free to comment. Of course, all of these process and related docs can and should continue to be iterated on. Just letting you know where we're at up til this point.

plt.show()
```

The code below plots the matched table along the Galactic plane again, but using the Astropy WCS package so we can display only the Galactic plane.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of code to plot the same thing as above but zoomed in. Is it actually helpful to have both? If so, what additional information do we get from each figure? What should I be noticing? Same for step 6.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it offers two ways to plot the data, and zooming in on the part of the sky that we are actually interested in can be more informative than an all-sky map, but to do a zoomed in sky map I cannot use the aitoff projection. I don't know if it is more than just a little coding exercise 😀

@jleagle94
Copy link
Author

jleagle94 commented Aug 29, 2025

For HEASARC reference, the relevant issue to this PR is HEASARC/tutorials#3.

It should actually be HEASARC/tutorials#5 😃

@jleagle94
Copy link
Author

Thanks for the comments and background info @troyraen. I will definitely improve the narration throughout the notebook, related to many of the comments of others.

Comment on lines 75 to 91
```python
heasarc = vo.dal.TAPService("https://heasarc.gsfc.nasa.gov/xamin/vo/tap")
```

## Step 3: Run a ADQL query to get all observation positions

```python
query = """
SELECT TOP 99999999 obsid, ra, dec, exposure, detector, time, name
FROM chanmaster
WHERE ra IS NOT NULL AND dec IS NOT NULL
"""


results = heasarc.search(query)
table = results.to_table()
table[:5]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But not every archive's module has a query_tap() is my point.


import requests

#2MASS query
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

twomass_query = f"""
    SELECT TOP 3 designation, ra, dec, glon, glat, DISTANCE(POINT('ICRS', ra,dec),POINT('ICRS', {f_coord.ra.deg}, {f_coord.dec.deg})) AS dist
    FROM fp_psc
    WHERE 1=CONTAINS(
        POINT('ICRS', ra, dec),
        CIRCLE('ICRS', {f_coord.ra.deg}, {f_coord.dec.deg}, {radius})
    )
    ORDER BY dist ASC
    """
vo.dal.TAPService("https://irsa.ipac.caltech.edu/TAP").search(twomass_query).to_table()

works for me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is annoying 😀 . I tested exactly this and got either "invalid number" or "invalid query". Anyway, I will retest and implement.

Copy link
Author

@jleagle94 jleagle94 Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. I actually still cannot get this to run, but this time I get this error: DALQueryError: Query Error: <No useful error from server> (using .run_async()) but back to invalid number if I use .search().

Copy link
Author

@jleagle94 jleagle94 Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What version of pyvo are you using? @trjaffe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really prefer if you were using query_region for these type of spacial queries for 2mass, because this is everything but user friendly, first the have to hand write the ADQL -- lots of things to unpack there, then call this line
vo.dal.TAPService("https://irsa.ipac.caltech.edu/TAP").search(twomass_query).to_table()

Instead of this:

Irsa.query_region(f_coord, catalog='fp_psc')

Copy link
Author

@jleagle94 jleagle94 Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize, my intention is not being conveyed here well. In the initial tests I have run so far, an astroquery search within HEASARC data took nearly twice as long (~30 s) as the identical command with pyvo (~15 s). This is only preliminary and may not be significant for other queries.

I haven’t fully explored other differences yet, but since this part of the code already takes several minutes to run, I need more time to properly evaluate both before integrating astroquery. I would appreciate additional time to explore these tools and determine the best way to implement them here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any time different between astroquery and pyvo. They make exactly the same call. Any observed difference may come from the response time from the server.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I finally got around to testing a bit more the astroquery version of things. Throughout the notebook, I replace pyvo with the astroquery equivalent code (or close to) and find comparable results. More specifically, I retrieve identical results with comparable computing time. I apologize for raising unnecessary red flags and offending anyone as that was not my intention. I am learning the tools and how to use them effectively and that is my primary goal when I communicate any concerns or thoughts, etc.

Now, when I query the 2MASS catalog, I now use astroquery using @bsipocz's suggestion:

Irsa.query_region(f_coord, catalog='fp_psc')

This is much simpler for a couple of reasons. 1) ADQL parsing is difficult for 2MASS. I ran into issues using DISTANCE() for negative coordinate values. While the helpdesk says this has been fixed, I still cannot get it to work. 2) I cannot appear to use a table upload the way I can for GAIA (although this is true for Irsa.query_region too AFAIK). 3) query_region avoids much of ADQL text necessary for a query_tap (whether in astroquery or in pvo).

Which brings me to my final assessment: using query_tap in astroquery or pyvo's .search() still uses the same bit of string to build the query we submit and run. So, in the places where this is done, there is really no discernible difference between using pyvo or astropy that I can see, so I am indifferent how this part of the code is implemented. I will let @trjaffe and @bsipocz decide what is the fate here. query_region for the 2MASS query is in fact simpler code wise, so I think it makes the most sense to use it here instead of the existing code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. ADQL parsing is difficult for 2MASS. I ran into issues using DISTANCE() for negative coordinate values. While the helpdesk says this has been fixed, I still cannot get it to work. 2) I cannot appear to use a table upload the way I can for GAIA (although this is true for Irsa.query_region too AFAIK).

Can you provide me with a simple reproducible example of both the DISTANCE() and table upload queries that are failing for you? I want to follow up with our backend team.

Copy link
Author

@jleagle94 jleagle94 Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. So, this works:

f_l, f_b = grouped_display_table['fermi_l'][0], grouped_display_table['fermi_b'][0]
f_coord = SkyCoord(l=f_l*u.deg, b=f_b*u.deg, frame="galactic").icrs
ra_in = str(f"{f_coord.ra.deg:.5f}")
dec_in = str(f"{f_coord.dec.deg:.5f}")
rad = str(radius)

twomass_query = (
    """
    SELECT TOP 3 designation, ra, dec, glon, glat
    FROM fp_psc
    WHERE 1=CONTAINS(
        POINT('ICRS', ra, dec),
        CIRCLE('ICRS', """ + ra_in + """, """ + dec_in + """, """ + rad + """)
    )
    """
)

twomass_results = mass.run_async(twomass_query).to_table()
print(twomass_results)

where ra_in , dec_in, and rad are 161.27739 -59.68220 0.03 in degrees.

If you add DISTANCE(POINT('ICRS', ra, dec), POINT('ICRS', CAST(""" + ra_in + """ AS DOUBLE), CAST(""" + dec_in + """ AS DOUBLE))) AS dist as we do in the GAIA query using the CAST() and strings of coordinate values to avoid other parsing issues, it returns

DALServiceError: Cannot wait for job completion. Job is not active!

due to calling it with run_async, but if you change it to search it gives

DALQueryError: UsageFault: BAD_REQUEST: Invalid or unsupported ADQL query string. See TAP documentation here: https://irsa.ipac.caltech.edu/docs/program_interface/TAP.html.

If I try JOIN in the same way as I do for GAIA,

twomass_query = (
    """
    SELECT TOP 3 f.fermi_name, f.ra, f.dec, m.designation, m.ra, m.dec, m.glon, m.glat
    FROM TAP_UPLOAD.mytable AS f
    JOIN fp_psc AS m
    ON 1=CONTAINS(
        POINT('ICRS', m.ra, m.dec),
        CIRCLE('ICRS', """ + ra_in + """, """ + dec_in + """, """ + rad + """)
    )
    ORDER BY f.fermi_name
    """
)
twomass_results = mass.search(twomass_query,uploads={'mytable': upload_table}).to_table()

the error is again:

DALQueryError: UsageFault: BAD_REQUEST: Invalid or unsupported ADQL query string. See TAP documentation here: https://irsa.ipac.caltech.edu/docs/program_interface/TAP.html

but I cannot figure out why the syntax is wrong other than adding JOIN or DISTANCE.

…tch/cxc-fermi-gaia-2mass_cross-match/README.md
…ndra_fermi_gaia_2mass.md to crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md
Following notebook review comments. Need to implement astroquery in 2MASS query at the end of the notebook. Look for it in the next version.
@jleagle94
Copy link
Author

I have implemented most of everyone's comments/suggestions following the discussions here. Latest version is up. I ran into a new issue running it on Fornax though, which I think is coming from Python 3.12 dependency clashes between matplotlib and astropy. I don't have time to work through this, so I instead made a Python 3.11 environment and ran the notebook there successfully. The version in shared-storage/ on Fornax documents those steps needed for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Galactic study using Chandra, Fermi, GAIA, and 2MASS data

6 participants