Add CFG2 cross matching notebook #492

jleagle94 · 2025-08-27T20:32:32Z

Adds a new directory for a Fornax notebook and files demonstrating high-energy Galactic source studies with Chandra, Fermi, GAIA and 2MASS.

Closes #491

bsipocz

I had a quick run though it, haven't tried to run the code or render the notebook.

Overall it feels that there could and should be way more narrative, especially at the last third of the notebook

crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md

all_sky_galactic_chandra_fermi_gaia_2mass/requirements.txt

crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md

bsipocz · 2025-08-27T22:13:58Z

crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md

+```python
+heasarc = vo.dal.TAPService("https://heasarc.gsfc.nasa.gov/xamin/vo/tap")
+```
+
+## Step 3: Run a ADQL query to get all observation positions
+
+```python
+query = """
+SELECT TOP 99999999 obsid, ra, dec, exposure, detector, time, name
+FROM chanmaster
+WHERE ra IS NOT NULL AND dec IS NOT NULL
+"""
+
+
+results = heasarc.search(query)
+table = results.to_table()
+table[:5]


consider sending this through astroquery rather than directly using pyvo

I asked her to use PyVO where possible. If you already know that the data are at HEASARC, and if your query can be done by astroquery, then it is indeed simpler. But note that this is a call to fetch the entire chanmaster catalog. I'm not sure astroquery.heasarc can do that. If you use query_region() and give a full-sky search radius, it'll slow the query down significantly.

Is there an astroquery-y way to fetch an entire catalog that's served out of a database (as opposed to a file on disk)?

yes, a query_tap via astroquery is possible, one can use that for custom ADQL as opposed to the wrapper/convenience of a spatial TAP query via query_region

Is there an astroquery-y way to fetch an entire catalog that's served out of a database (as opposed to a file on disk)?

I don't understand this, astroquery doesn't require file on disk for anything (well, it's some of the super small modules that do that typically when it's just a something dumped on a website, but it's rare and the exception). It does queries whenever there are an actual server on the other end.

But not every archive's module has a query_tap() is my point.

crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md

bsipocz · 2025-08-27T22:15:58Z

crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md

+```python
+gaia_dr3 = vo.dal.TAPService("https://gea.esac.esa.int/tap-server/tap")
+mass = vo.dal.TAPService("https://irsa.ipac.caltech.edu/TAP")
+```


do use astroquery for both of these

This is a nice example where we can demonstrate the end-to-end PyVO way of doing things, particularly to demonstrate the data discovery so that users can modify it for a different dataset. (It also depends on whether it will be used later with a constraint that is not just spatial. You cannot do generic SQL queries like that in most astroquery modules, AFAIK.)

We (HEASARCers) think we may present both possibilities, hiding PyVo maybe under a "for a more generic solution, see here" or something. But do we need to do that in every single notebook? That would be a pain. But again, what does the user do when they don't know which astroquery module serves the data they might need? Is there any sort of astroquery registry? Searching astroquery docs for a module that serves the data you want? That could be cool, and an alternative to the VO Registry, which has issues.

To be continued.

you can do generic adql at most of the big astroquery modules, the rest are either not VO complient, or not maintained by the archives, or using vo is just not their preference. For IRSA/2MASS I think it's a reasonable ask to use our astroquery module. --> and here there is no data discovery, the users are expected not just to know the name of the archives but to know the exact URLs, which is not a reasonable expectation.

There are a lot of great points here. I will incorporate them in the data access notebook I am working on that will cover astroquery and pyvo uses.

for data access I would think it would be great to get Jessica involved in the brainstorming, too. I mean we did add a lot of data access already (some of which may have room for improvements), and just overall to coordinate on the exact topic as some of the resources may exist already

It used to be in fornax-demo-notebooks, but it was moved...

Fornax-specific data access means demonstrating how to access and utilize all archival data. I personally would like to see such a notebook to expand my research interests to utilize IRSA and MAST. This is the point of Fornax, right? As a high-energy person myself, I am unfamiliar with IRSA and MAST and how they have become accustomed to storing, accessing, and using their data, and would like to learn how to. I feel like this is a shared thought for many of us.

yes, but there is zero fornax specific of that data access, you will be using standard tools and protocols.

Fornax specific would be to bake in local paths and code that is only available on fornax. Fortunately both the libraries and the data are public -- there are multiple reasons to do all of this within fornax, but none of those reasons are that there are some fornax specific data access for otherwise non-public data.

My understanding is that anything that is multi-archive = Fornax-specific. This is also a requirement noted in the checklist

We should really discuss this as a group so we are all on the same page. I would think anything considered a Fornax notebook is not necessarily only capable of running in the Fornax environment, but is tested and maintained to run on it, and emphasizes the power of multi-archival abilities.

Thanks for bringing up different types of notebooks @jleagle94. I'm just seeing this and want to add a bit to try to help clarify what our approach has been. Most of our notebooks show how to access data. Access methods for quite a few different datasets are demonstrated and I can point out a few things about them on Thursday. There are certainly datasets and access methods that aren't represented here yet and showing ones that are here again in different contexts is good too. So if you've got a multi-archive data access notebook in progress I'm sure we're all interested to bring it in here, and we'll also push to make it focus on a science use case. We haven't made a notebook for this repo that only shows how to access data for a few reasons, I think. One is exactly what @bsipocz said above; there's nothing specific about Fornax to justify that. Another is that there's often more than one way to access a particular dataset and which option is "best" usually depends on the use case. Both the science use case and the amount of data being accessed can make a big difference. (And the amount of data needed depends on the science use case). In addition, this repo is intended to highlight the capabilities of the Fornax Science Console and the Console is particularly well suited to certain kinds of use cases. I can expand on all that on Thursday, but in a nutshell, this is why our approach has been to focus the notebooks in this repo on science use cases.

Thanks for the additional context, @troyraen. I know that each archive already has some data access notebooks, but as @trjaffe has pointed out, there are differences in how each archive stores and accesses data in the cloud. I think it would be useful to have a multi-archive notebook that can touch on this a bit, and also inform users of the various tools and methods that are at their disposal. It would be a bit more comprehensive than what I have seen in single archive notebooks.

I have a current data access notebook that tries to do this that is under review with HEASARC right now that might make its way here at some point. The same version is available here. The point of the notebooks I am developing is to exactly emphasize one of your points about Fornax: that data access changes depending on the goal of the user and to demonstrate what is available and when it might be useful.

I would like to know more about what use cases are well suited for a Fornax environment so I can make sure the ones I develop take advantage of this.

crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md

all_sky_galactic_chandra_fermi_gaia_2mass/README.md

DavidT3 · 2025-08-28T20:51:54Z

For HEASARC reference, the relevant issue to this PR is HEASARC/heasarc-tutorials#3

jleagle94 · 2025-08-28T23:55:47Z

Thanks for your comments @bsipocz. I am working on implementing them over the next week or so. I will explore astroquery to see if it simplifies the code.

troyraen

Thanks @jleagle94, it's fun to see some high-energy data! My background isn't in this area but it made me interested to learn more.

I ran the notebook on the Fornax Science Console and it ran without error. Reading through it, I was mostly able to follow what you're doing but had a hard time following the why. I would have loved some intro text before jumping in and a quick note in many places to help me learn about the data. For this PR, I would say please add a title and an introduction giving a little bit of background about the datasets and what can be learned by putting them together. Also fix the requirements file so it can be used reliably to run the notebook.

I left other comments below with an eye towards the review checklists (https://github.com/nasa-fornax/fornax-demo-notebooks/blob/main/notebook_review_checklists.md) but historically we don't expect that everything is addressed in the first PR. The existing workflow for this repo is a 2-step process. First step is into the repo and second is into the rendered docs. We review using the checklists more explicitly before finishing step two. There's a notebook template in prep in #473. Feel free to comment. Of course, all of these process and related docs can and should continue to be iterated on. Just letting you know where we're at up til this point.

crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md

troyraen · 2025-08-29T04:23:00Z

crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md

+plt.show()
+```
+
+The code below plots the matched table along the Galactic plane again, but using the Astropy WCS package so we can display only the Galactic plane.


This is a lot of code to plot the same thing as above but zoomed in. Is it actually helpful to have both? If so, what additional information do we get from each figure? What should I be noticing? Same for step 6.

Well, it offers two ways to plot the data, and zooming in on the part of the sky that we are actually interested in can be more informative than an all-sky map, but to do a zoomed in sky map I cannot use the aitoff projection. I don't know if it is more than just a little coding exercise 😀

crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md

jleagle94 · 2025-08-29T13:30:45Z

For HEASARC reference, the relevant issue to this PR is HEASARC/tutorials#3.

It should actually be HEASARC/tutorials#5 😃

jleagle94 · 2025-08-29T14:02:31Z

Thanks for the comments and background info @troyraen. I will definitely improve the narration throughout the notebook, related to many of the comments of others.

trjaffe · 2025-08-28T19:17:29Z

crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md

+```python
+heasarc = vo.dal.TAPService("https://heasarc.gsfc.nasa.gov/xamin/vo/tap")
+```
+
+## Step 3: Run a ADQL query to get all observation positions
+
+```python
+query = """
+SELECT TOP 99999999 obsid, ra, dec, exposure, detector, time, name
+FROM chanmaster
+WHERE ra IS NOT NULL AND dec IS NOT NULL
+"""
+
+
+results = heasarc.search(query)
+table = results.to_table()
+table[:5]


But not every archive's module has a query_tap() is my point.

trjaffe · 2025-08-28T19:20:02Z

crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md

+
+import requests
+
+#2MASS query


twomass_query = f""" SELECT TOP 3 designation, ra, dec, glon, glat, DISTANCE(POINT('ICRS', ra,dec),POINT('ICRS', {f_coord.ra.deg}, {f_coord.dec.deg})) AS dist FROM fp_psc WHERE 1=CONTAINS( POINT('ICRS', ra, dec), CIRCLE('ICRS', {f_coord.ra.deg}, {f_coord.dec.deg}, {radius}) ) ORDER BY dist ASC """ vo.dal.TAPService("https://irsa.ipac.caltech.edu/TAP").search(twomass_query).to_table()

works for me.

That is annoying 😀 . I tested exactly this and got either "invalid number" or "invalid query". Anyway, I will retest and implement.

Huh. I actually still cannot get this to run, but this time I get this error: DALQueryError: Query Error: <No useful error from server> (using .run_async()) but back to invalid number if I use .search().

What version of pyvo are you using? @trjaffe

I really prefer if you were using query_region for these type of spacial queries for 2mass, because this is everything but user friendly, first the have to hand write the ADQL -- lots of things to unpack there, then call this line
vo.dal.TAPService("https://irsa.ipac.caltech.edu/TAP").search(twomass_query).to_table()

Instead of this:

Irsa.query_region(f_coord, catalog='fp_psc')

I apologize, my intention is not being conveyed here well. In the initial tests I have run so far, an astroquery search within HEASARC data took nearly twice as long (~30 s) as the identical command with pyvo (~15 s). This is only preliminary and may not be significant for other queries.

I haven’t fully explored other differences yet, but since this part of the code already takes several minutes to run, I need more time to properly evaluate both before integrating astroquery. I would appreciate additional time to explore these tools and determine the best way to implement them here.

There shouldn't be any time different between astroquery and pyvo. They make exactly the same call. Any observed difference may come from the response time from the server.

Ok, I finally got around to testing a bit more the astroquery version of things. Throughout the notebook, I replace pyvo with the astroquery equivalent code (or close to) and find comparable results. More specifically, I retrieve identical results with comparable computing time. I apologize for raising unnecessary red flags and offending anyone as that was not my intention. I am learning the tools and how to use them effectively and that is my primary goal when I communicate any concerns or thoughts, etc.

Now, when I query the 2MASS catalog, I now use astroquery using @bsipocz's suggestion:

Irsa.query_region(f_coord, catalog='fp_psc')

This is much simpler for a couple of reasons. 1) ADQL parsing is difficult for 2MASS. I ran into issues using DISTANCE() for negative coordinate values. While the helpdesk says this has been fixed, I still cannot get it to work. 2) I cannot appear to use a table upload the way I can for GAIA (although this is true for Irsa.query_region too AFAIK). 3) query_region avoids much of ADQL text necessary for a query_tap (whether in astroquery or in pvo).

Which brings me to my final assessment: using query_tap in astroquery or pyvo's .search() still uses the same bit of string to build the query we submit and run. So, in the places where this is done, there is really no discernible difference between using pyvo or astropy that I can see, so I am indifferent how this part of the code is implemented. I will let @trjaffe and @bsipocz decide what is the fate here. query_region for the 2MASS query is in fact simpler code wise, so I think it makes the most sense to use it here instead of the existing code.

ADQL parsing is difficult for 2MASS. I ran into issues using DISTANCE() for negative coordinate values. While the helpdesk says this has been fixed, I still cannot get it to work. 2) I cannot appear to use a table upload the way I can for GAIA (although this is true for Irsa.query_region too AFAIK).

Can you provide me with a simple reproducible example of both the DISTANCE() and table upload queries that are failing for you? I want to follow up with our backend team.

Sure. So, this works:

f_l, f_b = grouped_display_table['fermi_l'][0], grouped_display_table['fermi_b'][0] f_coord = SkyCoord(l=f_l*u.deg, b=f_b*u.deg, frame="galactic").icrs ra_in = str(f"{f_coord.ra.deg:.5f}") dec_in = str(f"{f_coord.dec.deg:.5f}") rad = str(radius) twomass_query = ( """ SELECT TOP 3 designation, ra, dec, glon, glat FROM fp_psc WHERE 1=CONTAINS( POINT('ICRS', ra, dec), CIRCLE('ICRS', """ + ra_in + """, """ + dec_in + """, """ + rad + """) ) """ ) twomass_results = mass.run_async(twomass_query).to_table() print(twomass_results)

where ra_in , dec_in, and rad are 161.27739 -59.68220 0.03 in degrees.

If you add DISTANCE(POINT('ICRS', ra, dec), POINT('ICRS', CAST(""" + ra_in + """ AS DOUBLE), CAST(""" + dec_in + """ AS DOUBLE))) AS dist as we do in the GAIA query using the CAST() and strings of coordinate values to avoid other parsing issues, it returns

DALServiceError: Cannot wait for job completion. Job is not active!

due to calling it with run_async, but if you change it to search it gives

DALQueryError: UsageFault: BAD_REQUEST: Invalid or unsupported ADQL query string. See TAP documentation here: https://irsa.ipac.caltech.edu/docs/program_interface/TAP.html.

If I try JOIN in the same way as I do for GAIA,

twomass_query = ( """ SELECT TOP 3 f.fermi_name, f.ra, f.dec, m.designation, m.ra, m.dec, m.glon, m.glat FROM TAP_UPLOAD.mytable AS f JOIN fp_psc AS m ON 1=CONTAINS( POINT('ICRS', m.ra, m.dec), CIRCLE('ICRS', """ + ra_in + """, """ + dec_in + """, """ + rad + """) ) ORDER BY f.fermi_name """ ) twomass_results = mass.search(twomass_query,uploads={'mytable': upload_table}).to_table()

the error is again:

DALQueryError: UsageFault: BAD_REQUEST: Invalid or unsupported ADQL query string. See TAP documentation here: https://irsa.ipac.caltech.edu/docs/program_interface/TAP.html

but I cannot figure out why the syntax is wrong other than adding JOIN or DISTANCE.

crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md

…tch/cxc-fermi-gaia-2mass_cross-match/README.md

…ndra_fermi_gaia_2mass.md to crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md

Following notebook review comments. Need to implement astroquery in 2MASS query at the end of the notebook. Look for it in the next version.

jleagle94 · 2025-09-30T15:20:09Z

I have implemented most of everyone's comments/suggestions following the discussions here. Latest version is up. I ran into a new issue running it on Fornax though, which I think is coming from Python 3.12 dependency clashes between matplotlib and astropy. I don't have time to work through this, so I instead made a Python 3.11 environment and ran the notebook there successfully. The version in shared-storage/ on Fornax documents those steps needed for now.

Review and add new fornax notebook and files

50ed1ec

jleagle94 added ready-for-review use case: cross-matching labels Aug 27, 2025

zoghbi-a self-assigned this Aug 27, 2025

bsipocz changed the title ~~Review and add new fornax notebook and files~~ Add CFG2 cross matching notebook Aug 27, 2025

bsipocz reviewed Aug 27, 2025

View reviewed changes

troyraen reviewed Aug 29, 2025

View reviewed changes

trjaffe reviewed Aug 29, 2025

View reviewed changes

Merge branch 'nasa-fornax:main' into add-notebook

0db2ee1

troyraen reviewed Sep 16, 2025

View reviewed changes

crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md Outdated Show resolved Hide resolved

jleagle94 added 9 commits September 29, 2025 16:26

Merge branch 'nasa-fornax:main' into add-notebook

441fee3

Rename all_sky_galactic_chandra_fermi_gaia_2mass/README.md to crossma…

da8facf

…tch/cxc-fermi-gaia-2mass_cross-match/README.md

Update README.md

c2e591e

Rename all_sky_galactic_chandra_fermi_gaia_2mass/all_sky_galactic_cha…

ee5cb77

…ndra_fermi_gaia_2mass.md to crossmatch/cxc-fermi-gaia-2mass_cross-match/cxc-fermi-gaia-2mass_cross-match.md

Delete all_sky_galactic_chandra_fermi_gaia_2mass directory

a662e3d

requirements file

82eebe5

Updated notebook file

dc4f3c1

Following notebook review comments. Need to implement astroquery in 2MASS query at the end of the notebook. Look for it in the next version.

Update index.md

fdb8cf6

Final versions

528d035


		import requests

		#2MASS query

Add CFG2 cross matching notebook #492

Are you sure you want to change the base?

Add CFG2 cross matching notebook #492

Uh oh!

Conversation

jleagle94 commented Aug 27, 2025

Uh oh!

bsipocz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DavidT3 commented Aug 28, 2025

Uh oh!

jleagle94 commented Aug 28, 2025

Uh oh!

troyraen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jleagle94 commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jleagle94 commented Aug 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jleagle94 Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

jleagle94 commented Aug 29, 2025 •

edited

Loading

jleagle94 Sep 9, 2025 •

edited

Loading

jleagle94 Sep 10, 2025 •

edited

Loading

jleagle94 Sep 10, 2025 •

edited

Loading

jleagle94 Sep 30, 2025 •

edited

Loading