Skip to content

TaQL: integer aggregate function columns have invalid datatype #265

@bennahugo

Description

@bennahugo

Hi I'm putting together a demo of taql and running into a problem with simple groupby / aggregate functions. I'm importing the table system from casatools for casa version 6.5.6.22

Here is a simple example:

tt = tb.taql("select ANTENNA1,ANTENNA2,gcount() as samplecount, sqrt(sumsqr(UVW[:2])) as bllength from tart.ms WHERE ANTENNA1!=ANTENNA2 GROUPBY ANTENNA1,ANTENNA2")

executes, but fails when I try to do tt.getcol("ANTENNA1") or tt.getcol('samplecount') with

2024-04-11 19:25:37	SEVERE	getcol::samplecount	Exception Reported: Unknown casa DataType!

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[88], line 1
----> 1 tt.getcol('samplecount')

File /opt/venvcasa/lib/python3.8/site-packages/casatools/table.py:838, in table.getcol(self, columnname, startrow, nrow, rowincr)
    827 def getcol(self, columnname, startrow=int(0), nrow=int(-1), rowincr=int(1)):
    828     """The entire column (or part of it) is returned. Warning: it might be big!
    829     The functions can only be used if all arrays in the column have the
    830     same shape. That is guaranteed for columns containing scalars or fixed
   (...)
    836     shaped
    837     """
--> 838     return self._swigobj.getcol(columnname, startrow, nrow, rowincr)

File /opt/venvcasa/lib/python3.8/site-packages/casatools/__casac__/table.py:2154, in table.getcol(self, *args, **kwargs)
   2115 def getcol(self, *args, **kwargs):
   2116     """
   2117     getcol(self, _columnname, _startrow, _nrow, _rowincr) -> variant *
   2118 
   (...)
   2152 
   2153     """
-> 2154     return _table.table_getcol(self, *args, **kwargs)

RuntimeError: Unknown casa DataType!

and

2024-04-11 19:34:41	SEVERE	getcol::ANTENNA1	Exception Reported: Unknown casa DataType!

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[90], line 1
----> 1 tt.getcol("ANTENNA1")

File /opt/venvcasa/lib/python3.8/site-packages/casatools/table.py:838, in table.getcol(self, columnname, startrow, nrow, rowincr)
    827 def getcol(self, columnname, startrow=int(0), nrow=int(-1), rowincr=int(1)):
    828     """The entire column (or part of it) is returned. Warning: it might be big!
    829     The functions can only be used if all arrays in the column have the
    830     same shape. That is guaranteed for columns containing scalars or fixed
   (...)
    836     shaped
    837     """
--> 838     return self._swigobj.getcol(columnname, startrow, nrow, rowincr)

File /opt/venvcasa/lib/python3.8/site-packages/casatools/__casac__/table.py:2154, in table.getcol(self, *args, **kwargs)
   2115 def getcol(self, *args, **kwargs):
   2116     """
   2117     getcol(self, _columnname, _startrow, _nrow, _rowincr) -> variant *
   2118 
   (...)
   2152 
   2153     """
-> 2154     return _table.table_getcol(self, *args, **kwargs)

RuntimeError: Unknown casa DataType!

It succeeds with the floating point aggregate, or when I get a floating point value column

tt.getcol('bllength')
array([0.19455482, 0.51647411, 0.75153464, 0.3595906 , 0.86945279,
       1.03569452, 1.20868631, 1.46945354, 1.61672806, 1.95074426,
       2.13140339, 1.0461806 , 1.18919058, 1.70865012, 2.0106987 ,
       2.17048716, 2.48252164, 1.02388213, 1.36084806, 1.68034738,
       1.79704041, 1.9604013 , 2.17944748, 0.67646162, 0.89569323,
       0.53671945, 1.00977189, 0.8411397 , 1.01413148, 1.27489871,
       1.42217324, 1.75618943, 1.93684857, 1.17858796, 1.31766977,...

Funnily enough though when no aggregate functions are requested in the select the integer values return correctly
e.g.

tt = tb.taql("select ANTENNA1,ANTENNA2 as bllength from tart.ms WHERE ANTENNA1!=ANTENNA2 GROUPBY ANTENNA1,ANTENNA2")
tt.getcol("ANTENNA1") # or ANTENNA2

returns an int valued array

array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,
        2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  3,  3,
        3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,

Full environment:

casadata==2023.6.26
casafeather==0.0.18
casalogger==1.0.16
casampi==0.5.3
casaplotms==2.1.2
casaplotserver==1.6.1
casashell==6.5.6.22
casatablebrowser==0.0.32
casatasks==6.5.6.22
casatestutils==6.5.6.22
casatools==6.5.6.22
casaviewer==1.8.2
python-casacore==3.5.2

Under the python-casacore table import the taql statements work correctly. I'm not sure what inside the casa environment makes this not work - perhaps something to raise with the casa team members?

from pyrap.tables import table as tbl
from pyrap.tables import taql
tt = taql("select ANTENNA1,ANTENNA2,gcount() as samplecount, sqrt(sumsqr(UVW[:2])) as bllength from tart.ms WHERE ANTENNA1!=ANTENNA2 GROUPBY ANTENNA1,ANTENNA2")
tt.getcol("ANTENNA1")
tt.getcol("samplecount")

returns

array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,
        2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  3,  3,...

and

array([6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841,
       6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841,
       6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841,
       6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841,
       6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841,
       6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841,
       6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841, 6841,

respectively as expected

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions