Support writing and reading table indices for formats supporting column metadata by taldcroft · Pull Request #18703 · astropy/astropy

taldcroft · 2025-10-09T15:47:29Z

Description

This has been a long-standing goal since 2017 (#6925 and #16836 from 2024).

See this gist for a quick demo of this functionality.

By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

github-actions · 2025-10-09T15:47:45Z

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

Do the proposed changes actually accomplish desired goals?
Do the proposed changes follow the Astropy coding guidelines?
Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

github-actions · 2025-10-09T15:47:46Z

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

mhvk · 2025-10-09T20:24:31Z

I only had time for a brief look, and think the overall idea is good. However, I wonder if we cannot stick closer to the existing idea of serializing columns, i.e., treat this as a representation of a specific class. I feel that might avoid introducing specific new arguments like engine or wonder about how to represent that.

For reference, here's a simple table with a SkyCoord and a masked column (note that the mask in the sky column ignores serialize_method="data_mask" -- #18704):

# %ECSV 1.0
# ---
# datatype:
# - {name: sc.ra, unit: deg, datatype: float64}
# - {name: sc.dec, unit: deg, datatype: float64}
# - {name: q, unit: km, datatype: float64}
# - {name: q.mask, datatype: bool}
# meta: !!omap
# - __serialized_columns__:
#     q:
#       __class__: astropy.units.quantity.Quantity
#       unit: !astropy.units.Unit {unit: km}
#       value: !astropy.table.SerializedColumn
#         __class__: astropy.utils.masked.core.MaskedNDArray
#         data: !astropy.table.SerializedColumn {name: q}
#         mask: !astropy.table.SerializedColumn {name: q.mask}
#     sc:
#       __class__: astropy.coordinates.sky_coordinate.SkyCoord
#       dec: !astropy.table.SerializedColumn
#         __class__: astropy.coordinates.angles.core.Latitude
#         unit: &id001 !astropy.units.Unit {unit: deg}
#         value: !astropy.table.SerializedColumn
#           __class__: astropy.utils.masked.core.MaskedNDArray
#           data: !astropy.table.SerializedColumn {name: sc.dec}
#       frame: icrs
#       ra: !astropy.table.SerializedColumn
#         __class__: astropy.coordinates.angles.core.Longitude
#         unit: *id001
#         value: !astropy.table.SerializedColumn
#           __class__: astropy.utils.masked.core.MaskedNDArray
#           data: !astropy.table.SerializedColumn {name: sc.ra}
#         wrap_angle: !astropy.coordinates.Angle
#           unit: *id001
#           value: 360.0
#       representation_type: spherical
# schema: astropy-2.0
sc.ra sc.dec q q.mask
"" "" 0.0 False
29.999999999999996 6.0 8.0 True
44.99999999999999 7.0 9.0 False

I guess an overall difficulty, as always, is naming... This is partially related to whether one should think of the indices as belonging to the column or to the table. In the former case, maybe it should be something like col._index_. But multiple columns is possible and the index cannot be accessed that way. Still should it be _index_col and _index_col1_col2 rather than a number?

taldcroft · 2025-10-11T10:12:36Z

However, I wonder if we cannot stick closer to the existing idea of serializing columns, i.e., treat this as a representation of a specific class.

@mhvk - I did think about trying to fit this into the serialized columns infrastructure and I agree that re-using this would be preferable if possible. There were a couple of issues that I could not resolve:

I don't think it will be possible to entirely eliminate all the table-level sandwich construct/represent code. To create an index for a multi-column index on ["a", "b"], the code needs the data for both columns a and b. The serialized column machinery is explicitly running per-column and column a knows nothing about the column b data.

Similarly, the machinery is expecting that the attribute data for a column is unique to that column. Consider the multi-column index above. Then column a represents the row index data with a.indices.row_index, but this will be repeated for column b as b.indices.row_index. It isn't obvious how to get around this. This also presumes some new class to allow constructing an index from some attributes and data (basically doing what get_index does but in a class, though again since this requires the entire table I'm not even sure this can work).

So overall I am unable to see a clear path to actually using the serialized columns machinery here. That doesn't mean it isn't possible, just that I probably can't do it without help.

~~What might be reasonable is moving the representation tbl["a"]["meta"]["__indices__"] data into the meta["__serialized_columns__"] structure. That would at least avoid making a new namespace.~~ Edit: see below.

One lingering doubt in all of this is that I do not like the current indexing implementation of the indices living on the columns instead of the table. I was supervising the GSOC student that did this but I honestly don't remember why we went that direction, but I have in my head a project to refactor this. If you look at the earlier commits in this PR, the indices were being serialized at the table level which actually feels somewhat more natural and you avoid duplication for multi-column indices.

taldcroft · 2025-10-11T10:24:19Z

About putting the indices into __serialized_columns__, I realize now that this would not be easy. The way things are now the code wants to consume all the keywords and put them into the class constructor. So:

>>> text = """
# %ECSV 1.0
# ---
# datatype:
# - {name: col0, unit: m, datatype: float64}
# meta: !!omap
# - __serialized_columns__:
#     col0:
#       __class__: astropy.units.quantity.Quantity
#       unit: !astropy.units.Unit {unit: m}
#       value: !astropy.table.SerializedColumn {name: col0}
#       indices: stuff
# schema: astropy-2.0
col0
1.0
2.0
"""
>>> QTable.read(text, format="ecsv")
TypeError: Quantity.__new__() got an unexpected keyword argument 'indices'

Everything is possible but I fear this would be messy.

taldcroft · 2025-10-11T10:29:02Z

~~I'm somewhat leaning to going back to the original implementation of a new __table_indices__ key on the table meta, alongside __serialized_columns__.~~ That eliminates duplication and matches the more intuitive notion that the indices are an attribute of the table not the columns.

EDIT: that is conceptually nicer but has an implementation problem.

mhvk · 2025-10-12T00:27:40Z

@taldcroft - I agree, let's think of those indices not as part of any column! Given that, I agree with the basic idea of your notation -- though perhaps it could still be something like f"_index_{'_'.join(relevant_colnames)}". I do worry that might get quite long, so perhaps just a number is fine. After all, in general there will be only one index (in which case I'd suggest it to be just _index_ or _index or so).

This, however, arguably makes those indices more like a column, not less so! So, I continue to wonder if it would not be possible to use the SerializedColumn machinery. If I understand the indexing correctly (which I'm really not sure I do...), the actual index holders are SlicedIndex. I guess conceptually the question is whether one can imagine it having a sensible .info so that it looks enough like a column that it can be serialized. Of course, one then still has to special-case that it is not stored in TableColumns, but rather in Table.meta. But it seems OK that where the indices are stored on the table is an implementation detail.

Anyway, I'll try to think a bit more too, with the idea that they live on the table, not on columns, in mind.

taldcroft · 2025-10-12T10:41:38Z

@mhvk - for some reason when writing about using a table-level __table_indices__, I forgot the reason I switched to columns. The FITS writer handles meta on columns transparently, but for table level meta it defaults to those items being normal FITS keywords, with special handling for __serialized_columns__.

This highlights that the current solution (meta on columns) allows keeping the code for this feature confined to table/connect.py, while ideas at the table level will require changes in other modules.

taldcroft · 2025-10-12T11:01:22Z

the question is whether one can imagine it having a sensible .info so that it looks enough like a column that it can be serialized.

As described more fully previously, I do not see how this is possible. The serialization mechanism with .info is predicated on the object owning its attributes and having a constructor that takes an attribute column (e.g. ra for SkyCoord) as a keyword argument. If that previous comment does not make sense I can elaborate.

taldcroft · 2025-10-12T11:06:14Z

Of course, one then still has to special-case that it is not stored in TableColumns, but rather in Table.meta.

If, in the end, we are putting the serialized index information somewhere other than __serialized_columns__, then I don't see any reason to use the SerializedColumn machinery to do the serialization since that amounts to an implementation detail. The current implementation is almost certainly simpler and more confined than what we could do via SerializedColumn and .info since that is not a great fit and requires making more special cases and other scaffolding to work.

mhvk · 2025-10-12T13:27:33Z

@taldcroft - I can see it making some sense, in that, effectively, these are computed columns, so different, and it would seem correct to not have them inside __serializaed_columns__. (Which doesn't mean one couldn't use the machinery more -- I think whatever would be in .info is not all that different from what you currently coded up in get_index -- but let's agree to make it an implementation detail.)

So, agreed to think of it as part of the table meta data and call it __indices__.

Another question would be why one would store the whole index rather than just have it be recomputed? (Or that at least be the default option?) But to answer that at least in part myself: I think it is great to have the possibility to write out the index column since it provides information about the table that now is explicitly exposed, and can thus be read by humans or regular .csv readers, etc. And whatever is in the header needs of course to include what is needed to reconstruct the index.

mhvk · 2025-10-12T13:30:25Z

More to the nitty gritty, looking at the output as you have it,

      #   - __indices__:
      #     - colnames: !!python/tuple [a]
      #       engine: SortedArray
      #       primary: true
      #       row_index_colname: __index__0
      #       unique: false

First, smaller details, easy to fix:

I'd just let colnames be a yaml list - that it is a tuple is irrelevant for the information, and since we are creating a new serialization mechanism for .escv, we don't need to include that.
primary would seem a property not of an individual index, but of the whole group, so I'd move that out under __indices__ -- though not sure one wants it to refer to [a], as it does in the table, or rather to __index__0.

A slightly larger suggestion would still be to go with a format more like SerialiazedColumn, to keep implementation simpler and more consistent (also for other people writing .ecsv readers!). I.e., I'd do,

- __indices__
  - primary: __index__0  # or primary_key: [a], might want to omit if there is only one index
  - __index__0  # still don't like that name much!, do omit zero if there is only one.
    - __class__: Index  # not SlicedIndex, that seems irrelevant
    - columns: [a]
    - engine: SortedArray
    - unique: False

So, not very different from what you have, except that now __index__0 is basically formatted like a serialized column, so initialization is trivial: just __class__(**kwargs), with machinery that is in place.

mhvk · 2025-10-12T13:39:20Z

This highlights that the current solution (meta on columns) allows keeping the code for this feature confined to table/connect.py, while ideas at the table level will require changes in other modules.

Hmm, only now saw your counterpoint to what we seemed to be heading towards... I do think our conclusion was the right one, though, so perhaps we should bite the bullet and start treating tables more like columns everywhere. Maybe this is not so difficult if we start making table.info more compatible with column.info? (Not suggesting we again try to make a table be able to be a column in another one, but just to increase the amount of shared machinery...)

taldcroft · 2025-10-13T10:10:24Z

Another question would be why one would store the whole index rather than just have it be recomputed?

For me at least, the promise of serializing the indices comes with actually storing the index like a real database. I.e. prefering to use disk space instead of CPU and processing time. Just having a light shim to store the keys and recompute the indices seems like a bit of a sham.

taldcroft · 2025-10-13T10:44:41Z

So, agreed to think of it as part of the table meta data and call it __indices__.

I'm not sure if you are being loose with words or I am not getting the message across. The index information needs to be stored on the COLUMN meta, not on the table meta. That is, unless we are willing to add significant new infrastructure to serializing meta for FITS.

Currently the FITS writer assumes that all keywords in the table meta can be expressed directly in FITS, i.e. the values should be simple types like int, float, string. In _encode_mixins(), the code checks if there is any column meta, and if so triggers the subsequent code to serialize that arbitrary column meta using FITS COMMENT line starting with "--BEGIN-ASTROPY-SERIALIZED-COLUMNS--". It is not elegant but that's what we use.

tl;dr: column meta gets serialized for free in FITS (and ECSV, HDF5).

Serializing table meta in FITS would require special-casing some keyword like __indices__ and then making a new section of FITS header comments like "--BEGIN-ASTROPY-SERIALIZED-INDICES--". This is feasible, but I'm not very keen on doubling-down a formalism that I don't like to begin with (despite having invented it!).

taldcroft · 2025-10-13T11:07:27Z

About the suggestion of formatting like below:

- __indices__
  - primary: __index__0  # or primary_key: [a], might want to omit if there is only one index
  - __index__0  # still don't like that name much!, do omit zero if there is only one.
    - __class__: Index  # not SlicedIndex, that seems irrelevant
    - columns: [a]
    - engine: SortedArray
    - unique: False

I also don't like the repetition of primary, but with your suggestion there is a problem that YAML mappings do not preserve the order, so managing them by key does not guarantee preserving the index order. This matters because tbl.indices is fundamentally a list and code can certainly do index = tbl.indices[1].

tl;dr: the indices must be represented as a list, not a mapping.

So, not very different from what you have, except that now __index__0 is basically formatted like a serialized column, so initialization is trivial: just __class__(**kwargs), with machinery that is in place.

Sadly the indexing code is not so simple. The Index class is actually created as an object and then passed into the SlicedIndex initializer. The new get_index code somewhat approximates what you have in your mind, but you can see that the entire table is an input to this constructor. Your __class__(**kwargs) does not have sufficient information to create a SlicedIndex.

taldcroft · 2025-10-13T13:42:41Z

@mhvk - I've pushed a few more commits to start making this real, with the understanding that the output format may still be open for update.

Right now, here is what it looks like, where I implemented your suggestions about colnames and starting with __index__. Another key update here is not repeating index_info, so each unique index info appears only once. I agree this is not perfect, but I do think it is better than the alternative I outlined for FITS.

>>> t = QTable()
>>> t["a"] = [2, 3, 1, 1]
>>> t["b"] = [3, 5, 4, 6]
>>> t.add_index("a", engine=SCEngine)
>>> t.add_index(["b", "a"])
>>> t.write(sys.stdout, format="ecsv", write_indices=True)
# %ECSV 1.0
# ---
# datatype:
# - name: a
#   datatype: int64
#   meta: !!omap
#   - __indices__:
#     - colnames: [a]
#       engine: SCEngine
#       primary: true
#       row_index_colname: __index__
#       unique: false
# - name: b
#   datatype: int64
#   meta: !!omap
#   - __indices__:
#     - colnames: [b, a]
#       engine: SortedArray
#       primary: false
#       row_index_colname: __index__1
#       unique: false
# - {name: __index__, datatype: int64}
# - {name: __index__1, datatype: int64}
# schema: astropy-2.0
a b __index__ __index__1
2 3 2 0
3 5 3 2
1 4 0 1
1 6 1 3

taldcroft · 2025-10-19T12:23:02Z

Ping @mhvk.

The feature freeze is coming very soon and this would be good to get in if we can agree on the format and location for the new metadata. Thinking about how the ECSV looks is maybe the easiest way to visualize. The internal implementation does not matter too much since we can change that later.

The failing tests are all from a trivial doctest issue that I will fix.

pllim · 2025-10-20T16:12:22Z

The feature freeze is coming very soon

OMG is it this week? 😱

taldcroft · 2025-10-20T16:24:01Z

The release calendar says Friday Oct 24, but I have not seen any notice to astropy-dev. Usually there should be a week since the final warning and the feature freeze.

taldcroft · 2026-04-30T13:00:50Z

Passing tests!

neutrinoceros · 2026-04-30T13:02:46Z

Quick note that I couldn't actually get to review the tests this morning, because the task I tackled first (looking for the origin of a segfault in devdeps) took me much longer than I anticipated. I'll get around to this next time I'm at a keyboard, which will be about 2 hours from now. Sorry about this delay.

taldcroft · 2026-04-30T14:28:08Z

@mhvk @neutrinoceros - I looked again at the FITS code and I think I see a way to use a table-level meta. It requires some special-casing and also smushing a new key into the --BEGIN-ASTROPY-SERIALIZED-COLUMNS-- comments section. It would look like below. I'm going to play around with this today as a follow-up PR to this one, but let me know now if you have objections.

COMMENT --BEGIN-ASTROPY-SERIALIZED-COLUMNS--
COMMENT datatype:
COMMENT - {name: a, unit: m, datatype: float64}
COMMENT - {name: b, datatype: int64}
COMMENT meta: !!omap
COMMENT - __serialized_columns__:
COMMENT     a:
COMMENT       __class__: astropy.units.quantity.Quantity
COMMENT       unit: !astropy.units.Unit {unit: m}
COMMENT       value: !astropy.table.SerializedColumn {name: a}
COMMENT - __table_indices__:
COMMENT     indices:
COMMENT     - colnames: [a, __index__1]
COMMENT       row_index_colname: __index__
COMMENT     - colnames: [a]
COMMENT       row_index_colname: __index__2
COMMENT     primary_key: [a, __index__1]
COMMENT --END-ASTROPY-SERIALIZED-COLUMNS--

mhvk

@taldcroft - I think that FITS comment section would be very self-explanatory, so definitely good as far as I'm concerned. And happy to take (correctly) any blame if we have to convince the maintainers that this can go in a little late...

But let me also approve this now -- it looks all good! I won't merge, though, since @neutrinoceros seemed to intend to come back to this soon (aside, thanks very much @neutrinoceros for taking over the more careful review!).

taldcroft · 2026-04-30T16:07:17Z

Thanks @mhvk. I only regret now calling that section ASTROPY-SERIALIZED-COLUMNS instead of something like ASTROPY-TABLE-META. Sigh.

About the follow-up PR, it might not be a stretch to call it a bug fix that could be applied to the RC. 😄

neutrinoceros

got a couple simple questions about these tests

neutrinoceros · 2026-04-30T16:31:14Z

+    lines = [
+        "# %ECSV 1.0",
+        "# ---",
+        "# datatype:",
+        "# - name: a",
+        "#   datatype: int64",
+        "#   meta: !!omap",
+        "#   - __indices__:",
+        "#     - colnames: [a]",
+        "#       engine: Foo",
+        "#       index_colname: __index__",
+        "#       unique: false",
+        "# - {name: __index__, datatype: int64}",
+        "# schema: astropy-2.0",
+        "a __index__",
+        "1 0",
+        "3 2",
+        "2 1",
+    ]
+    text = "\n".join(lines)


I think this is stable under ruff format

Suggested change

lines = [

"# %ECSV 1.0",

"# ---",

"# datatype:",

"# - name: a",

"# datatype: int64",

"# meta: !!omap",

"# - __indices__:",

"# - colnames: [a]",

"# engine: Foo",

"# index_colname: __index__",

"# unique: false",

"# - {name: __index__, datatype: int64}",

"# schema: astropy-2.0",

"a __index__",

"1 0",

"3 2",

"2 1",

]

text = "\n".join(lines)

text = "\n".join([

"# %ECSV 1.0",

"# ---",

"# datatype:",

"# - name: a",

"# datatype: int64",

"# meta: !!omap",

"# - __indices__:",

"# - colnames: [a]",

"# engine: Foo",

"# index_colname: __index__",

"# unique: false",

"# - {name: __index__, datatype: int64}",

"# schema: astropy-2.0",

"a __index__",

"1 0",

"3 2",

"2 1",

])

I don't like stacking things like this, won't fix.

What about using implicit str concatenation then ?

(Basically, replace "\n".join([ with (, drop all trailing commas, and change ]) for ) at the end)

I find the easiest to read for these kinds of situations is just a triple quote string with a dedent at the end!

neutrinoceros · 2026-04-30T16:38:33Z

+    t.add_index("a")
+    out = io.StringIO()
+    t.write(out, format="ecsv", write_indices=True)
+    assert out.getvalue().splitlines() == [


Is there a particular benefit to comparing lists of lines instead of a single string of text that I'm not seeing ?

Because it formats more nicely. I'm always bothered by having fully left-justified lines in the code, it just looks ugly. I've settled on this idiom and it works well and is convenient for me.

Implicit str concatenation looks very similar but even simpler.

But the output using implicit str concatenation is not easy to generate, in contrast to the way I am doing it.

What do mean "generate" ?

For "regression test" type outputs like this, I normally run the test script with --pdb and if it fails, then I print out.getvalue().splitlines(), paste that into the file and reformat with ruff. Then I examine the diffs carefully to ensure that they are what I expect.

That process is reliable and efficient.

fair enough

neutrinoceros · 2026-04-30T16:43:15Z

+    t = QTable()
+    t["a"] = Time([1, 3, 2, 2], format="cxcsec")
+    t["b"] = [3, 2, 2, 1]
+    t["__index__"] = [3, 1, 4, 2]  # Force a collision
+    indices_colnames = [
+        ["a"],
+        ["b", "a"],
+        ["a", "b", "__index__"],
+        ["__index__"],
+    ]


since the setup never changes, I'd prefer storing this input in a module constant and reuse-it (without mutations) than skip test case that we currently think do not matter (but for how long ?).

Won't fix, out of time.

neutrinoceros · 2026-04-30T16:44:53Z

+        t.add_index(colnames, engine=engine)
+
+    path = tmp_path / f"out.{fmt}"
+    kwargs = {"serialize_meta": True, "path": "root"} if fmt == "hdf5" else {}


this would be clearer if expressed as part of the test's parametrization instead of here

Is it okay if I do it ? Maybe as a follow up PR ?

You can do a followup PR, but not before the 8.0 feature freeze. I'm currently working on a PR (intended for 8.0) to move the serialization info from columns into table meta. This will touch tests and I don't want to get into a merge conflict.

neutrinoceros · 2026-04-30T16:45:02Z

+    kwargs = {"serialize_meta": True, "path": "root"} if fmt == "hdf5" else {}
+    t.write(path, format=fmt, write_indices=True, **kwargs)
+
+    kwargs = {"astropy_native": True} if fmt == "fits" else {}


neutrinoceros · 2026-04-30T16:47:02Z

+    for colnames in indices_colnames:
+        index = t.indices[colnames]
+        index2 = t2.indices[colnames]
+        assert index.id == index2.id


just a note in passing, I really wish we could use pytest 9's subtest fixture here. We can't, it's too new, but I'll still take the opportunity to raise awareness about this new and extremely convenient way to write large tests with many asserts

neutrinoceros · 2026-04-30T16:49:23Z

+        assert np.all(index.data.sorted_data() == index2.data.sorted_data())
+        # Check engine items as a list of pairs of the form
+        # [(key, [row 1, row 2, ...]), ...].
+        assert index.data.items() == index2.data.items()


does this actually compare lists ? Deliberately not refreshing my memory here, but when I see an items() method I expect it to work as dict's, which produces a generator, which in turn doesn't really support comparisons other than identity.

For these engines the protocol dictates that .items() returns list[tuple[Hashable, list[Integral]]]. So this is OK.

taldcroft

@neutrinoceros - thanks for the detailed review. I'm not going to address some of your change requests that don't directly improve test correctness or completeness but are intended to improve the code style. I am out of time. I'd like to get this PR merged and if you want to improve the test code style later that's fine.

taldcroft · 2026-04-30T18:25:56Z

+    t = QTable()
+    t["a"] = Time([1, 3, 2, 2], format="cxcsec")
+    t["b"] = [3, 2, 2, 1]
+    t["__index__"] = [3, 1, 4, 2]  # Force a collision
+    indices_colnames = [
+        ["a"],
+        ["b", "a"],
+        ["a", "b", "__index__"],
+        ["__index__"],
+    ]


Won't fix, out of time.

taldcroft · 2026-04-30T18:26:26Z

+        t.add_index(colnames, engine=engine)
+
+    path = tmp_path / f"out.{fmt}"
+    kwargs = {"serialize_meta": True, "path": "root"} if fmt == "hdf5" else {}


taldcroft · 2026-04-30T18:26:32Z

+    kwargs = {"serialize_meta": True, "path": "root"} if fmt == "hdf5" else {}
+    t.write(path, format=fmt, write_indices=True, **kwargs)
+
+    kwargs = {"astropy_native": True} if fmt == "fits" else {}


taldcroft · 2026-04-30T18:28:25Z

+        assert np.all(index.data.sorted_data() == index2.data.sorted_data())
+        # Check engine items as a list of pairs of the form
+        # [(key, [row 1, row 2, ...]), ...].
+        assert index.data.items() == index2.data.items()


For these engines the protocol dictates that .items() returns list[tuple[Hashable, list[Integral]]]. So this is OK.

neutrinoceros · 2026-04-30T19:07:14Z

Understandable. Will you be available tomorrow at all, or should I just assume that any correction has to be postponed to after the freeze ?

taldcroft · 2026-04-30T21:37:22Z

I won't be able to look at it until my follow-up or is merged. Your cleanup PR needs to come after my follow-up PR because they are likely to conflict. I'm only working for a couple of hours tomorrow morning.

neutrinoceros · 2026-05-01T08:25:18Z

argh. Forgot to squash.

neutrinoceros · 2026-05-01T08:26:40Z

@astrofrog, any way we could replay this as a squash-merge ? (basically what I'm asking amounts to "could you force push my merge commit away very soon ?")

astrofrog · 2026-05-01T09:58:40Z

Why do we need to squash merge?

neutrinoceros · 2026-05-01T10:04:51Z

NBD if we can't, as far as I'm concerned. I just wish I could ease my instant regrets.

taldcroft · 2026-05-01T10:05:27Z

I would have preferred a squash merge, but I agree at this point that we should not do any shenanigans with main. There are no huge files or other particular problems in the commits.

taldcroft added this to the v7.2.0 milestone Oct 9, 2025

github-actions Bot added table unified-io labels Oct 9, 2025

taldcroft mentioned this pull request Oct 9, 2025

index column not preserved on table write/read #16836

Closed

taldcroft requested a review from mhvk October 9, 2025 15:52

taldcroft commented Oct 9, 2025

View reviewed changes

Comment thread astropy/table/connect.py Outdated

taldcroft added the whatsnew-needed label Oct 13, 2025

taldcroft force-pushed the table-serialize-indices-on-read-write branch from ca99162 to 991b078 Compare October 14, 2025 18:53

taldcroft marked this pull request as ready for review October 19, 2025 12:23

taldcroft requested a review from neutrinoceros as a code owner October 19, 2025 12:23

taldcroft force-pushed the table-serialize-indices-on-read-write branch from e8a4b10 to 4d97ef3 Compare April 30, 2026 10:34

taldcroft added 3 commits April 30, 2026 06:36

Fix test

530ca36

Fix changelog file name

c5f4ac6

Add a comment to clarify code flow

035d6cd

taldcroft requested review from mhvk and neutrinoceros April 30, 2026 13:00

mhvk approved these changes Apr 30, 2026

View reviewed changes

neutrinoceros reviewed Apr 30, 2026

View reviewed changes

taldcroft added 2 commits April 30, 2026 14:06

Change warning message for clarity

4e8b292

Address review comments

0aff966

taldcroft commented Apr 30, 2026

View reviewed changes

neutrinoceros approved these changes May 1, 2026

View reviewed changes

neutrinoceros merged commit d7e0ad4 into astropy:main May 1, 2026
37 of 38 checks passed

taldcroft deleted the table-serialize-indices-on-read-write branch May 1, 2026 10:05

taldcroft mentioned this pull request May 1, 2026

Put serialized table indices into table meta instead of column meta #19630

Merged

1 task

astrofrog mentioned this pull request May 1, 2026

armv7 CI failures with recent serialization changes #19632

Closed

neutrinoceros mentioned this pull request May 1, 2026

TST: fix portability issue with new serialization tests #19634

Closed

1 task

taldcroft mentioned this pull request May 2, 2026

Fix table indices serialization test failures on ARMv7 and improve coverage #19644

Merged

1 task

neutrinoceros mentioned this pull request May 7, 2026

TST: centralize parameters on a test from #18703 #19709

Open

1 task

Uh oh!

Conversation

taldcroft commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

github-actions Bot commented Oct 9, 2025

Uh oh!

github-actions Bot commented Oct 9, 2025

Uh oh!

Uh oh!

mhvk commented Oct 9, 2025

Uh oh!

taldcroft commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taldcroft commented Oct 11, 2025

Uh oh!

taldcroft commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhvk commented Oct 12, 2025

Uh oh!

taldcroft commented Oct 12, 2025

Uh oh!

taldcroft commented Oct 12, 2025

Uh oh!

taldcroft commented Oct 12, 2025

Uh oh!

mhvk commented Oct 12, 2025

Uh oh!

mhvk commented Oct 12, 2025

Uh oh!

mhvk commented Oct 12, 2025

Uh oh!

taldcroft commented Oct 13, 2025

Uh oh!

taldcroft commented Oct 13, 2025

Uh oh!

taldcroft commented Oct 13, 2025

Uh oh!

taldcroft commented Oct 13, 2025

Uh oh!

taldcroft commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pllim commented Oct 20, 2025

Uh oh!

taldcroft commented Oct 20, 2025

Uh oh!

taldcroft commented Apr 30, 2026

Uh oh!

neutrinoceros commented Apr 30, 2026

Uh oh!

taldcroft commented Apr 30, 2026

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

taldcroft commented Apr 30, 2026

Uh oh!

neutrinoceros left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taldcroft Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

taldcroft commented Oct 9, 2025 •

edited

Loading

taldcroft commented Oct 11, 2025 •

edited

Loading

taldcroft commented Oct 11, 2025 •

edited

Loading

taldcroft commented Oct 19, 2025 •

edited

Loading

taldcroft Apr 30, 2026 •

edited

Loading

neutrinoceros Apr 30, 2026 •

edited

Loading