You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 13, 2026. It is now read-only.
I have started to write a patch, which could be integrated into an enhancement for read_gbq (rough idea, details TBD):
Provide boolean optimize_memory option
If True, the source table is inspected with a query to get min, max, presence of nulls and % of unique number of strings for INTEGER and STRING columns, respectively
When calling to_dataframe this information is passed to the dtypes option, downcasting integers to the appropriate numpy (u)int type, and converting strings to pandas category type at some threshold (less than 50% of unique values)
I already have a working monkey-patch, which is still a bit rough. If there is enough interest I'd happily make it more robust and submit a PR. Would be my first significant contribution to an open source project, so some help and feedback would be appreciated.
We use pandas-gbq a lot for our daily analyses. It is known that memory consumption can be a pain, see e.g. https://www.dataquest.io/blog/pandas-big-data/
I have started to write a patch, which could be integrated into an enhancement for
read_gbq(rough idea, details TBD):optimize_memoryoptionTrue, the source table is inspected with a query to get min, max, presence of nulls and % of unique number of strings for INTEGER and STRING columns, respectivelyto_dataframethis information is passed to thedtypesoption, downcasting integers to the appropriate numpy (u)int type, and converting strings to pandascategorytype at some threshold (less than 50% of unique values)I already have a working monkey-patch, which is still a bit rough. If there is enough interest I'd happily make it more robust and submit a PR. Would be my first significant contribution to an open source project, so some help and feedback would be appreciated.
Curious to hear your views on this.