As pointed out in #9457, creating a GAPIC client and not closing the client's transport's channel before letting the client get garbage collected means we leak sockets / file descriptors.
Steps to reproduce
- Start a Jupyter Notebook (or launch the code example with ipython).
- Load the magics with
%load_ext google.cloud.bigquery
- Run a
%%bigquery magic command.
- Observe with
psutil that open connections are not closed.
Code example
Notebook as Markdown:
import psutil
from google.cloud import bigquery
current_process = psutil.Process()
num_conns = len(current_process.connections())
print("connections before loading magics: {}".format(num_conns))
connections before loading magics: 12
%load_ext google.cloud.bigquery
num_conns = len(current_process.connections())
print("connections after loading magics: {}".format(num_conns))
connections after loading magics: 12
%%bigquery --use_bqstorage_api
SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 15
|
year |
birth_count |
| 0 |
2008 |
4255156 |
| 1 |
2007 |
4324008 |
| 2 |
2006 |
4273225 |
| 3 |
2005 |
4145619 |
| 4 |
2004 |
4118907 |
| 5 |
2003 |
4096092 |
| 6 |
2002 |
4027376 |
| 7 |
2001 |
4031531 |
| 8 |
2000 |
4063823 |
| 9 |
1999 |
3963465 |
| 10 |
1998 |
3945192 |
| 11 |
1997 |
3884329 |
| 12 |
1996 |
3894874 |
| 13 |
1995 |
3903012 |
| 14 |
1994 |
3956925 |
num_conns = len(current_process.connections())
print("connections after running magics: {}".format(num_conns))
connections after running magics: 16
%%bigquery --use_bqstorage_api
SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 15
|
year |
birth_count |
| 0 |
2008 |
4255156 |
| 1 |
2007 |
4324008 |
| 2 |
2006 |
4273225 |
| 3 |
2005 |
4145619 |
| 4 |
2004 |
4118907 |
| 5 |
2003 |
4096092 |
| 6 |
2002 |
4027376 |
| 7 |
2001 |
4031531 |
| 8 |
2000 |
4063823 |
| 9 |
1999 |
3963465 |
| 10 |
1998 |
3945192 |
| 11 |
1997 |
3884329 |
| 12 |
1996 |
3894874 |
| 13 |
1995 |
3903012 |
| 14 |
1994 |
3956925 |
num_conns = len(current_process.connections())
print("connections after running magics: {}".format(num_conns))
connections after running magics: 20
Full example:
import psutil
from google.cloud import bigquery
current_process = psutil.Process()
num_conns = len(current_process.connections())
print("connections before loading magics: {}".format(num_conns))
get_ipython().run_line_magic('load_ext', 'google.cloud.bigquery')
num_conns = len(current_process.connections())
print("connections after loading magics: {}".format(num_conns))
get_ipython().run_cell_magic('bigquery', '--use_bqstorage_api', 'SELECT\n source_year AS year,\n COUNT(is_male) AS birth_count\nFROM `bigquery-public-data.samples.natality`\nGROUP BY year\nORDER BY year DESC\nLIMIT 15')
num_conns = len(current_process.connections())
print("connections after running magics: {}".format(num_conns))
get_ipython().run_cell_magic('bigquery', '--use_bqstorage_api', 'SELECT\n source_year AS year,\n COUNT(is_male) AS birth_count\nFROM `bigquery-public-data.samples.natality`\nGROUP BY year\nORDER BY year DESC\nLIMIT 15')
num_conns = len(current_process.connections())
print("connections after running magics: {}".format(num_conns))
Stack trace
N/A
Suggested fix
As identified in #9457, we need to close the bqstorage_client.transport.channel, since we create a new BQ Storage client each time.
I suggest we also add psutil as a test-only dependency and verify in a system test of google.cloud.bigquery.magics._cell_magic that there are no additional open connections after running the cell magic.
As pointed out in #9457, creating a GAPIC client and not closing the client's transport's channel before letting the client get garbage collected means we leak sockets / file descriptors.
Steps to reproduce
%load_ext google.cloud.bigquery%%bigquerymagic command.psutilthat open connections are not closed.Code example
Notebook as Markdown:
Full example:
Stack trace
N/A
Suggested fix
As identified in #9457, we need to close the
bqstorage_client.transport.channel, since we create a new BQ Storage client each time.I suggest we also add
psutilas a test-only dependency and verify in a system test ofgoogle.cloud.bigquery.magics._cell_magicthat there are no additional open connections after running the cell magic.