Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncio code error during S3 upload despite only using synchronous functions #909

Open
anastaciacastro01 opened this issue Nov 4, 2024 · 5 comments

Comments

@anastaciacastro01
Copy link

I am using s3fs to send data from SQL Server to S3. I recently started getting an error that seems to be related to some type of async process.

Here is the exact process I take:

  1. Read data from a table in SQL Server
  2. Save it to a local file
  3. Push it to S3 using the write() function of the S3File object <- this is the step where I am getting errors (described below)

Here are the exact lines of code where the error occurs (fs is the S3FileSystem object):

with open(local_name, "rb") as local:
    with fs.open(s3_path_unique, "wb") as s3_file:
        s3_file.write(local.read())

At random times and for random tables, I have been getting these errors that seem to do with a connection issue:

Exception in callback _SelectorSocketTransport._write_send()
handle: <Handle _SelectorSocketTransport._write_send()>
Traceback (most recent call last):
  File "C:\Program Files\Python\Python312\Lib\asyncio\events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "C:\Program Files\Python\Python312\Lib\asyncio\selector_events.py", line 1137, in _write_send
    assert self._buffer, 'Data should not be empty'
           ^^^^^^^^^^^^
AssertionError: Data should not be empty
Future exception was never retrieved
future: <Future finished exception=ConnectionError('Connection lost')>
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
 
The above exception was the direct cause of the following exception:
 
ConnectionError: Connection lost

I am currently working on solving the ConnectionError, but can anyone help me understand these other two questions?

  1. Where is the asyncio code used in s3fs? I am not using asyncio in my code and for some reason have not been able to catch these errors with a try/except structure.
  2. Is there some type of timeout parameter that I can adjust in s3fs to reduce the likelihood of a connection error?

Thanks in advance!

@martindurant
Copy link
Member

First, a clarification: some fsspec implementations like s3fs are implemented async internally. The API that you are calling is blocking, but an event loop is running on another thread which actually executes the IO. This means that batch operations can happen without serialising the latency wait for each one.

Second: your actual error is the 'Data should not be empty' one. This rings a bell, it sounds like something that was fixed earlier this year. What version of s3fs do you have?

@anastaciacastro01
Copy link
Author

anastaciacastro01 commented Nov 4, 2024

Thank you!

My current version is 2024.3.1

-- Edit: I updated the version and will let you know if I am still running into issues in the next few days

@anastaciacastro01
Copy link
Author

I'm still running into the same error after updating the version. The error itself is probably related to the server, then.

Is there a way to at least catch the error, though? Right now, it's not going to the except block.

@martindurant
Copy link
Member

Please post your whole traceback, so we can see where this is being called from.

@anastaciacastro01
Copy link
Author

That is the whole traceback that I see. I just see the first error over and over again, and occasionally, the second error shows up once at the bottom.

I did end up making a change to use put_file() instead, and that has seemed to solve the issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants