You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I need to implement the FUSE getattr (stat) callback. I.e., I need to get at least the file type and size, and possibly name for a given path.
I am failing to do this with the HTTP filesystem implementation because:
info(path) always returns the file information for the HTML file, i.e., the file type is also always a file. This is already inconsistent to all other fsspec implementations. The same for isfile, which always returns true.
isdir(path) hangs and when looking at my local HTTP server log or at my network bandwidth when testing with an external server, I see that this call downloads the whole file. This means that currently an ls -la will download all files in the given folder...
Imho, isdir should be implemented via a listdir to the parent if there is no other way. I am also wondering what it does check. Is it simply doing a mimetype check whether it is HTML? If so, then the first 1000 or so bytes would suffice. But then, wouldn't it detect arbitrary HTML files inside a given "folder" wrongly as a folder?
My current workaround is to call info first and only call isdir if mimetype is text/html. This logic could also be implemented in HTTPFileSystem if there is no better way.
The text was updated successfully, but these errors were encountered:
The way that ls works for HTTP, is to download the URL/page, and look for links to URLs that look like children of the original page. This works well for "ftp-style" servers (like python -m http.server). That's the only way to know if a URL is a directory.
I guess, it would be reasonable to shortcut isdir to return False for ANY URL that isn't HTML?
I suppose the shortcut would be in info or ls actually - would you like to have a go at coding that? Before calling .texthere, checking the content-type of r and returning nothing for any request that isn't HTML should avoid the download.
I need to implement the FUSE getattr (stat) callback. I.e., I need to get at least the file type and size, and possibly name for a given path.
I am failing to do this with the HTTP filesystem implementation because:
info(path)
always returns the file information for the HTML file, i.e., the file type is also always a file. This is already inconsistent to all other fsspec implementations. The same forisfile
, which always returns true.isdir(path)
hangs and when looking at my local HTTP server log or at my network bandwidth when testing with an external server, I see that this call downloads the whole file. This means that currently anls -la
will download all files in the given folder...Test to reproduce:
Output:
Imho, isdir should be implemented via a listdir to the parent if there is no other way. I am also wondering what it does check. Is it simply doing a mimetype check whether it is HTML? If so, then the first 1000 or so bytes would suffice. But then, wouldn't it detect arbitrary HTML files inside a given "folder" wrongly as a folder?
My current workaround is to call
info
first and only callisdir
ifmimetype
istext/html
. This logic could also be implemented inHTTPFileSystem
if there is no better way.The text was updated successfully, but these errors were encountered: