Using read_csv within Databricks to open a local file #2177
-
I have imported some code from pandas to Databricks/Koalas. My read_csv statement does not work because the file is local to my computer, not within the Databricks file system (DBFS). I feel like I am missing something obvious. I want my pandas code to work on Databricks/Koalas with minor changes. I know I can use the Databricks GUI point-and-click to create a DBFS table, then make a DataFrame from the table, but that is not programmatic and is a poor solution if I have hundreds of local files. `import databricks.koalas as ks java.io.IOException: No FileSystem for scheme: C` |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
It should be either canonical URL (e.g., |
Beta Was this translation helpful? Give feedback.
-
Can you see if it works with plain PySpark? |
Beta Was this translation helpful? Give feedback.
-
Additionally to #2177 (comment), this is just not going to work ‒ internally, koalas/databricks/koalas/namespace.py Line 282 in 4d14f37 so the same restriction apply. Any path you want to read has to be accessible to every Spark worker in your cluster and that's really not the case when you use your local file system. In general, you should migrate your data to distributed file storage first ‒ in case of DBFS, the official CLI should do the trick.
|
Beta Was this translation helpful? Give feedback.
Additionally to #2177 (comment), this is just not going to work ‒ internally,
read_*
methods use standard Spark data sourceskoalas/databricks/koalas/namespace.py
Line 282 in 4d14f37
so the same restriction apply. Any path you want to read has to be accessible to every Spark worker in your cluster and that's really not the case when you use your local file system.
In general, you should migrate your data to distributed file storage first ‒ in case of DBFS, the official CLI should do the trick.