Replies: 1 comment 1 reply
-
Whether bit torrent or not, the bits and bytes will have to move from one place to another. The good thing about using something like bit torrent is that it offloads the traffic from NVFLARE and leave it for someone else to solve. The bad thing is that IT folks have another dependency to worry about, and another system to keep eyes on. The good news is that NVFLARE does not dictate either way - you can write your executors to upload result to another system and only include a link in the execution result, and write your controller to download the content from that system. Of course you will also have to deal with error conditions in case the upload/download fails. |
Beta Was this translation helpful? Give feedback.
-
Something @yanchengnv mentioned in another thread got me thinking about how the data layer is actually probably the biggest bottleneck in this system and wondering how to solve that.
When I say data layer I am only referring to data that is actively transferred between nodes. Not private training data.
I see a few bottlenecks:
a) all data must go through the server
b) the server could hit memory errors if too many clients respond at once
When a) might become a problem is if you wanted to train a separate model on each client, and then scatter the weights of any client to all clients. This will then require either synchronous transfer (server gets weights from client 1, server sends weights to all clients, server gets weights from client 2, server sends weights to all clients, ...etc) which is very slow or for all the clients to send their data at once and the server's memory must be large enough to contain that (if possible..and also slow)
There should be a way to pass around data references rather than the data itself and then to fetch those results rather than have them pushed.
My first thought here is that something like bit torrent would be a good approach, a bit torrent style approach would be good in a many to many weight sharing scenario since clients would be able to send data amongst themselves freeing up the server of the responsibility. Though maybe not feasible in some scenarios.
Probably the solution should just provide a simple interface
and allow arbitrary backends. eg S3, NFS, bittorrent
Beta Was this translation helpful? Give feedback.
All reactions