Scalability issues #82
BellmannRichard
started this conversation in
General
Replies: 2 comments 1 reply
-
Hi @BellmannRichard |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hey @BellmannRichard, as @yzaparto we only send the first 5 records of the table. The only scalability issue comes with table with many columns, I'll turn this in a discussion, let's see if we manage to find a solution! |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
You clearly have to pass the whole data frame to the OAI API. Even for small data frames (hundreds of rows, dozens of columns) this could easily fill up a 4096 context, or make users spend a lot of money. You should compute the number of tokens before you make the API call, and it’s that over some threshold, warn the user.
Also, this will clearly not scale to the size of the datasets used in the industry. Try a random dataset with 10000 rows and 100 columns for example. If it doesn’t work (as I expect) consider testing some fix, such as maybe split the di in chunks, summarize them and use the summaries to answer the research question. Summaries will most likely mess up the floating point numbers, though. All in all, I don’t see how this can work even for medium-sized dataframes
Beta Was this translation helpful? Give feedback.
All reactions