-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream implementation #9
Comments
Node streams for objects come with a terrible performance bottleneck (see for instance rdfjs/N3.js#6), but iteration by itself would makes sense. However, there is a performance penalty associated with crossing the JS/C++ barrier, so the current system of limit/offset is probably as efficient as it gets. |
…but it would make sense to implement a JavaScript-land iterator interface on top of the limit/offset system, for instance such that triples are fetched in groups of 100. |
Interesting discussion there, haven't had such a horrible experience with streams myself. Things probably changed quite a bit in node-js land since that discussion. I reran your perf script on a file of mine, and these are the results: The N3 parser perf scripts in master (uses
A script I quickly wrote that uses
Ran on node v7.7.2. Ps. an iterator would be great to have as well of course ;) |
Not bad, not bad at all… I wonder how much better RubenVerborgh/AsyncIterator performs on that. In any case, a more than appropriate feature request. |
What would you think about the implementation details ruben? I was thinking about:
Any thoughts? |
I wouldn't do that, you want to reduce the number of times you cross the C/JavaScript barrier for performance reasons.
I would implement all of this in JS-land, with a stream/iterator that communicates with the C code in batches of N triples (with N for instance 100). |
I've already implemented a stream-wrapper as you describe myself, and notice a performance penalty there as well (i.e., fetching 1000 query results is cheaper than fetching 10x100 query results). |
We probably should have an optional Also note optimizations such as this one that minimize the number of JavaScript objects being allocated. If multiple triples in the same batch share the same subject (or predicate or object), then they will be the exact same in-memory string. We should carefully think about these as well. |
A quick experiment where I 1) fetch 1 million query results in 1 go (i.e. 1 million results in single callback), and 2) fetch 1 million results in batches of 100 (i.e. 10.000 results per callback):
The difference in performance greatly depends on the query. A To me, it seems that implementing this in batches (all on the js side) comes at a cost. |
Yes, but everything depends on the access pattern! What if only the first 10 results are ever used? For the scenario where we know that a) all triples will be read and b) they all fit in memory, my assumption would indeed be that fetching all is more performant. Can you try with different buffer sizes to see the effect? I'm thinking that around |
Also, the |
Agreed, it depends on the access pattern. What is your usecase only fetching the first 10 results? If you know you're interested in 10, you can just use the callback mechanism and add a limit of 10 right? In a stream scenario you're often interested in all the results (but without the memory costs of course).
It's b) that I'd like to fix here with a stream implementation. Maybe I wasn't clear, but I think we can get the best of both worlds (low memory usage and fast result streaming) by:
That would only get worse right? My example was 1 million triples divided over 100 chunks, i.e. a buffer size of 10.000. Lowering that even more would greatly degrade performance (takes ~50 seconds). Or is there something lost in translation here? ;)
True, with
|
Nah, a stream means that I can stop whenever I want. Maybe the 10th result has a characteristic that I'm interested in. Maybe my downstream stops pulling after 10 results. In general, a consumer does not always know in advance how many results to pull.
That means maintaining state though, and carefully cleaning up. You should especially be careful with keeping JavaScript objects alive (such as the strings I mentioned above as an optimization).
Indeed, avoiding backpressure is key here; but backpressure also exists when I'm slowly pulling 10 results and thousands are already in memory.
My bad, I misread. Serious performance penalty indeed. I'm surprised offsetting is not cheaper. Are we sure the problem is offsetting BTW? What is the overhead of 100 times fetching the first 10.000 results versus fetching 100 consecutive pages of 10.000 results?
Wait, isn't that result better? |
Fair enough. Still, we can simply optimize the buffer size in that case. A buffer size of 1024 still sounds reasonable, as the performance difference in fetching 10 or 1024 is negligible. Ideally, with a dynamic buffer this can be improved.
Good points
That's right, but guess we have to find a middle-ground between memory usage and performance here. And a buffer of 1024 still sounds reasonable right?
Good point. The results are 1) 950ms for fetching the first 10.000 results for 100 times, vs. 2) 5.7 seconds for fetching 100 consecutive pages of 10.000 results. (using the
Ha! Completely misread, you're right. No clue why a chunked query would perform better than a single large query. You got any ideas? |
+1
I wonder if there's anything within hdt-cpp we can do for that. Intuitively, I'm surprised that offsetting is so expensive. The
I'm not too surprised; I've seen similar results with the N3.js parser parsing a streaming file versus loading the entire file in memory first. There is a threshold at which streaming gets faster. I think the block of memory is simply becoming too large and inefficient. This is good evidence for streaming in blocks. We can even use this to find a good buffer size. |
Imo that makes it a good reason to have a good-performing stream implementation, so we're not making a bad performing query even worse.
That's an interesting observation. Some quick experiments showed that a chunk-size of 2000 is the pivot point (on node v8.4.0). |
An addition: the |
So we might even need to treat the Regarding pivot point, it would be good to set up an experiment to determine this, as it can indeed evolve between Node versions. In general, a performance suite would be good, also w.r.t. changes to hdt-cpp. |
Oops, I tested this without updating the git submodule. The new hdt-cpp version does support offsetting the So, it seems there is no need to modify the c++ code for a streaming interface, and we can simply put all the streaming logic in the js code. |
Perfect, happy to hear that! My colleague @joachimvh is testing the performance of streams, iterators, and ES6 native iterators for another project; his findings will be useful to make a decision here as well. |
@joachimvh I'm interested, did you manage to do an analysis of streams, iterators and es6 native iterators? Did you observe anything interesting? |
I remember @joachimvh working on this. What were our results? Can we submit them to WWW dev track? |
I only did some minor tests before getting distracted by something else, but first results seemed to indicate: Fastest is the standard Node.js stream if you have a datatype that can easily be put into a Buffer, such as integers. With other datatypes the problem is that if you still try to use a Buffer you somehow have to put it in there (with JSON.stringify or a custom serialization function maybe) which drastically slows it down. Asynciterator and sync generator functions had pretty much the same performance, which was a bit slower than the Buffer stream implementation for integers, but didn't increase for more complex objects. A lot slower then were the Node.js object streams (i.e. objectMode = true) and even slower were the async generator functions (tested both babel and typescript transpile). But again, these tests were not that extensive which is why I didn't want to put exact numbers on them (but might give an indication already). |
Thanks. Recently read this blog about async iterators, that might explain the bad performance of your node object streams: https://medium.com/netscape/async-iterators-these-promises-are-killing-my-performance-4767df03d85b |
@joachimvh That's good news for the generators. Any need still for https://www.npmjs.com/package/asynciterator then? |
Well they are still sync generators, so you lose the whole async advantage of asynciterator there. The moment you start having streams where you need to wait on data this would be a problem (or at least complicate things). |
Ah yes, I misread, indeed. So we have:
Let's try to finish your tests somehow so we can keep track of async generator performance as engines start to support it natively. But for the moment, I think we should stick with AsyncIterator then. |
Sorry for digging this up, came across this issue by chance. What could be interesting is transferring (parts of) the dictionary to JS first (if that is possible). Then all data could be stored in Buffer and the buffer could be much larger. |
@mielvds That would be less data to transfer, but creation of objects in JS land might (or might not!) be more expensive. |
Have you every considered a stream implementation, instead of a callback? Any interest in that direction?
The text was updated successfully, but these errors were encountered: