-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cut ripgrep some slack in benchmarks ;-), document how to run benchmarks #104
Comments
@BurntSushi : In early results, it does looks like you're beating me with the '-u': | Test Case | built_ucg | inst_ucg | inst_ag | inst_ripgrep | inst_pcre2grep | inst_system_grep | inst_gnu_grep_e | ... and tables don't work in Github issue comments, great. :-/ Anyway, this is your second benchmark, PM_RESUME against the built linux tree. ucg == 0.267, rg -u == 0.181. Except something's not right, you and everyone else are getting 5 hits, I'm getting 11 hits. As the wise Mark Freuder Knopfler, OBE once said, "Two men say they're Jesus/One of them must be wrong", and I'm guessing that may be me in this instance... ...well, wait again. I'm actually getting 5 hits, but also detecting 6 recursive directory loops due to symlinks (which are mistakenly being counted as hits). You're doing a physical traversal right? I'm defaulting to logical. |
Yeah, my impression is that standard behavior is to not follow symlinks, so ripgrep won't do it by default. If you pass |
Also, I'm kind of surprised at how slow |
Yeah, the Fedora 24 numbers are on Virtual Box. If github supported tables, I could post the system info my benchmark suite obtains for you here. I'm working on getting the results into HTML form suitable for posting (graphs and everything), but I'm not quite there yet. Let me try the table for that specific system: Test System Details Parameter Value I guess that's almost readable. Like I said before, it's way past time for me to get a new rig. Never enough round tuits..... |
Yeah, in my testing the silver searcher does much worse in a virtual machine than on a native system, and my current hypothesis is because of memory maps. You can test it for yourself by passing the (That's not to say it invalidates your benchmark. Running these tools in a VM is a perfectly common and legitimate use case. But it's probably important to acknowledge or at least understand.) |
Yep, I've done the experiments too ( Similar topic: Have you tried asynchronous I/O for reading in the files? I have not, but I'm curious if that's any better than just a read() loop. |
@gvansickle I've always heard pretty terrible things about async I/O on Linux, so I've never tried it. See: http://stackoverflow.com/questions/8513663/linux-disk-file-aio Note that ripgrep does I/O differently from ucg. When it doesn't use memory maps, it reads incrementally. I think ucg just slurps the entire file in at once and then searches it, right? |
Right. I try to read the entire file with one read() call. Honestly I was a bit surprised that worked as well as it does. Again it's probably due to the use-case: mostly smallish files. I should gather statistics on that..... |
Me too. :P It actually holds up pretty well even on largeish files too. (Look at the subtitle benchmarks.) |
@BurntSushi : I just updated the one benchmark in the README.md. Sorry it took so long (in more ways than one: now you're winning! ;-)) |
@BurntSushi rightly reported that the 0.3.0 benchmarks do not pass '-u' to rg, thus giving the other utilities which don't look at .gitignore files an arguable unfair advantage. Address this.
Also, document how to obtain the corpi and run the benchmarks. This was slated for 0.3.0 (in my best intentions at least), but didn't make it. Maybe add rg's Linux corpus into the mix as well, it's a more typical use-case.
The text was updated successfully, but these errors were encountered: