Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ngram_bench #49

Merged
merged 2 commits into from
May 27, 2024
Merged

fix: ngram_bench #49

merged 2 commits into from
May 27, 2024

Conversation

shenxiangzhuang
Copy link
Owner

@shenxiangzhuang shenxiangzhuang commented May 27, 2024

Summary by CodeRabbit

  • Bug Fixes

    • Fixed the use of a counter library function in the ngram benchmark.
  • New Features

    • Switched to AHash in the ngram module for improved performance.
  • Dependency Updates

    • Updated cached dependency to version 0.51.3.
    • Updated versions of regex, lazy_static, rayon, and ahash.
    • Removed counter dependency.
  • Refactor

    • Updated function signatures and imports in the ngram module for better efficiency and clarity.

@shenxiangzhuang shenxiangzhuang added documentation Improvements or additions to documentation enhancement New feature or request labels May 27, 2024
@shenxiangzhuang shenxiangzhuang self-assigned this May 27, 2024
Copy link
Contributor

coderabbitai bot commented May 27, 2024

Walkthrough

Recent updates focus on performance and dependency management. Key changes include fixing a counter function in the ngram bench, switching to AHash in the ngram module, and updating various dependencies in Cargo.toml. The ngram module now uses a more efficient hashing algorithm and has streamlined its function signatures and imports, enhancing both performance and maintainability.

Changes

File Change Summary
CHANGELOG.md Documented fixes and updates related to the ngram module and dependencies.
Cargo.toml Updated versions for cached, regex, lazy_static, rayon, and ahash. Removed counter dependency.
src/ngram.rs Removed counter::Counter import, updated function signatures, and refactored tests and benchmarks.

In the code, a shift so bright,
Dependencies take flight.
With AHash now in sight,
Performance gains ignite.
The ngram module, lean and tight,
Prepares for future fights.

Tip

New Features and Improvements

Review Settings

Introduced new personality profiles for code reviews. Users can now select between "Chill" and "Assertive" review tones to tailor feedback styles according to their preferences. The "Assertive" profile posts more comments and nitpicks the code more aggressively, while the "Chill" profile is more relaxed and posts fewer comments.

AST-based Instructions

CodeRabbit offers customizing reviews based on the Abstract Syntax Tree (AST) pattern matching. Read more about AST-based instructions in the documentation.

Community-driven AST-based Rules

We are kicking off a community-driven initiative to create and share AST-based rules. Users can now contribute their AST-based rules to detect security vulnerabilities, code smells, and anti-patterns. Please see the ast-grep-essentials repository for more information.

New Static Analysis Tools

We are continually expanding our support for static analysis tools. We have added support for biome, hadolint, and ast-grep. Update the settings in your .coderabbit.yaml file or head over to the settings page to enable or disable the tools you want to use.

Tone Settings

Users can now customize CodeRabbit to review code in the style of their favorite characters or personalities. Here are some of our favorite examples:

  • Mr. T: "You must talk like Mr. T in all your code reviews. I pity the fool who doesn't!"
  • Pirate: "Arr, matey! Ye must talk like a pirate in all yer code reviews. Yarrr!"
  • Snarky: "You must be snarky in all your code reviews. Snark, snark, snark!"

Revamped Settings Page

We have redesigned the settings page for a more intuitive layout, enabling users to find and adjust settings quickly. This change was long overdue; it not only improves the user experience but also allows our development team to add more settings in the future with ease. Going forward, the changes to .coderabbit.yaml will be reflected in the settings page, and vice versa.

Miscellaneous

  • Turn off free summarization: You can switch off free summarization of PRs opened by users not on a paid plan using the enable_free_tier setting.
  • Knowledge-base scope: You can now set the scope of the knowledge base to either the repository (local) or the organization (global) level using the knowledge_base setting. In addition, you can specify Jira project keys and Linear team keys to limit the knowledge base scope for those integrations.
  • High-level summary placement: You can now customize the location of the high-level summary in the PR description using the high_level_summary_placeholder setting (default @coderabbitai summary).
  • Revamped request changes workflow: You can now configure CodeRabbit to auto-approve or request changes on PRs based on the review feedback using the request_changes_workflow setting.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to full the review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

github-actions bot commented May 27, 2024

🐰Bencher

ReportMon, May 27, 2024 at 03:16:49 UTC
Projectbleuscore
Branchfix/ngram_bench
Testbedubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!

  • Latency (latency)

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results
BenchmarkLatencyLatency Results
nanoseconds (ns)
bleu::benchmark::bench_batch_bleu➖ (view plot)12,097,976.90
bleu::benchmark::bench_bleu➖ (view plot)247,295.15
ngram::benchmark::bench_ngram➖ (view plot)49,676.07
tokenizer::benchmark::bench_tokenizer➖ (view plot)16,209.28

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

Copy link

codecov bot commented May 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.96%. Comparing base (8785907) to head (8f16eca).

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #49      +/-   ##
==========================================
- Coverage   85.05%   83.96%   -1.09%     
==========================================
  Files           4        4              
  Lines         368      343      -25     
==========================================
- Hits          313      288      -25     
  Misses         55       55              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shenxiangzhuang shenxiangzhuang changed the title add: changelog fix: ngram_bench May 27, 2024
@shenxiangzhuang shenxiangzhuang merged commit b80988a into master May 27, 2024
15 checks passed
@shenxiangzhuang shenxiangzhuang deleted the fix/ngram_bench branch May 27, 2024 03:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 8785907 and 8f16eca.
Files selected for processing (3)
  • CHANGELOG.md (1 hunks)
  • Cargo.toml (1 hunks)
  • src/ngram.rs (3 hunks)
Files skipped from review due to trivial changes (1)
  • Cargo.toml
Additional Context Used
LanguageTool (1)
CHANGELOG.md (1)

Near line 11: There seems to be a noun/verb agreement error. Did you mean “counters” or “countered”?
Context: ...eleased] ### Fixed - ngram bench use counter lib's function rather than the truly us...
Rule ID: SINGULAR_NOUN_VERB_AGREEMENT

Markdownlint (15)
CHANGELOG.md (15)

18: Expected: 1; Actual: 2
Multiple consecutive blank lines


25: Expected: 1; Actual: 2
Multiple consecutive blank lines


10: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines


13: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines


19: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines


26: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines


27: Expected: 1; Actual: 0; Above
Headings should be surrounded by blank lines


27: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines


31: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines


11: null
Lists should be surrounded by blank lines


14: null
Lists should be surrounded by blank lines


20: null
Lists should be surrounded by blank lines


28: null
Lists should be surrounded by blank lines


32: null
Lists should be surrounded by blank lines


32: null
Files should end with a single newline character

Additional comments not posted (2)
src/ngram.rs (2)

21-21: Ensure test imports are correctly scoped.

This import is correctly scoped within the test module, which is good practice in Rust to avoid including test dependencies in the main binary.


60-60: Ensure benchmark imports are correctly scoped.

This import is correctly scoped within the benchmark module, which is good practice in Rust to avoid including benchmark dependencies in the main binary.

Comment on lines +10 to +11
### Fixed
- `ngram` bench use counter lib's function rather than the truly used function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure grammatical correctness in the changelog entry.

- `ngram` bench use counter lib's function rather than the truly used function.
+ `ngram` bench uses counter lib's function rather than the truly used function.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
### Fixed
- `ngram` bench use counter lib's function rather than the truly used function.
### Fixed
- `ngram` bench uses counter lib's function rather than the truly used function.

Address formatting issues to improve readability.

+ 
### Fixed
- `ngram` bench uses counter lib's function rather than the truly used function.
+ 
### Changed
- Use AHash in ngram module
+ 
## [0.1.2] - 2024-04-29
+ 
### Changed
- Implement ngram counts with a better way (#33)
- Rust `compute_score` function signature changed to use references (#34)
- Use rayon to run every prediction-references statistics calculation parallely (#38)
- Use ahash for better performance (#39)
+ 
## [0.1.1] - 2024-04-26
+ 
### Changed
- Upgrade `cached` version to `0.50.0`
Change crate-type to `["cdylib", "rlib"]`
+ 
## [0.1.0] - 2024-04-23
+ 
- BLEU score calculation with `Tokenizer13a` as default tokenizer.

Also applies to: 13-14, 18-20, 25-27


Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
### Fixed
- `ngram` bench use counter lib's function rather than the truly used function.
### Fixed
- `ngram` bench uses counter lib's function rather than the truly used function.
### Changed
- Use AHash in ngram module
## [0.1.2] - 2024-04-29
### Changed
- Implement ngram counts with a better way (#33)
- Rust `compute_score` function signature changed to use references (#34)
- Use rayon to run every prediction-references statistics calculation parallely (#38)
- Use ahash for better performance (#39)
## [0.1.1] - 2024-04-26
### Changed
- Upgrade `cached` version to `0.50.0`
Change crate-type to `["cdylib", "rlib"]`
## [0.1.0] - 2024-04-23
- BLEU score calculation with `Tokenizer13a` as default tokenizer.

Comment on lines +65 to +76
let tokens: Vec<String> = vec![
"a".to_string(),
"a".to_string(),
"b".to_string(),
"c".to_string(),
];
let max_order = 4;

let iter_num: usize = 100;
b.iter(|| {
std::hint::black_box(for _ in 1..=iter_num {
get_ngram_counter(line, max_order);
get_token_ngram_counter(&tokens, max_order);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize the benchmark setup.

Consider pre-allocating the tokens vector outside the benchmark loop to avoid repeated allocations during the benchmarking process.

let tokens: Vec<String> = vec![
    "a".to_string(),
    "a".to_string(),
    "b".to_string(),
    "c".to_string(),
];
let max_order = 4;
let iter_num: usize = 100;
b.iter(|| {
    std::hint::black_box(for _ in 1..=iter_num {
        get_token_ngram_counter(&tokens, max_order);
    });
});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant