fix: ngram_bench #49

shenxiangzhuang · 2024-05-27T03:13:22Z

Summary by CodeRabbit

Bug Fixes
- Fixed the use of a counter library function in the ngram benchmark.
New Features
- Switched to AHash in the ngram module for improved performance.
Dependency Updates
- Updated cached dependency to version 0.51.3.
- Updated versions of regex, lazy_static, rayon, and ahash.
- Removed counter dependency.
Refactor
- Updated function signatures and imports in the ngram module for better efficiency and clarity.

coderabbitai · 2024-05-27T03:13:28Z

Walkthrough

Recent updates focus on performance and dependency management. Key changes include fixing a counter function in the ngram bench, switching to AHash in the ngram module, and updating various dependencies in Cargo.toml. The ngram module now uses a more efficient hashing algorithm and has streamlined its function signatures and imports, enhancing both performance and maintainability.

Changes

File	Change Summary
`CHANGELOG.md`	Documented fixes and updates related to the `ngram` module and dependencies.
`Cargo.toml`	Updated versions for `cached`, `regex`, `lazy_static`, `rayon`, and `ahash`. Removed `counter` dependency.
`src/ngram.rs`	Removed `counter::Counter` import, updated function signatures, and refactored tests and benchmarks.

In the code, a shift so bright,
Dependencies take flight.
With AHash now in sight,
Performance gains ignite.
The ngram module, lean and tight,
Prepares for future fights.

Tip

New Features and Improvements

Review Settings

Introduced new personality profiles for code reviews. Users can now select between "Chill" and "Assertive" review tones to tailor feedback styles according to their preferences. The "Assertive" profile posts more comments and nitpicks the code more aggressively, while the "Chill" profile is more relaxed and posts fewer comments.

AST-based Instructions

CodeRabbit offers customizing reviews based on the Abstract Syntax Tree (AST) pattern matching. Read more about AST-based instructions in the documentation.

Community-driven AST-based Rules

We are kicking off a community-driven initiative to create and share AST-based rules. Users can now contribute their AST-based rules to detect security vulnerabilities, code smells, and anti-patterns. Please see the ast-grep-essentials repository for more information.

New Static Analysis Tools

We are continually expanding our support for static analysis tools. We have added support for biome, hadolint, and ast-grep. Update the settings in your .coderabbit.yaml file or head over to the settings page to enable or disable the tools you want to use.

Tone Settings

Users can now customize CodeRabbit to review code in the style of their favorite characters or personalities. Here are some of our favorite examples:

Mr. T: "You must talk like Mr. T in all your code reviews. I pity the fool who doesn't!"
Pirate: "Arr, matey! Ye must talk like a pirate in all yer code reviews. Yarrr!"
Snarky: "You must be snarky in all your code reviews. Snark, snark, snark!"

Revamped Settings Page

We have redesigned the settings page for a more intuitive layout, enabling users to find and adjust settings quickly. This change was long overdue; it not only improves the user experience but also allows our development team to add more settings in the future with ease. Going forward, the changes to .coderabbit.yaml will be reflected in the settings page, and vice versa.

Miscellaneous

Turn off free summarization: You can switch off free summarization of PRs opened by users not on a paid plan using the enable_free_tier setting.
Knowledge-base scope: You can now set the scope of the knowledge base to either the repository (local) or the organization (global) level using the knowledge_base setting. In addition, you can specify Jira project keys and Linear team keys to limit the knowledge base scope for those integrations.
High-level summary placement: You can now customize the location of the high-level summary in the PR description using the high_level_summary_placeholder setting (default @coderabbitai summary).
Revamped request changes workflow: You can now configure CodeRabbit to auto-approve or request changes on PRs based on the review feedback using the request_changes_workflow setting.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to full the review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

github-actions · 2024-05-27T03:14:15Z

Bencher

Report	Mon, May 27, 2024 at 03:16:49 UTC
Project	bleuscore
Branch	fix/ngram_bench
Testbed	ubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!
Latency (latency)
Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results

Benchmark	Latency	Latency Results nanoseconds (ns)
bleu::benchmark::bench_batch_bleu	➖ (view plot)	12,097,976.90
bleu::benchmark::bench_bleu	➖ (view plot)	247,295.15
ngram::benchmark::bench_ngram	➖ (view plot)	49,676.07
tokenizer::benchmark::bench_tokenizer	➖ (view plot)	16,209.28

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

codecov · 2024-05-27T03:14:30Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.96%. Comparing base (8785907) to head (8f16eca).

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #49      +/-   ##
==========================================
- Coverage   85.05%   83.96%   -1.09%     
==========================================
  Files           4        4              
  Lines         368      343      -25     
==========================================
- Hits          313      288      -25     
  Misses         55       55

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 3

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8785907 and 8f16eca.

Files selected for processing (3)

CHANGELOG.md (1 hunks)
Cargo.toml (1 hunks)
src/ngram.rs (3 hunks)

Files skipped from review due to trivial changes (1)

Cargo.toml

Additional Context Used

LanguageTool (1)

CHANGELOG.md (1)

Near line 11: There seems to be a noun/verb agreement error. Did you mean “counters” or “countered”?
Context: ...eleased] ### Fixed - ngram bench use counter lib's function rather than the truly us...
Rule ID: SINGULAR_NOUN_VERB_AGREEMENT

Markdownlint (15)

CHANGELOG.md (15)

18: Expected: 1; Actual: 2
Multiple consecutive blank lines

25: Expected: 1; Actual: 2
Multiple consecutive blank lines

10: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

13: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

19: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

26: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

27: Expected: 1; Actual: 0; Above
Headings should be surrounded by blank lines

27: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

31: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

11: null
Lists should be surrounded by blank lines

14: null
Lists should be surrounded by blank lines

20: null
Lists should be surrounded by blank lines

28: null
Lists should be surrounded by blank lines

32: null
Lists should be surrounded by blank lines

32: null
Files should end with a single newline character

Additional comments not posted (2)

src/ngram.rs (2)

21-21: Ensure test imports are correctly scoped.

This import is correctly scoped within the test module, which is good practice in Rust to avoid including test dependencies in the main binary.

60-60: Ensure benchmark imports are correctly scoped.

This import is correctly scoped within the benchmark module, which is good practice in Rust to avoid including benchmark dependencies in the main binary.

coderabbitai · 2024-05-27T03:18:11Z

CHANGELOG.md

+### Fixed
+- `ngram` bench use counter lib's function rather than the truly used function.


Ensure grammatical correctness in the changelog entry.

- `ngram` bench use counter lib's function rather than the truly used function. + `ngram` bench uses counter lib's function rather than the truly used function.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

### Fixed

- `ngram` bench use counter lib's function rather than the truly used function.

### Fixed

- `ngram` bench uses counter lib's function rather than the truly used function.

Address formatting issues to improve readability.

+ ### Fixed - `ngram` bench uses counter lib's function rather than the truly used function. + ### Changed - Use AHash in ngram module + ## [0.1.2] - 2024-04-29 + ### Changed - Implement ngram counts with a better way (#33) - Rust `compute_score` function signature changed to use references (#34) - Use rayon to run every prediction-references statistics calculation parallely (#38) - Use ahash for better performance (#39) + ## [0.1.1] - 2024-04-26 + ### Changed - Upgrade `cached` version to `0.50.0` Change crate-type to `["cdylib", "rlib"]` + ## [0.1.0] - 2024-04-23 + - BLEU score calculation with `Tokenizer13a` as default tokenizer.

Also applies to: 13-14, 18-20, 25-27

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

### Fixed

- `ngram` bench use counter lib's function rather than the truly used function.

### Fixed

- `ngram` bench uses counter lib's function rather than the truly used function.

### Changed

- Use AHash in ngram module

## [0.1.2] - 2024-04-29

### Changed

- Implement ngram counts with a better way (#33)

- Rust `compute_score` function signature changed to use references (#34)

- Use rayon to run every prediction-references statistics calculation parallely (#38)

- Use ahash for better performance (#39)

## [0.1.1] - 2024-04-26

### Changed

- Upgrade `cached` version to `0.50.0`

Change crate-type to `["cdylib", "rlib"]`

## [0.1.0] - 2024-04-23

- BLEU score calculation with `Tokenizer13a` as default tokenizer.

coderabbitai · 2024-05-27T03:18:11Z

src/ngram.rs

+        let tokens: Vec<String> = vec![
+            "a".to_string(),
+            "a".to_string(),
+            "b".to_string(),
+            "c".to_string(),
+        ];
        let max_order = 4;

        let iter_num: usize = 100;
        b.iter(|| {
            std::hint::black_box(for _ in 1..=iter_num {
-                get_ngram_counter(line, max_order);
+                get_token_ngram_counter(&tokens, max_order);


Optimize the benchmark setup.

Consider pre-allocating the tokens vector outside the benchmark loop to avoid repeated allocations during the benchmarking process.

let tokens: Vec<String> = vec![ "a".to_string(), "a".to_string(), "b".to_string(), "c".to_string(), ]; let max_order = 4; let iter_num: usize = 100; b.iter(|| { std::hint::black_box(for _ in 1..=iter_num { get_token_ngram_counter(&tokens, max_order); }); });

add: changelog

99423a6

shenxiangzhuang added documentation Improvements or additions to documentation enhancement New feature or request labels May 27, 2024

shenxiangzhuang self-assigned this May 27, 2024

fix: cargo fmt

8f16eca

shenxiangzhuang changed the title ~~add: changelog~~ fix: ngram_bench May 27, 2024

shenxiangzhuang merged commit b80988a into master May 27, 2024
15 checks passed

shenxiangzhuang deleted the fix/ngram_bench branch May 27, 2024 03:17

coderabbitai bot reviewed May 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ngram_bench #49

fix: ngram_bench #49

shenxiangzhuang commented May 27, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 27, 2024 •

edited

Loading

Review Settings

AST-based Instructions

Community-driven AST-based Rules

New Static Analysis Tools

Tone Settings

Revamped Settings Page

Miscellaneous

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

Documentation and Community

github-actions bot commented May 27, 2024 •

edited

Loading

codecov bot commented May 27, 2024 •

edited

Loading

coderabbitai bot left a comment

coderabbitai bot May 27, 2024

coderabbitai bot May 27, 2024

		### Fixed
		- `ngram` bench use counter lib's function rather than the truly used function.

-### Fixed
-- `ngram` bench use counter lib's function rather than the truly used function.
+### Fixed
+- `ngram` bench uses counter lib's function rather than the truly used function.
+### Changed
+- Use AHash in ngram module
+## [0.1.2] - 2024-04-29
+### Changed
+- Implement ngram counts with a better way (#33)
+- Rust `compute_score` function signature changed to use references (#34)
+- Use rayon to run every prediction-references statistics calculation parallely (#38)
+- Use ahash for better performance (#39)
+## [0.1.1] - 2024-04-26
+### Changed
+- Upgrade `cached` version to `0.50.0`
+Change crate-type to `["cdylib", "rlib"]`
+## [0.1.0] - 2024-04-23
+- BLEU score calculation with `Tokenizer13a` as default tokenizer.

fix: ngram_bench #49

fix: ngram_bench #49

Conversation

shenxiangzhuang commented May 27, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented May 27, 2024 • edited Loading

Walkthrough

Changes

Review Settings

AST-based Instructions

Community-driven AST-based Rules

New Static Analysis Tools

Tone Settings

Revamped Settings Page

Miscellaneous

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

github-actions bot commented May 27, 2024 • edited Loading

Bencher

codecov bot commented May 27, 2024 • edited Loading

Codecov Report

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot May 27, 2024

Choose a reason for hiding this comment

coderabbitai bot May 27, 2024

Choose a reason for hiding this comment

shenxiangzhuang commented May 27, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 27, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)

github-actions bot commented May 27, 2024 •

edited

Loading

codecov bot commented May 27, 2024 •

edited

Loading