Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

session: skip creating indexes on the analyze_jobs table for older clusters #58608

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Rustin170506
Copy link
Member

What problem does this PR solve?

Issue Number: close #57996

Problem Summary:

What changed and how does it work?

I test #58134 again locally to evaluate the performance of creating indexes. For 100k rows, it takes 16 seconds to create the indexes, although it is not that slow, but it still takes some time. So I decided to undo part of this change. We will only create the new indexes for the new cluster. And we do not create the index for the old clusters during the upgrade process.

Normally, for the smaller cluster, this should not be a problem. But for some huge clusters, we can ask users to manually create it instead of blocking the upgrade process.

> ALTER TABLE mysql.analyze_jobs ADD INDEX `idx_schema_table_state` (`table_schema`, `table_name`, `state`)
[2024-12-30 14:17:40] completed in 5 s 755 ms
> ALTER TABLE mysql.analyze_jobs ADD INDEX `idx_schema_table_partition_state` (`table_schema`, `table_name`, `partition_name`, `state`)
[2024-12-30 14:17:52] completed in 11 s 860 ms

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

…usters

Signed-off-by: Rustin170506 <techregister@pm.me>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 30, 2024
Signed-off-by: Rustin170506 <techregister@pm.me>
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 30, 2024
Signed-off-by: Rustin170506 <techregister@pm.me>
Copy link

codecov bot commented Dec 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.6977%. Comparing base (3ac2b49) to head (7a52e88).
Report is 5 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #58608        +/-   ##
================================================
+ Coverage   73.5500%   74.6977%   +1.1477%     
================================================
  Files          1680       1695        +15     
  Lines        464730     464785        +55     
================================================
+ Hits         341809     347184      +5375     
+ Misses       102055      96025      -6030     
- Partials      20866      21576       +710     
Flag Coverage Δ
integration 46.0155% <ø> (?)
unit 72.2857% <ø> (-0.0134%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.6910% <ø> (ø)
parser ∅ <ø> (∅)
br 61.6686% <ø> (+15.9031%) ⬆️

@Rustin170506
Copy link
Member Author

/retest

Copy link
Member Author

@Rustin170506 Rustin170506 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔢 Self-check (PR reviewed by myself and ready for feedback.)

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Dec 30, 2024
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Dec 30, 2024
Copy link

ti-chi-bot bot commented Dec 30, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-12-30 10:28:30.856884013 +0000 UTC m=+346246.212888551: ☑️ agreed by winoros.
  • 2024-12-30 10:53:21.606136303 +0000 UTC m=+347736.962140842: ☑️ agreed by time-and-fate.

@Rustin170506 Rustin170506 requested review from Leavrth and removed request for D3Hunter December 31, 2024 06:19
Comment on lines -3351 to -3352
doReentrantDDL(s, addAnalyzeJobsSchemaTableStateIndex, dbterror.ErrDupKeyName)
doReentrantDDL(s, addAnalyzeJobsSchemaTablePartitionStateIndex, dbterror.ErrDupKeyName)
Copy link
Contributor

@D3Hunter D3Hunter Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally, i don't like the idea that for two clusters of same version, but with different schema. It introduce more maintenance burden, if I don't know this PR, and meet a issue and see this difference, I will take it as a bug at first glance.

for a production cluster with so many tables, say 1M, the upgrade duration is quite long in most cases, not just the process of TiDB upgrade, also for rolling upgrade of other components, it's even longer when there are many online traffic, takes hours or even days. I think it's acceptable for this add-index to be slower, and it's the tradeoff we have to make

If we can reduce the size of this table, or the index create faster, that would be better, certainly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally, i don't like the idea that for two clusters of same version, but with different schema. It introduce more maintenance burden, if I don't know this PR, and meet a issue and see this difference, I will take it as a bug at first glance.

Yes. I agree that having different schemas is annoying. But for most users the full table scan is OK. So introducing the prototail risk to slow down the upgrade is not worth it. We don't want to introduce that risk for users that 99% of them don't have the problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also asked @tangenta, he suggested that it is better not to do this kind of operation for a volumetric table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

99% of them don't have the problem.

99% users don't have 1M tables, so upgrade is fast even with the index. this PR is for the 1%

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

I am not in a hurry with this PR. So let's discuss it further.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

99% users don't have 1M tables, so upgrade is fast even with the index. this PR is for the 1%

My point here is that 99% of users do not have 1M tables, so there is no need to add this index for them to bring the potential risk for them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMU, 99% of users do not have 1M tables -> 99% users don't have 100K rows -> add-index very fast -> no such risk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also need to add new indexes on tables like mysql.stats_histograms.
This table is related to column count and index count. It's more likely to be a big table.
So the problem will still exist.

I think we can add the related operation to our upgrading guide.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add the related operation to our upgrading guide.

you mean ask users to manually create index on system table? system table should better be managed by TiDB itself IMO, and most production cluster have very strict permission control, asking DBA or others with root permission to manage what TiDB itself should done, not sure how much they will buy this idea.

Copy link

ti-chi-bot bot commented Dec 31, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Leavrth, time-and-fate, winoros, yudongusa

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added approved do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mysql.analyze_jobs missed indexes
6 participants