session: skip creating indexes on the analyze_jobs table for older clusters #58608

Rustin170506 · 2024-12-30T06:31:39Z

What problem does this PR solve?

Issue Number: close #57996

Problem Summary:

What changed and how does it work?

I test #58134 again locally to evaluate the performance of creating indexes. For 100k rows, it takes 16 seconds to create the indexes, although it is not that slow, but it still takes some time. So I decided to undo part of this change. We will only create the new indexes for the new cluster. And we do not create the index for the old clusters during the upgrade process.

Normally, for the smaller cluster, this should not be a problem. But for some huge clusters, we can ask users to manually create it instead of blocking the upgrade process.

> ALTER TABLE mysql.analyze_jobs ADD INDEX `idx_schema_table_state` (`table_schema`, `table_name`, `state`)
[2024-12-30 14:17:40] completed in 5 s 755 ms
> ALTER TABLE mysql.analyze_jobs ADD INDEX `idx_schema_table_partition_state` (`table_schema`, `table_name`, `partition_name`, `state`)
[2024-12-30 14:17:52] completed in 11 s 860 ms

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

…usters Signed-off-by: Rustin170506 <techregister@pm.me>

Signed-off-by: Rustin170506 <techregister@pm.me>

codecov · 2024-12-30T07:10:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.6977%. Comparing base (3ac2b49) to head (7a52e88).
Report is 5 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #58608        +/-   ##
================================================
+ Coverage   73.5500%   74.6977%   +1.1477%     
================================================
  Files          1680       1695        +15     
  Lines        464730     464785        +55     
================================================
+ Hits         341809     347184      +5375     
+ Misses       102055      96025      -6030     
- Partials      20866      21576       +710

Flag	Coverage Δ
integration	`46.0155% <ø> (?)`
unit	`72.2857% <ø> (-0.0134%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`52.6910% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`61.6686% <ø> (+15.9031%)`	⬆️

Rustin170506 · 2024-12-30T07:37:09Z

/retest

Rustin170506

🔢 Self-check (PR reviewed by myself and ready for feedback.)

ti-chi-bot · 2024-12-30T10:53:22Z

[LGTM Timeline notifier]

Timeline:

2024-12-30 10:28:30.856884013 +0000 UTC m=+346246.212888551: ☑️ agreed by winoros.
2024-12-30 10:53:21.606136303 +0000 UTC m=+347736.962140842: ☑️ agreed by time-and-fate.

D3Hunter · 2024-12-31T06:20:14Z

pkg/session/bootstrap.go

-	doReentrantDDL(s, addAnalyzeJobsSchemaTableStateIndex, dbterror.ErrDupKeyName)
-	doReentrantDDL(s, addAnalyzeJobsSchemaTablePartitionStateIndex, dbterror.ErrDupKeyName)


personally, i don't like the idea that for two clusters of same version, but with different schema. It introduce more maintenance burden, if I don't know this PR, and meet a issue and see this difference, I will take it as a bug at first glance.

for a production cluster with so many tables, say 1M, the upgrade duration is quite long in most cases, not just the process of TiDB upgrade, also for rolling upgrade of other components, it's even longer when there are many online traffic, takes hours or even days. I think it's acceptable for this add-index to be slower, and it's the tradeoff we have to make

If we can reduce the size of this table, or the index create faster, that would be better, certainly.

personally, i don't like the idea that for two clusters of same version, but with different schema. It introduce more maintenance burden, if I don't know this PR, and meet a issue and see this difference, I will take it as a bug at first glance.

Yes. I agree that having different schemas is annoying. But for most users the full table scan is OK. So introducing the prototail risk to slow down the upgrade is not worth it. We don't want to introduce that risk for users that 99% of them don't have the problem.

I also asked @tangenta, he suggested that it is better not to do this kind of operation for a volumetric table.

99% of them don't have the problem.

99% users don't have 1M tables, so upgrade is fast even with the index. this PR is for the 1%

/hold

I am not in a hurry with this PR. So let's discuss it further.

99% users don't have 1M tables, so upgrade is fast even with the index. this PR is for the 1%

My point here is that 99% of users do not have 1M tables, so there is no need to add this index for them to bring the potential risk for them.

IMU, 99% of users do not have 1M tables -> 99% users don't have 100K rows -> add-index very fast -> no such risk

We may also need to add new indexes on tables like mysql.stats_histograms.
This table is related to column count and index count. It's more likely to be a big table.
So the problem will still exist.

I think we can add the related operation to our upgrading guide.

I think we can add the related operation to our upgrading guide.

you mean ask users to manually create index on system table? system table should better be managed by TiDB itself IMO, and most production cluster have very strict permission control, asking DBA or others with root permission to manage what TiDB itself should done, not sure how much they will buy this idea.

ti-chi-bot · 2024-12-31T06:27:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Leavrth, time-and-fate, winoros, yudongusa

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Leavrth,time-and-fate,winoros]
~~br/OWNERS~~ [Leavrth]
~~pkg/session/OWNERS~~ [yudongusa]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

session: skip creating indexes on the analyze_jobs table for older cl…

dd5f95c

…usters Signed-off-by: Rustin170506 <techregister@pm.me>

ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 30, 2024

test: simplify the cases

6e70b5b

Signed-off-by: Rustin170506 <techregister@pm.me>

ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 30, 2024

test: update the broken test case

7a52e88

Signed-off-by: Rustin170506 <techregister@pm.me>

Rustin170506 requested review from winoros, D3Hunter and time-and-fate December 30, 2024 09:12

Rustin170506 commented Dec 30, 2024

View reviewed changes

winoros approved these changes Dec 30, 2024

View reviewed changes

ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Dec 30, 2024

time-and-fate approved these changes Dec 30, 2024

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Dec 30, 2024

yudongusa approved these changes Dec 31, 2024

View reviewed changes

Rustin170506 requested review from Leavrth and removed request for D3Hunter December 31, 2024 06:19

D3Hunter reviewed Dec 31, 2024

View reviewed changes

Leavrth approved these changes Dec 31, 2024

View reviewed changes

ti-chi-bot bot added approved do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Dec 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

session: skip creating indexes on the analyze_jobs table for older clusters #58608

session: skip creating indexes on the analyze_jobs table for older clusters #58608

Rustin170506 commented Dec 30, 2024

codecov bot commented Dec 30, 2024 •

edited

Loading

Rustin170506 commented Dec 30, 2024

Rustin170506 left a comment

ti-chi-bot bot commented Dec 30, 2024

D3Hunter Dec 31, 2024 •

edited

Loading

Rustin170506 Dec 31, 2024

Rustin170506 Dec 31, 2024

D3Hunter Dec 31, 2024

Rustin170506 Dec 31, 2024

Rustin170506 Dec 31, 2024

D3Hunter Dec 31, 2024

winoros Dec 31, 2024

D3Hunter Dec 31, 2024

ti-chi-bot bot commented Dec 31, 2024

		doReentrantDDL(s, addAnalyzeJobsSchemaTableStateIndex, dbterror.ErrDupKeyName)
		doReentrantDDL(s, addAnalyzeJobsSchemaTablePartitionStateIndex, dbterror.ErrDupKeyName)

session: skip creating indexes on the analyze_jobs table for older clusters #58608

Are you sure you want to change the base?

session: skip creating indexes on the analyze_jobs table for older clusters #58608

Conversation

Rustin170506 commented Dec 30, 2024

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

codecov bot commented Dec 30, 2024 • edited Loading

Codecov Report

Rustin170506 commented Dec 30, 2024

Rustin170506 left a comment

Choose a reason for hiding this comment

ti-chi-bot bot commented Dec 30, 2024

[LGTM Timeline notifier]

D3Hunter Dec 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ti-chi-bot bot commented Dec 31, 2024

codecov bot commented Dec 30, 2024 •

edited

Loading

D3Hunter Dec 31, 2024 •

edited

Loading