-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add valgrind tests #188
Add valgrind tests #188
Changes from 6 commits
3e7f29d
c1786a3
5377c9f
b8b605e
0c3d4ef
18b7b1c
eb6fff0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,6 +27,7 @@ required for testing citus. | |
* [Running PgBench Tests Against Hyperscale (Citus)](#pgbench-cloud) | ||
* [Running TPC-H Tests](#tpch) | ||
* [Running TPC-H Tests Against Hyperscale (Citus)](#tpch-cloud) | ||
* [Running Valgrind Tests](#valgrind) | ||
* [Example fab Commands](#fab-examples) | ||
* [Tasks, and Ordering of Tasks](#fab-tasks) | ||
* [Task Namespaces](#task-namespaces) | ||
|
@@ -602,6 +603,90 @@ On the coordinator node: | |
fab run.tpch_automate:tpch_q1.ini,connectionURI='postgres://citus:dwVg70yBfkZ6hO1WXFyq1Q@c.fhhwxh5watzbizj3folblgbnpbu.db.citusdata.com:5432/citus?sslmode\=require' | ||
``` | ||
|
||
## <a name="valgrind"></a> Running Valgrind Tests | ||
|
||
TL;DR | ||
|
||
```bash | ||
SaitTalhaNisanci marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# 1 # start valgrind test | ||
|
||
# create valgrind instance to run | ||
export RESOURCE_GROUP_NAME='your-valgrind-test-rg-name-here' | ||
export VALGRIND_TEST=1 | ||
cd azure | ||
./create-cluster.sh | ||
|
||
# connect to coordinator | ||
eval `ssh-agent -s` | ||
ssh-add | ||
./connect.sh | ||
|
||
# run fab command in coordinator in a detachable session | ||
sudo yum install tmux | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We already have tmux in our servers by default now: |
||
tmux new -d "fab use.postgres:12.3 use.enterprise:enterprise-master run.valgrind" | ||
|
||
# simply exit from coordinator after detaching | ||
|
||
# 2 # finalize valgrind test | ||
|
||
# reconnect to coordinator after 9.5 hours (if you preferred default coordinator configuration) | ||
export RESOURCE_GROUP_NAME='your-valgrind-test-rg-name-here' | ||
./connect.sh | ||
|
||
# you can first check if valgrind test is finished by attaching to tmux session | ||
tmux a | ||
# then you should detach from the session before moving forward | ||
Ctrl+b d | ||
|
||
# run push results script | ||
cd test-automation/azure | ||
./push-results.sh <branch name you prefer to push results> | ||
|
||
# simply exit from coordinator after pushing the results | ||
|
||
# delete resource group finally | ||
cd azure | ||
./delete-resource-group.sh | ||
``` | ||
|
||
DETAILS: | ||
|
||
To create a valgrind instance, following the steps in [Setup Steps For Each Test](#azure-setup-steps), do the following before executing `create-cluster.sh`: | ||
|
||
```bash | ||
export VALGRIND_TEST=1 | ||
``` | ||
|
||
, which makes `numberOfWorkers` setting useless. | ||
This is because we will already be using our regression test structure and it creates a local cluster | ||
itself. Also, as we install `valgrind` only on coordinator, if we have worker nodes, then we cannot build | ||
PostgreSQL as we require `valgrind` on workers and get error even if we do not need them. | ||
|
||
On the coordinator node: | ||
|
||
```bash | ||
# an example usage: Use PostgreSQL 12.1 and run valgrind test on enterprise/enterprise-master | ||
fab use.postgres:12.1 use.enterprise:enterprise-master run.valgrind | ||
``` | ||
|
||
However as valgrind tests take too much time to complete, we recommend you to run valgrind tests in a detached session: | ||
```bash | ||
sudo yum install tmux | ||
tmux new -d "fab use.postgres:12.1 use.enterprise:enterprise-master run.valgrind" | ||
``` | ||
|
||
After the tests are finished (takes up to 9 hours with default coordinator size), re-connect to the coordinator. | ||
Result can be found under `$HOME/results` directory. | ||
|
||
To push the results to `release_test_results` repository, run the below command in coordinator node: | ||
|
||
```bash | ||
sh $HOME/test-automation/azure/push-results.sh <branch_name_to_push> | ||
``` | ||
|
||
Finally, delete your resource group. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we mention for the periodic run? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess I couldn't quite understand There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess I meant should we mention that there is a periodic run for valgrind tests? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh now I understood, will apply that too There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. + that destroy job isn't actually executed automatically when user follows manual steps, so I think we should not mention this here ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. adressed with last commit |
||
Note that automated (weekly) valgrind test already destroys the resources that it uses. | ||
|
||
## <a name="fab-examples"></a> Example fab Commands | ||
|
||
Use `fab --list` to see all the tasks you can run! This is just a few examples. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,8 +40,19 @@ trap cleanup EXIT | |
|
||
rg=$1 | ||
export RESOURCE_GROUP_NAME=${rg} | ||
./create-cluster.sh | ||
|
||
if [ "$rg" == "citusbot_valgrind_test_resource_group" ]; then | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We kind of want to do the cleanup if it fails while creating the cluster I think, but we need the VALGRIND_TEST variable set before running ./create-cluster.sh We can possibly do the following: if [ "$rg" == "citusbot_valgrind_test_resource_group" ]; then
# If running valgrind tests, export VALGRIND_TEST to be 1 to ensure
# only coordinator instance is created in create-cluster script
export VALGRIND_TEST=1
fi
./create-cluster.sh
if <VALGRIND_TEST variable = 1> ; then
# If running valgrind tests, do not run cleanup function
# This is because, as valgrind tests requires too much time to run,
# we start valgrind tests via nohup in ci. Hence ssh session
# will immediately be closed just after the fabric command is run
trap - EXIT
fi
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. makes a lot sense, thanks |
||
# If running valgrind tests, do not run cleanup function | ||
# This is because, as valgrind tests requires too much time to run, | ||
# we start valgrind tests via nohup in ci. Hence ssh session | ||
# will immediately be closed just after the fabric command is run | ||
trap - EXIT | ||
# If running valgrind tests, export VALGRIND_TEST to be 1 to ensure | ||
# only coordinator instance is created in create-cluster script | ||
export VALGRIND_TEST=1 | ||
fi | ||
|
||
./create-cluster.sh | ||
|
||
public_ip=$(az group deployment show -g ${rg} -n azuredeploy --query properties.outputs.publicIP.value) | ||
# remove the quotes | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
#!/bin/bash | ||
|
||
# fail if trying to reference a variable that is not set. | ||
set -u | ||
# exit immediately if a command fails | ||
set -e | ||
# echo commands | ||
set -x | ||
|
||
function cleanup { | ||
sh ./delete-resource-group.sh | ||
} | ||
|
||
export RESOURCE_GROUP_NAME="citusbot_valgrind_test_resource_group" | ||
|
||
trap cleanup EXIT | ||
|
||
public_ip=$(az group deployment show -g ${RESOURCE_GROUP_NAME} -n azuredeploy --query properties.outputs.publicIP.value) | ||
# remove the quotes | ||
public_ip=$(echo ${public_ip} | cut -d "\"" -f 2) | ||
|
||
echo ${public_ip} | ||
|
||
ssh-keyscan -H ${public_ip} >> ~/.ssh/known_hosts | ||
chmod 600 ~/.ssh/known_hosts | ||
|
||
sh ./delete-security-rule.sh | ||
|
||
echo "adding public ip to known hosts in remote" | ||
ssh -o "StrictHostKeyChecking no" -A pguser@${public_ip} "ssh-keyscan -H ${public_ip} >> /home/pguser/.ssh/known_hosts" | ||
echo "running tests in remote" | ||
|
||
# ssh with non-interactive mode does not source bash profile, so we will need to do it ourselves here. | ||
# put an empty success file for valgrind tests under results dir if there are error logs | ||
# push the files under results dir | ||
ssh -o "StrictHostKeyChecking no" -A pguser@${public_ip} \ | ||
"source ~/.bash_profile;" \ | ||
"sh /home/pguser/test-automation/azure/push-results.sh ${RESOURCE_GROUP_NAME}"; |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
#!/bin/bash | ||
|
||
# this scripts pushes the results under results/ directory to release-test-results repository | ||
|
||
# args # | ||
# $1 -> branch name to push results | ||
|
||
# fail if trying to reference a variable that is not set. | ||
set -u | ||
# exit immediately if a command fails | ||
set -e | ||
# fail in a pipeline if any of the commands fails | ||
set -o pipefail | ||
|
||
branch_name=$1 | ||
|
||
# add github to known hosts | ||
|
||
echo "github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==" >> ~/.ssh/known_hosts | ||
|
||
git clone git@github.com:citusdata/release-test-results.git "${HOME}"/release-test-results | ||
|
||
git config --global user.email "citus-bot@microsoft.com" | ||
git config --global user.name "citus bot" | ||
|
||
now=$(date +"%m_%d_%Y_%s") | ||
|
||
mv "${HOME}"/results "${HOME}"/release-test-results/periodic_job_results/"${now}" | ||
|
||
cd "${HOME}"/release-test-results | ||
|
||
commit_message="add test results" | ||
SaitTalhaNisanci marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
git checkout -b "${branch_name}/${now}" | ||
git add -A | ||
git commit -m "$commit_message" | ||
git push origin "${branch_name}/${now}" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,22 +22,23 @@ if [ "$rg_name" = "citusbot_tpch_test_resource_group" ]; then | |
fab run.tpch_automate | ||
fi | ||
|
||
|
||
# add github to known hosts | ||
echo "github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==" >> ~/.ssh/known_hosts | ||
|
||
git clone git@github.com:citusdata/release-test-results.git | ||
|
||
git config --global user.email "citus-bot@microsoft.com" | ||
git config --global user.name "citus bot" | ||
|
||
now=$(date +"%m_%d_%Y_%s") | ||
|
||
mv ${HOME}/results ${HOME}/release-test-results/periodic_job_results/${now} | ||
|
||
cd ${HOME}/release-test-results | ||
|
||
git checkout -b ${rg_name}/${now} | ||
git add -A | ||
git commit -m "add test results for performance tests ${rg_name}" | ||
git push origin ${rg_name}/${now} | ||
# If running valgrind tests, do not run cleanup function | ||
# This is because, as valgrind tests requires too much time to run, | ||
# we start valgrind tests via nohup in ci. Hence ssh session | ||
# will immediately be closed just after the fabric command is run | ||
# | ||
# We have a seperate job to terminate the machine and push the results | ||
if [ "$rg_name" = "citusbot_valgrind_test_resource_group" ]; then | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this doesn't seem to be the correct way of checking There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should use "==" ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no, they are very same when comparing strings |
||
nohup fab use.postgres:13.1 use.enterprise:enterprise-master run.valgrind > /dev/null 2>&1 & | ||
|
||
# wait for cloning to end | ||
while ! test -d "$HOME/citus-enterprise"; | ||
SaitTalhaNisanci marked this conversation as resolved.
Show resolved
Hide resolved
|
||
do | ||
echo "Wait until citus is cloned completely ..."; | ||
sleep 60; | ||
done | ||
|
||
echo "Citus is cloned succesfully"; | ||
else | ||
sh "${HOME}"/test-automation/azure/push-results.sh "$1"; | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: This could be put right above "cron" field so that it is closer to where it is relevant.