Skip to content

Create px_healthcheck.sh#106

Merged
usujlana-px merged 5 commits intomainfrom
usujlana-px-patch-1
Mar 18, 2026
Merged

Create px_healthcheck.sh#106
usujlana-px merged 5 commits intomainfrom
usujlana-px-patch-1

Conversation

@usujlana-px
Copy link
Collaborator

First commit

What this PR does / why we need it:

Which issue(s) this PR fixes (optional)
Closes #

Special notes for your reviewer:

Copy link
Collaborator

@tjoseph-px tjoseph-px left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

README first commit
fixed typo
removed px-security comment
@tjoseph-px
Copy link
Collaborator

ReadMe LGTM

Copy link
Member

@adityadani adityadani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would one run this script? Is at install time, upgrade time ?
We dont need to do it now, but separating out the checks based on what activity someone wants to do would be better. So run the NBDD checks only during general health check, or run the PDB check only before upgrades.

Secondly I would highly recommend using YAMLs and JSONs for parsing CLI or CR outputs. The CLI outputs can change and we dont want to keep changing this script. JSON outputs will never break compatibility

Comment on lines +449 to +451
print_warning "Found duplicate IP addresses (possible ghost entries):"
echo "========================================="
while IFS= read -r ip; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate IP entries can happen momentarily in cloud drive environments as disks/DriveSets move between nodes. Instead of ghost entries, you can say make sure all storage nodes are up and running

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script is intended to primarily be run before a PXE upgrade and if required to do a quick health check.
Discussed internally and CX TSE/DSE favored to keep the script as is rather than seperate it out at this time. If this comes up again in future will try to do so.
Did change the commands to use JSON wherever possible.
We have seen seen in some cases a cluster has nodes with the same IP but different UUID. The check is intended for that.

Comment on lines +894 to +901
for vol_id in $(/opt/pwx/bin/pxctl volume list 2>/dev/null | awk "NR>1 && NF>0 {print \$1}"); do
inspect=$(/opt/pwx/bin/pxctl volume inspect "$vol_id" 2>/dev/null)
if echo "$inspect" | grep -q "Replication Status.*:.*Resync"; then
# Extract name and status - format is " Name : volume-name"
name=$(echo "$inspect" | awk -F: "/^[[:space:]]*Name[[:space:]]*:/{gsub(/^[[:space:]]+/,\"\",\$2);print \$2;exit}")
status=$(echo "$inspect" | awk -F: "/^[[:space:]]*Status[[:space:]]*:/{gsub(/^[[:space:]]+/,\"\",\$2);print \$2;exit}")
echo "RESYNC|${vol_id}|${name}|${status}"
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You could do a pxctl v i <vol> -j | grep resync

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I would suggest use -j or the json outputs to parse fields instead of parsing pretty printed CLI outputs. Once you have jsons then you can use jq as well to parse sub fields.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the changes to the script to do this.

Comment on lines +1207 to +1244
local pure_json
pure_json=$(echo "$pure_json_b64" | base64 -d 2>/dev/null)
if [[ -z "$pure_json" ]]; then
print_error "Failed to decode pure.json from px-pure-secret."
return
fi

# Step 4: Parse FlashArrays from the JSON

local fa_count=0
local fa_endpoints=()
local fa_tokens=()


local fa_section
fa_section=$(echo "$pure_json" | tr -d '\n' | sed -n 's/.*"FlashArrays"[[:space:]]*:[[:space:]]*\(\[[^]]*\]\).*/\1/p')

if [[ -z "$fa_section" ]]; then
print_info "No FlashArrays section found in px-pure-secret. Skipping FlashArray check."
return
fi

# Parse the FlashArrays section - extract all MgmtEndPoint and APIToken values
while IFS= read -r line; do
[[ -n "$line" ]] && fa_endpoints+=("$line")
done < <(echo "$fa_section" | grep -o '"MgmtEndPoint"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/.*:.*"\([^"]*\)"/\1/')

while IFS= read -r line; do
[[ -n "$line" ]] && fa_tokens+=("$line")
done < <(echo "$fa_section" | grep -o '"APIToken"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/.*:.*"\([^"]*\)"/\1/')

fa_count=${#fa_endpoints[@]}

if [[ $fa_count -eq 0 ]]; then
print_info "No FlashArrays found in px-pure-secret."
return
fi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a json, jq command will make your life very easy than parsing line by line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As jq may/may not be installed on a user system I did not do this. skipping.

fi
echo ""

# Step 5: Test connectivity to each FlashArray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connnectivity should be checked from the worker nodes and not from the mac/windows machine where this script is running

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have made this change. Now the FA perf testing is being done from the portworx pod that is selected at the start of the script.

manual_image_check
check_flasharray
px_alerts_show
check_nbdd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a PX version check to ensure this is being reported only when PX is at the version which supports NBDD

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made changes to the script to only run check_nbdd if the PXE version is 3.5.0 or above.

Added changes flagged by Aditya - Mar 3, 2026
@adityadani
Copy link
Member

lgtm

@usujlana-px usujlana-px merged commit e52fafa into main Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants