Skip to content
This repository has been archived by the owner on Jul 24, 2021. It is now read-only.

Device Report audit results #871

Open
daleghent opened this issue Aug 23, 2019 · 1 comment
Open

Device Report audit results #871

daleghent opened this issue Aug 23, 2019 · 1 comment
Labels
database involves database schema or config changes, or non-trivial query authoring device reports Involves data coming from reporters discussion needs-reporter needs accompanying changes in conch-reporter/livesys validation

Comments

@daleghent
Copy link
Contributor

Device Report Audit

Categories

CPUS

cpus[] is an array of logical processors, with one array member per logical CPU. Each member contains information that is specific to that logical CPU.

Before cpus[] existed, we used and still have a section that counts the physical CPUs present and uses the model_name string from one of them to identify the type:

  "processor": {
    "count": 2,
    "type": "Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz"
  }

The CpuCount validation already counts the number of processors by counting the number of array members in cpus[], however that is in error as it is a count of logical CPUs, not physical CPUs. This is probably better, as core count is going to be more critical than socket count.

Can Remove

Element Reason
flags[] CPU flags are copious and not required for any existing or future validation needs.
model_family Not needed
model_stepping Not needed
core_id Not needed
clock Not needed
microcode Not needed (SmartOS installs its own microcode version upon boot)

Needs Validation

Element Reason
model_name Ensure the correct CPU model is installed in the system.

Disks

disks is one of the remaining sets of data that is still in hash form instead of an array (disks[]). Some of the info is not validated on specifically, but is used to help locate a disk in the system should it need attention during the preflight process. This means that we may retain information that is not directly used for validation purposes, but is used to support preflight in other ways.

All information in this set is used in some way, or needs to be.

Example (SATA)

    "PHYG830501N21P9DGN": {
      "model": "INTEL SSDSC2KG01",
      "health": "OK",
      "hba": 0,
      "firmware": "0100",
      "temp": "12",
      "hctl": "8:0:0:0",
      "transport": "sata",
      "vendor": "ATA",
      "size": 1800000,
      "device": "sdi",
      "drive_type": "SATA_SSD"
    }

Example (NVMe)

    "S36WNX0KA02240": {
      "health": "OK",
      "vendor": "0x144d",
      "device": "nvme2n1",
      "model": "",
      "drive_type": "SAS_SSD",
      "hba": 0,
      "block_sz": 512,
      "temp": "31",
      "firmware": "CXV8601Q",
      "transport": "nvme",
      "size": 1800000,
      "hctl": ""
    }

Needs Validation

Element Reason
firmware We need to validated expected drive firmware versions
model We need to validate the expected models of drives in the system and their respective quantities. Currently we count drives broadly by their type (SSD, SAS, SATA, etc) which, while sufficient for BOMs as a whole, it can lead to inaccuracies as we have experienced in the past.

Interfaces

interfaces is also a legacy hash of hashes and must be converted to arrays. This section holds data on each physical ethernet interface in the system and its LLDP peers, which is required for determining proper switch wiring. All data in this section is used, and it currently does not lack any validations.

Example

    "eth2": {
      "vendor": "Mellanox",
      "state": "up",
      "product": "MT27640 Family",
      "ipaddr": "192.168.227.72",
      "peer_mac": "74:83:ef:d6:98:0d",
      "peer_switch": "sin102-tor02-0513",
      "peer_port": "Ethernet33",
      "mac": "50:6B:4B:AB:9B:9A",
      "mtu": "1500",
      "peer_descr": "Arista Networks EOS version 4.20.7M running on an Arista Networks DCS-7160-48YC6",
      "peer_text": "sin102-tor02-0513 Ethernet33"
    }

DIMMs

dimms[] is an array of memory modules, with one member per module. We currently use the info in this area to validate total system memory (RamTotal), DIMM count (DimmCount) and where DIMMs are plugged in on the motherboard (DimmMap).

Example (populated slot)

    {
      "memory-speed": "2666",
      "memory-rank": "2",
      "memory-set": null,
      "memory-bank-locator": "P1_Node1_Channel0_Dimm0",
      "memory-maximum-voltage": "1.2",
      "memory-serial-number": "23E38365",
      "memory-data-width": "64",
      "memory-total-width": "72",
      "memory-minimum-voltage": "1.2",
      "memory-configured-clock-speed": "2666",
      "memory-part-number": "M393A4K40CB2-CTD",
      "memory-type": "DDR4",
      "memory-asset-tag": "P2-DIMMD1_AssetTag (date:18/39)",
      "memory-configured-voltage": "1.2",
      "memory-locator": "P2-DIMMD1",
      "memory-size": 32,
      "memory-form-factor": "DIMM",
      "memory-manufacturer": "Samsung",
      "memory-type-detail": "Synchronous"
    }

Example (unpopulated slot)

    {
      "memory-manufacturer": null,
      "memory-bank-locator": null,
      "memory-maximum-voltage": "1.2 V",           "memory-configured-voltage": "1.2 V",
      "memory-data-width": null,
      "memory-speed": null,
      "memory-locator": "P1-DIMMC2",
      "memory-form-factor": null,
      "memory-type-detail": null,
      "memory-minimum-voltage": "1.2 V",
      "memory-type": null,
      "memory-asset-tag": null,
      "memory-set": null,
      "memory-total-width": null,
      "memory-configured-clock-speed": null,
      "memory-part-number": null,
      "memory-serial-number": null,
      "memory-size": null,
      "memory-rank": null
    }

As a part of past efforts, we now count DIMMs and the amount of memory in a system by counting dimm[] array members and summing the capacities of them using the memory-size member in each DIMM. This means we can completely remove the separate memory section of the report, which contains a toal count (in GB) of memory and a total count of DIMMS:

"memory": { "total": 768, "count": 24 }

Must Retain

Element Reason
memory-locator Required for DimmMap
memory-serial-number Required for DimmMap
memory-size Required for RamTotal

All other memory-* elements can be elided at this time.

Special Item: MegaRAID cards

There are currently a few things that may or may not show up in a given report. As a part of the Ceres v2 rollout, we had to ensure that an LSI/Avago/Broadcom MegaRAID card was present in the system. The presense of one and technical details concerning it will show up in a megaraid[] array in the report. We must formulate a validation for signalling its presensce.

@daleghent daleghent added database involves database schema or config changes, or non-trivial query authoring device reports Involves data coming from reporters discussion needs-reporter needs accompanying changes in conch-reporter/livesys validation labels Aug 23, 2019
@karenetheridge
Copy link
Contributor

As part of the 3.1 spec work we will figure out how things need to move around in device reports.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
database involves database schema or config changes, or non-trivial query authoring device reports Involves data coming from reporters discussion needs-reporter needs accompanying changes in conch-reporter/livesys validation
Projects
None yet
Development

No branches or pull requests

2 participants