Giter Site home page Giter Site logo

dell / idrac-telemetry-reference-tools Goto Github PK

View Code? Open in Web Editor NEW
28.0 9.0 20.0 7.73 MB

Reference toolset for PowerEdge telemetry metric collection and integration with analytics and visualization solutions.

Home Page: https://github.com/dell/iDRAC-Telemetry-Reference-Tools

License: Apache License 2.0

Go 74.37% Shell 13.81% Dockerfile 0.77% Python 4.69% JavaScript 6.35%
poweredge telemetry analytics visualization idrac9

idrac-telemetry-reference-tools's Introduction

Telemetry Reference Tools

Navigation

So, what is the point of this repo?

While it's easy to collect telemetry metrics it is not easy to build a well-structured, best-practice following, pipeline. We will not profess to be perfect and the totality of this code base is maintained by Dell employees in their spare time, but this repo is here to give people a head start in building that pipeline with a near one button deploy mechanism to create a telemetry collection pipeline.

We tried hard to write these instructions such that things would be easy to follow. If you have feedback, especially if something was confusing, feel free to open an issue

What is Telemetry?

Telemetry is a vendor-neutral part of the Redfish standard for providing telemetry data from a device. For more on the Redfish standard see DMTF's white paper. You can also see their developer resources. Many people first ask, "What is in telemetry?"

Telemetry is presented by what are called reports. There are currently 24 report types available with their Metrics. You can obtain a list by browsing to your iDRAC at https://<iDRAC>/redfish/v1/TelemetryService/MetricReports.

Following are the currently available metrics (MetricIDs) and the associated pre-canned reports. Detail of each metric (MetricDefinition), like description, type, units, and sensing interval etc., can be obtained using the following command:

curl -s -k -u : -X GET https://redfish/v1/TelemetryService/MetricDefinitions/SystemMaxPowerConsumption

Output:

{

"@odata.type": "#MetricDefinitio.v1_1_1.MetricDefinition",

"@odata.context": "/redfish/v1/$metadata#MetricDefinition.MetricDefinition",

"@odata.id": "/redfish/v1/TelemetryService/MetricDefinitions/SystemMaxPowerConsumption",

"Id": "SystemMaxPowerConsumption",

"Name": "System Max Power Consumption Metric Definition",

"Description": "Peak system power consumption",

"MetricType": "Numeric",

"MetricDataType": "Decimal",

"Units": "W",

"Accuracy": 1,

"SensingInterval": "PT60S",

}

List of Metric Reports with Metrics:

  • StorageDiskSMARTData

    • CommandTimeout
    • CRCErrorCount
    • CurrentPendingSectorCount
    • DriveTemperature
    • ECCERate
    • EraseFailCount
    • ExceptionModeStatus
    • MediaWriteCount
    • PercentDriveLifeRemaining
    • PowerCycleCount
    • PowerOnHours
    • ProgramFailCount
    • ReadErrorRate
    • ReallocatedBlockCount
    • UncorrectableErrorCount
    • UncorrectableLBACount
    • UnusedReservedBlockCount
    • UsedReservedBlockCount
    • VolatileMemoryBackupSourceFailures 
    
  • SerialLog

  • ThermalMetrics

    • ComputePower
    • ITUE
    • PowerToCoolRatio
    • PSUEfficiency
    • SysAirFlowEfficiency
    • SysAirflowPerFanPower
    • SysAirflowPerSysInputPower
    • SysAirflowUtilization
    • SysNetAirflow
    • SysRackTempDelta
    • TotalPSUHeatDissipation
    
  • MemorySensor

    • TemperatureReading
    
  • GPUMetrics

    • BoardPowerSupplyStatus
    • BoardTemperature
    • GPUHealth
    • GPUMemoryUsage
    • GPUUsage
    • GPUMemoryClockFrequency
    • GPUClockFrequency
    • GPUStatus
    • MemoryTemperature
    • PowerBrakeState
    • PowerConsumption
    • PowerSupplyStatus
    • PrimaryTemperature
    • SecondaryTemperature
    • ThermalAlertState
    
  • MemoryMetrics

    • AddressParityError
    • UncorrectableECCError
    • CorrectableECCError
    • DataLossDetected
    • MemorySpareBlock
    • PredictedMediaLifeLeftPercent
    • TemperatureThresholdAlarm
    
  • ThermalSensor

    • TemperatureReading
    
  • CPURegisters

  • AggregationMetrics

    • SystemAvgInletTempHour
    • SystemMaxInletTempHour
    • SystemMaxPowerConsumption
    
  • GPUStatistics

    • CumulativeDBECounterFB
    • CumulativeDBECounterGR
    • CumulativeSBECounterFB
    • CumulativeSBECounterGR
    • DBECounterFB
    • DBECounterFBL2Cache
    • DBECounterGRL1Cache
    • DBECounterGRRF
    • DBECounterGRTex
    • DBERetiredPages
    • SBECounterFB
    • SBECounterFBL2Cache
    • SBECounterGRL1Cache
    • SBECounterGRRF
    • SBECounterGRTex
    • SBERetiredPages    
    
  • Sensor

    • AmpsReading
    • CPUUsagePctReading
    • IOUsagePctReading
    • MemoryUsagePctReading
    • RPMReading
    • SystemUsagePctReading
    • TemperatureReading
    • VoltageReading
    • WattsReading
    
  • NICSensor

    • TemperatureReading
    
  • FanSensor

    • RPMReading
    
  • PowerMetrics

    • SystemHeadRoomInstantaneous
    • SystemInputPower
    • SystemOutputPower
    • SystemPowerConsumption
    • TotalCPUPower
    • CPUPower
    • TotalFanPower
    • TotalMemoryPower
    • TotalPciePower
    • TotalStoragePower
    • TotalFPGAPower
    • FPGAPower
    
  • NICStatistics

    • DiscardedPkts
    • FCCRCErrorCount
    • FCOELinkFailures
    • FCOEPktRxCount
    • FCOEPktTxCount
    • FCOERxPktDroppedCount
    • LanFCSRxErrors
    • LanUnicastPktRxCount
    • LanUnicastPktTxCount
    • LinkStatus
    • OSDriverState
    • PartitionLinkStatus
    • PartitionOSDriverState
    • RDMARxTotalBytes
    • RDMARxTotalPackets
    • RDMATotalProtectionErrors
    • RDMATotalProtocolErrors
    • RDMATxTotalBytes
    • RDMATxTotalPackets
    • RDMATxTotalReadReqPkts
    • RDMATxTotalSendPkts
    • RDMATxTotalWritePkts
    • RxBroadcast
    • RxBytes
    • RxErrorPktAlignmentErrors
    • RxErrorPktFCSErrors
    • RxFalseCarrierDetection
    • RxJabberPkt
    • RxMutlicast
    • RxPauseXOFFFrames
    • RxPauseXONFrames
    • RxRuntPkt
    • RxUnicast
    • TxBroadcast
    • TxBytes
    • TxErrorPktExcessiveCollision
    • TxErrorPktLateCollision
    • TxErrorPktMultipleCollision
    • TxErrorPktSingleCollision
    • TxMutlicast
    • TxPauseXOFFFrames
    • TxPauseXONFrames
    • TxUnicast
    
  • StorageSensor

    • TemperatureReading
    
  • CPUMemMetrics

    • CPUC0ResidencyHigh
    • CPUC0ResidencyLow
    • CUPSIIOBandwidthDMI
    • CUPSIIOBandwidthPort0
    • CUPSIIOBandwidthPort1
    • CUPSIIOBandwidthPort2
    • CUPSIIOBandwidthPort3
    • NonC0ResidencyHigh
    • NonC0ResidencyLow
    • AvgFrequencyAcrossCores
    • CPUPkgEnergy
    • DRAMPkgEnergy
    • LimitingEvents
    • EnergyTimestamp
    • PkgPwr
    • DRAMPwr
    • PkgThermalStatus
    • ThermalCrtlCircuitActivation
    • DRAMThrottling
    • TJMax
    • CPUEpi
    • CPUViolationCounter
    • CPULimitingCounter
    • DDRLimitingCounter
    • TCtrl
    • CPUAvgPbmRatioCounterLow
    • AccCoreCyclesLow
    • AccCoreCyclesHigh
    • UncoreClocksLow
    • UncoreClocksHigh
    
  • PowerStatistics

    • LastDayAvgPower
    • LastDayMaxPower
    • LastDayMaxPowerTime
    • LastDayMinPower
    • LastDayMinPowerTime
    • LastHourAvgPower
    • LastHourMaxPower
    • LastHourMaxPowerTime
    • LastHourMinPower
    • LastHourMinPowerTime
    • LastMinuteAvgPower
    • LastMinuteMaxPower
    • LastMinuteMaxPowerTime
    • LastMinuteMinPower
    • LastMinuteMinPowerTime
    • LastWeekAvgPower
    • LastWeekMaxPower
    • LastWeekMaxPowerTime
    • LastWeekMinPower
    • LastWeekMinPowerTime
    
  • FPGASensor

    • TemperatureReading
    
  • CPUSensor

    • TemperatureReading
    
  • PSUMetrics

    • PSURPMReading
    • PSUTemperatureReading
    
  • FCPortStatistics

    • FCInvalidCRCs
    • FCLinkFailures
    • FCLossOfSignals
    • FCRxKBCount
    • FCRxSequences
    • FCRxTotalFrames
    • FCTxKBCount
    • FCTxSequences
    • FCTxTotalFrames
    • FCStatOSDriverState
    • PortSpeed
    • PortStatus
    
  • NVMeSMARTData

    • AvailableSpare
    • AvailableSpareThreshold
    • CompositeTemparature
    • ControllerBusyTimeLower
    • ControllerBusyTimeUpper
    • CriticalWarning
    • DataUnitsReadLower
    • DataUnitsReadUpper
    • DataUnitsWrittenLower
    • DataUnitsWrittenUpper
    • HostReadCommandsLower
    • HostReadCommandsUpper
    • HostWriteCommandsLower
    • HostWriteCommandsUpper
    • MediaDataIntegrityErrorsLower
    • MediaDataIntegrityErrorsUpper
    • NumOfErrorInfoLogEntriesLower
    • NumOfErrorInfoLogEntriesUpper
    • PercentageUsed
    • PowerCyclesLower
    • PowerCyclesUpper
    • PowerOnHoursLower
    • PowerOnHoursUpper
    • UnsafeShutdownsLower
    • UnsafeShutdownsUpper
    
  • FCSensor

    • TemperatureReading
    
  • SFPTransceiver

    • SFPTemperature
    • TemperatureStatus
    • SFPVoltage
    • VoltageStatus
    • TxBiasCurrent
    • TxBiasCurrentStatus
    • TxOutputPower
    • TxOutputPowerStatus
    • RxInputPower
    • RxInputPowerStatus
    
  • SystemUsage

    • CPUUsage
    • IOUsage
    • MemoryUsage
    • AggregateUsage
    
  • DPU Metrics (No pre-canned report for these metrics. Custom Metric Report should be created to get reports of these DPU metrics)

    • DPUTemperature
    • DPUPowerConsumption
    

If you want to see what a report looks like check out this sample report of the StorageDiskSMARTData report.

What do I do with telemetry data?

The question most people then have is, "Well, what do I do with it?" Generally, what most people want to do is grab these JSON reports and push them into a time series database. This way they can monitor their systems over time for things like tracking load, failure prediction, anomaly detection etc. For example, maybe you want to know what times of day your network is most active, telemetry can tell you that. Maybe you have a group of systems failing with a higher frequency and you want to know why. Telemetry could tell you that everything is overheating because no one told you the datacenter in question was 92 degrees. Not that we have seen anyone do that cough.

Here is what the data might look like for you. This is what an R840 looked like during startup in Splunk analytics.

What is in this data pipeline?

Currently, we support the following time series databases for the pipeline:

  • Elasticsearch
  • InfluxDB
  • Prometheus
  • Timescale
  • Splunk (you must bring your own Splunk instance)

With the exception of Splunk, the databases for the pipeline are self deploying. With Splunk you will have to deploy your own instance of Splunk, but we detail how to do this in the instructions.

At its highest level the pipeline looks like this:

iDRAC telemetry -> ActiveMQ -> Time series database

Overview

There are several GoLang programs in between that provide the glue which connect all of these data pipelines. See the architecture for more details.

Architecture Details

See ARCHITECTURE.md

Getting Started

Make Sure iDRAC is up-to-date

You must be running iDRAC 4.0 or higher for telemetry support.

Licensing

The first thing you will need is the Datacenter License for iDRAC. If you do not know what license your iDRAC currently has you can check it by logging into the iDRAC and looking here:

If you just want to try things out you can get a trial license for your iDRACs here. If you would like to deploy licenses to many servers programmatically there is an example script of how to do that in Python and Powershell.

Enabling Telemetry

The next thing you will need to do is enable telemetry on your servers. You can either do this through the GUI or there is a script available that will do it programmatically. The syntax is python3 ./ConfigurationScripts/EnableOrDisableAllTelemetryReports.py -ip YOUR_IDRAC_IP -u IDRAC_ADMIN -p IDRAC_PASSWORD -s Enabled To do it through the GUI log into the iDRAC and go to Configuration->System Settings->Telemetry Streaming and set Telemetry Data Stream enabled.

Hardware and System Requirements

Whatever server on which you decide to run telemetry reference tools will have to run the following applications:

  • A few lightweight GoLang programs
  • Docker
  • Apache ActiveMQ
  • Your time series database

The amount of resources you will need strongly depends on the number of servers from which you will create data. We are still in the initial phases of testing but here are some stats from a 5-minute capture of a live R840:

Total Packets 2933
Average PPS 9.8
Average Packet Size (Bytes) 720
Total Bytes sent/recv 2112629 (~2.06MB)
Average bytes/s 7056

As you can see from the above, the data load is very light. You could easily pull data from hundreds of servers with one receiver depending on its resources. As we perform more load testing we will push results here.

If you would like to perform some simple testing on your own you can pull all reports via HTTP SSE with the command:

curl -kX GET -u root:PASSWORD "https://YOUR_IDRAC_IP/redfish/v1/SSE? \$filter=EventFormatType%20eq%20MetricReport"

What to do next?/Installation

After you have gone through getting started. You can head over to our installation instructions.

NOTE: You will need access to the internet for the initial build of the pipeline but can move it offline after it is built.

Post Installation

After you have your setup running you will likely want to start customizing it to your liking. This is meant to be a reference architecture but it is unlikely it will do exactly what you want. You will likely want to develop your own dashboards, analytics setups, or tune parts of the pipeline.

Debugging

If you need to debug things, we have included a few tips and tricks we learned along the way in DEBUGGING.md.

Default Ports Used by the Framework

  • 3000 - Grafana
  • 8080 - configgui port (external). Internally it uses 8082
  • 8088 - Splunk HTTP Event Listener (if using Splunk)
  • 8000 - Splunk Management UI (if using Splunk)
  • 8161 - ActiveMQ Administrative Interface (default credentials are admin/admin)
  • 61613 - ActiveMQ messaging port. Redfish read will send to this port

FAQ

What is the advantage of using HTTP SSE over other approaches?

HTTP SSE consumes far less bandwidth than the alternate methods of obtaining telemetry data. Long pulling/Hanging GET requires a GET to "hang" on all iDRACs, using POST subscriptions runs the risk of no one being available to listen to the events and consumes more bandwidth, and syslog has a number of protocol-specific problems detailed below.

Why not syslog?

The issue with the syslog protocol is that the protocol does not specify a max message size. This means that each syslog server implementation chooses a max message size. Furthermore, the behavior when a message exceeds the maximum message size is not defined. Many syslog servers simply truncate the message which is obviously not a desirable behavior. If you decide you really want to try to use syslog (we don't recommend this approach) a member of our team has written a script for reassembling the messages.

How much horsepower do I need to collect telemetry?

See Hardware and System Requirements

What is the output format from telemetry?

JSON

Do I need to worry about telemetry overwhelming the iDRAC link?

There is no risk that even with all reports turned on telemetry can overwhelm iDRAC's 1Gb/s link. Each report takes only KB/s. Even if you tune the reports to send very frequently you will not overwhelm the 1Gb/s link.

What does a telemetry report look like?

See this example of what a telemetry report looks like.

Is telemetry vendor neutral?

Yes. Telemetry is part of DMTFs Redfish specification. While there are parts of the Redfish standard which are left to vendor implementation Dell's telemetry implementation is compliant with the specification.

Are there demo licenses for iDRAC Datacenter

Yes. See this website

What license is required for telemetry?

iDRAC Datacenter

LICENSE

This project is licensed under Apache 2.0 License. See the LICENSE for more information.

Contributing

We welcome your contributions to this reference toolset. See Contributing Guidelines for more details. Please reference our Code of Conduct.

Disclaimer

The software applications included in this package are considered "BETA". They are intended for testing use in non-production environments only.

No support is implied or offered. Dell Corporation assumes no responsibility for results or performance of "BETA" files. Dell does NOT warrant that the Software will meet your requirements, or that operation of the Software will be uninterrupted or error free. The Software is provided to you "AS IS" without warranty of any kind. DELL DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. The entire risk as to the results and performance of the Software is assumed by you. No technical support provided with this Software.

IN NO EVENT SHALL DELL OR ITS SUPPLIERS BE LIABLE FOR ANY DIRECT OR INDIRECT DAMAGES WHATSOEVER (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR OTHER PECUNIARY LOSS) ARISING OUT OF USE OR INABILITY TO USE THE SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Some jurisdictions do not allow an exclusion or limitation of liability for consequential or incidental damages, so the above limitation may not apply to you.

Support

  • To report an issue open one here.
  • If any requirements have not been addressed, then create an issue here.
  • To provide feedback to the development team, email [email protected].

idrac-telemetry-reference-tools's People

Contributors

cjmadathil avatar garasankara avatar grantcurell avatar krishnakartik1 avatar mahiredz avatar nikii1118 avatar sailm23 avatar srinivasadatari avatar sumedhk24 avatar superchalupa avatar trevorsquillario avatar windsparks33 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

idrac-telemetry-reference-tools's Issues

[enhacement] cover container deployment for environments require proxy

It would be helpful to include reference for environments which require a proxy given it has to be defined in many places.

Docker Daemon required to pull images
Docker Client required for container images that need to pull down imports

  • note noProxy option should also be specified so intra container communication does not attempt to hit proxy.

Additionally docker-compose files could add define for ports to the containers

  kib01:
    image: docker.elastic.co/kibana/kibana:7.10.1
    container_name: kib01
    depends_on: {"es01": {"condition": "service_healthy"}}
    environment:
      ELASTICSEARCH_URL: http://es01:9200
      ELASTICSEARCH_HOSTS: http://es01:9200
    networks:
      - elastic
    ports:
      - 5601:5601

helper page isn't displaying

This issue is used to track 1 issue.

  1. The helper page doesn't display
    ./docker-compose-files/compose.sh -h

Fix coming soon!

prometheus-ingester is not known host

Installing Prometheus pump results in prometheus-ingester target down as defined at cmd/prometheuspump/prometheus.yml and as such no data loaded.

Error at Prometheus /targets:

Get http://prometheus-ingester:2112/metrics: dial tcp: lookup prometheus-ingester on 127.0.0.11:53: server misbehaving

Prometheus pump container has alias prometheus-pump-withtestserver or prometheus-pump-standalone depending on test db being used or not.

prometheus-ingester does not seem to be defined anywhere.

Not sure what was intended, but one way to get around this is to add alias to prometheus-pump-standalone: &prometheus-pump at docker-compose-files/docker-compose.yml:

    networks:
      host-bridge-net:
        aliases:
          - prometheus-ingester

As long as both prometheus-pump-standalone, prometheus-pump-withtestserver are not running sharing the same alias this should be fine, but maybe there is better solution.

Compose files declare a docker named volume but do not use it

For example here:

a named docker volume called mysql-db is declared but it is not consumed in the mysql definition:

  mysqldb:
    image: mysql:latest
    restart: always
    container_name: mysqldb
    environment:
      - MYSQL_DATABASE=telemetrysource_services_db #specify configured database name
      - MYSQL_USER=reftools           #sample reference - specify configured user name
      - MYSQL_PASSWORD=*Ref!**lsq8#v* #sample reference - specify configured credentials
      - MYSQL_ROOT_PASSWORD=""        #sample reference - specify configured credentials
    networks:
      - elastic
    #mount deployed mysql filesystem location for persistance

elkpump will not retry if Elasticsearch doesn't come up

In my instance I didn't set all of the required elasticsearch settings (ex vm.max_map_count) so Elasticsearch failed to start. When this happened elkpump's DNS lookup failed and the container stopped:

go: downloading github.com/go-stomp/stomp v2.0.3+incompatible
go: downloading github.com/mitchellh/mapstructure v1.1.2
go: downloading github.com/elastic/go-elasticsearch/v8 v8.0.0-20201229214741-2366c2514674
2022/02/02 02:11:19 Cannot delete index: dial tcp: lookup es01: Temporary failure in name resolution
exit status 1

Desired behavior should be that it continues to retry until ES is up

configgui does not expose any ports

The telemetry-receiver container which runs configgui, does not expose any ports so there is no way to connect to it.

Temporary solution: expose a fixed port
Permanent solution: Many of the variables inside the docker compose file are configurable. There will need to be an external config file or something combined with a jinja2 template. OR you would have to have something which will update these values live.

It would probably be best if some central template engine controlled everything because additionally the user may turn things off and in in the telemetry receiver file which would further change what ports should and should not be open. This could all be done with Jinja. In a more sophisticated scenario Ansible

config.ini file has readable iDRAC Password

config.ini stores the iDRAC Password in readable format. This is security issue.

It is fine to accept the password in readable format but when the Telemetry containers starts this file should be no longer exists in readable format. Telemetry container should convert this file in some encrypted form or take password in encrypted form inside the container and then delete this file.

Later when user wants to make changes in password or add new nodes then user should supply new config.ini file and restart the telemetry container.

Can you please consider this issue on priority? Omnia HLD itself will not get clearance if this vulnerability is open.

Update the upload file service to properly check for completion

Because I am bad at web dev and it is getting late I did this:

    async function uploadFile() {
        let formData = new FormData();
        formData.append("file", fileupload.files[0]);
        await fetch('/api/v1/CsvUpload', {
            method: "POST",
            body: formData,
            contentType: 'text/csv'
        });
        alert("CSV file processing completed. Refresh the page to confirm the services were added. Check logs for " +
            "results.")
        $('#csvModal').modal('toggle');
    }

It is bad. I know it is bad. It ought be fixed, but I need it working for tomorrow and it currently works. This should be updated to check if there were any errors in the file upload.

Kibana ports not forwarded

Port 5601 is not forwarded in our docker compose so while Kibana will run, the user won't be able to access it externally.

root@telemetrytest:~# docker ps -a
CONTAINER ID   IMAGE                                                  COMMAND                  CREATED        STATUS                   PORTS                 NAMES
e4232510c507   docker.elastic.co/kibana/kibana:7.10.1                 "/usr/local/bin/dumb…"   13 hours ago   Up 13 hours              5601/tcp              kib01
8431b93df5ad   golang:1.15                                            "/bin/sh -c cmd/idra…"   13 hours ago   Up 13 hours                                    telemetry-receiver
0e42bea7943a   golang:1.15                                            "go run cmd/elkpump/…"   13 hours ago   Up 25 seconds                                  es-ingester
7e3a154ee5d5   mysql:latest                                           "docker-entrypoint.s…"   13 hours ago   Up 13 hours              3306/tcp, 33060/tcp   mysqldb
9451d17b7164   docker.elastic.co/elasticsearch/elasticsearch:7.10.1   "/tini -- /usr/local…"   13 hours ago   Up 2 minutes             9200/tcp, 9300/tcp    es02
d15f8433250f   rmohr/activemq:5.10.0                                  "/bin/bash -c 'bin/a…"   13 hours ago   Up 13 hours              8161/tcp, 61616/tcp   activemq
03d3664da575   docker.elastic.co/elasticsearch/elasticsearch:7.10.1   "/tini -- /usr/local…"   13 hours ago   Up 2 minutes             9200/tcp, 9300/tcp    es03
0ed7ff0bcc76   docker.elastic.co/elasticsearch/elasticsearch:7.10.1   "/tini -- /usr/local…"   13 hours ago   Up 2 minutes (healthy)   9200/tcp, 9300/tcp    es01
root@telemetrytest:~# ss -ltn
State                             Recv-Q                            Send-Q                                                       Local Address:Port                                                         Peer Address:Port                            Process
LISTEN                            0                                 4096                                                         127.0.0.53%lo:53                                                                0.0.0.0:*
LISTEN                            0                                 128                                                                0.0.0.0:22                                                                0.0.0.0:*
LISTEN                            0                                 128                                                                   [::]:22                                                                   [::]:*

Pipeline fails when volumes are in root - permission denied

root@telemetrytest:~/root# docker logs ab33864fc5bb
/bin/sh: 1: cmd/idrac-telemetry-receiver.sh: Permission denied

It looks like the root cause is that when you pull the repo idrac-telemetry-receiver.sh is missing execute:

root@telemetrytest:~# ls -al root/idrac-testing/cmd/
total 160
drwxr-xr-x 12 root root  4096 Feb  8 12:35 .
drwxr-xr-x 10 root root  4096 Feb  8 12:35 ..
drwxr-xr-x  2 root root  4096 Feb  8 12:35 configui
drwxr-xr-x  2 root root  4096 Feb  8 12:35 dbdiscauth
drwxr-xr-x  2 root root  4096 Feb  8 12:35 elkpump
-rw-r--r--  1 root root 97146 Jan 13 18:45 highleveldesign.png
-rw-r--r--  1 root root   562 Jan 13 18:45 idrac-telemetry-collector-all.sh
-rw-r--r--  1 root root   347 Feb  8 11:52 idrac-telemetry-receiver.sh
drwxr-xr-x  2 root root  4096 Feb  8 12:35 influxpump
-rw-r--r--  1 root root   187 Jan 13 18:45 initialize_timescaledb.sh
drwxr-xr-x  2 root root  4096 Feb  8 12:35 prometheuspump
-rw-r--r--  1 root root  2482 Feb  8 11:52 README.md
drwxr-xr-x  2 root root  4096 Feb  8 12:35 redfishread
drwxr-xr-x  2 root root  4096 Feb  8 12:35 simpleauth
drwxr-xr-x  2 root root  4096 Feb  8 12:35 simpledisc
drwxr-xr-x  2 root root  4096 Feb  8 12:35 splunkpump
drwxr-xr-x  2 root root  4096 Feb  8 12:35 timescalepump

Possible fixes:

  • Create a start script that checks prerequisites
  • Run container as root (bad idea but it does solve the problem)
  • Custom container with the script built in instead of volume mounted (kinda high maintenance)

need to run the golinter on all the files of the repo, for example

~/go/bin/golangci-lint run cmd/redfishread/redfishread.go
cmd/redfishread/redfishread.go:229:24: Error return value of r.Redfish.GetLceSSE is not checked (errcheck)
go r.Redfish.GetLceSSE(eventsIn, "https://"+r.Redfish.Hostname+"/redfish/v1/SSE")
^
cmd/redfishread/redfishread.go:364:34: Error return value of dataBusService.ReceiveCommand is not checked (errcheck)
go dataBusService.ReceiveCommand(commands)
^
cmd/redfishread/redfishread.go:243:2: S1023: redundant return statement (gosimple)
return
^
cmd/redfishread/redfishread.go:136:2: printf: log.Printf format %s has arg eventData of wrong type *github.com/dell/iDRAC-Telemetry-Reference-Tools/internal/redfish.RedfishPayload (govet)
log.Printf("RedFish LifeCycle Events Found for parsing: %s\n", eventData)
^
cmd/redfishread/redfishread.go:279:3: printf: log.Println arg list ends with redundant newline (govet)
log.Println("EventService not supported...\n")
^
cmd/redfishread/redfishread.go:230:2: ineffectual assignment to eventsOut (ineffassign)
eventsOut := new(redfish.RedfishEvent)
^
cmd/redfishread/redfishread.go:227:38: SA4009: argument eventService is overwritten before first use (staticcheck)
func getRedfishLce(r *RedfishDevice, eventService *redfish.RedfishPayload, dataBusService *databus.DataBusService) {
^
cmd/redfishread/redfishread.go:232:2: SA4009(related information): assignment to eventService (staticcheck)
eventService = eventsOut.Payload
^

Elastic Pipeline missing documents

When using the docker-compose elk reference pipeline there are gaps between sensor readings.
kibana_db_gaps
Viewing Aggregation from ES it is clear that documents are missing from these sensors.

  "aggregations": {
    "by_context": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "CPU2 Temp",
          "doc_count": 13520
        },
        {
          "key": "CPU1 Temp",
          "doc_count": 8088
        }
      ]
    }
  }

When viewing a parallel SSE from iDRAC it is clear all readings are present.
When viewing es-ingester logs it is clear that all readings are being read from databus

/* https://github.com/dell/iDRAC-Telemetry-Reference-Tools/blob/main/cmd/elkpump-basic/elkpump-basic.go */
			log.Print("value: ", value) // line 80 

I suspect that these readings are getting lost during handleGroups call to esapi.bulk

Corresponding es-ingester logs and ES aggregation for dashboard image
es-ingester_and_kibana_agg.log

configgui logs password to STDOUT in plaintext

image

Configgui is a cool tool and I like the bootstrap interface but it looks like it logs all received system passwords in plaintext. This is retrievable with docker logs telemetry-receiver. This should be changed moving forward

sometimes cgo crash happens

docker ps -a --no-trunc
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7acbc2910e9781d407f3a90a795a7959fdb325c0c2d07be07ff65594e292aa43 idrac-telemetry-reference-tools/redfishread:latest "/app" 11 minutes ago Up 10 minutes idrac-telemetry-reference-tools-redfishread-1
0a956d1e75932501e605ae598388f558710643ed4b7f1de663c47d79fa26f2e0 idrac-telemetry-reference-tools/configui:latest "/app" 11 minutes ago Up 10 minutes 0.0.0.0:8080->8082/tcp idrac-telemetry-reference-tools-configui-1
d9b3a00e7ed7ab3888053dbf183225f1267364ae1715a1245f4b56ae6a537013 idrac-telemetry-reference-tools/dbdiscauth:latest "/app" 11 minutes ago Up 10 minutes idrac-telemetry-reference-tools-dbdiscauth-1
dbb66d6c2e7796ca76b5502bb8d05b39893c41e992129e820222ac19c9355f73 mysql:latest "docker-entrypoint.sh mysqld" 11 minutes ago Up 10 minutes 3306/tcp, 33060/tcp mysqldb
3b936a4bcc866d69389b5d537430deec8ced262290e815cdf80e5f2154c5f8b6 idrac-telemetry-reference-tools/influxpump:latest "/app" 11 minutes ago Up 10 minutes idrac-telemetry-reference-tools-influx-pump-withtestserver-1
775a2dab68347ad59cd2aa8b503bc39dfa35b6c71496f447b0c05ad0e4820685 sha256:bc3956082f663fbbf4d470b64a28b745b28fea8833e65256d2fd2c98044acddc "/bin/sh -c 'CGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o app ./cmd/${CMD}'" 15 minutes ago Exited (2) 15 minutes ago relaxed_swartz
39433244cded1e89fe01b8ee12d0224ed37e85966fa60b7361342fc594b5949c grafana/grafana:9.0.1 "/run.sh" 22 minutes ago Up 10 minutes 0.0.0.0:3000->3000/tcp telemetry-reference-tools-grafana
bca1cf893acd07b2dcfa5e48f9e9bc18fb5898f46d57ae097d56258216d19636 influxdb:latest "/entrypoint.sh influxd" 22 minutes ago Up 10 minutes (healthy) 0.0.0.0:8086->8086/tcp telemetry-reference-tools-influx
844da1fb8bec31453e1d4acabf7503c3108f1cae8bc8981cec26ca7a1cda6bf1 rmohr/activemq:latest "/bin/sh -c 'bin/activemq console'" 19 hours ago Up 10 minutes 1883/tcp, 5672/tcp, 61613-61614/tcp, 61616/tcp, 0.0.0.0:8161->8161/tcp activemq

Configui is dependent on constant external web connection to function

Configui is using bootstrap under the hood but depends on the presence of a CDN network to load:

        <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous"/>
        <script src="http://code.jquery.com/jquery-3.4.1.min.js" integrity="sha384-vk5WoKIaW/vJyUAd9n/wmopsmNhiy+L2Z+SBxGYnUkunIxVxAv/UtMOhba/xskxh" crossorigin="anonymous"></script>
        <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js" integrity="sha384-ApNbgh9B+Y1QKtv3Rn7W3mgPxhU9K/ScQsAP7hUibX39j7fakFPskvXusvfa0b4Q" crossorigin="anonymous"></script>
        <script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js" integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl" crossorigin="anonymous"></script>

Enhancement: Debugging fairly difficult - no log output from containers

Right now, as far as I can tell, logs aren't being redirected from things like redfishread.go. In my case, I was having problems with the queue not being created in ActiveMQ. I had to set up my target Linux box as a remote debug target and then debug redfishread.go but the problem is it doesn't have the same DNS namespace so I'm having to resolve that before moving on.

The timescale container is always restarting

I was trying to use the tool with the timescale pipeline. However, the timescale container is always restarting:

$ docker-compose -f ./timescale-docker-pipeline-reference-unenc.yml up -d
Starting activemq  ... done
Starting timescale          ... done
Starting grafana            ... done
Starting mysqldb            ... done
Starting timescale-ingester ... done
Starting telemetry-receiver ... done
$ docker-compose -f ./timescale-docker-pipeline-reference-unenc.yml ps
       Name                     Command                 State             Ports
--------------------------------------------------------------------------------------
activemq             /bin/bash -c bin/activemq  ...   Up           61616/tcp, 8161/tcp
grafana              /run.sh                          Up           3000/tcp
mysqldb              docker-entrypoint.sh mysqld      Up           3306/tcp, 33060/tcp
telemetry-receiver   /bin/sh -c cmd/idrac-telem ...   Up
timescale            /bin/sh -c cmd/initialize_ ...   Restarting
timescale-ingester   go run cmd/timescalepump/t ...   Up

Its docker logs are like this:

$ docker logs --tail 50 --follow --timestamps timescale
2021-12-07T01:27:26.172757282Z 	Is the server running locally and accepting
2021-12-07T01:27:26.172759726Z 	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
2021-12-07T01:27:27.092481235Z psql: error: could not connect to server: No such file or directory

docker compose bind mounts incorrect

Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /home/michael_e_brown/git/iDRAC-Telemetry-Reference-Tools/dashboards/FanRPM.json

When running setup:

$ ./docker-compose-files/compose.sh --setup-influx-test-db
Pass: Docker compose version is 2.2.3.
Cleaning up old containers for telemetry-reference-tools-influx: 8c090a61b2b4
833789f8c09c
Stopping: 8c090a61b2b4
833789f8c09c
Removing: 8c090a61b2b4
833789f8c09c
Cleaning up old containers for telemetry-reference-tools-grafana: 757330c6973d
Stopping: 757330c6973d
Removing: 757330c6973d
Removing volume: telemetry-reference-tools_influxdb-storagetelemetry-reference-tools_influxdb-storage
Set up environment file in /home/michael_e_brown/git/iDRAC-Telemetry-Reference-Tools/.env
To run manually, run the following command line:
docker-compose --project-directory /home/michael_e_brown/git/iDRAC-Telemetry-Reference-Tools -f /home/michael_e_brown/git/iDRAC-Telemetry-Reference-Tools/docker-compose-files/docker-compose.yml --profile setup-influx-test-db up -d

WARN[0000] The "SPLUNK_HEC_URL" variable is not set. Defaulting to a blank string.
WARN[0000] The "SPLUNK_HEC_KEY" variable is not set. Defaulting to a blank string.
WARN[0000] The "SPLUNK_HEC_INDEX" variable is not set. Defaulting to a blank string.
[+] Running 5/5
⠿ Volume "idrac-telemetry-reference-tools_influxdb-storage" Created 0.0s
⠿ Volume "idrac-telemetry-reference-tools_grafana-storage" Created 0.0s
⠿ Container telemetry-reference-tools-influx Created 13.8s
⠿ Container telemetry-reference-tools-grafana Created 28.0s
⠿ Container idrac-telemetry-reference-tools-setup-influx-pump-1 Created 17.4s
⠋ Container idrac-telemetry-reference-tools-grafana-setup-influx-datasource-1 C... 0.0s
Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /home/michael_e_brown/git/iDRAC-Telemetry-Reference-Tools/dashboards/FanRPM.json

It looks like the compose yaml file has old stuff:

$ grep -r FanRPM.json *
docker-compose-files/docker-compose.yml: source: ${PWD}/dashboards/FanRPM.json

GetSensorThresholds.py - Report pull per example fails

Running the code as the example describes fails. I ran with 192.168.1.45 root password Sensor. This yields:

For metric report: Sensor
FAIL: detailed error message: 'MetricProperty'

The problem is in the indicated line:
image
It looks like the code expects there to be an attribute called MetricProperty that is not present.

docker-compose.yml

There is a value for SPLUNK_URL

This looks to only be a single value. Can a change be made so that multiple values for SPLUNK_URL can be enabled. In many Splunk deployments there are multiple endpoints so as not to overwhelm one. All the endpoints do communicate with each other so the data only needs to send to one. Maybe sending "round robin" or some other way would be fantastic.

Example:

SPLUNK_URL: "http://splunk-index01:8088/","http://splunk-index02:8088/","http://splunk-index03:8088/","http://splunk-index04:8088/","http://splunk-index05:8088/","http://splunk-index06:8088/"

telemetry reference tools not worked with prometheus

Hello.
Configuration setup could not complete for prometheus. There are many messages in the configuration of the installation process and it does not stop.
Last message: "Waiting for grafana container setup Prometheus DATA_SOURCE & DASHBOARD to finish"

Setup process:

ubuntu@ip-172-27-1-25:~/iDRAC-Telemetry-Reference-Tools$ ./docker-compose-files/compose.sh setup --prometheus-test-db
Pass: Docker compose version is 2.2.3.
[+] Running 23/25
 ⠿ grafana-setup-influx-datasource Error                                                                                                                                                                                                                 1.5s
 ⠿ grafana Pulled                                                                                                                                                                                                                                       16.6s
   ⠿ df9b9388f04a Pull complete                                                                                                                                                                                                                          3.5s
   ⠿ 8a2d7b6c89bf Pull complete                                                                                                                                                                                                                          3.9s
   ⠿ 290368f4e636 Pull complete                                                                                                                                                                                                                          5.5s
   ⠿ 34f09fc247ac Pull complete                                                                                                                                                                                                                          5.7s
   ⠿ f58a7c0e6b43 Pull complete                                                                                                                                                                                                                          6.6s
   ⠿ a5e39be1ca05 Pull complete                                                                                                                                                                                                                         14.2s
   ⠿ 9310026c8fc6 Pull complete                                                                                                                                                                                                                         14.3s
   ⠿ f7053e076105 Pull complete                                                                                                                                                                                                                         14.4s
   ⠿ dd166239212b Pull complete                                                                                                                                                                                                                         14.5s
 ⠿ prometheus Pulled                                                                                                                                                                                                                                    12.6s
   ⠿ 50e8d59317eb Pull complete                                                                                                                                                                                                                          1.0s
   ⠿ b6c3b3e34d73 Pull complete                                                                                                                                                                                                                          2.0s
   ⠿ a2e16c7047f9 Pull complete                                                                                                                                                                                                                          5.5s
   ⠿ a96a052cd33c Pull complete                                                                                                                                                                                                                          8.4s
   ⠿ e008aa9ce341 Pull complete                                                                                                                                                                                                                          8.7s
   ⠿ 4691b2e44244 Pull complete                                                                                                                                                                                                                          8.9s
   ⠿ 9661a7a702f8 Pull complete                                                                                                                                                                                                                          9.2s
   ⠿ 586ae72743cc Pull complete                                                                                                                                                                                                                          9.5s
   ⠿ 21cdbc93d370 Pull complete                                                                                                                                                                                                                          9.7s
   ⠿ 9285c2ca8bb2 Pull complete                                                                                                                                                                                                                          9.9s
   ⠿ 75dc67f7e388 Pull complete                                                                                                                                                                                                                         10.1s
   ⠿ 3dfa2b58407d Pull complete                                                                                                                                                                                                                         10.4s
 ⠿ setup-prometheus-pump Error                                                                                                                                                                                                                           1.5s
Sending build context to Docker daemon  118.1kB
Step 1/2 : FROM alpine:latest
latest: Pulling from library/alpine
213ec9aee27d: Pull complete 
Digest: sha256:bc41182d7ef5ffc53a40b044e725193bc10142a1243f395ee852a8d9730fc2ad
Status: Downloaded newer image for alpine:latest
 ---> 9c6f07244728
Step 2/2 : RUN apk --no-cache add curl jq uuidgen
 ---> Running in ab8e032784f2
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/community/x86_64/APKINDEX.tar.gz
(1/9) Installing ca-certificates (20220614-r0)
(2/9) Installing brotli-libs (1.0.9-r6)
(3/9) Installing nghttp2-libs (1.47.0-r0)
(4/9) Installing libcurl (7.83.1-r2)
(5/9) Installing curl (7.83.1-r2)
(6/9) Installing oniguruma (6.9.8-r0)
(7/9) Installing jq (1.6-r1)
(8/9) Installing libuuid (2.38-r1)
(9/9) Installing uuidgen (2.38-r1)
Executing busybox-1.35.0-r17.trigger
Executing ca-certificates-20220614-r0.trigger
OK: 9 MiB in 23 packages
Removing intermediate container ab8e032784f2
 ---> 0920c4b0ec5a
[Warning] One or more build-args [GROUPNAME GROUP_ID USERNAME USER_ID] were not consumed
Successfully built 0920c4b0ec5a
Successfully tagged idrac-telemetry-reference-tools/setupprometheus:latest
Sending build context to Docker daemon  118.1kB
Step 1/2 : FROM alpine:latest
 ---> 9c6f07244728
Step 2/2 : RUN apk --no-cache add curl jq uuidgen
 ---> Using cache
 ---> 0920c4b0ec5a
[Warning] One or more build-args [GROUPNAME GROUP_ID USERNAME USER_ID] were not consumed
Successfully built 0920c4b0ec5a
Successfully tagged idrac-telemetry-reference-tools/setup:latest

Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
[+] Running 7/7
 ⠿ Network idrac-telemetry-reference-tools_host-bridge-net                      Created                                                                                                                                                                  0.1s
 ⠿ Volume "idrac-telemetry-reference-tools_grafana-storage"                     Created                                                                                                                                                                  0.0s
 ⠿ Volume "idrac-telemetry-reference-tools_prometheus-data"                     Created                                                                                                                                                                  0.0s
 ⠿ Container prometheus                                                         Started                                                                                                                                                                  0.8s
 ⠿ Container telemetry-reference-tools-grafana                                  Started                                                                                                                                                                  1.5s
 ⠿ Container idrac-telemetry-reference-tools-grafana-setup-influx-datasource-1  Started                                                                                                                                                                  2.4s
 ⠿ Container idrac-telemetry-reference-tools-setup-prometheus-pump-1            Started                                                                                                                                                                  1.7s
Waiting for grafana container setup Prometheus DATA_SOURCE & DASHBOARD to finish
Waiting for grafana container setup Prometheus DATA_SOURCE & DASHBOARD to finish
Waiting for grafana container setup Prometheus DATA_SOURCE & DASHBOARD to finish
Waiting for grafana container setup Prometheus DATA_SOURCE & DASHBOARD to finish

I created DATA_SOURCE manually, but it not solve message flooding.
image

About my server:

ubuntu@ip-172-27-1-25:~/iDRAC-Telemetry-Reference-Tools$ docker version
Client: Docker Engine - Community
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun  6 23:02:57 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.17
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b842
  Built:            Mon Jun  6 23:01:03 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.7
  GitCommit:        0197261a30bf81f1ee8e6a4dd2dea0ef95d67ccb
 runc:
  Version:          1.1.3
  GitCommit:        v1.1.3-0-g6724737
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

ubuntu@ip-172-27-1-25:~/iDRAC-Telemetry-Reference-Tools$ docker-compose -v
Docker Compose version v2.2.3

ubuntu@ip-172-27-1-25:~/iDRAC-Telemetry-Reference-Tools$ uname -a
Linux ip-172-27-1-25 5.13.0-1029-aws #32~20.04.1-Ubuntu SMP Thu Jun 9 13:03:13 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

ubuntu@ip-172-27-1-25:~/iDRAC-Telemetry-Reference-Tools$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.4 LTS"

ubuntu@ip-172-27-1-25:~/iDRAC-Telemetry-Reference-Tools$ docker ps -a
CONTAINER ID   IMAGE                                                    COMMAND                  CREATED          STATUS                      PORTS                                       NAMES
86d83075dc3f   idrac-telemetry-reference-tools/setup:latest             "/extrabin/initializ…"   13 seconds ago   Up 11 seconds                                                           idrac-telemetry-reference-tools-grafana-setup-influx-datasource-1
e3d2a0cd46be   idrac-telemetry-reference-tools/setupprometheus:latest   "/bin/sh"                14 seconds ago   Exited (0) 11 seconds ago                                               idrac-telemetry-reference-tools-setup-prometheus-pump-1
311131208d18   grafana/grafana:9.0.1                                    "/run.sh"                14 seconds ago   Up 12 seconds               0.0.0.0:3000->3000/tcp, :::3000->3000/tcp   telemetry-reference-tools-grafana
5f207b262c4f   prom/prometheus:v2.36.0                                  "/bin/prometheus --c…"   14 seconds ago   Up 12 seconds               0.0.0.0:9090->9090/tcp, :::9090->9090/tcp   prometheus

ubuntu@ip-172-27-1-25:~/iDRAC-Telemetry-Reference-Tools$ docker logs idrac-telemetry-reference-tools-setup-prometheus-pump-1
ubuntu@ip-172-27-1-25:~/iDRAC-Telemetry-Reference-Tools$ docker logs telemetry-reference-tools-grafana
✔ Downloaded grafana-polystat-panel v1.2.11 zip successfully

Please restart Grafana after installing plugins. Refer to Grafana documentation for instructions if necessary.

logger=settings t=2022-08-23T13:54:45.48605418Z level=info msg="Starting Grafana" version=9.0.1 commit=14e988bd22 branch=HEAD compiled=2022-06-21T13:43:01Z
logger=settings t=2022-08-23T13:54:45.486480809Z level=info msg="Config loaded from" file=/usr/share/grafana/conf/defaults.ini
logger=settings t=2022-08-23T13:54:45.48658704Z level=info msg="Config loaded from" file=/etc/grafana/grafana.ini
logger=settings t=2022-08-23T13:54:45.486656115Z level=info msg="Config overridden from command line" arg="default.paths.data=/var/lib/grafana"
logger=settings t=2022-08-23T13:54:45.486705388Z level=info msg="Config overridden from command line" arg="default.paths.logs=/var/log/grafana"
logger=settings t=2022-08-23T13:54:45.486772204Z level=info msg="Config overridden from command line" arg="default.paths.plugins=/var/lib/grafana/plugins"
logger=settings t=2022-08-23T13:54:45.486825175Z level=info msg="Config overridden from command line" arg="default.paths.provisioning=/etc/grafana/provisioning"
logger=settings t=2022-08-23T13:54:45.486875607Z level=info msg="Config overridden from command line" arg="default.log.mode=console"
logger=settings t=2022-08-23T13:54:45.486934365Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_DATA=/var/lib/grafana"
logger=settings t=2022-08-23T13:54:45.486981209Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_LOGS=/var/log/grafana"
logger=settings t=2022-08-23T13:54:45.48701263Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_PLUGINS=/var/lib/grafana/plugins"
logger=settings t=2022-08-23T13:54:45.487060119Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_PROVISIONING=/etc/grafana/provisioning"
logger=settings t=2022-08-23T13:54:45.487094841Z level=info msg="Path Home" path=/usr/share/grafana
logger=settings t=2022-08-23T13:54:45.487139563Z level=info msg="Path Data" path=/var/lib/grafana
logger=settings t=2022-08-23T13:54:45.487179483Z level=info msg="Path Logs" path=/var/log/grafana
logger=settings t=2022-08-23T13:54:45.487209926Z level=info msg="Path Plugins" path=/var/lib/grafana/plugins
logger=settings t=2022-08-23T13:54:45.487251395Z level=info msg="Path Provisioning" path=/etc/grafana/provisioning
logger=settings t=2022-08-23T13:54:45.487291383Z level=info msg="App mode production"
logger=sqlstore t=2022-08-23T13:54:45.487398521Z level=info msg="Connecting to DB" dbtype=sqlite3
logger=migrator t=2022-08-23T13:54:45.509445413Z level=info msg="Starting DB migrations"
logger=migrator t=2022-08-23T13:54:45.515163292Z level=info msg="migrations completed" performed=0 skipped=425 duration=723.447µs
logger=plugin.manager t=2022-08-23T13:54:45.556576958Z level=info msg="Plugin registered" pluginId=input
logger=plugin.manager t=2022-08-23T13:54:45.585589318Z level=info msg="Plugin registered" pluginId=grafana-polystat-panel
logger=secrets t=2022-08-23T13:54:45.586282279Z level=info msg="Envelope encryption state" enabled=true currentprovider=secretKey.v1
logger=query_data t=2022-08-23T13:54:45.593731982Z level=info msg="Query Service initialization"
logger=live.push_http t=2022-08-23T13:54:45.601840938Z level=info msg="Live Push Gateway initialization"
logger=infra.usagestats.collector t=2022-08-23T13:54:45.756288751Z level=info msg="registering usage stat providers" usageStatsProvidersLen=2
logger=provisioning.datasources t=2022-08-23T13:54:45.756705794Z level=error msg="can't read datasource provisioning files from directory" path=/etc/grafana/provisioning/datasources error="open /etc/grafana/provisioning/datasources: no such file or directory"
logger=provisioning.plugins t=2022-08-23T13:54:45.756848604Z level=error msg="Failed to read plugin provisioning files from directory" path=/etc/grafana/provisioning/plugins error="open /etc/grafana/provisioning/plugins: no such file or directory"
logger=provisioning.notifiers t=2022-08-23T13:54:45.756922518Z level=error msg="Can't read alert notification provisioning files from directory" path=/etc/grafana/provisioning/notifiers error="open /etc/grafana/provisioning/notifiers: no such file or directory"
logger=http.server t=2022-08-23T13:54:45.760591432Z level=info msg="HTTP Server Listen" address=[::]:3000 protocol=http subUrl= socket=
logger=ngalert t=2022-08-23T13:54:45.760793116Z level=info msg="warming cache for startup"
logger=ngalert.multiorg.alertmanager t=2022-08-23T13:54:45.761029529Z level=info msg="starting MultiOrg Alertmanager"
logger=provisioning.dashboard t=2022-08-23T13:54:45.761397896Z level=error msg="can't read dashboard provisioning files from directory" path=/etc/grafana/provisioning/dashboards error="open /etc/grafana/provisioning/dashboards: no such file or directory"
logger=grafanaStorageLogger t=2022-08-23T13:54:45.764935956Z level=info msg="storage starting"
logger=context traceID=00000000000000000000000000000000 userId=1 orgId=1 uname=admin t=2022-08-23T13:55:44.681988313Z level=info msg="Request Completed" method=GET path=/api/live/ws status=0 remote_addr=185.119.0.165 time_ms=1 duration=1.295208ms size=0 referer= traceID=00000000000000000000000000000000
logger=live t=2022-08-23T13:57:06.172551959Z level=info msg="Initialized channel handler" channel=grafana/dashboard/uid/Cl2GHHW4z address=grafana/dashboard/uid/Cl2GHHW4z
ubuntu@ip-172-27-1-25:~/iDRAC-Telemetry-Reference-Tools$ docker logs prometheus
ts=2022-08-23T13:54:44.381Z caller=main.go:491 level=info msg="No time or size retention was set so using the default time retention" duration=15d
ts=2022-08-23T13:54:44.383Z caller=main.go:535 level=info msg="Starting Prometheus Server" mode=server version="(version=2.36.0, branch=HEAD, revision=d48f381d9a4e68c83283ce5233844807dfdc5ba5)"
ts=2022-08-23T13:54:44.384Z caller=main.go:540 level=info build_context="(go=go1.18.2, user=root@b3126bd1c115, date=20220530-13:56:56)"
ts=2022-08-23T13:54:44.384Z caller=main.go:541 level=info host_details="(Linux 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 ba0b8a950a3e (none))"
ts=2022-08-23T13:54:44.384Z caller=main.go:542 level=info fd_limits="(soft=1048576, hard=1048576)"
ts=2022-08-23T13:54:44.384Z caller=main.go:543 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2022-08-23T13:54:44.386Z caller=web.go:553 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2022-08-23T13:54:44.387Z caller=main.go:972 level=info msg="Starting TSDB ..."
ts=2022-08-23T13:54:44.407Z caller=tls_config.go:195 level=info component=web msg="TLS is disabled." http2=false
ts=2022-08-23T13:54:44.408Z caller=head.go:493 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2022-08-23T13:54:44.408Z caller=head.go:536 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=2.47µs
ts=2022-08-23T13:54:44.408Z caller=head.go:542 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2022-08-23T13:54:44.809Z caller=head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=1
ts=2022-08-23T13:54:44.809Z caller=head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=1 maxSegment=1
ts=2022-08-23T13:54:44.809Z caller=head.go:619 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=1.567329ms wal_replay_duration=398.947324ms total_replay_duration=400.539787ms
ts=2022-08-23T13:54:44.810Z caller=main.go:993 level=info fs_type=EXT4_SUPER_MAGIC
ts=2022-08-23T13:54:44.810Z caller=main.go:996 level=info msg="TSDB started"
ts=2022-08-23T13:54:44.810Z caller=main.go:1177 level=info msg="Loading configuration file" filename=/config/prometheus.yml
ts=2022-08-23T13:54:44.811Z caller=main.go:1214 level=info msg="Completed loading of configuration file" filename=/config/prometheus.yml totalDuration=655.084µs db_storage=1.267µs remote_storage=2.248µs web_handler=672ns query_engine=997ns scrape=275.221µs scrape_sd=41.265µs notify=833ns notify_sd=3.344µs rules=1.734µs tracing=3.757µs
ts=2022-08-23T13:54:44.811Z caller=main.go:957 level=info msg="Server is ready to receive web requests."
ts=2022-08-23T13:54:44.811Z caller=manager.go:937 level=info component="rule manager" msg="Starting rule manager..."


Expected behavior: the configuration process ( ./compose.sh setup --prometheus-test-db) will complete and after it I can start the process ( ./compose.sh start).

GetSensorThresholds.py - Specifying example metric property fails

Using the example code:
image

and running as is gives:

Thresholds for mp:'/redfish/v1/Systems/System.Embedded.1/Oem/Dell/DellNumericSensors/iDRAC.Embedded.1_0x23_SystemBoardSYSUsage' are below:
FAIL, status code for reading attributes is not 200, code is: 404

Tested against an R840 with idrac 4.32.15.00

compose.sh

When using the compose.sh --splunk-pump I get the following

Creating network "idrac-telemetry-reference-tools_host-bridge-net" with driver "bridge"
Creating volume "idrac-telemetry-reference-tools_influxdb-storage" with default driver
Creating volume "idrac-telemetry-reference-tools_grafana-storage" with default driver
Creating volume "idrac-telemetry-reference-tools_prometheus-data" with default driver
Creating volume "idrac-telemetry-reference-tools_mysqldb-volume" with default driver
ERROR: Service "activemq" was pulled in as a dependency of service "splunk-pump-standalone" but is not enabled by the active profiles. You may fix this by adding a common profile to "activemq" and "splunk-pump-standalone".

Because files are mounted, mounting from a filesystem using CRLF will cause failures

root@e0f6acee0d14:/go/src/github.com/telemetry-reference-tools# cmd/idrac-telemetry-receiver.sh
cmd/idrac-telemetry-receiver.sh: line 1: $'\r': command not found
cmd/idrac-telemetry-receiver.sh: line 2: $'\r': command not found
cmd/idrac-telemetry-receiver.sh: line 3: $'\r': command not found

In my case I was deving from Windows sync'd to a remote Linux system. Because the code is mounted instead of being container-native, if the saved format converts line endings to CRLF the code fails.

We should ensure that git saves all files with Unix-style line endings.

If we choose to include a startup script it should check that all required files use only LF instead of CRLF

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.