Java Application Monitoring and Troubleshooting Basics
4. Java Application as a Runtime White Box: App running, JVM and application monitoring, troubleshooting, faults analysing and tuning. 24 hrs / 3 days.
Software at student's developer station
Network access from student stations to emulation of prod host
Network Access from student stations and prod host
* starred items and checked checklist items are optional
Training introducing and focusing (15m)
Hands-on: Teams and their demand (15m)
Java Platform crash course (2h)
What do any application doing?
System as Public service Metaphor
How we do model the data?
How we do model the behavior?
Where data is stored? Core data scopes
Concept
Metaphor
Implementation
Local/method/stack variables
Short-term memory: Chief remember sugar doze only when doing sugaring
Call Stack
Parameters
Details when asking others to do some work: waiter asks johnnyChief.makeMeal(whatMeals?)
Call Stack
Object state
State of worker or structure: its current properties values
Heap object space
- Request scope
Some object state accessible to all the workers in call chain handling request: sticky note or voice message given each worker to next, "not spicy"
Parameters, framework support, ThreadLocal
- Session scope
Some object state accessible to all the workers handling all requests from the same Visitor: "its for table 13"
Framework support
- Singleton/application scope
Some object state accessible to all the workers
Framework support, Language support for static variables
Persistent
Long-term data store surviving system restarts
File, embedded/local database, remote filesystem, remote database
Integration
Data stored and processed by external system
Remote system procedure call, message queue
How do we implement application with Java
Concept
Metaphor
Reality
Runtime
If Developer is CEO setting application logic, Runtime is your vice
JVM API and system library API
Working with thread : Thread API, states, pooling
We can create work force on demand to execute our instructions
But we have some RAM memory and performance cost
Working with class : dynamic classloading
Instructions what to do workers get just in time not ahead but worker remember it till die
But we have run-time latency costs
Working with instance : create and GC
We ask our vice to hire and retire workers
Objects state costs us RAM memory. When object's no longer needed it purged from RAM
How do we build Java application?
How do we run Java application?
How do we monitor Java application internals?
Teamwork: NFRs and metrics checklist (15m)
Hands-on: Simple application local building, running and monitoring (30m)
cd
git clone https://github.com/{{ STUDENT_ACCOUNT }}/java-application-monitoring-and-troubleshooting
cd java-application-monitoring-and-troubleshooting
git checkout {{ group_custom_branch }}
mvn clean verify [-DskipTests]
java \
-Xms128m -Xmx256m \
-cp target/dbo-1.0-SNAPSHOT.jar \
-Dapp.property=value \
com.acme.dbo.Presentation \
program arguments
linux$ top [-pid jvmpid]
windows> taskmgr
Then answered and reviewed at debrief
Modern applications architecture and deployment: What tiers do we monitor? (1h)
Tier
Application Layers: UI/P, API/C, BL/S, DAL/R
Application caching
Thread Pool
JPA Caching
JPA subsystem
Connection Pools
JDBC subsystem
Framework configuration with profiles
Framework for Spring modules management
Framework for Web/SOAP/REST application expose
Framework for Application
Application Server/Servlet Container
JVM: application debug API
JVM: application profiling API
JVM: universal monitoring API
JVM: threads, IO
JVM: memory, GC
JVM: process
Container: Networking
Container: Core
Message queues
DBMS
OS: Threads
OS: Processes
Hardware: HDD/SSD
Hardware: RAM
Hardware: CPU
Teamwork: What metrics do we monitor for production app? (30m)
Monitoring architecture overview (30m)
pUML source
@startuml
node "dev station" as devstation {
[ssh terminal ] as terminal
[ansible playbook ] as ansible
[browser ]
[jmeter ]
ansible -> terminal
}
actor Ops as ops
ops --> ansible
ops --> terminal
ops --> browser
ops --> jmeter
node prod {
[jmeter agent ] as jmeter_agent
[node exporter ] as node_exporter
component [application ] {
[monitoring endpoint ] as monitor
}
component [prometheus ] {
database metrics_history
}
prometheus --> monitor
prometheus -> node_exporter
jmeter_agent -> application
node_exporter -> prod
interface port
monitor - ( port
}
terminal --> prod
browser --> prometheus
browser --> application
jmeter --> jmeter_agent
@enduml
Monitoring overview and tools
Load generation architecture overview
Hands-on: Prod host and monitoring provisioning (15m)
jmeter -Jremote_hosts=127.0.0.1 -Dserver.rmi.ssl.disable=true
JMeter → Options → Log Viewer
JMeter → Run → Remote Start → 127.0.0.1
Modern applications architecture and deployment: How do we monitor tiers? (1h)
Tier
Implementation
Tools
Application Layers
PWA or Server-side Template Engine, Spring @Controllers, @Services, Spring Data JPA @Repositories
Spring Metrics for Counters, Timers, Long Task Timers, Statistics
Application caching
spring-boot-starter-cache module + built-in default Simple cache provider
Spring Metrics for Caches
Thread Pool
Java built-in ExecutorService
Spring Metrics for DataSources
JPA subsystem and JPA Caching
Hibernate
service:jmx:// Hibernate built-in statistics
JDBC subsystem and Connection Pools
Derby JDBC driver + HikariCP
service:jmx://com.zaxxer.hikari , Spring Metrics for DataSources
Framework for modules management
Spring Boot
spring-boot-actuator + Built-in Micrometer + Prometheus Adapter
Framework for Application
Spring Core + Spring MVC (spring-boot-starter-web)
Spring Metrics for Web Instrumentation [for Prometheus], Core Micrometer [for Prometheus]
Application Server/Servlet Container
spring-boot-starter-tomcat
JVM: application debug API
JPDA
jsadebugd
JVM: application profiling API
JVMTI
hprof
JVM: threads, IO
JVM scheduler, JNI
jstack
JVM: memory, GC
Built-in Garbage Collectors
jstat , jstatd , jmap , jhat
JVM: universal monitoring API
JMX
jvisualvm
JVM: process
Oracle/OpenJDK JRE
jps , jcmd , jinfo
Containers
Docker
docker cli , docker api for Prometheus , Prometheus cAdvisor
Message queues
n/u
vendor tools, prometheus exporters
DBMS
Apache Derby / Postgresql
vendor tools, Prometheus pg_exporter , pg explain , pg analyse
OS
Linux
ps , top
Hardware
x86
df
, free
, SNMP , Prometheus Node Exporter
Hands-on: Modern application remote building, running and monitoring (30m)
ssh -p {{ ansible_port }} {{ ansible_user }}@{{ prod }}
cd /opt
git clone --branch master --depth 1 https://github.com/{{ STUDENT_ACCOUNT }}/agile-practices-application
cd agile-practices-application
mvn clean verify [-DskipTests]
cd /opt/agile-practices-application
rm -rf dbo-db
nohup \
java \
-Xms128m -Xmx128m \
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=heapdump.hprof \
-Dderby.stream.error.file=log/derby.log \
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9999 -Dcom.sun.management.jmxremote.rmi.port=9999 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=0.0.0.0 \
-jar target/dbo-1.0-SNAPSHOT.jar \
--spring.profiles.active=qa \
--server.port=8080 \
> /dev/null 2>&1 &
jmeter -n -t load.jmx -Jremote_hosts=127.0.0.1 -Dserver.rmi.ssl.disable=true
df -ah
free -m
docker images -a
docker ps -a
ps -ef
ps -eaux --forest
ps -eT | grep < pid>
top
top + ' f'
top -p < pid>
top -H -p < pid>
jps [-lvm]
jcmd < pid> help
jcmd < pid> VM.uptime
jcmd < pid> VM.system_properties
jcmd < pid> VM.flags
http://{{ prod }}:8080/dbo/swagger-ui.html
http://{{ prod }}:8080/dbo/actuator/health
http://{{ prod }}:8080/dbo/actuator
http://{{ prod }}:8080/dbo/actuator/prometheus
http://{{ prod }}:9090/alerts
http://{{ prod }}:9090/graph
http://{{ prod }}:9090/graph?g0.range_input=15m&g0.tab=0&g0.expr=http_server_requests_seconds_count
curl --request POST http://{{ prod }}:8080/dbo/actuator/shutdown
rm -rf dbo-db
Then answered and reviewed at debrief
Typical JVM memory issues (3)
Heap dumps and key memory metrics
Typical issues and resolution
jcmd <pid> GC.heap_dump /tmp/dump.hprof
jmap -dump:live,format=b,file=/tmp/dump.hprof <pid>
Typical JVM threading issues (3)
JVM threading architecture
Application threading architecture
Typical issues and resolution
Typical JVM IO issues (3)
Non-blocking IO architecture
Typical data storage issues (3)
Typical JVM containerization issues (1)*
Containerization architecture
Typical caching issues (1.5)*
Generating application workload (1.5)*
Distributed logging (1.5)*
Intro to Java logging solutions
Distributed logging collection and processing
Distributed monitoring arhitecture
Typical RDBMS issues (1.5)*
How to deal with typical distributed system issues? (2.5)*
First Law of Distributed Objects
Microservices architecture patterns and trade-offs