You can see that we're spending almost all clock cycles on garbage collection, and garbage collection happens often
After profiling, the culprit turned out to be the summaries that are kept for each request in wai-prometheus-middleware
Tue Jul 4 11:17 2017 Time and Allocation Profiling Report (Final)
warpmachine +RTS -N -T -p -RTS --proxy-host localhost --proxy-port 8000
total time = 17.73 secs (17725 ticks @ 1000 us, 1 processor)
total alloc = 10,233,152,016 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
invariant.fj Prometheus.Metric.Summary src/Prometheus/Metric/Summary.hs:(197,9)-(198,58) 17.9 13.1
wrapCompile.\ Text.Regex.Posix.Wrap Text/Regex/Posix/Wrap.hsc:(480,49)-(484,42) 7.9 0.0
throwSocketErrorIfMinus1RetryMayBlock Network.Socket.Internal Network/Socket/Internal.hsc:(161,1)-(162,53) 6.2 0.6
getAddrInfo.\.\.\.\ Network.Socket Network/Socket.hsc:(1460,36)-(1470,43) 5.4 0.0
connect.\.\.connectLoop Network.Socket Network/Socket.hsc:(444,9)-(458,29) 4.7 0.0
throwSocketErrorIfMinus1_ Network.Socket.Internal Network/Socket/Internal.hsc:166:1-47 3.0 0.0
invariant Prometheus.Metric.Summary src/Prometheus/Metric/Summary.hs:(193,1)-(198,58) 2.4 6.0
wrapTest.\ Text.Regex.Posix.Wrap Text/Regex/Posix/Wrap.hsc:(489,45)-(495,40) 1.9 0.0
compress.compressPair.inv Prometheus.Metric.Summary src/Prometheus/Metric/Summary.hs:172:17-60 1.8 5.1
connect.\.\.connectBlocked Network.Socket Network/Socket.hsc:(460,9)-(465,67) 1.6 0.4
compress Prometheus.Metric.Summary src/Prometheus/Metric/Summary.hs:(156,1)-(172,60) 1.3 3.9
decodeWithTable/fill Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:(171,62)-(194,68) 1.1 2.3
encode/fill Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:(61,55)-(69,52) 1.1 2.6
urlDecode Network.HTTP.Types.URI Network/HTTP/Types/URI.hs:(209,1)-(228,34) 1.0 2.3
peek8_32 Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:43:1-36 0.9 2.8
decodeWithTable.\.\.look Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:(165,11)-(168,49) 0.8 2.8
asCString Text.Regex.Posix.ByteString Text/Regex/Posix/ByteString.hs:(150,1)-(152,40) 0.5 1.1
memConstEqual.loop Data.Memory.PtrMethods Data/Memory/PtrMethods.hs:(103,5)-(107,34) 0.5 1.7
toByteString Blaze.ByteString.Builder Blaze/ByteString/Builder.hs:151:1-46 0.3 1.4
urlDecode.go Network.HTTP.Types.URI Network/HTTP/Types/URI.hs:(211,5)-(221,40) 0.2 1.5
toByteStringIOWith.getBuffer Blaze.ByteString.Builder Blaze/ByteString/Builder.hs:(187,5)-(204,40) 0.2 10.8
toByteStringIOWith Blaze.ByteString.Builder Blaze/ByteString/Builder.hs:(184,1)-(204,40) 0.2 10.7
recv Network.Socket.ByteString Network/Socket/ByteString.hsc:(226,1)-(231,62) 0.1 3.3
We commented out the middleware in our application, and instead of using 400% CPU, we're back to 2% CPU, and the GC contribution is only 0.02
instead of 1.0
. Also see the graph below. On the left is before removing the middleware, and on the right is after removing the middleware.