Comments (7)
Related SOF answer: https://stackoverflow.com/questions/32123475/profiling-builds-with-stack
TL;DL: use stack build --profile
to build the system and run it using stack exec -- <path-to-penrose-binary> +RTS -p
It seems like if we run stack build --profile
, it's quite costly to come back to the original stack build
because the libraries have to be rebuilt and it, as we all know, takes 20 mins to do so.
- TODO: find a way to build libs with profiling options separately
from penrose.
Profile result for a circle + a label + contains
Sun Jul 22 11:36 2018 Time and Allocation Profiling Report (Final)
penrose +RTS -p -RTS sub/oneset.sub sty/venn.sty dsll/setTheory.dsl
total time = 2.35 secs (2354 ticks @ 1000 us, 1 processor)
total alloc = 1,224,297,568 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
shapeDefs ShapeDef src/ShapeDef.hs:(105,1)-(106,45) 3.3 3.7
evalProperty.(...) NewStyle src/NewStyle.hs:1411:13-52 3.0 15.5
lookupProperty NewStyle src/NewStyle.hs:(1528,1)-(1535,27) 2.8 0.4
lookupField NewStyle src/NewStyle.hs:(1516,1)-(1525,28) 2.6 1.0
penalty NewFunctions src/NewFunctions.hs:(168,1)-(169,23) 2.5 1.3
addProperty.properties' NewStyle src/NewStyle.hs:1051:16-63 2.0 1.4
evalProperty NewStyle src/NewStyle.hs:(1409,1)-(1412,64) 2.0 4.4
evalExpr.argResult NewStyle src/NewStyle.hs:(1441,11)-(1512,97) 1.9 1.8
constrFuncDict NewFunctions src/NewFunctions.hs:(174,1)-(186,13) 1.8 1.5
initProperty NewStyle src/NewStyle.hs:(1686,1)-(1694,68) 1.8 3.7
.: ShapeDef src/ShapeDef.hs:462:1-10 1.7 0.8
addProperty NewStyle src/NewStyle.hs:(1027,1)-(1054,76) 1.7 2.5
evalGPI_withUpdate.trans' NewStyle src/NewStyle.hs:1418:13-87 1.4 8.1
pPrint Text.Show.Pretty Text/Show/Pretty.hs:71:1-26 1.3 0.1
evalExpr NewStyle src/NewStyle.hs:(1438,1)-(1512,97) 1.3 4.4
checkDeclaredType Env src/Env.hs:(319,1)-(322,118) 1.2 0.0
happyDoAction Text.Show.Parser templates/GenericTemplate.hs:(111,1)-(136,60) 1.1 0.3
ppCon Text.Show.Pretty Text/Show/Pretty.hs:(132,1)-(133,51) 1.1 2.1
binary Numeric.AD.Internal.Reverse src/Numeric/AD/Internal/Reverse.hs:(182,3)-(191,89) 1.1 0.8
textType ShapeDef src/ShapeDef.hs:(250,1)-(261,6) 1.1 1.5
circType ShapeDef src/ShapeDef.hs:(222,1)-(234,6) 1.1 1.2
addProperty.fieldDict' NewStyle src/NewStyle.hs:1052:16-77 1.1 1.0
shapes2vals.lookupPath NewStyle src/NewStyle.hs:(1621,9)-(1625,46) 1.1 0.6
r2f Utils src/Utils.hs:65:1-16 1.0 0.2
block Text.Show.Pretty Text/Show/Pretty.hs:(150,1)-(152,63) 0.9 1.6
shapeDefs.zipWithKey ShapeDef src/ShapeDef.hs:106:11-45 0.9 2.2
toPolymorphic Server src/Server.hs:156:1-74 0.8 1.4
evalExprs.evalExprF.(...) NewStyle src/NewStyle.hs:1543:28-71 0.6 1.4
shapeExprsToVals.properties' NewStyle src/NewStyle.hs:1612:15-50 0.5 1.1
from penrose.
For nested.sub
, the optimization seems to be very slow. Here is the profiling result:
Sun Jul 22 11:41 2018 Time and Allocation Profiling Report (Final)
penrose +RTS -p -RTS sub/nested.sub sty/venn.sty dsll/setTheory.dsl
total time = 30.60 secs (30596 ticks @ 1000 us, 1 processor)
total alloc = 23,942,023,920 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
.: ShapeDef src/ShapeDef.hs:462:1-10 6.5 2.9
lookupField NewStyle src/NewStyle.hs:(1516,1)-(1525,28) 5.4 1.1
evalProperty.(...) NewStyle src/NewStyle.hs:1411:13-52 4.5 17.8
lookupProperty NewStyle src/NewStyle.hs:(1528,1)-(1535,27) 3.8 0.4
predEq NewStyle src/NewStyle.hs:641:1-99 3.6 1.0
shapeDefs ShapeDef src/ShapeDef.hs:(105,1)-(106,45) 3.3 3.0
penalty NewFunctions src/NewFunctions.hs:(168,1)-(169,23) 3.3 1.3
initProperty NewStyle src/NewStyle.hs:(1686,1)-(1694,68) 2.8 3.0
evalExpr.argResult NewStyle src/NewStyle.hs:(1441,11)-(1512,97) 2.7 2.1
compare NewStyle src/NewStyle.hs:874:25-27 2.1 0.0
evalProperty NewStyle src/NewStyle.hs:(1409,1)-(1412,64) 2.1 5.1
evalGPI_withUpdate.trans' NewStyle src/NewStyle.hs:1418:13-87 2.0 9.2
addProperty.properties' NewStyle src/NewStyle.hs:1051:16-63 1.8 1.1
addProperty NewStyle src/NewStyle.hs:(1027,1)-(1054,76) 1.7 2.0
evalExpr NewStyle src/NewStyle.hs:(1438,1)-(1512,97) 1.4 4.9
findShape.\ ShapeDef src/ShapeDef.hs:401:24-45 1.4 0.0
textType ShapeDef src/ShapeDef.hs:(250,1)-(261,6) 1.2 1.2
shapes2vals.lookupPath NewStyle src/NewStyle.hs:(1621,9)-(1625,46) 1.1 0.5
toSubPred NewStyle src/NewStyle.hs:(460,1)-(462,73) 1.0 2.2
circType ShapeDef src/ShapeDef.hs:(222,1)-(234,6) 1.0 1.0
relMatchesLine NewStyle src/NewStyle.hs:(742,1)-(749,28) 1.0 2.8
evalFnArgs NewStyle src/NewStyle.hs:(1547,1)-(1551,37) 0.8 2.2
toPolymorphic Server src/Server.hs:156:1-74 0.8 1.2
shapeDefs.zipWithKey ShapeDef src/ShapeDef.hs:106:11-45 0.8 1.8
shapeExprsToVals.properties' NewStyle src/NewStyle.hs:1612:15-50 0.7 1.3
evalExprs.evalExprF.(...) NewStyle src/NewStyle.hs:1543:28-71 0.5 1.4
from penrose.
After INLINE
ing (.:)
and penalty
, here is the result for nested.sub
. I expected heavy usage of ShapeDef util functions, but seems like evaluation takes a long time here.
(UPDATE: adding INLINE
maybe defeats the purpose, because fprof-auto
will automatically exclude the INLINE
calls...)
Sun Jul 22 13:28 2018 Time and Allocation Profiling Report (Final)
penrose +RTS -p -RTS sub/nested.sub sty/venn.sty dsll/setTheory.dsl
total time = 22.30 secs (22299 ticks @ 1000 us, 1 processor)
total alloc = 20,678,967,480 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
lookupField NewStyle src/NewStyle.hs:(1516,1)-(1525,28) 5.8 1.1
getName ShapeDef src/ShapeDef.hs:(474,1)-(476,64) 5.0 0.9
evalProperty.(...) NewStyle src/NewStyle.hs:1411:13-52 4.7 18.3
lookupProperty NewStyle src/NewStyle.hs:(1528,1)-(1535,27) 3.9 0.4
predEq NewStyle src/NewStyle.hs:641:1-99 3.9 1.1
shapeDefs ShapeDef src/ShapeDef.hs:(105,1)-(106,45) 3.5 3.1
evalExpr.argResult NewStyle src/NewStyle.hs:(1441,11)-(1512,97) 2.9 2.2
initProperty NewStyle src/NewStyle.hs:(1686,1)-(1694,68) 2.7 3.1
evalProperty NewStyle src/NewStyle.hs:(1409,1)-(1412,64) 2.5 5.2
compare NewStyle src/NewStyle.hs:874:25-27 2.3 0.0
evalGPI_withUpdate.trans' NewStyle src/NewStyle.hs:1418:13-87 2.2 9.5
addProperty NewStyle src/NewStyle.hs:(1027,1)-(1054,76) 2.0 2.1
addProperty.properties' NewStyle src/NewStyle.hs:1051:16-63 2.0 1.2
findShape.\ ShapeDef src/ShapeDef.hs:401:24-45 1.7 0.0
evalExpr NewStyle src/NewStyle.hs:(1438,1)-(1512,97) 1.4 5.1
shapes2vals.lookupPath NewStyle src/NewStyle.hs:(1621,9)-(1625,46) 1.3 0.5
toSubPred NewStyle src/NewStyle.hs:(460,1)-(462,73) 1.1 2.3
relMatchesLine NewStyle src/NewStyle.hs:(742,1)-(749,28) 1.1 2.9
textType ShapeDef src/ShapeDef.hs:(250,1)-(261,6) 1.1 1.2
addProperty.trn' NewStyle src/NewStyle.hs:1053:16-57 1.1 1.0
dist Utils src/Utils.hs:319:1-64 1.0 0.5
contains NewFunctions src/NewFunctions.hs:(379,1)-(400,79) 1.0 0.3
circType ShapeDef src/ShapeDef.hs:(222,1)-(234,6) 1.0 1.0
shapeDefs.zipWithKey ShapeDef src/ShapeDef.hs:106:11-45 0.9 1.9
evalFnArgs NewStyle src/NewStyle.hs:(1547,1)-(1551,37) 0.8 2.3
shapeExprsToVals.properties' NewStyle src/NewStyle.hs:1612:15-50 0.8 1.3
toPolymorphic Server src/Server.hs:156:1-74 0.7 1.2
evalExprs.evalExprF.(...) NewStyle src/NewStyle.hs:1543:28-71 0.6 1.5
toSubExpr NewStyle src/NewStyle.hs:(451,1)-(453,110) 0.5 1.0
toSubPredArg NewStyle src/NewStyle.hs:(456,1)-(457,49) 0.5 1.0
BTW 20678.96748 MB used in memory?
from penrose.
Ran the old system on master
and got the following result:
Sun Jul 22 17:02 2018 Time and Allocation Profiling Report (Final)
penrose +RTS -p -RTS src/sub/nested.sub src/sty/venn.sty src/dsll/setTheory.dsl
total time = 40.54 secs (40541 ticks @ 1000 us, 1 processor)
total alloc = 26,682,575,096 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
breakDelim Data.List.Split.Internals src/Data/List/Split/Internals.hs:(151,1)-(156,36) 7.1 24.6
breakDelim.(...) Data.List.Split.Internals src/Data/List/Split/Internals.hs:155:25-52 6.3 4.3
matchDelim Data.List.Split.Internals src/Data/List/Split/Internals.hs:(73,1)-(77,23) 5.5 4.0
splitInternal Data.List.Split.Internals src/Data/List/Split/Internals.hs:(139,1)-(148,70) 5.4 7.9
insertBlanks' Data.List.Split.Internals src/Data/List/Split/Internals.hs:(195,1)-(201,49) 4.5 7.2
split Data.List.Split.Internals src/Data/List/Split/Internals.hs:249:1-68 3.4 9.6
penalty Functions src/Functions.hs:(498,1)-(499,23) 3.2 1.2
onSublist Data.List.Split.Internals src/Data/List/Split/Internals.hs:278:1-72 3.1 0.0
doDrop Data.List.Split.Internals src/Data/List/Split/Internals.hs:(172,1)-(173,14) 3.0 4.3
splitInternal.(...) Data.List.Split.Internals src/Data/List/Split/Internals.hs:144:3-31 2.7 0.0
postProcess Data.List.Split.Internals src/Data/List/Split/Internals.hs:(163,1)-(168,45) 2.6 2.1
breakDelim.match Data.List.Split.Internals src/Data/List/Split/Internals.hs:155:25-52 2.0 0.0
objOrSecondaryShape Runtime src/Runtime.hs:(328,1)-(335,27) 1.9 0.0
binary Numeric.AD.Internal.Reverse src/Numeric/AD/Internal/Reverse.hs:(182,3)-(191,89) 1.6 1.2
splitInternal.toSplitList Data.List.Split.Internals src/Data/List/Split/Internals.hs:(146,3)-(148,70) 1.5 2.8
matched Style src/Style.hs:(362,1)-(371,30) 1.4 0.0
lookupAll Runtime src/Runtime.hs:320:1-110 1.4 0.2
procBlock.isOneToOne.bijectify Style src/Style.hs:465:19-75 1.3 2.4
getDictAndFns.initDict.\ Style src/Style.hs:335:22-103 1.2 0.9
procBlock.addShapes Style src/Style.hs:(443,9)-(445,50) 1.1 0.6
procBlock.varmaps Style src/Style.hs:434:9-87 1.1 0.4
getConstrTuples.getType Substance src/Substance.hs:673:11-137 1.0 2.0
procBlock.isOneToOne.flatMap Style src/Style.hs:462:17-66 0.8 1.9
matchWith Style src/Style.hs:(376,1)-(380,33) 0.8 2.0
Not sure what we can learn from this. I'm pretty confused about the Data.split
usage. The line that really uses the splitOn
, which is from this package, is:
nameParts = splitOn nameSep
from penrose.
Thanks for doing the profiling! Looking at the most recent nested.sub
report, I'm very surprised that getName
and predEq
take so much time. getName
should be fast and predEq
is only called in the compilation phase, not the optimization/runtime. Can we exclude the compilation from the profiling?
Also, I would expect the optimization to be taking a lot of time (step
, line search, etc.) but I don't even see it on the list (except for dist
and contains
).
Some of the slowness might be on the rendering side. Try using the Chrome JS profiler? https://developers.google.com/web/tools/chrome-devtools/rendering-tools/
Also try looking for tips on making numeric Haskell code fast, e.g. https://wiki.haskell.org/Performance/Floating_point
http://book.realworldhaskell.org/read/profiling-and-optimization.html
from penrose.
We profiled the system several times and removed the largest bottlenecks. The opt could still be faster per-step, but it's probably more effective to now think about using smarter optimization methods or better objective functions. #120
from penrose.
Related Issues (20)
- chore: rename `/packages/core/src/contrib` to `lib/`
- feat: Penrose logo library function
- Vitepress build for `docs-site` runs out of memory HOT 1
- CompileError: WebAssembly.compile(): Compiling function #71 failed: local count too large @+22906 HOT 3
- WebAssembly could not allocate memory HOT 3
- Tutorial 3 : uses an outdated argument (`arrowheadSize`) for `Line` HOT 1
- bug: `averagePoint` works only for 2D points
- Tutorial 3 exercice 2 : cannot get scalar vector multiplication to work HOT 3
- SVG rect width and height negative leads to bad behavior HOT 1
- IDE rendered as a blank page in macOS Safari
- Integration with Typst project HOT 7
- Error: Style internal error: expected to be either a constructor or function, but was not found HOT 3
- NaN in polyline points leads to rust panic HOT 2
- Missing local HOT 4
- Docs missing for `repeatable`
- bboxFromPath expected pathData to be nonempty HOT 1
- Support specifying initial value to varying values.
- Style Selectors on Nonexistent Function Names
- Ensuring consistency between successive diagrams HOT 5
- Cannot share gists with empty programs in the IDE
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from penrose.