Giter Site home page Giter Site logo

Comments (27)

arybczak avatar arybczak commented on August 11, 2024 1

For the record, I experimented with -fprof-late a bit and the trick to get all dependencies to compile with -fprof-late I came up with is to put into your cabal.project.local:

package *
  ghc-options: -fno-prof-auto -fprof-late

from optics.

arybczak avatar arybczak commented on August 11, 2024 1

Ok, I didn't see this bit:

As others already stated this doesn't apply prof-late to dependencies which is a shortcomming of cabal. To work around this you can use the cabal.project trick mentioned further up.

Looks like a bug in cabal to me if profiling affects dependencies, but profiling-detail doesn't.

from optics.

phadej avatar phadej commented on August 11, 2024

profiling does get into way of optimizations. It's results are unreliable. Try -fprof-late with GHC 9.4

from optics.

maralorn avatar maralorn commented on August 11, 2024

I have built my project with ghc 9.4 and with

cabal run --enable-profiling --profiling-detail=none --ghc-option -fprof-late --ghc-option -with-rtsopts=-pj

It appears the prof-late does not apply to the dependencies, but only to my project. Is this what you wanted me to do?

castOptic and beneath it repIso, Lens.first, Lens.second are still present in the most expensive function of my program.

I don’t believe that you can tell anything from a speedscope screenshot, but maybe it illustrates, vaguely, what I am talking about:

image

I am not sure if the simplifier could profit from inlining castOptic, but I think it’s plausible that it could help.

(maybe for context: In this benchmark this processJsonMessage function is being called on over 1M json messages, so squeezing even a tiny bit out of that would probably help perfomance.)

from optics.

arybczak avatar arybczak commented on August 11, 2024

This might be similar to #324.

from optics.

arybczak avatar arybczak commented on August 11, 2024

Also, @maralorn where does the optics code that doesn't optimize come from? Is it hand written?

from optics.

arybczak avatar arybczak commented on August 11, 2024

Also, optics doesn't define repIso. This function comes from generic-lens-core. For the record, optics provides built-in generic lenses and prisms, not sure why do you use generic-optics.

from optics.

maralorn avatar maralorn commented on August 11, 2024

@arybczak Thank you for helping me understanding this.

You are right, I was using generic-optics here. I didn’t know that optics has generic lenses, thx for the hint. I removed generic-optics from my project. Replacing field with gfield actually made the function 25% faster (in this benchmark), so that’s a huge win!

Still castOptic remains in the eventlog. My code in questions is the following function:

(gfield @"activities" %~ IntMap.adjust (_3 ?~ progress) id'.value)

I apply it to a record which has a field called activities of type IntMap _.

I wonder if the problem is, that I haven’t yet brought cabal to build optics with -fprof-late. I think the -fprof-late currently only applies to my project. Maybe castOptic will disappear when I get it to do that? Although apparently all the operators from optics which I am using seem to get inlined correctly. Only castOptic and # are remaining.

from optics.

arybczak avatar arybczak commented on August 11, 2024

Replacing field with gfield actually made the function 25% faster (in this benchmark), so that’s a huge win!

That's great news 👍 FYI you can use gfield via OverloadedLabels, like this:

(#activities %~ IntMap.adjust (_3 ?~ progress) id'.value)

Regarding the topic, let's experiment. Can you try #470 and see whether it fixes the problem?

from optics.

maralorn avatar maralorn commented on August 11, 2024

(I am not sure, why I refrained from using OverloadedLabels, I think it’s because sometimes the error messages get more confusing.)

Well, yes, applying that patch makes castOptic disappear from the eventlog. One operator remained: #. so I additionally created 4d7be0d and now presumably everything get’s inlined, at least there are no visible cost centers in the function anymore, I haven‘t looked at the core though.

Now I don‘t know if the function got faster again, I have run the benchmark muitiple times and it looks like it might be a improvement of about 10% but it might not be statistically significant.

So yes, castOptic get’s inlined with that change, I am just not sure how important that is.

from optics.

arybczak avatar arybczak commented on August 11, 2024

I did an experiment with using review in a code with profiling enabled and I didn't get castOptic there, but I got #. and .#.

I don't see a harm in marking there functions INLINE because they and very small and thus always get inlined when profiling is off anyway, but the more fundamental issue here is that profiling in GHC interferes with optimizations, so other, similar problems will most likely pop in the future 😞

from optics.

phadej avatar phadej commented on August 11, 2024

A friendly reminder about -fprof-late.

from optics.

maralorn avatar maralorn commented on August 11, 2024

Well, if this is really just being caused by the profiler preventing inlining, then let‘s just close this issue.

I hear you @phadej, but then the real issue here is that there are currently no resources on the internet about the relation between -fprof-late and dependencies build with cabal. I am actually not even sure what -fprof-late is supposed to do when building a library …

from optics.

phadej avatar phadej commented on August 11, 2024

the real issue here is that there are currently no resources on the internet about the relation between -fprof-late and dependencies build with cabal.

... which library maintainers cannot really address. Doing stuff with cabal (or tbh, any build tool) which is just recently got added to GHC is often troublesome. Indeed, I have no good idea how to pass profiling specific flags to dependency package builds, and only when cabal profiling build is done. (My first idea is to build a custom cabal which does -fprof-late instead of -fprof-auto, but I wouldn't call that user friendly, nor really recommend to anyone).

from optics.

maralorn avatar maralorn commented on August 11, 2024

Yeah, it’s obviously not an optics issue. I just insisted, to make sure, that I haven’t been making a stupid mistake.

from optics.

maralorn avatar maralorn commented on August 11, 2024

Thx, anyway for taking the time to talk this through with me.

from optics.

siraben avatar siraben commented on August 11, 2024

@maralorn I'm running into very similar issues with https://discourse.haskell.org/t/debugging-why-late-profiling-detail-still-shows-optics-unexpectedly/6391. I'm trying to experiment with @arybczak's suggestion.

from optics.

siraben avatar siraben commented on August 11, 2024

I've created a minimal, reproducible example showing the performance differences between lens and optics while profiling here: https://github.com/siraben/lens-optics-profiling

from optics.

maralorn avatar maralorn commented on August 11, 2024

Very interesting. Thank you!

from optics.

arybczak avatar arybczak commented on August 11, 2024

@siraben can you check with #491?

from optics.

arybczak avatar arybczak commented on August 11, 2024

Ok, I checked it myself and it does indeed fix it.

from optics.

maralorn avatar maralorn commented on August 11, 2024

I am still likely confused whether this fixes a problem that only happens during profiling and if so, why?

from optics.

arybczak avatar arybczak commented on August 11, 2024

Looks like GHC doesn't inline small/trivial functions when profiling is enabled unless they are marked with INLINE pragmas. As for why this happens, it would be best to ask GHC devs.

from optics.

AndreasPK avatar AndreasPK commented on August 11, 2024

Looks like GHC doesn't inline small/trivial functions when profiling is enabled unless they are marked with INLINE pragmas. As for why this happens, it would be best to ask GHC devs.

I don't think this is the case. Rather cabal ends up invoking ghc with both -fprof-auto and -fprof-late which ends up inserting the cost centres implied by both modes.

I've created a minimal, reproducible example showing the performance differences between lens and optics while profiling here: https://github.com/siraben/lens-optics-profiling

This was very helpful. At least in the reproducer this issue is that cabal will pass "-fprof-auto" while you then later pass -fprof-late manually. These two don't cancel each other out. Rather ghc will insert both the cost-centres for "-fprof-auto" and for "-fprof-late". This is because there are edge cases where one might want to combine these profiling modes. But perhaps that's not worth the confusion.

Anyway to properly use prof-late for just your project either use:

cabal run --enable-profiling {lens|optics}-profiling --ghc-options='-O2 -fprof-late' --profiling-detail=none -- +RTS -p -RTS

Or for newer versions of cabal

cabal run --enable-profiling {lens|optics}-profiling --ghc-options='-O2' --profiling-detail=late-- +RTS -p -RTS

As others already stated this doesn't apply prof-late to dependencies which is a shortcomming of cabal. To work around this you can use the cabal.project trick mentioned further up. So something like:

package *
  ghc-options: -fno-prof-auto -fprof-late
  profiling-detail: none

If there is a reproducer showing clearly that -fprof-late actually prevents inlining I would be very interested to take a look at it. But using the flags as I described it seems to work as expected for this reproducer.

from optics.

arybczak avatar arybczak commented on August 11, 2024

@AndreasPK I've run this repo with the following cabal.project.local:

with-compiler: ghc-9.4.5

profiling: True
profiling-detail: late

tests: True

and here's the difference:

lens-profiling:

        Sun Jun 18 15:49 2023 Time and Allocation Profiling Report  (Final)

           lens-profiling +RTS -s -p -RTS

        total time  =        0.02 secs   (19 ticks @ 1000 us, 1 processor)
        total alloc =  40,051,984 bytes  (excludes profiling overheads)

COST CENTRE MODULE SRC                 %time %alloc

$wloop      Main   <no location info>  100.0   99.9


                                                                                    individual      inherited
COST CENTRE       MODULE                SRC                      no.     entries  %time %alloc   %time %alloc

MAIN              MAIN                  <built-in>               162           0    0.0    0.0   100.0  100.0
 main             Main                  app/Main-lens.hs:29:1-4  324           1    0.0    0.0     0.0    0.0
  main7           Main                  <no location info>       325           1    0.0    0.0     0.0    0.0
   main1          Main                  <no location info>       326           1    0.0    0.0     0.0    0.0
 CAF              Main                  <entire-module>          323           0    0.0    0.0   100.0   99.9
  $fShowBox13     Main                  <no location info>       333           1    0.0    0.0     0.0    0.0
  main2           Main                  <no location info>       327           1    0.0    0.0     0.0    0.0
  main3           Main                  <no location info>       328           1    0.0    0.0     0.0    0.0
   $w$cshowsPrec2 Main                  app/Main-lens.hs:9:17-20 329           1    0.0    0.0     0.0    0.0
  main4           Main                  <no location info>       330           1    0.0    0.0   100.0   99.9
   $wloop         Main                  <no location info>       331     1000001  100.0   99.9   100.0   99.9
  main5           Main                  <no location info>       332           1    0.0    0.0     0.0    0.0
 CAF              GHC.Conc.Signal       <entire-module>          284           0    0.0    0.0     0.0    0.0
 CAF              GHC.IO.Encoding       <entire-module>          269           0    0.0    0.0     0.0    0.0
 CAF              GHC.IO.Encoding.Iconv <entire-module>          267           0    0.0    0.0     0.0    0.0
 CAF              GHC.IO.Handle.FD      <entire-module>          259           0    0.0    0.1     0.0    0.1

optics-profiling:

        Sun Jun 18 15:49 2023 Time and Allocation Profiling Report  (Final)

           optics-profiling +RTS -s -p -RTS

        total time  =        0.02 secs   (24 ticks @ 1000 us, 1 processor)
        total alloc =  40,051,784 bytes  (excludes profiling overheads)

COST CENTRE MODULE                SRC                                            %time %alloc

increment3  Main                  <no location info>                              58.3   39.9
$wloop      Main                  <no location info>                              25.0   59.9
%           Optics.Internal.Optic src/Optics/Internal/Optic.hs:(120,1)-(130,54)   16.7    0.0


                                                                                                           individual      inherited
COST CENTRE       MODULE                  SRC                                           no.     entries  %time %alloc   %time %alloc

MAIN              MAIN                    <built-in>                                    139           0    0.0    0.0   100.0  100.0
 main             Main                    app/Main-optics.hs:36:1-4                     278           1    0.0    0.0     0.0    0.0
  main7           Main                    <no location info>                            279           1    0.0    0.0     0.0    0.0
   main1          Main                    <no location info>                            280           1    0.0    0.0     0.0    0.0
 CAF              Main                    <entire-module>                               277           0    0.0    0.0   100.0   99.9
  $fShowBox13     Main                    <no location info>                            295           1    0.0    0.0     0.0    0.0
  increment2      Main                    <no location info>                            286           1    0.0    0.0     0.0    0.0
   castOptic      Optics.Internal.Optic   src/Optics/Internal/Optic.hs:(102,1)-(108,44) 288           1    0.0    0.0     0.0    0.0
    %             Optics.Internal.Optic   src/Optics/Internal/Optic.hs:(120,1)-(130,54) 289           2    0.0    0.0     0.0    0.0
   runFunArrow    Data.Profunctor.Indexed src/Data/Profunctor/Indexed.hs:78:37-47       287           1    0.0    0.0     0.0    0.0
  main2           Main                    <no location info>                            281           1    0.0    0.0     0.0    0.0
  main3           Main                    <no location info>                            282           1    0.0    0.0     0.0    0.0
   $w$cshowsPrec2 Main                    app/Main-optics.hs:16:26-29                   283           1    0.0    0.0     0.0    0.0
  main4           Main                    <no location info>                            284           1    0.0    0.0   100.0   99.9
   $wloop         Main                    <no location info>                            285     1000001   25.0   59.9   100.0   99.9
    increment2    Main                    <no location info>                            290           0    0.0    0.0    75.0   39.9
     castOptic    Optics.Internal.Optic   src/Optics/Internal/Optic.hs:(102,1)-(108,44) 291           0    0.0    0.0    75.0   39.9
      %           Optics.Internal.Optic   src/Optics/Internal/Optic.hs:(120,1)-(130,54) 292           0   16.7    0.0    75.0   39.9
       increment3 Main                    <no location info>                            293     1000000   58.3   39.9    58.3   39.9
  main5           Main                    <no location info>                            294           1    0.0    0.0     0.0    0.0
 CAF              GHC.Conc.Signal         <entire-module>                               265           0    0.0    0.0     0.0    0.0
 CAF              GHC.IO.Encoding         <entire-module>                               250           0    0.0    0.0     0.0    0.0
 CAF              GHC.IO.Encoding.Iconv   <entire-module>                               248           0    0.0    0.0     0.0    0.0
 CAF              GHC.IO.Handle.FD        <entire-module>                               240           0    0.0    0.1     0.0    0.1

Only after #491 it becomes:

        Sun Jun 18 15:53 2023 Time and Allocation Profiling Report  (Final)

           optics-profiling +RTS -s -p -RTS

        total time  =        0.02 secs   (19 ticks @ 1000 us, 1 processor)
        total alloc =  40,051,768 bytes  (excludes profiling overheads)

COST CENTRE MODULE SRC                 %time %alloc

$wloop      Main   <no location info>  100.0   99.9


                                                                                       individual      inherited
COST CENTRE       MODULE                SRC                         no.     entries  %time %alloc   %time %alloc

MAIN              MAIN                  <built-in>                  127           0    0.0    0.0   100.0  100.0
 main             Main                  app/Main-optics.hs:36:1-4   254           1    0.0    0.0     0.0    0.0
  main7           Main                  <no location info>          255           1    0.0    0.0     0.0    0.0
   main1          Main                  <no location info>          256           1    0.0    0.0     0.0    0.0
 CAF              Main                  <entire-module>             253           0    0.0    0.0   100.0   99.9
  $fShowBox13     Main                  <no location info>          263           1    0.0    0.0     0.0    0.0
  main2           Main                  <no location info>          257           1    0.0    0.0     0.0    0.0
  main3           Main                  <no location info>          258           1    0.0    0.0     0.0    0.0
   $w$cshowsPrec2 Main                  app/Main-optics.hs:16:26-29 259           1    0.0    0.0     0.0    0.0
  main4           Main                  <no location info>          260           1    0.0    0.0   100.0   99.9
   $wloop         Main                  <no location info>          261     1000001  100.0   99.9   100.0   99.9
  main5           Main                  <no location info>          262           1    0.0    0.0     0.0    0.0
 CAF              GHC.Conc.Signal       <entire-module>             248           0    0.0    0.0     0.0    0.0
 CAF              GHC.IO.Encoding       <entire-module>             238           0    0.0    0.0     0.0    0.0
 CAF              GHC.IO.Encoding.Iconv <entire-module>             236           0    0.0    0.0     0.0    0.0
 CAF              GHC.IO.Handle.FD      <entire-module>             228           0    0.0    0.1     0.0    0.1

from optics.

siraben avatar siraben commented on August 11, 2024

@arybczak the new changes help, thank you! Would you be able to make a release to Hackage that incorporates these changes?

from optics.

arybczak avatar arybczak commented on August 11, 2024

I will after #492 is merged.

from optics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.