Giter Site home page Giter Site logo

Comments (16)

davidlmobley avatar davidlmobley commented on June 16, 2024 1

Totally agree with @leeping here. @vtlim in my group also uses this aspect a lot, and @bannanc likely will as well.

from qcschema.

dgasmith avatar dgasmith commented on June 16, 2024

I believe this is part of the change where the base schema starts with a list and looks like the following:

[
    "spec_version",
    {
      ...input_one
    },
    {
      ...input_two
    },
    ...
]

While I do like this my main concern is making various QM programs actually execute this. Can @saromleang or @loriab weigh in?

from qcschema.

dgasmith avatar dgasmith commented on June 16, 2024

I think Psi4 could natively support this, but other codes would require calling wrappers which might as well be moved to other program layers rather than baking into the spec itself. Can I get other QM devs to weigh in here?

from qcschema.

wadejong avatar wadejong commented on June 16, 2024

from qcschema.

loriab avatar loriab commented on June 16, 2024

I once had the view that what a QC input file could support, a job schema should support. I've since withdrawn to a single schema job should support what quantum-chemically is a single logical unit, so a whole SAPT is one job, but a CCSD followed by a CISD is two jobs, even if SCF is shared between them. That can keep the job spec schema from getting too combinatorial – loop over these molecules, doing these methods, at all these basis sets, and at each of this list of torsion angles. Psi could do that job, but I'm reluctant to see it do-able in the job schema immediately facing a QC program. Better that that should be driven by the next layer up in the workflow.

from qcschema.

saromleang avatar saromleang commented on June 16, 2024

I don't believe GAMESS is setup to take in a batch of inputs and process them. A lot of work would need to be done within GAMESS to allow this (not saying that there isn't any benefit to it).

from qcschema.

leeping avatar leeping commented on June 16, 2024

A set of configurations with the same atomic symbols / charge / multiplicity / method is a common kind of calculation; it's also a convenient unit of data to include in a database, because the user is likely going to request the entire set rather than just one configuration. That was my starting point for requesting this feature.

I appreciate the concerns of the QM developers. It's a major task to make a quantum chemistry code support this kind of batch processing if it doesn't already. My hope is that requesting a feature in the schema is not equivalent to requesting this feature in all QM codes.

Maybe the "set of multiple conformations" should have a variable name other than "geometry", such as:

json_molecule["geometries"] = [[0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 2]]

If "geometries" is provided, then "geometry" should not be provided. That way, the QM codes that support batch processing will loop over the configurations, and those that don't can simply throw an error. What do you think?

from qcschema.

loriab avatar loriab commented on June 16, 2024

Can multiple job spec documents differing in geometry serve the same role? It's considerable redundancy for the workflows you're planning, but it's pretty modest compared to the output and cost of QC calculations. So long as the runtime database is indexable by molecule identity, the records should be readily grouped. Conformations can then be associated even if they came from different input files.

from qcschema.

leeping avatar leeping commented on June 16, 2024

I think multiple job spec documents could serve the same role, similar to how a stack of looseleaf pages can play a similar role as a book. It's mainly a matter of organization and convenience, and having the technology to bind the book can save a lot of time.

from qcschema.

langner avatar langner commented on June 16, 2024

I tend agree with @loriab - it will be much easier to implement a simple schema that covers a single unit of computation. But how do we intend to deal with multiple confirmations when they occur in single jobs, for example geometry optimization? Surely the output should be in one file. Perhaps we could extend the design for these types of cases so they would support more generic cases?

from qcschema.

leeping avatar leeping commented on June 16, 2024

I certainly understand @loriab 's concern that a single conformation makes the most sense as a single unit of computation. On the other hand, an array of single-point calculations (sharing atomic symbols / method / charge / multiplicity) is becoming increasingly common and important. There is currently no easy and standard way to manage these arrays, leading to a lot of overhead in doing this research.

I'm mainly asking for this feature as an organizational tool, which would enable us to store the data in one file, have our data-processing programs process a single file instead of looping over multiple ones, store an array of single-point calculations as one entry in a database, and refer to the whole array using one key.

It would also be great if QM codes could support running an array of single-point calculations as a feature, but that's not what I'm directly interested in. I would implement this behavior in the codes I contribute to, but I wouldn't go as far as to request it in every single code. A small script could be provided to split the job array file into multiple single units, or multiple outputs may be combined into one file.

More generally, if we request such "organizational features" in the schema, is it equivalent to requesting the same kind of organization in the QM codes that use it?

from qcschema.

vtlim avatar vtlim commented on June 16, 2024

This feature would definitely be useful for me as well -- running the same geometry optimization scheme on a large array of conformations (10s to 1000s of conformations per molecule). If I want to perform additional optimizations or visualizations, that requires me to iterate through the conformations' directories a number of times. Supporting multiple conformations would make it easier for processing and maintenance.

from qcschema.

jchodera avatar jchodera commented on June 16, 2024

It would indeed be extremely useful for JSON files intended to specify input for quantum chemical calculations to list a number of configurations on which the same operation is to be performed. If this is difficult for all programs to support natively, could this just be added as a separate Tier of spec support? It would seem like a simple Python driver would be sufficient to then act as a harness for codes that conform to a lower level Tier of the spec.

Another relevant question: If the JSON spec would also provide a way to associate the output with the inputs, how would the mapping from calculation outputs to input configuration be managed? And if several configurations were produced by a single input (e.g., for geometry optimization), we also have to worry about the association between each input configuration and many output configurations (as well as, for example, other associated properties with every output configuration).

from qcschema.

dgasmith avatar dgasmith commented on June 16, 2024

I think this goes back to what the scope of the schema project really is. Talking to folks who have implemented successful schema's before have indicated that projects which are narrow in scope have a much (much!) higher chance of being successful. Having this spec as a simple "API" for single QC applications seems complex enough (to me).

A few downsides to implementing "workflows" in the spec:

  • Whats happens if the QC program crashes on computation number 49/50?
  • Putting multiple computations through a single QC program makes parallelizing over these computations harder.
  • How do we link multiple inputs and outputs and what happens with nested outputs? (as John mentioned)
  • We create further divisions in the spec between QM programs that support this and those that do not.

I do understand that this ability is very useful; however, I believe this would best take place at a higher workflow level. This seems to be an extremely useful aspect, as John mentioned could we move this to a higher level "workflow" schema?

Im happy to be convinced otherwise here, I mostly worry that increasing the scope and complexity of this project makes its first version date even further out (if ever).

from qcschema.

davidlmobley avatar davidlmobley commented on June 16, 2024

I think I'm inclined to agree that, while useful, this should be punted to further down the line or a higher workflow level.

from qcschema.

leeping avatar leeping commented on June 16, 2024

Sounds good - I agree it's best to focus on making something narrow in scope that works. A higher-level workflow schema would be a good way forward.

from qcschema.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.