The bond_analysis.py only outputs the correct

Update on this, I recently reworked <a href="https://lorenzo-rovigatti.github.io/oxDN

Hi, I have written a to categorize the bond status of each NT

RodenLuo, thanks for the support. The does contain the functions that are helpf

[FEATURE REQUEST] bond_analysis.py,about lorenzo-rovigatti/oxdna

Comments (20)

Prateek1410 commented on September 16, 2024 1

Okay, got it, thanks! This was very helpful.

from oxdna.

ErikPoppleton commented on September 16, 2024 1

Update on this, I recently reworked oxDNA_analysis_tools.bond_analysis.bond_analysis to have per-configuration output. The CLI script now also includes a plot which shows the number of bonds over time, which I find very useful for equilibrating structures where there are a lot of bonds I need to form.

from oxdna.

RodenLuo commented on September 16, 2024

Hi,

I have written a script to categorize the bond status of each NT for each Frame. It will generate a file in a format similar to the trajectory format of oxDNA. From this output, several ways of downstream analysis can be followed, such as summing up all frames or getting averaged statistics. I attach the code (in the same license as oxDNA, GPL-3.0 license) and the necessary info below. Hope it helps.

We (the KAUST NANOVIS group) have developed a visualization tool in Houdini and also in standalone C++ to visualize the bond status in each frame as well as the PCA result for an oxDNA trajectory in various abstract layouts. See the Advertisement for SynopSpace section below for both the static figure and mp4. The tool is in the alpha-testing phase now. Feel free to reach out to deng.luo[AT]kaust.edu.sa if you are interested in seeing your structures/trajectories inside it.

Code: oxDNA bond status.zip

Output

Click me to see the details

An example output

t = 1000000
-2
-1
0
1
2 7923
...

Output format:

t = <FrameNumber>
<StatusCode> [Pair ID if code == 2] # for NT0
<StatusCode> [Pair ID if code == 2] # for NT1
<StatusCode> [Pair ID if code == 2] # for NT2
...

So, there are 5 categories. The meaning of them are as follows. hb stands for hydrogen-bonded.

-2: Designed to be hb, wrong hb
-1: Designed to be singled, but is hb
0: Designed to be hb, not yet formed
1: Designed to be singled, and is single
2: Designed to be hb, correctly formed. In this case, the pair id will be printed as well.

Commands to run the code:

Click me to see the details

python convert_hb_to_nt_status.py design.top design.ordered.pair.txt hb_status.txt hb_status.dat

# design.top is oxDNA topology file
# hb_status.dat is the output
# The rest is explained below

hb_status.txt is the output of DNAnalysis if you add the following to input and invoke DNAnalysis input

####    Analysis    ####
analysis_data_output_1 = {
    name = hb_status.txt
    print_every = 1
    col_1 = {
          type = hb_list
    }
}

design.ordered.pair.txt, the format is the same as hb_status.txt but with only one frame. And it records the designed hb status. Example below.

To generate design.ordered.pair.txt, there are several ways.

If you have cadnano or CanDo json file, download tacoxDNA, replace 3 files with the ones attached in the code. Invoke the conversion command will generate .ordered.pair.txt automatically.
If the above is not the case, choose one step from DNAnalysis output and add to the designed status manually. (Sometimes DNAnalysis failed to capture a few base pairs.)
Use oxview.
Drag & drop the design to oxview.
Enable strand in Selection mode, and enable Select pairs.
Select the scaffold.
Download Selected base list.
sed 's/ /\n/2; P; D' baseListFile.txt > basePairList.txt
awk '$1 > $2 {temp = $2; $2 = $1; $1 = temp} 1' basePairList.txt > basePairList.ordered.txt
Add header # step 0 at the first line manually.

Advertisement for SynopSpace

Click me to see the details

alpha_testing.mp4

from oxdna.

supers1x commented on September 16, 2024

RodenLuo, thanks for the support. The script does contain the functions that are helpful. All I need is to build my workflow based on the hb_status.dat file.
However, when dealing with disordered hb_status.txt (the DNAnalysis is to blame), the script gives wrong results compared to design.ordered.pair.txt. Thus, it is also important to print base ID in the output.
Maybe my code could help to solve the problem.
code.zip

from oxdna.

RodenLuo commented on September 16, 2024

So, first, you will need to add one line such as # step 0 or t = 0, or anything else as the first line in your pairs.txt. (This is because I initially want to keep some format consistency to have this header. During parsing, in line 28 of convert_hb_to_nt_status.py, it is skipping the first line.)

Then, this corner case (with only one frame) actually reveals a bug. There is a code section to get the nt status code "1" correct. But this section has not been executed in this corner case. I have updated the code to correct it: convert_hb_to_nt_status.zip.

Now the output looks as follow:

t = 3000000
1
0
2 250
2 249
...

Keep in mind that oxDNA NT index starts at 0 (and strand index starts at 1). This means at this frame, NT0 is designed to be single and is single, NT1 is designed to be paired and not yet formed, NT2 is designed to be paired with 250 and is correctly formed, the same for NT3 with 249.

PS: I have the feeling that you manually come up with the pairs.txt, which is fine if you don't have a big structure. Just make sure that in each line, the smaller ID always goes first. The awk command mentioned in my previous reply guarantees this.

PPS: I think understand what you mean by "disordered". It has been taken into account. In short, construct_hb_design function builds the hb_design_dict first. Then hb_frame_list with the same length as the total NT is initialized with all 0s. And then when parsing each line of the "disordered" output, hb_frame_list is updated at that specific index. In the end, a final go through of hb_frame_list is there to identify the case for status code = 1. So, even if the output is disordered, the script runs fine.

from oxdna.

zoombya commented on September 16, 2024

My take on this problem is to write my own function counting the number of correct bonds in this case.

# pairs is a dictionary of following form 
#   {lower_base_id : bound_base_id}
# it's importaint to have the id with the lower index come first in those pairs.  

def analyze_bonds(path, pairs):
    import oxpy
    # collect the states
    n_states = []
    # we need the context as it otherwise complains about the logger
    with oxpy.Context():
        # get our simulation from the path 
        s = Simulation(f"{path}/input")
        # add the observable
        s.input_file["analysis_data_output_1"] = '{ \n name = stdout \n print_every = 1e10 \n col_1 = { \n id = my_obs \n type = pair_energy \n } \n }'
        
        # get oxDNA Analysis instead of the normal backend
        backend = oxpy.analysis.AnalysisBackend(
            s.input_file
        )
        
        # fire the analysis off
        for i in range(3000): # i am interested only in a subset of the trajectory 
            # read in a conf to analyze
            backend.read_next_configuration()
            # split the observable output 
            e_txt = backend.config_info().get_observable_by_id("my_obs").get_output_string(0).strip().split('\n')
            
            # accumulate the number of bonds => state
            state = 0
            for line in e_txt:
                if line[0] != "#": # filter out the header and footer
                    # split output into calculated parameters
                    id1, id2, FENE, BEXC, STCK, NEXC, HB, CRSTCK, CXSTCK, DH, total = map(float,line.split())
                    # figure out if pair ids are in the observable output and if we have a HB
                    if id1 in pairs and id2 == pairs[id1] and HB :
                        state +=1
            # record the state
            n_states.append(state)
        
        #output stuff
        with open(join(path,"bond_states.json"),"w") as file:
            file.write(
                dumps(n_states)
            )

This function also uses a peace of my boiler plate simulation class, which I am actively fiddling with.
Attached is a link to the prototype.

Biggest problem with the observable is that it will output all the bonds in the system, regardless if they are wrong or not, so you have to maintain some sort of a list as @RodenLuo mentioned.

ox_boilerplate2.zip

from oxdna.

Prateek1410 commented on September 16, 2024

Thanks for the help, RodenLuo. I followed your instructions and ran the codes for a small system of dsDNA (40nt long). I can't seem to understand the outputted hb_status.dat file: looking at the trajectory, I believe all pairs are intact throughout and thus, the status should be '2' however, this is the case for only 1 base pair: 20 and 59 at all frames. I am attaching the files here; could you please take a look? Thanks again! I converted all to .txt files cause otherwise it wasn't accepting them.

input.txt
last_conf.txt
trajectory.txt
baseListFile.txt

hb_status_output.txt

basePairList.ordered.txt
basePairList.txt
dsdna.txt
dsdna_conf.txt
hb_status.txt

from oxdna.

RodenLuo commented on September 16, 2024

Hi, How come you have index 84 in your baseListFile.txt and all the related files, given that your top file has only 80 NTs?

I guess you have edited the structure after you downloaded the baseListFile.txt, or you messed up two structures here. I repeated everything using your dsdna.txt and dsdna_conf.txt and the output looked fine. It generated this basePairList.ordered.txt. With it, you should be able to get the correct result.

from oxdna.

RodenLuo commented on September 16, 2024

Oh, the chance is very low. But you might hit this bug as I did before... sulcgroup/oxdna-viewer#87

I played in oxview with your structure, the baseListFile.txt content order sometimes changes. But the pairs are correct.

from oxdna.

Prateek1410 commented on September 16, 2024

It seems the issue is with the awk command not working, weirdly. Thus, the ordering of the particle ids doesn't happen.

I don't think I hit that bug; I simply selected all particles and then downloaded the selected baselistfile on which I successfully ran the sed command. However, after running awk, I don't see any changes in the generated ordered file.

from oxdna.

RodenLuo commented on September 16, 2024

The awk command is to guarantee the first column is always smaller than the second column. If it is already the case, the awk command will simply do nothing.

What I don't get is, in your original baseListFile.txt, it starts with "45 84 ...". How is it possible that you have an "84" there, as you only have 80 NTs in the whole system.

Have you tried the new basePairList.ordered.txt I uploaded above?

from oxdna.

Prateek1410 commented on September 16, 2024

Hi Roden,

Thanks, I got it now. I don't know how those extra NTs came up though. The problem lay in my faulty selection which wasn't giving me the correct pairs. After correct selection I am getting the expected result, thanks again!

To clarify: in every frame, each particle's bond status is given right?

How should I use the script if I am interested in the bond status of only a few particles? Is there a way, or would I have to generate the hb_status.dat file for the entire structure and then process it further to study my chosen particles?

Thanks and regards
Prateek

from oxdna.

RodenLuo commented on September 16, 2024

Hi Prateek, Great to know it's working.

in every frame, each particle's bond status is given right?

Yes. There is a detailed doc in my first reply on this page.

to study my chosen particles?

When I first wrote the script, I tried to code it as general as possible. Then, any downstream analysis can go from here and filter the output to get the case-specific results. For example, if you are only interested in the first NT, then grabbing the first line of each frame would do. Either manually in a text/Excel editor or a bit of learning on Python scripting about file parsing and array indexing should be good enough for your case.

Cheers,
Roden

from oxdna.

RodenLuo commented on September 16, 2024

You are very welcome! Just a final heads-up, I forgot to mention just now. oxDNA's NT indexing starts with 0. So, NT 0 is the first line under each frame.

from oxdna.

Prateek1410 commented on September 16, 2024

Okay, thanks.

from oxdna.

lorenzo-rovigatti commented on September 16, 2024

Hey guys, I didn't manage to follow the whole thread, but is the feature request still active? In other words, is this something you think we should somehow incorporate in the code or not?

from oxdna.

Prateek1410 commented on September 16, 2024

Hello Professor Lorenzo

Yes, it'd be fantastic if you could please rework the script to produce info on missbonds, native bonds for each configuration.

Warm Regards
Prateek

from oxdna.

supers1x commented on September 16, 2024

Hello Professor Lorenzo,
Just as Prateek said, a function that outputs misbonds, desined-bonds would be appreciated.

from oxdna.

ErikPoppleton commented on September 16, 2024

Hey guys, sorry for the slow responses, in the middle of moving to a new positions so things are a bit hectic. That being said, a few months ago I rewrote all the scripts in oat to be composed of cleanly importable functions for use in your own Python scripts. oxDNA_analysis_tools.bond_analysis.bond_analysis might be exactly what you're looking for. If you need it on a per-configuration basis rather than an average over configurations, feel free to copy the function and just change the data accumulation.

from oxdna.

Prateek1410 commented on September 16, 2024

Thanks a lot, Erik. This is super helpful.

…

On Tue, 20 Dec, 2022, 9:00 pm Erik Poppleton, ***@***.***> wrote: Update on this, I recently reworked oxDNA_analysis_tools.bond_analysis.bond_analysis <https://lorenzo-rovigatti.github.io/oxDNA/oat/api.html#bond-analysis> to have per-configuration output. The CLI script now also includes a plot which shows the number of bonds over time, which I find very useful for equilibrating structures where there are a lot of bonds I need to form. [image: bonds] <https://user-images.githubusercontent.com/36451772/208703953-c1d697a9-290b-4e2e-9b9d-e73578318c95.png> — Reply to this email directly, view it on GitHub <#45 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARS5VAZMINFHXAFEIEDCSTTWOHGIZANCNFSM6AAAAAARB3RWGI> . You are receiving this because you commented.Message ID: ***@***.***>

from oxdna.

[FEATURE REQUEST] bond_analysis.py about oxdna HOT 20 CLOSED

Comments (20)

Output

Commands to run the code:

Advertisement for SynopSpace

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent