Giter Site home page Giter Site logo

Comments (4)

aphayt avatar aphayt commented on August 15, 2024

Hi Sarah

Last night I had the same problem when I was trying to set up a virulence gene database for Salmonella. And I got an identical error message. Have you found a solution to the issue?

Many thanks,
Yue

from srst2.

sarahpenir avatar sarahpenir commented on August 15, 2024

Hi @aphayt,

I was able to make the program work by modifying the "main" function of VFDB_cdhit_to_csv.py with the following code:

def main():

    args = parse_args()
    outfile = file(args.outfile,"w")
    outfile.write("seqID,clusterid,gene,allele,DNA,annotation\n")

    database = {} # key = clusterid, value = list of seqIDs
    seq2cluster = {} # key = seqID, value = clusterid

    for line in open(args.cluster_file):
        if line.startswith(">"):
            ClusterNr = line.split()[1]
            continue

        line_split =  line.split(">")
        seqID = line_split[1].split("(")[0]

        if ClusterNr not in database:
            database[ClusterNr] = []
        if seqID not in database[ClusterNr]:
            database[ClusterNr].append(seqID) # for virulence gene DB, this is the unique ID R0xxx
        seq2cluster[seqID] = ClusterNr
    for record in SeqIO.parse(open(args.infile, "r"), "fasta"):
        clusterid = ""      
        full_name = record.description
        genus = full_name.split("[")[2].split()[0]
        id_bits = re.sub("[()]","",full_name.split("[")[0]).split() # 'R004852 fliL VP2243 '
        seqID = full_name.split()[0].split("(")[0] # R004852
        gene = id_bits[1] # fliL

        if len(id_bits) > 2:
            allele = id_bits[1]+"_"+id_bits[2] # fliL_VP2243
        else:
            allele = id_bits[1]
        if seqID in seq2cluster:
            clusterid = seq2cluster[seqID]
        outstring = ",".join([seqID, clusterid, gene, allele, str(record.seq), re.sub(",","",record.description)]) + "\n"
        outfile.write(outstring)
    outfile.close()

Hope this helps,
Sarah P.

from srst2.

aphayt avatar aphayt commented on August 15, 2024

Hi Sarah

Many thanks for sharing. I have made two VF databases: Campylobacter and Salmonella after following the steps in 'Error in step: Using the VFDB Virulence Factor Database with SRST2' #59.

Best,
Yue

from srst2.

rrwick avatar rrwick commented on August 15, 2024

Fixed in 5b1639b - thanks!

from srst2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.