Python subprocess.Popen gymnastics with ccp4 programs

From $1

This is a article that will hopefully allow me to understand what the subprocess module is doing

 

The problem: I am trying to run the following shell script , but from inside a python script . The reason I want to do this is that it is more elegant to get one script to do all the work, than to have a python script write a shell script and then have the shell script run externally and then have python process the output and delete the shell script to tidy things up.

 

Anyways here is the shell script I want to pythonify

#!/usr/bin/sh f2mtz HKLIN 32_4run1.phs HKLOUT 32_4run1.mtz <<eof SYMM p1 CELL 79.814 102.639 153.627 82.465 75.531 73.450 format '(3f4.0,f11.2,f8.2,f7.3,4f11.3)' skipline 0 LABOUT H K L IP FOMS PHIS HLA HLB HLC HLD CTYPOUT H H H I W P A A A A END eof

To do this I am using the subprocess module.

So based onn a pointer that Albert Hopkins provided , I analyse this task as follows

Ask yourself what you are trying to do: 

      * Run an executable       * pass arguments to the executable       * Use a string standard input to the call 

Then re-read over the subprocess module

Now before you read any further it seems like the peculiarities described here may just be side-effect of the way the ccp4 programs work and may not be generally applicable.

The basic problem I had was with getting the correct distinction between arguments and keywords as defined by the ccp4 documentation

For eg in f2mtz the Synopsis reads as

SYNOPSIS

   f2mtz hklin foo.hkl hklout foo.mtz
   [Keyworded input]

So the hklin foo.hkl and hklout foo.mtz are arguments and the rest of the inputs like CELL, SYMM etc are keywords

So based on this in the shell script I will color code arguments and keywords from the perspective of subprocess.Popen

#!/usr/bin/sh
f2mtz HKLIN 32_4run1.phs HKLOUT 32_4run1.mtz <<eof
SYMM p1
CELL 79.814 102.639 153.627 82.465 75.531 73.450
format '(3f4.0,f11.2,f8.2,f7.3,4f11.3)'
skipline 0
LABOUT H K L IP FOMS PHIS HLA HLB HLC HLD
CTYPOUT H H H I W P A A A A
END

eof

To automate a ccp4 script inside python . Put the keywords in a list as follows

commandtorun = ["command HKLIN %s HKLOUT %s " % (relevant inputs from python script) , "<<eof"]

Then start a subprocess

commandpopen = subprocess.Popen(commandtorun,stdin=subprocess.PIPE,shell=True)

And then communicate with the process as follows :

commandpopen.communicate(input=script)

Where script has all the keywords and the eof

So accordingly I have my program to automate the mtz to phs conversion given below.

import os.path
# To change this template, choose Tools | Templates
# and open the template in the editor.

__author__="hari"
__date__ ="$Mar 26, 2009 4:21:37 PM$"

from optparse import OptionParser
parser = OptionParser()
import os

def getcellparams(scafile):
    myfile = None
    if os.path.lexists(scafile):
        myfile = open(scafile,"read")
    else:
        print "File not found %s: Please give cell parameters" % scafile
        exit()
    myfile.readline()
    myfile.readline()
    cellline = myfile.readline().split()
    cell = " ".join(cellline[:-1])
    spg = cellline[-1]
    return (cell,spg)


def main():
    parser.add_option("-o",dest="mtzfile",help="output mtz file",metavar="*.phs")
    parser.add_option("-i","--phs",help="Input phases file from shelx",dest="phs",metavar="*.phs")
    parser.add_option("--sca","-s",dest="scafile",help="scafile for cell parameters",metavar="*.sca")
    parser.add_option("-c","--cell",dest="cell",help="cell parameters a b c alpha beta gamma",metavar="CELL")
    parser.add_option("--sym",dest="symm",help="symmetry / space group",metavar="P1")
    (options,spillover) = parser.parse_args()
  #  print "Converting %s file to %s mtz file" % (options.phs,options.mtzfile)
    cellparams,symm_from_sca = (None,None)
    if options.scafile != None:
        (cellparams,symm_from_sca) = getcellparams(options.scafile)
    if options.symm == None:
        options.symm = symm_from_sca
    script =  """SYMM %s
    CELL %s
    skipline
    LABOUT H K L FP FOM PHIS X
    CTYPOUT H H H F W P R
    FORMAT '(3f4.0,f11.2,f8.2,f8.1,f8.2)'
    END
    eof""" %(options.symm,cellparams)
    import subprocess
    f2mtzargs = ["f2mtz hklin %s hklout %s " % (options.phs,options.mtzfile),"<<eof"]
    a = subprocess.Popen(f2mtzargs,stdin=subprocess.PIPE,shell=True)
    a.communicate(input=script)
   

       
if __name__ == "__main__":
    main()

What does not work and I dont know why:

In the above program the following versions of the command do not work

f2mtzargs = ["f2mtz ", "hklin %s" % option.phs , "hklout %s" % option.mtzfile , "<<eof"]

f2mtzargs = ["f2mtz hklin %s hklout %s <<eof" % (option.phs, option.mtz)]

ALso using communicate to convey arguments and keywords does not work because almost all the ccp4 programs expect the HKLIN and HKLOUT to be provided as arguments and not after program is launched as keywords.

 

Tags:
 
Images (0)
 
Comments (0)
You must login to post a comment.