Rowland NMR Toolkit Script Generator Command Help

Tools Home

Molecular Biology

NMR

Data Model

VENN

Links

UCHC>SBF>TOOLS_HOME>HELP>NMR>NMR_TOOLKIT

Below are the instructions for the Rowland NMR Toolkit script generator and general information about NMR data processing. Click here for a link to the official Rowland NMR Toolkit manual.

Table of Contents: Introduction; Usage; Enter parameter string / Load Parameters; Dimension; CPU's; Sampling; Input; Big/Little Endian; Output; Processing Type?; Number of Data Points; Output Data Size; Acquisition Type; Quadrature Fix; Reverse Spectrum; Delete Imaginaries?; Time Domain Convolution (Solvent Suppression); Correct First Point; Scale First Point and DC Offset; Linear Prediction; First Apodization (Window Function); Second Apodization; Phasing; Referencing Parameters; Maximum Entropy Parameters; Extract Left Half of Spectrum?; Additional File Formats; Remove Temporary Files

Introduction: (top)

The Rowland NMR Toolkit Script generator is designed to aid in the creation of RNMRTK scripts for the processing of 2- and 3-dimensional NMR data collected using States, or States-TPPI for quadrature detection. It also handles sensitivity-enhanced data without any prior manipulation of the data.

The script generator is still under development. While I have attempted to eliminate as many bugs as possible I can not test all possible scenarios. If you find any bugs in the script generator please let me know and I will attempt to fix them as soon as possible (markm@neuron.uchc.edu). The program at this point does some very basic error checking when executed. If any of the parameters you have selected are incompatible an error message should appear with suggestions on how to fix the problem. However, the checks are not rigorous and there are sure to be cases where incorrectly entered parameters may not be caught by the error checking. If you find any additional error checks you would like included please let me know.

Many of the comments made here come from the book "NMR Data Processing" by Jefferey C. Hoch and Alan S. Stern (1996).

Usage: (top)

After creating a script using the RNMRTK script generator copy the script to your favorite text editor. Save the file to any name you like (here I will use process.com). Make the file executable by typing chmod u+x process.com, or whatever you called the script. Then simply execute the script by typing ./process.com, or whatever name you used. I often use the command tee to generate a log file of the script "./process.com | tee process.log". Note that on OSX systems I find it best to use vi as the text editor as it handles copy/pastes from OSX browsers without causing issues with hidden characters and premature line breaks.

Enter parameter string / Load Parameters: (top)

The Enter parameter string text box allows one to type in any parameter directly to set its value. For example if you want to set the number of cpu's to 4 you can type nproc=4 in the text box and hit the "Load Parameter" button. You can enter multiple values to be set at once by separating them with a comma, but with no spaces. For example "nproc=2,machine=V,proc_type2=ft" will set the number of cpus to 2, the machine type to Varian, and the t1 processing type to FT.

Of course this method is probably more laborious than simply filling out the form directly. However, I cave created macros/scripts that will generate an input string which may be copied and pasted into the text box to rapidly fill out the form.

For Varian instruments I have created a macro called varian2sbtools and a minor modification to the BPsvf macro which calls varian2sbtools upon a save. These macros create a file called sbtools.input which is the parameter string to be copied into the text box.

For Bruker data sets I have created a perl script called bruker2sbtools.prl which is run from the directory of the saved data. Likewise this script creates a file called sbtools.input which is the parameter string to be copied into the text box.

All of the scripts are available from the downloads page. These macros / scripts also create additional files, macros, etc. to aid in data processing. See the scripts themselves for more details.

Dimension: (top)

Toggle to switch between 2-dimensional and 3-dimensional data. When the 2-dimensional toggle is selected only the first two columns will appear (direct dimension and t1:first indirect). When the 3-dimensional toggle is selected all three columns will appear (direct dimension, t1:first indirect, and t2:second indirect).

CPU's: (top)

Pull down menu that allows you to choose how many CPU's to use in the calculation. This feature is only utilized when performing Maximum Entropy reconstructions in two dimensions (msa2d). The script generator will divide a 3D data set into chunks of 2D planes to reduce the overall memory requirements of the MSA2D program. The number of planes in each chunk will be equal to the number of CPU's selected. An environmental variable MP_SET_NUMTHREADS is set which tells the msa2d program to use multiple threads (cpu's). Note that the number of cpu's chosen can exceed the number of actual cpu's in your computer as each cpu can run multiple threads simultaneously, each thread will just take longer to run because they will be sharing the system resources.

The cluster option allows an msa2d calculation to be distributed over a cluster of computers. A cluster is defined as any group of computers which you can connect via ssh without the use of a password and which you have access to the same "home" folder, likely through an NFS mount. Currently each of the computers should be able to call the same binary msa2d program. However, minor changes to the scripts could possibly allow distribution over different hardware platforms. The CPU value determines how large a chunk to send to each of the cluster nodes. The cluster scripts provide load balancing so that computers of different capabilities may be mixed easily with no extra work. In internal testing I have found that there is no benefit to increasing the number of CPU's beyond the number of CPU's in the computers to reduce the total I/O costs of having to make a larger number of ssh connections and to have a larger number of read/write steps. In order to get the cluster option to function three files are needed; cluster-msa2d, msa2d-process, and a computer_list. The computer_list will need to be created for your cluster. The two scripts and an example computer_list file can be downloaded from the downloads page. The two scripts will need to have minor modifications of defined paths when they are installed and they should be placed somewhere in the path (the exe folder of the rnmrtk program is a good spot).

Sampling: (top)

Dialog box for selecting how the indirect dimensions of the data set were collected. Uniform means that the time increments in the indirect dimensions were collected with uniform time spacing (typical manner of collecting data). Non-uniform means that the indirect time points were taken from a file and could be non uniform in their spacing. If non-uniform is selected a filename with the sampling schedule needs to be entered and the file must be in the current directory where the script will be executed from.

A "Use sample schedule" checkbox has been added. When checked a sample schedule is used and the filename must be provided. If the data was collected with non-uniform sampling then this checkbox must be selected. If the data was collected linearly and the "Use sample schedule "checkbox is checked then the data will be processed non-uniformly. Only those points which are present in the sample schedule will be kept, all others will be deleted. This mode can be used for various purposes. One is to process uniformly sampled data where a few of the FID's are corrupted due to an error such as an ADC overflow. See the FAQ list for more details. Another purpose is to collect a large complete uniform data set and then apply several different sampling schedules for testing purposes. Lastly, it gives the user the ability to test processing non-uniform data without the hassle of figuring out how to collect non-uniform data. It is hoped that this type of test will convince the user that it is worth collecting non-uniform data in order to obtain the highest resolution data in the quickest amount of time.

Lastly a "Random Order?" checkbox has been added. If the sampling schedule for 2D experiments is not in sequential order then this checkbox should be checked. This checkbox has no effect for 3D experiments.

Input: (top)

Filename of the raw NMR data. There are three checkboxes called Varian, Bruker, and RNMRTK.

When the Varian checkbox is selected the input filename is changed to fid by default. Varian data will be loaded using the loadvnmr command built into the rnmrtk program. The procpar file is parsed by the loadvnmr command to find all relevant information needed to load the data. For data collected with non-uniform sampling the procpar file will be edited to have additional information appended to the end of the file which will allow the loadvnmr command to function without any further user intervention. For 3D processing there is a button which allows the user to select between phase2,phase and phase,phase2. For uniform sampling this setting is not used as the loadvnmr command extracts the proper setting directly from the procpar file itself. However, setting this value correctly for non-uniform sampling is critical.

When the Bruker checkbox is selected the input filename is changed to ser by default. The data will be loaded with the generic rnmrtk load command. This load command needs the raw filename to have an extension and there needs to be a parameter file defining the layout of the raw data with the same name as the raw data but with a .par extension. The script that the RNMRTK script generator creates will automatically add a .dat extension to the ser filename and automatically create a parameter file called ser.par. No further intervention should be needed by the user. For data sets collected with digital filtering you may enter the number of points to left shift your data to remove the "odd" beginning of the Bruker FID's. Shifting left will allow proper processing of the data, but will introduce a significant curvature of the baseline near the edge of the spectrum. This is normally not an issue as long as the sweep width is large enough so that real peaks are not at the edge of the spectrum. To determine the number of points to shift left you can examine the data set or the following table may be of use.

When the RNMRTK checkbox is selected the user must provide a raw NMR filename along with an extension. It is expected that a parameter file with the same base name as the raw NMR data is present which defines the layout of the raw NMR data. See the load command from the rnmrtk program for further details (RNMRTK manual).

Big/Little Endian: (top)

The big-endian little-endian checkbox only needs to be selected for Bruker data. In general if you have a SGI computer which controls your system you will select little-endian. For all other computer types big-endian needs to be selected.

Output: (top)

Name of the final transformed data set. The data set will automatically be saved in RNMRTK format with a .sec extension. A parameter file of the same base name will also be created with a .par extension which defines the layout of the rnmrtk file format. If additional file formats are selected the same base name will be used and the appropriate extension will be added.

Output Format - For security reasons only letters, numbers and underscore are allowed in valid filenames. If you need to use other character types simply use a text editor to rename them after the script is generated.

Processing Type: (top)

Sets the processing type to none, FT, or maximum entropy reconstruction. None cannot be selected for the acquisition dimension, but can be selected for the indirect dimensions. For 3D data sets the only option for the acquisition dimension is FT as the msa2d program can only currently reconstruct two dimensions at a time. A 3D version is being created. Data collected with uniform time spacing (typical data collection) can be processed with either FT or maximum entropy reconstruction. Data collected non-uniformly must be processed with maximum entropy reconstruction. Light blue areas must be filled out for both FT and maximum entropy options. Dark blue areas must be filled out for a given column if the processing type is set to FT. The grey area must be filled out if maximum entropy reconstruction is selected in any dimension. Advantages of using maximum entropy reconstruction over FT is the ability to get more robust results as compared to linear prediction, the ability to de-convolve natural line-width and J-couplings to get significant increases in resolution without compromising the S/N, and the ability to use sparse data sets which can be used to collect much shorter experiments without compromising sensitivity and which can achieve higher resolution.

Number of Data Points: (top)

For each column in the form page there are three values; total, real, and imag. Once one value is entered the other two should be filled out automatically by JavaScript. For Varian instruments the first column is represented by np, the second column ni, and the third column ni2. For Bruker data the first column is represented by TD from acqus, the second column TD from acqu2s, and the third column TD from acqu3s.

total - The total number of points in each free induction decay (fid). Each fid consists of 1/2 real and 1/2 imaginary points. Varian and Bruker both define np and TD the same for the acquisition dimensions and the value for total should be equal to np (Varian) or TD (Bruker). For indirect dimensions Bruker uses the same convention. The TD value from the acqu2s and acqu3s are the total number of fids (reals + imaginaries). Varian defines the ni and ni2 values as a complex number and therefore total for the indirect dimensions for Varian should be equal to ni*2 and ni2*2.

real - Number of real points collected in each of the dimensions. For Varian the acquisition dimension should be equal to 1/2*np, and for the indirect dimensions it should be equal to ni and ni2. For Bruker all three should be equal to the appropriate TD/2.

imag - Number of imaginary points collected in each of the dimensions. For Varian the acquisition dimension should be equal to 1/2*np, and for the indirect dimensions it should be equal to ni and ni2. For Bruker all three should be equal to the appropriate TD/2.

Output Data Size: (top)

The output data size sets the final REAL data set size along all dimensions and controls the number of zero-fills used to process the data. For Fourier transformed data, if the output data size for any given dimension is larger than the number of complex points collected or linear predicted then the data will be zero filled to the size of the output data size for that dimension. Note that for the acquisition dimension if "Extract Left Half of Spectrum?" is selected the zero-fill size will be doubled as it will need to be cut in half to give the final output data set size selected.

Zero-filling extends an fid by appending zeros to the end. This causes a slight increase in the digital resolution of the frequency domain data after Fourier transformation and allows imaginary data to be reconstructed using a Hilbert Fourier transformation. For data which is being processed by maximum entropy reconstruction, the data set will be reconstructed to the size selected in the output data size for each dimension. Note that if the output data size for any given dimension is larger than the number of complex points collected in that dimension that the data will be "predicted" to the final data set size much like linear prediction. However, the results of maximum entropy reconstruction should give superior results to linear prediction when done properly.

Acquisition Type: (top)

Selects the type of data you have collected in each dimension. Currently for the acquisition dimension the only choice is complex. For the two indirect dimensions the current choices are States, States-TPPI, and sensitivity enhanced data. There is no reason to rearrange sensitivity enhanced data before processing. It is assumed that sensitivity enhanced data is collected as either States or States-TPPI.

Quadrature Fix: (top)

The R, I, -R, -I option for the quadrature fix will multiply every other real/imaginary pair along a given dimension by negative one. This is completely analogous to the -ALT flag that nmrPipe uses for the FT command. This will cause a reordering of the signals along the given dimension and is often needed when processing Bruker data sets. In principle other combinations for QUADFIX may be used such as R, -I which would multiply all the imaginaries by negative one (conjugate) and cause a reversal of the spectrum along that dimension. At this point reversals are handled by a separate reverse switch on the script generator. Other fixes will be added later as needed.

Reverse Spectrum: (top)

Often the indirect dimensions of 2-dimensional and 3-dimensional NMR experiments are reversed. In some cases the acquisition dimension can also be reversed. This can be fixed by changing the phase of the receiver during detection, but it is easier to reverse the fid during processing. The acquisition dimension is reversed by taking the complex conjugate of the fid during the initial importing of the data set. Indirect dimensions are reversed using the reverse command.

Delete Imaginaries?: (top)

Check these boxes if you would like the imaginaries deleted during processing. Note that once you know your phase parameters the imaginaries can be safely deleted. If you are having trouble with memory size make sure the imaginaries are being deleted and check your output data set size.

Note that when converting to other analysis packages such as XEASY, Sparky, NMRPipe, etc. the imaginaries are automatically deleted even when the delete imaginaries checkboxes are not selected as these packages do not accept imaginary data. However, the RNMRTK formatted output file will have the imaginaries saved in this case.

In order to reduce processing time it is important to delete the imaginaries in the acquisition dimension when processing the two indirect dimensions with maximum entropy reconstruction.

Time Domain Convolution: (top)

Time domain convolution is a very effective method to remove large solvent signals, such as residual water, from your spectra. In this script generator you may only select a single frequency to subtract. However, you can simply edit the script after creation to add additional sstdc command lines with different frequencies to subtract more than one signal. The Defaults button resets the Time Domain Convolution parameters back to their default values.

Filter Width - The filter width is inversely proportional to the size of the window to suppress and is dependent on the line-width of the signal you are removing and the number of points in the fid. The smaller the value the greater the amount of signal that is subtracted and the faster the calculation time. For experiments where no signals overlap the solvent signal you generally want to use small values for size, such as 16. For experiments where you have closely spaced resonances to the solvent signal it is generally best to try larger values (~60) and to test several different window sizes to get the best results of subtracting solvent and leaving your signals unaffected.

EndPoints - The endpoints parameter sets the number of points and the beginning and end of the fid to be treated as extrapolation rather than convolution.

Filter Shape - The filter shape can be selected to Gaussian (default) or Cosine.

Frequency - Allows the frequency of the signal to be suppressed to be entered. By default the signal to be suppressed is at zero frequency (the center of the spectrum), however, any frequency may be entered.

Remove Diagonal - For 2D data sets the remove diagonal checkbox can be selected to use the solvent suppression command sstdc to remove the diagonal.

Correct First Point: (top)

This setting allows you to correct the initial point(s) of the acquisition fid by backward linear prediction.

The first few points of the acquisition fid are often responsible for causing baseline rolls after Fourier transformation. This is a problem because the initial points of an fid are often collected incorrectly and have large amounts of noise as a result of collecting data usually only a few usec after applying high power RF pulses to the probe coil and other imperfections in the probe itself. These points can be corrected by backward linear prediction. If linear predict first point(s) is selected then the initial point or points are predicted. This replaces the existing points, it does not add additional points to the beginning of the fid.

Predict - The value "predict" selects the number of points to predict. Generally a value of 1 or 2 is chosen.

Points - The value "points" selects the number of points to use in the prediction.

Coef - The value "coef" selects the number of coefficients to use in calculation. Coef should not exceed 50% of "points".

Scale First Point and DC Offset: (top)

Scale - If Scale First Point is selected then the value scale is multiplied by the initial point of the fid. This applies for all dimensions. For any given dimension the value of scale should typically be 0.5 assuming that the first order phase correction is zero. For cases where the first order phase correction is not zero then scale is typically set to 1.

DC Offset - Often the tail end of an fid does not equal zero because the entire fid is either shifted up or down slightly from the zero point. This is referred to as a DC offset. If an fid with a DC offset is Fourier transformed a spike at zero frequency will appear. Worse, if the fid is zero-filled, the appended zeros will not extend from the last point for the fid but rather will be offset from the last point. This will be interpreted as a truncation artifact when performing a Fourier transformation and cause wiggles at the base of peaks. Both of these adverse effects can be removed by simply adjusting the fid up or down so that the center of the fid is near zero. This feature can be turned on and off by adjusting the DC Offset switch. It is only available in the acquisition dimension.

Linear Prediction: (top)

Linear prediction extrapolates additional data points to time-domain data (fid). Linear predicting can be an effective way to increase the number of data points, and hence resolution, for data sets that are truncated. Often in 3-dimensional data sets you set ni and ni2 small to save acquisition time even though the signal has not decayed away to zero. Using linear prediction to extend the fid in these cases can improve the resolution considerably.

Maximum entropy reconstruction can be used as a replacement for linear prediction and Fourier transformation for both linear and non-linearly collected data.

Linear prediction should only be used when the signal you are trying to predict has not decayed completely away to zero. If the signal has already decayed to zero then linear predicting further data points will generally add noise and not improve resolution. It is therefore best to always process data without any linear prediction and then compare the spectra to one with linear prediction. It is also best to try different linear prediction parameters and compare them to see which works the best. Remember processing parameters can have huge effects on the quality of the data.

For experiments that have dimensions that were collected with constant time evolution it is generally best to use mirror-image linear prediction. See the readme file of the pulse sequence if you are unsure if any of the dimensions were collected with constant time evolution. In general, the mirror-image linear prediction algorithm will give superior results and is faster to perform.

Linear prediction works best when the signal is strong, truncated, and there are as few peaks as possible to predict. Because of this last feature it is best in 3-dimensional data sets, where both the t1 and t2 dimensions are to be linear predicted, to Fourier transform the acquisition dimension first, then transform the t1 dimension without linear prediction and then process the t2 dimension with linear prediction. Afterwards the f1 dimension can be inverse Fourier transformed, linear predicted, and then re-transformed. All of this is built into the script generator and takes no extra work on your part. For 3-dimensional data sets where only one of the indirect dimensions will be linear predicted the script generator automatically processes the dimension to be linear predicted last.

Linear Prediction Type - The choices are none, forward, mirror-image (0,0) and mirror-image (90, -180). For dimensions without constant time evolution use forward linear prediction. For cases where the dimension was collected with constant time evolution it is best to use mirror-image linear prediction. However, forward linear prediction may be used on constant time data. For mirror image linear prediction use the (90, -180) setting when the initial point was collected at half dwell time. This is generally set by the f1180 and f2180 flags in Varian pulse sequences.

Predict - Is the number of the data points to predict. The larger the value the better the resolution will become up to the natural decay point of the fid, but at the expense of extra noise appearing in the final spectra. Like most processing parameters it is best to try different values to see which one works the best. For cases where the signal is weak or the truncation effect is minimal it is best to use a smaller value for predict and for cases where there is plenty of signal and the truncation effect is large use a larger value for predict. Note that for protein work we generally do not make predict greater than ni, but for small molecules or peptide work you may be able to make predict quite a bit larger with beneficial results.

Coefficients - Coefficients sets the number of sinusoids (signals) that will be predicted. Coef must be less than 50% of the number of points used in the prediction. If coef is set high the calculation time will be longer and may cause additional noise if coef is set much larger than the number of sinusoids in any given slice which is being linear predicted. If coef is set lower than the number of sinusoids being predicted in any given slice then different peaks will be fit with the same parameters and will lead to a frequency shift of the peaks. It is therefore very important to have coef set appropriately for your data set to get optimal results and not lead to any frequency errors in your spectrum.

Points - Points sets the actual number of collected points to be used in the prediction. This value is generally set to match ni or ni2 for t1 and t2 dimensions, respectively. By default the values are changed automatically on the form when ni and ni2 are entered.

Nextrap - Nextrap sets the point from which the linear prediction will proceed from. For example if ni = 32 and nextrap is set to 32 then points will be predicted from 33 onward. Typically this value is set equal to ni or ni2 for t1 and t2 dimensions, respectively. By default the values are changed automatically on the form when ni and ni2 are entered.

First Apodization (Window Functions): (top)

Rarely does Fourier transformation of the fid give rise to good quality spectra. There are often problems with the final result such as truncation artifacts, low signal to noise or limited resolution. Apodization is the process where the spectra is convoluted to achieve a more satisfactory line-shape. This is done by multiplying the fid by a time domain filter function. Two common functions are the sinebell and gaussian. The idea is to multiply the fid by a function so that it always decays away to zero at the end. This will remove truncation artifacts that will give rise to wiggles along the baseline near peaks, especially strong signals. For fids that are severely truncated this can lead to noise that stretches across the entire spectra. The strength of the signal will determine the type of function that you will want to apply. Typically one tries to increase the resolution as far as possible while keeping noise to a minimum. For some spectra that are very noisy the only thing that can be done is to decrease the noise at the expense of resolution. The initial few points of an fid are responsible for most of the signal to noise that you get. The stronger the initial part of the fid the weaker the noise will appear. The longer the fid "rings out" the higher the resolution will be. Therefore, increasing the initial part of the fid will lead to good signal to noise but poor resolution while enhancing the late parts of the fid will give better resolution but add noise. It is up to you to try and decide which function will give the most desirable effects. Often it is good to have two processed spectra, one with good signal to noise and one with good resolution. That way you can have the best of both worlds and will not have to compromise.

Function - The functions to choose from include none, Gaussian, sine bell, and sine bell squared. I suggest trying each of them initially to find which one gives the best results. Sine bell squared functions are most common in the indirect dimensions, and Gaussian functions are used primarily in the acquisition dimension. Note that when processing a given dimension with maximum entropy reconstruction no apodization is applied to that dimension. Apodization is only used when data is Fourier transformed.

Shift - Shift is used to determine the shift of the sine bell functions. It is not used for the Gaussian functions and can be ignored. The shift is entered in degrees. A shift of 90 gives a pure cosine function and a shift of 0 gives a pure sine bell function. Small values give increased resolution at the expense of extra noise and values near 90 give good signal to noise at the expense of resolution.

lb - Gaussian peaks have narrower line widths than lorentzian line-shapes, especially near the base of the peak. However, NMR signals have lorentzian line-shapes. The Gaussian window function converts the lorentzian line into a Gaussian line by multiplying the signal by a exponential to cancel the decay of the fid followed by applying a decreasing Gaussian function to introduce a Gaussian decay and hence a Gaussian line-shape. A typical value for lb is 20. Note that many other NMR processing programs enter lb as a negative number, however the Rowland NMR Toolkit needs this value to be positive. A negative value will create an exponentially rising function with disastrous results. The value for lb is very important and dramatically determines the shape of the Gaussian function. lb is only used with a Gaussian window function and can be ignored for sine bell functions.

gc - gc is the Gaussian decay coefficient. Typically a value of 20% is used to give the resonances a Gaussian line-shape. Note that many NMR processing packages enter this number as a decimal (Ex: 0.20), however, the Rowland NMR Toolkit needs this value to be a percentage. Both lb and gc are dependent on the sweep width and the number of points in the fid. Because of this it is important to view the Gaussian function first before applying it to make sure that it is doing what you think it is. gc is only used with a Gaussian window function and can be ignored for sine bell or sine bell squared functions.

Size - Size represents the number of points that the window function will be applied to in the direct dimension. Typically you apply the window function to all of the real points (1/2*np). However, in cases where np was set too high the size variable allows you to only select part of the fid for transformation. Lets say that np = 2048, giving 1024 real and 1024 imaginary points. When viewing the 1024 real points of the fid you realize that the signal has decayed away by point 256. If you process the data using all 1024 real points you get a large amount of noise. If you chop the fid off after real point 512 and transform it you will get a reduction in the noise level and not diminish resolution significantly or at all, as long as the signal truly has decayed away before the point in which you chopped the data. There is no size value for either of the indirect dimensions because generally you want to use all of the points in the transformation. For the indirect dimension size is set automatically to ni (ni2), or in the case of linear prediction it is set equal to the last point predicted. Size is set automatically on the form when np is entered.

Viewing Window Functions from the Macro Generator - Soon there will be a view button located from within the script generator form page that will display the window function for each dimension based on the selected parameters. Until that time the following recipe can be used for viewing the function using the RNMRTK program seepln.

Enter the following from the command line:

section -c 32000

rnmrtk loadvnmr ./fid start 1 num 1 - For Varian data

rnmrtk load ./ser.dat start 1 num 1 - For Bruker data. Note that a parameter file must have already been created.

rnmrtk dim t1

rnmrtk ones 1

rnmrtk FUNCTION: For example GM 20. 20.

seepln

section -d

Second Apodization: (top)

This allows the fid to be multiplied by a second window function. Currently the only choice is exponential multiplication (em). EM multiplies the data in the work space by an exponential window. This apodization function is used to reduce noise at the expense of spectral resolution. EM may be used alone (by setting the 1st apodization to none) or in conjunction with other window functions. EM is dependent on the sweep width and the number of points in the fid. Because of this always view the window function before transformation to be sure you know what you are applying. For instance if you are applying an em 5 to a fid with 1024 points it will be significantly different then applying em 5 to a fid with 256 points. Typically it is not beneficial to apply em unless the spectra is very noisy. In these cases it can be used quite effectively to help locate weak peaks hidden under the noise. However, this is done at the expense of resolution.

EM - No exponential multiplication is applied when em is set to 0. The larger the value of em the faster the exponential decay that is applied giving reduced noise but poorer resolution.

Phasing: (top)

Applies a phase correction of the frequency domain data. If both the zero order and first order phase correction values are zero no phase correction will be performed. It is typical to process the data with no phase corrections, phase the spectrum in nmrDraw, seepln, or your favorite visualization tool and then reprocess the data with the phase correction applied. To speed things up first process the data without linear prediction and do a minimal number of zero-fills. After you have determined the phases then go ahead and add all the extra bells and whistles to the processing script. Note that things such as window functions and linear prediction will not affect phase parameters.

NOTES: The Rowland NMR Toolkit uses a phase which has an opposite sign than that used by nmrPipe / nmrDraw and VNMR. When using nmrDraw to phase your spectra be sure to use the same value but opposite sign. It is sometimes necessary to apply an additional 90 degree phase correction to data that has been collected with sensitivity enhancement. This has been built in to the scripts as separate phase commands in addition to the general phasing commands. For Varian data this always seems to be necessary. However, I have not test a large number of Bruker data sets collected with sensitivity enhancement to know if the same holds true for Bruker data. Please let me know if Bruker data should be handled differently and I will make the appropriate changes to the script generator.

Directions for phasing the initial FID in the RNMRTK program seepln

Enter the following from the command line:

section -c 32000

rnmrtk loadvnmr ./fid start 1 num 1 - For Varian data

rnmrtk load ./ser.dat start 1 num 1 - For Bruker data. Note that a parameter file must have already been created.

rnmrtk dim t1

rnmrtk fft

seepln

Use the number keys 1-4 and 6-9 to phase and note final value.

section -d

Referencing Parameters: (top)

Referencing information is not only critical for obtaining correct position of signals, but also for the proper importing of data and for the proper operation of any processing command that utilizes the sweep width or spectrometer frequency, such as EM an GAUSSIAN apodization functions.

Sweep Width - sw, sw1 and sw2 are the sweep widths for each of the three dimensions.

Spectrometer Frequency sfrq1, sfrq2, and sfrq3 are the frequencies used in each of the three dimensions. The frequency used should be the frequency at zero ppm.

Reference ref1, ref2, and ref3 are the reference ppm at the center of each of the three dimensions. This program assumes that the reference value is set to the center of the spectra. Let me know if you would like to be able to select a reference point manually and I can edit the script generator.

Nucleus The nucleus that is detected in each of the three dimensions. These values are for display purposes only and do not affect the referencing in any way.

Notes: Once you have the spectrometer frequency set for the acquisition dimension the form page will automatically set the correct spectrometer frequency for indirect dimensions when the nucleus is selected. However, this only works on a refresh so be careful.

For Varian data we often utilize two scripts / macros to aid in setting up experiments and obtaining processing parameters. The first one is called setcar which is a Varian macro. To run the macro place it in your vnmrsys/macro folder and then type setcar from the VNMR command line. The program will prompt you for the ppm value for the center position of the acquisition dimension. You can then enter "n" to have the program display the referencing information for the other nuclei or select a transmitter channel. If you select a transmitter channel you are prompted to enter the ppm value you want the center of that particular nuclei to be set to. This will change dof, dof2, or dof3 appropriately.

The other program is call procpar.prl and is a perl script that parses a procpar file and extracts all the information needed to processes the data set. To use the program place the procpar.prl script in your path and type procpar.prl procpar 4.772 where 4.772 is the center ppm value of the acquisition dimension.

We also now have a new Varian macro varian2sbtools which in conjunction with a slightly modified BPsvf command. When installed the BPsvf command will create an extra folder called scripts. Inside the script folder will be auto generated rnmrtk and nmrPipe processing scripts, along with a sbtools parameter line which can be copied into the "Enter parameter string" text box at the top of the form to rapidly fill out the form with the appropriate values. A perl script called bruker 2sbtools.prl has also been created which does very similar things.

Maximum Entropy Parameters: (top)

If maximum entropy is selected for any dimension in the Processing Type? section then the Maximum Entropy Parameter section needs to be filled out. There are many dialog boxes to choose from, however, in many cases very few if any need to be entered by the user. For each choice under Maximum Entropy Algorithm the items that may need to be entered are shown in parenthesis. In all cases where data was collected with linear sampling the nuse values need to be entered for each indirect dimension that will be processed with maximum entropy reconstruction. nuse is not used when data was collected with non-uniform sampling. Note that nuse values on the form page are changed automatically when ni or ni2 values are entered. In most cases these default values will be used.

Auto with NO Deconvolution - When the Auto with No Deconvolution option is chosen, values for def and aim are determined automatically by the program noisecalc. If "Use separate noise fid" is NOT checked then noisecalc will analyze the fid collected with the largest time increment (the one that should have the weakest signal) and determine the magnitude of the noise in the spectra. This analysis will be used to determine initial values for def and aim. The noisecalc program will also analyze the FID with the shortest time increment (the one that should have the strongest signal - usually the first increment) to determine the position of the 10 largest signals which are outside the solvent region. Preliminary maximum entropy runs for slices (1D reconstructions) or planes (2D reconstructions) are performed and the 10 values of lambda determined after convergence for each slice/plane are averaged. The entire spectrum is then reprocessed in constant lambda mode utilizing the average lambda value from the preliminary runs. Note that when maximum entropy reconstruction is applied in both dimensions (msa2d) of a 2D data set and auto mode is selected the msa2d program simply runs once in constant aim mode and will converge with a single lambda value. There is no need to re-run in this case under constant lambda mode as a similar result would be obtained.

Separate Noise and Signal FID's - If "Uses separate noise fid" is selected then a separate filename needs to be provided which is a 1D spectrum collected identically to the data set being processed except that only one time increment is selected (1D) and the experiment was altered in some manner to provide a spectrum of pure noise. Setting the offset frequency shifted by at least 1/pw is one option and setting the transmitter power to a near zero level is another method. In either case the script generator will utilize that spectrum to analyze the noise and determine initial starting values for def and aim. If "Use separate signal fid" is selected then a separate filename needs to be entered for a 1D data set collected with an identical sweep width and number of points. This 1D spectrum will then be used by noisecalc to find the location of the 10 strongest signals outside the noise region. This option is only truly useful for experiments which give poor signal in the initial FID such as an HNCACB. In this case the first FID of an HSQC would be appropriate to use. Under normal conditions the use of "Auto Mode" WITHOUT separate noise or signal FID's seems to be giving satisfactory results. It is expected that the relative path from the script directory to the separate signal and noise FID's be provided on the form page. For example, if you collected a separate noise fid on a Varian and named the file noise.fid and you copied the noise.fid folder to the directory where you were processing the data then you would enter noise.fid/fid in the form text box.

Auto with Deconvolution - Auto with deconvolution works identically to Auto with NO Deconvolution so see instructions above. The only difference is that J-coupling deconvolution and/or Line-width deconvolution may be applied. J-coupling and line-width deconvolution settings are independent of each other and need not be set similarly for each dimension. Values for deconvolution only need to be selected for dimensions which will have maximum entropy reconstruction applied to them. J-coupling deconvolution will attempt to deconvolve a J-coupling of a given Hz value which must be entered. The coupling to be removed must be chosen as in-phase or anti-phase. It appears that choosing a J-coupling value slightly less than the theoretical coupling gives the best results. J-coupling deconvolution is best used for experiments where decoupling is not possible, the couplings to be removed are consistent in their value, and where the J-coupling is large enough to cause splitting of the signals or broadening. For line-width deconvolution a line-width must be entered and that line-width will be deconvolved from all of the signals giving sharper lines without altering the S/N significantly (assuming reasonable values are selected). Choosing large line-width values to deconvolve will give sharper signals, however, if the line-width value is set too large (especially if it approaches the natural line-width of the signals) then distortions will appear. It is also important when choosing to deconvolve line-width that you create a large enough final output size for the reconstruction. If the final data set size is too small and a large value for line-width deconvolution is chosen then the resulting FID's may ring out strong to the end causing truncation artifacts to appear. For both J-coupling and line-width deconvolution the number of iterations that will be necessary for convergence will be much higher than when NO deconvolution is used. You may find that the script errors with the message that convergence was not reached. If this occurs examine the output of the script (you may need to capture the output with > log or | tee log) to find the step which is failing. The value for loops may need to be increased in some situations beyond the default values to converge properly. The loops value can be simply edited with your favorite text editor and re-run.

AIM - In AIM mode you must enter values for def, aim, and nloops. nuse for each dimension processed by maximum entropy must be entered if the data was collected linearly. In constant aim mode J-coupling and line-width deconvolution may be applied in any combination for dimensions being processed by maximum entropy reconstruction. For 2D NMR experiments where maximum entropy reconstruction will only be used in the indirect dimension the constant aim method the MSA program will be run separately on each slice and each slice will therefore converge with a different lambda value. The lambda value is a sort of "scaling factor" so each slice will have a different scale. This will cause severe line-shape distortions when viewing the entire 2D spectrum. In general for 2D cases where MSA will only be performed in the indirect dimension the constant aim mode will only be used in a trial fashion to choose an appropriate lambda value. The entire 2D spectra will then be re-transformed in constant lambda mode. For 2D spectra where msa2d will be utilized to perform maximum entropy reconstruction along both the direct and indirect dimensions then constant aim mode is appropriate as the entire 2D plane will be processed in one step and hence have a single lambda value after convergence. No line-shape distortions will appear and there is no need to re-run in constant lambda mode. For 3D data sets where maximum entropy reconstruction is performed only along a single dimension (msa) or along both indirect dimensions (msa2d) constant aim mode again is only used as trial runs to find an appropriate lambda value. Once found the whole 3D spectrum can be re-processed in constant lambda mode so no line-shape distortions appear. Values for aim, lambda, and def can also be found automatically be using one of the two auto funcitons.

Constant Lambda - In Constant lambda mode values for def, lambda, nloops, and nuse must be entered. Nuse only needs to be entered for dimensions being processed by maximum entropy reconstruction and only for data that was collected linearly. It is typical to process 1d slices (msa) or 2D planes (msa2d) initially in constant aim mode to determine an appropriate value for lambda and then to re-process the data in constant lambda mode to obtain the final spectra. In constant lambda mode J-coupling and line-width deconvolution may be applied in any combination for dimensions being processed by maximum entropy reconstruction. Values for aim, lambda, and def can also be found automatically be using one of the two auto funcitons.

Def, Aim, Lambda, and NLoops - The parameters def, aim and lambda affect the results of the maximum entropy reconstruction. There are two methods in which maximum entropy reconstruction may be run. In constant aim mode values for def and aim are input and the value lambda is adjusted by the program until a convergence is met. In constant-lambda mode values for def and lambda are entered and the program adjusts aim until convergence is met. In principle, aim should be equal to the noise level in the data, which is not always easy to ascertain beforehand. In aim is too large, the result is a flat featureless spectrum. If aim is too small the result is a spectrum that resembles the zero-filled FT of the data. For linear data a method for estimating the aim value is to process the data with the FT without apodization or zero-filling. The noise level can then be estimated by computing the root-mean-square value for a blank region of the spectrum. The def value should be set to a value which is smaller than the smallest significant feature of the spectrum that you want to keep, but larger than the noise. If def is too large then the spectrum will resemble a zero-filled FT spectrum. If def is set too low then the onset of nonlinearity occurs in the range spanned by the noise and results in reconstructions with spiky noise peaks. Since the non-linearity of a maximum entropy reconstruction is dependent on def and lambda it is important that the final reconstruction be performed in constant-lambda mode for 3D data sets where data is reconstructed along two of the dimensions with msa2d or for 2D/3D data sets where only one dimension is reconstructed with maximum entropy (msa). In constant aim mode each slice (msa) or plane (msa2d) will have a different non-linearity and hence a different scale which would cause severe distortions to the line-shape. In principle constant-aim mode is used as a test to find a good value for lambda. The one exception to this case if for 2D data sets being processed with msa2d. In this case the entire spectrum is reconstructed in one go so there is no reason to re-run in constant lambda mode. For situations where peak amplitudes across spectra need to be compared, such as relaxation data sets, the final reconstructions should always be done in constant-lambda mode to get proper values of the integrals across all the spectra.

Very Generic, NON-Mathematical, Description of how Maximum Entropy Reconstruction Works - The Fourier Transforms converts data collected in the time domain to the frequency domain. In order to accomplish this goal the time spacing between each point in the time domain data (usually 1/sw) must be equally spaced and no points may be missing. For the directly detected dimension this is typically not an issue as the entire time domain data is usually collected in under 100 msec. However, for indirect dimensions each time point can take from several seconds to several minuets to collect depending on the amount of time averaging that needs to be performed. This causes 3D and 4D experiments to often take days to collect, and the data is usually truncated as it would simply take too long to collect enough points to allow the signal to fully decay away. This leads to a reduction in the possible resolution which could be obtained if a greater number of increments were collected. When processing data with the Fourier transform one typically attempts to predict out those points which were not collected by using linear prediction. Maximum entropy reconstruction takes an "inverse" look at the problem of converting data from the time domain to the frequency domain. Rather than applying a FT directly to the time domain data, the frequency domain data is "guessed". This "guessed" frequency domain data is then inverse Fourier transformed back to the time domain and the "guessed" time domain data is compared to the collected time domain data. Based on comparison between the two time domain data sets a better guess of the frequency domain data can be generated. The process is then repeated and the "guessed" time domain data will be closer match to the collected time domain data. After several rounds of repeating the process a convergence will be achieved where the experimentally collected time domain data matches the "guessed" frequency domain data after inverse Fourier transformation. While it may appear that it may take a very long time for convergence to occur, in practice convergence generally occurs in less than 50 rounds for data with a significant number of signals and even quicker for noise regions. One huge advantage of this "inverse" approach is that not all of the data points need to be used. Assume that only 1/3 of the points in the collected time domain data were actually collected. When the frequency domain data is "guessed" and inverse Fourier transformed all of the points are present in the "guessed" FID. However, the comparison to the collected FID is only preformed for those points which were collected. In this manner data with missing points can be easily processed. In principle this method can be used along multiple dimensions concurrently. In practice the Rowland NMR Toolkit only has the ability to process along 1 dimension (msa) and 2 dimensions (msa2d) at a time. For 3D data sets this is not a significant limitation as the acquisition dimension can be processed by using the Fourier transform. For 4D data sets one of the indirect dimensions needs to be collected uniformly and processed by using the Fourier transform. An msa3d version is currently under development. Values for def, aim, and lambda determine how the convergence proceeds and are thus very critical to getting a proper convergence. The AUTO buttons in the script generator have been setup so the scripts utilize a separate program (noisecalc) to obtain proper values for def, aim and lambda without any user interaction. Once determined, the user may wish to "tweak" the parameters slightly to either obtain a spectra with more emphasis on smaller signals or on obtaining a cleaner spectra. The values for def, aim, and lambda used in AUTO mode are reported in the log files for the user to view. One additional feature of maximum entropy reconstruction is the ability to deconvolve line-width and J-couplings from the final frequency domain data without the addition of noise or line-shape distortions which occur when severe apodization functions are used to increase resolution of spectra. The "guessed" frequency domain data is predicted with the line-width or J-coupling removed and then before comparison to the experimentally collected data set the line-width and J-coupling are multiplied back into the "guessed" spectra. This feature allows one to narrow line-widths by performing "virtual" decoupling or by reducing the line-width of all the resonances. Note that for J-coupling deconvolution to work properly all the J-couplings to be deconvolved must be similar. For line-width deconvolution it is important that you do not deconvolve more line-width than exists in the peak or severe distortions to the line-shape will occur and severe truncation artifacts will appear. If reasonable values are applied these two feature can allow dramatically improved spectra over conventional processing schemes. When processing data with line-width or J-coupling deconvolution being applied the number of loops needed to obtain convergence is much higher. Therefore, nloops must be set high (above 1000 typically) compared to 200-300 for a reconstruction without deconvolution.

Non-linearity of maximum entropy reconstructions One feature of maximum entropy reconstructions is that a non-linearity occurs where strong peaks are amplified relative to small peaks and small peaks are squashed in amplitude. The values def and lambda set the point at which this non-linearity occurs. It is therefore important that def be set high enough so noise spikes are not amplified to look like real signals and low enough so that noise peaks are not squashed into the noise. The Auto feature determines the values for def, aim, and lambda for you automatically for you, and then the user can "tweak" these values if desired. For many spectra the non-linearity of the data is not a significant issue, but for spectra where you need precise integration of the signals a maximum entropy reconstruction will pose a slight issue as the integrals will not be linearly related. However, a simple way to overcome this issue is to synthetically inject peaks of known parameters (frequency, line-width, amplitude) into the spectrum in an area where no peaks exist so as to not cause any overlap. If several peaks are injected with different amplitudes a calibration curve of peak amplitude versus integral value can be measured. This calibration curve can then be used to back correct the integral values of your peaks. This process has been done by several groups and is robust enough to handle even the demanding integration requirements of NMR relaxation experiments.

Extract Left Half of Spectrum? (top)

For ¹⁵N edited spectra it is typical that only the amide resonances appear in the direct dimension. Since all of the amide resonances are downfield of water, which is typically the center of the spectra, there is no need to keep the right half of the spectra. For these cases it is best to cut the direct dimension in half. This saves disk space by 50%, decreases processing time 4 fold for 3-dimensional spectra, and allows for faster screen drawing during analysis. Updating of the referencing is handled by the program. At this point the cut dimension in half switch will only save the left half of the spectra. If there is a need I can edit the generator to allow selection of the right half of the spectra or allow the user to define a particular region to save. Let me know if either choice would be beneficial.

Output File Formats (top)

By default the output will be saved in the Rowland NMRToolkit format. This file format contains two files. One is the processed NMR data in binary format and the other is a file with the same base name and a .par extension which is a small text file which defines the layout of the binary format. In addition to the rnmrtk file format, which can be viewed with the programs contour and seepln, the data can also be converted to nmrDraw/nmrPipe, Xeasy, Felix, or nmrView and Sparky formats. Two additional file formats are available and for the 3D nmrPipe formats the order of the output can be selected which can be useful for checking the phasing along all three dimensions without the need to swap the order of the data. The conversion to Xeasy, Felix, and nmrPipe file formats is built into the rnmrtk program. Sparky and nmrView file formats are created from an nmrPipe file format as an intermediate. Therefore nmrPipe must be installed and in the path for these conversions to work. Also the pipe2ucsf and ucsfdata programs must be installed for the Sparky conversion.

Remove Temporary Files (top)

The script generator creates many temporary files when executed. These temporary files can be either removed automatically or left in the folder. The only difference to the scripts is whether the rm statements at the end of the script are commented or uncommented. This allows for the user to easily change the behavior of the script in regard to removing temporary files and gives the user the complete rm command to be used later if desired to "clean up" a folder from these temporary files. Certain files which are likely to be of value to the user are not deleted automatically. However, these files are removed automatically when the script is executed an additional time.