Below are the instructions for the Rowland NMR Toolkit script generator and
general information about NMR data processing. Click here for a link to the
official Rowland NMR Toolkit
manual.
- Introduction:
(top)
- The Rowland NMR Toolkit Script generator is designed to aid in the creation of RNMRTK scripts for the processing of 2- and 3-dimensional NMR data collected using States, or States-TPPI for quadrature detection. It also handles sensitivity-enhanced data without any prior manipulation of the data.
- The script generator is still under development. While I have attempted to eliminate as many bugs as possible I can not test all possible
scenarios. If you find any bugs in the script generator please let me know and I will attempt to fix them as soon as possible (markm@neuron.uchc.edu). The program at this point does some very basic error checking when executed. If any of the parameters you have selected are incompatible an error message should appear with suggestions on how to fix the problem. However, the checks are not rigorous and there are sure to be cases where incorrectly entered parameters may not be caught by the error checking. If you find any additional error checks you would like included please let me know.
Many of the comments made here come from the book "NMR Data Processing" by Jefferey C. Hoch and Alan S. Stern (1996).
- Usage:
(top)
- After creating a script using the RNMRTK script generator copy the script to your favorite text editor.
Save the file to any name you like (here I will use process.com). Make the file executable by typing chmod u+x
process.com, or whatever you called the script. Then simply execute the script by typing ./process.com, or
whatever name you used. I often use the command tee to generate a log file of the script "./process.com | tee process.log".
Note that on OSX systems I find it best to use vi as the text editor as it handles copy/pastes from OSX browsers
without causing issues with hidden characters and premature line breaks.
- Enter parameter string / Load Parameters:
(top)
- The Enter parameter string text box allows one to type in any parameter directly to set its value. For example if you
want to set the number of cpu's to 4 you can type nproc=4 in the text box and
hit the "Load Parameter" button. You can enter multiple values to be set at once
by separating them with a comma, but with no spaces. For example "nproc=2,machine=V,proc_type2=ft"
will set the number of cpus to 2, the machine type to Varian, and the t1
processing type to FT.
Of course this method is probably more laborious than simply filling out the
form directly. However, I cave created macros/scripts that will generate an
input string which may be copied and pasted into the text box to rapidly fill
out the form.
For Varian instruments I have created a macro called varian2sbtools and a
minor modification to the BPsvf macro which calls varian2sbtools upon a save.
These macros create a file called sbtools.input which is the parameter string to
be copied into the text box.
For Bruker data sets I have created a perl script called bruker2sbtools.prl
which is run from the directory of the saved data. Likewise this script creates
a file called sbtools.input which is the parameter string to be copied into the
text box.
All of the scripts are available from the
downloads page.
These macros / scripts also create additional files, macros, etc. to aid in data
processing. See the scripts themselves for more details.
- Dimension:
(top)
- Toggle to switch between 2-dimensional and 3-dimensional data. When the 2-dimensional
toggle is selected only the first two columns will appear (direct dimension and t1:first indirect).
When the 3-dimensional toggle is selected all three columns will appear (direct dimension, t1:first indirect, and t2:second indirect).
- CPU's:
(top)
- Pull down menu that allows you to choose how many CPU's to use in the
calculation. This feature is only utilized when performing Maximum Entropy
reconstructions in two dimensions (msa2d). The script generator will divide a 3D
data set into chunks of 2D planes to reduce the overall memory requirements of
the MSA2D program. The number of planes in each chunk will be equal to the
number of CPU's selected. An environmental variable MP_SET_NUMTHREADS is set
which tells the msa2d program to use multiple threads (cpu's). Note that the
number of cpu's chosen can exceed the number of actual cpu's in your computer as
each cpu can run multiple threads simultaneously, each thread will just take
longer to run because they will be sharing the system resources.
The cluster
option allows an msa2d calculation to be distributed over a cluster of
computers. A cluster is defined as any group of computers which you can connect
via ssh without the use of a password and which you have access to the same
"home" folder, likely through an NFS mount. Currently each of the computers
should be able to call the same binary msa2d program. However, minor changes to
the scripts could possibly allow distribution over different hardware platforms.
The CPU value determines how large a chunk to send to each of the cluster nodes.
The cluster scripts provide load balancing so that computers of different
capabilities may be mixed easily with no extra work. In internal testing I have
found that there is no benefit to increasing the number of CPU's beyond the
number of CPU's in the computers to reduce the total I/O costs of having to make
a larger number of ssh connections and to have a larger number of read/write
steps. In order to get the cluster option to function three files are needed;
cluster-msa2d, msa2d-process, and a computer_list. The computer_list will need
to be created for your cluster. The two scripts and an example computer_list
file can be downloaded from the
downloads page. The
two scripts will need to have minor modifications of defined paths when they are
installed and they should be placed somewhere in the path (the exe folder of the
rnmrtk program is a good spot).
- Sampling:
(top)
- Dialog box for selecting how the indirect dimensions of the data set were collected. Uniform means that the time increments in the indirect dimensions were collected with uniform time spacing (typical manner of collecting data). Non-uniform means that the indirect time points were taken from a file and could be non uniform in their spacing. If
non-uniform is selected a filename with the sampling schedule needs to be
entered and the file must be in the current directory where the script will be
executed from.
A "Use sample schedule" checkbox has been added. When checked a
sample schedule is used and the filename must be provided. If the data was
collected with non-uniform sampling then this checkbox must be selected. If the
data was collected linearly and the "Use sample schedule "checkbox is checked
then the data will be processed non-uniformly. Only those points which are
present in the sample schedule will be kept, all others will be deleted. This
mode can be used for various purposes. One is to process uniformly sampled data
where a few of the FID's are corrupted due to an error such as an ADC overflow.
See the FAQ list for more details. Another purpose is to collect a large
complete uniform data set and then apply several different sampling schedules
for testing purposes. Lastly, it gives the user the ability to test processing
non-uniform data without the hassle of figuring out how to collect non-uniform
data. It is hoped that this type of test will convince the user that it is worth
collecting non-uniform data in order to obtain the highest resolution data in
the quickest amount of time.
Lastly a "Random Order?" checkbox has been added. If the sampling schedule
for 2D experiments is not in sequential order then this checkbox should be
checked. This checkbox has no effect for 3D experiments.
- Input:
(top)
- Filename of the raw NMR data. There are three checkboxes called Varian,
Bruker, and RNMRTK.
When the Varian checkbox is selected the input filename is
changed to fid by default. Varian data will be loaded using the loadvnmr command
built into the rnmrtk program. The procpar file is parsed by the loadvnmr
command to find all relevant information needed to load the data. For data
collected with non-uniform sampling the procpar file will be edited to have
additional information appended to the end of the file which will allow the
loadvnmr command to function without any further user intervention. For 3D
processing there is a button which allows the user to select between
phase2,phase and phase,phase2. For uniform sampling this setting is not used as
the loadvnmr command extracts the proper setting directly from the procpar file
itself. However, setting this value correctly for non-uniform sampling is
critical.
When
the Bruker checkbox is selected the input filename is changed to ser by default.
The data will be loaded with the generic rnmrtk load command. This load command
needs the raw filename to have an extension and there needs to be a parameter
file defining the layout of the raw data with the same name as the raw data but
with a .par extension. The script that the RNMRTK script generator creates will
automatically add a .dat extension to the ser filename and automatically create
a parameter file called ser.par. No further intervention should be needed by the
user. For data sets collected with digital filtering you may enter the number of
points to left shift your data to remove the "odd" beginning of the Bruker FID's.
Shifting left will allow proper processing of the data, but will introduce a
significant curvature of the baseline near the edge of the spectrum. This is
normally not an issue as long as the sweep width is large enough so that real
peaks are not at the edge of the spectrum. To determine the number of points to
shift left you can examine the data set or the following
table may be of use.
When the RNMRTK checkbox is selected the user must provide a raw NMR
filename along with an extension. It is expected that a parameter file with the
same base name as the raw NMR data is present which defines the layout of the
raw NMR data. See the load command from the rnmrtk program for further details (RNMRTK manual).
- Big/Little Endian:
(top)
- The big-endian little-endian checkbox only needs to be selected for Bruker data. In general if you have a SGI computer
which controls your system you will select little-endian. For all other computer types big-endian needs to be selected.
- Output:
(top)
- Name of the final transformed data set. The data set will automatically be saved in RNMRTK format with a .sec extension.
A parameter file of the same base name will also be created with a .par
extension which defines the layout of the rnmrtk file format. If additional file formats are selected the same base name will be used and the appropriate extension will be added.
- Output Format - For security reasons only letters, numbers and underscore are allowed in valid filenames. If you need to use other character types simply use a text editor to rename them after the script is generated.
- Processing Type:
(top)
- Sets the processing type to none, FT, or maximum entropy reconstruction. None cannot be selected for the acquisition dimension, but can be selected for the indirect dimensions. For 3D data sets the only option for the acquisition dimension is FT
as the msa2d program can only currently reconstruct two dimensions at a time. A
3D version is being created. Data collected with uniform time spacing (typical
data collection) can be processed with either FT or maximum entropy
reconstruction. Data collected non-uniformly must be processed with maximum
entropy reconstruction. Light blue areas must be filled out for both FT and maximum entropy options. Dark blue areas must be filled out for a given column if the processing type is set to FT. The grey area must be filled out if maximum entropy
reconstruction is selected in any dimension. Advantages of using maximum entropy
reconstruction over FT is the ability to get more robust results as compared to
linear prediction, the ability to de-convolve natural line-width and J-couplings
to get significant increases in resolution without compromising the S/N, and the
ability to use sparse data sets which can be used to collect much shorter
experiments without compromising sensitivity and which can achieve higher
resolution.
- Number of Data Points:
(top)
- For each column in the form page there are three values; total, real, and
imag. Once one value is entered the other two should be filled out automatically
by JavaScript. For Varian instruments the first column is represented by np, the
second column ni, and the third column ni2. For Bruker data the first column is
represented by TD from acqus, the second column TD from acqu2s, and the third
column TD from acqu3s.
- total - The total number of points in each free induction decay (fid). Each fid consists of 1/2 real and 1/2 imaginary points.
Varian and Bruker both define np and TD the same for the acquisition dimensions
and the value for total should be equal to np (Varian) or TD (Bruker). For
indirect dimensions Bruker uses the same convention. The TD value from the
acqu2s and acqu3s are the total number of fids (reals + imaginaries). Varian
defines the ni and ni2 values as a complex number and therefore total for the
indirect dimensions for Varian should be equal to ni*2 and ni2*2.
- real - Number of real points collected in each of the dimensions.
For Varian the acquisition dimension should be equal to 1/2*np, and for the
indirect dimensions it should be equal to ni and ni2. For Bruker all three
should be equal to the appropriate TD/2.
- imag - Number of imaginary points collected in each of the
dimensions. For Varian the acquisition dimension should be equal to 1/2*np, and
for the indirect dimensions it should be equal to ni and ni2. For Bruker all
three should be equal to the appropriate TD/2.
- Output Data Size:
(top)
- The output data size sets the final REAL data set size along all
dimensions and controls the number of zero-fills used to process the data. For
Fourier transformed data, if the output data size for any given dimension is
larger than the number of complex points collected or linear predicted then the
data will be zero filled to the size of the output data size for that dimension.
Note that for the acquisition dimension if "Extract Left Half of Spectrum?" is
selected the zero-fill size will be doubled as it will need to be cut in half to
give the final output data set size selected.
Zero-filling extends an fid by appending zeros to the end. This causes a
slight increase in the digital resolution of the frequency domain data after
Fourier transformation and allows imaginary data to be reconstructed using a
Hilbert Fourier transformation. For data which is being processed by maximum
entropy reconstruction, the data set will be reconstructed to the size selected
in the output data size for each dimension. Note that if the output data size
for any given dimension is larger than the number of complex points collected in
that dimension that the data will be "predicted" to the final data set size much
like linear prediction. However, the results of maximum entropy reconstruction
should give superior results to linear prediction when done properly.
- Acquisition Type:
(top)
- Selects the type of data you have collected in each dimension. Currently for the acquisition dimension the only choice is complex. For the two indirect dimensions the current choices are States, States-TPPI, and sensitivity enhanced data. There is no reason to rearrange sensitivity enhanced data before processing.
It is assumed that sensitivity enhanced data is collected as either States or
States-TPPI.
- Quadrature Fix:
(top)
- The R, I, -R, -I option for the quadrature fix will multiply every other real/imaginary pair along a given dimension
by negative one. This is completely analogous to the -ALT flag that nmrPipe uses
for the FT command. This will cause a reordering of the signals along the given
dimension and is often needed when processing Bruker data sets. In principle
other combinations for QUADFIX may be used such as R, -I which would multiply
all the imaginaries by negative one (conjugate) and cause a reversal of the
spectrum along that dimension. At this point reversals are handled by a separate
reverse switch on the script generator. Other fixes will be added later as
needed.
- Reverse Spectrum:
(top)
- Often the indirect dimensions of 2-dimensional and 3-dimensional NMR experiments are reversed.
In some cases the acquisition dimension can also be reversed. This can be fixed by changing the phase of the
receiver during detection, but it is easier to reverse the fid during processing. The acquisition dimension is
reversed by taking the complex conjugate of the fid during the initial importing of the data set.
Indirect dimensions are reversed using the reverse command.
- Delete Imaginaries?:
(top)
- Check these boxes if you would like the imaginaries deleted during
processing. Note that once you know your phase parameters the imaginaries can be
safely deleted. If you are having trouble with memory size make sure the
imaginaries are being deleted and check your output data set size.
Note that when converting to other analysis packages such as XEASY, Sparky,
NMRPipe, etc. the imaginaries are automatically deleted even when the delete
imaginaries checkboxes are not selected as these packages do not accept
imaginary data. However, the RNMRTK formatted output file will have the
imaginaries saved in this case.
In order to reduce processing time it is important to delete the imaginaries
in the acquisition dimension when processing the two indirect dimensions with
maximum entropy reconstruction.
- Time Domain Convolution:
(top)
- Time domain convolution is a very effective method to remove large solvent signals, such as residual water, from your spectra. In this script generator you may only select a single
frequency to subtract. However, you can simply edit the script after creation to add additional sstdc command lines with different frequencies to subtract more than one signal. The Defaults button
resets the Time Domain Convolution parameters back to their default values.
- Filter Width - The filter width is inversely proportional to the size of the window to suppress
and is dependent on the line-width of the signal you are removing and the number of points in the fid. The smaller the value the greater the amount of signal that is subtracted and the faster the calculation time. For experiments where no signals overlap the solvent signal you generally want to use small values for size, such as 16. For experiments where you have closely spaced resonances to the solvent signal it is generally best to try larger values (~60) and to test several different window sizes to get the best results of subtracting solvent and leaving your signals
unaffected.
- EndPoints - The endpoints parameter sets the number of points and the beginning and end of the fid to be treated as extrapolation rather than convolution.
- Filter Shape - The filter shape can be selected to Gaussian (default) or Cosine.
- Frequency - Allows the frequency of the signal to be suppressed to be entered. By default the signal to be suppressed is at zero frequency (the center of the spectrum), however, any frequency may be entered.
- Remove Diagonal - For 2D data sets the remove diagonal checkbox can be selected to use the solvent suppression
command sstdc to remove the diagonal.
- Correct First Point:
(top)
- This setting allows you to correct the initial point(s) of the acquisition fid by backward linear prediction.
The first few points of the acquisition fid are often responsible for causing baseline rolls after
Fourier transformation. This is a problem because the initial points of an fid are often collected incorrectly and have large amounts of noise as a result of collecting data usually only a few usec after applying high power RF pulses to the probe coil and other imperfections in the probe itself. These points can be corrected by backward linear prediction. If linear predict first point(s) is selected then the initial point or points are predicted. This replaces the existing points, it does not add additional points to the beginning of the fid.
- Predict - The value "predict" selects the number of points to predict. Generally a value of 1
or 2 is chosen.
- Points - The value "points" selects the number of points to use in the prediction.
- Coef - The value "coef" selects the number of coefficients to use in calculation. Coef should not exceed 50% of "points".
- Scale First Point and DC Offset:
(top)
- Scale - If Scale First Point is selected then the value scale is
multiplied by the initial point of the fid. This applies for all dimensions. For
any given dimension the value of scale should typically be 0.5 assuming that the
first order phase correction is zero. For cases where the first order phase
correction is not zero then scale is typically set to 1.
- DC Offset - Often the tail end of an fid does not equal zero because the entire fid is either shifted up or down slightly from the zero point. This is referred to as a DC offset. If an fid with a DC offset is
Fourier transformed a spike at zero frequency will appear. Worse, if the fid is zero-filled, the appended zeros will not extend from the last point for the fid but rather will be offset from the last point. This will be interpreted as a truncation artifact when performing a
Fourier transformation and cause wiggles at the base of peaks. Both of these adverse effects can be removed by simply adjusting the fid up or down so that the center of the fid is near zero. This feature can be turned on and off by adjusting the DC Offset switch. It is only available in the acquisition dimension.
- Linear Prediction:
(top)
- Linear prediction extrapolates additional data points to time-domain data (fid). Linear predicting can be an effective way to increase the number of data points, and hence resolution, for data sets that are truncated. Often in 3-dimensional data sets you set ni and ni2 small to save acquisition time even though the signal has not decayed away to zero. Using linear prediction to extend the fid in these cases can improve the resolution considerably.
-
- Maximum entropy reconstruction can be used as a replacement for linear
prediction and Fourier transformation for both linear and non-linearly collected
data.
Linear prediction should only be used when the signal you are trying to predict has not decayed completely away to zero. If the signal has already decayed to zero then linear predicting further data points will generally add noise and not improve resolution. It is therefore best to always process data without any linear prediction and then compare the spectra to one with linear prediction. It is also best to try different linear prediction parameters and compare them to see which works the best. Remember processing parameters can have huge effects on the quality of the data.
For experiments that have dimensions that were collected with constant time evolution it is generally best to use mirror-image linear prediction. See the readme file of the pulse sequence if you are unsure if any of the dimensions were collected with constant time evolution. In general, the mirror-image linear prediction algorithm will give superior results and is faster to perform.
Linear prediction works best when the signal is strong, truncated, and there are as few peaks as possible to predict. Because of this last feature it is best in 3-dimensional data sets, where both the t1 and t2 dimensions are to be linear predicted, to
Fourier transform the acquisition dimension first, then transform the t1 dimension without linear prediction and then process the t2 dimension with linear prediction. Afterwards the f1 dimension can be inverse
Fourier transformed, linear predicted, and then re-transformed. All of this is built into the script generator and takes no extra work on your part.
For 3-dimensional data sets where only one of the indirect dimensions will be
linear predicted the script generator automatically processes the dimension to
be linear predicted last.
- Linear Prediction Type - The choices are none, forward, mirror-image (0,0) and mirror-image (90, -180). For dimensions without constant time
evolution use forward linear prediction. For cases where the dimension was collected with constant time evolution it is best to use mirror-image linear prediction. However, forward linear prediction may be used on constant time data. For mirror image linear prediction use the (90, -180)
setting when the initial point was collected at half dwell time. This is generally set by the f1180 and f2180 flags in Varian pulse sequences.
- Predict - Is the number of the data points to predict. The larger the value the better the resolution will become
up to the natural decay point of the fid, but at the expense of extra noise
appearing in the final spectra. Like most processing parameters it is best to try different values to see which one works the best. For cases where the signal is weak or the truncation effect is minimal it is best to use a smaller value for predict and for cases where there is plenty of signal and the truncation effect is large use a larger value for predict. Note that for protein work we generally do not make predict greater than ni, but for small molecules or peptide work you may be able to make predict quite a bit larger with beneficial results.
- Coefficients - Coefficients sets the number of sinusoids (signals) that will be
predicted. Coef must be less than 50% of the number of points used in the prediction. If
coef is set high the calculation time will be longer and may cause additional noise if coef is set much larger than the number of
sinusoids in any given slice which is being linear predicted. If coef is set lower than the number of
sinusoids being predicted in any given slice then different peaks will be fit with the same parameters and will lead to
a frequency shift of the peaks. It is therefore very important to have coef set
appropriately for your data set to get optimal results and not lead to any frequency errors in your spectrum.
- Points - Points sets the actual number of collected points to be used in the prediction. This value is generally set to match ni or ni2 for t1 and t2 dimensions, respectively.
By default the values are changed automatically on the form when ni and ni2 are
entered.
- Nextrap - Nextrap sets the point from which the linear prediction will proceed from. For example if ni = 32 and nextrap is set to 32 then points will be predicted from 33 onward. Typically this value is set equal to ni or ni2 for t1 and t2 dimensions, respectively.
By default the values are changed automatically on the form when ni and ni2 are
entered.
- First Apodization (Window Functions):
(top)
- Rarely does Fourier transformation of the fid give rise to good quality spectra. There are often problems with the final result such as truncation artifacts, low signal to noise or limited resolution. Apodization is the process where the spectra is convoluted to achieve a more satisfactory line-shape. This is done by multiplying the fid by a time domain filter function. Two common functions are the sinebell and gaussian. The idea is to multiply the fid by a function so that it always decays away to zero at the end. This will remove truncation artifacts that will give rise to wiggles along the baseline near peaks,
especially strong signals. For fids that are severely truncated this can lead to noise that stretches across the entire spectra.
The strength of the signal will determine the type of function that you will want to apply. Typically one tries to increase the resolution as far as possible while keeping noise to a minimum. For some spectra that are very noisy the only thing that can be done is to decrease the noise at the expense of resolution. The initial few points of an fid are responsible for most of the signal to noise that you get. The stronger the initial part of the fid the weaker the noise will appear. The longer the fid "rings out" the higher the resolution will be. Therefore, increasing the initial part of the fid will lead to good signal to noise but poor resolution while enhancing the late parts of the fid will give better resolution but add noise. It is up to you to try and decide which function will give the most desirable effects. Often it is good to have two processed spectra, one with good signal to noise and one with good resolution. That way you can have the best of both worlds and will not have to compromise.
- Function - The functions to choose from include none, Gaussian,
sine bell, and sine bell squared. I suggest trying each of them initially to find which one gives the best results.
Sine bell squared functions are most common in the indirect dimensions, and
Gaussian functions are used primarily in the acquisition dimension. Note that when processing a given dimension with maximum entropy reconstruction no apodization is applied to that dimension. Apodization is only used when data is Fourier transformed.
- Shift - Shift is used to determine the shift of the sine bell functions. It is not used for the
Gaussian functions and can be ignored. The shift is entered in degrees. A shift of 90 gives a pure cosine function and a shift of 0 gives a pure
sine bell function. Small values give increased resolution at the expense of extra noise and values
near 90 give good signal to noise at the expense of resolution.
- lb - Gaussian peaks have narrower line widths than lorentzian line-shapes, especially near the base of the peak. However, NMR signals have lorentzian line-shapes. The
Gaussian window function converts the lorentzian line into a Gaussian line by multiplying the signal by a exponential to cancel the decay of the fid followed by applying a decreasing
Gaussian function to introduce a Gaussian decay and hence a Gaussian line-shape. A typical value for lb is 20. Note that many other NMR processing programs enter lb as a negative number, however the Rowland NMR Toolkit needs this value to be positive.
A negative value will create an exponentially rising function with disastrous results. The value for lb is very important and dramatically determines the shape of the
Gaussian function. lb is only used with a Gaussian window function and can be ignored for
sine bell functions.
- gc - gc is the Gaussian decay coefficient. Typically a value of 20% is used to give the resonances a
Gaussian line-shape. Note that many NMR processing packages enter this number as a decimal (Ex: 0.20), however, the
Rowland NMR Toolkit needs this value to be a percentage. Both lb and gc are dependent on the sweep width and the number of points in the fid. Because of this it is important to view the
Gaussian function first before applying it to make sure that it is doing what you think it is. gc is only used with a Gaussian window function and can be ignored for sine bell or sine bell squared functions.
- Size - Size represents the number of points that the window function will be applied to in the direct dimension. Typically you apply the window function to all of the real points (1/2*np). However, in cases where np was set too high the size variable allows you to only select part of the fid for transformation. Lets say that np = 2048, giving 1024 real and 1024 imaginary points. When viewing the 1024 real points of the fid you realize that the signal has decayed away by point 256. If you process the data using all 1024 real points you get a large amount of noise. If you chop the fid off after real point 512 and transform it you will get a reduction in the noise level and not diminish resolution significantly
or at all, as long as the signal truly has decayed away before the point in which you chopped the data. There is no size value for either of the indirect dimensions because
generally you want to use all of the points in the transformation. For the indirect dimension size is set automatically to ni (ni2), or in the case of linear prediction it is set equal to the last point predicted.
Size is set automatically on the form when np is entered.
- Viewing Window Functions from the Macro Generator - Soon there will be a view button located from within the script generator form page that will display the window function for each dimension based on the selected parameters.
Until that time the following recipe can be used for viewing the function using
the RNMRTK program seepln.
- Enter the following from the command line:
- section -c 32000
- rnmrtk loadvnmr ./fid start 1 num 1 - For Varian data
- rnmrtk load ./ser.dat start 1 num 1 - For Bruker data. Note that a parameter file must have already been created.
- rnmrtk dim t1
- rnmrtk ones 1
- rnmrtk FUNCTION: For example GM 20. 20.
- seepln
- section -d
- Second Apodization:
(top)
- This allows the fid to be multiplied by a second window function. Currently the only choice is exponential multiplication (em). EM multiplies the data in the work space by an exponential window. This apodization function is used to reduce noise at the expense of spectral resolution. EM may be used alone (by setting the 1st apodization to none) or in conjunction with other window functions. EM is dependent on the sweep width and
the number of points in the fid. Because of this always view the window function before transformation to be sure you know what you are applying. For instance if you are applying an em 5 to a fid with 1024 points it will be significantly different then applying em 5 to a fid with 256 points. Typically it is not beneficial to apply em unless the spectra is very noisy. In these cases it can be used quite effectively to help locate weak peaks hidden under the noise. However, this is done at the expense of resolution.
- EM - No exponential multiplication is applied when em is set to 0. The larger the value of em the faster the exponential decay that is applied giving reduced noise but poorer resolution.
- Phasing:
(top)
- Applies a phase correction of the frequency domain data. If both the zero order and first order phase correction values are zero no phase correction will be performed. It is typical to process the data with no phase corrections, phase the
spectrum in nmrDraw, seepln, or your favorite visualization tool and then reprocess the data with the phase correction applied. To speed things up first process the data without linear prediction
and do a minimal number of zero-fills. After you have determined the phases then go ahead and add all the extra bells and whistles to the processing script. Note that things such as window functions and linear prediction will not affect phase parameters.
NOTES: The Rowland NMR Toolkit uses a phase which has an opposite sign than that used by nmrPipe / nmrDraw and VNMR. When using nmrDraw to phase your spectra be sure to use the same value but opposite sign. It is sometimes necessary to apply an additional 90 degree phase correction to data that has been collected with sensitivity enhancement. This has been built in to the scripts as separate phase commands in addition to the general phasing commands. For Varian data this always seems to be necessary. However, I have not test a large number of Bruker data sets collected with sensitivity enhancement to know if the same holds true for Bruker data. Please let me know if Bruker data should be handled differently and I will make the appropriate changes to the script generator.
-
- Directions for phasing the initial FID in the RNMRTK program seepln
- Enter the following from the command line:
- section -c 32000
- rnmrtk loadvnmr ./fid start 1 num 1 - For Varian data
- rnmrtk load ./ser.dat start 1 num 1 - For Bruker data. Note that a parameter file must have already been created.
- rnmrtk dim t1
- rnmrtk fft
- seepln
- Use the number keys 1-4 and 6-9 to phase and note final value.
- section -d
-
- Referencing Parameters:
(top)
- Referencing information is not only critical for obtaining correct position
of signals, but also for the proper importing of data and for the proper
operation of any processing command that utilizes the sweep width or
spectrometer frequency, such as EM an GAUSSIAN apodization functions.
-
- Sweep Width - sw, sw1 and sw2 are the sweep widths for each of the
three dimensions.
- Spectrometer Frequency sfrq1, sfrq2, and sfrq3 are the frequencies used in each of the three dimensions. The frequency used should be the frequency at zero ppm.
- Reference ref1, ref2, and ref3 are the reference ppm at the center of each of the three dimensions. This program assumes that the reference value is set to the center of the spectra. Let me know if you would like to be able to select a reference point manually and I can edit the script generator.
- Nucleus The nucleus that is detected in each of the three dimensions. These values are for display purposes only and do not affect the referencing in any way.
Notes: Once you have the spectrometer frequency set for the acquisition
dimension the form page will automatically set the correct spectrometer
frequency for indirect dimensions when the nucleus is selected. However, this
only works on a refresh so be careful.
For Varian data we often utilize two scripts / macros to aid in setting up
experiments and obtaining processing parameters. The first one is called
setcar which is a Varian macro. To run the
macro place it in your vnmrsys/macro folder and then type setcar from the VNMR
command line. The program will prompt you for the ppm value for the center
position of the acquisition dimension. You can then enter "n" to have the
program display the referencing information for the other nuclei or select a
transmitter channel. If you select a transmitter channel you are prompted to
enter the ppm value you want the center of that particular nuclei to be set to.
This will change dof, dof2, or dof3 appropriately.
The other program is call procpar.prl and is a
perl script that parses a procpar file and extracts all the information needed to processes the data set.
To use the program place the procpar.prl script in your path and type
procpar.prl procpar 4.772 where 4.772 is the center ppm value of the acquisition
dimension.
We also now have a new Varian macro varian2sbtools
which in conjunction with a slightly modified BPsvf command.
When installed the BPsvf command will create an extra folder called scripts. Inside the script folder
will be auto generated rnmrtk and nmrPipe processing scripts, along with a sbtools parameter line which
can be copied into the "Enter parameter string" text box at the top of the form
to rapidly fill out the form with the appropriate values. A perl script called
bruker 2sbtools.prl has also been created which does very similar things.
- Maximum Entropy Parameters:
(top)
- If maximum entropy is selected for any dimension in the Processing Type?
section then the Maximum Entropy Parameter section needs to be filled out. There
are many dialog boxes to choose from, however, in many cases very few if any
need to be entered by the user. For each choice under Maximum Entropy Algorithm
the items that may need to be entered are shown in parenthesis. In all cases
where data was collected with linear sampling the nuse values need to be entered
for each indirect dimension that will be processed with maximum entropy
reconstruction. nuse is not used when data was collected with non-uniform
sampling. Note that nuse values on the form page are changed automatically when
ni or ni2 values are entered. In most cases these default values will be used.
-
- Auto with NO Deconvolution - When the Auto with No Deconvolution option is chosen, values for def and aim are determined automatically by the program noisecalc. If
"Use separate noise fid" is NOT checked then noisecalc will analyze the fid
collected with the largest time increment (the one that should have the weakest signal) and determine the magnitude of the noise in the spectra. This analysis will be used to determine initial values for def and aim. The noisecalc
program will also analyze the FID with the shortest time increment (the one that should have the
strongest signal - usually the first increment) to determine the position of the 10 largest
signals which are outside the solvent region. Preliminary maximum entropy runs for slices (1D reconstructions) or planes (2D reconstructions) are performed and the
10 values of lambda determined after convergence for each slice/plane are averaged. The entire spectrum is then reprocessed in constant lambda mode utilizing the average lambda value
from the preliminary runs. Note that when maximum entropy reconstruction is
applied in both dimensions (msa2d) of a 2D data set and auto mode is selected
the msa2d program simply runs once in constant aim mode and will converge with a
single lambda value. There is no need to re-run in this case under constant
lambda mode as a similar result would be obtained.
Separate Noise and
Signal FID's - If "Uses separate noise fid" is selected
then a separate filename needs to be provided which is a 1D spectrum collected
identically to the data set being processed except that only one time increment
is selected (1D) and the experiment was altered in some manner to provide a
spectrum of pure noise. Setting the offset frequency shifted by at least 1/pw is
one option and setting the transmitter power to a near zero level is another
method. In either case the script generator will utilize that spectrum to
analyze the noise and determine initial starting values for def and aim. If "Use
separate signal fid" is selected then a separate filename needs to be entered
for a 1D data set collected with an identical sweep width and number of points.
This 1D spectrum will then be used by noisecalc to find the location of the 10
strongest signals outside the noise region. This option is only truly useful for
experiments which give poor signal in the initial FID such as an HNCACB. In this
case the first FID of an HSQC would be appropriate to use. Under normal
conditions the use of "Auto Mode" WITHOUT separate noise or signal FID's
seems to be giving satisfactory results. It is expected that the relative path
from the script directory to the separate signal and noise FID's be provided on
the form page. For example, if you collected a separate noise fid on a Varian
and named the file noise.fid and you copied the noise.fid folder to the
directory where you were processing the data then you would enter noise.fid/fid
in the form text box.
-
- Auto with Deconvolution - Auto with deconvolution works
identically to Auto with NO Deconvolution so see instructions above. The only
difference is that J-coupling deconvolution and/or Line-width deconvolution may
be applied. J-coupling and line-width deconvolution settings are independent of
each other and need not be set similarly for each dimension. Values for
deconvolution only need to be selected for dimensions which will have maximum
entropy reconstruction applied to them. J-coupling deconvolution will attempt to
deconvolve a J-coupling of a given Hz value which must be entered. The coupling
to be removed must be chosen as in-phase or anti-phase. It appears that choosing
a J-coupling value slightly less than the theoretical coupling gives the best
results. J-coupling deconvolution is best used for experiments where decoupling
is not possible, the couplings to be removed are consistent in their value, and
where the J-coupling is large enough to cause splitting of the signals or
broadening. For line-width deconvolution a line-width must be entered and that
line-width will be deconvolved from all of the signals giving sharper lines
without altering the S/N significantly (assuming reasonable values are
selected). Choosing large line-width values to deconvolve will give sharper
signals, however, if the line-width value is set too large (especially if it
approaches the natural line-width of the signals) then distortions will appear.
It is also important when choosing to deconvolve line-width that you create a
large enough final output size for the reconstruction. If the final data set
size is too small and a large value for line-width deconvolution is chosen then
the resulting FID's may ring out strong to the end causing truncation artifacts
to appear. For both J-coupling and line-width deconvolution the number of
iterations that will be necessary for convergence will be much higher than when
NO deconvolution is used. You may find that the script errors with the message
that convergence was not reached. If this occurs examine the output of the
script (you may need to capture the output with > log or | tee log) to find the
step which is failing. The value for loops may need to be increased in some
situations beyond the default values to converge properly. The loops value can
be simply edited with your favorite text editor and re-run.
-
- AIM - In AIM mode you must enter values for def, aim, and nloops. nuse for each dimension processed by maximum entropy
must be entered if the data was collected
linearly. In constant aim mode J-coupling and line-width deconvolution may be
applied in any combination for dimensions being processed by maximum entropy
reconstruction. For 2D NMR experiments where maximum entropy reconstruction will only
be used in the indirect dimension the constant aim method the MSA program will
be run separately on each slice and each slice will therefore converge with a
different lambda value. The lambda value is a sort of "scaling factor" so each
slice will have a different scale. This will cause severe line-shape distortions
when viewing the entire 2D spectrum. In general for 2D cases where MSA will only
be performed in the indirect dimension the constant aim mode will only be used in
a trial fashion to choose an appropriate lambda value. The entire 2D spectra
will then be re-transformed in constant lambda mode. For 2D spectra where msa2d
will be utilized to perform maximum entropy reconstruction along both the direct
and indirect dimensions then constant aim mode is appropriate as the entire 2D
plane will be processed in one step and hence have a single lambda value after
convergence. No line-shape distortions will appear and there is no need to
re-run in constant lambda mode. For 3D data sets where maximum entropy
reconstruction is performed only along a single dimension (msa) or along both
indirect dimensions (msa2d) constant aim mode again is only used as trial runs
to find an appropriate lambda value. Once found the whole 3D spectrum can be
re-processed in constant lambda mode so no line-shape distortions appear. Values
for aim, lambda, and def can also be found automatically be using one of the two
auto funcitons.
-
- Constant Lambda - In Constant lambda mode values for def, lambda,
nloops, and nuse must be entered. Nuse only needs to be entered for dimensions
being processed by maximum entropy reconstruction and only for data that was
collected linearly. It is typical to process 1d slices (msa) or 2D planes
(msa2d) initially in constant aim mode to determine an appropriate value for
lambda and then to re-process the data in constant lambda mode to obtain the
final spectra. In constant lambda mode J-coupling and line-width deconvolution
may be applied in any combination for dimensions being processed by maximum
entropy reconstruction. Values for aim, lambda, and def can also be found
automatically be using one of the two auto funcitons.
-
- Def, Aim, Lambda, and NLoops - The parameters def, aim and lambda affect
the results of the maximum entropy reconstruction. There are two methods in
which maximum entropy reconstruction may be run. In constant aim mode values for
def and aim are input and the value lambda is adjusted by the program until a
convergence is met. In constant-lambda mode values for def and lambda are
entered and the program adjusts aim until convergence is met. In principle, aim should be
equal to the noise level in the data, which is not always easy to ascertain
beforehand. In aim is too large, the result is a flat featureless spectrum. If
aim is too small the result is a spectrum that resembles the zero-filled FT of
the data. For linear data a method for estimating the aim value is to process
the data with the FT without apodization or zero-filling. The noise level can
then be estimated by computing the root-mean-square value for a blank region of
the spectrum. The def value should be set to a value which is smaller than the
smallest significant feature of the spectrum that you want to keep, but larger
than the noise. If def is too large then the spectrum will resemble a
zero-filled FT spectrum. If def is set too low then the onset of nonlinearity
occurs in the range spanned by the noise and results in reconstructions with
spiky noise peaks. Since the non-linearity of a maximum entropy reconstruction
is dependent on def and lambda it is important that the final reconstruction be
performed in constant-lambda mode for 3D data sets where data is reconstructed
along two of the dimensions with msa2d or for 2D/3D data sets where only one
dimension is reconstructed with maximum entropy (msa). In constant aim mode each
slice (msa) or plane (msa2d) will have a different non-linearity and hence a
different scale which would cause severe distortions to the line-shape. In
principle constant-aim mode is used as a test to find a good value for lambda.
The one exception to this case if for 2D data sets being processed with msa2d.
In this case the entire spectrum is reconstructed in one go so there is no
reason to re-run in constant lambda mode. For situations where peak amplitudes
across spectra need to be compared, such as relaxation data sets, the final
reconstructions should always be done in constant-lambda mode to get proper
values of the integrals across all the spectra.
-
- Very Generic, NON-Mathematical, Description of how Maximum Entropy Reconstruction Works - The Fourier
Transforms converts data collected in the time domain to the frequency domain.
In order to accomplish this goal the time spacing between each point in the time
domain data (usually 1/sw) must be equally spaced and no points may be missing.
For the directly detected dimension this is typically not an issue as the entire
time domain data is usually collected in under 100 msec. However, for indirect
dimensions each time point can take from several seconds to several minuets to
collect depending on the amount of time averaging that needs to be performed.
This causes 3D and 4D experiments to often take days to collect, and the data is
usually truncated as it would simply take too long to collect enough points to
allow the signal to fully decay away. This leads to a reduction in the possible
resolution which could be obtained if a greater number of increments were
collected. When processing data with the Fourier transform one typically
attempts to predict out those points which were not collected by using linear
prediction. Maximum entropy reconstruction takes an "inverse" look at the
problem of converting data from the time domain to the frequency domain. Rather
than applying a FT directly to the time domain data, the frequency domain data
is "guessed". This "guessed" frequency domain data is then inverse Fourier
transformed back to the time domain and the "guessed" time domain data is
compared to the collected time domain data. Based on comparison between the two
time domain data sets a better guess of the frequency domain data can be
generated. The process is then repeated and the "guessed" time domain data will
be closer match to the collected time domain data. After several rounds of
repeating the process a convergence will be achieved where the experimentally
collected time domain data matches the "guessed" frequency domain data after
inverse Fourier transformation. While it may appear that it may take a very long
time for convergence to occur, in practice convergence generally occurs in less
than 50 rounds for data with a significant number of signals and even quicker
for noise regions. One huge advantage of this "inverse" approach is that not all
of the data points need to be used. Assume that only 1/3 of the points in the
collected time domain data were actually collected. When the frequency domain
data is "guessed" and inverse Fourier transformed all of the points are present
in the "guessed" FID. However, the comparison to the collected FID is only
preformed for those points which were collected. In this manner data with
missing points can be easily processed. In principle this method can be used
along multiple dimensions concurrently. In practice the Rowland NMR Toolkit only
has the ability to process along 1 dimension (msa) and 2 dimensions (msa2d) at a
time. For 3D data sets this is not a significant limitation as the acquisition
dimension can be processed by using the Fourier transform. For 4D data sets one
of the indirect dimensions needs to be collected uniformly and processed by
using the Fourier transform. An msa3d version is currently under development.
Values for def, aim, and lambda determine how the convergence proceeds and are
thus very critical to getting a proper convergence. The AUTO buttons in the
script generator have been setup so the scripts utilize a separate program (noisecalc)
to obtain proper values for def, aim and lambda without any user interaction.
Once determined, the user may wish to "tweak" the parameters slightly to either
obtain a spectra with more emphasis on smaller signals or on obtaining a cleaner
spectra. The values for def, aim, and lambda used in AUTO mode are reported in
the log files for the user to view. One additional feature of maximum entropy
reconstruction is the ability to deconvolve line-width and J-couplings from the
final frequency domain data without the addition of noise or line-shape
distortions which occur when severe apodization functions are used to increase
resolution of spectra. The "guessed" frequency domain data is predicted with the
line-width or J-coupling removed and then before comparison to the
experimentally collected data set the line-width and J-coupling are multiplied
back into the "guessed" spectra. This feature allows one to narrow line-widths
by performing "virtual" decoupling or by reducing the line-width of all the
resonances. Note that for J-coupling deconvolution to work properly all the
J-couplings to be deconvolved must be similar. For line-width deconvolution it
is important that you do not deconvolve more line-width than exists in the peak
or severe distortions to the line-shape will occur and severe truncation
artifacts will appear. If reasonable values are applied these two feature can
allow dramatically improved spectra over conventional processing schemes. When
processing data with line-width or J-coupling deconvolution being applied the
number of loops needed to obtain convergence is much higher. Therefore, nloops
must be set high (above 1000 typically) compared to 200-300 for a reconstruction
without deconvolution.
-
- Non-linearity of maximum entropy reconstructions One feature of maximum entropy
reconstructions is that a non-linearity occurs where strong peaks are amplified
relative to small peaks and small peaks are squashed in amplitude. The values
def and lambda set the point at which this non-linearity occurs. It is therefore
important that def be set high enough so noise spikes are not amplified to look
like real signals and low enough so that noise peaks are not squashed into the
noise. The Auto feature determines the values for def, aim, and lambda for you
automatically for you, and then the user can "tweak" these values if desired.
For many spectra the non-linearity of the data is not a significant issue, but
for spectra where you need precise integration of the signals a maximum entropy
reconstruction will pose a slight issue as the integrals will not be linearly
related. However, a simple way to overcome this issue is to synthetically inject
peaks of known parameters (frequency, line-width, amplitude) into the spectrum
in an area where no peaks exist so as to not cause any overlap. If several peaks
are injected with different amplitudes a calibration curve of peak amplitude
versus integral value can be measured. This calibration curve can then be used
to back correct the integral values of your peaks. This process has been done by
several groups and is robust enough to handle even the demanding integration
requirements of NMR relaxation experiments.
-
- Extract Left Half of Spectrum?
(top)
- For 15N edited spectra it is typical that only the amide resonances appear in the direct dimension. Since all of the amide resonances are downfield of water, which is typically the center of the spectra, there is no need to keep the right half of the spectra. For these cases it is best to cut the direct dimension in half. This saves disk space by 50%, decreases processing time 4 fold for 3-dimensional spectra, and allows for faster screen drawing during analysis. Updating of the referencing is handled by the program. At this point the cut dimension in half switch will only save the left half of the spectra. If there is a need I can edit the generator to allow selection of the right half of the spectra or allow the user to define a particular region to save. Let me know if either choice would be beneficial.
- Output File Formats
(top)
- By default the output will be saved in the Rowland NMRToolkit format. This
file format contains two files. One is the processed NMR data in binary format
and the other is a file with the same base name and a .par extension which is a
small text file which defines the layout of the binary format. In addition to the
rnmrtk file format, which can be viewed with the programs contour and seepln, the data can also be converted to
nmrDraw/nmrPipe, Xeasy, Felix, or nmrView and Sparky formats. Two additional
file formats are available and for the 3D nmrPipe formats the order of the
output can be selected which can be useful for checking the phasing along all
three dimensions without the need to swap the order of the data. The conversion
to Xeasy, Felix, and nmrPipe file formats is built into the rnmrtk program.
Sparky and nmrView file formats are created from an nmrPipe file format as an
intermediate. Therefore nmrPipe must be installed and in the path for these
conversions to work. Also the pipe2ucsf and ucsfdata programs must be installed
for the Sparky conversion.
- Remove Temporary Files
(top)
- The script generator creates many temporary files when executed. These
temporary files can be either removed automatically or left in the folder. The
only difference to the scripts is whether the rm statements at the end of the
script are commented or uncommented. This allows for the user to easily change
the behavior of the script in regard to removing temporary files and gives the
user the complete rm command to be used later if desired to "clean up" a folder
from these temporary files. Certain files which are likely to be of value to the
user are not deleted automatically. However, these files are removed
automatically when the script is executed an additional time.