Sample Scheduler for Non-Uniform Data Sets
Tools Home Molecular Biology NMR Data Model VENN Links
               
UCHC>SBF>TOOLS_HOME>NMR>SAMPLE_SCHEDULER

Update

The web based Sample Scheduler has been replaced with a downloadable Java Application called ScheduleTool. To install the program follow the installation instructions below in the documentation.

NUS ScheduleTool Documentation

The ScheduleTool was created to assist users in the creation of Non-Uniform Sampling (NUS) schedules for the collection of NMR data. The tool was designed with ease of use in mind. By entering easily know values for molecular weight, nucleus types, number of dimensions, and sweep widths the program will automatically determine suitable values for parameters directly responsible for building the random sample schedules. A graphical user interface was built to aid in this process and to visualize the sample schedules. Analysis tools are also built into the tool for developers and to help educate users as to why some sample schedules may be better than others.

While the ScheduleTool was designed with ease of use in mind it also supports more advanced features and the ability to run from the command line for advanced users and developers who may be generating many sample schedules for testing purposes rather than a single sample schedule for NMR data collection.

While I attempted to think of as many features as possible I am sure there are many more features users may want to see added. Send any requests for new features, report bugs, or questions about the tool to markm@neuron.uchc.edu.

The ScheduleTool was created by Mark Maciejewski and Val Gorbatyuk in conjunction with Jeffrey Hoch at the University of Connecticut Health Center. Thanks to Alan Stern, Mehdi Mobli, and Jay Vyas for helpful discussions.

OUTLINE

INSTALLATION (top)

Note: A semi recent version of java must be installed to run the Schedule Tool (java version 1.5 or higher). Use the command "java -version" from a command line to determine the version installed on your system.

Download the latest version of the ScheduleTool (ScheduleTool-dir.tar), unpack, and save the jar file and ScheduleTool script to a location in the system path. NOTE: OSX installations should use the ScheduleTool-mac script. The script simply runs the command:

java -jar -Xmx1024m ScheduleTool-date.jar $@ Note: The date part of the filename for the ScheduleTool will change as the program is updated. The OSX version has an additional argument which places the name of the program on the dock when it is running.

The -Xmx1024m allows the program to use up to 1GB of memory. This number can be adjusted based on your needs. Note that if you encounter program errors, especially for very large 3D sample schedules, you may want to try increasing the 1024 to a larger value, say 2048. The $@ passes all arguments on the command line to the program.

KNOWN BUGS or are they FEATURES? (top)

NON-UNIFORM SAMPLING (NUS) (top)

A description of non-uniform sampling can be found at http://rnmrtk.uchc.edu/rnmrtk/NUS.html

Non-uniform sampling is the process of collecting time-domain data at non-fixed intervals. There are several advantages to collecting data in this manner.

ALGORITHM (top)

The algorithm for generating the sample schedule is based on randomly picking a subset of possible points from a 1D, 2D or 3D grid, but with a skewed random distribution based on an exponentially decaying function in each dimension. First the algorithm finds the maximum increment in all dimensions to determine the size of the 1D, 2D, or 3D grid of potential points that may be picked (this would be the size of a uniformly collected spectrum). The algorithm then uses the decay rate in each dimension along with the sweep width in each dimension (used to determine the evolution time of each potential point) to determine the probability of each potential point by:

1D: Probability = [EXP (t1*decayRate_t1)]

2D: Probability = [EXP (t1*decayRate_t1)] * [EXP (t2*decayRate_t2)]

3D: Probability = [EXP (t1*decayRate_t1)] * [EXP (t2*decayRate_t2)] * [EXP (t3*decayRate_t3)]

where tn = (increment_number * (sweep_width_tn)-1)

The initial point with an evolution time equal to zero has a probability set to 1 and other points have a probability between 0 and 1. Note that for constant time experiments the decay rate is set to zero for that dimension. Also note that for SIN or COS modulated signals the appropriate SIN or COS function is multiplied to the exponential decay function above to determine the probability of each point.

New Probability = Probability (above) * ABS [COS/SIN(3.14 * J_tn * tn)], where tn = (increment_number * (sweep_width_tn)-1)

Once the probability of each potential point is determined that probability is multiplied by a random number between 1 and 0. Thus a potential point with a high probability based on the decay rate may end up with a low probability if the random number is close to 0. Thus the product of the probability based on the decay rate and J-coupling along with the random number is what leads to the randomness of the process. Once each potential point has its new probability based on the product of the decay rate and random number the values are sorted from greatest to least. The sample schedule then consists of the highest values in the sorted list up to the total points in the sample schedule.

RUNNING THE SCHEDULE TOOL WITH A GRAPHICAL USER INTERFACE (GUI) (top)

Note: The ScheduleTool may be run with a Graphical User Interface (GUI) or standalone from the command line. Any command line arguments entered when running the program in GUI mode are passed to the program and those arguments are used to populate the program entries with default values. NOTE: The default values ndim, molWeight, field, swtN, and nucltN must be entered first. All other values must come after these values are entered.

  1. Running the program in generic GUI mode
    • Enter the command "ScheduleTool" or "ScheduleTool --gui"
  2. Running the program in help mode which shows all arguments which may be passed to the program
  3. Running the program in GUI mode with default parameters passed to the program
    • Example: "ScheduleTool --ndim 2 --molWeight 32000 --field 600 --swt1 8000 --swt2 1750 --nuclt1 13C --nuclt2 15N --gui"
      • This command will invoke the GUI of the ScheduleTool and populate the number of dimensions to 2, the Molecular Weight to 32,000 Da, the sweep widths to 8000 and 1750 Hz and the nuclei to 13C and 15N. Those values will then be processed to determine reasonable values for Maximum Increment, Decay Rate, and Total Points and the determined values will be populated in the GUI text boxes upon opening the program, as if the Compute Defaults button were pressed. Any of the values may be changed by the user prior to creating a sample schedule.

DESCRIPTION OF THE GRAPHICAL USER INTERFACE (top)

The GUI has seven components.

  1. The "Number of Non-Uniform Dimensions" located at the very top.
  2. Sample information including "Molecular Weight", "Field Strength", "Nucleus" type, "Sweep Widths", and whether the experimental dimension is "Constant Time". A checkbox for using a "Random" seed is also present along with a text box to enter a "Seed" value if the random checkbox is not checked.
  3. A "Compute Defaults" button. This button, when pressed, uses the sample information described above to compute default values for the "Maximum Increment" "Decay Rate", and "Total Points". Other values such as "Force Initial Points", J-Coupling", and "Oversampling" are not altered by the "Compute Defaults" button.
  4. Sample schedule information is located below the "Compute Defaults" button and includes text boxes for "Maximum Increment", "Decay Rate", "Force Initial Points", "J-Coupling Value", and "Total Points" along with pull down menus for "J-Coupling" type, and "Oversampling". The values in this sections, along with the "Sweep Widths" entered in the sample information section are used to compute the sample schedule.
  5. The "Create Schedule" button creates the sample schedule and opens a Sample Schedule window with the ability to view the sample schedule, view the point spread function (PSF) of the sample schedule, provides some basic statistical information about the sample schedule and parameters used to create it, and provides pull down menus for saving the sample schedule and other information. Each of these are described in more detail below in the sample schedule window help.
  6. The "Help" button is a web link to this document
  7. Information mouse over's are shown as blue italic i's. When the mouse hovers over any of these additional information is presented to the user. For example, when the mouse is placed over the "Molecular Weight" information mouse over, the user is presented with the molecular rotational correlation time that will be used to compute default T2 relaxation times based on the "Nucleus" choice when the "Compute Defaults" button is pressed.

DETAILS OF GUI COMPONENTS (top)

SAMPLE SCHEDULE WINDOW (top)

The sample schedule window opens automatically after creating a sample schedule when running in GUI mode. The window is composed of five parts; A schedule window, a FFT window, a Stats window, and File and Data Menu bars.

Schedule Window

FFT or Point Spread Function Window

Statistics Window

  1. Graphical view of the sample schedule. (top)
    1. This is a graphical representation of the sample schedule with points being collected shown as red squares. Sample points not in the sample schedule are blank. The blue circle represents the median increment and may or may not actually be included in the sample schedule.
      1. For 1D and 2D schedules the entire schedule is displayed as a single figure. For 3D schedules only the t1 / t2 dimensions are shown and a slider bar is included to view the t1 / t2 schedule planes for any given t3 value. Averages are shown per t1 / t2 slice.
      2. Using the mouse any region can be zoomed. The right mouse button may be used to zoom back out by selecting Auto Range - Both Axis.
  2. Graphical view of the point spread function (PSF). (top)
    1. The point spread function is displayed by selecting the FFT tab. The PSF is shown with contours and as an intensity plot.
      1. The contour start level can be changed manually. Enter zero to turn contours off.
    2. The PSF is calculated by performing a discrete Fourier transform of the sample schedule where all points that are included in the sample schedule are set to a value of "one" and all points not included in the sample schedule are set to a value of "zero".
      1. When collecting NUS data there are often sampling artifacts that appear in the spectrum. These artifacts arise because each signal in the spectrum is convolved with the PSF. These artifacts are normally weak and can be ignored, but in cases where the PSF gives large intensities outside the central region (0 Hz) can lead to significant artifacts in the final spectrum. Therefore a sample schedule that minimizes these large amplitudes in the PSF will tend to give spectra with fewer intense sampling artifacts. In addition, using a processing method such as maximum entropy reconstruction (MaxEnt) is beneficial in that the sampling artifacts are deconvolved from the final spectrum to some degree without affecting any "true" signals (signals that are not arising due sampling artifacts).
      2. In general the larger the percentage of points and the more random the sample schedule the better the PSF will look. Sample schedules with any type of pattern such as those used in back projection will lead to the most significant sampling artifacts. In addition the central component is broadened slightly as the percentage of kept points in the sample schedule decreases. This will cause a slight broadening of the peaks in the final spectrum. As this is slight, and NUS generally affords much better resolution to begin with, this slight broadening is often ignored.
        1. Note however that MaxEnt can easily deconvolved this added linewidth from the spectrum and can even deconvolve some of the true linewidth of the peaks without causing deleterious effects to the spectrum (essentially narrowing all the peaks) as long as it is applied in a conservative manner. Conservative means that no more than 1/2 of the natural linewidth is deconvolved and the size of the final spectrum is large enough to eliminate truncation artifacts that may arise with the decrease in linewidth.
    3. Statistics window (top)
      1. The statistics window shows values for:
        1. Sample schedule sensitivity
          1. This value is simply the sum of the exponential decay values for the given T2 based on the molecular weight, sweep widths, and nucleus choice for all points that are included in the sample schedule.
        2. Uniform sampling sensitivity
          1. This value is the same as the sampling schedule sensitivity except that it is the sum over all points up to the maximum increment and shows what the sensitivity would be if all the points were collected.
            1. Note: This value will likely be higher than the sampling schedule sensitivity. However, the experimental time would be much greater to achieve this sensitivity. Generally the sensitivity per unit time is greater for NUS sample schedules than uniform sampling.
        3. Relative sensitivity
          1. Simply the ratio of sampling schedule sensitivity and uniform sampling sensitivity.
        4. Improvement in sensitivity per unit time
          1. Simply the relative sensitivity normalized for experiment time.
        5. Rotational correlation time (tm)
          1. The tm is based on the MW and is based on experimental relaxation data from the literature and not a theoretical value. The tm is slightly higher than the theoretical value.
        6. Parameters used to calculate the sample schedule.
          1. These values are for recording the parameters used to create the sample schedule. Note that sweep width, maximum increment, and decay rate have the values chosen and the actual values used to calculate the sample schedule. For sample schedules created without oversampling these values are identical, but may be different for sample schedules created with oversampling depending on wheter the force decay rate checkbox was selected.
        7. Median
          1. The median increment in each of the dimensions along with the evolution time that corresponds to that increment.
        8. Average
          1. The average increment number in each of the dimensions along with the evolution time that corresponds to that increment.
        9. R2
          1. The R2 relaxation rate expected based on the molecular weight and the nucleus choice.
    4. File Menu (top)
      1. Save Varian
        1. Saves the sample schedule in a format suitable for using Orekhov / Hoch method of collecting NUS data from BioPack sequences. Also saves a second file suitable for processing the data using the Rowland NMR Toolkit.
          1. The difference is that the Varian file starts with 0 as the first increment while the toolkit file starts from 1.
      2. Save Bruker
        1. Saves the sample schedule in a format suitable for using the Wagner method of collecting NUS data on Bruker instruments.
      3. Save PSF-RNMRTK script
        1. Saves a Rowland NMR Toolkit processing script that will synthetically generate a data set and produce a point spread function of the sample schedule. The values in the PSF are identical between this tool and the toolkit.
      4. Save FFT result
        1. Saves a text file with the result of the point spread function. Both the real and imaginary components are saved in separate columns.
      5. Save peaks
        1. The sample schedule tool has a rudimentary peak picker. The save peaks command saves a text file with the output from the peak picker.
          1. Output is sorted from largest signal (the central component) to the smallest picked peak.
          2. The peak picker picks the peaks automatically upon the creation of the sample schedule so no peak picking is necessary to save the peak pick results.
      6. See command line
        1. The ScheduleTool can be run in GUI mode and from a command line. The "See command line" option allows the user to see in GUI mode the command line that was used to build the sample schedule. It is hoped that this can be used as a tutorial on how to build sample schedules from the command line.
      7. Print
        1. Prints the sample schedule window.
      8. Version
        1. Opens a window showing the current version number.
      9. Close
        1. Closes the Sample Schedule Window.
        2. NOTE: This does not close the ScheduleTool, just the Sample Schedule window. It is a known bug that having multiple sample schedule windows open at the same time may lead to problems. For instance if multiple sample schedule windows are open and the save peaks command is issued there is no guarantee that the correct PSF will be analyzed. I strongly urge users to only have a single sample schedule window open at a single time.
    5. Data Menu (top)
      1. FFT data
        1. Shows the intensity for both real and imaginary points from the point spread function. This display only shows 1000 data points at a time. The user can choose the first point to view.
        2. Under the File Menu the result of the FFT can be saved. This will save all the data points and not just 1000 at a time.
      2. Peaks
        1. Displays a list of peaks (from a default peak picker) in a window. Peaks are shown in decreasing intensity with the central component at 0 Hz at the top.

DESCRIPTION OF THE COMMAND LINE (top)

The command line option for running the ScheduleTool can be used in two ways. The first is to simply create a sample schedule direct from the command line without any graphical interface. The second is to pass arguments to the ScheduleTool so that when the GUI of the ScheduleTool opens the passed arguments are entered by default rather than the pre-programmed default values.

Whenever the ScheduleTool opens the first thing that is done is information for the number of dimensions, the molecular weight, the sweep widths, field strength, and the nucleus types are refactored to determine suitable parameters for maximum increment, decay rates, and total points. Because of this the number of dimensions, molecular weight, sweep widths, field, nucleus choices, and the sample schedule filename are required when being used from the command line alone. For the GUI mode default values are used for the initial refactoring and after the user defines these parameters the compute defaults button may be pressed to update the parameters again.

Optional values for parameters to generate the sample schedule can also be entered from the command line. When these optional parameters are entered from the command line the program first ignores these values. The required entries (sweep widths, field, nucleus choices, molecular weight, and the number of dimensions) are processed to determine suitable values to generate the sample schedule. Then the optional parameters that were entered from the command line are used to replace the default values and the sample schedule is created.

In order to generate a sample schedule the molecular weight, field, and nucleus choices are not strictly needed. However, the way the ScheduleTool is written these values still need to be entered. Maybe in a future version I will check to see if all parameters are entered then the requirement for the molecular weight, field, and nucleus choices will be removed. For now you must enter these values.

In GUI mode once a sample schedule has been created the user may see the command line that would create the same sample schedule by going to the File Menu and click on See Command Line. It is hoped that this feature would act as a tutorial on how to use the command line.

NOTE: The required paramters ndim, molWeight, field, swtN, and nucltN must be entered before any non required parameters for the command line parser to work properly. Also, the command line parser is not tolerant of typos.

COMMAND LINE HELP (top)

Usage:
[--help] to print this
[--gui] to run in graphical mode
--- Required for GUIless mode ----------------------------------------
[--ndim {1,2,3}] Number of non-uniform dimensions (default: 2)
[--molWeight {x.x}]
[--field {500,600,700,800,900}]
[--swtN {x.x}] Sweep width tN
[--nucltN {1H-homo, 1H-13C, 1H-15N, 13C, 15N, 15N-TROSY}] Nucleus tN
[--sched {filename}] Schedule output file (for GUIless mode)
[{--bruker, --varian}] Schedule will be generated in Bruker/Varian format (one at a time, default: Varian)
--- Optional ----------------------------------------------------------
[--constTimetN {true, false}] Constant time tN
[--maxIncrtN {x.x}] Maximum increment tN
[--decayRatetN {x.x}] Decay Rate tN
[--forceFirsttN {x}] Force first x points tN
[--jcoupltN {None,sin,cos}] J-Coupling tN
[--jcoupleFreqtN {x.x}] J-Coupling frequency tN
[--oversampltN {1X, 2X,4X, 8X}] Oversampling tN
[--totalPoints {x}] Force total points
[--seed {x}] The seed for the random number generator
[--path {path}] The location where all the output files will be stored (default: current directory)
[--fftdata {filename}] Where to output the result of the fft function
[--peaks {filename}] Where to output the peaks in the fft'ed data
[--rnmrtk {filename}] Where to output the PSF script for RNMRTK

Example
java -jar ScheduleTool.jar --ndim 2 --molWeight 30000 --field 600 --nuclt1 15N --nuclt2 13C --swt1 2000 --swt2 8000 --path ./ --sched sched.dat