Fequently Asked Questions by Users

*Question: I'm using quality_variant.txt (output by SHORE) file for option --marker of function SHOREmap outcross or SHOREmap extract. Is quality_variant.txt indeed the marker file to use?

Answer: For backcross data, it is correct to use quality_variant.txt determined from the pool as the initial marker file (for option: --marker). This file generated by SHORE is a huge tab-delimited table (with no annotation line). The corresponding consensus information file is the one named consensus_summary.txt (after unzipping) in SHORE result-folder supplementary_data/. As this file is large in Gygabytes, we can use SHOREmap extract to get necessary lines corresponding to the markers.

For outcross data, we need to use SHROEmap create to create high-quality markers for analysis of allele frequencies. For 'create' command, we need to provide four files, namely the quality_variant.txt as well as quality_reference.txt resulted from resequencing of two parental lines. quality_reference.txt can also be large in size. To save time in future steps, we can use SHOREmap extract to get the necessary lines. For example, we have quality_variant_parenta.txt, quality_reference_parenta.txt for parental line A and quality_variant_parentb.txt, quality_reference_parentb.txt for parental line B. Then, we extract for quality_variant_parenta.txt with quality_reference_parentb.txt (!some info might be lost as they are not one-to-one pairs). A similar process can be done for quality_variant_parentb.txt and quality_reference_parenta.txt. The extracted files are then provided for SHROEmap create to find good markers. These markers are provided for SHOREmap outcross.

*Question: I am using quality_variant.txt as the marker file. Why I always receive an error saying: "SNP info line requires at least 5 columns" (while the file contains indeed 9 columns following the required format) and finally "ERROR: no marker info recorded". Why my colums (tab-delimited) cannot be recognized?

Answer: The warning "SNP info line requires at least 5 columns" (and the final "ERROR: no marker info recorded") can be resulted from two facts. One is that the line currently processed does not have at least 5 tab-delimited columns, including c1=project_name, c2=chromosome_id, c3=marker_position, c4=reference_base, c5=alternative_base. The other is that there is 'white-space' detected in the first line of quality_variant.txt, which will substitute 'tab' for separating lines into sub-units.

All these facts are due to the arbitrary modification of the original quality_variant.txt. Please be careful if you want to add additional annotation information in the begining of the marker file. You should start any of the annotation lines with symbol '#' (so that these lines will be ignored during reading markers).

*Question: Can I run SHOREmap_v3.x on a MAC system?

Answer: It is possible, but the OpenMotif libararies like libXt and libXm required by DISLIN must be properly installed. Try the following steps:

1. Download OpenMotif: the .dmg package of OpenMotif from http://www.ist-inc.com/motif/download/openmotif_download.html: 'Mac OS X 10.5 Universal (Leopard) compat-2.1.32 openmotif-compat-2.1.32_IST.macosx10.5.dmg MD5 8385161 (7.99 MB)'. Install it according to the instructions in README file (that is, you can double click .dmp which generates another installation, which you also double click to continue with your own specification). OpenMotif-Compat 2.1.31 will be installed in /usr/OpenMotif-2.1.31-22i which will be linked to /usr/OpenMotif (we need to provide this path later, so before continuing please check if there are libXm* files installed successfully under the path).

2. Download DISLIN: https://www.mps.mpg.de/dislin/mac-osx-darwin. Be careful to check which version you need, for a 32-bit or 64-bit system. The following one works for OS 10.8: '10.3.darwin.intel.64.tar.gz Mac OSX 10.5, 10.6, 10.7, 10.8, Intel gcc, g++, g77, Perl, Python, Java, Intel icc, ifort (64-bit) 9898 KB 19-Jul-201'. Install DISLIN and set environmental variables. There are clear instructions in the README file (contained in the downloaded package).

2.1 Installation of DISLIN does not give dislin_d.h, but it is required for compiling SHOREmap_v3.x. We can find it under the examples/ folder of the initial installation package of DISLIN. Please copy dislin_d.h to folder dislin10.3/ of SHOREmap_v3.x/.

2.2 You also need to copy libraries 'libdislin_d*.dylib' into 'usr/local/lib'.

3. Download SHOREmap_v3.x => unzip it => open 'makefile' file => modify '-L/usr/lib/ -lXt' to '-L/usr/OpenMotif -lXm' => save the changes.

4. Type 'make' from terminal to compile an executable SHOREmap. There are a few warnings about usage of '\0' in the code, which has no influence on functions of SHOREmap.

*Question: For outcross data, I cannot achieve an accurate mapping interval only with the resequencing data of the pool. The difficulty lies in how to find proper markers. For this, can you provide a general workflow with reseqeuncing data of parental lines?

Answer: For outcross data, if only given the resequencing data of F2 (which are used to do de novo marker identification), SHOREmap usually cannot guarantee an accurate mapping interval. Here are several suggestions for achieving an accurate mapping interval with additional resequencing data of parental lines. Taking Arabidopsis thaliana as an example, suppose the mutant is in Col background, and cross to the Ler accession leading to the F2 pool; and SHORE is used for resequencing.

First, do these analyses:

a. with sequencing data of the pool of F2, do a resequencing analysis with SHORE;

b. with sequencing data of genome of parent Col, do a resequencing analysis with SHORE;

c. with sequencing data of genome of parent Ler, do a resequencing analysis with SHORE.

You will get many files from each resequencing analysis, which are recorded in the standard SHORE folder. In general, what SHOREmap requires includes <1> quality_variant.txt (containing candidate markers, in foler ConsensusAnalysis/), <2> quality_reference.txt (containing consensus calls supporting reference bases, and its format is the same as that of quality_variant.txt, in foler ConsensusAnalysis/) and <3> consensus_summary.txt (containing consensus call information, in folder ConsensusAnalysis/supplementary_data/).

Next, use 'SHOREmap extract' to prepare files for 'SHOREmap create' to create markers for analyzing pooled F2. Do the following:

1. use 'SHOREmap extract' to extract reference base calls from quality_reference.txt of Col for quality_variant.txt of Ler, which outputs result in extracted_quality_ref_base_1.txt;

SHOREmap extract --chrsizes chrsizes.chr1-5.txt --folder /Col/ConsensusAnalysis --marker /Ler/ConsensusAnalysis/quality_variant.txt --extract-bg-ref --consen /Col/ConsensusAnalysis/quality_reference.txt -verbose --row-first 1

2. use 'SHOREmap extract' to extract reference base calls from quality_reference.txt of Ler for quality_variant.txt of Col, which outputs result in extracted_quality_ref_base_2.txt;

SHOREmap extract --chrsizes chrsizes.chr1-5.txt --folder /Ler/ConsensusAnalysis --marker /Col/ConsensusAnalysis/quality_variant.txt --extract-bg-ref --consen /Ler/ConsensusAnalysis/quality_reference.txt -verbose --row-first 2

Next, do the following (pls tune the score, coverage according to your own data):

SHOREmap create --chrsizes chrsizes.chr1-5.txt --folder /test_SHOREmap_create --marker-pa /Col/ConsensusAnalysis/quality_variant.txt --marker-pb /Ler/ConsensusAnalysis/quality_variant.txt --bg-ref-base-pa /Col/ConsensusAnalysis/extracted_quality_ref_base_1.txt --bg-ref-base-pb /Ler/ConsensusAnalysis/extracted_quality_ref_base_2.txt --pmarker-score 30 --pmarker-min-cov 20 --pmarker-max-cov 37 --pmarker-min-freq 1 --bg-ref-cov 18 --bg-ref-cov-max 37 --bg-ref-freq 1 --bg-ref-score 30 -verbose

Then, we can find file SHOREmap_created_F2Pab_specific.txt in folder test_SHOREmap_create/. This file contains markers for SHOREmap outcross analysis, which could be further controlled by tuning options corresponding to coverage and base quality score (in pooled F2).

Next, use 'SHOREmap extract' to extract consensus base calls from consensus_summary.txt for quality_variant.txt of pooled F2, which outputs result in extracted_consensus_3.txt . This file providing resequencing information of pooled F2 for calculating allele frequencies at markers (determined from 'SHOREmap create').

SHOREmap extract --chrsizes chrsizes.chr1-5.txt --folder /F2/ConsensusAnalysis --marker /F2/ConsensusAnalysis/quality_variant.txt --consen /F2/ConsensusAnalysis/supplementary_data/consensus_summary.txt -verbose --row-first 3

Finally, do the following:

SHOREmap outcross --chrsizes chrsizes.chr1-5.txt --folder /test_SHOREmap_outcross --marker SHOREmap_created_F2Pab_specific.txt --consen /F2/ConsensusAnalysis/extracted_consensus_3.txt --min-marker 10 -verbose -plot-boost -plot-scale --window-step 10000 --window-size 300000 --interval-min-mean 0.994 --interval-max-cvar 0.015 --min-coverage 22 --max-coverage 35

*Question: how do I determine the values of options corresponding to read coverage and base qulity score?

Answer: For most cases, we can check the average number of reads covering a genomic position (or read coverage) in the config.log in SHORE folder. Then, we can determine the minimum/maximum thresholds for coverage. For example, if we see from the log file that in average along a chromosome/scaffold, the coverage is 50, then we may provide SHOREmap backcross/outcross with minimum-coverage as 30, maximum coverage as 70. For quality of base calls, if the maiximum is 40 as given by SHORE, we may use a threshold of 25. Careful tuning of these parameters for foreground and background resequencing data can lead to expected patterns (normal segregation of non-causal mutations or peak-patterns indicating the causal mutations).

*Question: When I am compiling SHOREmap, an error saying that "dislin_d.h cannot be found" happens?

Answer: DISLIN provides different versions of libraries like dislin.h and dislin_d.h to handle different precisions of data. The one we need is dislin_d.h that can handle double-precision data. However, after instaling DISLIN, we cannot find dislin_d.h in the installed folder. Instead, we can find dislin_d.h in the example folder of the installation-package of DISLIN. Then, we need to copy this dislin_d.h into SHROEmap_v3.0/dislin10.3, where there are also libdislin_d.so and libdislin_d.so.10.

*Question: When I am compiling SHOREmap, an error saying that "libdislin_d.so: could not read symbols: File in wrong format" happens?

Answer: This error indicates that there is an incompatibility in terms of 'bit' between the DISLIN and the Operating System. The installation package of DISLIN given in the manual/website is a 32-bit version, which might not fit your operating system if it is 64-bit. We can check the system-bit by using 'uname -a' from the terminal, which should give data like 'Linux ws-name 3.5.0-25-generic #39-Ubuntu SMP Mon Feb 25 18:26:58 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux'.

To fix this error, other 64-bit packages of DISLIN could be installed, e.g. dislin-10.3.linux.i586_64.tar.gz or dislin-10.3.linux.i386_64.tar.gz. For a comprehensive introduction on the installation packages of DISLIN for different systems, please refer to the downloading website of DISLIN.

*Question: when using SHOREmap annotate function, there is a warning message like 'Warning (in get_gene_snps.cpp): length of cds is not a multiple of 3: AT4G14272.1: 8219622~8220026'. What is the problem?

Answer: in tair10 gff, there are tens of genes annotated with lengths of concatenated cds not being multiple of three. This fact may result in a change of the functional annotation of a mutation with nonsynnoymous effect into synnoymous, and vice versa. Unfortunately this is a fact what we cannot improve at this moment.

Therefore, for downstream analysis, we need to be careful on these genes if they contain striking mutations. Nomatter what these mutations have been annotated, just keep in mind there are some 'abnormally' annotated genes in the interval. If there are no other 'better' candidate genes of interest, we may need to come back to check such genes..

*Question: When using 'samtools mpileup -uD' to call SNPs, some of them which we can see based on visualization of the read alignment are not in the final variant call file. This can affect downstream SHOREmap analysis. How can we improve?

Answer: It might be that SNPs colocalized with INDELs have been filtered out by samtools. Try to use 'samtools mpileup -uD ' with an additional option '-B', which truns off the BAQ-filtering (or Base Alignment Quality filtering), or stops samtools to rule out false SNPs caused by nearby INDELs.