Inference

We provide here the source code and command lines to create new D.melanogaster transcriptional network out of the modENCODE data (2010-2011). The network can be produced in a .dot and .sif  format in order to be used in the GraphViz,Tulip or Cytoscape software. The two formats .dot and .sif share a common structure to represent the graph as an adjacency list: FROM XX TO with FROM being the source node (i.e. the TF), TO being the target node (i.e. the TG) and XX being a string describing the interaction. We adopt here the string  "->", since it is a valid string in both formats.  Genes (FROM and TO) are denoted with their flybase format: FBgn in every .data files. 
The code is distributed freely in the hope that it will be useful but without any warranty (License GPL3)

  1. Create the list of all possible edges in the network from (tg.data) the list of genes having at least one measurement and  (tf.data) the list of known TF among them using the C++ script (fromTo.cpp) and the header (fly.h)
    1. compile g++ -o fromto fromTo.cpp
    2. download tf.data and tg.data and place the files in the directory data/
    3. execute ./fromto data/tf.data data/tg.data netw/from.netw netw/to.netw  in order to generate two files from.netw and to.netw, with 10M lines each, containing the edge origins and the edge destinations respectively of all possible edges.
  2. Create the motif subnetwork, i.e. the set of weights from (motif.data.zip), the network built from evolutionary conserved motif, for each possible edges created in step 1, using the C++ script motif.cpp
    1. compile g++ -o motif motif.cpp
    2. download and unzip motif.data.zip  and place the files in the directory data/
    3. execute ./motif data/tf.data data/tg.data data/motif.data netw/motif.netw in order to generate the file motif.netw of edge weights.
  3. Create the ChIP subnetwork, i.e. the set of weights for each possible edges inferred from (chip.data), the network built from modENCODE ChIP-chip interactions, using the C++ script chip.cpp
    1. compile g++ -o chip chip.cpp
    2. download chip.data  and place the files in the directory data/
    3. execute ./chip data/tf.data data/tg.data data/chip.data netw/chip.netw in order to generate the file chip.netw.
  4. Create the expression and chromatin subnetworks using the C++ script correlation.cpp
    1. compile g++ -o correlation correlation.cpp
    2. download the time course chromatin dataset chrom-tc.data  and place the files in the directory data/
    3. execute ./correlation data/tf.data data/tg.data data/chrom-tc.data netw/chrom-tc.netw in order to generate the file chrom-tc.netw list of weights.
    4. download the cell lines chromatin dataset chrom-cl.data  and place the files in the directory data/
    5. execute ./correlation data/tf.data data/tg.data data/chrom-cl.data netw/chrom-cl.netw in order to generate the file chrom-cl.netw list of weights.
    6. download the RNAseq dataset rnaseq.data  and place the files in the directory data/
    7. execute ./correlation data/tf.data data/tg.data data/rnaseq.data netw/rnaseq.netw in order to generate the file rnaseq.netw list of weights.
    8. download the microarray dataset microarray.data  and place the files in the directory data/
    9. execute ./correlation data/tf.data data/tg.data data/microarray.data netw/microarray.netw in order to generate the file microarray.netw list of weights.
    10. download the flyatlas dataset flyatlas.data  and place the files in the directory data/
    11. execute ./correlation data/tf.data data/tg.data data/flyatlas.data netw/flyatlas.netw in order to generate the file flyatlas.netw list of weights.
  5. Create the unsupervised integrated network using the C++ script unsupervised.cpp
    1. compile g++ -o unsupervised unsupervised.cpp
    2. execute ./unsupervised netw/motif.netw netw/chip.netw netw/chrom-tc.netw netw/chrom-cl.netw netw/microarray.netw netw/rnaseq.netw netw/flyatlas.netw netw/unsupervised.netw in order to generate the file unsupervised.netw containing the list of weights of the unsupervised network from the list of weights given by the seven subnetworks.
  6. Create the supervised integrated subnetwork using the C++ script supervised.cpp 
    1. compile g++ -o supervised supervised.cpp
    2. execute ./supervised netw/motif.netw netw/chip.netw netw/chrom-tc.netw netw/chrom-cl.netw netw/microarray.netw netw/rnaseq.netw netw/flyatlas.netw netw/supervised.netw in order to generate the file supervised.netw containing the list of weights of the logistic regression from the list of weights given by the seven subnetworks.
  7. Create the .sif or .dot files using the C++ script netw2dot.cpp
    1. compile g++ -o netw2dot netw2dot.cpp
    2. execute
      1.  ./netw2dot netw/tf.netw netw/tg.netw netw/supervised.netw 300000 supervised dot in order to obtain the file supervised.dot describing a network made of the 300000 top ordered edges from the file supervised.netw.
      2.  ./netw2dot netw/tf.netw netw/tg.netw netw/unsupervised.netw 200000 unsupervised sif in order to obtain the file unsupervised.sif describing the network made of the 200000 top ordered edges from unsupervised.netw

If you have questions concerning this code, please send an email to software@meyerp.com.