An investigation into whether the similarity of language used in bug reports can
be used to enhance bug localisation techniques.

This work was published in
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6385108

Licence:

This repository consists of multiple programs rather than one coherent
whole.  Licensing information is contained individually within each file. Where
possible, LICENCE_BSD is used but other licences are used where necessary.

Dependencies:

Gant - Optional, used for building and running programs.
Java + libraries
    Apache Commons IO
    Apache Commons Lang
    Guava
    Eclipse - Only some jars required, those related to AST parsing
    Weka
Python + libraries
    gensim
    SciPy
    NumPy
R + libraries - For generating report
    Hmisc
    reshape
    xtable
LaTeX + packages - For generating report
    cite
    dblfloatfix
    glossaries
    hyperref
    ifthen
    cleverref
Sweave - For generating report
My library - https://bitbucket.org/el_loserio/library

Instructions:

To regenerate paper:
1. Download and extract the ArgoUML, JabRef, jEdit and muCommander benchmarks
from http://www.cs.wm.edu/semeru/data/benchmarks/.
2. Edit build/build.properties as approriate to point to where dependencies are
installed
3. Edit build/*.config as approriate to where you downloaded benchmarks
4. Run build/build.sh. Note that while the paper should be regenerated without
error, the classifiers do include some form of randomness, and therefore the
numbers, charts and classifiers generated will be different. In particular,
Sections V.A, V.B and V.C will be entirely inaccurate. To regenerate the paper
as written, download the results from
http://personal.cis.strath.ac.uk/s.davies/, extract them to the appropriate
place and run gant report.

To add a new project:
1. Create GoldSets and Queries directories, with files for each bug in the same
format as the samples above
2. Add a build/<project>.config file in the same format as an existing *.config
file.
3. build/build.sh should now build your new project. If you don't want to
generate other projects you can remove the corresponding *.config file. Note
that the report itself will not include your new data, and may fail to build if
other *.config files are removed

To run individual targets:
You can run individual stages of the process using commands such as gant
download_jedit etc. Note that while you can do the same stage for multiple
projects at once (e.g. gant classify_all) or multiple stages for one project at
once (e.g. gant links_jedit classify_jedit), multiple stages for multiple
projects (e.g. gant bugs_all parse_all) will probably fail.

To run without gant:
All the programs can be run as normal Python/Java programs as long as you
appropriately set the path up. Look at build/build.gant for an idea how. The
first argument to every program is the absolute path to the config file for the
project you wish to run.
