- This is the code for the MLRC2020 challenge w.r.t. the ACL 2020 paper Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings[1]
 - The code is build upon [1]:5d8fdbd4
 - Minor modifications have been made to 5d8fdbd4 in order to perform the ablation study. In case of any query relating to the original code[1], please contact Apoorv.
 
- Knowledge Graph Embedding model
- TuckER
 - Tested on {MetaQA_full, MetaQA_half} datasets
 
 - Question embedding models
- ALBERT
 - XLNet
 - Longformer
 - SentenceBERT (SentenceTransformer)
 - Tested on {fbwq_full, fbwq_half} datasets
 
 
- Python >= 3.7.5, pip
 - zip, unzip
 - Docker (Recommended)
 - Pytorch version 1.3.0a0+24ae9b5. For more info, visit here.
 
- 
Docker Image: Cuda-Python[2] can be used. Use the
runtimetag.- 
docker run -itd --rm --runtime=nvidia -v /raid/kgdnn/:/raid/kgdnn/ --name embedkgqa__4567 -e NVIDIA_VISIBLE_DEVICES=4,5,6,7 -p 7777:7777 qts8n/cuda-python:runtime
 
 - 
 - 
Alternatively, Docker Image: Embed_KGQA[3] can be used as well. It's build upon [2] and contains all the packages for conducting the experiments.
- Use 
envtag for image without models. - Use 
env-modelstag for image with models. - 
docker run -itd --rm --runtime=nvidia -v /raid/kgdnn/:/raid/kgdnn/ --name embedkgqa__4567 -e NVIDIA_VISIBLE_DEVICES=4,5,6,7 -p 7777:7777 jishnup/embed_kgqa:env
 - All the required packages and models (from the extended study with better performance) are readily available in [3].
- Model location within the docker container: 
/raid/mlrc2020models//raid/mlrc2020models/embeddings/contain the KG embedding models./raid/mlrc2020models/qa_models/contain the QA models.
 
 - Model location within the docker container: 
 
 - Use 
 - 
The experiments have been done using [2]. The requirements.txt packages' version have been set accordingly. This may vary w.r.t. [1].
 - 
KGQA/LSTMandKGQA/RoBERTadirectory nomenclature hasn't been changed to avoid unnecessary confusion w.r.t. the original codebase[1]. - 
fbwq_fullandfbwq_full_neware the same but independent existence is required because- Pretrained 
ComplExmodel usesfbwq_full_newas the dataset name - Trained 
SimplEmodel usesfbwq_fullas the dataset name 
 - Pretrained 
 - 
No
fbwq_full_newdataset was found in the data shared by the author[1], so went ahead with this setting. - 
Also, pretrained qa_models were absent in the data shared. The reproduction results are based on training scheme used by us.
 - 
For training QA datasets, use
batch_size >= 2. 
# Clone the repo
git clone https://github.com/jishnujayakumar/MLRC2020-EmbedKGQA && cd "$_"
# Set a new env variable called EMBED_KGQA_DIR with MLRC2020-EmbedKGQA/ directory's absolute path as value
# If using bash shell, run 
echo 'export EMBED_KGQA_DIR=`pwd`' >> ~/.bash_profile && source ~/.bash_profile
# Change script permissions
chmod -R 700 scripts/
# Initial setup
./scripts/initial_setup.sh
# Download and unzip, data and pretrained_models from the original EmbedKGQA paper
./scripts/download_artifacts.sh
# Install LibKGE
./scripts/install_libkge.sh- Steps to train KG embeddings.
 
Hyperparameters in the following commands are set w.r.t. [1].
# Method: 1
cd $EMBED_KGQA_DIR/KGQA/LSTM;
python main.py  --mode train \
            --nb_epochs 100 \
            --relation_dim 200 \
            --hidden_dim 256 \
            --gpu 0 \ #GPU-ID
            --freeze 0 \
            --batch_size 64 \
            --validate_every 4 \
            --hops <1/2/3> \ #n-hops
            --lr 0.0005 \
            --entdrop 0.1 \ 
            --reldrop 0.2 \
            --scoredrop 0.2 \
            --decay 1.0 \
            --model <ComplEx/TuckER> \ #KGE models
            --patience 10 \
            --ls 0.0 \
            --use_cuda True \ #Enable CUDA
            --kg_type <half/full>
        
# Method: 2
# Modify the hyperparameters in the script file w.r.t. your usecase
$EMBED_KGQA_DIR/scripts/train_metaQA.sh \
    <ComplEX/TuckER> \
    <half/full> \
    <1/2/3> \
    <batch_size> \
    <gpu_id> \
    <relation_dim># Method: 1
cd $EMBED_KGQA_DIR/KGQA/RoBERTa;
python main.py  --mode train \
                --relation_dim 200 \
                --que_embedding_model RoBERTa \
                --do_batch_norm 0 \
                --gpu 0 \
                --freeze 1 \
                --batch_size 16 \
                --validate_every 10 \
                --hops webqsp_half \
                --lr 0.00002 \
                --entdrop 0.0 
                --reldrop 0.0 \
                --scoredrop 0.0 \
                --decay 1.0 \
                --model ComplEx \
                --patience 20 \
                --ls 0.0 \
                --l3_reg 0.001 \
                --nb_epochs 200 \
                --outfile delete
# Method: 2
# Modify the hyperparameters in the script file w.r.t. your usecase
$EMBED_KGQA_DIR/scripts/train_webqsp.sh \
    <ComplEx/SimplE> \
    <RoBERTa/ALBERT/XLNet/Longformer/SentenceTransformer> \
    <half/full> \
    <batch_size> \
    <gpu_id> \
    <relation_dim>Set the mode parameter as test (keep the other hyperparameters same as used in training)
- Details about data and pretrained weights.
 - Details about dataset creation.
 - Presentation for [1] by Apoorv.
 
Please cite the following if you incorporate our work.
@article{P:2021,
  author = {P, Jishnu Jaykumar and Sardana, Ashish},
  title = {{[Re] Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings}},
  journal = {ReScience C},
  year = {2021},
  month = may,
  volume = {7},
  number = {2},
  pages = {{#15}},
  doi = {10.5281/zenodo.4834942},
  url = {https://zenodo.org/record/4834942/files/article.pdf},
  code_url = {https://github.com/jishnujayakumar/MLRC2020-EmbedKGQA},
  code_doi = {},
  code_swh = {swh:1:dir:c95bc4fec7023c258c7190975279b5baf6ef6725},
  data_url = {},
  data_doi = {},
  review_url = {https://openreview.net/forum?id=VFAwCMdWY7},
  type = {Replication},
  language = {Python},
  domain = {ML Reproducibility Challenge 2020},
  keywords = {knowledge graph, embeddings, multi-hop, question-answering, deep learning}
}Following 3 options are available for any clarification, comments or suggestions
- Join the discussion forum.
 - Create an issue.
 - Contact Jishnu or Ashish.