TIRA
In order to compete in the Semeval competition, your code will have to run on the TIRA system. We are distributing sample code that will allow you to adapt your existing code to run on TIRA.
These instructions worked for us, but they may require some help in adapting it to your code. It’s also possible that you encounter errors that we didn’t encounter, so please let us know the problems you run into so we can update this page!
Saving out your model
One of the files we are providing, train.py, trains a model and then writes out everything the model needs to a pickle file. In the sample code, which is a version of the Lab 7 sample solution, the things we need for the model are:
- An HNLabelsobject (label_makerin the sample code)
- An HNFeaturesobject (feature_makerin the sample code)
- A trained classifier object (clf, aMultinomialNB, in the sample code)
The saved model (say, model.pkl) should be copied to your virtual machine along with predict.py and any other files your prediction code depends on (e.g., HyperpartisanNewsReader.py and MyModelCode.py – or whatever files you’ve already made that you need).
Preprocessing the XML data
Another provided file is preprocess.py that does all of the preprocessing you’ve come to know and love for the training data! You’re welcome to edit that as you see fit (if you want to add extra preprocessing to the data beyond what we’ve already provided you with).
In the predict.py file, in the first block of code in the do_predict() under the comment “#Preprocessing!”,  the file handling and preprocessing is done. You’ll probably want to leave that part unchanged, and make edits only to the parts of the file that come later, but of course you can make changes if you’d like.
Reading in the saved model
The next part of predict.py reads in the pickled model information that you saved out with train.py.
The file predict.py reads in those pickled objects and uses them to make predictions on a new dataset. The TIRA evaluation script will only pass a directory name to the script with the location of the xml file(s) to be evaluated, not a filename. Consequently, predict.py takes an argument that’s the location of a directory of xml file(s) as one of its inputs. The do_process() function pre-processes every xml file in that directory and returns an open filepointer for a single xml file, though, so the rest of your prediction code should not need to change.
Command-line run
Before trying to get the web interface to run, make sure your command works when you ssh into your VM. Ours looks like:
python3 predict.py model.pkl /media/training-datasets/hyperpartisan-news-detection/pan19-hyperpartisan-news-detection-by-article-training-dataset-2018-11-22 out/
Where out/ is the name of a new, empty directory we’ll store the predictions in, and the path starting with /media points to a read-only copy of the hand-labeled portion of the training set.
Once that works, you can try the web interface.
Setup
You can access the TIRA website, then under “PAN 2012-2018”, select Hyperpartisan News Detection. Log in using your TIRA credentials.
On the website, go to the “My Software” tab and set up a new Software instance. The command from above would be translated as
python3 predict.py model.pkl $inputDataset $outputDir
If all of your code is in your home directory on the VM, you shouldn’t need to set any other variables.
You can choose which input dataset you want to run on from a drop-down. Eventually, you’ll want to run on the validation (and, even more eventually, the test) set. For a quicker check that everything is working, you can run an evaluation on the hand-labeled training data set, which is small.
Completing a “Run”
Once you set your software up, click “Run.” TIRA will move your VM into sandboxing mode, which means that you won’t be able to access it directly (any existing ssh connections will be closed) until your prediction run is finished. The shutdown-sandbox-startup process takes about 4 minutes, which is overhead on top of whatever time it takes your system to generate its predictions.
While the software is running, the TIRA page will continuously refresh itself. Once it is finished, there will be a new “Runs” section at the bottom of the “My Software” tab on TIRA. Your run will show up there, and clicking the “i” icon to the right will take you to a detailed view of that run.
Evaluating a “Run”
Once your run completes successfully, move to the Evaluator box. Select your run from the “Input Run” dropdown and then click “Run”. That will add another entry to the “Runs” box whose software is listed as “evaluator1.” Clicking the “i” icon on the far right of that row will show the output of the evaluator, which will include all of the performance metrics you’re likely to be interested in.