Finding $τ^{-} → μ^{-}μ^{+}μ^{-}$ Using tensorFlow

This is a real example from CERN competition, hosted on Kaggle (ended Oct 2015). It asks to find $τ → 3μ$ (Lepton flavor violation if detected would indicate new physics beyond the SM). Some of the labeled data are artificially made, but the successful learning models from the competition can be used for real detections. See this technical CERN report.

The 1st place term solution published on the Kaggle blog and the github showed they used "physical intuition + xgboosts + ANN". The tree boost method was used by some winning terms in the last contest (actually xgboosts was popular in many other contest too), which was searching for Higgs boson (ended Sept 2014). Reference "Higgs Boson Discovery with Boosted Trees" and "The Higgs boson machine learning challenge"

This 3-year-old video by David Hogg (NYU) (April 2013) reflects how much things have changed in term of the adoption of ANN in the science community. Thanks to tensorFlow (Google), Torch (NYU, facebook). ANN is a lot easier to use now!

In [3]:
from IPython.display import YouTubeVideo
YouTubeVideo("aA3qdegi8Vw", start=564,end=642, autoplay=0,
             theme="light", color="white", rel=0, 
             height=400, width=500,showinfo=0)
Out[3]:
In [ ]:
'''
My notes on boosting discussed the main difference between tree
bagging (random forest) and tree boosting. Essentially one is
variance reduction and the other increasing accuracy, but prone 
to overfitage.

We can also combine the two 
the following code were copied from 
https://www.kaggle.com/benhamner/flavours-of-physics/
                           rf-xgboost-example/code
'''

from sklearn.ensemble import RandomForestClassifier
import xgboost as xgb

print "Train a Random Forest model"
rf = RandomForestClassifier(n_estimators=100, random_state=1)

rf.fit(train[features], train["signal"])         
#train[features] are features, 
#train["signal"] = 0/1 labels for back/foreground

print "Train a XGBoost model"
params = {"objective": "binary:logistic",
          "eta": 0.3,
          "max_depth": 5,
          "min_child_weight": 3,
          "silent": 1,
          "subsample": 0.7,
          "colsample_bytree": 0.7,
          "seed": 1}
num_trees=250
gbm = xgb.train(params, xgb.DMatrix(train[features],
                       train["signal"]), num_trees)

#Combining tree boost and random forest
print "Make predictions on the test set"
predictions = (rf.predict_proba(test[features])[:,1] +
              gbm.predict(xgb.DMatrix(test[features])))/2
In [ ]:
'''
Walk through 1st place term technical Write-Up on the blog, 
find out that they first create new attribute--new mass,
which turns out to be highly correlated to the measured mass
in the train set

the following codes were copied from their github solution
https://github.com/aguschin/flavours-of-physics/blob/
                       master/AssemblingFinalSolution4.ipynb

'''

for tt in train,test, check_agreement, check_correlation:
    tt['p0_pz'] = (tt.p0_p**2 - tt.p0_pt**2)**0.5
    tt['p1_pz'] = (tt.p1_p**2 - tt.p1_pt**2)**0.5
    tt['p2_pz'] = (tt.p2_p**2 - tt.p2_pt**2)**0.5
    tt['pz'] = tt.p0_pz + tt.p1_pz + tt.p2_pz
    tt['p'] = (tt.pt**2 + tt.pz**2)**0.5
    tt['speed'] = tt.FlightDistance / tt.LifeTime
    tt['new_mass'] = tt.p / tt.speed
    
features_m = features + ['new_mass']

'''
cross-validation was used throughout.

The mass term was missing in the testing set, so 
they used xgboost to predict mass, called 
new_mass2 in the training set also, and add two more new features 
 'new_mass_delta', 'new_mass_ratio'
 
then they used nolearn.lasagne for neural net + parallelism

'''
import nolearn
from lasagne.layers import DenseLayer, InputLayer, NINLayer
from lasagne.nonlinearities import softmax, rectify, sigmoid
from lasagne.updates import rmsprop, adagrad, sgd, adadelta
from nolearn.lasagne import NeuralNet

net = NeuralNet(layers=layers0,

                 input_shape=(None, Xtrain.shape[1]),
                 dense1_num_units=8, # 1 hidden layer with 8 neurons

                 output_num_units=2,
                 output_nonlinearity=softmax, #then softmax layer

                 update=adagrad,
                 update_learning_rate=0.01,

                 eval_size=.005,
                 verbose=1,
                 max_epochs=3000)

net.fit(Xtrain, Ytrain) ;

'''
in summary from their codes we learn that xgboost is about 
95-98% accurate with varied numbers of trees
and NN is about 99.8% accurate with k-ford (k=5)
then they blended everything together  

They didn't show how they got the final boosting formula, but 
looked like it was from other NN to calculate the weights

Finally they got 0.999997749873 accuracy
'''
In [4]:
'''
One can replace the nolearn.lasagne part in their solution
by tensorFlow logistic regression, 
it runs in parallel and its accuracy is about 99.2%.

reference
www.tensorflow.org/versions/0.6.0/tutorials/mnist/pros/index.html 
'''

import tensorflow as tf
import numpy as np
In [5]:
sess = tf.InteractiveSession()

#the reason tensorFlow need placehodler is because
#it runs in parallel. It needs to assign a batch of
#matrices to different processors.  

x = tf.placeholder(tf.float32, [None, 45]) #number of features
y_ = tf.placeholder(tf.float32, [None, 2])

W = tf.Variable(tf.zeros([45, 2])) 
b = tf.Variable(tf.zeros([2]))


py_x = model(X, w)
y = tf.nn.softmax(tf.matmul(x,W) + b) 

# Define loss and optimizer then train
cross = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x, Y))
train_op = tf.train.GradientDescentOptimizer(0.5).minimize(cross)
predict_op = tf.argmax(py_x, 1)


sess.run(tf.initialize_all_variables()) 

for i in range(1000):
  batch = xtrain.next_batch(50) #batch size
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})