AI競馬で回収率100%越えを目指して

はなむけ競馬場

python プログラム

[3日目] How I made top 0.3% on a Kaggle competition 写経

投稿日:

How I made top 0.3% on a Kaggle competitionを写経する3日目。

前回はコチラ

対数変換や二乗する

数値データをlogをとったり、二乗したりして新しい特徴を作る。


#対数を取る関数
def logs(res,ls):
    m = res.shape[1]
    for l in ls:
        #列を追加する
        res = res.assign(newcol=pd.Series(np.log(1.01+res[l])).values)
        res.columns.values[m] = l + "_log"
        m += 1

log_features = ['LotFrontage','LotArea','MasVnrArea','BsmtFinSF1','BsmtFinSF2','BsmtUnfSF',
                 'TotalBsmtSF','1stFlrSF','2ndFlrSF','LowQualFinSF','GrLivArea',
                 'BsmtFullBath','BsmtHalfBath','FullBath','HalfBath','BedroomAbvGr','KitchenAbvGr',
                 'TotRmsAbvGrd','Fireplaces','GarageCars','GarageArea','WoodDeckSF','OpenPorchSF',
                 'EnclosedPorch','3SsnPorch','ScreenPorch','PoolArea','MiscVal','YearRemodAdd','TotalSF']

all_features = logs(all_features, log_features)</pre>
def squares(res,ls):
    m = res.shape[1]
    for l in ls:
        res = res.assign(newcol=pd.Series(res[l]*res[l]).values)
        res.columns.values[m] = l + "_sq"
        m+=1
        
    return res

squared_features = ['YearRemodAdd', 'LotFrontage_log', 
              'TotalBsmtSF_log', '1stFlrSF_log', '2ndFlrSF_log', 'GrLivArea_log',
              'GarageCars_log', 'GarageArea_log']
all_features = squares(all_features, squared_features)



def squares(res,ls):
    m = res.shape[1]
    for l in ls:
        res = res.assign(newcol=pd.Series(res[l]*res[l]).values)
        res.columns.values[m] = l + "_sq"
        m+=1
        
    return res

squared_features = ['YearRemodAdd', 'LotFrontage_log', 
              'TotalBsmtSF_log', '1stFlrSF_log', '2ndFlrSF_log', 'GrLivArea_log',
              'GarageCars_log', 'GarageArea_log']
all_features = squares(all_features, squared_features)

 

モデルに対数がいいのか、二乗がいいのかなんて判別できないので、特徴量を増やしてみるのだと思う。

 

カテゴリ変数はget_dummies()で変換

<pre><span class="n">all_features</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">get_dummies</span><span class="p">(</span><span class="n">all_features</span><span class="p">)</span><span class="o">.</span><span class="n">reset_index</span><span class="p">(</span><span class="n">drop</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></pre>

 

重複した特徴量を除く

機械的に作成した特徴量が重複している場合が考えられるので、のぞいておく。

<pre><span class="c1"># Remove any duplicated column names</span>
<span class="n">all_features</span> <span class="o">=</span> <span class="n">all_features</span><span class="o">.</span><span class="n">loc</span><span class="p">[:,</span><span class="o">~</span><span class="n">all_features</span><span class="o">.</span><span class="n">columns</span><span class="o">.</span><span class="n">duplicated</span><span class="p">()]</span></pre>

-python, プログラム

Copyright© はなむけ競馬場 , 2021 All Rights Reserved Powered by AFFINGER5.