kaggle House Pricesをやってみる(Kerasによる実装)

何もかも足りていない。やれなければいけないことがいっぱいある。
Score: 0.18858.

概要は以下
https://trueman-developer.blogspot.com/2019/08/kaggle-house-prices.html
データの可視化は以下
https://trueman-developer.blogspot.com/2019/08/kaggle-house-prices_8.html


その他の初心者向けコンペは以下
タイタニック生存予測
https://trueman-developer.blogspot.com/2019/07/keras.html

手書き画像認識
https://trueman-developer.blogspot.com/2019/07/kaggle-digit-recognizer-keras.html





サンプルコード



カーネルに乗っけたほうがいいんだろうか?
https://github.com/ninomae-makoto/kaggle/blob/master/house-prices01.ipynb

データ読み込み


import pandas as pd
import numpy as np

# https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data より
origin_train = pd.read_csv("./data/house-prices-advanced-regression-techniques/train.csv")
origin_test = pd.read_csv("./data/house-prices-advanced-regression-techniques/test.csv")
np.random.seed(666)


不要データ削除 欠損値補完


train = origin_train.copy()
test = origin_test.copy()

del train['Id']
del train['Utilities']

del test['Id']
del test['Utilities']

    
train.MSSubClass = train.MSSubClass.replace([20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160 , 180, 190], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
train.MSZoning = train.MSZoning.replace(['A', 'C (all)', 'FV', 'I', 'RH', 'RL', 'RP', 'RM'], [0, 1, 2, 3, 4, 5, 6, 7])
train.LotFrontage = train.LotFrontage.fillna(train.LotFrontage.median())
train.Street = train.Street.replace(['Grvl', 'Pave'], [0, 1])
train.Alley = train.Alley.fillna('NA').replace(['Grvl', 'Pave', 'NA'], [0, 1, 2])
train.LotShape = train.LotShape.replace(['Reg', 'IR1', 'IR2', 'IR3'], [0, 1, 2, 3])
train.LandContour = train.LandContour.replace(['Lvl', 'Bnk', 'HLS', 'Low'], [0, 1, 2, 3])
train.LotConfig = train.LotConfig.replace(['Inside', 'Corner', 'CulDSac', 'FR2', 'FR3'], [0, 1, 2, 3, 4])
train.LandSlope = train.LandSlope.replace(['Gtl', 'Mod', 'Sev'], [0, 1, 2])
train.Neighborhood = train.Neighborhood.replace(
    ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'Names', 'NoRidge', 'NPkVill', 'NridgHt', 'NWAmes', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker', 'NAmes'],
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25])
train.Condition1 = train.Condition1.replace(['Artery', 'Feedr', 'Norm', 'RRNn', 'RRAn', 'PosN', 'PosA', 'RRNe', 'RRAe'], [0, 1, 2, 3, 4, 5, 6, 7, 8])
train.Condition2 = train.Condition2.replace(['Artery', 'Feedr', 'Norm', 'RRNn', 'RRAn', 'PosN', 'PosA', 'RRNe', 'RRAe'], [0, 1, 2, 3, 4, 5, 6, 7, 8])
train.BldgType = train.BldgType.replace(['1Fam', '2fmCon', 'Duplex', 'Twnhs', 'TwnhsE'], [0, 1, 2, 3, 4])
train.HouseStyle = train.HouseStyle.replace(['1Story', '1.5Fin', '1.5Unf', '2Story', '2.5Fin', '2.5Unf', 'SFoyer', 'SLvl'], [0, 1, 2, 3, 4, 5, 6, 7])
train.RoofStyle = train.RoofStyle.replace(['Flat', 'Gable', 'Gambrel', 'Hip', 'Mansard', 'Shed'], [0, 1, 2, 3, 4, 5])
train.RoofMatl = train.RoofMatl.replace(['ClyTile', 'CompShg', 'Membran', 'Metal', 'Roll', 'Tar&Grv', 'WdShake', 'WdShngl'], [0, 1, 2, 3, 4, 5, 6, 7])
train.Exterior1st = train.Exterior1st.replace(
    ['AsbShng', 'AsphShn', 'BrkComm', 'BrkFace', 'CBlock', 'CemntBd', 'HdBoard', 'ImStucc', 'MetalSd', 'Other', 'Plywood', 'PreCast', 'Stone', 'Stucco', 'VinylSd', 'Wd Sdng', 'WdShing'], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
train.Exterior2nd = train.Exterior2nd.replace(['Brk Cmn', 'CmentBd'], ['BrkComm', 'CemntBd']).replace(
    ['AsbShng', 'AsphShn', 'BrkComm', 'BrkFace', 'CBlock', 'CemntBd', 'HdBoard', 'ImStucc', 'MetalSd', 'Other', 'Plywood', 'PreCast', 'Stone', 'Stucco', 'VinylSd', 'Wd Sdng', 'WdShing', 'Wd Shng', 'CmentBd', 'Cmn'], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
train.MasVnrType = train.MasVnrType.fillna('None').replace(['BrkCmn', 'BrkFace', 'CBlock', 'None', 'Stone'], [0, 1, 2, 3, 4])
train.MasVnrArea = train.MasVnrArea.fillna(0)
train.ExterQual = train.ExterQual.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4])
train.ExterCond = train.ExterCond.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4])
train.Foundation = train.Foundation.replace(['BrkTil', 'CBlock', 'PConc', 'Slab', 'Stone', 'Wood'], [0, 1, 2, 3, 4, 5])
train.BsmtQual = train.BsmtQual.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5])
train.BsmtCond = train.BsmtCond.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5])
train.BsmtExposure = train.BsmtExposure.fillna('NA').replace(['Gd', 'Av', 'Mn', 'No', 'NA'], [0, 1, 2, 3, 4])
train.BsmtFinType1 = train.BsmtFinType1.fillna('NA').replace(['GLQ', 'ALQ', 'BLQ', 'Rec', 'LwQ', 'Unf', 'NA'], [0, 1, 2, 3, 4, 5, 6])
train.BsmtFinType2 = train.BsmtFinType2.fillna('NA').replace(['GLQ', 'ALQ', 'BLQ', 'Rec', 'LwQ', 'Unf', 'NA'], [0, 1, 2, 3, 4, 5, 6])
train.Heating = train.Heating.replace(['Floor', 'GasA', 'GasW', 'Grav', 'OthW', 'Wall'], [0, 1, 2, 3, 4, 5])
train.HeatingQC = train.HeatingQC.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4])
train.CentralAir = train.CentralAir.replace(['N', 'Y'], [0, 1])
train.Electrical = train.Electrical.fillna('SBrkr')
train.Electrical = train.Electrical.replace(['SBrkr', 'FuseA', 'FuseF', 'FuseP', 'Mix'], [0, 1, 2, 3, 4])
train.KitchenQual = train.KitchenQual.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4])
train.Functional = train.Functional.replace(['Typ', 'Min1', 'Min2', 'Mod', 'Maj1', 'Maj2', 'Sev', 'Sal'], [0, 1, 2, 3, 4, 5, 6, 7])
train.FireplaceQu = train.FireplaceQu.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5])
train.GarageType = train.GarageType.fillna('NA').replace(['2Types', 'Attchd', 'Basment', 'BuiltIn', 'CarPort', 'Detchd', 'NA'], [0, 1, 2, 3, 4, 5, 6])
train.GarageYrBlt = train.GarageYrBlt.fillna(0)
train.GarageFinish = train.GarageFinish.fillna('NA').replace(['Fin', 'RFn', 'Unf', 'NA'], [0, 1, 2, 3])
train.GarageQual = train.GarageQual.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5])
train.GarageCond = train.GarageCond.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5])
train.PavedDrive = train.PavedDrive.replace(['Y', 'P', 'N'], [0, 1, 2])
train.PoolQC = train.PoolQC.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'NA'], [0, 1, 2, 3, 4])
train.Fence = train.Fence.fillna('NA').replace(['GdPrv', 'MnPrv', 'GdWo', 'MnWw', 'NA'], [0, 1, 2, 3, 4])
train.MiscFeature = train.MiscFeature.fillna('NA').replace(['Elev', 'Gar2', 'Othr', 'Shed', 'TenC', 'NA'], [0, 1, 2, 3, 4, 5])
train.SaleType = train.SaleType.replace(['WD', 'CWD', 'VWD', 'New', 'COD', 'Con', 'ConLw', 'ConLI', 'ConLD', 'Oth'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
train.SaleCondition = train.SaleCondition.replace(['Normal', 'Abnorml', 'AdjLand', 'Alloca', 'Family', 'Partial'], [0, 1, 2, 3, 4, 5])



test.MSSubClass = test.MSSubClass.replace([20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160 , 180, 190], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
test.MSZoning = test.MSZoning.replace(['A', 'C (all)', 'FV', 'I', 'RH', 'RL', 'RP', 'RM'], [0, 1, 2, 3, 4, 5, 6, 7])
test.LotFrontage = test.LotFrontage.fillna(test.LotFrontage.median())
test.Street = test.Street.replace(['Grvl', 'Pave'], [0, 1])
test.Alley = test.Alley.fillna('NA').replace(['Grvl', 'Pave', 'NA'], [0, 1, 2])
test.LotShape = test.LotShape.replace(['Reg', 'IR1', 'IR2', 'IR3'], [0, 1, 2, 3])
test.LandContour = test.LandContour.replace(['Lvl', 'Bnk', 'HLS', 'Low'], [0, 1, 2, 3])
test.LotConfig = test.LotConfig.replace(['Inside', 'Corner', 'CulDSac', 'FR2', 'FR3'], [0, 1, 2, 3, 4])
test.LandSlope = test.LandSlope.replace(['Gtl', 'Mod', 'Sev'], [0, 1, 2])
test.Neighborhood = test.Neighborhood.replace(
    ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'Names', 'NoRidge', 'NPkVill', 'NridgHt', 'NWAmes', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker', 'NAmes'],
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25])
test.Condition1 = test.Condition1.replace(['Artery', 'Feedr', 'Norm', 'RRNn', 'RRAn', 'PosN', 'PosA', 'RRNe', 'RRAe'], [0, 1, 2, 3, 4, 5, 6, 7, 8])
test.Condition2 = test.Condition2.replace(['Artery', 'Feedr', 'Norm', 'RRNn', 'RRAn', 'PosN', 'PosA', 'RRNe', 'RRAe'], [0, 1, 2, 3, 4, 5, 6, 7, 8])
test.BldgType = test.BldgType.replace(['1Fam', '2fmCon', 'Duplex', 'Twnhs', 'TwnhsE'], [0, 1, 2, 3, 4])
test.HouseStyle = test.HouseStyle.replace(['1Story', '1.5Fin', '1.5Unf', '2Story', '2.5Fin', '2.5Unf', 'SFoyer', 'SLvl'], [0, 1, 2, 3, 4, 5, 6, 7])
test.RoofStyle = test.RoofStyle.replace(['Flat', 'Gable', 'Gambrel', 'Hip', 'Mansard', 'Shed'], [0, 1, 2, 3, 4, 6])
test.RoofMatl = test.RoofMatl.replace(['ClyTile', 'CompShg', 'Membran', 'Metal', 'Roll', 'Tar&Grv', 'WdShake', 'WdShngl'], [0, 1, 2, 3, 4, 5, 6, 7])
test.Exterior1st = test.Exterior1st.fillna('VinylSd').replace(
    ['AsbShng', 'AsphShn', 'BrkComm', 'BrkFace', 'CBlock', 'CemntBd', 'HdBoard', 'ImStucc', 'MetalSd', 'Other', 'Plywood', 'PreCast', 'Stone', 'Stucco', 'VinylSd', 'Wd Sdng', 'WdShing'], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
test.Exterior2nd = test.Exterior2nd.replace(['Brk Cmn', 'CmentBd'], ['BrkComm', 'CemntBd']).fillna('VinylSd').replace(
    ['AsbShng', 'AsphShn', 'BrkComm', 'BrkFace', 'CBlock', 'CemntBd', 'HdBoard', 'ImStucc', 'MetalSd', 'Other', 'Plywood', 'PreCast', 'Stone', 'Stucco', 'VinylSd', 'Wd Sdng', 'WdShing', 'Wd Shng', 'CmentBd', 'Cmn'], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
test.MasVnrType = test.MasVnrType.fillna('None').replace(['BrkCmn', 'BrkFace', 'CBlock', 'None', 'Stone'], [0, 1, 2, 3, 4])
test.MasVnrArea = test.MasVnrArea.fillna(0)
test.ExterQual = test.ExterQual.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4])
test.ExterCond = test.ExterCond.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4])
test.Foundation = test.Foundation.replace(['BrkTil', 'CBlock', 'PConc', 'Slab', 'Stone', 'Wood'], [0, 1, 2, 3, 4, 5])
test.BsmtQual = test.BsmtQual.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5])
test.BsmtCond = test.BsmtCond.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5])
test.BsmtExposure = test.BsmtExposure.fillna('NA').replace(['Gd', 'Av', 'Mn', 'No', 'NA'], [0, 1, 2, 3, 4])
test.BsmtFinType1 = test.BsmtFinType1.fillna('NA').replace(['GLQ', 'ALQ', 'BLQ', 'Rec', 'LwQ', 'Unf', 'NA'], [0, 1, 2, 3, 4, 5, 6])
test.BsmtFinSF1 = test.BsmtFinSF1.fillna(0)
test.BsmtFinType2 = test.BsmtFinType2.fillna('NA').replace(['GLQ', 'ALQ', 'BLQ', 'Rec', 'LwQ', 'Unf', 'NA'], [0, 1, 2, 3, 4, 5, 6])
test.BsmtFinSF2 = test.BsmtFinSF2.fillna(0)
test.BsmtUnfSF = test.BsmtUnfSF.fillna(test.BsmtUnfSF.median())
test.TotalBsmtSF = test.TotalBsmtSF.fillna(test.TotalBsmtSF.median())
test.Heating = test.Heating.replace(['Floor', 'GasA', 'GasW', 'Grav', 'OthW', 'Wall'], [0, 1, 2, 3, 4, 5])
test.HeatingQC = test.HeatingQC.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4])
test.CentralAir = test.CentralAir.replace(['N', 'Y'], [0, 1])
test.Electrical = test.Electrical.replace(['SBrkr', 'FuseA', 'FuseF', 'FuseP', 'Mix'], [0, 1, 2, 3, 4])
test.BsmtFullBath = test.BsmtFullBath.fillna(0)
test.BsmtHalfBath = test.BsmtHalfBath.fillna(0)
test.KitchenQual = test.KitchenQual.fillna('TA')
test.KitchenQual = test.KitchenQual.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4])
test.Functional = test.Functional.fillna('Typ')
test.Functional = test.Functional.replace(['Typ', 'Min1', 'Min2', 'Mod', 'Maj1', 'Maj2', 'Sev', 'Sal'], [0, 1, 2, 3, 4, 5, 6, 7])
test.FireplaceQu = test.FireplaceQu.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5])
test.GarageType = test.GarageType.fillna('NA').replace(['2Types', 'Attchd', 'Basment', 'BuiltIn', 'CarPort', 'Detchd', 'NA'], [0, 1, 2, 3, 4, 5, 6])
test.GarageYrBlt = test.GarageYrBlt.fillna(0)
test.GarageYrBlt[1132] = test.GarageYrBlt.median()
test.GarageFinish = test.GarageFinish.fillna('NA').replace(['Fin', 'RFn', 'Unf', 'NA'], [0, 1, 2, 3])
test.GarageCars[1116] = 0
test.GarageArea[1116] = 0
test.GarageQual = test.GarageQual.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5])
test.GarageCond = test.GarageCond.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5])
test.PavedDrive = test.PavedDrive.replace(['Y', 'P', 'N'], [0, 1, 2])
test.PoolQC = test.PoolQC.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'NA'], [0, 1, 2, 3, 4])
test.Fence = test.Fence.fillna('NA').replace(['GdPrv', 'MnPrv', 'GdWo', 'MnWw', 'NA'], [0, 1, 2, 3, 4])
test.MiscFeature = test.MiscFeature.fillna('NA').replace(['Elev', 'Gar2', 'Othr', 'Shed', 'TenC', 'NA'], [0, 1, 2, 3, 4, 5])
test.SaleType = test.SaleType.fillna('WD').replace(['WD', 'CWD', 'VWD', 'New', 'COD', 'Con', 'ConLw', 'ConLI', 'ConLD', 'Oth'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
test.SaleCondition = test.SaleCondition.replace(['Normal', 'Abnorml', 'AdjLand', 'Alloca', 'Family', 'Partial'], [0, 1, 2, 3, 4, 5])


train.head()


フォーマットを合わせる


from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler()

# 説明変数と目的変数に分割
y_train = train["SalePrice"].values

# 学習対象外を外す
# COLUMNS = list(train.columns)
# COLUMNS.remove('SalePrice')

# 学習対処のみ抽出
COLUMNS = ['OverallQual', 'GrLivArea', 'GarageCars', 'GarageArea',
       'TotalBsmtSF', '1stFlrSF', 'FullBath', 'TotRmsAbvGrd', 'YearBuilt',
       'YearRemodAdd', 'MasVnrArea', 'Fireplaces', 'BsmtFinSF1', 'Foundation',
       'LotFrontage', 'WoodDeckSF', '2ndFlrSF', 'OpenPorchSF', 'SaleCondition',
       'HalfBath']

x_train = train[COLUMNS].values

len(COLUMNS)
x_train.shape[1]

前回選定した上位20項目のみをx_trainに。


モデルの作成 学習


import keras
from keras.utils.np_utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Dropout

model = Sequential()
model.add(Dense(40, input_dim=x_train.shape[1], activation='relu'))
model.add(Dense(20, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1))
# Compile model
model.compile(optimizer ='adam', loss = 'mean_squared_error')
epochs_hist = model.fit(x_train, y_train, epochs = 500, batch_size = 3, verbose = 1)


結果をcsvに出力


predictions = model.predict( test[COLUMNS].values)

origin_test["SalePrice"] = np.round(predictions).astype(np.int)

# 出力する
origin_test[["Id","SalePrice"]].to_csv("./data/house-prices-advanced-regression-techniques/out.csv",index=False)





その他いろいろ試してみたこと



適当に設定


model = Sequential()
model.add(Dense(80, input_dim=x_train.shape[1], activation='relu'))
model.add(Dense(160, activation='relu'))
model.add(Dense(320, activation='relu'))
model.add(Dense(1))
model.compile(optimizer ='adam', loss = 'mean_squared_error')
model.fit(x_train, y_train, epochs = 1000, batch_size = 3, verbose = 1)

Score:0.621


もうちょいマシに


model = Sequential()
model.add(Dense(10, input_dim=x_train.shape[1], activation='relu'))
model.add(Dense(20, activation='relu'))
model.add(Dense(40, activation='relu'))
model.add(Dense(1))
model.compile(optimizer ='adam', loss = 'mean_squared_error')
model.fit(x_train, y_train, epochs = 1000, batch_size = 3, verbose = 1)

Score:0.54


Dropout追加


model = Sequential()
model.add(Dense(10, input_dim=x_train.shape[1], activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(20, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(40, activation='relu'))
model.add(Dense(1))
model.compile(optimizer ='adam', loss = 'mean_squared_error')
model.fit(x_train, y_train, epochs = 1000, batch_size = 3, verbose = 1)

Score:0.84


epochs を増やす


model = Sequential()
model.add(Dense(40, input_dim=x_train.shape[1], activation='relu'))
model.add(Dense(20, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1))
# Compile model
model.compile(optimizer ='adam', loss = 'mean_squared_error')
epochs_hist = model.fit(x_train, y_train, epochs = 10000, batch_size = 3, verbose = 1)

Score:0.62156
全く有効ではない。このくらいのデータだと大体100~くらいでいいようだ。


層を増やす


model = Sequential()
model.add(Dense(80, input_dim=x_train.shape[1], activation='relu'))
model.add(Dense(40, activation='relu'))
model.add(Dense(20, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1))
# Compile model
model.compile(optimizer ='adam', loss = 'mean_squared_error')
epochs_hist = model.fit(x_train, y_train, epochs = 1000, batch_size = 3, verbose = 1)

Score:0.62047
あまり有効ではない

データをいじるしかないようだ。


SalePriceと関連がある上位20項目に絞って学習


途中まであるデータを全て使っていたが絞るようにする。

# 学習対象外を外す
# COLUMNS = list(train.
columns)
# COLUMNS.remove('SalePrice')

# 学習対処のみ抽出
COLUMNS = ['OverallQual', 'GrLivArea', 'GarageCars', 'GarageArea',
       'TotalBsmtSF', '1stFlrSF', 'FullBath', 'TotRmsAbvGrd', 'YearBuilt',
       'YearRemodAdd', 'MasVnrArea', 'Fireplaces', 'BsmtFinSF1', 'Foundation',
       'LotFrontage', 'WoodDeckSF', '2ndFlrSF', 'OpenPorchSF', 'SaleCondition',
       'HalfBath']

Score:0.19811
一気に上がった。
つまり闇雲に引数を増やしたり学習回数を増やしたりしても意味がないらしい。


SalePriceと関連がある上位10項目に絞って学習


Score:0.20771
もう少し増やすといい?


SalePriceと関連がある上位30項目に絞って学習


heatmapで出した時にマイナスにならなければOK?
Score:0.20771
というわけでもなかった。


データが0~1に入るよう変換


from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler()

# 説明変数と目的変数に分割
y_train = train["SalePrice"].values

# 学習対処のみ抽出
COLUMNS = ['OverallQual', 'GrLivArea', 'GarageCars', 'GarageArea',
       'TotalBsmtSF', '1stFlrSF', 'FullBath', 'TotRmsAbvGrd', 'YearBuilt',
       'YearRemodAdd', 'MasVnrArea', 'Fireplaces', 'BsmtFinSF1', 'Foundation',
       'LotFrontage', 'WoodDeckSF', '2ndFlrSF', 'OpenPorchSF', 'SaleCondition',
       'HalfBath']

x_train = train[COLUMNS].values
scale = StandardScaler()
x_train = scaler.fit_transform(x_train)

x_test = test[COLUMNS].values
x_test = scaler.fit_transform(x_test)


# x_train, x_test, y_train, y_test = train_test_split(x_train , y_train, test_size=0.25, random_state=42)

len(COLUMNS)
x_train.shape[1]

Score:0.27746
むしろ悪くなっている。
事前に手動でやっていたからだろうか?


まだ説明変数を計算して増やすなど打てる手はあるがとりあえずここまで。
知識が全然足りていないのでもう少しどうにかしないといけない。

2019年8月10日土曜日