Score: 0.18858.
概要は以下
https://trueman-developer.blogspot.com/2019/08/kaggle-house-prices.html
データの可視化は以下
https://trueman-developer.blogspot.com/2019/08/kaggle-house-prices_8.html
その他の初心者向けコンペは以下
タイタニック生存予測
https://trueman-developer.blogspot.com/2019/07/keras.html
手書き画像認識
https://trueman-developer.blogspot.com/2019/07/kaggle-digit-recognizer-keras.html
サンプルコード
カーネルに乗っけたほうがいいんだろうか?
https://github.com/ninomae-makoto/kaggle/blob/master/house-prices01.ipynb
データ読み込み
import pandas as pd import numpy as np # https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data より origin_train = pd.read_csv("./data/house-prices-advanced-regression-techniques/train.csv") origin_test = pd.read_csv("./data/house-prices-advanced-regression-techniques/test.csv") np.random.seed(666)
不要データ削除 欠損値補完
train = origin_train.copy() test = origin_test.copy() del train['Id'] del train['Utilities'] del test['Id'] del test['Utilities'] train.MSSubClass = train.MSSubClass.replace([20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160 , 180, 190], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]) train.MSZoning = train.MSZoning.replace(['A', 'C (all)', 'FV', 'I', 'RH', 'RL', 'RP', 'RM'], [0, 1, 2, 3, 4, 5, 6, 7]) train.LotFrontage = train.LotFrontage.fillna(train.LotFrontage.median()) train.Street = train.Street.replace(['Grvl', 'Pave'], [0, 1]) train.Alley = train.Alley.fillna('NA').replace(['Grvl', 'Pave', 'NA'], [0, 1, 2]) train.LotShape = train.LotShape.replace(['Reg', 'IR1', 'IR2', 'IR3'], [0, 1, 2, 3]) train.LandContour = train.LandContour.replace(['Lvl', 'Bnk', 'HLS', 'Low'], [0, 1, 2, 3]) train.LotConfig = train.LotConfig.replace(['Inside', 'Corner', 'CulDSac', 'FR2', 'FR3'], [0, 1, 2, 3, 4]) train.LandSlope = train.LandSlope.replace(['Gtl', 'Mod', 'Sev'], [0, 1, 2]) train.Neighborhood = train.Neighborhood.replace( ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'Names', 'NoRidge', 'NPkVill', 'NridgHt', 'NWAmes', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker', 'NAmes'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]) train.Condition1 = train.Condition1.replace(['Artery', 'Feedr', 'Norm', 'RRNn', 'RRAn', 'PosN', 'PosA', 'RRNe', 'RRAe'], [0, 1, 2, 3, 4, 5, 6, 7, 8]) train.Condition2 = train.Condition2.replace(['Artery', 'Feedr', 'Norm', 'RRNn', 'RRAn', 'PosN', 'PosA', 'RRNe', 'RRAe'], [0, 1, 2, 3, 4, 5, 6, 7, 8]) train.BldgType = train.BldgType.replace(['1Fam', '2fmCon', 'Duplex', 'Twnhs', 'TwnhsE'], [0, 1, 2, 3, 4]) train.HouseStyle = train.HouseStyle.replace(['1Story', '1.5Fin', '1.5Unf', '2Story', '2.5Fin', '2.5Unf', 'SFoyer', 'SLvl'], [0, 1, 2, 3, 4, 5, 6, 7]) train.RoofStyle = train.RoofStyle.replace(['Flat', 'Gable', 'Gambrel', 'Hip', 'Mansard', 'Shed'], [0, 1, 2, 3, 4, 5]) train.RoofMatl = train.RoofMatl.replace(['ClyTile', 'CompShg', 'Membran', 'Metal', 'Roll', 'Tar&Grv', 'WdShake', 'WdShngl'], [0, 1, 2, 3, 4, 5, 6, 7]) train.Exterior1st = train.Exterior1st.replace( ['AsbShng', 'AsphShn', 'BrkComm', 'BrkFace', 'CBlock', 'CemntBd', 'HdBoard', 'ImStucc', 'MetalSd', 'Other', 'Plywood', 'PreCast', 'Stone', 'Stucco', 'VinylSd', 'Wd Sdng', 'WdShing'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]) train.Exterior2nd = train.Exterior2nd.replace(['Brk Cmn', 'CmentBd'], ['BrkComm', 'CemntBd']).replace( ['AsbShng', 'AsphShn', 'BrkComm', 'BrkFace', 'CBlock', 'CemntBd', 'HdBoard', 'ImStucc', 'MetalSd', 'Other', 'Plywood', 'PreCast', 'Stone', 'Stucco', 'VinylSd', 'Wd Sdng', 'WdShing', 'Wd Shng', 'CmentBd', 'Cmn'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) train.MasVnrType = train.MasVnrType.fillna('None').replace(['BrkCmn', 'BrkFace', 'CBlock', 'None', 'Stone'], [0, 1, 2, 3, 4]) train.MasVnrArea = train.MasVnrArea.fillna(0) train.ExterQual = train.ExterQual.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4]) train.ExterCond = train.ExterCond.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4]) train.Foundation = train.Foundation.replace(['BrkTil', 'CBlock', 'PConc', 'Slab', 'Stone', 'Wood'], [0, 1, 2, 3, 4, 5]) train.BsmtQual = train.BsmtQual.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5]) train.BsmtCond = train.BsmtCond.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5]) train.BsmtExposure = train.BsmtExposure.fillna('NA').replace(['Gd', 'Av', 'Mn', 'No', 'NA'], [0, 1, 2, 3, 4]) train.BsmtFinType1 = train.BsmtFinType1.fillna('NA').replace(['GLQ', 'ALQ', 'BLQ', 'Rec', 'LwQ', 'Unf', 'NA'], [0, 1, 2, 3, 4, 5, 6]) train.BsmtFinType2 = train.BsmtFinType2.fillna('NA').replace(['GLQ', 'ALQ', 'BLQ', 'Rec', 'LwQ', 'Unf', 'NA'], [0, 1, 2, 3, 4, 5, 6]) train.Heating = train.Heating.replace(['Floor', 'GasA', 'GasW', 'Grav', 'OthW', 'Wall'], [0, 1, 2, 3, 4, 5]) train.HeatingQC = train.HeatingQC.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4]) train.CentralAir = train.CentralAir.replace(['N', 'Y'], [0, 1]) train.Electrical = train.Electrical.fillna('SBrkr') train.Electrical = train.Electrical.replace(['SBrkr', 'FuseA', 'FuseF', 'FuseP', 'Mix'], [0, 1, 2, 3, 4]) train.KitchenQual = train.KitchenQual.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4]) train.Functional = train.Functional.replace(['Typ', 'Min1', 'Min2', 'Mod', 'Maj1', 'Maj2', 'Sev', 'Sal'], [0, 1, 2, 3, 4, 5, 6, 7]) train.FireplaceQu = train.FireplaceQu.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5]) train.GarageType = train.GarageType.fillna('NA').replace(['2Types', 'Attchd', 'Basment', 'BuiltIn', 'CarPort', 'Detchd', 'NA'], [0, 1, 2, 3, 4, 5, 6]) train.GarageYrBlt = train.GarageYrBlt.fillna(0) train.GarageFinish = train.GarageFinish.fillna('NA').replace(['Fin', 'RFn', 'Unf', 'NA'], [0, 1, 2, 3]) train.GarageQual = train.GarageQual.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5]) train.GarageCond = train.GarageCond.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5]) train.PavedDrive = train.PavedDrive.replace(['Y', 'P', 'N'], [0, 1, 2]) train.PoolQC = train.PoolQC.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'NA'], [0, 1, 2, 3, 4]) train.Fence = train.Fence.fillna('NA').replace(['GdPrv', 'MnPrv', 'GdWo', 'MnWw', 'NA'], [0, 1, 2, 3, 4]) train.MiscFeature = train.MiscFeature.fillna('NA').replace(['Elev', 'Gar2', 'Othr', 'Shed', 'TenC', 'NA'], [0, 1, 2, 3, 4, 5]) train.SaleType = train.SaleType.replace(['WD', 'CWD', 'VWD', 'New', 'COD', 'Con', 'ConLw', 'ConLI', 'ConLD', 'Oth'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) train.SaleCondition = train.SaleCondition.replace(['Normal', 'Abnorml', 'AdjLand', 'Alloca', 'Family', 'Partial'], [0, 1, 2, 3, 4, 5]) test.MSSubClass = test.MSSubClass.replace([20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160 , 180, 190], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]) test.MSZoning = test.MSZoning.replace(['A', 'C (all)', 'FV', 'I', 'RH', 'RL', 'RP', 'RM'], [0, 1, 2, 3, 4, 5, 6, 7]) test.LotFrontage = test.LotFrontage.fillna(test.LotFrontage.median()) test.Street = test.Street.replace(['Grvl', 'Pave'], [0, 1]) test.Alley = test.Alley.fillna('NA').replace(['Grvl', 'Pave', 'NA'], [0, 1, 2]) test.LotShape = test.LotShape.replace(['Reg', 'IR1', 'IR2', 'IR3'], [0, 1, 2, 3]) test.LandContour = test.LandContour.replace(['Lvl', 'Bnk', 'HLS', 'Low'], [0, 1, 2, 3]) test.LotConfig = test.LotConfig.replace(['Inside', 'Corner', 'CulDSac', 'FR2', 'FR3'], [0, 1, 2, 3, 4]) test.LandSlope = test.LandSlope.replace(['Gtl', 'Mod', 'Sev'], [0, 1, 2]) test.Neighborhood = test.Neighborhood.replace( ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'Names', 'NoRidge', 'NPkVill', 'NridgHt', 'NWAmes', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker', 'NAmes'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]) test.Condition1 = test.Condition1.replace(['Artery', 'Feedr', 'Norm', 'RRNn', 'RRAn', 'PosN', 'PosA', 'RRNe', 'RRAe'], [0, 1, 2, 3, 4, 5, 6, 7, 8]) test.Condition2 = test.Condition2.replace(['Artery', 'Feedr', 'Norm', 'RRNn', 'RRAn', 'PosN', 'PosA', 'RRNe', 'RRAe'], [0, 1, 2, 3, 4, 5, 6, 7, 8]) test.BldgType = test.BldgType.replace(['1Fam', '2fmCon', 'Duplex', 'Twnhs', 'TwnhsE'], [0, 1, 2, 3, 4]) test.HouseStyle = test.HouseStyle.replace(['1Story', '1.5Fin', '1.5Unf', '2Story', '2.5Fin', '2.5Unf', 'SFoyer', 'SLvl'], [0, 1, 2, 3, 4, 5, 6, 7]) test.RoofStyle = test.RoofStyle.replace(['Flat', 'Gable', 'Gambrel', 'Hip', 'Mansard', 'Shed'], [0, 1, 2, 3, 4, 6]) test.RoofMatl = test.RoofMatl.replace(['ClyTile', 'CompShg', 'Membran', 'Metal', 'Roll', 'Tar&Grv', 'WdShake', 'WdShngl'], [0, 1, 2, 3, 4, 5, 6, 7]) test.Exterior1st = test.Exterior1st.fillna('VinylSd').replace( ['AsbShng', 'AsphShn', 'BrkComm', 'BrkFace', 'CBlock', 'CemntBd', 'HdBoard', 'ImStucc', 'MetalSd', 'Other', 'Plywood', 'PreCast', 'Stone', 'Stucco', 'VinylSd', 'Wd Sdng', 'WdShing'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]) test.Exterior2nd = test.Exterior2nd.replace(['Brk Cmn', 'CmentBd'], ['BrkComm', 'CemntBd']).fillna('VinylSd').replace( ['AsbShng', 'AsphShn', 'BrkComm', 'BrkFace', 'CBlock', 'CemntBd', 'HdBoard', 'ImStucc', 'MetalSd', 'Other', 'Plywood', 'PreCast', 'Stone', 'Stucco', 'VinylSd', 'Wd Sdng', 'WdShing', 'Wd Shng', 'CmentBd', 'Cmn'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) test.MasVnrType = test.MasVnrType.fillna('None').replace(['BrkCmn', 'BrkFace', 'CBlock', 'None', 'Stone'], [0, 1, 2, 3, 4]) test.MasVnrArea = test.MasVnrArea.fillna(0) test.ExterQual = test.ExterQual.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4]) test.ExterCond = test.ExterCond.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4]) test.Foundation = test.Foundation.replace(['BrkTil', 'CBlock', 'PConc', 'Slab', 'Stone', 'Wood'], [0, 1, 2, 3, 4, 5]) test.BsmtQual = test.BsmtQual.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5]) test.BsmtCond = test.BsmtCond.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5]) test.BsmtExposure = test.BsmtExposure.fillna('NA').replace(['Gd', 'Av', 'Mn', 'No', 'NA'], [0, 1, 2, 3, 4]) test.BsmtFinType1 = test.BsmtFinType1.fillna('NA').replace(['GLQ', 'ALQ', 'BLQ', 'Rec', 'LwQ', 'Unf', 'NA'], [0, 1, 2, 3, 4, 5, 6]) test.BsmtFinSF1 = test.BsmtFinSF1.fillna(0) test.BsmtFinType2 = test.BsmtFinType2.fillna('NA').replace(['GLQ', 'ALQ', 'BLQ', 'Rec', 'LwQ', 'Unf', 'NA'], [0, 1, 2, 3, 4, 5, 6]) test.BsmtFinSF2 = test.BsmtFinSF2.fillna(0) test.BsmtUnfSF = test.BsmtUnfSF.fillna(test.BsmtUnfSF.median()) test.TotalBsmtSF = test.TotalBsmtSF.fillna(test.TotalBsmtSF.median()) test.Heating = test.Heating.replace(['Floor', 'GasA', 'GasW', 'Grav', 'OthW', 'Wall'], [0, 1, 2, 3, 4, 5]) test.HeatingQC = test.HeatingQC.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4]) test.CentralAir = test.CentralAir.replace(['N', 'Y'], [0, 1]) test.Electrical = test.Electrical.replace(['SBrkr', 'FuseA', 'FuseF', 'FuseP', 'Mix'], [0, 1, 2, 3, 4]) test.BsmtFullBath = test.BsmtFullBath.fillna(0) test.BsmtHalfBath = test.BsmtHalfBath.fillna(0) test.KitchenQual = test.KitchenQual.fillna('TA') test.KitchenQual = test.KitchenQual.replace(['Ex', 'Gd', 'TA', 'Fa', 'Po'], [0, 1, 2, 3, 4]) test.Functional = test.Functional.fillna('Typ') test.Functional = test.Functional.replace(['Typ', 'Min1', 'Min2', 'Mod', 'Maj1', 'Maj2', 'Sev', 'Sal'], [0, 1, 2, 3, 4, 5, 6, 7]) test.FireplaceQu = test.FireplaceQu.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5]) test.GarageType = test.GarageType.fillna('NA').replace(['2Types', 'Attchd', 'Basment', 'BuiltIn', 'CarPort', 'Detchd', 'NA'], [0, 1, 2, 3, 4, 5, 6]) test.GarageYrBlt = test.GarageYrBlt.fillna(0) test.GarageYrBlt[1132] = test.GarageYrBlt.median() test.GarageFinish = test.GarageFinish.fillna('NA').replace(['Fin', 'RFn', 'Unf', 'NA'], [0, 1, 2, 3]) test.GarageCars[1116] = 0 test.GarageArea[1116] = 0 test.GarageQual = test.GarageQual.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5]) test.GarageCond = test.GarageCond.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'Po', 'NA'], [0, 1, 2, 3, 4, 5]) test.PavedDrive = test.PavedDrive.replace(['Y', 'P', 'N'], [0, 1, 2]) test.PoolQC = test.PoolQC.fillna('NA').replace(['Ex', 'Gd', 'TA', 'Fa', 'NA'], [0, 1, 2, 3, 4]) test.Fence = test.Fence.fillna('NA').replace(['GdPrv', 'MnPrv', 'GdWo', 'MnWw', 'NA'], [0, 1, 2, 3, 4]) test.MiscFeature = test.MiscFeature.fillna('NA').replace(['Elev', 'Gar2', 'Othr', 'Shed', 'TenC', 'NA'], [0, 1, 2, 3, 4, 5]) test.SaleType = test.SaleType.fillna('WD').replace(['WD', 'CWD', 'VWD', 'New', 'COD', 'Con', 'ConLw', 'ConLI', 'ConLD', 'Oth'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) test.SaleCondition = test.SaleCondition.replace(['Normal', 'Abnorml', 'AdjLand', 'Alloca', 'Family', 'Partial'], [0, 1, 2, 3, 4, 5]) train.head()
フォーマットを合わせる
from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler scaler=MinMaxScaler() # 説明変数と目的変数に分割 y_train = train["SalePrice"].values # 学習対象外を外す # COLUMNS = list(train.columns) # COLUMNS.remove('SalePrice') # 学習対処のみ抽出 COLUMNS = ['OverallQual', 'GrLivArea', 'GarageCars', 'GarageArea', 'TotalBsmtSF', '1stFlrSF', 'FullBath', 'TotRmsAbvGrd', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'Fireplaces', 'BsmtFinSF1', 'Foundation', 'LotFrontage', 'WoodDeckSF', '2ndFlrSF', 'OpenPorchSF', 'SaleCondition', 'HalfBath'] x_train = train[COLUMNS].values len(COLUMNS) x_train.shape[1]
前回選定した上位20項目のみをx_trainに。
モデルの作成 学習
import keras from keras.utils.np_utils import to_categorical from keras.models import Sequential from keras.layers import Dense, Dropout model = Sequential() model.add(Dense(40, input_dim=x_train.shape[1], activation='relu')) model.add(Dense(20, activation='relu')) model.add(Dense(10, activation='relu')) model.add(Dense(1)) # Compile model model.compile(optimizer ='adam', loss = 'mean_squared_error') epochs_hist = model.fit(x_train, y_train, epochs = 500, batch_size = 3, verbose = 1)
結果をcsvに出力
predictions = model.predict( test[COLUMNS].values) origin_test["SalePrice"] = np.round(predictions).astype(np.int) # 出力する origin_test[["Id","SalePrice"]].to_csv("./data/house-prices-advanced-regression-techniques/out.csv",index=False)
その他いろいろ試してみたこと
適当に設定
model = Sequential() model.add(Dense(80, input_dim=x_train.shape[1], activation='relu')) model.add(Dense(160, activation='relu')) model.add(Dense(320, activation='relu')) model.add(Dense(1)) model.compile(optimizer ='adam', loss = 'mean_squared_error') model.fit(x_train, y_train, epochs = 1000, batch_size = 3, verbose = 1)
Score:0.621
もうちょいマシに
model = Sequential() model.add(Dense(10, input_dim=x_train.shape[1], activation='relu')) model.add(Dense(20, activation='relu')) model.add(Dense(40, activation='relu')) model.add(Dense(1)) model.compile(optimizer ='adam', loss = 'mean_squared_error') model.fit(x_train, y_train, epochs = 1000, batch_size = 3, verbose = 1)
Score:0.54
Dropout追加
model = Sequential() model.add(Dense(10, input_dim=x_train.shape[1], activation='relu')) model.add(Dropout(0.25)) model.add(Dense(20, activation='relu')) model.add(Dropout(0.25)) model.add(Dense(40, activation='relu')) model.add(Dense(1)) model.compile(optimizer ='adam', loss = 'mean_squared_error') model.fit(x_train, y_train, epochs = 1000, batch_size = 3, verbose = 1)
Score:0.84
epochs を増やす
model = Sequential() model.add(Dense(40, input_dim=x_train.shape[1], activation='relu')) model.add(Dense(20, activation='relu')) model.add(Dense(10, activation='relu')) model.add(Dense(1)) # Compile model model.compile(optimizer ='adam', loss = 'mean_squared_error') epochs_hist = model.fit(x_train, y_train, epochs = 10000, batch_size = 3, verbose = 1)
Score:0.62156
全く有効ではない。このくらいのデータだと大体100~くらいでいいようだ。
層を増やす
model = Sequential() model.add(Dense(80, input_dim=x_train.shape[1], activation='relu')) model.add(Dense(40, activation='relu')) model.add(Dense(20, activation='relu')) model.add(Dense(10, activation='relu')) model.add(Dense(1)) # Compile model model.compile(optimizer ='adam', loss = 'mean_squared_error') epochs_hist = model.fit(x_train, y_train, epochs = 1000, batch_size = 3, verbose = 1)
Score:0.62047
あまり有効ではない
データをいじるしかないようだ。
SalePriceと関連がある上位20項目に絞って学習
途中まであるデータを全て使っていたが絞るようにする。
# 学習対象外を外す # COLUMNS = list(train. columns) # COLUMNS.remove('SalePrice') # 学習対処のみ抽出 COLUMNS = ['OverallQual', 'GrLivArea', 'GarageCars', 'GarageArea', 'TotalBsmtSF', '1stFlrSF', 'FullBath', 'TotRmsAbvGrd', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'Fireplaces', 'BsmtFinSF1', 'Foundation', 'LotFrontage', 'WoodDeckSF', '2ndFlrSF', 'OpenPorchSF', 'SaleCondition', 'HalfBath']
Score:0.19811
一気に上がった。
つまり闇雲に引数を増やしたり学習回数を増やしたりしても意味がないらしい。
SalePriceと関連がある上位10項目に絞って学習
Score:0.20771
もう少し増やすといい?
SalePriceと関連がある上位30項目に絞って学習
heatmapで出した時にマイナスにならなければOK?
Score:0.20771
というわけでもなかった。
データが0~1に入るよう変換
from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler scaler=MinMaxScaler() # 説明変数と目的変数に分割 y_train = train["SalePrice"].values # 学習対処のみ抽出 COLUMNS = ['OverallQual', 'GrLivArea', 'GarageCars', 'GarageArea', 'TotalBsmtSF', '1stFlrSF', 'FullBath', 'TotRmsAbvGrd', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'Fireplaces', 'BsmtFinSF1', 'Foundation', 'LotFrontage', 'WoodDeckSF', '2ndFlrSF', 'OpenPorchSF', 'SaleCondition', 'HalfBath'] x_train = train[COLUMNS].values scale = StandardScaler() x_train = scaler.fit_transform(x_train) x_test = test[COLUMNS].values x_test = scaler.fit_transform(x_test) # x_train, x_test, y_train, y_test = train_test_split(x_train , y_train, test_size=0.25, random_state=42) len(COLUMNS) x_train.shape[1]
Score:0.27746
むしろ悪くなっている。
事前に手動でやっていたからだろうか?
まだ説明変数を計算して増やすなど打てる手はあるがとりあえずここまで。
知識が全然足りていないのでもう少しどうにかしないといけない。