Score: 0.18858.
概要は以下
https://trueman-developer.blogspot.com/2019/08/kaggle-house-prices.html
データの可視化は以下
https://trueman-developer.blogspot.com/2019/08/kaggle-house-prices_8.html
その他の初心者向けコンペは以下
タイタニック生存予測
https://trueman-developer.blogspot.com/2019/07/keras.html
手書き画像認識
https://trueman-developer.blogspot.com/2019/07/kaggle-digit-recognizer-keras.html
サンプルコード
カーネルに乗っけたほうがいいんだろうか?
https://github.com/ninomae-makoto/kaggle/blob/master/house-prices01.ipynb
データ読み込み
1 2 3 4 5 6 7 | import pandas as pd import numpy as np # https: //www.kaggle.com/c/house-prices-advanced-regression-techniques/data より origin_train = pd.read_csv( "./data/house-prices-advanced-regression-techniques/train.csv" ) origin_test = pd.read_csv( "./data/house-prices-advanced-regression-techniques/test.csv" ) np.random.seed( 666 ) |
不要データ削除 欠損値補完
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | train = origin_train.copy() test = origin_test.copy() del train[ 'Id' ] del train[ 'Utilities' ] del test[ 'Id' ] del test[ 'Utilities' ] train.MSSubClass = train.MSSubClass.replace([ 20 , 30 , 40 , 45 , 50 , 60 , 70 , 75 , 80 , 85 , 90 , 120 , 150 , 160 , 180 , 190 ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 ]) train.MSZoning = train.MSZoning.replace([ 'A' , 'C (all)' , 'FV' , 'I' , 'RH' , 'RL' , 'RP' , 'RM' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 ]) train.LotFrontage = train.LotFrontage.fillna(train.LotFrontage.median()) train.Street = train.Street.replace([ 'Grvl' , 'Pave' ], [ 0 , 1 ]) train.Alley = train.Alley.fillna( 'NA' ).replace([ 'Grvl' , 'Pave' , 'NA' ], [ 0 , 1 , 2 ]) train.LotShape = train.LotShape.replace([ 'Reg' , 'IR1' , 'IR2' , 'IR3' ], [ 0 , 1 , 2 , 3 ]) train.LandContour = train.LandContour.replace([ 'Lvl' , 'Bnk' , 'HLS' , 'Low' ], [ 0 , 1 , 2 , 3 ]) train.LotConfig = train.LotConfig.replace([ 'Inside' , 'Corner' , 'CulDSac' , 'FR2' , 'FR3' ], [ 0 , 1 , 2 , 3 , 4 ]) train.LandSlope = train.LandSlope.replace([ 'Gtl' , 'Mod' , 'Sev' ], [ 0 , 1 , 2 ]) train.Neighborhood = train.Neighborhood.replace( [ 'Blmngtn' , 'Blueste' , 'BrDale' , 'BrkSide' , 'ClearCr' , 'CollgCr' , 'Crawfor' , 'Edwards' , 'Gilbert' , 'IDOTRR' , 'MeadowV' , 'Mitchel' , 'Names' , 'NoRidge' , 'NPkVill' , 'NridgHt' , 'NWAmes' , 'OldTown' , 'SWISU' , 'Sawyer' , 'SawyerW' , 'Somerst' , 'StoneBr' , 'Timber' , 'Veenker' , 'NAmes' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 ]) train.Condition1 = train.Condition1.replace([ 'Artery' , 'Feedr' , 'Norm' , 'RRNn' , 'RRAn' , 'PosN' , 'PosA' , 'RRNe' , 'RRAe' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ]) train.Condition2 = train.Condition2.replace([ 'Artery' , 'Feedr' , 'Norm' , 'RRNn' , 'RRAn' , 'PosN' , 'PosA' , 'RRNe' , 'RRAe' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ]) train.BldgType = train.BldgType.replace([ '1Fam' , '2fmCon' , 'Duplex' , 'Twnhs' , 'TwnhsE' ], [ 0 , 1 , 2 , 3 , 4 ]) train.HouseStyle = train.HouseStyle.replace([ '1Story' , '1.5Fin' , '1.5Unf' , '2Story' , '2.5Fin' , '2.5Unf' , 'SFoyer' , 'SLvl' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 ]) train.RoofStyle = train.RoofStyle.replace([ 'Flat' , 'Gable' , 'Gambrel' , 'Hip' , 'Mansard' , 'Shed' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) train.RoofMatl = train.RoofMatl.replace([ 'ClyTile' , 'CompShg' , 'Membran' , 'Metal' , 'Roll' , 'Tar&Grv' , 'WdShake' , 'WdShngl' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 ]) train.Exterior1st = train.Exterior1st.replace( [ 'AsbShng' , 'AsphShn' , 'BrkComm' , 'BrkFace' , 'CBlock' , 'CemntBd' , 'HdBoard' , 'ImStucc' , 'MetalSd' , 'Other' , 'Plywood' , 'PreCast' , 'Stone' , 'Stucco' , 'VinylSd' , 'Wd Sdng' , 'WdShing' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 ]) train.Exterior2nd = train.Exterior2nd.replace([ 'Brk Cmn' , 'CmentBd' ], [ 'BrkComm' , 'CemntBd' ]).replace( [ 'AsbShng' , 'AsphShn' , 'BrkComm' , 'BrkFace' , 'CBlock' , 'CemntBd' , 'HdBoard' , 'ImStucc' , 'MetalSd' , 'Other' , 'Plywood' , 'PreCast' , 'Stone' , 'Stucco' , 'VinylSd' , 'Wd Sdng' , 'WdShing' , 'Wd Shng' , 'CmentBd' , 'Cmn' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 ]) train.MasVnrType = train.MasVnrType.fillna( 'None' ).replace([ 'BrkCmn' , 'BrkFace' , 'CBlock' , 'None' , 'Stone' ], [ 0 , 1 , 2 , 3 , 4 ]) train.MasVnrArea = train.MasVnrArea.fillna( 0 ) train.ExterQual = train.ExterQual.replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' ], [ 0 , 1 , 2 , 3 , 4 ]) train.ExterCond = train.ExterCond.replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' ], [ 0 , 1 , 2 , 3 , 4 ]) train.Foundation = train.Foundation.replace([ 'BrkTil' , 'CBlock' , 'PConc' , 'Slab' , 'Stone' , 'Wood' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) train.BsmtQual = train.BsmtQual.fillna( 'NA' ).replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) train.BsmtCond = train.BsmtCond.fillna( 'NA' ).replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) train.BsmtExposure = train.BsmtExposure.fillna( 'NA' ).replace([ 'Gd' , 'Av' , 'Mn' , 'No' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 ]) train.BsmtFinType1 = train.BsmtFinType1.fillna( 'NA' ).replace([ 'GLQ' , 'ALQ' , 'BLQ' , 'Rec' , 'LwQ' , 'Unf' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 ]) train.BsmtFinType2 = train.BsmtFinType2.fillna( 'NA' ).replace([ 'GLQ' , 'ALQ' , 'BLQ' , 'Rec' , 'LwQ' , 'Unf' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 ]) train.Heating = train.Heating.replace([ 'Floor' , 'GasA' , 'GasW' , 'Grav' , 'OthW' , 'Wall' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) train.HeatingQC = train.HeatingQC.replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' ], [ 0 , 1 , 2 , 3 , 4 ]) train.CentralAir = train.CentralAir.replace([ 'N' , 'Y' ], [ 0 , 1 ]) train.Electrical = train.Electrical.fillna( 'SBrkr' ) train.Electrical = train.Electrical.replace([ 'SBrkr' , 'FuseA' , 'FuseF' , 'FuseP' , 'Mix' ], [ 0 , 1 , 2 , 3 , 4 ]) train.KitchenQual = train.KitchenQual.replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' ], [ 0 , 1 , 2 , 3 , 4 ]) train.Functional = train.Functional.replace([ 'Typ' , 'Min1' , 'Min2' , 'Mod' , 'Maj1' , 'Maj2' , 'Sev' , 'Sal' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 ]) train.FireplaceQu = train.FireplaceQu.fillna( 'NA' ).replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) train.GarageType = train.GarageType.fillna( 'NA' ).replace([ '2Types' , 'Attchd' , 'Basment' , 'BuiltIn' , 'CarPort' , 'Detchd' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 ]) train.GarageYrBlt = train.GarageYrBlt.fillna( 0 ) train.GarageFinish = train.GarageFinish.fillna( 'NA' ).replace([ 'Fin' , 'RFn' , 'Unf' , 'NA' ], [ 0 , 1 , 2 , 3 ]) train.GarageQual = train.GarageQual.fillna( 'NA' ).replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) train.GarageCond = train.GarageCond.fillna( 'NA' ).replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) train.PavedDrive = train.PavedDrive.replace([ 'Y' , 'P' , 'N' ], [ 0 , 1 , 2 ]) train.PoolQC = train.PoolQC.fillna( 'NA' ).replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 ]) train.Fence = train.Fence.fillna( 'NA' ).replace([ 'GdPrv' , 'MnPrv' , 'GdWo' , 'MnWw' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 ]) train.MiscFeature = train.MiscFeature.fillna( 'NA' ).replace([ 'Elev' , 'Gar2' , 'Othr' , 'Shed' , 'TenC' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) train.SaleType = train.SaleType.replace([ 'WD' , 'CWD' , 'VWD' , 'New' , 'COD' , 'Con' , 'ConLw' , 'ConLI' , 'ConLD' , 'Oth' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ]) train.SaleCondition = train.SaleCondition.replace([ 'Normal' , 'Abnorml' , 'AdjLand' , 'Alloca' , 'Family' , 'Partial' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) test.MSSubClass = test.MSSubClass.replace([ 20 , 30 , 40 , 45 , 50 , 60 , 70 , 75 , 80 , 85 , 90 , 120 , 150 , 160 , 180 , 190 ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 ]) test.MSZoning = test.MSZoning.replace([ 'A' , 'C (all)' , 'FV' , 'I' , 'RH' , 'RL' , 'RP' , 'RM' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 ]) test.LotFrontage = test.LotFrontage.fillna(test.LotFrontage.median()) test.Street = test.Street.replace([ 'Grvl' , 'Pave' ], [ 0 , 1 ]) test.Alley = test.Alley.fillna( 'NA' ).replace([ 'Grvl' , 'Pave' , 'NA' ], [ 0 , 1 , 2 ]) test.LotShape = test.LotShape.replace([ 'Reg' , 'IR1' , 'IR2' , 'IR3' ], [ 0 , 1 , 2 , 3 ]) test.LandContour = test.LandContour.replace([ 'Lvl' , 'Bnk' , 'HLS' , 'Low' ], [ 0 , 1 , 2 , 3 ]) test.LotConfig = test.LotConfig.replace([ 'Inside' , 'Corner' , 'CulDSac' , 'FR2' , 'FR3' ], [ 0 , 1 , 2 , 3 , 4 ]) test.LandSlope = test.LandSlope.replace([ 'Gtl' , 'Mod' , 'Sev' ], [ 0 , 1 , 2 ]) test.Neighborhood = test.Neighborhood.replace( [ 'Blmngtn' , 'Blueste' , 'BrDale' , 'BrkSide' , 'ClearCr' , 'CollgCr' , 'Crawfor' , 'Edwards' , 'Gilbert' , 'IDOTRR' , 'MeadowV' , 'Mitchel' , 'Names' , 'NoRidge' , 'NPkVill' , 'NridgHt' , 'NWAmes' , 'OldTown' , 'SWISU' , 'Sawyer' , 'SawyerW' , 'Somerst' , 'StoneBr' , 'Timber' , 'Veenker' , 'NAmes' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 ]) test.Condition1 = test.Condition1.replace([ 'Artery' , 'Feedr' , 'Norm' , 'RRNn' , 'RRAn' , 'PosN' , 'PosA' , 'RRNe' , 'RRAe' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ]) test.Condition2 = test.Condition2.replace([ 'Artery' , 'Feedr' , 'Norm' , 'RRNn' , 'RRAn' , 'PosN' , 'PosA' , 'RRNe' , 'RRAe' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ]) test.BldgType = test.BldgType.replace([ '1Fam' , '2fmCon' , 'Duplex' , 'Twnhs' , 'TwnhsE' ], [ 0 , 1 , 2 , 3 , 4 ]) test.HouseStyle = test.HouseStyle.replace([ '1Story' , '1.5Fin' , '1.5Unf' , '2Story' , '2.5Fin' , '2.5Unf' , 'SFoyer' , 'SLvl' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 ]) test.RoofStyle = test.RoofStyle.replace([ 'Flat' , 'Gable' , 'Gambrel' , 'Hip' , 'Mansard' , 'Shed' ], [ 0 , 1 , 2 , 3 , 4 , 6 ]) test.RoofMatl = test.RoofMatl.replace([ 'ClyTile' , 'CompShg' , 'Membran' , 'Metal' , 'Roll' , 'Tar&Grv' , 'WdShake' , 'WdShngl' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 ]) test.Exterior1st = test.Exterior1st.fillna( 'VinylSd' ).replace( [ 'AsbShng' , 'AsphShn' , 'BrkComm' , 'BrkFace' , 'CBlock' , 'CemntBd' , 'HdBoard' , 'ImStucc' , 'MetalSd' , 'Other' , 'Plywood' , 'PreCast' , 'Stone' , 'Stucco' , 'VinylSd' , 'Wd Sdng' , 'WdShing' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 ]) test.Exterior2nd = test.Exterior2nd.replace([ 'Brk Cmn' , 'CmentBd' ], [ 'BrkComm' , 'CemntBd' ]).fillna( 'VinylSd' ).replace( [ 'AsbShng' , 'AsphShn' , 'BrkComm' , 'BrkFace' , 'CBlock' , 'CemntBd' , 'HdBoard' , 'ImStucc' , 'MetalSd' , 'Other' , 'Plywood' , 'PreCast' , 'Stone' , 'Stucco' , 'VinylSd' , 'Wd Sdng' , 'WdShing' , 'Wd Shng' , 'CmentBd' , 'Cmn' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 ]) test.MasVnrType = test.MasVnrType.fillna( 'None' ).replace([ 'BrkCmn' , 'BrkFace' , 'CBlock' , 'None' , 'Stone' ], [ 0 , 1 , 2 , 3 , 4 ]) test.MasVnrArea = test.MasVnrArea.fillna( 0 ) test.ExterQual = test.ExterQual.replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' ], [ 0 , 1 , 2 , 3 , 4 ]) test.ExterCond = test.ExterCond.replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' ], [ 0 , 1 , 2 , 3 , 4 ]) test.Foundation = test.Foundation.replace([ 'BrkTil' , 'CBlock' , 'PConc' , 'Slab' , 'Stone' , 'Wood' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) test.BsmtQual = test.BsmtQual.fillna( 'NA' ).replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) test.BsmtCond = test.BsmtCond.fillna( 'NA' ).replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) test.BsmtExposure = test.BsmtExposure.fillna( 'NA' ).replace([ 'Gd' , 'Av' , 'Mn' , 'No' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 ]) test.BsmtFinType1 = test.BsmtFinType1.fillna( 'NA' ).replace([ 'GLQ' , 'ALQ' , 'BLQ' , 'Rec' , 'LwQ' , 'Unf' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 ]) test.BsmtFinSF1 = test.BsmtFinSF1.fillna( 0 ) test.BsmtFinType2 = test.BsmtFinType2.fillna( 'NA' ).replace([ 'GLQ' , 'ALQ' , 'BLQ' , 'Rec' , 'LwQ' , 'Unf' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 ]) test.BsmtFinSF2 = test.BsmtFinSF2.fillna( 0 ) test.BsmtUnfSF = test.BsmtUnfSF.fillna(test.BsmtUnfSF.median()) test.TotalBsmtSF = test.TotalBsmtSF.fillna(test.TotalBsmtSF.median()) test.Heating = test.Heating.replace([ 'Floor' , 'GasA' , 'GasW' , 'Grav' , 'OthW' , 'Wall' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) test.HeatingQC = test.HeatingQC.replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' ], [ 0 , 1 , 2 , 3 , 4 ]) test.CentralAir = test.CentralAir.replace([ 'N' , 'Y' ], [ 0 , 1 ]) test.Electrical = test.Electrical.replace([ 'SBrkr' , 'FuseA' , 'FuseF' , 'FuseP' , 'Mix' ], [ 0 , 1 , 2 , 3 , 4 ]) test.BsmtFullBath = test.BsmtFullBath.fillna( 0 ) test.BsmtHalfBath = test.BsmtHalfBath.fillna( 0 ) test.KitchenQual = test.KitchenQual.fillna( 'TA' ) test.KitchenQual = test.KitchenQual.replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' ], [ 0 , 1 , 2 , 3 , 4 ]) test.Functional = test.Functional.fillna( 'Typ' ) test.Functional = test.Functional.replace([ 'Typ' , 'Min1' , 'Min2' , 'Mod' , 'Maj1' , 'Maj2' , 'Sev' , 'Sal' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 ]) test.FireplaceQu = test.FireplaceQu.fillna( 'NA' ).replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) test.GarageType = test.GarageType.fillna( 'NA' ).replace([ '2Types' , 'Attchd' , 'Basment' , 'BuiltIn' , 'CarPort' , 'Detchd' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 ]) test.GarageYrBlt = test.GarageYrBlt.fillna( 0 ) test.GarageYrBlt[ 1132 ] = test.GarageYrBlt.median() test.GarageFinish = test.GarageFinish.fillna( 'NA' ).replace([ 'Fin' , 'RFn' , 'Unf' , 'NA' ], [ 0 , 1 , 2 , 3 ]) test.GarageCars[ 1116 ] = 0 test.GarageArea[ 1116 ] = 0 test.GarageQual = test.GarageQual.fillna( 'NA' ).replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) test.GarageCond = test.GarageCond.fillna( 'NA' ).replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'Po' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) test.PavedDrive = test.PavedDrive.replace([ 'Y' , 'P' , 'N' ], [ 0 , 1 , 2 ]) test.PoolQC = test.PoolQC.fillna( 'NA' ).replace([ 'Ex' , 'Gd' , 'TA' , 'Fa' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 ]) test.Fence = test.Fence.fillna( 'NA' ).replace([ 'GdPrv' , 'MnPrv' , 'GdWo' , 'MnWw' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 ]) test.MiscFeature = test.MiscFeature.fillna( 'NA' ).replace([ 'Elev' , 'Gar2' , 'Othr' , 'Shed' , 'TenC' , 'NA' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) test.SaleType = test.SaleType.fillna( 'WD' ).replace([ 'WD' , 'CWD' , 'VWD' , 'New' , 'COD' , 'Con' , 'ConLw' , 'ConLI' , 'ConLD' , 'Oth' ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ]) test.SaleCondition = test.SaleCondition.replace([ 'Normal' , 'Abnorml' , 'AdjLand' , 'Alloca' , 'Family' , 'Partial' ], [ 0 , 1 , 2 , 3 , 4 , 5 ]) train.head() |
フォーマットを合わせる
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler scaler=MinMaxScaler() # 説明変数と目的変数に分割 y_train = train[ "SalePrice" ].values # 学習対象外を外す # COLUMNS = list(train.columns) # COLUMNS.remove( 'SalePrice' ) # 学習対処のみ抽出 COLUMNS = [ 'OverallQual' , 'GrLivArea' , 'GarageCars' , 'GarageArea' , 'TotalBsmtSF' , '1stFlrSF' , 'FullBath' , 'TotRmsAbvGrd' , 'YearBuilt' , 'YearRemodAdd' , 'MasVnrArea' , 'Fireplaces' , 'BsmtFinSF1' , 'Foundation' , 'LotFrontage' , 'WoodDeckSF' , '2ndFlrSF' , 'OpenPorchSF' , 'SaleCondition' , 'HalfBath' ] x_train = train[COLUMNS].values len(COLUMNS) x_train.shape[ 1 ] |
前回選定した上位20項目のみをx_trainに。
モデルの作成 学習
1 2 3 4 5 6 7 8 9 10 11 12 13 | import keras from keras.utils.np_utils import to_categorical from keras.models import Sequential from keras.layers import Dense, Dropout model = Sequential() model.add(Dense( 40 , input_dim=x_train.shape[ 1 ], activation= 'relu' )) model.add(Dense( 20 , activation= 'relu' )) model.add(Dense( 10 , activation= 'relu' )) model.add(Dense( 1 )) # Compile model model.compile(optimizer = 'adam' , loss = 'mean_squared_error' ) epochs_hist = model.fit(x_train, y_train, epochs = 500 , batch_size = 3 , verbose = 1 ) |
結果をcsvに出力
1 2 3 4 5 6 | predictions = model.predict( test[COLUMNS].values) origin_test[ "SalePrice" ] = np.round(predictions).astype(np. int ) # 出力する origin_test[[ "Id" , "SalePrice" ]].to_csv( "./data/house-prices-advanced-regression-techniques/out.csv" ,index=False) |
その他いろいろ試してみたこと
適当に設定
1 2 3 4 5 6 7 | model = Sequential() model.add(Dense( 80 , input_dim=x_train.shape[ 1 ], activation= 'relu' )) model.add(Dense( 160 , activation= 'relu' )) model.add(Dense( 320 , activation= 'relu' )) model.add(Dense( 1 )) model.compile(optimizer = 'adam' , loss = 'mean_squared_error' ) model.fit(x_train, y_train, epochs = 1000 , batch_size = 3 , verbose = 1 ) |
Score:0.621
もうちょいマシに
1 2 3 4 5 6 7 | model = Sequential() model.add(Dense( 10 , input_dim=x_train.shape[ 1 ], activation= 'relu' )) model.add(Dense( 20 , activation= 'relu' )) model.add(Dense( 40 , activation= 'relu' )) model.add(Dense( 1 )) model.compile(optimizer = 'adam' , loss = 'mean_squared_error' ) model.fit(x_train, y_train, epochs = 1000 , batch_size = 3 , verbose = 1 ) |
Score:0.54
Dropout追加
1 2 3 4 5 6 7 8 9 | model = Sequential() model.add(Dense( 10 , input_dim=x_train.shape[ 1 ], activation= 'relu' )) model.add(Dropout( 0.25 )) model.add(Dense( 20 , activation= 'relu' )) model.add(Dropout( 0.25 )) model.add(Dense( 40 , activation= 'relu' )) model.add(Dense( 1 )) model.compile(optimizer = 'adam' , loss = 'mean_squared_error' ) model.fit(x_train, y_train, epochs = 1000 , batch_size = 3 , verbose = 1 ) |
Score:0.84
epochs を増やす
1 2 3 4 5 6 7 8 | model = Sequential() model.add(Dense( 40 , input_dim=x_train.shape[ 1 ], activation= 'relu' )) model.add(Dense( 20 , activation= 'relu' )) model.add(Dense( 10 , activation= 'relu' )) model.add(Dense( 1 )) # Compile model model.compile(optimizer = 'adam' , loss = 'mean_squared_error' ) epochs_hist = model.fit(x_train, y_train, epochs = 10000 , batch_size = 3 , verbose = 1 ) |
Score:0.62156
全く有効ではない。このくらいのデータだと大体100~くらいでいいようだ。
層を増やす
1 2 3 4 5 6 7 8 9 | model = Sequential() model.add(Dense( 80 , input_dim=x_train.shape[ 1 ], activation= 'relu' )) model.add(Dense( 40 , activation= 'relu' )) model.add(Dense( 20 , activation= 'relu' )) model.add(Dense( 10 , activation= 'relu' )) model.add(Dense( 1 )) # Compile model model.compile(optimizer = 'adam' , loss = 'mean_squared_error' ) epochs_hist = model.fit(x_train, y_train, epochs = 1000 , batch_size = 3 , verbose = 1 ) |
Score:0.62047
あまり有効ではない
データをいじるしかないようだ。
SalePriceと関連がある上位20項目に絞って学習
途中まであるデータを全て使っていたが絞るようにする。
1 2 3 4 5 6 7 8 9 10 11 | # 学習対象外を外す # COLUMNS = list(train. columns) # COLUMNS.remove( 'SalePrice' ) # 学習対処のみ抽出 COLUMNS = [ 'OverallQual' , 'GrLivArea' , 'GarageCars' , 'GarageArea' , 'TotalBsmtSF' , '1stFlrSF' , 'FullBath' , 'TotRmsAbvGrd' , 'YearBuilt' , 'YearRemodAdd' , 'MasVnrArea' , 'Fireplaces' , 'BsmtFinSF1' , 'Foundation' , 'LotFrontage' , 'WoodDeckSF' , '2ndFlrSF' , 'OpenPorchSF' , 'SaleCondition' , 'HalfBath' ] |
Score:0.19811
一気に上がった。
つまり闇雲に引数を増やしたり学習回数を増やしたりしても意味がないらしい。
SalePriceと関連がある上位10項目に絞って学習
Score:0.20771
もう少し増やすといい?
SalePriceと関連がある上位30項目に絞って学習
heatmapで出した時にマイナスにならなければOK?
Score:0.20771
というわけでもなかった。
データが0~1に入るよう変換
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler scaler=MinMaxScaler() # 説明変数と目的変数に分割 y_train = train[ "SalePrice" ].values # 学習対処のみ抽出 COLUMNS = [ 'OverallQual' , 'GrLivArea' , 'GarageCars' , 'GarageArea' , 'TotalBsmtSF' , '1stFlrSF' , 'FullBath' , 'TotRmsAbvGrd' , 'YearBuilt' , 'YearRemodAdd' , 'MasVnrArea' , 'Fireplaces' , 'BsmtFinSF1' , 'Foundation' , 'LotFrontage' , 'WoodDeckSF' , '2ndFlrSF' , 'OpenPorchSF' , 'SaleCondition' , 'HalfBath' ] x_train = train[COLUMNS].values scale = StandardScaler() x_train = scaler.fit_transform(x_train) x_test = test[COLUMNS].values x_test = scaler.fit_transform(x_test) # x_train, x_test, y_train, y_test = train_test_split(x_train , y_train, test_size= 0.25 , random_state= 42 ) len(COLUMNS) x_train.shape[ 1 ] |
Score:0.27746
むしろ悪くなっている。
事前に手動でやっていたからだろうか?
まだ説明変数を計算して増やすなど打てる手はあるがとりあえずここまで。
知識が全然足りていないのでもう少しどうにかしないといけない。