Appendix B: Agricultural Insurance Loss and Relationships to Climate across the Inland Pacific Northwest Region of the United States

Appendix B Summary Statement: Agricultural Insurance Loss and Principle Components Analysis

Appendix B documents the exploratory data analysis process, examining the relationships of agricultural commodity loss, at a county level, from 1989-2015, across the three state region of the Pacific Northwest (Washington, Idaho, and Oregon), and then focuses in on the 24 county region of the Inland Pacific Northwest (IPNW).

Here we perform a variety of Principle Components Analyses (PCA) to better understand the combined effects of differing damage causes, commodities, counties, and years on overall loss.

DATA PREPARATION

Step 1. Data Preparation. Here we load our agricultural insurance loss data and prepare it for PCA Analysis.

Step 2. Data Preparation. Original Insurance Loss Dataset - Pacific Northwest

Step 3. Data Preparation. Aggregated Insurance Loss Dataset - Pacific Northwest

Step 4. 24-county inland Pacific Northwest (iPNW) Study Area.

Pacific Northwest PCA

Step 5. PCA: PNW insurance loss, by COUNTY with COMMODITY loadings: 2001 to 2015

Step 6. PCA: PNW insurance loss, by YEAR with DAMAGE CAUSE loadings: 2001 to 2015

Step 7. PCA: PNW insurance loss, by MONTH with DAMAGE CAUSE loadings: 2001 to 2015

Step 8. PCA: PNW WHEAT insurance loss, by YEAR with DAMAGE CAUSE loadings: 2001 to 2015

Step 9. PCA: PNW WHEAT insurance loss, by COUNTY with DAMAGE CAUSE loadings: 2001 to 2015

Step 10. PCA: PNW APPLES insurance loss, by COUNTY with DAMAGE CAUSE loadings: 2001 to 2015

Step 11. PCA: PNW BARLEY insurance loss, by COUNTY with DAMAGE CAUSE loadings: 2001 to 2015

Inland Pacific Northwest PCA

Step 12. PCA: IPNW insurance loss by COUNTY with DAMAGE CAUSE loadings: 2001 to 2015: ALL COMMODITIES

Step 13. PCA: IPNW WHEAT insurance loss, by COUNTY with DAMAGE CAUSE loadings: 2001 to 2015: KMEANS CLUSTERING

Step 14: PCA: IPNW WHEAT insurance loss by COUNTY with DAMAGE CAUSE loadings: PC1 and PC2 table

Step 15. PCA: IPNW WHEAT loss by COUNTY with DAMAGE CAUSE loadings: Spatial Mapping of PC1 and PC2

Step 16. PCA: IPNW WHEAT insurance loss, by YEAR with DAMAGE CAUSE loadings: 2001 to 2015: KMEANS CLUSTERING

Step 17. PCA: IPNW WHEAT insurance loss, by YEAR with DAMAGE CAUSE loadings: 2001 to 2015: PC1 vs PC2 barplot

Step 18: PCA: IPNW WHEAT insurance loss by YEAR with DAMAGE CAUSE loadings: PC1 and PC2 table

Step 19: PCA: EXPERIMENTAL PCA + KMEANS: IPNW WHEAT insurance loss by COUNTY with DAMAGE CAUSE loadings: 2001 TO 2015

Step 1: Data Preparation

In order to generate any of the following tables and graphs, we download all data locally to your /tmp/seamon/seamon directory, from the seamon_dissertation_dataload.R file from the following location:

http://github.com/erichseamon/seamon_dissertation/data/

Then you will be able to re-generate any of the provided Rmarkdown files from:

http://github.com/erichseamon/seamon_dissertation/appendices/

the dataload.R file has a multitude of datasets, including:
* climatology summary data by county for the study area examined (inland Pacific Northwest);
* original RMA crop insurance data;
* state polygon shapefile;
* consumer price indexing;
* wheat yields; and
* wheat pricing

usePackage("knitr")
usePackage("tab")
usePackage("car")
usePackage("RCurl")
usePackage("lme4")
usePackage("ez")
usePackage("lattice")
usePackage("ggplot2")
usePackage("coefplot2")
usePackage("broom")
usePackage("htmlTable")
usePackage("gridExtra")
usePackage("kableExtra")
usePackage("repmis")



#options(scipen=999)


#------add crop insurance data

#URL <- "https://raw.githubusercontent.com/erichseamon/seamon_dissertation/master/data/RMA_originaldata/RMA_originaldata_txt.zip"
#destfile <- "/tmp/seamon/RMA_originaldata_txt.zip"
#download.file(URL, destfile)
#outDir<-"/tmp/seamon/RMA_originaldata/"
#unzip(destfile,exdir=outDir)


rma1989 <- read.csv("/tmp/seamon/RMA_originaldata/1989.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma1990 <- read.csv("/tmp/seamon/RMA_originaldata/1990.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma1991 <- read.csv("/tmp/seamon/RMA_originaldata/1991.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma1992 <- read.csv("/tmp/seamon/RMA_originaldata/1992.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma1993 <- read.csv("/tmp/seamon/RMA_originaldata/1993.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma1994 <- read.csv("/tmp/seamon/RMA_originaldata/1994.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma1995 <- read.csv("/tmp/seamon/RMA_originaldata/1995.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma1996 <- read.csv("/tmp/seamon/RMA_originaldata/1996.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma1997 <- read.csv("/tmp/seamon/RMA_originaldata/1997.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma1998 <- read.csv("/tmp/seamon/RMA_originaldata/1998.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma1999 <- read.csv("/tmp/seamon/RMA_originaldata/1999.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2000 <- read.csv("/tmp/seamon/RMA_originaldata/2000.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2001 <- read.csv("/tmp/seamon/RMA_originaldata/2001.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2002 <- read.csv("/tmp/seamon/RMA_originaldata/2002.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2003 <- read.csv("/tmp/seamon/RMA_originaldata/2003.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2004 <- read.csv("/tmp/seamon/RMA_originaldata/2004.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2005 <- read.csv("/tmp/seamon/RMA_originaldata/2005.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2006 <- read.csv("/tmp/seamon/RMA_originaldata/2006.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2007 <- read.csv("/tmp/seamon/RMA_originaldata/2007.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2008 <- read.csv("/tmp/seamon/RMA_originaldata/2008.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2009 <- read.csv("/tmp/seamon/RMA_originaldata/2009.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2010 <- read.csv("/tmp/seamon/RMA_originaldata/2010.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2011 <- read.csv("/tmp/seamon/RMA_originaldata/2011.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2012 <- read.csv("/tmp/seamon/RMA_originaldata/2012.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2013 <- read.csv("/tmp/seamon/RMA_originaldata/2013.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2014 <- read.csv("/tmp/seamon/RMA_originaldata/2014.txt", sep = "|", header = FALSE, strip.white = TRUE)
rma2015 <- read.csv("/tmp/seamon/RMA_originaldata/2015.txt", sep = "|", header = FALSE, strip.white = TRUE)


#load individual files from 1989 to 2000
RMA_1989 <- rbind(rma1989, rma1990, rma1991, rma1992, rma1993, rma1994, rma1995, rma1996, rma1997, rma1998, rma1999, rma2000)

#filter columns
RMA_1989 <- data.frame(RMA_1989[,1], RMA_1989[,3], RMA_1989[,5], RMA_1989[,7], RMA_1989[,12], RMA_1989[,13], RMA_1989[,14], RMA_1989[,15])

#set column names
colnames(RMA_1989) <- c("year", "state", "county", "commodity", "damagecause", "monthcode", "month", "loss")

#load individual files from 2001 to 2015
RMA <- rbind(rma2001, rma2002, rma2003, rma2004, rma2005, rma2006, rma2007, rma2008, rma2009, rma2010, rma2011, rma2012, rma2013, rma2014, rma2015)

#filter columns
RMA <- data.frame(RMA[,1], RMA[,3], RMA[,5], RMA[,7], RMA[,12], RMA[,13], RMA[,14], RMA[,15], RMA[,16])

#set column names
colnames(RMA) <- c("year", "state", "county", "commodity", "damagecause", "monthcode", "month", "acres", "loss")

#subset 2001 to 2015 for ID, WA, and OR
RMA_PNW <- subset(RMA, state == "WA" | state == "OR" | state == "ID" )

#subset 1989 to 2000 for ID, WA, and OR
RMA_PNW_1989 <- subset(RMA_1989, state == "WA" | state == "OR" | state == "ID" )

#calculate loss per acre
RMA_PNW$lossperacre <- RMA_PNW$loss / RMA_PNW$acres

#remove records with missing months
RMA_PNW <- subset(RMA_PNW, month != "")

#set a crop year column (i.e. water year)
RMA_PNW$cropyear <- RMA_PNW$year

#make oct/nov/dec part of the crop year
RMA_PNW$cropyear[RMA_PNW$monthcode >= 10] <- RMA_PNW$year + 1

#remove 2016
RMA_PNW <- subset(RMA_PNW, year != 2016)


#aggregate by year/state/county/commodity, for total loss in $
RMA_commodity_loss <- aggregate(RMA$loss, by = list(RMA$year, RMA$state, RMA$county, RMA$commodity), FUN = sum)
colnames(RMA_commodity_loss) <- c("year", "state", "county", "commodity", "loss")
RMA_commodity_loss <- subset(RMA_commodity_loss, commodity != "ADJUSTED GROSS REVENUE")

#aggregate by year/state/county/commodity for total number of claims
RMA_commodity_count <- aggregate(RMA$loss, by = list(RMA$year, RMA$state, RMA$county, RMA$commodity), FUN = length)
colnames(RMA_commodity_count) <- c("year", "state", "county", "commodity", "count")
RMA_commodity_count <- subset(RMA_commodity_count, commodity != "ADJUSTED GROSS REVENUE")

#aggregate for year/state/county/commodity for acreage
RMA_commodity_acres <- aggregate(RMA$acres, by = list(RMA$year, RMA$state, RMA$county, RMA$commodity), FUN = sum)
colnames(RMA_commodity_acres) <- c("year", "state", "county", "commodity", "acres")
RMA_commodity_acres <- subset(RMA_commodity_acres, commodity != "ADJUSTED GROSS REVENUE")

#aggregate for year/state/county/damage cause for total loss in $
RMA_damage_loss <- aggregate(RMA$loss, by = list(RMA$year, RMA$state, RMA$county, RMA$commodity, RMA$damagecause), FUN = sum)
colnames(RMA_damage_loss) <- c("year", "state", "county", "commodity", "damagecause", "loss")

#aggregate for year/state/county/commodity/damage cause by total loss in $
RMA_damage_count <- aggregate(RMA$loss, by = list(RMA$year, RMA$state, RMA$county, RMA$commodity, RMA$damagecause), FUN = length)
colnames(RMA_damage_count) <- c("year", "state", "county", "commodity", "damagecause", "count")

#aggregate by year/state/county/commodity/damage cause by acres
RMA_damage_acres <- aggregate(RMA$acres, by = list(RMA$year, RMA$state, RMA$county, RMA$commodity, RMA$damagecause), FUN = sum)
colnames(RMA_damage_acres) <- c("year", "state", "county", "commodity", "damaecause", "acres")

#combine all the aggregated commodity values (loss, count, acreage)
RMA_commodity_combined <- cbind(RMA_commodity_loss, RMA_commodity_count$count, RMA_commodity_acres$acres)
colnames(RMA_commodity_combined) <- c("year", "state", "county", "commodity", "loss", "count", "acres")

#calculate lossperacre for aggregated commodity
RMA_commodity_combined$lossperacre <- RMA_commodity_combined$loss / RMA_commodity_combined$acres

#calculate loss per claim for aggregated commodity
RMA_commodity_combined$lossperclaim <- RMA_commodity_combined$loss / RMA_commodity_combined$count

#calculate acres per claim for aggregated commodity
RMA_commodity_combined$acresperclaim <- RMA_commodity_combined$acres / RMA_commodity_combined$count

#combine all aggregated damage cause values (loss, count, acreage) 
RMA_damage_combined <- cbind(RMA_damage_loss, RMA_damage_count$count, RMA_damage_acres$acres)
colnames(RMA_damage_combined) <- c("year", "state", "county", "commodity", "damagecause", "loss", "count", "acres")

#calculate lossperacre for aggregated damage cause
RMA_damage_combined$lossperacre <- RMA_damage_combined$loss / RMA_damage_combined$acres

#calculate loss per claim for aggregated damage cause
RMA_damage_combined$lossperclaim <- RMA_damage_combined$loss / RMA_damage_combined$count

#calculate acres per claim for aggregated damage cause
RMA_damage_combined$acresperclaim <- RMA_damage_combined$acres / RMA_damage_combined$count

#set NAs for loss per acre for damage cause combo to zero.  
RMA_damage_combined$lossperacre[!is.finite(RMA_damage_combined$lossperacre)] <- 0

#subset for PNW - 2001 - 2015
RMA_damage_PNW <- subset(RMA_damage_combined, state == "WA" | state == "OR" | state == "ID")

#remove all other counties values
RMA_damage_PNW <- subset(RMA_damage_PNW, county != "All Other Counties")

#subset just for Idaho
RMA_Idaho <- subset(RMA_damage_PNW, state == "ID")

#subset just for Oregon
RMA_Oregon <- subset(RMA_damage_PNW, state == "OR")

#subset just for Washington
RMA_Washington <- subset(RMA_damage_PNW, state == "WA")

#subset just for iPNW counties for Idaho
RMA_Idaho_IPNW <- subset(RMA_Idaho, county == "Idaho" | county == "Lewis" | county ==  "Nez Perce" | county ==  "Latah" | county ==  "Benewah")

#subset just for iPNW counties for Oregon
RMA_Oregon_IPNW <- subset(RMA_Oregon, county == "Wasco" | county ==  "Sherman" | county ==  "Gilliam" | county ==  "Morrow" | county ==  "Umatilla" | county ==  "Union" | county ==  "Wallowa")

#subset just for iPNW counties for Washington
RMA_Washington_IPNW <- subset(RMA_Washington, county ==  "Douglas" | county ==  "Grant" | county ==  "Benton" | county ==  "Franklin" | county ==  "Walla Walla" | county ==  "Adams" | county ==  "Lincoln" | county ==  "Spokane" | county ==  "Whitman" | county ==  "Columbia" | county ==  "Garfield" | county ==  "Asotin")


#combine all iPNW counties
RMA_damage_IPNW <- rbind(RMA_Idaho_IPNW,RMA_Oregon_IPNW,RMA_Washington_IPNW)

 palousecounties    <-  c("Idaho", "Lewis", "Nez Perce", "Clearwater", "Latah", "Benewah", "Kootenai","Douglas", "Grant", "Benton", "Franklin", "Walla Walla", "Adams", "Lincoln", "Spokane", "Whitman", "Columbia", "Garfield", "Asotin","Wasco", "Sherman", "Gilliam", "Morrow", "Umatilla", "Union", "Wallowa")
 
sumloss_palouse <- RMA_damage_PNW[RMA_damage_PNW$county %in% palousecounties, ] 
sumloss_nonpalouse <- RMA_damage_PNW[!RMA_damage_PNW$county %in% palousecounties, ]

#----

Step 2. Original Insurance Loss Dataset - Pacific Northwest

In Step 2, we document the following table, which is an example of the original data that was acquired from the USDA Risk Management Agency (RMA). Each record represents an individual insurance claim.

knitr::kable(RMA_PNW[1:10,], format="markdown") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

	year	state	county	commodity	damagecause	monthcode	month	acres	loss	lossperacre	cropyear
11338	2001	ID	Ada	All Other Crops	Drought	9	SEP	17.000	153.00	9.000000	2001
11339	2001	ID	Ada	All Other Crops	Heat	8	AUG	105.200	5249.00	49.895437	2001
11340	2001	ID	Ada	All Other Crops	Freeze	4	APR	125.000	4500.00	36.000000	2001
11341	2001	ID	Ada	All Other Crops	Wind/Excess Wind	5	MAY	50.000	1800.00	36.000000	2001
11342	2001	ID	Ada	All Other Crops	Wind/Excess Wind	4	APR	92.500	3330.00	36.000000	2001
11343	2001	ID	Bannock	WHEAT	Drought	8	AUG	133.000	1212.00	9.112782	2001
11344	2001	ID	Bannock	WHEAT	Drought	9	SEP	777.520	24807.00	31.905289	2001
11345	2001	ID	Bannock	WHEAT	Drought	7	JUL	3529.754	54726.46	15.504327	2001
11346	2001	ID	Bannock	WHEAT	Heat	7	JUL	19.796	2371.60	119.801980	2001
11347	2001	ID	Bannock	WHEAT	Heat	8	AUG	25.000	904.00	36.160000	2001

Step 3. Aggregated Insurance Loss Dataset - Pacific Northwest

In Step 3, the following table is an example of an aggregated dataset derived from the original insurance loss csv files. Here we have summarized claims by year, county, commodity, and damage cause. Each unique combination is summarized, which includes the total summarized loss, the number of claims, the total summarized acreage, loss per acre, loss per claim, and acres per claim. This dataset was the basis for our data examination.

knitr::kable(RMA_damage_PNW[1:10,], format="markdown") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

	year	state	county	commodity	damagecause	loss	count	acres	lossperacre	lossperclaim	acresperclaim
11	2014	ID	Ada	All Other Crops	Area Plan Crops Only	1398	1	0	0.0000000	1398.0	0
12	2015	ID	Ada	All Other Crops	Area Plan Crops Only	11810	1	0	0.0000000	11810.0	0
211	2008	OR	Baker	All Other Crops	Area Plan Crops Only	15292	2	5482	2.7894929	7646.0	2741
212	2010	OR	Baker	All Other Crops	Area Plan Crops Only	1819	2	2282	0.7971078	909.5	1141
215	2009	ID	Bannock	All Other Crops	Area Plan Crops Only	2284	1	600	3.8066667	2284.0	600
245	2008	ID	Bear Lake	All Other Crops	Area Plan Crops Only	8102	1	2468	3.2828201	8102.0	2468
246	2012	ID	Bear Lake	All Other Crops	Area Plan Crops Only	7459	1	2468	3.0222853	7459.0	2468
291	2010	ID	Bingham	All Other Crops	Area Plan Crops Only	4197	1	192	21.8593750	4197.0	192
292	2011	ID	Bingham	All Other Crops	Area Plan Crops Only	7769	1	240	32.3708333	7769.0	240
293	2012	ID	Bingham	All Other Crops	Area Plan Crops Only	6786	1	168	40.3928571	6786.0	168

Step 4: Inland Pacific Northwest (iPNW) Study Area

In Step 4, we document the 24 county iPNW study area.

usePackage("data.table")
usePackage("maptools")
usePackage("classInt")
usePackage("leaflet")
usePackage("dplyr")
usePackage("raster")
usePackage("htmltools")

#URL <- "https://raw.githubusercontent.com/erichseamon/seamon_dissertation/master/data/states/states_conus.zip"
#destfile <- "/tmp/seamon/states_conus.zip"
#download.file(URL, destfile)
#outDir<-"/tmp/seamon"
#unzip(destfile,exdir=outDir)

states <- readShapePoly('/tmp/seamon/states_conus/states_conus.shp',
                        proj4string=CRS
                        ("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
projection = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")



#setwd("/dmine/data/counties/")


#URL <- "https://raw.githubusercontent.com/erichseamon/seamon_dissertation/master/data/counties/threestate_palouse.zip"
#destfile <- "/tmp/seamon/threestate_palouse.zip"
#download.file(URL, destfile)
#outDir<-"/tmp/seamon"
#unzip(destfile,exdir=outDir)

counties <- readShapePoly('/tmp/seamon/threestate_palouse/threestate_palouse.shp',
                     proj4string=CRS
                     ("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
projection = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")


#pal <- colorNumeric(palette = c("white", "orange", "darkorange", "red", "darkred"),
#                    domain = counties$NAME)
exte <- as.vector(extent(counties))

#label <- paste(sep = "<br/>", counties$NAME, round(counties$NAME, 0))
#markers <- data.frame(label)
#labs <- as.list(counties$NAME)

counties <- counties[counties$NAME != "Kootenai",]
counties <- counties[counties$NAME != "Clearwater",]

      lat_long <- coordinates(counties)

 counties_a <- data.frame(counties)
      labs <- lapply(seq(nrow(counties_a)), function(i) {
        paste0(as.character(counties_a[i, "NAME"]))
      })

#leaflet(data = counties, options = leafletOptions(minZoom = 6, maxZoom = 6)) %>%  addProviderTiles("Stamen.TonerLite") %>% fitBounds(exte[1], exte[3], exte[2], exte[4]) %>% addPolygons(color = "blue",  weight = 1) %>%

      
#--


mapz <- leaflet(data = counties, options = leafletOptions(zoomControl = FALSE, zoomSnap = .4, zoomDelta = .4)) %>%  
  addProviderTiles(providers$Hydda.Base) %>%
   addProviderTiles(providers$Stamen.TonerLines) %>% 
  
 
  
  fitBounds(exte[1], exte[3], exte[2], exte[4]) %>% addPolygons(color = "lightyellow",  fillOpacity = .8, weight = 1) %>%  addPolygons(data = states, fillOpacity = 0, weight = 4, stroke = TRUE, smoothFactor = 0.5, color = "black") %>%  addPolygons(color = "gray",  fillOpacity = 0, weight = 2) %>%  

  
 # addMiniMap(
#    tiles = providers$Stamen.TonerLite,
#    position = 'topleft', 
#    width = 175, height = 175,
#    toggleDisplay = FALSE) %>%
  
  
addLabelOnlyMarkers(data = counties, lng = lat_long[,1], lat = lat_long[,2], label = lapply(labs, HTML), labelOptions = labelOptions(style = list("font-family" = "serif"), noHide = TRUE, direction = 'middle', textOnly = TRUE, textsize = "16px", color = "white",  offset = c(20, 0), markerOptions(riseOnHover = TRUE, col = "white")
)) 
        
addScaleBar(mapz, position = c("topright"), options = scaleBarOptions())

#markerOptions(map, riseOnHover = TRUE)
 # addLegend(colors = ~NAME, labels = ~NAME, values = ~NAME, opacity = 0.5, title = NULL,
  #          position = "bottomright")

Step 5: PCA: PNW insurance loss, by COUNTY with COMMODITY loadings: 2001 to 2015

In Step 5, we perform a PCA for the insurance loss for the entire PNW, by county, with commodity as the factor loadings. Data from 2001 is 2015 is used. We additionally have generated a scree plot that shows the proportion of variance explained by the individual components.

usePackage("ggplot2")
usePackage("ggfortify")
usePackage("reshape2")

#PNW by commodity

RMA_damage_PNW_transformed_commodity <- dcast(RMA_damage_PNW, county + state ~ commodity, value.var = c("loss"), sum)
RMA_damage_PNW_transformed_commodity$statecounty <- paste(RMA_damage_PNW_transformed_commodity$county, "_", RMA_damage_PNW_transformed_commodity$state, sep="")
rownames(RMA_damage_PNW_transformed_commodity) <- RMA_damage_PNW_transformed_commodity$statecounty
RMA_damage_PNW_exogenous_commodity <- RMA_damage_PNW_transformed_commodity[,3:41]

RMA_damage_PNW_exogenous_commodity <- RMA_damage_PNW_exogenous_commodity[-1]
RMA_damage_PNW_exogenous_commodity <- RMA_damage_PNW_exogenous_commodity[-20]
RMA_damage_PNW_exogenous_commodity <- RMA_damage_PNW_exogenous_commodity[-37]


fit <- princomp(RMA_damage_PNW_exogenous_commodity, cor=TRUE)
#ggplot2::autoplot(prcomp(RMA_damage_PNW_exogenous_commodity, scale = TRUE, center = TRUE), data = RMA_damage_PNW_transformed_commodity, colour = 'county', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 3, legend = FALSE) + geom_text(vjust=-1, label=rownames(RMA_damage_PNW_exogenous_commodity)) + theme(legend.position = "none")

ggplot2::autoplot(prcomp(RMA_damage_PNW_exogenous_commodity, center = TRUE, scale = TRUE), frame = TRUE, frame.type = 'norm', data = RMA_damage_PNW_transformed_commodity, colour = 'state', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 2, legend = FALSE)  + theme_bw()   + geom_text(vjust=-1, label=rownames(RMA_damage_PNW_exogenous_commodity)) + theme(legend.position = "none") + ggtitle("Principle Components: PNW insurance loss\nby county with commodity loadings ") + theme(plot.title = element_text(size =28, family = "Serif")) + theme(axis.title.y = element_text(family = "Serif", size=16), axis.title.x = element_text(family = "Serif", size = 16), axis.text.x = element_text(size=rel(1.5), hjust = 1, family = "Serif"), axis.text.y = element_text(size=rel(1.5), hjust = 1, family = "Serif"))

#pca_RMA_damage_IPNW_exogenous$rotation

#plot(pca_RMA_damage_IPNW_exogenous$rotation)

std_dev <- fit$sdev

pr_var <- std_dev^2

#pr_var[1:10]

prop_varex <- pr_var/sum(pr_var)

plot(prop_varex, xlab = "Principal Component",
             ylab = "Proportion of Variance Explained",
             type = "b", main = "Scree Plot")

Step 6: PCA: PNW insurance loss, by YEAR with DAMAGE CAUSE loadings: 2001 to 2015

In Step 6, we perform a PCA for the insurance loss for the entire PNW, for all commodities by year, with damage cause as the factor loadings. Data from 2001 is 2015 is used. We additionally have generated a scree plot that shows the proportion of variance explained by the individual components.

usePackage("ggplot2")
usePackage("ggfortify")
usePackage("reshape2")

RMA_damage_PNW_transformed_year <- dcast(RMA_damage_PNW, year ~ damagecause, value.var = c("loss"), sum)
#RMA_damage_PNW_transformed_year$statecounty <- paste(RMA_damage_PNW_transformed_year$county, "_", RMA_damage_PNW_transformed_year$state, sep="")
rownames(RMA_damage_PNW_transformed_year) <- RMA_damage_PNW_transformed_year$year
RMA_damage_PNW_exogenous_year <- RMA_damage_PNW_transformed_year[,2:31]

#PNW using year as group - with damage cause as loadings
#fit <- princomp(RMA_damage_PNW_exogenous_year, cor=TRUE, scale = TRUE, center = TRUE)

RMA_damage_PNW_exogenous_year_factor <- cbind(RMA_damage_PNW_exogenous_year[,2:3], RMA_damage_PNW_exogenous_year[6], RMA_damage_PNW_exogenous_year[8], RMA_damage_PNW_exogenous_year[,11:14], RMA_damage_PNW_exogenous_year[16:17])

fit <- princomp(RMA_damage_PNW_exogenous_year_factor, cor=TRUE)

ggplot2::autoplot(prcomp(RMA_damage_PNW_exogenous_year_factor, center = TRUE, scale = TRUE),  data = RMA_damage_PNW_exogenous_year_factor, colour = rownames(RMA_damage_PNW_exogenous_year_factor), loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 2, legend = FALSE)  + theme_bw()   + geom_text(vjust=-1, label=rownames(RMA_damage_PNW_exogenous_year_factor)) + theme(legend.position = "none") + ggtitle("Principle Components: PNW insurance loss\nby year with damage cause loadings ") + theme(plot.title = element_text(size =28, family = "Serif")) + theme(axis.title.y = element_text(family = "Serif", size=16), axis.title.x = element_text(family = "Serif", size = 16), axis.text.x = element_text(size=rel(1.5), hjust = 1, family = "Serif"), axis.text.y = element_text(size=rel(1.5), hjust = 1, family = "Serif"))

#pca_RMA_damage_IPNW_exogenous$rotation

#plot(pca_RMA_damage_IPNW_exogenous$rotation)

std_dev <- fit$sdev

pr_var <- std_dev^2

#pr_var[1:10]

prop_varex <- pr_var/sum(pr_var)

plot(prop_varex, xlab = "Principal Component",
             ylab = "Proportion of Variance Explained",
             type = "b", main = "Scree Plot")

Step 7: PCA: PNW insurance loss, by MONTH with DAMAGE CAUSE loadings: 2001 to 2015

In Step 7, we perform a PCA for the insurance loss for the entire PNW, for all commodities by month, with damage cause as the factor loadings. Data from 2001 is 2015 is used. We additionally have generated a scree plot that shows the proportion of variance explained by the individual components.

usePackage("ggplot2")
usePackage("ggfortify")
usePackage("reshape2")

#PNW damage cause as loadings using months as groups

#remove negatives related to lossperacre calc
RMA_PNW_noneg <- subset(RMA_PNW, lossperacre != Inf)
RMA_PNW_noneg2 <- subset(RMA_PNW_noneg, lossperacre >=0 )
RMA_PNW_noneg3 <- subset(RMA_PNW_noneg2, month !="" )

RMA_damage_PNW_transformed_month <- dcast(RMA_PNW_noneg3, month ~ damagecause, value.var = c("loss"), mean)

is.nan.data.frame <- function(x)
do.call(cbind, lapply(x, is.nan))

RMA_damage_PNW_transformed_month[is.nan(RMA_damage_PNW_transformed_month)] <- 0

#RMA_damage_PNW_transformed_year$statecounty <- paste(RMA_damage_PNW_transformed_year$county, "_", RMA_damage_PNW_transformed_year$state, sep="")
rownames(RMA_damage_PNW_transformed_month) <- RMA_damage_PNW_transformed_month$month
RMA_damage_PNW_exogenous_month <- RMA_damage_PNW_transformed_month[,2:31]

RMA_damage_PNW_exogenous_month <- cbind(RMA_damage_PNW_exogenous_month[,2:3], RMA_damage_PNW_exogenous_month[6], RMA_damage_PNW_exogenous_month[8], RMA_damage_PNW_exogenous_month[,11:14], RMA_damage_PNW_exogenous_month[16:17])


#ggplot2::autoplot(prcomp(RMA_damage_PNW_exogenous_year, center = TRUE, scale = TRUE), data = RMA_damage_PNW_transformed_year, colour = 'year', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 3, legend = FALSE) + geom_text(vjust=-1, label=rownames(RMA_damage_PNW_exogenous_year)) + theme(legend.position = "none")

#PNW using month as group - with damage cause as loadings
fit <- princomp(RMA_damage_PNW_exogenous_month, cor=TRUE)
ggplot2::autoplot(prcomp(RMA_damage_PNW_exogenous_month, center = TRUE, scale = TRUE),  data = RMA_damage_PNW_transformed_month, colour = 'month', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 2, legend = FALSE)  + theme_bw()   + geom_text(vjust=-1, label=rownames(RMA_damage_PNW_exogenous_month)) + theme(legend.position = "none") + ggtitle("Principle Components: PNW insurance loss\nby month with damage cause loadings ") + theme(plot.title = element_text(size =28, family = "Serif")) + theme(axis.title.y = element_text(family = "Serif", size=16), axis.title.x = element_text(family = "Serif", size = 16), axis.text.x = element_text(size=rel(1.5), hjust = 1, family = "Serif"), axis.text.y = element_text(size=rel(1.5), hjust = 1, family = "Serif"))

#pca_RMA_damage_IPNW_exogenous$rotation

#plot(pca_RMA_damage_IPNW_exogenous$rotation)

std_dev <- fit$sdev

pr_var <- std_dev^2

#pr_var[1:10]

prop_varex <- pr_var/sum(pr_var)

plot(prop_varex, xlab = "Principal Component",
             ylab = "Proportion of Variance Explained",
             type = "b", main = "Scree Plot")

Step 8: PCA: PNW WHEAT insurance loss, by YEAR with DAMAGE CAUSE loadings: 2001 to 2015

In Step 8, we perform a PCA for the insurance loss for the entire PNW, for wheat by year, with damage cause as the factor loadings. Data from 2001 is 2015 is used. We additionally have generated a scree plot that shows the proportion of variance explained by the individual components.

usePackage("ggplot2")
usePackage("ggfortify")
usePackage("reshape2")

#PNW damage cause as loadings using years as groups, JUST FOR WHEAT

RMA_damage_PNW_wheat_year <- subset(RMA_damage_PNW, commodity == "WHEAT")
RMA_damage_PNW_transformed_wheat_year <- dcast(RMA_damage_PNW_wheat_year, year ~ damagecause, value.var = c("loss"), sum)
#RMA_damage_PNW_transformed_year$statecounty <- paste(RMA_damage_PNW_transformed_year$county, "_", RMA_damage_PNW_transformed_year$state, sep="")
rownames(RMA_damage_PNW_transformed_wheat_year) <- RMA_damage_PNW_transformed_wheat_year$year
RMA_damage_PNW_exogenous_wheat_year <- RMA_damage_PNW_transformed_wheat_year
par(mar=c(8, 6.1, 4, 2.1), family = 'serif', mgp=c(4, 1, 0), las=0)

#PNW using year as group - with damage cause as loadings: JUST FOR WHEAT

RMA_damage_PNW_exogenous_wheat_year_factor <- cbind(RMA_damage_PNW_exogenous_wheat_year[,3:4], RMA_damage_PNW_exogenous_wheat_year[7], RMA_damage_PNW_exogenous_wheat_year[9], RMA_damage_PNW_exogenous_wheat_year[,12:15], RMA_damage_PNW_exogenous_wheat_year[17:18])
fit <- princomp(RMA_damage_PNW_exogenous_wheat_year_factor, cor=TRUE)

ggplot2::autoplot(prcomp(RMA_damage_PNW_exogenous_wheat_year_factor, center = TRUE, scale = TRUE),  data = RMA_damage_PNW_exogenous_wheat_year_factor, colour = rownames(RMA_damage_PNW_exogenous_wheat_year_factor), loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 2, legend = FALSE)  + theme_bw()   + geom_text(vjust=-1, label=rownames(RMA_damage_PNW_exogenous_wheat_year_factor)) + theme(legend.position = "none") + ggtitle("Principle Components: PNW WHEAT insurance loss\nby year with damage cause loadings ") + theme(plot.title = element_text(size =28, family = "Serif")) + theme(axis.title.y = element_text(family = "Serif", size=16), axis.title.x = element_text(family = "Serif", size = 16), axis.text.x = element_text(size=rel(1.5), hjust = 1, family = "Serif"), axis.text.y = element_text(size=rel(1.5), hjust = 1, family = "Serif"))

#pca_RMA_damage_IPNW_exogenous$rotation

#plot(pca_RMA_damage_IPNW_exogenous$rotation)

std_dev <- fit$sdev

pr_var <- std_dev^2

#pr_var[1:10]

prop_varex <- pr_var/sum(pr_var)

plot(prop_varex, xlab = "Principal Component",
             ylab = "Proportion of Variance Explained",
             type = "b", main = "Scree Plot")

Step 9: PCA: PNW WHEAT insurance loss, by COUNTY with DAMAGE CAUSE loadings: 2001 to 2015

In Step 9, we perform a PCA for the insurance loss for the entire PNW, for wheat by county, with damage cause as the factor loadings. Data from 2001 is 2015 is used. We additionally have generated a scree plot that shows the proportion of variance explained by the individual components.

#PNW wheat

usePackage("ggplot2")
usePackage("ggfortify")
usePackage("reshape2") 

RMA_damage_PNW_wheat <- subset(RMA_damage_PNW, commodity == "WHEAT")

RMA_damage_PNW_transformed_wheat <- dcast(RMA_damage_PNW_wheat, county + state ~ damagecause, value.var = c("loss"), sum)
RMA_damage_PNW_transformed_wheat$statecounty <- paste(RMA_damage_PNW_transformed_wheat$county, "_", RMA_damage_PNW_transformed_wheat$state, sep="")
rownames(RMA_damage_PNW_transformed_wheat) <- RMA_damage_PNW_transformed_wheat$statecounty
#RMA_damage_PNW_exogenous_wheat <- RMA_damage_PNW_transformed_wheat[,3:27]
RMA_damage_PNW_exogenous_wheat <- RMA_damage_PNW_transformed_wheat

#ggplot2::autoplot(prcomp(RMA_damage_PNW_exogenous_month), data = RMA_damage_PNW_transformed_month, colour = 'month', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 3, legend = FALSE) + geom_text(vjust=-1, label=rownames(RMA_damage_PNW_exogenous_month)) + theme(legend.position = "none")

#PNW wheat

RMA_damage_PNW_exogenous_wheat <- cbind(RMA_damage_PNW_exogenous_wheat[,4:5], RMA_damage_PNW_exogenous_wheat[8], RMA_damage_PNW_exogenous_wheat[10], RMA_damage_PNW_exogenous_wheat[,13:16], RMA_damage_PNW_exogenous_wheat[18:19])

fit <- princomp(RMA_damage_PNW_exogenous_wheat, cor=TRUE)
#ggplot2::autoplot(prcomp(RMA_damage_PNW_exogenous_wheat, scale = TRUE, center = TRUE), data = RMA_damage_PNW_transformed_wheat, colour = 'county', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 3, legend = FALSE) + geom_text(vjust=-1, label=rownames(RMA_damage_PNW_exogenous_wheat)) + theme(legend.position = "none")

ggplot2::autoplot(prcomp(RMA_damage_PNW_exogenous_wheat, center = TRUE, scale = TRUE), frame = TRUE, frame.type = 'norm', data = RMA_damage_PNW_transformed_wheat, colour = 'state', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 2, legend = FALSE)  + theme_bw()   + geom_text(vjust=-1, label=rownames(RMA_damage_PNW_exogenous_wheat)) + theme(legend.position = "none") + ggtitle("Principle Components: PNW wheat insurance loss\nby county with damage cause loadings ") + theme(plot.title = element_text(size =28, family = "Serif")) + theme(axis.title.y = element_text(family = "Serif", size=16), axis.title.x = element_text(family = "Serif", size = 16), axis.text.x = element_text(size=rel(1.5), hjust = 1, family = "Serif"), axis.text.y = element_text(size=rel(1.5), hjust = 1, family = "Serif"))

#pca_RMA_damage_IPNW_exogenous$rotation

#plot(pca_RMA_damage_IPNW_exogenous$rotation)

std_dev <- fit$sdev

pr_var <- std_dev^2

#pr_var[1:10]

prop_varex <- pr_var/sum(pr_var)

plot(prop_varex, xlab = "Principal Component",
             ylab = "Proportion of Variance Explained",
             type = "b", main = "Scree Plot")

Step 10: PCA: PNW APPLES insurance loss, by COUNTY with DAMAGE CAUSE loadings: 2001 to 2015

In Step 10, we perform a PCA for the insurance loss for the entire PNW, for apples by county, with damage cause as the factor loadings. Data from 2001 is 2015 is used. We additionally have generated a scree plot that shows the proportion of variance explained by the individual components.

#PNW apples

usePackage("ggplot2")
usePackage("ggfortify")
usePackage("reshape2")
RMA_damage_PNW_apples <- subset(RMA_damage_PNW, commodity == "APPLES")

RMA_damage_PNW_transformed_apples <- dcast(RMA_damage_PNW_apples, county + state ~ damagecause, value.var = c("loss"), sum)
RMA_damage_PNW_transformed_apples$statecounty <- paste(RMA_damage_PNW_transformed_apples$county, "_", RMA_damage_PNW_transformed_apples$state, sep="")
rownames(RMA_damage_PNW_transformed_apples) <- RMA_damage_PNW_transformed_apples$statecounty
RMA_damage_PNW_exogenous_apples <- RMA_damage_PNW_transformed_apples[,3:24]

RMA_damage_PNW_exogenous_apples <- cbind(RMA_damage_PNW_exogenous_apples[,1:4], RMA_damage_PNW_exogenous_apples[,7:10], RMA_damage_PNW_exogenous_apples[,13:14])

fit <- princomp(RMA_damage_PNW_exogenous_apples, cor=TRUE)
#PNW apples

#ggplot2::autoplot(prcomp(RMA_damage_PNW_exogenous_apples), data = RMA_damage_PNW_transformed_apples, colour = 'county', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 3, legend = FALSE) + geom_text(vjust=-1, label=rownames(RMA_damage_PNW_exogenous_apples)) + theme(legend.position = "none")

ggplot2::autoplot(prcomp(RMA_damage_PNW_exogenous_apples, center = TRUE, scale = TRUE), frame = TRUE, frame.type = 'norm', data = RMA_damage_PNW_transformed_apples, colour = 'state', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 2, legend = FALSE)  + theme_bw()   + geom_text(vjust=-1, label=rownames(RMA_damage_PNW_exogenous_apples)) + theme(legend.position = "none") + ggtitle("Principle Components: PNW apples insurance loss\nby county with damage cause loadings ") + theme(plot.title = element_text(size =28, family = "Serif")) + theme(axis.title.y = element_text(family = "Serif", size=16), axis.title.x = element_text(family = "Serif", size = 16), axis.text.x = element_text(size=rel(1.5), hjust = 1, family = "Serif"), axis.text.y = element_text(size=rel(1.5), hjust = 1, family = "Serif"))

#pca_RMA_damage_IPNW_exogenous$rotation

#plot(pca_RMA_damage_IPNW_exogenous$rotation)

std_dev <- fit$sdev

pr_var <- std_dev^2

#pr_var[1:10]

prop_varex <- pr_var/sum(pr_var)

plot(prop_varex, xlab = "Principal Component",
             ylab = "Proportion of Variance Explained",
             type = "b", main = "Scree Plot")

Step 11: PCA: PNW BARLEY insurance loss, by COUNTY with DAMAGE CAUSE loadings: 2001 to 2015

In Step 11, we perform a PCA for the insurance loss for the entire PNW, for barley by county, with damage cause as the factor loadings. Data from 2001 is 2015 is used. We additionally have generated a scree plot that shows the proportion of variance explained by the individual components.

#PNW barley

usePackage("ggplot2")
usePackage("ggfortify")
usePackage("reshape2")

RMA_damage_PNW_barley <- subset(RMA_damage_PNW, commodity == "BARLEY")

RMA_damage_PNW_transformed_barley <- dcast(RMA_damage_PNW_barley, county + state ~ damagecause, value.var = c("loss"), sum)
RMA_damage_PNW_transformed_barley$statecounty <- paste(RMA_damage_PNW_transformed_barley$county, "_", RMA_damage_PNW_transformed_barley$state, sep="")
rownames(RMA_damage_PNW_transformed_barley) <- RMA_damage_PNW_transformed_barley$statecounty
#RMA_damage_PNW_exogenous_barley <- RMA_damage_PNW_transformed_barley[,3:23]
RMA_damage_PNW_exogenous_barley <- RMA_damage_PNW_transformed_barley

RMA_damage_PNW_exogenous_barley <- cbind(RMA_damage_PNW_exogenous_barley[,3:4], RMA_damage_PNW_exogenous_barley[7], RMA_damage_PNW_exogenous_barley[9], RMA_damage_PNW_exogenous_barley[,12:15], RMA_damage_PNW_exogenous_barley[17:18])

fit <- princomp(RMA_damage_PNW_exogenous_barley, cor=TRUE)

#PNW barley

#ggplot2::autoplot(prcomp(RMA_damage_PNW_exogenous_barley), data = RMA_damage_PNW_transformed_barley, colour = 'county', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 3, legend = FALSE) + geom_text(vjust=-1, label=rownames(RMA_damage_PNW_exogenous_barley)) + theme(legend.position = "none")

ggplot2::autoplot(prcomp(RMA_damage_PNW_exogenous_barley, center = TRUE, scale = TRUE), frame = TRUE, frame.type = 'norm', data = RMA_damage_PNW_transformed_barley, colour = 'state', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 2, legend = FALSE)  + theme_bw()   + geom_text(vjust=-1, label=rownames(RMA_damage_PNW_exogenous_barley)) + theme(legend.position = "none") + ggtitle("Principle Components: PNW barley insurance loss\nby county with damage cause loadings ") + theme(plot.title = element_text(size =28, family = "Serif")) + theme(axis.title.y = element_text(family = "Serif", size=16), axis.title.x = element_text(family = "Serif", size = 16), axis.text.x = element_text(size=rel(1.5), hjust = 1, family = "Serif"), axis.text.y = element_text(size=rel(1.5), hjust = 1, family = "Serif"))

#pca_RMA_damage_IPNW_exogenous$rotation

#plot(pca_RMA_damage_IPNW_exogenous$rotation)

std_dev <- fit$sdev

pr_var <- std_dev^2

#pr_var[1:10]

prop_varex <- pr_var/sum(pr_var)

plot(prop_varex, xlab = "Principal Component",
             ylab = "Proportion of Variance Explained",
             type = "b", main = "Scree Plot")

Step 12: PCA: IPNW insurance loss by COUNTY with DAMAGE CAUSE loadings: 2001 to 2015

In Step 12, we perform a PCA for the insurance loss for the inland PNW, for all commodities by county, with damage cause as the factor loadings. Data from 2001 is 2015 is used. We additionally have generated a scree plot that shows the proportion of variance explained by the individual components.

#all PNW by damage

usePackage("ggplot2")
usePackage("ggfortify")
usePackage("reshape2")


RMA_damage_PNW_transformed <- dcast(RMA_damage_PNW, county + state ~ damagecause, value.var = c("loss"), sum)
RMA_damage_PNW_transformed$statecounty <- paste(RMA_damage_PNW_transformed$county, "_", RMA_damage_PNW_transformed$state, sep="")
rownames(RMA_damage_PNW_transformed) <- RMA_damage_PNW_transformed$statecounty
RMA_damage_PNW_exogenous <- RMA_damage_PNW_transformed[,3:32]
par(mar=c(8, 6.1, 4, 2.1), family = 'serif', mgp=c(4, 1, 0), las=0)

fit <- stats::princomp(RMA_damage_PNW_exogenous, cor=TRUE)
ggplot2::autoplot(prcomp(RMA_damage_PNW_exogenous, center = TRUE, scale = TRUE), frame = TRUE, frame.type = 'norm', data = RMA_damage_PNW_transformed, colour = 'state', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 2, legend = FALSE)  + theme_bw()   + geom_text(vjust=-1, label=rownames(RMA_damage_PNW_exogenous)) + theme(legend.position = "none") + ggtitle("Principle Components: IPNW insurance loss\nby county with damage cause loadings: 2001 to 2015") + theme(plot.title = element_text(size =28, family = "Serif")) + theme(axis.title.y = element_text(family = "Serif", size=16), axis.title.x = element_text(family = "Serif", size = 16), axis.text.x = element_text(size=rel(1.5), hjust = 1, family = "Serif"), axis.text.y = element_text(size=rel(1.5), hjust = 1, family = "Serif"))

par(mar=c(8, 6.1, 4, 2.1), family = 'serif', mgp=c(4, 1, 0), las=0)

#pca_RMA_damage_IPNW_exogenous$rotation

#plot(pca_RMA_damage_IPNW_exogenous$rotation)

std_dev <- fit$sdev

pr_var <- std_dev^2

#pr_var[1:10]

prop_varex <- pr_var/sum(pr_var)

plot(prop_varex, xlab = "Principal Component",
             ylab = "Proportion of Variance Explained",
             type = "b", main = "Scree Plot")

Step 13: PCA + KMEANS: WHEAT IPNW insurance loss, by COUNTY with DAMAGE CAUSE loadings: 2001 to 2015

In Step 13, we perform a PCA for the insurance loss for the inland PNW, for wheat by county, with damage cause as the factor loadings. Data from 2001 is 2015 is used. We additionally have generated a scree plot that shows the proportion of variance explained by the individual components.

#just the IPNW

usePackage("ggplot2")
usePackage("ggfortify")
usePackage("reshape2")

RMA_damage_IPNW$log10loss <- log10(RMA_damage_IPNW$loss)


RMA_damage_IPNW_wheat <- subset(RMA_damage_IPNW, commodity == "WHEAT")
RMA_damage_IPNW_transformed <- dcast(RMA_damage_IPNW_wheat, county + state ~ damagecause, value.var = c("loss"), sum)
RMA_damage_IPNW_transformed$statecounty <- paste(RMA_damage_IPNW_transformed$county, "_", RMA_damage_IPNW_transformed$state, sep="")
rownames(RMA_damage_IPNW_transformed) <- RMA_damage_IPNW_transformed$statecounty
RMA_damage_IPNW_exogenous <- RMA_damage_IPNW_transformed[,3:26]


#IPNW

#RMA_damage_IPNW_exogenous <- scale(RMA_damage_IPNW_exogenous, scale = TRUE, center = TRUE)
pca_RMA_damage_IPNW_exogenous <- prcomp(RMA_damage_IPNW_exogenous, scale = TRUE, center = TRUE)

RMA_damage_IPNW_exogenous_factor <- cbind(RMA_damage_IPNW_exogenous[2:3], RMA_damage_IPNW_exogenous[5:6], RMA_damage_IPNW_exogenous[8], RMA_damage_IPNW_exogenous[11], RMA_damage_IPNW_exogenous[13:14], RMA_damage_IPNW_exogenous[16:17])

fit <- prcomp(RMA_damage_IPNW_exogenous_factor, cor=TRUE, center = TRUE, scale = TRUE)

comp <- data.frame(fit$x[,1:2])
# Plot
#plot(comp, pch=16, col=rgb(0,0,0,0.5))


wss <- (nrow(RMA_damage_IPNW_exogenous_factor)-1)*sum(apply(RMA_damage_IPNW_exogenous_factor,2,var))
for (i in 1:23) wss[i] <- sum(kmeans(RMA_damage_IPNW_exogenous_factor,
                                     centers=i)$withinss)
plot(1:23, wss, type="b", xlab="Number of Clusters",
     ylab="Within groups sum of squares", main = "KMeans COUNTIES Elbow Plot")

  #identify the optimum number of clusters using NbClust
  par(mar=c(1,1,1,1))
  usePackage("NbClust")
  usePackage("factoextra")
  
  clustering_distance <- c("euclidean")
  clust<-NbClust(fit$x[,1:3], diss=NULL, distance = clustering_distance, min.nc=2, max.nc=20, method = "kmeans", index = "all")

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
##

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 6 proposed 2 as the best number of clusters 
## * 4 proposed 3 as the best number of clusters 
## * 2 proposed 4 as the best number of clusters 
## * 2 proposed 6 as the best number of clusters 
## * 1 proposed 7 as the best number of clusters 
## * 2 proposed 8 as the best number of clusters 
## * 3 proposed 17 as the best number of clusters 
## * 1 proposed 19 as the best number of clusters 
## * 3 proposed 20 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  2 
##  
##  
## *******************************************************************

  #clust<-NbClust(damage_loss[,1:(length(waterscarcity)-4)], diss=NULL, distance = clustering_distance, min.nc=2, max.nc=20, method = "kmeans", index = "all") 
  
  #res$All.index
  #res$Best.nc
  #res$All.CriticalValues
  #res$Best.partition
  
  maxclusters <- length(unique(clust$Best.partition))
  km.res <- kmeans(fit$x[,1:2], maxclusters, nstart = 25)
  #pam.res <- pam(damage_loss, maxclusters)
  hc.cut <- hcut(fit$x[,1:2], k = maxclusters, hc_method = "complete")

usePackage("RColorBrewer")
usePackage("scales")

k <- kmeans(comp, 2, nstart=25, iter.max=1000)

palette(alpha(brewer.pal(9,'Set2'), 1))
#plot(comp, col=k$clust, pch=2)


kclust <- data.frame(k$clust)
kclust$k.clust <- factor(kclust$k.clust)

ggplot2::autoplot(prcomp(RMA_damage_IPNW_exogenous_factor, scale = TRUE, center = TRUE), frame = TRUE, frame.type = 'norm', data = kclust, colour = 'k.clust', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 4, label = FALSE, label.size = 3, legend = FALSE)  + theme_bw()   + geom_text(vjust=-1, label=rownames(RMA_damage_IPNW_exogenous)) + theme(legend.position = "none") + ggtitle("Principle Components: IPNW insurance loss\nby county with damage cause loadings ") + theme(plot.title = element_text(size =28, family = "Serif")) + theme(axis.title.y = element_text(family = "Serif", size=16), axis.title.x = element_text(family = "Serif", size = 16), axis.text.x = element_text(size=rel(1.5), hjust = 1, family = "Serif"), axis.text.y = element_text(size=rel(1.5), hjust = 1, family = "Serif"))  + theme(legend.position="bottom")

#pca_RMA_damage_IPNW_exogenous$rotation

#plot(pca_RMA_damage_IPNW_exogenous$rotation)

std_dev <- fit$sdev

pr_var <- std_dev^2

#pr_var[1:10]

prop_varex <- pr_var/sum(pr_var)

plot(prop_varex, xlab = "Principal Component",
             ylab = "Proportion of Variance Explained",
             type = "b", main = "Scree Plot")

usePackage("plotly")

fit1 <- as.data.frame(fit$x[,1:3])

fit2 <- merge(fit$x, kclust, by="row.names",all.x=TRUE)

p <- plot_ly(fit2, x = ~PC2, y = ~PC1, z = ~PC3, color = ~k.clust, colors = c('#BF382A', '#0C4B8E')) 

p

Step 14: PCA: IPNW WHEAT loss by COUNTY with DAMAGE CAUSE loadings: PC1 and PC2 table

In Step 14, we summarize our PCA findings for the inland PNW, for wheat by county, using damage cause as the factor loadings. Here we plot PC1 and PC2 loadings as a table.

#--create table of PC1 and PC2 loadings

PC_table <- cbind(fit$x[,1], fit$x[,2])
colnames(PC_table) <- c("PC1", "PC2")
PC_table <- as.data.frame(PC_table)
PC_table$PC1 <- round(PC_table$PC1, 3)
PC_table$PC2 <- round(PC_table$PC2, 3)

htmlTable(PC_table,
          cgroup = c("2001-2015"),
          n.cgroup = c(2),
          caption = "Damage cause variable loadings by COUNTY",
          align = "c",
          rnames = rownames(PC_table),
          css.cell = "padding-left: .5em; padding-right: .5em;")

	2001-2015
Damage cause variable loadings by COUNTY
	PC1	PC2
Adams_WA	-3.574	1.325
Asotin_WA	1.703	0.967
Benewah_ID	1.807	-0.563
Benton_WA	1.247	1.232
Columbia_WA	1.559	0.633
Douglas_WA	0.111	0.762
Franklin_WA	1.208	1.038
Garfield_WA	1.575	0.831
Gilliam_OR	0.475	1.027
Grant_WA	0.136	0.859
Idaho_ID	1.132	-2.574
Latah_ID	1.298	-1.538
Lewis_ID	0.691	-2.392
Lincoln_WA	-2.988	1.015
Morrow_OR	-1.805	1.347
Nez Perce_ID	1.167	-0.583
Sherman_OR	1.39	0.415
Spokane_WA	0.836	-1.319
Umatilla_OR	-7.766	0.232
Union_OR	1.928	0.884
Walla Walla_WA	-2.215	-0.184
Wallowa_OR	1.918	0.579
Wasco_OR	0.908	-0.221
Whitman_WA	-2.74	-3.773

Step 15: PCA: IPNW WHEAT loss by COUNTY with DAMAGE CAUSE loadings: Spatial Mapping of PC1 and PC2

In Step 15, we summarize our PCA findings for the inland PNW, for wheat by county, using damage cause as the factor loadings. We take PC1 and PC2 loadings and plot each as a map.

#-map PC loadings

PC1_map <- as.data.frame(fit$x[,1])
PC1_map$NAME <- rownames(PC1_map)
colnames(PC1_map) <- c("PC1", "NAME")
PC1_NAME <- cbind(PC1_map, t(as.data.frame(strsplit(PC1_map$NAME, split='_', fixed=TRUE))))
colnames(PC1_NAME) <- c("PC1", "Statecounty", "NAME", "STATE")

PC1_map <- merge(counties, PC1_NAME, by=c("NAME"), all = FALSE)

PC2_map <- as.data.frame(fit$x[,2])
PC2_map$NAME <- rownames(PC2_map)
colnames(PC2_map) <- c("PC2", "NAME")
PC2_NAME <- cbind(PC2_map, t(as.data.frame(strsplit(PC2_map$NAME, split='_', fixed=TRUE))))
colnames(PC2_NAME) <- c("PC2", "Statecounty", "NAME", "STATE")

PC2_map <- merge(counties, PC2_NAME, by=c("NAME"), all = FALSE)


#PC1 map

#-----

PC1_map$PC1[is.na(PC1_map$PC1)] <- 0

 
## Make vector of colors for values smaller than 0 (20 colors)
rc1 <- colorRampPalette(colors = c("blue", "lightblue", "white"), space = "Lab")(14)

## Make vector of colors for values larger than 0 (180 colors)
rc2 <- colorRampPalette(colors = c("white", "red"), space = "Lab")(8)

## Combine the two color palettes
rampcols <- c(rc1, rc2)



# pal <- colorNumeric(colorRampPalette(c("blue", "lightblue", "white", "lightpink", "red"))(7), na.color = "#ffffff",domain = eval(parse(text=paste("PC1_map$", "PC1", sep=""))))
 pal <- colorNumeric(palette = rampcols, na.color = "#ffffff",domain = eval(parse(text=paste("PC1_map$", "PC1", sep=""))))

map <- leaflet(data = PC1_map, options = leafletOptions(zoomControl = FALSE, zoomSnap = 0.1)) %>%  
  addProviderTiles(providers$Hydda.Base) %>%
   addProviderTiles(providers$Stamen.TonerLines) %>%
  
  addMiniMap(
    tiles = providers$Stamen.TonerLite,
    position = 'topleft', 
    width = 100, height = 100,
    toggleDisplay = FALSE) %>%
  
  
   #addControl(title, position = "topleft", className="map-title") %>%
  
  
  fitBounds(exte[1], exte[3], exte[2], exte[4]) %>% addPolygons(fillColor = ~pal(PC1_map$PC1),  fillOpacity = .8, weight = 1, stroke = TRUE, smoothFactor = 0.5, color = "black") %>% addPolygons(data = states, fillOpacity = 0, weight = 4, stroke = TRUE, smoothFactor = 0.5, color = "black") %>% 
  
    addLegend(pal = pal, values = na.omit(~PC1), group = "circles", title = paste("PC1 Loadings", sep = ""), na.label = "", position = "topright") %>%

  
  #label = lapply(labs, HTML),

addLabelOnlyMarkers(data = PC2_map, lng = lat_long[,1], lat = lat_long[,2],  label = lapply(round(PC1_map$PC1, 2), HTML), labelOptions = labelOptions(style = list("font-family" = "serif"), noHide = TRUE, direction = 'middle', textOnly = TRUE, textsize = "20px", col = "white",  offset = c(20, 0), markerOptions(riseOnHover = TRUE)
)) 
        
addScaleBar(map, position = c("topright"), options = scaleBarOptions())

#-map PC loadings

PC1_map <- as.data.frame(fit$x[,1])
PC1_map$NAME <- rownames(PC1_map)
colnames(PC1_map) <- c("PC1", "NAME")
PC1_NAME <- cbind(PC1_map, t(as.data.frame(strsplit(PC1_map$NAME, split='_', fixed=TRUE))))
colnames(PC1_NAME) <- c("PC1", "Statecounty", "NAME", "STATE")

PC1_map <- merge(counties, PC1_NAME, by=c("NAME"), all = FALSE)

PC2_map <- as.data.frame(fit$x[,2])
PC2_map$NAME <- rownames(PC2_map)
colnames(PC2_map) <- c("PC2", "NAME")
PC2_NAME <- cbind(PC2_map, t(as.data.frame(strsplit(PC2_map$NAME, split='_', fixed=TRUE))))
colnames(PC2_NAME) <- c("PC2", "Statecounty", "NAME", "STATE")

PC2_map <- merge(counties, PC2_NAME, by=c("NAME"), all = FALSE)


#PC1 map

#-----

PC1_map$PC1[is.na(PC1_map$PC1)] <- 0

 
## Make vector of colors for values smaller than 0 (20 colors)
rc1 <- colorRampPalette(colors = c("blue", "lightblue", "white"), space = "Lab")(14)

## Make vector of colors for values larger than 0 (180 colors)
rc2 <- colorRampPalette(colors = c("white", "red"), space = "Lab")(8)

## Combine the two color palettes
rampcols <- c(rc1, rc2)



# pal <- colorNumeric(colorRampPalette(c("blue", "lightblue", "white", "lightpink", "red"))(7), na.color = "#ffffff",domain = eval(parse(text=paste("PC1_map$", "PC1", sep=""))))
 pal <- colorNumeric(palette = rampcols, na.color = "#ffffff",domain = eval(parse(text=paste("PC2_map$", "PC2", sep=""))))

map <- leaflet(data = PC2_map, options = leafletOptions(zoomControl = FALSE, zoomSnap = 0.1)) %>%  
  addProviderTiles(providers$Hydda.Base) %>%
   addProviderTiles(providers$Stamen.TonerLines) %>%
  
  addMiniMap(
    tiles = providers$Stamen.TonerLite,
    position = 'topleft', 
    width = 100, height = 100,
    toggleDisplay = FALSE) %>%
  
  
   #addControl(title, position = "topleft", className="map-title") %>%
  
  
  fitBounds(exte[1], exte[3], exte[2], exte[4]) %>% addPolygons(fillColor = ~pal(PC2_map$PC2),  fillOpacity = .8, weight = 1, stroke = TRUE, smoothFactor = 0.5, color = "black") %>% addPolygons(data = states, fillOpacity = 0, weight = 4, stroke = TRUE, smoothFactor = 0.5, color = "black") %>% 
  
    addLegend(pal = pal, values = na.omit(~PC2), group = "circles", title = paste("PC2 Loadings", sep = ""), na.label = "", position = "topright") %>%

  
  #label = lapply(labs, HTML),

addLabelOnlyMarkers(data = PC2_map, lng = lat_long[,1], lat = lat_long[,2],  label = lapply(round(PC2_map$PC2, 2), HTML), labelOptions = labelOptions(style = list("font-family" = "serif"), noHide = TRUE, direction = 'middle', textOnly = TRUE, textsize = "20px", col = "white",  offset = c(20, 0), markerOptions(riseOnHover = TRUE)
)) 
        
addScaleBar(map, position = c("topright"), options = scaleBarOptions())

Step 16: PCA + KMEANS: IPNW WHEAT insurance loss, by YEAR with DAMAGE CAUSE loadings: 2001 to 2015

In Step 16, we perform a PCA for the insurance loss for the inland PNW, for wheat, by year, with damage cause as the factor loadings. Data from 2001 is 2015 is used. We additionally have generated a scree plot that shows the proportion of variance explained by the individual components.

#IPNW damage cause as loadings using years as groups

usePackage("ggplot2")
usePackage("ggfortify")
usePackage("reshape2")

RMA_damage_IPNW_wheat <- subset(RMA_damage_IPNW, commodity == "WHEAT")


RMA_damage_IPNW_transformed_year <- dcast(RMA_damage_IPNW_wheat, year ~ damagecause, value.var = c("loss"), sum)
#RMA_damage_PNW_transformed_year$statecounty <- paste(RMA_damage_PNW_transformed_year$county, "_", RMA_damage_PNW_transformed_year$state, sep="")
rownames(RMA_damage_IPNW_transformed_year) <- RMA_damage_IPNW_transformed_year$year
#RMA_damage_IPNW_exogenous_year <- RMA_damage_IPNW_transformed_year[,2:28]
RMA_damage_IPNW_exogenous_year <- RMA_damage_IPNW_transformed_year

#IPNW with years as the group

RMA_damage_IPNW_exogenous_year <- cbind(RMA_damage_IPNW_exogenous_year[3:4], RMA_damage_IPNW_exogenous_year[7], RMA_damage_IPNW_exogenous_year[9], RMA_damage_IPNW_exogenous_year[,12:15], RMA_damage_IPNW_exogenous_year[18])

fit <- prcomp(RMA_damage_IPNW_exogenous_year, cor=TRUE, center = TRUE, scale = TRUE)

wss <- (nrow(RMA_damage_IPNW_exogenous_year)-1)*sum(apply(RMA_damage_IPNW_exogenous_year,2,var))
for (i in 1:14) wss[i] <- sum(kmeans(RMA_damage_IPNW_exogenous_year,
                                     centers=i)$withinss)
plot(1:14, wss, type="b", xlab="Number of Clusters",
     ylab="Within groups sum of squares", main = "Kmeans YEARS elbow plot")

usePackage("RColorBrewer")
usePackage("scales")

comp <- data.frame(fit$x[,1:3])

k <- kmeans(comp, 3, nstart=25, iter.max=1000)

palette(alpha(brewer.pal(9,'Set2'), 1))
#plot(comp, col=k$clust, pch=2)

kclust <- data.frame(k$clust)
kclust$k.clust <- factor(kclust$k.clust)

#ggplot2::autoplot(prcomp(RMA_damage_IPNW_exogenous_year, center = TRUE, scale = TRUE), colour = 'k.clust', data = kclust, loadings = TRUE, loadings.label = FALSE, loadings.label.size  = 6, label = FALSE, label.size = 5, legend = FALSE)  + theme_bw()  + geom_text(vjust=-1, label=rownames(RMA_damage_IPNW_exogenous_year)) + theme(legend.position = "none") + ggtitle("Principle Components: IPNW insurance loss\nby year with damage cause loadings") + theme(plot.title = element_text(size =28, family = "Serif")) + theme(axis.title.y = element_text(family = "Serif", size=16), axis.title.x = element_text(family = "Serif", size = 16), axis.text.x = element_text(size=rel(1.5), hjust = 1, family = "Serif"), axis.text.y = element_text(size=rel(1.5), hjust = 1, family = "Serif")) 

ggplot2::autoplot(prcomp(RMA_damage_IPNW_exogenous_year, scale = TRUE, center = TRUE), frame = TRUE, frame.type = 'norm', data = kclust, colour = 'k.clust', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 4, label = FALSE, label.size = 3, legend = FALSE)  + theme_bw()   + geom_text(vjust=-1, label=rownames(RMA_damage_IPNW_exogenous_year)) + theme(legend.position = "none") + ggtitle("Principle Components: IPNW insurance loss\nby year with damage cause loadings ") + theme(plot.title = element_text(size =28, family = "Serif")) + theme(axis.title.y = element_text(family = "Serif", size=16), axis.title.x = element_text(family = "Serif", size = 16), axis.text.x = element_text(size=rel(1.5), hjust = 1, family = "Serif"), axis.text.y = element_text(size=rel(1.5), hjust = 1, family = "Serif")) + theme(legend.position="bottom")

#ggplot2::autoplot(prcomp(RMA_damage_IPNW_exogenous, scale = TRUE, center = TRUE), data = RMA_damage_IPNW_transformed, colour = 'county', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 3, label = FALSE, label.size = 3, legend = FALSE) + geom_text(vjust=-1, label=rownames(RMA_damage_IPNW_exogenous)) + theme(legend.position = "none")

#pca_RMA_damage_IPNW_exogenous$rotation

#plot(pca_RMA_damage_IPNW_exogenous$rotation)

std_dev <- fit$sdev

pr_var <- std_dev^2

#pr_var[1:10]

prop_varex <- pr_var/sum(pr_var)

plot(prop_varex, xlab = "Principal Component",
             ylab = "Proportion of Variance Explained",
             type = "b", main = "Scree Plot")

Step 17: PCA:IPNW WHEAT insurance loss, by YEAR with DAMAGE CAUSE loadings: 2001 to 2015: PC1 vs PC2 barplot

In Step 17, we plot PC1 loadings vs PC2 by year, as a barplot, to visualize orthogonal/opposing patterns.

#--year end

fit <- prcomp(RMA_damage_IPNW_exogenous_year, cor=TRUE, center = TRUE, scale = TRUE)

PC_table_year <- cbind(fit$x[,1], fit$x[,2])
colnames(PC_table_year) <- c("PC1", "PC2")

par(mar=c(6, 6.1, 4, 2.1), family = 'serif', mgp=c(4, 1, 0), las=0)

barplot(t(PC_table_year), beside = T, las = 2, col=c("blue", "red"), cex.axis = 1.5, cex.names = 1.5)

#legend("top", c("PC1","PC2"), pch=15, 
#col=c("blue","red"), 
#bty="n")

Step 18: PCA: IPNW WHEAT loss by YEAR with DAMAGE CAUSE loadings: PC1 and PC2 table

In Step 18, we summarize our PCA findings for the inland PNW, for wheat by year, using damage cause as the factor loadings. Here we plot PC1 and PC2 loadings as a table.

#--create table of PC1 and PC2 loadings

PC_table <- cbind(fit$rotation[,1], fit$rotation[,2])
colnames(PC_table) <- c("PC1", "PC2")
PC_table <- as.data.frame(PC_table)
PC_table$PC1 <- round(PC_table$PC1, 3)
PC_table$PC2 <- round(PC_table$PC2, 3)

htmlTable(PC_table,
          cgroup = c("2001-2015"),
          n.cgroup = c(2),
          caption = "Damage cause variable loadings by YEAR",
          align = "c",
          rnames = rownames(PC_table),
          css.cell = "padding-left: .5em; padding-right: .5em;")

	2001-2015
Damage cause variable loadings by YEAR
	PC1	PC2
Cold Wet Weather	0.341	-0.455
Cold Winter	-0.374	-0.43
Drought	-0.397	-0.29
Excess Moisture/Precip/Rain	0.37	-0.445
Fire	-0.3	-0.262
Flood	0.402	-0.383
Freeze	-0.369	-0.275
Frost	-0.116	0.118
Hot Wind	-0.219	-0.146

Step 19: EXPERIMENTAL: PCA + KMEANS: WHEAT IPNW insurance loss, by COUNTY with DAMAGE CAUSE loadings: 2001 to 2015

In Step 19, we perform a PCA for the insurance loss for the inland PNW, for wheat by county, with damage cause as the factor loadings. Data from 2001 is 2015 is used. We additionally have generated a scree plot that shows the proportion of variance explained by the individual components.

#just the IPNW

usePackage("ggplot2")
usePackage("ggfortify")
usePackage("reshape2")

RMA_damage_IPNW$log10loss <- log10(RMA_damage_IPNW$loss)


RMA_damage_IPNW_wheat <- subset(RMA_damage_IPNW, commodity == "WHEAT")
RMA_damage_IPNW_transformed <- dcast(RMA_damage_IPNW_wheat, county + state + year ~ damagecause, value.var = c("loss"), sum)
RMA_damage_IPNW_transformed$statecounty <- paste(RMA_damage_IPNW_transformed$county, "_", RMA_damage_IPNW_transformed$state, "_", RMA_damage_IPNW_transformed$year, sep="")
rownames(RMA_damage_IPNW_transformed) <- RMA_damage_IPNW_transformed$statecounty
RMA_damage_IPNW_exogenous <- RMA_damage_IPNW_transformed[, -c(1,2,3,4,7,10,12,13,28)]

#RMA_damage_IPNW_exogenous <- RMA_damage_IPNW_transformed[,4:27]


#IPNW

#RMA_damage_IPNW_exogenous <- scale(RMA_damage_IPNW_exogenous, scale = TRUE, center = TRUE)
pca_RMA_damage_IPNW_exogenous <- prcomp(RMA_damage_IPNW_exogenous, scale = TRUE, center = TRUE)

RMA_damage_IPNW_exogenous_factor <- cbind(RMA_damage_IPNW_exogenous[2:3], RMA_damage_IPNW_exogenous[5:6], RMA_damage_IPNW_exogenous[8], RMA_damage_IPNW_exogenous[11], RMA_damage_IPNW_exogenous[13:14], RMA_damage_IPNW_exogenous[16:17])

fit <- prcomp(RMA_damage_IPNW_exogenous_factor, cor=TRUE, center = TRUE, scale = TRUE)

comp <- data.frame(fit$x[,1:2])
# Plot
#plot(comp, pch=16, col=rgb(0,0,0,0.5))


wss <- (nrow(RMA_damage_IPNW_exogenous)-1)*sum(apply(RMA_damage_IPNW_exogenous,2,var))
for (i in 1:23) wss[i] <- sum(kmeans(RMA_damage_IPNW_exogenous,
                                     centers=i)$withinss)
plot(1:23, wss, type="b", xlab="Number of Clusters",
     ylab="Within groups sum of squares", main = "KMeans COUNTIES Elbow Plot")

  #identify the optimum number of clusters using NbClust
  par(mar=c(1,1,1,1))
  usePackage("NbClust")
  usePackage("factoextra")
  
  clustering_distance <- c("euclidean")
  clust<-NbClust(fit$x[,1:3], diss=NULL, distance = clustering_distance, min.nc=2, max.nc=20, method = "kmeans", index = "all")

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
##

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 5 proposed 2 as the best number of clusters 
## * 5 proposed 3 as the best number of clusters 
## * 1 proposed 4 as the best number of clusters 
## * 3 proposed 5 as the best number of clusters 
## * 4 proposed 10 as the best number of clusters 
## * 4 proposed 16 as the best number of clusters 
## * 1 proposed 20 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  2 
##  
##  
## *******************************************************************

#  clust<-NbClust(damage_loss[,1:(length(waterscarcity)-4)], diss=NULL, distance = clustering_distance, min.nc=2, max.nc=20, method = "kmeans", index = "all") 
  
  #res$All.index
  #res$Best.nc
  #res$All.CriticalValues
  #res$Best.partition
  
  maxclusters <- length(unique(clust$Best.partition))
  km.res <- kmeans(RMA_damage_IPNW_exogenous, maxclusters, nstart = 25)
  #pam.res <- pam(RMA_damage_IPNW_exogenous, maxclusters)
  hc.cut <- hcut(RMA_damage_IPNW_exogenous, k = maxclusters, hc_method = "complete")

usePackage("RColorBrewer")
usePackage("scales")

k <- kmeans(fit$x, maxclusters, nstart=25, iter.max=1000)

palette(alpha(brewer.pal(9,'Set2'), 1))
#plot(comp, col=k$clust, pch=2)


kclust <- data.frame(k$clust)
kclust$k.clust <- factor(kclust$k.clust)

fit <- prcomp(RMA_damage_IPNW_exogenous, scale = TRUE, center = TRUE)

#biplot(prcomp(RMA_damage_IPNW_exogenous, scale = TRUE, center = TRUE, pc.biplot = TRUE))

ggplot2::autoplot(prcomp(RMA_damage_IPNW_exogenous, scale = TRUE, center = TRUE), frame = TRUE, frame.type = 'norm', data = kclust, colour = 'k.clust', loadings = TRUE, loadings.label = TRUE, loadings.label.size  = 4, label = FALSE, label.size = 3, legend = FALSE)  + theme_bw()   + geom_text(vjust=-1, label=rownames(RMA_damage_IPNW_exogenous)) + theme(legend.position = "none") + ggtitle("Principle Components: IPNW insurance loss\nby county with damage cause loadings ") + theme(plot.title = element_text(size =28, family = "Serif")) + theme(axis.title.y = element_text(family = "Serif", size=16), axis.title.x = element_text(family = "Serif", size = 16), axis.text.x = element_text(size=rel(1.5), hjust = 1, family = "Serif"), axis.text.y = element_text(size=rel(1.5), hjust = 1, family = "Serif"))  + theme(legend.position="bottom")

usePackage("magrittr")
usePackage("plotly")

#pca_RMA_damage_IPNW_exogenous$rotation

#plot(pca_RMA_damage_IPNW_exogenous$rotation)

std_dev <- fit$sdev

pr_var <- std_dev^2

#pr_var[1:10]

prop_varex <- pr_var/sum(pr_var)

plot(prop_varex, xlab = "Principal Component",
             ylab = "Proportion of Variance Explained",
             type = "b", main = "Scree Plot")

loads <- fit$rotation[,1:3]
fit1 <- as.data.frame(fit$rotation[,1:3])

fit2 <- merge(fit$x, kclust, by="row.names",all.x=TRUE)

#p <- plot_ly(fit2, x = ~PC2, y = ~PC1, z = ~PC3, color = ~k.clust, colors = c('#BF382A', '#0C4B8E')) 

p <- plot_ly() %>%
  add_trace(x=fit2$PC1, y=fit2$PC2, z=fit2$PC3,
            type="scatter3d", mode="markers",
            marker = list(color=fit2$k.clust, 
               colorscale = c("#FFE1A1", "#683531"), 
               opacity = 0.7)) 

for (k in 1:nrow(loads)) {
   x <- c(0, loads[k,1])
   y <- c(0, loads[k,2])
   z <- c(0, loads[k,3])
   p <- p %>% add_trace(x=x, y=y, z=z,
            type="scatter3d", mode="lines",
            line = list(width=8),
            opacity = 1) 
}

print(p)