Best place to file your taxes and why?

My husband wants to file his taxes at a tax filing place instead of using taxact like he did last year. He wants to do that because he wants to get his return back in the same day. What is the best…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Time Series Project

Greater Austin Area Investments

Once more I find myself at, as I said in my last blog post, my least favorite part of a project. But as I write what is to be my fourth blog post, I can say that I no longer feel like it’s my least favorite part. Recording myself presenting it, and then presenting it to my instructor has taken that crown.

Don’t get me wrong, I’m still not a huge fan of writing this, but there are worse things.

So what have I gotten myself into this time? Honestly, I’m still not entirely sure, but I’m at least comfortable with my results, even if I’m still not certain of the workflow to get to them.

The goal of this project is to analyze an area, in this case the Greater Austin Area, and determine the five best zip codes to invest in. For the purposes of this project, I treated “best” as providing the largest return on investment over a five-year period. Like my other projects, this was approached from the standpoint of a consultant for an investment company.

The database provided was a CSV file with data from Zillow for various metroplexes within the United States, so I began by pulling only the data for the Greater Austin Area, which includes 30 cities in 5 counties, and a total of 71 zip codes. The data came in a wide format with each zip code as its own row and having columns with various data such as the city it was in, the county it was in, and mean home sale prices for each month from April of 1994 through April of 2018. For ease of access I first formatted into a long format, with each zip code as its own column, and the months as rows, and then created subsets of the data that obtained the mean home sale prices for each month by county and across the whole region for comparison that way.

After that, I performed some EDA to look for patterns, and decomposed the county-level data to see trend, seasonality, and residual plots for each to look for stationarity.

From there I tested various SARIMA parameters to determine the best combination for my model, and then ran the model with those parameters on my region-wide time series, comparing to my ACF and PACF plots to determine how well it did.

Originally, I created a further subset of my data based on the 2008 housing market crash, and only looked at everything from 2012 and beyond. With this subset, my model’s AIC was really good, however, it had a couple of outliers both in the PACF plot and in the final results that I wasn’t able to explain, so I went back and re-ran my parameter test and model on the full dataset to obtain results that were a lot smoother in the end.

2012–2018 Model

1994–2018 Model

As mentioned above, while there are a number of issues with this second model compared to the first, in the end I felt more comfortable with this one, because the first model ended with some pretty drastic outliers that I couldn’t explain, and wasn’t comfortable with. So with that said, here are the…

These results are all from the second model, using the full dataset.

Interestingly, 78733 was one of the worst zip codes in the first model, but for whatever reason it ended up surpassing 78612 which was the best in the first model.

As you can see from the title of that plot, the top five zip codes for a 5-year return on investment are 78733, 78612, 78616, 78731, and 78741.

I feel like I have a lot left to learn about this particular subject. I stumbled my way through most of this project, using various examples to help figure out what I needed to do next and, in some cases, how to do it. I’m happy with my results, but not happy with my workflow, and I absolutely need to develop a workflow that fits with my thought process, because the one I used had me feeling confused for a large portion of the project.

I also need to go back and determine A: Why I got those outliers in my PACF plot and results with my first model, and B: How to fix those issues.

For now, I must wallow in my sadness at the fact that my zip code was the worst investment in the first model, and second worst in the second model, meaning that regardless of which model I use, this zip code is pretty horrible for investments at the moment.

Add a comment

Related posts:

Old Man Club Est 1946 73rd Birthday Shirt

The only threat is the Old Man Club Est 1946 73rd Birthday Shirt misleading facts and it’s been the west starting wars. He’s just playing Chess with peoples loved one’s, so he and his BUTTIES can…

Analysis of UC Admissions by Source School

This project is ongoing, and utilizes Jupyter Notebooks, pandas and NumPy, among other data science packages for python. The code for this project is viewable here. Below is the most recent version…

The Norwegian Air Shuttle to the Twilight Zone

A balding man with a long ginger(ish) beard sits next to his Russian wife who is twice his height. They are conversing with a blond woman and her mother, who sports a leopard print t-shirt. “This…