The realities of working with real data are rather unique and intriguing. I find no end of strange unforeseen quirks that I’m never entirely certain how to handle. I’m working through the nuts and bolts of importing and processing my market data for FFXIV. As a side note, I must admit I underestimated R.
So, big data plan aside, I was attempting to consolidate 13 days of market data into a single data set. While most of the shenanigans revolved around navigating the ins and outs of data structure manipulation. I was, however, victorious! Kinda.
I ended up with some interesting entries like this one. My current sorting method considers these two separate entries because the price is different. In reality, it’s the same 205 units from two different time periods. The seller just dropped the price at some point. If I were to consider the entire collection as a whole data set what should be done with this? Which is more accurate, the higher listing or the lower listing? Both represent a legitimate attempt to sell goods. In reality, I don’t plan to treat all days combined as a single data set.
I’m going to run the stats on a per-day basis. Daily average unit price, total quantity, concentration, and elasticity for each good/server combination. That should allow me to see how these things are generally behaving over time and support good v. good and server v. server comparisons between data sets.
Elasticity is the odd man out though. Since it will be obtained using regression in the first place, I have to consider the value of retaining insignificant results in the data set. I could be off base here, but I believe that insignificant in the statistical sense simply indicates that there isn’t a clear relationship between Q and P. Might be worthwhile to compare the result of significant only v. all values just to see what there is to see.
Anyway, I’m off track. Having the same set of goods listed multiple times so that 205 units show up in the result as 410 units was not the only instance of something like this occurring.
I picked this example out as well, for that reason. I actually went and looked over the day-by-day data to see what was going on. A seller (not the same one as the other picture) initially posted 3 listings of 8 units each, a total of 24 units, at 210 gil. The very next day, they appear to have reduced the unit price to 147 gil. However, the goods sat there at that price for a couple of days before someone bought only a single 8 unit lot, which generated that 16 unit entry. It sat there for another day or two then someone bought another 8 unit lot, giving us the last entry, which also eventually sold.
So now I have 24 units that would become 72, 24 at 210 and 48 at 147.
I’m going to abandon that specific avenue of consolidation for now. I think if I date-stamped the records I could probably pull out the initial listing quantity and the final unit price, which would accurately read 24 units at 147 gil.
I would rather invest my time in the “original” plan first then follow up with this if I feel it needs doing.
Y’all take care, and remember to go check all your listing prices. You never know when your change might be the one that generates a blog post.