The information uncovered could be used to: The analyses could be further enriched with store, client and item information, if available. Explore how much is spent per transactionĪll of the above can be analyzed over the whole dataset or by region, by store or by time frame.Determine which are the most popular items sold.The goal is not to get every single item right, but to showcase critical thinking and domain knowledge.įor this scenario, some of the possible paths to explore are: The answer is not closed and will depend on previous experience and domain expertise. If they are leaky, it’s best to remove them from any sort of model. It’s easy to dismiss IDs as randomly generated values, but sometimes they encode information about the target variable. Another common mistake is to make data preparations, like normalization or outlier removal on the whole dataset, prior to splitting the dataset to validate the model, which is a leak of information.
0 Comments
Leave a Reply. |