This publish is by Lizzie.
On the finish of a latest course I taught on Bayesian approaches (which jogs my memory I ought to weblog an replace on that) a pupil requested ‘so when can we divide up our knowledge into check and coaching?’ This stopped me somewhat as the entire course was on a workflow strategy to science and stats that I hoped hammered house the right way to achieve mechanistic insights from simulated knowledge, getting ready you for extra insights utilizing retrodictive checks on a mannequin match to your empirical knowledge, and so on.. I used to be on the spot immediately realizing some gaps and failures in my course content material. I additionally shouldn’t have stunned, as ecologists are getting into large time on machine studying (are there different makes use of for check/coaching knowledge? Sure, however that’s the dominant place this language is in use in my subject now, IMHO), and we (I) don’t step again and educate the totally different approaches.
In discussing this with a stats colleague just lately he talked about the infinite seek for automated inference. `Feed in knowledge, pull crank, get scientific inference.’ It’s the other of the workflow to me. I additionally assume it’s not going to work properly, however it’s clearly the dream, and an alarming % of ecology is dedicated to it, with out even realizing it.
Machine studying is the brand new finest hope of automated inference for ecology (and lots of different fields) with out anybody seeming to note what they’re not getting. It’s wonderful to me what number of college students appear blithely unaware of what machine studying goes to provide you — (good) predictions for out-of-sample knowledge, however a tough time discovering interpretable parameters and all of the science that may go along with them. (And, sure, I do know among the machine studying approaches are engaged on altering this.) So that they see it as the inference strategy.
The earlier finest hope of automated inference was mannequin comparability (LOO is the brand new magic, AIC was an enormous — BIG — hit, earlier than that was stepwise regression with an alarming variety of ecologists by no means studying any potential for issues with stepwise regression, however I digress) and it’s nonetheless working sturdy in some circles. Match 6 or 600 or so fashions and examine them to see which is finest. In my space, the fashions balloon since we do not know what climatic driver to incorporate. For instance, I feel water issues to bushes rising exterior, so for a precipitation variable, ought to I take advantage of complete precipitation? Our possibly simply through the rising season? Or, wait, possibly divide up rising and non-growing season. However then for the non-growing season, ought to I take advantage of snow depth? Snow water equal (SWE)? That is so exhausting, and there’s no clear reply.
Computerized inference to the rescue! You may put all of them in with mannequin comparability, together with a collection of potential interactions, and see which of them actually matter. Yay!
Did this work? Under no circumstances should you ask me. I just lately noticed a tree ring speak that did this however you possibly can inform the most effective becoming mannequin truly made no organic sense after they thought of it extra, so that they offered the ‘second finest mannequin.’ And I’m fairly positive the second and third finest mannequin have been fairly related in any comparability metric you needed to throw at them and so they might need had actually totally different solutions to how the world works. (Ecologists have tried a method round this — mannequin averaging, which I don’t assume gives a lot both.) I’m not positive why everyone seems to be doing this aside from that (1) now we have all tacitly agreed it’s okay and (2) the opposite choice appears more durable, extra unsure and possibly now we have not all tacitly agreed it’s okay.
What have we by no means gotten out of this as finest I can inform:
(a) We begin to see new patterns in what issues in these mannequin comparisons and say, ‘hey — all this work collectively actually exhibits we should always deal with SWE on this context. Thank goodness we did mannequin comparability as there isn’t any different means we’d have figured this out.’
(b) We use one thing we realized in mannequin comparability to design an experiment that teaches us one thing new. Like, ‘wow, I by no means thought excessive warmth in August can be so vital, I’ll now arrange an experiment to check the function of utmost warmth in August. I’m so glad I put that predictor — and excessive warmth in each different month and in 3-month home windows — in my mannequin so I might discover this out.’
(c) The sensation of pleasure at saying, ‘have a look at my minimal satisfactory mannequin! That is nice and so useful.’
We by no means get this stuff as a result of the outcomes are nearly at all times a large number. Everyone knows this as finest I can inform so we don’t even look intently at them as reviewers any extra.
What’s the opposite choice?
The opposite choice to me is that you simply choose your few best-guess rattling variables — those you can also make predictions about and describe the purposeful relationship of them to your response variable(s) and you set these in your mannequin. Possibly you match a number of fashions, however not infinite fashions. In my expertise, step one on this course of alone (selecting these variables) positive aspects me far more insights than any mannequin comparability ever has. Why? As a result of it’s the other of automated inference. It requires me to assume.
What’s the draw back of this different choice? One can be that we choose the unsuitable predictors and by no means see that tremendous predictor we’d have simply tossed in on mannequin comparability. However given the place 20+ years of mannequin comparability has gotten us I’m discounting this chance. The opposite — and that is what college students in my courses are actually anxious about — is that we don’t all tacitly agree that is okay. Many college students I recommend this to don’t assume it’s okay. They see how widespread mannequin comparability and its ilk are and fear they can’t get revealed with out it. They aren’t even skilled in the right way to choose these variables.
We’re so excessive on automated inference we don’t even prepare our college students to be ready for anything. And worse but, we inform them they’re doing (good) science.
With machine studying* we’re slipping even additional away from science and our coaching is getting even worse as finest I can inform. College students at UBC in knowledge science study to ‘tidy’ knowledge as if there isn’t any area experience on this course of. ‘Tidy’ means eradicating outliers, hole filling and different issues that horrify me to see college students study of their first time period. How on earth do they know what an outlier is once they don’t even know what the information are? After this they study random forests and a few easy neural nets. Science completed.
What’s the answer? I desperately hope folks smarter than me are engaged on this query. One reply is clearly elevating our requirements and discounting work that doesn’t actually give us a lot from no matter mannequin comparability they used. One other is best coaching — I feel all of us must admit that coaching has bought to vary with machine studying on the rise. Quite a lot of college students I work with now solely take knowledge science — they study solely machine studying and don’t know what a regression is or assume it’s something they use. They should see how interconnected all of the inference strategies are and what goals each works properly on for now (and never) and be ready that which may change. This appears tractable. What appears much less tractable is best coaching in science — coaching college students to know there’s no automated inference for science and getting helpful insights is definitely messier, more durable, and includes extra uncertainty than most individuals inform you (however, should you ask me, it’s additionally much more enjoyable).
*We’re someway additionally now calling most of machine studying ‘AI’ in ecology. Are different fields doing this? Why (I imply, aside from eager to sound such as you’re doing absolutely the coolest, most leading edge factor)?