Viewing a single comment thread. View all comments

SkinnyJoshPeck t1_isu0ui7 wrote

I hear ya; I think the point is less about proficiency and more about mastery -- in my case, I was marked down heavily since I didn't use iloc. Something like

df[df.col < 10]
vs
df[df.iloc[:, 0] < 10]

because I guess it makes it more clear to the reader, and it protects the code from explicit column names; the fact that I didn't use it made me seem like I didn't know pandas well.

to your point, though, I see the importance in the infrastructure. In this case, it was for an ml scientist role where I wouldn't actually be doing any of the MLOps, just designing and tuning the models.

16

phb07jm t1_isudd5v wrote

Can someone please explain why the second is preferable? I would always do the first because it's more likely that the position of a column will change than the name.

21

silvershadow t1_isurezr wrote

Change the iloc to a loc and then I would maybe see the argument.

.iloc and .loc explicitly return the original data frame, while [] indexing can in some cases return a copy. Pandas makes no promises on what you get

So depending on what the full expression was the criticism of using [] inducing could make sense. You’d need to see the full context of what OP was writing though.

From the sounds of what they wrote though, this is not the thinking the interviewer was following.

11

chief167 t1_isuculx wrote

Ok yeah well that's stupid. Because I am actually in favour of column names instead of indexes. Indexes are pain in the ass when your incoming dataframe changes, it creates an implicit dependency.

But your last line is my point. You shouldn't be concerned about MLops stuff, but if your models is already in the right framework, it saves soooo much time

11

monkeyunited t1_isufjc7 wrote

That’s dumb and violates the “explicit is better than implicit” rule.

8