Skip to content

Knn#21

Open
pasaunders wants to merge 22 commits into
masterfrom
knn
Open

Knn#21
pasaunders wants to merge 22 commits into
masterfrom
knn

Conversation

@pasaunders
Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread src/knn.py
"""Calcute the distance between two rows."""
dist = 0.0
for i in range(len(row1) - 1):
dist += (row1[i] - row2[i]) ** 2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing out on using the power of Numpy (or pandas) here to broadcast mathematical operations. If row1 and row2 are numpy arrays, then you could just have

return sqrt(np.sum((row1 - row2)**2))

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Written this way to account for the difference in length between rows. Test data is submitted without a "classification" column. Present data has such columns.

Comment thread src/knn.py Outdated

def predict(self, test_data, tk=None):
"""Given data, categorize the data by its k nearest neighbors."""
if tk is None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's tk?

Comment thread src/knn.py Outdated
for row in self.data.iterrows():
distances.append((row[1][-1], self._distance(row[1], test_data)))
distances.sort(key=lambda x: x[1])
# import pdb; pdb.set_trace()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corpse code

Comment thread src/knn.py Outdated
if my_class:
return my_class
else:
self.predict(test_data, tk - 1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused as to why this has to be recursive

Copy link
Copy Markdown
Owner

@CCallahanIV CCallahanIV Feb 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Written for the case in which the classification is a "tie" between two classes. In that case, the classify function returns None and therefore predict is run once again with a decreased k value. This is based on my interpretation of the algorithm in the class notes. Does not mean I didn't interpret it incorrectly, though.

https://codefellows.github.io/sea-python-401d5/lectures/k_nearest_neighbors.html?highlight=nearest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants