My Blog

All blogs, in their entirety, are posted on my Medium profile. For those without a paid description, follow the friend links provided as part of the postings below.

  • Joe Robinson

Get set up to work remotely in PyCharm in < 5 minutes!

Connection enables local machine to sync with one that is remote upon changes being made on either.
Configure once, then connect each time!

PyCharm, for me, is a great IDE — complete with features that promote effective programming, a community devoted to sharing clever plug-ins, and, my favorite trait, Professional licenses are free to students. With this, JetBrains toolbox with its many IDEs (one for most modern computing language) is available to students free (no strings attached). Bravo, JetBrains! Free for students is a service that more products should embrace.

Read the complete blog on Medium (link).

Updated: Mar 13

Families In the Wild (FIW), the largest and most comprehensive dataset for visual kinship recognition.

Problem Formulation

The goal of kinship verification is to determine whether a pair of faces of different subjects are kin of a specific type, like parent-child. This is a classical Boolean problem with system responses being either KIN or NON-KIN. That is, true or false, respectively, and formulating the one-vs-one paradigm of automatic kinship recognition.

Given a pair of faces, the task here would be to determine whether or not either is a father-son pair.
Given a pair of faces, the task here would be to determine whether or not either is a father-son pair.

Finish the blog on Medium, Friend Link.

  • Joe Robinson

Updated: Mar 13

As far as time in manual labor, preparing data for an ML pipeline more often than not takes the majority. Furthermore, building or extending a database usually costs astronomical amounts of time, subtasks, and attention to detail. The latter led me to find a great command-line tool for cleaning out duplicates and near-duplicates, especially when used with iTerm2 (or iTerm) — namely imgdupes.

Note that the aim here is to introduce imgdupes. See reference for the technical details of specifications, algorithms, options, and such (or stay tuned for a future post on the details).

Problem Statement: De-Duplicating an Image Set.

My situation while building a facial image database was as follows: a directory of multiple directories, and with each subdirectory containing images for the respective class. This is a common scenario in ML tasks, as many renowned datasets follow such convention: separate class samples by directory for both convenience and as explicit labels. Thus, I was cleaning face data, and the identity of the faces within named the subdirectories.

Knowing there were several duplicates and near-duplicates (e.g., neighboring video frames), and that this was not good for the problem I aim to solve, I needed an algorithm or tool to find duplicates. Precisely, I needed a tool to discover, display, and prompt to delete all duplicate images. I was fortunate to stumble upon a wonderful python-based command-line tool called imgdupes.

Friend Link to finish reading on Medium.