Skip to content

Module cleanup#377

Draft
VamatoHD wants to merge 2 commits into
kaifcodec:mainfrom
VamatoHD:cleanup
Draft

Module cleanup#377
VamatoHD wants to merge 2 commits into
kaifcodec:mainfrom
VamatoHD:cleanup

Conversation

@VamatoHD

Copy link
Copy Markdown
Collaborator

As I was looking at newer modules, I found they need some optimizations. For instance, Result.available(extras) already ignores the fields that are None, therefore checks like the following are unnecessary:

extra = {}
if u.get("id"): extra["id"] = u.get("id")
if u.get("name"): extra["name"] = u.get("name")
if u.get("username"): extra["username"] = u.get("username")
if u.get("title"): extra["title"] = u.get("title")
if u.get("last_posted_at"): extra["last_posted"] = u.get("last_posted_at")
if u.get("last_seen_at"): extra["last_seen"] = u.get("last_seen_at")
if u.get("created_at"): extra["registered"] = u.get("created_at")

and can be collapsed into:

extra = {
    "id": u.get("id"),
    "name": u.get("name"),
    "username": u.get("username"),
    "title": u.get("title"),
    "last_posted": u.get("last_posted_at"),
    "last_seen": u.get("last_seen_at"),
    "registered": u.get("created_at"),
}

There are still a lot of files that require this change.

@VamatoHD VamatoHD added the help wanted Extra attention is needed label Jun 16, 2026
@kaifcodec

Copy link
Copy Markdown
Owner

@VamatoHD Thank you for pointing that out!

While the Result class does clean up None values, those if blocks are actually there to catch specific falsy values like 0 or False. When writing the data extraction logic, for about 10 of these modules, I noticed that if a user hasn't set fields like name or location, the servers sometimes return them as booleans (False) or integers (0) instead of None or an empty string.

If we blindly remove the if checks right now, the console tree outputs would start displaying clutter like name: False or id: 0.

However, I completely agree that the repetitive if checks make the code look bloated, and it’s something I've been meaning to clean up. A better long-term approach would be to update the Result class's filtering logic to handle these falsy values first then clean it.

@VamatoHD

Copy link
Copy Markdown
Collaborator Author

@kaifcodec I've added a couple of helper functions:

def add_extra_unchecked(self, key: str, value: Any): - Adds an extra without checking for 0 and False, only for None and empty strings;

def add_extra(self, key: str, value: Any): - Adds an extra, while checking for 0 and False - Default behavior for Result.available(), Result.taken() and Result.error();


def add_extras_unchecked(self, extras: dict[str, Any]):
def add_extras(self, extras: dict[str, Any]):

This are just wrappers that loop the dictionary and run for each key/value pair.

@kaifcodec

Copy link
Copy Markdown
Owner

@VamatoHD Appreciate the effort on this! Moving the filtering logic to the Result class is definitely the move, saves us a lot of headache with the False/0 stuff.

Though tbh, adding 4 new helper methods feels a bit over-engineered for just processing a simple dictionary. It adds too much extra stuff to the class API when we really just want one consistent rule across the whole project anyway. Also, let's just stick strictly to extra instead of allowing both extra and extras to keep our parameter naming clean.

We can actually get the exact same result by just dropping a clean check directly inside the original block without needing all the extra functions:

if "extra" in kwargs and isinstance(kwargs["extra"], dict):
    for key, value in kwargs["extra"].items():
        # Drop None, False, 0, and empty strings safely
        if value is None or value is False or value == 0 or value == "":
            continue
            
        # Drop whitespace-only strings
        if isinstance(value, str) and not value.strip():
            continue

        clean_key = key.strip().rstrip(":").strip().replace(" ", "_").lower()
        if not clean_key:
            continue

        if not isinstance(value, (bool, int)):
            value = str(value)

        self.extra[clean_key] = value

After this implementation we can safely remove the if checks and pass the raw dict to Result.

@VamatoHD

Copy link
Copy Markdown
Collaborator Author

@kaifcodec The problem from that is that some modules actually return 0 or False, for example:

Flickr ( __ ): Found
      ...
      ├── is_pro: False
      ├── photos_count: 0
      ├── follower_count: 1
      └── following_count: 1

Therefore, there needs to be way to allow both.

@kaifcodec

Copy link
Copy Markdown
Owner

@VamatoHD good catch there, I really forgot that.
Now the real talk is, I tried to think about many workarounds to handle both things but in no cases I can have the module structure exactly clean, no new parameters for Result.taken() and can pass raw dict without the if checks.

And when we try to implement the filtration in Result class it seems like we have to introduce new parameters and new nested helper functions.

But my preference is like we keep the structure of modules similar (i.e. no new parameters for Result.taken()) and pass the raw dict without repeatitive if blocks and Result handles filtration inside this block,

if "extra" in kwargs and isinstance(kwargs["extra"], dict):

Some approaches I imagined is like, hardcoding the key names to result.py and a check in there before passing that to .update() like below,

if "extra" in kwargs and isinstance(kwargs["extra"], dict):
    for key, value in kwargs["extra"].items():
        clean_key = key.strip().rstrip(":").strip().replace(" ", "_").lower()
        if not clean_key:
            continue

        if value is None or value == "":
            continue

        # Drop False/0 ONLY if they belong to typical profile fields
        if value is False or value == 0:
            if any(x in clean_key for x in ["name", "id", "location", "url", "user", "email"]):
                continue

        if isinstance(value, str) and not value.strip():
            continue

        if not isinstance(value, (bool, int)):
            value = str(value)

        self.extra[clean_key] = value

Check this out and tell me any other better approach if you have something on your mind which also aligns with the preferences I said earlier.

@VamatoHD

Copy link
Copy Markdown
Collaborator Author

@kaifcodec I don't like it being hardcoded as each module is unique.

The solution I see is to only have a function like add_extra_unchecked that accepts a dict, just filters None and is called by result.update and/or the module if a value can be 0 or False, something like:

def update(self, **kwargs):
    # Added "url" to the list of fields allowed for dynamic updates
    for field in ("username", "site_name", "category", "is_email", "url"):
        if field in kwargs and kwargs[field] is not None:
            setattr(self, field, kwargs[field])

    if "extra" in kwargs and isinstance(kwargs["extra"], dict):
        filtered = {k: v for k, v in kwargs["extra"].items() if v}
        self.add_extra_unchecked(filtered)

    return self

def add_extra_unchecked(self, extras: dict[str, Any]):
    for k, v in extras.items():
        if v is None or (isinstance(v, str) and not v.strip()):
            continue

        clean_key = k.strip().rstrip(":").strip().replace(" ", "_").lower()
        if not clean_key:
            continue

        if not isinstance(v, (bool, int)):
            v = str(v)

        self.extra[clean_key] = v

    return self

Each module that has values that don't need to be checked would call that function:

result.taken(extra=to_filter).add_extra_unchecked(not_to_filter)

@kaifcodec

Copy link
Copy Markdown
Owner

@VamatoHD The main issue here is developer velocity and the long-term maintenance of these modules.

If we have to split the data into two separate dictionaries, it forces module authors to spend a massive amount of time reverse-engineering every API payload. When I recently added over 80+ sites with deep data extraction logic, I had to test at least 5 different usernames per site just to see which fields were public (e.g., email, name, age, location). This is because users rarely fill out everything; one user sets their location but leaves their bio blank, while another does the opposite, meaning the API keys change or vanish entirely depending on the profile.

If we now have to manually track down which edge-case fields might return a raw boolean False or integer 0 when a user leaves data unset, we would have to test even more usernames just to figure out how to separate the keys into different functions.

That amount of friction just to add a single module isn't justified, especially since our goal is to scale up to hundreds of sites quickly. While the module-side if checks are repetitive, they at least save us from having to map out every single field's data types in advance.

I just want to avoid adding extra work to the API reverse-engineering process. It's already an intensive task to add a site cleanly without missing any fields, and we shouldn't make module creation a bottleneck.

@VamatoHD

Copy link
Copy Markdown
Collaborator Author

@kaifcodec I see what you mean, and I only have one idea left that allows both behaviors while allowing good maintenance.

It would be the implementation of the following simple class:

class Some: # The name can always change
    def __init__(self, value: Any) -> None:
        self.value = value

    def get(self) -> Any:
        return self.value

This allows for the following:

extra = {
    "registration": user_data.get("registration"),
    "gender": user_data.get("gender"),
    # The following are numbers, so they can be 0
    "userid": Some(user_data.get("userid")),
    "editcount": Some(user_data.get("editcount")),
}

Basically, it would be to wrap numbers and booleans, allowing them to bypass the check (None is always checked).

@kaifcodec

Copy link
Copy Markdown
Owner

@VamatoHD I see where you're coming from with the Some wrapper, but honestly, what I am trying to achieve might just be structurally impossible.

We are basically asking the machine to look at a raw False or 0 and magically know whether it represents "missing data" or a legitimate "is_pro: False" status. As we've both realized, Python simply cannot differentiate between them without us modifying the modules with extra wrapper functions or tracking parameters.

The issue with the Some class is that it still forces us to predict beforehand which values might return a raw falsy type when a user leaves a field blank. We would still be stuck hunting down 5 different usernames per site, one who didn't set their name, one who didn't set their bio, one who left their age blank and so on just to see if the backend returns a raw False or 0 boolean for those missing fields.

That specific reverse-engineering headache is exactly what I want to avoid. At this point, keeping the repetitive if checks in the modules is actually better and faster for developer velocity than spending hours testing and tracking down unique usernames just to map out every possible data combination for the Some() wrapper.

I think it's best we stick with the standard if checks for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

help wanted Extra attention is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants