Module cleanup#377
Conversation
|
@VamatoHD Thank you for pointing that out! While the If we blindly remove the However, I completely agree that the repetitive |
|
@kaifcodec I've added a couple of helper functions:
This are just wrappers that loop the dictionary and run for each key/value pair. |
|
@VamatoHD Appreciate the effort on this! Moving the filtering logic to the Result class is definitely the move, saves us a lot of headache with the False/0 stuff. Though tbh, adding 4 new helper methods feels a bit over-engineered for just processing a simple dictionary. It adds too much extra stuff to the class API when we really just want one consistent rule across the whole project anyway. Also, let's just stick strictly to We can actually get the exact same result by just dropping a clean check directly inside the original block without needing all the extra functions: if "extra" in kwargs and isinstance(kwargs["extra"], dict):
for key, value in kwargs["extra"].items():
# Drop None, False, 0, and empty strings safely
if value is None or value is False or value == 0 or value == "":
continue
# Drop whitespace-only strings
if isinstance(value, str) and not value.strip():
continue
clean_key = key.strip().rstrip(":").strip().replace(" ", "_").lower()
if not clean_key:
continue
if not isinstance(value, (bool, int)):
value = str(value)
self.extra[clean_key] = valueAfter this implementation we can safely remove the |
|
@kaifcodec The problem from that is that some modules actually return Therefore, there needs to be way to allow both. |
|
@VamatoHD good catch there, I really forgot that. And when we try to implement the filtration in Result class it seems like we have to introduce new parameters and new nested helper functions. But my preference is like we keep the structure of modules similar (i.e. no new parameters for Result.taken()) and pass the raw dict without repeatitive if "extra" in kwargs and isinstance(kwargs["extra"], dict):Some approaches I imagined is like, hardcoding the key names to if "extra" in kwargs and isinstance(kwargs["extra"], dict):
for key, value in kwargs["extra"].items():
clean_key = key.strip().rstrip(":").strip().replace(" ", "_").lower()
if not clean_key:
continue
if value is None or value == "":
continue
# Drop False/0 ONLY if they belong to typical profile fields
if value is False or value == 0:
if any(x in clean_key for x in ["name", "id", "location", "url", "user", "email"]):
continue
if isinstance(value, str) and not value.strip():
continue
if not isinstance(value, (bool, int)):
value = str(value)
self.extra[clean_key] = valueCheck this out and tell me any other better approach if you have something on your mind which also aligns with the preferences I said earlier. |
|
@kaifcodec I don't like it being hardcoded as each module is unique. The solution I see is to only have a function like def update(self, **kwargs):
# Added "url" to the list of fields allowed for dynamic updates
for field in ("username", "site_name", "category", "is_email", "url"):
if field in kwargs and kwargs[field] is not None:
setattr(self, field, kwargs[field])
if "extra" in kwargs and isinstance(kwargs["extra"], dict):
filtered = {k: v for k, v in kwargs["extra"].items() if v}
self.add_extra_unchecked(filtered)
return self
def add_extra_unchecked(self, extras: dict[str, Any]):
for k, v in extras.items():
if v is None or (isinstance(v, str) and not v.strip()):
continue
clean_key = k.strip().rstrip(":").strip().replace(" ", "_").lower()
if not clean_key:
continue
if not isinstance(v, (bool, int)):
v = str(v)
self.extra[clean_key] = v
return selfEach module that has values that don't need to be checked would call that function: result.taken(extra=to_filter).add_extra_unchecked(not_to_filter) |
|
@VamatoHD The main issue here is developer velocity and the long-term maintenance of these modules. If we have to split the data into two separate dictionaries, it forces module authors to spend a massive amount of time reverse-engineering every API payload. When I recently added over 80+ sites with deep data extraction logic, I had to test at least 5 different usernames per site just to see which fields were public (e.g., email, name, age, location). This is because users rarely fill out everything; one user sets their location but leaves their bio blank, while another does the opposite, meaning the API keys change or vanish entirely depending on the profile. If we now have to manually track down which edge-case fields might return a raw boolean That amount of friction just to add a single module isn't justified, especially since our goal is to scale up to hundreds of sites quickly. While the module-side I just want to avoid adding extra work to the API reverse-engineering process. It's already an intensive task to add a site cleanly without missing any fields, and we shouldn't make module creation a bottleneck. |
|
@kaifcodec I see what you mean, and I only have one idea left that allows both behaviors while allowing good maintenance. It would be the implementation of the following simple class: class Some: # The name can always change
def __init__(self, value: Any) -> None:
self.value = value
def get(self) -> Any:
return self.valueThis allows for the following: extra = {
"registration": user_data.get("registration"),
"gender": user_data.get("gender"),
# The following are numbers, so they can be 0
"userid": Some(user_data.get("userid")),
"editcount": Some(user_data.get("editcount")),
}Basically, it would be to wrap numbers and booleans, allowing them to bypass the check ( |
|
@VamatoHD I see where you're coming from with the We are basically asking the machine to look at a raw The issue with the That specific reverse-engineering headache is exactly what I want to avoid. At this point, keeping the repetitive I think it's best we stick with the standard |
As I was looking at newer modules, I found they need some optimizations. For instance,
Result.available(extras)already ignores the fields that areNone, therefore checks like the following are unnecessary:and can be collapsed into:
There are still a lot of files that require this change.