This page will contain the activity log of the pyFF+ experiments and endeavours.
Table of Contents |
---|
Memory profiling
This is the bare import and code usage of using heapy to print heap information while running python code.
...
Code Block |
---|
import xml.sax class XML(xml.sax.handler.ContentHandler): def __init__(self): self.current = etree.Element("root") self.nsmap = { 'xml': 'http://www.w3.org/XML/1998/namespace'} self.buffer = '' def startElement(self, name, attrs): attributes = {} for key, value in attrs.items(): key = key.split(':') if len(key) == 2 and: if key[0] == 'xmlns': self.nsmap[key[-1]] = value else: attributes[f"{{{ self.nsmap.get(key[0], key[0]) }}}{ key[-1] }"] = value elif value: attributes[key[-1]] = value name = name.split(':') if len(name) == 2: name = f"{{{ self.nsmap.get(name[0], name[0]) }}}{ name[-1] }" else: name = name[-1] self.current = etree.SubElement(self.current, name, attributes, nsmap=self.nsmap) def endElement(self, name): self.current.text = self.buffer self.current.tail = "\n" self.current = self.current.getparent() self.buffer = '' def characters(self, data): d = data.strip() if d: self.current.textbuffer += d def parse_xml(io, base_url=None): parser = xml.sax.make_parser() handler = XML() parser.setContentHandler(handler) parser.parse(io) return etree.ElementTree(handler.current[0]) |
...
Code Block |
---|
def process_handler(): ... # Only return request if md is valid? valid = True log.debug(f"Resource walk") for child in request.registry.md.rm.walk(): log.debug(f"Resource {child.url}") valid = valid and child.is_valid() if len(request.registry.md.rm) == 0 or not valid: log.debug(f"Resource not valid") # 500: The server has either erred or is incapable of performing the requested operation. raise exc.exception_response(500) else: log.debug(f"Resource valid") return response |
Performance-test branch
Incorporated the "store.py" changes in this branch https://github.com/IdentityPython/pyFF/compare/preformance-tests to see how that would change the memory consumption of pyFF, but it didn't change much. It ends up using ~1.8G of RES after several hours of continuously (60s) refreshing the edugain metadata feed.
The changes try to store entities as their serialized (tostring) version of the metadata, and re-parse it on demand. The idea being that we don't need to keep track of the whole parsed tree, but just the serialized entities.
Parked
https://tech.buzzfeed.com/finding-and-fixing-memory-leaks-in-python-413ce4266e7d
Size limitations
We plan to create a controlled mock metadata set containing multitudes of edugain metadata (e.g. 5k, 10k, 20k and 100k entities) to see how pyFF would cope with that amount of entities and metadata.