xml schema validation runs very slow on aws lambda python runtime

Bug #1929126 reported by richard wu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Invalid
Undecided
Unassigned

Bug Description

I have a test code where it loads a set xsd files and validates against a xml string, when it runs on my linux box using python3.8, it finishes in milliseconds, but when it runs aws lambda python3.8 runtime, it finishes in 15 seconds. any thoughts what could be the reason.

Revision history for this message
richard wu (rwu888) wrote :
Download full text (3.3 KiB)

import time
from lxml import etree
xml_in = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!-- Created with Liquid Studio 2021 (https://www.liquid-technologies.com) --> <ns2:SignalProcessingNotification acquisitionPointIdentity="C1P4HV11-1999983.dfw.720" xmlns="urn:cablelabs:md:xsd:signaling:3.0" xmlns:ns6="urn:cablelabs:iptvservices:esam:xsd:manifest:1" xmlns:ns5="urn:cablelabs:iptvservices:esam:xsd:common:1" xmlns:ns2="urn:cablelabs:iptvservices:esam:xsd:signal:1" xmlns:ns4="urn:cablelabs:md:xsd:content:3.0" xmlns:ns3="urn:cablelabs:md:xsd:core:3.0"> <ns5:Ext eventName="esamEndpoint"/> <ns2:ResponseSignal action="replace" acquisitionPointIdentity="C1P4HV11-1999983.dfw.720" acquisitionSignalID="424c514b-e4ce-4003-801e-719761563cd7"> <UTCPoint utcPoint="2021-04-30T18:40:52.821Z"/> <BinaryData signalType="SCTE35">/DBQAAAAAAAA///wFAUAAAfRf+/+T/iBBX4AUmXAAAEBAQArAQpDVUVJPJ8jKjAwAh1DVUVJAAAH0X//AABScXoBCTExMzYzNTIyMTYAAB+wVcY=</BinaryData> </ns2:ResponseSignal> </ns2:SignalProcessingNotification>'

#xml_in = '<?xml version="1.0" encoding="utf-8"?> <!-- Created with Liquid Studio 2021 (https://www.liquid-technologies.com) --> <ns4:SignalProcessingEvent xmlns:ns2="urn:cablelabs:iptvservices:esam:xsd:common:1" xmlns="urn:cablelabs:md:xsd:core:3.0" xmlns:ns4="urn:cablelabs:iptvservices:esam:xsd:signal:1" xmlns:ns3="urn:cablelabs:md:xsd:signaling:3.0" xmlns:ns5="urn:cablelabs:iptvservices:esam:xsd:manifest:1" xmlns:ns6="urn:cablelabs:md:xsd:content:3.0"> <ns2:BatchInfo batchId="B56CEA1A-3F5F-48CD-B411-8BC3609E5F16"> <ns2:Source uriId="mag2-mx5-suits"> <ns6:SourceUrl>mag2-mx5-suits</ns6:SourceUrl> </ns2:Source> </ns2:BatchInfo> <ns4:ResponseSignal action="create" acquisitionPointIdentity="mag2-mx5-suits:SCTE35:66" signalPointID="MUQyNTkzQzgtMzM3Ri00NQ==" acquisitionSignalID="424c514b-e4ce-4003-801e-719761563cd7"> <ns3:NPTPoint nptPoint="14866917.257" /> <ns3:BinaryData signalType="SCTE35">/DAlAH+DGqLsAP///wUAH2gIAOCABUFDVYAAUmXAAAAAAAAAV7AxRg==</ns3:BinaryData> </ns4:ResponseSignal> </ns4:SignalProcessingEvent>'

xml_in = xml_in.encode('ascii')

def validate(xml_path: str, xsd_path: str) -> bool:

    xmlschema_doc = etree.parse(xsd_path)
    xmlschema = etree.XMLSchema(xmlschema_doc)

    result = False
    try:
        xml_doc = etree.XML(xml_in)
    except Exception as inst:
        print(inst)
        return result

    try:
        xmlschema.assertValid(xml_doc)
        print('XML valid, schema validation ok.')
        result = True

    except etree.DocumentInvalid as err:
        print("ERROR:" + str(type(err)))
        print("ERROR:" + str(err))
        print("ERROR:" + str(dir(err)))
        print("ERROR:" + str(err.error_log))
        print("dddddddddddddddddddddddddd")
        for error in xmlschema.error_log:
            print(str(dir(error)))
            print(error.path)
            print(error.message)
    except Exception as ex:
        print(str(ex))

    ##result = xmlschema.validate(xml_doc)

    return result

tic = time.perf_counter()
if validate(xml_in, "./schema/OC-SP-ESAM-API-I03-Signal.xsd"):
#if validate("./test.xml", "./OC-SP-ESAM-API-C01-Signal.xsd"):
#if validate(xml_in, "./SCTE35.xsd"):
#if validate("./test.xml...

Read more...

Revision history for this message
richard wu (rwu888) wrote :

AWS lambda runtime has 3 GB memory 2 vCPU

Revision history for this message
scoder (scoder) wrote :

Probably not an lxml issue.

Try to find out how Lambda sets up the execution environment. Maybe it installs everything from scratch and spends 14 seconds on the setup and then a couple of milliseconds on the validation.

Revision history for this message
richard wu (rwu888) wrote :

Thanks for the replies! when it runs on baremetal, it runs within milliseconds. I actually run the same code on amazonlinux:2 on docker and install the lxml, then run the test code, it returns back in 15 seconds. it has something to do with the how the docker container schedule the vCPUs.
here is how I setup,

docker run --cpuset-cpus 0-1 --cpu-period 1000000 --cpu-quota 200000 --memory 2G --memory-swap 2G --security-opt seccomp=unconfined -it --rm --entrypoint /bin/bash -v /tmp:/mnt amazonlinux:2

amazon-linux-extras enable python3.8

python3.8 -m pip install lxml

python3.8 testsxd.py

bash-4.2# python3.8 testsxd.py
XML valid, schema validation ok.
Valid! :)
finished in 15.0521 seconds

Revision history for this message
richard wu (rwu888) wrote :

missed one steps above:

yum install python3.8

Revision history for this message
scoder (scoder) wrote :

Closing since there is nothing lxml can do here. Schema validation is implemented in libxml2.

Changed in lxml:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.