How to be an idiot at coding part infinite + 1

Use range headers to ensure not downloading complete response, when only needing a small amount of bytes.

We had some code read partial responses from a server. This code was tested thoroughly and worked as intended. We got the amount of bytes we needed from the head of a file, and then were able to parse that data as a binary blob.

However; when we put this into production - containers started crashing randomly.

Or rather the code has been in production for years without a crash, but now we were utilizing this code path many times an hour instead of a couple of times per day.

The culprit was hard to find - but it all releates to how ruby works, how networking works, and how I though it worked more like in c.

The old code did something like this:

    bytes = nil

    uri = URI(url)
    begin
      http = Net::HTTP.new(uri.host, uri.port)
      if url.start_with?('https')
        http.use_ssl = true
        http.verify_mode = OpenSSL::SSL::VERIFY_NONE
      end
      http.start do |h|
        request = Net::HTTP::Get.new(uri.request_uri)
        h.request(request) do |response|
          bytes = response.socket.read(count)
        end
      end
    rescue IOError => e
      # ignore
    end

    bytes

and the new code does something far more simple; uses the range header

    headers = {'Range' => "bytes=0-#{limit}"}
    uri = URI(url)
    response = Net::HTTP.get_response(uri, headers)

The former code retrieved the complete file response from the server, and then read the amount of bytes into a variable; the latter only requests the bytes from the server that is needed.

For small files the difference is negligible; but the larger the files get the larger the problems becomes. Response times of the former code goes up, memory usage also goes up. And since video files can be more than 100mb in size in our usage; the slow downloads combined with the large memory usage causes the OOMkiller to destroy the container before the process finishes; further re-enqueing the same job - which increases the possibility that multiple jobs of this type gets handled by the same container in "the same time" - further increasing the possibility of the container crashing again.