Monthly Archives: January 2020

TPS = connections / latency

If one connection supports 5TP, then 5 connection supports 25TPS. In another word, if a connection supports 5TPS, that means request latency is 200ms. We have:

TPS = number_of_connection / request_latency

So TPS relates to TPS per connection and number of connection.

How to find max TPS?

  1. set connection to 1, literally increase TPS from 0 to 10, 20, 40, …. Find the service max TPS
  2. set connection to 2, literally increase TPS from 0 to 10, 20, 40, …. Find the service max TPS
  3. set connection to 4, literally increase TPS from 0 to 10, 20, 40, …. Find the service max TPS
  4. set connection to 8, literally increase TPS from 0 to 10, 20, 40, …. Find the service max TPS
  5. set connection to 16, literally increase TPS from 0 to 10, 20, 40, …. Find the service max TPS

In that case, find the max TPS by different connection.

RDD vs Dataframe

DataFrame, is like a table in database. It has schema to describe the data. We can easily manipulate 2 dataframes just like sql in relational database. You can do groupBy, count, sort, join, where.

RDD, resilient distributed dataset. collection of elements partitioned across the nodes of the cluster that can be operated on in parallel. It doesn’t has optimization engine.

TCP read timeout(socket timeout), write timeout, connection timeout, HTTP request timeout

Socket, build communication between two process or application through TCP/IP protocol, layer 4 in OSI model . One acts as client process, another acts as server process. Server will do bind(), listen(), accept(), then read(), write(). Client do connect(), then read(), write().

Socket timeout:
read timeout, client hasn’t received data from server after [READ_TIMEOUT] time. Normally is SO_TIMEOUT in socket.
write timeout, client hasn’t been able to write successfully to server after [WRITE_TIMEOUT] time.
connection timeout, 3 steps handshake is not finished within [CONNECTION_TIMEOUT].
Below 3 are all on client side.
keep alive timeout(client side), it is part of HTTP Header. if client hasn’t received from server for [KEEP_ALIVE_TIMEOUT], client sends a request to server to see if server is still alive. If yes, client will still wait; if not, client will close the HTTP connection.

request timeout(server side HTTP), client needs to periodically send data to server. If server hasn’t received data from client for [REQUEST_TIMEOUT] time, server drops the connection.

TTL is the number in a IP packet. Meaning how long time this packet can be alive and router doesn’t drop it.

References:
https://stackoverflow.com/questions/2735883/relation-between-http-keep-alive-duration-and-tcp-timeout-duration

Read-after-write model vs eventually consistency model

Below is the read-after-write model:

GET /key-prefix/cool-file.jpg 404
PUT /key-prefix/cool-file.jpg 200
GET /key-prefix/cool-file.jpg 200

As we can see, file doesn’t exist at beginning. As long as PUT is done, GET always return successfully.

Below is eventually consistency model:

GET /key-prefix/cool-file.jpg 404
PUT /key-prefix/cool-file.jpg 200
GET /key-prefix/cool-file.jpg 404

File doesn’t exist, after PUT is done, GET may or may not return successfully. File is propagating.

References:
https://codeburst.io/quick-explanation-of-the-s3-consistency-model-6c9f325e3f82