Skip to content

random767435#266

Open
random767435 wants to merge 14 commits into
tempestphp:mainfrom
random767435:random767435
Open

random767435#266
random767435 wants to merge 14 commits into
tempestphp:mainfrom
random767435:random767435

Conversation

@random767435

@random767435 random767435 commented Mar 12, 2026

Copy link
Copy Markdown

The unique algorithm 🤘 invented for this challenge:

  • Any presented URL can be uniquely defined by its 22-character tail. Since the shortest URL (/uses) is 29 characters, the tail allows the URL index to be encoded in a similar hash table.
  • This makes it possible to process data chunks by reading them from right to left. By doing so, we can first read the date stamp (via a fixed offset), then the URL tail (also via a fixed offset), and then jump to the next line
  • Additionally both URL index and URL length are encoded in the same hash, and it allows to lookup on the one hash table only (and saves additional time).

As a result, this algorithm is ~20% more efficient than a "standard" left-to-right sliding approach using strpos (because it excludes 100M strpos calls) 😌

@random767435

Copy link
Copy Markdown
Author

/bench

@brendt

brendt commented Mar 12, 2026

Copy link
Copy Markdown
Member

Benchmarking complete! Best execution time: 2.542s

Full results:

{
  "results": [
    {
      "command": "cd /Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../.benchmark/pr-266 && php -dmax_execution_time=300 tempest data:parse --input-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data.csv\" --output-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data-actual.json\"",
      "mean": 2.55149121888,
      "stddev": 0.009021873639487562,
      "median": 2.54806082748,
      "user": 16.11734322,
      "system": 1.7361627600000002,
      "min": 2.54272507748,
      "max": 2.56613370248,
      "times": [
        2.54806082748,
        2.56613370248,
        2.55339636848,
        2.5471401184799998,
        2.54272507748
      ],
      "memory_usage_byte": [
        74219520,
        74219520,
        74219520,
        74219520,
        74219520
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0
      ]
    }
  ]
}

@random767435

Copy link
Copy Markdown
Author

/bench

@brendt

brendt commented Mar 12, 2026

Copy link
Copy Markdown
Member

Benchmarking complete! Best execution time: 2.602s

Full results:

{
  "results": [
    {
      "command": "cd /Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../.benchmark/pr-266 && php -dmax_execution_time=300 tempest data:parse --input-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data.csv\" --output-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data-actual.json\"",
      "mean": 2.6246863872799997,
      "stddev": 0.020150453537264905,
      "median": 2.6253111204799997,
      "user": 16.94691388,
      "system": 1.58959684,
      "min": 2.60268878748,
      "max": 2.64741853748,
      "times": [
        2.60268878748,
        2.6416448284799996,
        2.64741853748,
        2.60636866248,
        2.6253111204799997
      ],
      "memory_usage_byte": [
        74088448,
        74088448,
        74088448,
        74088448,
        74088448
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0
      ]
    }
  ]
}

@random767435

Copy link
Copy Markdown
Author

/bench

@random767435

Copy link
Copy Markdown
Author

/bench

@brendt

brendt commented Mar 12, 2026

Copy link
Copy Markdown
Member

Benchmarking complete! Best execution time: 2.449s

Full results:

{
  "results": [
    {
      "command": "cd /Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../.benchmark/pr-266 && php -dmax_execution_time=300 tempest data:parse --input-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data.csv\" --output-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data-actual.json\"",
      "mean": 2.4623941148799995,
      "stddev": 0.010637497703390642,
      "median": 2.46028953188,
      "user": 16.283930759999997,
      "system": 0.8963642399999999,
      "min": 2.44901794788,
      "max": 2.47798423988,
      "times": [
        2.46595994788,
        2.47798423988,
        2.44901794788,
        2.46028953188,
        2.4587189068799997
      ],
      "memory_usage_byte": [
        74334208,
        74334208,
        74334208,
        74334208,
        74334208
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0
      ]
    }
  ]
}

@brendt

brendt commented Mar 12, 2026

Copy link
Copy Markdown
Member

You didn’t optimize. You performed violence (on latency). 🔪
🏆 leaderboard.csv

@random767435

Copy link
Copy Markdown
Author

/bench

@brendt

brendt commented Mar 13, 2026

Copy link
Copy Markdown
Member

Benchmarking complete! Best execution time: 2.627s

Full results:

{
  "results": [
    {
      "command": "cd /Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../.benchmark/pr-266 && php -dmax_execution_time=300 tempest data:parse --input-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data.csv\" --output-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data-actual.json\"",
      "mean": 2.63730512572,
      "stddev": 0.0089991577442048,
      "median": 2.63655013352,
      "user": 16.62120316,
      "system": 1.97577654,
      "min": 2.62739884252,
      "max": 2.65070280052,
      "times": [
        2.63655013352,
        2.65070280052,
        2.63135471752,
        2.64051913452,
        2.62739884252
      ],
      "memory_usage_byte": [
        74088448,
        74137600,
        74137600,
        74137600,
        74285056
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0
      ]
    }
  ]
}

@random767435

Copy link
Copy Markdown
Author

/bench

@brendt

brendt commented Mar 13, 2026

Copy link
Copy Markdown
Member

Benchmarking complete! Best execution time: 2.465s

Full results:

{
  "results": [
    {
      "command": "cd /Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../.benchmark/pr-266 && php -dmax_execution_time=300 tempest data:parse --input-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data.csv\" --output-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data-actual.json\"",
      "mean": 2.4905864510799995,
      "stddev": 0.029308639510552802,
      "median": 2.4910914004799998,
      "user": 16.17893398,
      "system": 0.98564474,
      "min": 2.46529156748,
      "max": 2.53717727648,
      "times": [
        2.46576560948,
        2.46529156748,
        2.53717727648,
        2.49360640148,
        2.4910914004799998
      ],
      "memory_usage_byte": [
        74088448,
        74088448,
        74088448,
        74088448,
        74465280
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0
      ]
    }
  ]
}

@random767435

Copy link
Copy Markdown
Author

/bench

@brendt

brendt commented Mar 13, 2026

Copy link
Copy Markdown
Member

Benchmarking complete! Best execution time: 2.448s

Full results:

{
  "results": [
    {
      "command": "cd /Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../.benchmark/pr-266 && php -dmax_execution_time=300 tempest data:parse --input-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data.csv\" --output-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data-actual.json\"",
      "mean": 2.460128228,
      "stddev": 0.008552351063892696,
      "median": 2.4615851286,
      "user": 16.24836798,
      "system": 0.9338687199999999,
      "min": 2.4487572946,
      "max": 2.4702872526,
      "times": [
        2.4702872526,
        2.4653869196,
        2.4546245445999997,
        2.4615851286,
        2.4487572946
      ],
      "memory_usage_byte": [
        73842688,
        74268672,
        74268672,
        74268672,
        74268672
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0
      ]
    }
  ]
}

@random767435

Copy link
Copy Markdown
Author

/bench

@brendt

brendt commented Mar 13, 2026

Copy link
Copy Markdown
Member

Benchmarking complete! Best execution time: 2.455s

Full results:

{
  "results": [
    {
      "command": "cd /Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../.benchmark/pr-266 && php -dmax_execution_time=300 tempest data:parse --input-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data.csv\" --output-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data-actual.json\"",
      "mean": 2.4686511485600002,
      "stddev": 0.0137146818580164,
      "median": 2.46913488176,
      "user": 16.2275718,
      "system": 1.0150784,
      "min": 2.45503008976,
      "max": 2.48632004876,
      "times": [
        2.47743400676,
        2.48632004876,
        2.45503008976,
        2.45533671576,
        2.46913488176
      ],
      "memory_usage_byte": [
        74301440,
        74301440,
        74301440,
        74301440,
        74301440
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0
      ]
    }
  ]
}

@random767435

Copy link
Copy Markdown
Author

/bench

@brendt

brendt commented Mar 14, 2026

Copy link
Copy Markdown
Member

Benchmarking complete! Best execution time: 2.475s

Full results:

{
  "results": [
    {
      "command": "cd /Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../.benchmark/pr-266 && php -dmax_execution_time=300 tempest data:parse --input-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data.csv\" --output-path=\"/Users/brentroose/Dev/100-million-row-challenge/app/Commands/../../data/real-data-actual.json\"",
      "mean": 2.48245010106,
      "stddev": 0.007956842258772086,
      "median": 2.47802340886,
      "user": 15.966409559999999,
      "system": 1.1614289799999997,
      "min": 2.47511111786,
      "max": 2.49136340986,
      "times": [
        2.49136340986,
        2.49081436786,
        2.47802340886,
        2.47511111786,
        2.47693820086
      ],
      "memory_usage_byte": [
        79544320,
        79544320,
        79544320,
        79544320,
        79544320
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0
      ]
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants