Forever to parse 150,000 row file

Is there anyway I can optimize reading this file?

Btw this is a Laravel application. `data_fill()` is a Laravel specific function. Its just my php array building logic.

https://github.com/laravel/framework/blob/3414dcfcbe27cf0f4deee0670f022983e8016392/src/Illuminate/Support/helpers.php#L427


I took the 150,000 row, 16 column file and chopped it down to a much smaller sample of 4261 rows and it takes about 10-15 seconds. The complete file takes minutes (had to heavily modify nginx and php to allow for this).

Here is a blackfire.io report of the smaller file - https://blackfire.io/profiles/c4087f40-dd5c-42ed-9258-3c6d5a1ace51/graph

Looks like it is reading all 68176 cells (4261 rows * 16 columns) multiple times on the smaller file 349380 times (the darker red boxes).

Spreadsheets attached.

[spreadsheets.zip](https://github.com/box/spout/files/2346507/spreadsheets.zip)

Here is the code i am using to process the file(s). I have hard coded some parameters for testing. Normally these would be request variables to allow for different column selection based on the header row.

```php
public function report02()
    {
        ini_set('max_execution_time', '400');

        // Time how long script takes to run
        $executionStartTime = microtime(true);

        $reader = ReaderFactory::create(Type::XLSX); // for XLSX files

        $reader->open(storage_path('smaller.xlsx'));

        $headers = [];
        $header_tech = strtoupper('INSTALLER NBR');
        $header_tech_index = 0;
        $header_equipment_type = strtoupper('ITEM');
        $header_equipment_type_index = 0;
        $header_equipment_sn = strtoupper('SERIAL NUMBER');
        $header_equipment_sn_index = 0;
        $header_equipment_status = strtoupper('EQUIP STS');
        $header_equipment_status_index = 0;
        $header_equipment_age = strtoupper('DAYS ASSIGNED');
        $header_equipment_age_index = 0;
        $show_old_equipment_limit_in_days = 21;
        // convert comma delimitied tech numbers to an array and trim white space
        $techs = array_map('trim', explode(",", '9448,69091,69165,69327,69430,69445,69449,69711,70056'));
        $data = [
            'old_equipment' => [
                'total' => 0
            ]
        ];

        foreach ($reader->getSheetIterator() as $sheet) {
            foreach ($sheet->getRowIterator() as $row => $values) {
                // Check for last row - end of file
                $values2 = array_map('trim', $values);
                if (! empty($values2)) {
                    // Header row position
                    if ($row == 2) {
                        $headers = $values2;
                        $header_tech_index = array_search($header_tech, $headers);
                        $header_equipment_type_index = array_search($header_equipment_type, $headers);
                        $header_equipment_sn_index = array_search($header_equipment_sn, $headers);
                        $header_equipment_status_index = array_search($header_equipment_status, $headers);
                        $header_equipment_age_index = array_search($header_equipment_age, $headers);
                    }

                    // Data after header
                    if ($row > 2) {
                        // This row contains data for a tech we requested
                        if (in_array($values2[$header_tech_index], $techs)) {
                            // 7 Status
                            if ($values2[$header_equipment_status_index] == 7) {
                                if (array_has($data, 'techs.' . $values2[$header_tech_index] . '.counts.7 Status.' . $values2[$header_equipment_type_index])) {
                                    $data['techs'][$values2[$header_tech_index]]['counts']['7 Status'][$values2[$header_equipment_type_index]]++;
                                } else {
                                    data_fill($data, 'techs.' . $values2[$header_tech_index] . '.counts.7 Status.' . $values2[$header_equipment_type_index], 1);
                                }

                                // All tech equipment
                                data_fill($data, 'techs.' . $values2[$header_tech_index] . '.equipment.7 Status.' . $values2[$header_equipment_sn_index], [
                                    'type' => $values2[$header_equipment_type_index],
                                    'age' => $values2[$header_equipment_age_index],
                                ]);
                            }

                            // T Status
                            if ($values2[$header_equipment_status_index] == 'T') {
                                if (array_has($data, 'techs.' . $values2[$header_tech_index] . '.counts.T Status.' . $values2[$header_equipment_type_index])) {
                                    $data['techs'][$values2[$header_tech_index]]['counts']['T Status'][$values2[$header_equipment_type_index]]++;
                                } else {
                                    data_fill($data, 'techs.' . $values2[$header_tech_index] . '.counts.T Status.' . $values2[$header_equipment_type_index], 1);
                                }

                                // All tech equipment
                                data_fill($data, 'techs.' . $values2[$header_tech_index] . '.equipment.T Status.' . $values2[$header_equipment_sn_index], [
                                    'type' => $values2[$header_equipment_type_index],
                                    'age' => $values2[$header_equipment_age_index],
                                ]);
                            }

                            // X Days or older
                            if ($values2[$header_equipment_age_index] >= $show_old_equipment_limit_in_days && ($values2[$header_equipment_status_index] !== 'L')) {
                                data_fill($data, 'old_equipment.techs.' . $values2[$header_tech_index] . '.' . $values2[$header_equipment_sn_index], [
                                        'type' => $values2[$header_equipment_type_index],
                                        'status' => $values2[$header_equipment_status_index],
                                        'age' => $values2[$header_equipment_age_index],
                                    ]);

                                $data['old_equipment']['total']++;
                            }
                        }
                    }
                }
                unset($values2);
            }
        }

        $reader->close();

        $executionEndTime = microtime(true);
        $data['runtime_in_seconds'] = round($executionEndTime - $executionStartTime, 2);

        return response()->json($data, 200);
    }
```

Screenshot of smaller file output

<img width="1575" alt="screen shot 2018-09-03 at 4 23 37 pm" src="https://user-images.githubusercontent.com/5751754/45000785-ced73b80-af95-11e8-9b8e-82d3097e7b9e.png">


Thanks for any help I receive.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Forever to parse 150,000 row file #585

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Forever to parse 150,000 row file #585

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions