introduce an accessor method for `std::fs::ReadDir` that lets us get the path associated with the `ReadDir` iterator

# Proposal

## Problem statement



This ACP is technically two different features, but they are both related to my motivating example, so to avoid making redundant proposals, I'll present both of them here. 

Currently, in `std::fs`, we can construct a `ReadDir` iterator through using `std::fs::read_dir` passing in a `AsRef<Path>` value. However, while the `ReadDir` iterator stores the directory path it's reading directory entries from, we do not have an accessor method to get this directory path. For context and clarity, this is the internal implementation of `ReadDir` for unix platforms as seen in [`library/std/src/sys/fs/unix.rs`](https://github.com/rust-lang/rust/blob/9602bda1dd0c1bbf5787e398385bbac81fd532f8/library/std/src/sys/fs/unix.rs#L259):

```rust
// all DirEntry's will have a reference to this struct
struct InnerReadDir {
    dirp: DirStream,
    root: PathBuf,
}

pub struct ReadDir {
    inner: Arc<InnerReadDir>,
    end_of_stream: bool,
}

impl ReadDir {
    fn new(inner: InnerReadDir) -> Self {
        Self { inner: Arc::new(inner), end_of_stream: false }
    }
}
```

The problem that arises here when a `ReadDir` struct does not exposing a path reference to its internal `PathBuf` is that iterative implementations that constructs a `ReadDir` on nested directories are forced to redundantly allocate a `PathBuf` to remember this path (see motivating example); in iterative implementations we are unable to hold path references to child directories because the lifetime of those path references will not live through the next iteration.

The second issue to point out is in regards to what `std::fs::read_dir` does internally. Underneath the hood, `std::fs::read_dir` wraps around the different platform implementations of `readdir`, which the common thing they do across different platforms is they take a `&Path` argument and allocate memory to convert this path reference to an owned `PathBuf`. This is the unix platform implementation of [readdir](https://github.com/rust-lang/rust/blob/9602bda1dd0c1bbf5787e398385bbac81fd532f8/library/std/src/sys/fs/unix.rs#L2080-L2089):

```rust
pub fn readdir(path: &Path) -> io::Result<ReadDir> {
    let ptr = run_path_with_cstr(path, &|p| unsafe { Ok(libc::opendir(p.as_ptr())) })?;
    if ptr.is_null() {
        Err(Error::last_os_error())
    } else {
        let root = path.to_path_buf();
        let inner = InnerReadDir { dirp: DirStream(ptr), root };
        Ok(ReadDir::new(inner))
    }
}
```

The thing is for iterative/recursive approaches on nested directories, the nested directory `DirEntry` produced by `ReadDir` iterator returns an owned `PathBuf` when accessing this entry's path via `DirEntry::path`. The problem that arises from this is that constructing a `ReadDir` for these directories allocates the same `PathBuf` again instead of taking ownership of the nested directories' `PathBuf`. This could be expensive if we have a bunch of nested directory paths and we're just cloning a `PathBuf` for each nested directory.

It would help to have a way to access the path associated with the `ReadDir` iterator for an optimized iterative solution aiming to operate on a directory and nested directories as well as a way to print what path contains `DirEntry` error results (as `ReadDir` produces `Result<DirEntry, Error>` items). Having a way for the `ReadDir` iterator to be constructed through an owned `PathBuf` would help avoid a redundant allocation for nested directories.

## Motivating examples or use cases



My motivating example comes from my PR in trying to change the implementation of [`std::fs::remove_dir_all` for unix/uefi platform from recursive to an iterative version](https://github.com/rust-lang/rust/pull/154829).

The current implementation of `std::fs::remove_dir_all` for the mentioned platforms are as [follows](https://github.com/rust-lang/rust/blob/9602bda1dd0c1bbf5787e398385bbac81fd532f8/library/std/src/sys/fs/common.rs#L30):
```rust
pub fn remove_dir_all(path: &Path) -> io::Result<()> {
    let filetype = fs::symlink_metadata(path)?.file_type();
    if filetype.is_symlink() { fs::remove_file(path) } else { remove_dir_all_recursive(path) }
}

fn remove_dir_all_recursive(path: &Path) -> io::Result<()> {
    for child in fs::read_dir(path)? {
        let result: io::Result<()> = try {
            let child = child?;
            if child.file_type()?.is_dir() {
                remove_dir_all_recursive(&child.path())?;
            } else {
                fs::remove_file(&child.path())?;
            }
        };
        // ignore internal NotFound errors to prevent race conditions
        if let Err(err) = &result
            && err.kind() != io::ErrorKind::NotFound
        {
            return result;
        }
    }
    ignore_notfound(fs::remove_dir(path))
}
```

My iterative implementation in the [PR](https://github.com/rust-lang/rust/pull/154829) is as follows:
```rust
pub fn remove_dir_all(path: &Path) -> io::Result<()> {
    let filetype = fs::symlink_metadata(path)?.file_type();
    if filetype.is_symlink() { fs::remove_file(path) } else { remove_dir_all_iterative(path) }
}

fn remove_dir_all_iterative(path: &Path) -> io::Result<()> {
    let mut directories = VecDeque::new();
    directories.push_front((path.to_path_buf(), fs::read_dir(path)?));

    while !directories.is_empty() {
        let (parent_path, read_dir) = &mut directories[0];
        let child = read_dir.next();
        if let Some(child) = child {
            let result: io::Result<()> = try {
                let child = child?;
                let child_path = child.path();
                if child.file_type()?.is_dir() {
                    let child_readdir = fs::read_dir(&child_path)?;
                    directories.push_front((child_path, child_readdir));
                } else {
                    fs::remove_file(&child_path)?;
                }
            };

            // ignore internal NotFound errors to prevent race conditions
            if let Err(err) = &result
                && err.kind() != io::ErrorKind::NotFound
            {
                return result;
            }
        } else {
            ignore_notfound(fs::remove_dir(parent_path))?;
            directories.pop_front();
        }
    }

    Ok(())
}
```

In my iterative approach, to preserve the order that the entries are removed, I use a `VecDeque` to store a tuple of the `PathBuf` and the associated `ReadDir` iterator; holding a `PathBuf` value here is necessary in the tuple if I want to remove the path associated with the `ReadDir` iterator when I exhaust the iterator. Moreover, I am unable to store a `&Path` value in the tuple because the `child.path()` returns an owned `PathBuf` that will be deallocated at the end of the while loop iteration if I don't transfer ownership to it to the VecDeque. This storing of `PathBuf`, as mentioned earlier, is redundant because our `ReadDir` iterator already has that information. Moreover, because our `ReadDir` iterator has that information, the current recursive implementation does utilize more memory with each `path: &Path` arguments passed in on recursive calls, when we only need to use that path reference value at the time of removing the associated directory of the `ReadDir` iterator.

Also note: I am aware that I could use a `LinkedList` here if resizing costs is undesirable for this `VecDeque`.

Unfortunately, both the current recursive and iterative implementation has an extra allocation on `PathBuf` when using `std::fs::read_dir` with the nested directories. In an ideal scenario, an iterative version of `remove_dir_all` would be as follows:

```rust
fn remove_dir_all_iterative(path: &Path) -> io::Result<()> {
    let mut directories = VecDeque::new();
    directories.push_front(fs::read_dir(path)?);

    while !directories.is_empty() {
        // directory is not empty
        let read_dir = directories.front_mut().unwrap();

        let child = read_dir.next();
        if let Some(child) = child {
            let result: io::Result<()> = try {
                let child = child?;
                if child.file_type()?.is_dir() {
                    directories
                        .push_front(fs::owned_read_dir(child.path())?);
                } else {
                    fs::remove_file(&child.path())?;
                }
            };

            // ignore internal NotFound errors to prevent race conditions
            if let Err(err) = &result
                && err.kind() != io::ErrorKind::NotFound
            {
                return result;
            }
        } else {
            ignore_notfound(fs::remove_dir(read_dir.root()))?;
            directories.pop_front();
        }
    }

    Ok(())
}
```
This solution assumes that we have an accessor method to the `ReadDir` inner struct `root` field and a `std::fs::owned_read_dir())` method that operates exactly like `std::fs::read_dir()` returning a `Result<ReadDir>`, but accepts an owned `PathBuf` instead of a path reference.

## Solution sketch



In `std::fs`, we can introduce the following methods:

```rust
/// Takes ownership of the path and constructs an iterator that 
/// iterates through the directory entries of the path
pub fn owned_read_dir(path: PathBuf) -> io::Result<ReadDir> {
    fs_imp::read_dir(path).map(ReadDir)
}

impl ReadDir {
    /// Returns the path that this iterator is reading entries from
    pub fn root(&self) -> &Path {
        self.0.root()
    }
}
```

This solution also means that we have to change the `fs_imp::read_dir` signature from currently:

```rust
pub fn read_dir(path: &Path) -> io::Result<ReadDir> {
    // FIXME: use with_native_path on all platforms
    imp::readdir(path)
}
```
to
```rust
pub fn read_dir(path: PathBuf) -> io::Result<ReadDir> {
    imp::readdir(path) // platform impl would also take in a `PathBuf`
}
```

Or if this is undesirable, we can create a separate `owned_read_dir` function with the trade off that it introduces an `owned_read_dir` function that we need to separately implement/maintain per platform.

As for `std::fs::ReadDir::root`, we would have to create a wrapper accessor method per platform implementation of `ReadDir` struct.

## Alternatives

The current working approach I can think of right now for iterative implementations that perform work on a given directory and nested directories is to use a tuple that holds the path and the associated `ReadDir` iterator; you can also get the parent of the child paths through `Path::parent`, but this doesn't seem ideal and won't be useful if you exhaust the `ReadDir` iterator. Otherwise, just use a recursive implementation.

The downside with the solution I propose is that it introduces more function that we need to implement for different platforms, and for any future platforms that we support.



## Links and related work



## What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team [feature lifecycle]. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

[feature lifecycle]: https://std-dev-guide.rust-lang.org/development/feature-lifecycle.html

## Possible responses

The libs team may respond in various different ways. First, the team will consider the *problem* (this doesn't require any concrete solution or alternatives to have been proposed):

- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

introduce an accessor method for `std::fs::ReadDir` that lets us get the path associated with the `ReadDir` iterator #771

Proposal

Problem statement

Motivating examples or use cases

Solution sketch

Alternatives

Links and related work

What happens now?

Possible responses

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

introduce an accessor method for std::fs::ReadDir that lets us get the path associated with the ReadDir iterator #771

Description

Proposal

Problem statement

Motivating examples or use cases

Solution sketch

Alternatives

Links and related work

What happens now?

Possible responses

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

introduce an accessor method for `std::fs::ReadDir` that lets us get the path associated with the `ReadDir` iterator #771