This function downloads a file from Google Cloud Storage (GCS) to a local
directory and reads it into R as a data frame. It uses the gsutil
command-line tool to handle the file download.
dl_read_gcp(
path,
sep = "\t",
header = TRUE,
tmpdir,
gsutil_path = "gsutil",
check_first = TRUE,
verbose = FALSE,
...
)
Character. The path to the file in GCS, e.g., gs://bucket-name/file-name.csv
.
Character. The field separator character. Default is \t
.
Logical. Whether the file contains the names of the variables as its first line. Default is TRUE.
Character. The local directory to which the file will be downloaded.
Character. The path to the gsutil
command-line tool.
Default is "gsutil".
Logical. Whether to check if the file already exists locally before downloading. Default is TRUE.
Logical. If TRUE, prints messages about the download process. Default is FALSE.
Additional arguments passed to readr::read_delim
.
A data frame containing the contents of the downloaded file.
This function first checks if the specified file exists in GCS. If the file
exists, it downloads the file to the specified local directory (tmpdir
). If
the local directory does not exist, it will be created. The function handles
spaces in directory paths by quoting them appropriately. If the file is
successfully downloaded, it is read into R using readr::read_delim
.
If the check_first
argument is set to TRUE, the function will first check
if the file already exists locally to avoid redundant downloads. If the file
is already present locally, it will not be downloaded again.
if (FALSE) {
df <- dl_read_gcp(
path = "gs://bucket-name/file-name.csv",
sep = ",",
header = TRUE,
tmpdir = "/local/path",
gsutil_path = "gsutil",
check_first = TRUE,
verbose = TRUE
)
}