Skip to content

Commit

Permalink
Bring back agi::fs::path to ensure UTF-8 paths
Browse files Browse the repository at this point in the history
On Windows, std::filesystem::path internally stores paths in UTF-16,
but constructing an std::filesystem::path from a string reads that
string in Windows-1252 or some other non-UTF-8 narrow encoding. This
breaks all kinds of code that previously assumed that one could simply
convert between UTF-8 strings, wstrings, and paths freely.

Before the switch from boost::filesystem to std::filesystem, this was
solved by using boost::filesystem::path::imbue to configure
boost::filesystem to always use UTF-8. However, there is no equivalent
function for std::filesystem. It seems that the encoding used can be
controlled to some degree using the C and C++ locales, but changing
these to UTF-8 breaks other things (and global locales are a headache
in general. I won't pull a wm4 here but you probably know what I mean).

So, there does not seem to be any easy solution to this. Aegisub also
isn't the only program to have this problem, see e.g.
https://www.bunkus.org/2021/03/converting-a-c-code-base-from-boostfilesystem-to-stdfilesystem/

As far as I can see, the three options are
- Somehow mess with the global locales until everything magically works.
  This feels risky, might not work on all systems, and could break in
  the future.
- Audit the entire code base and check every single conversion between
  strings and paths (Yeah, no)
- Reinvent the wheel and write a wrapper class that fixes
  std::filesystem::path by forcing all conversions from and to
  std::string to use UTF-8.

So, here we are. It doesn't feel great to have another reinvention of
something that shouldn't be Aegisub's responsibility in the first place,
and we *just* got rid of all the agi::fs wrapper code, but this seems
like the only sane way to be sure that all conversions happen the way we
expect. I guess since agi::fs wraps std::filesystem and not
boost::filesystem this time, it's still better than before.

Incidentally, std::u8string seems to be kind of a meme too. The idea of
being explicit about your string being UTF-8 is great, but how is there
not even a standard function to reinterpret a string as UTF-8 or
vice-versa?? Let alone support in any other string handling or I/O
functions.

The changeset is pretty big, but the main changes are in fs.h/fs.cpp .
The rest is just a few find&replace calls and a handful of manual fixes.

Finally, it should be noted that conversion between
std::filesystem::paths and std::wstrings is broken on gcc <= 11:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048
This is what currently causes the added lagi_mru.add_entry_utf8 test
to fail on the Ubuntu CI. Clang and newer versions of gcc work, though.

Fixes #219.
  • Loading branch information
arch1t3cht committed Dec 26, 2024
1 parent a99092d commit b005c20
Show file tree
Hide file tree
Showing 121 changed files with 442 additions and 368 deletions.
2 changes: 1 addition & 1 deletion libaegisub/audio/provider.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ class writer {
std::ostream& out;

public:
writer(std::filesystem::path const& filename) : outfile(filename, true), out(outfile.Get()) { }
writer(agi::fs::path const& filename) : outfile(filename, true), out(outfile.Get()) { }

template<int N>
void write(const char(&str)[N]) {
Expand Down
4 changes: 2 additions & 2 deletions libaegisub/audio/provider_dummy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ class DummyAudioProvider final : public AudioProvider {
}

public:
DummyAudioProvider(std::filesystem::path const& uri) {
DummyAudioProvider(agi::fs::path const& uri) {
noise = uri.string().find(":noise?") != std::string::npos;
channels = 1;
sample_rate = 44100;
Expand All @@ -70,7 +70,7 @@ class DummyAudioProvider final : public AudioProvider {
}

namespace agi {
std::unique_ptr<AudioProvider> CreateDummyAudioProvider(std::filesystem::path const& file, agi::BackgroundRunner *) {
std::unique_ptr<AudioProvider> CreateDummyAudioProvider(agi::fs::path const& file, agi::BackgroundRunner *) {
if (!file.string().starts_with("dummy-audio:"))
return {};
return std::make_unique<DummyAudioProvider>(file);
Expand Down
8 changes: 3 additions & 5 deletions libaegisub/audio/provider_hd.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,12 @@
#include <libaegisub/fs.h>
#include <libaegisub/path.h>

#include <filesystem>
#include <boost/interprocess/detail/os_thread_functions.hpp>
#include <ctime>
#include <thread>

namespace {
using namespace agi;
using namespace std::filesystem;

class HDAudioProvider final : public AudioProviderWrapper {
mutable temp_file_mapping file;
Expand All @@ -49,7 +47,7 @@ class HDAudioProvider final : public AudioProviderWrapper {
}
}

path CacheFilename(path const& dir) {
fs::path CacheFilename(fs::path const& dir) {
// Check free space
if ((uint64_t)num_samples * bytes_per_sample > fs::FreeSpace(dir))
throw AudioProviderError("Not enough free disk space in " + dir.string() + " to cache the audio");
Expand All @@ -59,7 +57,7 @@ class HDAudioProvider final : public AudioProviderWrapper {
}

public:
HDAudioProvider(std::unique_ptr<AudioProvider> src, path const& dir)
HDAudioProvider(std::unique_ptr<AudioProvider> src, fs::path const& dir)
: AudioProviderWrapper(std::move(src))
, file(dir / CacheFilename(dir), num_samples * bytes_per_sample)
{
Expand All @@ -83,7 +81,7 @@ class HDAudioProvider final : public AudioProviderWrapper {
}

namespace agi {
std::unique_ptr<AudioProvider> CreateHDAudioProvider(std::unique_ptr<AudioProvider> src, std::filesystem::path const& dir) {
std::unique_ptr<AudioProvider> CreateHDAudioProvider(std::unique_ptr<AudioProvider> src, agi::fs::path const& dir) {
return std::make_unique<HDAudioProvider>(std::move(src), dir);
}
}
2 changes: 1 addition & 1 deletion libaegisub/common/charset.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
#endif

namespace agi::charset {
std::string Detect(std::filesystem::path const& file) {
std::string Detect(agi::fs::path const& file) {
agi::read_file_mapping fp(file);

// First check for known magic bytes which identify the file type
Expand Down
6 changes: 3 additions & 3 deletions libaegisub/common/file_mapping.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ char *map(int64_t s_offset, uint64_t length, boost::interprocess::mode_t mode,
}

namespace agi {
file_mapping::file_mapping(std::filesystem::path const& filename, bool temporary)
file_mapping::file_mapping(agi::fs::path const& filename, bool temporary)
#ifdef _WIN32
: handle(CreateFileW(filename.wstring().c_str(),
temporary ? read_write : read_only,
Expand Down Expand Up @@ -116,7 +116,7 @@ file_mapping::~file_mapping() {
}
}

read_file_mapping::read_file_mapping(std::filesystem::path const& filename)
read_file_mapping::read_file_mapping(agi::fs::path const& filename)
: file(filename, false)
{
offset_t size = 0;
Expand All @@ -134,7 +134,7 @@ const char *read_file_mapping::read(int64_t offset, uint64_t length) {
return map(offset, length, read_only, file_size, file, region, mapping_start);
}

temp_file_mapping::temp_file_mapping(std::filesystem::path const& filename, uint64_t size)
temp_file_mapping::temp_file_mapping(agi::fs::path const& filename, uint64_t size)
: file(filename, true)
, file_size(size)
{
Expand Down
3 changes: 1 addition & 2 deletions libaegisub/common/format.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,7 @@
#include <libaegisub/format.h>

#include <libaegisub/charset_conv.h>

#include <filesystem>
#include <libaegisub/fs.h>

#ifdef _MSC_VER
#define WCHAR_T_ENC "utf-16le"
Expand Down
21 changes: 18 additions & 3 deletions libaegisub/common/fs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ namespace sfs = std::filesystem;

namespace agi::fs {
namespace {
void check_error(std::error_code ec, const char *exp, sfs::path const& src_path, sfs::path const& dst_path) {
void check_error(std::error_code ec, const char *exp, path const& src_path, path const& dst_path) {
if (ec == std::error_code{}) return;
using enum std::errc;
switch (ec.value()) {
Expand Down Expand Up @@ -55,7 +55,7 @@ void check_error(std::error_code ec, const char *exp, sfs::path const& src_path,
check_error(ec, #exp, src_path, dst_path);

#define CHECKED_CALL_RETURN(exp, src_path) \
CHECKED_CALL(auto ret = exp, src_path, std::filesystem::path()); \
CHECKED_CALL(auto ret = exp, src_path, agi::fs::path()); \
return ret

#define WRAP_SFS(sfs_name, agi_name) \
Expand All @@ -69,6 +69,12 @@ void check_error(std::error_code ec, const char *exp, sfs::path const& src_path,
return sfs::sfs_name(p, ec); \
}

#define WRAP_SFS_AGI_PATH(sfs_name, agi_name) \
auto agi_name(path const& p) -> agi::fs::path { \
CHECKED_CALL(auto ret = sfs::sfs_name(p, ec), p, agi::fs::path()); \
return agi::fs::path(std::move(ret)); \
}

// sasuga windows.h
#undef CreateDirectory

Expand All @@ -82,7 +88,8 @@ void check_error(std::error_code ec, const char *exp, sfs::path const& src_path,
WRAP_SFS(last_write_time, ModifiedTime)
WRAP_SFS(create_directories, CreateDirectory)
WRAP_SFS(remove, Remove)
WRAP_SFS(canonical, Canonicalize)
WRAP_SFS_AGI_PATH(canonical, Canonicalize)
WRAP_SFS_AGI_PATH(absolute, Absolute)

uintmax_t Size(path const& p) {
if (DirectoryExists(p))
Expand All @@ -104,4 +111,12 @@ void check_error(std::error_code ec, const char *exp, sfs::path const& src_path,
if (filename[filename.size() - ext.size() - 1] != '.') return false;
return boost::iends_with(filename, ext);
}

agi::fs::path CurrentPath() {
return agi::fs::path(std::filesystem::current_path());
}

void CurrentPath(path const& path) {
std::filesystem::current_path(path);
}
}
2 changes: 1 addition & 1 deletion libaegisub/common/io.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
#include <fstream>

namespace agi::io {
using namespace std::filesystem;
using namespace agi::fs;

std::unique_ptr<std::istream> Open(path const& file, bool binary) {
LOG_D("agi/io/open/file") << file;
Expand Down
2 changes: 1 addition & 1 deletion libaegisub/common/json.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ json::UnknownElement parse(std::istream &stream) {
}
}

json::UnknownElement file(std::filesystem::path const& file, std::string_view default_config) {
json::UnknownElement file(agi::fs::path const& file, std::string_view default_config) {
try {
if (fs::FileExists(file))
return parse(*io::Open(file));
Expand Down
4 changes: 2 additions & 2 deletions libaegisub/common/keyframe.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -90,15 +90,15 @@ int wwxd(std::string const& line) {
}

namespace agi::keyframe {
void Save(std::filesystem::path const& filename, std::vector<int> const& keyframes) {
void Save(agi::fs::path const& filename, std::vector<int> const& keyframes) {
io::Save file(filename);
std::ostream& of = file.Get();
of << "# keyframe format v1" << std::endl;
of << "fps " << 0 << std::endl;
boost::copy(keyframes, std::ostream_iterator<int>(of, "\n"));
}

std::vector<int> Load(std::filesystem::path const& filename) {
std::vector<int> Load(agi::fs::path const& filename) {
auto file = io::Open(filename);
std::istream &is(*file);

Expand Down
4 changes: 2 additions & 2 deletions libaegisub/common/log.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@
#include "libaegisub/cajun/elements.h"
#include "libaegisub/cajun/writer.h"
#include "libaegisub/dispatch.h"
#include "libaegisub/fs.h"
#include "libaegisub/util.h"

#include <boost/range/algorithm/remove_if.hpp>
#include <chrono>
#include <filesystem>
#include <fstream>

namespace agi::log {
Expand Down Expand Up @@ -97,7 +97,7 @@ Message::~Message() {
agi::log::log->Log(sm);
}

JsonEmitter::JsonEmitter(std::filesystem::path const& directory)
JsonEmitter::JsonEmitter(agi::fs::path const& directory)
: fp(new std::ofstream(directory/util::strftime("%Y-%m-%d-%H-%M-%S.json")))
{
}
Expand Down
8 changes: 4 additions & 4 deletions libaegisub/common/mru.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ int mru_index(std::string_view key) {
}

namespace agi {
MRUManager::MRUManager(std::filesystem::path const& config, std::string_view default_config, agi::Options *options)
MRUManager::MRUManager(agi::fs::path const& config, std::string_view default_config, agi::Options *options)
: config_name(config)
, options(options)
{
Expand All @@ -78,7 +78,7 @@ MRUManager::MRUListMap &MRUManager::Find(std::string_view key) {
return mru[index];
}

void MRUManager::Add(std::string_view key, std::filesystem::path const& entry) {
void MRUManager::Add(std::string_view key, agi::fs::path const& entry) {
MRUListMap &map = Find(key);
auto it = find(begin(map), end(map), entry);
if (it == begin(map) && it != end(map))
Expand All @@ -93,7 +93,7 @@ void MRUManager::Add(std::string_view key, std::filesystem::path const& entry) {
Flush();
}

void MRUManager::Remove(std::string_view key, std::filesystem::path const& entry) {
void MRUManager::Remove(std::string_view key, agi::fs::path const& entry) {
auto& map = Find(key);
map.erase(remove(begin(map), end(map), entry), end(map));
Flush();
Expand All @@ -103,7 +103,7 @@ const MRUManager::MRUListMap* MRUManager::Get(std::string_view key) {
return &Find(key);
}

std::filesystem::path const& MRUManager::GetEntry(std::string_view key, const size_t entry) {
agi::fs::path const& MRUManager::GetEntry(std::string_view key, const size_t entry) {
const auto map = Get(key);
if (entry >= map->size())
throw MRUError("Requested element index is out of range.");
Expand Down
2 changes: 1 addition & 1 deletion libaegisub/common/option.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ struct option_name_cmp {

namespace agi {

Options::Options(std::filesystem::path const& file, std::string_view default_config, OptionSetting setting)
Options::Options(agi::fs::path const& file, std::string_view default_config, OptionSetting setting)
: config_file(file)
, setting(setting)
{
Expand Down
2 changes: 1 addition & 1 deletion libaegisub/common/path.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ fs::path Path::MakeRelative(fs::path const& path, fs::path const& base) const {
auto ref_it = base.begin();
for (; path_it != path.end() && ref_it != base.end() && *path_it == *ref_it; ++path_it, ++ref_it) ;

std::filesystem::path result;
agi::fs::path result;
for (; ref_it != base.end(); ++ref_it)
result /= "..";
for (; path_it != path.end(); ++path_it)
Expand Down
2 changes: 1 addition & 1 deletion libaegisub/common/thesaurus.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@

namespace agi {

Thesaurus::Thesaurus(std::filesystem::path const& dat_path, std::filesystem::path const& idx_path)
Thesaurus::Thesaurus(agi::fs::path const& dat_path, agi::fs::path const& idx_path)
: dat(std::make_unique<read_file_mapping>(dat_path))
{
read_file_mapping idx_file(idx_path);
Expand Down
4 changes: 2 additions & 2 deletions libaegisub/common/vfr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ Framerate::Framerate(std::initializer_list<int> timecodes)
SetFromTimecodes();
}

Framerate::Framerate(std::filesystem::path const& filename)
Framerate::Framerate(agi::fs::path const& filename)
: denominator(default_denominator)
{
auto file = agi::io::Open(filename);
Expand All @@ -192,7 +192,7 @@ Framerate::Framerate(std::filesystem::path const& filename)
throw UnknownFormat(line);
}

void Framerate::Save(std::filesystem::path const& filename, int length) const {
void Framerate::Save(agi::fs::path const& filename, int length) const {
agi::io::Save file(filename);
auto &out = file.Get();

Expand Down
1 change: 0 additions & 1 deletion libaegisub/include/lagi_pre.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
#endif

#include <algorithm>
#include <filesystem>
#include <functional>
#include <iterator>
#include <map>
Expand Down
12 changes: 6 additions & 6 deletions libaegisub/include/libaegisub/access.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
// ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
// OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

#include <filesystem>
#include <libaegisub/fs.h>

namespace agi::acs {
enum Type {
Expand All @@ -22,11 +22,11 @@ enum Type {
DirWrite
};

void Check(std::filesystem::path const& file, acs::Type);
void Check(agi::fs::path const& file, acs::Type);

static inline void CheckFileRead(std::filesystem::path const& file) { Check(file, acs::FileRead); }
static inline void CheckFileWrite(std::filesystem::path const& file) { Check(file, acs::FileWrite); }
static inline void CheckFileRead(agi::fs::path const& file) { Check(file, acs::FileRead); }
static inline void CheckFileWrite(agi::fs::path const& file) { Check(file, acs::FileWrite); }

static inline void CheckDirRead(std::filesystem::path const& dir) { Check(dir, acs::DirRead); }
static inline void CheckDirWrite(std::filesystem::path const& dir) { Check(dir, acs::DirWrite); }
static inline void CheckDirRead(agi::fs::path const& dir) { Check(dir, acs::DirRead); }
static inline void CheckDirWrite(agi::fs::path const& dir) { Check(dir, acs::DirWrite); }
}
10 changes: 5 additions & 5 deletions libaegisub/include/libaegisub/audio/provider.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@
#pragma once

#include <libaegisub/exception.h>
#include <libaegisub/fs.h>

#include <atomic>
#include <filesystem>
#include <memory>
#include <vector>

Expand Down Expand Up @@ -84,14 +84,14 @@ DEFINE_EXCEPTION(AudioDataNotFound, AudioProviderError);

class BackgroundRunner;

std::unique_ptr<AudioProvider> CreateDummyAudioProvider(std::filesystem::path const& filename, BackgroundRunner *);
std::unique_ptr<AudioProvider> CreatePCMAudioProvider(std::filesystem::path const& filename, BackgroundRunner *);
std::unique_ptr<AudioProvider> CreateDummyAudioProvider(agi::fs::path const& filename, BackgroundRunner *);
std::unique_ptr<AudioProvider> CreatePCMAudioProvider(agi::fs::path const& filename, BackgroundRunner *);

std::unique_ptr<AudioProvider> CreateConvertAudioProvider(std::unique_ptr<AudioProvider> source_provider);
std::unique_ptr<AudioProvider> CreateLockAudioProvider(std::unique_ptr<AudioProvider> source_provider);
std::unique_ptr<AudioProvider> CreateHDAudioProvider(std::unique_ptr<AudioProvider> source_provider,
std::filesystem::path const& dir);
agi::fs::path const& dir);
std::unique_ptr<AudioProvider> CreateRAMAudioProvider(std::unique_ptr<AudioProvider> source_provider);

void SaveAudioClip(AudioProvider const& provider, std::filesystem::path const& path, int start_time, int end_time);
void SaveAudioClip(AudioProvider const& provider, agi::fs::path const& path, int start_time, int end_time);
}
4 changes: 2 additions & 2 deletions libaegisub/include/libaegisub/charset.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
// ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
// OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

#include <filesystem>
#include <libaegisub/fs.h>
#include <string>

namespace agi {
Expand All @@ -22,7 +22,7 @@ namespace agi {
/// @brief Returns the character set with the highest confidence
/// @param file File to check
/// @return Detected character set.
std::string Detect(std::filesystem::path const& file);
std::string Detect(agi::fs::path const& file);

} // namespace util
} // namespace agi
Loading

0 comments on commit b005c20

Please sign in to comment.